This is the research homepage for metaSEQ.
metaSEQ
:
De Novo metagenomic genome assembler
Introduction
The metaSEQ
is the first de novo metagenomic
sequence assembly program. This is a good and the only solution for the common
computational challenge in many metagenomic projects to develop a robust de novo sequence assembly program that
is capable of simultaneously assembling multiple, diverse and yet highly
similar microbial genomes or short regions. metaSEQ
can assemble short regions in microbial genomes such as 16s rRNA gene sequences
using micro-reads of length 100 basepairs (454 Sequencer provides about 250bp
reads). metaSEQ uses an Eulerian
Graph as the backbone, on top of which it implements various novel algorithmic
and statistical methods to resolve orientations of reads, correct sequencing
errors, assemble sequences, estimate abundance levels, and cluster assembled
sequences into species. Multiple simulation results show that metaSEQ can accurately assemble 1
million reads for 100 species (similar or dissimilar) within an hour in a
regular 32-bit desktop computer. The time and space efficiency of the proposed
computational framework demonstrates the potential to be scaled up for de novo assembly of the whole microbial
genomes.
metaSEQ has
various functionalities and you can easily enable and/or disable such
functionalities using options (http://www-scf.usc.edu/~sungjech/metaSEQ/metaSEQ.html).
Download
1.
Source codes: metaSEQ-release-v1.0.tar.gz
2. User
guide: user_guide.pdf
3. Simulated
read sets
A. 10
underlying sequences, 100K reads and 0.2% of sequencing error rate: ITS_02_100K.tar.gz
B. 10
underlying sequences, 1M reads and 0.2% of sequencing error rate: ITS_02_1M.tar.gz
C. 100
underlying sequences, 1M reads and 0.2% of sequencing error rate: ITS_02_1M_100.tar.gz
HowTo
1. Preparation
A. This project assumes Linux environment.
B. Program needle of EMBOSS package is used for calculating pair-wise similarity ratios among template sequences.
2. Compile
A. Unpack the tar file using ¡®tar –xvfz metaSEQ-release-v1.0.tar.gz¡¯
B. Type ¡®make¡¯
C. Type ¡®make install¡¯
3. Place simulated read set in the ¡®data¡¯ directory.
4. Adjust configuration file (user_guide.pdf)
5. Assembly
Type ¡®./metaSEQ¡¯
There are various options such as expected target sequence length (minimum length and maximum length thresholds), depth of initial layer, depth of inter layer, and so on. Please check ¡®Adjusting Configuration File.¡¯
6. Check debugging messages from the
¡®debug_message¡¯ directory.
7. Check template sequences, clusters and
their abundance levels. The template sequences will be saved in ¡®data¡¯
directory with named *.template and clustering information will be saved in
¡®clusters¡¯ directory.
Last Updated 12/03/2008 by Sungje Cho