This is the research homepage for metaSEQ.

Home

metaSEQ

     : De Novo metagenomic genome assembler

 

Introduction

The metaSEQ is the first de novo metagenomic sequence assembly program. This is a good and the only solution for the common computational challenge in many metagenomic projects to develop a robust de novo sequence assembly program that is capable of simultaneously assembling multiple, diverse and yet highly similar microbial genomes or short regions. metaSEQ can assemble short regions in microbial genomes such as 16s rRNA gene sequences using micro-reads of length 100 basepairs (454 Sequencer provides about 250bp reads). metaSEQ uses an Eulerian Graph as the backbone, on top of which it implements various novel algorithmic and statistical methods to resolve orientations of reads, correct sequencing errors, assemble sequences, estimate abundance levels, and cluster assembled sequences into species. Multiple simulation results show that metaSEQ can accurately assemble 1 million reads for 100 species (similar or dissimilar) within an hour in a regular 32-bit desktop computer. The time and space efficiency of the proposed computational framework demonstrates the potential to be scaled up for de novo assembly of the whole microbial genomes.

metaSEQ has various functionalities and you can easily enable and/or disable such functionalities using options (http://www-scf.usc.edu/~sungjech/metaSEQ/metaSEQ.html).

 

Download

1.     Source codes: metaSEQ-release-v1.0.tar.gz

2.     User guide: user_guide.pdf

3.     Simulated read sets

A.     10 underlying sequences, 100K reads and 0.2% of sequencing error rate: ITS_02_100K.tar.gz

B. 10 underlying sequences, 1M reads and 0.2% of sequencing error rate: ITS_02_1M.tar.gz

C.     100 underlying sequences, 1M reads and 0.2% of sequencing error rate: ITS_02_1M_100.tar.gz

 

HowTo

1.     Preparation

A.     This project assumes Linux environment.

B. Program needle of EMBOSS package is used for calculating pair-wise similarity ratios among template sequences.

2.     Compile

A.     Unpack the tar file using ¡®tar –xvfz metaSEQ-release-v1.0.tar.gz¡¯

B. Type ¡®make¡¯

C.     Type ¡®make install¡¯

3.     Place simulated read set in the ¡®data¡¯ directory.

4.     Adjust configuration file (user_guide.pdf)

5.     Assembly

Type ¡®./metaSEQ¡¯

There are various options such as expected target sequence length (minimum length and maximum length thresholds), depth of initial layer, depth of inter layer, and so on. Please check ¡®Adjusting Configuration File.¡¯

6.     Check debugging messages from the ¡®debug_message¡¯ directory.

7.     Check template sequences, clusters and their abundance levels. The template sequences will be saved in ¡®data¡¯ directory with named *.template and clustering information will be saved in ¡®clusters¡¯ directory.

 

Home

Last Updated 12/03/2008 by Sungje Cho conjugated linoleic acid

The University of Southern California does not screen or control the content on this website and thus does not guarantee the accuracy, integrity, or quality of such content. All content on this website is provided by and is the sole responsibility of the person from which such content originated, and such content does not necessarily reflect the opinions of the University administration or the Board of Trustees