2013 Beyond the Genome Informatics Challenge: Metagenomics Variant Encoding
Sven-Eric Schelhorn and Michael Schatz
The goal of this challenge is to identify a secret message we encoded as
variants into a metagenomics sample. The sample was generated by mixing
portions of the reference sequences of several microbial species. Sequence reads
were simulated from these portions. Within each portion of a
referencesequence, a foreigninsert (i.e, not originating from any of the microbial
species) was placed. These inserts encode a message.
See the presentation for all the details. Download the data here: btg2013.tgz
You can use the included dna-encode.pl script to decode the message. It uses
the algorithm defined in
GM Church, Y Gao, and S Kosuri. (2012)
Next-Generation Digital Information Storage in DNA. Science. DOI: 10.1126/science.1226355
If you have Perl installed on your computer, you can get the documentation for the
dna-encode.pl script with the following command:
$ perldoc dna-encode.pl
You can get a brief synopsis of the command with this command:
$ perl dna-encode.pl --help
The types of reads in each FASTQ (.fq) file are described in detail below.
dna-encode.pl Perl script to encode/decode text to/from DNA
sh_end_{1,2}.fastq.gz Paired end read data from the mixed references,
fastq format, 2x250bp from 1000+/-50bp fragments
lo_end_{1,2}.fastq.gz Paired end read data from the mixed references,
fastq format, 2x150bp from 5300+/-500bp fragments
Hints
- It’s a metagenomic sample- Choose your tools accordingly.
- After you identified an insert, you need to identify the insert wildtype.There are several ways to distinguish it from the variants. BLAST, consensus, pairwise
similarities...
- NCBI Blast may be unreliable due to the Government Shutdown. If yes, try to use the public BLAST server at EBI/EMBL (WU-BLAST).
Solution Guide
Solution Guide available here
|