2012 Beyond the Genome Informatics Challenge: Digital Encoding
James David Dooling, Michael Schatz, and James Taylor
The goal of this challenge is to identify a secret message inserted into an unknown microbial genome.
See the presentation for all the details. Download the data here: btg12.tgz
The tarball contains several read sets that are the starting point for the
challenge. The reads were generated by taking a portion of an organism's
reference sequence and inserting a DNA-encoded famous quote into the sequence.
Your challenge is to identify the inserted sequence, decode the quote, and
identify its speaker.
You can use the included dna-encode.pl script to decode the message. It uses
the algorithm defined in
GM Church, Y Gao, and S Kosuri. (2012)
Next-Generation Digital Information Storage in DNA. Science. DOI: 10.1126/science.1226355
If you have Perl installed on your computer, you can get the documentation for the
dna-encode.pl script with the following command:
$ perldoc dna-encode.pl
You can get a brief synopsis of the command with this command:
$ perl dna-encode.pl --help
The types of reads in each FASTQ (.fq) file are described in detail below.
i2x100f180.1.fq Read 1 of Illumina 2x100 reads from 180+/-20 bp fragments
i2x100f180.2.fq Read 2 of Illumina 2x100 reads from 180+/-20 bp fragments
i2x50f2000.1.fq Read 1 of Illumina 2x50 reads from 2+/-0.2 kbp fragments
i2x50f2000.2.fq Read 2 of Illumina 2x50 reads from 2+/-0.2 kbp fragments
i2x250f700.fq Interleaved reads 1 and 2 of Illumina 2x250 reads from
700+/-50 bp fragments
See this presentation for background on genome assembly and whole genome alignment. Try assembling the reads,
BLASTing the contigs to identify the microbe, then aligning the contigs to the reference to identify the inserted sequence. Then decode the inserted sequence
using the included dna-encode script.
Solution Guide available here