Genome Assembly Class (2006)
An 8 part lecture series given at the University of Hawaii between August 13 - 18 2006. The
lecture series covers the entire assembly process, from sequencing reactions, to
assembly, and finishing. The discussion begins with an overview of the assembly
process, and its theoretical foundations of Lander-Waterman statistics and
Shortest-Common-Superstring. Next there is an indepth discussion of the Celera
Assembler, covering the details of overlapping, unitigging, and scaffolding.
Next an Introduction to AMOS is given describing the motivation, framework, and
a brief discussion of some of the currently available tools. Lecture 5
discusses current methods to discover mis-assemblies and the Interactive Genome
Visual Analytics tool Hawkeye, which acts as a visual portal to understanding
and validating your assembly data. Next, I discuss two common problems in
assembly, that of base calling and trimming and describe AutoEditor and
AutoJoiner which are second generation assembly tools to address these areas.
Lecture 6 is provided by Adam Phillippy and covers all aspects of Whole Genome
Alignment, centered around the MUMmer suite. The following lecture, also by
Adam Phillippy, describes the AMOScmp Comparative Assembler which uses MUMmer to
assemble genomes without the costly overlapping step even at extremely low
coverage. The Final lecture acts as a summary for the class, and a checklist
for potential problem areas one might encounter during whole genome assembly.
1. |
Genome Assembly: Assembly Concepts and Methods:
Lander-Waterman Statistics, Shortest-Common-Superstring
|
2. |
Celera Assembler: Theory and Practice:
runCA, overlapper, unitigging, scaffolding
|
3. |
AMOS: A Modular Open Source Assembler:
AMOS overview, runAMOS, AMOS banks, Converters
|
4. |
AMOS Assembly Validation and Visualization:
Mate-pairs, SNPs, Coverage levels, Hawkeye, stitchContigs, Assembly Repair
|
5. |
Improving Assembly without Sequencing:
Basecalling, AutoEditor, Trimming, AutoJoiner
|
6. |
Whole Genome Alignment:
Alignment, Smith-Waterman, MUMmer, Suffix Trees
|
7. |
Comparative Genome Assembly:
AMOScmp, MUMmer, reference assembly
|
8. |
Assembly Checklist:
Sequencing, Libraries, Biases, Coverage, Unitigging, Scaffolding
|
|
|