2012

Quantitative Biology - 2012


Lecture 1: Exact Matching

Brute-force, Binary search, quicksort, binary trees, hashing, BWT

Lecture 2: Dynamic Programming and Applications

Dynamic programming, Fibonacci, Longest-increasing subsequencing, Edit Distance, Smith-Waterman, Dynamic Time Warp, BLAST

Lecture 3: Graphs and Genomes

Graph representations, network motifs, modularity, breath-first-search, depth-first-search, eulerian path, traveling salesman, genome assembly, assembly validation.

Lecture 4: Gene Finding and HMMs

Prokaryotic and Eukaryotic gene finding, Glimmer, GlimmerHMM, Forward algorithm, Viterbi, GHMM

Code Sample: BWT Construction and Searching

Construct the BWT for a genome, and compute exact matches against it.

Handouts on the BWT

Details on BWT Construction and searching



Quantitative Biology Bootcamp - 2012


Lecture 1: Molecular Biology, Computers, and Unix

Milestones in Molecular Biology and the rise of sequencing. Overview of computer systems, introduction to unix and scripting.

Lecture 2: Sequence Alignment and Computational Thinking

Introduction to Alignment and Algorithms, Suffix Arrays, Binary Search, QuickSort, Burrows-Wheeler Transform, Bowtie.

Lecture 3: Genomic Resources

NCBI, UCSC, CSHL Meetings and Courses, Galaxy.

Lecture 4: Dicovering Origins of Replication

Background on tracking DNA replication with next-gen sequencing, Walk-through of analysis steps, Visualization of discovered replication sites.

Lecture 5: Matlab Tutorial

Introduction to Matlab, working with variables and matrices, ploting, working with strings, working with structures, working with CEL arrays.



SBU Graduate Genetics - 2012


Lecture 1: Sequence Alignment and Computational Thinking

Introduction to Alignment and Algorithms, Suffix Arrays, Binary Search, QuickSort, Burrows-Wheeler Transform, Bowtie.

Lecture 2: Whole Genome Assembly and Alignment

Theory and Practice of whole genome assembly and alignment highlighting the Celera Assembler.



CSHL Advanced Sequencing Course - 2012


Course Wiki

Schedule of presentations and slides.

Whole Genome Assembly and Alignment [ppt]

Theory and Practice of whole genome assembly and alignment highlighting Allpaths, SOAPdenovo, and the Celera Assembler.

Assembly Tutorial [ppt]

Tutorial on using Allpaths for whole genome assembly

Challenge Problem Data [ref genome]

Data for challenge problem in the tutorial



Undergraduate Research Program in Bioinformatics - 2012


Lecture 1: Sequence Alignment and Computational Thinking

In this class we explored the problem of finding exact occurrences of a query sequence in a large genome or database of sequences. Under this theme, we started by analyzing the brute force approach introducing the concepts of algorithm, complexity analysis, and E-values. Next we discussed suffix arrays as an index for accelerating the search, including analyzing the performance of binary search. We also considered two traditional algorithms for sorting (Selection Sort versus QuickSort) and their relative performance. In the second half of the class we discussed finding approximate occurrences of a short query sequence in a large genome or database of sequences. We first defined the problem by considering various metrics of an approximate occurrence such as hamming distance, or edit distance. We then considered different methods for computing inexact alignments including brute force global & local alignments, and seed-and-extend algorithms. Finally we discussed Bowtie as a Burrows-Wheeler transform based short read mapping algorithm for discovering alignments to reference genome.


Lecture 2: Sequencing Pitfalls

In this session we reviewed the currently available sequencing technologies and best practices, focusing on the widely used Illumina sequencing platform, the up and coming PacBio sequencing platform, and the recently announced Oxford Nanopore instruments. Special attention was placed on the complexities and biases with Illumina.


Lecture 3: Graphs and Genomes

The theme of this class was graphs and methods for graph analysis. The emphasis was on genome assembly but included a discussion of other biological networks including PPI networks, regulation networks, neuron interaction networks, and cell cycle graphs. In the class, we considered fundamental properties of graphs, such as nodes, edges, degrees, and shortest paths. We then examined in detail algorithms for searching graphs with a with breadth-first-search, and then approaches for finding minimum cost paths through weighed graphs (traveling salesman problem), including exhaustive search, greedy algorithms, and branch-and-bound. This lead to a discussion of the intractable nature of NP-complete problems, and reviewed several important examples (vertex cover, clique finding, knapsack problem, Hamiltonian cycle).