2014 CSHL Undergraduate Research Program in Bioinformatics |
---|
Searching for GATTACA
In this class we explored the problem of finding exact occurrences of a query
sequence in a large genome or database of sequences. Under this theme, we
started by analyzing the brute force approach introducing the concepts of
algorithm, complexity analysis, and E-values. Next we discussed suffix arrays
as an index for accelerating the search, including analyzing the performance of
binary search. We also considered two traditional algorithms for sorting
(Selection Sort versus QuickSort) and their relative performance. In the second
half of the class we discussed finding approximate occurrences of a short query
sequence in a large genome or database of sequences. We first defined the
problem by considering various metrics of an approximate occurrence such as
hamming distance, or edit distance. We then considered different methods for
computing inexact alignments including brute force global & local
alignments, and seed-and-extend algorithms. Finally we discussed Bowtie as a
Burrows-Wheeler transform based short read mapping algorithm for discovering
alignments to reference genome.
Python & Bioinformatics
Python Class 1
Introduction to python, variables, lists, conditions, loops
Python Class 2
Brute force search, dictionaries, motif finding
iPython Notebooks for Probability & Statistics
- Rolling a die (Uniform Random Probability)
- Flipping a coin (Binomial & Normal Distributions)
- Throwing Marbles into Jars (Poisson Distribution)
- Throwing Darts (Exponential Distribution)
We also used the exercises at Rosalind throughout the course.
Special topics
Talk by Anne Churchland on balancing work and life.
|