1 |
26-Aug |
Mon |
Introduction |
Sign Up for Piazza |
* Molecular Structure of Nucleic Acid (Watson and Crick, 1953, Nature) * Biological data sciences in genome research (Schatz, 2015, Genome Research) * Big Data: Astronomical or Genomical? (Stephens et al, 2015, PLOS Biology) |
2 |
28-Aug |
Wed |
Genomic Technologies, kmers |
Assignment 1 |
* Coming of age: ten years of next-generation sequencing technologies (Goodwin et al, 2016, Nature Reviews Genetics) * Guide to k-mer approaches for genomics across the tree of life (Jenike et al., 2024, arXiv) |
* |
2-Sep |
Mon |
\({\color{red}\text{Labor Day}}\) |
|
|
3 |
4-Sep |
Wed |
Assembly, WGA |
|
* Toward simplifying and accurately formulating fragment assembly. (Myers, 1995, J. Comp. Bio.) * Velvet: Algorithms for de novo short read assembly using de Bruijn graphs (Zerbino and Birney, 2008, Genome Research) * SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing (Bankevich, et al. 2012, J Comput Biol) * MUMmer: Alignment of Whole Genomes (Delcher et al, 1999, NAR) |
4 |
9-Sep |
Mon |
Human Genome, Long Reads |
Assignment 2 |
* Initial sequencing and analysis of the human genome (International Human Genome Sequencing Consortium, 2001, Nature) * FALCON-unzip: Phased diploid genome assembly with single-molecule real-time sequencing (Chin et al, 2016, Nature Methods) * MHAP: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing (Berlin et al, 2015, Nature Biotech) * Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation (Koren et al, 2017, Genome Research) * Telomere-to-telomere assembly of diploid chromosomes with Verkko (Rautiainen et al, 2023, Nature Biotechnology) * Piercing the dark matter: bioinformatics of long- range sequencing and mapping (Sedlazeck et al, 2018, Nature Reviews Genetics) |
5 |
11-Sep |
Wed |
T2T, HPRC, pangenome |
|
* The complete sequence of a human genome (Nurk et al, Science 2012) * Approaching complete genomes, transcriptomes and epi-omes with accurate long-read sequencing (Kovaka et al, 2023, Nature Methods * A draft human pangenome reference (Liao et al, 2023, Nature) * Beyond the Human Genome Project: The Age of Complete Human Genome Sequences and Pangenome References (Taylor et al., 2024, Annual Review of Genomics and Human Genetics) |
6 |
16-Sep |
Mon |
Read Mapping |
|
* How to map billions of short reads onto genomes (Trapnell and Salzberg, 2009, Nature Biotech) * Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome (Langmead et al, 2009, Genome Biology) * BWA-MEM: Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM (Li, 2013, arXiv) * Sapling: Accelerating Suffix Array Queries with Learned Data Models (Kirsche et al, 2020, bioRxiv |
7 |
18-Sep |
Wed |
Variant Analysis |
|
* Haplotype-based variant detection from short-read sequencing (Garrison and Marth, arXiv, 2012) * The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data (McKenna et al, 2010, Genome Research) * A universal SNP and small-indel variant caller using deep neural networks (Poplin et al, 2018, Nature Biotechnology * SAM/BAM/Samtools: The Sequence Alignment/Map format and SAMtools (Li et al, 2009, Bioinformatics) * IGV: Integrative genomics viewer (Robinson et al, 2011, Nature Biotech) |
8 |
23-Sep |
Mon |
Human evolution |
Assignment 3 |
* An integrated map of genetic variation from 1,092 human genomes (1000 Genomes Consortium, 2012, Nature) * Analysis of protein-coding genetic variation in 60,706 humans (Let et al, 2016, Nature) * A Draft Sequence of the Neandertal Genome (Green et al. 2010, Science) * Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals (Vernot et al. 2016. Science) * Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) (Schatz et al, 2022, Cell Genomics) |
9 |
25-Sep |
Wed |
Intro to ML: PCA, Clustering, tSNE, UMAP, Decision Trees, NN |
|
* What are decision trees? (Kingsford and Salzberg, 2008, Nature Biotechnology) * What is a hidden Markov model? (Eddy, 2004, Nature Biotechnology) * Deep learning in biomedicine (Wainberg et al, 2018, Nature Biotechnology) * Visualizing Data Using t-SNE |
10 |
30-Sep |
Mon |
CNN + DeepVariant |
|
* ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al., 2012, NIPS) |
11 |
2-Oct |
Wed |
Functional Analysis 1: Annotation |
|
* BLAST: Basic Local Alignment Search Tool * Glimmer: Microbial gene identification using interpolated Markov models * MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects * BEDTools: a flexible suite of utilities for comparing genomic features (Quinlan & Hall, 2010, Bioinformatics) |
12 |
7-Oct |
Mon |
Functional Analysis 2: RNA-seq |
Assignment 4 |
* RNA-Seq: a revolutionary tool for transcriptomics (Wang et al, 2009. Nature Reviews Genetics) * Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks (Trapnell et al, 2012, Nature Protocols) * Salmon provides fast and bias-aware quantification of transcript expression (Patro et al, 2017, Nature Methods) * Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications (Krueger and Andrews, 2011, Bioinformatics) |
13 |
9-Oct |
Wed |
Functional Analysis 3: Methyl-seq, Chip-seq, and Hi-C |
|
* ChIP-seq and beyond: new and improved methodologies to detect and characterize protein-DNA interactions (Furey, 2012, Nature Reviews Genetics) * PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls (Rozowsky et al. 2009. Nature Biotech) * Comprehensive Mapping of Long-Range Interactions Reveals Folding Principles of the Human Genome (Lieberman-Aiden et al, 2009, Science) |
14 |
14-Oct |
Mon |
Functional Analysis 4: Regulatory States, ENCODE, GTEx, RoadMap |
Project proposal |
* An integrated encyclopedia of DNA elements in the human genome (The ENCODE Project Consortium, Nature, 2012) * Genetic effects on gene expression across human tissues (GTEx Consortium, Nature, 2017) * Integrative analysis of 111 reference human epigenomes (Roadmap Epigenome Consortium, Nature, 2015) * ChromHMM: automating chromatin-state discovery and characterization (Ernst & Kellis, 2012, Nature Methods) * Segway: Unsupervised pattern discovery in human chromatin structure through genomic segmentation (Hoffman et al, 2012, Nature Methods) |
15 |
16-Oct |
Wed |
Functional Analysis 5: Single Cell Genomics |
|
* Ginkgo: Interactive analysis and assessment of single-cell copy-number variations (Garvin et al, 2015, Nature Methods) * The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells (Trapnell et al, Nature Biotech, 2014) * Eleven grand challenges in single-cell data science (Lahnemann et al, Genome Biology, 2020) |
16 |
21-Oct |
Mon |
Transformers |
Assignment 5 |
* Attention is all you need (Vaswani et al. 2017, arXiv) |
17 |
23-Oct |
Wed |
Transformers + Enformer |
|
* Effective gene expression prediction from sequence by integrating long-range interactions (Avsec et al., 2021, Nature Methods) * Personal transcriptome variation is poorly explained by current genomic deep learning models (Huang et al., 2023, Nature Genetics) * Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings (Sasse et al., 2023, Nature Genetics) |
18 |
28-Oct |
Mon |
Other applications of DL in Genomics |
Prelim report assigned |
* Deep Learning Sequence Models for Transcriptional Regulation (Sokolova et al., 2024, Annual Reviews of Genomics and Human Genetics) * AlphaFold (Jumper et al, 2021, Nature) |
19 |
30-Oct |
Wed |
Midterm review |
|
|
20 |
4-Nov |
Mon |
Midterm [In class exam] |
|
|
21 |
6-Nov |
Wed |
Human Genetic Diseases |
|
* Genome-Wide Association Studies (Bush & Moore, 2012, PLOS Comp Bio) * The contribution of de novo coding mutations to autism spectrum disorder (Iossifov et al, 2014, Nature) |
22 |
11-Nov |
Mon |
Metagenomics |
Prelim Report Due; Final Report Assigned |
* Kraken: ultrafast metagenomic sequence classification using exact alignments (Wood and Salzberg, 2014, Genome Biology) * Chapter 12: Human Microbiome Analysis (Morgan and Huttenhower) |
23 |
13-Nov |
Wed |
\({\color{red}\text{No class BIODATA24}}\) |
|
|
24 |
18-Nov |
Mon |
Cancer Genomics |
|
* The Hallmarks of Cancer (Hanahan & Weinberg, 2000, Cell) * Evolution of Cancer Genomes (Yates & Campbell, 2012, Nature Reviews Genetics) * Comprehensive molecular portraits of human breast tumours (TCGA, 2012, Nature) |
25 |
20-Nov |
Wed |
In class project presentation |
|
|
* |
25-Nov |
Mon |
\({\color{red}\text{Thanksgiving Break}}\) |
|
|
* |
27-Nov |
Wed |
\({\color{red}\text{Thanksgiving Break}}\) |
|
|
26 |
2-Dec |
Mon |
In class project presentation |
|
|
27 |
4-Dec |
Wed |
In class project presentation |
|
|
* |
16-Dec |
Mon |
Final Report Due |
Final Report Due |
|