About This Page
After students have completed gene annotations, they can use a variety of different bioinformatics tools and data sources to investigate interesting biological questions about the genes they have annotated. This section contains curriculum materials that have been developed by GEP faculty to guide students undertaking such explorations.
Searching for Transcription Start Sites in Drosophila
Last update: 12/21/2016
This presentation describes the recommended annotation strategy for identifying transcription start sites in Drosophila. The presentation provides an overview of the promoter architecture in D. melanogaster and describes the types of evidence that can be used to support the transcription start sites annotations.
Annotation of Transcription Start Sites in Drosophila
Last update: 12/24/2016
This walkthrough illustrates the GEP protocol for the comparative annotation of transcription start sites (TSS) in D. biarmipes. The walkthrough also includes a sample GEP TSS Report for the TSS annotation of onecut.
An Introduction to Hidden Markov Models
Last update: 02/16/2016
Developed by Dr. Anton E. Weisstein (Truman State University), Zongtai Qi and Zane Goodwin (TAs for Bio 4342), this curriculum introduce students to the idea of Hidden Markov Models (HMM) that forms the core component of most gene predictors. The lecture by Zongtai Qi uses weather prediction to illustrate the key concepts of the HMM, whereas the lecture by Zane Goodwin focuses on the HMM that models a splice donor site. The exercise developed by Dr. Weisstein includes a spreadsheet with a simple HMM that models a splice donor site. The spreadsheet allows students to examine the impact of different transition and emission probabilities on splice site predictions. A video recording of Dr. Weisstein's HMM presentation during the 2014 GEP Alumni Workshop is also available online. This HMM curriculum is also available on CourseSource.
Introduction to Dynamic Programming
Last update: 08/04/2015
Developed by Dr. Anton E. Weisstein (Truman State University) and Mingchao Xie (TA for Bio 4342), this lecture and exercise introduce students to the core algorithm (dynamic programming) used by many sequence alignment tools (e.g., BLAST). The exercise includes a spreadsheet with a dynamic programming matrix that allows students to explore the impact of different types of alignments (i.e. global, semiglobal, and local) and scoring systems on the resulting sequence alignment.
From Smith-Waterman to BLAST
Last update: 07/23/2015
This lecture from Dr. Jeremy Buhler discusses the limitations of the Smith-Waterman local alignment algorithm and the heuristics used by the BLAST program in order to reduce the search space and to quickly produce high-scoring local alignments.
Introduction to Motifs and Motif Finding
Last update: 07/29/2014
This document contains the notes from a lecture on motif finding given by Dr. Jeremy Buhler in the Bio 4342 course at WU. The lecture covers the different approaches used to represent sequence motifs and to search for sequence motifs in a genome.
Behavior and Limitations of Motif Finding
Last update: 12/10/2016
Developed by Dr. Jeremy Buhler, this exercise uses MEME to discover putative regulatory motifs in a collection of D. melanogaster promoter sequences. It also illustrates some of the challenges associated with motif finding and the limitations of motif finding programs.
Annotation of Conserved Motifs in Drosophila
Last update: 12/26/2016
This walkthrough uses FlyBase, FlyFactorSurvey, and Patser to identify transcription factor binding sites in the region surrounding the transcription start site of onecut in D. biarmipes.
Motif Discovery in Drosophila
Last update: 12/26/2016
This walkthrough uses FlyBase RNA-Seq Search and the MEME suite to discover motifs that are enriched in a collection of D. melanogaster Muller F element genes that show similar expression patterns.
Multiple Sequence Alignments with Clustal Omega
Last update: 12/08/2016
Developed by Yu He (TA for Bio 4342), this presentation provides a basic overview of the common algorithms used to generate multiple sequence alignments. The presentation also illustrates how one could use Clustal Omega to generate a multiple sequence alignment for a set of orthologous proteins in order to identify conserved domains within the protein.
Generating Multiple Sequence Alignments with ClustalW
Last update: 04/03/2011
Dr. Susan Parrish (McDaniel College) developed a basic lecture and weblem exercise (found at the end of the lecture) on using ClustalW to generate multiple sequence alignments, phylograms, and cladograms. This lecture and exercise are given prior to beginning the GEP annotation projects. Students who submit their GEP annotation projects early are then asked to generate multiple sequence alignments and phylograms of the putative proteins encoded within their assigned contig or fosmid, compared to those related proteins encoded by other Drosophila species of interest to the GEP.