Engaging Students in Research in Genomics
At Washington University, interested students enroll in a one-semester upper-level laboratory course, Bio 4342 Research Explorations in Genomics, which meets for two hours of lecture and six hours of lab a week. The course is designed as a collaborative laboratory investigation of a problem in genomics, involving wet-lab generation of a large data set (finishing a genomic sequence) and computer analysis of the data (including annotation of genes, assessment of repeats, exploration of evolutionary questions, etc.). At present the research problem involves generating finished sequence from the dot chromosome of various species of Drosophila, making comparisons among species to discern patterns of genome organization related to control of gene expression. The scientific interest is based on observations indicating that the dot chromosome is largely heterochromatic, yet contains ~80 genes in the 1.2 Mb distal region, similar to the euchromatic arms in gene density. Understanding this domain requires careful analysis not just of the genes present, but also of the type and distribution of repetitious sequences. A series of PowerPoint presentations on this question by SCR Elgin is available here. These lectures discuss 1) the presence of repetitious sequences in eukaryotic genomes, analyzed by techniques including hybridization kinetics (see also excerpt from Wood et al "Biochemistry: A Problems Approach;" 2) data establishing the nucleosome model of chromatin structure; 3) the cytological and biochemical distinction between euchromatin and heterochromatin; and 4) a discussion of our current understanding of fourth chromosome organization and function. All slides are annotated indicating source and providing a text for the lecture. Other cases where finished and annotated sequences are of particular interest will be taken up in future years. A single introductory lecture has been added in 2011 for the Cal State CSUPERB Workshop.
The class is organized into two main sections. During the first six weeks students analyze raw sequence data generated by NIH grantees (taken from the web), identify additional sequencing reactions needed, obtain the data in collaboration with the Washington University Genome Center, and generate finished, high quality sequence. In the second half of the course, students analyze and annotate the new sequences they (or their GEP classmates) produced during the first half. In a one semester course, most students can finish and annotate one or two 40 kb fosmids of DNA, depending on the difficulty. As one example of such a course, a copy of the schedule for Bio 4342 for 2012 is provided, along with course information indicating the basis for grading. Note that students in the Genomics Education Partnership can participate in an annotation project without having to be involved in finishing a sequence, as some publically posted sequence data is of sufficient quality that no further finishing is needed. Background reading for Bio 4342, including papers on DNA sequencing technology, on ways that sequencing is currently being used in a variety of research projects, and on the Drosophila genomes, is listed on the literature page.
The materials that we have adapted and authored to teach finishing using Consed (a program for viewing and editing sequence assemblies, developed by David Gordon in Phil Green's laboratory), and to teach annotation skills using a variety of web-based tools, are posted on the website. Note that academic users can obtain Consed for restricted academic use free of charge directly from the University of Washington. (For details, please visit the phred, phrap, consed website). Members of the Genomics Education Partnership should consult Chris Shaffer at Washington University.) Our annotation strategy makes extensive use of the UCSC Genome Browser (including custom tracks for our own projects), the Ensembl Genome Browser, NCBI Blast, FlyBase, and the ExPASy Proteomics Server, all publically accessible sites.
Bio 4342 has been designed by Professors Sarah Elgin Ph.D., Department of Biology; Elaine Mardis Ph.D., Department of Genetics, Genome Center; and Jeremy Buhler Ph.D., Department of Computer Science & Engineering; with considerably input from Christopher Shaffer, Wilson Leung, and other students and staff at Washington University.
