sgdClone WashU Clones Washington University Clones Mapping and Sequencing Description This track displays the location of clones (mostly lambda and cosmid clones) from Washington University in St. Louis using the names assigned by that group. This information was downloaded from the Saccharomyces Genome Database (SGD) from the file https://downloads.yeastgenome.org/curation/chromosomal_feature/clone.tab. Credits Thanks to Washington University in St. Louis and the SGD for the data used in this track. sgdGene SGD Genes Protein-Coding Genes from Saccharomyces Genome Database Genes and Gene Predictions Description This track shows annotated genes and open reading frames (ORFs) of Saccharomyces cerevisiae obtained from the Saccharomyces Genome Database (SGD). The data were downloaded from the file ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/s_cerevisiae.gff3 on 27 Nov. 2003. This track excludes the ORFs classified as dubious by SGD. Clicking on an item in this track brings up a display that synthesizes available data on the gene from a wide variety of sources. Credits Thanks to the SGD for providing the data used in this annotation. sgdOther SGD Other Other Features from Saccharomyces Genome Database Genes and Gene Predictions Description This track shows a variety of features in the Saccharomyces cerevisiae genome, including tRNAs, transposons, centromeres, and open reading frames (ORFs) classified as dubious. The data were downloaded from the Saccharomyces Genome Database (SGD) from the file ftp://genome-ftp.stanford.edu/pub/yeast/data_download/chromosomal_feature/s_cerevisiae.gff3 on 27 Nov. 2003. Click on an item in this track to display details about it. Credits Thanks to the SGD for providing the data used in this annotation. transRegCode Regulatory Code Transcriptional Regulatory Code from Harbison Gordon et al. Expression and Regulation Description This track shows putative regulatory elements in Saccharomyces cerevisiae that are supported by cross-species evidence (Harbison, Gordon, et al., 2004). Harbison, Gordon, et al. performed a genome-wide location analysis with 203 known DNA-binding transcriptional regulators (some under multiple environmental conditions) and identified 11,000 high-confidence interactions between regulators and promoter regions. They then compiled a compendium of motifs for 102 transcriptional regulators based on a combination of their experimental results, cross-species conservation data for four species of yeast and motifs from the literature. Finally, they mapped these motifs to the S. cerevisiae genome. This track shows positions at which these motifs matched the genome with high confidence and at which the matching sequence was well conserved across yeast species. The details page for each putative binding site shows the sequence at that site compared to the position-specific probability matrix for the associated transcriptional regulator (shown as both a table and a graphical logo). It also indicates whether the binding site is supported by experimental (ChIP-chip) results and the number of other yeast species in which it is conserved. See also the "Reg. ChIP-chip" track for additional related information. Display Conventions The scoring ranges from 200 to 1000 and is based on the number of lines of evidence that support the motif being active. Each of the two sensu stricto species in which the motif was conserved counts as a line of evidence. If the ChIP-chip data showed good (P ≤ 0.001) evidence of binding to the transcription factor associated with the motif, that counts as two lines of evidence. If the ChIP-chip data showed weaker (P ≤ 0.005) evidence of binding, that counts as just one line of evidence. The following table shows the relationship between lines of evidence and score: EvidenceScore 41000 3500 2333 1250 0200 Credits The data for this track was provided by the Young and Fraenkel labs at MIT/Whitehead/Broad. The track was created by Jim Kent. References Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J et al. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004 Sep 2;431(7004):99-104. PMID: 15343339; PMC: PMC3006441 Supplementary data at http://younglab.wi.mit.edu/regulatory_code/ and http://fraenkel.mit.edu/Harbison/. transRegCodeProbe Reg. ChIP-chip ChIP-chip Results from Harbison Gordon et al. Expression and Regulation Description This track shows the location of the probes spotted on a slide in the chromatin immunoprecipitation/microarray hybridization (ChIP-chip) experiments described in Harbison, Gordon et al. below. Click on an item in this track to display a page showing which transcription factors pulled down DNA that is enriched for this probe sequence, which transcription factor binding site motifs are present in the probe and whether these motifs are conserved in related yeast species. See also the "Regulatory Code" track for the position of the individual motifs. Credits The data for this track was provided by the Young and Fraenkel labs at MIT/Whitehead/Broad. The track was created by Jim Kent. References Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J et al. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004 Sep 2;431(7004):99-104. PMID: 15343339; PMC: PMC3006441 Supplementary data at http://younglab.wi.mit.edu/regulatory_code/ and http://fraenkel.mit.edu/Harbison/. augustusGene AUGUSTUS AUGUSTUS ab initio gene predictions v3.1 Genes and Gene Predictions Description This track shows ab initio predictions from the program AUGUSTUS (version 3.1). The predictions are based on the genome sequence alone. For more information on the different gene tracks, see our Genes FAQ. Methods Statistical signal models were built for splice sites, branch-point patterns, translation start sites, and the poly-A signal. Furthermore, models were built for the sequence content of protein-coding and non-coding regions as well as for the length distributions of different exon and intron types. Detailed descriptions of most of these different models can be found in Mario Stanke's dissertation. This track shows the most likely gene structure according to a Semi-Markov Conditional Random Field model. Alternative splicing transcripts were obtained with a sampling algorithm (--alternatives-from-sampling=true --sample=100 --minexonintronprob=0.2 --minmeanexonintronprob=0.5 --maxtracks=3 --temperature=2). The different models used by Augustus were trained on a number of different species-specific gene sets, which included 1000-2000 training gene structures. The --species option allows one to choose the species used for training the models. Different training species were used for the --species option when generating these predictions for different groups of assemblies. Assembly Group Training Species Fish zebrafish Birds chicken Human and all other vertebrates human Nematodes caenorhabditis Drosophila fly A. mellifera honeybee1 A. gambiae culex S. cerevisiae saccharomyces This table describes which training species was used for a particular group of assemblies. When available, the closest related training species was used. Credits Thanks to the Stanke lab for providing the AUGUSTUS program. The training for the chicken version was done by Stefanie König and the training for the human and zebrafish versions was done by Mario Stanke. References Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics. 2008 Mar 1;24(5):637-44. PMID: 18218656 Stanke M, Waack S. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics. 2003 Oct;19 Suppl 2:ii215-25. PMID: 14534192 multizYeast Conservation Seven Species of Saccharomyces, Alignments & Conservation Comparative Genomics Description This track shows a measure of evolutionary conservation in seven species of the genus Saccharomyces based on a phylogenetic hidden Markov model (phastCons). The graphic display shows the alignment projected onto S. cerevisiae. The genomes were downloaded from: S. cerevisiae - http://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/ S. paradoxus - http://www.broadinstitute.org/ftp/pub/annotation/fungi/comp_yeasts/S1a.Assembly/Spar_contigs.fasta S. mikatae - http://www.broadinstitute.org/ftp/pub/annotation/fungi/comp_yeasts/S1a.Assembly/Smik_contigs.fasta S. kudriavzevii - http://www.genetics.wustl.edu/saccharomycesgenomes/Contigs/YM6553.fsa.gz S. bayanus - http://www.broadinstitute.org/ftp/pub/annotation/fungi/comp_yeasts/S1a.Assembly/Sbay_contigs.fasta S. castelli - http://www.genetics.wustl.edu/saccharomycesgenomes/Contigs/YM476.fsa.gz S. kluyveri - http://www.genetics.wustl.edu/saccharomycesgenomes/Contigs/YM479.fsa.gz In full display mode, this track shows the overall conservation score across all species as well as pairwise alignments of each species with S. cerevisiae. The pairwise alignments are shown in dense display mode using a grayscale density gradient. The checkboxes in the track configuration section allow the exclusion of species from the pairwise display; however, this does not remove them from the conservation score display. When zoomed-in to the base-display level, the track shows the base composition of each alignment. The numbers and symbols on the Gaps line indicate the lengths of gaps in the S. cerevisiae sequence at those alignment positions relative to the longest non-S. cerevisiae sequence. If there is sufficient space in the display, the size of the gap is shown; if not, and if the gap size is a multiple of 3, a "*" is displayed, otherwise "+" is shown. To view detailed information about the alignments at a specific position, zoom in the display to 30,000 or fewer bases, then click on the alignment. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the Graph configuration help link for an explanation of the configuration options. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. The pairwise alignments were then multiply aligned using multiz, and the resulting multiple alignments were assigned conservation scores by phastCons. The phastCons program computes conservation scores based on a phylo-HMM, a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next (Felsenstein and Churchill 1996, Yang 1995, Siepel and Haussler 2005). PhastCons uses a two-state phylo-HMM, with a state for conserved regions and a state for non-conserved regions. The value plotted at each site is the posterior probability that the corresponding alignment column was "generated" by the conserved state of the phylo-HMM. These scores reflect the phylogeny (including branch lengths) of the species in question, a continuous-time Markov model of the nucleotide substitution process, and a tendency for conservation levels to be autocorrelated along the genome (i.e., to be similar at adjacent sites). The general reversible (REV) substitution model was used. Note that, unlike many conservation-scoring programs, phastCons does not rely on a sliding window of fixed size, so short highly-conserved regions and long moderately conserved regions can both obtain high scores. More information about phastCons can be found in Siepel et al. (2005). PhastCons currently treats alignment gaps as missing data, which sometimes has the effect of producing undesirably high conservation scores in gappy regions of the alignment. We are looking at several possible ways of improving the handling of alignment gaps. Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. "Wiggle track" plotting software by Hiram Clawson at UCSC. References Phylo-HMMs and phastCons: Felsenstein J, Churchill GA. A Hidden Markov Model approach to variation among sites in rate of evolution. Mol Biol Evol. 1996 Jan;13(1):93-104. PMID: 8583911 Siepel A, Haussler D. Phylogenetic Hidden Markov Models. In: Nielsen R, editor. Statistical Methods in Molecular Evolution. New York: Springer; 2005. pp. 325-351. Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S, et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Yang Z. A space-time process model for the evolution of DNA sequences. Genetics. 1995 Feb;139(2):993-1005. PMID: 7713447; PMC: PMC1206396 Chain/Net: Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz: Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Harris RS. Improved pairwise alignment of genomic DNA. Ph.D. Thesis. Pennsylvania State University, USA. 2007. Blastz: Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 ensGene Ensembl Genes Ensembl Genes Genes and Gene Predictions Description These gene predictions were generated by Ensembl. For more information on the different gene tracks, see our Genes FAQ. Methods For a description of the methods used in Ensembl gene predictions, please refer to Hubbard et al. (2002), also listed in the References section below. Data access Ensembl Gene data can be explored interactively using the Table Browser or the Data Integrator. For local downloads, the genePred format files for sacCer1 are available in our downloads directory as ensGene.txt.gz or in our genes download directory in GTF format. For programmatic access, the data can be queried from the REST API or directly from our public MySQL servers. Instructions on this method are available on our MySQL help page and on our blog. Previous versions of this track can be found on our archive download server. Credits We would like to thank Ensembl for providing these gene annotations. For more information, please see Ensembl's genome annotation page. References Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al. The Ensembl genome database project. Nucleic Acids Res. 2002 Jan 1;30(1):38-41. PMID: 11752248; PMC: PMC99161 gcPercent GC Percent Percentage GC in 20,000-Base Windows Mapping and Sequencing Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in a 20,000 base window. Windows with high GC content are drawn more darkly than windows with low GC content. High GC content is typically associated with gene-rich areas. Credits This track was generated at UCSC. blastHg16KG Human Proteins Human Proteins (hg16) Mapped by Chained tBLASTn Genes and Gene Predictions Description This track contains tBLASTn alignments of the peptides from the predicted and known genes identified in the hg16 Known Genes track. Methods First, the predicted proteins from the human Known Genes track were aligned with the human genome using the Blat program to discover exon boundaries. Next, the amino acid sequences that make up each exon were aligned with the S. cerevisiae sequence using the tBLASTn program. Finally, the putative S. cerevisiae exons were chained together using an organism-specific maximum gap size but no gap penalty. The single best exon chains extending over more than 60% of the query protein were included. Exon chains that extended over 60% of the query and matched at least 60% of the protein's amino acids were also included. Credits tBLASTn is part of the NCBI BLAST tool set. For more information on BLAST, see Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-410. Blat was written by Jim Kent. The remaining utilities used to produce this track were written by Jim Kent or Brian Raney. phastConsElements Most Conserved PhastCons Conserved Elements (Seven Species of Saccharomyces) Comparative Genomics Description This track shows predictions of conserved elements produced by the phastCons program. PhastCons is part of the PHAST (PHylogenetic Analysis with Space/Time models) package. The predictions are based on a phylogenetic hidden Markov model (phylo-HMM), a type of probabilistic model that describes both the process of DNA substitution at each site in a genome and the way this process changes from one site to the next. Methods Best-in-genome pairwise alignments were generated for each species using blastz, followed by chaining and netting. A multiple alignment was then constructed from these pairwise alignments using multiz. Predictions of conserved elements were then obtained by running phastCons on the multiple alignments with the --most-conserved option. PhastCons constructs a two-state phylo-HMM with a state for conserved regions and a state for non-conserved regions. The two states share a single phylogenetic model, except that the branch lengths of the tree associated with the conserved state are multiplied by a constant scaling factor rho (0 <= rho <= 1). The free parameters of the phylo-HMM, including the scaling factor rho, are estimated from the data by maximum likelihood using an EM algorithm. This procedure is subject to certain constraints on the "coverage" of the genome by conserved elements and the "smoothness" of the conservation scores. Details can be found in Siepel et al. (2005). The predicted conserved elements are segments of the alignment that are likely to have been "generated" by the conserved state of the phylo-HMM. Each element is assigned a log-odds score equal to its log probability under the conserved model minus its log probability under the non-conserved model. The "score" field associated with this track contains transformed log-odds scores, taking values between 0 and 1000. (The scores are transformed using a monotonic function of the form a * log(x) + b.) The raw log odds scores are retained in the "name" field and can be seen on the details page or in the browser when the track's display mode is set to "pack" or "full". Credits This track was created at UCSC using the following programs: Blastz and multiz by Minmei Hou, Scott Schwartz and Webb Miller of the Penn State Bioinformatics Group. AxtBest, axtChain, chainNet, netSyntenic, and netClass by Jim Kent at UCSC. PhastCons by Adam Siepel at Cornell University. References PhastCons Siepel A, Bejerano G, Pedersen JS, Hinrichs AS, Hou M, Rosenbloom K, Clawson H, Spieth J, Hillier LW, Richards S et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 2005 Aug;15(8):1034-50. PMID: 16024819; PMC: PMC1182216 Chain/Net Kent WJ, Baertsch R, Hinrichs A, Miller W, and Haussler D. Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784 Multiz Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AF, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004 Apr;14(4):708-15. PMID: 15060014; PMC: PMC383317 Blastz Chiaromonte F, Yap VB, Miller W. Scoring pairwise genomic sequence alignments. Pac Symp Biocomput. 2002:115-26. PMID: 11928468 Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison R, Haussler D, and Miller W. Human-Mouse Alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961 oreganno ORegAnno Regulatory elements from ORegAnno Expression and Regulation Description This track displays literature-curated regulatory regions, transcription factor binding sites, and regulatory polymorphisms from ORegAnno (Open Regulatory Annotation). For more detailed information on a particular regulatory element, follow the link to ORegAnno from the details page. ORegAnno (Open Regulatory Annotation). --> Display Conventions and Configuration The display may be filtered to show only selected region types, such as: regulatory regions (shown in light blue) regulatory polymorphisms (shown in dark blue) transcription factor binding sites (shown in orange) regulatory haplotypes (shown in red) miRNA binding sites (shown in blue-green) To exclude a region type, uncheck the appropriate box in the list at the top of the Track Settings page. Methods An ORegAnno record describes an experimentally proven and published regulatory region (promoter, enhancer, etc.), transcription factor binding site, or regulatory polymorphism. Each annotation must have the following attributes: A stable ORegAnno identifier. A valid taxonomy ID from the NCBI taxonomy database. A valid PubMed reference. A target gene that is either user-defined, in Entrez Gene or in EnsEMBL. A sequence with at least 40 flanking bases (preferably more) to allow the site to be mapped to any release of an associated genome. At least one piece of specific experimental evidence, including the biological technique used to discover the regulatory sequence. (Currently only the evidence subtypes are supplied with the UCSC track.) A positive, neutral or negative outcome based on the experimental results from the primary reference. (Only records with a positive outcome are currently included in the UCSC track.) The following attributes are optionally included: A transcription factor that is either user-defined, in Entrez Gene or in EnsEMBL. A specific cell type for each piece of experimental evidence, using the eVOC cell type ontology. A specific dataset identifier (e.g. the REDfly dataset) that allows external curators to manage particular annotation sets using ORegAnno's curation tools. A "search space" sequence that specifies the region that was assayed, not just the regulatory sequence. A dbSNP identifier and type of variant (germline, somatic or artificial) for regulatory polymorphisms. Mapping to genome coordinates is performed periodically to current genome builds by BLAST sequence alignment. The information provided in this track represents an abbreviated summary of the details for each ORegAnno record. Please visit the official ORegAnno entry (by clicking on the ORegAnno link on the details page of a specific regulatory element) for complete details such as evidence descriptions, comments, validation score history, etc. Credits ORegAnno core team and principal contacts: Stephen Montgomery, Obi Griffith, and Steven Jones from Canada's Michael Smith Genome Sciences Centre, Vancouver, British Columbia, Canada. The ORegAnno community (please see individual citations for various features): ORegAnno Citation. References Lesurf R, Cotto KC, Wang G, Griffith M, Kasaian K, Jones SJ, Montgomery SB, Griffith OL, Open Regulatory Annotation Consortium.. ORegAnno 3.0: a community-driven resource for curated regulatory annotation. Nucleic Acids Res. 2016 Jan 4;44(D1):D126-32. PMID: 26578589; PMC: PMC4702855 Griffith OL, Montgomery SB, Bernier B, Chu B, Kasaian K, Aerts S, Mahony S, Sleumer MC, Bilenky M, Haeussler M et al. ORegAnno: an open-access community-driven resource for regulatory annotation. Nucleic Acids Res. 2008 Jan;36(Database issue):D107-13. PMID: 18006570; PMC: PMC2239002 Montgomery SB, Griffith OL, Sleumer MC, Bergman CM, Bilenky M, Pleasance ED, Prychyna Y, Zhang X, Jones SJ. ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation. Bioinformatics. 2006 Mar 1;22(5):637-40. PMID: 16397004 xenoRefGene Other RefSeq Non-S. cerevisiae RefSeq Genes Genes and Gene Predictions Description This track shows known protein-coding and non-protein-coding genes for organisms other than S. cerevisiae, taken from the NCBI RNA reference sequences collection (RefSeq). The data underlying this track are updated weekly. Display Conventions and Configuration This track follows the display conventions for gene prediction tracks. The color shading indicates the level of review the RefSeq record has undergone: predicted (light), provisional (medium), reviewed (dark). The item labels and display colors of features within this track can be configured through the controls at the top of the track description page. Label: By default, items are labeled by gene name. Click the appropriate Label option to display the accession name instead of the gene name, show both the gene and accession names, or turn off the label completely. Codon coloring: This track contains an optional codon coloring feature that allows users to quickly validate and compare gene predictions. To display codon colors, select the genomic codons option from the Color track by codons pull-down menu. For more information about this feature, go to the Coloring Gene Predictions and Annotations by Codon page. Hide non-coding genes: By default, both the protein-coding and non-protein-coding genes are displayed. If you wish to see only the coding genes, click this box. Methods The RNAs were aligned against the S. cerevisiae genome using blat; those with an alignment of less than 15% were discarded. When a single RNA aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 25% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from RNA sequence data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT--the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 esRegGeneToMotif Reg. Module Eran Segal Regulatory Module Expression and Regulation Description This track shows predicted transcription factor binding sites based on sequence similarities upstream of coordinately expressed genes. In dense display mode the gold areas indicate the extent of the area searched for binding sites; black boxes indicate the actual binding sites. In other modes the gold areas disappear and only the binding sites are displayed. Clicking on a particular predicted binding site displays a page that shows the sequence motif associated with the predicted transcription factor and the sequence at the predicted binding site. Where known motifs have been identified by this method, they are named; otherwise, they are assigned a motif number. Methods This analysis was performed according to Genome-wide discovery of transcriptional modules from DNA sequence and gene expression on various pre-existing microarray datasets. A regulatory module is comprised of a set of genes predicted to be regulated by the same combination of DNA sequence motifs. The predictions are based on the co-expression of the set of genes in the module and on the appearance of common combinations of motifs in the upstream regions of genes assigned to the same module. Credits Thanks to Eran Segal for providing the data analysis that forms the basis for this track. The display was programmed by Jim Kent. References Segal E, Yelensky R, Koller D. Genome-wide discovery of transcriptional modules from DNA sequence and gene expression. Bioinformatics. 2003;19 Suppl 1:i273-82. PMID: 12855470 est S. cer. ESTs S. cerevisiae ESTs Including Unspliced mRNA and EST Description This track shows alignments between S. cerevisiae expressed sequence tags (ESTs) in GenBank and the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, S. cerevisiae ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 mrna S. cer. mRNAs S. cerevisiae mRNAs from GenBank mRNA and EST Description The mRNA track shows alignments between S. cerevisiae mRNAs in GenBank and the genome. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, the items that are more darkly shaded indicate matches of better quality. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the mRNA display. For example, to apply the filter to all mRNAs submitted by a specific author, type the name of the individual in the author box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "author" table contains the names of all individuals who can be entered into the author text box. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only mRNAs that match all filter criteria will be displayed. If "or" is selected, only mRNAs that match any one of the filter criteria will be displayed. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display mRNAs that match the filter criteria. If "include" is selected, the browser will display only those mRNAs that match the filter criteria. This track may also be configured to display codon coloring, a feature that allows the user to quickly compare mRNAs against the genomic sequence. For more information about this option, click here. Methods GenBank S. cerevisiae mRNAs were aligned against the genome using the blat program. When a single mRNA aligned in multiple places, the alignment having the highest base identity was found. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence were kept. Credits The mRNA track was produced at UCSC from mRNA sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 simpleRepeat Simple Repeats Simple Tandem Repeats by TRF Variation and Repeats Description This track displays simple tandem repeats (possibly imperfect repeats) located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 intronEst Spliced ESTs S. cerevisiae ESTs That Have Been Spliced mRNA and EST Description This track shows alignments between S. cerevisiae expressed sequence tags (ESTs) in GenBank and the genome that show signs of splicing when aligned against the genome. ESTs are single-read sequences, typically about 500 bases in length, that usually represent fragments of transcribed genes. To be considered spliced, an EST must show evidence of at least one canonical intron (i.e., the genomic sequence between EST alignment blocks must be at least 32 bases in length and have GT/AG ends). By requiring splicing, the level of contamination in the EST databases is drastically reduced at the expense of eliminating many genuine 3' ESTs. For a display of all ESTs (including unspliced), see the S. cerevisiae EST track. Display Conventions and Configuration This track follows the display conventions for PSL alignment tracks. In dense display mode, darker shading indicates a larger number of aligned ESTs. The strand information (+/-) indicates the direction of the match between the EST and the matching genomic sequence. It bears no relationship to the direction of transcription of the RNA with which it might be associated. The description page for this track has a filter that can be used to change the display mode, alter the color, and include/exclude a subset of items within the track. This may be helpful when many items are shown in the track display, especially when only some are relevant to the current task. To use the filter: Type a term in one or more of the text boxes to filter the EST display. For example, to apply the filter to all ESTs expressed in a specific organ, type the name of the organ in the tissue box. To view the list of valid terms for each text box, consult the table in the Table Browser that corresponds to the factor on which you wish to filter. For example, the "tissue" table contains all the types of tissues that can be entered into the tissue text box. Multiple terms may be entered at once, separated by a space. Wildcards may also be used in the filter. If filtering on more than one value, choose the desired combination logic. If "and" is selected, only ESTs that match all filter criteria will be highlighted. If "or" is selected, ESTs that match any one of the filter criteria will be highlighted. Choose the color or display characteristic that should be used to highlight or include/exclude the filtered items. If "exclude" is chosen, the browser will not display ESTs that match the filter criteria. If "include" is selected, the browser will display only those ESTs that match the filter criteria. This track may also be configured to display base labeling, a feature that allows the user to display all bases in the aligning sequence or only those that differ from the genomic sequence. For more information about this option, go to the Base Coloring for Alignment Tracks page. Several types of alignment gap may also be colored; for more information, go to the Alignment Insertion/Deletion Display Options page. Methods To make an EST, RNA is isolated from cells and reverse transcribed into cDNA. Typically, the cDNA is cloned into a plasmid vector and a read is taken from the 5' and/or 3' primer. For most — but not all — ESTs, the reverse transcription is primed by an oligo-dT, which hybridizes with the poly-A tail of mature mRNA. The reverse transcriptase may or may not make it to the 5' end of the mRNA, which may or may not be degraded. In general, the 3' ESTs mark the end of transcription reasonably well, but the 5' ESTs may end at any point within the transcript. Some of the newer cap-selected libraries cover transcription start reasonably well. Before the cap-selection techniques emerged, some projects used random rather than poly-A priming in an attempt to retrieve sequence distant from the 3' end. These projects were successful at this, but as a side effect also deposited sequences from unprocessed mRNA and perhaps even genomic sequences into the EST databases. Even outside of the random-primed projects, there is a degree of non-mRNA contamination. Because of this, a single unspliced EST should be viewed with considerable skepticism. To generate this track, S. cerevisiae ESTs from GenBank were aligned against the genome using blat. Note that the maximum intron length allowed by blat is 750,000 bases, which may eliminate some ESTs with very long introns that might otherwise align. When a single EST aligned in multiple places, the alignment having the highest base identity was identified. Only alignments having a base identity level within 0.5% of the best and at least 96% base identity with the genomic sequence are displayed in this track. Credits This track was produced at UCSC from EST sequence data submitted to the international public sequence databases by scientists worldwide. References Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. PMID: 23193287; PMC: PMC3531190 Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank: update. Nucleic Acids Res. 2004 Jan 1;32(Database issue):D23-6. PMID: 14681350; PMC: PMC308779 Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 uwFootprints UW Footprints UW Protein/DNA Interaction Footprints Expression and Regulation Description The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence. Display Conventions and Configuration DNaseI-seq cleavage counts are displayed at nucleotide resolution, along with a 'mappability' track that indicates whether tag sequences starting at that location on both the forward and the reverse strands can be uniquely mapped to the yeast genome. Finally, the set of footprints with q values <0.1 are included, where the q value is defined as the minimal false discovery rate threshold at which the given footprint is deemed significant. The name associated with each footprint is its q value. Methods To visualize regulatory protein occupancy across the genome of Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was coupled with massively parallel DNA sequencing to create a dense whole-genome map of DNA template accessibility at the nucleotide-level. Yeast nuclei were isolated and treated with a DNase I concentration sufficient to release short (<300 bp) DNA fragments. Small fragments were derived from two DNase I "hits" in close proximity. Each end of those fragments represents an in vivo DNase I cleavage site. The sequence and hence genomic location of these sites were then determined by DNA sequencing. Footprints were identified using a computational algorithm that evaluates short regions (between 8 and 30 bp) over which the DNase I cleavage density was significantly reduced compared with the immediately flanking regions. FDR thresholds were assigned to each footprint by comparing p-values obtained from real and shuffled cleavage data. Detailed methods are given in Hesselberth et al. (2009), and supplementary data and source code are available here. Credits This track was produced at the University of Washington by Jay R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos (jstam@stamlab.org). References Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods. 2009 Apr;6(4):283-9. PMID: 19305407; PMC: PMC2668528 uwFootprintsViewCounts Tag Counts UW Protein/DNA Interaction Footprints Expression and Regulation uwFootprintsTagCounts Tag Counts UW Footprints Tag Counts Expression and Regulation Description The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence. Display Conventions and Configuration DNaseI-seq cleavage counts are displayed at nucleotide resolution, along with a 'mappability' track that indicates whether tag sequences starting at that location on both the forward and the reverse strands can be uniquely mapped to the yeast genome. Finally, the set of footprints with q values <0.1 are included, where the q value is defined as the minimal false discovery rate threshold at which the given footprint is deemed significant. The name associated with each footprint is its q value. Methods To visualize regulatory protein occupancy across the genome of Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was coupled with massively parallel DNA sequencing to create a dense whole-genome map of DNA template accessibility at the nucleotide-level. Yeast nuclei were isolated and treated with a DNase I concentration sufficient to release short (<300 bp) DNA fragments. Small fragments were derived from two DNase I "hits" in close proximity. Each end of those fragments represents an in vivo DNase I cleavage site. The sequence and hence genomic location of these sites were then determined by DNA sequencing. Footprints were identified using a computational algorithm that evaluates short regions (between 8 and 30 bp) over which the DNase I cleavage density was significantly reduced compared with the immediately flanking regions. FDR thresholds were assigned to each footprint by comparing p-values obtained from real and shuffled cleavage data. Detailed methods are given in Hesselberth et al. (2009), and supplementary data and source code are available here. Credits This track was produced at the University of Washington by Jay R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos (jstam@stamlab.org). References Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods. 2009 Apr;6(4):283-9. PMID: 19305407; PMC: PMC2668528 uwFootprintsViewMap Mappability UW Protein/DNA Interaction Footprints Expression and Regulation uwFootprintsMappability Mappability UW Footprints Mappability Expression and Regulation Description The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence. Display Conventions and Configuration DNaseI-seq cleavage counts are displayed at nucleotide resolution, along with a 'mappability' track that indicates whether tag sequences starting at that location on both the forward and the reverse strands can be uniquely mapped to the yeast genome. Finally, the set of footprints with q values <0.1 are included, where the q value is defined as the minimal false discovery rate threshold at which the given footprint is deemed significant. The name associated with each footprint is its q value. Methods To visualize regulatory protein occupancy across the genome of Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was coupled with massively parallel DNA sequencing to create a dense whole-genome map of DNA template accessibility at the nucleotide-level. Yeast nuclei were isolated and treated with a DNase I concentration sufficient to release short (<300 bp) DNA fragments. Small fragments were derived from two DNase I "hits" in close proximity. Each end of those fragments represents an in vivo DNase I cleavage site. The sequence and hence genomic location of these sites were then determined by DNA sequencing. Footprints were identified using a computational algorithm that evaluates short regions (between 8 and 30 bp) over which the DNase I cleavage density was significantly reduced compared with the immediately flanking regions. FDR thresholds were assigned to each footprint by comparing p-values obtained from real and shuffled cleavage data. Detailed methods are given in Hesselberth et al. (2009), and supplementary data and source code are available here. Credits This track was produced at the University of Washington by Jay R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos (jstam@stamlab.org). References Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods. 2009 Apr;6(4):283-9. PMID: 19305407; PMC: PMC2668528 uwFootprintsViewPrint Footprints UW Protein/DNA Interaction Footprints Expression and Regulation uwFootprintsPrints Footprints UW Protein-binding Footprints Expression and Regulation Description The orchestrated binding of transcriptional activators and repressors to specific DNA sequences in the context of chromatin defines the regulatory program of eukaryotic genomes. We developed a digital approach to assay regulatory protein occupancy on genomic DNA in vivo by dense mapping of individual DNase I cleavages from intact nuclei using massively parallel DNA sequencing. Analysis of >23 million cleavages across the Saccharomyces cerevisiae genome revealed thousands of protected regulatory protein footprints, enabling de novo derivation of factor binding motifs as well as the identification of hundreds of novel binding sites for major regulators. We observed striking correspondence between nucleotide-level DNase I cleavage patterns and protein-DNA interactions determined by crystallography. The data also yielded a detailed view of larger chromatin features including positioned nucleosomes flanking factor binding regions. Digital genomic footprinting provides a powerful approach to delineate the cis-regulatory framework of any organism with an available genome sequence. Display Conventions and Configuration DNaseI-seq cleavage counts are displayed at nucleotide resolution, along with a 'mappability' track that indicates whether tag sequences starting at that location on both the forward and the reverse strands can be uniquely mapped to the yeast genome. Finally, the set of footprints with q values <0.1 are included, where the q value is defined as the minimal false discovery rate threshold at which the given footprint is deemed significant. The name associated with each footprint is its q value. Methods To visualize regulatory protein occupancy across the genome of Saccharomyces cerevisiae, DNase I digestion of yeast nuclei was coupled with massively parallel DNA sequencing to create a dense whole-genome map of DNA template accessibility at the nucleotide-level. Yeast nuclei were isolated and treated with a DNase I concentration sufficient to release short (<300 bp) DNA fragments. Small fragments were derived from two DNase I "hits" in close proximity. Each end of those fragments represents an in vivo DNase I cleavage site. The sequence and hence genomic location of these sites were then determined by DNA sequencing. Footprints were identified using a computational algorithm that evaluates short regions (between 8 and 30 bp) over which the DNase I cleavage density was significantly reduced compared with the immediately flanking regions. FDR thresholds were assigned to each footprint by comparing p-values obtained from real and shuffled cleavage data. Detailed methods are given in Hesselberth et al. (2009), and supplementary data and source code are available here. Credits This track was produced at the University of Washington by Jay R. Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J. Sabo, Richard Sandstrom, Alex P. Reynolds, Robert E. Thurman, Shane Neph, Michael S. Kuehn, William S. Noble (william-noble@u.washington.edu), Stanley Fields (fields@u.washington.edu) and John A. Stamatoyannopoulos (jstam@stamlab.org). References Hesselberth JR, Chen X, Zhang Z, Sabo PJ, Sandstrom R, Reynolds AP, Thurman RE, Neph S, Kuehn MS, Noble WS et al. Global mapping of protein-DNA interactions in vivo by digital genomic footprinting. Nat Methods. 2009 Apr;6(4):283-9. PMID: 19305407; PMC: PMC2668528