/goldenPath/help/bigRmskTrackDescExample.html:Genome_Browser_bigRmsk_RepeatMasker_Description /goldenPath/help/interact.html:interact_and_bigInteract_Track_Format Interact and bigInteract Track Format The interact (and bigInteract) track format displays pairwise interactions as arcs or half-rectangles connecting two genomic regions on the same chromosome. Cross-chromosomal interactions can also be represented in this format; the display shows the region on the currently viewed chromosome, with a vertical bar, labeled with the chromosome of the connected region (space permitting). For directional interactions such as SNP/gene, the interactions in the reverse direction are displayed as a dashed line or curve. An alternative 'cluster' view can also be configured for directional interaction data. This view groups interactions by source or by target, producing a display of linked blocks. This format is useful for displaying functional element interactions such as SNP/gene interactions, and is also suitable for low-density chromatin interactions, such as ChIA-PET, and other use cases with a limited number of interactions on the genome. It is not suitable for high-density chromatin data such as Hi-C. The interact format is available as a standalone plain text bed5+13 format for use with smaller datasets as a custom track, and as a binary indexed format (bigInteract) suitable for track hubs and custom tracks. The bigInteract format provides more track customization features (i.e. schema customization), and is recommended for users who can use command-line tools and have web-accessible data storage. If you do not have web-accessible data storage, please see the Hosting section of the Track Hub Help documentation. Interact format files are converted to bigInteract files using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) schema file defining the fields of the bigInteract. Interact format definition The following autoSql definition illustrates the basic schema supporting interact (and bigInteract) tracks. table interact "interaction between two regions" ( string chrom; "Chromosome (or contig, scaffold, etc.). For interchromosomal, use 2 records" uint chromStart; "Start position of lower region. For interchromosomal, set to chromStart of this region" uint chromEnd; "End position of upper region. For interchromosomal, set to chromEnd of this region" string name; "Name of item, for display. Usually 'sourceName/targetName/exp' or empty" uint score; "Score (0-1000)" double value; "Strength of interaction or other data value. Typically basis for score" string exp; "Experiment name (metadata for filtering). Use . if not applicable" string color; "Item color. Specified as r,g,b or hexadecimal #RRGGBB or html color name, as in //www.w3.org/TR/css3-color/#html4. Use 0 and spectrum setting to shade by score" string sourceChrom; "Chromosome of source region (directional) or lower region. For non-directional interchromosomal, chrom of this region." uint sourceStart; "Start position in chromosome of source/lower/this region" uint sourceEnd; "End position in chromosome of source/lower/this region" string sourceName; "Identifier of source/lower/this region" string sourceStrand; "Orientation of source/lower/this region: + or -. Use . if not applicable" string targetChrom; "Chromosome of target region (directional) or upper region. For non-directional interchromosomal, chrom of other region" uint targetStart; "Start position in chromosome of target/upper/this region" uint targetEnd; "End position in chromosome of target/upper/this region" string targetName; "Identifier of target/upper/this region" string targetStrand; "Orientation of target/upper/this region: + or -. Use . if not applicable" ) Column Explanations The first 5 fields of the interact format are the same as the first 5 fields of the standard BED format. See a graphical depiction below of the columns. When creating bigInteract files, we encourage you to customize the title and field descriptions of the prototype autoSql schema to better describe your data. Customizing this file will make your data more easily interpreted by users, who will see the field descriptions when accessing the track data from the Table Browser, when viewing items on the Genome Browser details pages (via the "view table schema" link), and (for users who download files), from the -as option of the bigBedInfo tool. As an example, if the dataset represents SNP/gene interactions, replace 'sourceName' and related fields with 'snpName', etc, and 'targetName' and related fields with 'geneName', etc., editing the field descriptions to reflect the changes you make. For non-directional data such as ChIA-PET, one could use 'region1' and 'region2'. As the browser display of this format only shows the paired region labels on mouseover, we recommend including a BED or other format file to display the source and target region labels. Creating interact and bigInteract custom tracks Example #1 In this example, you will create an interact custom track using example SNP/gene interaction data in multiple tissues. This example uses the interaction coloring and directionality features of the interact track type. 1. Paste the following track line into the custom track management page for the human assembly hg19. track type=interact name="interact Example One" description="An interact file" interactDirectional=true maxHeightPixels=200:100:50 visibility=full browser position chr12:40,560,500-40,660,499 #chrom chromStart chromEnd name score value exp color sourceChrom sourceStart sourceEnd sourceName sourceStrand targetChrom targetStart targetEnd targetName targetStrand chr12 40572709 40618813 rs7974522/LRRK2/muscleSkeletal 0 0.624 muscleSkeletal #7A67EE chr12 40572709 40572710 rs7974522 . chr12 40618812 40618813 LRRK2 + chr12 40579899 40618813 rs17461492/LRRK2/muscleSkeletal 0 0.624 muscleSkeletal #7A67EE chr12 40579899 40579900 rs17461492 . chr12 40618812 40618813 LRRK2 + chr12 40614433 40618813 rs76904798/LRRK2/nerveTibial 0 0.625 nerveTibial #FFD700 chr12 40614433 40614434 rs76904798 . chr12 40618812 40618813 LRRK2 + chr12 40618812 40652520 rs2723264/LRRK2/lung 0 1.839 lung #9ACD32 chr12 40652519 40652520 rs2723264 . chr12 40618812 40618813 LRRK2 + 2. Click the "submit" button. After the file loads in the Genome Browser, you should see four interactions displayed; four variants interacting with the same gene (LRRK2). Hovering the mouse over the curve peak of an interaction will display the interaction name (SNP/gene/tissue). Hovering over an interaction end will show the name of the end region (e.g. SNP or gene). Clicking at one of the hoverable regions will show the details page for the interaction. The interactDirectional setting causes reverse direction interactions (where target precedes source) to be displayed as dashed lines. In this example, the green (lung) interaction is in the reverse direction. Example #2 In this example, you will create an interact custom track using example chromatin interaction data. This type of data is non-directional and commonly would represent a single experiment, with the interaction score being of interest. The settings below display using the gray-scale coloring feature, where the darkness of the interaction is based on the score. 1. Paste the following track line into the custom track management page for the human assembly hg19. track type=interact name="interact Example Two" description="Chromatin interactions" useScore=on maxHeightPixels=200:100:50 visibility=full browser position chr3:64,562,440-64,642,288 chr3 64496901 64584378 . 375 3 . 0 chr3 64496901 64498901 . . chr3 64581378 64584378 . . chr3 64568052 64569134 . 400 3 . 0 chr20 52552477 52556062 . . chr3 64568052 64569134 . . chr3 64596901 64615378 . 175 3 . 0 chr3 64596901 64600855 . . chr3 64612677 64615378 . . chr3 64623042 64636663 . 800 4 . 0 chr3 64623042 64625153 . . chr3 64632961 64636663 . . 2. Click the "submit" button. After the file loads in the Genome Browser, you should see four interactions displayed on chromosome 3. Two of the interactions have both interacting regions in the browser view, and two have a single region. One of these interacts across chromosomes (with a region on chromosome 20), and the other with a region outside of the browser window (indicated by rectangular connector). The darkness of the interaction indicates the strength of the interaction. Also see the graphical representation of each column of this example data below. Example #3 In this example, you will create a bigInteract track out of an existing bigInteract format file, located on the UCSC Genome Browser http server. This file contains data for the hg19 assembly. This example also contains the interactUp=true setting to flip the arcs of the interact display. To create a custom track using this file: 1. Construct a track line referencing the file and set the browser position to show region of interest in the file: track type=bigInteract interactUp=true name="interact Example Three" description="A bigInteract file" useScore=on visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/interact/interactExample3.inter.bb browser position chr3:63,820,967-63,880,091 2. Paste the track line into the custom track management page for the human assembly hg19. 3. Click the "submit" button. After the file loads in the Genome Browser, you should see a number of interactions, all arching as hills instead of valleys, with some curved and many rectangular indicating a connector to a region outside of the browser window. Press the 10x zoom out button to see the full connections. Example #4 In this example, you will use an example BED file to create a bigInteract file, allowing the data to be remotely accessed and exist within a track hub. The track settings for bigInteract on a hub can be viewed here. 1. Download the example file here. 2. Download the fetchChromSizes and bedToBigBed programs from the utilities directory appropriate to your operating system. 3. Use fetchChromSizes to create a chrom.sizes file for the UCSC database you are working with (hg19 for these examples). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg19.chrom.sizes file for the hg19 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.chrom.sizes. 4. Save the autoSql file interact.as to your computer. If you want the column labels to reflect non-directional interactions, you can change the default variable names from 'source...' and 'target...' to 'region1...' and 'region2...'. 5. Run bedToBigBed to create the bigInteract file: bedToBigBed -as=interact.as -type=bed5+13 interactExample4.inter.bed hg19.chrom.sizes interactExample4.inter.bb 6. Move the newly constructed bigInteract file to a web accessible http, https, or ftp location. 7. Construct a custom track line with a bigDataUrl parameter pointing to the newly created bigInteract file. track type=bigInteract name="interact Example Four" description="A bigInteract file" useScore=on bigDataUrl=/interactExample4.inter.bb visibility=pack browser position chr3:63,820,967-63,880,091 8. To fully take advantage of creating a bigInteract file, create a Track Hub and use a stanza such as the following: track exampleInteractTrack type bigInteract visibility full shortLabel exInteract longLabel Example interact track spectrum on scoreMin 175 maxHeightPixels 300:150:20 bigDataUrl interactExample4.inter.bb Understanding the interact file format This graphic represents the data in Example #2 with boxes around columns of data, separately illustrated as individual custom tracks in the lower image. [] The interact file format has 18 fields where the first 5 fields (box column1) are standard BED format fields, which define the span of the interaction to be viewed on a chromosome. In the below image, see the representation of box column1 and how it spans the length of each arc. Next, there are 3 fields for value, exp, and color before two sets of 5 fields that specify the coordinates, name, and strand of the source (box column2) and target (box column3) data, defining the endpoints of each interact arc. In the below image, the box column2 represents the left foot of each arc while the box column3 represents the right foot of each arc. The second row of the example data denotes an interaction to another chromosome, chr20, and thus is not represented by an arc. [] Sharing your data with others If you would like to share your interact/bigInteract data track with a colleague, learn how to create a URL by looking at Example 6 on this page. Extracting data from the bigInteract format Because bigInteract files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. Use the -as option to see the file field descriptions. As with all UCSC Genome Browser programs, one can type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program. /goldenPath/help/hgCodonColoringMrna.html:Genome_Browser_Codon_Coloring Codon and Base Coloring for Alignment Tracks The Genome Browser's codon coloring feature allows users to quickly validate and compare alignment tracks, including mRNAs. To turn on codon coloring, select the desired option from the Color track by codons pull-down menu. Color Options - genomic codons - codons are labeled and colored according to the genomic sequence - mRNA codons - codons are labeled and colored according to the aligned sequence - nonsynonymous mRNA codons - codons are labeled and colored according to the aligned sequence. Only nonsynonymous codons are labeled. Conservative and non-conservative substitutions are defined using the BLOSUM62 score matrix. Conservative substitutions (yellow) are defined as positive BLOSUM scores, while non-conservative substitutions (red) are defined as negative scores. - mRNA bases - bases are labeled. - different mRNA bases - only those bases that differ from the genomic sequence are labeled and colored. Color Legend Genomic/mRNA codons Nonsynonymous mRNA codons mRNA bases/different mRNA bases -------- ------------------------------------- -------------------------------------- --------------------------------- red "*" stop codons non-conservative codon substitutions base substitutions yellow conservative codon substitutions cyan spliced or truncated partial codons spliced or truncated partial codons green start codons incl. methionine black "X" truncated codons Details The codon reading frame is defined by the GenBank CDS annotation of the aligned sequence. Each codon is colored and labeled in the direction of transcription. A sequence without a CDS will not be colored; these can be non-protein coding RNAs or an mRNA where a CDS annotation was not provided in the original data. The mRNA bases/different mRNA bases display options are colored and labeled in the direction of the genomic sequence. Note that it is possible to show the base complement, and thus change the base labeling, by clicking the arrow to the left of the base display or by clicking the "reverse" button. When zoomed out past the base level, the browser will choose one color to represent many bases. The priority of display, from most important to least important, is: different mRNA base/nonsynonymous codon coloring (if enabled), and then alignment coloring (if enabled). The browser will not display genomic/mRNA codon coloring when viewing large regions of the genome. To view labeling, the track must be zoomed to within 3 times the base level. For information about alignment insertion/deletion display options, click here. References Henikoff S and Henikoff JG. Amino acid substitution matrices from protein blocks. PNAS. 1992;89(22):10915-10919. /goldenPath/help/metadata.html:Adding_Track_Metadata Adding metadata to tracks Contents Adding metadata to tracks Previous metadata versions Tagstorm metadata Tab-separated metadata Adding metadata to tracks Adding metadata to your tracks about cell lines, experimental protocols, or assays can be accomplished in a number of ways, via the newly supported metaDb or metaTab trackDb fields, or via the older style metadata trackDb field. The metaDb and metaTab fields link external tagStorm or tab-separated metadata files to the data in the hub. The new formats are preferred over the older metadata field, although the metadata lines will continue to be supported for track hubs, but no new features will be added as they will for tagStorm and tab-separated files. The following is an example of a genomes.txt file calling the tagStorm metadata file: genome hg38 metaDb relativePath/to/tagStorm.txt and specifying a tab-separated metadata file: genome hg38 metaTab relativePath/to/tabSep.txt When using tab-separated or tagStorm metadata, a meta column or line will be needed to specify which metadata information is applied to a track. The meta value should be a unique alphanumeric string. Previous metadata versions Currently, in order to add metadata to your tracks, you must specify all of the metadata key-value pairs in each stanza of a track that includes metadata, like the last line of the following example: track experiment1 shortLabel Donor A longLabel Donor A's Metadata Experiment type bigWig bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw parent treatmentX on subGroups view=X metadata differentiation="10 hour" treatment=X donor=A lab="List Meta Lab" data_set_id=ucscTest1 access=group assay=long-RNA-seq enriched_in=exon life_stage=postpartum species="Homo sapiens" ucsc_db=hg38 Each track must have a separate metadata field and its own list of key-values, which can become cumbersome when each track in a group all share a common subset of metadata. For instance, if there are 10 tracks in a composite or multiWig, where each subtrack only differs in the "differentiation" tag, it would be more convenient to have a shared set of metadata and then specify the differences for each track. This is the motivation behind the tagStorm format, described below. You can find an example of a hub using the metadata example here and you can load the following session to view the hub, https://genome.ucsc.edu/s/PublicSessions/metadata_field. Tagstorm metadata The tagStorm format is a plain text file similar to a trackDb file that describes all of the tracks in a track hub, in that both are files where the first word in a line is the tag and the rest of the line is the value, and different stanza's are line delimited. TagStorm's are also similar to a spreadsheet, where a tag corresponds to a column and a stanza to an entire row. The tagStorm format is easy for computers to parse, reduce the redundancy of a tab-separated file, and they are human readable. Here is a canonical tagStorm example: lab tagStorm Lab data_set_id ucscTest1 access group assay long-RNA-seq enriched_in exon life_stage postpartum species Homo sapiens ucsc_db hg38 treatment X donor A differentiation 10 hour meta ucsc1_1 differentiation 1 day meta ucsc1_4 differentiation 5 days meta ucsc1_7 treatment Y donor B differentiation 10 hour meta ucsc1_2 differentiation 1 day meta ucsc1_5 differentiation 5 days meta ucsc1_8 donor C differentiation 10 hour meta ucsc1_3 differentiation 1 day meta ucsc1_6 differentiation 5 days meta ucsc1_9 Each stanza, such as "donor B", inherits from any stanzas above it at the right indentation level, and is a parent to stanzas beneath. In the example above, Treatment Y applies to both "donor B" and "donor C". Treatment X only applies to "donor A" as they are at the same indentation level. There are three differentiation times that apply to each of the donors and they can be referenced in the trackDb stanza using the meta line in the tagStorm file, i.e., meta ucsc1_1. The meta ucsc1_1 line would reference the following metadata: lab tagStorm Lab data_set_id ucscTest1 access group assay long-RNA-seq enriched_in exon life_stage postpartum species Homo sapiens ucsc_db hg38 treatment X donor A differentiation 10 hour A trackDb stanza using the tagStorm metadata can be seen in the following example: track experiment1 shortLabel Donor A longLabel Donor A's TagStorm Experiment type bigWig bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw parent treatmentX on subGroups view=X meta ucsc1_1 You can find the complete example of the hub using the tagStorm metadata here and you can load the following session to view the hub, https://genome.ucssc.edu/s/PublicSessions/tagStorm_metadata. The hub uses a composite track, so if you are unfamiliar with composite tracks, the Quick Start Guide on composites can explain how the tracks are organized. The details page for the trackDb stanza example (Donor A) can be seen below. [] Tab-separated metadata Column or tab-separated metadata can be useful to store computer readable information as an array. While this format is very easy for a computer to parse, it can be bit confusing or difficult for humans to read and interpret. As you can see in the example below, many columns become redundant as they repeat the same information on each line. #lab data_set_id access assay enriched_in life_stage species ucsc_db treatment donor differentiation meta tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 X A 10 hour ucsc1_1 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 X A 1 day ucsc1_4 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 X A 5 days ucsc1_7 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y B 10 hour ucsc1_2 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y B 1 day ucsc1_5 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y B 5 days ucsc1_8 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y C 10 hour ucsc1_3 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y C 1 day ucsc1_6 tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 Y C 5 days ucsc1_9 To reference a line in the TSV or CSV file in the trackDb stanza, a meta column must contain a unique alpha-numeric string. For example, ucsc1_1 would reference the following metadata in your track: #lab data_set_id access assay enriched_in life_stage species ucsc_db treatment donor differentiation meta tabSepLab ucscTest1 group long-RNA-seq exon postpartum Homo sapiens hg38 X A 10 hour ucsc1_1 A trackDb stanza using the tab-separated metadata can be seen in the following example: track experiment1 shortLabel Donor A longLabel Donor A's Tab Separated Experiment type bigWig bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigWigExample.bw parent treatmentX on subGroups view=X meta ucsc1_1 You can find the complete example of the hub using the tab-separated metadata here and you can load the following session to view the hub, https://genome.ucsc.edu/s/PublicSessions/TabSeparated_metadata. The hub uses a composite track, so if you are unfamiliar with composite tracks, the Quick Start Guide on composites can explain how the tracks are organized. The details page for the trackDb stanza example (Donor A) can be seen below. [] /goldenPath/help/hgVcfTrackHelp.html:Genome_Browser_VCF_Tracks Configuring VCF tracks Genome Browser VCF tracks may be configured in a variety of ways to highlight different aspects of the displayed information. By default, VCFs will display alleles with base-specific coloring. Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters. [VCF default display] The default VCF custom track will display colored bases and will not show clustering unless specified as VCF/tabix in the custom track page. The following section describes configuration settings available to VCF files compressed and indexed in the Tabix format. This requires VCF manipulation, separate index files, and a web accessible directory to reference from the bigDataUrl track line. For more information on setting up and uploading VCF/Tabix data, click the link on VCF custom track creation. Configuring the haplotype sorting display If the VCF file contains genotype columns for at least two samples (four haplotypes), then a haplotype sorting display can be configured. This can be useful for determining the similarity between the samples and inferring inheritance at a particular locus. Enable Haplotype sorting display: When this option is checked, each sample's phased and/or homozygous genotypes are split into haplotypes, clustered by similarity around a central variant, and sorted for display by their position in the clustering tree. The tree (as space allows) is drawn in the label area next to the track image. Leaf clusters, in which all haplotypes are identical (at least for the variants used in clustering), are colored purple. [VCF tree diagram] The haplotype tree can be seen to the left of the track. Each variant is drawn as a vertical column, using color to distinguish between reference alleles and alternate alleles of the horizontally running haplotypes. If unchecked, then the display is the same as for VCF without genotypes: a stacked bar graph of the top two alleles, showing the proportion of alleles if allele counts are available. This setting is enabled by default. The following options are applicable only when the haplotype sorting display is enabled: Haplotype sorting order: Haplotypes are sorted using a distance function that uses a central variant. Differences between haplotypes are penalized with weights that decrease for each successive variant away from the central variant. By default, the median variant in the window is used. By clicking on a variant in the display, you will get the option to always use that variant when it is in the current view. Haplotype coloring scheme: There are three ways that reference and alternate alleles can be colored: - By default, the reference allele is invisible and the alternate allele is black. When multiple haplotypes must be combined into the same pixel row, grayscale is used to shade according to the proportions of reference and alternate alleles. The central variant has a thin purple outline. Extra pixel rows at the top and bottom show the locations of variants in case they are hard to see due when the invisible reference allele is the major allele. Variants used in clustering have purple marks in these rows; variants outside the clustered regions have black marks. - The reference allele is blue and the alternate allele is red. Purple indicates a mix of reference and alternate alleles. The central variant has a thick black outline. - Both alleles are colored using the same color scheme as when there are no genotypes: A is red, C is blue, G is green and T is magenta. Gray indicates a mix of reference and alternate alleles. The central variant has a thick black outline. In all coloring modes, if some alleles in a haplotype are undefined, a pale yellowish color is used for those alleles. Haplotype clustering leaf shape: Leaf clusters are collections of identical haplotypes. By default, they are drawn as open triangles <. They can also be displayed as open rectangles [. [VCF options] Haplotype sorting display height: This number represents the track height in pixels. If the number of pixels is fewer than the number of haplotypes (2 * the number of genotype columns), some horizontal pixel rows must represent multiple haplotypes; with differing haplotypes' colors combined according to the selected coloring scheme. [VCF options] Filtering out variants Variants can be filtered out of the display according to several properties: - Exclude items with QUAL score less than N: If the checkbox is checked, then all variants whose QUAL column has a non-numeric value (e.g. ".") or a value less than N are excluded from display. By default, the checkbox is not checked and N is 0. - Exclude items with these FILTER values: This option appears only if the VCF header defines at least one FILTER code. There is a checkbox for each code defined in the header. If checked, then all variants with that code in the FILTER column are excluded from display. By default, no checkboxes are checked, so all variants are displayed regardless of FILTER column values. - Minimum minor allele frequency (if INFO column includes AF or AC+AN): If a variant's INFO field includes AF (alternate allele frequency) or both AC and AN (alternate allele count and total number of alleles), then its minor allele frequency can be compared against this threshold. If the minor allele frequency is less than the threshold, the variant will not be displayed. [VCF options] When you have finished making your configuration changes, click the Submit button to return to the annotation track display page. /goldenPath/help/bedMethyl.html:Genome_Browser_bedMethyl_Track_Format bedMethyl and bigMethyl Track Format The bedMethyl format is an extension of the standard BED 9 format used to display DNA methylation site data in a genome browser. This format is useful for base-resolution methylation data generated by bisulfite sequencing or direct methylation detection methods such as long-read sequencing. By including both methylation level and support (coverage), bedMethyl provides a detailed view of methylation across the genome. The bedMethyl format includes the information of a BED 9 along with additional fields: - Valid Coverage: Reads with valid modification call - Percent Modified: Percent of valid calls that are modified - Modified calls: Number of calls with a modified base - Canonical calls: Number of calls with a canonical base - Other modification calls: Number of calls with a modified base, other modifications - Reads with a deletion: Number of reads with a deletion at this reference position - Low-confidence calls: Number of calls where the probability of the call was below the threshold - Reads with a base mismatch: Number of reads with a base other than the canonical base for this modification - Reads with no modification call: Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call [] The items are colored from 0% methylated modified (blue) to 100% (red). Hovering over an item or clicking it shows the additional details found in bedMethyl. Methylation calls are shown separately for CpG sites (m) and non-CpG (CHG/CHH) sites (h). Creating a bedMethyl custom track Example #1 In this example, you will create a bedMethyl custom track using bedMethyl data for the hg38 assembly. 1. Paste the following track line into the custom track management page for the human assembly hg38. track type=bedMethyl name="bedmethyl example" description="bedMethyl custom track" visibility="pack" chr21 5010053 5010054 h 0 + 5010053 5010054 255,0,0 1 0.00 0 0 1 0 0 0 0 chr21 5010053 5010054 m 0 + 5010053 5010054 255,0,0 1 0.00 1 0 0 0 0 0 0 chr21 5010215 5010216 h 0 + 5010215 5010216 255,0,0 1 30.00 0 0 1 0 0 0 0 chr21 5010215 5010216 m 0 + 5010215 5010216 255,0,0 1 30.00 1 0 0 0 0 0 0 chr21 5010331 5010332 h 0 + 5010331 5010332 255,0,0 1 70.00 0 0 1 0 0 0 0 chr21 5010331 5010332 m 0 + 5010331 5010332 255,0,0 1 70.00 1 0 0 0 0 0 0 chr21 5010335 5010336 h 0 + 5010335 5010336 255,0,0 1 100.00 0 0 1 0 0 0 0 chr21 5010335 5010336 m 0 + 5010335 5010336 255,0,0 1 100.00 1 0 0 0 0 0 0 2. Click the "submit" button. 3. Go to chr21:5,010,030-5,010,408 to see the data. bigMethyl Format The bigMethyl format is the indexed version of bedMethyl using bedToBigBed. See bigBed format. The bigMethyl format is more efficient to display in the Genome Browser, and it offers more trackDb options, which will allow for customization. The following autoSql definition is an example on how to specify bigMethyl files. This definition, contained in the file bigMethyl.as, is pulled in when the bedToBigBed utility is run with the -as=bigMethyl.as option. table bigMethyl "bigMethyl bedMethyl" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chrom" uint chromEnd; "End position in chrom" string name; "dbSNP Reference SNP (rs) identifier or :" uint score; "Score from 0-1000, derived from p-value" char[1] strand; "Unused. Always '.'" uint thickStart; "Start position in chrom" uint thickEnd; "End position in chrom" uint color; "Red (positive effect) or blue (negative). Brightness reflects pvalue" string nValidCov; "Valid Coverage" double percMod; "Percent Modified" uint nMod; "Number of calls with a modified base" uint nCanon; "Number of calls with a canonical base" uint nOther; "Number of calls with a modified base, other modification" uint nDelete; "Number of reads with a deletion at this reference position" uint nFail; "Number of calls where the probability of the call was below the threshold" uint nDiff; "Number of reads with a base other than the canonical base for this modification" uint nNoCall; "Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call" ) The first 9 fields of this bigMethyl format are the same as the first 9 fields of the standard BED format. Creating a bigMethyl custom track Example #2 In this example, you will create a bigMethyl file to display as a custom track. 1. Save this bedMethyl file to your computer. 2. Save the autoSql files bigMethyl.as to your computer. 3. Download the bedToBigBed utility. 4. Use the bedToBigBed utility to create a bigMethyl file from your sorted bedMethyl file, using the bedMethyl.bed file and chrom.sizes files created above. bedToBigBed -as=bigMethyl.as -type=bed9+9 bedMethyl.bed https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes bigMethyl.bb 5. Move the newly created bigMethyl file (bigMethyl.bb) to a web-accessible http, https, or ftp location. At this point you should have a URL to your data, such as "https://institution.edu/bigMethyl.bb", and the file should be accessible outside of your institution/hosting providers network. For more information on where to host your data, please see the Hosting section of the Track Hub Help documentation. Construct a custom track line with a bigDataUrl parameter pointing to the newly created bigMethyl file. track type=bigMethyl name="bigMethyl Example" description="A bigMethyl file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMethyl.bb visibility=pack 6. Go to chr21:5,010,030-5,010,408 to see the data. Sharing your data with others Custom tracks can also be loaded via one URL line. This link loads the same bigMethyl.bb track and sets additional display parameters from Example 2 in the URL: http://genome.ucsc.edu/cgi-bin/hgTracks?ignoreCookie=1&db=hg38&position=chr21:5,010,030-5,010,408&hgct_customText=track%20type=bigMethyl%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMethyl.bb %20visibility=pack If you would like to share your bigMethyl data track with a colleague, you can learn how to create a URL link to your data by looking at Example #6 on the custom track help page. Extracting data from the bigMethyl format Because the bigMethyl files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program. /goldenPath/help/chain.html:Genome_Browser_Chain_Format Chain Format The chain format describes a pairwise alignment that allow gaps in both sequences simultaneously. Each set of chain alignments starts with a header line, contains one or more alignment data lines, and terminates with a blank line. The format is deliberately quite dense. Example: chain 4900 chrY 58368225 + 25985403 25985638 chr5 151006098 - 43257292 43257528 1 9 1 0 10 0 5 61 4 0 16 0 4 42 3 0 16 0 8 14 1 0 3 7 0 48 chain 4900 chrY 58368225 + 25985406 25985566 chr5 151006098 - 43549808 43549970 2 16 0 2 60 4 0 10 0 4 70 Header Lines chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id The initial header line starts with the keyword chain, followed by 11 required attribute values, and ends with a blank line. The attributes include: - score -- chain score - tName -- chromosome (reference/target sequence) - tSize -- chromosome size (reference/target sequence) - tStrand -- strand (reference/target sequence) - tStart -- alignment start position (reference/target sequence) - tEnd -- alignment end position (reference/target sequence) - qName -- chromosome (query sequence) - qSize -- chromosome size (query sequence) - qStrand -- strand (query sequence) - qStart -- alignment start position (query sequence) - qEnd -- alignment end position (query sequence) - id -- chain ID The alignment start and end positions are represented as zero-based half-open intervals. For example, the first 100 bases of a sequence would be represented with start position = 0 and end position = 100, and the next 100 bases would be represented as start position = 100 and end position = 200. NOTE: When the strand value is "-",the query coordinates (qStart and qEnd) are on the reverse strand and must be subtracted from the chromosome size to obtain the correct position on the forward strand in the other genome. The reverse coordinates are subtracted as follows to get forward strand coordinates: qStartForward = qSize - qEnd qEndForward = qSize - qStart For example, using the query coordinates from chain 5 in hg38ToMm10.over.chain.gz: chain score tName tSize tStrand tStart tEnd qName qSize qStrand qStart qEnd id chain 442878230 chr1 248956422 + 158547112 207360161 chr1 195471971 - 21022354 65032227 5 The reverse strand coordinates are subtracted from the chromosome size: mm10Start = 195471971 - 65032227 = 130439744 mm10End = 195471971 - 21022354 = 174449617 The forward strand coordinates for chain 5 on mm10 are chr1 130439744 174449617, or with 1-based coordinates for a position range, chr1:130,439,745-174,449,617. To reverse the calculation and derive the corresponding hg38 coordinates using chain 5 in mm10ToHg38.over.chain.gz, note that the derived mm10 coordinates match the tStart and tEnd values: chain 442878230 chr1 195471971 + 130439744 174449617 chr1 248956422 - 41596261 90409310 5 The hg38 coordinates are subtracted as follows: hg38Start = 248956422 - 90409310 = 158547112 hg38End = 248956422 - 41596261 = 207360161 These coordinates match the target coordinates in hg38ToMm10.over.chain.gz. Alignment Data Lines Alignment data lines contain three required attribute values: size dt dq - size -- the size of the ungapped alignment - dt -- the difference between the end of this block and the beginning of the next block (reference/target sequence) - dq -- the difference between the end of this block and the beginning of the next block (query sequence) NOTE: The last line of the alignment section contains only one number: the ungapped alignment size of the last block. "Snake" rearrangement display Rearrangement display, sometimes called snakes display, is an alternative way to view pairwise alignments. It is available for PSL and chain format tracks. Rearrangement display is a representation of the path that the sequence follows in the "other" sequence. You start in the upper left and move to the right, following the lines if you come to the end of a block. If a block is red, which means it is a match on the negative strand, then you reverse your course and start going from right to left. The gray lines mean there are no bases in the other sequence between the blocks. Orange lines means there are some bases in there that are not aligning. The display type can be enabled on the track configuration page of eligible tracks. Below are two examples for clarity. [Rearrangement display example 1] This example shows a tandem duplication on CDH1 which duplicates 3 exons. [Rearrangement display example 2] This example shows an inversion, which can be identified by the red colored sequence. /goldenPath/help/docker.html:Docker_help_page Docker Help Page Contents What is Docker? How to Install Docker Desktop? Using Docker Desktop for UCSC Genome Browser Create a Docker Volume for Data Persistence Updating the Latest UCSC Genome Browser Version Customize a UCSC Genome Browser Docker Container What is Docker? Docker is a platform for developing, testing and running applications. Docker can be used to run genomics tools and manage software such as the UCSC Genome Browser. Docker offers consistency across different computers and environments by packaging everything needed including specific software versions and configurations into a self-contained unit called a container. Container A container is software that packages up code and all its dependencies to run an application quickly and reliably from one computing environment to another. A container is isolated from other containers and a Docker container can be run on a developer's local laptop, virtual machines, on cloud providers, or other combinations of environments. A genomics analysis pipeline or entire analysis environment can be packaged to a local computer into a Docker container and moved to a cluster or a cloud server. See also: Use containers to Build, Share and Run applications Image A Dockerfile is a text file that provides instructions to build an image. The Dockerfile is written in Dockerfile syntax. A docker image is a read-only template with instructions and everything needed to run an application for the container. See also: Overview of the get started guide How to Install Docker Desktop? Windows - Go to the Install Docker Desktop on Windows page - Check system requirements - Install Docker interactively or from the command line macOS - Go to the Install and run Docker Desktop on Mac page - Check system requirements - Install Docker interactively or from the command line Linux - Go to the Install Docker Desktop on Linux page and select Linux distribution - Check system requirements - Follow Generic installation steps Using Docker Desktop for UCSC Genome Browser Start Docker Desktop after installation is complete: - Windows: start Docker Desktop from the Start menu - macOS: start Docker Desktop from the Applications folder - Linux: start the Docker service by running the following command on the terminal: sudo systemctl start docker Obtaining a UCSC Genome Browser Dockerfile The UCSC Genome Browser dockerfile can be obtained from the UCSC Genome Browser Github by using the wget command: wget https://raw.githubusercontent.com/ucscGenomeBrowser/kent/master/src/product/installer/docker/Dockerfile Creating a Image Once the dockerfile has been downloaded, running the docker build with the 't' option allows the naming and the optional tag (format: "name:tag") of the image. The image can be created by running the following command in the same directory where the dockerfile is located: docker build . -t user_name/ucsc_genomebrowser_image Creating a Container After the image has been created, running the docker run command and the image with the -d option allows the container to be run in the background, whereas the default runs the container in the foreground. The -p option publishes a container's port(s) to the host. The following command maps port 8080 on the host machine to port 80 in the container and names the container using the -name option: docker run -d --name ucsc_genomebrowser_container -p 8080:80 user_name/ucsc_genomebrowser_image Accessing the running container via http://localhost:8080 Running the following command will list the running container: docker container ls Running the following command stops the running container:: docker stop Running the following command removes the existing container:: docker rm Using Docker Desktop to Create a Container The Docker Desktop user interface can be used to run the container by going to the images tab and clicking the run button under Actions: [] Click Optional settings in the "Run a new container" pop-up window: [] Enter a Container name and a Host port in the "Run a new container" popup window: [] Click the link with the Host port to go to the running container via localhost: [] Create a Docker Volume for Data Persistence A Docker volume allows the data to be persistent (long-lasting) after the container restarts and mount to a host directory or another container's data volume into the UCSC Genome Browser container. The following command creates a new volume named ucsc_genomebrowser_volume that containers can consume and store data in: docker volume create ucsc_genomebrowser_volume After creating the volume named ucsc_genomebrowser_volume, running the docker run command starts the UCSC Genome Browser container using the user_name/ucsc_genomebrowser_image image and the -v option to mount the volume created in the previous step. docker run -d --name ucsc_genomebrowser_volume -p 8080:80 -v ucsc_genomebrowser_volume:/data user_name/ucsc_genomebrowser_image Files can be copied into the Docker volume or a bind mount can be used to link a host directory containing data to the /data directory inside the container. The following command copies a file to the data directory inside the container: docker cp file.txt ucsc_genomebrowser_volume:/data Running the execute command will list the file inside the running container: docker exec ucsc_genomebrowser_volume ls data Updating the Latest UCSC Genome Browser Software Access the Docker Container's Shell Updating the latest UCSC Genome Browser version will require access to the Docker container running shell (command-line interface) of the UCSC Genome Browser. The execute command can be run inside a running Docker container with the -it options. The -i or --interactive option allows interaction with the command being executed and keeps STDIN open even if not attached. This will allow input to be provided for the command. The -t or --tty option allocates a pseudo-TTY and allows for a more interactive experience. The following example shows how to run exec command and the -it options: docker exec -it /bin/bash Update the Genome Browser Software Running the following command updates the Genome Browser software: bash root/browserSetup.sh cgiUpdate Customize a UCSC Genome Browser Docker Container Editing hg.conf The hg.conf file is a file that has information on how to connect to MariaDB, the location of the other directories and various other settings. The hg.conf file can be edited by running the execute command inside a running Docker container with the -it options. The -i or --interactive option allows interaction with the command being executed and keeps STDIN open even if not attached. This will allow input to be provided for the command. The -t or --tty option allocates a pseudo-TTY and allows for a more interactive experience. Any common text editors such as vi, nano, and vim can be used with the execute command and the -it options. The following example shows how to edit the hg.conf file using vi: docker exec -it vi /usr/local/apache/cgi-bin/hg.conf Changing the Default Genome Browser Options Track settings such as fonts, text size, default tracks, attached hubs, and the default region can be customized and set as the default settings. These settings will appear every time the UCSC Genome Browser graphic display is opened and will also appear after a reset of all user settings. This can be useful when working with a different assembly than hg38, having track hubs automatically attached, or changing the visibility of tracks. - The first step is to create a Session containing all desired display options, hubs, tracks, and default region in the UCSC Genome Browser Docker container. Open the UCSC Genome Browser Docker container shell. See the Access the Docker Container's Shell section of this page. - Create a new file, 'defaultCart.sql', to make a new MySQL table. Add the following to the defaultCart.sql file: #The default cart CREATE TABLE defaultCart ( contents longblob not null # cart contents ); - Drop the existing defaultCart table by running the following query: mysql hgcentral -Ne "DROP TABLE defaultCart" - Load the defaultCart.sql file as a table by running the following query: mysql hgcentral < defaultCart.sql - Insert the session to the default cart table by using the user name and the session name, which was the session saved in the earlier step, and run the following query (add userName): mysql hgcentral -Ne "insert into defaultCart select contents from namedSessionDb where sessionName='nameOfSession' and userName='nameOfUser'" - Finally, make sure the following line is in your hg.conf file. This file is found in the cgi-bin directory, e.g. cgi-bin/hg.conf. defaultCartName=defaultCart /goldenPath/help/hubQuickStartAssembly.html:Assembly_Hub_Quick_Start Quick Start Guide to Assembly Hubs Assembly Hubs allow researchers to create Track Data Hubs on assemblies that are not in the UCSC Browser. By including the underlying reference sequence in UCSC twoBit format, as well as data tracks, researchers can browse and annotate any genome. We may have a GenArk Hub of your genome, or you can visit our assembly request page and we can build an assembly hub for you. For more information please refer to the Assembly Hub User Guide. Below is also a section about starting GBiB Assembly Hubs. STEP 1: In a publicly accessible directory, copy this Arabidopsis thaliana plant assembly hub, which includes an araTha1.2bit file, using the following wget command: wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/ Alternatively, if you do not have wget installed, you can curl these files individually. Perform the curl -O option in the location you wish to copy the files: curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt If you use curl, be sure to recreate the structure with matching araTha1 and araTha1/bbi directories. Double check you have all the files by looking here: http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/ STEP 2: Paste your hub.txt link (http://yourURL/hub.txt) into the Connected Hubs tab of the Track Data Hubs page, click the "Add Hub" button, and then click the "Genome Browser" link from the top bar. Alternatively build a URL that will directly load your assembly hub and display it on hgGateway. Then click the "Genome Browser" link from the top bar to view your assembly hub: http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://yourURL/hub.txt This URL should work the same as using the original data just copied: http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt STEP 3: Congratulations! Your assembly hub should display! If you are having problems, be sure all your files and the directories are publicly accessible. You may also wish to reset the browser occasionally to clear all existing data. For hubs to work, your server must also accept byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays: curl -I http://yourURL/hub.txt Now that you have the assembly hub copied from above, you can copy the directory and start to edit some of the documents such as genomes.txt, groups.txt, and trackDb.txt to understand how they work. Refer to the Assembly Hub User Guide to understand how to build a twoBit file for your own original fasta files. Read more about trackDb settings in the definition document. This assembly hub is a an abbreviated version of a larger plant assembly Public Hub. You can explore the larger hub structure here. Please note that the Browser waits 5 minutes before checking for any changes to these files. When editing hub.txt, genomes.txt, trackDb.txt, and related hub files, shorten this delay by adding udcTimeout=1 to your URL. For more information, please see the Debugging and Updating Track Hubs section of the Track Hub User Guide. Also, for more detailed instructions on setting up a regular hub, please see the Setting Up Your Own Track Hub section of the Track Hub User Guide. Setting up Blat and In-Silico PCR for an Assembly Hub By running gfServers from your institution, you can enable blat on your assembly hubs. See Starting Blat and In-Silico PCR for an Assembly Hub for details. Setting up an Assembly Hub on GBiB with Blat and In-Silico PCR included With an operational installation of Genome Browser in a Box (GBiB), you can quickly and easily acquire an example assembly hub and run gfServers locally on the GBiB to enable Blat and In-Silico PCR. See the section Starting a Blat and In-Silico PCR enabled Assembly Hub on GBiB for more information. Resources - Assembly Hub User Guide - Track Hub User Guide - Track Database (trackDb) Definition Document - Public Hub Guidelines - Quick Start Guide to a Basic Hub - Quick Start Guide to Organizing Hubs Starting Blat and In-Silico PCR for an Assembly Hub From the location of yourAssembly.2bit file, http://yourURL/yourAssembly/yourAssembly.2bit, you can start two gfServers, specifying a port for the assembly hub to access amino acid sequence, 17777 -trans, or DNA sequence, 17779, in this example: gfServer start localhost 17777 -trans -mask yourAssembly.2bit & gfServer start localhost 17779 -stepSize=5 yourAssembly.2bit & Then you can edit the genomes.txt file of your assembly hub to include three lines in the stanza referring to yourAssembly, that would have matching port numbers: transBlat yourLab.yourInstitution.edu 17777 blat yourLab.yourInstitution.edu 17779 isPcr yourLab.yourInstitution.edu 17779 The assembly hub can be configured to talk to a dynamic BLAT server that loads a pre-built index when started by an xinetd super-server. This allows genomes to have a blat server without needing it to be resident in memory at all times. See Running your own gfServer and Adding BLAT servers for details on how to setup dynamic BLAT servers See an example genomes.txt with commented out lines here, and please note the uppercase "B" in transBlat. For more information, see the "Adding BLAT servers" section of the Assembly Hub User Guide. The Source Downloads page offers access to utilities with pre-compiled binaries such as gfServer found in a blat/ directory for your machine type here and further blat documentation here. Please note that because the -mask option in the above 17777 -trans gfServer option will mask all lower-case sequence from being matched, you may not wish to include it. See the above blat links and gfServer usage statement for more information. If you have trouble connecting your blat servers with the browser or if the browser cannot access your files, check if your institution has a firewall that prevents the browser from sending multiple inquiries. If this is the case, ask your systems administrator to add the following IP addresses as exceptions so that access is not limited. 128.114.119.* 129.70.40.99 134.160.84.67 128.114.198.32 This will allow connections with the U.S.-based genome.ucsc.edu site, the Europe-based mirror, the Asia-based mirror, and the UCSC development server. Starting a Blat and In-Silico PCR enabled Assembly Hub on GBiB STEP 1. Acquire and install Genome Browser in a Box: http://genome.ucsc.edu/goldenPath/help/gbib.html. You may also wish to read this UCSC blog post. STEP 2. With your GBiB operational, use your computer's terminal program to ssh into your GBiB: ssh browser@localhost -p 1235, using browser for the password. STEP 3. Navigate to the GBiB's /folders directory and use sudo to wget this assembly hub: cd /folders sudo wget -r --no-parent --reject "index.html*" -nH --cut-dirs=3 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/ STEP 4. You now have all the required files on your local machine and can load this plant assembly hub by using this URL and selecting it under the "group" category where "Plant araTha1" displays: http://127.0.0.1:1234/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt STEP 5. To enable blat you must acquire the gfServer utility. The UCSC Genome Browser and Blat software are free for academic, nonprofit, and personal use. Commercial download and installation of the Blat and In-Silico PCR software may be licensed through Kent Informatics. You can obtain just the gfServer utility on your GBiB with either of the following commands that will create a bin directory and install the tool. The commands use the North American and the European download servers respectively. mkdir ~/bin -p; rsync -avP hgdownload.soe.ucsc.edu::genome/admin/exe/linux.x86_64/blat/gfServer ~/bin/ mkdir ~/bin -p; rsync -avP hgdownload-euro.soe.ucsc.edu::genome/admin/exe/linux.x86_64/blat/gfServer ~/bin/ The GBiB also includes a tool you can run on the command line to download an entire suite of tools including gfServer: gbibAddTools STEP 6. Navigate to the genomes.txt file of this assembly hub: cd /folders/hubExamples/hubAssembly/plantAraTha1/ Edit the currently commented-out blat lines with sudo vi genomes.txt and use "x" when the cursor is over the # at the start of the line to remove it and :w! to save the changes and :q to quit. blat localhost 17779 transBlat localhost 17777 isPcr yourLab.yourInstitution.edu 17779 Please note that if you loaded your hub earlier, it will take five minutes (300 seconds) for the browser to check for any changes to genomes.txt, and that this delay can be shortened temporarily by adding &udcTimeout=10 to the URL. See more information in the Debugging and Updating section of the Track Hub User Guide. STEP 7. Change directories to the 2bit file: cd /folders/hubExamples/hubAssembly/plantAraTha1/araTha1 Run the two gfServer commands to start the blat servers: gfServer start localhost 17777 -trans -mask araTha1.2bit & gfServer start localhost 17779 -stepSize=5 araTha1.2bit & STEP 8. Load this plant assembly hub by using this URL and selecting it under the "group" category where "Plant araTha1" displays: http://127.0.0.1:1234/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://127.0.0.1:1234/folders/hubExamples/hubAssembly/plantAraTha1/hub.txt On the blat page, http://127.0.0.1:1234/cgi-bin/hgBlat, you can now select the Arabidopsis thaliana assembly and blat plant amino acid sequences, such as IYQTRENKYIIGEIQITESERDRRRSSLPGNH or DNA sequences, such as TAAGTAAAAAATAATATGATTAAGACTAATAAATCTTAATAGTTAATACT. On the PCR page, http://127.0.0.1:1234/cgi-bin/hgPcr, you can now select the Arabidopsis thaliana genome and enter a forward primer such as TAGGTCTGCACCTGTGGTTCAAAATTTT and a reverse primer such as CAATACAAGTCAACATTTTAGCGCCGAGA and click the "Flip Reverse Primer" box and then click submit to find matches on the assembly. /goldenPath/help/qValue.html:Genome_Browser_Q-Value Q-Value For any genome-wide analysis, reporting individual p-values can be misleading, because the p-value does not correct for the large number of tests performed. The q-value is an analog of the p-value that incorporates multiple testing correction. The q-value is defined as the minimum false discovery rate at which an observed score is deemed significant. Thus, the q-value attempts to control the percentage of false positives among a collection of scores. This contrasts with a traditional Bonferroni correction (or E-value), which controls the probability of one or more false positives in a collection of scores. Software for computing q-values from a collection of p-values is available at: https://github.com/StoreyLab/qvalue For a good introduction to false discovery rate estimation and the q-value see: Storey JD, Tibshirani R. Statistical significance for genomewide studies. Proc Natl Acad Sci. 2003 Aug 5;100(16):9440-5. /goldenPath/help/hubBasics.html:UCSC_Genome_Browser:_Hub_Basics Track Hub Basics Track Hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser. They allow you to display a set of custom annotations on an assembly (or assemblies) of your choice and offer several advantages over custom tracks, including more display configuration options, more track organization options, more control over your data, and easier updates to that data. This page covers the basics of setting up your own hub: 1. Creating your hub.txt 2. Common track types and their configuration 3. Grouping tracks 4. Creating description pages 5. Sharing and linking to your hub As you build your hub, use the "Hub Development" tab on the Track Data Hub page to check your hub for errors or to disable file caching to see your changes immediately, rather than after the 300ms refresh rate. Example hub.txt To begin, we want to provide an example hub.txt that has been created in a way to make it easy to swap in your data URLs in place of our examples. It indicates what settings are required and includes many optional settings that can help elevate your tracks beyond the basics. Alongside these settings, it includes short explanations of how those settings work and how to configure them, but there is also a version provided without these. - Minimal hub.txt - Detailed hub.txt with setting explanations or without explanations - Visualize this example hub.txt Creating your own hub.txt The first step in creating a track hub is to create your hub.txt file. Download the example hub.txt and use this as a starting point, changing our default values to those for your hub. But, we'll also provide the necessary settings here. These settings control how your hub is labeled in the interface and contact information: hub myExampleHub # a short, unique internal identifier for your hub, no spaces # shortLabel and longLabel are how your hub is labeled in the Genome Browser interface # shortLabels should be under 20 characters and longLabels under 70 shortLabel Example Hub longLabel Example Hub for useOneFile option useOneFile on email genome-www@soe.ucsc.edu genome hg38 If you have tracks across multiple assemblies, see the full track hub documentation. Common Track Types The most common track types are bigBed and bigWig, compressed, binary versions of corresponding plain-text formats. Together they should cover much of what you might want to display in the Genome Browser, from transcription peaks to RNA-seq results. bigBed Tracks You can use bigBed tracks to display discrete annotations, such as genes, transcription start sites, or conserved genomic elements. The bigBed format builds off the plain-text BED format and is thus flexible in terms of what fields are included. Your file must start with a set of 12 standard fields (though not necessarily all of them), but can also extend the format with any number of additional fields. Building a bigBed Next, we'll discuss how to build a bigBed from a bed file. 1. Download the bedToBigBed utility for your system type from our download server. 2. Use bedToBigBed to build your bigBed: bedToBigBed -sort in.bed chrom.sizes myBigBed.bb - If your assembly is a UCSC-hosted assembly (e.g. hg38), chrom.sizes can be a URL (replace "genNom" with the assembly name (e.g. hg38)): http://hgdownload.soe.ucsc.edu/goldenPath/genNom/bigZips/genNom.chrom.sizes. If you're working with a GenArk assembly hub, then the chrom.sizes file can be found under the "Data file downloads" section on the assembly gateway page. - If you have custom fields in your bed file, you will need to create a custom .as file. You can download the basic BED .as and modify this by adding new fields below those in your file. 3. Put your bigBed file alongside your hub.txt in a web-accessible location, either through the 10GB of space we make available to users or through one of several other services 4. You will use the file name (e.g. "myBigBed.bb") with the bigDataUrl setting in your hub.txt bigBed track hub configuration Once you have built your bigBed files, it is time to create a stanza in your hub.txt file for that track. Here is what the required settings discussed above might look like for a basic bigBed track: track bigBedRequiredSettings shortLabel bigBed Required Settings longLabel A bigBed Example with Required Settings visibility pack type bigBed 12 + bigDataUrl gtexCaviar.chr7_155799529-155812871.bb The type line consists of three parts: - "bigBed" is the basic track type - "12" indicates how many standard BED fields are included in your file. You may need to change this to match the number of standard BED fields in your file. - "+" tells the genome browser there are extra fields beyond the standard fields. If your file has no extra fields, replace this with a ".". Here is a screenshot of what this basic bigBed track looks like displayed in the Genome Browser: [] The bigBed format also offer a wide range of customization options for the display, from decorators to highlights. Additionally, they offer extensive filter controls, searching options, and mouseover configurations. Our trackDb documentation contains a full listing of settings available for the format. Here is the bigBed configuration with some commonly used settings, including filtering and mouseover configuration. track bigBedCommonSettings shortLabel bigBed Common Settings longLabel A bigBed Example with Commonly Used Settings visibility pack type bigBed 12 + bigDataUrl gtexCaviar.chr7_155799529-155812871.bb filterLabel.cpp CPP (Causal Posterior Probability) filter.cpp 0 filterLabel.geneName Gene Symbol filterText.geneName * mouseOver $name; CPP: $cpp And here is what that track looks like in the Genome Browser: [] These common settings added options to the track configuration pop-up: [] bigWig Tracks You can use bigWig to tracks to display continuous annotations, such as RNA-seq expression, conservation scores, or other genome-wide scores. You can build a bigWig using one of two plain-text formats: wiggle or bedGraph. Building a bigWig Next, we'll discuss how to build a bigWig from a wig or bedGraph file. 1. Download the wigToBigWig utility for your system type from our download server. 2. Use this utility to build your bigWig: wigToBigWig in.bedGraph chrom.sizes myBigWig.bw - If your assembly is a UCSC-hosted assembly (e.g. hg38), chrom.sizes can be a URL (replace "genNom" with the assembly name (e.g. hg38)): http://hgdownload.soe.ucsc.edu/goldenPath/genNom/bigZips/genNom.chrom.sizes. If you're working with a GenArk assembly hub, then the chrom.sizes file can be found under the "Data file downloads" section on the assembly gateway page. 3. Put your bigWig file alongside your hub.txt in a web-accessible location, either through the 10GB of space we make available to users or through one of several other services 4. You will use the file name (e.g. "myBigWig.bw") with the bigDataUrl setting bigWig track hub configuration The basic trackDb configuration for a bigWig track is similar to a bigBed track as all tracks required the same basic settings (track, shortLabel, longLabel, type, bigDataUrl). This is what the configuration for a bigWig track might look like (the example hub.txt includes other useful settings): track bigWigExample shortLabel bigWig Example longLabel A bigWig Example with Commonly Used Settings visibility pack type bigWig -20 10.003 bigDataUrl hg38.phyloP100way.chr7_155799529-155812871.bw color 60,60,140 The type line consists of two parts: - "bigWig" is the basic track type - "-20 10.003" indicates the minimum and maximum of the data in the bigWig Here is what this looks like visualized in the Genome Browser: [] Grouping tracks Next, we'll provide a basic overview of how to group your tracks using composite tracks and super tracks. This will allow you to pull similar data together under a single track. Composite Tracks Composite tracks can hold multiple tracks of the same type. For example, you use a composite to group together a set of RNA-seq experiments including replicates. Here's what the configuration might look like for a composite containing two bigWig tracks. There are two key components of a composite: (1) the line "compositeTrack on" in the parent track stanza, and (2) including "parent compositeName" for each track that will be part of the composite. track compositeExample shortLabel Example Composite Track longLabel Example composite track using bigWigs visibility dense type bigWig compositeTrack on track compositeBigWig1 bigDataUrl a.chr7_155799529-155812871.bw shortLabel bigWig #1 longLabel bigWig in Composite Track Example #1 parent compositeExample type bigWig 0 1 color 255,0,0 autoScale group visibility dense track compositeBigWig2 bigDataUrl c.chr7_155799529-155812871.bw shortLabel bigWig #2 longLabel bigWig in Composite Track Example #2 parent compositeExample type bigWig 0 1 color 0,255,0 autoScale group visibility dense This composite track configuration will display like so: [] Super Tracks Super tracks are a more general type of container. They can contain tracks of different types and even composites. Configuring a basic super track is quite similar to composite tracks. There are two key components of a composite: (1) the line "superTrack on" in the parent track stanza, and (2) including "parent superTrackName" for each track that will be part of the super track. track superTrackExample shortLabel Super Track Example longLabel A super-track of related data of various types together: individual, multiWig, and composite superTrack on show html examplePage track superTrackbigBed parent superTrackExample bigDataUrl gtexCaviar.chr7_155799529-155812871.bb shortLabel ST bigBed example longLabel A super-track-contained bigBed type bigBed 12 + visibility squish priority 30 track superTrackCompositeBigWig parent superTrackExample compositeTrack on shortLabel ST Composite bigWig longLabel A composite track in a super track grouping bigWigs visibility dense type bigWig priority 60 track superTrackCompositeBigWig1 bigDataUrl a.chr7_155799529-155812871.bw shortLabel ST bigWig composite #1 longLabel A composite-contained bigWig in a super track example #1 parent superTrackCompositeBigWig on type bigWig 0 1 track superTrackCompositeBigWig2 bigDataUrl c.chr7_155799529-155812871.bw shortLabel ST bigWig composite #2 longLabel A composite-contained bigWig in a super track example #2 parent superTrackCompositeBigWig on type bigWig 0 1 Loading the example hub with this super track onfiguration looks like this: [] Creating description pages If you plan to share your track hub more widely, you will want to create a description page for your tracks. A description page could contain a short description of what the data represents, how the data was generated, a link to the associated paper, or a contact email for questions regarding the data. We provide an example description html that you can modify with the details for your track. Once you've modified this example html for your track add an html to the corresponding track stanza: track bigWigExample shortLabel bigWig Example longLabel A bigWig Example with Commonly Used Settings type bigWig -20 10.003 bigDataUrl hg38.phyloP100way.chr7_155799529-155812871.bw html bigWigDescription.html Sharing your hub Once you have a functional hub that you would like to share with others, you can create links that you give to others in two ways. The first option is to create a session link, which requires a Genome Browser account. Load your hub, configure the genome browser as you'd like (e.g. position and data tracks), select "My Sessions" under "My Data", and use the option to save the current settings as a session. You will then be provided with a URL that you can share with others. The other option is to create a URL to the Genome Browser that loads your hub on the assembly of interest. There are three URL parameters you will want to use: - db - UCSC assembly name (e.g. hg38) - position - chromosome position to load - hubUrl - URL to your hub You will then append these to a genome browser URL. For example, this url with load the example hub: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr7:155799529-155812871&hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubBasicSettings/hub.txt If you feel that your hub would be of general use to the research community, you can contact us about making it a public hub. Note that public hubs have to meet more stringent requirements than the basics described here. Check that your hub meets the public hub requirements and then follow the directions on that page for submitting it to us for review. /goldenPath/help/decorator.html:Genome_Browser_Decorators Track Decorators Overview Track decorators allow highlighting parts of features with colors and/or symbols (glyphs/shapes) within a single track. The decorations can either be overlaid onto the feature or shown directly underneath. Decorators can be added to BED 12+, bigBed, PSL, and bigGenePred tracks. To add decorations to a track, the decorations must first be stored in a separate bigBed file that includes extra fields to identify the decorated items. The track settings for the track must then be modified to include a pointer to that bigBed file; this can be done either in the track line for a custom track, or in the trackDb.txt file for a hub. At present, we only support a single decorator per track, but we anticipate supporting multiple decorators in the future. [] Contents Annotating the Genome The Decorator Styles Getting Started Examples Hub TrackDb Settings Troubleshooting Annotating the Genome The genome browser‘s primary way to annotate the genome uses colored rectangles ("exons" for gene tracks) linked by thin lines ("introns"), often stored as a bigBed. These were originally used for genes but then evolved to cover other types of annotations, e.g. enhancers, chromatin modifications, or single nucleotide variants. We usually call these annotations "features". Each rectangle ("exon") of a feature has the same color and individual parts cannot be highlighted. If you wanted to highlight parts of the features, traditionally this required a second track. [] A track decorator is more compact than creating a separate track for these sub-features, but loses some of the abilities of normal tracks, for example, there is no right-click on the sub-features and the user cannot click these sub-features. The primary use case driving this display was protein-domains drawn on top of gene models, but we have found many other applications since then, e.g. drawing summits on ATAC peaks or highlighting selenocysteines on transcripts. The Decorator Styles Track decorators can be shown in two styles, "block" and "glyph" style. The "block" style option can be used to color exons and introns and can display a label for them. For example, the "block" track decorator could be used to overlay protein domain boundaries on transcripts where usually one would use an entirely different track for the domains. [] The "glyph" style option offers 8 different types of glyphs and the color of choice. [] The "glyph" style option can be used to draw entirely new symbols, for example, to indicate insertion positions on the genome with small triangles. We appreciate user feedback. If you have glyph questions, glyph style requests, or have found new glyph applications, please contact our mailing list. To use decorators in your track hub or custom tracks, you will need to create an additional bigBed file that defines the regions, colors, and glyphs. Getting Started The Decorator bigBed A decorator bigBed file, which contains decorations for annotating another track, is very similar to our standard bigBed file format. The only difference is the addition of some extra required fields, which describe how each decoration should be drawn and what item within that other track it annotates. The full .as format for decorator bigBed files is as follows: string chrom; "Chromosome (or contig, scaffold, etc.)" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Name of item" uint score; "Score from 0-1000" char[1] strand; "+ or -" uint thickStart; "Start of where display should be thick (start codon)" uint thickEnd; "End of where display should be thick (stop codon)" uint color; "Primary RGB color for the decoration" int blockCount; "Number of blocks" int[blockCount] blockSizes; "Comma separated list of block sizes" int[blockCount] chromStarts; "Start positions relative to chromStart" string decoratedItem; "Identity of the decorated item in chr:start-end:item_name format" string style; "Draw style for the decoration (e.g. block, glyph)" string fillColor; "Secondary color to use for filling decoration, blocks, supports RGBA" string glyph; "The glyph to draw in glyph mode; ignored for other styles" A copy of this file can be found here. Valid values for the style field are "block" and "glyph". Valid glyph entries include "Circle", "Square", "Diamond", "Triangle", "InvTriangle", "Octagon", "Star", and "Pentagram". If the text isn't recognized, Circle will be used by default. The "decoratedItem" field (chr:start-end:item_name format) captures the link between the decoration and what item in the track is being decorated. The contents of this field must be the chromosome, BED start coordinate, BED end coordinate, and item name for the decorated item (note - these are 0-based half-open BED coordinates, not 1-based fully closed coordinates. That means they are the same values as should appear in a BED file describing the decorated item). Examples Example #1: Building a decorator bigBed Here is an example of how to use this format. Consider the item in BED format as a "feature": chr1 1000 2000 feature 0 + 1000 2000 0 2 400,400 0,600 We can take this BED file and construct a mainEx.bb from it as described in the bigBed documentation. To add a decoration to the "feature" item that highlights the region at position chr1:1,201-1,800, we could create a corresponding item in a decorator bed file like the following: chr1 1200 1800 highlight 0 + 1200 1800 255,0,0,255 1 600 0 chr1:1000-2000:feature block 255,0,0,128 Ignored - The name of this decorator item is highlight specified in the fourth field of the decorator bed file. - The 9th field of the decorator bed file (255,0,0,255) specifies the decoration outline using RGB values and an alpha value to control opacity. - The chr1:1000-2000:feature (13th field of the decorator bed file) entry describes which item in the main bed file is to be annotated. In this case, it's an item with the name "feature" at position chr1:1,001-2,000. - The 15th field of the decorator bed file (255,0,0,128) specifies the interior of the decoration using RGB values and an alpha value to control opacity. - The Ignored value is used in the last field of the bed file because we are creating a block decoration (a decoration that annotates a range of bases). To add a glyph decoration that marks the final base of the transcript with a green circle, we would then include the following line in the decorator bed file: chr1 1999 2000 green_circle 0 + 1999 2000 0,255,0,255 1 1 0 chr1:1000-2000:feature glyph 0,255,0,255 Circle - The name of this decorator item is green_circle specified in the fourth field of the decorator bed file. - The 9th field of the decorator bed file (0,255,0,255) specifies the decoration outline using RGB values and an alpha value to control opacity. - The chr1:1000-2000:feature (13th field of the decorator bed file) entry describes which item in the main bed file is to be annotated. - The 15th field of the decorator bed file (0,255,0,255) specifies the interior of the decoration using RGB values and an alpha value to control opacity. - The Circle glyph style is specified in the last field of the decorator bed file. Both of these bed decorations can be stored in a file named decoratorsEx.bed and then built as a decoratorsEx.bb using the hg38.chrom.sizes and the decoration.as files and running the following command: bedToBigBed -type=bed12+ -as=decoration.as decoratorsEx.bed hg38.chrom.sizes decoratorsEx.bb Debugging the decorator bigBed The resulting decoratorsEx.bb file can be displayed as a stand-alone custom track for debugging purposes. This process allows verification that the decorations are correctly applied. To display the decorator bigBed, navigate to the UCSC Genome Browser Custom Tracks page and paste the URL into the designated text field: browser position chr1:1000-2000 https://genome.ucsc.edu/goldenPath/help/examples/decorator/decoratorsEx.bb Once the decorator bigBed is loaded, the decorations will be rendered on the UCSC Genome Browser, allowing for verification of the correct display. Example #2: Create a custom track Create a decorator bigBed custom track using the decorator bigBed file from Example #1. 1. Construct a track line that references the bigBed and the decorator bigBed file: browser position chr1:1000-2000 track type=bigBed name="Decorators Example Two" description="bigBed with decorators" visibility=pack bigDataUrl=https://genome.ucsc.edu/goldenPath/help/examples/decorator/mainEx.bb decorator.default.bigDataUrl=https://genome.ucsc.edu/goldenPath/help/examples/decorator/decoratorsEx.bb 2. Paste the track line into the custom track page for the human assembly, hg38. 3. Click the Submit button. Custom tracks can also be loaded via one URL line. The link below loads the same bigBed with decorators track and sets additional parameters in the URL: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgct_customText=track%20type=bigBed%20bigDataUrl=https://genome.ucsc.edu/goldenPath/help/examples/decorator/mainEx.bb%20decorator.default.bigDataUrl=https://genome.ucsc.edu/goldenPath/help/examples/decorator/decoratorsEx.bb&position=chr1:1000-2000 Example #3: Create a decorator bigBed with extra (custom) fields Additional fields can also be added onto the end of the .as file, though they will not be used by default. The additional fields can be used for custom feature filters and mouseOvers options. For example, to set up a filterValues filter, you would add the following two fields to the decoration.as file: int numKeywords; "Number of keywords" string[numKeywords] keywords; "Keywords for this decorator" The numKeywords field specifies the number of keywords and the keywords field specifies the keywords. You would then modify your decoratorsEx.bed file to include an additional fields at the end of each line, detailing which keywords apply to each of the decorations: chr1 1200 1800 highlight 0 + 1200 1800 255,0,0,255 1 600 0 chr1:1000-2000:feature block 255,0,0,128 Ignored 1 Type1 chr1 1999 2000 green_circle 0 + 1999 2000 0,255,0,255 1 1 0 chr1:1000-2000:feature glyph 0,255,0,255 Circle 2 Type2,Type3 The first bed entry filters for one class "Type1" and the second bed entry filters for two classes "Type2,Type3". You can also add a mouseOverField which will allow you to mouse over text that is different from the "name" of the decorator bigBed. You would add the following field to the decoration.as file: string mouseOverField; "Mouse over text" Then modify your decoratorsEx.bed file to include the additional field. chr1 1200 1800 highlight 0 + 1200 1800 255,0,0,255 1 600 0 chr1:1000-2000:feature block 255,0,0,128 Ignored 1 Type1 alternate_highlight_name chr1 1999 2000 green_circle 0 + 1999 2000 0,255,0,255 1 1 0 chr1:1000-2000:feature glyph 0,255,0,255 Circle 2 Type2,Type3 alternate_green_circle_name You would then rebuild the decorations.bb file using the decorationEx_fields.as with extra fields: bedToBigBed -type=bed12+ -as=decorationEx_fields.as decoratorsEx.bed hg38.chrom.sizes decoratorsEx.bb Hub TrackDb Settings The statement below provides trackDb settings to add decorators to a track hub: track testTrack type bigBed 12 bigDataUrl https://genome.ucsc.edu/goldenPath/help/examples/decorator/mainEx.bb itemRgb on decorator.default.bigDataUrl https://genome.ucsc.edu/goldenPath/help/examples/decorator/decorationsEx.bb - The type can be BED 12+, bigBed, PSL, and bigGenePred tracks. - The bigDataUrl is the main file to be annotated for decorators. - The itemRgb setting allows the coloring of the interior of the block decorators. - The decorator.default.bigDataUrl setting adds decorations to the track and will point to the bigBed file containing the decorators. Other settings are available to further configure decorators. Each may be applied to the decorator instead of the primary track by prepending "decorator.default." to the setting. For example, to set up a filterValues filter on the "keywords" field of the decorator, allowing the user to filter to any combination of three classes "Type1", "Type2", and "Type3", the following trackDb entry could be used were the multipleListOr setting splits the three classes list values by commas: decorator.default.filterValues.keywords Type1,Type2,Type3 decorator.default.filterType.keywords multipleListOr Please note that this would also require building an extra keywords field into the decorator bigBed to hold those values, see Example #3 for more details. Decorators also support the mouseOver and mouseOverField settings that can be applied to bigBed tracks: decorator.default.mouseOver decorator $name mouseOver decorator.default.mouseOver $mouseOver You can configure the block decoration placement visibility to "overlay", "adjacent", or "hide" using the blockMode setting: decorator.default.blockMode adjacent You can use the maxLabelBases setting to set a maximum window size (in bases) for which labels will be drawn. If not set, the value will default to 200kb. This can be useful to deactivate decoration labels when there are too many track items and too many decoration labels to process visually. decorator.default.maxLabelBases 100000 A full list of supported decorator settings is available in the trackDb documentation. Troubleshooting If you get an error when you run the bedToBigBed program, please check your input BED file for data coordinates that extend past the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input BED file before using the bedToBigBed program. /goldenPath/help/customColumn.html:Genome_Sorter_Custom_Columns Displaying Your Own Columns in the UCSC Gene Sorter The Gene Sorter provides dozens of columns containing information on genes computed at UCSC or provided by outside collaborators. In addition to these standard columns, users may also upload their own columns for temporary display in the browser. Custom columns are viewable only on the machine from which they were uploaded and are kept only for 8 hours after the last time they were accessed. Optionally, users can make custom columns viewable by others as well. Gene Sorter custom columns are based on files in line-oriented format. Each column is described by an initial column line followed by one or more data lines. The column line describes the name, hyperlinks, and other overall characteristics of the column. Each data line contains specific information about a gene annotated by the column. Lines starting with # are ignored. Only one column file may be loaded at a time; however, multiple column descriptions may be included in the same custom file, separated by blank lines. The column line Each column description must begin with a column line containing the keyword column followed by an optional set of one or more attribute pairs: column [attribute1]=[value1] [attribute2]=[value2]... Attribute values must be enclosed in quotes if they contain spaces or tabs. Attribute names and data values are case-sensitive. The following attributes may be defined: name - Symbolic name of the custom column (not displayed to user). shortLabel - Label displayed at the top of the column in the Gene Sorter display. The default value is User Column. longLabel - Short description of the column displayed after the name on the configuration and filter pages. The default value is User custom column. visibility - Controls whether column is displayed by default: on = display column, off = hide column. The default is on. priority - Specifies the display order of the column relative to others. Columns with lower priority values appear toward the lefthand side of the display. The standard browser columns have priorities between 0 and 20. The default priority is 2.01. itemUrl - URL used to construct hyperlinks accessed by clicking on column data values. If the URL contains %s, the column value will be inserted at that position in the hyperlink string. For example, if itemUrl for a column is defined as http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg16& position=%s (the UCSC Genome Browser URL), clicking on the data value NM_024014 will open the Human Jul. 2003 Genome Browser to the position occupied by RefSeq accession NM_024014. There is no default for this attribute. labelUrl - URL of the hyperlink accessed by clicking on the column label. No default. search - When the attribute is set to one of the following values, column data may be searched using the Gene Sorter search text box. Rows containing matches will be moved to the top of the display. By default, no search criterion is set. - exact - matches only if the text entered in the position search box exactly matches the column data text. - prefix - matches if search text exactly matches the initial part of the column data text. - fuzzy - matches if search text matches any portion of the column data text. idLookup - When set, this attribute specifies the standard column that should be used to link key values to the Gene Sorter display. For example, if idLookup is set to refSeq, a custom column data row containing the key NM_024014 will display on the same row as the RefSeq row containing NM_024014. The idLookup values are case-sensitive. By default, idLookup is set to the acc (GenBank) column. To determine the idLookup value that corresponds to a specific standard column, click the column's title in the Gene Sorter display (use the configure button to turn on the column display if it is currently hidden). The near.do.colInfo parameter in the URL linked to the column title is set to the idLookup value that corresponds to that column. isNumber - When this attribute is set to on, the filter page displays numerical max/min controls for this column. Default is off. Data lines Data lines are of the format: [key] [value] - key - links the custom column data value to a data value in the column specified by the idLookup attribute. If idLookup is unset, the browser looks for a match in the acc (GenBank) column. - value - data value to be displayed in the custom column row that matches the specified key. It is permissible to have more than one key/value pair per key. In this case, the column displays a comma-separated list of values. Data line keys and values are case-sensitive. Examples Example #1 This example defines a custom column for the Jul. 2003 Gene Sorter. The column's key values are linked to data in the refSeq column. Column rows can be filtered by numerical range by setting the max/min values for the column on the filter page. #Custom column file for MyLab Trial 3 # #Column line: # column name="MyLab Data" shortLabel="MyLab" longLabel="MyLab Trial 3" visibility=on priority=2.05 idLookup=refSeq isNumber=on # #Data lines (key links to refSeq column): # NM_005523 1.2 NM_005522 4.5 NM_018951 5.1 NM_000522 5.7 NM_030661 9.4 NM_002141 5.2 NM_024014 4.3 NM_006896 6.0 Example #2 This example defines a custom column for the Oct. 2003 mouse Gene Sorter. The column's key values are linked by default to the acc (GenBank) column. Clicking on the column's title (UCSCLab) displays the web page http://genome.ucsc.edu/. Clicking on a specific data value displays the web page http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm4 (the UCSC Genome Browser) at the position specified by the data value. A search on the word MOUSE will display a list of all UCSC BioLab data that contains the string "MOUSE". #Custom column file for UCSC BioLab Test Data # #Column line: # column name="UCSCLab Data" shortLabel="UCSCLab" longLabel="UCSC BioLab Test Data 4/4/04" visibility=on priority=2.05 itemUrl=http://genome.ucsc.edu/cgi-bin/hgTracks?db=mm4&position=%s labelUrl=http://genome.ucsc.edu search=fuzzy # #Data lines (key links to refSeq column): # U20370 HXAB_MOUSE L08757 HXAA_MOUSE NM_008264 HXAD_MOUSE # The following 2 lines demonstrate multiple data values for 1 key: AK083575 NM_010449 AK083575 Q8BNI8 M95599 NM_010451 M28021 HXA5_MOUSE NM_010454 HXA6_MOUSE /goldenPath/help/covidBrowserIntro.html:COVID_Browser_Intro Introduction to the SARS-CoV-2 Genome Browser The UCSC Genome Browser is an open-source, interactive sequence visualization tool that has been a cornerstone of genomics since we released the first human genome assembly 20 years ago. Cited in more than 37,000 scientific articles and used by thousands of researchers each day; it allows for cross-referencing of research, clinical, and epidemiology data against reference genomes, including SARS-CoV-2. This data is continuously updated and added to as new datasets become available. For a more thorough description, please reference our SARS-CoV-2 Genome Browser Nature Genetics paper. We also post updates and COVID Browser resources to out COVID-19 Browser home page. This guide will go through some of the most important use cases of the SARS-CoV-2 Genome Browser. These topics include: - Orientation and Navigation - Gene Data and Sequence Alignments - Variation and Immunology data - Phylogenetic Contact Tracing using UShER - Other tools and data downloads - Support and Collaboration For those who prefer a video explanation, we also have the following tutorial: Genome Browser Orientation and Navigation The standardized reference genome displayed on the COVID Genome Browser is from one of the first isolated cases, known as NC_045512v2 or wuhCor1. With more than 80 track datasets across the SARS-CoV-2 reference genome's nearly 30,000 RNA bases, navigation is essential to finding the information you want to see. Below is an example view of the SARS-CoV-2 Genome Browser with labeled sections highlighting the navigation, reference sequence, annotations, and other available track datasets. Navigation controls at the top allow users to move left and right and to zoom. The search box allows users to search for particular features or to move to exact genomic coordinates. The RNA sequence is shown at the top only when the view is sufficiently zoomed in. Annotations are shown for data tracks that have been set to visible in the available tracks section at the bottom. Tracks can be configured with a right-click or by clicking on their name near the bottom of the page. [Labeled orientation to the Genome Browser] This is a view of the SARS-CoV-2 Genome Browser (COVID Browser) with labeled elements to help with orientation. Interact with this session by clicking on the picture. To read the full caption, please go to our Nature Genetics paper. Genes and Sequence Alignments Gene and protein annotations are organized by the contributor, most notably NCBI and UniProt. Having multiple information sources allows a consensus to be formed among datasets. Like many viral genomes, molecular complexity arises from polyproteins rearranging, generating ~29 protein products. Most notable among these is the S (spike) protein which defines coronaviruses and allows entry into cell membrane. Additional tracks contain information such as interactions between viral proteins and human proteins (protein interact), PDB structures, and RNA structure annotations (Rangan RNA), and more. Sequence alignments and conservation data are also available across the SARS-CoV-2 genome, from large-scale views to individual bases and amino acids. Four conservation tracks compare sequences with 44 bat coronaviruses, 119 vertebrate coronaviruses, 7 human coronaviruses, and PhyloCSF computed conservation scores. The tracks display differently depending on visibility mode and the number of bases on the screen. Datasets can be turned on by setting the dropdown next to the data track name from "hide" to dense, squish, pack, or full. Then click the refresh button to see these changes in effect. Clicking on a data track name will take you to a description with more information on the dataset, display conventions, methods, and references. Clicking on a particular item will take you to a page with complete information about that item and dataset. [Some of the gene and conservation data on the Genome Browser] This Genome Browser display shows some of the gene and conservation tracks available on the SARS-CoV-2 genome. You should be able to see UniProt protein products, regions of interest, and domains all mapped against the SARS-CoV-2 genome. Below those tracks are two different conservation alignments in "squish" and "pack" formats, comparing bat-host and human-host coronavirus sequences with the reference SARS-CoV-2 genome. Interact with this session by clicking on the picture. Exploring Variation and Immunology Data The SARS-CoV-2 Genome Browser displays data on variation within SARS-CoV-2 from UniProt, GenBank, GISAID, Nexstrain, and other providers. These datasets cover global trends in SARS-CoV-2 variation among all available public sequences, with regional descriptions available through clicking into a particular entry. A few of the most notable tracks under the "Variation and Repeats" section are the Phylogeny: Public track, which shows a continuously updating phylogenetic tree that clusters similar sequences, with the frequency of each mutation shown by the height of the bar at that particular base. Tools are provided to filter these data to show only well-supported mutation calls, set thresholds for minor-allele frequency, and display data for specific clades. Another track is the spike protein mutations from community annotations, highlighted as amino acid changes with red indicating strong antibody escape in receptor-binding domain (RBD) mutation screens. The Genome Browser has also has the Variants of Concern track, which pinpoints each accumulated mutation that defines 4 strains of SARS-CoV-2 of particular concern, labeled based on lay terms (such as 'California variant') as well as the using the lineage defined by the Pangolin software (such as 'B.1.1.7'). The Genome Browser also provides 12 immunology datasets that can inform potential therapeutic targets or public health risks. Protein epitopes are highlighted in the genome by multiple tracks, including those from the Immune Epitope Database (IEDB) and from a study of COVID+ patients. Of particular interest are the datasets describing surveys of antibody response across a variety of SARS-CoV-2 variants in the receptor-binding domain (Antibody Escape Mutations). [Some of the variation and immunology data on the Genome Browser] This image shows some of the variation data tracks that can be displayed on the SARS-CoV-2 genome, specifically zoomed into the receptor-binding domain of the Spike protein. Validated epitopes are displayed in black that may be a target for therapeutic antibodies. In red and black, antibody escape scores are are shown for each genome position. Smaller tick marks show amino acid or nucleotide changes from different sources, with more information available by clicking into the item. Genetic Contact Tracing with UShER The UCSC Genome Browser has developed a tool that allows placement of SARS-CoV-2 sequences onto existing phylogenetic trees far faster than previous methods, allowing instantaneous tracing of strains and transmission events. This tool is called Ultrafast Sample placement on Existing tRees (UShER) and exists as an interactive web-tool to compare sequences and link to existing public phylogenetic trees. [Example of the UShER phylogeny placement tool] After uploading a Fasta file, the tool returns a page with quality metrics such as: number of bases aligned, number of Ns, and number of maximally parsimonious placements along with the lineage and clade of the nearest neighbor. Colored boxes highlight possible quality issues, green meaning this was a high confidence placement. SARS-CoV-2/ COVID Phylogenic Trees You can view your aligned SARS-CoV-2 sequence genotypes along with their closest known relatives among the 150,000+ public sequences. You can compare among your uploaded samples or trace possible transmission vectors using mutational signatures. [Example of the UShER phylogeny placement tool tree features] The uploaded sequences are highlighted in blue alongside their most closely aligned public sequences. You can investigate genotypes and relationships between samples. Other tools, downloads, and features Custom Tracks, BLAT, Track Hubs Along with a suite of data tracks, filters, and visualization options for the SARS-CoV-2 genome, the UCSC Genome Browser offers many additional ways to interface with our data. You can upload your data on the reference genome in nearly any format with our Custom Track tool. If you have unaligned sequence, you can use our BLAT sequence alignment tool to get coordinates and base-by-base comparison with any reference genome. We also display formatted data as Track Hubs and curate a list of user-submitted Public Track Hubs. Downloads, Table Browser, JSON API, SQL As part of our open-source, open-access philosophy, we try to make it as easy as possible for researchers to download entire datasets or filtered subsets. Each track description page has a Data Access section which points users to our main options for data download. For downloading complete datasets, our SARS-CoV-2 download directory provides access to all our source files for transparency and reproducibility. Our Table Browser tool lets users interact with our data using a variety of filters based on score, identifiers, or any other field. Table Browser also allows users to convert data into multiple different formats (e.g. BED, GTF) and to access different formatted sequence outputs (in FASTA format). We have a JSON API which can be programmatically called and return any dataset in its entirety or as a filtered subset based on documented input parameter. We also offer a Public SQL server for similar flexible, automatic way to access genomic data and annotations. Along with this particular virus genome browser, we have thousands of genomes available for visualization and analysis from our genome assemblies gateway page. Support and Collaboration The Genome Browser offers rapid email support for anything related to our tools. If your question is general or may have been asked before, please review our Browser documentation and our archive of previously answered questions. If you would still like help, please go to our Contact Us page to see access our email support. When contacting us, please include a session link, images, and example data if applicable. We are active on social media, you can follow us on Twitter or Facebook. We are always looking to collaborate with researchers and add new datasets to our site. We also seek to continuously improve our tools to meet the needs of the scientific community. If you have any collaboration ideas, contributions, or feature requests, please reach out through our suggestion page. /goldenPath/help/maf.html:Genome_Browser_maf_Format Redirect /goldenPath/help/cram.html:Genome_Browser_CRAM_Format CRAM Track Format The UCSC Genome Browser is capable of displaying both the BAM and CRAM file formats. While BAM files contain all sequence data within a file, CRAM files are smaller by taking advantage of an additional external "reference sequence" file. This file is needed to both compress and decompress the read information. Since CRAM files are more dense than BAM files, many groups are switching to the CRAM format to save disk space. For CRAM tracks to load there is an expectation that the checksum of the reference sequence used to create the CRAM will be in the CRAM header. A file with a matching checksum is also expected to be accessible from the EBI RefGet CRAM reference registry (see References for CRAM resources). Otherwise, users must specify a refUrl setting that will point to a server that is offering up the reference sequences (see Example Four). Since the loading of CRAM data requires the specific reference sequence used to create the CRAM file, it is very important that the exact same reference sequence is used for compression and decompression. When a CRAM file is first loaded on a given chromosome, a check for the preexistence in a special browser "cramCache" directory of the specified reference checksum will take place. If the reference sequence information specific for that CRAM for the currently viewed chromosome region does not exist, a message will display about the file not being found along with a note about downloading the reference from the EBI CRAM reference registry if it is available or from another Refget server using the refUrl setting. A refresh of the page once the download is complete will display the CRAM data as if it were a BAM file. The track lines to describe CRAM tracks are identical to track lines for BAM tracks. This includes the type parameter, which is still bam even for CRAM tracks. The only difference is that instead of providing the URL to a BAM file, the URL instead points to a CRAM file. Please also note that just as a BAM file requires an associated BAM.bai index file, a CRAM file will require an associated CRAM.crai index file in the same location to load. Example #1 Here is an example CRAM track that displays around the gene SOD1 on hg19 that can be cut and pasted as text into the Custom Tracks page: track type=bam db=hg19 name=exampleCRAM bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/cramExample.cram Please note at the above URL location there is also a http://genome.ucsc.edu/goldenPath/help/examples/cramExample.cram.crai file. If this .crai file is at a different URL, the bigDataIndex= option must be added. Clicking this following link will also load the above track. The information following hgct_customText is equivalent to pasting the text in to the Custom Tracks page: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr21%3A33031597-33041570&hgct_customText=track%20type=bam%20name=exampleCRAM%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/cramExample.cram Example #2 If the URL to a CRAM file ends with .cram, you can paste the URL directly into the custom track management page, click submit and view it in the Browser. The track name will then be the name of the file. If you want to configure the track name and descriptions, you will need to create a track line, as shown in the above example. Learn more about track line options and configuring custom tracks here. Here is an example URL to a CRAM file from the 1000 Genomes Project that can be pasted directly into the Custom Tracks page: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/exome_alignment/HG00096.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam.cram You can see by adding the above link the Browser automatically assigns the type=bam and the track name=HG00096.mapped.ILLUMINA.bwa.GBR.exome.20120522.bam to the created track to browse. Clicking the following image will load a CRAM file from the 1000 Genomes Project. [] This CRAM display takes advantage of using the new "density graph" feature where the bam.cram reads are displayed as a bar graph by checking the box next to "Display data as a density graph" on the Custom Track Settings page. Example #3 The CRAM format is also supported in track hubs. Below is an example trackDb.txt stanza that would display a CRAM files from the 1000 Genomes Project. To learn more about using Track Hubs see the User Guide and associated Quick Start Guides to building hubs. Note that type bam is used to display CRAM files in hubs, just as type bam is used in custom CRAM tracks. track cram61 type bam shortLabel HG00361 longLabel This CRAM file is from the 1000 Genomes Project HG00361 visibility pack bigDataUrl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00361/exome_alignment/HG00361.mapped.ILLUMINA.bwa.FIN.exome.20120522.bam.cram Example #4 For genomes that are not registered in the EBI CRAM Reference Registry, the refUrl setting is used to point the browser to the appropriate place to find the reference sequence. The refUrl setting is used with the URL of the reference server, such as refUrl http://university.edu/URL/cramRef/%s where the %s gets replaced by the RefGet MD5 checksum which identifies the reference sequence. The example below shows a hub track stanza using the refUrl setting: track cramExample type bam visibility full shortLabel cramExRefUrl longLabel This CRAM file points to a reference sequence specified by refUrl refUrl http://university.edu/URL/cramRef/%s bigDataUrl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram The use of refUrl can also be employed on a custom track line: track type=bam db=hg19 name=cramExRefUrl refUrl=http://university.edu/URL/cramRef/%s bigDataUrl=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram References Below is a collection of helpful CRAM resources: - CRAM toolkit - CRAM reference registry - Refget: standardized access to reference sequences - CRAM format specification (version 3.0) - Using CRAM within Samtools - International Genome Sample Resource (IGSR) CRAM Tutorial - Dave Tang blog post about "BAM to CRAM" Sharing your data with others If you would like to share your CRAM data track with a colleague, learn how to create a URL by looking at Example 11 on this page. Activating CRAM support for the Genome Browser To find documentation on how to set up CRAM support on a mirror of the UCSC Genome Browser please see this following README.cram file. /goldenPath/help/image.html:Genome_Browser_Tracks_Image Help with the browser tracks image If you are having difficulties viewing the Genome Browser, it may be a result of a recent update to our software. To clear up the tracks image, you can force a refresh/reload of the page in your Internet browser. Depending on which Internet browser you are using, you may need to hold down the Shift key while simultaneously clicking the "reload" button (or, in some browsers, Ctrl+R works). Once you have refreshed the page, the tracks image should display correctly. Sorry for any inconvenience. We hope that you understand that the website is continually evolving so that we can provide you with more features and data. /goldenPath/help/trix.html:Genome_Browser_Trix_Indices Trix Indices A Trix index consists of a pair of files that allow for fast look-up of free text associated with a list of identifiers. The index is created from a single line-oriented text file using the program ixIxx. Each line in the text file starts with an identifier, followed by free text associated with the ID. The search is not case sensitive, so any case-combination of the free text entered will be matched. For a more complete description of how to make a searchable track hub (or custom track), please visit the Quick Start Guide to Searchable track hubs. To complete the steps below you must first download the ixIxx utility. For more information on downloading our command line utilities, see these instructions. Example 1 To create a Trix index, follow these steps: 1. Prepare a text file that associates your IDs with free text: id1 this is text for id1 id2 this is text for id2 id3 this is text for id3 2. Run the ixIxx program on your text file. ixIxx input.txt myTrix.ix myTrix.ixx Example 2 1. If you have a bigBed track with unique gene names such as SIRT1, BRCA1, TP53 in the fourth name column, and you built the bigBed using the option of -extraIndex=name to index the name field you can create an input.txt such as the following that associates the bigBed name with other identifiers people might search for or with shorter spellings of the names: SIRT1 sirt1 sir sirt Sirtuin SIR2-Like ENSG00000096717 NM_012238 BRCA1 brca1 brca brc breast cancer 1 ENSG00000012048 NM_007300 TP53 tp53 tp5 Tumor Protein P53 ENSG00000141510 NM_001126112 2. You would then run the ixIxx program on your text file, taking into account the length of your longest word. ixIxx input.txt myTrix.ix myTrix.ixx The ixIxx utility has a default of 31 characters for words. If you build an input.txt with longer words, such as very long accessions, you neeed to add the option -maxWordLength=N to override the default and expand it to the size you are using for the longest words in your index. Resources and examples If you want to use your Trix index in a track hub, see the searchTrix setting in the Track Database Definition Document. Review our Quick Start Guide to Searchable track hubs for illustrated steps building a track hub. There are also tools available for taking genePred format to trix format, such as this gpToIx.pl perl script. /goldenPath/help/bigMaf.html:Genome_Browser_bigMaf_Alignment_Format bigMaf Track Format The bigMaf format stores multiple alignments in a format compatible with MAF files, which is then compressed and indexed as a bigBed. The bigMaf files are created using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) file that defines the fields of the bigMaf. The bigMaf files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigMaf files have considerably faster display performance than regular MAF files when working with large data sets. The bigMaf file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigMaf files, please see the Hosting section of the Track Hub Help documentation. bigMaf file definition The following autoSql definition is used to specify bigMaf multiple alignment files. This definition, contained in the file bigMaf.as, is pulled in when the bedToBigBed utility is run with the -as=bigMaf.as option. bigMaf.as table bedMaf "Bed3 with MAF block" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" lstring mafBlock; "MAF block" ) An example: bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb Supporting frame and summary definitions Alongside the bigMaf file, two other summary and frame bigBeds are created. The following autoSql definition is used to create the first file, pointed to online with summary , rather than the standard bigDataUrl used with bigMaf. The file mafSummary.as, is pulled in when the bedToBigBed utility is run with the -as=mafSummary.as option. mafSummary.as table mafSummary "Positions and scores for alignment blocks" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string src; "Sequence name or database of alignment" float score; "Floating point score." char[1] leftStatus; "Gap/break annotation for preceding block" char[1] rightStatus; "Gap/break annotation for following block" ) An example, bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb. Another tool, hgLoadMafSummary generates the input bigMafSummary.bed file. The following autoSql definition is used to create the second file, pointed to online with frames . The file mafFrames.as, is pulled in when the bedToBigBed utility is run with the -as=mafFrames.as option. mafFrames.as table mafFrames "codon frame assignment for MAF components" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start range in chromosome" uint chromEnd; "End range in chromosome" string src; "Name of sequence source in MAF" ubyte frame; "frame (0,1,2) for first base(+) or last bast(-)" char[1] strand; "+ or -" string name; "Name of gene used to define frame" int prevFramePos; "target position of the previous base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" int nextFramePos; "target position of the next base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" ubyte isExonStart; "does this start the CDS portion of an exon?" ubyte isExonEnd; "does this end the CDS portion of an exon?" ) An example, bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb. Another tool, genePredToMafFrames generates the input bigMafFrames.txt file. Note that the bedToBigBed utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file. Creating a bigMaf track To create a bigMaf track, follow these steps: Step 1. If you already have a MAF file you would like to convert to a bigMaf, skip to Step 3. Otherwise, download this example MAF file for the human GRCh38 (hg38) assembly. Step 2. If you would like to include optional reading frame and block summary information, download the chr22_KI270731v1_random.gp genePred file. Step 3. Download the autoSql file bigMaf.as needed by bedToBigBed. If you have opted to include the optional frame summary and information with your bigMaf file, you must also download the autoSql files mafSummary.as and mafFrames.as files. Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below: wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.maf wget https://genome.ucsc.edu/goldenPath/help/examples/chr22_KI270731v1_random.gp wget https://genome.ucsc.edu/goldenPath/help/examples/bigMaf.as wget https://genome.ucsc.edu/goldenPath/help/examples/mafSummary.as wget https://genome.ucsc.edu/goldenPath/help/examples/mafFrames.as wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes Step 4. Download the bedToBigBed and mafToBigMaf programs from the UCSC binary utilities directory. If you have opted to generate the optional frame and summary files for your multiple alignment, you must also download the hgLoadMafSummary, genePredSingleCover, and genePredToMafFrames programs from the same directory. Step 5. Use the fetchChromSizes script from the same directory to create a chrom.sizes file for the UCSC database with which you are working (e.g., hg38). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes. mafToBigMaf hg38 chr22_KI270731v1_random.maf stdout | sort -k1,1 -k2,2n > bigMaf.txt bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt hg38.chrom.sizes bigMaf.bb Note that the hg38 in the mafToBigMaf hg38 command indicates the referenceDb and matches the expected prefix of the primary species' sequence name, for instance hg38 for the hg38.chr22_KI270731v1_random found in the input example chr22_KI270731v1_random.maf file. Step 6. Follow the below steps to create the binary indexed mafFrames and mafSummary files to accompany your bigMaf file: genePredSingleCover chr22_KI270731v1_random.gp single.gp genePredToMafFrames hg38 chr22_KI270731v1_random.maf bigMafFrames.txt hg38 single.gp bedToBigBed -type=bed3+8 -as=mafFrames.as -tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf cut -f2- bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb Step 7. Move the newly created bigMaf file (bigMaf.bb) to a web-accessible http, https or ftp location. If you generated the bigMafSummary.bb and/or bigMafFrames.bb files, move those to a web accessible location, likely same location as the bigMaf.bb file. Step 8. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this: track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb summary=http://myorg.edu/mylab/bigMafSummary.bb frames=http://myorg.edu/mylab/bigMafFrames.bb Step 9. Paste the custom track line into the text box on the custom track management page. Navigate to chr22_KI270731v1_random to see the example data for this track. The bedToBigBed program can be run with several additional options. For a full list of the available options, type bedToBigBed (with no arguments) on the command line to display the usage message. Examples Example #1 In this example, you will create a bigMaf custom track using an existing bigMaf file, bigMaf.bb, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly. To create a custom track using this bigMaf file: 1. Construct a track line that references the file: track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb summary=http://genome.ucsc.edu/goldenPath/help/examples/bigMafSummary.bb 2. Paste the track line into the custom track management page for the human assembly hg38 (Dec. 2013). 3. Click the "submit" button. Note that additional track line options exist that are specific to the MAF format. For instance, adding the parameter setting speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5" to the above example will specify the order of sequences by species. Custom tracks can also be loaded via one URL line. This link loads the same bigMaf.bb track and sets additional display parameters in the URL: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack After this example bigMaf is loaded in the Genome Browser, click into an alignment on the browser's track display. Note that the details page displays information about the individual alignments, similar to that which is available for a standard MAF track. Example #2 In this example, you will create a bigMaf file from an existing bigMaf input file, bigMaf.txt, located on the UCSC Genome Browser http server. 1. Save the bed3+1 example file, bigMaf.txt, to your computer (Step 6, above). 2. Save the autoSql file bigMaf.as to your computer (Step 3, above). 3. Download the bedToBigBed utility (Step 4, above). 4. Save the hg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human (hg38) assembly (Step 5, above). 5. Run the bedToBigBed utility to create a binary indexed MAF file (Step 6, above): bedToBigBed -type=bed3+1 -tab -as=bigMaf.as bigMaf.txt hg38.chrom.sizes bigMaf.bb 6. Move the newly created bigMaf file (bigMaf.bb) to a web-accessible location (Step 7, above). 7. Construct a track line that points to the bigMaf file (Step 8, above). 8. Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (step 9, above). Sharing your data with others If you would like to share your bigMaf data track with a colleague, learn how to create a URL by looking at Example 6 on this page. Extracting data from the bigMaf format Because bigMaf files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program. /goldenPath/help/bigMethyl.html:Genome_Browser_bigMethyl_Track_Format BigMethyl Track Format The bigMethyl format allows display of methylation sites. General Structure The bigMethyl format is line-oriented. BigMethyl data are preceded by a track definition line, which adds a number of options for controlling the default display of this track. Following the track definition line are the track data in 18 column BED format: chromA chromStartA chromEndA dataValueA chromB chromStartB chromEndB dataValueB Parameters for bigMethyl track definition lines All options are placed in a single line separated by spaces: track type=bigMethyl name=track_label description=center_label Note: if you copy/paste the above example, you must remove the line breaks. The track type is REQUIRED, and must be bigMethyl: type=bigMethyl Data Values BigMethyl track data values can be integer or real, positive or negative values. The chromosome coordinates are zero-based, half-open. This means that the first chromosome position is 0, and the last position in a chromosome of length N would be N - 1. The positions listed in the input data must be in numerical order, and only the specified positions will be graphed. bigMethyl format has eighteen columns of data: chrom chromStart chromEnd dataValue Example Note: The above example is a custom track that includes a track type= line that is specific for loading the data in the browser. This line will cause a raw bigMethyl data file to fail validation by other tools, such as validateFiles, outside of the browser. /goldenPath/help/hgBamTrackHelp.html:Genome_Browser_BAM_Configuration Configuring BAM tracks Genome Browser BAM tracks may be configured in a variety of ways to highlight different aspects of the displayed information. The configuration options are described here and related to custom track settings that can alter the default appearance of the custom track. Click here for more information on BAM custom track creation. - Attempt to join paired end reads by name: This checkbox appears only if pairEndsByName is included in the track settings. When checked, SAM/BAM records with the same name will be joined into pairs for display, with a line drawn between them. - Minimum alignment quality: Exclude alignments with quality less than the given number. The default is 0, unless changed by the track setting minAliQual. - Color track by bases: By default, mismatching bases are highlighted in the display. Change the selection to "item bases" to see all base values from the query sequence, or "OFF" to ignore query sequence. - Additional coloring modes: Other aspects of the alignments can be displayed in color or grayscale. The default mode is "Color by strand" (bamColorMode=strand), unless the bamColorMode track setting specifies gray, tag or off. - Color by strand: alignments on the reverse strand are colored dark red, alignments on the forward strand are colored dark blue. - Grayscale: items are shaded according to the chosen method: alignment quality, base qualities, or unpaired ends. The alignment qualities of items are shaded on a scale of 0 (lightest) to 99 (darkest). Base qualities are shaded on a scale of 0 (lightest) to 40 (darkest). When "unpaired ends" is selected, items that were paired in sequencing but whose mate was not mapped are colored gray, while singletons and properly paired items are black. Alignment quality is the default (bamGrayMode=aliQual) unless bamGrayMode track setting is baseQual or unpaired. - Use R,G,B colors specified in user-defined tag: SAM/BAM may include user-defined tags, whose names begin with X, Y or Z and include one other letter or number. The user-defined tag named here specifies red, green and blue (RGB) intensities as a zero-terminated string (tag type Z) containing comma-separated triples of numbers from 0-255. For example, if a SAM/BAM record includes the tag YC:Z:255,0,0, then the item is colored red; YC:Z:0,0,255 makes the item blue. By default, the tag is "YC" unless changed using the track setting bamColorTag. - No additional coloring - Display data as a density graph: This feature enables the BAM data to be displayed as a bar graph where the height is proportional to the number of reads mapped to each genomic position. Through dynamic calculation of items in the current window, this feature plots a line similar to a wiggle graph that can be customized with a number of graph-based configuration options such as drawing indicator lines, smoothing plots, adjusting graph height and vertical range, and switching from bars to points. Please note that the feature is best displayed with Display mode set to full and that the default Data view scaling is auto-Scale to data view. Also, please note that when set to display as a density graph, other BAM display options described on this page, such as coloring for strand, alignment quality and base qualities, are not applied. When you have finished making your configuration changes, click the Submit button to return to the annotation track display page. /goldenPath/help/barChart.html:barChart_and_bigBarChart_Track_Format barChart and bigBarChart Track Format The barChart (and bigBarChart) track format displays a graph of category-specific values over genomic regions, similar to the GTEx Gene track. This format is useful for displaying gene expression and other datasets where it is desirable to compare a set of variables over genomic regions. While a barChart track can effectively show datasets with single values for each variable (e.g. comparing individual samples), the format provides specific features to display studies comprised of a large set of samples for each variable (e.g. comparing tissues with multiple samples for each tissue). In this usage, the main genome browser display presents a graph of summary values (e.g. medians) for each variable, and the distribution of sample values across variables is shown via a boxplot graph shown on the details page for each region. The barChart format is available as a standalone plain text bed6+ format for use with smaller datasets as a custom track, and as a binary indexed format (bigBarChart) suitable for track hubs and custom tracks. The bigBarChart format provides more track customization features (i.e. schema customization, and label configuration support), and is recommended for users who can use command-line tools and have web-accessible data storage. If you do not have web-accessible data storage, please see the Hosting section of the Track Hub Help documentation. barChart format files are converted to bigBarChart files using the program bedToBigBed, run with the -as option to pull in a special autoSql (.as) schema file defining the fields of the bigBarChart. Below is an example of the barChart format in 'full' visibility mode [BarChart example in full mode] The 'squish' display mode draws one colored rectangle indicating the category (e.g. tissue) with highest value of the measured metric (e.g. gene expression) if it contributes more than 10% to the total expression, otherwise the chart is colored black. The following image shows the GTEx Genes track in 'squish' mode; the beige colored item (tissue) has the highest expression in the ACE2 gene and represents more than 10% of total expression. Click into the colored rectange for more information. [BarChart example in squish mode] barChart format definition The following autoSql definition illustrates the basic schema supporting barChart (and bigBarChart) tracks. table bigBarChart "bigBarChart bar graph display" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Name or ID of item" uint score; "Score (0-1000)" char[1] strand; "'+','-' or '.'. Indicates whether the query aligns to the + or - strand on the reference" string name2; "Alternate name of item" uint expCount; "Number of bar graphs in display, must be <= 100" float[expCount] expScores; "Comma separated list of category values." bigint _dataOffset; "Offset of sample data in data matrix file, for boxplot on details page, optional only for barChart format" int _dataLen; "Length of sample data row in data matrix file, optional only for barChart format" ) Column Explanations The first 6 fields of the barChart format are the same as the first 6 fields of the standard BED format. The name2 field provides an alternate item name, useful if you would like to associate multiple transcripts to a single gene locus, different variables to the same experiment type, etc. The expCount and expScores fields are used as in the Microarray format; they define the number of categories and a value for each category (see example #1 below). The _dataOffset and _dataLen fields are used internally by the track to locate sample values for a region in an optional matrix file containing all sample values. These values are used to draw a boxplot of all sample data on the details page for the bar chart. When a matrix file is not supplied, these fields should be set to 0. (As a convenience, these fields are optional for barChart custom tracks). When creating bigBarChart files, we encourage you to customize the title and field descriptions of the prototype autoSql schema to better describe your data. In the example below, the name field of the track refers to a transcript, while the name2 field represents a gene: table xyzGeneExpression "XYZ gene expression barChart" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" string name; "Transcript name" uint score; "Score (0-1000), derived from total expScores (below)" char[1] strand; "+, -, or ., indicating orientation of the item" string name2; "Gene name" uint expCount; "Number of tissues" float[expCount] expScores; "Comma separated list of median expression in RPKM for each tissue." bigint _dataOffset; "Offset of sample data in data matrix file" int _dataLen; "Length of sample data row in data matrix file" ) Customing this file will make your data more easily interpreted by users, who will see the field descriptions when accessing the track data from the Table Browser, when viewing items on the Genome Browser details pages (via the "view table schema" link), and (for users who download files), from the -as option of the bigBedInfo tool. Creating barChart and bigBarChart custom tracks The steps for creating barChart tracks differ from the process for creating bigBarChart tracks. The steps also differ based on whether you have an input matrix file (generated perhaps from an RNA-Seq differential expression analysis pipeline) or not. If you have an expression matrix-like file, skip to Example #3, otherwise follow example 1 below. Example #1 In this example, you will create a barChart custom track using example bed6+3 data. 1. Paste the following track line into the custom track management page for the human assembly hg38. track type=barChart name="barChart Example One" description="A barChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack browser position chr14:95,081,796-95,436,280 # chrom chromStart chromEnd string name score strand name2 expCount expScores _dataOffset _dataLen chr14 95086227 95158010 DICER1 999 - ENSG00000100697.10 5 2.94,11.60,38.00,6.69,4.89 chr14 95181939 95319906 CLMN 999 - ENSG00000165959.7 5 7.08,69.53,9.32,1.38,1.68 chr14 95417493 95475836 SYNE3 999 - ENSG00000176438.8 5 7.29,3.73,0.74,20.35,1.39 2. Click the "submit" button. After the file loads in the Genome Browser, you should see an automatically colored bar graph with 5 bars. Hovering the mouse over any of the individual bars will display the name of the particular bar ("wholeBlood", "adiposeSubcut", ...) as well as the value associated with that bar (10.94, 0.74, ...). The order of bar names in the barChartBars field of track line should exactly match the order of the values in the expScores field. Example #2 In this example, you will create a bigBarChart track out of an existing bigBarChart format file, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly. To create a custom track using this file: 1. Construct a track line referencing the file: track type=bigBarChart name="bigBarChart Example One" description="A bigBarChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb browser position chr14:95,081,796-95,436,280 2. Paste the track line into the custom track management page for the human assembly hg38. 3. Click the "submit" button. After the file loads in the Genome Browser, you should see an automatically colored bar graph with 5 bars. The same rules apply to bigBarChart custom tracks as barChart custom tracks in that the order of names in the barChartBars field should exactly match the order of values from the expScores field in the bigBarChart file. Example #3 In this example, you will use the helper scripts expMatrixToBarchartBed and bedJoinTabOffset on example matrix and category files in order to generate a bed6+5 barChart format file, which can be loaded as a custom track into the Genome Browser. The matrix file is a tab-separated (must be tabs, not spaces) file of the following form, perhaps resulting from an RNA-Seq analysis pipeline. Please note that the first line must describe each column as in the example snippet below. transcript sample1 sample2 sample3 sample4 sample5 ... transcriptName value1 value2 value3 value4 value5 ... The categories file then provides more meta information about this matrix file. It is a two column, tab-separated file that maps the samples in the matrix file to a specific category: sample1 category1 sample2 category1 ... ... sampleA category2 sampleB category2 ... ... sampleX category3 sampleY category3 ... ... Each column in the first line of the matrix file must be found in the categories file. We have provided an example category file and matrix file to follow along with the rest of this example. To create a custom track in this form, follow the below steps: 1. Create a bed6+1 file to use as a map for the items in your matrix (does not need to be all-inclusive). This file, with example lines such as: chr14 95086227 95158010 ENSG00000100697.10 999 - DICER1 will be one of three input files to the helper script expMatrixToBarchartBed. For an example file to follow along with the rest of this example, you can download an example bed6+1 file here. 2. Download the helper programs expMatrixToBarchartBed and bedJoinTabOffset from the utilities directory appropriate for your operating system. 3. Make sure the programs downloaded above are in your system's PATH variable. For example, if you downloaded the programs to your $HOME/Downloads directory, set your PATH variable accordingly: export PATH=$PATH:$HOME/Downloads 4. Now run expMatrixToBarchartBed (which in turn runs bedJoinTabOffset) like so: expMatrixToBarchartBed categoriesFile matrixFile bedInputFile outputBed The argument outputBed will be a bed6+5 file, with the expCount, expScores, _dataOffset, and _dataLen fields computed for you, for example: chr14 95086227 95158010 ENSG00000100697.10 999 - DICER1 5 10.94,11.60,8.00,6.69,4.89 93153 26789 expMatrixToBarchartBed will also output the order of the scores in the expScores field, which you can then copy and paste into the barChartBars field of the custom track line so the bars displayed in the browser match the right values: The columns and order of the groups are: #chr start end name score strand name2 expCount expScores;adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood _offset _lineLength If you have already pre-computed expCount and expScores, and just need offsets into your matrix file for a more descriptive details page, run only bedJoinTabOffset like so: bedJoinTabOffset matrixFile exampleBed6+3 outBed 5. Now that we have a bed6+5 format file that corresponds to the barChart format, we can construct a track line and prepend it to our bed6+5 file so the Genome Browser will recognize it: track type=barChart name="barChart Example" description="A barChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" barChartMetric=median" visibility=pack 6. Use the upload button on the custom track management page for the human assembly hg38 to upload the file just created. 7. Click the "submit" button. expMatrixToBarchartBed automatically computes the median values for all the samples in the matrix file, which is useful when your experiment contains data from 8000 samples (such as the GTEx data). Furthermore, expMatrixToBarchartBed can compute the mean value of all samples in a category of the matrix file (instead of the default median), in addition to allowing for a specific ordering of the expScores field. NOTE: Set the barChartMetric setting to 'mean' if you use this option of expMatrixToBarchartBed. For more information about expMatrixToBarchartBed or bedJoinTabOffset, run the program with no arguments to get a usage message. Example #4 In this example, you will use the bed6+5 file created in Example 3 to create a bigBarChart file, allowing the data to be remotely accessed and exist within a track hub. The track settings for bigBarChart on a hub can be viewed here. 1. If not already completed, follow steps 1-4 from Example 3 above, or download the example bed6+5 file here 2. Download the fetchChromSizes and bedToBigBed programs from the utilities directory appropriate to your operating system. 3. Use fetchChromSizes to create a chrom.sizes file for the UCSC database you are working with (hg38 for these examples). Alternatively, you can download the chrom.sizes file for any assembly hosted at UCSC from our downloads page (click on "Full data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38 database is located at http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes. 4. Save the autoSql file barChartBed.as to your computer. 5. Run bedToBigBed to create the bigBarChart file: bedToBigBed -as=barChartBed.as -type=bed6+5 inputBed hg38.chrom.sizes output.bigBed 6. Move the newly constructed bigBarChart file to a web accessible http, https, or ftp location. 7. Construct a custom track line with a bigDataUrl parameter pointing to the newly created bigBarChart file. If the matrix and category files used to make the precursor barChart file are also moved to an http, https, or ftp location, we can point to them on the custom track line as well (all settings must be on the same line): track type=bigBarChart name="bigBarChart Example One" description="A bigBarChart file" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" barChartMetric=median barChartUnit=RPKM bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb barChartMatrixUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleMatrix.txt barChartSampleUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleSampleData.txt visibility=pack 8. To fully take advantage of creating a bigBarChart file and the barChartMatrixUrl and barChartSampleUrl supporting files, create a Track Hub and use a stanza such as the following: track exampleBarChartTrack type bigBarChart visibility full shortLabel exBarChart longLabel Simple example bar chart track barChartBars adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood barChartColors #FF6600 #33CCCC #CC9955 #AAAAFF #FF00BB barChartLabel Tissues barChartMetric median barChartUnit RPKM bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb barChartMatrixUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleMatrix.txt barChartSampleUrl http://genome.ucsc.edu/goldenPath/help/examples/barChart/exampleSampleData.txt Please note, the fields in your barChartBars line must match the terms in your categories file (exampleBarChartSamples.txt) in order for the boxplot display to show up on the details page for tracks. Below is an example image indicating the benefit of using these files in a hub, note the "View all data points for..." link that allows extracting data from the matrix file (exampleBarChartMatrix.txt) specific for this named item. [Example boxPlot image] Example #5 To help Track Hub Developers adjust the display of tracks we add two settings barChartBarMinWidth and barChartBarMinPadding. The first sets the minimum pixel width of the bars in the chart to a number of pixels, for example barChartBarMinWidth 10. The second sets the minimum pixel width between bars to a number of pixels, for example barChartBarMinPadding 5. Here are two example tracks using these settings on the same source data that can be loaded by going to the My Data, Custom Tracks page and pasting the below text to see how the display differs. browser position chr14:95,081,796-95,436,280 track type=bigBarChart barChartBarMinPadding=5 name="ex barChartBarMinPadding" description="A bigBarChart file with barChartBarMinPadding" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb track type=bigBarChart barChartBarMinWidth=20 name="ex barChartBarMinWidth" description="A bigBarChart file with barChartBarMinWidth" barChartBars="adiposeSubcut breastMamTissue colonTransverse muscleSkeletal wholeBlood" visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/barChart/hg38.gtexTranscripts.bb [Example1 of barChartBarMinPadding and barChartBarMinWidth] Both tracks have the same data, however, in the bottom track the barChartBarMinWidth 20 setting triggers wider widths, and the top track has larger padding between bars from the setting barChartBarMinPadding 5. As described in the settings entries for barChartBarMinWidth and barChartBarMinPadding there is a dynamic calculation dependent on the current window size, the width of the item, and the number of bars for the item. So that when zooming in the appearance of the barCharts with these settings can be different, at different scales. For instance, in the first image, you can see how much impact barChartBarMinWidth has on the second track, as well as the barChartBarMinPadding in the top track. But as zoomed in, with the below image, the impact of both of these settings is less noticeable. [Example2 of barChartBarMinPadding and barChartBarMinWidth] Example #6 To help with the selection and exploration of large data sets the new settings barChartFacets, barChartStatsUrl, and barChartMerge were introduced where on the details page checkboxes enable slicing data down to smaller collections based on metadata. The setting barChartFacets turns on the faceted selection on the track details and configure page which is useful for selecting which bars out of a large number to display by clicking designated checkboxes. The setting barChartStatsUrl <url> associates a table in tab-separated values with the barChart, with one line per bar. And the setting barChartMerge on enables a merge button inside of the faceted selections. It is particularly useful when there are many bars and many facets to condense a related group, such as tissue source. Below is an example track using these settings on source data for a Tabula Sapiens single cell RNA data from many tissues track. This excerpt of settings from that track allows experimenting to see these settings in action, and to be loaded by going to the My Data, Custom Tracks page. track type=bigBarChart name="ex Tabula Sapiens" description="A bigBarChart using Tabula Sapiens data to illustrate new Details pages" visibility=pack barChartCategoryUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/bw_edit_tissue_cell_type.categories barChartFacets=tissue,cell_class,cell_type barChartStatsUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/bw_edit_tissue_cell_type.facets barChartMerge=on bigDataUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/tabulaSapiens/tissue_cell_type.bb Once loaded, click into an item to see the details page, in this case for the gene ACE2 at the default position in hg38. On the details page, rather than a static bar chart image, there is a dynamic interactive selection screen with checkbox facets to narrow down the display. Adding barChartMerge on enables the display of the "merge" button, and barChartFacets tissue,cell_class,cell_type sources information in barChartStatsUrl ...tissue_cell_type.facets to enable the facet options. To interact with this example, click the first two "merge" buttons next to "tissue" and "cell_class." [Example1 of Facets on barCharts] With those two merged selections, then click on the "Macrophage" option to see just this one cell type selection. [Example2git l of Facets on barCharts] By then clicking the "unmerge" button next to "tissue" the single bar chart will expand with tissue clusters. [Example3 of Facets on barCharts] In these ways the new barChartFacets, barChartStatsUrl, and barChartMerge settings allow users to explore the barChart data on the individual details page more closely. One can use the facets to further select certain types and also click the columns (val/count/cluster) to arrange by numerical value or alphabetical name. Also, if you click the "Return to Genome Browser" link, you will see only these selection bars are displayed. [Example4 of Facets on barCharts] In this image after making the selections browsing ACE2 the "zoom out" button has been clicked to also view nearby genes where the expression of these tissue selections for the gene PIR is quite noticeably different. Example #7 In this example, we will be using command-line tools that were used to create the single-cell tracks available on hg38. For more in-depth examples of these tools, take a look at the following makedoc for a real-life example. The matrixClusterColumns command converts a single cell gene expression matrix to a cell-type gene expression matrix. It takes a cell-by-cell metadata matrix that refers to the same cells as a gene expression matrix and combines the gene expression values for all cells of a given type into a single value representing the cell type. It can also be used on other metadata fields to produce matrices that show mean or average gene expression levels for a donor, an organ, or any other metadata field or combination of fields. The following command uses the exprMatrix.tsv and meta.tsv files to create six files: prepMatrix.tsv, prepStats.tsv, TissueCompMatrix.tsv, TissueCompStats.tsv, SexMatrix.tsv, and SexStats.tsv. matrixClusterColumns exprMatrix.tsv meta.tsv \ prep prepMatrix.tsv prepStats.tsv \ "Tissue Composition" TissueCompMatrix.tsv TissueCompStats.tsv \ Sex SexMatrix.tsv SexStats.tsv Read 5 rows from meta.tsv matrix exprMatrix.tsv has 209126 fields 209126 total columns, 209121 unclustered, 0 misses 209126 total columns, 209121 unclustered, 0 misses 209126 total columns, 209121 unclustered, 0 misses . If you are not sure which GENCODE Genes version is best suited for your data, the gencodeVersionForGenes command takes a list of gene symbols or gene accessions and searches for the version of GENCODE or RefSeq that matches the most genes in the list. Optionally, the tool can produce a BED file containing the gene structures for the genes in the list. The following command uses the gene.lst and geneSymVerTx.tsv files to create a mapping.bed file that will be used in the next step. # Figure out gene set gencodeVersionForGenes -target=hg38 gene.lst geneSymVerTx.tsv -bed=mapping.bed examining 23 versions of gencode and refseq best is gencodeVM5 as sym on mm10 with 6 of 6 (100%) hits on hg38 6 of 6 (100%) hit across versions The matrixToBarChartBed combines an expression matrix and a BED file with gene structures (mapping.bed) to make a new BED file (myTissueComp.bed) with a barChart showing gene expression that can be viewed on the Genome Browser. The optional argument, stats=stats.tsv provides a statistics file and improves the coloring in trackDb. The following command uses the TissueCompMatrix.tsv, mapping.bed, and TissueCompStats.tsv to create the myTissueComp.bed file that can be viewed on the Genome Browser or converted into a bigBed file so it can be used inside of a track hub. matrixToBarChartBed TissueCompMatrix.tsv mapping.bed myTissueComp.bed -stats=TissueCompStats.tsv 5 genes found, 0 (0.00%) missed The simpleBarChartBed.as can be used with the bedToBigBed command to create the bigBarChart file. bedToBigBed myTissueComp.bed hg38.chrom.sizes myTissueComp.bb -type=bed6+3 -as=simpleBarChartBed.as pass1 - making usageList (1 chroms): 15 millis pass2 - checking and writing primary data (5 records, 9 fields): 1 millis Sharing your data with others If you would like to share your barChart/bigBarChart data track with a colleague, learn how to create a URL by looking at Example 6 on this page. Extracting data from the bigBarChart format Because bigBarChart files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. Use the -as option to see the file field descriptions. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program. /goldenPath/help/axt.html:Genome_Browser_axt_Alignment_Format axt Alignment Format axt alignment files are produced from Blastz, an alignment tool available from Webb Miller's lab at Penn State University. The axtNet and axtChain alignments are produced by processing the alignment files with additional utilities written by Jim Kent at UCSC. Example: The following segment from an axt file shows the first 2 sets of alignments of the human assembly (the aligning assembly) to mouse chromosome 19 (the primary assembly). 0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500 TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA chr19 3008279 3008357 chr11 70573976 70574054 - 3900 CACAATCTTCACATTGAGATCCTGAGTTGCTGATCAGAATGGAAGGCTGAGCTAAGATGAGCGACGAGGCAATGTCACA CACAGTCTTCACATTGAGGTACCAAGTTGTGGATCAGAATGGAAAGCTAGGCTATGATGAGGGACAGTGCGCTGTCACA Structure Each alignment block in an axt file contains three lines: a summary line and 2 sequence lines. Blocks are separated from one another by blank lines. Summary line 0 chr19 3001012 3001075 chr11 70568380 70568443 - 3500 The summary line contains chromosomal position and size information about the alignment. It consists of 9 required fields: - Alignment number -- The alignment numbering starts with 0 and increments by 1, i.e. the first alignment in a file is numbered 0, the next 1, etc. - Chromosome (primary organism) - Alignment start (primary organism) -- The first base is numbered 1 - Alignment end (primary organism) -- The end base is included. - Chromosome (aligning organism) - Alignment start (aligning organism) - Alignment end (aligning organism) - Strand (aligning organism) -- If the strand value is "-", the values of the aligning organism's start and end fields are relative to the reverse-complemented coordinates of its chromosome. - Blastz score -- Different blastz scoring matrices are used for different organisms.See the README.txt file in the alignments directory for scoring information specific to a pair of alignments. Sequence lines TCAGCTCATAAATCACCTCCTGCCACAAGCCTGGCCTGGTCCCAGGAGAGTGTCCAGGCTCAGA TCTGTTCATAAACCACCTGCCATGACAAGCCTGGCCTGTTCCCAAGACAATGTCCAGGCTCAGA The sequence lines contain the sequence of the primary assembly (line 2) and aligning assembly (line 3) with inserts. Repeats are indicated by lower-case letters. /goldenPath/help/hgTrackHubHelp.html:Track_Hubs Using UCSC Genome Browser Track Hubs Contents What Are Track Hubs? What Are Assembly Hubs? Viewing Track Hubs Sharing Track Hubs Setting up a Track Hub Adding Groups to a Track hub Debugging Track Hubs Setting up item search Adding filters to your Track Hub Registering a Track Hub with UCSC Checking Hub settings and compatibility Where to host your data Track Hub Database Definition Document ------------------------------------------------------------------------ Additional resources - Raney BJ, et al. Track Data Hubs enable visualization of user-defined genome-wide annotations on the UCSC Genome Browser. Bioinformatics. 2014 Apr 1;30(7):1003-5. - Assembly Hub User Guide - Track Database Definition Document (Reference guide for trackDb parameter definitions) - Track hub settings blog post - Public Hub Guidelines - Quick Start Guide to Basic Hubs - Quick Start Guide to Organizing Track Hubs into Groupings - Quick Start Guide to Assembly Hubs with Blat - Quick Start Guide to Searchable Track Hubs - Quick Start Guide to adding Filters to Hubs Search the Genome Browser help pages: Questions and feedback are welcome. What Are Track Hubs? Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser (please note that hosting hub files on HTTP tends to work even better than FTP and local hubs can be displayed on GBiB). Track hubs can be displayed on genomes that UCSC directly supports, or on your own sequence. Hubs are a useful tool for visualizing a large number of genome-wide data sets. For example, a project that has produced several wiggle plots of data can use the hub utility to organize the tracks into composite and super-tracks, making it possible to show the data for a large collection of tissues and experimental conditions in a visually elegant way, similar to how the ENCODE native data tracks are displayed in the browser. The track hub utility allows efficient access to data sets from around the world through the familiar Genome Browser interface. Browser users can display tracks from any public track hub that has been registered with UCSC. Additionally, users can import data from unlisted hubs or can set up, display, and share their own track hubs. Genome assemblies that UCSC does not support can be loaded and viewed with associated data. The data underlying the tracks and optional sequence in a hub reside on the remote server of the data provider rather than at UCSC. Genomic annotations are stored in compressed binary indexed files in bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigMethyl, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, HAL, hic or VCF format that contain the data at several resolutions. In the case of assemblies that UCSC does not support, genomic sequence is stored in the efficient twoBit format. When a hub track is displayed in the Genome Browser, only the relevant data needed to support the view of the current genomic region are transmitted rather than the entire file. The transmitted data are cached on the UCSC server to expedite future access. This on-demand transfer mechanism eliminates the need to transmit large data sets across the Internet, thereby minimizing upload time into the browser. The track hub utility offers a convenient way to view and share very large sets of data. Individuals wishing to display only a few small data sets may find it easier to use the Genome Browser custom track utility. As with hub tracks, custom tracks can be uploaded to the UCSC Genome Browser and viewed alongside the native annotation tracks. Custom tracks can be constructed from a wide range of data types; hub tracks are limited to compressed binary indexed formats that can be remotely hosted. However, the custom tracks utility does not offer the data persistence and track configurability provided by the track hub mechanism: hub tracks can be grouped into composite or super-tracks and configured to display the data using a wide variety of options. There is no way to create a browser on your own sequence with custom tracks. In general, for users who have large data sets that would be prohibitive to upload, need to ensure the persistence of their data, or would like to take full advantage of track functionality, or create a browser on sequence not natively supported by UCSC or a genome browser mirror, track hubs are a better solution. Both mechanisms give data providers the flexibility to directly add, update, and remove data from their display as needed. What Are Assembly Hubs? Assembly Data Hubs extend the functionality of Track Data Hubs to assemblies that are not hosted natively on the Browser. Assembly Data Hubs were developed to address the increasing need for researchers to annotate sequence for which UCSC does not provide an annotation database. They allow researchers to include the underlying reference sequence, as well as data tracks that annotate that sequence. Sequence is stored in the UCSC twoBit format, and the annotation tracks are stored in the same manner as Track Data Hubs. For more information on how to setup your own Assembly Data Hub, please refer to the Assembly Hub User Guide and see the Quick Start Guide to Assembly Hubs. Viewing Track Hubs Public hubs The Genome Browser provides links to a collection of public track hubs that have been registered with UCSC. To view a list of the public track hubs available, click into the blue navigation bar "My Data" and then "Track Hubs" to reach the Public Track Hubs page. You can click links in the "Description" column to see details about a particular Hub. To view a hub's data, click on an assembly name on the row of your hub or the "Connect" button. If you clicked the "Connect" button, choose your assembly or click the "Genome Browser" link from the top blue bar to be brought to the default assembly. The selected hub tracks will be listed in a separate track group below the browser image and can be configured just like native browser tracks. Exercise caution when viewing a wide region that requires the Genome Browser to display a large number of track features: the browser display may time out. Unlisted hubs (located in the My Hubs tab) In addition to the Public Hubs listed, it is possible to load your own track hub or one created by a colleague. To add an unlisted hub, open the Track Hubs page and click the Connected Hubs tab. To import a new hub, type or paste its URL into the text box, then click the "Add Hub" button. If successful, your track hub will appear on that page. Tracks accessed through a hub can be used in Genome Browser sessions and custom tracks in the same manner as other tracks. The data underlying data hub tracks can be viewed, manipulated, and downloaded using the UCSC Table Browser. To remove a track hub from your Genome Browser display, click the "Disconnect" button on the Track Hubs page. For confidential or private data, please note that unlisted hubs are not secure. The URL helps to hide the location of the data; it is a simple barrier of obscurity. Please also know that hubs can be loaded from local directories when using GBiB. Sharing Track Hubs When sharing track hubs on a single assembly, we recommend using Saved Sessions (especially for publications). These have the advantages of being single-click access and sharing a full browser configuration. If you are just starting, an overview of the process is to make the hub on your web-accessible server, attach the hub to the Genome Browser, configure browser position and related tracks, then save your named session and share the session link. Your session links will be in the following format with your chosen username and session name: - http://genome.ucsc.edu/s/ExampleUser/TrackHubSession For additional information, see sharing Saved Sessions and backing up custom data. Creating a URL for a Track Hub Hubs can be loaded into the URL using the hubUrl= parameter. This parameter takes input similar to the track hub input box. Native UCSC supported genomes can be loaded into the URL using the db= parameter while non-natively supported genomes such as assembly hubs or GenArk hubs use the genome= parameter. URL parameters can be combined by using &. The following example links to the hg19 genome database and an example track hub using the db= and the hubUrl= parameters: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt Track hubs' track visibility can also be changed from the URL parameters. As an example, the following link specifies: - the genome database (db=hg38) - loads a track hub (hubUrl=http://hgdownload.soe.ucsc.edu/hubs/gtex/hub.txt) - hides all tracks (hideTracks=1) - hides the subtrack kids of a particular track (gtexRnaSignalMaleYoung_hideKids=1) - sets a specific subtrack to be displayed (gtexRnaSignalSRR1311243=full) - ignores user settings (ignoreCookie=1). https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hubUrl=http://hgdownload.soe.ucsc.edu/hubs/gtex/hub.txt&hideTracks=1>exRnaSignalMaleYoung_hideKids=1>exRnaSignalMaleYoung=full>exRnaSignalSRR1311243=full&ignoreCookie=1 Optional parameters that can be added to the URL: - guidelines=on/off - activate or deactivate the blue guidelines - hgFind.matches= - highlight features given their names - hgt.reset=1 - show only the default tracks - hgt.toggleRevCmplDisp=1 - show the reverse-complement - hgt.labelWidth= - set the size of the left-side label area - hideTracks=1 - hide all tracks - hideTracks=1&=full|dense|pack|hide - hide all tracks and show other tracks - highlight=.:-#|... - ignoreCookie=1 - do not load the user's existing settings saved in the internet browser's UCSC Genome Browser cookie. This means that the link will show the Genome Browser default settings such as track selections, custom tracks, and track hubs. Any changes you make in this new session will, however, affect the user's settings. E.g., if you add a track in this new window, and come back to the genome browser later, the track will still be there. This setting is useful if a website wants to link to the Genome Browser, starting with a "clean slate" but believes the user will come back to the Genome Browser expecting the changes to still be there. - ruler=hide - hide the ruler at the top of the browser image - oligoMatch=pack&hgt.oligoMatch= - switch on the Short Match track and highlight a matching sequence - pix= - set the width of the image in pixels - textSize= - set the size of text font - =full|pack|dense|hide - show your current tracks, adding a track and set it to full, pack or dense visibility or hide it - _imgOrd= - vertically orders the tracks on the image based on the numbers provided. You need to specify an order for every visible track when using this parameter - .heightPer= - sets the height of the a bigWig track in pixels - _hideKids=1 - hides a specific super track's individual tracks - _sel=1 - selects specific subtrack to be 'checked', allowing display Creating a URL with Multiple Track Hubs You can create a URL with multiple Track Hubs using the hubUrl= parameter. URL parameters can be combined by using &. For example, using the following track hubs: ENCODE DNA Trackhub https://storage.googleapis.com/gcp.wenglab.org/hubs/dna20/hub.txt JASPAR TFBS http://expdata.cmmt.ubc.ca/JASPAR/UCSC_tracks/hub.txt ReMap 2022 Regulatory Atlas https://remap.univ-amu.fr/storage/public/hubReMap2022/hub.txt The combination of the three hubs allows the creation of the following URL that can load the three hubs on the Genome Browser. https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hubUrl=https://storage.googleapis.com/gcp.wenglab.org/hubs/dna20/hub.txt&hubUrl=http://expdata.cmmt.ubc.ca/JASPAR/UCSC_tracks/hub.txt&hubUrl=https://remap.univ-amu.fr/storage/public/hubReMap2022/hub.txt Creating a URL for an Assembly Hub The following example links to an assembly hub using the hubUrl= and genome= parameters where in the example, genome=araTha1, is the assembly name set for genome in the genomes.txt file. URL parameters can be combined by using &. https://genome.ucsc.edu/cgi-bin/hgTracks?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt Additional hub connection parameters - hubClear=connects the hub and disconnects other hubs that are in the same directory - hgHubConnectReplacing hgTracks with this parameter connects the hub and redirects the link to the Track Data Hubs page - hgHub_do_firstDb=1 uses the first database in genomes.txt - hgHub_do_redirect=on redirects the attached hub to the Gateway page Redirecting the URL to the Gateway page The following example link connects the hub and redirects the link to the Gateway page to display the hub's description html page, which is defined in the genomes.txt by the htmlPath setting, by using hgHubConnect, hgHub_do_redirect=on, hgHubConnect.remakeTrackHub=on, hgHub_do_firstDb=1, and hubUrl. http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt You can also link to the Gateway page to display the hub's description html page by using the hgGateway and genome parameters. The following example links the hub to the Gateway page: http://genome.ucsc.edu/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt Creating a URL for an Assembly Hub with Multiple Track Hubs You can create a URL for an assembly hub with track hubs using a combination of the hubUrl= and genome= URL parameters. URL parameters can be combined by using &.For example, using the following assembly hub and track hubs: Arabidopsis thaliana assembly hub https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/hub.txt ReMap 2022 Regulatory Atlas https://remap.univ-amu.fr/storage/public/hubReMap2022/hub.txt UniBind 2021 Robust hub https://unibind.uio.no/static/data/latest/UniBind_hubs_Robust/UCSC/hub.txt The combination of the three hubs, along with genome=araTha1, allows for the creation of the following URL that can load the three hubs on the Genome Browser. https://genome.ucsc.edu/cgi-bin/hgTracks?genome=araTha1&hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/hub.txt&hubUrl=https://remap.univ-amu.fr/storage/public/hubReMap2022/hub.txt&hubUrl=https://unibind.uio.no/static/data/latest/UniBind_hubs_Robust/UCSC/hub.txt Creating a URL for a GenArk assembly with a track hub When creating a track hub for a GenArk assembly, there is no need to do any attaching of the assembly hub itself via the hubUrl= URL parameter. GenArk hubs will automatically attach themselves if the track hub mentions the GCA_ or GCF_ name identifier of the assembly hub. Simply load the track hub on the Genome Browser, and the assembly hub will automatically appear. For example, the following example track hub will load an additional track for the pig (GCA_002844635.1) GenArk assembly. To create a link to the track hub that references a GenArk assembly, the genome=GCA_002844635.1 and hubUrl= URL parameters can be used like in the following example: https://genome.ucsc.edu/cgi-bin/hgTracks?genome=GCA_002844635.1&hubUrl=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubGenArkExample/genArkTrackHub/hub.txt Setting up your own Track Hub This section provides a step-by-step description of the process used to set up a track hub on your own server. If you would like information about how to attach a track hub to an existing assembly hub, please refer to the following FAQ entry. To create your own hub you will need: - one or more data sets formatted in one of the compressed binary index formats supported by the Genome Browser: bigBed, bigBarChart, bigGenePred, bigNarrowPeak, bigMethyl, bigPsl, bigChain, bigInteract, bigMaf, bigWig, BAM, CRAM, HAL, hic or VCF - a set of text files that specify properties for the track hub and for each of the data tracks within it - a twoBit file with your sequence if you are setting up an assembly hub. Note that the allowed characters in sequence names are [A-Za-z_0-9_-] - an Internet-enabled web server or ftp server The files are placed on the server in a file hierarchy like the one shown in Example 1. Users experienced in setting up Genome Browser mirrors that contain their own data will find that setting up a track hub is similar, but is usually much easier. Depending on the number and complexity of the data sets, a track hub can typically be set up in a day or two. It is generally easiest to run the command-line data formatting programs in a Linux programming environment, although it's possible to manipulate smaller data sets using Mac OS-X as well. Note: there is now a useOneFile on hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the Genome Browser User Guide. If you would like to add metadata to your track hub, the following metadata guide contains examples of how to include the information in your tracks. Example 1: Directory hierarchy for a hub containing DNase and RNAseq data for the hg18 and hg19 human genome assemblies. The hg18/ and hg19/ subdirectories contain the assembly-specific data files. myHub/ - directory containing track hub files hub.txt - a short description of hub properties genomes.txt - list of genome assemblies included in the hub data hg19/ - directory of data for the hg19 (GRCh37) human assembly trackDb.txt - display properties for tracks in this directory dnase.html - description text for a DNase track dnaseLiver.bigWig - wiggle plot of DNase in liver dnaseLiver.bigBed - regions of active DNase liverGenes.bigGenePred - gene annotations of genes over-expressed in liver tissue dnaseLung.bigWig - wiggle plot of DNase in lung dnaseLung.bigWig - regions of active DNase ... rnaSeq.html - description text for an RNAseq track rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver rnaSeqLiver.bigBed - intron/exon lists for liver rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung rnaSeqLung.bigBed - intron/exon lists for lung hg18/ - directory of data for the hg18 (Build 36) human assembly trackDb.txt - display properties for tracks in this directory dnase.html - description text for a DNase track dnaseLiver.bigWig - wiggle plot of DNase data in liver dnaseLiver.bigBed - regions of active DNase dnaseLung.bigWig - wiggle plot of DNase data in lung dnaseLung.bigWig - regions of active DNase ... rnaSeq.html - description text for an RNAseq track rnaSeqLiver.bigWig - wiggle plot of RNAseq data in liver rnaSeqLiver.bigBed - intron/exon lists for liver rnaSeqLung.bigWig - wiggle plot of RNAseq data in lung rnaSeqLung.bigBed - intron/exon lists for lung ------------------------------------------------------------------------ Step 1. Format the data The data tracks provided by a hub must be formatted in one of the compressed binary index formats supported by the Genome Browser: bigWig, bigBed, bigGenePred, bigChain, bigNarrowPeak, bigMethyl, bigBarChart, bigInteract, bigPsl, bigMaf, hic, BAM, CRAM, HAL or VCF. Many of these formats also support several different chromosome aliases (e.g. '1' or 'NC_000001.11' in place of 'chr1'). bigWig - The bigWig format is best for displaying continuous value plot data, such as read depths from short read sequencing projects or levels of conservation observed in a multiple-species alignment. A bigWig file contains a list of chromosome segments, each of which is associated with a floating point value. When graphed, the segments may appear as a big "wiggle". Although each bigWig file can contain only a single value for any given base, bigWig tracks are often combined into "container multiWig" or "compositeTrack on" tagged tracks. For information on creating and configuring bigWig tracks, see the bigWig Track Format help page. bigBed - BigBed files are binary indexed versions of Browser Extensible Data (BED) files. BED format is useful for associating a name and (optionally) a color and a score with one or more related regions on the same chromosome, such as all the exons of a gene. See the bigBed Track Format help page for information on creating and configuring bigBed tracks. bigGenePred - BigGenePred files are binary indexed versions of Browser Extensible Data (BED) files with an extra eight fields that are useful for describing gene predictions that are modeled after the fields in genePred files. BigGenePred format is useful for associating a name and (optionally) a color and a score with one or more related regions on the same chromosome, such as all the exons of a gene. See the bigGenePred Track Format help page for information on creating and configuring bigGenePred tracks. bigChain - BigChain files are binary indexed versions of chain files. BigChain format is useful for large pairwise alignment data sets. See the bigChain Track Format help page for more information on creating and configuring bigChain tracks. bigMethyl - BigMethyl files are binary indexed versions of Browser Extensible Data (BED) files with first nine fields being the same as bed, and an extra nine fields that contain various scores. See the bigMethyl Track Format help page for information on creating and configuring bigMethyl tracks. bigNarrowPeak - BigNarrowPeak files are binary indexed versions of Browser Extensible Data (BED) files with first six fields being the same as bed, and an extra four fields that contain various scores and the offset of the base within the block that is the peak. See the bigNarrowPeak Track Format help page for information on creating and configuring bigNarrowPeak tracks. bigBarChart - BigBarChart files are binary indexed versions of barChart files. BigBarChart format is useful for bringing barChart display into track hubs, and supports schema customization and label configuration that is not supported for regular barChart format. See the barChart Track Format help page for information on creating and configuring bigBarChart tracks. bigInteract - BigInteract files are binary indexed versions of interact files. BigInteract format is useful for bringing interact display into track hubs, and supports schema customization and label configuration that is not supported for regular interact format. See the interact Track Format help page for information on creating and configuring bigInteract tracks. bigPsl - BigPsl files are binary indexed versions of PSL files. BigPsl format is useful for large data sets created by BLAT or other tools. See the bigPsl Track Format help page for more information on creating and configuring bigPsl tracks. bigMaf - BigMaf files are binary indexed versions of MAF files. BigMaf format is useful for large multiple alignment data sets. See the bigMaf Track Format help page for more information on creating and configuring bigMaf tracks. hic - Hic files are binary files that store contact matrices from chromatin conformation experiments. This format is useful for displaying interactions at a scale and depth that exceeds what can be easily visualized with the interact and bigInteract formats. See the hic Track Format help page for more information on creating and configuring hic tracks. BAM - BAM files contain alignments of (generally short) DNA reads to a reference sequence, usually a complete genome. BAM files are binary versions of Sequence Alignment/Map (SAM) format files. Unlike bigWig and bigBed formats, the index for a BAM file is in a separate file, which the track hub expects to be in the same directory with the same root name as the BAM file with the addition of a .bai suffix. See the BAM Track Format help page for more information. CRAM - The CRAM file format is a more dense form of BAM files with the benefit of saving much disk space. While BAM files contain all sequence data within a file, CRAM files are smaller by taking advantage of an additional external "reference sequence" file. This file is needed to both compress and decompress the read information. See the CRAM Track Format help page for more information. HAL - HAL (Hierarchical Alignment Format) is a graph-based structure to efficiently store and index multiple genome alignments and ancestral reconstructions. HAL files are represented in HDF5 format, an open standard for storing and indexing large, compressed scientific data sets. HAL is the native output format of the Progressive Cactus alignment pipeline, and is included in the Progressive Cactus installation package. VCF - VCF (Variant Call Format) files can contain annotations of single nucleotide variants, insertions/deletions, copy number variants, structural variants and other types of genomic variation. When a VCF file is compressed and indexed using tabix (available here), it can be used as a data track file. Unlike bigWig and bigBed formats, the tabix index is in a separate file, which the track hub expects to be in the same directory with the same root name as the VCF file with the addition of a .tbi suffix. See the VCF Track Format help page for more information. ------------------------------------------------------------------------ Step 2. Create the track hub directory Create a track hub directory in an Internet-accessible location on your web or ftp server. This directory will contain the hub.txt and genomes.txt files that define properties of the track hub and a subdirectory for each of the genome assemblies covered by the hub track data. Note: there is now a useOneFile on hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the Genome Browser User Guide. ------------------------------------------------------------------------ Step 3. Place the track data files in an Internet-accessible location The data files underlying a track in a hub do not have to reside in the track hub directory or even on the same server, but they must be accessible via the Internet. The track hub utility supports Internet protocols such as http://, https://, and ftp://, as well as file paths relative to the hub directory hierarchy. The location of a track file is defined by its bigDataUrl tag in the associated trackDb.txt file (Step 7). ------------------------------------------------------------------------ Step 4. Create the hub.txt file Within the hub directory, create a hub.txt file containing a single stanza with up to six fields that define properties of the track hub: hub hub_name shortLabel hub_short_label longLabel hub_long_label genomesFile genomes_filelist email email_address descriptionUrl descriptionUrl hub - a single-word name of the directory containing the track hub files. Not displayed to hub users. This must be the first line in the hub.txt file. shortLabel - the short name for the track hub. Suggested maximum length is 17 characters. Displayed as the hub name on the Track Hubs page and the track group name on the browser tracks page. longLabel - a longer descriptive label for the track hub. Suggested maximum length is 80 characters. Displayed in the description field on the Track Hubs page. genomesFile - the relative path of the genomes.txt file, which contains the list of genome assemblies covered by the track data and the names of their associated configuration files. By convention the genomes.txt file is located in the same directory as the hub.txt file. email - the contact to whom questions regarding the track hub should be directed. descriptionUrl - URL to HTML page with a description of the hub's contents. This can be relative to the directory which holds hub.txt. This file is assumed to be HTML, and if the hub is a UCSC public hub, this HTML will be crawled nightly by UCSC to build an index with which public hubs can be searched. If present, clicks on the shortLabel will open this HTML in a new tab. This field is optional. useOneFile on - only use this setting if your hub is displaying one genome and if you wish to put all information into only one file: hub.txt. This setting allows you to put the text lines of genomes.txt and trackDb.txt all into the singular hub.txtfile. To learn more about this setting click useOneFile on and see a working example here. Example 2: Sample hub.txt file defining attributes for the track hub shown in Example 1. hub UCSCHub shortLabel UCSC Hub longLabel UCSC Genome Informatics Hub for human DNase and RNAseq data genomesFile genomes.txt email genome@soe.ucsc.edu descriptionUrl ucscHub.html ------------------------------------------------------------------------ Step 5. Create the genomes.txt file Create a genomes.txt file within the track hub directory that contains a two-line stanza that must be separated by a line for each genome assembly that is supported by the hub data. Each stanza shows the location of the trackDb file that defines display properties for each track in that assembly, as well as an optional metadata storage file genome assembly_database_1 trackDb assembly_1_path/trackDb.txt metaTab assembly_1_path/tabSeparatedFile.txt genome assembly_database_2 trackDb assembly_2_path/trackDb.txt metaDb assembly_2_path/tagStormFile.txt genome - a valid UCSC database name. Each stanza must begin with this tag and each stanza must be separated by an empty line. trackDb - the relative path of the trackDb file for the assembly designated by the genome tag. By convention, the trackDb file is located in a subdirectory of the hub directory. However, the trackDb tag may also specify a complete URL. metaDb - the path to an optional tagStorm file that has the metadata for each track. Each track with metadata should have a "meta" tag specified in the trackDb stanza for that track and a "meta" tag in the tagStorm file. metaTab - the path to an optional tab separated file that has the metadata for each track. Each track with metadata should have a "meta" tag specified in the trackDb stanza for that track and a "meta" tag in the tab separated file. The first line of the TSV file should start with a '#' and have the field names for each column, one of them being "meta". If this genomes.txt file is for an assembly that does not have native support in the browser, the following fields must also be present: twoBitPath - refers to the .2bit file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program faToTwoBit. See here for instructions on how to build a 2bit file. groups - a file which defines the track groups on this Genome Browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image. The groups.txt file defines the grouping of track controls under the primary Genome Browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser. Each group is defined, for example the Mapping group: name map label Mapping priority 2 defaultIsClosed 0 The name is used in the trackDb.txt track definition group, to assign a particular track to this group. The label is displayed on the genome browser as the title of this group of track controls The priority orders this track group with the other track groups. The defaultIsClosed determines if this track group is expanded or closed by default. Values to use are 0 or 1. description - will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the assembly pull-down menu on the browser gateway page. organism - the string which is displayed along with the description on most title pages in the Genome Browser. Adjust your names in organism and description until they are appropriate. This organism name is the name that appears in the genome pull-down menu on the browser gateway page. scientificName - specifies the scientific name for the assembly. defaultPos - specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly. orderKey - used with other genome definitions at this hub to order the pull-down menu ordering the genome pull-down menu. htmlPath - refers to an html file that is used on the gateway page to display information for a novel assembly. Example genomes.txt including a newOrg1 assembly genome hg18 trackDb hg18/trackDb.txt genome hg19 trackDb hg19/trackDb.txt genome newOrg1 trackDb newOrg1/trackDb.txt twoBitPath newOrg1/newOrg1.2bit groups newOrg1/groups.txt description Big Foot V4 organism BigFoot defaultPos chr21:33031596-33033258 orderKey 4800 scientificName Biggus Footus htmlPath newOrg1/description.html ------------------------------------------------------------------------ Step 6. Create the genome assembly subdirectories Within the track hub directory, create a subdirectory for each of the genome assemblies that have track data in the hub. The subdirectory names must have a 1:1 correspondence with the database names defined by the genome tags in the genomes.txt file. ------------------------------------------------------------------------ Step 7. Create the trackDb.txt files The trackDb.txt file, which is based on the Genome Browser .ra format, is the most complicated of the text files in the hub directory. It contains a stanza for each of the data files for the given assembly that defines display and configuration properties for the track. If the tracks are grouped into larger entities, such as composite or super-tracks, the larger entities will have a stanza in the file as well. The Track Database Definition Document will help you understand how to create a trackDb.txt file. This document describes how to declare dataset display settings and values, and indicates the support level for each setting. While there are over 100 track settings supported at UCSC, other sites that display hubs have more limited settings support. To further portability of hubs, we have used input from other sites to identify a "base" subset of the "full" settings list, and the document has been assigned a version number. See the document introduction for a fuller explanation. At a minimum, each track in the trackDb.txt file must contain the "required" settings: track track_name bigDataUrl track_data_URL shortLabel short_label longLabel long_label type track_type track - the symbolic name of the track. The first character must be a letter, and the remaining characters must be letters, numbers, or under-bar ("_"). Each track must have a unique name. This tag pair must be the first entry in the trackDb.txt file. bigDataUrl - the file name, path, or Web location of the track's data file. The bigDataUrl can be a full URL. If it is not prefaced by a protocol, such as http://, https:// or ftp://, then it is considered to be a path relative to the trackDb.txt file. shortLabel - the short name for the track displayed in the track list, in the configuration and track settings, and on the details pages. Suggested maximum length is 17 characters. longLabel - the longer description label for the track that is displayed in the configuration and track settings, and on the details pages. Suggested maximum length is 80 characters. type - the format of the file specified by bigDataUrl. Must be either bigWig, bigBed, bigBarChart, bigGenePred, bigInteract, bigNarrowPeak, bigChain, bigPsl, bigMaf, hic, bam, halSnake or vcfTabix (Note: use type bam for CRAM files). If the type is bigBed, it may be followed by an optional number denoting the number of fields in the bigBed file (e.g., "type bigBed 12" for a file with 12 fields or "type bigBed 12 +" for a file that contains additional non-standard columns). If no number is given, a default value of 3 is assumed (a very limited display that omits names, strand information, and exon boundaries). Example 4: Sample trackDb.txt file containing two simple tracks. track dnaseSignal bigDataUrl dnaseSignal.bigWig shortLabel DNAse Signal longLabel Depth of alignments of DNAse reads type bigWig track dnaseReads bigDataUrl dnaseReads.bam shortLabel DNAse Reads longLabel DNAse reads mapped with MAQ type bam Suggestions: Default subtracks for composite For each composite, it is recommended that a subset of subtracks are "selected" (on) by default. This way, when a user turns the composite from hide to another visibility, they will see tracks displayed in the browser. Default composites within a super-track: For super-tracks that you don't want displaying by default when your track hub is turned on, it is recommended that some (or all) composites within the super-track be set to dense (or some visibility other than hide) by default and that the super-track be set to hide by default. This way, if a user changes a super-track from hide to show from the controls under the browser image, tracks are displayed. To implement, change the visibility line in trackDb of the super-tracks to hide and the visibility lines of all or some of the composite tracks within to dense (or some visibility other than hide). hgTrackUi controls: In addition to the controls for each view (click on the title of the view drop-down), there is often another set of controls above the view drop-downs (just under the "Overall display mode"). This set of controls is not associated with a particular view and clashes with the view controls. It is recommended to remove the controls that are not associated with a particular view. ------------------------------------------------------------------------ Step 8. Create track description files Each track in the hub may have an associated description file that describes the track to viewers. The file provides detailed information about the data displayed in the track, including methods used to produce and validate the data, background information, display conventions, acknowledgments, and reference publications. The description file, which must be in HTML format, is inserted into the track configuration page that displays when the user clicks on the track's short label. It also displays on the track details page that is shown when the user clicks on a feature in the track image. The track description file must have the same name as the symbolic name for the track (defined by the track tag in the trackDb.txt file) with a suffix of .html. For instance, a description file associated with the track named "dnaseSignal" in Example 4 would be named "dnaseSignal.html". The description file must reside in the same directory as the trackDb.txt file. Both parent and child tracks within a super-track can have their own description files. If the description file is not present, the corresponding sections of the track settings and details pages are left blank. Only one description page can be associated with composite and multiWig tracks; the file name should correspond to the symbolic name of the top-level track in the composite. Adding groups to a Track Hub To organize tracks within your track hub, you can utilize the groups setting, which references a separate text file that defines track groups for display under the genome browser graphic. [] The groups setting can be applied to a UCSC genome, a GenArk assembly, or an assembly hub. These track hub groups are kept separate from other track hubs and the native UCSC Genome Browser track groups, allowing for greater organizational flexibility. For instance, you can add a "genes" group without causing conflicts or confusion. You can define groups with names like "Category 1", "Category 2", and "Category 3". These group names are specified in the "groups.txt file ", which defines the track groups. Below is an example stanza demonstrating how to define these groups: name category1 label Category 1 name category2 label Category 2 priority 1 name category3 label Category 3 defaultIsClosed 1 You can access the complete "groups.txt " file containing these group definitions here: groups.txt. Within this file: - The name setting is used in the trackDb.txt file to associate specific tracks with a group. - The label setting specifies the title of the group in the genome browser. By default, groups are sorted alphabetically based on the label. - The priority setting dictates the display order of the track groups, with lower numbers shown first. - The defaultIsClosed setting controls whether the group is initially expanded or collapsed (0 for expanded, 1 for collapsed). After defining the "groups.txt" file, tracks can be assigned to a specific group within the "trackDb.txt" file using the group setting. For example, to assign the "bigBed1" track to the "category1" group, you could use the following stanza: track bigBed1 bigDataUrl https://hgdownload.soe.ucsc.edu/gbdb/hg38/ncbiRefSeq/ncbiRefSeqGenomicDiff.bb shortLabel bigBed example 1 longLabel This bigBed file is an example from the NCBI RefSeq Diffs Track type bigBed group category1 visibility dense The full "trackDb.txt" file, which includes the "bigBed1" track and additional examples, can be viewed here: trackDb.txt. Next, incorporate the "groups.txt" file with the groups setting into the genome stanza. This can be for a UCSC genome, a GenArk assembly, or an assembly hub. Below is an example using the hg38 genome: genome hg38 trackDb hg38/trackDb.txt groups groups.txt An example "genomes.txt" file, which includes the hg38 genome, a GenArk assembly, and an assembly hub, is available here: genomes.txt. By attaching the hub with these group settings, the tracks will be displayed within the specified groups. Below are URLs showing the group settings for the hg38 genome, a GenArk assembly, and an assembly hub: - hg38: hg38 Genome. - GCF_000951035.1: GenArk Assembly. - Arabidopsis thaliana Assembly Hub: Assembly Hub. Debugging Track Hubs Not updating? Change udcTimeout As part of the track hub mechanism, UCSC caches data from the hub on the local server. The hub utility periodically checks the time stamps on the hub files, and downloads them again only if they have a time stamp newer than the UCSC one. For performance reasons, UCSC checks the time stamps every 300 seconds, which can result in a 5-minute delay between the time a hub file is updated and the change appears on the Genome Browser. Hub providers can work around this delay by inserting the CGI variable udcTimeout=5 into the Genome Browser URL, which will reduce the delay to five seconds. To add this variable, open the Genome Browser tracks page and zoom or scroll the image to display a full browser URL in which the CGI variables visible. Insert the CGI variable just after the "hgTracks" portion of the URL so that it reads http://genome.ucsc.edu/cgi-bin/hgTracks?udcTimeout=5& (with the remainder of the URL following the ampersand). To restore the default timeout, a warning message will appear on hgTracks with a link to clear the udcTimeout variable. I used udcTimeout, why is it still not updating? Browser software attempts to speed performance by reading the trackDb architecture into a separate cache. This software assumes that the timestamp on a file like a hub's trackDb.txt should always be increasing in time and never decreasing (i.e., a new update to the file should not have an earlier timestamp). We did discover that if someone were to copy an older file (say with cp -p to preserve file timestamp), for example to restore a trackDb.txt to an earlier version, they might get stuck with a hub that is not updating. To resolve this specific problem the person would only need to edit or touch the file and give it a new timestamp. Check hub settings using Hub Development tool Hubs can be checked for valid file configuration, trackDb keywords, and composite or super track settings with the Hub Development tool. This tool can be accessed from the Track Hub page under the Hub Development tab. After entering your URL in the search box, the tool runs the hubCheck utility program equivalent to: hubCheck -noTracks http://url.to.hub.txt The noTracks setting speeds up the validation process but does not check for the presence or validity of your remotely hosted data track files. This tool checks your hub.txt, genome.txt, and trackDb.txt settings and displays warnings and errors in bright red font, such as "Missing required setting..." and "Cannot open...". The "Display load times" and "Enable hub refresh" optional settings show the load timing at the bottom of the Genome Browser page and allow instant hub refresh instead of 5 minute refresh. These options can be checked and activated by clicking "View Hub on Genome Browser". The following picture shows the example track grouping hub with the warning that the hub has no hub description page, no configuration errors, and "Display load times" checked: [The Hub Development tool checks config setting] The Hub Development tool checks for proper configuration files and track hub settings, and allows access to debugging settings. Check hub settings using hubCheck utility It is a good practice to run the command-line utility hubCheck on your track hub when you first bring it online and whenever you make significant changes. This utility by default checks that the files in the hub are correctly formatted, but it can also be configured to check a few other things including that various trackDb settings are correctly spelled and that they are supported by the UCSC Genome Browser. You can read more about using hubCheck to check the compatibility of your hub with other genome browsers below. Here is the usage statement for the hubCheck utility: hubCheck - Check a track data hub for integrity. usage: hubCheck http://yourHost/yourDir/hub.txt options: -checkSettings - check trackDb settings to spec -version=[v?|url] - version to validate settings against (defaults to version in hub.txt, or current standard) -extra=[file|url] - accept settings in this file (or url) -level=base|required - reject settings below this support level -settings - just list settings with support level Will create this directory if not existing -noTracks - don't check remote files for tracks, just trackDb (faster) -udcDir=/dir/to/cache - place to put cache for remote bigBeds and bigWigs Note that you will have to use the -udcDir option if /tmp/udcCache is not writable on your machine. The hubCheck program is available from the UCSC downloads server at http://hgdownload.soe.ucsc.edu/admin/exe/. Troubleshooting Track Hub connections If the browser is unable to load a track hub, it will display an error message. Some common causes for an import to fail include typos in the URL, a hub server that is offline, or errors in the track hub configuration files. Occasionally, remote track hubs may be missing, off-line, or otherwise unavailable. If a user is already browsing data from the remote hub when it disconnects, a yellow error message will be displayed instead of the expected data. Download all files in a hub using hubClone utility You can use the command-line utility hubClone to download all of the data and configuration files for a hub onto your local machine. To do so, use the utility with the "-download" option and it will download any files specified by bigDataUrls (e.g. bigBed, bigWig, etc). By default, it only copies the configuration files and changes any local bigDataUrls into remote URLs. Beyond that, you may also find it useful to make a copy of a public hub's configuration files in order to imitate various settings, such as filters, search options, or track groupings. Here is the usage message for the hubClone utility: hubClone - Clone the remote hub text files to a local copy in newDirectoryName, fixing up bigDataUrls to remote location if necessary usage: hubClone http://url/to/hub.txt options: -udcDir=/dir/to/udcCache Path to udc directory -download Download data files in addition to the hub configuration files Note that you will have to use the -udcDir option if /tmp/udcCache is not writable on your machine. The hubClone program is available from the UCSC downloads server at http://hgdownload.soe.ucsc.edu/admin/exe/. Setting up track item search The Genome Browser supports searching for items within bigBed tracks in track data hubs. To support this behavior you have to add an index to the bigBed file when you initially create the the bigBed file from the bed file input. Indices are usually created on the name field of the bed, but can be created on any field of the bed. Free-text searches can also be enabled by creating a TRIX index file that maps id's in the track to free-text metadata. Further instructions can be found in the Searchable Hub Quick Start Guide. See the searchIndex and searchTrix fields in the Hub Track Database Definition document for information on how to set up your bigBed to enable searching. The searchIndex setting requires the input BED data to be case-senstive sorted (sort -k1,1 -k2,2n). You can use either the example UNIX sort command or the bedSort utility available here. See an example searchable hub. Adding Filters to your Track Hub The Genome Browser supports three varieties of data filter options for bigBed data. These can improve data usability in many ways, such as displaying specific data by default (e.g. only items that pass certain quality scores), allow for filtering based on pre-specified categories (e.g designate only LINE repetitive elements from a list of options) and more. For more information on the options available and examples on how to set up filters, see our Track Hub Filters Quick Start Guide and the filter entries in the Track Database Definition Document. Registering a Track Hub with UCSC If you would like to share your track hub with other Genome Browser users, you can register your hub with UCSC by contacting the Genome Browser technical support mailing list at genome@soe.ucsc.edu. Please include the URL of your hub.txt file in the message. Once registered, your hub will appear as a link on the Public Hubs tab on the Track Hubs page. To assist developers of Public Hubs, there is a Public Hub Guidelines page. The page shares pointers and preferred style approaches, such as the need for creating description html pages for your data that display any available references and an email contact for further data questions. Alternatively, you can share your track hub with selected colleagues by providing them with the URL needed to load your hub via the Connected Hubs tab. Checking Hub settings and Compatibility Due to the growth in popularity of the track hub format, other genome browsers have begun supporting the UCSC track hub format. The hubCheck utility can be used to check the compatibility of your hub with the UCSC Genome Browser and other genome browsers. The following examples describe various settings that may be useful to you as you test this compatibility. Example 1: Listing the trackDb settings and their support levels and filtering them by support level. You can see all of the currently supported trackDb settings and their support level in the UCSC Genome Browser by running hubCheck with the "-settings" option: $ hubCheck -settings You can filter the displayed settings by support level by using the "-level=" option followed by the maximum level you wanted displayed. For example, to show only the required settings: $ hubCheck -settings -level=required Or, to show settings at both the required and base support levels: $ hubCheck -settings -level=base When you use the "-level" option to declare a support level, it includes the level that you define plus every level above it. This hierarchy of the different support levels is defined at the beginning of the Track Database Definition document. Example 2: Checking your trackDb settings against the settings and support levels provided by the UCSC Genome Browser. The "-checkSettings" option can be used check the settings in your hub's trackDb file against those provided by the UCSC Genome Browser on the Track Database definition page. This does not check to see that all of these settings are properly used, but checks to see that they are supported by the Genome Browser. For example, to fully run hubCheck on your hub: $ hubCheck -checkSettings http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt If you are just looking to check the compatibility of your hub and not the integrity of your files, you can use the "-noTracks" option to just check setting compatibility. Skipping these file integrity checks will also speed up hubCheck. Here is an example of some of the errors you might see if your hub includes unsupported settings: $ hubCheck -checkSettings http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubCheckUnsupportedSettings/hub.txt Found 1 problem: Setting 'ensemblAssemblyName' is unknown/unsupported If you want to check the settings used in your hub against those at a particular support level and higher, you can also use the "-level=" setting here as well. For example: $ hubCheck -checkSettings -level=base http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubGroupings/hub.txt The resulting list of problems reported by hubCheck are settings beyond the base support level and are less likely to be supported at other genome browsers, as not all external browsers support the full list of UCSC settings. We will periodically increment the version number for our trackDb settings as large numbers of settings are added, updated or (rarely) removed. Using just the "-checkSettings" option will check your trackDb settings against those defined in the most recent version of the Track Database definition page. However, if you want to check your settings against an older version of these settings, you can use the "-version=" option. For example, if you wanted to check your hub against version one of our trackDb settings: $ hubCheck -checkSettings -version=v1 http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubGroupings/hub.txt Example 3: Checking your settings against those provided by UCSC and another source, such as Ensembl. If you want to check the settings in your hub against those supported by other genome browsers, you will first need to create a single-column file that lists each non-UCSC setting and then use the "-extra=" option to specify this file when running hubCheck. For example, if you knew that a setting called "ensemblAssemblyName" was supported for use in track hubs by Ensembl, you could create a single line file that included the setting "ensemblAssemblyName". Then, when you want to check a hub that includes these extra trackDb settings, you would then specify this extra settings file on the command line: $ hubCheck -checkSettings -extra=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubCheckUnsupportedSettings/myExtraSettings.txt http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubCheckUnsupportedSettings/hub.txt (Note: The settings listed here in the "extra" file are just examples and do not represent real trackDb variables for hubs at Ensembl.) Where to host your data? As stated in What Are Track Hubs?, track hubs files must be located in web-accessible locations that support byte-range requests. Four options for hosting include: - Your institution's Information Technology services - Commercial webspace providers - Commercial cloud providers - Free webspace providers Your Institution: Many universities provide a location for researchers to place shareable data on the web and contacting your institution's system administrators will help discover a location to store your data. For example, if you work at the NIH, there is an internal data sharing NIH network site. Sometimes institution firewall rules can change, and you may need to inform your system administrators to add browser IP addresses as exceptions, listed here. Usually, your IT department can direct you to someone who manages webspace for individual groups. This is our recommended option, as it is usually free, very fast and you can update the files yourself easily. Webspace providers: If your institution does not provide any web hosting space for you, the most convenient solution is usually to buy a virtualized webspace server from a commercial web hosting provider. Files can be uploaded with FTP, rsync or scp and appear on a https:// domain. Avoid unlimited offers, they often do not allow binary files and are slower, rather look for a "virtual private server" (VPS). Some examples of providers are: A2 Hosting, BlueHost, GoDaddy, HostGator, Hostinger, DreamHost, but there are many others. This is not a complete list and we do not endorse a particular one. You can search the internet for "virtual private server comparison". Offers start at around $5-10/month for 25-50 GB of storage. The advantage of VPS providers is that they bill a flat rate per month, which may be easier to order through universities than the per GB transferred billing of cloud providers. For optimal performance, select a West Coast / San Francisco data center when ordering a web server, as this is closest and fastest from UCSC. There are usually no backups included, so it is good to keep local copies of your files. Cloud providers: In general, commercial online cloud backup providers that charge a flat rate (Dropbox, iCloud, Google Drive, Box.com, Microsoft OneDrive, Tencent Weiyun, Yandex.Disk, etc.) do not work reliably as their business model requires rare and rate-limited data access, which is too slow or too limited for genome annotation display. However, commercial cloud storage offers that charge per GB transferred (Amazon S3, Microsoft Azure Storage, Google Cloud Storage, Backblaze, Alibaba Object Store, etc.) typically do work. As of 2020, they cost around 2-3 US cents/GB/month to store the hub data and 12-18 US cents per GB transferred, when the hub is used. For optimal performance, select a San Francisco / San Jose data center for the main UCSC site genome.ucsc.edu, a Frankfurt/Germany data center for genome-euro.ucsc.edu and a Tokyo data center for genome-asia.ucsc.edu. You may also want to review this discussion about issues with distributed storage servers. These services are external to UCSC and may change. Free webspace: If you do not want to pay for web space, and your institution does not provide a data location supporting byte-range requests, we know of at least the following sites where you can host research data and configuration files for free: - Galaxy - Maximum size limit is 50 GB (uncompressed). 250 GB of storage is available per Galaxy account. - CyVerse Discovery Environment - 5 GB of space for free, but can be relatively slow to display. Offers a paid subscription service to expand space. - Github - files limited to 100MB, but very fast. - Figshare - not limited and fast, but every file needs to be uploaded individually and cannot be changed. Optimal for very stable links, e.g. in publications. - DropBox - 2 GB free on their BasicFree plan. Many other paid dropbox plans are available with much more space. Each of the providers above has a slightly different approach to hosting data for compatibility with the UCSC Genome Browser, and may have different advantages and disadvantages, such as size limitations, usage statistics, and version control integration. Additionally, as previously mentioned, any provider that supports byte-range access will work for hub hosting, and you are not limited to the above sites. Below is a summarized guide for each of the providers mentioned above. Galaxy Hosting Cyverse Hosting GitHub Hosting Figshare Hosting DropBox Hosting Hosting Files on Galaxy Galaxy is an open-source platform for FAIR data analysis that enables users to streamline and the analysis of genomic data and serves as a comprehensive toolkit for researchers, scientists, and bioinformaticians. The Galaxy platform provides a user-friendly web interface that allows you to host your data. You can navigate to the Galaxy platform, log in to your account (or create an account if you haven't already), and use the data upload functionality provided to host your data. Once uploaded, the data will be stored securely on the platform and made available for your analysis. Viewing a Single File Hosted on Galaxy 1. To begin, sign into your Galaxy account and then click the "Upload Data" button on the left-hand side of the page. A pop-up will appear to upload the file to Galaxy. [] 2. On the pop-up menu, click the "Choose local files" to select the file to host. Once the file is selected, select the genome and file type from the drop-down menus, and click the "Start" button. For this example, we will upload a bigBed file for the GRCh37/hg19 genome to Galaxy. [] 3. Once the file is uploaded to Galaxy, it will appear in the "History" section on the right-hand side. Click on the file to expand the menu, and additional information about the file such as the file size, format, and the database. Towards the bottom of the menu, five icons are displayed that perform different functions on Galaxy. Clicking on the "graph icon" will bring up another menu in the middle of the screen with a link to visualize the bigBed file on the UCSC Genome Browser. [] Creating and Viewing a Hub Hosted on Galaxy 1. Similar to the step in the previous example, sign into your Galaxy account and upload the files to Galaxy. For this example, we will be uploading a bigBed, a VCF, and a VCF index file to view on the GRCh37/hg19 genome as a track hub. [] 2. To create the hub, a URL to each of the files is needed to reference the Galaxy hosted data inside the hub. Click on each of the three files to expand the menu and display the five icons at the bottom. Next, click on the "chain-link" icon to copy the URL to the hosted file. With these three URLs, we can now begin to build the hub. [] 3. When creating the hub, it is recommended to use the useOneFile on setting to have the contents of the genomes.txt, hub.txt, and trackDb.txt files inside of a single file. With the URLs from the previous step, reference the three files hosted on Galaxy using the the bigDataUrl and bigDataIndex trackDb settings. The entire contents of the example hub can be seen below: hub myExampleHub shortLabel Example Hub longLabel Example Hub Hosted on Galaxy useOneFile on email genome-www@soe.ucsc.edu genome hg19 track vcfExample shortLabel VCF example longLabel VCF example using Galaxy visibility pack type vcfTabix maxWindowToDraw 200000 bigDataUrl https://usegalaxy.org/api/datasets/f9cad7b01a472135076a05ca3c1069a7/display?to_ext=vcf bigDataIndex https://usegalaxy.org/api/datasets/f9cad7b01a47213595c3df01954cabaf/display?to_ext=binary track bigBedExample shortLabel BigBed example longLabel BigBed example using Galaxy type bigBed visibility pack bigDataUrl https://usegalaxy.org/api/datasets/f9cad7b01a4721358405ebd3fc34f65a/display?to_ext=bigbed 4. After creating the hub on your local computer, upload the hub.txt file to Galaxy. Once the file is on Galaxy, click on the hub.txt file to expand the menu. Again, click on the "chain-link" icon to obtain the URL to the hub.txt file. Now, navigate to the UCSC Genome Browser's Track Hubs page and enter the URL to view the hub. [] 5. Alternatively, you can create a URL using the &hubUrl= URL parameter to quickly share and load the hub with colleagues: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr21:33031468-33034434&hubUrl=https://usegalaxy.org/api/datasets/f9cad7b01a472135a3056c5f93f137db/display?to_ext=txt Hosting Hubs on CyVerse CyVerse, previously known as the iPlant Collaborative, is an NSF-funded site created for assisting data scientists with their data storage and compute needs. Data hosting by CyVerse is free for academic groups and they support byte-range access, so they can be used for track hubs. However, Cyverse is sometimes slow, and may result in error messages if your hub includes many tracks that are meant to be shown at the same time by your users. In order to host your data on CyVerse, you first must create an account and then use their Discovery Environment to upload data. After creating an account and signing in, access the data screen by clicking the second icon on the left. Use the "Upload" button on the far right to import data from a URL or locally from your machine. [] You can also use the command line utility iCommands to facilitate bulk transfer of data (best used for large files in the 2-100 GB range), or use Cyberduck to bulk transfer up to 80 GB of data in one go. Once your file is available use the three dots on the far right to click the "Public Links(s)" option. [] Select this option for all the files you will be using with the Genome Browser, whether they are text-based files (trackDb.txt, groups.txt, description.html, etc.) or binary-indexed files (BAM, bigWig, bigBed, etc.) requiring byte-range access. Note, if you have a dataFile.bam, you must also have a dataFile.bam.bai file of the matching name and both must have public links created. After creating public links to your binary files, you must ensure your text files (i.e., trackDb.txt) point to the CyVerse locations for the files. For instance, the bigDataUrl setting, will need to point to the location of the BAM, bigWig, or bigBed (i.e., bigDataUrl https://data.cyverse.org/... /dataFile.bam). [] The hub.txt file (if not using the useOneFile on setting) will need to point the related genomes.txt location, which in turn points to the trackDb.txt location using these full https://data.cyverse.org/... links as well. Please see the Using the Discovery Environment wiki page on the CyVerse wiki for more information. Please direct any questions about CyVerse or the Discovery Environment to their Contact Us page or "Chat with Cyverse support" staff directly via the blue question box icon on the top right of the Discovery Environment page. [] Note, if you need to replace files once they have been uploaded into CyVerse's DE and Public links have already been created you will need to force update the CyVerse cache. One way is to go back to the Public links section and find "Refresh Cache" button. Another way is by hitting Control-Shift-R in your browser to force reload the file or by sending Cache-Control: no-cache header: curl --head --header 'Cache-Control: no-cache' https://data.cyverse.org/.../dataFile.bam Hosting Hubs on Github Github supports byte-range access to files when they are accessed via the raw.githubusercontent.com style URLs. To obtain a raw URL to a file already uploaded on Github, click on a file in your repository and click the Raw button: [Location of the Raw button for generating a plaintext URL to a file hosted on Github.] The "Raw" button results in a plain text page like the following: [Example of URL to a file on github] The bigDataUrl field (and any other statement pointing to a URL like bigDataIndex, refUrl, barChartMatrixUrl, etc.) of your trackDb.txt file should use the "raw.githubusercontent.com" style URL as shown above. Note that similar to CyVerse, you will need to update your trackDb.txt bigDataUrl lines to point to their correct raw.githubusercontent.com addresses. Please also note that HTML does not render properly from this address, and so you if you are also hosting your hub's description page here, you will want to use a site like RawGit to point to your descriptionUrl. The advantage to hosting your data on Github is the built in version control of the site, meaning you can view previous hub configurations and keep backups and history of your work automatically. The disadvantage of hosting on github is the relatively small file size upload limit compared to other hosting providers. For an example public hub hosted on Github, please see the Human cellular microRNAome barChart hub. For more information about moving files to Github, please see Github's help pages. Please direct any questions about Github to their help desk. Hosting hubs on figshare Figshare is a site for researchers and institutions to upload and collect usage statistics on their data, as well as make their data shareable and discoverable. The process for uploading a hub to Figshare is similar to the process involved at CyVerse, where one must first create an account, upload the bigDataUrl files, create shareable links, and then edit your hub.txt, genomes.txt, and trackDb.txt appropriately. One advantage to using Figshare is their emphasis on usage statistics, so institutional accounts can see how often their hubs and tracks are being accessed by others. Note that Figshare does not use filenames as part of the URLs, therefore bigDataUrl files that require a separate index file, like VCFs and BAM files, must have their index file location specified with a bigDataIndex. This keyword is relevant for Custom Tracks and Track Hubs. You can read more about bigDataIndex in the TrackDb Database Definition page. If you are having issues hosting at figshare, try to use the file's download URL. This URL will have "ndownloader" in its path. Also, for custom tracks you will need to declare a track line with track type and bigDataUrl. Below is a simple example of a bigBed custom track on hg38: track type=bigBed name="figshare example" bigDataUrl=https://figshare.com/ndownloader/files/38068053 Hosting hubs on DropBox DropBox is a site for users to share data files. DropBox has recently added support byteranges, a feature we need for hubs and custom tracks. DropBox has 2 GB in their BasicFree plan for free. Dropbox Plus for $10 a month -- $120 per year, and you must pay the full year to get the lowest price -- 1 user with 2 TB files up to 50GB maximum individual file size. They have lots of other plans available to for users, businesses, and schools. One must first create a DropBox account, upload the bigDataUrl files, or drag-and-drop from local disk. DropBox also has an app. They provide support for MacOS and Android too. Copy the share URL provided. Edit your hub.txt appropriately. Use useOneFile on hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the Genome Browser User Guide. If you would like to add metadata to your track hub, the following metadata guide contains examples of how to include the information in your tracks. Click share on each data file, choose copy link, Ctrl-C and paste the dropbox URL into the hub oneFile the bigDataUrl field in the trackDb section of your useOneFile hub txt. It will look somehting like this: bigDataUrl https://www.dropbox.com/scl/fi/8t785o3sqidp0tmar91bf/dnaseRep3.bw?rlkey=37wucbhdvwqntw4ejvig4kg7c&st=11v8l216&dl=0 Click the share button over your hub .txt file and Copy Link, Ctrl-C. Paste that dropbox hub url into the hgHubConnect CGI tab. https://www.dropbox.com/scl/fi/6wrobg6wqcgtm7khew4qo/hub1.txt?rlkey=vi32q9tb68kjpn2xhy0qjuqkf&st=nldzp2tw&dl=0 Note that DropBox does not use pathnames as part of the URLs, therefore bigDataUrl files that require a separate index file, like VCFs and BAM files, must have their index file location specified with a bigDataIndex. This keyword is relevant for Custom Tracks and Track Hubs. You can read more about bigDataIndex in the TrackDb Database Definition page. For custom tracks you will need to declare a track line with track type and bigDataUrl. Below is a simple example of a bigBed custom track on hg38: track type=bigWig name="dropbox example" bigDataUrl=https://www.dropbox.com/scl/fi/8t785o3sqidp0tmar91bf/dnaseRep3.bw?rlkey=37wucbhdvwqntw4ejvig4kg7c&st=1z1jce0t&dl=0 For more information on using DropBox, please see their DropBox Support. DropBox also provides a folder feature to help organize and group related data. Troubleshooting your own HTTPS server configuration When your own institution's system administrators are hosting your data they may benefit from this section about ensuring a secure HTTPS configuration. The most popular web servers that system admins use are Apache and NGINX. Instructions for setting up these popular web servers are found all over the web, so this section will not cover those here. Certs and Security As security on the Internet becomes increasingly important, SSL certificates are often required for proper server installation. Proper certificate validation helps stop "Man-In-The-Middle" attacks by ensuring that connections go to the correct server and not some fake imposter site. This process requires SSL certificates that have not expired, and whose domain name matches the domain name specified in the HTTPS URL. The UCSC Genome Browser's networking software uses the very popular open source library openssl 1.0. System administrators hosting your data should ensure that TLS1.2 is allowed if you are going to provide data over HTTPS, since it is fast and secure and compatible with openssl 1.0. FREE CERT PROVIDER To help system administrators, here are groups that provide free web certs, including the popular LETSENCRYPT Testing your site certs Here are ways to check HTTPS certificates, such as with curl, which uses openssl. curl https://yourdomain.com/yourhub/hub.txt If curl can fetch the hub.txt HTTPS URL without errors, then the certs should work with the UCSC Genome Browser. For a deeper level of debugging, system administrators can use the open ssl client command: openssl s_client -trusted_first -connect yourdomain.com:443 -servername yourdomain.com Various online SSL Server Test sites have great detailed documentation about how to check your website's certs and configuration, such as https://www.ssllabs.com/ssltest/. Here is an example where you can supply yourdomain.com and discover results: https://www.ssllabs.com/ssltest/analyze.html?d=yourdomain.com&latest Feel free to contact UCSC Genome Browser for help if you are seeing certificate validation error messages you do not understand. /goldenPath/help/liftOver.html:Genome_Browser_Track_LiftOver LiftOver of tracks from previous to new assembly The tracks indicated by a "ball" logo (e.g., [] and []) have been lifted from a previous assembly of the same organism with a minimum of quality control scrutiny (e.g., have been lifted from hg17 or hg18 to a later human assembly). The number indicated on the logo indicates the version of the assembly that the track was lifted from. These tracks are provided to our users with the intent that they assist in interpretation of other data, but must be used with caution. Not all annotations remain intact when lifted in this manner and in any case, cannot by definition contain any sequence that is new to the newer assembly. It should also be noted that tracks containing large regions will not lift as well because of the increased chance of spanning a region that has changed between the two assemblies. /goldenPath/help/hic.html:Genome_Browser_hic_Track_Format hic Track Format Note: The UCSC tools currently support hic versions 6-8. The hic track configuration help page describes hic display options. Hic format is an indexed binary format designed to permit fast random access to contact matrix heatmaps. The format was designed by the Aiden Lab at Baylor College of Medicine. More information on the hic format itself can be found in the documentation on Github. The format is used for displaying chromatin conformation data in the browser. This format is useful for displaying interactions at a scale and depth that exceeds what can be easily visualized with the interact and bigInteract formats. After running a chromatin conformation experiment such as in situ Hi-C, researchers can pass their results through the Juicer pipeline to produce a hic file. Due to the large size of many of these files, UCSC is not able to support direct upload via our Custom Track interface. Instead, users should place their Hic files in a web-accessible space and enter the URL as a bigDataUrl. If you do not have access to a web-accessible server and need hosting space for your Hic files, please see the Hosting section of the Track Hub Help documentation. [hic draw modes] Generating a hic track The typical workflow for generating a hic custom track is this: 1. Prepare your data by processing it with the Juicer pipeline to create a file in the hic format. 2. Move the hic file (my.hic) to an http, https, or ftp location. 3. Construct a custom track using a single track line. The basic version of the track line will look something like this: track type=hic name="My HIC" bigDataUrl=http://myorg.edu/mylab/my.hic 4. Paste the custom track line into the text box in the custom track management page, click "submit" and view in the Genome Browser. Parameters for hic custom track definition lines All options are placed in a single line separated by spaces (lines are broken only for readability here): track type=hic bigDataUrl=http://... name=track_label description=center_label visibility=display_mode db=db Note if you copy/paste the above example, you must remove the line breaks. Click here for a text version that you can paste without editing. The track type and bigDataUrl are REQUIRED: type=hic bigDataUrl=http://myorg.edu/mylab/my.hic The remaining settings are OPTIONAL: name track label # default is "User Track" description center label # default is "User Supplied Track" visibility full|dense|hide # default is hide (will also take numeric values 4|1|0) db genome database # e.g. hg19 for Human Feb. 2009 (GRCh37) Note that hic tracks currently only support the full, dense, and hide visibility modes. The hic track configuration help page describes the hic track configuration page options. Example #1 In this example, you will create a custom track for a hic file that is already on a public internet server — data from an in situ Hi-C experiment on the HMEC cell line mapped to the hg19 assembly (Rao et al., 2014). The line breaks inserted in the track line for readability must be removed before submitting this entry as a Custom Track. Click here for a text version you can paste without editing. The "browser" line above is used set the default view to a region of chromosome 21. browser position chr21:32,000,000-35,000,000 track type=hic name="hic Example One" description="hic Ex. 1: in situ Hi-C on HMEC" db=hg19 visibility=dense bigDataUrl=http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/hic/GSE63525_HMEC_combined.hic Paste the "browser" line and "track" line into the custom track management page for the human assembly hg19 (Feb. 2009), then click the "submit" button. On the following page, click the "chr21" link in the custom track listing to view the hic track in the Genome Browser. Example #2 In this example, you will load a hub that has hic data described in a hub's trackDb.txt file. First, navigate to the Basic Hub Quick Start Guide and review an introduction to hubs. Visualizing hic files in hubs involves creating three text files: hub.txt, genomes.txt, and trackDb.txt. The browser is passed a URL to the top-level hub.txt file that points to the related genomes.txt and trackDb.txt files. The trackDb.txt file contains stanzas for each track that outlines the details and type of each track to display, such as these lines for a hic file located at the bigDataUrl location: track hic1 bigDataUrl http://http://hgdownload.soe.ucsc.edu/gbdb/hg19/bbi/hic/GSE63525_GM12878_insitu_primary+replicate_combined.hic shortLabel hic example longLabel This hic file shows in situ Hi-C data from Rao et al. (2014) on the GM12878 cell line type hic visibility dense Note: there is now a useOneFile on hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the Genome Browser User Guide. Here is a direct link to the trackDb.txt file to see more information about this example hub, and below is a direct link to visualize the hub in the browser, where this example hic file displays in dense mode alongside the other tracks in this hub. You can find more Track Hub hic display options on the Track Database (trackDb) Definition Document page. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/hub.txt [Ex hic image from track hub] Sharing Your Data with Others If you would like to share your hic data track with a colleague, there are a couple of options. One method is to load your data track into the UCSC Genome Browser and then create a saved session by following the instructions here. If you are looking for a more automated method of sharing data, you may be interested to learn how to create a direct URL that loads custom data files. For a demonstration of this, see Example 6 on the custom tracks page. /goldenPath/help/hgGenomeHelp.html:Genome_Graphs Genome Graphs User's Guide Contents Introduction Formatting, uploading and importing data - Formatting data - Uploading data - Importing data Quick start Displaying data in Genome Graphs - Configuring the display - Setting a significance threshold - Setting a data region Viewing data in the Genome Browser Viewing data in the Gene Sorter Deleting data Correlating data ------------------------------------------------------------------------ Questions and feedback on this User's Guide are welcome. User questions and answers on Genome Graphs and other topics are available in the Genome Browser mailing list. Introduction Genome Graphs is a tool for displaying genome-wide data sets such as the results of genome-wide SNP association studies, linkage studies and homozygosity mapping. Using the Genome Graphs tool, you can: - upload several sets of genome-wide data and display them simultaneously - click on an area of interest and go directly to the genome browser at that position - set a significance threshold for your data and view only regions that meet that threshold - view the genes that exist in areas where your data meet your significance threshold To return to Genome Graphs from any other location on the Genome Browser website, use your browser's Back button, or click Home on the blue navigation bar, then click the Genome Graphs link. Note that only the "standard" chromosomes are displayed in the Genome Graphs display; haplotype and mitochondrial chromosomes are not displayed. This User's Guide is aimed at both the novice Genome Graphs user as well as the advanced user. If you are new to the Genome Graphs tool, read the Quick Start section to learn about the basics using some sample data. Advanced users may want to proceed directly to the section that addresses a particular area of functionality in detail. Formatting, uploading and importing data Formatting data Genome Graphs allows you to upload data from files that reside on your computer. Several file formats are accepted by the program. For all formats there is a single line for each marker. Each line starts with information on the marker, and ends with the numerical values associated with that marker. The markers can be of one of the following types: - chromosome base: e.g., chr1 130000 (Note that the first base in a chromosome is considered position 0) - STS Marker: e.g., RH75228 - dbSNP rsID: e.g., rs12345 - Affymetrix 500k Gene Chip: e.g., SNP_A-1780270 - Affymetrix Genome-Wide SNP Array 6: e.g., SNP_A-8575125 - Affymetrix SNP Array 6 Structural-Variation: e.g., CN_47396 - Illumina HumanHap300 Bead Chip: e.g., rs3934834 - Illumina HumanHap550 Bead Chip: e.g., rs3094315 - Illumina HumanHap650 Bead Chip: e.g., rs3094315 - Agilent CGH 244A: e.g. A_14_P112718 The marker-value pairs in each line of the file can be separated with a single space, a tab, or a comma. The file can contain multiple values for each marker. In that case, a separate graph will be created for each value column in the input file. For example, chromosome base markers with only one value associated with the marker would be entered like this: chrX 100000 1.23 dbSNP rsID markers with two values associated with the marker would be entered like this: rs10218492 0.384 0.882 The Genome Graph program will map the marker IDs to the genome. In cases where the marker maps to more than one location in the genome, the value(s) in your input file will be associated with each location. If the value associated with your marker is positive, do not include a sign (e.g., "+"). Include a sign ("-") only if the value is negative. Note that markers can only be mapped to assemblies for which there already exists a track of the type that contains your marker type. You can not, for example, use dbSNP rsID markers for the cow genome, as it does not have a SNP track. Uploading data Once you have created your input file, you must upload it to Genome Graphs. From the main Genome Graphs page, choose the clade, genome, and assembly to which your data pertains. If you are unsure of the UCSC assembly name, check this page. Then, click the upload button to go to the upload page. To upload a file in any of the supported formats, locate the file on your computer using the controls next to "file name", and then submit. The other controls on this form are optional, but can be used to enhance the display. In general, the controls that default to "best guess" will not need modification, since the default guess is almost always correct. The controls for display min and max values and connecting lines may be adjusted later via the configuration page. Here is a description of each control: - name of data set: Displayed in graph drop-down in Genome Graphs and as the track name in Genome Browser. Only the first 16 characters are visible in some contexts. For data sets with multiple graphs, this is the first part of the name, shared with all members of the data set. - description: A short sentence describing the data set. Displayed in the Genome Graphs and Genome Browser configuration pages, and as the center label in the Genome Browser. - file format: Controls whether the upload file is a tab-separated, comma-separated, or space separated table. - markers are: Describes how to map the data to chromosomes. The choices are that either the first column of the file is an ID of some sort, or the first column is a chromosome and the next a base. The IDs can be SNP rs numbers, STS marker names or IDs from any of the supported genotyping platforms. - column labels: Controls whether the first row of the upload file is interpreted as labels or data. If the first row contains text in the numerical fields, or if the mapping fields are empty, it is interpreted by "best guess" as labels. This is generally correct, but you can override this interpretation by explicitly setting the control. - display min value/max value: Set the range of the data set that will be plotted. If left blank, the range will be taken from the min/max values in the data set itself. For all data sets to share the same scale, you will usually need to set this. - label values: A comma-separated list of numbers for the vertical axis. If left blank, the axis will be labeled at the 1/3 and 2/3 points of your data range. - draw connecting lines: Lines are drawn connecting data points that are separated by this number of bases or fewer. - file name, or Paste URLs or data: Specify the uploaded data -- enter either a file on your local computer; or a URL at which the data file can be found; or simply paste-in the data. If entries are made in both fields, the file name will take precedence. Importing data In addition to supplying your own genome-wide data files, you can also import existing database tables from an assembly into the Genome Graphs tool. Any table containing positional information can be imported. This includes tables of the following types: BED, PSL, wiggle, MAF, and bedGraph. Custom track tables can be imported as well. The tables made by Genome Graphs (chromGraph) can not be imported as they are already in the format used by the tool, thus no conversion is necessary. All tables imported into Genome Graphs will be converted into a custom track of type chromGraph using a window-size of 10,000 bases. To import a table or custom track, choose the group, track, and table from the lists, then click the submit button. The other controls are optional, though completing them will enhance the display. The controls for display min and max values and connecting lines can be set later via the configuration page as well. Here is a description of each control. - name of data set: This will be displayed in the graph list in the Genome Graphs tool and as the track name in the Genome Browser. Only the first 16 characters are visible in some contexts. For data sets with multiple graphs, this is the first part of the name, shared with all members of the data set. - description: Enter a short sentence describing the data set. It will be displayed in the Genome Graphs tool and in the Genome Browser. - display min value/max value: Set the range of the data set to be plotted. If left blank, the range will be taken from the min and max values in the data set itself. If you would like all of your data sets to share the same scale, you will need to set this. - label values: A comma-separated list of numbers for the vertical axis. If left blank the axis will be labeled at the 1/3 and 2/3 point. - draw connecting lines: Lines connecting data points separated by no more than this number of bases are drawn. - depth or coverage: When importing positional tables, you can choose to convert those tables to the chromGraph format by using either the depth or coverage conversion method. Both conversion methods use a non-overlapping window size of 10,000 bases when converting to the chromGraph format. In the depth method, the weighted average for each 10,000 base window is assigned to a single point in the center of this window. Whereas the coverage method is binary &mdash if there is even one point in the input table in that 10,000 base window, the resulting graph will have a value of 1 for that range. Quick start Use the examples in this section of the User's Guide to get a feel for how the tool works. Refer to other sections in this User's Guide for details and instructions for more advanced features. The Genome Graphs tool comes pre-loaded with sample data. These sample data sets are from real-world genome-wide studies. Use these data sets to quickly see what the tool looks like when data is displayed. To view the sample data, choose a data set from the graph drop-down list, then choose your desired display color from the in drop-down list. The tool will display the data set directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display. Example #1 — SNPs on chr22 Follow these steps to display in Genome Graphs all of the highest quality SNPs on chromosome 22 for the hg18 assembly whose predicted functional role is "coding non-synonymous" (where there is a change in the peptide for the allele with respect to the reference assembly). Note that there are no SNPs on the p-arm of chromosome 22. This data set is formatted in the "marker value" style. The markers are dbSNP rsIDs. The associated value is +1 if the SNP is on the positive strand, and -1 if the SNP is on the negative strand. Here are the first ten rows of the data file: rs1007298 1 rs1007863 1 rs10154509 1 rs10154678 1 rs10154785 1 rs1018448 1 rs10212022 1 rs1022478 1 rs1042311 1 rs1042435 1 Step 1. Upload the data into the Genome Graphs tool Copy the entire sample data set into a text editor and save the file to your computer. This data set is associated with the human assembly: hg18 (Mar. 2006). Be sure to configure the Genome Graphs tool to use the hg18 assembly like so: clade: Vertebrate genome: Human assembly: Mar. 2006 Upload the file into the Genome Graphs tool. You can configure each control on the upload page, or just leave them set to their default values. The upload process may take some time, as the program is actually mapping each rsID in the input file to its location(s) in the genome. Step 2. Display the graph in Genome Graphs Now that your input file has been uploaded to the server, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose your desired display color from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. You should see the data plotted directly above chromosome 22. Step 3. View the graph in the Genome Browser From the Genome Graphs display, click anywhere on the graph or on chromosome 22 to open the Genome Browser for hg18 centered at that location on chr22. The graph will be drawn as a track near the top of the Genome Browser display. Displaying data in Genome Graphs Once you have uploaded your data, you will want to display it in the Genome Graphs tool. To display your uploaded data, simply choose the graph name from the graph drop-down list, then choose the color in which you would like it to be displayed from the in drop-down list. Your graph will be displayed directly above the chromosomes in Genome Graphs. Read on to learn how to customize the display. Configuring the display Configuring the graphs display To go to the configuration page, click the configure button on the main Genome Graphs page. This is the page from which you can configure many overall aspects of the Genome Graphs display. Individual graphs can also be configured (see the next section for help on that). On this page you will find the following controls: - image width - controls the overall width of the graphs display on the main Genome Graphs page. The default is 620 pixels. - graph height - controls the height of the graph(s) in the space above each chromosome. The default is 27 pixels. - graphs per line - controls how many graphs are displayed on each line in the space above each chromosome. For example, if you set this value to two, the display will superimpose two graphs on top of each other on one line. The axis label for the first graph will appear on the left side of the display and the axis for the second graph on the right side. - lines of graphs - controls how many sets of graphs will appear above each chromosome. For example, if you set this value to 2, the display will make room for two lines of graphs (each at the graph height above) in the space above each chromosome. - chromosome layout - controls how the chromosomes are laid out in the Genome Graphs display. You can choose to view one or two chromosomes on each horizontal line in the display. Alternatively, you can set up the display such that all of the chromosomes appear in one long line. If you choose this layout, you may want to adjust the width of the image (image width above). - numerical labels - check this box if you would like to see axis labels to the right/left of the display. If you did not specify label values when you uploaded your file, the numerical labels will default to 1/3 and 2/3 of the max and min values in your data input file. - highlight missing - check this box if you would like to see the areas in your graph where there is no data. Note that if you are displaying more than one graph, this attribute only pertains to the first graph. - region padding - controls the size of the data regions. The data points in your graphs which exceed the significance threshold are padded by this number of bases on either side. The default places 25,000 bases on each side. When you have completed configuring the display, click the submit button to return to the Genome Graphs display. Configuring individual graphs Near the bottom of the Configuration page, you will see a list of the graphs that you have uploaded. Click on the hyperlinked graph name to configure that graph. This configuration pertains to the Genome Graphs view. You can set the range of the display by editing the display min/max value values. This will restrict the Genome Graphs display for this graph to that data range. The axis will be labeled at 1/3 and 2/3 of the data range that you set. If your data is sparse, you may want to draw lines between your data points. You can configure that by editing the draw connecting lines between markers separated by up to ... bases value. The default value is 25,000,000 bases. When you have completed configuring the display, click the submit button twice to return to the Genome Graphs display. Setting a significance threshold Most genome-wide data has some amount of noise and is only interesting when the data values are above a certain value. You can set this value using the significance threshold input box. Enter a decimal number in this input box and click Enter. The display will now have a light gray line across the graph at this data value. If you have more than one graph displayed, the significance threshold only pertains to the graphs that contain the significance threshold in the displayed data range. The significance threshold works in concert with the browse regions and sort genes buttons; it will affect the regions that are displayed once you click either of these two buttons. To open the Genome Browser with a view of all of the regions in your graph that include data points that pass the significance threshold, click the browse regions button. This will open the Genome Browser with a navigation pane on the left side of the screen. This pane will contain links to all regions which pass your significance threshold. Note that if you are displaying more than one graph, the significant regions are based only on the first graph in the display list. To view a list of genes which are in regions that pass the significance threshold, click the sort genes button. This will open the Gene Sorter with only the genes that are in significant locations with respect to your data. If you would rather view all of your regions without restricting the output to only those regions that pass the significance threshold, simply delete any values from the significance threshold input box and click Enter before clicking browse regions. Setting a data region The data region is the span of bases that will be added to either side of the data points in your graphs which exceed the significance threshold. Set the data region by editing the region padding value on the configuration page. The combination of setting the data region and the significance threshold will affect two things: - the regions displayed in the Genome Browser after you click the browse regions button, - the genes displayed in the Gene Sorter after you click the sort genes button. For example, take a data set that contains the following data: chr2 100100000 2.3 chr2 100100500 4.5 chr2 100101000 1.2 If you set the significance threshold at 4.0, one data point in the data set passes that threshold. If you then set the data range to 200, then the one significant data point will be padded on each side by 200 base pairs. In that case, the only resulting significant data region will be chr2:100,100,300-100,100,700. If instead you set the data range to 2,000, then the one significant data point will be padded on each side by 2,000 base pairs. In that case, the resulting significant data region will be chr2:100,098,500-100,102,500. Viewing data in the Genome Browser To view your graphs in the Genome Browser, click the browse regions button. This will open the Genome Browser with your graph(s) displayed as track(s). You can configure and edit your track as you can any other track in the Genome Browser. In addition to the Genome Browser, you will also see a pane on the left-hand side, which contains links to all of the significant regions in your data. Please note that if you are displaying more than one graph in Genome Graphs, the significant regions are based only on the first graph in the display list. You can also navigate to the Genome Browser by clicking directly on a graph or chromosome in Genome Graphs. The Genome Browser will open with a 1,000,000 bp window centered on the location on which you clicked. Viewing data in the Gene Sorter To view the set of genes that are in significant regions in your data, click the sort genes button. This will open the Gene Sorter with a filter to include only genes that are located in regions in your input data that are above the significance threshold. Please note that if you are displaying more than one graph in Genome Graphs, the significant genes are based only on the first graph in the display list. If the graph was uploaded using markers, then a custom Gene Sorter column with the same name as the graph will be created. This column will list all markers for each gene that contain values above the significance threshold. Deleting data There are several ways to delete your data once it has been uploaded. If you are viewing your data as a track in the Genome Browser, you can click on the mini-button or track control for the track and delete the track using the Remove custom track button. You can also choose to reset your cart which will reset the browser interface settings to their defaults, as well as delete all custom tracks and data. Do this by clicking the "Reset All User Settings" under the top blue Genome Browser menu. Your data will be saved on our server for at least 48 hours from the time you last access it, unless it is saved in a Session. Correlating data sets To calculate how well correlated with one another your data sets are, click the correlate button. This will calculate and display the correlation coefficient (R) among each of your data sets. R, also known as Pearson's correlation coefficient, is a measure of the extent that two graphs move together. The value of R ranges between -1 and 1. A positive R indicates that the graphs tend to move in the same direction, while a negative R indicates that they tend to move in opposite directions. R-Squared (which is indeed just R*R) measures how much of the variation in one graph can be explained by a linear dependence on the other graph. R-Squared ranges between 0 when the two graphs are independent to 1 when the graphs are completely dependent. To return to the Genome Graphs, click the return to graphs button. /goldenPath/help/hgCollectionHelp.html:Track_Collection_Builder_Help Track Collection Builder Help /goldenPath/help/iupac.html:Genome_Browser_IUPAC_Codes IUPAC codes The International Union of Pure and Applied Chemistry (IUPAC) has defined a standard representation of DNA bases by single characters that specify either a single base (e.g. G for guanine, A for adenine) or a set of bases (e.g. R for either G or A). UCSC uses these single character codes to represent multiple observed alleles of single-base polymorphisms. Symbol Bases Origin of designation -------- ------------------ ------------------------------------ G G Guanine A A Adenine T T Thymine C C Cytosine R G or A puRine Y T or C pYrimidine M A or C aMino K G or T Keto S G or C Strong interaction (3 H bonds) W A or T Weak interaction (2 H bonds) H A or C or T not-G, H follows G in the alphabet B G or T or C not-A, B follows A V G or C or A not-T (not-U), V follows U D G or A or T not-C, D follows C N G or A or T or C aNy /goldenPath/help/hubQuickStartFilter.html:Track_Hub_Quick_Start_Filter Track Hub Filters Quick Start Guide Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any UCSC assembly or remotely-hosted sequence. There are different filtering options available for bigBed files depending on the kind of data the filter will be applied to. These filters are also described in the trackDb help doc. Note: for configurable features, like filters, an additional period "." or plus "+" is required in the type declaration, for instance type bigBed 5 . or type bigBed 9 +. Contents filter.fieldName - Used for numerical data filterText.fieldName - Used for text filtering filterValues.fieldName - Used for filtering by prespecified values or categories in data filter.fieldName filter.fieldName is used to enable numerical filtering within a field or column. It is often seen as a filter on data that contains a score field. It requires a default parameter to be passed, often this parameter is 0. By default, the range of values will be 0 to 1000. However, this range can be modified with the filterLimits.fieldName parameter. Additionally, the filter can be modified to take in a range of values with the filterByRange.fieldName on parameter. For more information on filter.fieldName, see the trackDb help doc entry. filter Example 1 In this first example, we have a simple track with 10 items. The data looks as follows: chr7 127000000 127000005 1 1 chr7 127000010 127000015 2 2 chr7 127000020 127000025 3 3 chr7 127000030 127000035 4 4 chr7 127000040 127000045 5 5 chr7 127000050 127000055 6 6 chr7 127000060 127000065 7 7 chr7 127000070 127000075 8 8 chr7 127000080 127000085 9 9 chr7 127000090 127000095 10 10 In this case, we have duplicated the name and score fields for clarity. We will be applying a default filter of 4 to the score field. This will mean that by default only items 4-10 will display. This filter is enabled with the line filter.score 4. The hub.txt looks as follows: track filterScore4 shortLabel filter.fieldNameDefault 4 longLabel Numerical filter with a default value of 4 passed visibility pack type bigBed 5 . filter.score 4 bigDataUrl example1.bb Below are all the materials for example 1: - bed file - bigBed file - .as file - hub.txt file - Example session The example session will display an image like the following, which hides items 1-3 and displays items 4-10. The three filtered items are also noted on the longLabel above the track as (3 items filtered). If we just want to enable filtering, but pass no default value, we can use filter.score 0. [Numerical filter enabled on bigBed] filter Example 2 In this second example we have four tracks with filter.fieldName to allow numerical filtering, filterByRange.fieldName on to enable range filtering, and filterLimits.fieldName to designate upper and lower range boundaries. filter.fieldName will be used on two separate fields (score and name) to demonstrate multiple filters. Lastly, filterLabel.fieldName will be used to change the default filter message to instead "Value range to filter" for the score field. The data used is the same as example 1 above: chr7 127000000 127000005 1 1 chr7 127000010 127000015 2 2 chr7 127000020 127000025 3 3 chr7 127000030 127000035 4 4 chr7 127000040 127000045 5 5 chr7 127000050 127000055 6 6 chr7 127000060 127000065 7 7 chr7 127000070 127000075 8 8 chr7 127000080 127000085 9 9 chr7 127000090 127000095 10 10 The data are organized as a bed5. Standard filtering will be enabled in the 4th name field, and filtering by ranges on the 5th score field. The score field will also be given a custom label with the filterLabel parameter. Here is an example of the first track stanza: track filteringByRangeAllValues shortLabel filteringByRangeDefault longLabel Filter by range enabled with default score including all values visibility pack type bigBed 5 . filter.name 0 filter.score 0:10 filterByRange.score on filterLimits.score 0:10 filterLabel.score Value range to filter bigDataUrl example2.bb Below are all the materials for example 2: - bed file - bigBed file - .as file - hub.txt file - Example session Enabling range filters for the score field as well as the standard filter for the name field, we see the following options in the track description page. Any number of filters can be enabled on a track simultaneously. [] Going to the example session will display a browser image like so: [Range filters enabled on bigBed] Each of the tracks is filtering by a different value, and in the final track two separate filters are enabled. - The first track (5-10) displays only items with scores 5-10. - The second track (3-8) displays only items with scores 3-8. - The third track has two filters. A range filter (3-8) and a second filter (5). Only the values that pass both filters are seen. - The fourth track has a filter inclusive of all values passed(0-10), so all items are displayed. filter Example 3 This third example explores how the numerical filters interact with non-numerical characters. The data is comprised of 10 items as a bed5, with the same coordinates as the examples above, a name field, and an arbitrary score field. The name field contains a mix of numerical and non-numerical characters. chr7 127000000 127000005 0 0 chr7 127000010 127000015 -1 0 chr7 127000020 127000025 2% 0 chr7 127000030 127000035 -3 0 chr7 127000040 127000045 4 0 chr7 127000050 127000055 5n 0 chr7 127000060 127000065 5 0 chr7 127000070 127000075 NA 0 chr7 127000080 127000085 . 0 chr7 127000090 127000095 <> 0 Filtering will be enabled on the 4th name field. The trackDb stanza looks as follows: track filteringNonNumerical shortLabel filteringNonNumerical longLabel Using numerical filters on a field with both numerical and non-numerical values visibility pack type bigBed 5 . filter.name 0 bigDataUrl example3.bb Below are all the materials for example 2: - bed file - bigBed file - .as file - hub.txt file - Example session The filter is being passed on the name field, with a default value of 0. The example session will show which items still display with this default value: [Filtering on non-numerical characters] The only items being filtered are the negative values, -1 and -3. Entirely non-numerical characters are interpreted as 0. If we instead change the filter to be 2, we see the following: [Filtering on non-numerical characters] In this case we see the items that start with non-numerical characters get filtered. Items that start with a number, and are following by another character, are treated as the number. This can be seen with the 2% value, which remains visible with the filter active. It is important to keep in mind that non-numerical values, such as NA, will be visible when the default 0 filter is active, but will be removed when any positive numerical filter is activated. filterText.fieldName filterText.fieldName is used to enable text searching in the specified fieldName. This will display any items passed which match exactly the searched term, or only part of the search term. Two types of searching are supported, wildcard searching (*) or regular expression searching (regexp). The mode between the two types can be freely changed in the track description page, or a default passed using the filterType.fieldName parameter. Lastly, the filter label will be the description of the field as specified by the autoSql (.as) file. This label can be customized with the filterLabel.fieldName parameter. A value can be passed with this setting to enable a specific filter by default. Also, the default search type is wildcard (*). For more information on filterText.fieldName, see the trackDb help doc entry. filterText Example 1 In this first example, we have 10 genes with arbitrary coordinates as follows: chr7 127000000 127000005 EGFR 1 chr7 127000010 127000015 VEGFA 2 chr7 127000020 127000025 APOE 3 chr7 127000030 127000035 IL6 4 chr7 127000040 127000045 TGFBI 5 chr7 127000050 127000055 BRCA1 6 chr7 127000060 127000065 BRCA2 7 chr7 127000070 127000075 MTHFR 8 chr7 127000080 127000085 ESR1 9 chr7 127000090 127000095 AKT1 10 By default, we would like our data to display only BRCA1 and BRCA2. The easiest way to accomplish this is to enable a filterText wildcard filter. We will be filtering on the name field using the following setting: filterText.name BRCA* The hub.txt looks as follows: track filterTextDefaultBRCA shortLabel filterTextBRCA longLabel Wildcard filterText with default BRCA value visibility pack type bigBed 5 . filterText.name BRCA* bigDataUrl filterTextExample1.bb Below are all the materials for example 1: - bed file - bigBed file - .as file - hub.txt file - Example session The example session will display an image like the following, showing only the BRCA items. The eight filtered items are also noted on the longLabel above the track as (8 items filtered). If we just want to enable filtering, but pass no default value, we can use filterText.name *. [filterText filter enabled on bigBed] We can also change the filter type or filter value (or remove values) by going to the track description page, which will show the following: [filterText filter track description page] Changing the filter from BRCA* to *A* would expand the wildcard match to all items with A. See the trackDb help doc entry for additional information including an example using regexp. For instance, with wildcard changed to a regexp type of search, putting in .*A\|B.* will match any items with an A or B in it, while .*[0-9] will match any item ending in a number. filterValues.fieldName filterValues.fieldName is used to enable filtering by pre-specified values within a field. It can be used on fields that can contain one text value or a list of comma-separated values of text, like "classA,classB". Usually, these are category names. The option requires at least one value to filter on. Every individual possible value that can ever occur in the field must be passed in a comma separated list. You will then be able to select those values as categories, choosing to display only items that belong to one, any, or at least one of the selected values. By default, the user can select multiple values from this list and the filter lets pass any features with at least one of these values (multipleListOr). The type of selection can be designated by passing the optional filterType.fieldName parameter. Possible options are: - single - Allows selection of a single item from dropdown menu. - singleList - Allows selection of a single item from dropdown menu. Accepts comma-separated values in the bigBed field. - multiple - Allows selection of any number of items from dropdown menu. Shows items that contain at least one of the selections. - multipleListOr - Allows selection of any number of items from dropdown menu. Shows items that contain at least one of the selections. Accepts comma-separated values in the bigBed field. Enables the radio button that allows swapping of filter type in the track description page. - multipleListOnlyOr - Same as multipleListOr except it disables the option to change the filter type from the browser interface. - multipleListAnd - Allows selection of any number of items from dropdown menu. Displays only items containing all of the selections. Accepts comma-separated values in the bigBed field. Enables the radio button that allows swapping of filter type in the track description page. - multipleListOnlyAnd - Same as multipleListAnd except it disables the option to change the filter type from the browser interface. As with other filters, default values can be passed to filterValues using the filterValuesDefault.fieldName parameter. It can take a comma-separated list just like filterValues.fieldName, and any items included will be automatically selected. The labels in the menu shown to the user can be configured to display a different name/label than the one present in the bigBed field. This can be helpful when the data values are written in short form, but you want a longer more descriptive name to show up in the UI. The format for this substitution is as follows: filterValues.fieldName fieldValue1|alternativeName1,fieldValue2|alternativeName2... In this example, a bigBed could have value AML in a field called Disease, but we would like the menu to display Acute Myeloid Leukemia so the line could be filterValues.Disease AML|Acute Myeloid Leukemia,MSC|Melanoma Skin Cancer... where the filter display would have the full disease names, while the data instead in the bigBed was the abbreviation. This can also be used to reduce the size of bigBed files. The following session contains a hub with example tracks of all the possible filterValues settings. It can be used to explore the differences and restrictions of each of the settings. filterValues Example 1 In this first example, we have a small track with 10 items. The score and names of the items are the same, and there is an additional field added which labels the score as either an even or odd number, making our file a bigBed 5+1: chr7 127000000 127000005 1 1 odd chr7 127000010 127000015 2 2 even chr7 127000020 127000025 3 3 odd chr7 127000030 127000035 4 4 even chr7 127000040 127000045 5 5 odd chr7 127000050 127000055 6 6 even chr7 127000060 127000065 7 7 odd chr7 127000070 127000075 8 8 even chr7 127000080 127000085 9 9 odd chr7 127000090 127000095 10 10 even We wish to add a filter which allows selection on whether the item is even or odd. This is enabled by the line filterValues.OddEven odd,even. Remember that all possible values need to be listed. We also wish for only a single selection to be possible, which we will do with the following setting filterType.OddEven singleList. Below is the track stanza in the hub.txt: track filteValuesOddEven shortLabel filterValues.OddEven odd,even longLabel filterValues categorical filter on odd and even values visibility pack type bigBed 5 + 1 filterValues.OddEven odd,even filterType.OddEven singleList bigDataUrl filterValuesExample1.bb Below are all the materials for this example: - bed file - bigBed file - .as file - hub.txt file - Example session filterValues Example 2 In this second example, we have signal data on 10 items, and want to enable filtering on any number of annotation type. Note that the annotation type is written in shorthand: chr7 127000000 127000005 signal1 0 DNA-BR,AH chr7 127000010 127000015 signal2 0 AS,BS chr7 127000020 127000025 signal3 0 BS chr7 127000030 127000035 signal4 0 BS chr7 127000040 127000045 signal5 0 DNA-BR chr7 127000050 127000055 signal6 0 DNA-BR,BS chr7 127000060 127000065 signal7 0 AS chr7 127000070 127000075 signal8 0 BS chr7 127000080 127000085 signal9 0 AS,AH chr7 127000090 127000095 signal10 0 DNA-BR,BS In this case, the annotations (the sixth annotationType column) represent the following: - DNA-BR - DNA-binding region - AS - active site - AH - alpha helix - BS - beta strand Some signals contain more than one type of annotation. We will enable the categorical filter using the filterValues.annotationType setting, including all the possible values. We will also substitute the complete annotation type names by using the pipe "|" character. We work with researchers most interested in DNA-binding regions that are also beta strands, so we will want the default behavior to display only those items. For this we will use filterType.annotationType multipleListAnd to require matching on all selections, and filterValuesDefault.annotationType DNA-BR,BS to pass the default values. Note that we do not use the type multipleListOnlyAnd, as that would not allow users to change selection type. Below is the complete track stanza of the hub.txt file: track filteValuesAnnotationType shortLabel filterValues.annotationType longLabel filterValues categorical filter on multiple annotation types visibility pack type bigBed 5 + 1 filterValues.annotationType DNA-BR|DNA-binding region,AS|active site,AH|alpha helix,BS|beta strand filterType.annotationType multipleListAnd filterValuesDefault.annotationType DNA-BR,BS bigDataUrl filterValuesExample2.bb Below are all the materials for this example: - bed file - bigBed file - .as file - hub.txt file - Example session Going to the example session will result in a browser image as such: [filterValues filter enabled on bigBed] Note that only signal6 and signal10 are displayed by default. That is because they are the only two items that have both DNA-BR (DNA-binding region) and BS (beta strand) as their type of annotation. If we click into the track, we see additional filter options. In the image below we have switched the filter to one or more match, meaning that if the items contain any of the selected items they will display. We have also expanded the selection to include AH (alpha-helix). Lastly, note that the menu displays the alternative long names instead of the shorthand: [filterValues filter enabled on bigBed] This will result in all items being displayed except for signal 7. This is because it is the only item that contains none of the selected categories, with its only annotation being AS (active site): [filterValues filter enabled on bigBed] Additional Resources - Track Hub User Guide - Guide To useOneFile setting - Search file .ix documentation - Mailing list question with searchable Track Hub - Mailing list question with searchable Custom Tracks - Track Database (trackDb) searchTrix Definition - Quick Start Guide to Organizing Track Hubs into Groupings - Quick Start Guide to Assembly Track Hubs /goldenPath/help/bigBed.html:Genome_Browser_bigBed_Track_Format bigBed Track Format The bigBed format stores annotation items that can be either a simple or a linked collection of exons, much as BED files do. BigBed files are created from BED type files using the program bedToBigBed. The resulting bigBed files are in an indexed binary format. The main advantage of the bigBed files is that only those portions of the files needed to display a particular region are transferred to the Genome Browser server. Because of this, bigBed has considerably faster display performance than regular BED when working with large data sets. The bigBed file remains on your local web-accessible server (http, https, or ftp), not on the UCSC server, and only the portion that is needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigBed files, please see the Hosting section of the Track Hub Help documentation. Additional indices can be created for the items in a bigBed file to support item search in track hubs. See Example #3 below for an example of how to build an additional index. See this wiki page for help in selecting the graphing track data format that is most appropriate for your type of data. To see an example of turning a text-based bedDetail custom track into the bigBed format, see this How to make a bigBed file blog post. Note that the bedToBigBed utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file. Quickstart example commands It is not hard to create a bigBed file. The following UNIX commands create one on a Linux machine (swap macOSX for linux for an Apple environment). The steps are explained in more detail in the following sections on this page: wget https://genome.ucsc.edu/goldenPath/help/examples/bedExample.txt wget https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed chmod a+x bedToBigBed ./bedToBigBed bedExample.txt hg19.chrom.sizes myBigBed.bb mv myBigBed.bb ~/public_html/ The last step assumes that your ~/public_html/ directory is accessible from the internet. This may not be the case on your server. You may have to copy the file to another server and web-accessible location at your University. Once you know the URL to the file myBigBed.bb, you can paste this URL into the custom track box on the UCSC Genome Browser to display the file. Creating a bigBed track To create a bigBed track, follow these steps (for concrete Unix commands, see the examples below on this page): Step 1. Create a BED format file following the directions here. When converting a BED file to a bigBed file, you are limited to one track of data in your input file; therefore, you must create a separate BED file for each data track. If your BED file was originally a custom track, remove any existing "track" or "browser" lines from your BED file so that it contains only data. Your file does not need to be sorted by chromosome name, but all entries for a single chromosome must be together and sorted by chromosome start position. If you're not sure if this is true for your BED file, it may be easiest to sort the file using the "-sort" option for bedToBigBed. Finally, if your BED files are large, they can be compressed using gzip (e.g. myTrack.bed.gz) and still read by bedToBigBed. Step 2. Download the bedToBigBed program from the binary utilities directory. Example #2 below shows the exact Unix command. The bedToBigBed program can be run with several additional options. Some of these, such as the -as and -type options, are used in examples below. The -type option, describes the size of the bigBed file, -type=bedN[+[P]], where N is an integer between 3 and 12 and the optional +[P] parameter specifies the number of extra fields, not required, but preferred. Describing the size of the bigBed file is needed for access to extra fields like name, itemRgb, etc. Examples:-type=bed6 or -type=bed6+ or -type=bed6+3 . For a full list of the available options, type bedToBigBed (with no arguments) on the command line to display the usage message. Step 3. Use the fetchChromSizes script from the same directory to create the chrom.sizes file for the UCSC database you are working with (e.g., hg19). If the assembly genNom is hosted by UCSC, chrom.sizes can be a URL like: http://hgdownload.soe.ucsc.edu/goldenPath/genNom/bigZips/genNom.chrom.sizes. Step 4. Use the bedToBigBed utility to create a bigBed file from your sorted BED file, using the input.bed file and chrom.sizes files created in Steps 1 and 3: bedToBigBed input.bed chrom.sizes myBigBed.bb The chrom.sizes file can also be a 2bit or a chromAlias bigBed file using the following command-line arguments: -sizesIs2Bit -- If set, the chrom.sizes file is assumed to be a 2bit file. -sizesIsChromAliasBb -- If set, then chrom.sizes file is assumed to be a chromAlias bigBed file or a URL to a such a file Step 5. Move the newly created bigBed file (myBigBed.bb) to a web-accessible http, https, or ftp location. At this point you should have a URL to your data, such as "https://institution.edu/myBigBed.bb", and the file should be accessible outside of your institution/hosting providers network. For more information on where to host your data, please see the Hosting section of the Track Hub Help documentation. Step 6. If the file name ends with a .bigBed or .bb suffix, you can paste the URL of the file directly into the custom track management page, click "submit" and view the file as a track in the Genome Browser. By default, the file name will be used to name the track. To configure the track name and descriptions, you must create a "track line", as shown in Example 1 Configuration Step 1. Alternatively, if you want to set the track labels and other options yourself, construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this: track type=bigBed name="My Big Bed" description="A Graph of Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigBed.bb Paste this custom track line into the text box on the custom track management page. Examples Example #1: Load an existing bigBed file In this example, you will load an existing bigBed file, bigBedExample.bb, on the UCSC http server. This file contains data on chromosome 21 on the human hg19 assembly. To create a custom track using this bigBed file: 1. Paste the URL http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb into the custom track management page for the human assembly hg19 (Feb. 2009). 2. Click the "submit" button. 3. On the next page that displays, click the "go" link. To view the data in the bigBed track in the Genome Browser navigate to chr21:33,031,597-33,041,570. Configuration You can customize the track display by including track and browser lines that define certain parameters: 1. Construct a track line that references the bigBedExample.bb file: track type=bigBed name="bigBed Example One" description="A bigBed file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb 2. Paste the track line into the custom track management page, click the "submit" button. On the next page that displays, click the "go" link. To view the data in the bigBed track in the Genome Browser navigate to chr21:33,031,597-33,041,570. 3. With the addition of the following browser line with the track line you can ensure that the custom track opens at the correct position when you paste in the information: browser position chr21:33,031,597-33,041,570 track type=bigBed name="bigBed Example One" description="A bigBed file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb Paste the browser and track lines into the custom track page, click the "submit" button and the "go" link to see the data. Example #2: Create a bigBed file from a BED file In this example, you will convert a sample BED file to bigBed format. 1. Save the BED file bedExample.txt to a server, ideally one that is accessible from the internet. (Steps 1 and 2 in Creating a bigBed track, above). wget https://genome.ucsc.edu/goldenPath/help/examples/bedExample.txt 2. Save the file hg19.chrom.sizes to your computer. It contains the chrom.sizes data for the human (hg19) assembly (Step 3, above). wget https://genome.ucsc.edu/goldenPath/help/hg19.chrom.sizes 3. If you use your own file, it has to be sorted, first on the chrom field, and secondarily on the chromStart field. You can use the utility bedSort available here or the following UNIX sort command to do this: sort -k1,1 -k2,2n unsorted.bed > input.bed 4. Download the bedToBigBed utility (Step 2, above). Replace "linux" below with "macOSX" if your server is a Mac. wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed chmod a+x bedToBigBed 5. Run the utility to create the bigBed output file (Step 4, above): ./bedToBigBed bedExample.txt hg19.chrom.sizes myBigBed.bb 6. Place the bigBed file you just created (myBigBed.bb) on a web-accessible server (Step 5, above). mv myBigBed.bb ~/public_html/ At some Universities, this involves using the commands ftp, scp or rsync to copy the file to a different server, one that is accessible from the internet. We have documentation how to find such a server. 7. Paste the URL itself into the Custom Tracks entry form or construct a track line that points to your bigBed file (Step 5, above). 8. Create the custom track on the human assembly hg19 (Feb. 2009), and view it in the Genome Browser (Step 6, above). Note that the original BED file contains data on chromosome 21 only. Example #3: Create a bigBed file with extra (custom) fields BigBed files can store extra fields in addition to the predefined BED fields. In this example, you will create your own bigBed file from a fully featured existing BED file that contains the standard BED fields up to and including the color field called itemRgb (field 9), plus two additional non-standard fields (two alternate names for each item in the file). The standard BED column itemRgb contains an R,G,B color value (e.g. "255,0,0"). The resulting bigBed file will have nine standard BED columns and two additional non-standard user-defined columns. If you add extra fields to your bigBed file, you must include an AutoSql format (.as) file describing the fields. In this file, all fields (standard and non-standard) are described with a short internal name and also a human-readable description. For more information on AutoSql, see Kent and Brumbaugh, 2002, as well as examples of .as files in this directory. Then, the bedToBigBed program is run with the arguments -type=bed9+2 and also -as=bedExample2.as to help correctly interpret all the columns in the data. This example also demonstrates how to create an extra search index on the name field, and the first of the extra fields to be used for track item search. The searchIndex setting requires the input BED data to be case-sensitive sorted (sort -k1,1 -k2,2n), where newer versions of the tool bedToBigBed (available here) are enhanced to catch improper input. 1. Save the BED file bedExample2.bed to your computer (Steps 1 and 2 in Creating a bigBed track, above). 2. Save the file hg18.chrom.sizes to your computer. This file contains the chrom.sizes for the human (hg18) assembly (Step 4, above). 3. Save the AutoSql file bedExample2.as to your computer. This file contains descriptions of the BED fields, and is required when the BED file contains a color field. 4. Download the bedToBigBed utility (Step 3, above). 5. Run the utility to create a bigBed output file with an index on the name field and the first extra field: (Step 5, above): bedToBigBed -as=bedExample2.as -type=bed9+2 -extraIndex=name,geneSymbol bedExample2.bed hg18.chrom.sizes myBigBed2.bb 6. Paste the URL of the file into the custom tracks entry form, or alternatively construct a track line that points to your bigBed file (Step 7, above). Because this bigBed file includes a field for color, you must include the itemRgb attribute in the track line. It will look somewhat similar to this (note that you must insert the URL specific to your own bigBed file): track type=bigBed name="bigBed Example Three" description="A bigBed File with Color and two Extra Fields" itemRgb="On" bigDataUrl=http://yourWebAddress/myBigBed2.bb 7. Create the custom track on the human assembly hg18 (Mar. 2006), and view it in the Genome Browser (step 8, above). Note that the original BED file contains data on chromosome 7 only. 8. If you are using the bigBed file in a track hub, you can use the additional indices for track item searches. See the setting "searchIndex" in the Track Database Definition Document for more information. For example, if you run the bedToBigBed utility with the option -extraIndex=name, you will be able to search on the "name" field by adding the line searchIndex name to the stanza about your bigBed in the hub's trackDb.txt file. While searchIndex expects a search string with an exact match in the index, another setting for Track Hubs, searchTrix allows for a fast look-up of free text associated with a list of identifiers, when a searchIndex has also been created. See a Searchable Track Hub Quick Start Guide here. 9. Extra fields can contain text for labels or for display with mouseover (if the BED "name" field is needed for something that is not the label). See the trackDb settings "mouseOverField" and "labelField" for more information. 10. When you click on features, the contents of all extra fields are shown as a table. You can modify the layout of the resulting page with the trackDb settings "skipFields", "sepFields" and "skipEmptyFields", and transform text fields into links with the "urls" trackDb setting. 11. Extra fields that start with the character "_" are reserved for internal use (special display code); their contents are not shown on the details page. Sharing Your Data with Others If you would like to share your bigBed data track with a colleague, the best solution is to save your current view as a stable Genome Browser Session Link. This will save the position and all settings that you made, all track visibilities, filters, highlights, etc. If you want to create URLs to your bigBed file programmatically from software, look at Example #6 on this page. Extracting Data from the bigBed Format Because the bigBed files are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory: - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. These programs accept either file names or URLs to files as input. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) on the command line to view the usage statement. Troubleshooting If you get an error when you run the bedToBigBed program, check your input BED file for data coordinates that extend past the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input BED file before using the bedToBigBed program. /goldenPath/help/assemblyHubGuidelines.html:Public_Hub_Guidelines Assembly Please note, if you are working with a genome that has already been submitted to the NCBI Assembly system, it may already be available in the UCSC Genome Browser. Please examine the GenArk Assembly Hub collection to see if your genome of interest is already available. In the case it cannot be found there, you can use the UCSC Assembly Request page to request a genome assembly be added to the UCSC Genome Browser. Contents Overview Web Server Linking to Your Assembly Hub - hub.txt - genomes.txt - 2bit File - groups.txt Building Tracks - Cyotoband Track Assembly Hub Resources - G-OnRamp - MakeHub - Example NCBI Assembly Hubs - Example Loading African Bush Elephant Assembly Hub and Looking at the Related genomes.txt and trackDb.txt Adding BLAT Servers - Configuring Assembly Hubs to Use a Dedicated gfServer - Troubleshooting BLAT Servers - Process Check - Check for Correct Path/Filename - Check "gfServer Status" Check - Testing with gfClient - Configuring Assembly Hubs to Use a Dynamic gfServer - Check gfServer Status for Dynamic Servers Overview The Assembly Hub function allows you to display your novel genome sequence using the UCSC Genome Browser. Web Server To display your novel genome sequence, use a web server at your institution (or free services like Cyverse), for usage behind a firewall you can also load them locally through docker to supply your files to the UCSC Genome Browser. Note that hosting hub files on HTTP is highly recommended and much more efficient than FTP. You then establish a hierarchy of directories and files to host your novel genome sequence. For example: myHub/ - directory to organize your files on this hub hub.txt - primary reference text file to define the hub, refers to: genomes.txt - definitions for each genome assembly on this hub newOrg1/ - directory of files for this specific genome assembly newOrg1.2bit - ‘2bit’ file constructed from your fasta sequence description.html - information about this assembly for users trackDb.txt - definitions for tracks on this genome assembly groups.txt - definitions for track groups on this assembly bigWig and bigBed files - data for tracks on this assembly external track hub data tracks The URL to reference this hub would be: http://yourLab.yourInstitution.edu/myHub/hub.txt Note: there is now a useOneFile on hub setting that allows the hub properties to be specified in a single file. More information about this setting can be found on the Genome Browser User Guide. You can view a working example hierarchy of files at: Plants A smaller slice of this hub is represented in a Quick Start Guide to Assembly Hubs. Linking to Your Assembly Hub You can build direct links to the genome(s) in your assembly hub: - The hub connect page: http://genome.ucsc.edu/cgi-bin/hgHubConnect?hgHub_do_redirect=on&hgHubConnect.remakeTrackHub=on&hgHub_do_firstDb=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt - The genome gateway page: http://genome.ucsc.edu/cgi-bin/hgGateway?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt - Directly to the genome browser: http://genome.ucsc.edu/cgi-bin/hgTracks?genome=araTha1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubAssembly/plantAraTha1/hub.txt hub.txt The initial file hub.txt is the primary URL reference for your assembly hub. The format of the file: hub hubName shortLabel genome longLabel Comment describing this hub contents genomesFile genomes.txt email contactEmail@institution.edu descriptionUrl aboutHub.html shortLabel is the name that will appear in the genome pull-down menu at the UCSC gateway page. Example: Plants. genomesFile is a reference to the next definition file in this chain that will describe the assemblies and tracks available at this hub. Typically genomes.txt is at the same directory level as this hub.txt, however it can also be a relative path reference to a different directory level. The email address provides users a contact point for queries related to this assembly hub. The descriptionUrl provides a relative path or URL link to a webpage describing the overall hub. genomes.txt The genomes.txt file provides the references to the genome assemblies and tracks available at this assembly hub. The example file indicates the typical contents: genome ricCom1 trackDb ricCom1/trackDb.txt groups ricCom1/groups.txt description July 2011 Castor bean twoBitPath ricCom1/ricCom1.2bit organism Ricinus communis defaultPos E09R7372:1000000-2000000 orderKey 4800 scientificName Ricinus communis htmlPath ricCom1/description.html transBlat yourLab.yourInstitution.edu 17777 blat yourLab.yourInstitution.edu 17777 isPcr yourLab.yourInstitution.edu 17779 There can be multiple assembly definitions in this single file. Separate these stanzas with blank lines. The references to other files are relative path references. In this example there is a sub-directory here called ricCom1 which contains the files for this specific assembly. - The genome name is the equivalent to the UCSC database name. The genome browser displays this database name in title pages in the genome browser. - The trackDb refers to a file which defines the tracks to place on this genome assembly. The format of this file is described in the Track Hub help reference documentation. - The groups refers to a file which defines the track groups on this genome browser. Track groups are the sections of related tracks grouped together under the primary genome browser graphics display image. - The description will be displayed for user information on the gateway page and most title pages of this genome assembly browser. It is the name displayed in the assembly pull-down menu on the browser gateway page. - The twoBitPath refers to the .2bit file containing the sequence for this assembly. Typically this file is constructed from the original fasta files for the sequence using the kent program faToTwoBit. This line can also point to a URL, for example, if you are duplicating an existing Assembly Hub, you can use the original hub's 2bit file's URL location here. - The organism string is displayed along with the description on most title pages in the genome browser. Adjust your names in organism and description until they are appropriate. This example is very close to what the genome browser normally displays. This organism name is the name that appears in the genome pull-down menu on the browser gateway page. - The defaultPos specifies the default position the genome browser will open when a user first views this assembly. This is usually selected to highlight a popular gene or region of interest in the genome assembly. - The orderKey is used with other genome definitions at this hub to order the pull-down menu ordering the genome pull-down menu. - The htmlPath refers to an html file that is used on the gateway page to display information about the assembly. - The transBlat, blat, and isPcr entries refer to different configurations of the gfServer that enhance search capabilities for amino acids, BLAT algorithms, and PCR respectively. More here. Note that it is strongly encouraged to give each of your genomes stanza's a line for defaultPos, scientificName, organism, description (along with other above settings) so that when your hub is attached it will load a specified default location and have text to be more easily searched from the Gateway page. 2bit File The .2bit file is constructed from the fasta sequence for the assembly. The kent source program faToTwoBit is used to construct this file. Download the program from the downloads section of the Browser. For example: faToTwoBit ricCom1.fa ricCom1.2bit Use the twoBitInfo to verify the sequences in this assembly and create a chrom.sizes file which is not used in the hub, but is useful in later processing to construct the big* files: twoBitInfo ricCom1.2bit stdout | sort -k2rn > ricCom1.chrom.sizes The .2bit commands can function with the .2bit file at a URL: twoBitInfo -udcDir=http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit stdout | sort -k2nr > ricCom1.chrom.sizes Sequence can be extracted from the .2bit file with the twoBitToFa command, for example: twoBitToFa -seq=chrCp -udcDir=http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit stdout > ricCom1.chrCp.fa groups.txt The groups.txt file defines the grouping of track controls under the primary genome browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser. Each group is defined, for example the Mapping group: name map label Mapping priority 2 defaultIsClosed 0 - The name is used in the trackDb.txt track definition group, to assign a particular track to this group. - The label is displayed on the genome browser as the title of this group of track controls. - The priority orders this track group with the other track groups. - The defaultIsClosed determines if this track group is expanded or closed by default. Values to use are 0 or 1. Building Tracks Tracks are defined in the trackDb.txt where each stanza describes how tracks are displayed (shortLabel/longLabel/color/visibility) and other information such as what group the track should belong to (referencing the groups.txt) and if any additional html should display when one clicks into the track or a track item: track gap_ longLabel Gap shortLabel Gap priority 11 visibility dense color 0,0,0 bigDataUrl bbi/ricCom1.gap.bb type bigBed 4 group map html ../trackDescriptions/gap For more informations about the syntax of the trackDb.txt file, use UCSC's Hub Track Database Definition page. It helps to have a cluster super computer to process the genomes to construct tracks. It can be done for small genomes on single computers that have multiple cores. The process for each track is unique. Please note the continuing document: Browser Track Construction for a discussion of constructing tracks for your assembly hub. Cytoband Track Assembly hubs can have a Cytoband track that can allow for quicker navigation of individual chromosomes and display banding pattern information if known. A quick version of the track can be built using the existing chrom.sizes files for your assembly (the banding options include gneg, gpos25, gpos50, gpos75, gpos100, acen, gvar, or stalk): cat araTha1.chrom.sizes | sort -k1,1 -k2,2n | awk '{print $1,0,$2,$1,"gneg"}' > cytoBandIdeo.bed The resulting bed file can be turned into a big bed and given a .as file (example here) to inform the browser it is not a normal bed. bedToBigBed -type=bed4 cytoBandIdeo.bed -as=cytoBand.as araTha1.chrom.sizes cytoBandIdeo.bigBed In the trackDb, as long as the track is named cytoBandIdeo (track cytoBandIdeo example) it will load in the assembly hub. Assembly Hub Resources There are resources for automatically building assembly hubs available from G-OnRamp and MakeHub. There is also a collection of Example NCBI assembly hubs that are already working and can either be used or copied as a template to build further hubs. G-OnRamp G-OnRamp is a Galaxy workflow that turns a genome assembly and RNA-Seq data into a Genome Browser with multiple evidence tracks. Because G-OnRamp is based on the Galaxy platform, developing some familiarity with the key concepts and functionalities of Galaxy would be beneficial prior to using G-OnRamp. Here is a link to their instruction page that gives an overview of their process. MakeHub MakeHub is a command line tool for the fully automatic generation of track data hubs for visualizing genomes with the UCSC genome browser. More information can be found on their GitHub page. Example loading African bush elephant assembly hub and looking at the related genomes.txt and trackDb.txt Here are some quick steps to load an example hub from this collection, and an attempt to explain how to look at the files behind the hub. 1. Click the above Vertebrate Mammalian assembly hub link. 2. Scroll down and find the "common name" column and click the hyperlink for "African bush elephant" after looking at the other information on that row. 3. Note that you have arrived at a gateway page that has "African bush elephant Genome Browser - GCA_000001905.1_Loxafr3.0" displayed, where you can see a "Download files for this assembly hub:" section if you desired to access these specific files and notably a link. 4. Click "Go" or the top "Genome Browser" blue bar menu to arrive at viewing this assembly hub (note this is on our genome-test site). 5. To load this hub on our public site, at the earlier step you can copy the hyperlink for "African bush elephant" and paste it in a browser and change the very first "http://genome-test.gi.ucsc.edu/gbdb/..." to "http://genome.ucsc.edu/cgi-bin/..." instead. Now to investigate the files behind the hub to understand the process involved: 1. Click the link found in the "Download files for this assembly hub:" section on a loaded assembly hub's gateway page. 2. Note the "GCA_000001905.1_Loxafr3.0.ncbi2bit" file, this is the binary indexed remote file that is allowing the Browser to display this genome. 3. Find the "GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt" file and click the link to look at it. 4. Review this genomes.txt file, which defines each track in a new hub to show where to find the above 2bit on the "twoBitPath" line and also defines where to find all track database to display data on this genome in the "trackDb" line (the real genomes.txt for this massive hub is up one directory as this hub has 204 assemblies - where you will find this stanza included). 5. From the earlier link to all the files, click the GCA_000001905.1_Loxafr3.0.trackDb.ncbi.txt link. 6. Review this trackDb.txt file which defines the tracks to display on this hub, and also has "bigDataUrl" lines to tell the Browser where to find the data to display for each track, as well as other features such on some tracks as "searchIndex" and "searchTrix" lines to help support finding data in the hub and "url" and "urlLabel" lines on some tracks to help create links out on items in the hub to other external resources and "html" lines to a file that will have information to display about the data for users who click into tracks. Adding BLAT servers BLAT servers (gfServer) are configured as either dedicated or dynamic servers. Dedicated BLAT serves index a genome when started and remain running in memory to quickly respond to request. Dynamic BLAT servers pre-index genomes to files and are run on demand to handle a BLAT request and then exit. Dedicated gfServer are easier to configure and faster to respond. However, the server continually uses memory. A dynamic gfServer is more appropriate with multiple assemblies and infrequent use. Their response time is usually acceptable; however, it varies with the speed of the disk containing the index. With repeated access, the operating system will cache the indexes in memory, improving response time. Configuring assembly hubs to use a dedicated gfServer By running your own BLAT server, you can add lines to the genomes.txt file of your assembly hub to enable the browser to access the server and activate blat searches. Please see Running your own gfServer for details on installing and configuring both dedicated and dynamic gfServers. - Next edit your genomes.txt stanza that references yourAssembly to have two lines to inform the browser of where the blat servers are located and what ports to use. See an example of commented out lines here. Please note the capital "B" in transBlat. transBlat yourServer.yourInstitution.edu 17777 blat yourServer.yourInstitution.edu 17779 isPcr yourServer.yourInstitution.edu 17779 - You should now be able to load and perform blat and PCR operations on your assembly. For example, a URL such as the following would bring up the blat CGI and have your assembly listed at the bottom of the "Genome:" drop-down menu: http://genome.ucsc.edu/cgi-bin/hgBlat?hubUrl=http://yourServer.yourInstitution.edu/myHub/hub.txt. Also note the separate isPcr line provides the option to use a different gfServer than the blat host if desired. - Some institutions have firewalls that will prevent the browser from sending multiple inquiries to your blat servers, in which case you may need to request your admins add this IP range as exceptions that are not limited: 128.114.119.* That will cover the U.S. genome.ucsc.edu site. In case you may wish the requests to work from our European Mirror genome-euro.ucsc.edu site, you would want to include 129.70.40.120 also to the exception list. Please see more about configuring your blat gfServer to replicate the UCSC Browser's settings, which will also have information about optimizing PCR results. The Source Downloads page offers access to utilities with pre-compiled binaries such as gfserver found in a blat/ directory for your machine type here and further blat documentation here, and the gfServer usage statement for further options. Please also know you can set up gfservers on docker and run it locally. Note: You can stop your instance of gfServer with a command. For example: gfServer stop localhost 17860 Troubleshooting BLAT servers You can see this error if you have the translatedBlat / nucleotideBlat port numbers the wrong way around: Expecting 6 words from server got 2 The following is an example of an error message when attempting to run a DNA sequence query via the web-based BLAT tool after loading a hub, after starting a gfServer instance (from the same dir as the 2bit file). For example, a command to start an instance of gfServer: gfServer start localhost 17779 -stepSize=5 contigsRenamed.2bit & Example of a possible error message, from web-based BLAT after attempting a web-based BLAT query: Error in TCP non-blocking connect() 111 - Connection refused Operation now in progress Sorry, the BLAT/iPCR server seems to be down. Please try again later. Check the following: 1.) Process check First, make sure your gfServer instance is running. Type the following command to check for your running gfServer process: ps aux | grep gfServer 2.) Check for correct path/filename In your genomes.txt file, does your twoBitPath/filename match what you specified in your command to start gfServer? In your genomes.txt file, is the location of the instance to your gfServer correct? To check this, you can cd into the directory where you started your gfServer, then type the command: hostname -i Your result should be an IP address, for example, '132.249.245.79'. Now you can test the connection to your port that you specified, with a simple telnet command. Type in the following command: telnet yourIP yourPort. For example: telnet 132.249.245.79 17777 The results should read, "Connected to 132.249.245.79". Otherwise, if gfServer isn't running or if you typed the wrong location in your telnet command, telnet will say, "Connection refused." In this example, check your genomes.txt file, and make sure your blat line reads, "blat 132.249.245.79 17777". You may need to change your genomes.txt file from, for example, "blat localhost 17777" to "blat 132.249.245.79 17777" (use your specific IP/host name where gfServer is running). 3.) Check "gfServer status" check To request status from the gfServer process, run: gfServer status yourLocation yourPort. For example: $ gfServer status 132.249.245.79 17777 You should see output like this: version 36x2 type nucleotide host localhost port 17777 tileSize 11 stepSize 5 minMatch 2 pcr requests 0 blat requests 0 bases 0 misses 0 noSig 1 trimmed 0 warnings 0 4.) Testing with gfClient The best troubleshooting test is to take the webpage out of the equation, and use the command line utility, gfClient, to run the query on your instance of gfServer. If you can successfully connect gfClient to gfServer, you will know that your location and port specification are correct. From the directory that holds your hub's .2bit file (should be the same directory where your instance of gfServer was launched), perform a query using gfClient: You can type "gfClient" on your command line to see the usage statement. Use the following command: gfClient yourLocation yourPort pathOf2bitFile yourFastaQuery.fa nameOfOutputFile.psl FYI: For testing with gfClient, you only need the gfServer binary on your server, not blat. For example: gfClient localhost 17777 . query.fa gfOutput.psl Note the "." after the port, to specify that the query will use the .2bit file in the current directory. After running this command, take a look at the gfOutput.psl file. If successful, you will see BLAT results. Another example: Note: In the example below, "yourServer.yourInstitution.edu" is the name of their machine where you run the gfServer command. From the test machine: Test the DNA alignment, where test.fa is some sequence to find: gfClient yourServer.yourInstitution.edu 17779 `pwd` test.fa dnaTestOut.psl From the test machine: Test the protein alignment, where proteinSequence.fa is the sequence to find: gfClient -t=dnaX -q=prot yourServer.yourInstitution.edu 17779 `pwd` proteinSequence.fa proteinOut.psl - NOTE: the yourAssembly.2bit file needs to be on this test machine also. - The pwd says to find the yourAssembly.2bit file in this directory. Configuring assembly hubs to use a dynamic gfServer A dynamic BLAT server is specified with the "dynamic" argument to the blat, transBlat, isPcr definitions in the hub genomes.txt file, followed by the gfServer root-relative path of the directory containing the 2bit and gfidx files. For example: blat yourServer.yourInstitution.edu 4096 dynamic yourAssembly transBlat yourServer.yourInstitution.edu 4096 dynamic yourAssembly isPcr yourServer.yourInstitution.edu 4096 dynamic yourAssembly The genome and gfServer indexes would be: $rootdir/yourAssembly/yourAssembly.2bit $rootdir/yourAssembly/yourAssembly.untrans.gfidx $rootdir/yourAssembly/yourAssembly.trans.gfidx See Building gfServer indexes for instructions in building the index. For large hubs, it is possible to have more deeply nest directory, for instance, the following NCBI convention: blat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 transBlat yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 isPcr yourServer.yourInstitution.edu 4096 dynamic GCF/000/181/335/GCF_000181335.3 Which will reference these genome files and indexes: $rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.2bit $rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.untrans.gfidx $rootdir/GCF/000/181/335/GCF_000181335.3/GCF_000181335.3.trans.gfidx Check gfServer status for dynamic servers A query without specifying a genome is an "I am alive" check: % gfServer status myserver 4040 version 37x1 serverType dynamic Specifying a genome checks that is is valid and gives information on how to the index was built: % gfServer -genome=mm10 -genomeDataDir=test/mm10 status myserver 4040 version 37x1 serverType dynamic type nucleotide tileSize 11 stepSize 5 minMatch 2 Using -trans checks the translated index: % gfServer -genome=mm10 -genomeDataDir=test/mm10 -trans status myserver 4040 version 37x1 serverType dynamic type translated tileSize 4 stepSize 4 minMatch 3 /goldenPath/help/customTrack.html:Genome_Browser_Custom_Tracks Displaying Your Own Annotations in the Genome Browser Table of Contents What are custom annotation tracks? - Building and sharing a custom track Loading a custom track into the Genome Browser Displaying and managing custom tracks - Creating browser lines for annotations - Additional browser line options - Defining track lines for annotations - Required and useful track attribute pairs Sharing your annotation track with others Troubleshooting annotation display problems Custom Track Examples: - Simple annotation file - Two annotations track in one file - BED custom track with multiple blocks - Simple annotation in bigBed format - Create external links using the 'name' field from the BED file - Loading a custom track via the URL - Construct a sharable URL using the bigDataUrl setting /goldenPath/help/query.html:Genome_Browser_Queries Querying the Genome Browser From the Genomes page, you can jump to the default position of an assembly by clicking the "Go" button or you can specify a particular genome position in a variety of formats. These same formats are valid in the search bar above the main Genome Browser track display. In addition to the positional queries described below, any search term can be used to find matches in track data, track names and/or descriptions, help docs, and public hub track names and/or descriptions. See our search page for more details on the search functionality. Valid position queries can include: - Chromosome numbers - Chromosomal coordinate ranges - Gene names - Accession numbers - An mRNA, EST or STS marker - Keywords from the GenBank description of an mRNA - HGVS terms - HGVS and accession searches on outdated RefSeq accession versions is available on hg38 To specify a genome position: 1. Select the desired clade, genome and assembly 2. Enter the desired query in the "Position/Search Term" box (see sample queries below) 3. Click the "Go" button A query may have multiple results. If this is the case, a results page will appear listing each result along with the track it is associated with. Once selected, the result will be displayed in the Browser with a highlighted label, making it easier to identify. If you have further questions, you can search the Genome Browser FAQ page and find links to further resources. Also, developers of track hubs can create searchable track hubs using the searchTrix setting. To quickly jump to a codon or exon of a gene transcript: 1. Use one of the searches below to jump to a gene, to show all transcripts of a gene or range of interest 2. Right-click any transcript, select "Choose exon" or "Zoom to codon" and enter the exon or codon position of interest Sample queries Below is a list of examples that might be used to query the Genome Browser. Note that not every query listed here will produce a result in every assembly. The list serves only to illustrate the different types of queries that can be performed. Query Genome Browser Response chr7 Displays all of chromosome 7 chr3:1-1000000 Displays the first million bases of chromosome 3, counting from the p-arm telomere 3:1-1000000 Displays the first million bases of chromosome 3, Ensembl format chromosome names chr3 0 1000000 Displays the first million bases of chromosome 3; BED format NC_000007.14:1-1000000 Displays the first million bases of chromosome 3, RefSeq format CM000665.2:1-1000000 Displays the first million bases of chromosome 3, GenBank/INSDC format chr3:1000000+2000 Displays a region of chromosome 3 that spans 2000 bases, starting with position 1000000 chrUn_GL000213v1 Displays all of the unplaced contig GL000213v1 chr3_GL000221v1_random Displays the unlocalized contig GL000221v1 chr1_KN196472v1_fix Displays all of patch fix KN196472v1 20p13 Displays the region for band p13 on chromosome 20 GTATGTAGCCACGGAGCACCATTACCTGTCACCATTACCTGAATGGCTA Displays the first best match to this DNA sequence, e.g. chr21:33034835-33034883 for hg19 AA205474 Displays the region containing the EST with GenBank accession AA205474 in the BRCA1 cancer gene on chromosome 17 AC008101 Displays the region containing the clone with GenBank accession AC008101 AF083811 Displays the region containing the mRNA with GenBank accession number AF083811 NM_017414 Displays the region containing RefSeq identifier NM_017414 NP_059110 Displays the region containing protein accession number NP_059110 PRNP Displays the region containing HUGO Gene Nomenclature Committee identifier PRNP Q99697 Displays the region containing the alignment of the UniProt/SwissProt protein sequence with accession Q99697 (PITX2) RH18061;RH80175 15q11;15q13 NM_012090.5;NM_012421.4 Displays the region between genome landmarks, such as the STS markers RH18061 and RH80175, or chromosome bands 15q11 to 15q13, or SNPs NM_000310.4 and NM_012090.5. This syntax may also be used for other range queries, such as between uniquely determined ESTs, mRNAs, refSeqs, SNPS, etc. NR_026861.1:1-1000 Works with any other type of accession from this page: Displays the first 1000bp of NR_026861.1 NM_000310.4(PPT1):c.271_287del17insTT NM_007262.5(PARK7):c.-24+75_-24+92dup NM_006172.4(NPPA):c.456_*1delAA MYH11:c.503-14_503-12del NM_198576.4(AGRN):c.1057C>T NM_198056.3:c.1654G>T NP_002993.1:p.Asp92Glu NP_002993.1:p.D92E BRCA1 Ala744Cys BRCA1 A744C LRG_100t1:c.4G>A LRG_100t1:n.1 LRG_456p1:p.Ser190Leu LRG_321:g.16409_16461del ENST00000002596.6:c.-108-6848A>G ENSP00000005178.5:p.Val20Gly chrX:g.31500000_31600000del NR_111987:n.-1 NM_015102.5:n.3038-2 NM_001372044:c.1528_1530del Displays the region that matches the HGVS expression, usually in the format : If a gene symbol is used, HGVS search will try all RefSeq transcripts to find the nucleotide or amino acid at the position indicated in the expression. If there are multiple matches, a disambiguation page will be shown. If the RefSeq sequence differs from the genome sequence, then currently the search will use the genome, not the transcript, for codon counting and amino acid / nucleotide comparison. Please contact us if this is inconvenient. NM_198056.2:c.1A>C An example of an HGVS search on a previous NM version that is now outdated. Support for previous NM accessions is only available on hg38. essv8694097 Displays the region covering the copy number variant with the accession essv8694097 in the Database of Genomic Variants (DGV) nssv3446126 Displays the region covering the copy number variant with the accession nssv3446126 in the cases of developmental delay CTD-3071L10 Displays the region covering the CTD-3071L10 NCBI clone end mapping in the NCBI Clone DB database nssv16167444 Displays the region covering the common copy number genomic variant with the accession nssv16167444 in the nstd186 (NCBI Curated Common Structural Variants) dataset rs1333049 Displays results for annotations matching this rsID, including dbSNP database COSM6161404 Displays the region covering COSM6161404 in the Catalogue Of Somatic Mutations In Cancer (COSMIC) database nssv3395351 Displays the region covering ClinVar Copy Number Variant with the accession nssv3395351 in the ClinVar database BRCT_assoc Displays the region covering the manually-curated Pfam-A domain BRCT_assoc found in GENCODE Genes U133A:219211_at Displays the region containing the consensus and exemplar sequences used for the selection of probes on the Affymetrix HG-U133A chips chr1 0 1000 When entered without ":" and "-", uses 0-based, half-open coordinates (like custom tracks and internal table coordinates), so displays chr1:1-1000 pseudogene mRNA Lists transcribed pseudogenes, but not cDNAs p53 Lists mRNAs related to the p53 tumor suppressor T-cell receptor Lists mRNAs for T-cell receptor genes in GenBank breast cancer Lists mRNAs associated with breast cancer homeobox caudal Lists mRNAs for caudal homeobox genes zinc finger Lists zinc finger mRNAs kruppel zinc finger Lists only kruppel-like zinc fingers huntington Lists candidate genes associated with Huntington's disease zahler Lists mRNAs deposited by a scientist named Zahler Evans,J.E. Lists mRNAs deposited by co-author J.E. Evans Use this last format for author queries. Although GenBank requires the search format Evans JE, internally it uses the format Evans,J.E.. /goldenPath/help/bigLolly.html:Genome_Browser_bigLolly_Track_Format bigLolly Track Format The bigLolly format uses a standard bigBed file that is used to generate a lollipop graph where the position of a lollipop circle corresponds to a genomic coordinate. By default, the score is used to decide how high to draw the lollipop, but there are trackDb options to specify which fields to use for the height and width of the lollipop, as well as to draw lines on the graph. BigLolly trackDb options arguments are noStems, lollySizeField, lollyMaxSize, lollyField, yAxisLabel, and yAxisNumLabels. These options are also described in the trackDb help doc. This format is useful for displaying small genomic features such as sequence variants, as it provides two ways to characterize features and make them more visible -- stem height and radius -- in addition to color. The lollipop graph type can be used to annotate bases for variants, RNA editing, Selenocysteines, frameshifts, or any other reason. [] The bigBed files used in bigLolly type are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. The bigLolly file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigLolly files, please see the Hosting section of the Track Hub Help documentation. Contents bigLolly format definition Creating a bigLolly track Sharing your data with others Extracting data from the bigLolly format Troubleshooting bigLolly format definition Any bigBed file can be displayed as a bigLolly. See bigBed format. The following autoSql definition is an example on how to specify bigLolly files. This definition, contained in the file bigLolly.as, is pulled in when the bedToBigBed utility is run with the -as=bigLolly.as option. table bigLolly "bigLolly lollipops" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chrom" uint chromEnd; "End position in chrom" string name; "dbSNP Reference SNP (rs) identifier or :" uint score; "Score from 0-1000, derived from p-value" char[1] strand; "Unused. Always '.'" uint thickStart; "Start position in chrom" uint thickEnd; "End position in chrom" uint color; "Red (positive effect) or blue (negative). Brightness reflects pvalue" double pValueLog; "-log10 p-value" ) The first 9 fields of this bigLolly format are the same as the first 9 fields of the standard BED format. The pValueLog field provides a numeric field for stem height. Creating a bigLolly track Example #1 In this example, you will create a bigLolly custom track using an existing bigBed file, located on the UCSC Genome Browser http server. By default the score field is used to define the lollipop height. This file contains data for the hg38 assembly. To create a custom track using this bigBed file: 1. Construct a track line that references the file: track type=bigLolly name="bigLolly Example One" description="A bigLolly file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample3.bb visibility=full 2. Paste the track line into the custom track management page for the human assembly hg38 (Dec. 2013). 3. Click the "submit" button. 4. Go to chr21:17,030,007-17,055,589 to see the data. Example #2 In this example, you will create your own bigBed file to display as a bigLolly from an bed file, using an extra field to define the height of the lollipops. 1. Save this bed file to your computer. 2. Save the autoSql files bigLollyExample2.as to your computer. 3. Download the bedToBigBed utility. 4. Save the hg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human hg38 assembly. 5. Use the bedToBigBed utility to create a bigBed file from your sorted BED file, using the bigLollyExample2.bed file and chrom.sizes files created above. bedToBigBed -as=bigLollyExample2.as -type=bed9+1 bigLollyExample2.bed hg38.chrom.sizes bigLollyExample2.bb 6. Move the newly created bigBed file (bigLollyExample2.bb) to a web-accessible http, https, or ftp location. At this point you should have a URL to your data, such as "https://institution.edu/bigLollyExample2.bb", and the file should be accessible outside of your institution/hosting providers network. For more information on where to host your data, please see the Hosting section of the Track Hub Help documentation. track type=bigLolly name="bigLolly Example Two: SNP data" description="A second bigLolly file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigLollyExample2.bb lollyField=pValueLog visibility=full 7. Go to chr21:15,593,670-15,632,442 to see the data. Example #3 In this example, you will create your own bigBed file to display as a bigLolly from a bed file with the size of the lollipop defined by an extra field (lollySizeField=lollySize) where the numbers in this field are similar to a radius and define circle size. To avoid large circles from being clipped, the setting lollyMaxSize=10 ensures circles of size 10 fully display. Also, to turn off the lollipop stems, the setting lollyNoStems=on is added. Finally, the settings yAxisLabel.0="0 on 30,30,190 0" and yAxisLabel.1="5 on 30,30,190 5" adds labels and lines on the y axis where 30,30,190 defines the color. 1. Save this bed file to your computer. 2. Save the autoSql files bigLollyExample3.as to your computer. 3. Download the bedToBigBed utility. 4. Save the hg38.chrom.sizes text file to your computer. This file contains the chrom.sizes for the human hg38 assembly. 5. Use the bedToBigBed utility to create a bigBed file from your sorted BED file, using the bigLollyExample3.bed file and chrom.sizes files created above. bedToBigBed -as=bigLollyExample3.as -type=bed9+1 bigLollyExample3.bed hg38.chrom.sizes bigLollyExample3.bb 6. Move the newly created bigBed file (bigLollyExample3.bb) to a web-accessible http, https, or ftp location. At this point you should have a URL to your data, such as "https://institution.edu/bigLollyExample3.bb", and the file should be accessible outside of your institution/hosting providers network. For more information on where to host your data, please see the Hosting section of the Track Hub Help documentation. track type=bigLolly name="bigLolly Example Three" description="A third bigLolly file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigLollyExample3.bb lollySizeField=lollySize visibility=full yAxisLabel.0="0 on 30,30,190 0" yAxisLabel.1="5 on 30,30,190 5" lollyMaxSize=10 lollyNoStems=on 7. Go to chr21:25,891,755-25,891,870 to see the data. [] Sharing your data with others Custom tracks can also be loaded via one URL line. This link loads the same bigLolly.bb track and sets additional display parameters from Example 1 in the URL: http://genome.ucsc.edu/cgi-bin/hgTracks?ignoreCookie=1&db=hg38&position=chr21:17,002,145-17,159,243&hgct_customText=track%20type=bigLolly%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample3.bb %20visibility=full If you would like to share your bigLolly data track with a colleague, learn how to create a URL link to your data by looking at Example #6 on the custom track help page. Extracting data from the bigLolly format Because the bigLolly files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory. - bigBedToBed — converts a bigBed file to ASCII BED format. - bigBedSummary — extracts summary information from a bigBed file. - bigBedInfo — prints out information about a bigBed file. As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement. Troubleshooting If you encounter an error when you run the bedToBigBed program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the bedClip program (available here) to remove the problematic row(s) in your input file before running the bedToBigBed program. /goldenPath/help/hgTablesHelp.html:Table_Browser_Help Table Browser User's Guide Contents Introduction About the Table Browser databases and tables - Position-oriented tables - Non-positional tables Getting started - simple queries - Simple position-based query - Batch query using identifiers - Batch query using positions - Query to get gene symbols Filtering output by constraining field values - Filtering on fields from a single table - Filtering on fields from multiple tables - Filter constraints Intersecting data from multiple tables - Intersecting data from two tables - Intersecting data from multiple tables - Intersection options Correlating data from two tables Output formats - Displaying all fields in a table - Displaying selected fields from one or more tables - Displaying sequence (FASTA) data - Displaying CDS FASTA alignments - Saving query results in GTF or BED format - Saving data to a file - Saving data as a custom track - Displaying query results as Genome Browser hyperlinks - Displaying a statistical summary of query data Video examples of Table Browser queries - Find list of genes in a region - Obtaining coordinates and sequences of gene exons - Find SNPs in a gene - Find SNPs upstream of Genes ------------------------------------------------------------------------ Search the Genome Browser help pages: Questions and feedback are welcome. Introduction The Table Browser provides a powerful and flexible graphical interface for querying and manipulating the Genome Browser annotation tables. Because the Table Browser uses the same database as the Genome Browser, the two views are always consistent. Using the Table Browser, you can: - retrieve the DNA sequence data or annotation data underlying Genome Browser tracks for the entire genome, a specified coordinate range, or a set of accessions - apply a filter to set constraints on field values included in the output - generate a custom track and automatically add it to your session so that it can be graphically displayed in the Genome Browser - conduct both structured and free-from SQL queries on the data - combine queries on multiple tables or custom tracks through an intersection or union and generate a single set of output data - display basic statistics calculated over a selected data set - display the schema for table and list all other tables in the database connected to the table - organize the output data into several different formats for use in other applications, spreadsheets, or databases This User's Guide is aimed at both the novice Table Browser user as well the advanced user. If you are new to the Table Browser, read the Getting started section to learn about browser basics and try some simple queries. Advanced users may want to proceed directly to the section that addresses a particular area of functionality in detail. Although the Table Browser provides sufficient flexibility to satisfy the needs of most users, some advanced users may require the ability to run SQL commands directly on the Genome Browser database. UCSC provides two public MariaDB servers: (1) genome-mysql.soe.ucsc.edu (US West Coast), (2) genome-euro-mysql.soe.ucsc.edu (Europe). More information can be found on our MariaDB Access page. Alternatively, the database may be downloaded to a local computer for MariaDB access. See the mirror site documentation for information on setting up a local copy of the database. About the Table Browser databases and tables The Table Browser is built on top of the Genome Browser database, which actually consists of several separate databases, one for each genome assembly. Tables within the databases may be differentiated by whether the data are based on genome start-stop coordinates (positional tables) or are independent of position (non-positional tables).Some output formats and query options are applicable only to positional tables, hence the distinction. Positional tables Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments, and other annotations. Each of the annotation tracks displayed in the Genome Browser is based on a positional table. In some instances, data from other positional and non-positional tables may also be incorporated into the track. Data associated with custom annotation tracks active within the user's Table Browser session are also available as positional tables. Positional tables can be further subdivided into several categories based on the type of data they describe. Alignment data can be best described by using a block structure to represent each element. Other tables require only start and end coordinate data for each element. Some tables specify a translation start and end in addition to the transcription start and end. Some tables contain strand information, others don't. Most tables, but not all, specify a name for each element. Based on the format of the data described by a table, different query and output formatting options may be offered. Non-positional tables Non-positional tables contain data not tied to genomic location, for example a table that correlates a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA IDs to extended information such as author, tissue, or keyword. Some "meta" tables in this category contain information about the structure of the database itself or describe external files containing sequence data. Getting started - simple queries In its most basic form, the Table Browser can be used to retrieve a specific subset of records from a track or positional table in a selected genome assembly. The query may be based on a specific position or a set of one or more identifiers. This section describes the steps required to conduct basic simple data queries using the Table Browser. Once you have mastered the basic Table Browser functionality, refer to subsequent sections for information about generating more complex queries that use filters, intersections, and alternative data output formats. Simple position-based query Follow these steps to display a list of records that lie within a specific position in a table: Step 1. Pick a genome assembly Specify the genome assembly from which you'd like to retrieve the data by choosing the appropriate organism in the genome list, then selecting the assembly version from the assembly list. Note that the assembly list refreshes each time a different option is selected in the genome list. Assemblies are typically named after the first three characters of an organism's genus and species names. Step 2. Pick an annotation track The group list shows all the annotation track groups available in the selected genome assembly. The names correspond to the groupings displayed at the bottom of the Genome Browser annotation tracks page. When a group is selected from the list, the track list automatically updates to show all the annotation tracks available within that group. - If you already know the name of the annotation track in which you're interested, select the All Tracks option in the group list, then select the track from the track list. Similarly, you can directly select a table by choosing the All Tables option in the group list, selecting a database from the database list, then selecting the table from the table list. - To examine all the tracks available within a certain group (e.g., all gene prediction tracks), select the group name from the group list, then browse the entries in the track list. - Custom annotation tracks created during the current session are listed under the Custom Tracks group. - If no selections are made from the group or track lists, the track selection defaults to the Known Genes track in the Genes and Gene Prediction Tracks group. Step 3. Pick a table The table list shows all tables (both positional and non-positional) associated with the currently-selected track. By default, it displays the primary table for the track, i.e. the table containing the data shown in the Genome Browser annotation track. Other tables in the list are linked to the primary table by a common field and may provide supporting data used in constructing the annotation. - If the group list is set to the All Tables option, the tables list will show all tables present in the database currently selected in the database list, rather than those associated with a particular track. Step 4. Pick a genomic region (positional tables only) By default, the Table Browser region is set to genome, which will display all the data records in the selected table. - To restrict the data to a specific position range, type the position into the position box. Some examples of specific positions include a chromosome name (chrX), a coordinate range within a chromosome (chrX:100000-400000), or a scaffold name. - You can select multiple genomic regions by clicking the "define regions" button and entering up to 1,000 regions in a 3- or 4-field BED file format. - To look up the position range of a genomic element -- such as a gene name, an accession ID, an STS marker, etc. -- or keywords from the GenBank description of an mRNA, type the string into the position box, then click the Lookup button. - The data in non-positional tables are not tied to genomic coordinates; therefore, the region option is unavailable when a non-positional table is selected. A basic query on a non-positional table will show all the data in the table. Step 5. Display the output Click the Get Output button to display the results of the query. By default, the Table Browser outputs the data from all fields in the selected table as tab-separated text on the screen. See the Output formats section for information on configuring the query output. Example: Here is an example of a simple query that retrieves all the RefSeq Genes records in the position range chr7:26906938-26940301 on the May 2004 human genome assembly. 1. Select the Human option in the genome list. 2. Select the May 2004 option in the assembly list. 3. Select the Genes and Gene Prediction Tracks option in the group list. 4. Select the RefSeq Genes option in the track list. 5. Type chr7:26906938-26940301 in the position box (the Table Browser will automatically select the position option button). 6. Click the Get Output button. The Table Browser will display the records for the RefSeq accessions NM_005522, NM_153620, NM_006735, NM_153632, NM_030661, and NM_153631. Batch query using identifiers In many cases, you may want to retrieve data based on a list of one or more accessions, IDs, or names, rather than querying by genomic position. Many tracks in the Table Browser, such as those in the Genes and Gene Prediction or Variationtrack groups, support identifier queries. The identifier type used in the query must match the kind of identifiers present in the track data, e.g., mRNA accession IDs must be used to query the mRNA table and rsIDs must match those in the dbSNP table. Follow these steps to display a list of records that correspond to a set of accessions or names entered as query input. Step 1. Pick the genome assembly, track, and table Step 2. Select the genome region setting Step 3. Load the identifiers into the browser Click the Paste List button to type or paste in the identifiers or the Upload List button to load the data from a file existing on your local computer. - If you are loading multiple identifiers, entries must be separated by a space, tab, or line. - Wildcards may not be used in the list (see the Filter section for information about conducting queries that include wildcards). - The Table Browser will retain the identifier list until you delete the information by clicking the Clear List button. Step 4. Click the Get Output button See the Output formats section for information about configuring the query output. Batch query from positions If you have a list of genomic positions and want to retrieve information about their properties, you can use the Define Regions button to input multiple positions to query a chosen table. Please note, any items in the table that overlap with the defined regions will be included in the Table Browser output. In this example, you want to determine the dbSNP rsID names for your list of positions. Step 1. Select genome assembly and track To determine dbSNP rsIDs we will be using Human genome hg38 and dbSNP153. Step 2. Select the define regions button, enter regions You can find the define regions button under the Define region of interest section. Upload, type, or paste in your regions of interest, making sure they are in the desired 0/1 base notation. They will only be accepted in BED or positional format. Step 3. Select output format and get output If you want all data from a table, you need not change the output format from the default. If you want only particular columns from the table, you can change it to selected fields from primary and related tables. Once you hit the get output button, you will be redirected to a column selection page or if you did not change the output format, your output data itself. Get gene symbols in a query Follow the example below to obtain gene symbols in your query: - 1. Select the clade, genome, assembly, group, table, and region as desired. - 2. Change the output format to selected fields from primary and related tables. - 3. Click get output to go to the next step of selecting fields from related tables. - 4. Select the fields you would like from your primary table. - 5. On the same Select Fields form, find the table for the related kgXref table. For example, look for the hg38.kgXref table, and then check the checkbox next to Gene Symbol to add gene symbols to your query results. - 6. Click get output again to get the final query output. Filtering output by constraining field values The Table Browser filter option can be used to: - apply constraints on table field values to restrict which records should appear in the query output - conduct batch queries using wildcards - include fields from multiple tables in the query output Filtering on fields from a single table Follow these steps to create a filter on one or more fields in a single table: Step 1. Select the assembly, track, and region Step 2. Click the Create button on the filter line Step 3. Add the filter constraints One or more of the fields in the currently selected table may be filtered by typing constraints into the corresponding text boxes. - By default, the initial values set up in the filter match all records in the table. - Constraints must match the data type of the field to be applied successfully. For example, the geneName field in the hg17 refFlat table is a string; therefore, constraining values must also be strings. See the Filter constraints sections for more information on valid filter values. - Multiple filter values may be applied against one field by separating the values with spaces. - Individual field constraints are combined with AND, i.e. a record must meet the constraints on all fields to be retrieved. Step 4. Click the Submit button to apply the filter Once a filter has been created on a table, it will persist for the duration of the Table Browser session or until it has been cleared. Only one filter can exist for a table at a time, but multiple filters may exist in one session if they are applied on different tables. To modify an existing filter, click the Edit button on the filter line. To remove a filter, click the Clear button. Filtering on fields from multiple tables A Table Browser filter may include constraints on fields from tables related to the primary table. To create a filter composed of fields from multiple tables: Step 1. Select the assembly, track, and region Step 2. Click the Create button on the filter line Note: If a filter already exists on the table, click the Edit button to modify it or the Clear button to remove it. Step 3. Select the tables to include in the filter Scroll down to the Linked Tables section of the page. The tables listed in this section are linked to the selected table by one or more common fields (typically a name, accession, or ID field). Click the boxes in front of the table(s) whose fields you wish to include in the filter, then click the Allow Filtering Using Field in Checked Tables button. The fields of the selected tables will be displayed in the top portion of the page. Step 4. Add the filter constraints Step 5. Click the Submit button to apply the filter Note: In the current implementation of the Table Browser, the selected fields from primary and related tables output format option must be used when including fields from multiple tables in a filter. Check the boxes for all tables in the Linked Tables list on which filter constraints have been applied, then click the Allow Selection From Checked Tables button to include them in the output. Filter constraints Strings Text fields are compared to words or patterns containing wildcard characters. Valid wildcards are i "*" (matches 0 or more characters) and "?" (matches a single character). Each space-separated word or pattern in a text field box is matched against the value of that field in each record. If any word or pattern matches the value, then the record meets the constraint on that field. Numbers Numeric fields are compared to table data using an operator such as <, >, != (not equals) followed by a number. To specify a range, enter two numbers (start and end) separated by white space and/or a comma. Free-form queries When the filters on individual fields aren't sufficiently flexible, the free-form query text box allows the application of more complex constraints that typically relate two or more field names of the selected table. Valid free-form queries use the syntax of the SQL where clause (using wildcards as defined above). Free-form queries combine simple constraints with AND, OR, and NOT using parentheses as needed for clarity. A simple constraint consists of a table field name, a comparison operator (see below), and a value: a number, string, wildcard value (see below), or another field name. In place of a field name, you may use an arithmetic expression of numeric field names. - String or wildcard values for text comparisons must be quoted. Single or double quotes may be used. If comparing to a literal string value, use the "=" or "!=" operator. If comparing to a wildcard value, use the "LIKE" or "NOT LIKE" operator. - Numeric comparison operators include <, <=, =, != (not equals), >=, and >. - Arithmetic operators include +, -, *, and /. - Other SQL comparison keywords may also be used. Example: The following examples show free-form queries applied to the human refGene table). - txStart = cdsStart - searches for gene models missing expected 5' UTR upstream sequence (if strand is "+"; 3' UTR downstream if strand is "-") - chrom NOT LIKE "chr??" - restricts search to chromosomes 1 - 9, X and Y - cdsEnd - cdsStart) > 10000 - selects genes with coding spanning more than 10 kbp - txStart != cdsStart) AND (txEnd != cdsEnd) AND exonCount = 1 - finds single exon genes with both 3' and 5' flanking UTR - cdsEnd - cdsStart) > 30000) AND (exonCount=2 OR exonCount=3) - finds genes with long spans but only 2 - 3 exons Intersecting data from multiple tables It is often interesting to compare the positions of features in different annotation tracks to identify points of overlap. The Table Browser intersection utility can be used to generate various position-based comparisons of track features. Using the intersection utility, you can: - examine all genomic positions where the feature data from the two tracks overlap - identify genomic locations where there is no overlap between track features - establish thresholds for the amount of overlap that must exist between the two feature sets - conduct feature-by-feature comparisons as well as base-by-base comparisons of tracks - complement (invert) a position set before comparing the tracks An intersection may be expanded to include additional tables by using the Table Browser custom track feature. Note: The intersection utility can be used only on positional tables. To generate intersections incorporating data in non-positional tables, use the Table Browser filter utility. See the Filtering on fields from multiple tables section for more information. Intersecting data from two tables Follow these steps to configure and generate an intersection between two positional tables: Step 1. Select the assembly, track, table, and region for the primary table Note: Only positional tables may be used in an intersection. Step 2. Click the Create button on the intersection line Note: If an intersection already exists on the table, click the Edit button to modify it or the Clear button to remove it. Step 3. Select the secondary track to include in the filter Select a group in the group list, then select a track from the track list. To view all the tracks available, regardless of group, select the All Tracks option in the group list. Step 4. Select a combination method The Table Browser provides two major types of comparisons: - feature-by-feature comparisons preserve the structure of the primary table. For example, if the primary table describes exon structure and the features are compared with a second table, the results will describe exon structure (unless you choose an output format in which the structure is lost). - base-by-base comparisons examine the primary table and the table underlying the secondary track one base at a time. The structure of the primary table is not preserved in this comparison. For example, even if the primary table describes exon structure, the intersection results will contain only position ranges; no information about exon/block structure, strand, or translation region will be retained. Click the circle in front of a combination method to select it. Only one method may be selected from the two sets of methods. For more information about the individual combination options, see the Intersection Options section. Step 5. (optional) Select the complement options Check the box in front of one or both tables to complement the feature data. The complement options allow you to invert the set of positions covered by one or both tables. For example, if you choose to complement the primary track, any position covered by the that track's features will be considered not covered, and vice versa. This option provides more flexibility in comparing track positions. Step 6. Click the Submit button to apply the intersection Once an intersection has been created on a table, it will persist for the duration of the Table Browser session or until it has been cleared. Only one intersection may exist at a time. To modify an existing intersection, click the Edit button on the intersection line. To remove an intersection, click the Clear button. Intersecting data from more than two tables The Table Browser intersection utility limits combinations to only two tables. An existing intersection may be expanded to include additional tables by using the Table Browser custom track utility. To create an intersection on multiple tables: Step 1. Set up an intersection between two tables See the Intersecting data from two tables section for more information. Step 2. Save the intersection data in a custom track See the Saving data as a custom track section for information on generating a custom track. Note: In the current implementation of the Table Browser, you must use the Get Custom Track button on the custom track page to add the custom track to the Table Browser track list. Step 3. Select the newly-generated custom track Select the Custom Tracks option in the group list, then select the newly created custom track from the track list. Step 4. Create an intersection with another track Follow the steps in the Intersecting data from two tables section to intersect the custom track with another track. Intersection options Feature-by-feature comparisons Some comparisons preserve the primary table's gene and alignment structure, if it exists. For example, if the refGene table (human RefSeq Genes track) is combined with another table using one of these comparisons, the resulting output data will describe exon structure (unless you choose an output format in which the structure is lost). Primary table features are kept or discarded based on the amount of positional overlap with the features in the table underlying the secondary track. The Table Browser offers the following options in this category: - Any overlap: A primary table record will appear in the output if any of its base positions are covered by any feature in the secondary table. - No overlap: A primary table record will appear in the output only if none of its base positions are covered by any feature in the secondary table. - Overlap greater than a specified threshold: A primary table record will appear in the output if the percentage of its base positions covered by secondary table features is greater than the user-specified threshold. - Overlap less a specified threshold: A primary table record will appear in the output if the percentage of its base positions covered by secondary table features is less than the user-specified threshold. Note: If the primary table has an exon/block structure, only those bases located in exons and/or blocks will be counted. Base-by-base comparisons In these combination options, the positions of the primary and secondary table features are compared one base position at a time. When applying base-by-base comparisons, the structure of the primary table is not preserved. For example, if the refGene table (from the human RefSeq Genes track) is compared with a secondary table using these comparisons, the resulting output data will not describe exon structure. Instead, only position ranges will be returned; the exon/block structure, strand, and translation region information will be discarded. The Table Browser provides the following base-by-base combination options: - Base-by-base intersection (AND): A nucleotide position is included in the output if it is covered by at least one feature of both the primary table and the secondary table. - Base-by-base union (OR): A nucleotide position is included in the output if it is covered by at least one feature of either the primary table or the secondary table. Note: If the primary table has an exon/block structure, only base positions located in exons and/or blocks will be counted. Base-by-base complement (NOT) Before the Table Browser applies a feature-by-feature or base-by-base comparison to the table data, the set of positions covered by one or both tables can be inverted (complemented). When the data set of a table is complemented, any position covered by the table's features in the original data will be considered not covered in the inverted data, and vice versa. This option gives the user more flexibility in comparing table positions. Correlating data from two tables The Table Browser Correlation function creates a scatter plot of the data points of two tables as well as provides individual histograms of the data points from both tables. Additionally, it will also show a plot of the Residuals vs. Fitted which can be used to detect non-linearity, unequal error variances and outliers. The correlation function uses Pearson's correlation, which is optimized to work with continuous data such as wiggle tracks. For tracks that do not have data values such as gene-structured tracks, the data value used in the calculation is 1.0 for bases covered by exons and 0.0 at all other positions in the region. Due to memory and processing limitations, the number of data points that can be plotted is limited to 300,000,000. The "Window data to" function allows you to smooth out your plot by taking the average of the number of data points specified (defaults to 1). The total number of bases analyzed is independent of the data window. There is currently no way to output the results of the Correlation function. Output formats The data resulting from a Table Browser query may be configured in a number of different ways: - The output can be displayed on the screen, saved to a file, or saved to an annotation track table that can be displayed in the Genome Browser or used in a subsequent Table Browser query. - The data can include all fields from the primary or selected table, or can be restricted to selected fields from the primary table and related tables. - The data can be organized in one of several formats: tab-separated, sequence (FASTA), Browser Extensible Data format (BED), Gene Transfer Format (GTF), or a statistical summary of the data in the query. The output options available for a specific query may vary depending on the table(s) selected. For example, non-positional table data cannot be organized in a position-based format, but instead may be displayed only in tab-separated format. The Table Browser will automatically update the options on the output format list to show only those available for the current query. Displaying all fields in a table To display all the fields of the records in the query output in tab-separated format, select the all fields from primary table option. Displaying selected fields from one or more tables To restrict the query output to a subset of the fields in a table, choose the selected fields from primary and related tables option. You will be prompted to pick the table fields to display. Click the box in front of the fields you would like to see in the query output (or click the Check All button to select all the fields), then click the Get Fields button. To include data fields from other tables linked to the selected table, choose the selected fields from primary and related tables option, then scroll down to the Linked Tables section of the page. The tables listed in this section are linked to the selected table by one or more common fields (typically a name, accession, or ID field). Click the boxes in front of the table(s) whose fields you wish to include in the query output, then click the Allow Selection From Checked Tables. The fields of the selected tables will be displayed in the top portion of the page. Click the boxes in front of the fields that you wish to include in the query output, then click the Get Fields button underneath any of the field lists to generate tab-separated output that includes data from all the selected fields. Note that the Get Fields and Cancel buttons apply globally to all the selected tables, but the Check All and Clear All buttons apply only to the fields listed directly above the buttons. Displaying sequence (FASTA) data (positional tables only) To display the genomic sequence underlying the query results, select the sequence option in the output format list. The Table Browser will present you with several options to configure the output display. When you have completed the configuration, click the Get Sequence button. When displaying sequence data for gene prediction tracks, you will also be offered the option to view the protein and mRNA sequence as extracted from the data source in addition to the genomic sequence. Displaying CDS FASTA alignments (genePred tables only) The CDS FASTA alignments are created from a Multiple Alignment File (MAF) in combination with a genePred table. The UCSC MAF format stores multiple alignments at the DNA level between entire genomes. You can use the Table Browser to return FASTA alignments of coding regions in nucleotide-space or translated into amino acid-space. However, it is worth noting that the initial MAF files are all created by aligning genomes at the DNA level. Genome-wide CDS FASTA alignments Note that when using the Table Browser to fetch CDS FASTA output, it is best to restrict your query to a reasonable-sized position range rather than requesting output from the entire genome. A genome-wide query will take a substantial amount of compute time, and it is likely that your Internet browser will time out and disconnect. If you would like to download genome-wide CDS FASTA output for any of several model organisms, you can do so from the download server. Creating CDS FASTA alignments using the Table Browser To display FASTA multiple alignments for the CDS regions of genes, select the CDS FASTA alignment from multiple alignment option in the output format list. In order to see this output format option, you must have a genePred table selected. If you limit your search to a certain position range within the genome (rather than searching the entire genome), the tool will return FASTA alignments for all genes that overlap the position for which you are searching. The Table Browser will present you with a configuration page. On this page, you can select options for your output. First, select your MAF table. This is the table from which the multiple alignments will be extracted for the CDS regions of your gene track. If you do not know the name of the MAF table that corresponds to the Conservation track, you can find it in the Genome Browser by following these instructions. Then select any of the following choices: - Separate into exons - The default behavior is for the coding exons of each gene to be concatenated into one sequence in the output FASTA multiple alignment. In this case each output row header has the format listed below under "Whole gene format". If the separate into exons option is chosen then each exon will be listed with a separate header in the format listed below under "Exon format". - Show nucleotides - The default behavior is for the nucleotides in the alignment to be translated into amino acids according to the strand and exon frames defined in the selected genePred table. If this option is chosen, then the nucleotides in the alignment will not be translated into amino acids. - Output lines with just dashes - The default behavior is for the alignment rows that contain only dashes to not be printed. If this option is chosen, then these dashes-only rows are printed. - Format output as table - If this option is chosen, the header and sequence for each organism will appear on the same line. - Truncate headers as __ characters (enter zero for no headers) - This option works in conjunction with the "Format output as table" option. If you want to see only a portion of the headers, choose this option, and enter the number of characters at which you would like the headers truncated. Finally, from the list of species, select those that you would like included in the FASTA multiple alignment output. Press the "get output" button to view the output. Explanation of CDS FASTA header format Whole gene format: geneName_assemblyName peptideLength location Exon format: geneName_assemblyName_exonNum_totalExons exonLength inFrame outFrame location Here are the descriptions for each field name: - geneName- the name field from the genePred table. - assemblyName- the UCSC assembly name for the species. - peptideLength- the length of the entire coding region. If the "Show nucleotides" option is chosen, this will be in nucleotides, otherwise it will be the number of amino acids in the peptide. - location- this is the chromosome position within the assembly that is aligned in the multiple alignment. The format of this string is chrom:start-end followed by the strand where the alignment occurs. If more than one region is aligned then all the regions are listed with a semi-colon (;) between each position. This address is in genome browser coordinates (i.e. the start address is one-based). - exonNum- the ordinal of the exon. Exons are counted starting at one and begin at the transcription start site and progress along the strand of transcription. - totalExons- the number of coding exons in the gene. - exonLength- the length of the current exon. If the "Show nucleotides" option is chosen, this will be the number of nucleotides in the exon, otherwise it will be the number of amino acids in the exon (with amino acids translated from split codons placed in the exon where two of the three nucleotides lie). - inFrame- the frame number of the first nucleotide in the exon. Frame numbers can be 0, 1, or 2 depending on what position that nucleotide takes in the codon which contains it. - outFrame- the frame number of the nucleotide after the last nucleotide in this exon. Frame numbers can be 0, 1, or 2 depending on what position that nucleotide takes in the codon which contains it. Explanation of CDS FASTA sequence format As noted above, the CDS FASTA output files can be in either DNA-space or protein-space. In some instances, there is a dash ("–") in the sequence portion of the CDS FASTA file. Dashes are used in several circumstances. They indicate missing sequence for the aligning genome, as well as deletions in the aligning genome or insertions in the base genome. Because the CDS FASTA alignments are based on one reference genome, any amino acids or nucleotides that are not in the reference genome are not displayed. Consequently the peptides shown for aligning genomes are not necessarily the peptide that the gene of the other organism would generate. Any sequence inserted in an aligning genome or deleted in the base genome will not be present in the alignment. We represent this condition with an orange bar in the Genome Browser display, but the CDS FASTA alignments silently ignore this issue. Nucleotide CDS FASTA sequence: Consider the example below that shows the FASTA sequence for four species aligned with the first exon of the human gene PLEKHO1 (UCSC Gene: uc001ett.1). Note that the rat (rn4) row is missing the first three nucleotides. This could be due to a lineage-specific insertion between the rat and human genomes, or a lineage-specific deletion between the human and rat genomes. Note also that the Zebrafish (danRer4) row contains only dashes. This could be due to excessive evolutionary distance between the zebrafish and human, missing data in the zebrafish, or independent indels in the region in both species. Sometimes it is helpful to view the Conservation track in the Genome Browser in this area to clarify the exact meaning of the dashes. >uc001ett.1_hg18_1_6 30 0 0 chr1:148389072-148389101+ ATGATGAAGAAGAACAAcode >uc001ett.1_panTro2_1_6 30 0 0 chr1:129156502-129156531+ ATGATGAAGAAGAACAAcode >uc001ett.1_rn4_1_6 30 0 0 chr2:190795892-190795918- ---ATGAAGAAGAGCGGCTCCGGCAAGCGG >uc001ett.1_danRer4_1_6 30 0 0 ------------------------------ >uc001ett.1_oryLat2_1_6 30 0 0 chr11:3404940-3404969- AGGATGAAGAAAAGCAACCAGAGCAGGCGG Amino Acid CDS FASTA sequence: - Codons that have a dash in any of the three nucleotides are represented by a dash in the amino acid. - Codons with an N in any position are represented with an X. - Stop codons are represented with a Z. - All other amino acids follow the IUPAC amino acid codes. - In exon format, when the codon triplet is split between two exons, the amino acid will be displayed as part of the exon containing two of the three nucleotides like so: Saving query results in GTF or BED format (positional tables only) To format the query results using GTF or BED conventions, select the corresponding option in the output format list. Note that when you select GTF, the table browser translates the output into this format. For tables that lack feature designations, all records are arbitrarily assigned the feature "exon" to conform to GTF specifications. If you select BED format, you will be presented with the option to include and configure a custom track header and options for organizing the data. When you have finished the configuration -- or to accept the default options -- click the Get BED button at the bottom of the window. To understand the name column in the BED format, see this FAQ. Saving data to a file By default, the Table Browser displays query results directly in your internet browser window. To redirect the data to a file, type a file name into the output file box before starting the query. The Table Browser will prompt you for the location of this file on your local disk while processing the query. Saving data as a custom track (positional tables only) Query output may be saved in a format that can be displayed as a custom annotation track in the Genome Browser. Custom tracks created during a Table Browser session may also be used for subsequent queries and intersections in the same session. For more information on custom tracks, see the Genome Browser User's Guide. To save query data in custom track format, select the custom track option in the output format list. When the query is executed, the Table Browser will prompt you to customize the track header and configure the record layout of the data. The configuration is optional; the Table Browser automatically sets up a default track configuration. Click the Custom track link for more information on custom track syntax and format. When you have finished configuring the custom track -- or to accept the default configuration -- click one of the buttons at the bottom of the window to create the custom annotation track. - To display the query results as text on the screen, click the Get Custom Track File button. - To save the query results to a file on your local disk for future use, specify a file name in the output file box before executing the query, then click the Get Custom Track File button. - To load the query results into a table accessible from the Table Browser table list, click the Get Custom Track in Table Browser button. - To view the query results as a custom track in the Genome Browser, click the >Get Custom Track in Genome Browser button. Your browser display will be redirected automatically to the Genome Browser, with your custom track positioned near the top of the annotation tracks window. - To access your custom track data in a subsequent query in the same Table Browser session, select the Custom Tracks option from the group list to display the custom tracks available. Displaying query results as Genome Browser hyperlinks (positional tables only) To examine the records in the query output individually in the Genome Browser, select the hyperlinks to Genome Browser output option. The Table Browser will display a list of one or more hyperlinks corresponding to the individual records in the output data. Click a link to open up the Genome Browser display to the item and position shown on the hyperlink. Displaying a statistical summary of query data (positional tables only) To generate a statistical summary of the query output data, the region covered by the query, and the CPU time required to process the query, click the Summary/Statistics button. Video examples of Table Browser queries Finding a list of genes in a genomic region [] Visit our Video Page. Visit our YouTube channel. Obtaining coordinate sequences for a gene exon [] Visit our Video Page. Visit our YouTube channel. Finding all the SNPs in a gene [] Visit our Video Page. Visit our YouTube channel. Finding SNPs upstream of a gene [] Visit our Video Page. Visit our YouTube channel. /goldenPath/help/hubQuickStartSearch.html:Track_Hub_Quick_Start Searchable Track Hub Quick Start Guide Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any UCSC assembly or remotely-hosted sequence. Making your annotation data searchable is an important improvement to the usability of your hub, especially if your annotations are not otherwise represented on the Browser. This Quick Start Guide will go through making a searchable track hub from a GFF3 file; converting to a genePred, bed, and bigBed, then creating a trix search index file. This example will be made with the new "useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files. STEP 1: Downloads Gather our settings and data files in a publicly-accessible directory (such as a university web-server, CyVerse, or Github). For more information on this, please see the hosting guide. Copy the hub.txt file using wget, curl, or copy-paste: wget http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt Download some example GFF3 data from Gencode. This file happens to be long non-coding RNAs (lncRNAs): wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.long_noncoding_RNAs.gff3.gz Next, you will need to download four Genome Browser utilities to convert the GFF3 file to bigBed format and run the search index command. Similar commands exist to convert other file types. These are operating system specific: Utility Name MacOS Download Linux Download ---------------- ---------------- ---------------- gff3ToGenePred Download Download genePredToBed Download Download bedToBigBed Download Download IxIxx Download Download STEP 2: Format Data In order to format the data, you will need to run a command to make those commands executable: chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx Then run the first conversion from GFF3 to genePred, making sure to include -geneNameAttr=gene_name so that gene symbol is used as the name2 instead of ID number, and sorting by chromosome and position: gff3ToGenePred -geneNameAttr=gene_name gencode.v32.long_noncoding_RNAs.gff3.gz stdout | sort -k2,2 -k4n,4n > gencode.v32.lncRNAs.genePred Convert that genePred file to a bed file: genePredToBed gencode.v32.lncRNAs.genePred gencode.v32.lncRNAs.bed Compress and index that bed file into a bigBed format, adding the -extraIndex=name to allow EnstID searches: bedToBigBed -extraIndex=name gencode.v32.lncRNAs.bed https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v32.lncRNAs.bb If you would like to stop here, you will be able to display your bigBed hub and search for the names that were indexed into the bigBed file (EnstID). You will not be able to use the searchIndex and searchTrix trackDb setting, which require creating a key and value search index for your file as shown below. STEP 3: Create Search Index If you want to link your annotation names to anything other than the field referrenced in the -extraIndex command, you will need to make and index file. We will make an input file which will link one identifier (EnstID) with search terms composed of gene symbols and EnstIDs. Below is one example of a command to create an input file for the search indexing command: cat gencode.v32.lncRNAs.genePred | awk '{print $1, $12, $1}' > input.txt To examine or download that file, you can click here. Note that the first word is the key referenced in the BED file and the following search terms are associated aliases will be searchable to the location of the key. These search terms are case insensitive and allow partial word searches. Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the search run quickly even in large files. ixIxx input.txt out.ix out.ixx STEP 4: View and Search Enter the URL to your hub on the Connected Hubs tab of the Track Data Hubs page. Alternately, you can enter your hub.txt URL in the following web address: genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hubUrl=YourUrlHere If you would like to look at an already-made example, click the following link which includes hideTracks=1 to hide other tracks. After the link is a picture of what the hub should look like: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hideTracks=1&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt [A display of the Searchable hub track] Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results page until you see your search results. [Typing a search term in the search box] You can type your search term (fam87b) in the box above the ideogram and press go. Note that it is not case sensitive. Scrolling to the bottom of the search results page, you will see your searchable hub keyword that was linked with your search term. Clicking into it will bring you to the position of your search term. [Search hit for fam87b] [Search results for fam87b] If you are having problems, be sure all your files are publicly-accessible and that your server accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays: curl -IL http://yourURL/hub.txt Note that the Browser waits 5 minutes before checking for any changes to these files. When editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding udcTimeout=1 to your URL. For more information, see the Debugging and Updating Track Hubs section of the Track Hub User Guide. Understanding hub.txt with useOneFile The hub.txt file is a configuration file with names, descriptions, and paths to other files. The example below uses the setting useOneFile on to indicate that all the settings and paths appear in only the hub.txt file as opposed to having two additional settings files (genomes.txt and trackDb.txt). Please visit the UseOneFile guide for more information. The most important settings to make the hub searchable appear in the third section, in what would formerly be the trackDb.txt file. The searchIndex and searchTrix indicate which fields are indexed in the bigBed file and where to find the .ix file respectively. To see the actual hub.txt file for the above example, click here. hub MyHubsNameWithoutSpaces shortLabel My Hub's Name longLabel Name up to 80 characters versus shortLabel limited to 17 characters email myEmail@address descriptionUrl aboutMyHub.html useOneFile on genome assembly_database_2 track uniqueNameNoSpacesOrDots type track_type bigDataUrl track_data_url shortLabel label 17 chars longLabel long label up to 80 chars visibiltiy hide/dense/squish/pack/full searchIndex field,field2 searchTrix path/to/.ix/file Additional Resources - Track Hub User Guide - Guide To useOneFile setting - Search file .ix documentation - Mailing list question with searchable Track Hub - Mailing list question with searchable Custom Tracks - Track Database (trackDb) searchTrix Definition - Quick Start Guide to Organizing Track Hubs into Groupings - Quick Start Guide to Assembly Track Hubs /goldenPath/help/mirror.html:Genome_Browser_Manual_Installation Installation of a UCSC Genome Browser on a local machine ("mirror") Contents Considerations before installing a Genome Browser Installing a Genome Browser locally with the GBiC installer Docker installation instructions Manual installation instructions Using UDR to speed up downloads The genome-mirror mailing list What happened to Genome Browser in a Box (GBiB)? Considerations before installing a Genome Browser Like most web servers, running a Genome Browser installation at your institution, even for your own department, requires a Unix machine, disk space (6TB for hg19), and the resources to update the site and underlying OS regularly. You may want to consider these alternatives before embarking on a full UCSC Genome Browser installation directly on your server. For information about operating in the cloud, visit the Cloud Data and Software Resources help page. 1. Embed the Genome Browser graphic in your web page If you only want to include a genome browser view into your webpage for already existing genomes, you can use an