viralzoneGenes Viralzone Transcriptome Viralzone Transcriptome: early=green, intermediate=bleu, late=red genes Description This track shows the genome-wide expression map of monkeypox virus. ORFs are selectively expressed at early, intermediate, or late times of infection. This is due to stage-specific viral promoters and transcription factors associated with the viral DNA-dependent RNA polymerase. ORFs are named according to the official nomenclature of orthopoxviruses. Display Conventions and Configuration ORFS are colored according to the time of their transcription: green for early genes, blue for intermediate genes, and red for late genes. Methods Expression data were obtained from ribosome profiling data from the Vacinia virus study: Deciphering poxvirus gene expression by RNA sequencing and ribosome profiling. The monkexpox data were annotated for similarity. The monkexpox genome is very similar to the vaccinia virus genome, and all orthopoxviruses share the same expression pattern and replication cycle. The gene names follow the universal OPGXXX (OrthoPoxvirus Gene number XXX) convention published in Ancient Gene Capture and Recent Gene Loss Shape the Evolution of Orthopoxvirus-Host Interaction Genes. Data Access The raw data can be explored interactively with the Table Browser or combined with other datasets in the Data Integrator tool. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed. Instructions for downloading this command can be found on our utilities page. The tool can also be used to obtain features within a given range without downloading the file, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mpxvRivers/viralzoneGenes/viralzone.bb -chrom=NC_063383.1 -start=0 -end=100000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. Credits Many thanks to Philippe Mercier from Viralzone for providing these annotations. References Zhilong Yang, Shuai Cao, Craig A Martens, Stephen F Porcella, Zhi Xie, Ming Ma, Ben Shen, Bernard Moss Deciphering poxvirus gene expression by RNA sequencing and ribosome profiling J Virol 2015 Jul;89(13):6874-86. Tatiana G Senkevich, Natalya Yutin, Yuri I Wolf, Eugene V Koonin, Bernard Moss Ancient Gene Capture and Recent Gene Loss Shape the Evolution of Orthopoxvirus-Host Interaction Genes mBio 2021 Aug 31;12(4):e0149521 assembly Assembly Assembly map Description This track shows the sequences used in the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1 genome assembly. Genome assembly procedures are covered in the NCBI assembly documentation. NCBI also provides specific information about this assembly. There are no gaps in this assembly. There is only one sequence: cytoBandIdeo Chromosome Band (Ideogram) Ideogram for Orientation map gc5Base GC Percent GC Percent in 5-Base Windows map Description The GC percent track shows the percentage of G (guanine) and C (cytosine) bases in 5-base windows on the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1/GCF_014621545.1 genome assembly. High GC content is typically associated with gene-rich areas. The average overall GC percent for the entire assembly is % 33.03. This track may be configured in a variety of ways to highlight different aspects of the displayed information. Click the "Graph configuration help" link for an explanation of the configuration options. Credits The data and presentation of this graph were prepared by Hiram Clawson. genbankAli Genbank Alignments Genbank Alignments: alignments of sequences in Genbank to the reference map Description This track shows the alignments of selected full genome sequences in Genbank to the reference genome. Display Conventions and Configuration Alignable regions are shown with thick lines. Small single-bp differences are highlighted with red tickmarks on the thick lines. Alignable regions can be separated by non-alignable sequences with thinner lines. Deletions are shown with single lines. Deletions that include at least one short insertion are shown with double lines. Rearrangements are not shown directly, but duplications will lead to two rows of thick lines, with the exact origin of the sequence shown on mouse over or by clicking the sequence. Methods Sequences were aligned with BLAT and filtered with our command line tool pslReps at default settings. Data Access The raw data can be explored interactively with the Table Browser or combined with other datasets in the Data Integrator tool. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed. Instructions for downloading this command can be found on our utilities page. The tool can also be used to obtain features within a given range without downloading the file, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mpxvRivers/genbankAli/seqs.bb -chrom=NC_063383.1 -start=0 -end=100000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. References Monzon S et al, Genomic accordions may hold the key to Monkeypox Clade IIb's increased transmissibility Biorxiv Oct 01, 2022 Gigante CM et al, Genomic deletions and rearrangements in monkeypox virus from the 2022 outbreak, USA Biorxiv, Sep 17 , 2022 ncbiGene NCBI Genes NCBI gene predictions genes Description The NCBI Gene track for the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1 genome assembly is constructed from the gff file GCF_014621545.1_ASM1462154v1_genomic.gff.gz supplied with the genome assembly at the FTP location: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/ Track statistics summary Total genome size: 197,209 Gene count: 179 Bases in genes: 165,451 Percent genome coverage: % 83.896 Credits This track was produced at UCSC from data generated by scientists worldwide and curated by the NCBI RefSeq project. References Kent WJ. BLAT - the BLAST-like alignment tool. Genome Res. 2002 Apr;12(4):656-64. PMID: 11932250; PMC: PMC187518 Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014 Jan;42(Database issue):D756-63. PMID: 24259432; PMC: PMC3965018 Pruitt KD, Tatusova T, Maglott DR. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. PMID: 15608248; PMC: PMC539979 pcrAmplicon PCR Sequencing PCR Sequencing Protocols: Amplicons and Primers map Description This track shows the amplicons and primers for PCR-based genome amplification schemes. The track contains one subtrack for each sequencing protocol primers set. By configuring the track, one can show or hide each set individually. The primer sets are labeled with the name of the first author of the protocol. Yale hMPXV Amplicon Scheme (Chen et al 2022): This is a primer set for Illumina sequencers, it was tested on a MiSeq with 2x150nt read lengths. It is available from Protocols.io. Based on initial validation, the approach shows notably higher depth and breadth of coverage across the genome, particularly with higher PCR cycle threshold (Ct) samples, as compared to metagenomic sequencing. The average amplicon size is ~2.5kB.As the primers were designed for the pre-outbreak sequence MT903345, there is one single mismatch in the genome, for MPXV_102. Its left primer is cagcgtgtataggatggTgacg but the genome is cagcgtgtataggatggCgacg. Welkers et al: This is a primer set for Oxford Nanopore sequencers. It is also available from Protocols.io. Amplicon size is ~2.5kB. Display Conventions and Configuration Genomic locations of the amplicons are highlighted. Thick rectangles indicate the primers, intervening amplicon sequence is shown with lines and arrows. A click on the amplicons shows both primer sequences. Methods Primer sequences and names were download from the source publications (Chen et al) and (Welkers et al), aligned to the Rivers reference with our isPcr tool and converted to bigBed with AWK. Data Access The raw data can be explored interactively with the Table Browser or combined with other datasets in the Data Integrator tool. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed. Instructions for downloading this command can be found on our utilities page. The tool can also be used to obtain features within a given range without downloading the file, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mpxvRivers/pcrAmplicon/pcrAmplicon.bb -chrom=NC_063383.1 -start=0 -end=100000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. References Chen NFG et al Monkeypox virus multiplexed PCR amplicon sequencing Protocols.io V2 July 26, 2022 Welkers M et al Monkeypox virus whole genome sequencing using combination of NextGenPCR and Oxford Nanopore Protocols.io V1, July 12, 2022 pcrChen Yale hMPXV PCR Amplicons: Yale hMPXV Amplicon Scheme (Chen et al. 2022 V2) map Description This track shows the amplicons and primers for PCR-based genome amplification schemes. The track contains one subtrack for each sequencing protocol primers set. By configuring the track, one can show or hide each set individually. The primer sets are labeled with the name of the first author of the protocol. Yale hMPXV Amplicon Scheme (Chen et al 2022): This is a primer set for Illumina sequencers, it was tested on a MiSeq with 2x150nt read lengths. It is available from Protocols.io. Based on initial validation, the approach shows notably higher depth and breadth of coverage across the genome, particularly with higher PCR cycle threshold (Ct) samples, as compared to metagenomic sequencing. The average amplicon size is ~2.5kB.As the primers were designed for the pre-outbreak sequence MT903345, there is one single mismatch in the genome, for MPXV_102. Its left primer is cagcgtgtataggatggTgacg but the genome is cagcgtgtataggatggCgacg. Welkers et al: This is a primer set for Oxford Nanopore sequencers. It is also available from Protocols.io. Amplicon size is ~2.5kB. Display Conventions and Configuration Genomic locations of the amplicons are highlighted. Thick rectangles indicate the primers, intervening amplicon sequence is shown with lines and arrows. A click on the amplicons shows both primer sequences. Methods Primer sequences and names were download from the source publications (Chen et al) and (Welkers et al), aligned to the Rivers reference with our isPcr tool and converted to bigBed with AWK. Data Access The raw data can be explored interactively with the Table Browser or combined with other datasets in the Data Integrator tool. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed. Instructions for downloading this command can be found on our utilities page. The tool can also be used to obtain features within a given range without downloading the file, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mpxvRivers/pcrAmplicon/pcrAmplicon.bb -chrom=NC_063383.1 -start=0 -end=100000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. References Chen NFG et al Monkeypox virus multiplexed PCR amplicon sequencing Protocols.io V2 July 26, 2022 Welkers M et al Monkeypox virus whole genome sequencing using combination of NextGenPCR and Oxford Nanopore Protocols.io V1, July 12, 2022 pcrWelkers Welkers et al PCR Amplicons: Welkers et al. 2022 V1 map Description This track shows the amplicons and primers for PCR-based genome amplification schemes. The track contains one subtrack for each sequencing protocol primers set. By configuring the track, one can show or hide each set individually. The primer sets are labeled with the name of the first author of the protocol. Yale hMPXV Amplicon Scheme (Chen et al 2022): This is a primer set for Illumina sequencers, it was tested on a MiSeq with 2x150nt read lengths. It is available from Protocols.io. Based on initial validation, the approach shows notably higher depth and breadth of coverage across the genome, particularly with higher PCR cycle threshold (Ct) samples, as compared to metagenomic sequencing. The average amplicon size is ~2.5kB.As the primers were designed for the pre-outbreak sequence MT903345, there is one single mismatch in the genome, for MPXV_102. Its left primer is cagcgtgtataggatggTgacg but the genome is cagcgtgtataggatggCgacg. Welkers et al: This is a primer set for Oxford Nanopore sequencers. It is also available from Protocols.io. Amplicon size is ~2.5kB. Display Conventions and Configuration Genomic locations of the amplicons are highlighted. Thick rectangles indicate the primers, intervening amplicon sequence is shown with lines and arrows. A click on the amplicons shows both primer sequences. Methods Primer sequences and names were download from the source publications (Chen et al) and (Welkers et al), aligned to the Rivers reference with our isPcr tool and converted to bigBed with AWK. Data Access The raw data can be explored interactively with the Table Browser or combined with other datasets in the Data Integrator tool. For automated analysis, the genome annotation is stored in a bigBed file that can be downloaded from the download server. Annotations can be converted from binary to ASCII text by our command-line tool bigBedToBed. Instructions for downloading this command can be found on our utilities page. The tool can also be used to obtain features within a given range without downloading the file, for example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/mpxvRivers/pcrAmplicon/pcrAmplicon.bb -chrom=NC_063383.1 -start=0 -end=100000 stdout Please refer to our mailing list archives for questions, or our Data Access FAQ for more information. References Chen NFG et al Monkeypox virus multiplexed PCR amplicon sequencing Protocols.io V2 July 26, 2022 Welkers M et al Monkeypox virus whole genome sequencing using combination of NextGenPCR and Oxford Nanopore Protocols.io V1, July 12, 2022 simpleRepeat Simple Repeats Simple Tandem Repeats by TRF varRep Description This track displays simple tandem repeats (possibly imperfect repeats) on the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1/GCF_014621545.1 genome assembly, located by Tandem Repeats Finder (TRF) which is specialized for this purpose. These repeats can occur within coding regions of genes and may be quite polymorphic. Repeat expansions are sometimes associated with specific diseases. There are 22 items in the track covering 1,065 bases, assembly size 197,209 bases, percent coverage % 0.54. Methods For more information about the TRF program, see Benson (1999). Credits TRF was written by Gary Benson. References Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 Jan 15;27(2):573-80. PMID: 9862982; PMC: PMC148217 Credits This track was generated using a modification of a program developed by G. Miklem and L. Hillier (unpublished). References Gardiner-Garden M, Frommer M. CpG islands in vertebrate genomes. J Mol Biol. 1987 Jul 20;196(2):261-82. PMID: 3656447 tanDups Tandem Dups Paired identical sequences map Description This track indicates any pair of exactly identical sequence for the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1 genome assembly. There may be two tracks in this composite collection: Gap Overlaps - Paired exactly identical sequence on each side of a gap Tandem Dups - Paired exactly identical sequence survey over entire genome assembly The Gap Overlaps is thus a subset of the full Tandem Dups track. This investigation began when an unusual number of paired sequences around gaps was noticed during the mouse strain sequencing project. This naturally raised the question, how common is this feature, and what type of assemblies can it be found in. The Gap Overlaps track indicates any pair of exactly identical sequence on each side of gaps. Where a gap is any run of N's, including a single N. The end of an upstream sequence before the gap is duplicated exactly at the beginning of the downstream sequence following the gap in the assembly.Data in track: . The Tandem Dups track is a similar survey over the entire genome assembly. The separation gap between these paired sequences can range from 1 base up to 20,000 bases.Data in track: Item count: 3; Bases covered: 576. Methods The Gap Overlap duplicate sequences were found by extracting 1,000 bases before and after each gap and aligned to each other with the blat command: blat -q=dna -minIdentity=95 -repMatch=10 upstreamContig.fa downstreamContig.fa Filtering the PSL output for a perfect match, no mis-matches, and therefore of equal size matching sequence, where the alignment ends exactly at the end of the upstream sequence, and begins exactly at the start of the downstream sequence. The Tandem Dups paired sequences were found with the following procedure: Generate 29 base kmers for the entire genome, allow only kmers with bases: A C T G, no N's allowed. Pair up identical kmers with at least one base separation and up to 20,000 bases separation. Collapse overlapping kmer pairs when they are the same size of sequence and the same spacing between the pairs. This procedure preserves the definition of duplicated identical pairs. The resulting pairs can now be longer sequences with smaller separation then the constituent pairs Final result selects sizes of 30 bases or more for the size of the paired sequence, and at least one base remaining as a separation gap. Collapsed pairs that close the gap are discarded. They appear to indicate simple repeat sequences when this happens. It would be interesting to have this result available, but that is not available at this time. The reason for starting with 29 base sized pairs and then selecting results of at least 30 base sized pairs results in a reasonable number of 30 base pairs. If the procedure starts with 30 base sized pairs, it produces way too many 30 base kmer pairs for a reasonable count. See Also Interactive tables of all results: Gap Overlaps Tandem Dups Credits Thank you to Joel Armstrong and Benedict Paten of the Computational Genomics Lab at the U.C. Santa Cruz Genomics Institute for identifying this characteristic of genome assemblies. The data and presentation of this track were prepared by Hiram Clawson, U.C. Santa Cruz Genomics Institute tandemDups Tandem Dups Paired exactly identical sequence survey over entire genome assembly map Description This track indicates any pair of exactly identical sequence for the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1 genome assembly. There may be two tracks in this composite collection: Gap Overlaps - Paired exactly identical sequence on each side of a gap Tandem Dups - Paired exactly identical sequence survey over entire genome assembly The Gap Overlaps is thus a subset of the full Tandem Dups track. This investigation began when an unusual number of paired sequences around gaps was noticed during the mouse strain sequencing project. This naturally raised the question, how common is this feature, and what type of assemblies can it be found in. The Gap Overlaps track indicates any pair of exactly identical sequence on each side of gaps. Where a gap is any run of N's, including a single N. The end of an upstream sequence before the gap is duplicated exactly at the beginning of the downstream sequence following the gap in the assembly.Data in track: . The Tandem Dups track is a similar survey over the entire genome assembly. The separation gap between these paired sequences can range from 1 base up to 20,000 bases.Data in track: Item count: 3; Bases covered: 576. Methods The Gap Overlap duplicate sequences were found by extracting 1,000 bases before and after each gap and aligned to each other with the blat command: blat -q=dna -minIdentity=95 -repMatch=10 upstreamContig.fa downstreamContig.fa Filtering the PSL output for a perfect match, no mis-matches, and therefore of equal size matching sequence, where the alignment ends exactly at the end of the upstream sequence, and begins exactly at the start of the downstream sequence. The Tandem Dups paired sequences were found with the following procedure: Generate 29 base kmers for the entire genome, allow only kmers with bases: A C T G, no N's allowed. Pair up identical kmers with at least one base separation and up to 20,000 bases separation. Collapse overlapping kmer pairs when they are the same size of sequence and the same spacing between the pairs. This procedure preserves the definition of duplicated identical pairs. The resulting pairs can now be longer sequences with smaller separation then the constituent pairs Final result selects sizes of 30 bases or more for the size of the paired sequence, and at least one base remaining as a separation gap. Collapsed pairs that close the gap are discarded. They appear to indicate simple repeat sequences when this happens. It would be interesting to have this result available, but that is not available at this time. The reason for starting with 29 base sized pairs and then selecting results of at least 30 base sized pairs results in a reasonable number of 30 base pairs. If the procedure starts with 30 base sized pairs, it produces way too many 30 base kmer pairs for a reasonable count. See Also Interactive tables of all results: Gap Overlaps Tandem Dups Credits Thank you to Joel Armstrong and Benedict Paten of the Computational Genomics Lab at the U.C. Santa Cruz Genomics Institute for identifying this characteristic of genome assemblies. The data and presentation of this track were prepared by Hiram Clawson, U.C. Santa Cruz Genomics Institute windowMasker WM + SDust Genomic Intervals Masked by WindowMasker + SDust varRep Description This track depicts masked sequence as determined by WindowMasker on the the 30 May 2022 Monkeypox virus/GCF_014621545.1_ASM1462154v1/GCF_014621545.1 genome assembly. The WindowMasker tool is included in the NCBI C++ toolkit. The source code for the entire toolkit is available from the NCBI FTP site. Methods To create this track, WindowMasker was run with the following parameters: windowmasker -mk_counts true -input GCF_014621545.1_ASM1462154v1.unmasked.fa -output wm_counts windowmasker -ustat wm_counts -sdust true -input GCF_014621545.1_ASM1462154v1.unmasked.fa -output windowmasker.intervals perl -wpe 'if (s/^>lcl\|(.*)\n$//) { $chr = $1; } \ if (/^(\d+) - (\d+)/) { \ $s=$1; $e=$2+1; s/(\d+) - (\d+)/$chr\t$s\t$e/; }' windowmasker.intervals > windowmasker.sdust.bed The windowmasker.sdust.bed included masking for areas of the assembly that are gap. The file was 'cleaned' to remove those areas of masking in gaps, leaving only the sequence masking. The final result covers 18,619 bases in the assembly size 197,209 for a percent coverage of % 9.44. References Morgulis A, Gertz EM, Schäffer AA, Agarwala R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 2006 Jan 15;22(2):134-41. PMID: 16287941