"Clint" - Pan troglodytes Photo: Yerkes National Primate Research Center, Emory University |
The March 2006 chimpanzee (Pan troglodytes) browser displays data from the 6X whole genome shotgun draft assembly (Build 2 Version 1, Oct. 2005) produced by the Chimpanzee Sequencing and Analysis Consortium. This assembly contains sequence from the initial 4X chimpanzee assembly described and analyzed in Nature (The Chimpanzee Sequencing and Analysis Consortium, 2005), with additional 2X sequence generated, assembled, and assigned to chromosomes by the Genome Sequencing Center of Washington University School of Medicine, St. Louis, MO, USA. For more information about this assembly, see Pan_troglodytes-2.1 in the NCBI Assembly database.
This assembly uses a new chromosomal numbering scheme that reflects orthology between the human and chimpanzee chromosomes. For details, see the Assembly details section below and the Genome Browser FAQ. To read more about the chimpanzee assembly, see the Washington University in St. Louis School of Medicine Pan troglodytes web page and the National Institutes of Health NIH News summary of the chimpanzee analysis paper.
The chimpanzee is the species most closely related to humans, but is endangered. Consequently, it is the focus of multiple conservation efforts.
A genome position can be specified by the HUGO Gene Nomenclature Committee gene name of a human RefSeq, the accession number of an EST or mRNA, a chromosomal coordinate range, or keywords from the GenBank description of an mRNA. The following list shows examples of valid position queries for the chimpanzee genome. See the User's Guide for more information.
Request: |
Genome Browser Response: |
|
chr22 | Displays all of chromosome 22 | |
chr2a:11,250,001-12,250,000 | Displays a million bases of chromosome 2a, beginning at
base 11,250,001. Note that chromosome 2 in this assembly has been split into two parts: 2a and 2b. |
|
chr2a:11,250,001+2000 | Displays a region of chr 2a that spans 2000 bases, starting at position 11,250,001 | |
BRCA1 | Displays a list of genomic regions where human RefSeq gene BRCA1 (or features associated with BRCA1) aligns | |
AF115459 | Displays region of genome with mRNA with GenBank accession number AF115459 | |
348 | Displays the region of genome with Entrez Gene identifier 348 | |
pseudogene mRNA | Lists transcribed pseudogenes, but not cDNAs | |
sialic acid | Lists mRNAs and RefSeqs with GenBank keywords sialic acid | |
huntington | Lists mRNAs associated with Huntington's disease | |
Paabo,S. | Lists mRNAs deposited by co-author S. Paabo | |
Use this last format for author queries. Although GenBank requires the search format Paabo S, internally it uses the format Paabo,S.. |
This assembly covers about 97 percent of the genome and is based on 6X sequence coverage. It is composed of 265,882 contigs with an N50 length of 29 kb and 44,460 supercontigs with an N50 length of 9.7 Mb. The total contig length, not including estimated gap sizes, is 2.97 Gb. Of that total, 2.82 Gb of sequence have been ordered and oriented along specific chimpanzee chromosomes, 107 Mb have been placed in chr*_random, and 50 Mb remain in chrUn.
The whole genome shotgun data were derived primarily from the donor Clint, a captive-born male chimpanzee from the Yerkes Primate Research Center in Atlanta, GA, USA. The reads were assembled with the whole-genome assembly program PCAP (Huang, 2006), using stringent parameters derived by eliminating detectable global misassemblies -- interchromosomal cross-overs determined by alignment of the chimpanzee genome against the human genome -- larger than 50 Kb.
The assembly data were aligned against the human genome at UCSC utilizing BLASTZ (Schwartz, 2003) to align and score non-repetitive chimpanzee regions against repeat-masked human sequence. The alignment chains differentiated between orthologous and paralogous alignments (Kent, 2003); only "reciprocal best" alignments were retained in the alignment set. The chimpanzee AGP files were generated from these alignments in a manner similar to that described in The Chimpanzee Sequencing and Analysis Consortium (2005). Centromeres were introduced into the chimp sequence at the positions of the centromeres in the human chromosomes. Ten documented/known human inversions supported by the assembly were introduced into the ordering, as was the separation of alignments to human chromosome 2 into chimpanzee chromosomes 2a and 2b. The regions in the WGS assembly corresponding to the finished sequences for chromosomes 21 and Y and a 5-Mb finished region from chimpanzee chromosome 7 were replaced with the corresponding finished AGPs/sequences. See the Credits page for acknowledgements for these chromosomal regions.
A major difference between this assembly and the previous Nov. 2003 version is the chromosomal numbering scheme, which has been changed to reflect a new standard that preserves orthology with human chromosomes. Proposed by E.H. McConkey in 2004, the new numbering convention was subsequently endorsed by the International Chimpanzee Sequencing and Analysis Consortium. This standard assigns the identifiers "2a" and "2b" to the two chimp chromosomes that fused in the human genome to form chromosome 2. Note that the genome assembly shown in the Nov. 2003 panTro1 Genome Browser retains the older numbering scheme in which these chromosomes are numbered 12 and 13. To view a table showing the correspondence between human and chimp chromosomes, see the FAQ.
Bulk downloads of the sequence and annotation data are available via the Genome Browser FTP server or the Downloads page. The complete set of sequence reads is available at the NCBI trace archive. These data have specific conditions for use.
The chimpanzee browser annotation tracks were generated by UCSC and collaborators worldwide. See the Credits page for a detailed list of the organizations and individuals who contributed to this release.
Chimpanzee Sequencing and Analysis Consortium. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005 Sep 1;437(7055):69-87. PMID: 16136131
Huang X, Yang SP, Chinwalla AT, Hillier LW, Minx P, Mardis ER, Wilson RK. Application of a superword array in genome assembly. Nucleic Acids Res. 2006;34(1):201-5. PMID: 16397298; PMC: PMC1325203
Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D. Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci U S A. 2003 Sep 30;100(20):11484-9. PMID: 14500911; PMC: PMC208784
McConkey EH. Orthologous numbering of great ape and human chromosomes is essential for comparative genomics. Cytogenet Genome Res. 2004;105(1):157-8. PMID: 15218271
Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W. Human-mouse alignments with BLASTZ. Genome Res. 2003 Jan;13(1):103-7. PMID: 12529312; PMC: PMC430961