This directory contains FASTA files which contain a modified version
of the Build 36.1 finished human genome assembly (hg18,
Mar. 2006). The chromosomal sequences were assembled by the
International Human Genome Project sequencing centers. The hg18/36.1
assembly was changed to use IUPAC ambiguous nucleotide characters at
each base covered by a stringently filtered subset of single-base
substitutions annotated by dbSNP build 129. For example, if the
assembly has an 'A' at a position where dbSNP has annotated an A/C/T
substitution SNP, the 'A' is replaced by 'H' in the FASTA file here.
dbSNP single-base substitutions were excluded from masking in the
following cases:
- UCSC tagged the dbSNP item with any of these exceptions (see also
hg18.snp129Exceptions and hg18.snp129ExceptionDesc database tables):
- MultipleAlignments: dbSNP mapped item to multiple locations
- ObservedMismatch: the reference allele does not appear in the item's
observed alleles.
- ObservedWrongFormat: the observed sequence has an unexpected format
(no instances of this exception were found in snp129)
- dbSNP item class is not "single".
- dbSNP item length is not exactly one base.
- dbSNP item weight is greater than 1. (lower weight = higher confidence)
The remaining single-base substitutions were used to mask the genomic
sequence.
Files included in this directory:
chr*.subst.fa.gz - FASTA files with IUPAC characters for substitution SNPs
md5sum.txt - checksums of files in this directory
------------------------------------------------------------------
If you plan to download a large file or multiple files from this
directory, we recommend that you use ftp rather than downloading the
files via our website. To do so, ftp to hgdownload.cse.ucsc.edu
[username: anonymous, password: your email address], then cd to the
directory goldenPath/hg18/bigZips. To download multiple files, use
the "mget" command:
mget <filename1> <filename2> ...
- or -
mget -a (to download all the files in the directory)
Alternate methods to ftp access.
Using an rsync command to download the entire directory:
rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp129Mask/ .
For a single file, e.g. chr1.subst.fa.gz
rsync -avzP \
rsync://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp129Mask/chr1.subst.fa.gz .
Or with wget, all files:
wget --timestamping \
'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp129Mask/*'
With wget, a single file:
wget --timestamping \
'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg18/snp129Mask/chr1.subst.fa.gz' \
-O chr1.subst.fa.gz
To uncompress the fa.gz files:
gunzip <file>.fa.gz
Name Last modified Size Description
Parent Directory -
chr1.subst.fa.gz 2008-06-30 15:43 72M
chr2.subst.fa.gz 2008-06-30 15:45 76M
chr3.subst.fa.gz 2008-06-30 15:46 62M
chr4.subst.fa.gz 2008-06-30 15:46 60M
chr5.subst.fa.gz 2008-06-30 15:46 57M
chr5_h2_hap1.subst.fa.gz 2008-06-30 15:46 550K
chr6.subst.fa.gz 2008-06-30 15:46 53M
chr6_cox_hap1.subst.fa.gz 2008-06-30 15:47 1.5M
chr6_qbl_hap2.subst.fa.gz 2008-06-30 15:47 1.3M
chr7.subst.fa.gz 2008-06-30 15:47 49M
chr8.subst.fa.gz 2008-06-30 15:47 45M
chr9.subst.fa.gz 2008-06-30 15:47 38M
chr10.subst.fa.gz 2008-06-30 15:43 42M
chr11.subst.fa.gz 2008-06-30 15:44 42M
chr12.subst.fa.gz 2008-06-30 15:44 41M
chr13.subst.fa.gz 2008-06-30 15:44 31M
chr14.subst.fa.gz 2008-06-30 15:44 28M
chr15.subst.fa.gz 2008-06-30 15:44 26M
chr16.subst.fa.gz 2008-06-30 15:44 25M
chr17.subst.fa.gz 2008-06-30 15:45 24M
chr18.subst.fa.gz 2008-06-30 15:45 24M
chr19.subst.fa.gz 2008-06-30 15:45 17M
chr20.subst.fa.gz 2008-06-30 15:45 19M
chr21.subst.fa.gz 2008-06-30 15:45 11M
chr22.subst.fa.gz 2008-06-30 15:46 11M
chrM.subst.fa.gz 2008-06-30 15:47 6.0K
chrX.subst.fa.gz 2008-06-30 15:47 48M
chrY.subst.fa.gz 2008-06-30 15:47 7.9M
md5sum.txt 2008-07-01 11:47 1.4K