The "analysis set" is a version of the genome prepared for next-gen sequencing read alignment. We are making available versions of the NCBI files here converted into UCSC formats. For a full description of the "analysis set" concept, see NCBI's README file: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/ Below are the original UCSC analysisSet files. These files are no longer updated. For the latest analysis set see the FTP NCBI directory at the address above. Files included in this directory: hg38.analysisSet.2bit - analysis set sequence hg38.analysisSet.fa.gz - analysis set sequence hg38.analysisSet.chroms.tar.gz - analysis set sequence one file per chromosome The analysis set sequence is masked as mentioned in ../README.txt, repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. The sequences in the file are otherwise identical to the NCBI file GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz hg38.fullAnalysisSet.2bit - all of the sequence from the above set, plus all of the alt-scaffolds from the GRCh38 ALT_REF_LOCI_* assembly units. hg38.fullAnalysisSet.chroms.tar.gz - all of the sequence from the above set, plus all of the alt-scaffolds from the GRCh38 ALT_REF_LOCI_* assembly units. The analysis set sequence is masked as mentioned in ../README.txt, repeats from RepeatMasker and Tandem Repeats Finder (with period of 12 or less) are shown in lower case; non-repeating sequence is shown in upper case. md5sum.txt - checksums of files in this directory
Name Last modified Size Description
Parent Directory - hg38.analysisSet.2bit 2014-01-27 10:40 770M hg38.analysisSet.chroms.tar.gz 2014-01-27 11:02 905M hg38.analysisSet.fa.gz 2021-09-01 00:30 905M hg38.fullAnalysisSet.2bit 2014-03-18 13:23 797M hg38.fullAnalysisSet.chroms.tar.gz 2014-03-18 13:41 936M md5sum.txt 2021-09-01 01:00 307