This file is from:

This directory contains compressed multiple alignments of
  123 insect genome assembly sequences on the reference
  D. melanogaster/dm6/Aug 2014.

See also: assemblyInformation.txt for information about which
          assemblies have been used in this 124-way multiple alignment

Files in this directory:

dm6.124way.sequenceNames.nh  - phylogenetic tree used for multiz alignment,
                             - with UCSC database names or sequence names
dm6.124way.scientificName.nh - the same phylogenetic tree with strictly
                             - scientific names
dm6.124way.taxId.nh - the same phylogenetic tree with NCBI taxonomy IDs
                    - there is a duplicate ID in this list: 46245
                    - there are two assembly versions of: D_pseudoobscura
                    - named: D_pseudoobscura_1 and droPse3
nameCrossReference.txt - tab separated columns with the different names
                       - for these sequences and databases, columns:
                       1. sequence name - UCSC database or scientific name
                                        - sequence names used in MAF files
                       2. NCBI accession
                       3. NCBI taxon identifier
                       4. assembly name
                       5. scientific name
sciNameToUcscDbName.txt - translation of UCSC database name to an abbreviated
                        - scientific name
sequences/* - directories with 2bit files and assembly reports from NCBI
            - for each sequence used that is not in a UCSC database
sequences/md5sum.txt - the MD5 sums for the files in this directory structure

maf/*.maf.gz - alignments referenced to dm6, separate maf files for each

upstream1000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below
upstream2000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below
upstream5000.ncbiRefSeq.maf.gz - alignments in regions upstream, see below

md5sum.txt - MD5 sums of these file to verify transmission

The "alignments" directory contains compressed FASTA alignments
for the NCBI RefSeq Gene CDS regions of the D. melanogaster genome
(dm6, Aug. 2014) aligned to the assemblies.

The upstream*.maf.gz files contain alignments in regions upstream of
annotated transcription starts for the NCBI RefSeq Genes with annotated 5' UTRs.
These files differ from the standard MAF format: they display
alignments that extend from start to end of the upstream region in
D. melanogaster, whether or not alignments actually exist. In situations where
no alignments exist or the alignments of one or more species are missing,
dot (".") is used as a placeholder. Multiple regions of an assembly's
sequence may align to a single region in the human sequence; therefore,
only the species name is displayed in the alignment data and no position
information is recorded. The alignment score is always zero in these files.

For a description of multiple alignment format (MAF), see

The phastCons data can be found at:

The phyloP data can be found at:

For more information about this data, see the track
description for the Conservation track:

Note, the uncompressed maf/*.maf.gz files are 156 Gb of data, when compressed, 
they are approximately 18 Gb of compressed data.  The entire set of
data in this directory is approximately 20 Gb.


To download a large file or multiple files from this directory, we recommend
that you use rsync or ftp rather than downloading the files via our website.

Via rsync:
rsync -avz --progress \
        rsync:// ./

Via FTP:
    user name: anonymous
    password: <your email address>
    go to the directory goldenPath/dm6/multiz124way

To download multiple files from the UNIX command line, use the "mget" command.
    mget <filename1> <filename2> ...
    - or -
    mget -a (to download all the files in the directory)
Use the "prompt" command to toggle the interactive mode if you do not want
to be prompted for each file that you download.

All the files in this directory are freely usable for any
purpose. For data use restrictions regarding the individual
genome assemblies, see
      Name                           Last modified      Size  Description
Parent Directory - alignments/ 2018-11-28 08:23 - assemblyInformation.txt 2018-11-28 16:46 14K dm6.124way.scientificName.nh 2024-06-05 14:32 7.7K dm6.124way.sequenceNames.nh 2018-12-21 11:33 6.7K dm6.124way.taxId.nh 2018-12-21 11:41 6.0K maf/ 2018-11-27 15:36 - md5sum.txt 2018-12-21 15:04 547 nameCrossReference.txt 2018-11-28 09:37 8.7K sciNameToUcscDbName.txt 2018-11-28 09:45 533 sequences/ 2018-12-21 15:10 - upstream1000.ncbiRefSeq.maf.gz 2018-12-21 11:58 495M upstream2000.ncbiRefSeq.maf.gz 2018-12-21 12:51 1.0G upstream5000.ncbiRefSeq.maf.gz 2018-12-21 14:16 2.7G