This file is from:
http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/multiz7way/README.txt
This directory contains compressed multiple alignments of 7 virus sequences.
These 7 sequences represent coronavirus strains in human populations
The 'reference' sequence for this collection is the sequence:
NC_045512v2 - 2019-12-30 - Wuhan-Hu-1
https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2
Description files in this directory:
md5sum.txt - md5 sums to verify copied files
wuhCor1.7way.nameList.txt - relating the accession name to
sequence name, and sample collection date
wuhCor1.7way.nh - Phylogenetic tree used for multiz alignment.
The phylogenetic tree was calculated on 31mer frequency similarity
and neighbor joining that distance matrix with the phylip toolset:
http://evolution.genetics.washington.edu/phylip.html
'neighbor' command:
http://evolution.genetics.washington.edu/phylip/progs.data.dist.html
wuhCor1.multiz7way.maf.gz - alignments with gap annotation with
accession identifiers
dnaFasta7.fa.tgz - gzipped tar file for the DNA fasta, 7 sequences
- to extract sequences: tar xvzf dnaFasta7.fa.tgz
- creates seven files:
# -rw-rw-r-- 1 27718 May 13 08:44 CoV229E.fa
# -rw-rw-r-- 1 30361 May 13 08:44 HKU1.fa
# -rw-rw-r-- 1 30557 May 13 08:44 MERS.fa
# -rw-rw-r-- 1 27954 May 13 08:44 NL63.fa
# -rw-rw-r-- 1 31188 May 13 08:44 OC43.fa
# -rw-rw-r-- 1 30190 May 13 09:37 SARS_CoV_1.fa
# -rw-rw-r-- 1 30515 May 13 08:49 wuhCor1.fa
Example measurement of the sequences with the 'faCount' command:
# faCount *.fa
#
# #seq len A C G T N cpg
# CoV229E 27317 7420 4549 5903 9445 0 488
# HKU1 29926 8331 3895 5699 12001 0 340
# MERS 30119 7900 6116 6304 9799 0 711
# NL63 27553 7253 3979 5516 10805 0 332
# OC43 30741 8502 4660 6649 10930 0 485
# SARS_CoV_1 29751 8481 5940 6187 9143 0 568
# NC_045512v2 29903 8954 5492 5863 9594 0 439
# total 205310 56841 34631 42121 71717 0 3363
For a description of multiple alignment format (MAF), see
http://genome.ucsc.edu/goldenPath/help/maf.html.
---------------------------------------------------------------
To download a large file or multiple files from this directory, we recommend
that you use rsync or ftp rather than downloading the files via our website.
Via rsync:
rsync -avz --progress \
rsync://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/multiz7way/ ./
Via FTP:
ftp hgdownload.soe.ucsc.edu
user name: anonymous
password: <your email address>
go to the directory goldenPath/wuhCor1/multiz7way
To download multiple files from the UNIX command line, use the "mget" command.
mget <filename1> <filename2> ...
- or -
mget -a (to download all the files in the directory)
Use the "prompt" command to toggle the interactive mode if you do not want
to be prompted for each file that you download.
---------------------------------------------------------------
All the files in this directory are freely usable for any
purpose. For data use restrictions regarding the individual
genome assemblies, see http://genome.ucsc.edu/goldenPath/credits.html.
---------------------------------------------------------------
Name Last modified Size Description
Parent Directory -
wuhCor1.multiz7way.maf.gz 2020-05-13 10:49 43K
wuhCor1.7way.nh 2020-05-19 13:06 178
wuhCor1.7way.nameList.txt 2020-05-19 13:03 289
md5sum.txt 2020-05-19 13:10 221
dnaFasta7.fa.tgz 2020-05-19 12:05 62K