|
The purpose of the Assembly Clone
Overlap pages is to detail the amount of overlap between
clones in each of these maps. Each clone sequence in the
freeze is compared against every other clone sequence in the
freeze for sequence overlap using Jim Kent's BLAT
program. Phase 0 sequences in the freeze are not considered.
This matching is done three separate times using different
levels of stringency giving strong, medium, and weak
overlaps. The criteria is specified using flags to Jim's
program, g2gOverlap, which is used as a filter after running
the sequences through BLAT. The g2gOverlap program computes
the number of matching bases between two accessions (each
accession represented as a set of sequence contigs) taken from
alignments that satisfy certain properties (defaults in
parentheses):
- maxBad% (1) - Maximum percentage of mismatches and inserts
- maxTail (200) - Maximum non-aligning section on end of sequence contig
- minUnique (50) - Minimum number of non-repeat-masked matching bases
- minMatch (100) - Minimum number of matching bases
- minFragSize(200) - Minimum size of sequence contigs
For each of the levels of matching, the following parameters are used:
- Weak: maxBad 0.020000, maxTail 500, minUnique 16, minMatch 50,
minFragSize 200
- Medium: maxBad 0.010000, maxTail 100, minUnique 200, minMatch 400,
minFragSize 1500
- Strong: maxBad 0.010000, maxTail 100, minUnique 1000, minMatch 2000,
minFragSize 3000
The actual overlaps of the clones in the assembled draft
sequence can be seen in the corresponding freeze version of
the UCSC
Human Genome Browser under the Coverage track.
The Clone Overlap pages show the results of these matches in
tables with the following columns:
- Links - links to other pages with information for this accession -
Summary(S), Genetic(G), YAC(Y), RH(R), BAC End Pairs(B), Overlaps(O)
- Contig1 - name of contig containing first accessioned clone
- Acc1 - GenBank accession of first accessioned clone.
- Start1 - base pair in chromosome where clone starts.
- Phs - phase of Acc1 sequence in freeze.
- Chr2 - chromosome containing second accessioned clone
- Contig2 - name of contig containing second accessioned clone
- Acc2 - GenBank accession of second accessioned clone.
If this clone is also used in this map, clicking on it will
cause the contig containing that clone to be displayed with
the accession at or near the top.
- Start2 - base pair in chromosome where clone starts.
- Phs - phase of Acc2 sequence in freeze.
- Strong Overlap - number of bases matched between the two clones using
the strictest matching criteria
- Medium Overlap - number of bases matched between the two clones
which are not matched using the strictest criteria but are
matched using the less strict criteria.
- Weak Overlap - number of bases matched between the two clones
which are not matched using either of the stricter criteria,
but are matched using the weakest criteria.
- Total Overlap - the sum of strong, medium, and weak overlap matches.
There are separate tables for each of the contigs in a
chromosome. Each accession begins with a white colored line
which reports the amount of overlap with the accession
immediately following it within the same contig. It is hoped,
of course, that these overlaps are significant. In some case,
though, this overlap may not be significant for finished
clones due to the clones being trimmed for sequencing
purposes.
Following the white line for each accession are the
significant overlaps with other clones in the freeze.
Currently, an overlap is considered significant if there are
at least 10,000 bases that strongly match. In order to help
identify where the second clone is placed in the draft
sequence in relation to the first, the lines are colored with
the following interpretations:
- Green - the second clone is in the same contig.
- Pink - the second clone is in the same chromosome, but on a different
contig
- Red - the second clone is on a different chromosome.
- Yellow - the second clone is in the freeze,
but is not used in this map.
The pages are designed so that you can view one contig at a
time, or all contigs for a single chromosome at once. A
header frame is provided on each page to allow for easier
navigation through the table, to provide a reference for the
meanings of the line colors, and to display the table column
names for convenience when viewing longer contigs.
Warning: Many of the pages contain very large tables
and may take a while to load, especially for the larger
chromosomes. Please be patient.
|