Read Location From Amos File
1
1
Entering edit mode
12.7 years ago
Kelly ▴ 10

Hey all,

I'm doing a project involving scaffolding with paired end reads, and I want to find the distances between two contigs connected by a paired end.
I'm fairly new to bioinformatics, but I'm under the impression that you can infer a minimum distance by the measuring the distances from each read to the ends of the contigs that they lie in.

My problem lies with extracting this information from an AMOS (afg) file. It seems this information lies in the 'offset' field, but I'm unsure of what this number actually represents. I've browsed around on the web, but haven't found any resource that helps with this problem specifically.

Any help would be greatly appreciated.

distance read scaffolding • 2.5k views
ADD COMMENT
4
Entering edit mode
12.7 years ago

You are right about the way to calculate distance by measuring the read to the contig ends.

From http://www.cbcb.umd.edu/research/contig_representation.shtml,

{TLE
src:1027
off:0
clr:618,0
gap:
250 612
.
}

The offset field marks the start of the read. in this case, off: 0 means the read is at the start of the contig. clr: 618,0 means it has 618 bases matching the contig, and in reverse orientation.

So from the offset you get the location of the read, then you can get the distance for paired reads. For scaffolding, you have to also pay attention to the read orientation with respect to the contigs.

contig1              contig2
=============        ============
    ---->               ---->

Then you need to flip contig2, since you expect the reads to point towards one another.

contig1              contig2(-)
=============        ============
    ---->                <----
    |------- distance -------|

It is not part of what you asked for. But if you are thinking of a scaffolding solution based on AMOS files, you are likely doing a similar thing as BAMBUS and BAMBUS2. BAMBUS looks at each paired read, then calculates contig links, and then bundles them into contig edges. It uses a bunch of heuristics to produce a linear order of the contigs. It is worthwhile to study the paper.

ADD COMMENT
2
Entering edit mode

I am not certain, but it might be that there are unaligned portion of the read. So say the first 3 bases of the read are not in the contig, then it's -3. What does the clr say in that case?

ADD REPLY
0
Entering edit mode

If the offset is the location to the beginning of the read, could help me understand why some of the offsets are negative?

ADD REPLY
0
Entering edit mode

One example: off: -72 clr: 100,0

ADD REPLY
0
Entering edit mode

hmm.. try to verify the meaning of the offset by extracting the read sequence and compare to the contig consensus. the documentation is not very clear on what it means.

ADD REPLY

Login before adding your answer.

Traffic: 2574 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6