UCSC MAF format left or right align gaps?
1
1
Entering edit mode
7.0 years ago
jxmavs ▴ 10

Hello,

Does anyone know if the MAF format have a convention of right or left aligning its gaps/indels as seen in VCF files (where an indel is always left-aligned)?. I generated my own multiple sequence alignment using the multiz pipeline from UCSC and found that in some cases a gap is right aligned, in other cases its left aligned.

What I mean is the following, assume that I have 3 sequences, looking only at the sequence part in the MAF format,

A- TTA

A- TTA

ATTTA

In this case the gap is left aligned, if it was right aligned, I would see the following:

ATT -A

ATT -A

ATTTA

In my MAF formats, I see both instances occurring, which makes me wonder if there is a set convention or not. If there isn't, I would also appreciate any references on tools which can left align these gaps in the sequence file if there are any available.

genome ucsc MAF • 1.6k views
ADD COMMENT
0
Entering edit mode
4.0 years ago

It is likely something that was 'looked over' during the processing of the data, i.e., some labs performed left-alignment, while others did not. It's notable, too, that different labs used different variant callers for the data. So, the MAF Level 3 [open access] TCGA data is a real mixed bag and needs to be used with utmost caution.

Your post is 3 years old, so, perhaps this situation has been improved, now that the TCGA consortium has 'harmonised' all of their data.

Kevin

ADD COMMENT

Login before adding your answer.

Traffic: 2799 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6