Biostar Beta. Not for public use.
Parse allele database
0
Entering edit mode
13 months ago
oghzzang • 40

Dear Biostars users.

I have this variant format.

ex)

CHROM POS   REF   ALT
1             150    CAC  CAAC

Can I this format change following format using python?

CHROM POS   REF   ALT
1             150      C        CA
1             151      A        AA
python • 143 views
ADD COMMENTlink
1
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLYlink
0
Entering edit mode

Thanks. From now, I'll use this button. :)

ADD REPLYlink
1
Entering edit mode
4 months ago
RamRS 21k
Houston, TX

What you're asking for is called "left aligning normalization". It represents variants in the most parsimonious notation and is one of the best practices I've encountered and continue to use all the time.

If you have the VCF file this data comes from and the reference sequence used in the analysis, you can use either bcftools norm (bcftools) or vt decompose | vt norm (vt) to get to where you need from the VCF file. I'd recommend the latter as it makes tracking changes easier by adding OLD_MULTIALLELIC and OLD_VARIANT INFO fields.

If not, it becomes a much more challenging task because you're going to need to compare the REFERENCE sequence and ALT alleles manually to get to your solution.

ADD COMMENTlink
0
Entering edit mode

Thank you for your help.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1