Remove Gaps from multiple Alignment
1
0
Entering edit mode
6.6 years ago
misbahabas ▴ 70

Asslam u Alikum

I want to remove gaps from multiple sequence alignment , i tried many tools and scripts but it works in different ways.

H= -----ATGCGTACG-TGCA--C
M= ATGCCCGATCGCH----ATC

I want to remove gaps according to reference like if gaps is present in first line (reference) remove that column like this

H= ATGCGTACGTGC
M= CCCGATCGC----C

but different tools remove all gaps columns from whole file. please tell me any idea about it

alignment command sequence • 5.1k views
ADD COMMENT
0
Entering edit mode

please check your alignment, your expected output doesn't make any sense. 'M=' should end with 'T'

ADD REPLY
0
Entering edit mode

sorry it just a example both sequence have same lengths sorry again

>1
------------mfelaeySGLL---TLFL-IASFPIFT-SPIG---

>2
------------mfelsgyAVLLFFMVIFL-VASFPLLS-SPIG---

>3
--------MKLF-------TFLLFFL-LLC-LSMIPLLS-SPVS---

i want to remove gaps column according to >1 sequence

ADD REPLY
0
Entering edit mode

and what is 'H' here ??? what is the expected output ?

ADD REPLY
0
Entering edit mode

input

HUMAN= --ATGC-C

MOUSE= ATGC--T-

I just a example input, Now i want output like this

HUMAN= ATGCC

MOUSE= GC---

I want to remove gap columns according to reference not between all the file, here human is reference

ADD REPLY
0
Entering edit mode
6.6 years ago
Joe 21k

Easiest way to proceed, to my mind is to do the following (fair warning though, this is dirty):

Split the multifasta up first:

#!/bin/bash
i=1;
while read line ; do
  if [ ${line:0:1} == ">" ] ; then
    header="$line"
    echo "$header" >> seq"${i}".fasta
  else
    seq="$line"
    echo "$seq" >> seq"${i}".fasta
    ((i++))
  fi
done < myalignment.fasta

Assuming your human reference sequence is always the first one, this script will generate a file for each, e.g. seq1.fasta, seq2.fasta, seq3.fasta etc.

This means your reference should always be the first seq1.fasta.

Then to remove the gaps, you can simply do sed -i 's/-//g' seq1.fasta.

Then concatenate them all back together if you need a multifasta again:

cat seq*.fasta > edited_multifasta.fasta

ADD COMMENT
0
Entering edit mode

No doubt this could be achieved in a single script with Biopython which would also be a safer option, but I had this code to hand.

ADD REPLY

Login before adding your answer.

Traffic: 2489 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6