Biostar Beta. Not for public use.
Question: Removing unmatched Ns from query sequence based on alignment
0
Entering edit mode

Assume a contig of draft genome assembly that contains several regions of unknown sequence and length, which are indicated by stretches of 'N'. Further assume the presence of a contig of the same genome region, whose source is a complete and trustworthy genome assembly of the same organism. Finally, assume that the contig of the draft genome and the contig of the final genome have been properly aligned (i.e., positional homology is established).

The result would be something like this:

$ cat alignm.fas
>contig_draft
ACGTNNNNNNNNNNACGTNNNNNNNNNNACGT
>contig_final
ACGTACGT------ACGT------ACGTACGT

I would like to remove those 'N' from the draft contig that are matched by a gap character in the alignment (n=12 in the above example), but retain those that are matched by a nucleotide (n=8 in the above example).

In the above example, the desired output would be a new draft contig sequence with a length of 20 nucleotides, 8 of which are 'N'.

$ cat alignm.fas | commands_sought
>contig_draft
ACGTNNNNACGTNNNNACGT

Is there any code (e.g., on GitHub) already available to conduct the above-mentioned procedure?

Entering edit mode
0

You might have a look at some of the MSA trimming tools? eg. trimAl or such?

I assume they will not be default give what you want , but with a bit of parameter tuning you might get there

ADD REPLYlink 22 months ago
lieven.sterck
5.1k
Entering edit mode
0

Hi, a novice attempt with t_coffee would be

t_coffee -other_pg seq_reformat -in alignm.fas -action +rm_gap 1
>contig_draft
ACGTNNNNACGTNNNNACGT

contig_final ACGTACGTACGTACGTACGT

But this will just remove all gapped columns also those with a non-N character which is matched by a gap.

ADD REPLYlink 22 months ago
microfuge
♦ 1.1k
Entering edit mode
0

Nice, I was not aware t_coffee could do that himself.

I would give this a try and see where you end up. If the diff would be only the newly added sequence it will work perfectly. In the other case I think it might still work in most cases, given that there is only a few bases difference they will be aligned as a mismatch rather than a gap (tuning the alignment params might anyway give this result is not already the case)

ADD REPLYlink 22 months ago
lieven.sterck
5.1k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0