Biostar Beta. Not for public use.
How to count the length of fasta sequences?
0
Entering edit mode
11 months ago
Kasaragod, Kerala, India

I know, how to count the length of a particular fasta sequence. But, I need to count the length of a particular fasta sequence based on the header listed in another txt file. The base length to be printed in another/specified excel file at column number 3. Therefore, please help me to do the same. Thank you in advance.

ADD COMMENTlink
0
Entering edit mode

What have you tried? There are definitely solutions for this on the forum already.

ADD REPLYlink
0
Entering edit mode

Dear healey, The below command can list out the base length of each fasta sequences of the file,

awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta

But, I do not know, how to print the specified header's length listed in another txt file. Moreover, how to print the length in another excel sheet.

ADD REPLYlink
0
Entering edit mode

You will have to provide sample data.

ADD REPLYlink
0
Entering edit mode

Suppose, I have a multi fasta sequences in org1.fasta as given below,

>seq1
ATGCTA

>seq2
GCTAGTT

>seq3
TAGC

I need to count the length of following header's listed in id.txt as given below,

seq1
seq2

And the results(length) to be printed at 3rd coloumn (Total length) of another csv file org.csv as shown below,

ID       hit_length    Total length
seq1    3                  **6**
seq2    4                  **7**
ADD REPLYlink
0
Entering edit mode

what is hit_length

ADD REPLYlink
0
Entering edit mode

ID and hit_length columns are the existing data of csv file. I need to print the length at 3rd column as I specified with in * symbol.

ADD REPLYlink
0
Entering edit mode

just add |paste - -

unique.fasta

>seq1
ATGCTA

>seq2
GCTAGTT

>seq3
TAGC

awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta |paste - -

stdout

>seq1   6
>seq2   7
>seq3   4

If you need hit_length as the 2nd column there, then use a combination of GNU utilities paste and cut

awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta |paste - - |cut -f 2 > col3

col3

6
7
4

Combine the files together

paste file-with-ID-as-col1-and-hit_length-as-col2 col3 > final.result

This assumes that the order is the same in your results example and in the unique.fasta file

ADD REPLYlink
0
Entering edit mode
12 months ago
fhsantanna • 440
Brazil

I suggest using faslen from FAST tools. https://github.com/tlawrence3/FAST

ADD COMMENTlink
0
Entering edit mode

Thank you fhsantanna,

faslen does the same thing what this does

awk '/^>/{if (l!="") print l; print; l=0; next}{l+=length($0)}END{print l}' unique.fasta

But, I need to get the length of listed fasta headers only.

ADD REPLYlink
0
Entering edit mode

You could do a faslen, and then "grep > file.fasta > headers.txt" will capture the headers.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1