Biostar Beta. Not for public use.
Question: Help with extracting multiple sequences from a fasta file with a list of Ids (four counting per line)
0
Entering edit mode

I would like to extract multiple sequences from a fasta file with a list of counting ids (four counting per line). I found several scripts to extract sequences from fasta file based on a list of counting ids but with one counting per line. In my list of counting ids I have four counting per line. This is the header of my list of itd list

OG1.5_9691: aco|TRINITY_DN39707_c3_g4_i1.p1 bio|GFMW01138197.1.p1 lym|FX192122.1.p1 physa|Contig31631.p1
OG1.5_9693: aco|TRINITY_DN34744_c0_g1_i2.p1 bio|GFMW01140870.1.p1 lym|FX194372.1.p1 physa|Contig299.p1
OG1.5_9694: aco|TRINITY_DN40605_c7_g1_i1.p1 bio|GFMW01145544.1.p1 lym|FX194851.1.p1 physa|Contig70050.p1
OG1.5_9695: aco|Contig7627.p1 bio|GFMW01145616.1.p1 lym|FX202590.1.p1 physa|Contig22503.p1

I would really approciate any help you can provide to extract my sequences from the fasta file.

ADD COMMENTlink 11 months ago ahmed_bio82 • 0 • updated 11 months ago Pierre Lindenbaum 120k
Entering edit mode
0

I have edited the question for you this time, but for future reference, this is a Question not a Tutorial.

Its not clear to me what you mean by extracting IDs by 'counting'.

Can you show your input data? It looks like you've only shown us one of the 2 files.

ADD REPLYlink 11 months ago
Joe
12k
Entering edit mode
0

faSomeRecords utility from Jim Kent should extract the sequences as long as the fasta header exactly matches in both files. Linux version linked. Remember to chmod a+x faSomeRecords after you download before executing. I assume 4 counting means there are 4 identifiers separated by space in your headers?

ADD REPLYlink 11 months ago
genomax
68k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0