transcript and fasta sequence
1
0
Entering edit mode
6.5 years ago
qudrat ▴ 100

Hello all! I have two file containing ten thousands of transcripts fasta sequences each with different ids and I am interested in finding common sequences between the two files. Somebody please help me as it is hindering my work. Thank you in appreciation

sequence Assembly • 2.1k views
ADD COMMENT
0
Entering edit mode

Somebody please help me as it is hindering my work.

In what way?

How about this tool that merges assemblies? See more options here: High quality de novo transcriptome assembly rely on merging multiple assembly? Specifically dedupe.sh from BBMap should be very simple to use.

ADD REPLY
0
Entering edit mode

ten thousands of transcripts fasta sequences each with different ids

Can you post an example? Is this a de novo assembly?

ADD REPLY
0
Entering edit mode

Actually this a de novo assembly produced by using two different softwares to minimizes the false positives

ADD REPLY
0
Entering edit mode
6.5 years ago
glihm ▴ 660

Hello qudrat,

  1. If you are interested in IDENTICAL sequences, you can simply write a very short script to extract identical sequences in both files.

  2. You want to apply a "similarity" score, if so I strongly suggest using multi-aligners (BLAST or MUSCLE for instance) and then parse the results to have a global overview of similarity between sequences from your two different files.

  3. EDIT @genomax commentary: Use of assembly merge-tool.

ADD COMMENT
0
Entering edit mode

Hi glihm, Actually I was thinking of sort but I do not know script writing. This is a de novo assembly using two different software and I am doing this to minimizes the false positives.

ADD REPLY
0
Entering edit mode

Your request is now clearer. The answer of @genomax is in this case well suited for your issue by using assembly merging.

ADD REPLY

Login before adding your answer.

Traffic: 1457 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6