Biostar Beta. Not for public use.
How to get the list of all genes present in sam file?
0
Entering edit mode
2.0 years ago
mail2steff • 50
Potsdam, Germay

I have two samfiles which were generated by Bowtie (WGS). I have set of gene sequences separately in fasta format. I need to check whether these set of genes are present in sam files or not. How can I achieve this?

ADD COMMENTlink
0
Entering edit mode

This would be easier if you have a gtf, gff or bed file of your genes of interest. Do you?

If not, I believe the easiest is aligning your fasta to your reference genome, convert that to bed and use that for counting your Sam file (and checking presence).

ADD REPLYlink
0
Entering edit mode

Thank you for the reply. I have fasta sequence of 25 genes. Not gtf or gff or bed file. While aligning, should I merge 25 fasta sequences to one file and do the aignment against reference genome?

ADD REPLYlink
0
Entering edit mode

That would be good yes.

ADD REPLYlink
0
Entering edit mode

If genes are merged then that would be considered as one reference, but I think you're looking for each individual gene in your data. So, I'd rather keep each gene in unique FASTA file and align. Maybe I misunderstood something here.
Please feel free to correct me.

ADD REPLYlink
0
Entering edit mode

As I understood it, OP asks if the fasta records should be put together in one file (multifasta) or kept in separate files. I don't think OP wants to merge fasta records.

ADD REPLYlink
1
Entering edit mode
14 months ago
National Institutes of Health, Bethesda…

Have you considered using Salmon or Kallisto to quantify your reads against your genes?

ADD COMMENTlink
0
Entering edit mode

That's a good solution.

ADD REPLYlink
0
Entering edit mode

will Kallisto work for WGS also?

ADD REPLYlink
0
Entering edit mode

Sure, if you just want to know if those genes are covered

ADD REPLYlink
0
Entering edit mode

It wasn't clear in your original post that these were DNA sequence results. In that case, I'd suggest mapping your FASTA sequences against your genome and then looking at the coverage in sequencing across the regions to which your FASTA sequences map. Kallisto may "work", but I think we are pretty far off the beaten path.

ADD REPLYlink
0
Entering edit mode

Thank you for the commands. Ill check on this

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1