Biostar Beta. Not for public use.
gene sequence fetching from bam or fastq file of rna seq data
0
Entering edit mode
14 months ago

I have 6 wheat samples sequenced using RNA-seq. I received forward and reverse fastq files and I generated bam files by using hisat2 tool which are aligned with the reference wheat genome. I have been asked to build multiple sequence alignment for 3 genes from this sequenced rna seq data. I believe I need to select a gene sequence from all the samples and do a multiple sequence alignment. But I am struck in fetching the gene sequence from bam files. How do I select one gene sequence for all the 6 samples? Any suggestions? kidly send me commands for fetching the sequence of genes from this RNA seq data in fastq file or aligned.bam file.

ADD COMMENTlink
0
Entering edit mode

are you looking for isoforms? read about transcript assembly with stringtie, or isoform detection, once detected or assembled you can extract the sequences and align them.

ADD REPLYlink
0
Entering edit mode
14 months ago

Hi Buffo, Thanks, I have run the srtigtie.got this error. kindly check this error.i did not get it and solve it .Kindly set this commandaccording to my sample.

./stringtie G1_sorted.bam -B -o G1.gtf -G Triticum_aestivum.IWGSC.42.gtf -p 4 -C G1.refs.gtf -A G1.abund.tab -WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences

ADD COMMENTlink
0
Entering edit mode

If you use -c

-C output a file with reference transcripts that are covered by reads

And gets

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped!

It literaly means that you reads have mapped to regions without annotated transcripts. Be sure you are using the correct annotation. column 3 = Transcript

ADD REPLYlink
0
Entering edit mode

Now I run this command without the -c option .again got the same error. suggest me a solution .

./stringtie G1_sorted.bam -G Triticum_aestivum.IWGSC.42.gtf -l G1-Label -o G1_ST.gtf -p 15 WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.

ADD REPLYlink
0
Entering edit mode

here is gtf file

!genome-build IWGSC

!genome-version IWGSC

!genome-date 2018-07

!genome-build-accession GCA_900519105.1

3B IWGSC gene 212892 214491 . - . gene_id "TraesCS3B02G000100"; gene_source "IWGSC"; gene_biotype "protein_coding"; 3B IWGSC transcript 212892 214491 . - . gene_id "TraesCS3B02G000100"; transcript_id "TraesCS3B02G000100.1"; gene_source "IWGSC"; gene_biotype "protein_coding"; transcript_source "IWGSC"; transcript_biotype "protein_coding";

ADD REPLYlink
0
Entering edit mode

Before to perform any bioinformatic analysis I would recommend you:

Be sure what are you looking for 
It is possible to get by in silico analysis?
what tools are available to do it? and how it works?
Do I need professional help?

Sorry but it looks like you do not have idea what are you doing, read about gtf file format. If you perform Stringtie analysis it searchsfor transcripts (column 3), and therefore, if you do not have annotated transcripts or your reads maps to not annotated regions you will get a warning like....

WARNING: no reference transcripts were found for the genomic sequences where reads were mapped! Please make sure the -G annotation file uses the same naming convention for the genome sequences.
ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3