Do EGA RNAseq .bam files come with gtf/gff file corresponding to the genome assembly used?
1
0
Entering edit mode
2.9 years ago
rva_jango ▴ 10

I have downloaded an RNAseq dataset .BAM files with the pyega3 tool at EGA.

I have also downloaded the .tar file that lists the experiments, runs, etc but do not see a .gtf or .gff file to run something like featureCounts on the BAM files.

Any input appreciated, feel like there may be an easy answer here.

gtf gff RNAseq bam EGA • 769 views
ADD COMMENT
2
Entering edit mode
2.9 years ago
GenoMax 141k

You can look in the headers of the BAM files to see what genome build was used for the alignments. Most aligners will capture the command line used for the alignment and include it in this file. You can choose a GFF/GTF file based on the source/version of that genome build.

If these happen to be unaligned BAM files (yes you can create these from raw fastq data) then you can convert the BAM files back to fastq reads and then use an aligner, genome and annotation combination of your choice.

ADD COMMENT
0
Entering edit mode

Thanks for your directions.

module load sambamba/0.6.8
sambamba view -H $in | head

sambamba 0.6.8 by Artem Tarasov and Pjotr Prins (C) 2012-2018 LDC 1.10.0 / DMD v2.080.1 / LLVM6.0.1 / bootstrap LDC - the LLVM D compiler (0.17.4)

@HD VN:1.0 SO:coordinate @SQ SN:1 LN:249250621
AS:assembly19 SP:Homo_sapiens

I then downloaded the .GTF file from ensemble. Thanks.

To your revert to fastq, I did not do that but it was recommended as several pipelines start with fastq files.

bedtools bamtofastq [OPTIONS] -i <BAM> -fq <FASTQ>
ADD REPLY

Login before adding your answer.

Traffic: 1971 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6