Ensembl gene annotation GRCh37.87
1
0
Entering edit mode
5.0 years ago
zizigolu ★ 4.3k

Hi,

I need to quantify gene expression by salmon so I need Ensembl gene annotation GRCh37.87 likely in fasta

I tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/ but not working

Do you know know from where I can downlioad such file for salmon?

Thank you

RNA-Seq ensembl salmon • 4.0k views
ADD COMMENT
2
Entering edit mode

Ensembl gene annotation for GRCh37.87 would be in GTF or GFF3 not FASTA.

edit: ftp://ftp.ensembl.org/pub/grch37/release-87/

ADD REPLY
0
Entering edit mode

Sorry but in Salmon manual says in fasta

https://salmon.readthedocs.io/en/latest/salmon.html

ADD REPLY
0
Entering edit mode

If Salmon needs the FASTA sequences for the transcripts, then you can do the follow these steps (http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread_ex) using the GRCh37 reference from GENCODE and getting the GTF file from possibly from ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz. This assumes your BAM files are alignments to the main assembly and not including the alternative haplotypes or patches.

ADD REPLY
0
Entering edit mode

I have bam files I have alignment come from GRCh37_g1k by STAR; Now I need to quantify raw counts; I have done that by featurecounts but I have a lot of strange features so I decided to used Salmon. Thank you anyway but I don't know why I can not open these links

So not I though to convert my bam to fastq

When I used this command in Salmon I obtained this error

salmon quant -t gencode.v30lift37.transcripts.fa -l A -a file.bam -o salmon_quant

If you have access to the genome FASTA and GTF used for alignment
consider generating a transcriptome fasta using a command like:
gffread -w output.fa -g genome.fa genome.gtf
you can find the gffread utility at (http://ccb.jhu.edu/software/stringtie/gff.shtml)

Finally I used gffread but I am getting this error

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gff3

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gtf

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$
ADD REPLY
1
Entering edit mode

have you tried wget ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz ?

Perhaps your chromosomes are in the format >1 for chr1 instead of the gencode >chr1 format?

ADD REPLY
2
Entering edit mode

I just tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.cdna.all.fa.gz, it worked for me. This is the file that I think you need for Salmon.

ADD REPLY
3
Entering edit mode
5.0 years ago
GenoMax 141k

Use GRCh37 fasta from GENCODE. If you need annotations then those are available there as well.

That said for new analyses you should stick with current release unless you are trying to reproduce some past analyses.

ADD COMMENT
0
Entering edit mode

Thank you, bam files are from GRCh37 so I need that I guess

ADD REPLY
0
Entering edit mode

Sorry @genomax

Where I can find the same genome.fasta and GTF files from GRCh37_1k (1000 genome) ?

I googled, for genome.fasta there were 3 phases but I did not find any corresponding GTF

ADD REPLY

Login before adding your answer.

Traffic: 2906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6