Biostar Beta. Not for public use.
Ensembl gene annotation GRCh37.87
0
Entering edit mode
12 months ago
F ♦ 3.4k
Iran

Hi,

I need to quantify gene expression by salmon so I need Ensembl gene annotation GRCh37.87 likely in fasta

I tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/ but not working

Do you know know from where I can downlioad such file for salmon?

Thank you

ADD COMMENTlink
2
Entering edit mode

Ensembl gene annotation for GRCh37.87 would be in GTF or GFF3 not FASTA.

edit: ftp://ftp.ensembl.org/pub/grch37/release-87/

ADD REPLYlink
0
Entering edit mode

Sorry but in Salmon manual says in fasta

https://salmon.readthedocs.io/en/latest/salmon.html

ADD REPLYlink
0
Entering edit mode

If Salmon needs the FASTA sequences for the transcripts, then you can do the follow these steps (http://ccb.jhu.edu/software/stringtie/gff.shtml#gffread_ex) using the GRCh37 reference from GENCODE and getting the GTF file from possibly from ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz. This assumes your BAM files are alignments to the main assembly and not including the alternative haplotypes or patches.

ADD REPLYlink
0
Entering edit mode

I have bam files I have alignment come from GRCh37_g1k by STAR; Now I need to quantify raw counts; I have done that by featurecounts but I have a lot of strange features so I decided to used Salmon. Thank you anyway but I don't know why I can not open these links

So not I though to convert my bam to fastq

When I used this command in Salmon I obtained this error

salmon quant -t gencode.v30lift37.transcripts.fa -l A -a file.bam -o salmon_quant

If you have access to the genome FASTA and GTF used for alignment
consider generating a transcriptome fasta using a command like:
gffread -w output.fa -g genome.fa genome.gtf
you can find the gffread utility at (http://ccb.jhu.edu/software/stringtie/gff.shtml)

Finally I used gffread but I am getting this error

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gff3

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$

[fi1d18@cyan01 fi1d18]$ /temp/hgig/fi1d18/gffread-0.11.2.Linux_x86_64/gffread -w transcripts.fa -g hs37d5.fa gencode.v30lift37.annotation.gtf

Warning: couldn't find fasta record for 'chr1'!
Error: no genomic sequence available (check -g option!).
[fi1d18@cyan01 fi1d18]$
ADD REPLYlink
1
Entering edit mode

have you tried wget ftp://ftp.ensembl.org/pub/grch37/release-87/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.chr.gtf.gz ?

Perhaps your chromosomes are in the format >1 for chr1 instead of the gencode >chr1 format?

ADD REPLYlink
2
Entering edit mode

I just tried ftp://ftp.ensembl.org/pub/grch37/current/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh37.cdna.all.fa.gz, it worked for me. This is the file that I think you need for Salmon.

ADD REPLYlink
3
Entering edit mode
9 weeks ago
genomax 68k
United States

Use GRCh37 fasta from GENCODE. If you need annotations then those are available there as well.

That said for new analyses you should stick with current release unless you are trying to reproduce some past analyses.

ADD COMMENTlink
0
Entering edit mode

Thank you, bam files are from GRCh37 so I need that I guess

ADD REPLYlink
0
Entering edit mode

Sorry @genomax

Where I can find the same genome.fasta and GTF files from GRCh37_1k (1000 genome) ?

I googled, for genome.fasta there were 3 phases but I did not find any corresponding GTF

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1