I'm trying to run rsem for S. aureus strain USA300 FPR3757.
Then I saved this file as a GenBank file, which I want to transform into a gtf file for the RSEM pipeline.
Does anyone know of a program which transforms Ganbank format files from NCBI directly into a gtf file?
I was not aware of any gb -> gtf tranformation program.
However, I did find this website:
https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html
which transforms a gb file to a gff file. Once I got the gff file, I used the gffread utility to transform it into a gtf file.
This gtf file looks like this:
Filename: Staphylococcus_aureus_genome_USA300_FPR3757.gtf
CP000255 GenBank exon 544 1905 . + . transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255 GenBank CDS 544 1905 . + 0 transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255 GenBank exon 2183 3316 . + . transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255 GenBank CDS 2183 3316 . + 0 transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255 GenBank exon 3697 3942 . + . transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255 GenBank CDS 3697 3942 . + 0 transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255 GenBank exon 3939 5051 . + . transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";
CP000255 GenBank CDS 3939 5051 . + 0 transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";
Pretty simple-looking, but it's due to the fact that I'm analyzing a prokaryote.
QUESTION: does RSEM work well with prokaryotic genomes?
Now, I ran rsem-prepare-reference this way:
rsem-prepare-reference --gtf Staphylococcus_aureus_genome_USA300_FPR3757.gtf --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ Staphylococcus_aureus_genome_USA300_FPR3757.fasta Staphylococcus_aureus_genome_USA300_FPR3757
The log file said that rsem-prepare-reference finished successfully.
However, when I run rsem-calculate-expression , it gives an error. I don't know why, I have been working on it 2 days now.
The command:
rsem-calculate-expression -p 8 --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ --output-genome-bam /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq /path/to/rsem/files/saureus_usa300_fpr3757/Staphylococcus_aureus_genome_USA300_FPR3757 output
It gives this error:
nohup: ignoring input
/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD \@HD VN:1.4 SO:unsorted --outFileNamePrefix output.temp/output --readFilesIn /path/to/fastq/1/1_S1_L001_R1_001.trimmed.fq
Aug 29 09:51:11 ..... started STAR run
Aug 29 09:51:11 ..... loading genome
Aug 29 09:51:13 ..... started mapping
"/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757 --outSAMunmapped Within --outFilterType BySJout --outSAMattributes NH HI AS NM MD --outFilterMultimapNmax 20 --outFilterMismatchNmax 999 --outFilterMismatchNoverLmax 0.04 --alignIntronMin 20 --alignIntronMax 1000000 --alignMatesGapMax 1000000 --alignSJoverhangMin 8 --alignSJDBoverhangMin 1 --sjdbScore 1 --runThreadN 8 --genomeLoad NoSharedMemory --outSAMtype BAM Unsorted --quantMode TranscriptomeSAM --outSAMheaderHD \@HD VN:1.4 SO:unsorted --outFileNamePrefix output.temp/output --readFilesIn /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq " failed! Plase check if you provide correct parameters/options for the pipeline!
Any help is appreciated.
Run the command in the error message directly and see what error message it returns. That will undoubtedly be more informative. I suspect that the lack of
gene
andtranscript
entries in you GTF file is causing issues, but the error message from STAR can likely confirm/deny that suspicion.I edited the gtf file so it looks like this: