RSEM with prokaryotes
0
0
Entering edit mode
6.6 years ago
csmatyi1 • 0

I'm trying to run rsem for S. aureus strain USA300 FPR3757.

Then I saved this file as a GenBank file, which I want to transform into a gtf file for the RSEM pipeline.

Does anyone know of a program which transforms Ganbank format files from NCBI directly into a gtf file?

I was not aware of any gb -> gtf tranformation program.

However, I did find this website:

https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html

which transforms a gb file to a gff file. Once I got the gff file, I used the gffread utility to transform it into a gtf file.

This gtf file looks like this:

Filename: Staphylococcus_aureus_genome_USA300_FPR3757.gtf

CP000255        GenBank exon    544     1905    .       +       .       transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255        GenBank CDS     544     1905    .       +       0       transcript_id "SAUSA300_0001.t01"; gene_id "SAUSA300_0001"; gene_name "SAUSA300_0001";
CP000255        GenBank exon    2183    3316    .       +       .       transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255        GenBank CDS     2183    3316    .       +       0       transcript_id "SAUSA300_0002.t01"; gene_id "SAUSA300_0002"; gene_name "SAUSA300_0002";
CP000255        GenBank exon    3697    3942    .       +       .       transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255        GenBank CDS     3697    3942    .       +       0       transcript_id "SAUSA300_0003.t01"; gene_id "SAUSA300_0003"; gene_name "SAUSA300_0003";
CP000255        GenBank exon    3939    5051    .       +       .       transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";
CP000255        GenBank CDS     3939    5051    .       +       0       transcript_id "SAUSA300_0004.t01"; gene_id "SAUSA300_0004"; gene_name "SAUSA300_0004";

Pretty simple-looking, but it's due to the fact that I'm analyzing a prokaryote.

QUESTION: does RSEM work well with prokaryotic genomes?

Now, I ran rsem-prepare-reference this way:

rsem-prepare-reference --gtf Staphylococcus_aureus_genome_USA300_FPR3757.gtf --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ Staphylococcus_aureus_genome_USA300_FPR3757.fasta Staphylococcus_aureus_genome_USA300_FPR3757

The log file said that rsem-prepare-reference finished successfully.

However, when I run rsem-calculate-expression , it gives an error. I don't know why, I have been working on it 2 days now.

The command:

rsem-calculate-expression -p 8 --star --star-path /path/to/STAR-2.5.3a/bin/Linux_x86_64/ --output-genome-bam /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq  /path/to/rsem/files/saureus_usa300_fpr3757/Staphylococcus_aureus_genome_USA300_FPR3757 output

It gives this error:

nohup: ignoring input
/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 8  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD \@HD VN:1.4 SO:unsorted  --outFileNamePrefix output.temp/output  --readFilesIn /path/to/fastq/1/1_S1_L001_R1_001.trimmed.fq
Aug 29 09:51:11 ..... started STAR run
Aug 29 09:51:11 ..... loading genome
Aug 29 09:51:13 ..... started mapping
"/path/to/STAR-2.5.3a/bin/Linux_x86_64//STAR --genomeDir /path/to/rsemfiles/saureus_usa300_fpr3757  --outSAMunmapped Within  --outFilterType BySJout  --outSAMattributes NH HI AS NM MD  --outFilterMultimapNmax 20  --outFilterMismatchNmax 999  --outFilterMismatchNoverLmax 0.04  --alignIntronMin 20  --alignIntronMax 1000000  --alignMatesGapMax 1000000  --alignSJoverhangMin 8  --alignSJDBoverhangMin 1  --sjdbScore 1  --runThreadN 8  --genomeLoad NoSharedMemory  --outSAMtype BAM Unsorted  --quantMode TranscriptomeSAM  --outSAMheaderHD \@HD VN:1.4 SO:unsorted  --outFileNamePrefix output.temp/output  --readFilesIn /path/to/samples/1/1_S1_L001_R1_001.trimmed.fq " failed! Plase check if you provide correct parameters/options for the pipeline!

Any help is appreciated.

gtf RSEM gff genome annotation • 2.0k views
ADD COMMENT
0
Entering edit mode

Run the command in the error message directly and see what error message it returns. That will undoubtedly be more informative. I suspect that the lack of gene and transcript entries in you GTF file is causing issues, but the error message from STAR can likely confirm/deny that suspicion.

ADD REPLY
0
Entering edit mode

I edited the gtf file so it looks like this:

1       GenBank gene    544     1905    .       +       .       gene_id "SAUSA300_0001";
1       GenBank transcript      544     1905    .       +       .       gene_id "SAUSA300_0001"; transcript_id "SAUSA300_0001.t01";
1       GenBank exon    544     1905    .       +       .       gene_id "SAUSA300_0001"; transcript_id "SAUSA300_0001.t01";
1       GenBank gene    2183    3316    .       +       .       gene_id "SAUSA300_0002";
1       GenBank transcript      2183    3316    .       +       .       gene_id "SAUSA300_0002"; transcript_id "SAUSA300_0002.t01";
1       GenBank exon    2183    3316    .       +       .       gene_id "SAUSA300_0002"; transcript_id "SAUSA300_0002.t01";
1       GenBank gene    3697    3942    .       +       .       gene_id "SAUSA300_0003";
1       GenBank transcript      3697    3942    .       +       .       gene_id "SAUSA300_0003"; transcript_id "SAUSA300_0003.t01";
1       GenBank exon    3697    3942    .       +       .       gene_id "SAUSA300_0003"; transcript_id "SAUSA300_0003.t01";
1       GenBank gene    3939    5051    .       +       .       gene_id "SAUSA300_0004";
ADD REPLY

Login before adding your answer.

Traffic: 1412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6