Viral RNA-Seq Cufflinks not producing output
0
1
Entering edit mode
7.7 years ago
DG 7.3k

I'm running cufflinks on some data produced from a viral system using Ion Proton sequencing. I've been running several different RNA-Seq analysis workflows on this data. In particular this set is aligned with STAR + Bowtie2. I've also done HiSat2 + StringTie + Ballgown analysis on this data since that is sort of the replacement workflow for the traditional Tuxedo suite.

However I am running into an issue with the Cufflinks data. When running using:

cufflinks -g reference.gff -b reference.fa -u -p 24 --library-type fr-secondstrand sample.merged.sorted.bam

I end up not getting any results. Both skipped.gtf and transcript.gtf are empty, the tracking files just have the header.

Output to the screen:

You are using Cufflinks v2.2.1, which is the most recent release.
[04:39:37] Loading reference annotation. [04:39:37] Inspecting reads and determining fragment length distribution. Processed 1 loci.       [*************************] 100% 
Map Properties:     
Normalized Map Mass: 16399479.00    
Raw Map Mass: 16399479.00   
Fragment Length Distribution: Truncated Gaussian (default)
      Default Mean: 200
     Default Std Dev: 80 [04:41:49] 
Assembling transcripts and estimating abundances. 
Processed 1 loci.                           [*************************] 100% 
[04:44:31] Loading reference annotation and sequence. 
[04:44:31] Learning bias parameters.
Processed 0 loci.                            [*************************] 100% 
[04:45:50] Re-estimating abundances with bias correction. Processed 0 loci.

I tried setting -max-bundle-frags to a higher number; however then cufflinks gets stalled at 99% complete, waiting for 1 thread to finish for about a day or so, which normally these runs finish in a few minutes. I'm using a custom GFF file but I've also tried using another one that exists for this reference sequence, tried both GFF and GTF files. The gffread utility doesn't show any problems with these files. I've tried not using the -u flag.

Any suggestions what might be going on? The same dataset of Ion Proton reads has also been aligned against the human genome (since the viral system is infecting human cell culture) and that seems to work fine.

cufflinks RNA-Seq rna-seq transcriptomics • 3.2k views
ADD COMMENT
0
Entering edit mode

Sorry did not read the full post. Please ignore this. Cufflinks takes a gtf. You seem to have given a gff. Although don't know for sure if that is causing it.

ADD REPLY
0
Entering edit mode

Cufflinks will take both a GTF or a GFF. They mix and match which they specify in various in instructions but also, specifically for -g specify it as gff/gtf. Also, as described I tried both the GFF and GTF versions of my annotation files with the same results.

ADD REPLY
1
Entering edit mode

My apologies. I replied too fast without reading the entire post. I could not find any option to delete the post soon after posting. On another note, cufflinks has a --verbose option, just in case it hints to what is going on.

ADD REPLY
0
Entering edit mode

Trying that now. Not sure why I didn't even thing of that for more debugging info

ADD REPLY
0
Entering edit mode

I get some GFF Warnings (which oddly I don't get when I use gffread on the file). It seems to be something that happens around the second time it loads the reference annotation:

1: GQ994935.1
[03:49:21] Loading reference annotation.
GFF Warning: merging overlapping/adjacent feature segment CDS (1127-2779) with exon (1097-2000) for GFF ID transORF4a on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (1127-2779) with exon (1097-2234) for GFF ID transORF4b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (2609-2956) with CDS (1097-2779) for GFF ID transORF4a on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (2609-2956) with CDS (1097-2779) for GFF ID transORF4b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (20038-21051) with exon (18575-20248) for GFF ID transORFK3_70C on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment exon (20788-21099) with CDS (18575-21051) for GFF ID transORFK3_70C on GQ994935.1
GFF Warning: invalid coordinates for mRNA parent feature (ID=transORF42)
GFF Warning: merging overlapping/adjacent feature segment exon (69389-69782) with CDS (69228-69731) for GFF ID transORF47_46_45b on GQ994935.1
GFF Warning: merging overlapping/adjacent feature segment CDS (82043-83361) with exon (82042-82791) for GFF ID transORF57A on GQ994935.1
GFF Warning: invalid coordinates for mRNA parent feature (ID=transORFK12_Kaposinb)
Warning: gene geneORF8 found without exon segments (adding default exon).
Warning: gene geneORF11 found without exon segments (adding default exon).
Warning: gene geneK3 found without exon segments (adding default exon).
Warning: gene geneORF70 found without exon segments (adding default exon).
Warning: gene geneK5_6-AS found without exon segments (adding default exon).
Warning: gene geneORF21 found without exon segments (adding default exon).
Warning: gene geneORF24 found without exon segments (adding default exon).
Warning: gene geneORF28 found without exon segments (adding default exon).
Warning: gene geneORF36 found without exon segments (adding default exon).
Warning: gene geneORF43 found without exon segments (adding default exon).
Warning: gene geneORF46 found without exon segments (adding default exon).
Warning: gene geneORF47 found without exon segments (adding default exon).
Warning: gene geneORF48 found without exon segments (adding default exon).
Warning: gene geneORF50 found without exon segments (adding default exon).
Warning: gene geneORF49 found without exon segments (adding default exon).
Warning: gene geneORF53 found without exon segments (adding default exon).
Warning: gene geneORF57 found without exon segments (adding default exon).
Warning: gene geneORF59 found without exon segments (adding default exon).
Warning: gene geneORF60 found without exon segments (adding default exon).
Warning: gene geneORF61 found without exon segments (adding default exon).
Warning: gene geneORF62 found without exon segments (adding default exon).
Warning: gene geneORF66 found without exon segments (adding default exon).
Warning: gene geneORF67 found without exon segments (adding default exon).
Warning: gene geneORF67A found without exon segments (adding default exon).
Warning: gene geneORF71 found without exon segments (adding default exon).
Warning: gene geneORF72 found without exon segments (adding default exon).
Warning: gene geneORF73 found without exon segments (adding default exon).
Warning: gene geneK14 found without exon segments (adding default exon).
 Kept 128 transfrags out of 128
1: GQ994935.1
[03:49:21] Inspecting reads and determining fragment length distribution.
Inspecting bundle GQ994935.1:0-138146 with 18687057 reads
Processed 1 loci.                           
> Map Properties:
>   Normalized Map Mass: 16399479.00
>   Raw Map Mass: 16399479.00
>   Number of Multi-Reads: 951317 (with 3215392 total hits)
>   Fragment Length Distribution: Truncated Gaussian (default)
>                 Default Mean: 200
>              Default Std Dev: 80
**0 ReadHits still live**
Found 2 reference contigs
    Total map density: 16399479.000124
[03:51:33] Assembling transcripts and initializing abundances for multi-read correction.
GQ994935.1:0-138146 Processing new bundle with 18687057 alignments
Processed 1 loci.                           
[03:54:16] Loading reference annotation and sequence.
 **Kept 0 transfrags out of 0**
[03:54:16] Learning bias parameters.
Processed 0 loci.                           
[03:55:35] Re-estimating abundances with bias and multi-read correction.
Processed 0 loci.
ADD REPLY
0
Entering edit mode

Can you please also post a few lines of the vcf. Just in case is the strand in gtf a '+' and '-' and not set to a '.'. ?

ADD REPLY
0
Entering edit mode

Do you mean the GTF (there is no VCF). There is strand information in the GTF files (+ and -) for all of the features.

ADD REPLY
0
Entering edit mode

Ah my bad. Turns out it was a wrong lead. I was trying to grep the error message 'Kept \d+ transfrags out of 0' in the cufflinks source. Apparently the fault lies elsewhere.

ADD REPLY
0
Entering edit mode

I've not had any luck trying to calculate DE on viral expression with cufflinks. I've tried everything from the NCBI provided GFF/GTFs to making my own from scratch (not hard for ssRNA viruses) without success. I think that the way viruses are represented in GFF/GTFs is somehow valid to gffread but causes problems to parts of cuff* tool set. Then again I've also had instances with mammalian GTF/GFFs going through gffread without issue but parts of cuff* throwing a fit, so I don't think gffread is a completely reliable means for checking.

PS: What is a "viral system"?

ADD REPLY
0
Entering edit mode

I was just being non-specific about the virus, etc. But it is human cells infected with virus. So the mRNAs captured are a mix of virus transcripts and human transcripts. Viral system seemed an apt description for the experiment and set up.

ADD REPLY
0
Entering edit mode

And thanks for the input. It definitely seems like it is some issue with the annotations and how they are interpreted.

ADD REPLY
0
Entering edit mode

Ah, I thought you had something funky going on, like a system of multiple viral species. The cufflinks verbose output gave away the virus ;).

Have you tried deleting or manually fudging the regions cufflinks is complaining about? I've tried that without success, maybe it will work for you.

Here's a paper I had looked at previously, but haven't had a chance to try: http://journals.plos.org/plospathogens/article?id=10.1371/journal.ppat.1003896#s4

ADD REPLY
0
Entering edit mode

Yes, I'm now looking at featureCounts + DESeq2, so similar to HTSeq + edgeR

ADD REPLY

Login before adding your answer.

Traffic: 1464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6