How to align RNA-seq data to a PolyA tail "genome"
0
0
Entering edit mode
5.2 years ago
mb314 ▴ 20

Hello,

I think my RBP of interest is binding to PolyA tails. I have fastq.gz files from a CLIP-seq experiment and I want to see if any of my reads have high polyA content. I tried using Star Aligner (version 2.5.3a) to index a very short sequence of A's and was successful:

STAR --runMode genomeGenerate --runThreadN 16 --genomeDir PolyA/star_index --genomeFastaFiles PolyA/PolyA.fasta --genomeSAindexNbases 2 --limitGenomeGenerateRAM 33000000000

I then tried aligning my fastq.gz to the indexed polyA genome. The script ran for 24 hrs then aborted. When I aligned to the human genome, it was done in less than 20 minutes. The code I used is below:

STAR --runMode alignReads \
--runThreadN 16 \
--genomeDir ../genomes/PolyA/star_index \
--genomeLoad LoadAndRemove \
--readFilesIn pathtomyfile.fastq.gz \
--readFilesCommand zcat \
--outFilterMultimapNmax 20 \
--outFileNamePrefix myfileout.bam \
--outSAMattributes All \
--outSAMtype BAM Unsorted \
--outFilterMismatchNmax 10

Does anyone have suggestions at to why this failed? or have other recommendations to see how much polyA content I have in my samples?

Thank you!

RNA-Seq CLIP-seq Star Aligner PolyA tail • 1.7k views
ADD COMMENT
2
Entering edit mode

I do not really see the point in doing that. Polyadenylation is a posttranscriptional modification, means special enzymes put the polyA tail to the pre-mRNA after transcription. The polyA is not part of the gene in the genome so alignment won't help you. I would rather use a dedicated trimming tool such as bbduk, trimmomatic or cutadapt to trim polyA tails (please use the search function), and then see how many % of the reads contained that pattern.

ADD REPLY

Login before adding your answer.

Traffic: 2525 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6