Find the 3' utrs, 5'utrs and their counts from bam files
18 months ago

Hello,

I have bam files of 8 samples (4 normal and 4 diseased), produced by alignment with novoalign (small-rna sequencing data). I have excluded the mirna from the bam files by using the following command:

bedtools intersect -v "sample.bam" "hg19_mirna.gff3" > output.bam


In this manner I have excluded the miRNA from all the samples. Now I want to find the utr (3' and 5') sequences present in the resultant bam files, and the read counts of each of the utr sequences.

Could anyone suggest a way to do this?

You can try featureCounts with meta-feature "UTR" level.

Thank you. Actually I want to get a gtf file of the utr sequences, and then count the reads using htseq-count. But I am unable to find the gtf files of 3' and 5' utrs. Do you know where I could get it?

at ensembl you have gtf files for many organisms. They contain "UTR" metadata (I know for human and mouse).

Hi, I am currently using hg19 database (Grch37 version). I am unable to find the gtf / gff3 files of the utrs of this version. Could you please link me to them? Thanks a lot

I recommend you upgrade to hg38! Or search in the archives somewhere.

19 months ago
MPI IE, Freiburg, Germany

Using biomart at ensembl you can get the 3' and 5' UTR of hg19 (image below) at this address:

https://grch37.ensembl.org/biomart/martview/

Then I guess you can get the read coverage from these sequences using for example deepTools multiBamSummary BED-file