Biostar Beta. Not for public use.
Question: Making a counts
0
Entering edit mode

Hi everyone, I'm a newbie in R and RNA seq analysis, so don't judge me so much) I'm trying to perform RNA-seq analysis as it says here http://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html , I was given a bunch of bam/bam.bai files(C33A and HeLa) 3 scrambled and 3 shoct4 files for each gene, so as a first step I need to do counts, here is the first question: Should I use featurecounts functions for all bam files of one type of genes or parse them separately?

Secondly, I tried to parse all files as separately and together and got counts txt files, which different http://combine-australia.github.io/RNAseq-R/06-rnaseq-day1.html enter image description here

enter image description here

Tutorial count file: My Count file:

ADD COMMENTlink 15 months ago Nick • 0 • updated 15 months ago h.mon 25k
Entering edit mode
1

Use all files together when you feed them into featureCounts. That way you get a single matrix (genes in rows, samples in columns) that would be easier to use in downstream DE apps.

ADD REPLYlink 15 months ago
genomax
68k
Entering edit mode
1

Hi frozmanik,

welcome to Biostars. What you are asking are valid, but very basic questions that have been addressed a lot of times before. Therefore, I recommend that you spend quality time on reading this excellent tutorial/workflow on RNA-seq analysis, which also covers the questions you raised here. Please use google and the search function extensively because many of the common problems you'll encounter when reproducing the steps covered in the tutorial are extensively described here on Biostars as well as other communities or scientific blogs. Please also check How to add images to a Biostars post, and make sure the link you use is indeed correct. The link to the picture in your question seems to be broken. Cheers!

ADD REPLYlink 15 months ago
ATpoint
17k
Entering edit mode
0

Hello

use following commands for bam file, to create count matrix after aligning reads with genome

# in Linux
for f in *.sam; do samtools sort -@ 8 -o ${f/.sam/.bam} $f ;done
for f in *.bam; do htseq-count -f bam -i gene_name $f  genome_annotation.gtf  > ${f/.bam/.counts}; done 

#in R 
f <- list.files(".", "*.counts")
counts <- lapply(f, read.delim, header=F)
counts <- do.call(cbind, counts)
#remove 5 lines at the end
counts <- counts[-c(54135:54139),]
rownames(counts) <- counts[,1]
counts <- counts[,seq(ncol(counts)/2)*2]
colnames(counts) <- sub(".counts","",f)
ADD REPLYlink 15 months ago
ahmad mousavi
• 430
Entering edit mode
1

This might work for your specific situation, but because row numbers in counts <- counts[-c(54135:54139),] are hardcoded, it is not generic, and also rather complicated for something that can be solved with a one-liner like:

./featureCounts (options...) -a in.gtf -o countmatrix.txt *.bam
ADD REPLYlink 15 months ago
ATpoint
17k
Entering edit mode
0

While this answer may be correct on it own it is not addressing the question asked by OP. So I am moving this to a comment.

ADD REPLYlink 15 months ago
genomax
68k
1
Entering edit mode

As others said in the comments above, pass all bam files to featureCounts, then its output will contain all counts in one file, which is simpler to parse. However, there should be no differences in counts, regardless you pass one file at a time or all files together.

I don't understand which differences between your and the tutorial counts are puzzling you: 1) the actual counts; 2) the column names; 3) the missing length column.

The actual counts are different because you and the tutorial are analysing different data sets: cancer cell lines (C33A and HeLa) versus "basal stem-cell enriched cells (B) and committed luminal cells (L) in the mammary gland of virgin, pregnant and lactating mice". By the way, are you using a human or mouse reference genome?

Column names can be easily changed with colnames( seqdata ) <- c( "name1", "name2", "name3" ).

And the length column can be additionally parsed from the featureCounts output file.

ADD COMMENTlink 15 months ago h.mon 25k

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.0