Question

the times each genes have been repeated in my file is equal to the number of reads have been mapped on that gene

0

Entering edit mode

8.6 years ago

zizigolu ★ 4.3k

Sorry friends,

Below we can see few rows of a bed file from bowtie2. Column one is the name of genes, columns two and three are the position where the read has been mapped (start and end)..and as you see gene YAL001C is being repeated (the times this gene has been repeated is equal to the number of reads that have mapped on the different places on this gene)

YAL001C    0    31    SRR1944914.13670510    42    +
YAL001C    0    31    SRR1944914.14245831    42    +
YAL001C    0    31    SRR1944914.14846638    42    +
YAL001C    21    49    SRR1944914.16464709    42    +
YAL001C    34    64    SRR1944914.16452509    42    +
YAL001C    39    68    SRR1944914.9573160    42    +
YAL001C    41    72    SRR1944914.10936494    42    +
YAL001C    47    78    SRR1944914.3091079    42    +
YAL001C    51    81    SRR1944914.14101000    42    +
YAL001C    63    94    SRR1944914.6961904    42    +
YAL001C    64    94    SRR1944914.1613580    42    +
YAL001C    81    112    SRR1944914.6321368    42    +
YAL001C    87    117    SRR1944914.15157073    42    +
YAL001C    102    133    SRR1944914.6375363    42    +
YAL001C    110    142    SRR1944914.3776687    42    +
YAL001C    110    140    SRR1944914.8299121    42    +
YAL001C    110    140    SRR1944914.10247842    42    +
YAL001C    123    153    SRR1944914.17267226    42    +
YAL001C    153    184    SRR1944914.11895906    42    +
YAL001C    162    191    SRR1944914.8661898    42    +
YAL001C    162    193    SRR1944914.15558858    42    +
YAL001C    183    214    SRR1944914.1191651    42    +

Anyway I am with yeast and I used tophat2 using ensemble gtf file and so on)...many days I am trying to have such a bed by the tophat2 bam.file but I could not yet...a file in which the column one is the gene name and repeated as how many as reads have been mapped on and columns two and three are the start and the end of mappig of each read

Do you have any idea to have such a file?

Thank you

tophat2 bed read-count • 2.3k views

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

Ram · Answer 1 · 2015-09-09

3

Entering edit mode

8.6 years ago

andrew.j.skelton73 6.5k

Below we can see few rows of a bed file from bowtie2

Does bowtie2 produce a bed file? As far as I'm aware it produces a SAM file. Please be specific as to where you got your bed file from.

Did you look at the fourth column? That appears to be the difference between each of the three repeated entries for each start stop combination.

Anyway I am with yeast and I used tophat2 using ensemble gtf file and so on)

Be specific as to what commands you've ran already - reproducibility is key.

bamtobed will produce a bed file from a bam file with chromosome, start, stop, read IDs, etc. I guess your follow-up question would be "but what about gene annotation?" - Look at the biomaRt package

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by andrew.j.skelton73 6.5k

0

Entering edit mode

Oh Andrew come on. I mean I produced a sam then bam then bed

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

0

Entering edit mode

I mean by tophat2 tophat2 -p 8 -G genes.gtf genome file.fastq command (gtf I think is the annotation file and genome is the whole genome fasta), I produced a file named accepted_hits.bam which using

bam2bed < accepted_hits.bam | bedmap --echo --count genes.bed - > answer4.bed

command, I have a bed file now....but adviser asking me to have a file like what I pasted above

but what I have is like below:

I    334    337    "YAL069W    .    +    protein_coding    start_codon    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    0
I    334    646    "YAL069W    .    +    protein_coding    CDS    0    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; protein_id "YAL069W"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    2
I    334    649    "YAL069W    .    +    protein_coding    exon    .    exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; seqedit "false"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128";    2
I    537    540    "YAL068W-A    .    +    protein_coding    start_codon    0    exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; transcript_id "YAL068W-A"; transcript_name "YAL068W-A"; tss_id "TSS5439";    0
I    537    789    "YAL068W-A    .    +    protein_coding    CDS    0    exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; protein_id "YAL068W-A"; transcript_id "YAL068W-A"; transcript_name

anyway thank you

ADD REPLY • link updated 4.5 years ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k

Ram · Answer 2 · 2015-09-09

1

Entering edit mode

8.6 years ago

Istvan Albert 100k

This has nothing to do with tophat - what you have there most likely is the result of an intersect operation between a feature file and an alignment file - produced most likely by bedtools.

By default each overlap will be reported - hence you have the same gene reported each time it overlaps with a read. Consult the bedtools documentation on how to format the results of an intersect.

ADD COMMENT • link updated 19 months ago by Ram 43k • written 8.6 years ago by Istvan Albert 100k

0

Entering edit mode

Thanks Istvan,

What I pasted above is the result of bedops tool by which first I converted gtf.genes to genes.bed then using my accepted_hits.bam as input I got such a result but I need a file in column one contain gene name repeated equal to the number of reads that have been mapped on

ADD REPLY • link updated 19 months ago by Ram 43k • written 8.6 years ago by zizigolu ★ 4.3k