Question: the times each genes have been repeated in my file is equal to the number of reads have been mapped on that gene
0
Entering edit mode

sorry friends,

in thebelow we can see few rows of a bed file from bowtie2...column one is the name of genes, columns two and three are the position where the read has been mapped (start and end)..and as you see gene YAL001C is being repeated (the times this gene has been repeated is equal to the number of reads that have mapped on the different places on this gene)

YAL001C 0 31 SRR1944914.13670510 42 +
YAL001C 0 31 SRR1944914.14245831 42 +
YAL001C 0 31 SRR1944914.14846638 42 +
YAL001C 21 49 SRR1944914.16464709 42 +
YAL001C 34 64 SRR1944914.16452509 42 +
YAL001C 39 68 SRR1944914.9573160 42 +
YAL001C 41 72 SRR1944914.10936494 42 +
YAL001C 47 78 SRR1944914.3091079 42 +
YAL001C 51 81 SRR1944914.14101000 42 +
YAL001C 63 94 SRR1944914.6961904 42 +
YAL001C 64 94 SRR1944914.1613580 42 +
YAL001C 81 112 SRR1944914.6321368 42 +
YAL001C 87 117 SRR1944914.15157073 42 +
YAL001C 102 133 SRR1944914.6375363 42 +
YAL001C 110 142 SRR1944914.3776687 42 +
YAL001C 110 140 SRR1944914.8299121 42 +
YAL001C 110 140 SRR1944914.10247842 42 +
YAL001C 123 153 SRR1944914.17267226 42 +
YAL001C 153 184 SRR1944914.11895906 42 +
YAL001C 162 191 SRR1944914.8661898 42 +
YAL001C 162 193 SRR1944914.15558858 42 +
YAL001C 183 214 SRR1944914.1191651 42 +

anyway i am with yeast and i used tophat2 using ensemble gtf file and so on)...many days i am trying to have such a bed by the tophat2 bam.file but i could not yet...a file in which the column one is the gene name and repeated as how many as reads have been mapped on and columns two and three are the start and the end of mappig of each read

do you have any idea to have such a file????

thAnk you

ADD COMMENTlink 4.4 years ago F ♦ 3.4k
3
Entering edit mode

" in thebelow we can see few rows of a bed file from bowtie2"

Does bowtie2 produce a bed file? As far as I'm aware it produces a SAM file. Please be specific as to where you got your bed file from.

Did you look at the forth column? That appears to be the difference between each of the three repeated entries for each start stop combination.

" anyway i am with yeast and i used tophat2 using ensemble gtf file and so on)"

Be specific as to what commands you've ran already - reproducibility is key.

bamtobed will produce a bed file from a bam file with chromosome, start, stop, read IDs, etc. I guess your followup question would be "but what about gene annotation?" - Look at the biomaRt package

ADD COMMENTlink 4.4 years ago andrew.j.skelton73 5.7k
Entering edit mode
0

ohhhhhhhh Andrew come on...i mean i produced a sam then bam then bed

ADD REPLYlink 4.4 years ago
F
♦ 3.4k
Entering edit mode
0

i mean by tophat2 " tophat2 -p 8 -G genes.gtf genome file.fastq " command (gtf i think is the annotation flie and genome is the whole genome fasta), i produced a file named accepted_hits.bam which using "

bam2bed < accepted_hits.bam | bedmap --echo --count genes.bed - > answer4.bed

" command, i have a bed file now....but advider asking me to have a file like what i pasted above

but what i have is like below:

I 334 337 "YAL069W . + protein_coding start_codon 0 exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128"; 0
I 334 646 "YAL069W . + protein_coding CDS 0 exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; protein_id "YAL069W"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128"; 2
I 334 649 "YAL069W . + protein_coding exon . exon_number "1"; gene_id "YAL069W"; gene_name "YAL069W"; p_id "P3633"; seqedit "false"; transcript_id "YAL069W"; transcript_name "YAL069W"; tss_id "TSS1128"; 2
I 537 540 "YAL068W-A . + protein_coding start_codon 0 exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; transcript_id "YAL068W-A"; transcript_name "YAL068W-A"; tss_id "TSS5439"; 0
I 537 789 "YAL068W-A . + protein_coding CDS 0 exon_number "1"; gene_id "YAL068W-A"; gene_name "YAL068W-A"; p_id "P5377"; protein_id "YAL068W-A"; transcript_id "YAL068W-A"; transcript_name

anyway thank you

ADD REPLYlink 4.4 years ago
F
♦ 3.4k
1
Entering edit mode

This has nothing to do with tophat - what you have there most likely is the result of an intersect operation between a feature file and an alignment file - produced most likely by bedtools.

By default each overlap will be reported - hence you have the same gene reported each time it overlaps with a read. Consult the bedtools documentation on how to format the results of an intersect.

ADD COMMENTlink 4.4 years ago Istvan Albert 80k
Entering edit mode
0

tnx Istvan,

what i pasted above is the result of bedops tool by which first i converted gtf.genes to genes.bed then using my accepted_hits.bam as input i got such a result...but i need a file in column one contain gene name repeated equal to the number of reads that have been mapped on

ADD REPLYlink 4.4 years ago
F
♦ 3.4k

Login before adding your answer.

Powered by the version 1.8