Produce A Bed Or Position List File Containing A List Of Regions For Mpileup
1
0
Entering edit mode
10.5 years ago
rob234king ▴ 610

I mapped reads for SNP calling to a full 10Gb reference. I've reduced this reference to only those contigs that have SNPs of interest mapped to. I would now like to reduce the bam file so I can easily visualise in IGV.

From samtools the method of doing this seems to do samtools view on the bam file with the list of locations. My question is how to generate this file? Could I adapt a samtools faidx of my reduced contig reference file which has output like below to a bed file format?

scaff1 3970 29 79 80 scaff2 8501 4079 79 80

The first column is the name of the scaffold, then length of contig?, then start position? If this is the case could I not just move the third column to column position 2 and add column 2 and 3 for the end position column for a bed format file?

samtools mpileup • 4.8k views
ADD COMMENT
0
Entering edit mode

About the fai file Can you please tell me where I find information about .fai file format?

It's not clear what you want to do with this file . You can reduce the calling of mpileup with the option -l file.bed; Can also reduce the size of the BAM with samtools view -L region.bed your.bam

ADD REPLY
0
Entering edit mode

Yes sorry it was samtools view, realised this in a lecture. how do you create the bed file?

ADD REPLY
0
Entering edit mode

Looks like the -t setting can use the index.fai file, so I'll give that a try.

ADD REPLY
0
Entering edit mode

The -t option will replace the header in the output when you view the file, which is probably not what you want.

ADD REPLY
0
Entering edit mode

I try this:

samtools view -L rk1.bed -h A6_genome.sort.rmdup.bam > A6_genome.sort.rmdup_filtered.sam
samtools view -bS A6_genome.sort.rmdup_filtered.sam > A6_genome.sort.rmdup_filtered.bam

but the file is the same size as what it started so don't think it is working.

The only thing I can think of is my bed file is wrong?

(name_of_scaffold) (0) (length)

ADD REPLY
0
Entering edit mode

How many scaffolds are in the bed file you're passing to samtools and how many are in the assembly? The bed file should have a much smaller number of them.

ADD REPLY
0
Entering edit mode

approx 600 scaffolds in bed file and 10 million in reference that was used to map to bam file.

ADD REPLY
0
Entering edit mode

That's odd. Does the same thing happen if your bed file contains just 1 or 2 contigs? BTW, your two commands could be merged to samtools view -L rk1.bed -b -o A6_genome.sort.rmdup_filtered.bam A6_genome.sort.rmdup.bam, which will also probably be a bit faster.

ADD REPLY
0
Entering edit mode

I got round it by converting the bam file to sam and using tablet to visualise with my reduced reference which worked fine. I'll keep going trying to figure it out because I can't see why it doesn't work.

ADD REPLY
0
Entering edit mode

Thanks, please do report back here in the comments if you find out why that's not producing the expected behaviour.

ADD REPLY
0
Entering edit mode

If you just want to be able to visualize the alignments in IGV, then just sort and index the BAM file. IGV won't need to load the whole thing into memory.

ADD REPLY
0
Entering edit mode

Doesn't load, not sure if bam file is too big but only 1.5 gb.

ADD REPLY
0
Entering edit mode

Did it give an error message (start IGV from the command line if you don't normally)?

ADD REPLY
0
Entering edit mode
10.5 years ago

You can visualize in IGV any .bam file, independently of its size, as long as it is indexed and the reference genome that was used for the alignment is selected in advance. if your .bam file is not indexed (a .bai extension with the same name of the .bam file is what you would have to look for) samtools index yourbamfile would do, and if your genome of interest is not listed in the top left combobox on IGV interface you can enter it manually as .fasta or .genome on IGV's "genome" menu. both .bam/.bai and genome file can be locally or remotely stored served by http. that's all you need to load your .bam file into IGV.

ADD COMMENT
0
Entering edit mode

I think it's just a memory issue, the computer I'm working from has only 4gb of ram so may explain why IGV stalls.

ADD REPLY
1
Entering edit mode

4GB should be enough for running IGV with the 1.2GB option from the web, and even with the 750MB. it may be the case that you could have hugely covered regions, and so you could set IGV to downsample your visualization in order not to load each and every read. this can be done through [view]>[preferences]>[alignments]>[downsample reads], and has helped us when looking at really enriched target resequencing regions where we have more than 1000x.

ADD REPLY

Login before adding your answer.

Traffic: 2926 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6