Entering edit mode
5.1 years ago
Laven9
•
0
I have got my WES data and now I want to split my data into a sub-file containing data only from chr10 (both my data that in fastq format and the reference ucsc.hg19.fasta ). My code was like this:
for my data: I map it to ucsc.hg19.fasta by BWA
bwa mem ~/reference/ucsc.hg19.fasta ~/input/data.1.fastq.gz ~/input/data.2.fastq.gz |samtools view -S -b > ~/output/data.bam
then I sort it and index it.
samtools sort -n ~/output/data.bam > ~/output/data.sorted.bam
samtools index ~/output/data.sorted.bam
then I split it and convert it back to fastq
samtools view -h -b ~/output/data.sorted.bam chr10 > ~/output/chr10.bam
samtools bam2fq ~/output/chr10.bam > ~/output/chr10.fastq
cat ~/output/chr10.fastq | grep '^@.*/1$' -A 3 --no-group-separator > ~/output/chr10.1.fastq
cat ~/output/chr10.fastq | grep '^@.*/2$' -A 3 --no-group-separator > ~/output/chr10.2.fastq
And I do for reference:
samtools faidx ~/reference/ucsc.hg19.fasta chr10 > ~/reference/ucsc.hg19.chr10.fasta
But when I use these two sub-files ( chr10.1.fastq ,chr10.2.fastq) and reference(ucsc.hg19.chr10.fasta) for BWA again, I found I got a much smaller bam file( only 1 kb). It is unusual, for my chr10.bam is more than 40,000 kb.
What is wrong with my code? Please give me some advice! Thank you!