How To Look For Known Fusion In Fastq File
1
0
Entering edit mode
10.7 years ago
Angel ▴ 220

Hello:

I have an internal data for NCI-h660 file with 8m mapped pairs (HiSeq, 50bp paired end data) and I have an external dataset (4m mapped pairs, 50 bp paired end generated on GAII).

Questions: 1. I observe TMPRSS2-ERG fusion with external dataset, not with internal data from HiSeq. What could be the reasons? I use tophat2 fusion with same parameters for both the datasets.

  1. How can I investigate the FASTQ file to see if this fusion is present. The sequence of ERG-TMPRSS2 fusion is as mentioned here: http://info.gersteinlab.org/images/c/cc/FusionSeq_results.jpg

  2. Does this mean we need more data generated internally to find the same fusion? I use the following possible thresholds that are the minimum possible:

tophat-fusion-post -p $np --skip-read-dist --num-fusion-reads 1 --num-fusion-pairs 1 --num-fusion-both 2 $index

Any help will be greatly appreciated!! Thanks.

fusion fastq • 3.4k views
ADD COMMENT
0
Entering edit mode
10.7 years ago

Use grep to search your fastq for a specific sequence.

Something like

grep -A 2 -B 1 GGAATAACCTGCCGCG myfastq.fastq > junctions.fastq

The -A means "Get 2 lines after the line that matches that sequence". -B means "get the one line before the line that matches the sequence". This will give you the full 4 lines of the fastq entry. If you don't need that, you can omit those two options. Check the rev-comp of that sequence too.

If your fastq is gzipped, use zgrep instead of grep. If you have a .bam file, do this to search the .bam

samtools view mybam.bam | grep GGAATAACCTGCCGCG - > junctions.sam

samtools view is reading the .bam, and converting it to a plain text .sam, and feeding that one line at a time to grep, which is only going to output the lines that contain your sequence to junctions.sam.

ADD COMMENT

Login before adding your answer.

Traffic: 2575 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6