Question

Difference in read reference names when aligning reads.

0

Entering edit mode

5.7 years ago

TonyCN ▴ 60

This might be a newbie question, I'm a QM Chemist stepping in for a bioinformatician at work, so I am sorry in advance for the lack of necessary information required to help with my question.

I have a documented number of steps to follow that allows me to align my paired-end reads to a human reference genome and then perform variant calling. I am using Samtools, GATK and Picard. I am also using the same reference gnome fa file as my colleague.

However, when I perform the variant calling and I look inside the sam file I generated, I only have reference names "ref|NT_.....|". The original files generated by the bioinformatician have "NC...." as reference names. The code further down the pipeline will require the NC... naming structure.

I don't want this question to feel too much like a black box, but if I could get a general idea of what will have caused the difference in read reference names, I would be really grateful and I can go from there.

Thanks

assembly sequence genome • 1.1k views

ADD COMMENT • link 5.7 years ago by TonyCN ▴ 60

1

Entering edit mode

You must have aligned your data to a reference collection that had ref|NT.. names instead of the NC names.

ADD REPLY • link 5.7 years ago by GenoMax 142k

0

Entering edit mode

Would this be something inside the human genome *.fa file?

ADD REPLY • link 5.7 years ago by TonyCN ▴ 60

1

Entering edit mode

Yes. Take a look at grep "^>" .fa and see if that is what you have.

You need to use matched genome sequence/annotation for this reason.

ADD REPLY • link 5.7 years ago by GenoMax 142k

0

Entering edit mode

Ah I see! Thanks. I didn't spot a single NC notation. The file I have is hg38_GRCh38.p12.allChr.fa - any idea where to get hold of the corresponding reference file with NC rather than ref|NT/NW? I appreciate your help, I'm a little out of my skill set and comfort zone at the moment.

ADD REPLY • link 5.7 years ago by TonyCN ▴ 60

2

Entering edit mode

Do you know where you got that file from? You can get matching reference and GTF files from this page. You will need to realign your data though.

Here is an informative blog post that you will find useful about which human reference to use.

ADD REPLY • link 5.7 years ago by GenoMax 142k

0

Entering edit mode

I am afraid I don't recall where that file on my system came from. Thank you for that information, I'll try and plod on from here.

ADD REPLY • link 5.7 years ago by TonyCN ▴ 60