Identify the number of unique fragments from MiSeq data
0
1
Entering edit mode
7.8 years ago
arash.askary ▴ 10

I have a series of single-end reads from a RAD library of 48 uniquely tagged individuals in fastaq format. The data comes from a small MiSeq run. I want to know the number of unique fragments per individual/barcode, but I'm not sure how to go about getting that number. I'm new to bioinformatics, but I was able to use Stacks to demultiplex the library using the process_radtags function.

Could someone help? Thanks!

MiSeq stacks RAD • 2.0k views
ADD COMMENT
0
Entering edit mode

I thought stacks was a complete toolbox for RADseq analysis. So basically you are looking to deduplicate your demultiplexed datasets to get all unique sequences for each?

ADD REPLY
0
Entering edit mode

I'm sure there's a way to use stacks for my problem. I just want to know the number of unique fragments that are associated to each barcode. I'm not sure what you mean by deduplicating...

ADD REPLY
0
Entering edit mode

Following may give what you are looking for.

grep -A 1 "^@MACHINE_ID" your_file.fastq | grep -v "^@" | grep -v "\-\-" | sort | uniq -c

Replace MACHINE_ID with a few characters of the string (e.g. K00045) you see in your sequence files.

ADD REPLY
0
Entering edit mode

Thanks! That seems to be exactly what I wanted. Just out of curiosity, is there an easy way of discerning fragments that are <95% identical in the same line of code? I've read the grep and uniq manual and can't seem to find a solution there.

ADD REPLY
0
Entering edit mode

For that you would need to use an aligner (e.g. blat or NGS aligner) and specify constraints by doing an all by all search.

ADD REPLY

Login before adding your answer.

Traffic: 2550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6