Entering edit mode
7.8 years ago
arash.askary
▴
10
I have a series of single-end reads from a RAD library of 48 uniquely tagged individuals in fastaq format. The data comes from a small MiSeq run. I want to know the number of unique fragments per individual/barcode, but I'm not sure how to go about getting that number. I'm new to bioinformatics, but I was able to use Stacks to demultiplex the library using the process_radtags function.
Could someone help? Thanks!
I thought
stacks
was a complete toolbox for RADseq analysis. So basically you are looking to deduplicate your demultiplexed datasets to get all unique sequences for each?I'm sure there's a way to use stacks for my problem. I just want to know the number of unique fragments that are associated to each barcode. I'm not sure what you mean by deduplicating...
Following may give what you are looking for.
Replace
MACHINE_ID
with a few characters of the string (e.g. K00045) you see in your sequence files.Thanks! That seems to be exactly what I wanted. Just out of curiosity, is there an easy way of discerning fragments that are <95% identical in the same line of code? I've read the grep and uniq manual and can't seem to find a solution there.
For that you would need to use an aligner (e.g. blat or NGS aligner) and specify constraints by doing an all by all search.