Entering edit mode
5.0 years ago
c.pouchon
•
0
Hi everyone,
I am working on shotgun genomic sequences on plants, and I wanted to mapp my sequences into a list of genes of references (eg. BUSCO) in order to retrieve a consensus for each of my own sample and to compare them after in phylogenetic analyses.
I ran DIAMOND with my reads/contigs (from SPAdes) into my proteins of reference and I identified different hits. But i am wondering how I can retrieve my consensus (in nucleotide) for each gene of reference, and if there is a correct way to proceed as in samtools with mpileup function. Have you any idea?
Thanks, Best regards
Charles P.
It is not clear what you are asking above. Let me try to state it as follows. Let us know if that is correct.
You want to extract reads/contigs (you include both above, is that what you used when you did DIAMOND search) that are "aligning" to a particular protein, as consensus sequence (or eventually generate a consensus from them)?
Thanks, I am sorry for my question. You are right, I made two kind of analyses. But i am interesting on reads, i want to extract reads aligning on a particular protein (by DIAMOND) and after generate a consensus from them.
Then I suggest you extract the read names from your DIAMOND output (if you had tabular output then some combination of
grep
andcut
commands should work) and then get those reads from your original fastq data. You can usefilterbyname.sh
from BBMap suite as one of the program options to do that.