Question

Extract Consensus Sequence from DIAMOND

0

Entering edit mode

5.0 years ago

c.pouchon • 0

Hi everyone,

I am working on shotgun genomic sequences on plants, and I wanted to mapp my sequences into a list of genes of references (eg. BUSCO) in order to retrieve a consensus for each of my own sample and to compare them after in phylogenetic analyses.

I ran DIAMOND with my reads/contigs (from SPAdes) into my proteins of reference and I identified different hits. But i am wondering how I can retrieve my consensus (in nucleotide) for each gene of reference, and if there is a correct way to proceed as in samtools with mpileup function. Have you any idea?

Thanks, Best regards

Charles P.

alignment Assembly sequencing mapping • 1.2k views

ADD COMMENT • link 5.0 years ago by c.pouchon • 0

0

Entering edit mode

It is not clear what you are asking above. Let me try to state it as follows. Let us know if that is correct.

You want to extract reads/contigs (you include both above, is that what you used when you did DIAMOND search) that are "aligning" to a particular protein, as consensus sequence (or eventually generate a consensus from them)?

ADD REPLY • link 5.0 years ago by GenoMax 141k

0

Entering edit mode

Thanks, I am sorry for my question. You are right, I made two kind of analyses. But i am interesting on reads, i want to extract reads aligning on a particular protein (by DIAMOND) and after generate a consensus from them.

ADD REPLY • link 5.0 years ago by c.pouchon • 0

0

Entering edit mode

Then I suggest you extract the read names from your DIAMOND output (if you had tabular output then some combination of grep and cut commands should work) and then get those reads from your original fastq data. You can use filterbyname.sh from BBMap suite as one of the program options to do that.

ADD REPLY • link 5.0 years ago by GenoMax 141k