Question

Adding nucleotides to a sequence in a fastq file

0

Entering edit mode

5.2 years ago

cs1878 • 0

Hello all,

Hoping you can help me out with an idea.

I have illumina MiSeq data. I was sent demultiplexed files with the sample tags removed from the sequences. The primers remained - so the sequences were not completely processed when I received them. However, I do not have access to the original raw, multiplexed-sample-tag-containing data, for reasons I won't go into.

I would like to analyse the data using PipeCraft. However, for this you need multiplexed files that retain their sample tags.

Part of the method requires an "oligo file" which contains the metadata for the sample tags.

I was wondering if anyone knew of a command line (or otherwise) method of adding a unique set of nucleotides to each sequence within a fastq file. I could then loop through my files adding a known unique sample tag for each of my samples. I could then create the "oligo file", feed this into PipeCraft and it can continue with the pipeline as expected.

My searching online has only revealed how to remove, not add nucleotides to a sequence. At this point, I do not know if this is a crazy idea that is silly to attempt, or whether this is a valid method of addressing the data that I have.

Any suggestions that you can provide would be greatly appreciated.

Best,

Chris

sequence fastq • 1.0k views

ADD COMMENT • link updated 5.2 years ago by ATpoint 82k • written 5.2 years ago by cs1878 • 0

0

Entering edit mode

Hello,

Why not just skip the Demultiplexing panel step if you already have demultiplexed data ?

ADD REPLY • link 5.2 years ago by Bastien Hervé 5.3k

0

Entering edit mode

Hi Bastien, I will try skipping this step. Thanks

ADD REPLY • link 5.2 years ago by cs1878 • 0

0

Entering edit mode

Same as the first comment from Bastien Herve, skip this step. As far as I know you always get your sequences back demultiplexed, only in some rare cases you want to do it yourself. So the pipeline should be able to handle that.

I see in de PipeCraft paper that it contains pretty general tools that are easy to use standalone. Probably you wont even need all the tools that comes with this pipeline. Do you know which cluster method you prefer? For example in PipeCraft there is a chimera check step before the clustering, but if you use UPARSE you can skip the chimera checking part because it already has a build in chimera checker. Depending on the marker and length, the assembling can be much improved by trimming of the bad ends and this also does not come back in the paper. I also don't understand why they use such an old version of blast.

The primers remained - so the sequences were not completely processed when I received them

This is normal (and how it should be), I use cutadapt to trim them but there are many other tools for it.

ADD REPLY • link 5.2 years ago by gb ★ 2.2k

0

Entering edit mode

Hi gb, many thanks for your comments. I would like to use swarm to cluster, but I am exploring options to optimise the pipeline to best represent my mock community diversity. I appreciate the comments on the pipeline. I will look into other options about the trimming and chimera checking. Best, Chris

ADD REPLY • link 5.2 years ago by cs1878 • 0