Biostar Beta. Not for public use.
Which truseq trimmomatic adapters file to use when removing truseq adapters?
0
Entering edit mode
19 months ago
salamandra • 200

1 - I'm analysing RNA-seq data from a publication that says adapters used are Truseq. I want to trim adapters from this data with trimmomatic, but in 'adapters' folder in trimmomatic there're several files with 'truseq' in the name: 'TruSeq2-PE.fa', 'TruSeq2-SE.fa', 'TruSeq3-PE-2.fa', 'TruSeq3-PE.fa' and 'TruSeq3-SE.fa'. Which of those files should be used?all?

2 - Also, why does TruSeq Index Adapter sequence in trimmomatic 'TruSeq3-SE.fa' file has an extra 'A' nucleotide at the beginning of the sequence:

>TruSeq3_IndexedAdapter
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC

when comparing with same adapter in adapter sequences' pdf provided by Illuminia (page 25): https://support.illumina.com/content/dam/illumina-support/documents/documentation/chemistry_documentation/experiment-design/illumina-adapter-sequences-1000000002694-06.pdf ?

5’ GATCGGAAGAGCACACGTCTGAACTCCAGTCAC....

3 - Why is the TruSeq Universal Adapter in trimmomatic:

>TruSeq3_UniversalAdapter
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA

the reverse complement of the 3' part of same adapter in pdf provided by Illuminia (page 25)?

TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

4 - What are the different index underlined in adapter sequences provided by Illuminia?

ADD COMMENTlink
2
Entering edit mode

Maybe ask those detailed questions to one of the Trimmomatic devs (Tony Bolger is usually quite helpful -- you can find his details on the same webpage that also has the Trimmomatic documentation). If you do, please don't forget to share your newfound knowledge here.

Also, slightly out of topic: It is Illumina, not Illuminia. ;)

ADD REPLYlink
0
Entering edit mode

thanks for noticing, otherwise i would continue saying Illuminia

ADD REPLYlink
1
Entering edit mode

So Tony replied:

"1 - It depends mostly on which TruSeq protocol was used (V2 - which is old at this stage and usually data from the GAII, or V3, which is everything from the HiSeq or later machines), and whether the data is single-ended or paired ended (SE or PE). The only exception is TruSeq-3-PE which has two sets - TruSeq-3-PE.fa works fine for high quality libraries, but TruSeq-3-PE-2.fa contains some additional sequences which find partial adapters in unusual location/orientation.

2 - This reflects the A added during A-tailing.

3 - Because, AFAIK, that is the orientation which the adapter will have if it is included in the read. Naturally you can add it, or any other sequence you find and don't like, to the adapter file if it works better for you. "

ADD REPLYlink
1
Entering edit mode

There is a core sequence common to Illumina adapters. Once trimming programs find that sequence everything to the right of that core is generally trimmed.

ADD REPLYlink
0
Entering edit mode

So, where do I find that sequence to provide it to trimmomatic?

ADD REPLYlink
0
Entering edit mode

The best way to know for sure is to ask the sequencing facility. They should provide you with that information as a customer service. Often the standard adapters might work, but sometimes they might have used their own modifications. Whenever we receive a sample I do this. Second best option is to check for known sequences using fastQC or another program. SE = single-end, PE=paired-end. The protocol version should be given in the Material&Methods of the paper.

ADD REPLYlink
0
Entering edit mode

it's single end, my question is to whether should I use the trueseq 2 or 3 files or both... and i'm not a customer of them

ADD REPLYlink
1
Entering edit mode

Start with TruSeq3-SE.fa If that does not seem to trim anything then try TruSeq2-SE.fa

ADD REPLYlink
0
Entering edit mode

How can we check that it trimmed? Are adapters always at the begining of the reads, and if trimmed they disapear? I'm sorry if i'm doing dumb questions, i'm just start learning with bioinformatics on my own..

ADD REPLYlink
1
Entering edit mode

Adapter sequence should never be at the beginning of reads. If that is the case then you may have an adapter dimer without an insert. You will not see any adapter sequence (and hence nothing may be trimmed) if your insert sizes are longer than the number of cycles of sequencing. Only if you have short inserts (that are smaller than the length of sequencing) then you will see adapter sequences towards 3'-end of the reads. I am not a regular trimmomatic user but I assume it should produce a log of what got trimmed (if any).

If you are willing then I suggest you give bbduk.sh from BBmap suite a try instead or in addition to trimmomatic. Easy to use and understand options. Here is a guide to get you started.

ADD REPLYlink
0
Entering edit mode

it makes sense for adapter sequences to be only at the end of reads, nevertheless bbduk.sh has an option for removing 5' adapters ( “ktrim=r” is for right-trimming (3′ adapters), and “ktrim=l” is for left-trimming (5′ adapters), so probably to remove the dimers I guess

ADD REPLYlink
1
Entering edit mode

bbduk.sh can trim any type of sequences (not only adapters). One can even provide sequences to scan/trim by using literal=seq1,seq2 etc. (with real sequences in place of seq1 and seq2).

ADD REPLYlink
0
Entering edit mode

There is a core sequence common to Illumina adapters

True, but mind that different types of adapters have different cores, e.g. TruSeq vs. Nextera (becomes important once you analyze stuff like ATAC-seq).

ADD REPLYlink
1
Entering edit mode

One option is to use adapters.fa included with BBMap suite in the resources directory. It contains all commonly used commercial adapter kit sequences. There may be some additional trimming of the data (by using a common file) but that should not greatly affect the end result, especially when you have millions of reads to work with.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3.1