I have some problem with my degradome data, the company provided me two files, raw reads and mappable reads (trimmed reads without the adapter), this is the fastQC report of mappable reads for overrepresented sequences:
Sequence Count Percentage Possible Source
AGTTCTACAGTCCGACGATCAGTTCTACAGTCCGACGATCAGTTCTACAGT 8897 0.29136760032893066 Illumina DpnII expression Sequencing Primer (95% over 21bp)
CAGAGTTCTACAGTCCGACGATCCAGAGTTCTACAGTCCGACGATCCAGAG 5043 0.16515306378091463 Illumina DpnII expression Sequencing Primer (100% over 23bp)
GCGACCCCAGGTCAGGCGGGACCACCCGCTGAGTTTAAGCATATCAATAAG 4503 0.14746861911668818 No Hit
GAGTTCTACAGTCCGACGATCGAGTTCTACAGTCCGACGATCGAGTTCTAC 4241 0.1388883885573783 Illumina Small RNA Adapter 1 (95% over 22bp)
AGAGTTCTACAGTCCGACGATCAGAGTTCTACAGTCCGACGATCAGAGTTC 3681 0.1205489644611435 Illumina Small RNA Adapter 1 (96% over 26bp)
CAGAGTTCTACAGTCCGACGATCCAGAGTTCTACAGTCCAACGATGGAATT 3234 0.10591017415575608 Illumina DpnII expression Sequencing Primer (100% over 23bp)
But the main issue is for sequence length, so the raw reads length is 51 bp, that mean RNA seq + adapter length = 51 bp. but in clean reads (mappable.fq.gz= trimmed reads without adapter ) again about 20 percent of sequence ( 500 000 ) has the SAME length which is 51 bp ( same length before trimming), how it could be possible?
Furthermore, in mappable files about 50 percent of reads could NOT match to target even with 2 allowed mismatch. The length of sequences is 10 to 51 bp
I think these data are not clean reads and there are some adapter remnants in sequences, is it right?