This is a beta test.
Question: Paired-end flag for singleton
0
Entering edit mode

Dear all,

I'm having some trouble to identify singletons in paired-end sequencing data from Hi-C. I have a Hi-C library originated from 150 bp (75x2) paired-end Illumina flowcell. I ran the HiC-Pro (https://github.com/nservant/HiC-Pro) from the .fastq file and I got the following results:

Total_pairs_processed   3377696 100.0
Unmapped_pairs  227709  6.742
Low_qual_pairs  0       0.0
Unique_paired_alignments        716549  21.214
Multiple_pairs_alignments       686717  20.331
Pairs_with_singleton    1746721 51.713
Low_qual_singleton      0       0.0
Unique_singleton_alignments     0       0.0
Multiple_singleton_alignments   0       0.0
Reported_pairs  716549  21.214

I'm trying to have more information about the 51.713% Pairs_with_singleton. To do this, I'm trying to extract these singleton reads. However, I can't find the proper sam/bam flag to retrieve singletons.

1- Does anyone know the proper sam/bam flag to retrieve singletons?

Apart from that, I decided to map my fastq file with bowtie2 independently of HiC-Pro using the following command:

bowtie2 -N 1 -x ~/Desktop/Genomes_ref/bowtie2/hg19 -1 mysample_S1_L001_R1_001.fastq -2 mysample_S1_L001_R2_001.fastq -S mysample.sam

Then once I tried to retrieve any singleton information, I received different flag numbers for the same read pair:

~/Desktop/test$ samtools view -f 9 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145     89      chr16   150502  42      76M     =       150502  0       CACAGGCTGCAGAGAGTGGGCGCTGTTACCCGTTCACATAAACTTTCTAACCATGCACACAGATCAGAAAACACCC        CGGGEEC<ECFAF@F:GEGEGCGGGGGGGEF9EGGGGE9FDFGGGGGGGGGFGGGGFGDGGFFCFGGGGGECCCCC    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:9279:1148     89      chrUn_gl000225  71224   1       26M     =       71224   0       CAAGAGATGTAACTATTCTCCAGGCT      EECE<ACFGFGGFFE6C-G@ECC?CC AS:i:-5  XS:i:-5 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:2G23       YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    77      *       0       0       *       *       0       0       AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG     YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    141     *       0       0       *       *       0       0       NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC        #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152    73      chr9    140284100       42      76M     =       140284100       0       GAGAGGGACAGAGAGGGACAGTGAGACCAGCAAGGAGCTGGGACGCTGGGAGCCAGGTGGATGCATGCAGAGAGGG        CCCCCEGGGGGGGGGGECGGGFGGGGGGGGGFFEGGGGFFGECFCCGEGGGGGGGGG@<DEGG<EGG@CF9<6FFE    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    77      *       0       0       *       *       0       0       AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT        CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF    YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    141     *       0       0       *       *       0       0       NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152     89      chr9    40639288        1       76M     =       40639288        0       CCTGCCAGCAGATGAGCTTCAAAGTGCCTTAAGGAAGCACTTTGACCAGAAGGTAGATAACTCTTATTATAGAAGA        GEGGGGGGCGGGCGFGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGGGGGGFGGGGGGGGGCCCCC    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10574:1152    89      chr3    162948628       30      76M     =       162948628       0       GACAAAAACAAGCAATGGGGAAATAATTCCCTATTTAATAAATGGTGTTGGGAAAACTGGCTAGCCATATGCAGAA        <C7GGGGGGGGFE9GGE<C<EDCGCGGGGGGGGGGFFCGGGGGFFFGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC    AS:i:0  XS:i:-5 XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:76 YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:15507:1153    77      *       0       0       *       *       0       0       AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC        CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    YT:Z:UP
~/Desktop/test$ samtools view -f 5 mysample.sam | head
M02015:342:000000000-BPD5F:1:1101:9901:1145     133     chr16   150502  0       *       =       150502  0       NTCCAGCTCTGTATTTAGAGTCNNNNNNNGTTGGGGAGATTGGNNNNNANTTGGGGATCAGATTTGGNNATCTTGT        #8ACCFF<FGGEFGGGGCC9FC#######::CFFDGDGGGGGG#####:#696<<@7@FF,,,FEDF##::CDC9E    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9279:1148     133     chrUn_gl000225  71224   0       *       =       71224   0       NATCAGTGCATAGATAACTCACNNNNNNNCCTGTAAGCAGAGCNNNNNCNAGAGTTACATAACCCCGNNAATCAGT        #8-B-CFFG@,,;,;FEGGDG8#######,:CC6,,<CF@F@F#####:#:,,99,,CFE,C886BC##99:C<AC    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10882:1150    77      *       0       0       *       *       0       0       AGTCCTGATCCCCAAATCTGATCCCCAAATCTGATCAGTCAGAGGAAAGTGGGCCACACGGGAAGAGAGGTTCTC CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG     YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:10882:1150    141     *       0       0       *       *       0       0       NGACAGAGACAGATCCCATCCCNNNNNNNACTGGCCTTCAAACNNNNNANATTTTAAAGCCTGAAAANNAAGCTAC        #8BCCGGGGGGGGFFFFGFGGG#######::DFGGFFGGGGG?#####:#:9CCFECFGFGGCCFFF##::CFFGG    YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:13285:1152    133     chr9    140284100       0       *       =       140284100       0       *       *       YT:Z:UP YF:Z:LN
M02015:342:000000000-BPD5F:1:1101:11747:1152    77      *       0       0       *       *       0       0       AAAAAATTGGGCCAGGCATGGTAGCTCATGCCTATAATCCCAGCACTTTGGGAGGCCAAGAGGGGAGGAACAGATT        CC<CCFGG9<,6CF@@8F@C@FGGG<F<FGGGFGFCF<6,CE,EFC<<FGGG,@@@<E<E<AFCECEF:,C,C,CF    YT:Z:UP
M02015:342:000000000-BPD5F:1:1101:11747:1152    141     *       0       0       *       *       0       0       NTCATCGAATGGACTCGAAAGGNNNNNNNTAATGGACTTGAATNNNNNGNTCCCCAAATCTGATCCCNNAATCTG #-ACCGFAFFF8<C,CD<86@@#######,,:@FFG,CEFGG9#####:#,:CFFGEF<D@9@C@F@##9:CDFC     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:9920:1152     133     chr9    40639288        0       *       =       40639288        0       NACCTG  #8BCCG  YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:10574:1152    133     chr3    162948628       0       *       =       162948628       0       NAAACCTCTAGGATCCCCAAATNNNNNNNCCAAATATGATCCTNNNNNANCCTGACAAAAACAAGCANNGGGGAA #86A@<FGGGF9@AEGGCGGCG#######,:C@FC,C<,CFFG#####:#:,@FFGGFCCFE<F@FG##::7@F:     YT:Z:UP YF:Z:NS
M02015:342:000000000-BPD5F:1:1101:15507:1153    77      *       0       0       *       *       0       0       AATCCCAGCACTTTGGCAGGCCGAGGTGGGCGGATCCCCAAATCTGATCCCCAAATCTGATCCCCAAATCTGATCC        CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG    YT:Z:UP
~/Desktop/test$ samtools view -f 5 -F 9 mysample.sam | head
~/Desktop/test$

For example, the read M02015:342:000000000-BPD5F:1:1101:9901:1145 presents the flag 89 when I use -f 9 and the same read presents the flag 133 once I use -f 5.

2- Does anyone knows why the flag changes?

Thank you in advance for your time, Raphael

ADD COMMENTlink 12 months ago docdot • 0 • updated 12 months ago prasundutta87 • 330
Entering edit mode
0

The flag does not change. Each mate has its own flag. 89 means (1=paired | 8=mate unmapped | 16=read reverse strand | 64=first in pair) and 133 means (1=paired | 4=unmapped | 128=second in pair).

So, if you want to look for singletons that are aligned, use flag 8, if you want the non-aligned mate, use flag 4 as @prasundutta87 suggests.

ADD REPLYlink 12 months ago
cschu181
♦ 1.7k
2
Entering edit mode

I believe when a read is singleton, it is 'unpaired' for a paired-end sequencing read. So, you can check for reads whose flags are not set to 1 which is the flag for 'paired-end' reads..

ADD COMMENTlink 12 months ago prasundutta87 • 330

Login before adding your answer.

Powered by the version 1.6