@RG information from Bowtie2 Alignments
0
0
Entering edit mode
6.4 years ago
landrjos ▴ 20

Hi All,

I use Bowtie2 to align my Illumina Fastq sequence files then convert to BAM using Samtools. The BAM output file does not have any barcode information. Or any @RG information at all. Here is the header from one of the BAM files....

Does Bowtie2 save this information? If so I would I access it from the BAM file?

@HD VN:1.0  SO:coordinate
@SQ SN:chr1 LN:197195432
@SQ SN:chr2 LN:181748087
@SQ SN:chr3 LN:159599783
@SQ SN:chr4 LN:155630120
@SQ SN:chr5 LN:152537259
@SQ SN:chr6 LN:149517037
@SQ SN:chr7 LN:152524553
@SQ SN:chr8 LN:131738871
@SQ SN:chr9 LN:124076172
@SQ SN:chr10    LN:129993255
@SQ SN:chr11    LN:121843856
@SQ SN:chr12    LN:121257530
@SQ SN:chr13    LN:120284312
@SQ SN:chr14    LN:125194864
@SQ SN:chr15    LN:103494974
@SQ SN:chr16    LN:98319150
@SQ SN:chr17    LN:95272651
@SQ SN:chr18    LN:90772031
@SQ SN:chr19    LN:61342430
@SQ SN:chrX LN:166650296
@SQ SN:chrY LN:15902555
@SQ SN:chrM LN:16299
@SQ SN:chr13_random LN:404305
@SQ SN:chr17_random LN:628739
@SQ SN:chr1_random  LN:1231697
@SQ SN:chr3_random  LN:41899
@SQ SN:chr4_random  LN:160594
@SQ SN:chr5_random  LN:357350
@SQ SN:chr7_random  LN:362490
@SQ SN:chr8_random  LN:849593
@SQ SN:chr9_random  LN:449403
@SQ SN:chrUn_random LN:5900358
@SQ SN:chrX_random  LN:1785075
@SQ SN:chrY_random  LN:58682461
@PG ID:bowtie2  PN:bowtie2  VN:2.1.0
HWI-ST425:160:D1JFWACXX:3:1306:14534:76146  16  chr1    3000195 42  36M *   0   0   GTATTATAATTGTAATAGTATATACTTGTATGTACT    JJJJJJJJJJJIJJJJJJJJJJHFHHHHFFFFFBCC    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:MD:Z:36    YT:Z:UU
HWI-ST425:221:C6B83ACXX:8:1201:13332:28408  0   chr1    3000206 42  36M *   0   0   GTAATAGTATATACTTGTATGTACTTAAAATATTTT    CBCFFFFDHHHHHJJJJIJJJJJJJJJJJJJJJJJJ    AS:i:-4 XN:i:4  XM:i:4  XO:i:0  XG:i:0  NM:i:MD:Z:32N0N0N0N0    YT:Z:UU
HWI-ST425:221:C6B83ACXX:8:1113:12482:87912  0   chr1    3000818 42  36M *   0   0   CTATCATGACCTCTGAATGACTAGGGAATCTTGGAC    @@@FFDFFHHHHHJJIGIJCHGHHGGIEHHGGIIIJ    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:MD:Z:36    YT:Z:UU
HWI-ST425:221:C6B83ACXX:8:2309:10001:51423  0   chr1    3000818 42  36M *   0   0   CTATCATGACCTCTGAATGACTAGGGAATCTTGGAC    CCCFFFFFHHHHHJJJJJJIIJJJJJJJIJJJJGIJ    AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:MD:Z:36    YT:Z:UU
ChIP-Seq next-gen sequencing alignment • 2.2k views
ADD COMMENT
0
Entering edit mode

Did you provide the read group information to bowtie2? If so, show the parameters you used.

ADD REPLY
0
Entering edit mode

Here is the script I am using to run Bowtie2....Where would I add the read group information?

#!/bin/bash
#shell
#$ -S /bin/bash
#$ -pe smp_2 10

##### export OMPI_MCA_orte_rsh_agent="rsh:ssh"
##### export LD_LIBRARY_PATH=/usr/global/openmpi-1.5.3-w-psm/lib:$LD_LIBRARY_PATH
#####
#
#    variables that need to be set/changed by user
#
#####

######
#
#   unpaired reads:  script is set up to handle up to 5 (use UNPAIRED_READ_1...UNPAIRED_READ_5) to specify basenames 
#
######

export UNPAIRED_READ_1="GEO_Rep2_FAIRE_P1B9_S2_R2_001_Trimmomatic"            ## JOB_NAME WILL be based on read_1
          ## BASENAME for read_2 (no filetype extension)
          ## BASENAME for first file containing unpaired reads

export REFERENCE_GENOME="mm9.RM"    ## genome against which alignments are to be made

#
#   Build unpaired read list from basenames specified above
#   also move various files to $TMPDIR and uncompress if applicable 
#

export UNPAIRED_READS=${TMPDIR}/${UNPAIRED_READ_1}.fastq          ### UNPAIRED_READ_1
gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_1}.fastq
cp ${SGE_O_WORKDIR}/${UNPAIRED_READ_1}.fastq.gz ${TMPDIR}
gunzip ${TMPDIR}/${UNPAIRED_READ_1}.fastq.gz

if env | grep -q ^UNPAIRED_READ_2=
then
  export UNPAIRED_READS=${UNPAIRED_READS},${TMPDIR}/${UNPAIRED_READ_2}.fastq    ### UNPAIRED_READ_2
  gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_2}.fastq
  cp ${SGE_O_WORKDIR}/${UNPAIRED_READ_2}.fastq.gz ${TMPDIR}
  gunzip ${TMPDIR}/${UNPAIRED_READ_2}.fastq.gz
fi

if env | grep -q ^UNPAIRED_READ_3=
then
  export UNPAIRED_READS=${UNPAIRED_READS},${TMPDIR}/${UNPAIRED_READ_3}.fastq    ### UNPAIRED_READ_3
  gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_3}.fastq
  cp ${SGE_O_WORKDIR}/${UNPAIRED_READ_3}.fastq.gz ${TMPDIR}
  gunzip ${TMPDIR}/${UNPAIRED_READ_3}.fastq.gz
fi

if env | grep -q ^UNPAIRED_READ_4=
then
  export UNPAIRED_READS=${UNPAIRED_READS},${TMPDIR}/${UNPAIRED_READ_4}.fastq   ### UNPAIRED_READ_4
  gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_4}.fastq
  cp ${SGE_O_WORKDIR}/${UNPAIRED_READ_4}.fastq.gz ${TMPDIR}
  gunzip ${TMPDIR}/${UNPAIRED_READ_4}.fastq.gz
fi

if env | grep -q ^UNPAIRED_READ_5=
then
  export UNPAIRED_READS=${UNPAIRED_READS},${TMPDIR}/${UNPAIRED_READ_5}.fastq    ### UNPAIRED_READ_5
  gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_5}.fastq
  cp ${SGE_O_WORKDIR}/${UNPAIRED_READ_5}.fastq.gz ${TMPDIR}
  gunzip ${TMPDIR}/${UNPAIRED_READ_5}.fastq.gz
fi
printenv UNPAIRED_READS

### ls -l $TMPDIR
##### export UNPAIRED_READS=${TMPDIR}/${UNPAIRED_READ_1}.fastq,${TMPDIR}/${UNPAIRED_READ_2}.fastq,${TMPDIR}/${UNPAIRED_READ_3}.fastq

#
#   Set up our executable path so that we have mpirun and friends as well as mpiblast and associates in
#      our executable path
#

export BOWTIE2_PATH=/usr/global/bowtie2-2.1.0/bin   ## PATH to bowtie2 suite
export TWOBIT_PATH=/home/jscarsda/bin/x86_64
export PATH=$BOWTIE2_PATH:$PATH

#
#   environment variable BOWTIE2_INDEXES specifies directory where reference genomes reside if not in current directory
#

export BOWTIE2_INDEXES=/home/jlandry/reference_genomes_mm9_masked

#
#    compress the data to be aligned to enable more efficient copying of data to $TMPDIR, which is a unique directory on
#    the /tmp filesystem on the node on which this job is running.  This directory is created by the queueing system for
#    our job.
#

#### gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_1}.fastq
#### gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_2}.fastq
#### gzip ${SGE_O_WORKDIR}/${UNPAIRED_READ_3}.fastq

#
#  copy files containing reads to be aligned to ${TMPDIR} since local I/O is much faster than nfs I/O
#
#

#
#   run bowtie2 to perform alignments
#

time bowtie2-align -p $NSLOTS -N 1 -x $REFERENCE_GENOME -U $UNPAIRED_READS \
                                                        -S ${TMPDIR}/${UNPAIRED_READ_1}.${JOB_ID}.sam --no-unal

#
#    gzip output sam file
#

gzip ${TMPDIR}/${UNPAIRED_READ_1}.${JOB_ID}.sam

#
#   copy compressed sam file back to home directory
#

cp ${TMPDIR}/${UNPAIRED_READ_1}.${JOB_ID}.sam.gz ${SGE_O_WORKDIR}
ADD REPLY
0
Entering edit mode

Right, so you didn't tell it to use any read group information.

ADD REPLY
0
Entering edit mode

I am not a professional. Could you suggest some changes to the script to retain the read group information?

ADD REPLY
1
Entering edit mode

Have you read the bowtie2 documentation? Do so first, this is documented there.

ADD REPLY

Login before adding your answer.

Traffic: 2672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6