Freebayes outputs only one individual out of many
1
0
Entering edit mode
5.4 years ago
Fedster ▴ 30

Hi All,

I am running freebayes as:

freebayes -f Gasterosteus_aculeatus.BROADS1.dna.toplevel.fa -C 5 -L p1list.txt --populations p1pops.txt> plate1.vcf

with p1list.txt (the path is the same for all files):

/path/to/S-176.sorted.bam
/path/to/S-177.sorted.bam
/path/to/S-178.sorted.bam
/path/to/S-179.sorted.bam
/path/to/S-180.sorted.bam
/path/to/S-181.sorted.bam
/path/to/S-182.sorted.bam
/path/to/S-184.sorted.bam
/path/to/S-185.sorted.bam
/path/to/S-186.sorted.bam
...

and p1pops.txt:

S-176.sorted    pop1
S-177.sorted    pop1
S-178.sorted    pop1
S-179.sorted    pop1
S-180.sorted    pop1
S-181.sorted    pop2
S-182.sorted    pop2
S-184.sorted    pop2
S-185.sorted    pop2
S-186.sorted    pop2
...

yet if I do

awk '{if ($1 == "#CHROM"){print NF-9; exit}}' plate1.vcf
1

which means most of my samples have been ignored or discarded. Freebayes is version: v1.2.0-2-g29c4002.

I'd be grateful for any idea of what is going on.

freebayes • 4.5k views
ADD COMMENT
1
Entering edit mode

With your awk command, you just print out the content of the 9th column before the last column, in the line where the first column is #CHROM. Saying this you will print out the sample name of the first sample in your vcf. I you havent't passed a sample name via ReadGroup during alignment, freebayes will just enumerate them. I guess this is not what you like to do?

fin swimmer

EDIT: Sorry, your command should print the number of samples and not the content of the column. Therefore there had to in $.

ADD REPLY
0
Entering edit mode

are you saying I should have set a ReadGroup tag to each bam file (corresponding to the IDs in the populations file?) for freebayes to correctly ID the bam files as belonging to different IDs?

ADD REPLY
0
Entering edit mode

This would be best-practice, yes. But it isn't neccessary.

What's the output of:

$ grep "^#CHROM" plate1.vcf

fin swimmer

ADD REPLY
0
Entering edit mode
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  unknown
ADD REPLY
0
Entering edit mode

Ok, I tried to add RG tags and it is just basically a fail. I can add tags, but freebayes still does not like them. Given a file for ID x

samtools addreplacerg -r ID:x -r SM:x -o x.bam  x.sorted.bam

adds a tag (at least, I can see a tag in the file and I get no error), but to no avail:

ERROR(freebayes):  could not find SM: in @RG tag

so either there is no pleasing freebayes or samtools is not adding all the tags I am specifying.

ADD REPLY
0
Entering edit mode

What's the output from:

$ samtools view -H x.bam

and

$ samtools view x.bam|head -n10

?

Have you used x.bam in your freebayes command?

fin swimmer

ADD REPLY
0
Entering edit mode

technically I used not just x.bam but w whole lot of bams (sorted) generated by bowtie2. Using a real sample name:

samtools view -H S-177.sorted.bam |head
@HD VN:1.0  SO:coordinate
@SQ SN:MT   LN:15742
@SQ SN:groupI   LN:28185914
@SQ SN:groupII  LN:23295652
@SQ SN:groupIII LN:16798506
@SQ SN:groupIV  LN:32632948
@SQ SN:groupIX  LN:20249479
@SQ SN:groupV   LN:12251397
@SQ SN:groupVI  LN:17083675
@SQ SN:groupVII LN:27937443

samtools view -H S-177.sorted.bam |tail
@SQ SN:scaffold_1927    LN:175
@SQ SN:scaffold_1928    LN:149
@SQ SN:scaffold_1929    LN:138
@SQ SN:scaffold_1930    LN:134
@SQ SN:scaffold_1931    LN:132
@SQ SN:scaffold_1932    LN:107
@SQ SN:scaffold_1933    LN:67
@SQ SN:scaffold_1934    LN:60
@PG ID:bowtie2  PN:bowtie2  VN:2.3.4    CL:"/user/leuven/319/vsc31924/appl_taito/bowtie2-2.3.4-linux-x86_64/bowtie2-align-s --wrapper basic-0 -x /user/leuven/319/vsc31924/data/BOWTIE/tss -S S-177.sam -U S-177.flash.extendedFrags.fastq.gz"
@RG ID:S-177
ADD REPLY
0
Entering edit mode
@RG ID:S-177

Here the sample name is missing. If you really used the command you used above this line should look like this:

@RG ID:x    SM:x

(Or whatever you take for x)

Try using ' around the the values in the samtools command (even if it wasn't necessary in my test)

fin swimmer

ADD REPLY
0
Entering edit mode

and

samtools view S-177.sorted.bam |head -n5
K00335:147:HL3KJBBXX:7:1227:25286:38627 0   MT  6894    1   64M *   0   0   CTGCTAAACGTGAAGTTTTAGCAGTTGAAATAACAACAACAAACGTCGAGTGACTTCACGGCTG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ>JJJ    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0XG:i:0    NM:i:0  MD:Z:64 YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:2225:26332:25668    0   MT  6894    1   64M *   0   0   CTGCTAAACGTGAAGTTTTAGCAGTTGAAATAACAACAACAAACGTCGAGTGACTTCACGGCTG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJ    AS:i:0  XS:i:0  XN:i:0  XM:i:0  XO:i:0XG:i:0    NM:i:0  MD:Z:64 YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:2217:5852:2352  0   MT  8155    0   153M    *   0   0   CAGCCATCGCTATTGCCCTCCCCTGAGTCCTTCTTCCCACCCCTACTGCCCGATGAACAAGCAACCGATTTCTAGGTCTCCAGGGTTGATTTATTAATCGTTTCACACAACAGCTTCTCCTACCAGTAAACCTTGGAGGACATAAATGAGCAG   JJJJJJJJJJJJJJJJJJJAJJJFAJJJJJFFFFJFJ>FJJJJAFFA-FFJJJJFJJJJJJAJJJJJJJJFFFJAFAFFFJJJJJJJJJJFJJJJFJJAJFJJJJJJJJJJJFF<FA0FJ+%<0F<AJJJ5FAA0FJJJJJJJJJJJ7<-A--   AS:i:-37    XS:i:-37    XN:i:0  XM:i:7  XO:i:0  XG:i:0  NM:i:7  MD:Z:7T5C26T2C29G29T17T31   YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:1101:3315:43357 0   MT  14083   1   132M    *   0   0   CTGCTGAATATGCAAAAACAACTAATATTCCTCCTAAATAGATAAGAAACAAGACCAGGGATAAAAATGGACCCCCATGACTTACTAGCACGCCACAACCCATACCTGCTACCATCACCAACCCTAGAGCAG    JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJJJJFF<<A:JJ    AS:i:-22    XS:i:-22    XN:i:0  XM:i:4  XO:i:0XG:i:0    NM:i:4  MD:Z:25C26A56C16A5  YT:Z:UU RG:Z:S-177
 K00335:147:HL3KJBBXX:7:1103:13240:24261    0   MT  14083   1   132M    *   0   0   CTGCTGAATATGCAAAAACAACTAATATTCCTCCTAAATAGATAAGAAACAAGACCAGGGATAAAAATGGACCCCCATGACTTACTAGCACGCCACAACCCATACCTGCTACCATCACCAACCCTAGAGCAG    JJJJJJJJJJJJFFAFFJJJJJJJJJJJJJFJJA<JJJJAFFFFJFFJFJJJJFFFFFAJJAJJFAJJAJJFJJFAAAA<F<FF<<FJFFJJAJJJJJJJAJFJJJFJJJFFFAF<AFFF<FAAF<FA:*::    AS:i:-23    XS:i:-23    XN:i:0  XM:i:4  XO:i:0XG:i:0    NM:i:4  MD:Z:25C26A56C16A5  YT:Z:UU RG:Z:S-177
ADD REPLY
2
Entering edit mode
5.4 years ago
h.mon 35k

As finswimmer said, you need to add read group tags to each bam:

To call variants in a population of samples, each alignment must have a read group identifier attached to it (RG tag), and the header of the BAM file in which it resides must map the RG tags to sample names (SM). Furthermore, read group IDs must be unique across all the files used in the analysis. One read group cannot map to multiple samples. The reason this is required is that freebayes operates on a virtually merged BAM stream provided by the BamTools API. If merging the files in your analysis using bamtools merge would generate a file in which multiple samples map to the same RG, the files are not suitable for use in population calling, and they must be modified.

The freebayes readme also suggests a solution in case your bam don't have RG tags: you can stream your bam through bamaddrg and pipe directly to freebayes.

ADD COMMENT
0
Entering edit mode

finswimmer pointed out adding RG tags is best practice but technically not necessary. In any case I will give it a go as it is not working for me as it is.

ADD REPLY
0
Entering edit mode

Technically it is not neccessary because even without a ReadGroup it is a valid bam file. But there are many downstream analyse programs that need this information, e.g. for joint variant calling, like you like to do, that need this information. So it is best practice to include this information from the beginning regardless what your final goal is.

Saying this you can of course use bamaddrg as a workaround. But I would recommend to fix your bam file by adding the readgroup with the sample name using samtools addreplacerg or picard AddOrReplaceReadGroups.

fin swimmer

ADD REPLY
0
Entering edit mode

One thing that is not clear from bamaddrg is whether the option -L (list of files) is then needed. Is it?

ADD REPLY

Login before adding your answer.

Traffic: 1986 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6