Compare consecutive columns of a phased Beagle file to generate the number of elements that matches.
1
0
Entering edit mode
8.9 years ago
aritra90 ▴ 70

I have a Beagle phased output and I want to compare consecutive columns of a file and return the number of matched elements. I would prefer to use shell scripting or awk. Here is a sample bash/AWK script that I am trying to use.

!/bin/bash
for i in 3 4 5 6 7 8 9
do
  for j in 3 4 5 6 7 8 9
   do
    awk "$i == $j" phased.txt | wc -l
  done
done

I have a file of size 147189828 and I want to compare each columns and return the number of matched elements in a 828\828 matrix (A similarity matrix). This would be fairly easy in MATLAB, but, it takes a long time to load huge files. I can compare two columns and return the number of matched elements with the following awk command: awk '$3==$4' phased.txt | wc -l, but would need some help to do it for the entire file.

A snippet of the data:

# sampleID   HGDP00511  HGDP00511   HGDP00512   HGDP00512   HGDP00513   HGDP00513
M rs4124251       0                     0                      A                     G                  0                        A
M rs6650104       0                     A                      C                     T                  0                        0
M rs12184279      0                    0                      G                      A                 T                        0
..
..
beagle bash awk shell • 2.7k views
ADD COMMENT
0
Entering edit mode

Always show a snippet of data, as I have no idea what a phased beagle file is, but I can help you with comparison.

ADD REPLY
0
Entering edit mode

Hi Sukhdeep,

Thanks for reaching out. I have posted a snippet of the sample data. Your help is much appreciated.

ADD REPLY
0
Entering edit mode
8.9 years ago
aritra90 ▴ 70

SOLVED.

I was missing the $$

Thanks :)

ADD COMMENT

Login before adding your answer.

Traffic: 2369 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6