Biostar Beta. Not for public use.
check duplicates in two columns
0
Entering edit mode
21 months ago
European Union

Hi all!

I have two different files: a .map (from illumina genotyping with bead chip) and a .vcf (from NGS of a Pools of individuals). I'm interested in finding variations that are in both files, so I would have to compare for column 1: #CHROM and 4: POS (for .map) and column 1 #CHROM and 2: POS (for .vcf) to obtain some variations that are in common. I tried using awk but without success. Any suggestions will be very appreciated.

Greetings

Marco

ADD COMMENTlink
2
Entering edit mode

On stackoverflow.com you will find "thousands" of questions related to this issue.

ADD REPLYlink
2
Entering edit mode
ADD REPLYlink
0
Entering edit mode

Yup, that's true, but not thousands :-P.

ADD REPLYlink
0
Entering edit mode

You win. Technically.

ADD REPLYlink
1
Entering edit mode

Can you post your awk command?

ADD REPLYlink
0
Entering edit mode

thank you for the answers, my awk command is:

awk -F'\t' 'NR==FNR{c[$1$2]++;next};c[$1$4] > 0' file.vcf file.map

where $1$2 are #CHROM and POS in the .vcf file and $1$4 are #CHROM and POS for the .map file

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1