Viewing large compressed genetic relationship matrix .grm file
0
0
Entering edit mode
5.3 years ago
landscape95 ▴ 190

Hi everyone, I now have a large compressed genetic relationship matrix (.grm.gz) file around 870 GB. As I know, this file is the edge list of individuals via their relationship. Now I want to get this information of relationship between individual i and j but the compressed file is already large, can I have other ways to view it without extraction?

Your help is really appreciated!

GCTA • 837 views
ADD COMMENT
0
Entering edit mode

You can use zless input.grm.gz to have a look into the file. But for extracting specific information other ways are neccessary. You need to tell us more about how your data looks like and what is your goal to get more detailed help.

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you! I want to do some samples clustering based on the relationship between individuals.

ADD REPLY
0
Entering edit mode

How did you produce the file? I have not yet come across a relationship matrix of that size, even with 1000 Genomes data.

If you are comfortable on the command line (i.e. BASH / Shell) using mathematical functions, then you could just manually compute the Euclidean Distances, which is the square root of the sum of all square differences, as I show here:

mat
           Gene1 Gene2 Gene3
Sample1    2     2     1
Sample2    2     4     1

The Euclidean distance for Gene1 and Gene2 is (in R coding):

sqrt(
  sum(
    (2-2)^2,
    (2-4)^2,
    (1-1)^2 ) )
[1] 2

Check:

dist(mat, method="euclidean")
 2
ADD REPLY

Login before adding your answer.

Traffic: 2130 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6