1000 Genomes: Where To Get Samples Description
Entering edit mode
13.3 years ago
Chronos ▴ 610

What is the best source of meta-data about 1000 genomes samples (genomes)?

I can't make much sense of 201012141000genomessamples.xls, thus asking for a better source.

More specifically, I find these cases confusing:

unrel/duo and unrel/trio entries (is HG00144 the mother of HG00146 and HG00147? If yes - why? How come NA19625 is the only sample in family 2357-01, but is marked "trio"?):

HG00144   GBR   SRS006877   female   Mother    unrel/duo
HG00146   GBR   SRS006879   female   Sibling   unrel/duo
HG00147   GBR   SRS006880   female   Sibling   unrel/duo
NA19625   ASW   SRS003634   female   child     unrel/trio

"not father" cases - is that when biological father is not the husband? Then why is he listed under that family ID?

NA18510   YRI   SRS000103   Y010-03   male   not father   unrel

type=unrel for clearly related samples:

NA11932   CEU   SRS001261   1424-13   male     mat grandfather    unrel
NA11933   CEU   SRS001262   1424-14   female   mat grandmother    unrel

To the example above, two more unclarities: a) do the numbers after dash (as in 1424-13, 1424-14) matter?, and b) what is the purpose of replacing father and mother with (maternal|paternal) grand(father|mather), if there is no way to link these grandparents to their children and grand-children?

Also, what would addnl related mean in the unrel/duo/trio column?

genome • 4.6k views
Entering edit mode

Ok, I've figured one out:

trios with no obvious "third one", like in family SH071 (there are many of these):

HG00634 CHS     SRS008697       SH071   male    father  trio
HG00635 CHS     SRS008698       SH071   female  mother  trio

Here, the problem is that only parents were sequenced, with children likely postponed to a later phase. That is why there are many "trios" with only parents present.

Entering edit mode

At least for the 1424- family, "unrel" is explained by the fact that only "father" with his grandparents are related; as neither mother nor children are listed, maternal grandparents can be viewed as unrelated. This also partially answers my question b) under the 1424- example.

Entering edit mode

Many collections are trios as while for the low coverage sequencing we are only sequencing unrelated individuals the presence of the child from the trio will be useful for validation purposes

Entering edit mode
13.3 years ago

there used to be a slightly more descriptive sample file on the 1000 Genomes site that may probably want to check, although the main information is present on this new one. you can grab it from here.

regarding your particular doubts, I have to say that I haven't found any more detailed place for retrieving information of those samples. if it makes you feel better, I've also gone through the list looking for truely unrelated samples, and it hasn't been always clear to me.

Entering edit mode

Thanks a lot for the file! It seems to have more details.

Entering edit mode

Yes, it does make me feel better :) The problem I had was with automatic parsing of samples information into graph-like ancestry trees. Under time pressure, I've decided to postpone ancestry parsing until maybe 1000 genomes project produces a more consistent (machine-parseable/readable) description.


Login before adding your answer.

Traffic: 1017 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6