Question

LINCS L1000 dataset column names

2

Entering edit mode

7.6 years ago

wir ▴ 50

I'm working with a LINCS L1000 dataset that gives the GE of a cell line before and after perturbation by a small molecule. I am using Level 4 data. After loading the .gct file into matlab, I get a matrix of 22268-by-40172 as well as a vector of column_ids and a vector of row_ids.

Using the row ids and the gene metadata txt file included in the download, I know that each row represents a gene.

I can't figure out what a column represents. Obviously, each columns is a single experiment but I can't understand what each id means.

For example, here is a column id "LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03".

So far, I know that "LJP001" refers to LINCS Joint Project and "BT20" refers to the specific cell line. Somewhere, it must contain information about the small molecule used as a pertubagen but I don't know how to interpret this. Any help would be greatly appreciated!

LINCS L1000 • 4.7k views

ADD COMMENT • link updated 4.3 years ago by e.mohammadi.as ▴ 30 • written 7.6 years ago by wir ▴ 50

0

Entering edit mode

How do you get the perturbagen from the perturbagen group?

ADD REPLY • link 6.4 years ago by godwinwoo • 0

0

Entering edit mode

I have a relevant question. If you noticed, in the list of gene symbols first the landmark genes are presented. Second are the -666 genes which means the unavailable predicted genes. Third are the predicted genes which are almost 19000 genes(22268 genes=978 landmark gene + 2000 unavailable genes (-666) + 19000 predicted genes). In the list of predicted gene symbols (column 1), several gene symbols are repetitive but with different expression values in the same experiment. How it is possible?

ADD REPLY • link 4.3 years ago by e.mohammadi.as ▴ 30

0

Entering edit mode

Please open a new question and mention this post in it. You're not really adding an answer, so why use the "Submit Answer" option?

ADD REPLY • link 4.3 years ago by Ram 43k

0

Entering edit mode

How to download the data?

ADD REPLY • link 4.3 years ago by Shicheng Guo ★ 9.4k

0

Entering edit mode

22 months ago

Yep ▴ 20

In 2022, they seem to provide more information now. By querying the siginfo and compoundinfo csv files, we are able to see the perturbagen id, etc.

ADD COMMENT • link 22 months ago by Yep ▴ 20

score 4 · Accepted Answer · 2016-09-17

To answer my own question.

The column ids for Level 3 and Level 4 data is basically the distil_id. The example I posted

LJP001_BT20_24H_X1_B2_DUO52HI53LO:A03

can be broken into

the perturbagen group "LJP001"
the cell line "BT20"
the brew prefix "LJP001_BT20_24H"
the plate index "X1_B2_DUO52HI53LO"
the well index "A03"
the distil_id "LJP001_BT20_24H_X1_B2_DUO52HI53LO_A03" (note the switch from ':' to '_')

It turns out that the distil_id doesn't contain enough information to identify the perturbagen used. To identify this, you need to use the LINCS api. Here is more information about using the LINCS api to query the metadata. I also used this Coursera video as a reference. Note that the example given in the question doesn't work with the API.