Biostar Beta. Not for public use.
Programmatic access to Sample Information from GEO via GSM or SRR number
0
Entering edit mode
14 months ago
seidel 6.8k
United States

How does one programmatically get access to all sample information for a given sample using either the GSM number or the SRR number? I've used esearch to get runinfo, and map from GSE series identifiers to GSM sample ids and PRJN SRA identifiers to get corresponding SRR numbers, etc. But for a given ChIP Seq experiment the antibody used for a given sample is nowhere among any of that information and only appears as a sample attribute on a web page:

GEO WEB page

How does one get programmatic access to any arbitrary sample attribute given a GSM or SRR id?

GEO • 392 views
ADD COMMENTlink
2
Entering edit mode
14 months ago
vkkodali ♦ 1.1k
United States

You can query Biosample using GSM accessions and parse the Biosample docsum to extract this information as follows:

esearch -db biosample -q 'GSM3143747' \
  | esummary \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute

BioSample: SAMN09214109; SRA: SRS3306622; GEO: GSM3143747       Yeast Cell      seb1-1 epe1delta tfs1DN 30 C    Anti-H3K9me2 (abcam ab1220)

On the other hand, if all you have are SRR accessions you can get to Biosample using elink as follows:

esearch -db sra -q 'SRR7172016' \
  | elink -db sra -target biosample -name sra_biosample \
  | xtract -pattern DocumentSummary -element Identifiers,Attribute
ADD COMMENTlink
0
Entering edit mode

Excellent! This record has 80 samples so I can just write a loop and parse the results. There are some arguments in your example I didn't know about. Thanks!

ADD REPLYlink
1
Entering edit mode

If you are planning to write a loop, be sure to check out the sections on While Loop and For Loop here: https://www.ncbi.nlm.nih.gov/books/NBK179288/#chapter6.Automation

ADD REPLYlink
0
Entering edit mode
4 months ago
genomax 68k
United States

Using EntrezDirect you can pull up information like this.

$ esearch -db sra -query "GSM3143747" | esummary | xtract -pattern DocumentSummary -element LIBRARY_STRATEGY,LIBRARY_CONSTRUCTION_PROTOCOL
ChIP-Seq    ChIP DNA was extracted by bead beater with 0.5mm zirconia beads.  ChIP DNa was isolated by antibodies directed to our protein of interest (H3K9me2 or rpb3x-FLAG).  DNA was then isolated by incubation by SDS/proteinase K followed by column purification (Macherey-Nagel Nucleospin Gel Cleanup Columns). Libraries were prepared using End-It DNA End Repair Kit (Epicenter) followed by A-tailing using Klenow fragment (NEB), ligation to Illumina adaptors using Rapid T4 DNA Ligase (Enzymatics), PCR amplification for 15 cycles, and size selection between 200-350bp by bead purification and gel extraction.

@vkkodali may be by later to provide a more refined answer.

ADD COMMENTlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.3