Best way to systematically select ENCODE data to download?
1
0
Entering edit mode
5.8 years ago
Eric Lim ★ 2.1k

I normally use the following link to download ENCODE data.

https://www.encodeproject.org/files/{acc}/@@download/{acc}.fastq.gz

Using the online data selector is certainly one way to figure out the {acc}, but I'm wondering if I'm missing an easier way to batch download wanted data from ENCODE.

The .tsv provided by ENCODE has all the information I need to select wanted data, from experiments, assay types, species, etc, but I can't find anything that I can use to convert into accession ids.

Any advice?

encode • 2.1k views
ADD COMMENT
2
Entering edit mode
5.8 years ago
Eric Lim ★ 2.1k

I am primarily interested in their KD and control RNA-Seq, so I ended up writing a couple simple functions to retrieve the file IDs, given the experiment accession. Hope this might be helpful for someone.

import os
import requests

def get(resource,
        url='https://www.encodeproject.org/{}/?format=json',
        headers={'accept': 'application/json'}):
    return requests.get(url.format(resource), headers=headers).json()

def get_exp(exp_acc):
    def format(file):
        return [file['accession'], \
               file['paired_end'], \
               file['replicate']['biological_replicate_number']]

    response = get(os.path.join('experiments/', exp_acc))
    controls = set()
    for file in response['files']:
        if file['file_type'] == 'fastq':
            yield ['KD'] + format(file)
            controls |= set(file['replicate']['experiment']['possible_controls'])
    for ctrl in controls:
        response = get(ctrl)
        for file in response['files']:
            if file['file_type'] == 'fastq':
                yield ['Control'] + format(file)

from pprint import pprint
pprint(list(get_exp('ENCSR426UUG')))
ADD COMMENT

Login before adding your answer.

Traffic: 1814 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6