15 months ago
Seattle, WA USA
$ mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -D hg19 -N -e "SELECT k.chrom, kg.txStart, kg.txEnd, x.geneSymbol FROM knownCanonical k, knownGene kg, kgXref x WHERE k.transcript = x.kgID AND k.transcript = kg.name AND x.geneSymbol LIKE 'CTCF';" > CTCF.bed
The chromosome and positions of CTCF in
hg19 are in the first three columns of the unsorted BED file result.
A gene can have multiple transcripts, so you can get more than one record for a given HGNC gene name.
This result relies on three tables in the UCSC Genome Browser for database
The schema of
knownCanonical is located here: http://genome.ucsc.edu/goldenpath/gbdDescriptionsOld.html#KnownCanonical
The schema of
knownGene is located here: http://genome.ucsc.edu/cgi-bin/hgTables?db=hg19&hgta_group=genes&hgta_track=knownGene&hgta_table=knownGene&hgta_doSchema=describe+table+schema
Likewise, the schema of
kgXref is located here: http://genome.ucsc.edu/goldenpath/gbdDescriptionsOld.html#KgXref
The rest is just a SQL query based on the schemas of the three tables, along with database and host parameters that are specific to UCSC.
Part of the "magic" is knowing what tables and fields to use. This comes from experience with the Genome Browser and exploring the links to schemes that are usually available from the table description pages on the Genome Browser site, as well as scouring discussion threads and asking UCSC mailing lists directly, when that information is difficult to find, or seems to be unavailable.