How do I find promoter regions sequences and coordinates for a long list of LncRNAs with given the hgnc_id and symbols and geneID?
1
0
Entering edit mode
8.7 years ago
2015rpro • 0

I was given an CSV file by my PI(contain columns as listed below) and he asked me to help him find the promoter regions for a list of genes that transcribes the 2000 HGNC approved LncRNAs he is interested in and he specified that he wants the raw DNA sequences, not cDNA. My thought is to be able to collect the sequences and sequence coordinates as Granges objects, but I have no clue how to start and am currently figuring how to use the bioconductor. Can anyone gives me some ideas? or workflow? I have intermediate knowledge with R, and I assume this kind of job should be done with programming as the number of targets are so big.

[1] "hgnc_id"                  "symbol"                   "name"                     "locus_group"
[5] "locus_type"               "status"                   "location"                 "location_sortable"
[9] "alias_symbol"             "alias_name"               "prev_symbol"              "prev_name"
[13] "gene_family"              "gene_family_id"           "date_approved_reserved"   "date_symbol_changed"
[17] "date_name_changed"        "date_modified"            "entrez_id"                "ensembl_gene_id"
[21] "vega_id"                  "ucsc_id"                  "ena"                      "refseq_accession"
[25] "ccds_id"                  "uniprot_ids"              "pubmed_id"                "mgd_id"
[29] "rgd_id"                   "lsdb"                     "cosmic"                   "omim_id"
[33] "mirbase"                  "homeodb"                  "snornabase"               "bioparadigms_slc"
[37] "orphanet"                 "pseudogene.org"           "horde_id"                 "merops"
[41] "imgt"                     "iuphar"                   "kznf_gene_catalog"        "mamit.trnadb"
[45] "cd"                       "lncrnadb"                 "enzyme_id"                "intermediate_filament_db""
hgnc ucsc LncRNA ensembl promoter-regions • 3.2k views
ADD COMMENT
2
Entering edit mode
8.7 years ago

The RSAT regulatory sequence analysis tools have an online tool "retrieve sequence" for exactly this. Upload the list of genes, define the length of the sequence, and it sends you back a fasta file with the putative promoters.

http://rsat.sb-roscoff.fr/

ADD COMMENT

Login before adding your answer.

Traffic: 2928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6