Biostar Beta. Not for public use.
Gene list from .bed file needed, please help!
0
Entering edit mode
2.8 years ago
o.hickman • 10

I have bed files (from an ENCODE eCLIP experiment) in the format below.

I need to obtain a gene list from the chromosomal coordinates.

I have tried Galaxy: using USCS table browers KnownGene and kgXref functions, and join operations, but the gene list I get has clearly been duplicated in some way as some genes have the correct number of eCLIP tags, and some have thousands more than are evident when I view the .bed file in IGV.

has anybody got a simple, up to date way or solving this. I do not code so simple explanations if possible. Previous workflows in galaxy have not worked.

Thanks in advance!!

Oliver

chr7 155100450 155100506 rep02 1000 + 4.49254608837777 22.7294143201152 -1 -1

chr7 155100424 155100441 rep02 1000 + 3.74937915504325 15.3042207236355 -1 -1

ADD COMMENTlink
1
Entering edit mode

For your next post, don't forget to specify that you don't use Linux. You are making it harder on yourself as such because many tools in bioinformatics are made for Linux. Some might be available in Windows as well, but not optimal.

ADD REPLYlink
1
Entering edit mode

don't forget to specify that you don't use Linux

enter image description here

ADD REPLYlink
0
Entering edit mode

If you happen to use right Win10 version you would be able to use the unix bash shell available. But I do concur with @Wouter.

ADD REPLYlink
0
Entering edit mode
11 months ago
France/Nantes/Institut du Thorax - INSE…
$ cat input |\
awk '{printf("select K.chrom,MIN(K.txStart),MAX(K.txEnd),X.geneSymbol from knownGene as K,kgXref as X where K.chrom=\"%s\" and NOT(K.txEnd < %s or K.txStart>%s) and K.name=X.kgId group by K.chrom,X.geneSymbol;\n",$1,$2,$3);}' |\
mysql -N --user=genome --host=genome-mysql.soe.ucsc.edu -A -D hg19  |\
sort | uniq
ADD COMMENTlink
0
Entering edit mode

Hi Pierre, Would you mind explaining that post? Thanks, Oliver

ADD REPLYlink
0
Entering edit mode
  • 'input' is your bed file;
  • awk is used to build a mysql query fetching the chrom/start/end/geneSymbol from the UCSC in each BED line.
  • pipe those queries into mysql
  • remove the duplicates with sort | uniq
ADD REPLYlink
0
Entering edit mode

Hi Thanks Pierre, Is this using R? Thanks, Oliver

ADD REPLYlink
0
Entering edit mode

No it is not. It is using cat/awk/sort/uniq that are built into UNIX and mysql.

ADD REPLYlink
0
Entering edit mode
10 months ago
EagleEye 6.4k
Sweden

I assume you would like to associate your binding sites (From ENCODE eCLIP) to genes or different genomic locations. In that case you can make use of, GREAT OR Homer annotate peaks (Homer will provide detailed results when you use your own GTF annotation).

ADD COMMENTlink
0
Entering edit mode

Thanks for that, I don't have a Unix OS but if I get access to one I will try downloading HOMER and give it a try. O.

ADD REPLYlink
0
Entering edit mode

UNIX/LINUX is always bioinformatics friendly.

ADD REPLYlink

Login before adding your answer.

Similar Posts
Loading Similar Posts
Powered by the version 2.1