how to retreive and download sequence information
0
0
Entering edit mode
9.2 years ago
Affan ▴ 300

I have created a position weight matrix based on transcription factor binding sites in the FANTOM 4.

In my code (R), I have trained my PWM with TFBS in chr1, chr2. Now, I want to use this PWM to scan chr3 - chr22 to analyze the accuracy of my PWM.

What is the best way to retrieve a "stitched" string of chr3 - chr22. (or even individual ones if a single string is too large).

I tried using the DAS server but it doesn't work without giving coordinates.(http://genome.ucsc.edu/cgi-bin/das/hg18/dna?segment=chr2)

Doing my own homework, I see that both BioConductor and SeqinR package for R can do this. But I can't seem to figure out what the right workflow/code is to retrieve this information.

For what its worth, I do have hg18 downloaded as separate .fa files. I am fairly certain that there exists a function in SeqinR/BioConductor to read these fasta files. Is this the best way to do this?

sequencing databases • 2.0k views
ADD COMMENT
0
Entering edit mode

What do you mean with 'stitched' string of chr3-chr22? Apparently, you want the genomic sequence of the chromosomes pasted together but I think you want something else.

ADD REPLY

Login before adding your answer.

Traffic: 2655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6