segmentation for RNAseq dataset from fruitfly
1
0
Entering edit mode
8.9 years ago
ha.sr.skr • 0

Hello everyone,

I am new in bioinformatics, I have a several tasks to do, but I am really confused how can I do that.

What is needed is (given from my prof):

Given a set of gene expression data (let's use RNAseq-data for fly to keep the memory and CPU efforts down), you map them to the genome. This gives, for each sample/data set, a single signal "expression value", i.e., coverage f(x) as a function of the genomic coordinate x.

Now that task is to compute segmentations of this signal, i.e., find a set of intervals on which f is approximately constant.

First do this for every dataset separately.

Now we have a more difficult problem. Given the f_i(x) for each data set i, find a segmentation so that EACH f_i is approximately constant on each interval.

Of course, you want segmentations that have as few intervals as possible.

I would suggest to do two things:

(1) find a set of about 12 different RNAseq data sets from the fruitfly and map them to the genome.

(2) re-implement the simplest segmentation algorithms for time series-like data and test them.

(3) check how consistent are the results.

(4) how can we combine the different signal f_i to define a single criterion for segmenting the signal jointly.

The point now is that, of course, we want that the number of segments that we are defining only slowly grows with I and eventually saturates, since otherwise you just wind up with every genomic position being its owninterval -- which is of course a useless segmentation.

Can anyone explain what this tasks mean exactly:

  1. From where I can get the RNAseq (GEO, SRA, FLYBase...)
  2. What is "single signal expression value" and coverage?
  3. The sequences from databases do not contains coverage? should I calculate the coverage? If so, from where I get number of reads!!
  4. What are the segmentation algorithm that should be used?
expression_value segmentation RNA-Seq coverage • 2.3k views
ADD COMMENT
0
Entering edit mode

Check How To Ask Good Questions On Technical And Scientific Forums for some guidelines for posting questions on technical and scientific forums. One general recommendation is "do not post homework questions".

ADD REPLY
0
Entering edit mode
8.8 years ago

Hello!

If you're looking to compute segmentations (or annotations) on the genome using data like RNAseq, I'd recommend using Segway:

https://www.pmgenomics.ca/hoffmanlab/proj/segway/

I'm currently a programmer on the project and it sounds like your project might fit well for our software. We're constantly working on it and we're always happy to help and provide support on any techincal issues.

Hope that helps!

- Eric

ADD COMMENT

Login before adding your answer.

Traffic: 2063 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6