Non-computational challenges in bioinformatics
2
7
Entering edit mode
4.3 years ago
igor 13k

Is there a good writeup about non-computational challenges in bioinformatics, preferably with potential solutions? Basically, issues that bioinformaticians deal with that are often not appreciated by the people they work with and require a more collaborative solution. For example, dealing with low quality data, bad experimental design, insufficient storage, etc.

I would like to have a nice collection of known problems, which would probably not be surprising to many readers here. But also, I am curious if there are ways that computational biologists can somehow help resolve those problems rather than just complain about them.

collaboration • 1.5k views
ADD COMMENT
5
Entering edit mode

People failing to appreciate how long it takes to get something done.

ADD REPLY
3
Entering edit mode

So far the major issues I had were related to bad experimental design

ADD REPLY
0
Entering edit mode

It would be better if you can focus/refine this challenge (unless you meant it as a very general question). If a person was working in a core facility then these sort of challenges are part of every day job that you can't avoid. If you were helping people out of goodness of your heart then you may have a completely different reference/angle.

ADD REPLY
0
Entering edit mode

That's a good point. I don't know if the distinction is always clear. Sometimes people who are helping out of goodness of their heart are treated as a core facility.

ADD REPLY
7
Entering edit mode
4.3 years ago

Oh boy, lots of these:

  • Poor documentation for old data that you're told to make into something publishable. Usually it's also garbage quality.

Sometimes it's better to convince someone to take a project behind the barn and put it down rather than spending immense amounts of time and effort to generate low-confidence results. Most people are too stubborn to kill projects, however. Learning to say "no" to potential collaborations that seem doomed to fail/drag on forever is perhaps the most important non-computational skill for early career scientists to learn, in my opinion.

  • Collaborators who want you to do an analysis, but don't really know the question they're trying to answer.
    • Or that label all their files as non-descriptive nonsense and don't provide a sample map despite you asking multiple times.
    • Or saying they want to know X, Y, and Z, but their data can really only answer Q.
    • Or get upset when your sound analysis doesn't show exactly what they want/hope.
    • Or generally have an utter disregard for other projects/your own work, demanding an unreasonable turnaround.

Most of those can be prevented by meeting with those designing the study ahead of time, but many PIs seem to ignore the analysis aspect until they suddenly have a bunch of data that they don't know how to handle. Even then, being very clear about what you need (data, metadata, etc), exactly what they want, and how long you expect it to take (plus an extra 20-30% of time) can help assuage them. Being clear, upfront, and maintaining constant communication is crucial.

Additionally, having some self-respect is important. You are not a robot, sometimes things take longer than you expect, and collaborators need to understand that performing complex, custom analyses is a task that requires significant planning, critical thinking, and resources...just like involved wet-lab experiments. Sometimes you need to do significant literature digging to determine the most appropriate methods for their data, particularly if you're not familiar with the assay/format/typical analysis methods. If they didn't plan for any of that, that's on them.

  • Poor experimental design in terms of lack of proper controls and replicates.

Again, all project planning should include the person who's actually going to be analyzing the data. A pipe dream, certainly, but maybe someday.

  • "Fishing"-type studies in general.

These tend to lead to drawn out analysis periods with lots of work done that doesn't produce anything of use. If you start hearing, "well can you look at the data this way" every time you meet, then yup, you're fishing.

ADD COMMENT
4
Entering edit mode

You are not a robot, sometimes things take longer than you expect, and collaborators need to understand that performing complex, custom analyses is a task that requires significant planning, critical thinking, and resources...just like involved wet-lab experiments.

I've come to the conclusion that this is a key thing that wetlab people struggle with - and thats because in the wet lab, a large part of the work is following a protocol. Writing a new protocol and be very demanding. And learning to do a protocol well that you've not done before can also be a challenge. But once you can do it, it is down, and you can do it again and again. Yes, you might need green fingres to make it work, but you don't have to reinvent a new wheel every day.

But for all but the most basic bioinformatics it doesn't work like that. Its rarely the case than you can follow tightly defined protocol with little to no deviation. You are effectively writing a new protocol for every analysis you do. Not only does this mean that analysis takes time, and is unpredictable, and results may be subject to revision, but it also means I can't just teach someone how to do a particular analysis in isolation from the rest of the bioinformatics core skills/mindset.

I get fed up with the idea that someone thinks I can just show their student how to do analysis X, Y or Z in an afternoon, and then they can fulfill the need for analysis of that experiment type going forward in their lab.

The closest thing you can get to a protocol in bioinformatics is perhaps something like an RNA-seq analysis. But even analyses soon get beyond the standard protocol. We used to have a pipeline that took fastqs and a design matrix, and spat out DE genes and a whole load of QC. But we gave up on it. It oscillated around being so complex and covering so many bases, that it was a maintenance nightmare where you could find the result you wanted in the report anyway; and being so simplified that no one ever used it because they needed something more bespoke for their design.

ADD REPLY
0
Entering edit mode

We used to have a pipeline that took fastqs and a design matrix, and spat out DE genes and a whole load of QC. But we gave up on it.

Maybe this is a glass half-full or half-empty kind of situation. In my personal experience, a sizable portion of researchers will just check a few of their favorite genes in the DE table. If those are changing in the right direction, case closed.

ADD REPLY
0
Entering edit mode

This is why I refuse to give a collaborator the DE table until we've established what the question is and answered it for them.

ADD REPLY
1
Entering edit mode

@Ian: That may not always work especially when you work in a core facility. Collaborators are very different from paying/non-paying customers.

ADD REPLY
1
Entering edit mode

Seems that question did hit a sore spot :-D

ADD REPLY
1
Entering edit mode

They are the most frustrating and easily preventable issues relating to bioinformatics, in my experience. An analysis or program or my code may frustrate me at times, but nothing infuriates me more than someone wanting my help while simultaneously disrespecting me by blatantly wasting my time or acting like my contributions are trivial, run of the mill tasks.

Second most important skill is knowing when to swallow your ego and what you really want to say, taking the time to clearly explain potential issues in a rational, neutral manner, and recognizing that ignorance isn't equivalent to malice. Most people don't intend insult, rather just not understanding or recognizing that sound data analysis is often just as important/difficult as quality data generation. Walking them through the process you're going to take is often illuminating for the non-computationally inclined.

In short, communication skills are key.

ADD REPLY
1
Entering edit mode

Nothing wrong with fishing, as long as everyone knows that that is what is going on from the start, and someone has limited things in some way or another.

The scientific process is a cycle:

  ---->  Observe  ----
 |                    | 
Model          Hypothesize
 ^                    |
 |------- Test  <-----

The "Observe" part of the cycle is just as valid as the Hypothesize/Test part and that is what a "fishing" excerise is. The problem here is that the scientist who is wanting to do the obervations, is not he same person who is able to take them, and so every time they want to look at something, must ask someone else to make the observation, who is probably not as atuned as to whether the result is interesting or not, and so has to go back to the domain specialist, with the results tarted up in a way suitable for them to digest. They may go on indefinitely and can be very time consuming. This is also why people who are capable of spotting the interesting things in data, who also have the skills to manipulate the data and do "quick checks" on things can do so well.

ADD REPLY
0
Entering edit mode

I agree fishing can be fine, but it's pretty frustrating when you pull out things you find interesting and it generally gets passed over time and time again because it's not what they were hoping to find. I'm hoping this changes as my career progresses.

ADD REPLY
1
Entering edit mode

I have seen a lot of "fishing" expeditions. Some are true explorations, but some are essentially p-hacking. The researcher is looking for something specific and they will continue to ask for different analysis until they find it. You may as well ask at the beginning what exactly they are looking for to save yourself the trouble of all the pointless intermediate steps.

ADD REPLY
0
Entering edit mode

When you are in charge, you get to decide what is interesting and what is not.

ADD REPLY
0
Entering edit mode
ADD REPLY
2
Entering edit mode
4.3 years ago
ATpoint 82k

I guess for a CS or stat bioinformatician who has little background in biology some major issues could be poorly-defined scientific questions like "go analyze the data and find something interesting". This combined with a wetlab guy who has no idea about the possibilities and especially the limitations of data analysis, maybe combined with poor/unreplicated experimental design must be a nightmare.

ADD COMMENT

Login before adding your answer.

Traffic: 2990 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6