Question: What Kind Of Bioinformatics Tutorials Would You Like To See Online?
16
Entering edit mode

This is a two-part question, so bear with me!

I work on Knowledgeblog which is a lightweight publication system for scientific code, data, and results based around WordPress and extended by an ecosystem of off-the-shelf and custom plugins.

We're currently putting together a 'writeathon' to provide some bioinformatics tutorial material on a Knowledgeblog. What topics do people think would be good to cover?

We're looking for tutorials that might be good for all levels - computer scientists interested in learning some biology, biologists getting interested in bioinformatics, and of course tutorials aimed at bioinformaticians by bioinformaticians.

The second part of the question is more of a call to arms. We have a travel budget, and would be happy to spend some of this encouraging people to come to Newcastle for a day (Tuesday 21st June) to write away with us. Obviously this is more likely to occur if you're in the UK, but close international travel could also be supported in a limited number of cases.

All tutorials will be given a citable DOI, and no promises, but we will go for PubMed inclusion if we get enough content. You could also contribute remotely on the day, should travel be impossible but you still want to get some content up!

Suggestions for tutorial topics under this question would be great, votes will allow us to work out what topics we cover and who we invite! If you're interested in joining us in Newcastle at the end of June then please drop me an email directly (d.c.swan@ncl.ac.uk).

For examples of existing Knowledgeblogs you can have a look at Ontogenesis and Taverna kblogs.

ADD COMMENTlink 8.8 years ago Daniel Swan 13k • updated 8.7 years ago Gareth Palidwor ♦ 1.6k
Entering edit mode
2

community wiki ?

ADD REPLYlink 8.8 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

Will authors be able to edit the tutorials after the review?

ADD REPLYlink 8.8 years ago
Jan Kosinski
♦ 1.6k
Entering edit mode
0

Good initiative and best of luck! To add to Jan's question: will authors be able to edit tutorials that they have not written themselves? This is vital IMHO.

ADD REPLYlink 8.8 years ago
Michael Schubert
♦ 6.9k
Entering edit mode
0

Jan, very good question - the question of whether an article is canonical is important. The way we work this right now is that if new versions are edited, the old versions remain on the site, linked to at the bottom of the article.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

Michael, it doesn't work so much as a wiki. Articles can of course have multiple authors, but I don't think we envisage people changing other peoples articles! The idea would be to have more of a post-publication review - in the comments, or via trackbacks/pingbacks to other blog discussions, that the author could address at some point.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

Good luck with this Daniel. Are all images and text under a creative commons (or similar) licence? It would be nice to be able use material from the tutorials in both workshops and seminars without breaking copyright. On a related note, do you have recommended image resolution for the wiki or should the images link to a higher resolution version? This would be idea for their inclusion in other seminars.

ADD REPLYlink 8.8 years ago
Alastair Kerr
5.2k
Entering edit mode
0

Alastair, good point, I think we all feel an appropriate CC licence should be in place for this, but there is no decision on this yet. I guess the image resolution depends on how you author the tutorial. If they're embedded in a Word document and then posted, I suspect they would remain at 'Word' resolution. If you were to edit the post in the WordPress interface, you would be able to exercise more control over the formatting. We would support both endeavours, but the idea of Knowledgeblog was to allow people to post articles to the system using whatever their current toolchain is

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

Regarding whether it should be a wiki, definitely it should not! I might want to publish a tutorial using for solving a problem X using a tool Y, I don't want others editing it to use a tool Z because the community believes a tool Z is better. They should write their own tutorial on using a tool Z.

ADD REPLYlink 8.8 years ago
Jan Kosinski
♦ 1.6k
Entering edit mode
0

Jan, this is what we envisage as well. Wiki's are great, but not for what we're trying to do :)

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
13
Entering edit mode

Excellent effort Daniel ! Best wishes in advance.

I would start with a section on Statistics followed by in-depth tutorial. Statistical concepts will be reference material for various sections in the tutorial section

I think it will be interesting to see the tutorials organized by biological data / experiments.

For example:

Genome sequence:

  • Sequence similarity search
  • NGS/WES (QC, alignment, variant calling, annotation)
  • Phylogeny

Gene expression:

  • Mining public data resources for expression data pertaining to specific cellular events
  • Analysis of gene expression data using BioConductor packages

GWAS:

  • Background on Statistical Genetics

  • PLINK

  • DbGAP
  • Visualization tools

Protein sequence:

  • Homology
  • Domain/Motif assignment
  • Analysis of unassigned regions
  • Sequence classification (family, super family, fold level)

Protein Structure:

  • Modeling
  • Structure analysis (Hydrogen bond, solvent accessibility, disulphide bonds, higher order interactions)
  • Structure classification
  • Quality assessment of protein structures

Protein-protein interaction:

  • Databases
  • Visualization of PPI (Cytoscape, BioLayout Express 3D etc)
  • Reasoning over the data

Others:

  • Machine learning (Discuss various aspect of soft computing algorithms using published datasets)

  • Data integration and Data mining topics

ADD COMMENTlink 8.8 years ago Khader Shameer 18k
Entering edit mode
2

Looks like a fabulous beginning for an advanced course in bioinformatics!

ADD REPLYlink 8.8 years ago
Larry_Parnell
16k
Entering edit mode
0

Thanks Larry. Do you think we could really organize such a course that transcend between genome and proteome ? EMBO is doing great job by providing grants for teaching, is there anything similar in US ?

ADD REPLYlink 8.8 years ago
Khader Shameer
18k
Entering edit mode
0

Thanks Khader, some good suggestions there and at least some areas we have some expertise in that we could leverage locally.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

Thanks Daniel. Please let me know if I can contribute one or two tutorials. I will be happy to be a part of it !

ADD REPLYlink 8.8 years ago
Khader Shameer
18k
9
Entering edit mode

A few approaches to consider:

  1. For software installation/configuration tutorials, I recommend the approach used in the GMOD Tutorials. These include starting virtual system images (these use VMware), sample data, and step by step instructions. Most of these came out the annual GMOD courses and reflect exactly what was covered in the course. One drawback of having a starting system image is that those images get stale and need to be refreshed periodically (at GMOD this happens once a year). The instructors create these tutorials in this format for the course.
  2. For using software, short video tutorials work very well. The Galaxy Project puts out wildly popular _quickies_ , video tutorials that highlight how to do specific tasks in Galaxy. These only require a few minutes from the user (but take a long time to make).
  3. Finally, I also like the OpenHelix approach. OpenHelix creates comprehensive hour long video and slide based tutorials that include worked examples. These take an enormous amount of time to make, but excel at being thorough and clear.
ADD COMMENTlink 8.8 years ago Dave Clements • 610
Entering edit mode
1

i have a lot of respect for GMOD but I feel like providing ready-to-use virtual instances leaves beginners helpless when they will inevitably need to install dependencies and muck with their PATH to get something working. This is something I've seen first hand.

ADD REPLYlink 8.8 years ago
Jeremy Leipzig
18k
Entering edit mode
0

openhelix is a great resource. It's just a shame not all of the tutorials are free :( The galaxy webcasts are also excellent

ADD REPLYlink 8.8 years ago
Pi
• 510
Entering edit mode
0

Dave, We've used VM's for tutorials before for our Master's course, so not an alien idea to us. I think the idea of more screencast style tutorials is something we had not necessarily considered but perhaps should.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

At Ensembl we also have quite some short video tutorials, focusing on specific tasks in Ensembl and BioMart. These are made using Camtasia (http://en.wikipedia.org/wiki/Camtasia_Studio). They are made available through YouTube (http://www.ensembl.org/info/website/tutorials/index.html). They seem to be rather popular, but take quite a lot of time to make ....

ADD REPLYlink 8.8 years ago
Bert Overduin
♦ 3.6k
Entering edit mode
0

Jeremy, I agree that starting with ready-made virtual systems can leave users frustrated when they get outside the safety of that system. You can set "traps" in your teaching examples and then talk about things like checking logs, the screen command and so on, but that won't be comprehensive. I don't have a good idea on how to teach system debugging skills (in any depth) and bioinformatics tools in a short course.

ADD REPLYlink 8.8 years ago
Dave Clements
• 610
6
Entering edit mode

On a more advanced level I'd like to see:

- Multiple testing corrections 
- Getting started with medline text mining
- Building bioinformatics web apps backended by SQL
- Integrating multiple large data sets
- Bioinformatics projects: structure and lifecycle

[Edit] An additional one I thought of this morning was "databases in bioinformatics". In my experience, bioinformatics people use text files or SQL databases for data persistence and access, and not a lot else. A tutorial outlining the other options (berkeley DB, key-value stores, lucene, object serialization, object oriented databases, etc) with examples for each may give even experienced bioinformatics developers some new tools to work with.

ADD COMMENTlink 8.8 years ago Gareth Palidwor ♦ 1.6k
Entering edit mode
0

I'm pretty sure we're going to hit Integration as a topic anyway, but that's a good list. I might get one of our stats lecturers in to cover MTC, as I think it's a topic only ever mentioned 'in passing' with datasets!

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
5
Entering edit mode

My suggestion is not a topic but an approach. The tutorial certainly should be hands-on - there is no doubt about that - but it should go further and offer an interactive feature or critique/accolades from the tutorial leader or writer. A tutorial is about learning and bioinformatics is best taught in a more interactive style than by data dump/slide dump/read the notes on your own time.

ADD COMMENTlink 8.8 years ago Larry_Parnell 16k
Entering edit mode
0

agreed, my preferred approach is a standard data set and a progressive series of analyses applied to it, each building on the previous.

ADD REPLYlink 8.8 years ago
Gareth Palidwor
♦ 1.6k
Entering edit mode
0

Larry, you're right I think there's a lot of scope for critique in something like this which is often lacking from the format.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
4
Entering edit mode

my wishes :-)

  • how to write a plugin for Taverna2
  • how to "something-bio" using "language-1" when your favorite language is "language-2"
  • the internals of NCBI blast
  • biostatistics for dummies
  • ...
ADD COMMENTlink 8.8 years ago Pierre Lindenbaum 120k
Entering edit mode
0

how to write a taverna plugin is in the 2x user manual but I can't point you to a link as the taverna web server is down for 2 days.

ADD REPLYlink 8.8 years ago
Pi
• 510
Entering edit mode
0

@pi , the documentation for T2 is, from my point of view, incomplete & unreadable.

ADD REPLYlink 8.8 years ago
Pierre Lindenbaum
120k
Entering edit mode
0

Love the cross-language idea :)

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

We've already got a knowledgeblog for taverna (taverna.knowledgeblog.org). If anyone wants to write a "how-to write a plugin", this would be a good place to add it.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

We've already got a knowledgeblog for taverna (taverna.knowledgeblog.org). If anyone wants to write a "how-to write a plugin", this would be a good place to add it.

ADD REPLYlink 8.8 years ago
phillord
• 0
Entering edit mode
0

There is a tutorial on writing plugins for Taverna 2 at http://www.mygrid.org.uk/dev/wiki/display/developer/Creating+plugins+for+Taverna+2

ADD REPLYlink 8.8 years ago
Alaninmcr
• 0
Entering edit mode
0

@alaninmcr , Thanks ! this tutorial looks far more complete than the last time I saw it. (I removed my previous comment about it)

ADD REPLYlink 8.8 years ago
Pierre Lindenbaum
120k
3
Entering edit mode

I prefer task oriented tutorials that use a standard data set to demonstrate a bunch of standard analyses. I do a lot of bioinformatics consulting for scientists and grad students and much of the work is just variations on the same tasks, for example:

  • Microarray data

    • Quality analysis
    • Normalization
    • Annotation
    • Fold change analysis
    • Gene Ontology enrichment analysis
  • ChIP Seq

    • Quality analysis
    • Peak identification
    • Peak annotation (association with genes)

Scripts in perl and R are helpful, but I've found TM4 MeV to be particularly useful for non programmers dealing with microarray data.

I've worked on a few tutorials similar to what you describe; the affymetrix one (http://www.stemcore.ca/projects/SCNcourse) is getting rather old (doesn't handle the exon/gene chips), and the ChIP Seq one (http://regulome.ca/2010workshop) should be updated as well.

ADD COMMENTlink 8.8 years ago Gareth Palidwor ♦ 1.6k
Entering edit mode
0

My background is array data, so that's definitely along the lines of the kind of tutorials I was going to try and get written myself.

The Chip-Seq work would be interesting, I've done a bit of of this recently, and the QA/PI stage would be of great interest.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

I always do a QA step first; not much point in proceeding with analysis of crappy data.

ADD REPLYlink 8.8 years ago
Gareth Palidwor
♦ 1.6k
3
Entering edit mode

Most of the interesting things have been listed and covered;

Would add:

Annotations tools for GWAS results - database and visualisation scripts

Coalescent models

Haplotype and imputation analyses

This is more tutorial centered on problems to solve rather than focused on language or a database.

Christian

ADD COMMENTlink 8.8 years ago Genotepes • 940
Entering edit mode
0

GWAS is an interesting topic, but one we'd need someone to come in and do! Suggestions? Volunteers? :)

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

I could write one or two things although there are researchers more experienced, more native english speakers (and more in UK and even in Newcastle - Heather Cordell if I remember).

But definitely on some issues around GWAs I can write short notes.

ADD REPLYlink 8.8 years ago
Genotepes
• 940
3
Entering edit mode

As for the tutorials, I'd like to see lots of existing papers reverse engineered with its sample datasets. So that we can walk through them step-by-step and know that we got them right. This is much like questions but only that the answers are worked out for you. Probably, much like a journal club only that it is online.

And, the tutorials also can have little pointers/links to other background reading materials which can be comprised of fundamental facts or structured review articles or something in line of that.

ADD COMMENTlink 8.8 years ago Hranjeev ♦ 1.5k
Entering edit mode
1

A great example for this is: Sémon, M., Lobry, J.R., Duret, L. (2006) No Evidence for Tissue-Specific Adaptation of Synonymous Codon Usage in Humans. Molecular Biology and Evolution, 23:523-529. which has online data sets with interactive we based R (!) so you can reproduce their analysis completely (http://pbil.univ-lyon1.fr/datasets/SemonLobryDuret2005/)

Jean Lobry does a lot of this sort of thing, check the "online reproducibility" links: http://pbil.univ-lyon1.fr/members/lobry/

ADD REPLYlink 8.8 years ago
Gareth Palidwor
♦ 1.6k
Entering edit mode
0

HRanjeev, this is something we've been thinking of doing with Knowledgeblog anyway. The thinking at the moment is more of an 'enhanced paper' where data and code is embedded into the article and can be 'read' by R, so that the work can be recapitulated on the fly and checked that what is published is indeed correct. Nice to see someone is in line with our thinking!

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
Entering edit mode
0

Great thinking! Since we are in the internet age, glad that someone is actually considering to take it beyond traditional publishing mode. I'm actually excited to see how your concept flourishes. I'm following the academia.edu site also but I don't see it as an interactive avenue just yet. Sometimes the authors don't 'feel' the tangible credit to share their comments or even work on a public peer-review process. Hope this can different with Knowledgeblog. Good luck!

ADD REPLYlink 8.8 years ago
Hranjeev
♦ 1.5k
Entering edit mode
0

That was an excellent resource gawp. Something new to me and Jean Lobry is really doing a good job there.

ADD REPLYlink 8.8 years ago
Hranjeev
♦ 1.5k
2
Entering edit mode

I think Khader Shameer covered the spectrum fairly well. What I would like to see personally though is a primer on converting a command line-based pipeline into Galaxy. I'm becoming a fan of it but am personally having some issues with some of the advanced features and frankly can't find all the information I'd like to about the capabilities of Galaxy such as if the load balancing (Torque/PBS I believe) is customizable or if it does such a good job I wouldn't need to mark tasks as disk, RAM, or CPU intensive.

I believe there's a fair-sized market for this and believe it would ultimately render people's workflows more accessible. Not to mention strengthen the Galaxy framework as people add more tools and datatypes to it.

ADD COMMENTlink 8.8 years ago Lythimus • 200
Entering edit mode
0

Great idea - I was going through the Galaxy docs for this last week actually, and you're right I think this would have very broad appeal.

ADD REPLYlink 8.8 years ago
Daniel Swan
13k
0
Entering edit mode

As a biologist doing genetics, you will make my day, Thx

ADD COMMENTlink 8.8 years ago Nataly • 0
Entering edit mode
2

Nataly, try to use the "add comments" function under the Question for comments like these in the future...

ADD REPLYlink 8.8 years ago
Casey Bergman
18k

Login before adding your answer.

Powered by the version 1.8