Welcome!

Practical bioinformatics topics / NGS

I’m developing a survey instrument that I can use to assess bioinformatics training needs at UC Davis, with a particular emphasis on practical sequencing data analysis. (Please see my blog post on training for more information and background.)

A few notes –

  1. I intend this survey to be for biologists to fill out. So, I’m avoiding technical and foundational skills (cloud computing, Linux/UNIX, R, Python, managing large data sets).
  2. I’m also avoiding sequence analysis approaches for which there are no established pipelines.

Below is my list so far. I welcome comments, additions, and critiques! The live site is at http://ngs-training-needs-survey.rtfd.org/.

Please feel free to copy, fork, and modify freely - the source for this is on github at https://github.com/ngs-docs/ngs-training-needs-survey.

Genome assembly and annotation:

  • Assembling and annotating bacterial and archaeal genomes (w/Illumina, PacBio)
  • Assembling and annotating non-plant/animal eukaryotic genomes (w/Illumina)
  • Assembling animal genomes (w/Illumina)
  • Annotating animal genomes
  • Assembling plant genomes (w/Illumina)
  • Annotating plant genomes
  • Annotating bacterial genomes
  • Annotating fungal genomes
  • Long-read technologies for large genomes (PacBio, Moleculo)
  • Emerging technologies for genome sequencing, assembly, and annotation in plants
  • Emerging technologies for genome sequencing, assembly, and annotation in bacteria and archaea
  • Emerging technologies for genome sequencing, assembly, and annotation in animals

Resequencing and variant calling:

  • Variant calling on bacterial, archaeal, and fungal genomes
  • Variant calling on plant and animal genomes
  • Genotyping by sequencing

Transcriptomics:

  • mRNAseq expression analysis in major model organisms (human, mouse, zebrafish, Arabidopsis, yeast, worm, Drosophila)
  • ab initio transcriptome assembly, annotation, and expression analysis (semi-model animals, plants, and fungi)
  • de novo transcriptome assembly, annotation, and expression analysis (non-model eukaryotes)
  • Reference-genome-based bacterial and archaeal transcriptomics
  • De novo mRNAseq in bacteria and archaea (no reference genome)

Metagenomics and microbial ecology:

  • Amplicon analysis of populations and population structure
  • Reference-based metagenomics (e.g. human microbiome)
  • De novo shotgun metagenome and metatranscriptome assembly and analysis

Other:

  • ChIP-seq analysis
  • Reduced representation analysis of genomes and populations
  • Marker development
  • Genome Wide Association Studies

More open-ended questions:

What bioinformatics software/programs are you using right now?

  • CLC Workbench;
  • Galaxy;
  • Other (pls specify)

What compute resources are you using, if any?

  • Laptop or lab computer;
  • iPlant;
  • XSEDE;
  • DIAG;
  • Amazon cloud;
  • Davis Genome Center;
  • Other cloud (specify)
  • Other (pls specify)

What scripting or programming languages are you using, if any?

  • MATLAB
  • R
  • Python
  • Perl
  • SAS
  • Other (pls specify)

What do you feel is your major bioinformatics or sequence analysis-related obstacle, i.e. what is getting in the way of doing your data analysis?

Tiers of training

There are a lot of necessary skills for doing data intensive biology. How do you teach them all, and in what order?

Here’s a set of tiers of training for an intro class or course in next-gen sequence analysis; feedback welcome!

My teaching philosophy is to try to motivate later topics by first introducing the concepts in a useful workflow, and then diving deeper into the topics once they’ve been motivated. So the approach is to start with the minimum necessary to get something done, and then explain it in more detail after the first pass.

Interpreting this diagram

Except for the second tier, completion of all of the topics in one tier would be necessary to move on to the next. For the second tier, completion of at least one topic would be needed to move on to the next tier.

(You can also download a PDF by clicking on the image.)

_static/tiers.png

Learning Aims

(Long term goals of training.)

The workshops will, in the long term, enable students to:

  • Be capable of doing their own data analysis.
  • Be future proofed against new software. Students will be able to pick up new version of existing software, or new software, and apply them to their data.
  • Know how to assess computational performance and design appropriate computational controls, and use these to evaluate parameter and software choices.
  • Record their analysis workflow, publish reproducible analyses, and track data provenance manually.
  • Use the appropriate statistical models and tools.
  • Gain in efficiency and expertise on their own, through reading, or via informal interactions in person and online.
  • Choose and apply the appropriate computational resources.
  • Appropriately manage raw data and associated metadata.
  • Identify, troubleshoot, and solve common technical problems on their own, or via informal interactions in person, and/or online.

(Thanks to Tracy Teal for comments on this.)

Also see:

https://twitter.com/JasonWilliamsNY/status/544853305017765888 (Jason Williams)

https://twitter.com/Vaguery/status/544847281124835328 (Bill Tozier)

Indices and tables


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.
comments powered by Disqus