Tool:UMI-Tools 0.5, now with tools for cell barcoded scRNA-seq
0
11
Entering edit mode
6.7 years ago

We are proud to announce the release of UMI-Tools 0.5.

enter image description here

UMI-tools provides error aware tools for dealing with short random oligos (Unique Molecular Identifiers/Random Molecular Tags).

The novel, error corrected UMI deduplication algorithm was published here. We provide tools to group, deduplicate or count reads by the UMIs.

Find us on PyPI, conda or here: https://github.com/CGATOxford/UMI-tools

General walk-through

Droplet-barcoded single cell RNA-seq walk-through

Version 0.5

Version 0.5.0 introduces new commands to support single-cell RNA-Seq and reduces run-time. The underlying methods have not changed hence the minor release number uptick.

UMI-tools goes single cell

New commands for single cell RNA-Seq (scRNA-Seq):

  • whitelist - Extract cell barcodes (CB) from droplet-based scRNA-Seq fastqs and estimate the number of "true" CBs. Outputs a flatfile listing the true cell barcodes and 'error' barcodes within a set distance. See #97 for a motivating example. Thanks to @Hoohm for input and patience in testing. Thanks to Avi Srivastava for input in discussions about implementing a 'knee' method.

  • count - Count the number of reads per cell per gene after de-duplication. This tool uses the same underlying methods as group and dedup and acts to simplify scRNA-Seq read-counting with umi_tools. See #114, #131

  • count_tab - As per count but works from a flatfile input from e.g featureCounts - See #44, #121, #125

In the process of creating these commands, the options for dealing with UMIs on a "per-gene" basis have been re-jigged to make their purpose clearer. See e.g #127 for a motvating example.

To perform group, dedup or count on a per-gene, basis, the --per-gene option should be provided. This must be combined with either--gene-tag if the BAM contains gene assignments in a tag, or --per-contig if the reads have been aligned to a transcriptome. In the later case, if the reads have been aligned to a transcriptome where each contig is a transcript, the option --gene-transcript-map can be used to operate at the gene level. These options are standardised across all tools such that one can easily change e.g a count command into a dedup command.

Updated options:

  • extract - Can now accept regex patterns to describe UMI +/- CB encoding in read(s). See --extract-method=regex option.

We have written a guide for how to use UMI-tools for scRNA-Seq analysis including estimation of the number of true CBs, flexible extraction of cell barcodes and UMIs and per-cell read-counting as well as common workflow variations.

Reduced run-time

Introduced a hashing step to limit the scope of the edit-distance comparisons required to build the networks. Big thanks to @mparker2 for this!

Simplified installation ( #145 )

Previously extensions were cythonized and compiled on the fly using 'pyximport, requiring users to have access to the install directory the first time the extension was required. Now the cythonized extension is provided, and is compiled at install-time.

Drop us a line here, on twitter (@IanSudbery) or on Github if you need further help or advice.

UMI RNA-Seq single-cell • 4.8k views
ADD COMMENT

Login before adding your answer.

Traffic: 2491 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6