Single cell RNA-sequencing (scRNA-seq) can be used to dissect transcriptomic heterogeneity that is masked in population-averaged measurements. We validated a fully-integrated and robust droplet-based system that enables 3’ mRNA digital profiling of thousands of single cells in a highly multiplex fashion. We demonstrate the clinical utility of our technology to characterize both immune cell subtypes and genotypes by integrating single cell digital RNA profiling with de novo single nucleotide variant (SNV) calling.
To permit the measurement of spontaneous and induced nuclear and mitochondrial mutations, we developed the digital Random Mutation Capture assay (dRMC). The dRMC permits the analysis of millions of nucleotides, and can identify one mutant base pair among 109 wild-type base pairs. In our approach, enrichment for mutant mtDNA with restriction endonucleases precedes single molecule amplification, effectively eliminating issues with polymerase fidelity.
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. We developed a method, termed CypherSeq, which combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection.
Multiple independent studies have documented that the presence and quantity of tumor-infiltrating lymphocytes (TILs) are strongly correlated with increased survival. However, because of methodological factors, the exact effect of TILs on prognosis has remained enigmatic, and inclusion of TILs in standard prognostic panels has been limited. To address this limitation, we introduced a robust digital DNA-based assay, termed QuanTILfy, to count TILs and assess T cell clonality in tissue samples, including tumors.
Characterizing the transcriptome of individual cells is fundamental to understanding complex biological systems. We validated a fully-integrated and robust droplet-based system that enables 3’ mRNA digital profiling of thousands of single cells in a highly multiplex fashion. Cell encapsulation, of up to 8 samples at a time, takes place in ∼6 min, with ∼50% cell capture efficiency.
To demonstrate the system’s technical performance, we collected transcriptome data from ∼250k single cells across 29 samples. We validated the sensitivity of the system and its ability to detect rare populations using cell lines and synthetic RNAs. We profiled 68k peripheral blood mononuclear cells to demonstrate the system’s ability to characterize large immune populations:
Previous mutational assays able to identify rare random spontaneous mutations have ultimately been restricted to model systems. Although tissue culture and transgenic animal systems are powerful tools for identifying potential mutagens, they cannot accurately predict mutagenesis in humans. To permit the measurement of rare random mutation in human tissues, we developed the Random Mutation Capture (RMC) assay. The RMC assay is >100-fold more sensitive than previous methods that employ genomic selection, permits analysis of a large number of nucleotides, and can identify one mutant base pair among 109 wild-type base pairs.
It was with the development of this new technology that we were first able to provide the most convincing evidence to date for existence of a mutator phenotype in human cancers, a hypothesis proposed more than 30 years prior.
Although this assay was initially developed to study point mutation accumulation in the nuclear genome, we have since adapted it to resolve mitochondrial mutations and increased its resolution and throughput by “digitizing” the assay to more sensitively monitor base substitution and deletion mutations (Figure 1). This has allowed us to redefine the relationship among mitochondrial mutagenesis, cancer and aging.
For example, we recently demonstrated two surprising phenomena: 1) far fewer mitochondrial mutations arise in tumors than in normal healthy tissue, and 2) mitochondrial DNA exhibits mutagenic resistance to DNA-damaging agents.
Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. We have developed a method, termed CypherSeq, that combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection.
CypherSeq is designed to overcome the three main barriers to rare variant detection: (i) error correction, (ii) read depth and (iii) enrichment. CypherSeq employs double-stranded molecular barcoding to achieve high sensitivity base calling. Additionally, we exploit the circular nature of the plasmid-based sequencing library to enrich for specific targets using rolling circle amplification (RCA) based enrichment to reduce off-target reads and maximize read depth. CypherSeq's combination of accuracy and enrichment will enable the full potential of personalized, sequencing-based clinical applications to be realized.
The CypherSeq methodology incorporates the error-correcting capabilities of double-stranded barcodes into a circular construct that carries all the components required for NGS. The sequencing construct is cloned into a bacterial plasmid, and thus permits the replication and storage of the barcoded CypherSeq vectors in bacteria, whereas its circular nature allows for enrichment and amplification of specific targets via RCA. The CypherSeq workflow is compatible across many NGS platforms including the Illumina, Ion Torrent, Pacific Bio, 454 and SMRT systems, and is also capable of large-scale multiplexing using conventional indexes.
Figure 4. Overview of rolling circle amplification (RCA) enrichment from CypherSeq libraries. A CypherSeq vector library is amplified by extension of biotinylated, target-specific primers using the strand displacement synthesis-proficient polymerase Bst. Two primers, one targeting each of the complementary strands, must be used to achieve double-strand molecular barcoded error correction. Template CypherSeq vectors containing non-target sequences remain unamplified while templates containing the target sequence are amplified via RCA into long single-stranded products containing redundant copies of the target sequence and sequencing cassette. Unlike conventional PCR, each redundant copy of the target sequence is copied directly from the original DNA fragment. Thus, errors occurring in early rounds of amplification are not reproduced in later duplications, preventing exponential amplification of error. The RCA products are purified using magnetic streptavidin-coated beads, subjected to limited PCR with the library preparation primers (Supplementary Table SI4), and sequenced. The error correction methodology is performed identically to samples not subjected to enrichment. Namely, sequencing reads are compiled by barcode and a consensus is made for each barcode family independently. Substitutions occurring in <90% of the reads within a family are rejected as artifacts, while substitutions present in all or nearly all (>90%) of a family are accepted as true mutations.
We demonstrate that CypherSeq corrects errors inherent in NGS sequencing outputs allowing detection of mutations down to a frequency of 2.4 × 10−7 per base pair. However, the sensitivity of the CypherSeq methodology is likely even greater, as double-stranded barcoding-based error correction can theoretically permit the resolution of mutation frequencies as low as 10−9–10−10 per nucleotide (19) and depends upon the number of unique reads generated.
Translation of robust rare variant detection methods, such as CypherSeq, to the clinic have the potential to dramatically transform disease diagnostics, monitoring and prognostication. Circulating tumor DNA (ctDNA) and circulating tumor cells (CTCs) are detectable in the blood of most patients with advanced cancer and in a significant percentage of patients in the early stages of cancer (36,37). Early cancer diagnosis is currently the most promising approach to reducing mortality, as early detection is associated with more favorable prognosis for nearly all cancer types (36). Reliable detection of early-stage cancer, by quantifying ctDNA or CTCs marked by cancer-specific mutations, will require the most highly sensitive and specific rare variant detection assays to enable screening in a vast background of wild-type normal cells. By exploiting CypherSeq's highly sensitive error correction abilities and by targeting the enrichment step to a panel of genes known to be mutated in cancers, we expect CypherSeq will be able to achieve the sensitivity and specificity required for the early detection of disease.
The human cellular adaptive immune system identifies and destroys cells expressing aberrant proteins or protein fragments. The source of the abnormal protein fragments can include intracellular pathogenic infection, genomic mutations, or deregulation of gene expression. Cancerous cells often express such aberrant peptides, prompting a cellular adaptive immune response. These peptides are presented on the surface of cells by human leukocyte antigen molecules for binding by T cell receptors (TCRs) on the surface of T-lymphocytes, the primary mediators of the cellular adaptive immune response.
Tumor-infiltrating lymphocytes (TILs) have been shown to directly attack tumor cells in a variety of types of cancer, and multiple independent studies have demonstrated that the presence of TILs is strongly correlated with increased survival. For both colorectal and ovarian carcinoma patients, the presence or absence of TILs provides a strong prognostic marker for survival independent of current staging methods. However, existing assays and pathology tests to measure TILs are cumbersome, have inherent variability, are mostly restricted to research studies, and thus are not used for clinical decision-making.
As the importance of TILs gains appreciation, particularly given their potential utility for cancer prognostication and their role in immunotherapeutic response, new technologies to quantitatively measure TILs are needed. Fortunately, adaptive immune cells have a molecular signature that can be exploited for direct measurement. T cells have gene rearrangements in their TCR loci. The nucleotide sequences that encode the TCR regions are generated by somatic rearrangement of noncontiguous variable (V), diversity (D), and joining (J) region gene segments for the β chain, and V and J segments for the α chain. The existence of multiple V, D, and J gene segments in germline DNA permits substantial combinatorial diversity in receptor composition, and receptor diversity is further increased by the deletion of nucleotides adjacent to the recombination signal sequences (RSSs) of the V, D, and J segments, and template-independent insertion of nucleotides at the Vβ-Dβ, Dβ-Jβ, and Vα-Jα junctions.
We have developed QuanTILfy to measure the number of T-lymphocytes and assess clonality in a tissue using droplet digital polymerase chain reaction (ddPCR) technology. The massive sample partitioning is a key aspect of the ddPCR technique and a vital component of the QuanTILfy assay. ddPCR surpasses the performance of earlier techniques by introducing a scalable implementation of digital PCR, where the creation of tens of thousands of droplets allows for the generation of tens of thousands of data points, bringing the power of statistical analysis inherent to digital PCR into practical application.