HK1164945A - Single cell gene expression for diagnosis, prognosis and identification of drug targets - Google Patents
Single cell gene expression for diagnosis, prognosis and identification of drug targets Download PDFInfo
- Publication number
- HK1164945A HK1164945A HK12105664.0A HK12105664A HK1164945A HK 1164945 A HK1164945 A HK 1164945A HK 12105664 A HK12105664 A HK 12105664A HK 1164945 A HK1164945 A HK 1164945A
- Authority
- HK
- Hong Kong
- Prior art keywords
- cells
- cell
- cancer
- disease state
- expression
- Prior art date
Links
Description
Cross referencing
The benefit of U.S. provisional application No.61/205,485 filed on 20/1/2009, which is hereby incorporated by reference, is claimed.
Government authority
The invention was made with government support from federal fund U54CA126524 awarded by the national cancer institute. The government has certain rights in this invention.
Background
In recent years, analysis of gene expression patterns has provided a means to improve the diagnosis and risk stratification of many diseases. For example, analysis of unsupervised global gene expression patterns (global gene expression patterns) has identified different subtypes of cancer at the molecular level, which are considered homogeneous diseases based on standard diagnostic methods by widely differing differentiation of gene expression. Such molecular subtypes are often associated with different clinical outcomes. Global gene expression patterns can also be used to examine features associated with clinical behavior to generate prognostic markers.
As with many diseases, cancer is not usually caused by a single, well-defined cause, but can be viewed as multiple diseases, each caused by a different deviation in information pathways, which ultimately leads to apparently similar pathological manifestations. The identification of polynucleotides that are differentially expressed in cancer, precancerous, or cells with low metastatic potential relative to normal cells of the same tissue type can provide a basis for diagnostic tools, aid in drug discovery by providing targets for candidate agents, and further can be used to identify therapeutic targets for cancer therapy that are more appropriate for the type of cancer being treated.
The identification of differentially expressed gene products also advances the understanding of the progression and nature of complex diseases, and is critical to identifying genetic factors that cause phenotypes associated with, for example, the development of metastatic or inflammatory phenotypes. The identification of gene products that are differentially expressed at different stages and cell types can provide both an early diagnostic test and further serve as therapeutic targets. In addition, differentially expressed gene products may be the basis of screening assays to identify chemotherapeutic agents that modulate their activity (e.g., their expression, biological activity, etc.).
Early disease diagnosis is important to prevent disease progression and reduce morbidity. Analysis of patient samples to identify gene expression patterns provides a more specific basis for rational disease treatment that may produce fewer side effects than traditional treatments. Moreover, confirming that the lesion is at less risk to the patient (e.g., the tumor is benign) may avoid unnecessary treatment. Briefly, the identification of gene expression patterns in disease-associated cells can provide a basis for therapy, diagnosis, prognosis, therapeutic assays (therametrics), and the like.
As another example, infectious diseases cause damage to tissues and organs, resulting in morbidity and mortality for a particular organism. In the case of influenza a infections, the most common cause of hospitalization and death is lung tissue infection. However, at the single cell level, the exact cells of influenza infection, as well as the cells that repair the damaged lung, are not known. Such knowledge helps identify new therapeutic targets for intervention, such as new drugs to prevent viral infections, and new therapies to reduce morbidity.
Many tumors contain a mixed population of different cancer cells associated with their signaling pathways for growth and survival. Because these cancer cells respond differently to specific treatments, resistance to a specific population of cancer stem cells results in a recurrence of cytotoxic radiation and chemotherapy. Thus, the failure of clinical therapy is due in part to the resistance of a particular population of cancer cells to the therapy.
The initial shrinkage of the tumor, which is often observed shortly after treatment, reflects only the relative sensitivity of a subset of cancer cells, which may include tumor volume, and is not important for long-term survival. Thus, the most important clinical variable for assessing response to treatment and prognosis is not absolute tumor size, but the absolute number of specific populations of cancer cells remaining after treatment. If one were able to identify the differences in signaling pathways used by different populations of cancer cells in these tumors, one could design treatments that target each population of cells. By targeting all populations, one can eliminate tumors by treatment with drugs that affect different populations.
As another example, inflammatory bowel disease causes destruction of the normal structure of the intestine, thereby causing problems such as diarrhea, bleeding, and malabsorption. These problems are caused by the breakdown of the normal lining of the intestinal mucosa. The mucosal lining of the colon consists of crypts, with goblet, stem and progenitor cells at the base of the crypts, and mature cells, including gut and goblet cells, at the top of the crypts. In inflammatory bowel disease, it is unclear which cell population is damaged and the signaling pathways required to repair the damaged mucosa.
Methods for accurately determining the number and phenotype of cells in disease lesions using small numbers of cells are of great interest for prognosis, diagnosis and identification of signaling pathways for a variety of diseases that can be targeted by specific therapies, including inflammatory bowel disease, infections, cancer, autoimmune diseases such as rheumatoid arthritis and infections. The present invention addresses this problem.
Disclosure of Invention
The present invention provides compositions and methods for single cell gene expression profiling and/or transcriptome analysis. One method provided herein is a method of identifying different cell populations in a heterogeneous solid tumor sample, comprising: randomly segregating individual cells from the tumor into discrete locations; performing transcriptome analysis on a plurality of genes of the individually segmented cells in the isolated locations; a cluster analysis is performed to identify one or more distinct cell populations. In some instances, the individual cells are not enriched prior to partitioning. Transcriptome analysis can be performed simultaneously on at least 1000 individual cells. Transcriptome analysis can be performed using nucleic acid analysis. The discrete locations may be on a planar substrate. In certain embodiments, random partitioning is performed in a microfluidic system. Transcriptome analysis may include analysis of expressed RNA, non-expressed RNA, or both. The transcriptome analysis may be a whole transcriptome analysis. Transcriptome analysis may involve amplification of RNA using a single set of primer pairs, which in some embodiments are non-nested primers. Transcriptome analysis may be performed simultaneously or substantially in real time on all or a subset of individual cells. The one or more cell populations may be normal stem cells, normal progenitor cells, normal mature cells, inflammatory cells, cancer stem cells, or non-carcinogenic stem cells.
Further provided herein are methods of analyzing a heterogeneous tumor biopsy sample from a subject, comprising: randomly partitioning cells from a biopsy sample to discrete locations; performing a transcriptome analysis of at least 50 genes of the individually partitioned cells; and using the transcriptome data to identify one or more characteristics of the tumor. The performing step may be performed without prior enrichment of cell types. The characteristic identified may be the presence, absence or number of cancer cells. The identified characteristic may also be the presence, absence or number of stem cells, early progenitor cells, initially differentiated progenitor cells, later differentiated progenitor cells or mature cells. The identified characteristic may also be the effectiveness of the therapeutic agent to eliminate one or more cells. The identified characteristic may also be an activity of a signaling pathway, e.g., a specific pathway of a cancer stem cell, a differentiated cancer cell, a mature cancer cell, or a combination thereof. The methods disclosed herein may further comprise the step of using the signature to diagnose cancer or a stage of cancer in the subject.
Another method disclosed herein is a method of identifying a signaling pathway utilized by a disease state cell, comprising: randomly partitioning cells from a heterogeneous sample; performing transcriptome analysis on the isolated cells; identifying cells of at least one disease state using transcriptome analysis; comparing the transcriptome analysis of the cells of the at least one disease state with the transcriptomes of the following cells: a) a non-disease state cell; b) different disease state cells; and c) disease state stem cells; and identifying signaling pathways expressed in (i) the disease state cells, (ii) the disease state stem cells, and (iii) optionally in different disease state cells, but not in non-disease state cells, thereby identifying signaling pathways utilized by the disease state cells. The disease state is cancer, ulcerative colitis or inflammatory bowel disease. In certain embodiments, the signaling pathway is required for survival of the disease state cell.
The present disclosure also provides a method for diagnosing a subject with a condition, comprising: randomly partitioning cells from a heterogeneous sample; performing a first transcriptome analysis on the separated cells; identifying at least one disease state cell using transcriptome analysis by comparing a first transcriptome analysis from at least one disease state cell with a second transcriptome analysis from a non-disease state cell, thereby diagnosing the presence or absence of a condition associated with a disease state cell in the subject. The disease state may be breast cancer, colon cancer, ulcerative colitis or inflammatory bowel disease. Transcriptome analysis may include analysis of expressed RNA, non-expressed RNA, or both. The transcriptome analysis may be a whole transcriptome analysis.
Another method provided herein is a method for screening for a therapeutic agent comprising: exposing a first subject having disease state cells to one or more test agents; obtaining a heterogeneous tumor biopsy sample from a target region of a subject; performing transcriptome analysis on at least one individual cell from a heterogeneous tumor biopsy sample, wherein the biopsy sample comprises one or more disease state cells; and comparing the transcriptome analysis to transcriptomes from either: (i) a second subject without disease state cells; or (ii) a first subject prior to the exposing step; and identifying an agent that affects the transcriptome of cells from the test region to make it more like the transcriptome of the second subject or the first subject prior to exposure. The condition may be breast cancer, colon cancer, ulcerative colitis or inflammatory bowel disease. The therapeutic agent can be an antibody or antibody fragment, a small molecule, a nucleic acid (e.g., siRNA), RNA, DNA, RNA-DNA chimera, a protein, or a polypeptide.
The present disclosure also provides a method of determining the potential effectiveness of a therapeutic agent to treat a disease, comprising: isolating a first population of disease state cells to individual locations, wherein an individual location comprises an individual cell; determining the expression level of at least one nucleic acid or protein from at least one individual cell, thereby generating a disease state expression marker; exposing a second population of disease state cells to an agent; isolating a second population of disease state cells to individual locations, wherein the individual locations comprise individual cells; determining the expression level of at least one nucleic acid or protein from at least one individual cell of the second population; and comparing the expression levels of the individual cells from the second population to the disease state expression signature, thereby determining the effectiveness of the agent in treating the disease. The exposing step may be performed in vivo. In some examples, the first population and the second population are isolated from one subject, e.g., a human. The disease may be cancer, ulcerative colitis or inflammatory bowel disease. The nucleic acid or protein can be a cancer cell marker, a cancer stem cell marker, or both. The expression level may be the expression level of mRNA. In some embodiments, determining the mRNA expression level comprises detecting the expression or non-expression of 10 or more nucleic acids. The expression level may also be the expression level of the protein. The isolating step may comprise exposing the population of cells to an antibody that specifically binds to a protein present on individual cells.
Further provided herein are methods of determining the likelihood of a subject responding to a therapeutic agent, comprising: isolating a population of cells from the subject to individual locations, wherein an individual location comprises individual cells and wherein at least one individual cell is a disease state cell; determining the expression level of at least one nucleic acid or protein from individual cells of at least one disease state, wherein the nucleic acid or protein is a target of a therapeutic agent; and determining the likelihood of the subject's response based on the expression level of the at least one nucleic acid or protein. The expression level may be the expression level of mRNA. In certain embodiments, determining the expression level of the mRNA comprises detecting the expression or non-expression of 10 or more nucleic acids. The expression level may also be the expression level of a protein. The isolating step may comprise exposing the cell population to an antibody that specifically binds to a protein present on individual cells. The therapeutic agent may be an anti-cancer agent.
Another method as detailed herein provides a method for prognosis or diagnosis of gene expression using individual cells, comprising the steps of: isolating cells from the heterogeneous sample to respective addressable locations; lysing individual cells and dividing the resulting lysate into at least 2 fractions; amplifying the mRNA or cDNA obtained from the individual cells; determining a gene expression profile of a portion of the lysate, wherein the gene expression profile provides information on the subpopulation; and performing a transcriptome analysis on at least one cell in the target subpopulation. In some methods, at least 10 is analyzed2Or at least 103Individual cells. Cells can be classified based on the expression of at least one cell surface marker. The cells analyzed by the methods disclosed herein can be stem cells, such as hematopoietic stem cells. The initial sample may comprise less than 106Or less than 105And (4) cells. Cells may be classified according to at least one of CD34 and Thy1 expression. In certain embodiments, the expression of at least one or at least five (5) genes associated with hematopoietic stem cells is determined. Rotating shaftThe transcriptome analysis may be a whole transcriptome analysis.
Further provided herein is a method of classifying a stem cell comprising the steps of: (a) obtaining a stem cell transcriptome profile from the sample; and (b) comparing the obtained transcriptome profile with a reference stem cell transcriptome profile. The transcriptome profile may include a data set derived from at least about 5 stem cell-associated proteins. The stem cells analyzed may be cancer stem cells, hematopoietic stem cells, intestinal stem cells, leukemia stem cells, or lung stem cells. The sample analyzed may include cells from a cancer, such as breast or colon cancer. The transcriptome profiling may also comprise the additional steps of: extracting mRNA from the stem cell sample; quantifying the level of one or more mRNA species corresponding to the stem cell-specific sequence; and comparing the level of one or more mRNA species with the level of the mRNA species in the reference sample.
Also provided herein is a method of collecting transcriptome-related data comprising the steps of: transcriptome-related data is collected using any of the methods described herein, and the data is transmitted to a computer. The computer was connected to a sequencer. The transcriptome-related data may be stored after transfer, for example, the data may be saved on a computer-readable medium that may be retrieved from a computer. Data may be transmitted from the computer to a remote location, such as via the internet.
Drawings
This patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be available from the office upon request and payment of the necessary fee.
FIG. 1, human "colorectal cancer stem cells" (EpCAM) purified from human colorectal cancer tissue (tumor #4m6) xenografted from NOD/SCID micehigh) Single cell gene expression analysis by real-time PCR. In the first experiment (panel a), 16 single cells were analyzed for expression of 5 genes, with 27 replicates per cell gene combination; in this experiment, single cells aloneEach mRNA preparation of (a) was used in 3 consecutive rows of the reaction matrix, each gene-specific primer set was used in 9 consecutive columns, with the only exception of the first 3 columns without primers; gene expression levels of each individual cell were visualized with 3x9 plaques using chromaticities. In a second experiment (panel B), a similar approach was performed, where 16 single cells were analyzed for expression of 16 genes, with 9 replicates per cell gene combination; in this second case, each mRNA preparation of individual single cells was used in 3 consecutive rows of the reaction matrix and each gene-specific primer set was used in 3 consecutive columns, so that the gene expression level of each individual cell could be visualized with 3x3 plaques using chromaticities. In both cases, the experiments showed a high level of repeatability and consistency in each set of replicates.
FIG. 2, human "colorectal cancer stem cells" (EpCAM from xenograft #8m3high/CD166+Cells) real-time PCR of single cell gene expression analysis. In this figure, each row identifies a single cell and each column identifies a different gene. The intensity of gene expression is depicted using a color code, where dark red indicates stronger intensity and dark green indicates weaker intensity. The analysis clearly showed that EpCAM, based on all its transcriptome constituentshigh/CD166+Tumor cells can be subdivided into different subgroups. Most importantly, subsets of cells that show equal and simultaneous high levels of expression of genes encoding end differentiation markers of colonic epithelium (e.g., cytokeratin 20, CD66a/CEACAM1, carbonic anhydrase II, MUC2, trefoil factor 3) do not express or express low levels of genes encoding candidate intestinal stem cell markers or genes that are known to be essential for stem cell function (e.g., hTERT, LGR5, survivin), and vice versa.
FIGS. 3A-B, a: MTIC 11' ESA purified from Lung cells of NOD/SCID mice bearing mammary tumors+H2K-. Upper group gated (gated) H2K-Dapilville lineage), lower left group gated ESA+Cells were used to further gate CD 2441' cells in the lower right panel. b: HIF1a, Snail2, Zeb2, epithelial calcium mucinReal-time PCR analysis of mRNA levels of white, Vimentin (Vimentin), VEGFC, CCR7, Lox, Cox2 in MTIC and non-TIC.
FIG. 4 CT values from real-time PCR analysis comparing initial TIC and MTIC microRNA (mirs) levels.
Fig. 5A-5D, CD66a as a non-oncogenic cancer cell marker for breast cancer.
Fig. 6, copy number variant analysis of thousands of CNVs from 18 cell samples. Several may be associated with genomic instability and result in altered pluripotent stem cell traits.
FIG. 7 Single cell analysis device, principle.
FIG. 8, Gene set enrichment analysis of stem cell associated gene expression. Genes expressed by self-renewing normal HSCs, leukemic stem cells derived from granulocyte/macrophage progenitor cells (GMPs), but not by non-self-renewing normal GMPs, are analyzed in breast Cancer Stem Cells (CSCs) and their non-carcinogenic progeny (NTGs). As expected, these genes were significantly over-represented in the CSC gene expression signature. Heatmaps of overexpressed genes are shown.
FIG. 9 is a simplified diagram of the sorting of rare subsets of cells, "in silicon". Cell populations such as hematopoietic stem cells are sorted by FACS into 96-well plates, containing single cells. Lysing the cells divides the lysate into 2 fractions. A portion of the lysate is used to analyze the expression of a set of genes, allowing characterization of the cells based on transcription rather than expression of surface proteins. Using this information, selected lysates and/or lysates collected from similar cells were subjected to whole transcriptome analysis.
FIG. 10, a diagrammatic representation of data collected, stored and transmitted by a computer.
Detailed Description
The method of the invention uses the single cell gene expression profile of primary cells (primary cells) to characterize cell populations for disease diagnosis, sensitivity to specific therapeutic interventions, application of prognosis and identification of new drug targets. Heterogeneous cell samples are divided into spatially separated individual cells, optionally sorted according to target properties (possibly including surface markers), then lysed, the contents amplified, and individually analyzed for target gene expression. Thus, the cells analyzed were classified according to the genetic markers of the individual cells. Such classification allows for accurate assessment of the cellular components of the test sample.
Conventional methods for analyzing single cells for diagnostic purposes include counting the number of cells of a given type using coulter counters and flow cytometers. However, these measurements are typically based on the use of antibodies against surface markers and do not allow for the determination of gene expression or protein expression at the mRNA level. There are previous examples of single cell PCR analysis, but these examples were performed on a very small number of cells and or genes to provide useful diagnostic information or to provide the ability to distinguish between subtle or relevant cell subsets in a tissue. The methods of tissue staining used by pathologists suffer from similar drawbacks and are strongly dependent on the pathologist's qualitative judgment. Moreover, these measurements are limited to detecting a small number of parameters. However, the method of the invention allows for the detection of at least 10, at least 15, at least 20, at least 50, at least 100, at least 200, at least 300, at least 400, at least 500 or more different parameters, wherein the parameters comprise mRNA expression, gene expression, protein expression and further comprise cell surface markers that bind to mRNA, gene and/or protein expression.
Before the invention is further described, it is to be understood that this invention is not limited to particular embodiments described below, as variations of the particular embodiments may be made, which will still fall within the scope of the appended claims. It is also to be understood that the terminology used is for the purpose of describing particular embodiments only, and is not intended to be limiting. In this specification and the appended claims, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise.
Where a range of values is provided, it is understood that each intervening value, to the tenth of the unit of the lower limit (tenth), between the upper and lower limit of that range and any other stated or intervening value in that stated range, unless the context clearly dictates otherwise, is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included in the smaller ranges, or may be included in the invention subject to specific exclusion limits within the stated ranges. Where the stated range includes one or both of the limits, ranges excluding either or both of those included limits are also included in the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although any methods, devices, and materials similar or equivalent to those described herein can be used in the practice or testing of the invention, the illustrative methods, devices, and materials are now described.
All publications mentioned herein are incorporated herein by reference for the purpose of describing and disclosing the subject matter of the inventions described in the publications, which might be used in connection with the invention now described.
Identification and classification of cells into populations and subpopulations
The present disclosure relates to methods of classifying therapeutic targets for identifying cell populations and subpopulations and using the populations and/or subpopulations to diagnose, prognose, and/or identify conditions such as disease. The disease may include any kind of cancer (including but not limited to solid tumors, breast cancer, colon cancer, lung cancer, leukemia), inflammatory bowel disease, ulcerative colitis, autoimmune disease, inflammatory disease, and infectious disease. The present disclosure also provides formulations and kits for practicing the subject methods, such as antibodies and nucleic acid probes for detecting any of the biomarkers described herein, or formulations that modulate the biomarkers herein. The methods can also determine appropriate levels for treatment of a particular cancer.
Isolation of Single cells
Single cell gene expression profiles for disease diagnosis or prognosis applications are provided, as well as research tools for identifying new drug targets. Target diseases include, but are not limited to, immune-mediated dysfunction, cancer, and the like. In the methods of the invention, heterogeneous mixtures of cells, such as needle biopsies of tumors, biopsies of inflammatory lesions, synovial fluid, spinal cord aspirates, etc., are randomly or in some order divided into spatially separated single cells, e.g., incorporated into a multi-well plate, microarray, microfluidics, or slide. The cells are then lysed and the contents expanded and analyzed individually for expression of the gene of interest. The cells analyzed are therefore classified according to the genetic markers of the individual cells. Such classification allows for accurate assessment of the cellular components of the sample being tested, which assessment may find use, for example, in determining the identity and number of cancer stem cells in a tumor; for determining the identity and number of immune-related cells, e.g., the number and specificity of T cells, dendritic cells, B cells, and the like.
In certain embodiments, the sample of cells analyzed is an initial sample, which may be freshly isolated, frozen, or the like. However, the cells to be analyzed may be cultured cells. Typically the sample is a heterogeneous mixture of cells comprising a large number of different cell types, different populations or different subsets, such as2, 3, 4,5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 or more cell types, populations or subsets. In certain embodiments, the sample is a cancer sample from a solid tumor, leukemia, lymphoma, or the like, which may be a biopsy sample, such as a needle biopsy sample or the like, a blood sample of a disseminated tumor and leukemia, or the like. The sample may be obtained before self-diagnosis, during self-treatment, etc.
To separate cells from tissue, an appropriate solution is used for dispersion or suspension. Such solutions are typically balanced salt solutions, such as normal saline, PBS, Hank's balanced salt solution, and the like, conveniently supplemented with fetal bovine serum or other naturally occurring factors, and a low concentration of an acceptable buffer, typically 5-25 mM. Suitable buffers include HEPES, phosphate buffer, lactate buffer, and the like. The isolated cells can be collected in any suitable medium that maintains the viability of the cells, typically with a serum pad at the bottom of the collection tube. Different media are commercially available and may be used depending on the nature of the cell, including dMEM, HBSS, dPBS, RPMI, Iscove media, and the like, typically supplemented with fetal bovine serum.
In certain embodiments, the cells in the sample are isolated on a microarray. For example, highly integrated hepatocyte microarray systems may utilize microwells that are each large enough to fit exactly into a single cell (see Tokimitsu et al (2007) Cytometry PartA 71k 1003: 1010; and Yamamura et al (2005) Analytical Chemistry 77: 8050; each of which is specifically incorporated herein by reference). The target cells are enriched beforehand, -for example by FACS or other separation methods-is optional and in some embodiments, cells from the sample are isolated to the location of the separation without any prior isolation or enrichment. For example, cells from a sample (e.g., blood sample, biopsy sample, solid tumor) can be isolated individually to different locations. Typically, for solid tumor samples, the sample is mechanically, chemically, and/or enzymatically separated (e.g., by treatment with trypsin or sonication). Cells from a sample can be placed on any cell sorter (e.g., a microfluidic cell sorter) such that individual cells are separated at addressable locations on, for example, a plate surface. The plate surface may have indentations, obstructions or other features that may ensure separation of individual cells. The isolated cells can then be analyzed according to the methods herein. Preferably, the cells can be separated into different locations, wherein each location contains 1 or 0 cells.
Optionally, the cells are separated, for example, after sorting the cells by flow cytometry. For example, FACS sorting or size difference sorting may be used to increase the initial concentration of target cells by at least 1000, 10000, 100000, or more times, depending on the presence of one or more markers on the cell surface. Optionally, such cells may be classified according to the presence and/or absence of cell surface markers, in particular markers of the target population or subpopulation.
When cells are separated into different locations for analysis, the cells may be sorted using a microfluidic sorter, by flow cytometry, microscopy, or the like. Microfabricated fluorescence-activated cell sorter was described in Fu et al (1999) Nature Biotechnology 17: 1109 and Fu et al (2002) anal. chem.74: 2451-2457, each of which is incorporated herein by reference. Samples can be sorted by an integrated microfabricated cell sorter using multilayer soft etching technology (multi-layer soft lithography). The integrated cell sorter may incorporate various microfluidic functions including peristaltic pumps, shock absorbers, on-off valves, and input and output wells to perform cell sorting in a coordinated and automated manner. The effective volume of the control valve on the integrated cell sorter can be as small as 1pL and the volume of the optical interrogation can be as small as 100 fL. Microfluidic FACS offers higher sensitivity, no cross-contamination and lower cost than traditional FACS machines.
Individual cells can be separated into different locations (e.g., 96-well plates or microarray addresses) for further analysis and/or manipulation. For example, cell populations comprising Hematopoietic Stem Cells (HSCs) are classified by FACS analysis using antibodies that are capable of distinguishing HSCs from mature cells. Cells were sorted into 96-well plates, lysed using appropriate methods, and lysates analyzed by qPCR, microarray analysis, and/or sequencing.
The apparatus for single cell separation comprises a microfluidic cell sorter that separates living cells from cell debris and sorts cells from a single cell suspension. The microfluidic device may be used in combination with fluorescent signals derived from 1, 2, 3, 4,5 or more different surface markers (e.g. labeled antibodies to the markers of the target population or subpopulation), which are placed in separate chambers for subsequent genetic analysis. Other upstream steps, e.g. digestion of tumors or cell cultures to obtainCell suspensions can be incorporated into the system by staining with fluorescent surface markers. The number of cells to be analyzed depends on the heterogeneity of the sample, as well as the expected frequency of the target cells in the sample. Typically at least about 102At least about 10 cells3At least 5x 103At least about 104At least about 105At least about 106At least about 107At least about 108At least about 109At least about 1010At least about 1011At least about 1012At least about 1013At least about 1014At least about 1015Or more cells.
In some examples, Single Cell Analyzers (SCADs) are manufactured in standard sizes and can perform the following step 1) digesting tissue in an overall fully automated manner. Tissue is placed at the input portion of the device. Appropriate enzymes are introduced into the device and flowed in to perform extracellular matrix digestion to obtain a cell suspension. 2) Viable cells are separated from the cell debris, for example by flowing a microfluidic "metamaterial" (meta) into the digested sample suspension, which allows the fluid flow to be split according to particle size. 3) And (6) dyeing. Optionally, the filtered single cell suspension is stained with appropriate surface markers within the intervals of the microfluidics instrument. Staining with up to 5 different markers is useful for obtaining a highly pure population of cancer cells. 4) And (6) classifying. The stained single cell suspension flows into the next compartment of the microfluidics instrument to sort out cancer cells from the remaining cells. Various specific embodiments of the classifier are described in the examples.
Expression Profile
The classified cells can be individually lysed for analysis of the genetic (RNA, DNA) and/or protein content of the cells. mRNA can be captured on a column of oligo-dT beads, reverse transcribed on the beads, processed off the chip, transferred to macroscopic wells, and the like. Optionally, the DNA or RNA is pre-amplified prior to analysis. The preamplification may be a whole genome or transcriptome, or a portion thereof (e.g., a target gene/transcript). The polynucleotide sample can be transferred to a chip for analysis (e.g., by qRT-PCR) and determination of the expression profile.
The term "expression profile" is used broadly to include expressed proteins and/or expressed nucleic acids. The nucleic acid sample comprises a plurality or population of different nucleic acids, which may comprise information on the expression of genes that are determinative of the phenotype of interest of an individual cell. The nucleic acid sample may comprise RNA or DNA nucleic acids, such as mRNA, cRNA, cDNA, and the like. The expression profile can be generated in any convenient manner to determine differential gene expression between two samples, such as quantitative hybridization of mRNA, labeled mRNA, amplified mRNA, cDNA, etc., quantitative PCR, and the like. A subject or patient sample, such as a cell or collection thereof, such as a tissue, is analyzed. The sample may be collected by any convenient method known in the art. Additionally, tumor samples can be collected and examined to determine their relative effects of treatment leading to different deaths between normal and disease cells. A gene/protein of interest is a gene/protein found to be predictive, including the genes/proteins provided herein, wherein an expression profile can include expression data for 5, 10, 20, 25, 50, 100 or more (including all) of the listed genes/proteins.
Samples can be prepared in several different ways as known in the art, such as isolating mRNA from single cells, where the isolated mRNA is amplified, used, as known in the art of differential expression, to prepare cDNA, cRNA, and the like (see, e.g., Marcus et al Anal Chem (2006); 78 (9): 3084-89). Samples can be prepared from any tissue harvested from a subject (e.g., lesion or tumor tissue). Analysis of the sample can be used for any purpose (e.g., diagnosis, prognosis, classification, follow-up, and/or development of therapy). The cells may be cultured prior to analysis.
An expression profile is generated from the initial nucleic acid sample using any conventional protocol. Although many different ways of generating expression profiles are known, such as those used in the field of differential gene expression analysis, one representative and convenient type of protocol for generating expression profiles is quantitative PCR (QPCR or QT-PCR). Any available method for performing QPCR may be utilized, e.g., as described in Valera et al J neuroncol (2007)85 (1): 1-10.
After obtaining the expression profile from the sample being analyzed, the expression profile can be compared to a reference or control profile to make a diagnosis, prognosis, assay for drug effectiveness, or other desired assay. A reference or control profile is provided or may be obtained empirically. The resulting expression profile is compared to a single reference/control profile to obtain information about the cell/tissue phenotype being analyzed. Further alternatively, the resulting expression profile may be compared to two or more different reference/control profiles to obtain further information about the phenotype of the cell/tissue being analyzed. For example, the resulting expression profile can be compared to a positive and negative reference profile to obtain definitive information about whether the cell/tissue has the phenotype of interest.
Determination or analysis of the difference value, i.e., the difference in expression between the two profiles, can be performed using any conventional method, many of which are known to those skilled in the art of array, such as by comparing digital images of expression profiles, by comparing databases of expression data, and the like. Patents describing ways to compare expression profiles include, but are not limited to, U.S. Pat. nos. 6,308,170 and 6,228,575, the disclosures of which are incorporated herein by reference. Methods of comparing expression profiles are also described herein.
A statistical analysis step can then be performed to obtain the weighted contributions of the set of genes. For example, the methods can be described by Tibshirani et al (2002) p.n.a.s.99: 6567 nearest neighbor centroids analysis (nearest neighbor centroids analysis) as described in FIGS. 6572 calculates the centroid of each class, and then calculates the mean squared distance between the specified expression profile and each centroid, normalized by the standard deviation within the class.
The classification may be defined probabilistically, wherein the cutoff value (Cut-off) may be empirically generated. In a specific embodiment of the invention, a probability of about 0.4 may be used to distinguish between dormant and induced patients, more typically a probability of about 0.5, and also a probability of about 0.6 or higher. A "high" probability may be at least about 0.75, at least about 0.7, at least about 0.6, or at least about 0.5. The "low" probability may be no more than about 0.25, no more than 0.3, or no more than 0.4. In many embodiments, the information obtained above about the cells/tissues analyzed can be used to predict whether a host, subject or patient will be treated with a target therapy to optimize its dosage.
Identification of cell populations and subpopulations
In some embodiments of the invention, for example, CSCs may be identified based on the identification of cancer stem cells expressing a cancer stem cell marker (e.g., CD66a) that have an epithelial-like cancer, including but not limited to breast cancer and colon cancer. There is a subset of oncogenic cancer cells that are both self-renewing and differentiating. These oncogenic cells are responsible for the maintenance of the tumor and also produce large numbers of abnormally differentiated, non-oncogenic progeny, and thus meet the definition of cancer stem cells. Oncogenic potential is contained in a subpopulation of cancer cells that differentially express the markers of the invention. As shown herein, there is heterogeneity in the population of cells that positively express cancer stem cell markers, such as where CD66 is negative (CD 66)-) Is rich in cancer stem cells (carcinogenic), and CD66a+The cells of (a) are not carcinogenic. Detection of such heterogeneity in the population enables the determination of the subpopulation.
One skilled in the art will appreciate that a variety of sequences representing genes, transcripts and/or proteins may be analyzed. Such sequences may determine and/or differentiate between cellular phenotypes in a sample.
The marker or set of markers may be selected based on multiple aspects of a target population or subpopulation in a sample, e.g., tissue origin (e.g., neural versus epithelial) or disease state (e.g., cancerous versus non-cancerous). Other sequences that can be used to differentiate between cell populations (e.g., cancer stem cells from normal cells) can be determined using the methods described herein, such as by detecting a change (e.g., up-or down-regulation) in a gene of a target population.
Nucleic acids used to distinguish one population from another may be up-or down-regulated as compared between populations. For example, the expression of certain nucleic acids is up-or down-regulated in cancer cells as compared to normal cells, stem cells as compared to differentiated cells, and cancer stem cells as compared to differentiated cancer cells. In certain instances, up-or down-regulation of a gene can be used to differentiate between subpopulations in a large population. For example, certain nucleic acids are expressed only in normal cells, and cancer stem cells, or only in cancer stem cells.
A nucleic acid is up-regulated or down-regulated when compared to another population or subpopulation, to a specific nucleic acid of known or standard expression level. Alternatively, when analyzing multiple gene expression, a heatmap can be created by subtracting the mean and dividing each gene independently by the standard deviation, and assigning values based on the degree of deviation from the mean. For example, a value of +/1 may represent a standard deviation of 2.5-3 from the mean. Such analysis may be further refined so that genes in the range of "+/-3" may be used to cluster different types of clusters (e.g., "+ 3" for cancer and "-3" for normal cells, so that clustering algorithms can distinguish between them). The up-regulated gene may have a "+" value.
In certain examples, a combination of differentially expressed nucleic acids can be used as a profile for a particular population or subpopulation. A profile can comprise any number of differentially expressed nucleic acids and/or proteins, e.g., at least 1, 2, 3, 4,5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50 or more nucleic acids and/or proteins. In certain examples, nucleic acids used to identify a target population or subpopulation may be similarly expressed in a target and non-target population or non-target subpopulation. Such similarly expressed nucleic acids are typically used in combination with other differentially expressed nucleic acids to identify target populations or subpopulations.
The methods described herein can be used to analyze heterogeneous cell populations from any source (e.g., biopsy samples, normal tissue, solid tumors, etc.). Such methods can be used to isolate and analyze any cell population, such as a target population in a larger heterogeneous population or subpopulation, the presence of target cells, cancer or other stem cells in a heterogeneous population or subpopulation, or an intact heterogeneous population.
Discovery of biomarkers
The methods disclosed herein allow for the determination of novel markers associated with a population or subpopulation of cells (e.g., normal cells, cancer cells, disease state cells). The marker may include any biomarker including, but not limited to, DNA, RNA, and protein. In certain examples, the marker for the cell population is a gene or mRNA that is not normally expressed in the designated cell (e.g., a progenitor cell or a cell expressing a differentiation marker expresses a stem cell gene or a cell expressing a differentiation marker also expresses a proliferation gene). Typically, more than one marker is assessed, e.g., 2, 3, 4,5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more markers. Where the marker is expressed RNA, any portion of the transcriptome may be determined up to and including the full transcriptome.
Analysis of the expression pattern of nucleic acids in a certain target cell population or subpopulation can lead to the identification of new markers that separate the target population or subpopulation from other cell populations or subpopulations. For example, when a unique surface marker protein is expressed in a target population or subpopulation, antibodies that bind the marker can be developed for use in isolating and/or identifying the population or subpopulation of cells in the same or other individuals (e.g., by FACS). The identification of population or subpopulation specific markers includes certain markers that are not present in the cell population or subpopulation, which may be used for negative selection. The presence of a marker in a population or subpopulation can be determined using the methods described herein, and the presence of the marker can be used to define a cell population. mRNA in a population or subpopulation of cells analyzed showed that certain genes were differentially expressed in normal and cancer cells. Differential expression may include increased or decreased transcript levels, lack of transcription, and/or altered expression regulation (e.g., different expression patterns in response to a stimulus). The mRNA or other marker used as a marker for a population or subpopulation of cells may also include mutations present in the population or subpopulation of cells (e.g., cancer cells and cancer stem cells, but not normal cells). One skilled in the art will appreciate that such markers may represent a population of cells from a single individual being tested and/or may represent markers from many individuals. In some instances, the expressed mRNA is translated into a protein that can be detected by any of a wide range of protein detection methods (e.g., immunoassays, western blotting, etc.).
Other detectable labels include microrna (microrna). In some examples, the expression level of a microrna can serve as a marker for a population of cells in which the expression of a particular microrna in the population of cells is increased or decreased by about 1.5, 1.6, 1.7, 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3.0 or more fold compared to a similar population of cells.
Determination of transcriptomes in cell populations and subpopulations
In order to obtain further information about the cells isolated by any of the methods of the invention (e.g., FACS isolation of cells from a population followed by partial transcriptional analysis), it is advantageous to further analyze the cells. In some instances, individual cells isolated from a sample (e.g., by isolation of individual cells, with or without prior enrichment) are lysed and target nucleic acids (e.g., genomic DNA, mRNA, etc.) are collected. As described herein, transcriptional analysis of a gene or group of genes can be used to classify isolated cells into groups whose expression profiles exhibit similarity (e.g., cancer stem cells versus non-stem cells). Without being bound by theory, such information suggests differences in function, as the genes transcribed by the cell are closely related to their function. Once the cells are organized into groups of similar cells (e.g., those exhibiting similar or identical transcription profiles), individual cell lysates and/or lysates containing collected nucleic acids of similar cells can be further analyzed at the transcriptome level. In certain instances, the lysate (single cell or collection of similar cells) is used in methodologies (e.g., high throughput sequencing) to define the portion of the transcriptome for each cell and/or collection of similar cells. Transcriptome information from individual cells can be analyzed at the population level by comparing and/or combining the results of individual cells with the results of other similar cells. Transcriptome information from a similar collection of cells may also be used to define the transcriptional characteristics of such a collection.
Any cell population, such as a cell population comprising stem cells, can be studied in such a way. In some embodiments, the cells comprise stem cells, including embryonic stem cells, adult stem cells including, but not limited to, cancer stem cells, Hematopoietic Stem Cells (HSCs), and mesenchymal stem cells, and induced pluripotent stem cells. Typically, the cell population is a heterogeneous population (e.g., a clinical specimen). The methods herein (e.g., FACS sorting) can be used to isolate target subpopulations in a larger cell population according to any relevant criteria (e.g., expression of surface proteins). In some embodiments, such classified cells are differentiated such that each classified population contains 10 cells or less, 5 cells or less, 4 cells, 3 cells, 2 cells, or 1 cell.
In certain embodiments, the cells are lysed and divided into 2 or more fractions. A portion of the lysate is further analyzed (e.g., a small set of genes is analyzed to detect expression) to detect and/or differentiate subpopulations within a larger heterogeneous population. Further analysis indicates that the cell (e.g., hematopoietic stem cell) is in a subpopulation of interest. Lysates of individual cells or lysates from a collection of similar cells can be analyzed. A "similar cell" population can be determined based on the similarity of expression of 1, 2, 3, 4,5, 6,7, 8, 9, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 or more genes.
The target cells and cell populations or subpopulations may be further analyzed. The population or subpopulation of cells may include cells comprising a portion of the original sample, e.g., 1%, 2%, 3%, 4%, 5%, 6%, 7%, 8%, 9%, 10% or more of the cells of the original sample. By using the methods described herein, a population or subpopulation of cells of interest can be isolated from a heterogeneous sample such that the isolated population or subpopulation may be free of 51%, 52%, 53%, 54%, 55%, 56%, 57%, 58%, 59%, 60%, 61%, 62%, 63%, 64%, 65%, 66%, 67%, 68%, 69%, 70%, 71%, 72%, 73%, 74%, 75%, 76%, 77%, 78%, 79%, 80%, 81%, 82%, 83%, 84%, 85%, 86%, 87%, 88%, 89%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99%, or 100% of cells that are not members of the target population or subpopulation. Since the lysate is prepared from cells isolated from the original population, the study of similar cell populations can be accomplished by collecting the lysate from similar cells.
Further analysis of the cells, populations, and/or subpopulations may include analysis of the whole transcriptome. In some examples, the lysate includes mRNA that is amplified for analysis (e.g., cDNA) or analyzed directly (e.g., mRNA sequencing, microarray analysis). Amplification of mRNA can be performed by any method known in the art (e.g., in vitro transcription, ligation PCR cDNA amplification). In certain embodiments, the amplification of the mRNA can be performed in a microfluidics instrument, or using a microfluidics instrument. Analysis of the whole transcriptome can be performed by sequencing platforms such as those commercially available from Illumina (RNA-Seq) and Helicos (Digital Gene Expression or "DGE"). In certain embodiments, the polynucleotide of interest is sequenced. The target nucleic acid can be sequenced by conventional gel electrophoresis based methods using, for example, Sanger-type sequencing. Alternatively, sequencing may be accomplished by using some "next generation" method. Such "next generation" sequencing methods include, but are not limited to, those commercially available 1)454/Roche Lifescience, including, but not limited to, those described in Margulies et al Nature (2005) 437: 376-380 (2005); and U.S. patent nos.7,244,559; 7,335,762, respectively; 7,211,390, respectively; 7,244,567, respectively; 7,264,929, respectively; 7,323,305; 2) helicos BioSciences Corporation (Cambridge, MA), as described in U.S. application Ser. No.11/167046 and U.S. Pat. No. 7501245; 7491498, respectively; 7,276,720, respectively; and U.S. patent application publication nos. us 20090061439; US 20080087826; US 20060286566; US 20060024711; US 20060024678; US 20080213770; and US 20080103058; 3) applied Biosystems (e.g., SOLiD sequencing); 4) dover systems (e.g., Polonator G.007 sequencing); 5) illumina, such as U.S. patent nos.5,750,341; 6,306,597 and 5,969,119; and 6) Pacific Biosciences, such as U.S. Pat. Nos.7,462,452; 7,476,504, respectively; 7,405,281, respectively; 7,170,050, respectively; 7,462,468, respectively; 7,476,503; 7,315,019, respectively; 7,302,146, respectively; 7,313,308, respectively; us application publication nos. us 20090029385; US 20090068655; US20090024331 and US 20080206764. All documents are incorporated herein by reference. Such methods and apparatus are provided herein by way of example, and not by way of limitation.
A whole transcriptome analysis that can further identify different cell subsets can be performed for a number of reasons, including but not limited to: 1) detecting a gene activity and/or a transcription factor that regulates its development that may reveal a unique biological property of the subpopulation; 2) locating and/or identifying surface markers that can be used to purify the subpopulation (e.g., by FACS classification); and 3) detecting and/or identifying cellular genes and/or gene products that distinguish a sub-population from a comprehensive population, such as cancer stem cells versus normal tissues, as potential drug targets for disease.
Analysis of the population and/or subpopulation (e.g., by transcriptome analysis) may allow for refinement of techniques for isolating cells belonging to the subpopulation. For example, where the methods reveal a subpopulation-specific surface antigen, such cells can be isolated from a heterogeneous population (e.g., a patient sample) using antibodies developed by any available antibody synthesis method. In addition, transcriptome profiles can be used to develop gene expression sets that can be used to identify cells from other populations (e.g., samples from the same or different patients).
Diagnosis and prognosis
The invention may be used for the prevention, treatment, detection or study of any condition, including cancer, inflammatory diseases, autoimmune diseases and infections. Examples of cancers include prostate, pancreatic, colon, brain, lung, breast, bone and skin cancers. Examples of inflammatory conditions include irritable bowel syndrome and ulcerative colitis. Examples of autoimmune diseases include crohn's disease, lupus, and graves' disease. For example, the present invention is useful for the prevention, treatment, detection or study of the following diseases: gastrointestinal cancers, such as anal, colon, esophageal, gallbladder, stomach, liver and rectal cancers; genitourinary cancers such as cancers of the penis, prostate and testis; gynecological cancers, such as ovarian, cervical, endometrial, uterine, fallopian tube, vaginal and vulvar cancers; head and neck cancers, such as hypopharynx, larynx, oropharynx, lip, mouth, and oral cavity cancers, salivary gland cancers, digestive tract cancers, and sinus cancers; metastatic cancer; a sarcoma; skin cancer; urinary tract cancer, including bladder, kidney, and urinary tract cancer; cancers of the endocrine system, such as thyroid, pituitary, and adrenal glands and pancreatic islets; and pediatric cancers.
The present invention also provides a method of optimizing treatment by first classifying individual cells in a sample and then selecting the appropriate therapy, dose, treatment modality, etc. based on the information of the classification, which optimizes the differentiation between delivering anti-proliferative treatments to non-target cells while reducing unwanted toxicity. The treatment is optimized by selecting a treatment that reduces unwanted toxicity while providing effective antiproliferative activity. Treatments that affect only the sub-cells in the sample may be selected. In certain examples, treatments are selected that affect less than about 5%, less than about 1%, less than about 0.5%, less than about 0.2%, less than about 0.1%, less than about 0.05%, less than about 0.02%, less than about 0.01%, or less of the cells in the sample.
A marker of a condition may refer to the expression pattern of one or more genes or proteins in a single cell that indicates the presence of the condition. A cancer stem cell marker refers to the expression pattern of one or more genes and/or proteins whose expression is indicative of a cancer stem cell phenotype. An autoimmune or inflammatory cell marker refers to a gene and/or protein whose expression is indicative for an autoimmune or inflammatory cell marker. Markers can be obtained from all or part of the data set, typically markers include markers from at least about 5 genes and/or proteins, at least about 10 genes and/or proteins, at least about 15 genes and/or proteins, at least about 20 genes and/or proteins, at least about 25 genes and/or proteins, at least about 50 genes and/or proteins, at least about 75 genes and/or proteins, expression information for at least about 100 genes and/or proteins, at least about 150 genes and/or proteins, at least about 200 genes and/or proteins, at least about 300 genes and/or proteins, at least about 400 genes and/or proteins, at least about 500 genes and/or proteins, or more genes and/or proteins. When using subclasses of the data set, the subclasses can include up-regulated genes, down-regulated genes, or combinations thereof.
Clinical application analysis of patient samples
Although the following description focuses primarily on cancer stem cells, the methods described herein may be used to isolate and/or analyze any cell population, including but not limited to normal cells (e.g., normal stem cells, normal progenitor cells, and normal mature cells), virally infected cells, inflammatory cells, progenitor cells, cancer cells (e.g., oncogenic cells, non-oncogenic cells, cancer stem cells, and differentiated cancer cells), disease state cells (e.g., cancer cells, inflammatory bowel disease cells, ulcerative colitis cells, etc.), microbial (bacterial, fungal, protozoal) cells, and the like, of any tissue. Thus, the details provided for using Cancer Stem Cells (CSCs) are examples of analyses that can be performed on any disease state or condition.
In some embodiments of the invention, the number of CSCs in a patient sample may be determined relative to the total number of cancer cells. For example, cells from a biopsy sample are isolated, expression of one or more mrnas and/or proteins that are indicative of cancer cells is analyzed, and cells that exhibit a CSC phenotype are quantified. Alternatively, the collected data for a particular CSC population or subpopulation may be used to develop an affinity (e.g., antibody) screen for the population or subpopulation, and such an affinity screen may be used to quantify the number of cells. Typically, a greater percentage of CSCs indicates the potential for continued self-renewal of cells with a cancer phenotype. The number of CSCs in a patient sample may be compared to positive and/or negative reference samples, such as patient samples, e.g., blood samples, remission stage patient samples, and the like. In some embodiments, CSCs are quantified during a treatment period, wherein the number of cancer cells and the percentage of such CSC cells are quantified before, during, or after the course of treatment. Desirably, treatment that targets cancer stem cells results in a reduction in the total number and/or percentage of CSCs in a patient sample.
CSCs may be identified by their phenotype associated with a particular marker and/or their functional phenotype. In some embodiments, CSCs are identified and/or isolated by binding the cells to an agent specific for a marker of interest. The cells to be analyzed may be viable cells, or may be fixed or embedded cells.
The presence of CSCs in a patient sample may be indicative of the stage of cancer (e.g. leukemia, breast cancer, prostate cancer). In addition, detection of CSCs can be used to monitor response to treatment and aid prognosis. The presence of CSCs can be determined by quantifying cells with a stem cell phenotype. In addition to the determination of cell surface phenotype, it may be useful to quantify cells in a sample that have "stem cell" characteristics, which can be determined by functional criteria, such as self-renewal capacity, the ability to generate tumors in vivo, such as in a xenograft model, and the like.
Clinical samples for use in the methods of the invention may be obtained from a variety of sources, particularly blood, although in some instances samples such as bone marrow, lymph, cerebrospinal fluid, synovial fluid, and the like may be used. The sample may comprise a biopsy sample, or other clinical sample containing cells. Some samples include solid tumors or portions thereof. In the case of analyzing cell masses, such cell masses can be separated by appropriate methods known in the art (e.g., enzymatic digestion, physical separation). Such samples can be separated prior to analysis by centrifugation, washing, density fractionation, apheresis, affinity selection, panning, FACS, centrifugation with Hypaque, and the like, typically using mononuclear fractions (PBMC). In this manner, individual cells from a sample (e.g., a solid tumor) can be analyzed for differential gene expression and/or transcriptome analysis as described herein.
Once the sample is obtained, it can be applied directly, frozen or maintained in the appropriate medium for a short period of time. A variety of media can be used to maintain the cells. The sample may be obtained by any convenient procedure, such as biopsy, blood draw, venipuncture, and the like. In some embodiments, the sample will comprise at least about 102Individual cells, more typically at least about 103、104、105Or more cells. Typically, the sample is from a human patient, although animal models such as equine, bovine, porcine, canine, feline, rodent such as mouse, rat, hamster, primate, and the like may be used.
The cell sample may be dispersed or suspended using an appropriate solution. Such solutions are typically balanced salt solutions, such as normal saline, PBS, Hank's balanced salt solution, and the like, conveniently supplemented with fetal bovine serum or other naturally occurring factors, and low concentrations of acceptable buffers, typically 5-25 mM. Suitable buffers include HEPES, phosphate buffer, lactate buffer, and the like.
Cell staining assays can be performed using conventional methods. Techniques for providing accurate counts include fluorescence activated cell sorters with varying degrees of doping, such as multiple color channels, low angle and blunt light scattering detection channels, impedance channels, and the like. Cells are selected for dead cells by using a dye associated with dead cells (e.g., propidium iodide).
The affinity reagent may be a specific receptor or ligand for the cell surface molecule described above. In addition to antibody reagents, peptide-MHC antigen and T cell receptor pairs can be used; peptide ligands and receptors; effector and receptor molecules, and the like. Antibodies and T cell receptors can be monoclonal or polyclonal and can be produced by transgenic animals, immunized animals, immortalized human or animal B cells, cells transformed with DNA vectors encoding antibodies or T cell receptors, and the like. The details of antibody preparation and its suitability for use as a specific binding member are well known to those skilled in the art.
One approach is to use antibodies as affinity reagents. Conveniently, these antibodies may be conjugated to labels which are used separately. Labels include any label known in the art, including but not limited to magnetic beads that allow direct separation, biotin that can be removed with avidin or streptavidin bound to a support, fluorescent dyes that can be used with fluorescence activated cell sorters, and the like, to allow for convenient separation of specific cell types. Useful fluorescent dyes include phycobiliproteins such as phycoerythrin and phycocyanin, fluorescein and Texas Red. Each antibody is often labeled with a different fluorescent dye to allow for independent sorting of each label.
The antibody can be added to the cell suspension and incubated for a period of time sufficient to bind the available cell surface antigen. Incubation is typically at least about 5 minutes, and typically less than about 30 minutes. It is desirable to have sufficient antibody concentration in the reaction mixture so that the efficiency of the separation is not limited by the absence of antibody. The appropriate concentration was determined by titration. The medium in which the cells are separated is any medium that can maintain the viability of the cells. One medium that can be utilized is phosphate buffered saline containing 0.1% to 0.5% BSA. Various media are commercially available and used depending on the nature of the cells, including Dulbecco's Modified Eagle Medium (dMEM), Hank's basic salt solution (HESS), Dulbecco's phosphate buffered saline (dPBS), RPMI, Iscove's medium, PBS containing 5mM EDTA, etc., typically supplemented with fetal bovine serum, BSA, HSA, etc. The labeled cells can then be quantified based on the expression of cell surface markers as described above.
Comparison of differential progenitor cell analysis (differential progenitor analysis) obtained from patient samples with reference differential progenitor cell analysis can be accomplished by using appropriate deductive protocols, AI systems, statistical comparisons, and the like. Comparison of reference differential progenitor cell analysis from normal cells, cells from similar diseased tissue, etc., can provide an indication of the stage of disease. A database of differential progenitor cell analyses of the reference can be compiled. A particular interesting analysis is to follow patients, e.g. in the chronic and pre-leukemic stages of the disease, in order to observe an acceleration of the disease in the early stages. The methods of the invention allow for early therapeutic intervention, such as initiating chemotherapy, increasing chemotherapeutic doses, changing the choice of chemotherapeutic drugs, etc., by detection of pre-clinical acceleration.
Tumor classification and patient stratification
Methods of optimizing treatment by first classifying and selecting the appropriate therapy, dose, treatment modality, etc. based on this information are also provided that optimize the difference between delivering an anti-proliferative therapy to an undesired target cell while reducing undesired toxicity. The treatment is optimized by selecting a treatment that reduces undesirable toxicity while providing effective antiproliferative activity.
In one aspect, the present disclosure provides methods of classifying lesions, such as tumor lesions, immune disorder samples, etc., thus grouping or "stratifying" patients according to single cell (including CSC) gene expression markers. For example, tumors classified as having a high percentage of cancer stem cells have a higher risk of metastasis and death and can therefore be treated more aggressively than more benign types of tumors. Thus, analysis of the population or subpopulation present in a patient sample can be used to identify disease states, monitor treatment patterns, and/or develop treatment methods.
Samples from each of the potential patient collections for clinical trials may be sorted as described above. Patients with similarly classified lesions can then be selected for participation in a survey or clinical trial in which treatment of a heterogeneous patient population is required. The classification of patients can be used to assess the effectiveness of treatment in heterogeneous patient populations. Thus, comparing the disease classification of an individual's expression profile and the profile of a population may allow for the selection or design of drugs or other treatment modalities that are expected to be safe and effective for a particular patient or patient population (i.e., a group of patients with the same type of cancer). Classification can be based on expression (or lack thereof) of 1, 2, 3, 4,5, 6,7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 35, 40, 45, 50 or more nucleic acids and/or proteins.
Diagnosis, prognosis, treatment assessment (Therametrics) and treatment of disorders
The classification methods described herein, as well as their gene products and corresponding genes and gene products, are of particular interest as genetic or biochemical markers (e.g., in blood or tissue) that can detect the earliest changes in disease pathways (e.g., oncogenic pathways, inflammatory pathways, etc.) and/or monitor the efficacy of various therapeutic and prophylactic interventions.
Staging is a method used by physicians to describe how the cancer state progresses in a patient. Staging helps physicians determine prognosis, plan treatment, and evaluate the outcome of such treatment. The cancer types differ, as do staging systems, but they generally involve the following "TNM" system: cancer type, represented by T; whether the cancer metastasizes to nearby lymph nodes, denoted by N; whether the cancer metastasizes to a more distant part of the body is denoted by M. In general, if cancer is detected only in the primary lesion area and does not spread to any lymph nodes, it is called stage I. If it is spread only to the nearest lymph nodes, it is called stage II. In stage III, cancer usually spreads to lymph nodes in the proximal end near the site of incipient injury. Cancers that have spread to distant parts of the body, such as the liver, bones, brain, or other sites, are referred to as stage IV, one of the most advanced stages.
The methods described herein can facilitate fine-tuning of staging by identifying the aggressiveness of the cancer, such as the potential for metastasis and the presence of different regions of the body. Thus, a more aggressive treatment can be adjusted by changing the borderline stage II tumor to a stage III tumor using a stage II cancer with a classification that marks cancers with high metastatic potential. Conversely, the presence of polynucleotides that label low metastatic potential allows for more conservative tumor stages.
For example, breast cancer biopsy samples from stage II patients are analyzed by the methods described herein. Breast cancer may be classified as having high metastatic potential based on an expression profile determined from patient cells. Thus, the treating physician can utilize such information to treat the patient more aggressively than if he or she had not obtained further classification. Determination of the expression of a particular marker may also provide information on the potential target of drug therapy (e.g., cancer-causing cells from a patient expressing the drug target).
Development and identification of therapeutic methods
The methods and compositions described herein can be used for the development of new therapeutic agents or to identify and/or refine existing therapies. For example, by using single cell analysis, the expression profile of a target cell population (e.g., cancer stem cells and differentiated cancer cells, or differentiated cancer cells) can be analyzed to detect potential targets for therapeutic agents. Potential targets include, but are not limited to, specific biomarkers and misregulated pathways. The target of interest may comprise a marker or pathway specific to the cell population of interest.
In one example, the nucleic acid expression of cells of a target population or subpopulation can be analyzed as described herein to detect novel biomarkers that can be therapeutic targets. For example, specific cell surface molecules that are widely expressed in cancer stem cells and/or differentiated cancer cells can be studied as targets for potential therapeutic agents (such as antibodies or other binding moieties-potentially linked to toxins or other such effectors-having specificity for surface molecules). In other examples, the target cell population may be analyzed for pathways involved in the misregulation of the disease process (e.g., loss of regulation of cell cycle mechanisms in cancer cells). Pathways include, but are not limited to, activators and/or repressors of gene expression, expression of particular genes and/or groups of genes, and more complex general pathways. Therapeutic agents that target such mismodulation can potentially affect target cells to alter expression of nucleic acids associated with the target cells. Altered expression induced by the therapeutic agent may result in up-regulation or down-regulation of the nucleic acid. In certain examples, treatment of the cells and/or subject with one or more therapeutic agents can result in expression of a nucleic acid expressed in cells of a similar non-disease state (e.g., treatment results in expression of a cell cycle-associated gene in cells of a similar non-oncogenic cell).
By using the methods and compositions described herein, a target cell population can be analyzed for altered expression of one or more nucleic acids. The development of new and/or refined therapeutics may include analysis of target cell populations (e.g., colon cancer stem cells, breast cancer cells, etc.) to determine nucleic acids that exhibit altered expression profiles compared to "normal" cells. Such cells can be used to screen for potential therapeutic agents that affect expression of these and/or other nucleic acids by exposing an isolated target cell population to a candidate agent and detecting altered expression of the gene following exposure.
The methods described herein are used to analyze the effects of compounds affecting certain cell phenotypes, including but not limited to gene expression, pathway function (e.g., cell cycle, TERT pathway, oxidative stress pathway), and or cell type or morphology. Thus, in addition to or in lieu of analyzing the potential of a compound as a therapeutic agent, compounds affecting such phenotypic characteristics may also be analyzed. For example, an assay for changes in gene expression in a target population (e.g., normal colon cells, normal breast cells, cancer cells, stem cells, cancer stem cells, etc.) exposed to one or more test compounds can be performed to analyze the effect of the test compound on gene expression or other desired phenotype (e.g., marker expression, cell viability). Such assays are useful for a variety of purposes, such as cell cycle studies or assays of known or unknown pathways.
The agent to be analyzed for its potential therapeutic value may be any compound, small molecule, protein, lipid, carbohydrate, nucleic acid or other agent suitable for therapeutic use. The isolated target population of cells can be exposed to a library of potential therapeutic agents (e.g., antibody library, small molecule library) to determine their effect on gene expression and/or cell viability. In certain instances, the candidate therapeutic agent will specifically target the target cell population. For example, via single cell analysis, the presence of a mutation present in a target cell (e.g., a cancer stem cell and/or a differentiated cancer cell) is revealed, which mutation can be targeted by a candidate therapeutic agent. In certain examples, the treated cells can be exposed to a single cell analysis to determine the effect of a candidate therapeutic on the expression of one or more genes of interest and/or on the transcriptome.
In other embodiments of the invention, the agent is targeted to a population or subpopulation of disease state cells by specifically binding to a marker or combination of markers present on the target population or subpopulation. In certain embodiments, the agent comprises an antibody or antigen-binding derivative thereof specific for the label or combination of labels, optionally linked to a cytotoxic moiety. Such methods can be used to exclude a target population or subpopulation (e.g., exclude a population of cancer stem cells) from a patient.
Therapeutic agent screening assays
Cells expressing the marker or combination of markers (e.g., disease state cells) can be used for in vitro analysis and screening to detect factors and chemotherapeutic agents active on differentiated cancer cells and/or cancer stem cells. Of particular interest is a screening assay for agents active on human cells. A number of assays are available for this purpose, including immunoassays for protein binding; determination of cell growth, differentiation and functional activity; the generation of factors; and so on (see, e.g., Balis, (2002) J Nat' l Cancer Inst 94: 2; 78). In other embodiments, isolated polynucleotides corresponding to the markers and marker combinations of the invention are used in drug screening assays.
In screening assays for bioactive agents, antiproliferative drugs, and the like, a marker or target cell composition is contacted with a target agent, and the effect of the agent is evaluated by monitoring output parameters on the cell, such as expression of the marker, viability of the cell, and the like; or binding effect or potency on the enzymatic or receptor activity of the polypeptide. For example, a composition of breast cancer cells known to have an expression profile of "cancer stem cells" is exposed to a test agent and the exposed cells are analyzed individually as described herein to determine whether the test agent changes the expression profile as compared to untreated cells. Any isolated population of cells described herein or produced by the methods described herein can be freshly isolated, cultured, genetically altered, and the like. The cells may be environmentally induced clonal culture variants: e.g., separated into separate cultures and grown under different conditions, e.g., with or without drugs; the presence or absence of a cytokine or a combination thereof. The way in which cells respond to agents (e.g., peptides, siRNA, small molecules, etc.), particularly pharmaceutical agents (including the time period of the response), is an important reflection of the physiological state of the cell.
The parameter is a cellular component that can be quantified, particularly a component that can be accurately measured, for example, in a high-throughput system. The parameter may be any cellular component or cellular product that includes a cell surface determinant, receptor, protein or conformational or post-translational modification thereof, lipid, carbohydrate, organic or inorganic molecule, nucleic acid such as mRNA or DNA, or a portion derived from such a cellular component or composition thereof. For example, in a specific embodiment, an isolated cell as described herein is contacted with one or more agents and the expression level of a target nucleic acid is determined. Agents that alter the expression of the detected nucleic acid can be further analyzed for therapeutic potential, e.g., where the cells exhibit a more similar expression pattern to cells of a non-disease state. While most parameters (such as mRNA or protein expression) provide a quantitative readout, in some instances, semi-quantitative or qualitative results may be acceptable. The readout may comprise a single determined value, or may comprise an average, median or deviation value, and so forth. Typically, ranges of parameter readout values for each parameter are obtained from multiple states of the same assay. Variability is expected, and the range of values for each of the set of test parameters is obtained by using standard statistical methods with common statistical methods used to provide a single value.
Target agents for screening include compounds, known or unknown, including various chemical classes, primarily organic molecules, which may include organometallic molecules, genetic sequences, and the like. An important aspect of the present invention is the evaluation of drug candidates, including toxicity tests and the like.
In addition to complex biological agents, candidate agents include organic molecules containing functional groups necessary for structural reactions, particularly hydrogen bonds, and typically include at least one amine, carbonyl, hydroxyl, or carboxyl group, often at least two functional chemical groups. Candidate agents may include carbocyclic or heterocyclic structures and/or aromatic or polyaromatic structures substituted with one or more of the functional groups described above. Candidate agents may also be found in biomolecules, including peptides, polynucleotides, carbohydrates, fatty acids, steroids, purines, pyrimidines, or derivatives, structural analogs, or combinations thereof. In some instances, the test compound may have a known function (e.g., reducing oxidative stress), but it may act through an unknown mechanism or act on an unknown target.
Including pharmaceutically active drugs, genetically active molecules, and the like. The target compounds include chemotherapeutic agents, hormones or hormone antagonists, and the like. Examples of pharmaceutical formulations suitable for the present invention are those described below: "the pharmacological Basis of Therapeutics," Goodman and Gilman, McGraw-Hill, New York, New York, (1996), ninth edition, in: water, Salts and Ions; drugs influencing return function and electrolytic Metabolism; a Drugs influencing gaming functional Function; chemotherpayof Microbial Diseases; (ii) Chemotherapy of Neoplastic Diseases; dlugs active one blood-fouling organs; hormons and holmonone Antagonists; vitamins, Dermatology; and, Chapter Toxicology, all of which are incorporated herein by reference. Toxins and biological and chemical warfare agents (warfare agents) are also included, see for example Somani: S.M. (Ed.), "Chemical WarfareAgents," Academic Press, New York, 1992).
Test compounds include all of the above classes of molecules, and may further include samples of unknown content. The target is a complex mixture of naturally occurring compounds obtained from natural sources such as plants, fungi, bacteria, protists or animals. Although many samples include compounds in solution, solid samples that are soluble in an appropriate solvent can also be analyzed. Target samples include environmental samples such as groundwater, seawater, mineral water, etc., biological samples such as lysates prepared from grains, tissue samples, etc.; a preparation-period sample, such as in a time course during drug preparation; and a library of compounds prepared for analysis; and the like (e.g., compounds analyzed for potential therapeutic value, i.e., drug candidates).
The sample or compound may also include additional components, for example, components that affect ionic strength, pH, total protein concentration, and the like. Additionally, the sample may be processed to achieve at least partial fractionation or concentration. If it is desired to reduce degradation of the compound, the biological sample may be stored under conditions such as nitrogen, freezing, or a combination thereof. The sample volume used is sufficient to allow measurable detection, e.g., from about 0.1ml to 1ml of biological sample is sufficient.
Compounds comprising candidate agents may be obtained from a wide variety of sources, including libraries of synthetic or natural compounds. For example, there are many ways to synthesize randomly and directionally a large number of organic compounds, including expression of biomolecules, including randomized oligonucleotides and oligopeptides. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant and animal extracts are available or readily prepared. In addition, natural or synthetically produced libraries and compounds can be readily modified by conventional chemical, physical and biochemical means and can be used to generate combinatorial libraries. Known pharmaceutical formulations may be subjected to directed or random chemical modifications, such as acylation, alkylation, esterification, amidation, and the like, to produce structural analogs.
The formulations are screened for biological activity by adding the formulation to at least one and typically a plurality of cell samples, typically in conjunction with cells lacking the formulation. Changes in parameters responsive to the agent are measured and the results are evaluated by comparison to a reference culture such as obtained with other agents in the presence and absence of the agent.
The formulation may be added to the cell culture medium in culture in solution or in a fast dissolving form. The formulation may be added in a fine flow, intermittent or continuous addition flow-through (flow-through) system, or alternatively, the compound pellets may be added singly or incrementally to additional static solutions. In a flow-through system, two fluids are used, one of which is a physiologically neutral solution and the other of which is the same solution to which the test compound is added. A first fluid is passed through the cells followed by a second fluid. In the single solution approach, a bolus of test compound is added to the volume of medium surrounding the cells. The total concentration of culture medium components should not change significantly when the pellet is added, or between the two solutions in a flow-through process.
The formulation of some formulations does not include additional ingredients, such as preservatives, which can significantly affect the overall formulation. Such formulations therefore consist essentially of the biologically active compound and a physiologically acceptable carrier such as water, alcohol, DMSO and the like. However, if the compound is a liquid without a solvent, the formulation may consist essentially of the compound itself.
Multiple assays can be performed in parallel at different agent concentrations to obtain different responses to different concentrations. As is known in the art, determining an effective concentration of a formulation typically uses a range of concentrations from a 1: 10 or other logarithmic standard dilution. The concentration was further refined with a second series of dilutions if necessary. Typically, one of these concentrations serves as a negative control, i.e., a concentration of 0 or below the detection level of the agent or at or below the concentration of the agent that does not cause a detectable change in the phenotype.
A variety of methods can be used to quantify the presence of a selected marker. To determine the amount of molecule present, the conventional method is to label the molecule with a detectable moiety, which may be fluorescent, luminescent, radioactive, enzymatically active, etc., particularly one that specifically binds to a molecule with a high affinity parameter. Fluorescent moieties are readily available for labeling virtually any biomolecule, structure, or cell type.
The immunofluorescent moiety is directed to bind not only to specific proteins, but also to specific conformers, cleavage products or site modifications such as phosphorylation. Individual peptides and proteins can be engineered to be autofluorescent, e.g., by expressing them as green fluorescent protein chimeras within cells (for review see Jones et al (1999) Trends Biotechnol 17 (12): 477-81). Thus, the antibody may be genetically modified to provide a fluorescent dye as a structural part thereof. Depending on the label chosen, the parameters can be measured by using substances other than fluorescent labels, using immunoassay techniques such as Radioimmunoassay (RIA) or enzyme-linked immunosorbent assay (ELISA), homologous enzyme immunoassays, and related non-enzymatic techniques. The quantification of nucleic acids, in particular messenger RNA, is also of interest as a parameter. These can be measured by hybridization techniques that rely on nucleotide sequences of nucleic acids. Techniques include polymerase chain reaction and gene array techniques. See, e.g., Current Protocols in Molecular Biology, eds. Ausubel et al, John Wiley & Sons, New York, NY, 2000; freeman et al (1999) Biotechniques 26 (1): 112-225; kawamoto et al (1999) Genome Res 9 (12): 1305-12; and Chen et al (1998) Genomics 51 (3): 313-24.
Database and data analysis of expression profiles
The invention also provides databases of gene expression profiles of cancer stem cells and other cell types and uses thereof. Typically, such databases include expression profiles from various cell subsets such as cancer stem cells, cancer non-stem cells, corresponding normal portions of cancer cells, disease state cells (e.g., inflammatory bowel cells, ulcerative colitis cells), virally infected cells, early progenitor cells, initially differentiated progenitor cells, later differentiated progenitor cells, and mature cells. The expression profiles and their databases can be provided in a variety of media to aid in their use. "Medium" refers to a product containing the expression profile information of the present invention. The database of the present invention may be recorded on a computer-readable medium, such as any medium that can be read and accessed directly by a computer. Such media include, but are not limited to: magnetic storage media such as floppy disks, hard disk storage media, and magnetic tape; optical storage media such as CD-ROM; electrical storage media such as RAM and ROM; and hybrids of these classes, such as magnetic/optical storage media. Those skilled in the art will readily understand how to use any existing computer readable medium to create an article of manufacture containing a record of the database information of the present invention. "recording" refers to a method of storing information on a computer-readable medium using any such method known in the art. Any convenient data storage structure may be selected based on the means for accessing the stored information. A variety of data processing programs and formats may be used for storage, such as word processed text files, database formats, and the like.
As used herein, "computer-based system" refers to hardware devices, software devices, and data storage devices used to analyze the information of the present invention. The minimal hardware of the computer-based system of the present invention includes a Central Processing Unit (CPU), input devices, output devices, and data storage devices. Those skilled in the art will readily appreciate that any presently available computer-based system is suitable for use with the present invention. The data storage device may comprise any product containing a recording of the information of the invention as described above, or a memory access device accessible to such a product.
A variety of structural formats of input and output devices may be used to input and output information in the computer-based system of the present invention. Such presentation provides the skilled artisan with an ordering of the similarities contained in the test expression profiles and with an identification of the degree of similarity.
The data set can be analyzed using a variety of methods. In one embodiment, the expression data is transformed and normalized. For example, ratios are generated by averaging the expression data for each gene (by dividing the intensity measurements for each gene on a given array by the average intensity of the genes across all arrays), (2) ratios obtained by log transformation (based on 2), and (3) expression data is then medially centered across the arrays and then across the genes.
For cDNA microarray data, genes with fluorescent hybridization signals at least 1.5 times greater than the local background fluorescent signal in the reference channel were considered to be sufficiently detectable. Genes were centered with the mean values in each data set and the mean association clustering performed.
Data analysis can also be performed using a scaled approach (scaled approach). For example, pearson correlation of gene expression values may provide quantitative values reflecting the signal for each CSC. The higher the correlation value, the more the sample looks like the CSC phenotype of reference. Similar correlations can be made for any cell type, including normal cells, progenitor cells, autoimmune phenotype cells, inflammatory phenotype cells, infected cells, differentiated cancer cells, normal stem cells, normal mature cells, and the like. Negative correlation values indicate the opposite behavior. The threshold for classification can be moved up or down from 0 depending on the clinical purpose. For example, to predict metastasis as the first recurrent event, sensitivity and specificity may be calculated at 0.05 increasing relevance values for each threshold between-1 and +1, and thresholds may be selected that give the desired sensitivity of metastasis prediction, such as 80%, 90%, 95%, etc.
To provide a significance order, a False Discovery Rate (FDR) may be determined. First, a set of zero distributions of different values is generated. In one embodiment, the values of the observed profiles are transformed to produce a series of distributions of correlation coefficients that have no chance of being obtained, thereby generating an appropriate set of zero distributions of correlation coefficients (see Tusher et al (2001) PNAS 98, 5118-21, incorporated herein by reference). The zero distribution group is obtained as follows: interchanging the values of each of all available profiles; calculating the pairing correlation coefficients of all the profiles; calculating a probability density function of the interchanged correlation coefficients; the step is repeated N times, where N is a large number, typically 300. By using N distributions, one can calculate appropriate measurements (mean, median, etc.) of counts of correlation coefficients whose values exceed the (similarity) values from the distribution of experimentally observed similarity values at a given level of significance.
FDR is the ratio of the number of expected false significant correlations (estimated from correlations greater than the selected pearson correlations in the random number set) to the number of correlations greater than the selected pearson correlations in the empirical data (significant correlations). This truncated (cut-off) correlation value can be applied to the correlation between the profiles of the experiments.
By using the aforementioned distributions, a confidence level is selected for significance. Which is used to determine the lowest value of the correlation coefficient that exceeds the randomly obtained result. By using this method, one can obtain a threshold for positive correlations, negative correlations, or both. By using this threshold, the user can filter the values of the observed pair-wise correlation coefficients and remove those that do not exceed the threshold. Also, an estimate of the false positive rate for a given threshold may be obtained. For each individual "random correlation" distribution, one can find how many observations are outside of the threshold range. This step provides a series of counts. The mean and standard deviation of the series provide the average number of potential false positives and their standard deviation.
Data may be non-supervised hierarchical clustering to reveal relationships between profiles. For example, hierarchical clustering may be performed, where pearson relevance is used as a clustering metric. Clustering of the correlation matrix, such as by using multidimensional criteria, enhances the visualization of functional homology similarities and dissimilarities. Multidimensional scaling (MDS) can be applied in 1, 2, 3 dimensions.
The analysis may be done in hardware or software or a combination thereof. In one embodiment of the invention, a machine-readable storage medium is provided, the medium comprising a data storage material encoded with machine-readable data, which is capable of exhibiting any of the data sets and data comparisons of the present invention when the machine is programmed with instructions for use of the data. Such data can be used for a variety of purposes, such as drug discovery, analysis of interactions between cellular components, and the like. In some embodiments, the invention is implemented in a computer program executing on a programmed computer comprising a processor, a data storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device. Application code inputs data to perform the functions described above and generates output information. The output information is applied to one or more output devices in a known manner. For example, the computer may be a personal computer, a microcomputer or a workstation of conventional design.
Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the programs can be executed in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Each such computer program may be stored on a general or special purpose program-controlled computer-readable storage medium or device (e.g., ROM or magnetic diskette) for configuring and operating the computer when the storage medium or device is read by the computer to perform the steps described herein. The system can also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific predefined manner to perform the functions described herein.
A variety of structural formats for input and output methods may be used to input and output information in the computer-based system of the present invention. One format for the output method detects data sets that have different degrees of similarity to the trusted profile. Such presentation provides the skilled artisan with a ranking of similarity and a degree of differential similarity included in the test patterns.
Storing and transferring data
Further provided herein are methods of storing and/or communicating by computer sequences and other data collected by the methods disclosed herein. Any computer or computer accessory, including but not limited to software and storage devices, may be used to implement the present invention. The user can enter sequences or other data (e.g., transcriptome data) into the computer directly or indirectly. In addition, any device for sequencing DNA or analyzing transcriptome data may be connected to a computer, such that the data is transferred to the computer and/or a computer-compatible storage device. The data may be stored on a computer or on a suitable storage device (e.g., a CD). Data may be sent from one computer to another computer or to a data collection point by methods well known in the art (e.g., internet, ground mail, air mail). Thus, data collected by the methods described herein may be collected at any point or geographic location and sent to any geographic location.
An exemplary method is depicted in fig. 10. In this embodiment, the user provides a sample to the sequencer. Data is collected and/or analyzed by a sequencer connected to a computer. Software on the computer allows data collection and/or analysis. The data may be stored, presented (via a display or other similar device), and/or transmitted to other locations. As shown in fig. 10, the computer is connected to the internet, which can be utilized to transmit data to a handheld device used by a remote user (e.g., a doctor, scientist, or analyst). It should be appreciated that the data may be stored and/or analyzed prior to transmission. In certain embodiments, raw data can be collected and transmitted to a remote user who analyzes and/or stores the data. As shown in fig. 10, the transfer may be over the internet, but may also be over a satellite or other connection. Alternatively, the data may be stored on a computer readable medium (e.g., CD, memory storage device) and the medium may be sent to the end user (e.g., by mail). Remote users may be in the same or different geographic locations including, but not limited to, buildings, cities, states, countries, or continents.
Formulations and kits
Also provided are formulations and kits thereof for carrying out one or more of the methods described above. The formulation and its kit can be very different. The agent of interest includes agents specifically designed to generate an expression profile of the above-described phenotype determining genes. For example, the preparation may include a primer set of genes known to be differentially expressed in the target population or subpopulation (e.g., a preparation for detecting oncogenic breast cancer cells includes primers and probes for amplifying and detecting CD49f, CD24, and/or EPCAM expression).
One class of agents used to generate specific tailoring of the expression profiles of target cell populations and subpopulations is a collection of gene-specific primers designed to selectively amplify such genes for use in quantitative PCR or other quantitative methods. Gene-specific primers and methods of using them are described in U.S. Pat. No.5,994,076, the disclosure of which is incorporated herein by reference. Of particular interest are sets of gene-specific primers having primers for at least 5 genes, typically a plurality of such genes, such as at least 10, 15, 20, 30, 40, 50, 60, 70, 80, 90, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000 genes or more. The gene-specific primer sets may include only primers for genes associated with the target population or subpopulation (e.g., mutations, known misregulated genes, etc.), or they may include primers for additional genes (e.g., housekeeping genes, controls).
The kit of the present invention may comprise the above gene-specific primer set. The kit may further comprise a software package for statistical analysis of one or more phenotypes, and may further comprise a reference database for calculating the likelihood of susceptibility. The kit may comprise preparations for use in a variety of methods, such as primers for generating a target nucleic acid, dntps and/or rntps (which may be premixed or separate), one or more uniquely labeled dntps and/or rntps, such as biotin-labeled or Cy3 or Cy 5-labeled dntps, gold or silver particles with different scattering spectra, or other post-synthetic labeling preparations, such as chemically active derivatives of fluorescent dyes, enzymes, such as reverse transcriptase, DNA polymerase, RNA polymerase, and the like, different buffer media, such as hybridization and wash buffers, pre-formed probe arrays, labeled probe purification preparations and components, such as spin columns (spin columns), and the like, signal generating and detecting preparations, such as streptavidin-alkaline phosphatase conjugates, chemiluminescent or chemiluminescent substrates, and the like.
In addition to the above components, the kit further comprises instructions for performing the method. These instructions may be present in the kit in a variety of forms, one or more of which may be present in the kit. One form in which these instructions may be present is printed information on a suitable medium or substrate, such as in the packaging of the kit, one or more sheets of paper in the package insert on which the information is printed. Other ways may be a computer readable medium such as a floppy disk, a CD, etc. on which information is recorded. Other ways that may exist may be a web address of information that is available through the internet to access a remote location. Any convenient means may be present in the kit.
The analysis methods described above may be included as computer-executable instruction programs to perform various aspects of the present invention. Any of the above-described techniques may be performed by software part means installed in a computer or other information appliance or digital device. When so activated, the computer, appliance or device can then perform the techniques described above to facilitate analysis of sets of values associated with a plurality of genes, or to compare such associated values, in the manner described above. The software portion may be loaded from a fixed medium or accessed via a communication medium such as the internet or other type of computer network. The above features are embodied in one or more computer programs that are executable by one or more computers running such programs.
Examples
The following examples are offered by way of illustration and not limitation.
Example 1: analysis of gene expression in single cells.
A significant fraction of murine mammary CSCs contain relatively low levels of ROS, so it is hypothesized that these cells may express enhanced levels of ROS defense compared to their NTC counterparts.
Single cell gene expression analysis. To perform single cell gene expression experiments, we used a qpcrdynamica microfluidic chip (Fluidigm). Enrichment of MMTV-Wnt-1Thy1 using FACS+CD24+Single cells (TG) of Lin-CSC and "Not Thy1+CD24+"Lin" non-oncogenic (NTG) cells were classified as containing a PCR mixture (Cellsdirect, Invitrogen) and an RNase inhibitor (Su)Peraseln, Invitrogen). After hypotonic lysis, we added a mixture of RT-qPCR enzymes (SuperScript IIIRT/Platinum Taq, Invitrogen) and an aggregate (primers/probes) containing low concentrations of the analyte of interest (Gclm-Mm00514996_ m1, Gs-Mm 00515065_ m1, Foxo1-Mm00490672_ m1, Foxo4-Mm00840140-g1, Hifla Mm00468875_ m1, Epas 1-Mm00438717_ m 1). After pre-amplification for 22 PCR cycles (each cycle: 95 ℃ for 15 seconds; 60 ℃ for 4 minutes), reverse transcription was performed (50 ℃ for 15 minutes, 95 ℃ for 2 minutes). Total RNA controls were performed in parallel. Amplified cDNA obtained from each cell was inserted into the chip sample inlet with Taqman qPCR mix (Applied Biosystems). Separate assays (primers/probes) were inserted into the chip assay inlet (2 replicates each). The chips were loaded into a chip loader (Nanoflex, Fluidigm) for 1 hour, and then moved to a reader (Biomark, Fluidigm) for thermal cycling and fluorescence quantification. To remove low quality gene analysis, we abandoned gene analysis where the qPCR curve shows a non-exponential increase. To remove low quality cells (e.g., dead cells), we discarded cells that did not express the housekeeping genes Actb (β actin) and Hprt1 (hypoxanthine guanine phosphoribosyltransferase 1). This resulted in a single cell gene expression dataset consisting of 248 (109 oncogenic and 139 non-oncogenic) cells from all 7 chip types (chip-run). Two sample Kolmogorov-Smirnov (K-S) statistics were calculated to test whether the genes in the two populations were differentially expressed. By transforming the sample markers (i.e., TG versus NTG) and comparing those actual K-S statistics from the transformed zero distribution, we generate p-values. The p-value was further corrected by Bonferroni correction (Bonferroni correction) to adjust the multi-hypothesis test. Example 2 analysis and quantification of human "colorectal cancer stem cells" (Co-CSC) using SINCE-PCR is a new approach based on "single cell gene expression analysis".
The SINCE-PCR method allows the identification, characterization and quantification of "cancer stem cells" in human colorectal cancer tissue, with a degree of purity and resolution that was previously unachievable. Cancer stem cells, which may be carcinogenic or tumor initiating cells, are a subpopulation of cancer cells that have the ability to form tumors when transplanted into immunodeficient mice. Cancer stem cell populations are currently identified in breast, brain, head and neck, pancreas and colon cancers. The precise functional definition and quantification of "cancer stem cells" has several important implications for the diagnosis, prognosis, classification and therapeutic targeting of human cancers.
We describe a new method for identifying, analyzing and quantifying "cancer stem cells" in human colorectal cancer tissue, based on single cell gene expression analysis using real-time polymerase chain reaction (real-time PCR). We have identified a new group of genes whose synergistic and differential expression can be used as "markers" to identify different subsets of cancer cells in the same tumor tissue. This new set of genes includes all the housekeeping genes common to epithelial cells (EpCAM, β actin, GAPDH), genes biologically relevant to stem cells (hTERT, LGR5, survivin), and genes involved in the different cell lineage-associated tissue-specific differentiation pathways (carbonic anhydrase II, MUC2, trefoil factor 3) and differentiation stages (cytokeratin 20, CD66a/CEACAM1) of normal colonic epithelium. Based on the expression pattern of this set of genes, epithelial cells purified from human colorectal cancer tissue and individually analyzed as single cells can be "classified" and clustered into different groups corresponding to more or less advanced stages of differentiation (e.g., cells differentiated at the top end of a human colonic crypt versus less mature cells at the bottom of a human colonic crypt), and different lineages of differentiation of the colonic epithelium (e.g., goblet cells, intestinal cells, immature cells). Each group can be quantified as a percentage of the entire population. We have named this method and methodology for analyzing cellular components of biological tissues as "SINCR-PCR" (single cell expression-polymerase chain reaction).
Our findings are based on several observations. Human "colorectal cancer stem cells" directly enriched by flow cytometry from freshly collected solid tumor tissue were reproducible and robustly analyzed at the single cell level (fig. 1).
Single cell gene expression analysis using real-time PCR in human colon cancer xenografts indicated EpCAM-/CD44+And EpCAM-/VCD166+Cancer cells can be further sub-classified into different cell subtypes characterized by consistent and differential expression of different sets of genes involved in stem cell biology and differentiation, EpCAM-/CD44+And EpCAM-/VCD166+Cancer cells are known to be enriched in a "colorectal cancer stem cell" population. More interestingly, cell subtypes displaying high levels of genes encoding known terminal differentiation markers of colonic epithelium (e.g., cytokeratin 20, CD66a/CEACAM1 carbonic anhydrase II, MUC2, trefoil factor 3) do not express or express low levels of genes encoding candidate intestinal stem cell markers or genes necessary for known stem cell function (e.g., hTERT, LGR5, survivin), and vice versa. This indicates that EpCAM/CD44+/CD166+Cancer cells contain different cell subtypes characterized by different differentiation stages (fig. 2).
CD44 when purified by Fluorescence Activated Cell Sorting (FACS) and reinjected into immunodeficient NOD/SCID mice+/CD66a+And CD44+the/CD 66 anewl' cells display essentially different oncogenic properties, of which CD44+/CD66aneglowThe population appeared to be the one with the highest carcinogenic potential (table 4). This indicates that, in EpCAM+/CD44+Cell subsets of the cell population characterized by high expression of genes encoding differentiation markers such as CD66a/CEACAMI (i.e., more "mature" cell subsets) are often relatively deficient in oncogenic capacity. On the other hand, cell subtypes characterized by a lack or low level of expression of differentiation markers such as CD66a/CEACAMI (i.e., more "immature" cell subtypes) are enriched in the "colorectal cancer stem cell" content.
Table 1, oncogenic properties of human colon cancer cells based on CD66a/CEACAMI expression in combination with EpCAM and/or CD 44.
a for each experiment, the following reports of serial tumor allografts used as a source of purified cancer cellsIn vivo passage: m1 represents the first round of tumors obtained from the primary tumor graft, m2 is the second round of tumors obtained from the m1 graft, m3 is the third round of tumors obtained from the m2 graft, and so on; the initial representation is of the primary tumor obtained directly from the surgical specimen.bAll the classified populations were considered lineage marker negative (Lin)neg) In the case of human tumor xenografts established in NOD/SCID mice, which include mouse CD45 and mouse H2-Kd, and in the case of primary human tumors (in this case EpCAM as a positive epithelial selection marker), which include human CD3 and CD 45.CTumor survival is reported as: number of tumors obtained/number injected; after 5 weeks no tumor mass was seen suggesting unsuccessful tumor survival.
Example 2: generation and imaging of human breast cancer xenograft models with lung metastasis
Patient-derived breast cancer samples (either bulk or TIC) were orthotopically transplanted to the mammary fat pad of NOD-SCID mice. Generation of 6 xenograft tumor models (1 ER)+1 Her2+4 of the 3-negative ER-PR-Her 2-). All 4 of the 3-negative allografts developed spontaneous lung micrometastases, which were stained by IHC (i.e. H)&E) Proliferation markers Ki67 and Vimentin (Vimentin, Vim) staining. These data indicate that breast tumor cells or TICs, when transplanted into immunodeficient mice, are able to adapt to the mouse microenvironment to recapitulate human tumor growth and progression with spontaneous lung metastases.
To aid in the kinetic and semi-quantitative imaging of human breast cancer and metastasis in mice, mammary TIC was transduced with the firefly luciferase-EGFP fusion gene using pHRuKFG lentivirus (moi50) 4 days post-transplantation, and TIC at the initial site was detectable with a weak bioluminescent signal. And after one month both primary tumors (at the breast fat pad of L4 and R2) and lung metastases could be detected and imaged using the Xenogen IVIS200 system located at the Stanford's small animal imaging center. We observed a good correlation of tumor size or cell number with signal intensity. Generation and bioluminescence imaging of xenograft tumors with metastasis provided us with a cheerful feasibility to demonstrate miRNA function in human breast cancer MTIC in this proposal.
Example 3: microarray and real-time PCR analysis of human mammary MTIC
Isolation of human Primary Breast tumorigenic cells (TIC) or Metastatic TIC (MTIC) (CD 44) from Breast cancer initiation site or pleural effusion+CD24-/lowESA+Pedigree-). Once lung metastasis was detected in the xenograft model, lungs were split with Blenzyme (Roche) and cells were stained with mouse H2K and human CD44, CD24 and ESA to purify MTIC populations (CD 44)+CD24-/lowESA+H2K-Figure 8a), which grew orthotopic tumors at a rate of 200-1000 sorted cells 5/8 after transplantation into the mouse mammary fat pad.
As shown by microarray analysis and real-time PCR, HIF1 α and HIF1 regulated target genes were differentially expressed in MTIC compared to non-oncogenic tumor cells including Snail, Zeb2, vimentin, epithelial cadherin, Lox, Cox2, VEGF, etc. (fig. 8B). Colocalization of HIF1 α, vimentin, and CD44 was confirmed using immunohistochemical staining.
Example 4: micro RNA analysis
Differential expression profiles of parental breast cancer stem cells and metastatic cancer cells isolated from the lung were identified by microrna screening. For example, there is higher miR-10a expression and lower miR-490, miR-199a, etc. levels in lung MTIC compared to primary mammary TIC. As shown in fig. 4 by the 3-replicate real-time PCR, the mean CT values of lung MTIC versus primary TIC were compared: miR-10a (-7.9), miR-490(+3.0) and miR-199a (+12.9). NR3 is used as internal control. The data show that miR-10a in MTIC is up-regulated by 27.9 times and miR-199a is down-regulated by 212.9 times compared with primary TIC of breast cancer.
Example 5: CD66a as a non-oncogenic cancer cell marker for breast cancer
When the majority of cells were CD244bw, breast cancer cells were classified based on CD44 and CD66 a. Cells were then transplanted to NMammary fat pad of OD/SCID mice and tumor growth was monitored. As shown in FIG. 5, using bioluminescence imaging, CD44/CD66 a-cells showed high engraftment and high growth rates. CD66+The cells showed a lower and delayed tumor growth rate, the tumor size was much smaller, and showed a similar flow profile compared to the CD 66-derived tumor.
In fig. 5a, the flow profile is represented based on CD44 and CD66a markers. CD66-CD44+And CD66+CD44+Cells were classified for in vivo carcinogenesis analysis (100 cells or 1000 cells transplanted Zid or 41h into mammary fat pad of NOD/SLID mice). As shown in 10b, 5 out of 8 grafts from 100 CD 66-cells grew tumors and 100 CD66 cells+2 of 8 of the cell sources were grown. For 1000 cells, 8 of the 8 derived from CD 66-cell injection grew but from CD66+Only 3 out of 8 of the cell sources grew tumors. CD66 compared to the growth rate of palpable tumors+The cells were of lower and smaller size than those from CD66- (FIG. 5 c). In FIG. 5d, CD66-CD44 of 100K+Or CD66+CD44+Cells were infected with firefly luciferase-EGFP lentivirus prior to injection. From CD66+The bioluminescent signals of the cells were higher than those from the starting CD 66-cells (day 13). However, after 1 or 2 months, CD66 a-cells showed a dominant bioluminescent signal and eventually developed palpable tumors (day 68).
Example 6: optimization of gene lists for identifying and measuring cancer stem cell frequency
Most markers used to identify both normal stem cells and cancer stem cells at this time are not associated with important stem cell function. Their expression is associated with the particular microenvironment in which the stem cells are present upon isolation. Thus, the utility of common markers for identifying stem cells may vary depending on where they are collected.
Our method has identified markers of important stem cell function. Since self-renewal is a characteristic of typical stem cells, we focus our efforts on the self-renewal pathway. In each respective tissue, we identified a number of genes that are highly expressed by normal HSCs, progenitor cell-derived leukemia stem cells, and human epithelial cancer stem cells, but not by non-self-renewing cells. This genomic analysis, described in the initial results, identified a large number of genes previously thought to be associated with stem cell self-renewal. Similarly, we identified candidate micrornas that are differentially expressed by breast cancer stem cells and non-oncogenic cancer cells. Evidence demonstrates that several of these genes and micrornas have important stem cell functions, and that the function of these genes is also important for hESC and iPSC self-renewal and maintenance.
In order to manufacture a device capable of measuring the frequency of cancer stem cells in a tumor cell population, it is necessary to optimize a gene list for identifying cancer stem cells. As shown in fig. 1B, we produced a tremendous advancement in this regard, identifying telomerase as a cancer stem cell marker as well as several genes associated with the self-renewal process. TERT, the telomerase component, is expressed only in colon cancer cells with immature phenotypes. Furthermore, TERT cannot be effectively down-regulated with the differentiation of some hESC and iPSC lines.
Both normal and cancerous colonic epithelial cells were analyzed for expression of genes associated with crypt cell maturation and self-renewal. The self-renewing gene list extends beyond TERT to maximize the confidence that the cells are stem cells. The expression of genes identified in our analysis of normal and cancer stem cells was measured. Because cancer stem cells can potentially be from progenitors that escape expansion limitation or from a counting mechanism that limits the number of mitoses they can undergo, candidate genes are those expressed by normal murine HSCs, murine leukemic stem cells derived from progenitors and human breast cancer stem cells. The uppermost candidate genes identified in the table that are associated with stem cell maintenance include BMI1, -IDI, IGFBP3, HOX family members HOXA3, HOXA5, MEIS1, ETS1, ETS2, RUNX2, and STAT 3. We were to confirm which of these genes are involved in cancer stem cell self-renewal. To this end, we will systematically test the role of our candidate genes in cancer stem cell self-renewal using in vitro and in vivo techniques.
Expression of genes regulating self-renewal is associated with epithelial cell-specific gene expression, including maturation markers such as keratin and intestinal mucin. Which will allow the determination that the cells under analysis are not normal cellular contaminants in the biopsy sample. Mutations in tumor suppressor genes whose expression is down-regulated by the self-renewal gene BMI allow early progenitor cell self-renewal. These genes are often mutated in colon cancer, and thus self-renewing colon cancer stem cells are derived from both normal stem cells and early colon progenitor cells. Moreover, oncogenic mutations will alter gene expression by colon cancer cells. Thus, there is a difference in the expression of at least some of the genes associated with early crypt maturation between normal colonic epithelial stem cells and their malignant counterparts, which enables the two self-renewing cell populations to be distinguished from each other.
We identified 37 differentially expressed miRNAS in cancer stem cells and non-oncogenic cancer cells. Some miRNA clusters are down-regulated in normal tissue stem cells and not in cancer stem cells; moreover, expression of some mirnas, such as miR-200c and miR-183, inhibited the growth of embryonic cancer cells in vitro, abrogated their tumor-forming ability in vivo, and inhibited the colony formation of breast cancer cells in vitro. These mirnas and other clusters we identified provide molecular links connecting breast cancer stem cells with normal stem cell biology. Expression of these micrornas, which can be continuously up-or down-regulated in oncogenic cells, was probed in single cells from undifferentiated and differentiated hescs and ipscs. In essence, undifferentiated cells are classified by cell surface markers distinct from pluripotent stem cells such as the Tra and SSEA subtypes, evaluated for miRNA expression, efficiency in reprogramming and population parameters (results of teratoma analysis in the form of embryonal carcinoma, mixed embryonal carcinoma/differentiated cell index (% EC-to-differentiated) and differentiated cells). 28 days after differentiation, differentiated embryonic stem cell populations were obtained from the embryoid body products and classified by positive and negative selection for SSENTRA markers. We will detect single cells in the taxonomic group: 1) microrna profile indicative of cancer stem cells 2) gene expression profile (below), and 3) results of transplantation/teratoma analysis. We expect that the "anti-differentiated" cells in these populations will form malignant embryonic carcinoma derivatives and co-express markers of differentiated and undifferentiated cells in single cells.
Example 7: gene expression profiling at the Single cell level
In the pluripotent cell population, we observed cell lines that failed to down-regulate key oncogenic markers such as TERT even after 21 days of differentiation (see fig. 6). In addition, we observed that approximately 50% of our iPSC lines were unable to down-regulate exogenous and endogenous pluripotency markers in the differentiated state. In essence, this suggests that we predict a "molecular war" (molecular war) between differentiation and self-renewal that is the result of carcinogenic predisposition. We will optimize the gene list for identifying malignant cells in hESC and iPSC cell cultures by: 1) analysis of genes overexpressed in EC (embryonal tumor) cells relative to undifferentiated hESC and IPSC and human embryonic blastomeres, 2) cross-reference the gene list to include those from Aim 1 (to identify cancer stem cells), and 3) addition of genes to differentiated somatic and germ lines (the latter retaining genes against differentiated pluripotent stem cells). We will then use the immunodeficient mouse experiment to assess the oncogenic potential of the subpopulation based on the potential for malignancy based on single cell baseline gene expression diagnosis.
And (4) CNV analysis. Chromosomal variations are associated with instability in the pluripotent human stem cell population, with commonly observed chromosomal deletions and gains. However, few studies have discussed fine-structure, high-throughput methods to assess copy number at multiple sites. We propose to adopt our technique to evaluate genome-wide CNV numbers in independently derived pluripotent stem cell lines; changes in CNV reflect sub-chromosomal instability. Initially, we designed specific probe sets for addition to our gene/locus list that identify repeats across the genome, including those previously observed by our laboratory (fig. 6). In its original design, SCAD can accommodate analysis of up to 1000 markers. CNV analysis is commercially available and can be associated with instability of the genome in hescslipscs.
Example 8: automated apparatus for designing and identifying and quantifying cancer stem cells
Automated equipment was designed to identify cancer stem cells and calculate their frequency in tumors based on a combination of cell surface phenotype and gene expression. Cells with malignant potential are identified based on the co-expression of markers in differentiated and undifferentiated states in single cells using a similar strategy by using optimized marker/genetic analysis as described herein. The device will make a single cell suspension of embryoid bodies or tumor needle biopsies, isolate cell subsets (epithelial, differentiated, undifferentiated), then perform qRT-PCR of hundreds to thousands of single cells, and measure the stem cell content of the tumor or pluripotent cell culture. Such a fully automated device would eliminate the labor intensive steps currently required for cancer stem cell flow cytometric classification, allowing for a truly hands-free (hand-off), bedside diagnostic tool that requires less than 100,000 cells to isolate sufficient cancer cells for PCR analysis to quantify cancer stem cells. The automation, availability and low cost of microfluidic chip technology will enable individualized, rapid genetic diagnosis.
The core of the system is a microfluidic cell sorter. The device separates living cells (epithelial cells or cultured pluripotent cells or their products) from debris (necrotic cells and other particles), sorts the cells from single cell suspensions using fluorescent signals from up to 5 different surface markers, and places them in separate small boxes for subsequent genetic studies. Other upstream steps, such as digestion of the tumor or cell culture to obtain a cell suspension and staining of the cells with fluorescent surface markers, may also be included in the present system. How the system was used for tumor analysis is as follows: once a biopsy sample is obtained, the physician will place the sample on the input portion of the system. By using a user-friendly computer interface, the physician will set up the necessary parameters for classification and genetic analysis such as the number and type of surface markers, the number of PCR cycles required, etc., and the machine will perform the remaining steps without human intervention. Based on previously demonstrated techniques, the system would allow a sorting throughput of at least 30 cells/second.
Single Cell Analysis Device (SCAD) can be modular (fig. 7) and perform the following steps in an overall fully automated way: 1) digestion of tissues: tissue is placed at the device input. Appropriate enzymes are introduced into the device and flowed in to effect digestion of the extracellular matrix to obtain a cell suspension. 2) Separation of viable cells from debris: the suspension typically contains viable cells with an average size of 10 to 15 microns, while the debris material has an average size of about 5 microns. Sometimes the amount of dead material is relatively high compared to live cells, so filtering out dead material is important for efficient cell separation. We accomplish this by flowing the digested tissue suspension into a microfluidic "metamaterial" that allows the fluid flow to be split according to particle size. 3) Dyeing: the filtered single cell suspension was stained by using appropriate surface markers at different intervals in the microfluidic device. Staining with up to 5 different markers is useful in obtaining a high purity cancer cell population. 4) And (4) classification: the stained single cell suspension flows into the next compartment of the microfluidic device to sort out cancer cells from the remaining cells. Poisson statistics and monte carlo simulations indicate that only 2,000 to 20,000 cancer cells need to be classified within 99% confidence levels to be able to detect 2-fold changes in cancer stem cells. Currently, such small numbers of cells cannot be efficiently sorted using flow cytometry because the initial sample size required for FACS is approximately 100 ten thousand cells. We achieve this using microfluidics based classification in an indefinitely circularly sealed, isolated small volume environment that does not waste cells.
Fluid-based microfluidic cell sorter: microfluidic cytometers with throughput approaching 50 cells/sec have been demonstrated where cells are flowed through a laser beam at high speed (see Di Carlo et al, Lab Chip 2006; 6: 1445-. Faster electronics and more efficient imaging devices allow for improved throughput by orders of magnitude, which reduces the sorting time to less than 10 minutes.
And (3) parallel classification: cell sorters were developed based on capturing cells on a dense, 2D array of microfluidic compartments that can be addressed independently (fig. 7B and 7C above). Cells are flowed into the sorter array and captured by microfabricated cages. Such cages have previously been demonstrated to have a single cell capture efficiency in free flowing suspensions of over 50% (Di Carlo et al, supra). When all cages are filled, the microfluidic valves are closed and custom designed computer controlled light is used to identify all 5 fluorescent color imaging arrays needed for oncogenic cells. The new chip also allows phase contrast imaging, which may prove useful for studying cell morphology. The identified oncogenic cells are flowed into the next module for lysis, while the remaining cells are flowed out of the chip. This new cell sorter allows working with very small initial cell numbers, since cells can be cycled many times and are therefore not wasted. Current microfluidic chip technology allows us to place nearly 10,000 of these components on a 3x 3cm area, which can be rapidly interrogated (single shot) using state-of-the-art imagers such as the one used by the Fluidigm Biomark system. This cell sorter has a throughput of approximately 30 cells/second. One advantage of using a parallel sorting apparatus, as opposed to a fluid-based cell sorter, is that sorting and imaging during PCR can be performed by the same imager, thus allowing us to correlate fluorescence and morphological data with genetic data of individual cells.
Cell lysis and mRNA capture: the classified cancer stem cells are flowed into the next module for lysis in a separate chamber. mRNA can be captured on a column containing oligo-dT beads, reverse transcribed on the beads as generally described (Marcus et al, Anal Chem 2006; 78: 3084-: current protocols are followed by reagents to pre-amplify the gene set. The pre-amplified samples were transferred to a module similar to the FluidigmDynamic array chip for qRT-PCR and to determine the true cancer stem cell content.
Based on the analysis of normal breast and blood stem cells as well as colon, head and neck and breast cancer stem cells, we identified a new single cell assay that for the first time made it possible to accurately and unequivocally identify and enumerate cancer stem cells in biopsy samples and cultured pluripotent stem cell populations. As proof of principle, we used this assay for the analysis of single colon cancer cells. To this end, we used FACS to establish an early passage allograft classification CD66 from two different patients+CD44 binds intestinal cancer cells. These markers allow for approximately 3-5 fold enrichment of colon cancer stem cells (CoCSCs) in tumors. We speculate that cancer cells isolated with these markers are only partially enriched for coccs. Single cell gene expression analysis and subsequent carcinogenicity studies demonstrated the fact that CD66 was indeed present+CD44+Lineage cells are a mixture of CoCSC and non-carcinogenic cells, and this assay can be used to more accurately identify the frequency of CoCSC in biopsy samples. Single cell analysis revealed a hierarchical developmental structure of colon cancer cells that is suggestive of normal colon crypts. Notably, we found that most immature cells in colon tumors express TERT, a component of the telomerase complex that is important for long-term maintenance of tumors. The expression of LGR5, which is a marker for normal colon stem cells, is also restricted to immature cells. In contrast, genes expressed by mature colon crypt cells (including MUC2,' CK20, CA-2, and particularly CD66a) were not co-expressed with immature cell markers, most notably TERT. This suggests that these cells, like normal mature epithelial crypt cells, are limited in their ability to undergo extensive mitosis. In fact, we have already exchanged CD66a+(differentiated colon cancer cells) and CD66 a' colon cancer cells were transplanted into immunodeficient mice. CD66 a' cells formed tumors (5 out of 6 injections) while CD66+Cells were not formed (0 out of 5 injections). Similarly, in the 2 human breast cancer tumors tested, CD66ew cells were enriched for cancer stem cells when tested in an immunodeficient mouse model. These results demonstrate that single cell gene expression analysis is viableIdentification and quantification of cancer stem cells in tissue examination samples and cultures is possible.
Example 9: gene expression markers shared by normal stem cells and cancer stem cells in both blood and mammary epithelial tissues
In recent years it has become clear that cancer stem cells can come from different cell compartments. Some may be from mutant stem cells that have lost the limitation of stem cell pool expansion. Other more differentiated early progenitor cells from the loss of the counting mechanism that normally limits the number of mitoses they can undergo. Of course, many markers of cancer or leukemia stem cells from stem or progenitor cells are different. However, regardless of the source cell, the stem cell will retain the ability to self-renew. We believe that the reason may be that some pathways regulating self-renewal in cancer stem cells from stem cell compartmentalized or partially differentiated progeny are shared by them with each other and with normal HSCs. To test this hypothesis, we analyzed whether genes expressed by normal mouse HSCs and murine leukemia stem cells from progenitors (i.e., self-renewing populations) but not from normal progenitors (i.e., non-self-renewing populations) were also expressed by human breast cancer stem cells but not their non-oncogenic counterparts. Notably, human cancer stem cells, but not their non-oncogenic counterparts, overexpress these genes (fig. 8). We also generated a list of 2 other genes to identify candidates for other potentials: i) genes expressed by breast cancer stem cells and normal breast stem cells, but not by non-oncogenic cancer cells or mature mammary epithelial progenitor cells; ii) genes expressed by normal human HSCs and human breast cancer stem cells, but not by human blood progenitor cells or non-oncogenic breast cells.
Many of these genes are associated with self-renewal and cancer. These include the insulin growth factor binding partner IGFBP3, HOX family members HOXA3, HOXA5, ME1S1, and transcription factors such as ETS1, ETS2, RUNX2, and STAT 3. It was tested whether the transcription factor STAT3 is indeed a cancer stem cell regulator. STAT3 plays a role in the maintenance of both ES and HSC cells. Genetic analysis of both mouse and human breast cancer stem cells revealed that many STAT 3-activated transcripts were overexpressed by cancer stem cells. Second, STAT3 positive cells, which tend to concentrate at the edges of cancer and protein invasion, are not found in the seemingly more differentiated cells inside the tumor when we examine immunochemical analysis of breast tumors. Finally, there are small molecule inhibitors of STAT 3. Such inhibitors can be detected in a cancer stem cell model. The effect of the STAT3 inhibitor cucurbitacin on clonogenic capacity of murine breast cancer stem cells was examined. Briefly (a short), 24 hours of exposure to the inhibitor reduced the number of clones by-50% (p < 0.02, t-test). These results indicate that STAT3 plays an important role in at least some breast cancer stem cells.
The second gene of interest is MEIS 1. MEIS1 is preferably expressed by normal blood and breast stem cells, leukemia stem cells, and breast cancer stem cells. Genetic studies have shown that expression of MEIS1 is absolutely required for self-renewal and maintenance of normal stem cells and their leukemic counterparts. MEIS1 can regulate the turnover of breast cancer stem cells.
Candidate genes of particular interest for expression in both normal and cancer stem cells include CAV1, GAS1, MAP4K4 (kinase) MYLK (kinase), PTK2 (kinase), DAPK1 (kinase), LATS (kinase), FOSL2, AKT3 (kinase), PTPRC (tyrosine sulfatase), MAFF (oncogene), RRAS2 (RAS-related), NFKB, ROBO1, IL6ST (activating STAT3), CR1M1, PLS3, SOX2, CXCL14, ETS1, ETS2, MEIS1 and STAT3, and CD 47. Target candidate genes overexpressed by cancer stem cells but not normal stem cells include RGS4, CAV2, MAF (oncogene) WT1 (oncogene), SNAI2, FGFR2, MEIS2, 101, 103, ID4, and FOXC 1.
Example 10: whole transcriptome analysis of hematopoietic Stem cells
In this example, we sought a transcriptome analysis using hematopoietic stem cells. A general overview of this embodiment is shown in fig. 9. In this example, a population of cells suspected of comprising hematopoietic stem cells is isolated from a test subject. Cells are then prepared for FACS analysis by exposing the cell population to fluorescent antibodies to known hematopoietic stem cell markers (e.g., CD34, Thy1, etc.). Cells were sorted into 96-well plates so that each well contained no more than one single cell.
The isolated single cells were lysed and the lysate was divided into 2 fractions. The first part was used for single cell gene expression analysis using real-time PCR, essentially as described in example 1, by using a selection of genes that allows differentiation between HSC and non-HSC according to the level or presence of expression (e.g. CD34+, CD19-, CD 17-). After HSCs in the population are identified, lysates from single cells identified as HSCs are pooled. The cDNA library was created by amplifying total mRNA using standard methods. The cDNA is then sequenced using any of the "next generation" methods as those described in the text. The sequenced transcriptome is then analyzed to determine the presence or absence of unique genes and/or surface markers.
Following identification of surface markers unique to HSCs, antibodies that specifically bind to the surface markers are prepared using commercially available techniques. The specificity and effectiveness of the antibody (e.g., binding to isolated and/or recombinant proteins) is confirmed. The antibody is then labeled with a fluorescent moiety. FACS sorting and/or analysis can then be performed on other cell populations (e.g., from the same or different subjects) using antibodies against the newly discovered surface antigens.
Example 11: analysis of therapeutic agents
In this example, selection of a candidate therapeutic agent is performed. Target cells, such as colon cancer stem cells and colon cancer cells (differentiated) were isolated and analyzed at the single cell level, as described above. Target cells are isolated from the biopsy sample using previously identified target cell-specific markers (e.g., FACS isolation using fluorescent markers of target cell-specific antibodies and/or target cell-specific nucleic acids).
The target cell is isolated to an addressable location containing a single cell. The isolated cells are then exposed to a pool of candidate therapeutic agents (e.g., antibodies, toxin-linked antibodies, small molecules). The cells are then harvested and analyzed for gene expression patterns and/or cell viability. Successful candidate therapeutic agents may be those that target cells to die. Alternatively, a candidate therapeutic agent may alter the expression of a gene known to be misregulated (e.g., up-regulated or down-regulated) as compared to the expression pattern from cells that are not disease states. Exposure of target cells to candidate therapeutic agents can cause changes in the expression pattern of the nucleic acid that more closely resemble the pattern of normal (i.e., non-disease state) cells. Candidate therapeutic agents that are expected to kill or alter the target cells are then exposed to normal cells to determine their potential use as therapeutic agents (e.g., if the candidate agent kills the target cells and normal cells, then it is excluded as a potential agent for use).
Claims (45)
1. A method of identifying distinct cell populations in a heterogeneous solid tumor sample, comprising:
randomly segregating individual cells from the tumor into discrete locations;
performing transcriptome analysis of a plurality of genes of the individually partitioned cells in separate locations; and
a cluster analysis is performed to identify one or more distinct cell populations.
2. The method of claim 1, wherein the individual cells are not enriched prior to the partitioning.
3. The method of claim 1, wherein said transcriptome analysis is performed simultaneously on at least 1000 individual cells.
4. The method of claim 1, wherein said transcriptome analysis is performed using a nucleic acid assay.
5. The method of claim 1, wherein the discrete locations are on a planar substrate.
6. The method of claim 1, wherein said randomly partitioning is performed in a microfluidic system.
7. The method of claim 1, wherein said transcriptome analysis comprises analysis of expressed RNA, non-expressed RNA, or both.
8. The method of claim 1, wherein said transcriptome analysis is a whole transcriptome analysis.
9. The method of claim 1, wherein said transcriptome analysis comprises amplification of RNA using a single primer set.
10. The method of claim 9, wherein the primer pair is a non-nested primer.
11. The method of claim 1, wherein the transcriptome analysis is performed on all or a subset of the individual cells simultaneously or in substantially real time.
12. The method of claim 1, wherein the one or more cell populations are normal stem cells, normal progenitor cells, normal mature cells, inflammatory cells, cancer stem cells, or non-carcinogenic stem cells.
13. A method of analyzing a heterogeneous tumor biopsy sample from a subject, comprising:
randomly segregating cells from the biopsy sample into discrete locations;
performing transcriptome analysis on at least 50 genes of the individually separated cells; and
using transcriptome data to identify one or more characteristics of the tumor.
14. The method of claim 13, wherein the performing step is accomplished without prior enrichment of cell types.
15. The method of claim 13, wherein a characteristic is the presence, absence or number of cancer cells.
16. The method of claim 13, wherein the characteristic is the presence, absence or number of stem cells, early progenitor cells, initially differentiated progenitor cells, later differentiated progenitor cells or mature cells.
17. The method of claim 13, wherein the characteristic is the effectiveness of the therapeutic agent to eliminate one or more cells.
18. The method of claim 13, further comprising using the feature to diagnose the subject with cancer or a stage of cancer.
19. The method of claim 13, wherein the characteristic is activity of a signal path.
20. The method of claim 19, wherein the signaling pathway is specific for cancer stem cells, differentiated cancer cells, mature cancer cells, or a combination thereof.
21. A method of identifying a signaling pathway used by a disease state cell, comprising:
randomly partitioning cells from a heterogeneous sample;
performing transcriptome analysis on the separated cells;
identifying at least one disease state cell using a transcriptome analysis;
comparing the transcriptome analysis of the at least one disease state cell to the transcriptomes of:
a) a non-disease state cell;
b) different disease state cells; and
c) disease state stem cells; and
identifying the signaling pathway expressed in (i) the disease state cell, (ii) the disease state stem cell, and (iii) optionally a different disease state cell but not a non-disease state cell, thereby identifying the signaling pathway used by the disease state cell.
22. The method of claim 21, wherein the disease state is cancer, ulcerative colitis or inflammatory bowel disease.
23. The method of claim 21, wherein said signaling pathway is required for survival of said disease state cell.
24. A method of diagnosing a subject with a condition, comprising,
randomly partitioning cells from a heterogeneous sample;
performing a first transcriptome analysis on the partitioned cells;
identifying at least one disease state cell using a transcriptome analysis by comparing the first transcriptome analysis from at least one disease state cell with a second transcriptome analysis from a non-disease state cell, thereby diagnosing the presence or absence of a condition associated with the disease state cell in the subject.
25. The method of claim 24, wherein the disease state is breast cancer, colon cancer, ulcerative colitis, or inflammatory bowel disease.
26. The method of claim 21, wherein the transcriptome analysis comprises analyzing expressed RNA, non-expressed RNA, or both.
27. The method of claim 21, wherein the transcriptome analysis is a whole transcriptome analysis.
28. The method of claim 21, wherein the disease state cell is a breast cancer stem cell.
29. A method of screening for a therapeutic agent comprising:
exposing a first subject having disease state cells to one or more detection reagents;
obtaining a heterogeneous tumor biopsy sample from a target region of a subject;
performing transcriptome analysis on at least one individual cell from the heterogeneous tumor biopsy sample,
wherein the biopsy sample comprises one or more disease state cells; and
comparing the transcriptome analysis to a transcriptome from:
(i) a second subject without disease state cells; or
(ii) A first subject prior to the exposing step; and
identifying agents that affect the transcriptome of cells from the test region to make them more like those of the second subject or the first subject prior to exposure.
30. A method of determining the potential effectiveness of a therapeutic agent to treat a disease, comprising:
isolating a first population of disease state cells to a single location, wherein the single location comprises a single cell;
determining the expression level of at least one nucleic acid or protein from at least one of said individual cells, thereby producing a disease state expression marker;
exposing a second population of disease state cells to an agent;
isolating a second population of the disease state cells into individual locations, wherein the individual locations comprise an individual cell;
determining the expression level of at least one nucleic acid or protein from at least one of the individual cells of the second population; and
comparing the expression level of the individual cells from the second population to the disease state expression marker, thereby determining the effectiveness of the agent in treating the disease.
31. The method of claim 30, wherein the exposing step is performed in vivo.
32. The method of claim 30, wherein said first population and said second population are isolated from a subject.
33. The method of claim 32, wherein the subject is a human.
34. The method of claim 30, wherein the disease is cancer, ulcerative colitis or inflammatory bowel disease.
35. The method of claim 30, wherein the nucleic acid or the protein is a cancer stem cell marker.
36. The method of claim 30, wherein the expression level is an mRNA expression level.
37. The method of claim 33, wherein determining the mRNA expression level comprises detecting the expression or non-expression of 10 or more nucleic acids.
38. The method of claim 30, wherein the expression level is a protein expression level.
39. The method of claim 30, wherein the isolating step comprises exposing the population of cells to an antibody that specifically binds to a protein present on the individual cells.
40. A method of determining the likelihood of a subject responding to a therapeutic agent, comprising:
isolating a population of cells from a subject to individual locations, wherein the individual locations comprise individual cells and wherein at least one of the individual cells is a disease state cell;
determining the expression level of at least one nucleic acid or protein from individual cells of at least one of the disease states, wherein the nucleic acid or protein is a target for a therapeutic agent; and
determining a likelihood of a subject's response based on the expression level of the at least one nucleic acid or protein.
41. The method of claim 40, wherein the expression level is an mRNA expression level.
42. The method of claim 41, wherein determining the mRNA expression level comprises detecting expression or non-expression of 10 or more nucleic acids.
43. The method of claim 42, wherein the expression level is a protein expression level.
44. The method of claim 42, wherein the isolating step comprises exposing the population of cells to an antibody that specifically binds to a protein present on the individual cells.
45. The method of claim 40, wherein the therapeutic agent is an anti-cancer agent.
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US61/205,485 | 2009-01-20 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| HK1164945A true HK1164945A (en) | 2012-09-28 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP2389453B1 (en) | Single cell gene expression for diagnosis, prognosis and identification of drug targets | |
| US9850483B2 (en) | Methods and systems for analysis of single cells | |
| Powell et al. | Single cell profiling of circulating tumor cells: transcriptional heterogeneity and diversity from breast cancer cell lines | |
| Alunni-Fabbroni et al. | Circulating tumour cells in clinical practice: Methods of detection and possible characterization | |
| Zhu et al. | Advances in single-cell RNA sequencing and its applications in cancer research | |
| US8232071B2 (en) | In vitro depletion of acute myeloid leukemia stem cells | |
| Azevedo et al. | Circulating tumor cells in bladder cancer: Emerging technologies and clinical implications foreseeing precision oncology | |
| Magbanua et al. | Advances in genomic characterization of circulating tumor cells | |
| Yang et al. | Circulating tumor cells in neuroblastoma: Current status and future perspectives | |
| Chen et al. | Single-cell technologies in multiple myeloma: new insights into disease pathogenesis and translational implications | |
| Stoecklein et al. | Clinical application of circulating tumor cells | |
| Dupont et al. | A gene expression signature associated with metastatic cells in effusions of breast carcinoma patients | |
| US20220002814A1 (en) | Gene expression profiles for b-cell lymphoma and uses thereof | |
| AU2015202186B2 (en) | Single cell gene expression for diagnosis, prognosis and identification of drug targets | |
| HK1164945A (en) | Single cell gene expression for diagnosis, prognosis and identification of drug targets | |
| Handler et al. | Sphere-sequencing unveils local tissue microenvironments at single cell resolution | |
| Yasumizu et al. | Neural-net-based cell deconvolution from DNA methylation reveals tumor microenvironment associated with cancer prognosis | |
| Zhang et al. | Human circulating and tissue gastric cancer stem cells display distinct epithelial–mesenchymal features and behaviors | |
| Ak et al. | Multiplex imaging of localized prostate tumors reveals changes in mast cell type composition and spatial organization of AR-positive cells in the tumor microenvironment | |
| Belthier et al. | CD44v6 Defines a New Population of Circulating Tumor Cells Not Expressing EpCAM. Cancers 2021, 13, 4966 | |
| Kitz | Role of the epithelial-to-mesenchymal transition (EMT) on circulating tumor cell (CTC) and metastasis biology in prostate cancer | |
| Rosti et al. | Chromatin remodeling restrains oncogenic functions in prostate cancer | |
| Kanwar et al. | Circulating tumour cells: implications and methods of detection | |
| WO2022098619A1 (en) | Compositions and methods for improving cancer therapy | |
| Taavitsainen et al. | Single-cell ATAC and RNA sequencing reveal pre-existing and persistent subpopulations of cells associated with relapse of prostate cancer |