[go: up one dir, main page]

HK1229002B - Lung cancer biomarkers and uses thereof - Google Patents

Lung cancer biomarkers and uses thereof Download PDF

Info

Publication number
HK1229002B
HK1229002B HK17102551.8A HK17102551A HK1229002B HK 1229002 B HK1229002 B HK 1229002B HK 17102551 A HK17102551 A HK 17102551A HK 1229002 B HK1229002 B HK 1229002B
Authority
HK
Hong Kong
Prior art keywords
biomarker
individual
biomarkers
nsclc
cancer
Prior art date
Application number
HK17102551.8A
Other languages
Chinese (zh)
Other versions
HK1229002A (en
HK1229002A1 (en
Inventor
M.里尔-米恩
A.A.E.斯图尔特
R.M.奥斯特罗夫
S.A.威廉斯
E.N.布罗迪
Original Assignee
私募蛋白质体运营有限公司
Filing date
Publication date
Application filed by 私募蛋白质体运营有限公司 filed Critical 私募蛋白质体运营有限公司
Publication of HK1229002A publication Critical patent/HK1229002A/en
Publication of HK1229002A1 publication Critical patent/HK1229002A1/en
Publication of HK1229002B publication Critical patent/HK1229002B/en

Links

Description

Lung cancer biomarker and application thereof
The application is a divisional application of Chinese patent application with the name of 'lung cancer biomarker and application thereof' of invention No. 201180074349.4 with the application date of 2011, 10, month and 24.
Technical Field
The present application relates generally to the detection of biomarkers and diagnosis of cancer in an individual, and more particularly to one or more biomarkers, methods, devices, reagents, systems and kits for diagnosing cancer, more particularly lung cancer, in an individual.
Background
The following description provides a summary of information relevant to the present application and is not an admission that any of the information provided herein or publications referred to is prior art to the present application.
More people die from lung cancer than any other type of cancer. This is true for both men and women. Lung cancer is responsible for more deaths than a combination of breast, prostate and colon cancers. Lung cancer is responsible for an estimated 157,300 deaths, or 28% of all cancer deaths in the united states in 2010. It is estimated that in 2010, 116,750 men and 105,770 women will be diagnosed with lung Cancer, and 86,220 men and 71,080 women will die of lung Cancer (Jemal, CA Cancer JClin 2010; 60: 277). Among men in the united states, lung cancer is the second most common cancer among white, black, asian/pacific landmasses, american indians/alaska inhabitants, and spanish men. Among women in the united states, lung cancer is the second most common cancer among white, black and american indian/alaska protoresident women, and the third most common cancer among asian/pacific island people and hispanic women. For those who do not quit smoking, the chance of dying of lung cancer is 15%, and even for those who quit at the age of 50-59 years is still higher than 5%. The annual healthcare cost of lung cancer in the united states alone is $950 billion.
Ninety-one percent of lung cancers caused by smoking are non-small cell lung cancers (NSCLC), which represent about 85% of all lung cancers. The remaining 15% of all lung cancers are small cell lung cancers, although mixed cell lung cancers do occur. Because small cell lung cancer is rare and rapidly fatal, there is little opportunity for early detection.
There are three main types of NSCLC: squamous cell carcinoma, large cell carcinoma, and adenocarcinoma. Adenocarcinoma is the most common form of lung cancer (30% -65%) and is the lung cancer most commonly found in smokers and non-smokers. Squamous cell carcinoma is responsible for 25-30% of all lung cancers and is commonly found in the proximal bronchi. Early NSCLC tends to be localized, and if detected early, it can often be treated by surgery, with favorable outcomes and improved survival. Other treatment options include radiation therapy, drug therapy, and combinations of these methods.
NSCLC is staged by tumor size and its presence in other tissues, including lymph nodes. In the occult stage, cancer cells can be found in sputum samples or lavage samples, and no tumor can be detected in the lung. In phase 0, only the innermost lung lining shows cancer cells, and the tumor does not grow through the lining. In stage IA, the cancer is considered locally invasive and has grown deep into lung tissue, but the tumor is less than 3cm in diameter. During this period, tumors were not found in the main bronchi or lymph nodes. In stage IB, tumors are larger than 3cm in diameter or have grown into the bronchi or pleural thorax, but not yet into the lymph nodes. In stage IIA, the tumor is less than 7cm in diameter and may have grown into lymph nodes. In stage IIB, tumors have been found in lymph nodes and are greater than 5cm in diameter, or grow into the bronchi or pleura; or the cancer is not found in the lymph nodes, but in the chest wall, diaphragm, pleura, bronchi or tissues surrounding the heart, or a separate tumor nodule is present in the same lobe of the lung. In stage IIIA, cancer cells are found in lymph nodes near the lungs and bronchi, as well as those between the lungs but on the side of the chest where the tumor is located. Stage IIIB, cancer cells are localized on the opposite side of the breast from the tumor or in the neck. Other organs near the lung may also have cancer cells, and multiple tumors may be found in one lobe of the lung. In stage IV, tumors are found in more than one lobe or two lungs of the same lung, and cancer cells are found in other parts of the body.
Current methods for lung cancer diagnosis include sputum testing for cancer cells, chest x-rays, fiberoptic assessment and biopsy of the airways, and low dose helical Computed Tomography (CT). Sputum cytology has very low sensitivity. Chest X-rays are also relatively insensitive, requiring lesion sizes greater than 1cm to be visible. Bronchoscopy requires that the tumor be visible inside an airway accessible to the bronchoscope. The most widely accepted diagnostic method is low dose chest CT, but together with X-rays, the use of CT involves ionizing radiation, which itself can cause cancer. CT also has significant locality: scans require a high level of expertise to interpret, and many of the observed abnormalities are not in fact lung cancer, and generate substantial healthcare costs in follow-up CT findings. The most common incidental finding is a benign lung nodule.
Pulmonary nodules are relatively circular areas of lesions, or abnormal tissue, located within the lungs and can vary in size. Pulmonary nodules can be benign or cancerous, but most are benign. If the nodules are below 4mm, the prevalence is only 1.5%, if the nodules are 4-8mm, the prevalence is approximately 6%, and if the nodules exceed 20mm, the incidence is approximately 20%. For small and medium sized nodules, the patient is advised to undergo repeated scans over a period of three months to a year. For many large nodules, patients undergo biopsies (which are invasive and can lead to complications), even though most of these are benign.
Therefore, there is a need for diagnostic methods that can replace or supplement CT to reduce the number of surgical procedures performed and minimize the risk of surgical complications. In addition, even when lung nodules are absent or unknown, methods of detecting lung cancer in its early stages are needed to improve patient outcomes. Only 16% of lung cancer cases are diagnosed as localized, early stage cancer, with a 5-year survival rate of 46%, compared to 84% of those diagnosed at late stage, where the 5-year survival rate is only 13%. This demonstrates that relying on symptoms for diagnosis is not useful because many of them are common to other lung diseases and are usually present only in the late stages of lung cancer. These symptoms include persistent cough, blood sputum, chest pain, and recurrent bronchitis or pneumonia.
Benefits are generally recognized by the medical community when there are early diagnostic methods in cancer. Cancers that have widely used screening programs have the highest 5-year survival rate, e.g., 16% for breast (88%) and colon (65%) versus lung cancer. However, if cancer is diagnosed at stage I by screening, up to 88% of lung cancer patients survive ten years or more. This confirms a clear need for diagnostic methods that can reliably detect early stage NSCLC.
Biomarker selection for specific disease states involves first identifying markers that have measurable and statistically significant differences in the disease population compared to a control population for specific medical applications. Biomarkers may include secreted or shed molecules that are parallel to disease progression or progression and readily diffuse into the blood stream from lung tissue or distal tissue in response to a lesion. They may also include proteins produced by cells in response to tumors. The identified biomarker or panel of biomarkers is typically clinically validated or displayed as a reliable indicator for its original intended use of its selection. Biomarkers can include small molecules, metabolites, peptides, proteins, and nucleic acids. Some of the key issues affecting biomarker identification include overfitting of available data and bias in the data.
Various methods have been used in attempts to identify biomarkers and diagnose disease. For protein-based labels, these include two-dimensional electrophoresis, mass spectrometry, and immunoassay methods. For nucleic acid tagging, these include mRNA expression profiling, microrna profiling, FISH, Serial Analysis of Gene Expression (SAGE), and large-scale gene expression arrays.
The utility of dielectrophoresis is limited by low detection sensitivity; issues with protein solubility, charge, and hydrophobicity; gel reproducibility; and a single spot represents the possibility of multiple proteins. For mass spectrometry, depending on the format used, limitations surround sample processing and separation, sensitivity to low abundance proteins, signal to noise considerations, and inability to immediately identify detected proteins. Limitations in immunoassay methods for biomarker discovery have focused on the inability of antibody-based multiplexed assays to measure large numbers of analytes. It is possible to print only arrays of high quality antibodies and to measure analytes bound to these antibodies without a sandwich. Even very good antibodies are not sufficiently stringent in the context of selecting their binding partners to function in the context of blood or even cell extracts, because the proteins in these matrices in general have very different abundances. Thus, a different approach than immunoassay-based approaches of biomarker discovery would have to be used, which would require the use of a multiplex ELISA assay (i.e., a sandwich) to obtain sufficient stringency to measure many analytes simultaneously to decide which analytes are true biomarkers. Sandwich immunoassays cannot be scaled up to high levels and therefore biomarkers using stringent sandwich immunoassays are not found to be possible using standard array formats. Finally, antibody reagents suffer from a large amount of batch variability and reagent instability. The instant platform for protein biomarker discovery overcomes this problem.
Many of these methods rely on or require some type of sample fractionation prior to analysis. Thus, the sample preparation required to run a sufficiently powerful study designed to identify/discover statistically relevant biomarkers in a series of well-defined sample populations is very difficult, expensive and time consuming. A wide range of variability can be introduced into a variety of samples during the fractionation process. For example, a potential marker may be unstable to the process, the concentration of the marker may change, improper aggregation or disintegration may occur, and unintended sample contamination may occur, and thus obscure subtle changes that are expected in early disease.
It is widely recognized that biomarker discovery and detection methods using these techniques have serious limitations with respect to identifying diagnostic biomarkers. These limitations include the inability to detect low abundance biomarkers, the inability to consistently cover the entire dynamic range of proteomes, irreproducibility in sample processing and fractionation, and the lack of overall irreproducibility and robustness of the method. Further, these studies have introduced bias into the data and have failed to adequately address the complexity of the sample population, including the appropriate control in terms of distribution and randomization required to identify and validate biomarkers within the target disease population.
Although efforts aimed at finding new effective biomarkers have been ongoing for decades, efforts have been largely unsuccessful. Biomarkers for a variety of diseases have often been identified in scientific laboratories, often by unexpected findings in the basic study of some disease processes. Based on findings and a small amount of clinical data, papers are published suggesting the identification of new biomarkers. However, most of these proposed biomarkers have not yet proven to be true or useful biomarkers, mainly because the small number of clinical samples tested provides only weak statistical evidence that effective biomarkers have in fact been found. That is, the initial identification is not critical with respect to the basic elements of statistics. Every year 1994 through 2003, a search of the scientific literature revealed that thousands of references to biomarkers were published. However, in this same period, up to three new protein biomarkers are approved by the FDA for diagnostic use each year, and in a few years, new protein biomarkers are not approved.
Based on the history of failed biomarker discovery efforts, mathematical theories have been proposed that further facilitate general understanding: biomarkers for disease are rare and difficult to discover. Biomarker studies based on 2D gels or mass spectrometry support these concepts. Very few useful biomarkers have been identified by these methods. However, 2D gels and mass spectrometry measure proteins present in blood at concentrations of about 1nM and higher, and this protein population is most likely to be least likely to change with disease, which is generally ignored. In addition to the instant biomarker discovery platform, there is no proteomic biomarker discovery platform that is able to accurately measure protein expression levels at much lower concentrations.
Much is known about the biochemical pathways of complex human biology. Many biochemical pathways culminate or start with secreted proteins that act locally within the pathological state, such as secreting growth factors to stimulate other cells in the pathological state to replicate, and secreting other factors to evade the immune system, and so forth. While many of these secreted proteins work in a paracrine fashion, some operate distally within the body. Those skilled in the art with a fundamental understanding of biochemical pathways will understand that many pathological state-specific proteins should be present in the blood at concentrations below (even well below) the detection limits of 2D gels and mass spectrometry. Having to precede this relatively abundant number of disease biomarkers to identify is a proteomics platform that can analyze proteins at concentrations lower than those detectable by 2D gel or mass spectrometry.
Accordingly, there is a need for biomarkers, methods, devices, reagents, systems and kits that make possible: (a) screening high-risk smokers for lung cancer, (b) distinguishing benign pulmonary nodules from malignant pulmonary nodules; (c) detecting a lung cancer biomarker; and (d) diagnosing lung cancer.
Disclosure of Invention
The present application includes biomarkers, methods, reagents, devices, systems and kits for the detection and diagnosis of cancer and more particularly NSCLC. Biomarkers of the present application were identified using aptamer (aptamer) -based multiplexing, which is described in detail in example 1. By using the aptamer-based biomarker identification methods described herein, the present application describes a surprisingly large number of NSCLC biomarkers that can be used for NSCLC detection and diagnosis, as well as a large number of cancer biomarkers that can be used for more general cancer detection and diagnosis. In identifying these biomarkers, over 1000 proteins from hundreds of individual samples are measured, some of which are at concentrations in the low femtomolar range. This is about four orders of magnitude lower than biomarker discovery experiments done with 2D gels and/or mass spectrometry.
Although some of the NSCLC biomarkers may be used alone to detect and diagnose NSCLC, described herein are methods for grouping a plurality of NSCLC biomarker subsets that may be used as a biomarker panel. Once a single biomarker or subset of biomarkers has been identified, NSCLC detection or diagnosis in an individual can be accomplished using any assay platform or format that is capable of measuring differences in the levels of a selected biomarker or biomarkers in a biological sample.
However, the NSCLC biomarkers disclosed herein can be identified only by using the aptamer-based biomarker identification methods described herein, in which more than 1000 separate potential biomarker values are individually screened from a large number of individuals that have previously been diagnosed with or without NSCLC. This discovery approach is in sharp contrast to biomarker discovery from conditioned media or lysed cells, as it queries more patient-related systems that do not need to be converted to human pathological states.
Thus, in one aspect of the present application, one or more biomarkers are provided for use, alone or in various combinations, to diagnose NSCLC or to allow differential diagnosis of NSCLC from benign conditions, such as those found in individuals identified with uncertain lung nodules using CT scans or other imaging methods, to screen high risk smokers for NSCLC, and to diagnose individuals with NSCLC. Exemplary embodiments include the biomarkers provided in table 1, which are identified as described above using the aptamer-based multiplex assay described generally in example 1 and more specifically in examples 2 and 5. The markers provided in table 1 are useful for diagnosing NSCLC in high risk populations and for distinguishing benign pulmonary disease from NSCLC in individuals with uncertain pulmonary nodules.
Although some of the NSCLC biomarkers may be used alone to detect and diagnose NSCLC, methods for grouping a plurality of NSCLC biomarker subsets, each of which may be used as a panel of two or more biomarkers, are also described herein. Accordingly, various embodiments of the present application provide combinations comprising N biomarkers, wherein N is at least two biomarkers. In other embodiments, N is selected as any number from 2-59 biomarkers.
In still other embodiments, N is selected as any number from 2-5, 2-10, 2-15, 2-20, 2-25, 2-30, 2-35, 2-40, 2-45, 2-50, 2-55, or 2-59. In other embodiments, N is selected as any number from 3-5, 3-10, 3-15, 3-20, 3-25, 3-30, 3-35, 3-40, 3-45, 3-50, 3-55, or 3-59. In other embodiments, N is selected as any number from 4-5, 4-10, 4-15, 4-20, 4-25, 4-30, 4-35, 4-40, 4-45, 4-50, 4-55, or 4-59. In other embodiments, N is selected as any number from 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5-50, 5-55, or 5-59. In other embodiments, N is selected as any number from 6-10, 6-15, 6-20, 6-25, 6-30, 6-35, 6-40, 6-45, 6-50, 6-55, or 6-59. In other embodiments, N is selected as any number from 7-10, 7-15, 7-20, 7-25, 7-30, 7-35, 7-40, 7-45, 7-50, 7-55, or 7-59. In other embodiments, N is selected as any number from 8-10, 8-15, 8-20, 8-25, 8-30, 8-35, 8-40, 8-45, 8-50, 8-55, or 8-59. In other embodiments, N is selected as any number from 9-10, 9-15, 9-20, 9-25, 9-30, 9-35, 9-40, 9-45, 9-50, 9-55, or 9-59. In other embodiments, N is selected from any number from 10-15, 10-20, 10-25, 10-30, 10-35, 10-40, 10-45, 10-50, 10-55, or 10-59. It should be understood that N may be selected to encompass ranges of similar or higher order.
In another aspect, a method for diagnosing NSCLC in an individual is provided, the method comprising detecting, in a biological sample from the individual, at least one biomarker value corresponding to at least one biomarker selected from the group of biomarkers provided in table 1, wherein the individual is classified as having NSCLC based on the at least one biomarker value.
In another aspect, a method for diagnosing NSCLC in an individual is provided, the method comprising detecting biomarker values in a biological sample from the individual, each corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the likelihood of the individual having NSCLC is determined based on the biomarker values.
In another aspect, a method is provided for diagnosing NSCLC in an individual, the method comprising detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the individual is classified as having NSCLC based on the biomarker values, and wherein N ═ 2 to 10.
In another aspect, a method is provided for diagnosing NSCLC in an individual, the method comprising detecting biomarker values in a biological sample from the individual, each corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the likelihood of the individual having NSCLC is determined based on the biomarker values, and wherein N-2-10.
In another aspect, a method for diagnosing that an individual does not have NSCLC is provided, the method comprising detecting, in a biological sample from the individual, at least one biomarker value that corresponds to at least one biomarker selected from the group of biomarkers set forth in table 1, wherein the individual is classified as not having NSCLC based on the at least one biomarker value.
In another aspect, a method is provided for diagnosing that an individual does not have NSCLC, the method comprising detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the individual is classified as not having NSCLC based on the biomarker values, and wherein N ═ 2 to 10.
In another aspect, a method for diagnosing NSCLC is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a panel of N biomarkers, wherein the biomarkers are selected from the group of biomarkers set forth in table 1, wherein a classification of the biomarker values indicates that the individual has NSCLC, and wherein N-3-10.
In another aspect, a method for diagnosing NSCLC is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a biomarker panel selected from the group of biomarkers set forth in tables 2-11, wherein a classification of the biomarker values indicates that the individual has NSCLC.
In another aspect, a method for diagnosing the absence of NSCLC is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a panel of N biomarkers, wherein the biomarkers are selected from the group of biomarkers set forth in table 1, wherein a classification of the biomarker values indicates the absence of NSCLC in the individual, and wherein N-3-10.
In another aspect, a method for diagnosing the absence of NSCLC is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a panel of N biomarkers, wherein the biomarkers are selected from the group of biomarkers set forth in table 1, wherein a classification of the biomarker values indicates the absence of NSCLC in the individual, and wherein N-3-10.
In another aspect, a method for diagnosing the absence of NSCLC is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a biomarker panel selected from the group of biomarkers provided in tables 2-11, wherein a classification of the biomarker values indicates the absence of NSCLC in the individual.
In another aspect, a method for diagnosing NSCLC in an individual is provided, the method comprising detecting, in a biological sample from the individual, a biomarker value corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the individual is classified as having NSCLC based on a classification score derived from a predetermined threshold, and wherein N-2-10.
In another aspect, a method for diagnosing the absence of NSCLC in an individual is provided, the method comprising detecting a biomarker value in a biological sample from the individual that corresponds to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein the individual is classified as not having NSCLC based on a classification score derived from a predetermined threshold, and wherein N-2-10.
In another aspect, a computer-implemented method for indicating the likelihood of NSCLC is provided. The method comprises the following steps: retrieving on a computer biomarker information for an individual, wherein the biomarker information comprises biomarker values each corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein N is as defined above; performing respective classifications of the biomarker values with a computer; and indicating a likelihood that the individual has NSCLC based on the plurality of classifications.
In another aspect, a computer-implemented method for classifying an individual as having or not having NSCLC is provided. The method comprises the following steps: retrieving, on a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers provided in table 1; performing respective classifications of the biomarker values with a computer; and indicating whether the individual has NSCLC based on the plurality of classifications.
In another aspect, a computer program product for indicating the likelihood of NSCLC is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 1, wherein N is defined above; and code for performing a classification method that indicates a likelihood that the individual has NSCLC based on the biomarker values.
In another aspect, a computer program product for indicating NSCLC status in an individual is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample that each correspond to one of at least N biomarkers selected from the group of biomarkers provided in table 1; and code for performing a classification method that indicates the NSCLC status of the individual based on the biomarker values.
In another aspect, a computer-implemented method for indicating the likelihood of NSCLC is provided. The method comprises the following steps: retrieving, on a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values corresponding to biomarkers selected from the group of biomarkers set forth in table 1; performing classification of the biomarker values with a computer; and indicating a likelihood that the individual has NSCLC based on the classification.
In another aspect, a computer-implemented method for classifying an individual as having or not having NSCLC is provided. The method includes retrieving, by a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values corresponding to biomarkers selected from the group of biomarkers provided in table 1; performing classification of the biomarker values with a computer; and indicating whether the individual has NSCLC based on the classification.
In another aspect, a computer program product for indicating the likelihood of NSCLC is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample corresponding to biomarkers selected from the group of biomarkers set forth in table 1; and code for performing a classification method that indicates a likelihood that the individual has NSCLC based on the biomarker values.
In another aspect, a computer program product for indicating NSCLC status in an individual is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample corresponding to biomarkers selected from the group of biomarkers provided in table 1; and code for performing a classification method that indicates the NSCLC status of the individual based on the biomarker values.
Although some of the biomarkers may also be used individually to detect and diagnose cancer in general, methods are described herein for grouping a plurality of biomarker subsets that can be used as a biomarker panel to detect and diagnose cancer in general. Once a single biomarker or subset of biomarkers has been identified, cancer detection or diagnosis in an individual can be accomplished using any assay platform or format that is capable of measuring differences in the levels of a selected biomarker or biomarkers in a biological sample.
However, the cancer biomarkers disclosed herein can only be identified by using the aptamer-based biomarker identification methods described herein, in which more than 1000 separate potential biomarker values are individually screened from a large number of individuals that have previously been diagnosed with or without cancer. This discovery approach is in sharp contrast to biomarker discovery from conditioned media or lysed cells, as it queries more patient-related systems that do not need to be converted to human pathological states.
Thus, in one aspect of the present application, one or more biomarkers are provided for use, alone or in various combinations, to diagnose cancer. Exemplary embodiments include the biomarkers provided in table 19, which were identified using the aptamer-based multiplex assay described generally in example 1 and more particularly in example 6. The markers provided in table 19 can be used to distinguish individuals with cancer from those without cancer.
Although some of the cancer biomarkers may be used individually to detect and diagnose cancer, methods for grouping a plurality of cancer biomarker subsets, each of which may be used as a panel of three or more biomarkers, are also described herein. Accordingly, various embodiments of the present application provide combinations comprising N biomarkers, wherein N is at least three biomarkers. In other embodiments, N is selected as any number from 3-23 biomarkers.
In still other embodiments, N is selected from any number from 2-5, 2-10, 2-15, 2-20, or 2-23. In other embodiments, N is selected as any number from 3-5, 3-10, 3-15, 3-20, or 3-23. In other embodiments, N is selected as any number from 4-5, 4-10, 4-15, 4-20, or 4-23. In other embodiments, N is selected as any number from 5-10, 5-15, 5-20, or 5-23. In other embodiments, N is selected as any number from 6-10, 6-15, 6-20, or 6-23. In other embodiments, N is selected as any number from 7-10, 7-15, 7-20, or 7-23. In other embodiments, N is selected as any number from 8-10, 8-15, 8-20, or 8-23. In other embodiments, N is selected as any number from 9-10, 9-15, 9-20, or 9-23. In other embodiments, N is selected to be any number from 10-15, 10-20, or 10-23. It should be understood that N may be selected to encompass ranges of similar or higher order.
In another aspect, a method for diagnosing cancer in an individual is provided that includes detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the likelihood of the individual having cancer is determined based on the biomarker values, and wherein N-2-10.
In another aspect, a method for diagnosing cancer in an individual is provided that includes detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the likelihood of the individual having cancer is determined based on the biomarker values.
In another aspect, a method for diagnosing cancer in an individual is provided that includes detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the individual is classified as having cancer based on the biomarker values, and wherein N-3-10.
In another aspect, a method for diagnosing cancer in an individual is provided that includes detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the likelihood of the individual having cancer is determined based on the biomarker values, and wherein N-3 to 10.
In another aspect, a method for diagnosing that an individual does not have cancer is provided that includes detecting, in a biological sample from an individual, at least one biomarker value that corresponds to at least one biomarker selected from the group of biomarkers set forth in table 19, wherein the individual is classified as not having cancer based on the at least one biomarker value.
In another aspect, a method is provided for diagnosing that an individual does not have cancer, the method comprising detecting, in a biological sample from the individual, biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the individual is classified as not having cancer based on the biomarker values, and wherein N-3-10.
In another aspect, a method for diagnosing cancer is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a panel of N biomarkers, wherein the biomarkers are selected from the group of biomarkers set forth in table 19, wherein a classification of the biomarker values indicates that the individual has cancer, and wherein N-3-10.
In another aspect, a method for diagnosing cancer is provided that includes detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a biomarker panel selected from the group of biomarkers set forth in tables 20-29, wherein a classification of the biomarker values indicates that the individual has cancer.
In another aspect, a method for diagnosing the absence of cancer is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a panel of N biomarkers, wherein the biomarkers are selected from the group of biomarkers set forth in table 19, wherein a classification of the biomarker values indicates the absence of cancer in the individual, and wherein N-3-10.
In another aspect, a method for diagnosing the absence of cancer is provided, the method comprising detecting, in a biological sample from an individual, biomarker values that each correspond to a biomarker on a biomarker panel selected from the group of biomarkers provided in tables 20-29, wherein a classification of the biomarker values indicates the absence of cancer in the individual.
In another aspect, a method for diagnosing cancer in an individual is provided that includes detecting, in a biological sample from the individual, a biomarker value that corresponds to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the individual is classified as having cancer based on a classification score derived from a predetermined threshold, and wherein N-3-10.
In another aspect, a method for diagnosing the absence of cancer in an individual is provided, the method comprising detecting, in a biological sample from an individual, a biomarker value corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein the individual is classified as not having cancer based on a classification score derived from a predetermined threshold, and wherein N-3-10.
In another aspect, a computer-implemented method for indicating likelihood of cancer is provided. The method comprises the following steps: retrieving, on a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values each corresponding to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein N is defined above; performing respective classifications of the biomarker values with a computer; and indicating a likelihood that the individual has cancer based on the plurality of classifications.
In another aspect, a computer-implemented method for classifying an individual as having or not having cancer is provided. The method comprises the following steps: retrieving, on a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values that each correspond to one of at least N biomarkers selected from the group of biomarkers provided in table 19; performing respective classifications of the biomarker values with a computer; and indicating whether the individual has cancer based on the plurality of classifications.
In another aspect, a computer program product for indicating likelihood of cancer is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample that each correspond to one of at least N biomarkers selected from the group of biomarkers set forth in table 19, wherein N is defined above; and code that performs a classification method that indicates a likelihood that the individual has cancer based on the biomarker values.
In another aspect, a computer program product for indicating a cancer status of an individual is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample that each correspond to one of at least N biomarkers selected from the group of biomarkers provided in table 19; and code for performing a classification method that indicates a cancer status of the individual as a function of the biomarker values.
In another aspect, a computer-implemented method for indicating likelihood of cancer is provided. The method comprises the following steps: retrieving, on a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values corresponding to biomarkers selected from the group of biomarkers set forth in table 19; performing classification of the biomarker values with a computer; and indicating the likelihood of the individual having cancer based on the classification method.
In another aspect, a computer-implemented method for classifying an individual as having or not having cancer is provided. The method includes retrieving, by a computer, biomarker information for an individual, wherein the biomarker information comprises biomarker values corresponding to biomarkers selected from the group of biomarkers provided in table 19; performing classification of the biomarker values with a computer; and indicating whether the individual has cancer based on the classification.
In another aspect, a computer program product for indicating likelihood of cancer is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample corresponding to biomarkers selected from the group of biomarkers set forth in table 19; and code that performs a classification method that indicates a likelihood that the individual has cancer based on the biomarker values.
In another aspect, a computer program product for indicating a cancer status of an individual is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values in the biological sample corresponding to biomarkers selected from the group of biomarkers provided in table 19; and code for performing a classification method that indicates a cancer status of the individual as a function of the biomarker values.
Drawings
Fig. 1A is a flow diagram of an exemplary method for detecting NSCLC in a biological sample.
FIG. 1B shows the use of naive Bayes (Bayes) classification method, a flow chart of an exemplary method for detecting NSCLC in a biological sample.
Figure 2 shows a ROC curve for a single biomarker MMP7 using a naive bayes classifier for testing for NSCLC.
Figure 3 shows ROC curves for biomarker panel from two to ten biomarkers using a naive bayes classifier for testing for NSCLC.
Figure 4 illustrates the increase in classification score (AUC) as the number of biomarkers increased from one to ten using a naive bayes classification for a group of NSCLC subjects.
Figure 5 shows biomarker distributions for MMP7 measured according to the cumulative distribution function (cdf) in log transformed RFU for pooled smokers and benign pulmonary nodule controls (solid line) and NSCLC disease groups (dashed line), along with their curve fit to the normal cdf (dashed line) used to train the naive bayes classifier.
FIG. 6 illustrates an exemplary computer system for use with the various computer-implemented methods described herein.
Fig. 7 is a flow diagram of a method of indicating a likelihood that an individual has NSCLC, according to one embodiment.
Fig. 8 is a flow diagram of a method of indicating a likelihood that an individual has NSCLC, according to one embodiment.
Figure 9 illustrates an exemplary aptamer assay that can be used to detect one or more NSCLC biomarkers in a biological sample.
Figure 10 shows a histogram of the frequency of biomarkers from an aggregated set of potential biomarkers in constructing a classifier for distinguishing NSCLC from smokers and a benign pulmonary nodule control group.
Fig. 11A shows a pair of histograms summarizing all possible single-protein naive bayes classifier scores (AUC) using the biomarkers (black) and random marker sets (grey) set forth in table 1.
Fig. 11B shows a pair of histograms summarizing all possible two-protein naive bayes classifier scores (AUC) using the biomarkers (black) and random marker sets (gray) set forth in table 1.
Fig. 11C shows a pair of histograms summarizing all possible three-protein naive bayes classifier scores (AUC) using the biomarkers (black) and random marker sets (gray) set forth in table 1.
Figure 12 shows the AUC for a naive bayes classifier using 2-10 markers from the complete subject group and the scores obtained by discarding the best 5, 10 and 15 markers during classifier generation.
Fig. 13A shows a set of ROC curves modeled by the data in table 14 for the experimental group from two to five markers.
Fig. 13B shows a set of ROC curves calculated from training data for the panel of subjects from two to five markers as in fig. 12.
FIG. 14 shows ROC curves calculated from the panel of clinical biomarker subjects described in example 5.
Fig. 15A and 15B show performance comparisons between ten cancer biomarkers selected by the greedy selection procedure described in example 6 (table 19) and ten sets of 1,000 randomly sampled "non-marker" biomarkers. The mean AUC for the ten cancer biomarkers in table 19 is shown as the vertical dashed line. In FIG. 15A, the set of ten "non-tags" is randomly selected, which is not selected by the greedy selection procedure described in example 6. In fig. 15B, the same procedure as 15A is used; however, sampling was limited to the remaining 49 NSCLC biomarkers from table 1, which were not selected by the greedy selection procedure described in example 6.
Fig. 16 shows Receiver Operating Characteristic (ROC) curves for the 3 na iotave bayes classifiers set forth in table 31. For each study, the area under the curve (AUC) is also shown next to the legend.
Detailed Description
Reference will now be made in detail to the representative embodiments of the present invention. While the invention will be described in conjunction with the illustrated embodiments, it will be understood that it is not intended to limit the invention to these embodiments. On the contrary, the invention is intended to cover all alternatives, modifications and equivalents, which may be included within the scope of the invention as defined by the appended claims.
Those skilled in the art will recognize many methods and materials similar or equivalent to those described herein, which can be used in the practice of the present invention and are within the scope of the practice of the present invention. The present invention is in no way limited to the methods and materials described.
Unless defined otherwise, technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although the preferred methods, devices, and materials are now described, any methods, devices, and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention.
All publications, published patent documents, and patent applications cited in this application are indicative of the level of skill of one or more areas of art to which this application pertains. All publications, published patent documents and patent applications cited herein are incorporated by reference to the same extent as if each individual publication, published patent document or patent application were specifically and individually indicated to be incorporated by reference.
As used in this application, including the appended claims, the singular forms "a," "an," and "the" include plural references and are used interchangeably with "at least one" and "one or more" unless the content clearly dictates otherwise. Thus, reference to "an aptamer" includes mixtures of aptamers, reference to "a probe" includes mixtures of probes, and the like.
As used herein, the term "about" represents insignificant numerical modifications or variations such that the essential function of the item to which the value relates is unchanged.
As used herein, the terms "comprises," "comprising," "includes," "including," "contains," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, product-by-process, or composition of matter that comprises, includes, or contains an element or list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, product-by-process, or composition of matter.
The present application includes biomarkers, methods, devices, reagents, systems and kits for detecting and diagnosing NSCLC and more generally cancer.
In one aspect, one or more biomarkers are provided for use, alone or in various combinations, to diagnose NSCLC, to allow differential diagnosis of NSCLC and non-malignant conditions found in individuals with indeterminate lung nodules identified using CT scans or other imaging methods, to screen high risk smokers for NSCLC, and to diagnose individuals with NSCLC, to identify NSCLC recurrence, or to address other clinical indications. As described in detail below, exemplary embodiments include the biomarkers provided in table 1, which are identified using the aptamer-based multiplex assay described generally in example 1 and more particularly in example 2.
The findings obtained from analyzing hundreds of individual blood samples from NSCLC cases, as well as hundreds of equivalent individual control blood samples from high-risk smokers and benign lung nodules, are set forth in table 1. Smokers and benign pulmonary nodule control groups are designed to match the population for which NSCLC diagnostic tests may be of greatest benefit, including asymptomatic individuals and symptomatic individuals. These cases and controls were obtained from multiple clinical sites to simulate the range of real world conditions under which such tests could be applied. Potential biomarkers are measured in a single sample rather than pooled disease and control blood; this allows a better understanding of individual and group variations in the phenotype associated with the presence and absence of disease (in this case NSCLC). Because more than 1000 protein measurements were made for each sample, and a single measurement was from hundreds of samples each from the disease and control populations, table 1 results from the analysis of a rare large data set. Measurements were analyzed using the methods described in the section "classification of biomarkers and calculation of disease scores" herein. Table 1 lists 59 biomarkers found to be useful in distinguishing samples from individuals with NSCLC from "control" samples from smokers and benign lung nodules.
Although some of the NSCLC biomarkers may be used individually to detect and diagnose NSCLC, methods for grouping a plurality of subsets of NSCLC biomarkers are also described herein, wherein each grouping or subset selects a panel of subjects that can be used as three or more biomarkers, referred to herein interchangeably as a "biomarker panel (biomarkerpanel)" and a panel of subjects (panel). Accordingly, various embodiments of the present application provide combinations comprising N biomarkers, wherein N is at least two biomarkers. In other embodiments, N is selected from 2-59 biomarkers.
In still other embodiments, N is selected as any number from 2-5, 2-10, 2-15, 2-20, 2-25, 2-30, 2-35, 2-40, 2-45, 2-50, 2-55, or 2-59. In other embodiments, N is selected as any number from 3-5, 3-10, 3-15, 3-20, 3-25, 3-30, 3-35, 3-40, 3-45, 3-50, 3-55, or 3-59. In other embodiments, N is selected as any number from 4-5, 4-10, 4-15, 4-20, 4-25, 4-30, 4-35, 4-40, 4-45, 4-50, 4-55, or 4-59. In other embodiments, N is selected as any number from 5-10, 5-15, 5-20, 5-25, 5-30, 5-35, 5-40, 5-45, 5-50, 5-55, or 5-59. In other embodiments, N is selected as any number from 6-10, 6-15, 6-20, 6-25, 6-30, 6-35, 6-40, 6-45, 6-50, 6-55, or 6-59. In other embodiments, N is selected as any number from 7-10, 7-15, 7-20, 7-25, 7-30, 7-35, 7-40, 7-45, 7-50, 7-55, or 7-59. In other embodiments, N is selected as any number from 8-10, 8-15, 8-20, 8-25, 8-30, 8-35, 8-40, 8-45, 8-50, 8-55, or 8-59. In other embodiments, N is selected as any number from 9-10, 9-15, 9-20, 9-25, 9-30, 9-35, 9-40, 9-45, 9-50, 9-55, or 9-59. In other embodiments, N is selected from any number from 10-15, 10-20, 10-25, 10-30, 10-35, 10-40, 10-45, 10-50, 10-55, or 10-59. It should be understood that N may be selected to encompass ranges of similar or higher order.
In one embodiment, the number of biomarkers available for a subset of biomarkers or a group of subjects is based on sensitivity and specificity values for a particular combination of biomarker values. The terms "sensitivity" and "specificity" are used herein with respect to the ability to correctly classify an individual as having NSCLC or not having NSCLC based on the value of one or more biomarkers detected in its biological sample. "sensitivity" indicates the performance of one or more biomarkers with respect to correctly classifying an individual having NSCLC. "specificity" indicates the performance of one or more biomarkers with respect to correctly classifying an individual not having NSCLC. For example, the 85% specificity and 90% sensitivity for the labeled subject groups used to test the control sample and NSCLC sample set indicates: 85% of the control samples were correctly classified as control samples by the subject group and 90% of the NSCLC samples were correctly classified as NSCLC samples by the subject group. The desired or preferred minimum value can be determined as described in example 3. Representative groups of subjects are set forth in tables 4-11, which set forth a series of 100 different groups of subjects with 3-10 biomarkers having the indicated specificity and sensitivity levels for each group of subjects. The total number of occurrences of each marker in each of these experimental groups is indicated in table 12.
In one aspect, NSCLC in an individual is detected or diagnosed by performing an assay on a biological sample from the individual and detecting biomarker values that each correspond to at least one of the biomarkers MMP7, CLIC1, or STXIA and at least N additional biomarkers selected from the list of biomarkers in table 1, wherein N is equal to 2,3, 4, 5,6, 7, 8, or 9. In a further aspect, NSCLC in an individual is detected or diagnosed by performing an assay on a biological sample from the individual and detecting biomarker values that each correspond to one of the biomarkers MMP7, CLIC1, or STXIA and at least N additional biomarkers selected from the list of biomarkers in table 1, wherein N is equal to 1,2,3, 4, 5,6, or 7. In a further aspect, NSCLC is detected or diagnosed in an individual by performing an assay on a biological sample from the individual and detecting biomarker values that each correspond to biomarker MMP7 and one of at least N additional biomarkers selected from the list of biomarkers in table 1, wherein N is equal to 2,3, 4, 5,6, 7, 8, or 9. In a further aspect, NSCLC is detected or diagnosed in an individual by performing an assay on a biological sample from the individual and detecting biomarker values that each correspond to biomarker CLIC1 and one of at least N additional biomarkers selected from the list of biomarkers in table 1, wherein N is equal to 2,3, 4, 5,6, 7, 8, or 9. In a further aspect, NSCLC is detected or diagnosed in an individual by performing an assay on a biological sample from the individual and detecting biomarker values that each correspond to the biomarker STXIA and one of at least N additional biomarkers selected from the list of biomarkers in table 1, wherein N is equal to 2,3, 4, 5,6, 7, 8, or 9.
The NSCLC biomarkers identified herein represent a relatively large selection of a subset of biomarkers or a panel of subjects that can be used to effectively detect or diagnose NSCLC. The selection of the desired number of such biomarkers depends on the particular combination of biomarkers selected. It is important to remember that the biomarker panel used to detect or diagnose NSCLC may also include biomarkers not found in table 1, and the inclusion of additional biomarkers not found in table 1 may reduce the number of biomarkers in a particular subset or panel of subjects selected from table 1. The number of biomarkers from table 1 used in a subset or group of subjects can also be reduced if additional biomedical information is used in conjunction with the biomarker values to determine acceptable sensitivity and specificity values for a given assay.
Another factor that may affect the number of biomarkers to be used in a biomarker subset or a group of subjects is the procedure used to obtain a biological sample from an individual who is to be diagnosed for NSCLC. In a carefully controlled sample acquisition environment, the number of biomarkers necessary to meet the desired sensitivity and specificity values will be lower than in cases where there may be more variation in sample collection, handling and storage. In developing the biomarker list set forth in table 1, multiple sample collection sites were used to collect data for classifier training. This provides a more robust biomarker that is less sensitive to variations in sample collection, handling and storage, but may also require a greater number of biomarkers in a subset or group of subjects if the training data is all obtained under very similar conditions.
One aspect of the present application may be generally described with respect to fig. 1A and 1B. The biological sample is obtained from one or more individuals of interest. The biological sample is then assayed to detect the presence of one or more (N) biomarkers of interest, and the respective biomarker values for the N biomarkers (referred to as markers RFU in fig. 1B) are determined. Once the biomarkers have been detected and the biomarker values assigned, each marker is scored or classified as described in detail herein. The marker scores are then combined to provide a total diagnostic score that indicates the likelihood that the individual from which the sample was obtained has NSCLC.
As used herein, "lung" may be interchangeably referred to as "pulmonary".
As used herein, "smoker" refers to an individual who has a history of tobacco smoke inhalation.
"biological sample," "sample," and "test sample" are used interchangeably herein to refer to any material, biological fluid, tissue, or cell obtained or otherwise derived from an individual. This includes blood (including whole blood, leukocytes, peripheral blood mononuclear cells, buffy coat, plasma and serum), sputum, tears, mucus, nasal washes, nasal aspirates, respiratory (break), urine, semen, saliva, peritoneal washes, cyst fluid, meningeal fluid, amniotic fluid, glandular fluid, lymph fluid, cytological fluid, ascites, pleural fluid, nipple aspirates, bronchial scrubs, synovial fluid, joint aspirates, organ secretions, cells, cell extracts, and cerebrospinal fluid. This also includes all the aforementioned fractions that were experimentally separated. For example, a blood sample may be fractionated into serum, plasma, or a fraction containing a particular type of blood cells, such as red blood cells or white blood cells (leukocytes). If desired, the sample can be a combination of samples from an individual, such as a combination of a tissue and a fluid sample. The term "biological sample" also includes, for example, a material containing homogenized solid material, such as from a stool sample, a tissue sample, or a tissue biopsy. The term "biological sample" also includes materials derived from tissue culture or cell culture. Any suitable method for obtaining a biological sample may be employed; exemplary methods include, for example, phlebotomy, swab (e.g., cheek swab), and fine needle aspiration biopsy procedures. Exemplary tissues that are sensitive to fine needle aspiration include lymph nodes, lungs, lung washes, BAL (bronchoalveolar lavage), pleura, thyroid, breast, pancreas, and liver. Samples can also be collected, for example, by microdissection (e.g., Laser Capture Microdissection (LCM) or Laser Microdissection (LMD)), bladder wash, smear (e.g., PAP smear), or ductal lavage. "biological sample" obtained or derived from an individual includes any such sample that has been processed in any suitable manner after having been obtained from the individual.
Further, it is recognized that biological samples may be obtained by obtaining biological samples from a number of individuals and combining them or combining aliquots of the biological samples of each individual. Pooled samples may be treated as samples from a single individual and, if the presence of cancer is determined in pooled samples, each individual biological sample may be retested to determine which individual or individuals have NSCLC.
For the purposes of this specification, the phrase "data attributed to a biological sample from an individual" means that the data in some form is derived from or generated using the biological sample of the individual. The data may be reformatted, corrected or mathematically altered to some extent after having been generated, for example by being converted from units in one measurement system to units in another measurement system; however, data is understood to have been derived from or generated using a biological sample.
"target," "target molecule," and "analyte" are used interchangeably herein to refer to any molecule of interest that may be present in a biological sample. "molecule of interest" includes any minor change to a particular molecule, for example in the case of a protein, for example in amino acid sequence, disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation or any other manipulation or modification, for example conjugation to a marker component which does not substantially alter the properties of the molecule. A "target molecule," "target," or "analyte" is a copy of a class of molecules or multi-molecular structures or a collection of classes of molecules or multi-molecular structures. "target molecule," "target," and "analyte" refer to more than one such collection of molecules. Exemplary target molecules include proteins, polypeptides, nucleic acids, carbohydrates, lipids, polysaccharides, glycoproteins, hormones, receptors, antigens, antibodies, affibodies, autoantibodies, antibody mimetics, viruses, pathogens, toxic substances, substrates, metabolites, transition state analogs, cofactors, inhibitors, drugs, dyes, nutrients, growth factors, cells, tissues, and any fragments or portions of any of the foregoing.
As used herein, "polypeptide," "peptide," and "protein" are used interchangeably herein to refer to a polymer of amino acids of any length. The polymer may be linear or branched, it may comprise modified amino acids, and it may be interrupted by non-amino acids. The term also encompasses amino acid polymers that have been modified, either naturally or by intervention; such as disulfide bond formation, glycosylation, lipidation, acetylation, phosphorylation or any other manipulation or modification, such as conjugation to a labeling component. Also included within this definition are, for example, polypeptides containing one or more amino acid analogs (including, for example, unnatural amino acids, etc.), as well as other modifications known in the art. The polypeptide may be a single chain or related chain. Also included within this definition are preproteins and intact mature proteins; a peptide or polypeptide derived from a mature protein; a fragment of a protein; a splice variant; a protein in recombinant form; protein variants having amino acid modifications, deletions or substitutions; digesting the product; and post-translational modifications, such as glycosylation, acetylation, phosphorylation, and the like.
As used herein, "marker" and "biomarker" are used interchangeably to refer to a target molecule that is indicative of, or is indicative of, a normal or abnormal process in an individual or a disease or other condition in an individual. More specifically, a "marker" or "biomarker" is an anatomical, physiological, biochemical, or molecular parameter associated with the presence of a particular physiological state or process, whether normal or abnormal, and if abnormal, whether chronic or acute. Biomarkers are detectable and measurable by a variety of methods including laboratory assays and medical imaging. When the biomarker is a protein, the expression of the corresponding gene can also be used as a surrogate measure of the amount or presence or absence of the corresponding protein biomarker, or the methylation state of the gene encoding the biomarker, or the protein controlling biomarker expression in the biological sample.
As used herein, "biomarker value," "biomarker level," and "level" are used interchangeably to refer to a measurement made using any analytical method for detecting a biomarker in a biological sample, and are indicative of a biomarker in a biological sample, a presence, absence, absolute amount or concentration, relative amount or concentration, titer, level, expression level, proportion of measured levels, or the like, with respect to or corresponding to a biomarker in a biological sample. The exact nature of the "value" or "level" depends on the specific design and composition of the particular assay used to detect the biomarker.
When a biomarker is indicative of, or is a sign of, an abnormal process or disease or other condition in an individual, the biomarker is generally described as being over-expressed or under-expressed as compared to the expression level or value of the biomarker (which is indicative of, or is a sign of, the absence of a normal process or disease or other condition in the individual). "upregulation," "upregulated," "overexpression," "overexpressed," and any variation thereof, are used interchangeably to refer to a biomarker value or level in a biological sample that is greater than the biomarker value or level (or range of values or levels) typically detected in a similar biological sample from a healthy or normal individual. The term may also refer to a biomarker value or level in a biological sample that is greater than the biomarker value or level (or range of values or levels) that may be detected at different stages of a particular disease.
"downregulated," "underexpressed," and any variation thereof, are used interchangeably to refer to a biomarker value or level in a biological sample that is less than the biomarker value or level (or range of values or levels) typically detected in a similar biological sample from a healthy or normal individual. The term may also refer to a biomarker value or level in a biological sample that is less than the biomarker value or level (or range of values or levels) that may be detected at different stages of a particular disease.
Further, a biomarker that is over-expressed or under-expressed may also be referred to as being "differentially expressed" or having a "differential level" or "differential value" as compared to the "normal" expression level or value of the biomarker (which is indicative of, or indicative of, the absence of a normal process or disease or other condition in the individual). Thus, "differential expression" of a biomarker may also be referred to as a change from the "normal" expression level of the biomarker.
The terms "differential gene expression" and "differential expression" are used interchangeably to refer to a gene (or its corresponding protein expression product) whose expression is activated to a higher or lower level in a subject with a particular disease relative to its expression in a normal or control subject. The term also includes genes (or corresponding protein expression products) whose expression is activated to higher or lower levels at different stages of the same disease. It is also understood that differentially expressed genes may be activated or inhibited at the nucleic acid level or the protein level, or may undergo alternative splicing to result in different polypeptide products. Such differences can be evidenced by a variety of alterations including mRNA levels, surface expression, secretion, or other compartmentalization of the polypeptide. Differential gene expression may include a comparison of expression between two or more genes or gene products thereof; or a comparison of the expression ratios between two or more genes or gene products thereof; or even a comparison of two differentially processed products of the same gene, which differs between normal subjects and subjects with disease; or differ between stages of the same disease. Differential expression includes both quantitative as well as qualitative differences in the transient or cellular expression pattern of a gene or its expression product, for example, in normal and diseased cells, or in cells that have undergone different disease events or disease stages.
As used herein, "individual" refers to a test subject or patient. The subject may be a mammal or a non-mammal. In various embodiments, the subject is a mammal. The mammalian subject may be a human or non-human. In various embodiments, the subject is a human. A healthy or normal individual is one in which a disease or condition of interest (including, for example, a lung disease, lung-related disease, or other lung condition) cannot be detected by conventional diagnostic methods.
"diagnosis" and variations thereof refer to the detection, determination, or identification of a health state or condition of an individual based on one or more signs, symptoms, data, or other information associated with the individual. The health state of an individual may be diagnosed as healthy/normal (i.e., diagnosis of the absence of a disease or condition) or as diseased/abnormal (i.e., diagnosis of the presence of a disease or condition, or assessment of the characteristics of a disease or condition). The terms "diagnosis" and the like with respect to a particular disease or condition encompass initial detection of the disease; characterization or classification of a disease; detection of disease progression, remission or relapse; and detection of disease response following administration of the treatment or therapy to the individual. Diagnosis of NSCLC involves distinguishing individuals who have cancer from individuals who do not. It further includes differentiating smokers and benign lung nodules from NSCLC.
"prognosis" and variations thereof refer to the prediction of the future course of a disease or condition in an individual having the disease or condition (e.g., predicting patient survival), and such terms encompass the assessment of disease response following administration of a treatment or therapy to the individual.
"assessing" and variations thereof includes "diagnosis" and "prognosis" and also includes determination or prediction of the future course of a disease or condition in an individual who does not have the disease, as well as determination or prediction of the likelihood that the disease or condition will relapse in an individual who is apparently cured of the disease. The term "assessing" also encompasses evaluating the response of an individual to treatment, such as predicting whether an individual is likely to respond favorably or is unlikely to respond to a therapeutic agent (or will experience toxicity or other undesirable side effects, for example), selecting a therapeutic agent for administration to an individual, or monitoring or determining the response of an individual to a therapy that has been administered to an individual. Thus, "assessing" NSCLC may include, for example, any of the following: predicting a future course of NSCLC in the individual; predicting NSCLC recurrence in an individual who has apparently cured NSCLC; or determining or predicting the response of the individual to a treatment for NSCLC, or selecting a treatment for NSCLC to administer to the individual based on the determination of the biomarker value derived from a biological sample of the individual.
Any of the following examples may be referred to as "diagnosing" or "assessing" NSCLC: initially detecting the presence or absence of NSCLC; determining a classification of a particular stage, type or subtype, or other characteristic of NSCLC; determining whether a suspicious pulmonary nodule or mass is benign or malignant NSCLC; or to detect/monitor NSCLC progression (e.g., to monitor tumor growth or metastatic spread), remission, or recurrence.
As used herein, "additional biomedical information" refers to one or more assessments of an individual that are associated with a risk of cancer or, more specifically, a risk of NSCLC, in addition to using any of the biomarkers described herein. "additional biomedical information" includes any of the following: physical descriptors of the individual, physical descriptors of pulmonary nodules observed by CT imaging, height and/or weight of the individual, sex of the individual, ethnicity of the individual, smoking history, occupational history, exposure to known carcinogens (e.g., any of asbestos, radon gas, chemicals, smoke from fire, and air pollution, which may include emissions from stationary or active sources, such as industrial/factory or automotive/marine/aircraft emissions), exposure to second-hand smoke, family history of NSCLC (or other cancers), presence of pulmonary nodules, nodule size, nodule location, nodule morphology (e.g., as observed by CT imaging: Ground Glass Opacity (GGO), solid, non-solid), edge features of nodules (e.g., smooth, lobular, sharp and smooth, acicular, Wetted) and the like. Smoking history is usually quantified in terms of "year in bales" which refers to the number of years an individual has smoked multiplied by the average number of bales smoked per day. For example, on average, an individual who smokes a pack of cigarettes for 35 years each day is said to have a smoking history of 35 packs of years. Additional biomedical information may be obtained from the individual using conventional techniques known in the art, for example, from the individual itself, or from a medical practitioner, etc., using a conventional patient questionnaire or health history questionnaire. Alternatively, the additional biomedical information may be derived from conventional imaging techniques, including CT imaging (e.g., low-dose CT imaging) and X-rays. A test of biomarker level in combination with any additional biomedical information assessment may, for example, improve the sensitivity, specificity, and/or AUC for detecting NSCLC (or other NSCLC-related uses) compared to the biomarker tested alone or any particular item assessing additional biomedical information alone (e.g., CT imaging alone).
The term "area under the curve" or "AUC" refers to the area under the curve of the Receiver Operating Characteristic (ROC) curve, both of which are well known in the art. The AUC measure can be used to compare the accuracy of the classifier across the full data range. A classifier with a greater AUC has a greater ability to correctly classify an unknown between two groups of interest (e.g., NSCLC samples and normal or control samples). ROC curves can be used to plot the performance of a particular trait (e.g., any of the biomarkers described herein and/or any item in additional biomedical information) in distinguishing between two populations (e.g., a case with NSCLC and a control without NSCLC). Typically, trait data across the entire population (e.g., cases and controls) is sorted in ascending order based on the value of individual traits. Then, for each value of the feature, true positives and false positive rates are calculated for the data. The true positive rate is determined by counting the number of cases above the value for the trait, and then dividing by the total number of cases. False positive rates were determined by counting controls above the value for the trait, and then dividing by the total controls. Although this definition refers to the case where a feature is elevated in a case compared to a control, this definition also applies to the case where a feature is lower in a case compared to a control (in such cases, samples that are lower in count than the value for the feature). The ROC curve may be generated for a single feature as well as for other single output quantities, e.g., a combination of two or more features may be mathematically combined (e.g., added, subtracted, multiplied, etc.) to provide a single sum value, and this single sum value may be plotted in the ROC curve. Additionally, any combination of features can be plotted in a ROC curve, where the combination results in a single output magnitude. The combination of features may comprise a test. The ROC curve is a plot of the true positive rate (sensitivity) of the test versus the false positive rate (1-specificity) of the test.
As used herein, "detection" or "determination" with respect to a biomarker value includes the use of both the instrument required to observe and record a signal corresponding to the biomarker value and the material or materials required to generate that signal. In various embodiments, biomarker values are detected using any suitable method, including fluorescence, chemiluminescence, surface plasmon resonance, surface acoustic wave, mass spectrometry, infrared spectroscopy, raman spectroscopy, atomic force microscopy, scanning tunneling microscopy, electrochemical detection methods, nuclear magnetic resonance, quantum dots, and the like.
"solid support" herein refers to any substrate having a surface to which molecules can be attached, directly or indirectly, by covalent or non-covalent bonds. The "solid support" may have a variety of physical forms, which may include, for example, a membrane; chips (e.g., protein chips); a slide (e.g., a glass slide or coverslip); a column; hollow, solid, semi-solid, pore or cavity containing particles, such as beads; gelling; the fiber comprises a fiber optic material; a matrix; and a sample container; exemplary sample containers include sample wells, tubes, capillaries, vials, and any other vessel, groove, or recess capable of holding a sample. The sample containers may be contained on a multi-sample platform, such as a microtiter plate, a slide, a microsomal device, and the like. The support may be composed of natural or synthetic materials, organic or inorganic materials. The composition of the solid support to which the capture reagent is attached generally depends on the method of attachment (e.g., covalent attachment). Other exemplary containers include droplet and microsomal controlled or bulk oil/water emulsions in which assays and related operations can occur. Suitable solid supports include, for example, plastics, resins, polysaccharides, silica or silica-based materials, functionalized glasses, modified silicon, carbon, metals, inorganic glasses, membranes, nylon, natural fibers (e.g., silk, wool, and cotton), polymers, and the like. The material constituting the solid support may comprise reactive groups for attaching the capture reagent, such as carboxyl, amino or hydroxyl groups. Polymeric solid supports may include, for example, polystyrene, polyethylene terephthalate, polyvinyl acetate, polyvinyl chloride, polyvinyl pyrrolidone, polyacrylonitrile, polymethyl methacrylate, polytetrafluoroethylene, butyl rubber, styrene butadiene rubber, natural rubber, polyethylene, polypropylene, (poly) tetrafluoroethylene, (poly) vinylidene fluoride, polycarbonate, and polymethylpentene. Suitable solid carrier particles that may be used include, for example, encoding particles, such as Luminex-type encoding particles, magnetic particles, and glass particles.
Exemplary uses of biomarkers
In various exemplary embodiments, methods are provided for diagnosing NSCLC in an individual by detecting one or more biomarker values corresponding to one or more biomarkers present in the circulation, e.g., serum or plasma, of the individual via any number of analytical methods, including any of the analytical methods described herein. These biomarkers are differentially expressed, for example, in individuals with NSCLC, as compared to individuals without NSCLC. Detection of differential expression of biomarkers in an individual may be used, for example, to allow early diagnosis of NSCLC, to distinguish benign and malignant lung nodules (such as those observed on Computed Tomography (CT) scans), to monitor NSCLC recurrence, or for other clinical indications.
Any of the biomarkers described herein can be used in a variety of clinical indications for NSCLC, including any of the following: detection of NSCLC (e.g., in high risk individuals or populations); characterizing NSCLC (e.g., determining NSCLC type, subtype, or stage), e.g., by distinguishing non-small cell lung cancer (NSCLC) from Small Cell Lung Cancer (SCLC) and/or adenocarcinoma and squamous cell carcinoma (or otherwise promoting histopathology); determining whether the lung nodule is a benign nodule or a malignant lung tumor; determining a NSCLC prognosis; monitoring NSCLC progression or remission; monitoring NSCLC recurrence; monitoring transfer; selecting treatment; monitoring the response to a therapeutic agent or other treatment; stratification of individuals for Computed Tomography (CT) scans (e.g., identifying those individuals at greater risk for NSCLC and thus most likely to benefit from helical CT scans, thus increasing the positive predictive value of CT); combining the biomarker test with additional biomedical information, such as smoking history, etc., or nodule size, morphology, etc. (e.g., to provide an assay with increased diagnostic performance compared to a CT test or biomarker test alone); facilitating the diagnosis of pulmonary nodules as malignant or benign; facilitating clinical decision after lung nodules are observed on CT (e.g., pre-repeat CT scans if the nodule is deemed low-risk, e.g., if the biomarker-based test is negative, with or without classification of nodule size, or consider a biopsy if the nodule is deemed medium-high-risk, e.g., if the biomarker-based test is positive, with or without classification of nodule size); and facilitates decisions regarding clinical follow-up (e.g., whether to perform repeated CT scans, fine needle biopsies, node resections, or thoracotomies after observing non-calcified nodules on CT). Biomarker testing can improve Positive Predictive Value (PPV) over CT or chest X-ray scans of high risk individuals alone. In addition to its utility in conjunction with CT scanning, the biomarkers described herein may also be used in conjunction with any other imaging modality for NSCLC, such as chest X-ray, bronchoscopy or fluorobronchoscopy, MRI or PET scanning. In addition, the biomarkers can also be used to allow some of these uses before an indication of NSCLC is detected by imaging modality or other clinical associations, or before symptoms appear. It further includes distinguishing between individuals identified with uncertain lung nodules using CT scanning or other imaging methods, screening of high risk smokers for NSCLC, and diagnosing individuals with NSCLC.
As an example of a manner in which any of the biomarkers described herein may be used to diagnose NSCLC, differential expression of one or more of the desired biomarkers in an individual who is not known to have NSCLC may indicate that the individual has NSCLC, thereby enabling detection of NSCLC at an early stage of the disease where treatment is most effective, perhaps before NSCLC is detected by other means or before symptoms appear. Overexpression of one or more of the biomarkers during the course of NSCLC can indicate NSCLC progression, e.g., NSCLC tumor is growing and/or metastasizing (and thus indicating a poor prognosis), while a decrease in the extent to which one or more of the biomarkers is differentially expressed (i.e., the expression level in the individual shifts toward or approaches a "normal" expression level in subsequent biomarker tests) can indicate NSCLC remission, e.g., NSCLC tumor is shrinking (and thus indicating a good or better prognosis). Similarly, an increase in the extent to which one or more of the biomarkers are differentially expressed during the course of treatment for NSCLC (i.e., in a subsequent biomarker test, the expression level in the individual moves further away from the "normal" expression level) may indicate that NSCLC is progressing, and thus that the treatment is ineffective, while a decrease in the differential expression of one or more of the biomarkers during the course of treatment for NSCLC may indicate that NSCLC is in remission, and thus that the treatment is functioning successfully. In addition, an increase or decrease in the differential expression of one or more of the biomarkers can indicate a recurrence of NSCLC after the individual has apparently cured NSCLC. In such cases, for example, the individual may restart treatment at an earlier stage than NSCLC recurrence was not detected until later (or if the individual has maintenance therapy, for example, the treatment regimen is modified to increase the dose amount and/or frequency). In addition, differential expression levels of one or more of the biomarkers in an individual can predict an individual's response to a particular therapeutic agent. In monitoring NSCLC recurrence or progression, an alteration in the level of biomarker expression may indicate a need for repeated imaging (e.g., repeated CT scans), for example to determine NSCLC activity or to determine a need for an alteration in treatment.
Detection of any of the biomarkers described herein may be particularly useful following or concurrent with NSCLC treatment, e.g., to assess success of treatment or to monitor NSCLC remission, recurrence and/or progression (including metastasis) following treatment. NSCLC treatment may include, for example, administration of a therapeutic agent to an individual, performance of surgery (e.g., surgical resection of at least a portion of a NSCLC tumor or removal of NSCLC and surrounding tissue), administration of radiation therapy, or any other type of NSCLC treatment used in the art, and any combination of these treatments. Lung cancer treatment may include, for example, administration of a therapeutic agent to an individual, performance of surgery (e.g., surgical resection of at least a portion of a lung tumor), administration of radiation therapy, or any other type of NSCLC treatment used in the art, and any combination of these treatments. For example, siRNA molecules are synthetic double-stranded RNA molecules that inhibit gene expression and can serve as targeted lung cancer therapeutics. For example, any of the biomarkers can be detected at least once after treatment, or can be detected multiple times after treatment (e.g., at periodic intervals), or can be detected both before and after treatment. Differential expression levels of any of the biomarkers in the individual over a period of time may indicate NSCLC progression, remission, or relapse, examples of which include any of the following: an increase or decrease in the expression level of the biomarker after treatment as compared to the expression level of the biomarker before treatment; an increase or decrease in the expression level of the biomarker at a later time point after treatment compared to the expression level of the biomarker at an earlier time point after treatment; and the differential expression level of the biomarker at a single time point after treatment compared to the normal level of the biomarker.
As a specific example, the biomarker levels for any of the biomarkers described herein can be determined in pre-operative and post-operative (e.g., 2-16 weeks post-operative) serum or plasma samples. An increase in the expression level of the one or more biomarkers in the post-operative sample compared to the pre-operative sample can indicate progression of NSCLC (e.g., unsuccessful surgery), while a decrease in the expression level of the one or more biomarkers in the post-operative sample compared to the pre-operative sample can indicate remission of NSCLC (e.g., successful surgical removal of lung tumor). Similar analysis of biomarker levels can be performed before and after other forms of treatment, such as before and after radiation therapy or administration of a therapeutic agent or cancer vaccine.
In addition to testing biomarker levels as an independent diagnostic test, biomarker levels can also be accomplished in conjunction with the determination of SNPs or other genetic lesions or variability indicative of increased risk of disease susceptibility (see, e.g., Amos et al, Nature Genetics 40, 616-622 (2009)).
In addition to testing biomarker levels as a stand-alone diagnostic test, biomarker levels may also be accomplished in conjunction with radiological screening, such as CT screening. For example, biomarkers may promote medical and economic reasons for achieving CT scans, such as for screening large asymptomatic populations (e.g., smokers) at risk for NSCLC. For example, a "pre-CT" test of biomarker levels may be used to stratify high risk individuals for CT screening, e.g., to identify those at highest risk for NSCLC based on their biomarker levels, and should be prioritized for CT screening. If a CT test is implemented, the biomarker levels of one or more biomarkers may be measured (e.g., as determined by aptamer assay of serum or plasma samples), and the diagnostic score may be evaluated in conjunction with additional biomedical information (e.g., tumor parameters determined by the CT test) to enhance Positive Predictive Value (PPV) over CT or biomarker tests alone. The "post-CT" aptamer panel used to determine biomarker levels can be used to determine the likelihood that a lung nodule observed by CT (or other imaging modality) is malignant or benign.
Detection of any of the biomarkers described herein can be used for post-CT testing. For example, biomarker testing may eliminate or reduce a significant number of false positive tests over CT alone. Further, biomarker testing may facilitate treatment of a patient. For example, if the lung nodule size is less than 5mm, the results of the biomarker test may advance the patient from "view and wait" to a biopsy at an earlier time; if the lung nodule is 5-9mm, the biomarker test may eliminate the use of a biopsy or thoracotomy for false positive scans; and if the lung nodules are greater than 10mm, the biomarker test may eliminate surgery for these patient subpopulations with benign nodules. It would be beneficial to eliminate the need for biopsy in some patients based on biomarker testing, as there is significant morbidity associated with nodule biopsy and difficulty in obtaining nodule tissue depending on the location of the nodule. Similarly, eliminating the need for surgery in some patients, for example, whose nodules are actually benign, would avoid unnecessary risks and costs associated with surgery.
In addition to testing biomarker levels in conjunction with radiological screening in high-risk individuals (e.g., assessing biomarker levels in conjunction with the size or other characteristics of lung nodules or masses observed on an imaging scan), information about biomarkers may also be assessed in conjunction with other types of data, particularly data indicative of an individual's risk with respect to NSCLC (e.g., patient clinical history, occupational exposure history, symptoms, family history of cancer, risk factors such as whether an individual is a smoker, and/or the status of other biomarkers, etc.). These various data may be evaluated by automated methods, such as computer programs/software, which may be embodied in a computer or other apparatus/device.
Any of the biomarkers can also be used in imaging tests. For example, an imaging agent may be conjugated to any of the biomarkers, which may be used to aid in NSCLC diagnosis, to monitor disease progression/remission or metastasis, to monitor disease recurrence, or to monitor response to therapy, among other uses.
Detection and determination of biomarkers and biomarker values
Biomarker values for the biomarkers described herein can be detected using any of a variety of known analytical methods. In one embodiment, the biomarker values are detected using a capture reagent. As used herein, "capture agent" or "capture reagent" refers to a molecule capable of specifically binding to a biomarker. In various embodiments, the capture reagent may be exposed to the biomarker in solution, or may be exposed to the biomarker while the capture reagent is immobilized on a solid support. In other embodiments, the capture reagent contains a feature that reacts with a secondary feature on the solid support. In these embodiments, the capture reagent may be exposed to the biomarker in solution, and then a feature on the capture reagent may be used in conjunction with a secondary feature on the solid support to immobilize the biomarker on the solid support. The capture reagent is selected based on the type of assay to be performed. Capture reagents include, but are not limited to, aptamers, antibodies, antigens, adnectins, ankyrins, other antibody mimetics and other protein scaffolds, autoantibodies, chimeras, small molecules, F (ab')2Fragments, single chain antibody fragments, Fv fragments, single chain Fv fragments, nucleic acids, lectins, ligand binding receptors, affibodies, nanobodies, imprinted polymers, high affinity multimers (Avimers), peptidomimetics, hormone receptors, cytokine receptors, and synthetic receptors, as well as modifications and fragments of these.
In some embodiments, biomarker values are detected using biomarker/capture reagent complexes.
In other embodiments, biomarker values are derived from biomarker/capture agent complexes and are detected indirectly, for example, due to a reaction that is subsequent to the biomarker/capture agent interaction but that relies on the formation of the biomarker/capture agent complex.
In some embodiments, the biomarker values are detected directly from the biomarkers in the biological sample.
In one embodiment, the biomarkers are detected using a multiplex format that allows for the simultaneous detection of two or more biomarkers in a biological sample. In one embodiment of the multiplex format, the capture reagent is immobilized directly or indirectly, covalently or non-covalently, in discrete locations on the solid support. In another embodiment, the multiplexed format uses discrete solid supports, wherein each solid support has a unique capture reagent, such as a quantum dot, associated with the solid support. In another embodiment, a single device is used to detect each of the multiple biomarkers to be detected in the biological sample. A single device may be configured to allow each biomarker in a biological sample to be processed simultaneously. For example, microtiter plates can be used such that each well in the plate is used to uniquely analyze one of the multiple biomarkers to be detected in a biological sample.
In one or more of the foregoing embodiments, a fluorescent tag can be used to label a component of the biomarker/capture complex to enable detection of a biomarker value. In various embodiments, the fluorescent label can be conjugated to a capture reagent specific for any of the biomarkers described herein using known techniques, and the fluorescent label can then be used to detect the corresponding biomarker value. Suitable fluorescent labels include rare earth chelates, fluorescein and its derivatives, rhodamine and its derivatives, dansyl, allophycocyanin, PBXL-3, Qdot 605, lissamine, phycoerythrin, Texas Red and other such compounds.
In one embodiment, the fluorescent label is a fluorescent dye molecule. In some embodiments, the fluorochrome molecule includes at least one substituted indolium ring system, wherein the substituent on the 3-carbon of the indolium ring contains a chemically reactive group or a conjugate species. In some embodiments, the dye molecule comprises an AlexaFluor molecule, such as AlexaFluor 488, AlexaFluor 532, AlexaFluor 647, AlexaFluor 680, or AlexaFluor 700. In other embodiments, the dye molecules include a first type and a second type of dye molecules, such as two different AlexaFluor molecules. In other embodiments, the dye molecules include a first type and a second type of dye molecules, and the two dye molecules have different emission spectra.
Fluorescence can be measured with a variety of instruments compatible with a wide range of assay formats. For example, spectrofluorometers have been designed to analyze microtiter plates, microscope slides, printed arrays, cuvettes, and the like. See Principles of fluorescence Spectroscopy by j.r. lakowicz, Springer Science + Business Media, inc., 2004. See Bioluminescence & chemiluminisence: progress & CurrentApplications; edited by Philip E.Stanley and Larry J.Kricka, World Scientific publishing company, 1 month 2002.
In one or more of the foregoing embodiments, a chemiluminescent label may optionally be used to label a component of the biomarker/capture complex to enable detection of the biomarker value. Suitable chemiluminescent materials include oxalyl chloride, rhodamine 6G, Ru (bipy)32+TMAE (tetrakis (dimethylamino) ethylene), pyrogallol (1,2, 3-trihydroxybenzene), lucigenin, peroxyoxalate, aryl oxalate, acridinium ester, dioxetane, and any other.
In still other embodiments, the detection method comprises an enzyme/substrate combination that generates a detectable signal corresponding to a biomarker value, typically, the enzyme catalyzes a chemical change in the chromogenic substrate that can be measured using a variety of techniques including spectrophotometry, fluorescence, and chemiluminescence suitable enzymes include, for example, luciferase, luciferin, malate dehydrogenase, urease, horseradish peroxidase (HRPO), alkaline phosphatase, β -galactosidase, glucoamylase, lysozyme, glucose oxidase, galactose oxidase, and glucose-6-phosphate dehydrogenase, uricase, xanthine oxidase, lactoperoxidase, microperoxidase, and the like.
In still other embodiments, the detection method can be a combination of fluorescence, chemiluminescence, radionuclide or enzyme/substrate combination that generates a measurable signal. Multimodal signaling can have unique and advantageous features in biomarker assay formats.
More specifically, biomarker values for the biomarkers described herein can be detected using known analytical methods including one-way aptamer assays, multiple-way aptamer assays, one-way or multiple-way immunoassays, mRNA expression profiles, miRNA expression profiles, mass spectrometry, histological/cytological methods, and the like, as described in detail below.
Determining biomarker values using aptamer-based assays
Assays directed to the detection and quantification of physiologically meaningful molecules in biological and other samples are important tools in the fields of scientific research and health care. One such class of assays involves the use of microarrays that include one or more aptamers immobilized on a solid support. Aptamers are each capable of binding to a target molecule in a highly specific manner and with very high affinity. See, e.g., U.S. Pat. nos. 5,475,096 entitled "Nucleic Acid Ligands"; see also, for example, U.S. patent No. 6,242,246, U.S. patent No. 6,458,543, and U.S. patent No. 6,503,715, each entitled "Nucleic Acid Ligand Diagnostic Biochip". Once the microarray is contacted with the sample, the aptamers bind to their respective target molecules present in the sample, thereby enabling the determination of biomarker values corresponding to the biomarkers.
As used herein, "aptamer" refers to a nucleic acid having specific binding affinity for a target molecule. Recognition that affinity interactions are a matter of degree; however, in this context, "specific binding affinity" of an aptamer to its target means that the aptamer generally binds its target to a much higher degree than its affinity for binding to other components in the test sample. An "aptamer" is a collection of copies of a class or species of nucleic acid molecule having a particular nucleotide sequence. Aptamers can comprise any suitable number of nucleotides, including any number of chemically modified nucleotides. "aptamer" refers to a collection of more than one such molecules. Different aptamers may have the same or different number of nucleotides. Aptamers may be DNA or RNA or chemically modified nucleic acids, and may be single-stranded, double-stranded or contain double-stranded regions, and may comprise higher order structures. Aptamers may also be photoaptamers, in which a photoreactive or chemically reactive functional group is included in the aptamer to allow covalent attachment to its corresponding target. Any of the aptamer methods disclosed herein can include the use of two or more aptamers that specifically bind to the same target molecule. As described further below, the aptamer may comprise a tag. If an aptamer includes a tag, then all copies of the aptamer need not have the same tag. Furthermore, if different aptamers each include a tag, these different aptamers may have the same tag or different tags.
Aptamers can be identified using any known method, including the SELEX process. Once identified, aptamers can be prepared or synthesized according to any known method, including chemical and enzymatic synthetic methods.
As used herein, "SOMAmer" or slow off-rate modified aptamers refers to aptamers with improved off-rate characteristics. SOMAmer can be generated using the Improved SELEX Method described in U.S. publication No. 2009/0004667 entitled "Method for Generating Aptamers with Improved Off-Rates".
The terms "SELEX" and "SELEX process" are used interchangeably herein and generally refer to a combination of: (1) the selection of aptamers that interact with the target molecule in a desired manner, e.g., bind to proteins with high affinity, (2) amplification of the selected nucleic acids. The SELEX process can be used to identify aptamers with high affinity for a particular target or biomarker.
SELEX generally involves preparing a candidate mixture of nucleic acids, binding the candidate mixture to a selected target molecule to form an affinity complex, separating the affinity complex from unbound candidate nucleic acids, separating and isolating the nucleic acids from the affinity complex, purifying the nucleic acids, and identifying specific aptamer sequences. The process may include multiple cycles to further improve the affinity of the selected aptamer. The process may include an amplification step at one or more points in the process. See, for example, U.S. Pat. No. 5,475,096 entitled "nucleic acids Ligands". The SELEX process can be used to generate aptamers that bind covalently to their target, as well as aptamers that bind non-covalently to their target. See, for example, the name "Systematic Evolution of Nucleic acid ligands by exponentiall entity: U.S. Pat. No. 5,705,337 to Chemi-SELEX ".
The SELEX process can be used to identify high affinity aptamers containing modified nucleotides that impart improved characteristics to the aptamers, such as improved in vivo stability or improved delivery characteristics. Examples of such modifications include chemical substitutions at ribose and/or phosphate and/or base positions. Aptamers Containing modified nucleotides identified by the SELEX process are described in U.S. patent No. 5,660,985 entitled High Affinity Nucleic Acid Ligands binding modifying nucleotides, which describes oligonucleotides Containing nucleotide derivatives that are chemically modified at the 5 '-and 2' -positions of pyrimidines. U.S. Pat. No. 5,580,737, supra, describes highly specific aptamers containing a ligand consisting of 2 '-amino (2' -NH)2)2 '-fluoro (2' -F) and/or 2 '-O-methyl (2' -OMe). See also U.S. patent application publication 2009/0098549 entitled "SELEX and PHOTOSELEX," which describes nucleic acid libraries having expanded physical and chemical properties and their use in SELEX and PHOTOSELEX.
SELEX can also be used to identify aptamers with desirable off-rate characteristics. See U.S. patent application publication 2009/0004667 entitled "Method for generating Aptamers with Improved Off-Rates," which describes an Improved SELEX Method for generating Aptamers that can bind to a target molecule. Methods for generating aptamers and photoaptamers having a slower dissociation rate from their respective target molecules are described. The method involves contacting the candidate mixture with the target molecule, allowing formation of nucleic acid-target complexes to occur, and performing a slow dissociation rate enrichment process, wherein nucleic acid-target complexes with fast dissociation rates dissociate and no longer form, while complexes with slow dissociation rates will remain intact. In addition, the method includes using the modified nucleotides in generating a mixture of candidate nucleic acids to generate aptamers with improved off-rate performance.
Variations of this assay employ aptamers that include photoreactive functional groups that enable the aptamer to covalently bind or "photocrosslink" with its target molecule. See, for example, U.S. Pat. No. 6,544,776 entitled "Nucleic Acid ligand diagnostic Biochip". These photoreactive aptamers are also referred to as photoaptamers. See, e.g., U.S. Pat. No. 5,763,177, U.S. Pat. No. 6,001,577, and U.S. Pat. No. 6,291,184, each entitled "Systematic Evolution of Nucleic Acid Ligands by exponentialEnrichment: pho-toselection of Nucleic Acid Ligands and Solution SELEX "; see also, for example, U.S. Pat. No. 6,458,539 entitled "Photoselection of Nucleic Acid Ligands". After contacting the microarray with the sample and allowing the photoaptamer a chance to bind to its target molecule, the photoaptamer is photoactivated and the solid support is washed to remove any non-specifically bound molecules. Stringent washing conditions can be used because the target molecule bound to the photoaptamer is generally not removed due to the covalent bond created by the one or more photoactivated functional groups on the photoaptamer. In this manner, the assay enables the detection of biomarker values corresponding to the biomarkers in the test sample.
In both assay formats, the aptamer is immobilized on a solid support prior to contact with the sample. However, in some cases, aptamer fixation prior to contact with the sample may not provide an optimal assay. For example, pre-fixation of an aptamer may result in inefficient mixing of the aptamer with the target molecule on the surface of the solid support, possibly resulting in lengthy reaction times and thus extended incubation periods to allow efficient binding of the aptamer to its target molecule. Further, when photoaptamers are used in assays and depending on the material used as the solid support, the solid support may tend to scatter or absorb the light used to effect the formation of covalent bonds between the photoaptamers and their target molecules. Furthermore, depending on the method employed, the detection of the target molecule bound to its aptamer may have the disadvantage of being imprecise, since the surface of the solid support may also be exposed to and affected by any labeling agent used. Finally, aptamer immobilisation on a solid support typically involves an aptamer preparation step (i.e. immobilisation) prior to aptamer exposure to the sample, and this preparation step may affect the activity or functionality of the aptamer.
Aptamer assays have also been described that allow aptamers to capture their target in solution and then employ a separation step designed to remove specific components of the aptamer-target mixture prior to detection (see U.S. patent application publication 2009/0042206 entitled "Multiplexed analytes of Test Samples"). By detecting and quantifying nucleic acids (i.e., aptamers), the aptamer assay method enables the detection and quantification of non-nucleic acid targets (e.g., protein targets) in a test sample. The method generates nucleic acid surrogates (i.e., aptamers) for detecting and quantifying non-nucleic acid targets, thus allowing a wide variety of nucleic acid technologies including amplification to be applied to a wider range of desired targets including protein targets.
Aptamers can be constructed to facilitate separation of assay components from aptamer biomarker complexes (or photoaptamer biomarker covalent complexes) and to allow separation of aptamers for detection and/or quantification. In one embodiment, these constructs may comprise cleavable or releasable elements in the aptamer sequence. In other embodiments, additional functionality may be introduced into the aptamer, such as a label or detectable component, a spacer component, or a specific binding tag or anchoring element. For example, the aptamer may include a tag linked to the aptamer via a cleavable moiety, a label, a spacer component that separates the labels, and a cleavable moiety. In one embodiment, the cleavable element is a photo-cleavable linker. The photocleavable linker may be attached to the biotin moiety and the spacer segment, may include an NHS group for derivatization of the amine, and may be used to introduce a biotin group into the aptamer, thereby allowing for later release of the aptamer in the assay method.
Homogeneous assays performed with all assay components in solution do not require separation of the sample from the reagents prior to detection of the signal. These methods are fast and easy to use. These methods generate a signal based on a molecular capture or binding agent that reacts with its specific target. For NSCLC, the molecular capture agent will be an aptamer, antibody, or the like, and the specific target will be the NSCLC biomarker of table 1.
In one embodiment, the method for signal generation utilizes anisotropic signal changes due to the interaction of fluorophore-labeled capture reagents with their specific biomarker targets. When the labeled capture reagent reacts with its target, the increased molecular weight causes the rotational motion of the fluorophore attached to the complex to become much slower, changing the anisotropy value. By monitoring the change in anisotropy, the binding events can be used to quantitatively measure biomarkers in solution. Other methods include fluorescence polarization assays, molecular beacon methods, time-resolved fluorescence quenching, chemiluminescence, fluorescence resonance energy transfer, and the like.
An exemplary solution-based aptamer assay that can be used to detect biomarker values corresponding to biomarkers in a biological sample includes the steps of: (a) preparing a mixture by contacting a biological sample with an aptamer comprising a first tag and having a specific affinity for the biomarker, wherein when the biomarker is present in the sample, an aptamer affinity complex is formed; (b) exposing the mixture to a first solid support comprising a first capture element and allowing the first label to bind to the first capture element; (c) removing any components of the mixture that are not bound to the first solid support; (d) attaching a second tag to the biomarker component of the aptamer affinity complex; (e) releasing the aptamer affinity complex from the first solid support; (f) exposing the released aptamer affinity complex to a second solid support comprising a second capture element and allowing a second tag to bind to the second capture element; (g) removing any uncomplexed aptamer from the mixture by partitioning the uncomplexed aptamer from the aptamer affinity complex; (h) eluting the aptamer from the solid support; and (i) detecting the biomarker by detecting the aptamer component of the aptamer affinity complex.
By detecting the aptamer component of the aptamer affinity complex, any method known in the art can be used to detect biomarker values. Many different detection methods can be used to detect the aptamer component of the affinity complex, such as hybridization assays, mass spectrometry, or QPCR. In some embodiments, nucleic acid sequencing methods can be used to detect the aptamer component of an aptamer affinity complex, and thereby detect biomarker values. Briefly, any kind of nucleic acid sequencing method may be performed on a test sample to identify and quantify one or more sequences of one or more aptamers present in the test sample. In some embodiments, the sequence includes the entire aptamer molecule or any portion of the molecule, which can be used to uniquely identify the molecule. In other embodiments, identifying the sequencing is adding a specific sequence of the aptamer; such sequences are commonly referred to as "labels," barcodes, "or" zip codes. In some embodiments, the sequencing method includes an enzymatic step to amplify the aptamer sequence or to convert any kind of nucleic acid (including RNA and DNA) containing chemical modifications to any position into any other kind of nucleic acid suitable for sequencing.
In some embodiments, the sequencing method comprises one or more cloning steps. In other embodiments, the sequencing method comprises a direct sequencing method without cloning.
In some embodiments, the sequencing method comprises a targeting method using specific primers that target one or more aptamers in the test sample. In other embodiments, the sequencing method comprises a shotgun method that targets all aptamers in the test sample.
In some embodiments, the sequencing method comprises an enzymatic step to amplify the molecules targeted for sequencing. In other embodiments, the sequencing method directly sequences a single molecule. An exemplary nucleic acid sequencing-based method that can be used to detect biomarker values corresponding to biomarkers in a biological sample includes the steps of: (a) converting the aptamer mixture containing the chemically modified nucleotide into an unmodified nucleic acid using an enzymatic step; (b) unmodified nucleic acids obtained by shotgun sequencing with massively parallel sequencing platforms, such as 454 sequencing systems (454Life Sciences/Roche), Illumina sequencing systems (Illumina), ABI SOLID sequencing systems (Applied Biosystems), HeliScope single molecule sequencers (Helicos Biosciences), or Pacific Biosciences real-time single molecule sequencing systems (Pacific Biosciences) or Polonator G sequencing systems (dover systems); and (c) identifying and quantifying the aptamers present in the mixture by specific sequence and sequence counting.
Determination of biomarker values using immunoassay
Immunoassay methods are based on the reaction of an antibody with its corresponding target or analyte, and can detect an analyte in a sample according to a particular assay format. In order to improve the specificity and sensitivity of immunoreactivity-based assay methods, monoclonal antibodies are commonly used due to their specific epitope recognition. Polyclonal antibodies have also been successfully used in a variety of immunoassays due to their increased affinity for the target as compared to monoclonal antibodies. Immunoassays have been designed for use with a wide range of biological sample matrices. Immunoassay formats have been designed to provide qualitative, semi-quantitative, and quantitative results.
Quantitative results are generated by using a standard curve generated from known concentrations of a particular analyte to be detected. The response or signal from an unknown sample is plotted on a standard curve and the amount or value corresponding to the target in the unknown sample is determined.
Numerous immunoassay formats have been devised. The ELISA or EIA can quantitatively detect the analyte. This method relies on the attachment of a label to the analyte or antibody and the label component comprises, directly or indirectly, an enzyme. ELISA assays can be formatted for direct, indirect, competitive, or sandwich detection of analytes. Other methods rely on labels, such as radioisotopes (I125) or fluorescence. Additional techniques include, for example, agglutination reactions, turbidimetry, western blotting, immunoprecipitation, immunocytochemistry, immunohistochemistry, flow cytometry, serology, Luminex assays, and others (see immunological assays: acute Guide, edited by Brian Law, published by Taylor & Francis, ltd., 2005 edition).
Exemplary assay formats include enzyme-linked immunosorbent assay (ELISA), radioimmunoassay, fluorescence, chemiluminescence and Fluorescence Resonance Energy Transfer (FRET) or time-resolved-FRET (TR-FRET) immunoassays. Examples of procedures for detecting biomarkers include biomarker immunoprecipitation followed by quantitative methods that allow size and peptide level discrimination, such as gel electrophoresis, capillary electrophoresis, planar electrochromatography, and the like.
The method of detecting and/or quantifying the detectable label or signal-generating material depends on the nature of the label. The reaction products catalyzed by the appropriate enzyme (where the detectable label is an enzyme; see above) may be, but are not limited to, fluorescent, luminescent or radioactive, or they may absorb visible or ultraviolet light. Examples of detectors suitable for detecting such detectable labels include, but are not limited to, x-ray films, radioactive counters, scintillation counters, spectrophotometers, colorimeters, fluorometers, luminometers, and densitometers.
Any of the detection methods may be performed in any format that allows for any suitable preparation, processing, and analysis of the reaction. This may be, for example, in a multi-well assay plate (e.g. 96-well or 384-well), or using any suitable array or microarray. Stock solutions for various reagents can be prepared manually or robotically, and all subsequent pipetting, dilution, mixing, dispensing, washing, incubation, sample reading, data collection and analysis can be accomplished robotically using commercially available analytical software, robots, and detection instruments capable of detecting detectable labels.
Determination of biomarker values using gene expression profiling
Measuring mRNA in a biological sample can be used as an alternative to detecting the level of the corresponding protein in the biological sample. Thus, any of the biomarkers or biomarker panels described herein may also be detected by detecting the appropriate RNA.
mRNA expression levels were measured by reverse transcription quantitative polymerase chain reaction (RT-PCR followed by qPCR). RT-PCR was used to generate cDNA from mRNA. The cDNA can be used in qPCR assays to generate fluorescence as the DNA amplification process progresses. qPCR can produce an absolute measurement, such as mRNA copy number per cell, by comparison to a standard curve. Northern blotting, microarrays, Invader assays, and RT-PCR in combination with capillary electrophoresis have all been used to measure mRNA expression levels in samples. See Gene Expression Profiling: methods and Protocols, Richard a. shimkets, eds, Humana Press, 2004.
miRNA molecules are small RNAs that do not encode, but can regulate gene expression. Any of the methods suitable for measuring mRNA expression levels can be used for the corresponding mirnas. Recently, many laboratories have investigated the use of mirnas as biomarkers for disease. Many diseases involve extensive transcriptional regulation, and it is not surprising that mirnas may find a role as biomarkers. The correlation between miRNA concentration and disease is generally less clear than the correlation between protein level and disease, however, the value of miRNA biomarkers may be important. Of course, as with any RNA that is differentially expressed during disease, the problems faced in the development of in vitro diagnostic products include the following requirements: mirnas survive in diseased cells and are easily extracted for analysis, or mirnas are released into the blood or other matrix where they must survive long enough for measurement. Protein biomarkers have similar requirements, although many potential protein biomarkers are intentionally secreted in paracrine form at pathological and functional sites during disease. Many potential protein biomarkers are designed to function extracellularly within which those proteins are synthesized.
Detection of molecular markers using in vivo molecular imaging techniques
Any of the biomarkers (see table 1) can also be used in molecular imaging tests. For example, an imaging agent may be conjugated to any of the biomarkers, which may be used to aid in NSCLC diagnosis, to monitor disease progression/remission or metastasis, to monitor disease recurrence, or to monitor response to therapy, among other uses.
In vivo imaging techniques provide a non-invasive method for determining a particular disease state in an individual. For example, the entire body or even the entire body can be viewed as a three-dimensional image, providing valuable information about the morphology and structure within the body. Such techniques can be combined with the biomarker assays described herein to provide information about the cancer status, particularly NSCLC status, of an individual.
The use of in vivo molecular imaging techniques has expanded due to various advances in technology. These advances include the development of new contrast agents or labels, such as radioactive labels and/or fluorescent labels, which can provide strong signals in vivo; and the development of powerful new imaging techniques that can detect and analyze these signals from outside the body with sufficient sensitivity and accuracy to provide useful information. The contrast agent may be visualized in a suitable imaging system, providing an image of one or more body parts in which the contrast agent is located. The contrast agent may be combined with or associated with: for example capture reagents such as aptamers or antibodies, and/or peptides or proteins, or oligonucleotides (e.g. for detection of gene expression) or complexes containing any of these together with one or more macromolecules and/or other particulate forms.
Contrast agents may also be characteristic of radioactive atoms that may be used for imaging. Suitable radioactive atoms for scintillation studies include technetium-99 m or iodine-123. Other readily detectable moieties include, for example, spin labels for Magnetic Resonance Imaging (MRI), such as iodine-123, iodine-131 again, indium-111, fluorine-19, carbon-13, nitrogen-15, oxygen-17, gadolinium, manganese, or iron. Such labels are well known in the art and can be readily selected by one of ordinary skill in the art.
Standard imaging techniques include, but are not limited to, magnetic resonance imaging, computed tomography scans, Positron Emission Tomography (PET), Single Photon Emission Computed Tomography (SPECT), and the like. For diagnostic in vivo imaging, the type of detection instrument available is a major factor in the selection of a given contrast agent, such as a given radionuclide and a specific biomarker for a target (protein, mRNA, etc.). The radionuclide chosen will generally be of the type of decay detectable by a given type of instrument. In addition, when selecting a radionuclide for in vivo diagnosis, its half-life should be long enough to enable detection at maximum uptake by the target tissue, but short enough to minimize harmful radiation from the host.
Exemplary imaging techniques include, but are not limited to, PET and SPECT, which are imaging techniques in which a radionuclide is administered systemically or locally to an individual. The subsequent uptake of the radiotracer is measured over a period of time and used to obtain information about the targeted tissue and the biomarker. Due to the high energy (gamma-ray) emission of the specific isotope employed and the sensitivity and complexity of the instruments used to detect it, the two-dimensional distribution of radioactivity can be inferred from outside the body.
Positron-emitting nuclides commonly used in PET include, for example, carbon-11, nitrogen-13, oxygen-15, and fluorine-18. Isotopes that decay by electron capture and/or gamma-emission are used in SPECT and include, for example, iodine-123 and technetium-99 m. An exemplary method of labeling amino acids with technetium-99 m is reduction of pertechnetate ion in the presence of a chelate precursor to form an unstable technetium-99 m-precursor complex, which in turn reacts with the metal binding group of a bifunctional modified chemotactic peptide to form a technetium-99 m-chemotactic peptide conjugate.
Antibodies are frequently used in such in vivo imaging diagnostic methods. The preparation and use of antibodies for in vivo diagnosis is well known in the art. A labeled antibody that specifically binds any of the biomarkers in table 1 can be injected into an individual suspected of having a certain type of cancer (e.g., NSCLC), which can be detected based on the particular biomarker used for purposes of diagnosing or evaluating the disease condition of the individual. As previously described, the markers used are selected according to the imaging mode to be used. The localization of the marker allows the spread of the cancer to be determined. The amount of the marker within the organ or tissue also allows for the determination of the presence or absence of cancer in the organ or tissue.
Similarly, aptamers can be used in such in vivo imaging diagnostic methods. For example, aptamers used to identify (and thus specifically bind to) a particular biomarker described in table 1 may be appropriately labeled and injected into an individual suspected of having NSCLC, which may be detected based on the particular biomarker, for the purpose of diagnosing or assessing the NSCLC condition of the individual. As previously described, the markers used are selected according to the imaging mode to be used. The localization of the marker allows the spread of the cancer to be determined. The amount of the marker within the organ or tissue also allows for the determination of the presence or absence of cancer in the organ or tissue. Aptamer-directed imaging agents may have unique and advantageous characteristics with respect to tissue penetration, tissue distribution, kinetics, elimination, potency, and selectivity as compared to other imaging agents.
Such techniques may also optionally be performed with labeled oligonucleotides, for example, for detecting gene expression by imaging with antisense oligonucleotides. These methods are used for in situ hybridization, for example with fluorescent molecules or radionuclides as labels. Other methods for detecting gene expression include, for example, detecting the activity of a reporter gene.
Another general type of imaging technique is optical imaging, in which a fluorescence signal within an object is detected by optical means external to the object. These signals may be due to actual fluorescence and/or bioluminescence. Improvements in the sensitivity of optical detection devices have increased the usefulness of optical imaging for in vivo diagnostic assays.
The use of in vivo molecular biomarker imaging is increasing, including for clinical trials, e.g., more rapidly measuring clinical efficacy in trials on new cancer therapies, and/or avoiding prolonged placebo treatment for those diseases such as multiple sclerosis, where such prolonged treatment may be considered ethically problematic.
For a review of other technologies, see n.blow, Nature Methods, 6, 465-469, 2009.
Determination of biomarker values using histological/cytological methods
For the assessment of NSCLC, a variety of tissue samples may be used in histological or cytological procedures. Sample selection depends on the primary tumor location and the site of metastasis. For example, intrabronchial and transbronchial biopsies, fine needle aspiration, cutting needles, and core biopsies can be used for histology. Bronchial washes and scrubs, pleural aspiration, pleural fluid and sputum can be used for cytology. Although cytological assays are still used in the diagnosis of NSCLC, histological methods are known to provide better sensitivity for cancer detection. Any of the biomarkers identified herein that show up-regulation in individuals with NSCLC (table 1) can be used to stain histological samples as an indicator of disease.
In one embodiment, one or more capture reagents specific for the respective one or more biomarkers are used in cytological evaluation of lung tissue cell samples and may include one or more of the following: collecting a cell sample, fixing the cell sample, dehydrating, clearing (clearing), fixing the cell sample on a microscope slide, permeabilizing the cell sample, treating for analyte recovery (analyte recovery), staining, decolorizing, washing, blocking, and reacting with one or more capture reagents in a buffer solution. In another embodiment, the cell sample is produced from a cell pellet.
In another embodiment, one or more capture reagents specific for the respective one or more biomarkers are used in histological evaluation of lung tissue samples, and may include one or more of the following: collecting a tissue sample, fixing the tissue sample, dehydrating, clearing, fixing the tissue sample on a microscope slide, permeabilizing the tissue sample, treating for analyte recovery, staining, destaining, washing, blocking, rehydrating, and reacting with one or more capture reagents in a buffer solution. In another embodiment, the fixation and dehydration is replaced by freezing.
Suitable nucleic acid amplification methods include, for example, PCR, q- β replicase, rolling circle amplification, strand displacement, helicase-dependent amplification, loop-mediated isothermal amplification, ligase chain reaction, and restriction and circularization assisted rolling circle amplification.
In one embodiment, one or more capture reagents specific for the respective biomarker for histological or cytological evaluation are mixed in a buffer solution, which may include any of the following: blocking materials, competitors, detergents, stabilizers, carrier nucleic acids, polyanionic materials, and the like.
"cytological protocols" generally include sample collection, sample fixation (hybridization), sample fixation and staining. "cell preparation" may include several processing steps after sample collection, including the use of one or more slow off-rate aptamers for staining the prepared cells.
Sample collection may include placing the sample directly into an untreated transport container, placing the sample into a transport container containing some type of media, or placing the sample directly onto a slide (mounting) without any treatment or immobilization.
Sample fixation can be improved by applying the portion of the collected sample to a glass slide treated with polylysine, gelatin, or silane. Slides can be prepared by smearing a thin and uniform layer of cells over the slide. Care is usually taken to minimize mechanical distortion and drying artifacts. The fluid sample may be processed in a cell block method. Alternatively, the liquid sample may be mixed with the fixative solution 1:1 at room temperature for about 10 minutes.
The cell mass can be prepared from residual effusion, sputum, urinary sediment, gastrointestinal fluid, lung fluid, cell scrapings, or fine needle aspirates. The cells are concentrated or compacted by centrifugation or membrane filtration. A number of methods have been developed for the preparation of cell clumps. Representative procedures include fixed sediment, bacterial agar or membrane filtration methods. In the fixed sediment method, the cell sediment is mixed with a fixative such as Bouins, picric acid, or buffered formalin, and then the mixture is centrifuged to pellet the fixed cells. The supernatant was removed and the cell pellet was dried as completely as possible. The pellet was collected and wrapped in a tissue paper and then placed in a tissue cassette (tissue cassette). The tissue embedding cassettes are placed in jars containing additional fixative and processed as tissue samples. The agar method is very similar, except that the cell pellet is removed and dried on a paper towel, and then cut in half. The cut side was placed in a drop of melted agar on a glass slide and the pellet was then covered with agar, ensuring that no air bubbles were formed in the agar. The agar was allowed to harden and any excess agar was subsequently shaved off. It is placed in a tissue embedding cassette and tissue processing is completed. Alternatively, the pellet can be suspended directly in 2% liquid agar at 65 ℃ and the sample centrifuged. The agar cell pellet was allowed to solidify at 4 ℃ for 1 hour. The solid agar can be removed from the centrifuge tube and cut in half. Agar was wrapped in filter paper and then placed in a tissue embedding cassette. The processing from here is as described above. Membrane filtration may be used in place of centrifugation in any of these procedures. Any of these processes can be used to generate a "cell pellet sample".
Cell clumps can be prepared using specialized resins including Lowicryl resin, LR White, LRgold, Unicryl, and Monostep. These resins have low viscosity and can be polymerized at low temperature and using Ultraviolet (UV) light. The embedding process relies on gradually cooling the sample during dehydration, transferring the sample to a resin, and polymerizing the block at the appropriate UV wavelength at final low temperature.
The section of the cell block can be stained with hematoxylin-eosin for cell morphology examination, while the other section is used for examination of specific markers.
Whether the process is cytological or histological, the sample may be fixed prior to additional processing to prevent degradation of the sample. This process is called "immobilization" and describes a wide range of materials and procedures that may be used interchangeably. The sample fixation protocol and reagents are optimally selected empirically based on the target to be detected and the particular cell/tissue type to be analyzed. Sample fixation depends on reagents such as ethanol, polyethylene glycol, methanol, formalin or isopropanol. The sample should be fixed as soon as possible after collection and attachment to the slide. However, the selected fixative may introduce structural changes within the various molecular targets, making subsequent detection thereof more difficult. The fixation and anchorage processes and their order can modify the appearance of the cells, and these changes must be anticipated and recognized by the cell technician. Fixatives can cause certain cell types to shrink and cause the cytoplasm to appear granular or reticulated. Many fixatives work by cross-linking cellular components. This can damage or modify specific epitopes, generate neo-epitopes, facilitate molecular association and reduce membrane permeability. Formalin fixation is one of the most commonly used cytology/histology methods. Formalin forms a methyl bridge between adjacent proteins or within proteins. Precipitation or coagulation is also used for fixation, and ethanol is frequently used in such fixation. A combination of crosslinking and precipitation may also be used for immobilization. Strong immobilization processes are optimal in preserving morphological information, while weaker immobilization processes are optimal for preservation of molecular targets.
Representative fixatives are 50% absolute ethanol, 2mM polyethylene glycol (PEG), 1.85% formaldehyde. Variations on this formulation include ethanol (50% -95%), methanol (20% -50%) and formalin (formaldehyde) only. Another common fixative is 2% PEG 1500, 50% ethanol and 3% methanol. The slides were placed in the fixative at room temperature for about 10-15 minutes and then removed and allowed to dry. Once the slide is fixed, it can be rinsed with a buffer solution such as PBS.
A wide range of dyes can be used to differentially highlight and contrast or "stain" cellular, subcellular, and tissue features or morphological structures. Hematoxylin is used to stain the nucleus blue or black. Both Orange G-6 and Eosin Azure stain the cytoplasm of the cells. Orange G stained cells containing keratin and glycogen yellow. Eosin Y is used to stain nucleoli, cilia, red blood cells and superficial epithelial squamous cells. Romanowsky stain was used for air-dried slides and can be used to enhance pleomorphism (pleomorphism) and distinguish extracellular from cytoplasmic material.
The staining process may include a treatment that increases the permeability of the cells to a stain. Treatment of cells with detergent may be used to increase permeability. To increase cell and tissue permeability, the fixed sample may be further treated with solvents, saponins or non-ionic detergents. Enzymatic digestion may also improve the accessibility of specific targets in tissue samples.
After staining, the samples were dehydrated using a series of alcohol washes of increasing alcohol concentration. The final wash is done using xylene or a xylene substitute, such as citrus terpene, having a refractive index close to that of the coverslip to be applied to the slide. This last step is called transparency. Once the sample is dehydrated and clear, the sealant is applied. The mounting agent is selected to have a refractive index close to that of the pre-glass and to enable the cover glass to adhere to the slide. It also inhibits additional drying, shrinking or fading of the cell sample.
The final assessment of the lung cytology specimen is performed by some type of microscopic examination, regardless of the stain used or the processing, to allow visual inspection of the morphology and determination of the presence or absence of the marker. Exemplary microscopy methods include bright field, phase difference, fluorescence, and differential interference contrast.
If a secondary test is required on the sample after inspection, the cover slip can be removed and the glass slide destained. Destaining involves the use of the original solvent system used in the initial staining of the slides without the addition of dyes and in the reverse order of the original staining procedure. Destaining can also be accomplished by soaking the slides in acid alcohol until the cells are colorless. Once colorless, the slides were rinsed thoroughly in a water bath and a second staining procedure was applied.
In addition, specific molecular differentiation may be able to bind to cellular morphological analysis by using specific molecular agents, such as antibodies or nucleic acid probes or aptamers. This improves the accuracy of diagnostic cytology. Microdissection can be used to isolate subsets of cells for additional evaluation, particularly for genetic evaluation of abnormal chromosomes, gene expression, or mutations.
Preparation of tissue samples for histological evaluation involves fixation, dehydration, infiltration, embedding and sectioning. The fixation reagents used in histology are very similar or identical to those used in cytology and have the same problem of preserving morphological features at the expense of molecular features such as individual proteins. Time can be saved if the tissue sample is not fixed and dehydrated, but rather is frozen and then sectioned upon freezing. This is a gentler process and more individual markers can be saved. However, freezing is unacceptable for long-term storage of tissue samples because subcellular information is lost due to the introduction of ice crystals. Freezing ice in tissue samples also prevents the sectioning process from producing extremely thin sections, and thus some of the microscopic resolution and imaging of subcellular structures may be lost. In addition to formalin fixation, osmium tetroxide is also used to fix and stain phospholipids (membranes).
Dehydration of the tissue is achieved by successive washes with increasing alcohol concentrations. The transparentization employed a material that was miscible with the alcohol and embedding material and involved a stepwise process starting with 50:50 alcohol: transparentizing agent and followed by 100% transparentizing agent (xylene or xylene substitute). Infiltration involves incubating the tissue with an embedding agent in liquid form (warm wax, nitrocellulose solution) first 50:50 embedding agent: clearing agent, followed by 100% embedding agent. Embedding is accomplished by placing the tissue in a mold or cassette and filling with a molten embedding agent such as wax, agar or gelatin. Allowing the embedding agent to harden. The hardened tissue sample may then be cut into thin sections for staining and subsequent examination.
Prior to staining, the tissue sections were deparaffinized and rehydrated. Xylene is used to dewax sections, one or more xylene changes may be used, and the tissue rehydrated by successive washes in decreasing concentrations of alcohol. Tissue sections may be heat-fixed to glass slides at about 80 ℃ for about 20 minutes prior to deparaffinization.
Laser capture microdissection allows the isolation of subsets of cells from tissue sections for further analysis.
As in cytology, tissue sections or slices may be stained with a variety of stains in order to enhance the visualization of microscopic features. A large number of commercially available dyes can be used to enhance or identify a particular feature.
To further increase the interaction of molecular agents with cytological/histological samples, a number of techniques for "analyte recovery" have been developed. A first such technique uses high temperature heating of a fixed sample. This method is also known as heat-induced epitope retrieval or HIER. A variety of heating techniques have been used including steam heating, microwaves, high pressure steam, water baths, and pressure cooking or combinations of these heating methods. Analyte recovery solutions include, for example, water, citrate, and physiological saline buffer. The key to analyte recovery is the time at high temperature, but lower temperatures have also been used successfully for longer periods of time. Another key to analyte recovery is heating the pH of the solution. It was found that low pH provided optimal immunostaining, but also produced a background that often required the use of a second tissue section as a negative control. Regardless of buffer composition, the most consistent benefit (increased immunostaining without background increase) is generally obtained using high pH solutions. The analyte recovery process for a particular target is empirically optimized for the target using heat, time, pH and buffer composition as variables for process optimization. The use of the microwave analyte recovery method allows for sequential staining of different targets with antibody reagents. The time required to achieve antibody and enzyme complexes between staining steps has also been shown to degrade cell membrane analytes. Microwave heating methods have also improved in situ hybridization methods.
To begin the analyte recovery process, the sections are first dewaxed and hydrated. The slides were then placed in 10mM sodium citrate buffer pH 6.0 in a dish or jar. A representative procedure uses 1100W microwaves and microwells the slides for 2 minutes at 100% power, followed by microwells the slides for 18 minutes at 20% power after examination determines that the slides remain covered in liquid. The slides were then allowed to cool in a lidless container and then rinsed with distilled water. The HIER may be used in combination with enzymatic digestion to improve the reactivity of the target with immunochemical reagents.
One such enzymatic digestion protocol uses the protease K. The protease K was prepared at a concentration of 20g/ml in 50mM Tris base, 1mM EDTA, 0.5% Triton X-100, pH 8.0 buffer. The process first involved dewaxing the slices in 2 changes of xylene for 5 minutes each. The samples were then hydrated in 2 changes of 100% ethanol for 3 minutes each, 95% and 80% ethanol for 1 minute each, and then rinsed in distilled water. Sections were covered with the proteinase K working solution and incubated in a humidified chamber at 37 ℃ for 10-20 minutes (the optimal incubation time can vary depending on tissue type and degree of fixation). The sections were cooled at room temperature for 10 minutes and then washed in PBS Tween 20 for 2x2 minutes. If desired, sections can be blocked to eliminate potential interference from endogenous compounds and enzymes. The sections were then incubated with primary antibody at the appropriate dilution in primary antibody dilution buffer for 1 hour at room temperature or overnight at 4 ℃. The sections were then washed 2x2 minutes with PBS Tween 20. Additional blocking can be performed if desired for a particular application, followed by additional washing with PBS Tween 20 for 3x2 minutes, and then finally completion of the immunostaining protocol.
Simple treatment with 1% SDS at room temperature has also been demonstrated to improve immunohistochemical staining. Analyte recovery methods have been applied to slide mounted sections as well as free floating sections. Another treatment option is to place the slides in a jar containing citric acid and 0.1Nonident P40 at pH 6.0 and heat to 95 ℃. The slides are then washed with a buffer solution such as PBS.
For immunological staining of tissues, non-specific binding of antibodies to tissue proteins can be blocked by immersing the sections in a protein solution such as serum or skim milk powder.
The blocking reaction may include the following requirements: reducing the level of endogenous biotin; eliminating endogenous charge effects; inactivating the endogenous nuclease; and/or inactivation of endogenous enzymes such as peroxidases and alkaline phosphatases. Endogenous nucleases can be inactivated by: degradation with proteinase K, heat treatment, introduction of vector DNA or RNA using a chelating agent such as EDTA or EGTA, treatment with a chaotropic agent such as urea, thiourea, guanidine hydrochloride, guanidine thiocyanate, lithium perchlorate or the like or diethyl pyrocarbonate. Alkaline phosphatase can be inactivated by treatment with 0.1N HC1 for 5 minutes at room temperature or with 1mM levamisole. Peroxidase activity can be eliminated by treatment with 0.03% hydrogen peroxide. Endogenous biotin can be blocked by soaking the slide or slice in a solution of avidin (streptavidin, neutravidin can be substituted) for at least 15 minutes at room temperature. The slides or sections are then washed in buffer for at least 10 minutes. This step may be repeated at least three times. The slides or sections were then soaked in biotin solution for 10 minutes. This can be repeated at least three times, each time with fresh biotin solution. The buffer wash procedure was repeated. Blocking protocols should be minimized to prevent damage to cells or tissue structures or one or more targets of interest, but one or more of these protocols may be combined to "block" the slide or section prior to reaction with the one or more slow off-rate aptamers. See Basic Medical history: the Biology of cells, Tissues and organics, authored by Richard G.Kessel, Oxford University Press, 1998.
Determination of biomarker values using mass spectrometry methods
A variety of mass spectrometer configurations can be used to detect biomarker values. Several types of mass spectrometers are available or can be produced in a variety of configurations. In general, mass spectrometers have the following main components: sample inlet, ion source, mass analyzer, detector, vacuum system, and instrument control system and data system. Differences in the sample inlet, ion source and mass analyzer generally define the type of instrument and its capabilities. For example, the inlet may be a capillary column liquid chromatography source, or may be a direct probe or stage (stage) such as used in matrix-assisted laser desorption. Commonly used ion sources are for example electrospray including nano-spray and micro-spray, or matrix assisted laser desorption. Commonly used mass analyzers include quadrupole mass filters, ion trap mass analyzers and time-of-flight mass analyzers. Additional mass spectrometry methods are well known in the art (see Burlingame et al, anal. chem.70:647R-716R (1998); Kinter and Sherman, New York (2000)).
Protein biomarkers and biomarker values may be detected and measured by any of the following: electrospray ionization mass spectrometry (ESI-MS), ESI-MS/MS, ESI-MS/(MS) N, matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS), surface enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF-MS), desorption/ionization on silicon (DIOS), Secondary Ion Mass Spectrometry (SIMS), quadrupole time-of-flight (Q-TOF), tandem time-of-flight (TOF/TOF) techniques known as ultraflex III TOF/TOF, atmospheric pressure chemical ionization mass spectrometry (APCI-MS), APCI-MS/MS, APCI- (MS) N, atmospheric pressure ionization mass spectrometry (APPI-MS), APPI-MS/MS and APPI- (MS) N, quadrupole mass spectrometry, Fourier Transform Mass Spectrometry (FTMS); APCI-MS, Quantitative mass spectrometry and ion trap mass spectrometry.
The sample preparation strategy is used to label and enrich the sample before mass spectrometry of protein biomarkers and determination of biomarker values. Labeling methods include, but are not limited to, equivalent ectopic tags (iTRAQ) for relative and absolute quantification and stable isotope labeling with amino acids in cell culture (SILAC). Capture reagents used to selectively enrich a sample for candidate biomarker proteins prior to mass spectrometry include, but are not limited to, aptamers, antibodies, nucleic acid probes, chimeras, small molecules, F (ab')2 fragments, single chain antibody fragments, Fv fragments, single chain Fv fragments, nucleic acids, lectins, ligand-binding receptors, affibodies, nanobodies, ankles, domain antibodies, variable antibody scaffolds (e.g., diabodies, etc.), imprinted polymers, high affinity multimers, peptide mimetics, peptoids, peptide nucleic acids, threose nucleic acids, hormone receptors, cytokine receptors and synthetic receptors, bodies, and modifications and fragments of these.
Determination of biomarker values using proximity ligation assays
Proximity ligation assays can be used to determine biomarker values. Briefly, a test sample is contacted with a pair of affinity probes, which may be a pair of antibodies or a pair of aptamers, wherein each member of the pair is extended by an oligonucleotide. The targets for the pair of affinity probes may be two different determinants on one protein (determinants) or one determinant on each of two different proteins, which may be present as homo-or heteromeric complexes. When the probe binds to the target determinant, the free ends of the oligonucleotide extensions come close enough to hybridize together. Hybridization of oligonucleotide extensions is facilitated by the common adaptor oligonucleotides which act to bridge the oligonucleotide extensions together when they are placed in sufficient proximity. Once the oligonucleotide extensions of the probes are hybridized, the extended ends are ligated together by enzymatic DNA ligation.
Each oligonucleotide extension comprises a primer site for PCR amplification. Once the oligonucleotide extensions are ligated together, the oligonucleotides form a contiguous DNA sequence that reveals information about the identity and amount of the target protein, as well as information about protein-protein interactions, where the target determinant is on two different proteins, by PCR amplification. Proximity ligation can provide a highly sensitive and specific assay for real-time protein concentration and interaction information by using real-time PCR. Probes that do not bind the determinant of interest do not have the corresponding oligonucleotide extension to reach proximity and do not undergo ligation or PCR amplification, resulting in no signal generation.
The foregoing assays enable detection of biomarker values useful in a method of diagnosing NSCLC, wherein the method comprises detecting in a biological sample from an individual at least N biomarker values, each corresponding to a biomarker selected from the biomarkers provided in table 1, wherein a classification using the biomarker values indicates whether the individual has NSCLC as described in detail below. Although some of the NSCLC biomarkers may be used alone to detect and diagnose NSCLC, methods for grouping a plurality of subsets of NSCLC biomarkers, each of which may be used as a panel of three or more biomarkers, are also described herein. Accordingly, various embodiments of the present application provide combinations comprising N biomarkers, wherein N is at least three biomarkers. In other embodiments, N is selected as any number from 2-59 biomarkers. It should be understood that N may be selected as any number from any of the above ranges and similar but higher order ranges. In accordance with any of the methods described herein, biomarker values may be individually detected and classified, or they may be detected and classified together, as in a multiplex assay format, for example.
In another aspect, a method for detecting the absence of NSCLC is provided, the method comprising detecting at least N biomarker values in a biological sample from an individual, each corresponding to a biomarker selected from the biomarkers provided in table 1, wherein a classification of the biomarker values indicates the absence of NSCLC in the individual as described in detail below. Although certain of the NSCLC biomarkers can be used alone to detect and diagnose the absence of NSCLC, also described herein are methods for grouping a plurality of subsets of NSCLC biomarkers, each of which can be used as a panel of three or more biomarkers. Accordingly, various embodiments of the present application provide combinations comprising N biomarkers, wherein N is at least three biomarkers. In other embodiments, N is selected as any number from 2-59 biomarkers. It should be understood that N may be selected as any number from any of the above ranges and similar but higher order ranges. In accordance with any of the methods described herein, biomarker values may be individually detected and classified, or they may be detected and classified together, as in a multiplex assay format, for example.
Classification of biomarkers and calculation of disease scores
A biomarker "marker" for a given diagnostic test contains a collection of markers, each marker having a different level in the population of interest. In this context, different levels may refer to different averages of marker levels for individuals in two or more groups, or different variances in two or more groups, or a combination of both. For the simplest form of diagnostic testing, these markers can be used to assign unknown samples from an individual to one of two groups, diseased or non-diseased. The assignment of a sample to one of two or more groups is called classification, and the procedure for achieving such assignment is called a classifier or classification method. The classification method may also be referred to as a scoring method. There are many classification methods that can be used to construct a diagnostic classifier from a set of biomarker values. In general, classification methods are most easily performed using supervised learning techniques, in which data sets are collected using samples from individuals within two (or more, for multiple classification states) distinct groups that are desired to be distinguished. Since the class (group or population) to which each sample belongs is known a priori for each sample, the classification method can be trained to give the desired classification response. Unsupervised learning techniques can also be used to generate diagnostic classifiers.
Common methods for developing diagnostic classifiers include decision trees; bagging, improving forests and random forests; learning based on rule inference; a Parzen window; a linear model; logic; a neural network method; unsupervised clustering; k-average value; layered ascending/descending; semi-supervised learning; a prototype method; nearest neighbor; estimating the nuclear density; a support vector machine; a hidden Markov model; boltzmann learning; and the classifiers can be combined simply or in a manner that minimizes a particular objective function. For a review see, e.g., Pattern Classification, r.o.duca et al, editors, John Wiley & Sons, 2 nd edition, 2001; see also The Elements of Statistical Learning-Data Mining, reference, and Prediction, T.Hastie et al, eds., Springer Science + Business Media, LLC, 2 nd edition, 2009; each of which is incorporated herein by reference in its entirety.
To generate a classifier using supervised learning techniques, a set of samples, referred to as training data, is obtained. In the context of diagnostic testing, training data includes samples from different sets (classes) to which unknown samples will be later assigned. For example, samples collected from individuals in a control population and individuals in a particular disease population can constitute training data to develop a classifier that can classify an unknown sample (or more particularly, the individual from which the sample was obtained) as having the disease or not having the disease. Developing a classifier from training data is referred to as training the classifier. The specific details regarding classifier training depend on the nature of the supervised learning technique. For illustrative purposes, an example of training a naive Bayesian classifier is described below (see, e.g., Pattern Classification, R.O.Duda et al, eds., John Wiley & Sons, 2 nd edition, 2001; see also, The Elements of Statistical Learning-Data Mining, reference, and Prediction, T.Hastie et al, eds., Springer Science + Business Media, LLC, 2 nd edition, 2009).
Since there are typically many more potential biomarker values than samples in the training set, care must be taken to avoid overfitting. Overfitting occurs when the statistical model describes random errors or noise instead of potential relationships. Overfitting can be avoided in a number of ways including, for example, limiting the number of labels used in developing the classifier, assuming that label responses are independent of each other, limiting the complexity of the potential statistical model employed, and ensuring that the potential statistical model conforms to the data.
Illustrative examples of the use of a set of biomarkers to develop a diagnostic test include the application of a naive bayes classifier, a simple probabilistic classifier based on bayesian theorem, with strictly independent processing of biomarkers. Each biomarker is described by a class-dependent probability density function (pdf) with respect to measured RFU or logarithmic RFU (relative fluorescence units) values in each class. The common pdf for a set of markers in a class is assumed to be the product of the individual class-dependent pdfs for each biomarker. Training a naive bayesian classifier in this context means specifying parameters ("parameterization") to characterize the class-dependent pdf. Any potential model for the class-dependent pdf can be used, but the model should generally conform to the data observed in the training set.
Specifically, the class-dependent probability of measuring the value Xi of the biomarker i in the disease class is written as p (Xi | d), and observed to have the value x ═ (x | d)1,x2,……xn) Total naive Bayes probability composition of n labelsWhere a single XiS is the measured biomarker level in RFU or log RFU. Assigning a classification to an unknown is facilitated by: for the same measurement, there is a measured probability of prevalence of X, p (d | X), compared to the probability of prevalence p (c | X) without disease (control). The proportion of these probabilities is computed from the class-dependent pdf by applying the bayesian theorem, i.e. theWhere p (d) is the prevalence of disease in the population for which the test is appropriate. Taking logarithm of two sides of the proportion and substituting the logarithm into naive Bayes class dependency probability to obtainThis form is called log-likelihood ratio and simply states that the log-likelihood ratio of no particular disease versus having a disease and is composed primarily of the sum of the individual log-likelihood ratios of the n individual biomarkers. At its simplestIn a form, an unknown sample (or more particularly, the individual from which the sample was obtained) is classified as free of disease if the above ratio is greater than zero, and as having disease if the ratio is less than zero.
In an exemplary embodiment, the class-dependent biomarkers pdf p (Xi | C) and p (Xi | d) assume a normal or log-normal distribution in the measured RFU values Xi, i.e.For using mudAnd σdP (Xi | d) of (1) has similar expression. Parameterization of the model requires estimation of two parameters, the mean μ and variance σ, from each class-dependent pdf of the training data2. This may be accomplished in a number of ways including, for example, maximum likelihood estimation, least squares, and any other method known to those skilled in the art. Substituting the normal distributions for μ and σ into the log-likelihood ratio defined above, the following expression is obtained:
once μ s and σ2The set of s has been defined for each pdf in each class from the training data and specifies the prevalence of the disease in the population, a bayesian classifier is fully determined and can be used to classify unknown samples with a measurement of X.
The performance of a naive bayes classifier depends on the number and quality of biomarkers used to construct and train the classifier. As defined in example 3 below, a single biomarker will be carried out according to its KS distance (Kolmogorov-Smirnov). If the classifier performance metric is defined as the area under the receiver operating characteristic curve (AUC), a perfect classifier will have a score of 1, and on average, a random classifier will have a score of 0.5. The KS distance between two sets A and B of size n and m is defined as the value Dn,m=supx|FA,n(x)–FB,m(x) I, which is two experiencesMaximum difference between cumulative distribution functions (cdf). Empirical cdf, X for a set A of n observationsiIs defined asWherein IxiX is an indicator function, if Xi< x, it equals 1, and otherwise equals 0. By definition, this value is bounded between 0 and 1, where KS distance 1 indicates that the empirical distributions do not overlap.
Subsequent mark additions with good KS distances (e.g., >0.3) will generally improve classification performance if the subsequently added mark is not dependent on the first mark. Many high scoring classifiers were generated directly with a variant of the greedy algorithm using the area under the ROC curve (AUC) as the classifier score. (greedy is any algorithm that follows the problem solving meta-heuristic (metaprobabilistic) of making the local best choice at each stage, hopefully finding the overall optimum value.)
The algorithmic approach used here is described in detail in example 4. In short, all single analyte classifiers are generated from the table of potential biomarkers and added to the list. Next, all possible additions of the second analyte to each of the stored single analyte classifiers are then made, storing a predetermined number of best scoring pairs on the new list, e.g., one thousand. This new list of best two-labeled classifiers is used to explore all possible three-labeled classifiers, again storing the best one thousand of them. This process continues until the score enters a plateau or begins to deteriorate as additional markers are added. Those high scoring classifiers that remain after convergence can be evaluated for the desired performance for the intended use. For example, in one diagnostic application, a classifier with high sensitivity and medium specificity may be more desirable than medium sensitivity and high specificity. In another diagnostic application, a classifier with high specificity and moderate sensitivity may be more desirable. The desired performance level is generally selected based on a tradeoff that must be made between the number of false positives and false negatives that may each be tolerated for a particular diagnostic application. Such trade-offs generally depend on the medical consequences of false positive or false negative errors.
A variety of other techniques are known in the art and may be employed to generate many potential classifiers from a list of biomarkers using a na iotave bayes classifier. In one embodiment, a so-called genetic algorithm may use fitness scores as defined above for combining different markers. Genetic algorithms are particularly well suited to explore large diverse populations of potential classifiers. In another embodiment, so-called ant colony optimization may be used to generate the set of classifiers. Other strategies known in the art may also be employed, including, for example, other evolutionary strategies as well as simulated annealing and other random search methods. Meta-heuristics such as harmony search may also be employed.
Exemplary embodiments use any number of NSCLC biomarkers listed in table 1 in various combinations to generate diagnostic tests for detecting NSCLC (see example 2 for a detailed description of how these biomarkers are identified). In one embodiment, the method for diagnosing NSCLC uses a naive bayes classification approach in combination with any number of NSCLC biomarkers listed in table 1. In the illustrative example (example 3), the simplest test for detecting NSCLC from smokers and benign lung nodule populations can be constructed with a single biomarker such as MMP7, which is differentially expressed in NSCLC at a KS distance of 0.59, MMP 7. Parameter μ for MMP7 from Table 16 was usedc,i、σc,i、μd,iAnd σd,iAnd equations for the log-likelihood above, diagnostic tests with AUC 0.803 can be derived, see table 15. The ROC curve for this test is shown in fig. 2.
For example, the addition of the biomarker CLIC1 whose KS distance was 0.53 significantly improved classifier performance to AUC 0.883. It should be noted that the score of the classifier constructed from the two biomarkers is not a simple sum of KS distances; when the KS distance is not additive when combining biomarkers, and it uses many weaker markers to achieve the same level of performance as strong markers. The addition of the third marker STX1A, for example, improved classifier performance to AUC 0.901. The addition of additional biomarkers such as CHRDL1, PA2G4, SERPINAl, BDNF, GHR, TGFBI, and NME2 produced a series of NSCLC tests, which are summarized in table 15 and shown as a series of ROC curves in fig. 3. Classifier scores according to the number of analytes used in the classifier construction are shown in fig. 4. The AUC for this exemplary ten-label classifier was 0.948.
The markers listed in table 1 can be combined in a number of ways to generate a classifier for diagnosing NSCLC. In some embodiments, the panel of biomarkers consists of different numbers of analytes depending on the particular diagnostic performance criteria selected. For example, certain combinations of biomarkers yield more sensitive (or more specific) tests than other combinations.
Once the subject group is defined to include a particular set of biomarkers from table 1, and a classifier is constructed from the training data set, the definition of the diagnostic test is completed. In one embodiment, the procedure for classifying an unknown sample is outlined in fig. 1A. In another embodiment, the procedure for classifying an unknown sample is outlined in fig. 1B. Biological samples are suitably diluted and then run in one or more assays to generate relevant quantitative biomarker levels for classification. The measured biomarker levels are used as input to a classification method that outputs a classification and optionally a score for the sample that reflects the confidence in the class assignment.
Table 1 identifies 59 biomarkers that can be used to diagnose NSCLC. This is surprisingly higher than expected when compared to that typically found in biomarker discovery efforts, and can be attributed to the scale of the study, which encompasses over 1000 proteins measured in hundreds of individual samples, in some cases at concentrations in the low femtomolar range. It is speculated that the large number of biomarkers found reflect different biochemical pathways involved in tumor biology and the body's response to the presence of tumors; each pathway and process involves a number of proteins. The results show that no single protein in the panel provides information uniquely for such complex processes; in contrast, multiple proteins are involved in processes such as apoptosis or extracellular matrix repair.
Given the numerous biomarkers identified during the course of the study, it is expected that a large number of high performance classifiers can be derived that can be used in a variety of diagnostic methods. To test this concept, tens of thousands of classifiers were evaluated using the biomarkers in table 1. As described in example 4, multiple subsets of the biomarkers presented in table 1 can be combined to produce useful classifiers. For example, descriptions are provided for classifiers containing 1,2, and 3 biomarkers for detecting NSCLC. As described in example 4, all classifiers constructed using the biomarkers in table 1 were significantly better than classifiers constructed using "no markers".
The performance of the classifier obtained by randomly excluding some of the labels in table 1, which resulted in a smaller subset of classifiers being constructed therefrom, was also tested. As described in example 4, the classifier constructed from the random tag subset in table 1 performs similarly to the best classifier constructed using the full tag list in table 1.
The performance of the ten-tag classifier obtained by excluding the "best" single tag from the ten-tag aggregate was also tested. Classifiers constructed without the "best" label of table 1 also performed well, as described in example 4. Even after removing the best 15 of the markers listed in table, the multiple biomarker subsets listed in table 1 performed close to optimal. This suggests that the performance characteristics of any particular classifier may not be due to some small core set of biomarkers, and that disease processes may affect numerous biochemical pathways that alter the expression levels of many proteins.
The results from example 4 suggest some possible conclusions: first, the identification of a large number of biomarkers enables them to be aggregated into a vast number of classifiers that provide similarly high performance. Second, the classifier can be constructed such that certain biomarkers can replace other biomarkers in a manner that reflects redundancy that undoubtedly extends over the complexity of the underlying disease process. That is, the information about the disease contributed by any single biomarker identified in table 1 overlaps with the information contributed by other biomarkers, such that no particular biomarker or panel of biomarkers in table 1 must be included in any classifier.
The exemplary embodiment uses a naive bayes classifier constructed from the data in table 16 to classify unknown samples. The procedure is summarized in fig. 1A and 1B. In one embodiment, the biological sample is optionally diluted and run in a multiplex aptamer assay. The data from the assay was normalized and calibrated as described in example 3, and the resulting biomarker levels were used as input to a bayesian classification scheme. The log-likelihood ratios are calculated individually for each measured biomarker and then summed to produce a final classification score, also referred to as a diagnostic score. The resulting assignments can be reported as well as the overall classification score. Optionally, a single log-likelihood risk factor calculated for each biomarker level may also be reported. Details of the classification score calculation are presented in example 3.
Reagent kit
Any combination of biomarkers (and additional biomedical information) in table 1 can be detected using a suitable kit, e.g., for performing the methods disclosed herein. In addition, any kit can contain one or more detectable labels as described herein, e.g., a fluorescent moiety, and the like.
In one embodiment, as further described herein, a kit comprises: (a) one or more capture reagents (e.g., at least one aptamer or antibody) for detecting one or more biomarkers in a biological sample, wherein the biomarkers comprise any of the biomarkers set forth in table 1, and optionally (b) one or more software or computer program products for classifying an individual from which the biological sample was obtained as having or not having lung cancer, or for determining the likelihood that the individual has NSCLC. Alternatively, one or more instructions for performing the above steps manually by a human may be provided in place of one or more computer program products.
The combination of a solid support with the corresponding capture reagent and signal-generating material is referred to herein as a "detection device" or "kit". The kit may also include instructions for using the devices and reagents, processing the sample, and analyzing the data. Further, the kit may be used with a computer system or software to analyze and report the results of the analysis of the biological sample.
The kit may also contain one or more reagents (e.g., solubilization buffer, detergent, or buffer) for processing the biological sample. Any of the kits described herein can also include, for example, buffers, blocking agents, mass spectrometry matrix materials, antibody capture agents, positive control samples, negative control samples, software, and information such as protocols, instructions, and reference data.
In one aspect, the invention provides a kit for analyzing NSCLC status. The kit comprises PCR primers for one or more biomarkers selected from table 1. The kit may further comprise instructions for use of the biomarker and association of the biomarker with NSCLC. The kit may further comprise a DNA array containing complement for one or more biomarkers selected from table 1, reagents and/or enzymes for amplifying or isolating sample DNA. The kit may include reagents for real-time PCR, such as TaqMan probes and/or primers and enzymes.
For example, a kit may comprise: (a) a reagent comprising at least a capture reagent for quantifying one or more biomarkers in a test sample, wherein the biomarkers comprise the biomarkers set forth in table 1 or any other biomarker or panel of biomarkers described herein, and optionally (b) -one or more algorithms or computer programs for performing the steps of: comparing the amount of each biomarker quantified in the test sample to one or more predetermined cutoffs and assigning a score for each biomarker quantified based on the comparison, combining the assigned scores for each biomarker quantified to obtain a total score, comparing the total score to the predetermined scores, and determining whether the individual has NSCLC using the comparison. Alternatively, instead of one or more algorithms or computer programs, one or more instructions for manually performing the above steps by a human may be provided.
Computer method and software
Once a biomarker or set of biomarkers is selected, the method of diagnosing an individual may comprise the steps of: 1) collecting or otherwise obtaining a biological sample; 2) performing an analytical method to detect and measure a biomarker in a biological sample or a biomarker in a panel of subjects; 3) any data normalization or standardization required to perform the method for collecting biomarker values; 4) calculating a marker score; 5) combining marker scores to obtain a total diagnostic score; and 6) reporting the diagnostic score of the individual. In this method, the diagnostic score may be a single number determined from the sum of all markers, which number is compared to a preset threshold indicative of the presence or absence of disease. Alternatively, the diagnostic score may be a series of bars, each representing a biomarker value, and the response pattern may be compared to a preset pattern for determining the presence or absence of disease.
At least some embodiments of the methods described herein may be implemented using a computer. An example of a computer system 100 is shown in FIG. 6. Referring to fig. 6, a system 100 is shown including hardware elements, including a processor 101, an input device 102, an output device 103, a storage device 104, a computer-readable storage medium reader 105a, a communication system 106, a process acceleration (e.g., DSP or special purpose processor) 107, and a memory 109, electrically coupled via a bus 108. Computer-readable storage media reader 105a is further coupled to computer-readable storage media 105b, the combination comprehensively representing remote, local, fixed, and/or removable storage devices plus storage media, memory, etc. for temporarily and/or more permanently containing computer-readable information, which can include storage device 104, memory 109, and/or any other such accessible system 100 resource. The system 100 also includes software elements (shown as being currently located within the working memory 191) including an operating system 192 and other code 193, such as programs, data, and so forth.
Referring to fig. 6, the system 100 has a wide range of flexibility and configurability. Thus, for example, a single architecture may be used to implement one or more servers, which may be further configured in accordance with presently desired schemes, scheme variations, extensions, and the like. However, those skilled in the art will appreciate that the embodiments may be better utilized in accordance with more specific application requirements. For example, one or more system elements may be implemented as sub-elements within a component of system 100 (e.g., within communication system 106). Customized hardware might also be used and/or particular elements might be implemented in hardware, software, or both. Further, while connections to other computing devices, such as network input/output devices (not shown), may be employed, it is to be understood that wired, wireless, modem and/or other connection or connections to other computing devices may also be utilized.
In one aspect, the system can include a database containing characteristics of biomarkers characteristic of NSCLC. The biomarker data (or biomarker information) may be used as an input to a computer to be used as part of a computer-implemented method. The biomarker data may include data as described herein.
In one aspect, the system further includes one or more means for providing input data to the one or more processors.
The system also includes a memory for storing the data set of hierarchical data elements.
In another aspect, the means for providing input data comprises a detector, such as a mass spectrometer or gene chip reader, for detecting a characteristic of the data element.
The system may additionally include a database management system. The user request or query may be formatted in an appropriate language understood by a database management system that processes the query to extract relevant information from the database of the training set.
The system may be connected to a network that connects a network server and one or more clients. The network may be a Local Area Network (LAN) or a Wide Area Network (WAN) as is known in the art. Preferably, the server includes the hardware required to run a computer program product (e.g., software) to access database data for processing user requests.
The system may include an operating system (e.g., UNIX or Linux) for executing instructions from a database management system. In one aspect, the operating system may operate over a global communications network, such as the Internet, and utilize a global communications network server to connect to such networks.
The system may include one or more devices that include a graphical display interface including interface elements such as buttons, drop down menus, scroll bars, fields for entering text, and the like, as conventionally found in graphical user interfaces known in the art. Requests entered on the user interface may be passed to applications in the system for formatting to search one or more system databases for relevant information. The user entered request or query may be constructed in any suitable database language.
The graphical user interface may be generated by graphical user interface code that is part of the operating system and may be used to input data and/or display input data. The results of the processed data may be displayed on an interface, printed on a printer in communication with the system, stored in a storage device, and/or transmitted over a network or may be provided in the form of a computer-readable medium.
The system may be in communication with an input device for providing data regarding the data element to the system (e.g., a presentation value). In one aspect, the input device can include a gene expression profiling system, including, for example, a mass spectrometer, a gene chip or array reader, or the like.
The methods and apparatus for analyzing NSCLC biomarker information according to various embodiments may be implemented in any suitable manner, for example using a computer program operating on a computer system. A conventional computer system may be used which includes a processor and random access memory such as a remotely accessible application server, web server, personal computer or workstation. Additional computer system components may include storage devices or information storage systems, such as mass storage systems, and user interfaces, such as conventional displays, keyboards, and tracking devices. The computer system may be a stand-alone system or part of a computer network that includes a server and one or more databases.
NSCLC biomarker analysis systems may provide functions and operations to perform data analysis, such as data collection, processing, analysis, reporting, and/or diagnosis. For example, in one embodiment, a computer system may execute a computer program that can receive, store, search, analyze, and report information about NSCLC biomarkers. The computer program may include a plurality of modules that perform a variety of functions or operations, such as a processing module for processing the raw data and generating supplemental data, and an analysis module for analyzing the raw data and supplemental data to generate a NSCLC status and/or diagnosis. Diagnosing a NSCLC state may include generating or collecting any other information, including additional biomedical information, information about the individual's condition associated with a disease, identifying whether further testing is required, or otherwise assessing the health status of the individual.
Referring now to FIG. 7, an example of a method of utilizing a computer in accordance with the principles of the disclosed embodiments can be seen. In fig. 7, a flowchart 3000 is shown. In block 3004, biomarker information for the individual may be retrieved. The biomarker information may be retrieved from a computer database, for example, after performing a test of a biological sample of the individual. The biomarker information may comprise biomarker values each corresponding to one of at least N biomarkers selected from the biomarkers provided in table 1, wherein N ═ 2-59. In block 3008, a computer may be used to classify each biomarker value. Additionally, in block 3012, a determination may be made regarding the likelihood that the individual has NSCLC based on the plurality of classifications. The indication may be output to a display or other display device so that it may be viewed by the individual. Thus, for example, the indication may be displayed on a display screen or other output device of the computer.
Referring now to FIG. 8, an alternative method of utilizing a computer in accordance with another embodiment may be illustrated via a flow diagram 3200. In block 3204, biomarker information about the individual may be retrieved using a computer. The biomarker information comprises biomarker values corresponding to biomarkers selected from the group of biomarkers provided in table 1. In block 3208, a computer may be used to classify the biomarker values. Additionally in block 3212, an indication may be made as to the likelihood that the individual has NSCLC based on the classification. The indication may be output to a display or other display device so that it may be viewed by the individual. Thus, for example, the indication may be displayed on a display screen or other output device of the computer.
Some embodiments described herein may be implemented so as to comprise a computer program product. The computer program product may include a computer readable medium having computer readable program code embodied in the medium for causing an application program to execute on a computer having a database.
As used herein, a "computer program product" refers to an organized collection of instructions in the form of natural or programming language statements contained on a physical medium of any nature (e.g., written, electronic, magnetic, optical, or otherwise) and usable with a computer or other automated data processing system. When executed by a computer or data processing system, such programming language statements cause the computer or data processing system to function in accordance with the particular contents of the statements. Computer program products include, but are not limited to: source code and object code embedded in a computer readable medium and/or a program in a test or data library. Furthermore, a computer program product that enables a computing system or data processing apparatus to function in a preselected manner may be provided in a variety of forms including, but not limited to, raw source code, assembly code, object code, machine language, encrypted or compressed forms of the foregoing, and any and all equivalents.
In one aspect, a computer program product for indicating a likelihood of NSCLC is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values that each correspond to one of at least N biomarkers in the biological sample selected from the group of biomarkers provided in table 1, wherein N-2-59; and code for performing a classification method that indicates the NSCLC status of the individual based on the biomarker values.
In another aspect, a computer program product for indicating a likelihood of NSCLC is provided. The computer program product includes a computer readable medium embodying program code executable by a processor of a computing device or system, the program code comprising: code that retrieves data attributed to a biological sample from an individual, wherein the data comprises biomarker values corresponding to biomarkers in the biological sample selected from the group of biomarkers provided in table 1; and code for performing a classification method that indicates the NSCLC status of the individual based on the biomarker values.
While various embodiments have been described as methods or apparatus, it will be appreciated that embodiments may be implemented by code coupled to a computer, e.g., code resident on or accessible by the computer. For example, software and databases may be used to implement many of the methods described above. Thus, in addition to embodiments performed by hardware, it should also be noted that these embodiments may be performed using an article of manufacture that includes a computer usable medium having computer readable program code embodied therein that causes implementation of the functions disclosed in this specification. It is therefore desired to additionally consider embodiments in its program code means which are likewise protected by this patent. Further, embodiments may be embodied as code stored in virtually any kind of computer readable memory, including but not limited to RAM, ROM, magnetic media, optical media, or magneto-optical media. Even more generally, embodiments may be implemented in software or hardware, or any combination thereof, including but not limited to software running on a general purpose processor, microcode, PLA, or ASIC.
It is also contemplated that embodiments may be implemented as computer signals embodied in a carrier wave as well as signals (e.g., electrical and optical) propagated over a transmission medium. Thus, the various types of information described above can be formatted in a structure, such as a data structure, and transmitted as an electrical signal through a transmission medium or stored on a computer readable medium.
It will also be noted that many of the structures, materials, and acts described herein can be described as a means for performing a function or a step for performing a function. It should be understood, therefore, that such language is entitled to cover all such structures, materials, or acts disclosed within this specification and their equivalents, including those incorporated by reference.
Biomarker identification processes, the use of biomarkers disclosed herein, and various methods for determining biomarker values are described in detail above with respect to NSCLC. However, the application of the process, the use of the identified biomarkers and the method for determining the biomarker values are fully applicable to the identification of other specific types of cancer, cancer in general, any other disease or medical condition, or individuals who may or may not benefit from adjuvant medical treatment. In addition to reference to a particular outcome associated with NSCLC, reference herein to NSCLC may be understood to include other types of cancer, cancer in general, or any other disease or medical condition, as is clear from the context.
Examples
The following examples are provided for illustrative purposes only and are not intended to limit the scope of the present application as defined by the appended claims. All of the embodiments described herein are performed using standard techniques that are well known and conventional to those skilled in the art. Conventional Molecular biology techniques described in the following examples can be performed as described in standard laboratory manuals, such as Sambrook et al, Molecular Cloning: a Laboratory Manual, 3 rd edition, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. (2001).
Example 1 multiplex aptamer analysis of samples
This example describes a multiplex aptamer assay for analyzing samples and controls for identifying biomarkers set forth in table 1 (see fig. 9) and for identifying cancer biomarkers set forth in table 19. For NSCLC, mesothelioma, and renal cell carcinoma studies, the multiplex analysis utilized 1,034 aptamers each unique to a particular target.
In this method, the pipette tip is replaced for each solution addition.
Additionally, most solution transfers and washes add a 96-well head using Beckman Biomek Fxp unless otherwise noted. Unless otherwise noted, the manual pipetting method steps used a twelve channel P200Pipetteman (Rainin Instruments, LLC, Oakland, CA). A custom buffer called SB17 was prepared internally and contained 40mM HEPES, 100mM NaCl, 5mM KCl, 5mM MgCl21mM EDTA, pH 7.5. A custom buffer called SB18 was prepared internally, containing 40mM HEPES, 100mM NaCl, 5mM KCl, 5mM MgCl2pH 7.5. All steps were performed at room temperature unless otherwise indicated.
1. Preparation of aptamer stock solution
Custom aptamer stocks were prepared at 2x concentration for 5%, 0.316% and 0.01% sera in 1x SB17, 0.05% Tween-20.
These solutions were stored at-20 ℃ until use. On the day of the assay, each aptamer mixture was thawed at 37 ℃ for 10 minutes, placed in a boiling water bath for 10 minutes, and allowed to cool to 25 ℃ for 20 minutes with vigorous mixing between each heating step. After heat-cooling, 55 μ Ι of each 2x aptamer mixture was manually pipetted into a 96-well Hybaid plate and the plate was sealed with foil. The final result was three 96-well, foil-sealed Hybaid plates with 5%, 0.316%, and 0.01% aptamer mixtures. Individual aptamer concentrations were 2x final concentration or 1 nM.
2. Assay sample preparation
Frozen aliquots of 100% serum or plasma stored at-80 ℃ were placed in a water bath at 25 ℃ for 10 minutes. The thawed sample was placed on ice, vortexed gently (set at 4) for 8 seconds, and then placed on ice again.
10% sample solutions (2 × final concentration) were prepared by transferring 8 μ L of samples into 96-well Hykid plates using 50 μ L of 8-channel cross-pipettors (spinning pipettors) at 4 deg.C, each well containing 72 μ L of the appropriate sample diluent (for serum, 1 × SB17, or for plasma 0.8 × SB18, plus 0.06% Tween-20, 11.1 μ M Z-block _2, 0.44mM MgCl-222.2mM AEBSF, 1.1mM EGTA, 55.6 μm EDTA). This plate was stored on ice until the next sample dilution step was started on the BiomekFxP robot.
To begin sample and aptamer equilibration, 10% sample plates were briefly centrifuged and placed on a BeckmanFX where it was mixed by pipetting up and down with a 96-well pipettor. Subsequently, 0.632% sample plates (2 × final concentration) were prepared by diluting 6 μ L of 10% sample into 89 μ L1 xSB17, 0.05% Tween-20 with 2mM AEBSF. Next, 6 μ L of the resulting 0.632% sample was diluted in 184 μ L of 1xSB17, 0.05% Tween-20 to prepare 0.02% sample plates (2 × final concentration). Dilutions were done on a Beckman Biomek Fxp. After each transfer, the solutions were mixed by pipetting up and down. Subsequently, 3 sample dilution plates were transferred to their respective aptamer solutions by adding 55 μ L of the sample to 55 μ L of the appropriate 2x aptamer mixture. The sample and aptamer solutions were mixed on the robot by pipetting up and down.
3. Sample equilibrium binding
The sample/aptamer plates were sealed with foil and placed in a 37 ℃ incubator for 3.5 hours before proceeding to capture 1 step.
4. Preparation of Capture 2 bead plate
A11 mL aliquot of MyOne (Invitrogen Corp., Carlsbad, Calif.) streptavidin C1 beads was washed 2 times with an equal volume of 20mM NaOH (5 minutes of incubation for each wash), 3 times with equal volumes of 1XSB17, 0.05% Tween-20, and resuspended in 11mL of 1x SB17, 0.05% Tween-20. Using a 12-span multichannel pipettor, 50. mu.L of this solution was manually pipetted into each well of a 96-well Hyhid plate. The plates were then covered with foil and stored at 4 ℃ for use in the assay.
5. Preparation of Capture 1 bead plate
Three 0.45 μm Millipore HV plates (Durapore membrane, Cat # MAHVN4550) were equilibrated at 100 μ L1x SB17, 0.05% Tween-20 for at least 10 minutes. The equilibration buffer was then filtered through a plate and 133.3. mu.L of 7.5% streptavidin-agarose bead slurry (in 1XSB17, 0.05% Tween-20) was added to each well. To keep the streptavidin-agarose beads suspended while they are transferred into the filter plate, the bead solution is manually mixed with a 200 μ Ι _ 12 channel pipette, at least 6 times between pipetting events. After the beads were dispensed across 3 filter plates, vacuum was applied to remove the bead supernatant. Finally, the beads were washed in filter plates with 200 μ L of 1XSB17, 0.05% Tween-20, and then resuspended in 200 μ L of 1XSB17, 0.05% Tween-20. The bottom of the filter plate was blotted dry and the plate was stored for use in the assay.
6. Loading Cytomat
The Cytomat was loaded with all tips, plates, all reagents in the wells (except NHS-biotin reagent freshly prepared immediately before addition of the plate), 3 prepared capture 1 filter plates and 1 prepared MyOne plate.
7. Capture 1
After 3.5 hours equilibration time, the sample/aptamer plates were removed from the incubator, centrifuged for about 1 minute, removed of the cover, and placed on a platform (deck) of Beckman Biomek Fxp. The Beckman Biomek Fxp program was started. Unless otherwise stated, all subsequent steps in capture 1 were done by the Beckman Biomek Fxp robot. Within this procedure, a vacuum was applied to the capture 1 filter plate to remove the bead supernatant. 5%, 0.316% and 0.01% of the equilibrium binding reactions of each 100 microliters were added to their respective capture 1 filter plates and each plate was mixed for 10 minutes at 800rpm using a platform-type (on-deck) orbital shaker.
Unbound solution was removed via vacuum filtration. The capture 1 beads were washed with 190 μ L of 100 μ M biotin in 1 × SB17, 0.05% Tween-20 followed by 5 × 190 μ L of 1 × SB17, 0.05% Tween-20 by dispensing the solution and immediately applying vacuum to filter the solution through the plate.
8. Tagging
Aliquots of 100mM NHS-PEO 4-biotin in anhydrous DMSO were thawed at 37 ℃ for 6 minutes and subsequently diluted 1:100 with labeling buffer (SB 17, pH 7.25, 0.05% Tween-20). Diluted NHS-PEO 4-biotin reagent was manually added to the platform trough with robotic assistance, and the robotic program was manually restarted to dispense 100 μ l NHS-PEO 4-biotin into each well of each capture 1 filter plate. This solution was allowed to incubate for 5 minutes at 800rpm on an orbital shaker with shaking of the capture 1 beads.
9. Dynamic attack and photocutting
The tagging reaction was removed by vacuum filtration and quenched by adding 150 μ L of 20mM glycine in 1XSB17, 0.05% Tween-20 to the capture 1 plate. The NHS-tag/glycine solution was removed via vacuum filtration. Next, 1500. mu.L of 20mM glycine (1 XSB17, 0.05% Tween-20) was added to each plate and incubated for 1 minute at 800rpm on an orbital shaker before removal by vacuum filtration.
The wells of the capture 1 plate were then washed three times by: 1XSB17, 0.05% Tween-20 were added followed by vacuum filtration, and 190. mu.L of 1XSB17, 0.05% Tween-20 were added with shaking at 800rpm for 1 minute followed by vacuum filtration. After the last wash, the plate was placed on top of a 1mL deep well plate and removed from the platform. The capture 1 plate was centrifuged at 1000rpm for 1 minute to remove as much excess volume from the agarose beads as possible prior to elution.
The plate was returned to the Beckman Biomek Fxp and 85. mu.L of 10mM DxSO4 in 1XSB17, 0.05% Tween-20 was added to each well of the filter plate.
The filter plates were removed from the platform, placed on a Variomag Thermoshaker (Thermo Fisher Scientific, inc., ffalma, ma) under a BlackRay (Ted Pella, inc., Redding, CA) light source, and irradiated for 5 minutes while shaking at 800 rpm. After 5 minutes incubation, the plate was rotated 180 degrees and irradiated for 5 more minutes with shaking.
The photocleaved solution was sequentially eluted from each capture 1 plate into a common deep well plate by first placing a 5% capture 1 filter plate on top of the 1mL deep well plate and centrifuging at 1000rpm for 1 minute. Subsequently, 0.316% and 0.01% capture 1 plates were centrifuged sequentially into the same deep well plate.
10. Capture 2 bead Capture
A 1mL deep well block containing pooled capture 1 eluates was placed on the platform of Beckman Biomek Fxp for capture 2.
The robot transfers all photocleaved eluate from the 1mL deep well plate onto a Hyhid plate containing previously prepared captured 2MyOne magnetic beads (after removal of the MyOne buffer via magnetic separation).
The solution was incubated on a Variomag Thermoshaker (Thermo fisher scientific, inc., Waltham, MA) at 25 ℃ for 5 minutes while shaking at 1350 rpm.
The robot transfers the plate to a platform-type magnetic separator station. The plates were incubated on the magnet for 90 seconds before removing and discarding the supernatant.
30% Glycerol Wash at 11.37 deg.C
The capture 2 plate was moved to a platform type thermal shaker and 75 μ L of 1XSB17, 0.05% Tween-20 was transferred to each well. The plate was mixed at 1350rpm and 37 ℃ for 1 minute to resuspend and warm the beads. At 37 ℃,75 μ L of 60% glycerol was transferred to each well of the capture 2 plate and the plate was continued to mix at 1350rpm and 37 ℃ for another minute. The robot transfers the plate to a 37 ℃ magnetic separator where it is incubated on a magnet for 2 minutes and then the robot removes and discards the supernatant. These washes were repeated 2 more times.
After removing the third 30% glycerol wash from the capture 2 beads, 150 μ L of 1 × SB17, 0.05% Tween-20 was added to each well and incubated at 37 ℃ for 1 minute with shaking at 1350rpm before removal by magnetic separation on a magnet at 37 ℃.
The capture 2 beads were finally washed once with 150 μ L of 1XSB17, 0.05% Tween-20 with a 1 minute incubation, before magnetic separation, while shaking at 1350rpm at 25 ℃.
12. Capture 2 bead elution and neutralization
Aptamers were eluted from the capture 2 beads by adding 105 μ L of 100mM CAPSO with 1M NaCl, 0.05% Tween-20 to each well. The beads were incubated with this solution for 5 minutes with shaking at 1300 rpm.
Subsequently, capture 2 plates were placed on a magnetic separator for 90 seconds before transferring 63 μ L of the eluate to a new 96-well plate containing 7 μ L of 500mM HCl, 500mM HEPES, 0.05% Tween-20 in each well. After transfer, the mixed solution was mechanized by pipetting 90 μ Ι _ up and down five times.
13. Hybridization of
Beckman Biomek Fxp transferred 20. mu.L of neutralized capture 2 eluate to fresh Hybaid plates and added 6. mu.L of 10 Agilent Block containing 10 × hybridization control spike to each well. Next, 30 μ L of 2xAgilent hybridization buffer was manually pipetted into each well of the plate containing the neutralized sample and blocking buffer, and the solutions were mixed by slowly manually pipetting 25 μ L up and down 15 times to avoid large bubbles forming. The plate was spun at 1000rpm for 1 minute.
Custom Agilent microarray slides (Agilent Technologies, Inc., Santa Clara, Calif.) were designed to contain probes complementary to the random regions of the aptamer plus some primer regions. For most aptamers, the optimal length of the complementary sequence is determined empirically and ranges from 40-50 nucleotides. For future aptamers, the 46-mer complementary region was chosen by default. The probes were attached to the slide surface by poly-T linkers for a total probe length of 60 nucleotides.
The sealed slides were placed in an Agilent hybridization chamber and 40. mu.L each of the samples containing hybridization and blocking solutions were manually pipetted into each gasket. An 8-channel adjustable span pipette is used in a manner intended to minimize bubble formation. The custom Agilent microarray slide (Agilent Technologies, inc., Santa Clara, CA) with its barcode facing up was then slowly lowered onto the sealing slide (see Agilent manual for a detailed description).
The upper part of the hybridization chamber was placed on the slide/backing sandwich and the clamping carriage was slid over the entire assembly. The assemblies are clamped by firmly turning the screws.
Each slide/backing slide sandwich was visually inspected to ensure that the solution bubbles were free to move within the sample. If the bubble is not free to move, the hybridization chamber is tapped to release the bubble located near the gasket.
The assembled hybridization chamber was incubated in an Agilent hybridization oven at 60 ℃ for 19 hours with 20rpm rotation.
14. Post-hybridization washes
Approximately 400mL of Agilent wash buffer 1 was placed in each of two separate glass staining dishes. One staining dish was placed on a magnetic stir plate and the slide rack and stir bar were placed in buffer.
The staining dish for Agilent wash 2 was prepared by placing a stir bar in an empty glass staining dish.
A fourth glass stain dish was reserved for a final acetonitrile wash.
Each of the six hybridization chambers was disassembled. The slide/backing sandwich was removed one by one from its hybridization chamber and immersed in a staining dish containing wash 1. The slide/backing sandwich was pried apart using a pair of tweezers while the microarray slide was still submerged. Slides were quickly transferred to slide racks in wash 1 staining dishes on magnetic stir plates.
The slide rack was gently raised and lowered 5 times. The magnetic stirrer was turned on at the low setting and the slides were incubated for 5 minutes.
When wash 1 was left for one minute, wash buffer 2, pre-heated to 37 ℃ in an incubator, was added to the second prepared staining dish. Slide racks were quickly transferred to wash buffer 2 and any excess buffer on the bottom of the rack was removed by scraping it onto the top of the staining dish. The slide rack was gently raised and lowered 5 times. The magnetic stirrer was turned on at the low setting and the slides were incubated for 5 minutes.
The slide rack is slowly pulled out of wash 2, taking about 15 seconds to remove the slides from the solution.
For the remaining one minute in wash 2, Acetonitrile (ACN) was added to the fourth stain dish. The slide rack was transferred to an acetonitrile staining dish. The slide rack was gently raised and lowered 5 times. The magnetic stirrer was turned on at the low setting and the slides were incubated for 5 minutes.
The slide holder was slowly pulled out of the ACN staining dish and placed on a water-absorbent towel. The bottom edge of the slide was flash dried and the slide was placed in a clean slide box.
15. Microarray imaging
The microarray slides were placed in an Agilent scanner slide holder and loaded into the Agilent microarray scanner according to the manufacturer's instructions.
Slides were imaged in Cy3 channel at 5 μm resolution at 100% PMT setting, and XRD option was enabled at 0.05. The resulting tiff images were processed using Agilent feature extraction software version 10.5.
Example 2 biomarker identification
Identification of potential NSCLC biomarkers is performed for diagnosing NSCLC in individuals with uncertain lung nodules identified by CT scanning or other imaging methods, screening high risk smokers for NSCLC, and diagnosing individuals with NSCLC. The inclusion criteria for this study were smokers, 18 years of age or older, able to give informed consent, and a proven diagnosis of blood samples and NSCLC or benign findings. For cases, blood samples were collected prior to treatment or surgery and NSCLC was subsequently diagnosed. Exclusion criteria included prior diagnosis or treatment of cancer (excluding squamous cell carcinoma of the skin) within 5 years of blood draw. Serum samples were collected from 4 different sites and included 46 NSCLC samples and 218 control samples as described in table 17. A multiplex aptamer affinity assay as described in example 1 was used to measure and report RFU values for 1,034 analytes in each of the 264 samples.
Each of the case and control populations were compared separately by generating a class-dependent cumulative distribution function (cdf) for each of the 1,034 analytes. The KS distance (Kolmogorov-Smirnov statistic) between values from two sample sets is a non-parametric measure of the degree to which the empirical distribution of values from one set (set a) differs from the distribution of values from the other set (set B). For any value of the threshold T, a proportion of the values from set a are less than T, and a proportion of the values from set B are less than T. The KS distance measure selects the largest (unsigned) difference between the proportion of values from the two sets for any T.
This set of potential biomarkers can be used to construct a classifier that assigns the sample to a control or disease group. In fact, many such classifiers are generated from these biomarker sets, and the frequency with which any biomarker is used in a good scoring classifier is determined. Those biomarkers that occur most frequently in the best scoring classifier are most useful for generating diagnostic tests. In the present embodiment, a bayesian classifier is used to explore the classification space, but many other supervised learning techniques may be employed for this purpose. The score fitness of any single classifier was measured by the area under the receiver operating characteristic curve (AUC of ROC) of the classifier on a bayesian surface, assuming a disease prevalence of 0.5. This scoring metric ranges from zero to one, one of which is an error-free classifier. Details of the construction of a bayesian classifier from biomarker population measurements are described in example 3.
Using 59 analytes in table 1, a total of 964 10 analyte classifiers were found with an AUC of 0.94 for diagnosing NSCLC from the control group. From this set of classifiers, a total of 12 biomarkers were found to be present in 30% or more of the top scoring classifiers. Table 13 provides a list of these potential biomarkers, and fig. 10 is a frequency plot for the identified biomarkers.
Example 3 naive Bayes Classification for NSCLC
From the list of biomarkers identified as useful for distinguishing NSCLC from controls, a panel of ten biomarkers was selected and a naive bayes classifier was constructed, see tables 16 and 18. Modeling the class-dependent probability density function (pdf), p (Xi | c), and p (Xi | d) as lognormal distribution functions, where Xi is the logarithm of the measured RFU value for biomarker i, and c and d refer to the control and disease populations, the functions being characterized by a mean μ and a variance σ2. The parameters for the pdf for the ten biomarkers are listed in table 16, and the raw data are shown in fig. 5 along with an example of fitting to a normal pdf model. As demonstrated by fig. 5, the underlying assumptions seem to fit the data very well.
A naive Bayes classification for such a model is given by the following formula, where p (d) is the prevalence of disease in the population suitable for testing,and n is 10. Each term in the summation is a log-likelihood ratio for a single label, and the total log-likelihood ratio for the disease of interest (i.e., NSCLC in this case) relative to sample X with the disease is simply the sum of these individual terms plus the term responsible for the prevalence of the disease. For simplicity, we assume p (d) to be 0.5, so that
The calculation of the classification is detailed in table 16, considering the unknown sample measurements in log (rfu) for each of the ten biomarkers of 6.9, 8.7, 7.9, 9.8, 8.4, 10.6, 7.3, 6.3, 7.3, 8.1. Individual components containing log-likelihood ratios for disease versus control classes are tabulated and can be represented by the parameters in Table 16The value of (2) is calculated. The sum of the individual log-likelihood ratios is-11.584, or the likelihood of no disease versus having disease is 107,386, where the likelihood e11.584107,386. The first 3 biomarker values had a more consistent likelihood with the disease group (log likelihood > 0), but the remaining 7 biomarkers were all consistently found to be favorable for the control group. Multiplying the likelihoods gives the same result as shown above; the likelihood of the unknown sample not containing disease is 107,386. In fact, this sample is from the control population in the training set.
Example 4 greedy Algorithm for selecting biomarker panels for classifiers
This example describes the selection of biomarkers from table 1 to form a panel of subjects that can be used as classifiers in any of the methods described herein. The subset of biomarkers in table 1 was selected to construct a classifier with good performance. This method was also used to determine which potential markers were included as biomarkers in example 2.
As used herein, a measure of classifier performance is AUC; a performance of 0.5 is with respect to the baseline expectation for the random (coin toss) classifier, a classifier that is worse than random will score between 0.0-0.5, and a classifier that is better than random will score between 0.5-1.0. A perfect classifier without errors has a sensitivity of 1.0 and a specificity of 1.0. The method described in example 4 can be applied to other common measurements of performance, such as F-measurements, the sum of sensitivity and specificity, or the product of sensitivity and specificity. In particular, it may be desirable to process specificity and specificity with different weightings in order to select those classifiers with higher specificity performance at the expense of some sensitivity, or to select those classifiers with higher sensitivity performance at the expense of some specificity. Because the methods described herein only involve measurement of "performance," any weighting scheme that results in a single performance measurement may be used. Different applications have different benefits for true positive and true negative findings, and different costs associated with false positive and false negative findings. For example, screening asymptomatic smokers and identifying diagnosis of benign nodules found on CT generally do not have the same optimal tradeoff between specificity and sensitivity. The different requirements of the two tests generally require different weights to be set for positive and negative misclassifications, reflected in the performance measurements. Changing the performance measure will generally change the exact subset of labels selected from table 1 for a given data set.
For the bayesian approach described in example 3 to distinguish NSCLC samples from control samples, the classifier was fully parameterized by the distribution of biomarkers in disease and benign training samples, and the list of biomarkers was selected from table 1; that is, given a training data set, the subset of labels selected for inclusion test the classifier in a one-to-one manner.
The greedy approach employed here is used to search for the best subset of markers from table 1. For a small number of labels or classifiers with relatively few labels, each possible subset of labels is enumerated and evaluated based on the performance of the classifier constructed with that particular set of labels (see example 4, section 2). (this method is well known in the field of statistics as "optimal subset selection"; see, e.g., Hastie et al). However, for the classifiers described herein, the number of combinations of multiple markers can be very large, and it is not feasible to evaluate each possible set of 10 markers, since there are 30,045,015 possible combinations that can be generated from a list of only 30 total analytes. Because searching through each subset of tokens is impractical, it may not be possible to find a single optimal subset; however, by using this approach, many excellent subsets are found, and in many cases, any of these subsets may represent the best subset.
Instead of evaluating each possible set of markers, a "greedy" stepwise forward approach may be followed (see, e.g., Dabney AR, Storey JD (2007) Optimality drive neutral classic classification from Genomic data. PLoS ONE 2 (10): e1002.doi:10.1371/journal. po. 0001002). Using this approach, the classifier starts with the best single label (based on KS distance for a single label) and grows at each step by trying each member of the label list in turn that is not currently a member of the label set in the classifier. One label that scores best in combination with the existing classifier is added to the classifier. This is repeated until no further improvement in performance is achieved. Unfortunately, this approach may miss valuable marker combinations for which some of the individual markers were not all selected before the process terminated.
The greedy procedure used here is a detailed description of the aforementioned step-by-step forward approach, so to broaden the search, instead of only a single candidate classifier (subset of labels) at each step, a list of candidate classifiers is kept. The list is seeded with each single subset of markers (using each marker in the list itself). The list is expanded in step by deriving new classifiers (tag subsets) from the classifiers currently on the list and adding them to the list. Each subset of tokens currently on the list is extended by adding any tokens from table 1 that are not already part of the classifier, which will not duplicate the existing subset when they are added to the subset (these are referred to as "allowed tokens"). Each existing token subset is extended by each allowed token from the list. Specifically, such a process ultimately generates every possible subset, and the list will exhaust space. Thus, all generated classifiers are retained only if the list is smaller than some predetermined size (typically sufficient to hold all three-labeled subsets). Once the list reaches a predetermined size limit, it becomes elitist; that is, only those classifiers that show a certain level of performance remain on the list, while other classifiers fall to the end of the list and are discarded. This is achieved by keeping a list sorted in order of classifier performance; inserting a new classifier that is at least as good as the worst classifier currently on the list forces the exclusion of the current bottom successor. A further implementation detail is that the list is completely replaced at each generation step; thus, each classifier on the list has the same number of labels, and the number of labels per classifier is increased by one at each step.
Because this method uses different combinations of labels to generate the list of candidate classifiers, it can be queried whether the classifiers can be combined in order to avoid errors that might be generated by the best single classifier or by a small group of best classifiers. Such "population" and "expert board" methods are well known in the statistical and machine learning arts and include, for example, "averaging," "voting," "stacking," "bagging," and "boosting" (see, e.g., Hastie et al). By including information for several different classifiers and thus a larger set of labels from the biomarker table, these combinations of simple classifiers provide a method for reducing variance in the classification due to noise in any particular set of labels, effectively averaging between the classifiers. An example of the usefulness of this method is that it can prevent outliers in a single marker from adversely affecting the classification of a single sample. The need to measure a larger number of signals may be impractical in conventional "one label at a time" antibody assays, but has no drawbacks for a fully multiplexed aptamer assay. Techniques such as these benefit from a broader biomarker table and use a variety of sources of information about the disease process to provide a more robust classification.
The biomarkers selected in table 1 resulted in classifiers that performed better than classifiers constructed with "non-markers" (i.e., proteins with signals that did not meet the criteria for inclusion in table 1 (as described in example 2)).
For classifiers containing only one, two, and three markers, all possible classifiers obtained using the biomarkers in table 1 are enumerated and the performance profiles are examined in comparison to classifiers constructed from a similar list of randomly selected non-marker signals.
In fig. 11, AUC was used as a performance measure; a performance of 0.5 is the baseline expectation for the random (coin toss) sorter. The histogram of classifier performance was compared to a similarly exhaustive performance histogram from a classifier constructed from a "non-label" table of 59 non-label signals; the 59 signals were randomly selected from aptamers that did not demonstrate differential signaling between control and disease populations.
Figure 11 shows a histogram of the performance of all possible single-, two-and three-marker classifiers constructed from the biomarker parameters in table 14 for biomarkers that can distinguish between control populations and NSCLCs and compare these classifiers to all possible single-, two-and three-marker classifiers constructed using 59 "non-marker" aptamer RFU signals. Fig. 11A shows a histogram of single label classifier performance, fig. 11B shows a histogram of two label classifier performance, and fig. 11C shows a histogram of three label classifier performance.
In fig. 11, the solid line represents the histogram of classifier performance for all single-, two-, and three-labeled classifiers using the biomarker data in table 14 for smokers and benign lung nodules and NSCLC. The dashed line is a histogram of classifier performance for all single-labeled, two-labeled, and three-labeled classifiers using data on control and NSCLC but using a random set of unlabeled signals.
Classifiers constructed from the labels listed in table 1 form unique histograms that are well separated from classifiers constructed with signals from "no labels" for all single, two and three label comparisons. The performance and AUC score of classifiers constructed from the biomarkers in table 1 also increased more rapidly with the number of markers compared to classifiers constructed from non-markers, with the separation between the marker and non-marker classifiers increasing as the number of markers per classifier increases. All classifiers constructed using the biomarkers listed in table 14 performed significantly better than classifiers constructed using "no markers".
The distribution of classifier performance shows that there are many possible multi-labeled classifiers that can be derived from the set of analytes in table 1. Although some biomarkers are themselves superior to other biomarkers, as evidenced by the distribution of classifier scores and AUCs for a single analyte, it is desirable to determine whether such biomarkers are required to construct a high performance classifier. To make this distinction, the behavior of the classifier's performance is examined by eliminating some number of the best biomarkers. Figure 12 compares the classifier performance constructed with the full list of biomarkers in table 1 with the classifier performance constructed with the subset of biomarkers from table 1 excluding the top ranked markers.
Figure 12 demonstrates that classifiers constructed with less than optimal markers perform well, suggesting that the performance of the classifier is not due to some small core set of markers, and that changes in the potential processes associated with disease are reflected in the activity of many proteins. Even after removing the best 15 of the 59 markers from table 1, the multiple biomarker subsets in table 1 performed close to optimal. After discarding the 15 highest ranked (ranked by KS distance) markers from table 1, classifier performance increased with the number of markers selected from the table to reach an AUC of almost 0.93, approaching the performance of the best classifier score of 0.948 from the full biomarker list.
Finally, fig. 13 shows the ROC performance of a typical classifier constructed from the parameter list in table 14 according to example 3. Five analyte classifiers were constructed with MMP7, CLIC1, STX1A, CHRDL1, and PA2G 4. Fig. 13A shows the model performance assuming independence of these markers as in example 3, and fig. 13B shows the empirical ROC curve generated from the study data set used to define the parameters in table 14. It can be seen that the performance for a given number of selected markers is consistent in nature, and as evidenced by AUC, quantitative consistency is generally quite good, although model calculations tend to overestimate classifier performance. This is consistent with the following concept: the information contributed by any particular biomarker on the disease process is redundant with the information contributed by other biomarkers provided in table 1, while model calculations assume complete independence. Figure 13 thus demonstrates that table 1 in combination with the method described in example 3 enables the construction and evaluation of a very large number of classifiers that can be used to distinguish NSCLC from a control group.
Example 5 clinical biomarker panel
A random forest classifier is constructed from a set of selected biomarker panels, which biomarkers may be most suitable for use in clinical diagnostic testing. Unlike the model chosen by the naive bayesian greedy forward algorithm, the random forest classifier does not assume that the biomarker measurements are randomly distributed. Thus, this model can utilize biomarkers from table 1, which are not valid in a naive bayes classifier.
The panel of subjects was selected using a back-culling program that utilized the kini importance measure provided by the random forest classifier. The importance of kini is a measure of the effectiveness of biomarkers in correctly classifying samples in a training set.
This biomarker importance measure may be used to eliminate markers that are less important to classifier performance. The back culling program begins by constructing a random forest classifier that includes all 59 markers in table 1. The less important biomarkers are then eliminated and a new model is constructed with the remaining biomarkers. This procedure continues until only a single biomarker is retained.
The final subject group selected provided the best balance between the maximum AUC and the minimum number of markers in the model. The group of 8 biomarker subjects meeting these criteria consisted of the following analytes: MMP12, MMP7, KLK3-SERPINA3, CRP, C9, CNDP1, CA6, and EGFR. A graph of the ROC curve for this biomarker panel is shown in figure 14. The sensitivity of this model was 0.70 with a corresponding specificity of 0.89.
Example 6 biomarkers for cancer diagnosis
Identification of potential biomarkers for general diagnosis of cancer is performed. Both case and control samples were evaluated from 3 different types of cancer (lung, mesothelioma and renal cell carcinoma). Across venues, the selection criteria was at least 18 years of age, with signed informed consent. Both cases and controls were excluded due to known malignancies other than the cancer in question.
Lung cancer. Case and control samples were obtained as described in example 2. A total of 46 cases and 218 controls were used in this example.
Pleural mesothelioma. Case and control samples were obtained from academic cancer center biological repositories (biotopes) to identify potential biomarkers for differential diagnosis of pleural mesothelioma and benign lung disease, including suspicious radiological findings that were later diagnosed as non-malignant. A total of 124 mesothelioma cases and 138 asbestos-exposed controls were used in this example.
Renal cell carcinoma. Case and control samples were obtained from academic cancer center biostories from patients with Renal Cell Carcinoma (RCC) and benign masses (BEN). Preoperative samples were obtained for all subjects (TP 1). Preliminary analysis compares the outcome data (as recorded in SEER database field CA status 1) for RCC patients with "evidence of disease" (EVD) versus "evidence of disease free" (NED) as evidenced by clinical follow-up. A total of 38 EVD cases and 104 NED controls were used in this example.
By combining the biomarker sets considered separately for the 3 different cancer studies, a final list of cancer biomarkers was identified. A bayesian classifier using an increasing size set of biomarkers was successfully constructed using a greedy algorithm (as described in more detail in section 6.2 of this example). A set of biomarkers (or groups of subjects) and cancer types that can be used to diagnose cancer in different sites in general are compiled as a function of the size of the set (or group of subjects) and their performance is analyzed. This analysis led to a list of 23 cancer biomarkers shown in table 19, each present in at least one of these continuous marker sets ranging in size from three to ten markers. As an illustrative example, we describe the generation of a specific panel of subjects consisting of the ten cancer biomarkers shown in table 32.
6.1 naive Bayes classification for cancer
From the biomarker list in table 1, a panel of subjects with ten potential biomarkers was selected using a greedy algorithm for biomarker selection, as outlined in section 6.2 of this example. Different naive bayes classifiers were constructed for each of the 3 cancers. Modeling the class-dependent probability density function (pdf), p (Xi | c), and p (Xi | d) as lognormal distribution functions, where Xi is the logarithm of the measured RFU value for biomarker i, and c and d refer to the control and disease populations, the functions being characterized by a mean μ and a variance σ2. For 3 models consisting of ten potential biomarkersThe parameters of the pdf of (a) are listed in table 31.
A naive Bayes classification for such a model is given by the following formula, where p (d) is the prevalence of disease in the population suitable for testing,and n is 10. Each term in the summation is a log-likelihood ratio for a single label, and the total log-likelihood ratio for the disease of interest-free (i.e., in this case, each particular disease from 3 different cancer types) relative to the sample X with the disease is simply the sum of these individual terms plus the term responsible for the prevalence of the disease. For simplicity, we assume p (d) to be 0.5, so that
The calculation of the classification is detailed in table 32, considering the unknown sample measurements in log (rfu) for each of the ten biomarkers of 9.5, 8.8, 7.8, 8.3, 9.4, 7.0, 7.9, 6.3, 7.7, 10.6. The individual components containing log-likelihood ratios for the disease versus control classes are tabulated and can be calculated from the parameters in table 31 and the value of X. The sum of the individual log-likelihood ratios is-3.326, or the likelihood of no disease versus having disease is 28, where the likelihood e3.32628. The first 4 biomarker values had a more consistent likelihood with the disease group (log likelihood > 0), but the remaining 6 biomarkers were all consistently found to be favorable for the control group. Multiplying the likelihoods gives the same result as shown above; the likelihood of an unknown sample not containing disease is 28. In fact, this sample was from a control population in the training set of renal cell carcinoma.
6.1 naive Bayes classification for cancer
From the biomarker list in table 1, a panel of subjects with ten potential biomarkers was selected using a greedy algorithm for biomarker selection, as outlined in section 6.2 of this example. Different simple substances are respectively constructed for 3 different cancersAnd (4) a Bayesian classifier. Modeling the class-dependent probability density function (pdf), p (Xi | c), and p (Xi | d) as lognormal distribution functions, where Xi is the logarithm of the measured RFU value for biomarker i, and c and d refer to the control and disease populations, the functions being characterized by a mean μ and a variance σ2. The parameters for the pdf for the 3 models consisting of ten potential biomarkers are listed in table 31.
A naive Bayes classification for such a model is given by the following formula, where p (d) is the prevalence of disease in the population suitable for testing,and n is 10. Each term in the summation is a log-likelihood ratio for a single label, and the total log-likelihood ratio for the disease of interest-free (i.e., in this case, each particular disease from 3 different cancer types) relative to the sample X with the disease is simply the sum of these individual terms plus the term responsible for the prevalence of the disease. For simplicity, we assume p (d) to be 0.5, so that
The calculation of the classification is detailed in table 32, considering the unknown sample measurements in log (rfu) for each of the ten biomarkers of 9.5, 8.8, 7.8, 8.3, 9.4, 7.0, 7.9, 6.3, 7.7, 10.6. The individual components containing log-likelihood ratios for the disease versus control classes are tabulated and can be calculated from the parameters in table 31 and the value of X. The sum of the individual log-likelihood ratios is-3.326, or the likelihood of no disease versus having disease is 28, where the likelihood e3.32628. Only 4 of the biomarker values had a more consistent likelihood with the disease group (log likelihood > 0), but the remaining 6 biomarkers were all consistently found to be favorable for the control group. Multiplying the likelihoods gives the same result as shown above; the likelihood of an unknown sample not containing disease is 28. In fact, this sample was from a control population in the training set of NSCLC.
6.2 greedy Algorithm for selecting cancer biomarker panels for classifiers
Part 1
The subset of biomarkers in table 1 is selected to construct a potential classifier that can be used to determine which markers can be used as general cancer biomarkers to detect cancer.
With a set of markers, different models were each trained for 3 cancer studies, so an overall performance measure is needed to select a set of biomarkers that can classify many different types of cancer simultaneously. As used herein, a measure of classifier performance is the average of the area under the ROC curve across all naive bayes classifiers. The ROC curve is a plot of true positive rate (sensitivity) versus false positive rate (1-specificity) for a single classifier. The area under the curve (AUC) ranges from 0 to 1.0, with an AUC of 1.0 corresponding to perfect classification and an AUC of 0.5 corresponding to a random (coin toss) classifier. Other common measures of performance may be applied, such as F-measurements, or the sum or product of sensitivity and specificity. In particular, it may be desirable to process specificity and specificity with different weightings in order to select those classifiers with higher specificity performance at the expense of some sensitivity, or to select those classifiers with higher sensitivity performance at the expense of some specificity. We chose to use AUC because it encompasses all sensitivity and specificity combinations in a single measurement. Different applications have different benefits for true positive and true negative findings and will have different costs associated with false positive and false negative findings. Varying the performance measure may vary the exact subset of markers selected for a given data set.
For the bayesian method of differentiating cancer samples from control samples described in section 6.1 of this example, the classifier was fully parameterized by the distribution of biomarkers in each of the 3 cancer studies and the list of biomarkers was selected from table 19. That is, given a training data set, the subset of labels selected for inclusion test the classifier in a one-to-one manner.
The greedy approach employed here is used to search for the best subset of markers from table 1. For a small number of labels or classifiers with relatively few labels, each possible subset of labels is enumerated and evaluated for performance based on the classifier constructed with that particular label set (see example 4). (this method is well known in the field of statistics as "optimal subset selection"; see, e.g., Hastie et al). However, for the classifiers described herein, the number of combinations of multiple markers can be very large, and it is not feasible to evaluate each possible set of 10 markers, since there are 30,045,015 possible combinations that can be generated from a list of only 30 total analytes. Because searching through each subset of tokens is impractical, it may not be possible to find a single optimal subset; however, by using this approach, many excellent subsets are found, and in many cases, any of these subsets may represent the best subset.
Instead of evaluating each possible set of markers, a "greedy" stepwise forward approach may be followed (see, e.g., Dabney AR, Storey JD (2007) Optimality drive neutral classic classification from Genomic data. PLoS ONE 2 (10): e1002.doi:10.1371/journal. po. 0001002). Using this approach, the classifier starts with the best single label (based on KS distance for a single label) and grows at each step by trying each member of the label list in turn that is not currently a member of the label set in the classifier. One label that scores best in combination with the existing classifier is added to the classifier. This is repeated until no further improvement in performance is achieved. Unfortunately, this approach may miss valuable marker combinations for which some of the individual markers were not all selected before the process terminated.
The greedy program used here is a detailed description of the aforementioned step-by-step forward approach, so to broaden the search, rather than only retaining a single subset of tokens at each step, a list of candidate token sets is retained. The list is seeded with a single marker list. The list is expanded in step by deriving a new subset of labels from the classifiers that are currently on the list and adding them to the list. Each subset of tokens currently on the list is extended by adding any tokens from table 1 that are not already part of the classifier, which will not duplicate the existing subset when they are added to the subset (these are referred to as "allowed tokens"). Each time a new set of markers is defined, a set of classifiers consisting of one for each cancer study is trained using these markers, and the overall performance is measured via the average AUC across all 3 studies. To avoid potential overfitting, AUC for each cancer study model was calculated via a ten-fold cross-validation procedure. Each existing token subset is extended by each allowed token from the list. Specifically, such a process ultimately generates every possible subset, and the list will exhaust space. Thus, all generated sets of tokens are retained only when the list is smaller than some predetermined size. Once the list reaches a predetermined size limit, it becomes elite; that is, only those sets of classifiers that show a certain level of performance remain on the list, while other classifiers fall to the end of the list and are discarded. This is achieved by keeping a list sorted in the order of classifier set performance; inserting a new classifier that is at least as good as the worst set of classifiers currently on the list as a whole forces the exclusion to fail to reach the current bottom of the set of classifiers. A further implementation detail is that the list is completely replaced at each generation step; thus, each set of labels on the list has the same number of labels, the number of labels per classifier increasing by one at each step.
In one embodiment, the set of biomarkers (or panel of subjects) that can be used to construct a classifier for diagnosing general cancer versus non-cancer is based on the mean AUC for the particular biomarker combination used in the classification scheme. We identified a number of biomarker combinations derived from the markers in table 19 that are effective in classifying different cancer samples from controls. Representative subject groups are set forth in tables 22-29, which set forth a series of 100 different subject groups having 3-10 biomarkers with an indicated mean cross-validation (CV) AUC for each subject group. The total number of occurrences of each marker in each of these subject groups is indicated at the bottom of each table.
The biomarkers selected in table 19 produced classifiers that performed better than classifiers constructed with "no markers". In fig. 15, we show the performance of our ten biomarker classifier compared to the performance of other possible classifiers.
Figure 15A shows the distribution of mean AUC for classifiers constructed from a randomly sampled set of ten "non-markers" taken from the entire set of 23 markers present in all 3 studies, excluding the ten markers in table 19. The performance of ten potential cancer biomarkers is shown as a vertical dashed line. This figure clearly shows that the performance of the ten cancer biomarkers far exceeds the distribution of the other marker combinations.
Fig. 15B shows a similar distribution as fig. 15A, however, the random sampling set was limited to 49 biomarkers from table 1 that were not selected by the greedy biomarker selection program for the ten analyte classifier. This figure demonstrates that the ten biomarkers selected by the greedy algorithm represent a subset of biomarkers that generalize to other types of cancer, which is far superior to classifiers constructed with the remaining 49 biomarkers.
Finally, fig. 16 shows the classifier ROC curves for each of the 3 cancer study classifiers. The foregoing embodiments and examples are intended to be examples only. None of the particular embodiments, examples, or elements of a particular embodiment or example should be construed as critical, required, or essential elements or features of any one of the claims. Further, elements described herein are not required for the practice of the appended claims unless explicitly described as "essential" or "critical". Various changes, modifications, substitutions and other changes may be made to the disclosed embodiments without departing from the scope of the invention, which is defined by the appended claims. The specification, including the drawings and examples, is to be regarded in an illustrative rather than a restrictive sense, and all such modifications and substitutions are intended to be included within the scope of the present application. Accordingly, the scope of the application should be determined by the appended claims and their legal equivalents, rather than by the examples given above. For example, the steps recited in any of the method or process claims may be performed in any order practicable and are not limited to the order presented in any of the implementations, examples, or claims. Further, in any of the above methods, one or more of the biomarkers of table 1 or table 19 can be specifically excluded as a single biomarker or a biomarker from any group of subjects.
Table 1: cancer biomarkers
Table 2: panel of 1 biomarker
Table 3: panel of 2 biomarkers
TABLE 3 continuation of the upper page
Table 4: panel of 3 biomarkers
TABLE 4 continuation of the upper page
Table 5: panel of 4 biomarkers
TABLE 5 continuation of the upper page
Table 6: panel of 5 biomarkers
TABLE 6 continuation of the upper page
Table 7: panel of 6 biomarkers
TABLE 7 continuation of the upper page
TABLE 7 continuation of the upper page
Table 8: panel of 7 biomarkers
TABLE 8 continuation of the upper page
TABLE 8 continuation of the upper page
TABLE 8 continuation of the upper page
Table 9: panel of 8 biomarkers
TABLE 9 continuation page
TABLE 9 continuation page
TABLE 9 continuation page
Table 10: panel of 9 biomarkers
TABLE 10 continuation page
TABLE 10 continuation page
TABLE 10 continuation page
Table 11: panel of 10 biomarkers
TABLE 11 continuation page
TABLE 11 continuation page
TABLE 11 continuation page
Table 12: marker enumeration in biomarker panel
Table 13: analytes in ten-tag classifiers
CLIC1 BDNF
MMP7 STX1A
GHR TGFBI
CHRDL1 CRP
LRIG3 KLK3-SERPINA3
AHSG KIT
Table 14: parameters derived from a training set for a naive bayes classifier.
Table 15: AUC for an exemplary combination of biomarkers
# AUC
1 MMP7 0.803
2 MMP7 CLIC1 0.883
3 MMP7 CLIC1 STX1A 0.901
4 MMP7 CLIC1 STX1A CHRDL1 0.899
5 MMP7 CLIC1 STX1A CHRDL1 PA2G4 0.912
6 MMP7 CLIC1 STX1A CHRDL1 PA2G4 SERPINA1 0.922
7 MMP7 CLIC1 STX1A CHRDL1 PA2G4 SERPINA1 BDNF 0.930
8 MMP7 CLIC1 STX1A CHRDL1 PA2G4 SERPINA1 BDNF GHR 0.937
9 MMP7 CLIC1 STX1A CHRDL1 PA2G4 SERPINA1 BDNF GHR TGFBI 0.944
10 MMP7 CLIC1 STX1A CHRDL1 PA2G4 SERPINA1 BDNF GHR TGFBI NME2 0.948
Table 16: derived from the computation of a training set for a naive bayes classifier.
Table 17: clinical features of training sets
Table 18: ten biomarker classifier proteins
Table 19: biomarkers for cancer in general
KLK3-SERPINA3 EGFR
BMPER FGA-FGB-FGG
C9 STX1A
AKR7A2 CKB-CKM
DDC CA6
IGFBP2 IGFBP4
FN1 BMP1
CRP KIT
CNTN1 SERPINA1
BDNF GHR
ITIH4 NME2
AHSG
Table 20: panel of 1 biomarker
Table 21: panel of 2 biomarkers
TABLE 21 continuation page
TABLE 21 continuation page
Table 22: panel of 3 biomarkers
TABLE 22-continuation page
Table 23: panel of 4 biomarkers
TABLE 23-continuation page
TABLE 23-continuation page
Table 24: panel of 5 biomarkers
TABLE 24-continuation page
Table 25: panel of 6 biomarkers
TABLE 25 continuation page
TABLE 25 continuation page
TABLE 25 continuation page
Table 26: panel of 7 biomarkers
TABLE 26-continuation page
TABLE 26-continuation page
TABLE 26-continuation page
Table 27: panel of 8 biomarkers
TABLE 27 continuation page
TABLE 27 continuation page
TABLE 27 continuation page
Table 28: panel of 9 biomarkers
TABLE 28-continuation page
TABLE 28-continuation page
TABLE 28-continuation page
Table 29: panel of 10 biomarkers
TABLE 29 continuation Page
TABLE 29 continuation Page
TABLE 29 continuation Page
Table 30: marker enumeration in biomarker panel
Table 31: parameters derived from a cancer training set for a naive Bayes classifier
Table 32: derived from the computation of a training set for a naive bayes classifier.

Claims (46)

1. Use of a capture agent specific for a biomarker protein listed in table 1 in the preparation of a kit for diagnosing non-small cell lung cancer in an individual by a method comprising:
providing a biomarker panel comprising N biomarker proteins listed in table 1, wherein at least one of the biomarker proteins is CHRDL 1; and is
Determining biomarker levels in a biological sample from the individual, the biomarker levels each corresponding to one of N biomarker proteins selected from Table 1, wherein the levels of the N biomarker proteins provide an indication of the likelihood that the individual has or does not have non-small cell lung cancer, and wherein N is at least 2.
2. The use of claim 1, wherein N is at least 3.
3. The use of claim 1, wherein N is at least 4.
4. The use of claim 1, wherein N is at least 5.
5. The use of claim 1, wherein N is at least 6.
6. The use of claim 1, wherein N is at least 7.
7. The use of claim 1, wherein N is at least 8.
8. The use of claim 1, wherein N is at least 9.
9. The use of claim 1, wherein N is at least 10.
10. The use of claim 1, wherein N is at least 11.
11. The use of claim 1, wherein N is at least 12.
12. Use of a capture agent specific for a biomarker protein listed in table 1 in the preparation of a kit for screening for a non-small cell lung cancer individual by a method comprising:
providing a biomarker panel comprising N biomarker proteins listed in table 1, wherein at least one of the biomarker proteins is CHRDL 1; and is
Determining biomarker levels in a biological sample from the individual, the biomarker levels each corresponding to one of N biomarker proteins selected from Table 1, wherein the levels of the N biomarker proteins provide an indication of the likelihood that the individual has or does not have non-small cell lung cancer, and wherein N is at least 2.
13. The use of claim 12, wherein N is at least 3.
14. The use of claim 12, wherein N is at least 4.
15. The use of claim 12, wherein N is at least 5.
16. The use of claim 12, wherein N is at least 6.
17. The use of claim 12, wherein N is at least 7.
18. The use of claim 12, wherein N is at least 8.
19. The use of claim 12, wherein N is at least 9.
20. The use of claim 12, wherein N is at least 10.
21. The use of claim 12, wherein N is at least 11.
22. The use of claim 12, wherein N is at least 12.
23. Use of a capture agent specific for a biomarker protein listed in table 1 in the preparation of a kit for diagnosing non-small cell lung cancer in an individual by a method comprising:
providing a biomarker panel comprising N biomarker proteins listed in table 1, wherein at least one of the biomarker proteins is CHRDL 1; and is
Determining biomarker levels in a biological sample from the individual, the biomarker levels each corresponding to one of N biomarker proteins selected from Table 1, wherein the levels of the N biomarker proteins provide an indication of the likelihood that the individual has or does not have non-small cell lung cancer, and wherein N is at least 2, and
determining a biomarker value for each protein biomarker of the biomarker panel, wherein the combined biomarker value of the biomarker panel provides an indication of the likelihood that the individual has or does not have non-small cell lung cancer, and wherein the combined biomarker value of the biomarker panel has an AUC value of 0.80 or greater.
24. The use of claim 23, wherein the biomarker panel has an AUC value of 0.85 or greater.
25. The use of claim 23, wherein N is at least 3.
26. The use of claim 23, wherein N is at least 4.
27. The use of claim 23, wherein N is at least 5.
28. The use of claim 23, wherein N is at least 6.
29. The use of claim 23, wherein N is at least 7.
30. The use of claim 23, wherein N is at least 8.
31. The use of claim 23, wherein N is at least 9.
32. The use of claim 23, wherein N is at least 10.
33. The use of claim 23, wherein N is at least 11.
34. The use of claim 23, wherein N is at least 12.
35. The use of any one of claims 1-11 and 12-34, wherein determining the biomarker level comprises performing an in vitro assay.
36. The use of claim 35, wherein said in vitro assay comprises at least one capture reagent corresponding to each of said biomarkers, and further comprising selecting said at least one capture reagent from the group consisting of aptamers and antibodies.
37. The use of claim 36, wherein the at least one capture reagent is an aptamer.
38. The use of claim 35, wherein the in vitro assay is selected from the group consisting of an immunoassay, an aptamer-based assay, a histological or cytological assay.
39. The use of any one of claims 1-11 and 12-34, wherein each protein biomarker level is assessed based on a predetermined level or a predetermined level range.
40. The use of any one of claims 1-11 and 12-34, wherein the biological sample is selected from the group consisting of whole blood, plasma, and serum.
41. The use of claim 40, wherein the biological sample is serum.
42. The use of any one of claims 1-11 and 12-34, wherein the individual is a human.
43. The use of any one of claims 1-11 and 12-34, wherein the individual is a smoker.
44. The use of any one of claims 1-11 and 12-34, wherein the individual has a pulmonary nodule.
45. The use of any one of claims 1-11 and 12-34, wherein the individual is diagnosed as having, or the likelihood of the individual having, non-small cell lung cancer is determined based on the biomarker levels and at least one additional item of biomedical information corresponding to the individual.
46. The use according to claim 45, wherein the at least one further item of biomedical information is independently selected from the group consisting of:
(c) information corresponding to the presence or absence of pulmonary nodules in the individual,
(e) information corresponding to a change in height and/or weight of the individual,
(f) information corresponding to the ethnicity of the individual,
(g) information corresponding to the gender of the individual,
(h) information corresponding to the individual's smoking history,
(j) information corresponding to a history of alcohol consumption in the individual,
(k) information corresponding to a professional history in the individual,
(l) Information corresponding to a family history of lung cancer or other cancer in the individual, and
(m) information corresponding to the presence or absence in the individual of at least one genetic marker associated with a higher risk of lung cancer or other cancer in the individual or a family member of the individual.
HK17102551.8A 2017-03-13 Lung cancer biomarkers and uses thereof HK1229002B (en)

Publications (3)

Publication Number Publication Date
HK1229002A HK1229002A (en) 2017-11-10
HK1229002A1 HK1229002A1 (en) 2017-11-10
HK1229002B true HK1229002B (en) 2019-06-06

Family

ID=

Similar Documents

Publication Publication Date Title
CN106168624B (en) Lung cancer biomarkers and application thereof
CN103415624B (en) Pancreatic Cancer Biomarkers and Their Uses
AU2015249113B2 (en) Lung cancer biomarkers and uses thereof
CN102985819B (en) Lung cancer biomarkers and their uses
US20120143805A1 (en) Cancer Biomarkers and Uses Thereof
CN103429753A (en) Mesothelioma biomarkers and uses thereof
WO2011031344A1 (en) Cancer biomarkers and uses thereof
US20220065872A1 (en) Lung Cancer Biomarkers and Uses Thereof
HK1229002B (en) Lung cancer biomarkers and uses thereof
HK1229002A1 (en) Lung cancer biomarkers and uses thereof
HK1229002A (en) Lung cancer biomarkers and uses thereof
HK1229003A1 (en) Pancreatic cancer biomarkers and uses thereof
HK1229003B (en) Pancreatic cancer biomarkers and uses thereof
HK1196429A (en) Lung cancer biomarkers and uses thereof
HK1196429B (en) Lung cancer biomarkers and uses thereof