WO2010004562A2

WO2010004562A2 - Methods and compositions for detecting colorectal cancer

Info

Publication number: WO2010004562A2
Application number: PCT/IL2009/000683
Authority: WO
Inventors: Baruch Brenner; Shlomit Gilad; Yaron Goren; Ayelet Chajut
Original assignee: Mor Research Applications Ltd; Rosetta Genomics Ltd
Current assignee: Mor Research Applications Ltd; Rosetta Genomics Ltd
Priority date: 2008-07-09
Filing date: 2009-07-08
Publication date: 2010-01-14
Anticipated expiration: 2011-01-09
Also published as: WO2010004562A3

Abstract

The invention provides a method for conducting minimally-invasive early detection of colorectal cancer and/or of colorectal cancer precursor cells, by using microRNA molecules associated with colorectal cancer, as well as various nucleic acid molecules relating thereto or derived thereof.

Description

METHODS AND COMPOSITIONS FOR DETECTING COLORECTAL CANCER

CROSS REFERENCE TO RELATED APPLICATIONS The present application claims priority under 35 U. S. C. § 119(e) to U.S. Provisional

Application No. 61/079,136, filed July 9, 2008 which is herein incorporated by reference in its entirety.

FIELD OF THE INVENTION

The invention relates in general to microRNA molecules associated with colorectal cancer, as well as various nucleic acid molecules relating thereto or derived thereof.

BACKGROUND OF THE INVENTION

In recent years, microRNAs (miRs) have emerged as an important novel class of regulatory RNA, which have a profound impact on a wide array of biological processes. These small (typically 17-24 nucleotides long) non-coding RNA molecules can modulate protein expression patterns by promoting RNA degradation, inhibiting mRNA translation, and also affecting gene transcription. miRs play pivotal roles in diverse processes such as development and differentiation, control of cell proliferation, stress response and metabolism. The expression of many miRs was found to be altered in numerous types of human cancer, and in some cases strong evidence has been put forward in support of the conjecture that such alterations may play a causative role in tumor progression. There are currently about 885 known human miRs.

Colorectal cancer (CRC) is the third most frequently diagnosed malignancy in the United States and the second most common cause of cancer death. The prognosis of CRC is directly related to the stage of the disease. The five-year survival rate for patients with an early stage is approximately 90% but only one third of patients are diagnosed at this stage. The survival rate drops to 70-80% in case of deep tumor penetration into the bowel wall, 50-70% in case of regional lymph node involvement and 5-7% in patients with distant metastases. As described above, the prognosis of colorectal cancer is directly related to the degree of penetration of the tumor through the bowel wall and the presence or absence of nodal involvement; consequently early detection and treatment are crucial. Currently, screening tests for CRC, aimed at detecting early malignancy or premalignant polyps, consist of fecal occult blood test (FOBT), sigmoidoscopy, colonoscopy and double contrast barium enemas. Each one of these methods has significant limitations, mainly low sensitivity or specificity, or both, as well as associated risks and discomfort and a reluctance to comply with collection of fecal samples in the case of FOBT. Consequently, none has been widely accepted by the medical community or the public. Treatment regimens are determined by the type and stage of the cancer, and include surgery, radiation therapy and/or chemotherapy (including biological agents). Recurrence following surgery (the most common form of therapy) is a major problem and is often the ultimate cause of death. In spite of considerable research into diagnosis and therapies for colorectal cancer, it remains difficult to diagnose and treat effectively. Accordingly, there is a need in the art for improved methods for detecting colorectal cancer early in the course of the disease.

SUMMARY OF THE INVENTION Circulating nucleic acids in body fluids offer unique opportunities for early diagnosis of colorectal cancer. The present invention provides specific nucleic acid sequences for use in the identification, early detection and diagnosis of colorectal cancer. The nucleic acid sequences can also be used as prognostic markers for prognostic evaluation of a subject based on their expression pattern in a biological sample. The invention further provides a method of minimally-invasive early detection of colorectal cancer and/or of colorectal cancer precursor cells.

The invention further provides a method for the detection of colorectal cancer, the method comprising: obtaining a biological sample from a subject; determining an expression profile in said sample of nucleic acid sequences selected from the group consisting of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132-155, 157-177; a fragment thereof or a sequence having at least about 80% identity thereto; and comparing said expression profile to a reference expression profile representing the expression levels of said nucleic acid in healthy controls; whereby an altered expression level of the nucleic acid sequence allows the detection of said colorectal cancer. According to some embodiments, said altered expression level is a change in a score based on a combination of expression level of said nucleic acid sequences.

According to some embodiments, relatively high expression levels of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 1, 4, 5, 9-12, 17, 18, 22-24, 33-34, 81-84, 86-104, 124, 128-129, 132-154, 174; a fragment thereof and a sequence having at least about 80% identity thereto is indicative of the presence of colorectal cancer.

According to other embodiments, relatively low expression levels of a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 2, 3, 6-8, 13-16, 19-21, 105, 107-123, 125-127, 155, 157-173, 175-177; a fragment thereof and a sequence having at least about 80% identity thereto is indicative of the presence of colorectal cancer.

According to some embodiments, said method further comprising managing subject treatment based on the colorectal cancer status.

According to other embodiments, managing subject treatment is selected from ordering further diagnostic tests, administering at least one therapeutic agent, administering radiation therapy, immunotherapy, hyperthermia, surgery, surgery followed or preceded by chemotherapy and/or radiation therapy, biotherapy, and taking no further action.

According to some embodiments, said biological sample is selected from the group consisting of bodily fluid, a cell line and a tissue sample. According to one embodiment, the bodily fluid sample is a serum sample. According to another embodiment, said bodily fluid sample is a urine sample.

According to some embodiments, the method comprises determining the expression of at least two nucleic acid sequences. According to some embodiments the method further comprising combining one or more expression ratios. According to some embodiments, the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization, nucleic acid amplification, and a combination thereof. According to some embodiments, the nucleic acid amplification method is real-time PCR (RT-PCR). According to one embodiment, said real-time PCR is quantitative real-time PCR (qRT- PCR). According to some embodiments, the RT-PCR method comprises forward and reverse primers. According to other embodiments, the forward primer comprises a sequence selected from the group consisting of SEQ ID NOS: 36-46, 66-67, 225-265, 267-271; a fragment thereof and a sequence having at least about 80% identity thereto. According to some embodiments, the real-time PCR method further comprises hybridization with a probe. According to some embodiments, the probe comprises a nucleic acid sequence that is complementary to a sequence selected from the group consisting of any one of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132-155, 157-177; a fragment thereof and sequences at least about 80% identical thereto. According to other embodiments, the probe comprises a sequence selected from the group consisting of any one of SEQ ID NOS: 47-57, 77-78,178-203, 205-218, 220-224; a fragment thereof and a sequence having at least about 80% identity thereto.

The invention further provides a kit for the detection of colorectal cancer; said kit comprises a probe comprising a nucleic acid sequence that is complementary to a sequence selected from the group consisting of any one of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105,

107-129, 132-155, 157-177; a fragment thereof and sequences having at least about 80% identity thereto. According to some embodiments, said probe comprising a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 47-57, 77-78,178-203, 205- 218, 220-224; a fragment thereof and sequences having at least about 80% identity thereto.

According to other embodiments, the kit further comprises a forward primer comprising a sequence selected from the group consisting of SEQ ID NOS: 36-46, 66-67, 225-265, 267-

271; a fragment thereof and sequences having at least about 80% identity thereto. According to other embodiments, the kit further comprises a reverse primer comprising SEQ ID NO: 80, a fragment thereof and sequences having at least about 80% identity thereto.

These and other embodiments of the present invention will become apparent in conjunction with the figures, description and claims that follow.

BRIEF DESCRIPTION OF THE DRAWINGS Figures IA- ID are plots showing the detection of microRNAs in serum. MicroRNA profile is relatively constant in the population. Figures IA- IB show the microRNA levels in serum samples taken from ten healthy individuals. Signals (CT values) from two samples are shown (One sample C_T, vertical axis) relative to the median of all ten samples (horizontal axis). The levels of 363 different microRNAs were similar across the 10 samples with correlation coefficient above 0.91 for most pairs of samples. Figure 1C is a histogram of standard deviations (horizontal axis) for detection level (C_T) of the microRNAs shown in figures IA- IB (vertical axis indicates the number of miRs with the specified standard deviation). For most of the microRNAs the standard deviation across the 10 samples was below 1 C_T. Figure ID demonstrates box plots showing the signals (C_T values, vertical axis) of 11 microRNAs from Table 3 in serum samples taken from 74 healthy individuals. Boxplots show the median (horizontal line), 25 to 75 percentile (box), extent of data up to 1.5 times the interquartile range ("whiskers"), and outliers (crosses).

Figures 2A-2C demonstrate the different levels of circulating microRNAs in the sera of healthy individuals and CRC patients. Figures 2A-2B demonstrate that the amounts of circulating lisa-miR-16 (SEQ ID NO: 2) (Fig. 2A) and hsa-miR-125b (SEQ ID NO: 3) (Fig. 2B) are lower (higher C_τ) in sera of CRC patients (boxes 2, 4) compared to healthy individuals (boxes I₅ 3). The differences observed in the initial screening (10 samples in each group) (boxes 1, 2) were maintained in the larger cohort (boxes 3, 4). Boxplots show the median (horizontal line), 25 to 75 percentile (box), and extent of data ("whiskers"). Figure 2C demonstrates the classification of CRC based on microRNA profile in serum. Each of the 74 healthy controls and 44 CRC patients is represented by a marker. Squares represent healthy controls, and circles (of increasing darkness) represent CRC patients (with increasing stages of disease). The horizontal axis shows the normalized C_T values of hsa- miR-16 (C_T ¹⁶), where increasing values of C_T ¹⁶ indicate lower levels of hsa-miR-16. The vertical axis shows the normalized C_T values of hsa-miR-125b (Cτ^125b), where increasing value of Cχ^125b indicate lower levels of hsa-miR-125b. CRC patients have lower levels of hsa-miR-16 and hsa-miR-125b (higher values of Or¹⁶ and Cχ^125b). Two possible thresholds for classifications are shown. A cutoff at the solid line

had sensitivity of 91% (40/44) and specificity of 72% (53/74), a cutoff at the dashed line (C_τ ¹⁶+C_τ ^125b=59.6) had sensitivity of 55% (24/44) and specificity of 93% (69/74) in identifying colorectal cancer patients. For the group of non-metastatic cancer patients, the sensitivities of these cutoffs were 93% and 57%, respectively (the specificities are unchanged); for the metastatic patients the sensitivities were 88% and 50%, respectively (specificities are unchanged). These differences are not significant (p-value>0.5 by Fisher exact test). Inset: Receiver operating characteristic (ROC) for the metric defined by the combination of these two microRNAs (C≡C_τ ¹⁶⁺C_τ ^125b), which had an area under the curve (AUC) of 0.86. The ROC curves had AUC=0.86 for non-metastatic CRC patients and AUC=0.85 for metastatic cases. Dark square indicates the cutoff at C=58.6, light square indicated the cutoff at C=59.6. Dotted line indicates a random classifier, with AUC=O.5.

Figure 3 is a graph showing differential levels of microRNA in serum samples obtained from CRC patients (vertical axis) (n=19) as compared to microRNA levels in serum samples obtained from healthy individuals (horizontal axis) (n=19). The results are based on RT-PCR analysis, and show the median of the normalized signal of each microRNA (represented by crosses) for each of the two groups (the horizontal/vertical axes). The parallel lines describe a fold change of 1.5 in either direction between the groups. Statistically significant microRNAs are marked with circles (see details in Table 6). P- values are calculated by two sided Student t-test, and significance is adjusted using FDR (false discovery rate) of 0.1. Figures 4A-4H are boxplots presentations comparing distributions of the presence of exemplified statistically significant microRNAs: hsa-miR-211 (SEQ ID NO: 94) (4A), hsa- miR-451 (SEQ ID NO: 6) (4B), hsa-miR-107 (SEQ ID NO: 111) (4C), hsa-miR-15a (SEQ ID NO: 117) (4D), hsa-miR-622 (SEQ ID NO: 82) (4E), hsa-miR-658 (SEQ ID NO: 4) (4F), hsa-miR-501-5p (SEQ ID NO: 88) (4G) and hsa-miR-500* (SEQ ID NO: 92) (4H) in serum samples obtained from CRC patients or healthy subjects. The results are based on Real time PCR, and a higher normalized signal indicates higher amounts of microRNA present in the sample or samples. The normalized C_T signal (vertical axis) is calculated as follows: for each sample, the sample-average-Cτ is calculated by taking the average C_T of all probes tested, for this sample. The overall-average-Cτ is calculated by taking the mean of the sample-average-Cτ over all samples. For each sample, the rescaling-number is calculated by subtracting the overall-average-Cτ from the sample-average-Cτ. The rescaled-signals (for each probe) are calculated for each sample by subtracting the rescaling-number from the original C_T of each probe. The C_T measurement by PCR, as well as the rescaled-signal described above, produces higher numbers if the amount of original measured sequence is lower. In order to show the measurement on a more intuitive scale, where higher numbers represent higher amount of measured substance, we use the "normalized signal" which is the rescaled-signal subtracted from the arbitrary number 50, so chosen because all C_T measurements in our system are smaller than 50, which is above the maximal cycle used. For calculation of fold-changes, the data is translated from the Cτ-space which is logarithmic in the amounts measured to a linear measurement space by taking the exponent (base 2). For each miR two boxes are shown, the left box is for the group of serum samples obtained from CRC patients and the right box is for the group of serum samples obtained from healthy samples. The line in the box indicates the median value. The box top and bottom boundaries indicate the 25 and 75 percentile. The horizontal lines and crosses (outliers whose distance from top or bottom box boundary is more than 1.5 times the height of the box) show the full range of signals in this group. DETAILED DESCRIPTION OF THE INVENTION

The invention is based on the discovery that specific biomarker sequences (SEQ ID NOS: 1-271) can be used for the identification, early detection and diagnosis of colorectal cancers. Biomarkers have the potential to revolutionize diagnosis and treatment of various medical conditions. In particular, a theme of current cancer research is the quest for sensitive biomarkers that can be exploited to detect early neoplastic changes. Ideally, biomarkers should be sampled in a minimal-invasive way. Therefore the challenge of diverse biomedical research fields has been to identify biomarkers in body fluids, such as serum or urine. In recent years it has become clear that both cell-free DNA and mRNA are present in serum, as well as in other body fluids, and represent potential biomarkers. However, monitoring the typically small amounts of these nucleic acids in body fluids requires sensitive detection methods, which are not currently clinically applicable.

The present invention provides a sensitive, specific and accurate method which can be used for conducting in a minimally-invasive early detection of colorectal cancer and/or of colorectal cancer precursor cells. The methods of the present invention have high sensitivity and specificity.

Surprisingly, the above method allows simple minimally-invasive test, for easy detection of colorectal cancer and/or colorectal cancer precursor cells at a very early stage with higher reliability and effectiveness, saving time, material and operating steps, as well as saving cost and fine chemicals difficult to obtain.

Furthermore, the method according to the invention combines the advantages of easy sample collection and the option of diagnosing colorectal cancer or colorectal cancer precursor cells at an early stage. Being a minimally-invasive method, in which e.g. delivering a sample of serum, the method has a good potential to achieve high acceptance among subjects, which subjects can be humans or animals, for example. Therefore, the method can be used in routine tests, but also in prophylactic medical examinations. Definitions

Before the present compositions and methods are disclosed and described, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. It must be noted that, as used in the specification and the appended claims, the singular forms "a," "an" and "the" include plural referents unless the context clearly dictates otherwise. For the recitation of numeric ranges herein, each intervening number there between with the same degree of precision is explicitly contemplated. For example, for the range of 6-9, the numbers 7 and 8 are contemplated in addition to 6 and 9, and for the range 6.0-7.0, the number 6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8, 6.9 and 7.0 are explicitly contemplated. aberrant proliferation

As used herein, the term "aberrant proliferation" means cell proliferation that deviates from the normal, proper, or expected course. For example, aberrant cell proliferation may include inappropriate proliferation of cells whose DNA or other cellular components have become damaged or defective. Aberrant cell proliferation may include cell proliferation whose characteristics are associated with an indication caused by, mediated by, or resulting in inappropriately high levels of cell division, inappropriately low levels of apoptosis, or both. Such indications may be characterized, for example, by single or multiple local abnormal proliferations of cells, groups of cells, or tissue(s), whether cancerous or non-cancerous, benign or malignant. about

As used herein, the term "about" refers to +/-10%. antisense

The term "antisense," as used herein, refers to nucleotide sequences which are complementary to a specific DNA or RNA sequence. The term "antisense strand" is used in reference to a nucleic acid strand that is complementary to the "sense" strand. Antisense molecules may be produced by any method, including synthesis by ligating the gene(s) of interest in a reverse orientation to a viral promoter which permits the synthesis of a complementary strand. Once introduced into a cell, this transcribed strand combines with natural sequences produced by the cell to form duplexes. These duplexes then block either the further transcription or translation. In this manner, mutant phenotypes may be generated. attached

"Attached" or "immobilized" as used herein refer to a probe and a solid support and may mean that the binding between the probe and the solid support is sufficient to be stable under conditions of binding, washing, analysis, and removal. The binding may be covalent or non-covalent. Covalent bonds may be formed directly between the probe and the solid support or may be formed by a cross linker or by inclusion of a specific reactive group on either the solid support or the probe, or both. Non-covalent binding may be one or more of electrostatic, hydrophilic, and hydrophobic interactions. Included in non-covalent binding is the covalent attachment of a molecule, such as streptavidin, to the support and the non- covalent binding of a biotinylated probe to the streptavidin. Immobilization may also involve a combination of covalent and non-covalent interactions. biological sample

"Biological sample" as used herein means a sample of biological tissue or fluid that comprises nucleic acids. Such samples include, but are not limited to, tissue or fluid isolated from subjects. Biological samples may also include sections of tissues such as biopsy and autopsy samples, FFPE samples, frozen sections taken for histological purposes, blood, plasma, serum, sputum, stool, tears, mucus, hair, and skin. Biological samples also include explants and primary and/or transformed cell cultures derived from animal or patient tissues. Biological samples may also be blood, a blood fraction, urine, effusions, ascitic fluid, saliva, cerebrospinal fluid, cervical secretions, vaginal secretions, endometrial secretions, gastrointestinal secretions, bronchial secretions, sputum, cell line, tissue sample, or secretions from the breast. A biological sample may be provided by removing a sample of cells from an animal, but can also be accomplished by using previously isolated cells (e.g., isolated by another person, at another time, and/or for another purpose), or by performing the methods described herein in vivo. Archival tissues, such as those having treatment or outcome history, may also be used. cancer The term "cancer" is meant to include all types of cancerous growths or oncogenic processes, metastatic tissues or malignantly transformed cells, tissues, or organs, irrespective of histopathologic type or stage of invasiveness. Examples of cancers include but are nor limited to solid tumors and leukemias, including: apudoma, choristoma, branchioma, malignant carcinoid syndrome, carcinoid heart disease, carcinoma (e.g., Walker, basal cell, basosquamous, Brown-Pearce, ductal, Ehrlich tumor, small cell lung, non-small cell lung (e.g., lung squamous cell carcinoma, lung adenocarcinoma and lung undifferentiated large cell carcinoma), oat cell, papillary, bronchiolar, bronchogenic, squamous cell, and transitional cell), histiocytic disorders, leukemia (e.g., B cell, mixed cell, null cell, T cell, T-cell chronic, HTLV-II-associated, lymphocytic acute, lymphocytic chronic, mast cell, and myeloid), histiocytosis malignant, Hodgkin disease, immunoproliferative small, non-Hodgkin lymphoma, plasmacytoma, reticuloendotheliosis, melanoma, chondroblastoma, chondroma, chondrosarcoma, fibroma, fibrosarcoma, giant cell tumors, histiocytoma, lipoma, liposarcoma, mesothelioma, myxoma, myxosarcoma, osteoma, osteosarcoma, Ewing sarcoma, synovioma, adenofibroma, adenolymphoma, carcinosarcoma, chordoma, craniopharyngioma, dysgerminoma, hamartoma, mesenchymoma, mesonephroma, myosarcoma, ameloblastoma, cementoma, odontoma, teratoma, thymoma, trophoblastic tumor, adeno-carcinoma, adenoma, cholangioma, cholesteatoma, cylindroma, cystadenocarcinoma, cystadenoma, granulosa cell tumor, gynandroblastoma, hepatoma, hidradenoma, islet cell tumor, Leydig cell tumor, papilloma, Sertoli cell tumor, theca cell tumor, leiomyoma, leiomyosarcoma, myoblastoma, myosarcoma, rhabdomyoma, rhabdomyosarcoma, ependymoma, ganglioneuroma, glioma, medulloblastoma, meningioma, neurilemmoma, neuroblastoma, neuroepithelioma, neurofibroma, neuroma, paraganglioma, paraganglioma nonchromaffin, angiokeratoma, angiolymphoid hyperplasia with eosinophilia, angioma sclerosing, angiomatosis, glomangioma, hemangioendothelioma, hemangioma, hemangiopericytoma, hemangiosarcoma, lymphangioma, lymphangiomyoma, lymphangiosarcoma, pinealoma, carcinosarcoma, chondrosarcoma, cystosarcoma, phyllodes, fibrosarcoma, hemangiosarcoma, leimyosarcoma, leukosarcoma, liposarcoma, lymphangiosarcoma, myosarcoma, myxosarcoma, ovarian carcinoma, rhabdomyosarcoma, sarcoma (e.g., Ewing, experimental, Kaposi, and mast cell), neurofibromatosis, and cervical dysplasia, and other conditions in which cells have become immortalized or transformed. classification

"Classification" as used herein refers to a procedure and/or algorithm in which individual items are placed into groups or classes based on quantitative information on one or more characteristics inherent in the items (referred to as traits, variables, characters, features, etc) and based on a statistical model and/or a training set of previously labeled items. According to one embodiment, classification means determination of the type of colorectal cancer. colorectal cancer status The term "colorectal cancer status" refers to the status of the disease in the patient.

Examples of types of colorectal cancer statuses include, but are not limited to, the subject's risk of cancer, including colorectal carcinoma, the presence or absence of disease (e.g., carcinoma), the stage of disease in a patient (e.g., carcinoma), and the effectiveness of treatment of disease. Other statuses and degrees of each status are known in the art complement

"Complement" or "complementary" as used herein means Watson-Crick (e.g., A- TYU and C-G) or Hoogsteen base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. A full complement or fully complementary may mean 100% complementary base pairing between nucleotides or nucleotide analogs of nucleic acid molecules. CT

C_T signals represent the first cycle of PCR where amplification crosses a threshold (cycle threshold) of fluorescence. Accordingly, low values of C_T represent high abundance or expression levels of the microRNA. hi some embodiments the PCR C_T signal is normalized such that the normalized CT remains inversed from the expression level, hi other embodiments the PCR C_T signal may be normalized and then inverted such that low normalized-inverted C_T represents low abundance or expression levels of the microRNA. detection

"Detection" means detecting the presence of a component in a sample. Detection also means detecting the absence of a component. Detection also means measuring the level of a component, either quantitatively or qualitatively. differential expression

"Differential expression" means qualitative or quantitative differences in the temporal and/or cellular gene expression patterns within and among cells and tissue. Thus, a differentially expressed gene may qualitatively have its expression altered, including an activation or inactivation, in, e.g., normal versus disease tissue. Genes may be turned on or turned off in a particular state, relative to another state thus permitting comparison of two or more states. A qualitatively regulated gene may exhibit an expression pattern within a state or cell type which may be detectable by standard techniques. Some genes may be expressed in one state or cell type, but not in both. Alternatively, the difference in expression may be quantitative, e.g., in that expression is modulated, either up-regulated, resulting in an increased amount of transcript, or down-regulated, resulting in a decreased amount of transcript. The degree to which expression differs need only be large enough to quantify via standard characterization techniques such as expression arrays, quantitative reverse transcriptase PCR, northern analysis, real-time PCR, in situ hybridization and RNase protection. expression profile

The term "expression profile" is used broadly to include a genomic expression profile, e.g., an expression profile of microRNAs. Profiles may be generated by any convenient means for determining a level of a nucleic acid sequence e.g. quantitative hybridization of microRNA, labeled microRNA, amplified microRNA, cDNA, etc., quantitative PCR, ELISA for quantitation, and the like, and allow the analysis of differential gene expression between two samples. A subject or patient tumor sample, e.g., cells or collections thereof, e.g., tissues, is assayed. Samples are collected by any convenient method, as known in the art. Nucleic acid sequences of interest are nucleic acid sequences that are found to be predictive, including the nucleic acid sequences provided above, where the expression profile may include expression data for 5, 10, 20, 25, 50, 100 or more of, including all of the listed nucleic acid sequences. According to some embodiments, the term "expression profile" means measuring the abundance of the nucleic acid sequences in the measured samples. , expression ratio

"Expression ratio" as used herein refers to relative expression levels of two or more nucleic acids as determined by detecting the relative expression levels of the corresponding nucleic acids in a biological sample.

FDR When performing multiple statistical tests, for example in comparing the signal between two groups in multiple data features, there is an increasingly high probability of obtaining false positive results, by random differences between the groups that can reach levels that would otherwise be considered as statistically significant, hi order to limit the proportion of such false discoveries, statistical significance is defined only for data features in which the differences reached a p-value (by two-sided t-test) below a threshold, which is dependent on the number of tests performed and the distribution of p-values obtained in these tests. fragment

"Fragment" is used herein to indicate a non-full length part of a nucleic acid or polypeptide. Thus, a fragment is itself also a nucleic acid or polypeptide, respectively. gene

"Gene" as used herein may be a natural (e.g., genomic) or synthetic gene comprising transcriptional and/or translational regulatory sequences and/or a coding region and/or non- translated sequences (e.g., introns, 5'- and 3'-untranslated sequences). The coding region of a gene may be a nucleotide sequence coding for an amino acid sequence or a functional RNA, such as tRNA, rRNA, catalytic RNA, siRNA, miRNA or antisense RNA. A gene may also be an niRNA or cDNA corresponding to the coding regions (e.g., exons and miRNA) optionally comprising 5'- or 3 '-untranslated sequences linked thereto. A gene may also be an amplified nucleic acid molecule produced in vitro comprising all or a part of the coding region and/or 5'- or 3 '-untranslated sequences linked thereto.

Groove binder/minor groove binder (MGB)

"Groove binder" and/or "minor groove binder" may be used interchangeably and refer to small molecules that fit into the minor groove of double-stranded DNA, typically in a sequence-specific manner. Minor groove binders may be long, flat molecules that can adopt a crescent-like shape and thus, fit snugly into the minor groove of a double helix, often displacing water. Minor groove binding molecules may typically comprise several aromatic rings connected by bonds with torsional freedom such as furan, benzene, or pyrrole rings. Minor groove binders may be antibiotics such as netropsin, distamycin, berenil, pentamidine and other aromatic diamidines, Hoechst 33258, SN 6999, aureolic anti-tumor drugs such as chromomycin and mithramycin, CC- 1065, dihydrocyclopyrroloindole tripeptide (DPI₃), l,2-dihydro-(3H)-pyrrolo[3,2-e]indole-7-carboxylate (CDPI₃), and related compounds and analogues, including those described in Nucleic Acids in Chemistry and Biology, 2d ed., Blackburn and Gait, eds., Oxford University Press, 1996, and PCT

Published Application No. WO 03/078450, the contents of which are incorporated herein by reference. A minor groove binder may be a component of a primer, a probe, a hybridization tag complement, or combinations thereof. Minor groove binders may increase the T_m of the primer or a probe to which they are attached, allowing such primers or probes to effectively hybridize at higher temperatures. host cell

"Host cell" as used herein may be a naturally occurring cell or a transformed cell that may contain a vector and may support replication of the vector. Host cells may be cultured cells, explants, cells in vivo, and the like. Host cells may be prokaryotic cells such as E. coli, or eukaryotic cells such as yeast, insect, amphibian, or mammalian cells, such as CHO and

HeLa. identity

"Identical" or "identity" as used herein in the context of two or more nucleic acids or polypeptide sequences mean that the sequences have a specified percentage of residues that are the same over a specified region. The percentage may be calculated by optimally aligning the two sequences, comparing the two sequences over the specified region, determining the number of positions at which the identical residue occurs in both sequences to yield the number of matched positions, dividing the number of matched positions by the total number of positions in the specified region, and multiplying the result by 100 to yield the percentage of sequence identity. In cases where the two sequences are of different lengths or the alignment produces one or more staggered ends and the specified region of comparison includes only a single sequence, the residues of the single sequence are included in the denominator but not the numerator of the calculation. When comparing DNA and RNA, thymine (T) and uracil (U) may be considered equivalent. Identity may be performed manually or by using a computer sequence algorithm such as BLAST or BLAST 2.0. in situ detection

"In situ detection" as used herein means the detection of expression or expression levels in the original site hereby meaning in a tissue sample such as biopsy. label

"Label" as used herein means a composition detectable by spectroscopic, photochemical, biochemical, immunochemical, chemical, or other physical means. For example, useful labels include ³²P, fluorescent dyes, electron-dense reagents, enzymes (e.g., as commonly used in an ELISA), biotin, digoxigenin, or haptens and other entities which can be made detectable. A label may be incorporated into nucleic acids and proteins at any position. nucleic acid

"Nucleic acid" or "oligonucleotide" or "polynucleotide" as used herein mean at least two nucleotides covalently linked together. The depiction of a single strand also defines the sequence of the complementary strand. Thus, a nucleic acid also encompasses the complementary strand of a depicted single strand. Many variants of a nucleic acid may be used for the same purpose as a given nucleic acid. Thus, a nucleic acid also encompasses substantially identical nucleic acids and complements thereof. A single strand provides a probe that may hybridize to a target sequence under stringent hybridization conditions. Thus, a nucleic acid also encompasses a probe that hybridizes under stringent hybridization conditions.

Nucleic acids may be single stranded or double stranded, or may contain portions of both double stranded and single stranded sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA, or a hybrid, where the nucleic acid may contain combinations of deoxyribo- and ribo-nucleotides, and combinations of bases including uracil, adenine, thymine, cytosine, guanine, inosine, xanthine hypoxanthine, isocytosine and isoguanine. Nucleic acids may be obtained by chemical synthesis methods or by recombinant methods.

A nucleic acid will generally contain phosphodiester bonds, although nucleic acid analogs may be included that may have at least one different linkage, e.g., phosphoramidate, phospliorothioate, phosphorodithioate, or O-methylphosphoroamidite linkages and peptide nucleic acid backbones and linkages. Other analog nucleic acids include those with positive backbones; non-ionic backbones, and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, which are incorporated by reference. Nucleic acids containing one or more non-naturally occurring or modified nucleotides are also included within one definition of nucleic acids. The modified nucleotide analog may be located for example at the 5'-end and/or the 3'-end of the nucleic acid molecule. Representative examples of nucleotide analogs may be selected from sugar- or backbone-modified ribonucleotides. It should be noted, however, that also nucleobase-modified ribonucleotides, i.e. ribonucleotides, containing a non-naturally occurring nucleobase instead of a naturally occurring nucleobase such as uridines or cytidines modified at the 5-position, e.g. 5-(2- amino) propyl uridine, 5-bromo uridine; adenosines and guanosines modified at the 8- position, e.g. 8-bromo guanosine; deaza nucleotides, e.g. 7-deaza-adenosine; O- and N- alkylated nucleotides, e.g. N6-methyl adenosine are suitable. The 2'-OH-group may be replaced by a group selected from H, OR, R, halo, SH, SR, NH₂, NHR, NR₂ or CN, wherein R is C₁-C₆ alkyl, alkenyl or alkynyl and halo is F, Cl, Br or I. Modified nucleotides also include nucleotides conjugated with cholesterol through, e.g., a hydroxyprolinol linkage as described in Krutzfeldt et al., Nature 438:685-689 (2005) and Soutschek et al., Nature 432:173-178 (2004), which are incorporated herein by reference. Modifications of the ribose-phosphate backbone may be done for a variety of reasons, e.g., to increase the stability and half-life of such molecules in physiological environments, to enhance diffusion across cell membranes, or as probes on a biochip. The backbone modification may also enhance resistance to degradation, such as in the harsh endocytic environment of cells. The backbone modification may also reduce nucleic acid clearance by hepatocytes, such as in the liver. Mixtures of naturally occurring nucleic acids and analogs may be made; alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made. probe "Probe" as used herein means an oligonucleotide capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. There may be any number of base pair mismatches which will interfere with hybridization between the target sequence and the single stranded nucleic acids described herein. However, if the number of mutations is so great that no hybridization can occur under even the least stringent of hybridization conditions, the sequence is not a complementary target sequence. A probe may be single stranded or partially single and partially double stranded. The strandedness of the probe is dictated by the structure, composition, and properties of the target sequence. Probes may be directly labeled or indirectly labeled such as with biotin to which a streptavidin complex may later bind. promoter "Promoter" as used herein means a synthetic or naturally-derived molecule which is capable of conferring, activating or enhancing expression of a nucleic acid in a cell. A promoter may comprise one or more specific transcriptional regulatory sequences to further enhance expression and/or to alter the spatial expression and/or temporal expression of same. A promoter may also comprise distal enhancer or repressor elements, which can be located as much as several thousand base pairs from the start site of transcription. A promoter may be derived from sources including viral, bacterial, fungal, plants, insects, and animals. A promoter may regulate the expression of a gene component constitutively, or differentially with respect to cell, the tissue or organ in which expression occurs or, with respect to the developmental stage at which expression occurs, or in response to external stimuli such as physiological stresses, pathogens, metal ions, or inducing agents. Representative examples of promoters include the bacteriophage T7 promoter, bacteriophage T3 promoter, SP6 promoter, lac operator-promoter, tac promoter, SV40 late promoter, SV40 early promoter, RSV-LTR promoter, CMV IE promoter, SV40 early promoter or S V40 late promoter and the CMV IE promoter. reference expression profile As used herein, the phrase "reference expression profile" refers to a criterion expression value to which measured values are compared in order to determine the detection of a subject with colorectal cancer. The reference may be based on a combine metric score. selectable marker

"Selectable marker" as used herein means any gene which confers a phenotype on a host cell in which it is expressed to facilitate the identification and/or selection of cells which are transfected or transformed with a genetic construct. Representative examples of selectable markers include the ampicillin-resistance gene (Amp¹), tetracycline-resistance gene (Tc¹), bacterial kanamycin-resistance gene (Kan¹), zeocin resistance gene, the AURI-C gene which confers resistance to the antibiotic aureobasidin A, phosphinothricin-resistance gene, neomycin phosphotransferase gene (nptπ), hygromycin-resistance gene, beta- glucuronidase (GUS) gene, chloramphenicol acetyltransferase (CAT) gene, green fluorescent protein (GFP)-encoding gene and luciferase gene. sensitivity

"sensitivity" used herein may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example how frequently it correctly classifies a cancer into the correct type. The sensitivity for class A is the proportion of cases that are determined to belong to class "A" by the test out of the cases that are in class "A", as determined by some absolute or gold standard. specificity

"Specificity" used herein may mean a statistical measure of how well a binary classification test correctly identifies a condition, for example how frequently it correctly classifies a cancer into the correct type. The specificity for class A is the proportion of cases that are determined to belong to class "not A" by the test out of the cases that are in class

"not A", as determined by some absolute or gold standard. stringent hybridization conditions

"Stringent hybridization conditions" as used herein mean conditions under which a first nucleic acid sequence (e.g., probe) will hybridize to a second nucleic acid sequence (e.g., target), such as in a complex mixture of nucleic acids. Stringent conditions are sequence-dependent and will be different in different circumstances. Stringent conditions may be selected to be about 5-1O⁰C lower than the thermal melting point (T_m) for the specific sequence at a defined ionic strength pH. The T_m may be the temperature (under defined ionic strength, pH, and nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (as the target sequences are present in excess, at T_m, 50% of the probes are occupied at equilibrium).

Stringent conditions may be those in which the salt concentration is less than about 1.0 M sodium ion, such as about 0.01-1.0 M sodium ion concentration (or other salts) at pH 7.0 to 8.3 and the temperature is at least about 30°C for short probes (e.g., about 10-50 nucleotides) and at least about 60°C for long probes (e.g., greater than about 50 nucleotides). Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide. For selective or specific hybridization, a positive signal may be at least 2 to 10 times background hybridization. Exemplary stringent hybridization conditions include the following: 50% formamide, 5x SSC, and 1% SDS, incubating at 42°C, or, 5x SSC, 1% SDS, incubating at 65°C, with wash in 0.2x SSC, and 0.1% SDS at 65⁰C. substantially complementary "Substantially complementary" as used herein means that a first sequence is at least

60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical to the complement of a second sequence over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides, or that the two sequences hybridize under stringent hybridization conditions. substantially identical

"Substantially identical" as used herein means that a first and a second sequence are at least 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98% or 99% identical over a region of 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100 or more nucleotides or amino acids, or with respect to nucleic acids, if the first sequence is substantially complementary to the complement of the second sequence. subject

As used herein, the term "subject" refers to a mammal, including both human and other mammals. The methods of the present invention are preferably applied to human subjects. target nucleic acid

"Target nucleic acid" as used herein means a nucleic acid or variant thereof that may be bound by another nucleic acid. A target nucleic acid may be a DNA sequence. The target nucleic acid may be RNA. The target nucleic acid may comprise a mRNA, tRNA, shRNA, siRNA or Piwi-interacting RNA, or a pri-miRNA, pre-miRNA, miRNA, or anti- miRNA.

The target nucleic acid may comprise a target miRNA binding site or a variant thereof. One or more probes may bind the target nucleic acid. The target binding site may comprise 5-100 or 10-60 nucleotides. The target binding site may comprise a total of 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30-40, 40- 50, 50-60, 61, 62 or 63 nucleotides. The target site sequence may comprise at least 5 nucleotides of the sequence of a target miRNA binding site disclosed in U.S. Patent Application Nos. 11/384,049, 11/418,870 or 11/429,720, the contents of which are incorporated herein. tissue sample

As used herein, a tissue sample is tissue obtained from a tissue biopsy using methods well known to those of ordinary skill in the related medical arts. The phrase "suspected of being cancerous" as used herein means a cancer tissue sample believed by one of ordinary skill in the medical arts to contain cancerous cells. Methods for obtaining the sample from the biopsy include gross apportioning of a mass, microdissection, laser-based microdissection, or other art-known cell-separation methods. variant

"Variant" as used herein referring to a nucleic acid means (i) a portion of a referenced nucleotide sequence; (ii) the complement of a referenced nucleotide sequence or portion thereof; (iii) a nucleic acid that is substantially identical to a referenced nucleic acid or the complement thereof; or (iv) a nucleic acid that hybridizes under stringent conditions to the referenced nucleic acid, complement thereof, or a sequence substantially identical thereto. vector

"Vector" as used herein means a nucleic acid sequence containing an origin of replication. A vector may be a plasmid, bacteriophage, bacterial artificial chromosome or yeast artificial chromosome. A vector may be a DNA or RNA vector. A vector may be either a self-replicating extrachromosomal vector or a vector which integrates into a host genome. wild type

As used herein, the term "wild type" sequence refers to a coding, a non-coding or an interface sequence which is an allelic form of sequence that performs the natural or normal function for that sequence. Wild type sequences include multiple allelic forms of a cognate sequence, for example, multiple alleles of a wild type sequence may encode silent or conservative changes to the protein sequence that a coding sequence encodes.

The present invention employs miRNA for the identification, classification and diagnosis of colorectal cancer.

MicroRN A processing

A gene coding for a microRNA (miRNA) may be transcribed leading to production of an miRNA precursor known as the pri-miRNA. The pri-miRNA may be part of a polycistronic RNA comprising multiple pri-miRNAs. The pri-miRNA may form a hairpin structure with a stem and loop. The stem may comprise mismatched bases. The hairpin structure of the pri-miRNA may be recognized by Drosha, which is an RNase III endonuclease. Drosha may recognize terminal loops in the pri-miRNA and cleave approximately two helical turns into the stem to produce a 60-70 nucleotide precursor known as the pre-miRNA. Drosha may cleave the pri-miRNA with a staggered cut typical of RNase III endonucleases yielding a pre-miRNA stem loop with a 5' phosphate and ~2 nucleotide 3' overhang. Approximately one helical turn of the stem (~10 nucleotides) extending beyond the Drosha cleavage site may be essential for efficient processing. The pre-miRNA may then be actively transported from the nucleus to the cytoplasm by Ran-GTP and the export receptor Ex-portin-5. The pre-miRNA may be recognized by Dicer, which is also an RNase III endonuclease. Dicer may recognize the double-stranded stem of the pre-miRNA. Dicer may also recognize the 5' phosphate and 3' overhang at the base of the stem loop. Dicer may cleave off the terminal loop two helical turns away from the base of the stem loop leaving an additional 5' phosphate and ~2 nucleotide 3' overhang. The resulting siRNA-like duplex, which may comprise mismatches, comprises the mature miRNA and a similar-sized fragment known as the miRNA*. The miRNA and miRNA* may be derived from opposing arms of the pri-miRNA and pre-miRNA. MiRNA* sequences may be found in libraries of cloned miRNAs but typically at lower frequency than the miRNAs.

Although initially present as a double-stranded species with miRNA*, the miRNA may eventually become incorporated as a single-stranded RNA into a ribonucleoprotein complex known as the RNA-induced silencing complex (RISC). Various proteins can form the RISC, which can lead to variability in specificity for miRNA/miRNA* duplexes, binding site of the target gene, activity of miRNA (repression or activation), and which strand of the miRNA/miRNA* duplex is loaded in to the RISC. When the miRNA strand of the miRNA:miRNA* duplex is loaded into the RISC, the miRNA* may be removed and degraded. The strand of the miRNA:miRNA* duplex that is loaded into the RISC may be the strand whose 5¹ end is less tightly paired. In cases where both ends of the miRNA:miRNA* have roughly equivalent 5' pairing, both miRNA and miRNA* may have gene silencing activity. The RISC may identify target nucleic acids based on high levels of complementarity between the miRNA and the mRNA, especially by nucleotides 2-7 of the miRNA. Only one case has been reported in animals where the interaction between the miRNA and its target was along the entire length of the miRNA. This was shown for mir-196 and Hox B 8 and it was further shown that mir-196 mediates the cleavage of the Hox B8 mRNA (Yekta et al 2004, Science 304-594). Otherwise, such interactions are known only in plants (Bartel & Bartel 2003, Plant Physiol 132-709).

A number of studies have studied the base-pairing requirement between miRNA and its rnRNA target for achieving efficient inhibition of translation (reviewed by Bartel 2004, Cell 116-281). In mammalian cells, the first 8 nucleotides of the miRNA may be important (Doench & Sharp 2004 GenesDev 2004-504). However, other parts of the microRNA may also participate in mRNA binding. Moreover, sufficient base pairing at the 3' can compensate for insufficient pairing at the 5' (Brennecke et al, 2005 PLoS 3-e85).

Computation studies, analyzing miRNA binding on whole genomes have suggested a specific role for bases 2-7 at the 5' of the miRNA in target binding but the role of the first nucleotide, found usually to be "A" was also recognized (Lewis et at 2005 Cell 120-15). Similarly, nucleotides 1-7 or 2-8 were used to identify and validate targets by Krek et al (2005, Nat Genet 37-495).

The target sites in the mRNA may be in the 5' UTR, the 3' UTR or in the coding region. Interestingly, multiple miRNAs may regulate the same mRNA target by recognizing the same or multiple sites. The presence of multiple miRNA binding sites in most genetically identified targets may indicate that the cooperative action of multiple RISCs provides the most efficient translational inhibition. miRNAs may direct the RISC to downregulate gene expression by either of two mechanisms: mRNA cleavage or translational repression. The miRNA may specify cleavage of the mRNA if the mRNA has a certain degree of complementarity to the miRNA. When a miRNA guides cleavage, the cut may be between the nucleotides pairing to residues 10 and 11 of the miRNA. Alternatively, the miRNA may repress translation if the miRNA does not have the requisite degree of complementarity to the miRNA. Translational repression may be more prevalent in animals since animals may have a lower degree of complementarity between the miRNA and the binding site.

It should be noted that there may be variability in the 5' and 3' ends of any pair of miRNA and miRNA*. This variability may be due to variability in the enzymatic processing of Drosha and Dicer with respect to the site of cleavage. Variability at the 5' and 3' ends of miRNA and miRNA* may also be due to mismatches in the stem structures of the pri-miRNA and pre-miRNA. The mismatches of the stem strands may lead to a population of different hairpin structures. Variability in the stem structures may also lead to variability in the products of cleavage by Drosha and Dicer. Nucleic Acids

Nucleic acids are provided herein. The nucleic acids comprise the sequence of SEQ TD NOS: 1-271 or variants thereof. The variant may be a complement of the referenced nucleotide sequence. The variant may also be a nucleotide sequence that is substantially identical to the referenced nucleotide sequence or the complement thereof. The variant may also be a nucleotide sequence which hybridizes under stringent conditions to the referenced nucleotide sequence, complements thereof, or nucleotide sequences substantially identical thereto.

The nucleic acid may have a length of from 10 to 250 nucleotides. The nucleic acid may have a length of at least 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 125, 150, 175, 200 or 250 nucleotides. The nucleic acid may be synthesized or expressed in a cell (in vitro or in vivo) using a synthetic gene described herein. The nucleic acid may be synthesized as a single strand molecule and hybridized to a substantially complementary nucleic acid to form a duplex. The nucleic acid may be introduced to a cell, tissue or organ in a single- or double-stranded form or capable of being expressed by a synthetic gene using methods well known to those skilled in the art, including as described in U.S. Patent No. 6,506,559 which is incorporated by reference.

Nucleic acid complexes The nucleic acid may further comprise one or more of the following: a peptide, a protein, a RNA-DNA hybrid, an antibody, an antibody fragment, a Fab fragment, and an aptamer.

Pri-miRNA

The nucleic acid may comprise a sequence of a pri-miRNA or a variant thereof. The pri-miRNA sequence may comprise from 45-30,000, 50-25,000, 100-20,000, 1,000-1,500 or 80-100 nucleotides. The sequence of the pri-miRNA may comprise a pre-miRNA, miRNA and miRNA*, as set forth herein, and variants thereof. The sequence of the pri-miRNA may comprise the sequence of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132-155, 157-177; or variants thereof. The pri-miRNA may form a hairpin structure. The hairpin may comprise a first and a second nucleic acid sequence that are substantially complimentary. The first and second nucleic acid sequence may be from 37-50 nucleotides. The first and second nucleic acid sequence may be separated by a third sequence of from 8-12 nucleotides. The hairpin structure may have a free energy of less than -25 Kcal/mole, as calculated by the Vienna algorithm, with default parameters as described in Hofacker et al., Monatshefte f. Chemie 125: 167-188 (1994), the contents of which are incorporated herein. The hairpin may comprise a terminal loop of 4-20, 8-12 or 10 nucleotides. The pri-miRNA may comprise at least 19% adenosine nucleotides, at least 16% cytosine nucleotides, at least 23% thymine nucleotides and at least 19% guanine nucleotides. Pre-miRNA

The nucleic acid may also comprise a sequence of a pre-miRNA or a variant thereof. The pre-miRNA sequence may comprise from 45-90, 60-80 or 60-70 nucleotides. The sequence of the pre-miRNA may comprise a miRNA and a miRNA* as set forth herein. The sequence of the pre-miRNA may also be that of a pri-miRNA excluding from 0-160 nucleotides from the 5' and 3' ends of the pri-miRNA. The sequence of the pre-miRNA may comprise the sequence of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132-155, 157-177; or variants thereof. miRNA The nucleic acid may also comprise a sequence of a miRNA (including miRNA*) or a variant thereof. The miRNA sequence may comprise from 13-33, 18-24 or 21-23 nucleotides. The miRNA may also comprise a total of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleotides. The sequence of the miRNA may be the first 13-33 nucleotides of the pre-miRNA. The sequence of the miRNA may also be the last 13-33 nucleotides of the pre-miRNA. The sequence of the miRNA may comprise the sequence of SEQ ID NOS: 1- 1-11, 33-34, 81-84, 86-104, 124; or variants thereof. Anti-miRNA The nucleic acid may also comprise a sequence of an anti-miRNA capable of blocking the activity of a miRNA or miRNA*, such as by binding to the pri-miRNA, pre- miRNA, miRNA or miRNA* (e.g. antisense or RNA silencing), or by binding to the target binding site. The anti-miRNA may comprise a total of 5-100 or 10-60 nucleotides. The anti-miRNA may also comprise a total of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39 or 40 nucleotides. The sequence of the anti-miRNA may comprise (a) at least 5 nucleotides that are substantially identical or complimentary to the 5' of a miRNA and at least 5-12 nucleotides that are substantially complimentary to the flanking regions of the target site from the 5' end of the miRNA, or (b) at least 5-12 nucleotides that are substantially identical or complimentary to the 3' of a miRNA and at least 5 nucleotide that are substantially complimentary to the flanking region of the target site from the 3' end of the miRNA. The sequence of the anti-miRNA may comprise the compliment of SEQ ID NOS: 1-11, 33-34, 81-84, 86-104, 124; or variants thereof.

Binding Site of Target The nucleic acid may also comprise a sequence of a target microRNA binding site or a variant thereof. The target site sequence may comprise a total of 5-100 or 10-60 nucleotides. The target site sequence may also comprise a total of at least 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62 or 63 nucleotides. The target site sequence may comprise at least 5 nucleotides of the sequence of SEQ ID NOS: 1-11, 33-34, 81-84, 86-104, 124.

Synthetic Gene

A synthetic gene is also provided comprising a nucleic acid described herein operably linked to a transcriptional and/or translational regulatory sequence. The synthetic gene may be capable of modifying the expression of a target gene with a binding site for a nucleic acid described herein. Expression of the target gene may be modified in a cell, tissue or organ. The synthetic gene may be synthesized or derived from naturally-occurring genes by standard recombinant techniques. The synthetic gene may also comprise terminators at the 3 '-end of the transcriptional unit of the synthetic gene sequence. The synthetic gene may also comprise a selectable marker.

Vector

A vector is also provided comprising a synthetic gene described herein. The vector may be an expression vector. An expression vector may comprise additional elements. For example, the expression vector may have two replication systems allowing it to be maintained in two organisms, e.g., in one host cell for expression and in a second host cell

(e.g., bacteria) for cloning and amplification. For integrating expression vectors, the expression vector may contain at least one sequence homologous to the host cell genome, and preferably two homologous sequences which flank the expression construct. The integrating vector may be directed to a specific locus in the host cell by selecting the appropriate homologous sequence for inclusion in the vector. The vector may also comprise a selectable marker gene to allow the selection of transformed host cells.

Host Cell

A host cell is also provided comprising a vector, synthetic gene or nucleic acid described herein. The cell may be a bacterial, fungal, plant, insect or animal cell. For example, the host cell line may be DG44 and DUXBlI (Chinese Hamster Ovary lines, DHFR minus), HELA (human cervical carcinoma), CVI (monkey kidney line), COS (a derivative of CVI with SV40 T antigen), Rl 610 (Chinese hamster fibroblast) BALBC/3T3 (mouse fibroblast), HAK (hamster kidney line), SP2/0 (mouse myeloma), P3x63-Ag3.653 (mouse myeloma), BFA-IcIBPT (bovine endothelial cells), RAJI (human lymphocyte) and 293 (human kidney). Host cell lines may be available from commercial services, the American Tissue Culture Collection or from published literature.

Probes

A probe is provided herein. A probe may comprise a nucleic acid. The probe may have a length of from 8 to 500, 10 to 100 or 20 to 60 nucleotides. The probe may also have a length of at least 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,

28, 29, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280 or

300 nucleotides. The probe may comprise a nucleic acid of 18-25 nucleotides.

A probe may be capable of binding to a target nucleic acid of complementary sequence through one or more types of chemical bonds, usually through complementary base pairing, usually through hydrogen bond formation. Probes may bind target sequences lacking complete complementarity with the probe sequence depending upon the stringency of the hybridization conditions. A probe may be single stranded or partially single and partially double stranded. The strandedness of the probe is dictated by the structure, composition, and properties of the target sequence. Probes may be directly labeled or indirectly labeled.

Test Probe

The probe may be a test probe. The test probe may comprise a nucleic acid sequence that is complementary to a miRNA, a miRNA*, a pre-miRNA, or a pri-miRNA. The sequence of the test probe may be selected from SEQ ID NOS: 47-57, 77-78, 178-195, 198- 203, 205-218, 220-224; or variants thereof.

Linker Sequences

The probe may further comprise a linker. The linker may be 10-60 nucleotides in length. The linker may be 20-27 nucleotides in length. The linker may be of sufficient length to allow the probe to be a total length of 45-60 nucleotides. The linker may not be capable of forming a stable secondary structure, or may not be capable of folding on itself, or may not be capable of folding on a non-linker portion of a nucleic acid contained in the probe. The sequence of the linker may not appear in the genome of the animal from which the probe non-linker nucleic acid is derived.

Reverse Transcription

Target sequences of a cDNA may be generated by reverse transcription of the target RNA. Methods for generating cDNA may be reverse transcribing polyadenylated RNA or alternatively, RNA with a ligated adaptor sequence.

Reverse Transcription using Adaptor Sequence Ligated to RNA

The RNA may be ligated to an adapter sequence prior to reverse transcription. A ligation reaction may be performed by T4 RNA ligase to ligate an adaptor sequence at the 3' end of the RNA. Reverse transcription (RT) reaction may then be performed using a primer comprising a sequence that is complementary to the 3' end of the adaptor sequence.

Reverse Transcription using Polyadenylated Sequence Ligated to RNA

Polyadenylated RNA may be used in a reverse transcription (RT) reaction using a poly(T) primer comprising a 5' adaptor sequence. The poly(T) sequence may comprise 8, 9, 10, 11, 12, 13, or 14 consecutive thymines. The reverse transcription primer may comprise SEQ ID NO: 80 or variants thereof.

RT-PCR of RNA

The reverse transcript of the RNA may be amplified by real time PCR, using a specific forward primer comprising at least 15 nucleic acids complementary to the target nucleic acid and a 5' tail sequence; a reverse primer that is complementary to the 3' end of the adaptor sequence; and a probe comprising at least 8 nucleic acids complementary to the target nucleic acid. The probe may be partially complementary to the 5' end of the adaptor sequence.

PCR of Target Nucleic Acids Methods of amplifying target nucleic acids are described herein. The amplification may be by a method comprising PCR. The first cycles of the PCR reaction may have an annealing temp of 56°C, 57⁰C, 58⁰C, 59°C, or 60°C. The first cycles may comprise 1-10 cycles. The remaining cycles of the PCR reaction may be 60⁰C. The remaining cycles may comprise 2-40 cycles. The annealing temperature may cause the PCR to be more sensitive. The PCR may generate longer products that can serve as higher stringency PCR templates.

Forward Primer

The PCR reaction may comprise a forward primer. The forward primer may comprise 15, 16, 17, 18, 19, 20, or 21 nucleotides identical to the target nucleic acid. The 3' end of the forward primer may be sensitive to differences in sequence between a target nucleic acid and a sibling nucleic acid.

The forward primer may also comprise a 5' overhanging tail. The 5' tail may increase the melting temperature of the forward primer. The sequence of the 5' tail may comprise a sequence that is non-identical to the genome of the animal from which the target nucleic acid is isolated. The sequence of the 5' tail may also be synthetic. The 5' tail may comprise 8, 9, 10, 11, 12, 13, 14, 15, or 16 nucleotides. The forward primer may comprise SEQ ID NOS: 36-46, 66-67, 225-250, 252-265, 267-271; or variants thereof.

Reverse Primer The PCR reaction may comprise a reverse primer. The reverse primer may be complementary to a target nucleic acid. The reverse primer may also comprise a sequence complementary to an adaptor sequence. The sequence complementary to an adaptor sequence may comprise SEQ ID NO: 80 or variants thereof.

Biochip A biochip is also provided. The biochip may comprise a solid substrate comprising an attached probe or plurality of probes described herein. The probes may be capable of hybridizing to a target sequence under stringent hybridization conditions. The probes may be attached at spatially defined locations on the substrate. More than one probe per target sequence may be used, with either overlapping probes or probes to different sections of a particular target sequence. The probes may be capable of hybridizing to target sequences associated with a single disorder appreciated by those in the art. The probes may either be synthesized first, with subsequent attachment to the biochip, or may be directly synthesized on the biochip.

The solid substrate may be a material that may be modified to contain discrete individual sites appropriate for the attachment or association of the probes and is amenable to at least one detection method. Representative examples of substrate materials include glass and modified or functionalized glass, plastics (including acrylics, polystyrene and copolymers of styrene and other materials, polypropylene, polyethylene, polybutylene, polyurethanes, TeflonJ, etc.), polysaccharides, nylon or nitrocellulose, resins, silica or silica- based materials including silicon and modified silicon, carbon, metals, inorganic glasses and plastics. The substrates may allow optical detection without appreciably fluorescing.

The substrate may be planar, although other configurations of substrates may be used as well. For example, probes may be placed on the inside surface of a tube, for flow- through sample analysis to minimize sample volume. Similarly, the substrate may be flexible, such as flexible foam, including closed cell foams made of particular plastics.

The substrate of the biochip and the probe may be derivatized with chemical functional groups for subsequent attachment of the two. For example, the biochip may be derivatized with a chemical functional group including, but not limited to, amino groups, carboxyl groups, oxo groups or thiol groups. Using these functional groups, the probes may be attached using functional groups on the probes either directly or indirectly using a linker. The probes may be attached to the solid support by either the 5' terminus, 3' terminus, or via an internal nucleotide. The probe may also be attached to the solid support non-covalently. For example, biotinylated oligonucleotides can be made, which may bind to surfaces covalently coated with streptavidin, resulting in attachment. Alternatively, probes may be synthesized on the surface using techniques such as photopolymerization and photolithography.

Diagnostics A method of diagnosis is also provided. The method comprises detecting a differential expression level of colorectal cancer-associated nucleic acids in a biological sample. The sample may be derived from a patient. Diagnosis of a cancer state, and its histological type, in a patient may allow for prognosis and selection of therapeutic strategy. Further, the developmental stage of cells may be classified by determining temporarily expressed cancer-associated nucleic acids.

In situ hybridization of labeled probes to tissue sections and smears may be performed. When comparing the fingerprints between an individual and a standard, the skilled artisan can make a diagnosis, a prognosis, or a prediction based on the findings. It is further understood that the genes which indicate the diagnosis may differ from those which indicate the prognosis and molecular profiling of the condition of the cells may lead to distinctions between responsive or refractory conditions or may be predictive of outcomes. Kits

A kit is also provided and may comprise a nucleic acid described herein together with any or all of the following: assay reagents, buffers, probes and/or primers, and sterile saline or another pharmaceutically acceptable emulsion and suspension base. In addition, the kits may include instructional materials containing directions (e.g., protocols) for the practice of the methods described herein. For example, the kit may be used for the amplification, detection, identification or quantification of a target nucleic acid sequence. The kit may comprise a ρoly(T) primer, a forward primer, a reverse primer, and a probe.

Any of the compositions described herein may be comprised in a kit. In a non- limiting example, reagents for isolating miRNA, labeling miRNA, and/or evaluating a miRNA population using an array are included in a kit. The kit may further include reagents for creating or synthesizing miRNA probes. The kits will thus comprise, in suitable container means, an enzyme for labeling the miRNA by incorporating labeled nucleotide or unlabeled nucleotides that are subsequently labeled. It may also include one or more buffers, such as reaction buffer, labeling buffer, washing buffer, or a hybridization buffer, compounds for preparing the miRNA probes, components for in situ hybridization and components for isolating miRNA. Other kits of the invention may include components for making a nucleic acid array comprising miRNA, and thus, may include, for example, a solid support. The following examples are presented in order to more fully illustrate some embodiments of the invention. They should, in no way be construed, however, as limiting the broad scope of the invention.

EXAMPLES Example 1 Methods Patients and samples

Serum samples were collected from colorectal cancer (CRC) patients and healthy individuals ("Controls"). CRC patients were defined as patients with histologically confirmed colorectal adenocarcinoma, regardless of stage. Patients should not have been operated nor received any anti-neoplastic therapy prior to study entry, to avoid any possible influence of such interventions on microRNA expression. Healthy controls were people in whom the presence of CRC was excluded by complete colonoscopy within six months prior to study entry. Colonoscopy was done as a screening test or as an investigational procedure for suspected cancer or benign conditions. All participants had to be 18 years or older at registration and to be able to sign and understand the informed consent. The study protocol was approved by the local institutional review board.

Fifty-two CRC patients and 81 healthy individuals participated in this study. Of these, successful microRNA extraction was achieved in 49 patients and 79 healthy individuals (96% of samples). Ten CRC cases (with metastatic cancer) and ten control samples were used for the first screening phase, which measured microRNA expression in hundreds of microRNAs; 44 cases and 74 control samples were used in the second phase that studied a smaller set of candidate microRNA markers. Five samples from each group were studied in both the first and second phases for internal validation. Most of the demographic parameters were similarly distributed among the two groups, with a mean age around 60 years, an even gender distribution and predominance of Jewish subjects (Table 1). Family history of cancer was slightly more prevalent among the healthy controls (p=0.02). Normal endoscopic examination was by far the most common finding among the healthy controls .

Another screening experiment was preformed using 19 CRC patients and 19 healthy controls. 19 of them were also in the previous experiment. The expression of 252 microRNAs was measured in each of these serum samples.

Table 1: Clinical features of study subjects

Data was missing on family history of cancer (2 subjects), ethnicity (1) and grade (13).

Sample collection

8ml of blood was collected from each subject directly into serum collection tubes (Greiner Bio-one, VACUETTE® Serum Tubes 455071). The whole blood was allowed to stand for about Ih at RT before being centrifuged at 1800 g for 10 minutes at RT. The resultant serum was aliquoted into eppendorf tubes and stored at -80⁰C. RNA extraction

Serum (lOOμl) was incubated over night at 57⁰C with 300μl pre-heated Proteinase K extraction solution as detailed in Table 2: Table 2: Proteinase K extraction solution

Followed by acid phenol: chloroform extraction, linear acrylamide (8μl) was added. RNA was ETOH precipitated ON at -20⁰C and re-suspend with DDW (43 μl). Next, DNase (Ambion) treatment was performed to eliminate residual DNA fragments. Finally, after a second acid phenolxhloroform extraction, the pellet was re-suspended in DDW. qRT-PCR

RNA was subjected to a polyadenylation reaction as described previously ( Shi, R. and Chiang, V.L. 2005, Biotechniques. 39(4):519-25). Briefly, RNA was incubated in the presence of poly (A) polymerase (PAP; Takara-2180A), MnC12, and ATP for Ih at 37⁰C. Then, using an oligodT primer harboring a consensus sequence (complementary to the reverse primer) reverse transcription was performed on total RNA using Superscript II RT (Invitrogen). Next, the cDNA was amplified by real time PCR; this reaction contained a microRNA-specific forward primer, a TaqMan probe complementary to the 3' of the specific microRNA sequence as well as to part of the polyA adaptor sequence, and a universal reverse primer complementary to the 3' sequence of the oligodT tail.

In addition to the 11 microRNAs listed in Table 1, the following 11 microRNAs were measured for normalization purposes: hsa-miR-126 (SEQ ID NO: 25), hsa-miR-19a (SEQ ID NO: 27), hsa-miR-92a (SEQ ID NO: 26), hsa-miR-423-3p (SEQ ID NO: 28), hsa- miR-22 (SEQ ID NO: 29), hsa-miR-24 (SEQ ID NO: 30), hsa-let-7b (SEQ ID NO: 31), hsa- let-7d (SEQ ID NO: 32), hsa-miR-23a (SEQ ID NO: 33), hsa-miR-148a (SEQ ID NO: 34), and hsa-miR-27a (SEQ ID NO: 35). Data analysis and statistics

In the initial screen, the data was rescaled by subtracting for each sample the mean C_T of all 363 microRNAs, and adding back the mean CT across all measured samples. Twenty-two microRNAs were chosen for the second stage using the following criteria: 11 differentially expressed microRNAs with the smallest p-values (two-sided unpaired t-test) among those with fold change (exponent base 2 of the difference in the median normalized C_T of the two groups) above 1.5; and 11 additional microRNAs with smallest variation in levels were chosen for normalization purposes. These 22 microRNAs were measured on the larger cohort, and the data was rescaled by subtracting for each sample the mean CT of the 11 normalization microRNAs, and adding back the mean CT (of the normalization microRNAs) across all measured samples.

Simple combinations were examined (sums or differences of C_T) of pairs of microRNAs for their ability to discriminate CRC patients from healthy individuals, calculating for each such combination the area under the curve (AUC) of the receiver operating characteristic (ROC). Several different combinations (e.g. Fig. 2) had AUC>0.80. The combination with highest AUC is shown in Figure 2C. Example 2 Specific microRNAs are used for the detection of colorectal cancer in serum samples

Protocols for extracting and quantifying microRNA in serum and other body fluids were developed. The sensitivity and specificity of this qRT-PCR method makes it possible to monitor the minute amount of microRNA present in cell-free body fluids (Fig. 1).

An important parameter for potential biomarkers is their general distribution in the population. The expression levels of the microRNAs chosen for the initial screening were compared in the set of ten healthy individuals, in whom the presence of CRC was ruled out by a complete colonoscopy within the last 6 months.

The microRNA profile was consistent among the healthy controls, with a mean correlation coefficient (between all pairs of individuals) of 0.90 (Figs. 1 A-IB). The median standard deviation for these microRNAs was less than 1 C_T (Fig. 1C). Thus, microRNAs are generally found in serum within a limited range. Importantly for their potential use as diagnostic tools, it was found that microRNA levels in unfrozen serum did not change substantially when samples were left at room temperatures for up to 4 hours, or by twice freezing and re-thawing of samples. This further indicates that microRNA levels in serum samples are sufficiently robust to serve as potential clinical biomarkers. In order to test whether circulating microRNAs can be used to identify CRC patients. The serum levels of 363 microRNAs were measured in the sera of 10 metastatic CRC patients as compared with microRNA levels in the sera of 10 healthy individuals. From this small dataset a subset of 11 microRNAs were identified (Table 3) (Fig. ID) which showed the strongest differences between the two groups, and a further set of 11 microRNAs for normalization. The levels of these 22 microRNAs were measured in serum samples from the cohort of 44 CRC patients and 74 controls. This cohort included 5 of each group from the first set of samples. AU of the microRNAs showed the same direction of change in the large sample set as in the initial screen (Fig. 2A-2B). All of the 11 microRNAs were indeed found to have different levels (p-value<0.05 in two-sided t-test) in the sera of the 16 metastatic CRC patients compared to controls (Table 3). Satisfyingly, 4 of these microRNAs were also statistically significant when comparing the group of 28 earlier- stage, non-metastatic CRC patients to the controls (Table 3).

These differential microRNAs can be used to identify subsets of the population that are enriched for CRC patients. hsa-miR-16 (SEQ ID NO: 2) and hsa-miR-125b (SEQ ID NO: 3) have relatively consistent differences between controls and both metastatic and non- metastatic subjects (Table 3), with the most significant p-values and largest fold-change between the healthy controls and the non-metastatic cases (Table 3). These microRNAs have lower levels (higher C_T) in the sera of CRC patients (Fig.2). A simple sum of the abundance signals (Cx' s) of these microRNAs can be used to identify CRC patients with 91% sensitivity and 72% specificity (Fig. 2C, solid line). An alternative cutoff on the levels can reach specificity of 93% at a sensitivity of 55% (Fig. 2C, dashed line).

P-values are calculated on normalized C_T- Fold-changes are calculated by the exponent (base 2) of the difference in the median normalized Cx of the two groups. "+" marks higher expression (lower Cx) and "— " marks lower expression (higher Cx) in colorectal cancer patients.

Table 3: Comparison between microRNA levels in serum samples obtained from healthy controls and patients with colorectal cancer, based on qRT-PCR measurements

Table 4: Sequences of primers and probes used for the detection of differential miRs

Table 5: Sequences of primers and probes used for normalization

Example 3

Serum microRNAs as biomarkers for colorectal cancer

Another screening experiment was preformed on serum samples obtained from 19 CRC patients and 19 healthy controls. The expression of 252 microRNAs was measured in each of these serum samples. The differential levels of microRNA in serum samples obtained from CRC patients as compared to microRNA levels in serum samples obtained from healthy individuals are presented in Figure 3. Statistically significant microRNAs up- regulated in serum samples obtained from CRC patients as compared to healthy individuals are presented in Table 6. Statistically significant microRNAs down regulated in serum samples obtained from CRC patients as compared to healthy individuals are presented in Table 7. Boxplots presentations comparing distributions of the presence of exemplified statistically significant microRNAs are shown in Figures 4A-4H. Table 6: microRNAs up-regulated in serum samples obtained from CRC patients as compared to healthy individuals

Table 7: microRNAs down regulated in serum samples obtained from CRC patients as compared to healthy individuals

Table 8: Sequences of primers and probes used for the detection of differential miRs

The foregoing description of the specific embodiments will so fully reveal the general nature of the invention that others can, by applying current knowledge, readily modify and/or adapt for various applications such specific embodiments without undue experimentation and without departing from the generic concept, and, therefore, such adaptations and modifications should and are intended to be comprehended within the meaning and range of equivalents of the disclosed embodiments. Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art.

Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

It should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

Claims

1. A method for the detection of colorectal cancer, the method comprising: obtaining a biological sample from a subject; measuring an expression profile in said sample of nucleic acid sequences selected from the group consisting of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132- 155, 157-177; a fragment thereof and a sequence having at least about 80% identity thereto; and comparing said obtained expression profile to a reference expression profile representing the expression levels of said nucleic acid in healthy controls; whereby an altered expression level of the nucleic acid sequence allows the detection of said colorectal cancer.

2. The method of claim 1, wherein relatively high expression levels of a nucleic acid sequence selected from the group consisting of SEQ ID NOS:

1, 4, 5, 9-12, 17, 18, 22-24, 33-34, 81-84, 86-104, 124, 128-129, 132-154, 174; a fragment thereof and a sequence having at least about 80% identity thereto is indicative of the presence of colorectal cancer.

3. The method of claim 1, wherein relatively low expression levels of a nucleic acid sequence selected from the group consisting of SEQ ID NOS:

2, 3, 6-8, 13-16, 19-21, 105, 107-123, 125-127, 155, 157-173, 175-177; a fragment thereof and a sequence having at least about 80% identity thereto is indicative of the presence of colorectal cancer.

4. The method of claim 1, wherein said biological sample is selected from the group consisting of bodily fluid, a cell line and a tissue sample.

5. The method of claim 4, wherein said bodily fluid sample is a serum sample.

6. The method of claim 4, wherein said bodily fluid sample is a urine sample.

7. The method of claim 1, wherein the method comprises determining the expression levels of at least two nucleic acid sequences.

8. The method of claim 7, wherein the method further comprises combining one or more expression ratios of said nucleic acid sequences.

9. The method of claim 1, wherein the expression levels are determined by a method selected from the group consisting of nucleic acid hybridization, nucleic acid amplification, and a combination thereof.

10. The method of claim 9, wherein the nucleic acid amplification method is real-time PCR.

11. The method of claim 10, wherein the real-time PCR method comprises forward and reverse primers.

12. The method of claim 11, wherein the forward primer comprises a sequence selected from the group consisting of SEQ ID NOS: SEQ ID NOS: 36-46, 66-67, 225-265, 267-271; a fragment thereof and a sequence having at least about 80% identity thereto.

13. The method of claim 12, wherein the real-time PCR method further comprises a probe.

14. The method of claim 13, wherein the probe comprises a nucleic acid sequence that is complementary to a sequence selected from the group consisting of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132- 155, 157-177; a fragment thereof and a sequence having at least about 80% identity thereto.

15. The method of claim 14, wherein the probe comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 47-57, 77- 78, 178-203, 205-218, 220-224; a fragment thereof and a sequence having at least about 80% identity thereto.

16. The method of claim 11, wherein the reverse primer comprises SEQ ID NO: 80, a fragment thereof and a sequence having at least about 80% identity thereto.

17. A kit for the detection of colorectal cancer, said kit comprising a probe comprising a nucleic acid sequence that is complementary to a sequence selected from the group consisting of SEQ ID NOS: 1-24, 33-34, 81-84, 86-105, 107-129, 132-155, 157-177; a fragment thereof and a sequence having at least about 80% identity thereto.

18. The kit of claim 17, wherein the probe comprises a nucleic acid sequence selected from the group consisting of SEQ ID NOS: 47-57, 77-78,178- 203, 205-218, 220-224; a fragment thereof and a sequence having at least about 80% identity thereto.

19. The kit of claim 17, wherein the kit further comprises a forward primer comprising a sequence selected from the group consisting of any one of SEQ ID NOS: 36-46, 66-67, 225-265, 267-271; a fragment thereof and a sequence having at least about 80% identity thereto.

20. The kit of claim 17, wherein the kit further comprises a reverse primer comprises SEQ ID NO: 80, a fragment thereof and a sequence having at least about 80% identity thereto.

21. The method of claim 1, further comprising managing subject treatment based on the colorectal cancer status.

22. The method of claim 21, wherein managing subject treatment is selected from ordering further diagnostic tests, administering at least one therapeutic agent, administering radiation therapy, immunotherapy, hyperthermia, surgery, surgery followed or preceded by chemotherapy and/or radiation therapy, biotherapy, and taking no further action.