HK1033473B

HK1033473B - Method for producing complex dna methylation fingerprints

Info

Publication number: HK1033473B
Application number: HK01104134.8A
Authority: HK
Inventors: A‧欧莱克; S‧S‧欧莱克; J‧沃尔特
Original assignee: Epi基因组公司
Priority date: 1997-11-27
Filing date: 1998-11-27
Publication date: 2006-10-13

Description

Preparation method of complex DNA methylation fingerprint

1. Field of the invention

The method claimed herein offers a new possibility for differential diagnosis of cancer diseases. It leads to a more thorough understanding of the pathogenesis of carcinogenesis and polygenic genetic diseases. In addition, the method also involves the identification of all genes involved in disease progression. In the past, cellular differentiation and differentiation of higher organisms has remained largely unknown. The method now greatly improves the understanding of this aspect.

In recent years, the observation levels well studied by the development of methodology of molecular biology include the gene itself, gene translation of RNA, and the produced protein. During ontogeny, gene opening, how the activation and inhibition of certain genes in specific cells and tissues is controlled, may be highly correlated with the nature and extent of methylation of the gene or genome. In this regard, it is reasonable to assume that the pathogenic case is expressed with a modified methylation pattern of the individual genes or genomes.

The prior art is a method that can study the methylation pattern of an individual gene. Other recent developments of the method also make possible the analysis of very small quantities of raw material, however, the total number of measurement points is still at most two digits, with a theoretical value range of at least 10⁷And (4) measuring points. With the method claimed, it is now possible for the first time to examine any desired part of the genome with any desired number of assay points. Thus, this method allows the identification of the etiology of all types of genetic diseases that cannot be determined by any other method, and allows the development of new therapeutic strategies and the identification of new drug target proteins.

2. Description of the Prior Art

2.1 Prior Art analysis of cellular phenotypic molecules

The study of gene expression may be at the RNA level or at the protein level. These two levels essentially reflect important phenotypic parameters. Protein assays using two-dimensional gels (McFarrel method) have been known for about 15 years. Using these assays, analysis of the chromatographic positions of thousands of proteins can be described in detail. Very early on, these electropherograms have been processed or estimated using data processing methods. In principle, the method is highly effective, however, it is inferior in two respects to modern gene expression methods based on RNA analysis.

In particular, proteins of regulatory importance cannot be detected from small numbers of cells, due to the fact that the sensitivity of the method used is too low. In fact, unlike nucleic acids, proteins cannot be amplified. In addition, the method is extremely complex, not amenable to automation, and very expensive. In contrast, RNA analysis has considerable advantages and is more sensitive due to the use of PCR. In particular, every RNA species considered important can be identified immediately on the basis of the sequence.

Overexpression or underexpression of individual RNAs of known sequence is often easily detected; however, in the application described here, this is only valid in exceptional cases. The "differential display" method is at best able to perform semi-quantitative studies of expression. The expression products of the PCR amplification were separated by gel electrophoresis. The effectiveness of gel electrophoresis is limited due to its resolution. In addition, the method is not sensitive and stable enough for routine diagnostic applications (Liang, P. and Pardee, A.B., Science 257, 967-.

Genes with higher over-or under-expression are often identified by subtractive (subtractive) techniques. cDNA clones of the cell or tissue species to be tested are plated. The cDNA was hybridized to the clone as a comparator. Expression patterns cannot be reliably prepared using this technique.

One activity of the U.S. human genome project is the systematic sequencing of expressed genes. The data thus obtained can be used to build expression chips, which allow the study of almost all expression sequences of one cell or tissue type in one experiment.

2.2 State of the Art analysis of cancer diseases

Genetic mutations invariably cause cancer disease [ sic ], i.e., cellular degeneration. The cause of these mutations may be exogenous influences, or intracellular events. In a few exceptions, individual mutations (translocations, deletions) that often affect large regions of the genome lead to cell degeneration; however, in most cases, a series of mutations of different genes are involved, and only their combined action causes malignant disease. These results at the DNA level are also reflected at the RNA and protein levels. In this connection, proliferation is very likely to occur, since in many cases it is clear that the amount and type of one RNA influences the extent of synthesis of several other RNA species. This leads to a change in the synthesis rate of the corresponding protein, which, in turn, can lead to deregulated metabolism, thus initiating regulatory and counter-regulatory mechanisms. The result is a gene expression pattern of the cell that has been modified in a very specific (but essentially indeterminate) manner, the specificity being for certain cancers, stages of the cancer and the degree of malignancy of the cancer. This phenomenon has been beyond the research field of natural science to date. In fact, it is impossible to completely examine gene expression or cellular metabolism. Chip technology offers this possibility for the first time (Schena, M. et al, science 270, 467-.

If one wishes to solve the diagnostic problem of early diagnosis of tumors on a molecular level, one today faces, with few exceptions, difficult difficulties to overcome: since, for most tumors, the molecular events, i.e. the different mutations, are known only insignificantly, the investigator does not know what should be looked for in the medical examination material. This means that it is absolutely impossible to exploit the remarkable sensitivity and specificity of the polymerase chain reaction. For example, certain intestinal tumors, Ewing's sarcoma, and certain forms of leukemia, are virtually each defined as a single, precisely described mutation. In these cases, it is possible to identify degenerated cells from millions of normal cells. However, even in these apparently well-defined tumor populations, there are differences in behavior such that the following conclusions must be drawn: other unknown genetic parameters (e.g., the genetic background of an individual) play an important role. Immunological tumor markers are useful auxiliary parameters, among other conventional diagnostic parameters, but they currently play only a role. However, they can be used for the purpose of prescreening suspicious cells.

Histology plays an important and indispensable role in the identification of degenerated tissue, but is not precise in early diagnosis.

Since most tumors are not well characterized at the molecular level for diagnostic purposes, there is generally no possibility of subdividing into stages or into fractions according to degree of risk. However, this subdivision is an absolute prerequisite for improved selectivity of therapy, especially for the development of effective new drugs and gene therapy.

2.3 State of the Art for the study of the number, type and nature of the possible Stable states of cells of higher organisms

There is now increasing evidence that complex regulatory systems (an excellent example being cellular regulation) can exist in a limited number of stable states when taken alone, above a critical minimum complexity, below a critical maximum connectivity (of the average number of components linked to any given component) (Kauffman, s.a., "origin of Order", oxford university Press, 1993). In this connection, the term status should be understood as a concept of choice of general phenomena. As regards the cells which are the biological regulatory system, mention may also be made of the state of differentiation or the cell type. While this connection has not been proven-even the limitation of the possible state of a biological system has not been demonstrated-the practical meaning should be very important. With regard to the fixed information content of biological cells (in fact, such constancy exists substantially only within one population), if only a limited number of stable states exist, denatured cells may also be in only one of these states or in a transition state between the possible states. At this time, it is impossible to define these states on a molecular basis. According to the state of the art, it is almost impossible to achieve a correlation between individual status and cell behavior. However, this analysis clearly contributes to the diagnosis and prognosis of the disease. It is even possible to establish a correlation between the likely state of the diseased cells and the most appropriate therapy. Furthermore, it is possible that this method can also have a decisive influence on the choice of treatment time. For example, if one wants to find tumor cells in a transition state between possible states, one can assume that such a cell population is more likely to be subject to a treatment-induced selective pressure and thus can escape more easily. In this case, this transitional cell population has greatly enhanced flexibility and can easily reach a possible steady state, where the selection pressure will be eliminated and the treatment is ineffective. Methods that can classify cells and groups of cells according to state would also be useful to recognize, understand, and possibly solve these problems. However, according to the state of the art, it is not possible to determine whether only a limited number of cell states are present. It is thus not possible to distinguish groups of cells according to abstract criteria regarding states and to predict these states with certain cell behaviors.

2.4 genetic diseases

Today, the genetic map of the human genome comprises 2500 so-called microsatellites. These tools are used to locate a large number of genes, usually genes whose defects cause genetic disease, by linkage analysis and then identify them. This elucidates the common genetic diseases caused by a single defective gene from the point of view of the geneticist's law, and multigenic diseases can also be understood in this way. Many polygenic diseases are common and are included in so-called widespread disease. Asthma and diabetes are examples thereof. Many types of cancer are also included. The application of the above-described linkage analysis strategy was also initially highly successful. In many cases, a large number of causative genes have been found for important polygenic diseases such as diabetes, schizophrenia, atherosclerosis and obesity. In addition to the availability of appropriate molecular biology laboratory techniques, the presence of a relatively large number of patients and their relatives affected by each disease is an important prerequisite for genetic elucidation. Over the past two years, it has become apparent that the number of hundreds of patients initially used for multigenic disease linkage analysis is very likely to be an order of magnitude lower. In any case, this applies to the case where the complete disease gene profile is to be elucidated. Because the level of manual work required for such linkage analysis is unusually high, only very slow progress can be expected in the analysis of polygenic diseases. Alternative strategies are being sought, as it is these diseases that are of great social and economic importance.

2.5 State of the Art DNA chips

The principle of Affimetrix has advanced to the full extent of all developments (e.g., U.S. Pat. nos. 5,593,839, 5,999,695 or 5,631,734). However, many other companies and research projects have produced DNA chips with different characteristics for special purposes (e.g., U.S. Pat. Nos. 5,667,667, 5,525,464, or 5,492,806 or, for example, Goffeau, A., Nature 385, 202-. Recent literature has reported commercially available HIV chips that allow for the examination of the complete HIV genome. The fluorescently labeled PCR product of the sample to be tested can hybridize with up to 400,000 oligonucleotides. The evaluation of the signal is carried out with the aid of a CCD camera. The known ability of these systems to be used for allele-specific hybridization has been exploited for a long time. This means that the signal is only maintained until the end of the hybridization and washing steps where the sample is absolutely complementary to the immobilized oligonucleotide. The examination of known gene sequences for the detection of mutations has been successful because every partial region of the complete sequence is present on the substrate in the form of an oligonucleotide sequence and these oligonucleotide sequences can be regarded as every possible deviation from the normal sequence. The efficiency of the chip method is partly due to the fact that sequence information of a large number of genes or gene loci is obtained by two simple work steps, i.e., hybridization and washing.

2.6 analysis method of Length determination

Several embodiment variants of the method according to the invention require extremely rapid and accurate gravimetric determination at the end of the method. Since the fragment length must be measured for tens of thousands of data points, a very efficient measurement system is required. According to the state of the art, possible systems include automated sequencing devices (U.S. Pat. No. 4,811,218), capillary electrophoresis (e.g.Woolley, A.T., et al, analytical chemistry (anal.chem.)68, 4081-. The prior art enables these methods to be implemented efficiently, although considerable modifications and introduction of new logic for the method according to the invention are required.

2.6.1 Mass Spectrometry

The weight of short DNA sequences can be determined accurately by MALDI-TOF mass spectrometry. Furthermore, methods exist in the prior art which combine these analytical methods with primer extension reactions. In this method, for example, an oligonucleotide having a specific sequence is hybridized with a DNA sample, and only one of four nucleotides is added in each reaction. It is known which nucleotide is added to the 3 'end of the oligonucleotide by a polymerase after hybridization, so that the identity of the base after the 3' end of the oligonucleotide can be determined. Variations of this method include a method that allows the determination of the length of a repeat sequence that contains only two of the four possible bases. In this method, the natural nucleotide, which is complementary to the base present, and one or two other nucleotides so modified are added as terminators of the polymerization reaction, so that the reaction is terminated after the repeated sequence. The terminating agent is typically ddNTP. The length of the repeated sequence can be derived from the length measurement.

2.7 State of the Art methylation analysis

The modification of the genomic base cytosine to 5' -methylcytosine represents an epigenetic parameter which is by far the most important one and has been well examined. However, methods exist to determine the overall genotype of cells and individuals, but to date there are no comparable methods to generate and evaluate epigenotypic information on a large scale.

In principle, there are three fundamentally different methods for determining the 5-methyl state of cytosines in a sequence.

The first method is based in principle on the use of "methylation sensitive" Restriction Endonucleases (REs). REs are characterized in that they produce nicks in DNA at certain DNA sequences (typically 4-8 bases in length). The location of these nicks can be detected by gel electrophoresis, transferred to a membrane and hybridized. Methylation-sensitive means that, for this step to occur, certain bases within the recognition sequence must be unmethylated. The band pattern after restriction and gel electrophoresis thus varies with the methylation pattern of the DNA. However, most of the CpG that can be methylated is located outside the recognition sequence of the RE and thus cannot be detected.

The sensitivity of this method is very low (Bird, a.p., Southern, e.m., journal of molecular biology (j.mol.biol.)118, 27-47). One variant combines PCR with this method; amplification by two primers flanking the recognition sequence after the nick is only performed if the recognition sequence is in methylated form. In this case, the sensitivity could theoretically be increased to one molecule of the target sequence; however, only individual positions can be detected at a significant cost (Shemer, R. et al, PNAS 93, 6371-.

The second variant is based on the partial chemical cleavage of the entire DNA using the Maxam-Gilbeft sequencing reaction model, ligation of adaptors to the ends thus generated, amplification using universal primers, and separation by gel electrophoresis. Using this method, a specified region having a size of less than several thousand base pairs can be detected. However, this method is too complex and unreliable and is practically no longer used (Ward, C, et al, J. Biochem. 265, 3030-3033).

A new method for DNA detection to determine the presence of 5-methylcytosine is based on the specific reaction of bisulfite with cytosine. The latter is converted under appropriate conditions into uracil, which corresponds to thymine and also to another base with respect to the base pairing involved. 5-methylcytosine is unmodified. As a result, the original DNA is transformed in such a way that methylcytosine, which originally could not be distinguished from cytosine by hybridization behavior, can now be detected by "ordinary" molecular biology techniques. All of these techniques are based on base pairing and can now be fully utilized. As far as sensitivity is concerned, the prior art is defined as a method which contains the DNA to be tested in an agarose matrix, for preventing diffusion and renaturation of the DNA (bisulfite reacts only with single-stranded DNA), and replaces all precipitation and purification steps with rapid dialysis (Olek, A. et al, nucleic acids research (Nucl. acids. Res.)24, 5064-. Using this method, individual cells can be detected, which illustrates the potential of this method. However, only individual regions up to about 3000 base pairs in length have been detected to date, and comprehensive detection of cells to identify thousands of possible methylation events is not possible. However, this method cannot reliably analyze minute fragments from a small sample amount. Despite the protection against diffusion, these samples were lost through the matrix.

2.8 State of the Art in the application of the bisulfite technique

To date, with very few exceptions, (e.g., Zeschnigk, m., european journal of human genetics (eur.j.hum.gen.)5, 94-98; Kubota, t. et al, journal of nature genetics (nat. gene.) 16, 16-17), bisulfite technology has only been used for research. However, specific short fragments of known genes after bisulfite treatment can be routinely amplified, either by complete sequencing (Olek, A. and Walter, J., Nature J. Gen. Genet. 17, 275-. All of these references are from 1997. The concept of using complex methylation patterns-less by evaluating algorithms such as neural networks-to correlate with phenotypic data applicable to complex genetic diseases has not been mentioned in the literature today; furthermore, it cannot be performed according to the methodology of the prior art.

3. Problems of the invention and solutions to the problems

In summary, the prior art has weaknesses, which can be solved by the method according to the invention.

The problem is solved by a method for the characterization, classification and differentiation of tissue and cell types, for the prediction of tissue and cell group behavior, and for the identification of genes with altered expression, characterized in that:

in genomic DNA obtained from any tissue sample, which may have been treated, subjected to shearing or cleaved by restriction endonucleases in a manner known per se, the base cytosine, but not 5-methylcytosine, is converted to uracil in a manner known per se by treatment with a bisulfite solution,

amplifying the portion of genomic DNA thus treated using either very short or degenerate oligonucleotides or oligonucleotides complementary to adaptor oligonucleotides which have been ligated to the ends of the cleaved DNA prior to bisulfite treatment,

in general, the amount of remaining cytosine on the guanine-rich DNA strand and/or guanine on the cytosine-rich DNA strand from the amplified portion is detected by hybridization or polymerase reactions, in such an amount that the analysis produces data which can be automatically adapted to the processing algorithm, making it possible to draw conclusions about the phenotype of the analyzed cellular material.

According to the invention, this is advantageous:

data obtained from several or more such assays of DNA samples derived from phenotypically identical or similar cells or tissues is correlated during a training phase to the phenotype of the cells for which DNA was detected using neural networks or other evaluation algorithms,

using the correlation between phenotype and methylation status, the data collected in the evaluation mode during the training phase is used to derive the phenotype of the cells in which the DNA was detected, by the generation of methylation status of a DNA sample of unknown origin, or

The data taken in the evaluation mode during the training phase are used to identify cytosine positions in the DNA examined that are different from the methylation state determined during the training phase, using the methylation state of DNA of a known cell type.

In addition, according to the invention, it is advantageous to cleave DNA containing 5 ' -CpG-3 ' cytosines in its recognition sequence with a restriction endonuclease before the bisulfite treatment, but it is also advantageous to cleave DNA only at recognition sequences in which the cytosines in the 5 ' -CpG-3 ' are in unmethylated form at the 5 ' position.

Furthermore, according to the invention, the following is also advantageous:

before the genomic DNA is modified in a manner known per se with a bisulfite solution, the genomic DNA is cleaved with a restriction endonuclease,

the resulting ends are provided with a known, strand-drawing short DNA sequence, also called linker,

oligonucleotides complementary to the adaptors which have been treated with bisulfite are used to amplify all DNA fragments or subpopulations from the total fragments thus generated after bisulfite treatment.

In this connection, it is advantageous to carry out the reaction of the genomic DNA probe with the bisulfite solution while maintaining methylcytosine under a cyclic variation of the reaction temperature from 0 to 100 ℃ in order to convert cytosine to uracil.

It is also preferred that the DNA is cleaved before the bisulfite treatment and then introduced into a heatable porous capillary which is permeable only to small molecules, wherein the subsequent reaction steps of the bisulfite treatment are carried out by adding and removing reagents by dialysis.

Furthermore, according to the invention, it is advantageous to transfer the sample into a heatable capillary that is impermeable to small molecules before the bisulfite treatment, wherein the subsequent reaction steps of the bisulfite treatment can be carried out by supplying reagents addition and removal via the connected capillary.

Furthermore, according to the invention, it is advantageous to carry out the polymerase reaction after the bisulfite treatment in the same capillary as the bisulfite treatment or in a capillary connected to the capillary, or in a container connected to the capillary.

In capillaries in which the polymerase reaction is carried out with bisulfite-treated DNA samples, it is also advantageous to carry out the separation according to the length of the resulting fragment population.

Furthermore, the treated DNA is preferably separated therefrom by bisulfite precipitation.

Furthermore, according to the invention, it is preferred for the amplification of genomic DNA samples treated with bisulfite to combine two types of oligonucleotides, one of which contains no cytosine base or its analogs, or only to a very small extent in the 5 ' -CpG-3 ' base, or only in regions of the oligonucleotides which are not essential for amplification, and the other of which contains no guanine base or its analogs, or only to a very small extent in the 5 ' -CpG-3 ' base, or only in regions of the oligonucleotides which are not essential for amplification, such as the 5 ' region, wherein the two types of oligonucleotides

a) Should be so short that in each amplification containing only one of the two classes representative, more than 100 different fragments are amplified, or

b) These oligonucleotides contain so many so-called degenerate positions that in an amplification of only one representative of each of the two classes, more than 100 different fragments are amplified, or

c) So many representatives of both classes of oligonucleotides are used in the amplification that more than 100 different fragments are amplified.

It is believed to be most desirable to mix the treated and amplified DNA with different oligonucleotides in each reaction in separate preparations for polymerase reactions, these oligonucleotides

Complementary to the adaptors at the 5' end, or generally for amplification of the oligonucleotides treated with bisulfite, and

in each reaction, is different at its 3' end, and

the variable 3' end of which begins downstream of the known linker sequence or oligonucleotide sequence,

and the variable 3' end extends into the unknown template DNA sequence via a known linker sequence of between 2-12 nucleotides.

In this connection, it is also particularly preferred that these reactions, in which the polymerase reaction is started with an oligonucleotide which is complementary to the DNA treated with bisulfite, contain, in addition to the three nucleotides dATP, dTTP and dCTP or analogs of these three nucleotides:

a nucleotide analogue which is complementary to the base cytosine and blocks any further extension of the strand after incorporation by a polymerase, or

Without nucleotides or nucleotide analogues complementary to the base cytosine.

Furthermore, according to the invention, it is preferred here that the reactions in which the polymerase reaction is started with oligonucleotides complementary to the bisulfite-treated DNA contain, in addition to the three nucleotides dATP, dTTP and dGTP or analogues of these three nucleotides:

a nucleotide analogue which is complementary to the base guanine and blocks any further extension of the strand after incorporation by a polymerase, or

Without nucleotides or nucleotide analogues complementary to the base guanine.

It is particularly preferred in this regard that the termination of the polymerase reaction takes place at a position which earlier contained methylcytosine in the DNA sample, using a terminator which itself is modified in such a way that it allows the detection of the specifically terminated polymerase reaction product.

Furthermore, according to the invention, the different fragment mixtures of the individual reaction products resulting from the appropriate combination are applied to the individual points of the ion source of a MALDI-TOF or another mass spectrometer and the fragment composition of the individual reactions is determined by determining the weight of all DNA fragments.

Furthermore, it is preferred that different fragment mixtures of the respective reaction preparations resulting from the appropriate combinations are added to the respective lanes in gel electrophoresis, and the fragment composition of the respective reactions is determined by determining the length of all DNA fragments.

Furthermore, the oligonucleotides according to the invention, with which the polymerase reaction is initiated, are coupled to different chemical labels depending on the sequence of the oligonucleotide, the chemical and/or physical properties of which enable the detection and differentiation of the different labels by standard chromatographic or mass spectrometry methods.

In this respect, it is particularly advantageous:

the fragment portion of the DNA to be examined which has been treated with bisulfite and prepared in the first amplification step is simultaneously mixed with two or more chemically differently labeled oligonucleotides,

these oligonucleotides are used as primers for polymerase reactions in reaction preparations,

the resulting complex mixture of fragments is subjected to electrophoretic separation according to length in a first analytical step, and

individual length fractions of the fragment mixture resulting from the electrophoresis are subjected to chromatographic or mass spectrometric analysis, which detects the presence or absence of chemical labels characterizing the oligonucleotides in each length fraction.

Furthermore, according to the invention, the following oligonucleotides are applied to one surface:

does not contain the base cytosine or an analogue thereof, or is contained only in the 5 '-CpG-3' region or only in a region which is not essential for hybridization with the sample DNA,

or contain no guanine base, or only in 5 '-CpG-3' or in regions not essential for hybridization with sample DNA.

In this connection, it is preferred according to the invention that the DNA sample is treated with bisulfite and amplified

Hybridization with oligonucleotides immobilized in a known manner on a surface, so that for each point of the surface it is known which oligonucleotide sequence is present at that point, hybridization of amplified sample DNA with immobilized oligonucleotides only takes place when the oligonucleotide and the sample DNA are completely complementary in the region necessary for hybridization, or is retained only after a suitable washing step.

Another object of the invention is a kit characterized by combining at least the two components described above (e.g.a combination of an oligonucleotide for amplifying DNA amplified with bisulfite treatment and an oligonucleotide immobilized on a substrate for detection), for the treatment of DNA with bisulfite, the amplification of DNA thus treated, and the detection of the methylation status of more than 100 CpG dinucleotides in the genome of a mammal in a reaction, enabling a clinically relevant diagnosis of a cancer disease.

The method solves the problem of determining a very large number of parameters diagnostic for cell behavior. For this purpose, a completely new concept of cell analysis must be set forth, a completely new evaluation mechanism must be combined with the analysis, and the technical basis for data generation must be available. This method makes use of the information content of cytosine methylation for the first time, thus making available the analytical methods and associated evaluation algorithms required for this purpose. The method according to the invention can therefore be used for the purpose of finding secondary relevant loci in cells affected by genetic defects, which loci either could not be determined theoretically or with great difficulty using the methods according to the prior art: the method shows a genetically modified locus, the (possibly extra) genetic alterations of which do not contain any actual alterations in the base sequence. Thus, the method according to the invention makes available targets for new therapeutic strategies. Furthermore, the method solves the problem of the classification of degenerated cells in such a way that a much more precise or more precise correlation between (extra) genotype and phenotype is established than in the prior art. In addition, the method according to the invention makes it possible to predict the likely future behaviour of degenerated cells and the response of these cells to stimuli in vivo or in vitro. Finally, the method also helps to select the optimal treatment for cancer diseases. Moreover, the method allows the detection of common genetic and/or biochemical characteristics of tumor cells of similar phenotype but different genotypes (to the extent that these differences can be detected by prior art techniques). The claims of this method are based on the assumption that the most different genotypes can lead to very similar epigenotypes and thus to very similar phenotypes. Thus, the method also enables detection of such changes in tumor cell gene expression that are not caused by changes in base sequence, or are only indirectly caused thereby.

4. Detailed description of the solution to a specific problem according to the method of the invention

The method solves the specified problems in an innovative way through a combination and improvement of the different methods of the prior art. These methods are used according to certain modifications of the invention known per se for adapting them to new needs, thus resulting in a completely new overall method, which will be described below with reference to preferred variants of the method and will be described by way of example.

4.1 pretreatment of the bisulfite solution-treated DNA samples

The basic process steps, such as isolation of the tissue or cells and extraction of the DNA from the latter, are carried out in a manner known per se. However, the extraction of DNA for further analysis will be carried out in small amounts in a preferred variant of the method, usually in an oil layer, as with the treatment with bisulfite itself, which prevents contact with the environment. The aim of the method is to reduce the loss of DNA, so that even with very small starting quantities reproducible results can be guaranteed. The extraction of DNA from cells or tissues can also be carried out directly in capillaries as described below, in which all subsequent reactions can be carried out. However, the limitation of the extraction volume is not an essential part of the method.

The extracted DNA can now be subjected to bisulfite treatment in untreated form, to shearing, or to specific cleavage with restriction endonucleases.

In this regard, the method according to the invention can be subdivided into two different method variants. A variant in which the final detection of individual methylcytosine positions is carried out by hybridization with oligonucleotides, generally without the need for additional DNA pretreatment in this regard. A second variant, characterized in that the whole genome amplification of the DNA sample is carried out via oligonucleotides and samples which are complementary to adaptors which are ligated to the ends of the DNA and treated with bisulfite, requires the ligation of these adaptors to the individual fragments of the cleaved DNA. The linker is a double-stranded short DNA molecule, typically exhibiting a single-stranded projection. The projection is complementary to the ends of the cleaved DNA sample, so that such linkers can be attached at both ends of the sample DNA fragments using appropriate ligases. For this purpose, the amount of linker must be increased so that it is present in excess relative to the number of fragment ends. However, the ligation of the adaptors to the sample fragments can in principle also be carried out without complementary single-stranded projections. The individual reactions are essentially known from the prior art (Sambrook et al, Molecular Cloning: A laboratory Manual, CSHLP, 1989) and are therefore not described further. The combination of adaptors with bisulfite treatment and subsequent whole genome amplification is in principle innovative and is not mentioned in the literature or patent literature.

4.2 modification of the bisulfite Process according to the invention

All variants of the method according to the invention are based on a method for modifying single-stranded DANN with bisulfite. However, in order to make possible certain variants of the process according to the invention, certain modifications to the bisulfite process are required.

The main variant of the method is based on the fact that, on the one hand, the total amount of starting material should be small (in limited cases, only one cell or a few tens of cells), but also on the fact that several variants of the method actually require the use of tiny fragments. In addition, the routine use of the method according to the invention for clinical diagnosis requires the automation of all method steps, so that the highest possible degree of reproducibility is achieved.

All steps of the bisulfite method should therefore be carried out in small amounts, completely protected from "external" influences. The bisulfite reaction contained in the agarose matrix has advanced with respect to fragment diffusion, but the reaction is still carried out in a very large amount of aqueous bisulfite solution. As a result, important small DNA fragments can diffuse into the solution and thus cannot be further analyzed.

The method according to the invention comprises the implementation of the bisulfite method without using any additional volume. For example, the bisulfite reaction is carried out in oil in a volume of only 1-10mL, and all the components can be moved directly under the oil by a robot where they form a single droplet in which all subsequent reaction steps are carried out. The difficulties of preparing the bisulfite solutions at the concentrations required according to the prior art, and the dilemma of the fact that solutions with lower reaction times and lower bisulfite concentrations lead to significant damage to the DNA sample, can be solved by the method according to the invention.

The method takes advantage of the fact that the different steps of the bisulfite reaction are equilibrium reactions. For two important reaction steps: the sulfonation and subsequent deamination of cytosines, these balances being on the correct (sulfonated and deaminated) side at different temperatures. If the kinetics for establishing the individual equilibrium are taken into account, it is obviously advantageous to carry out the bisulfite reaction under cyclic conditions of varying temperature. A preferred variant of the process comprises a change from 4 ℃ (10 minutes) to 50 ℃ (20 minutes). However, the process according to the invention should comprise all other temperatures and reaction times at a certain temperature. For example, under certain conditions it is advantageous if a greatly reduced reaction time is adjusted. It is also useful and in principle novel to insert a step of denaturing the DNA to be tested again at very high temperatures between the deamination step (at high temperature, 50 ℃) and the subsequent repeated sulfonation step. For high molecular weight DNA, the denaturation temperature is usually > 90 ℃, but may also be below this temperature, but still be within the scope of the method. This has two reasons. On the one hand, there are variants of the method for detecting very short DNA fragments. On the other hand, in each reaction cycle, the complementarity between strands decreases due to the conversion of cytosine to uracil that occurs. Thus, cyclic reaction schemes can exhibit extreme complexity. For example, in the first cycle, the denaturation temperature may be above 90 ℃, but in later cycles, it may be adjusted to a lower value. In all cases, multistep reactions can only be optimized by carrying out very relevant test series. Thus, the claimed protection generally relates to a cyclically performed bisulfite reaction.

Other prior art solutions to the above problem are based on the transfer of one or more steps of the method into the capillary. There are generally two variations: capillaries can be 1) impermeable, or 2) permeable to certain solvents, like very fine dialysis tubing.

The variant according to 1) shows that droplets containing DNA, bisulfite and free-radical scavenger as described in the above example can be added from the outside to an aqueous solution by means of a heatable, coolable capillary. In this way, the droplets can be separated in the liquid or gas phase within the capillary. All reactions are then carried out in the capillary and other reagents can be added through the inlet fitting. Since the capillary according to variant 1) is completely closed to the outside, a matrix solution has to be added for the subsequent step, which causes the above-mentioned problems and requires a solution according to the invention.

The variant according to 2) shows that, firstly, only the DNA solution can be passed through a porous capillary, which is pretreated by a corresponding pretreatment method using the method steps according to the invention or further method steps. The capillary itself is capable of passing the solution through the vessel required for the reaction step within the capillary. In particular, in this variant, the DNA solution inside the capillary is first passed through a bisulfite solution, which can additionally be subjected to periodic temperature changes or a constant temperature. In another step, after completion of the bisulfite reaction, the capillary tube is passed through a dialysis solution, then through an alkaline solution, and finally through another dialysis solution. After these bisulfite treatment steps in the capillary, another variant of the method is provided in which all other PCR and primer extension steps are performed in the same capillary. When the different primers used for primer extension according to the invention are labeled by specific chemical modifications, capillary electrophoresis can also be performed in the extension of the same capillary directly after all these PCR and primer extension steps. In electrophoresis, the extension products are separated according to length, and subsequent mass spectrometry, chromatography or optical analysis separates the collected size fractions according to their labels, thereby producing a result spectrum or a result chromatogram in a second analysis process.

The use of capillaries for bisulfite and PCR and/or extension reactions also simplifies another detection variant according to the invention. In fact, as described further below, the fragments can be introduced into the capillary immediately after amplification, carrying on their insides oligonucleotides specific for individual methylcytosines as hybridization partners.

Another variant of the method is based on the removal of high molecular weight bisulfite solutions without dialysis. The advantage of this variant is that the other disadvantages of the variants described hitherto are eliminated.

Each dialysis in agarose allows part of the process to occur in a large volume of aqueous solution. As a result, there is a risk of DNA loss due to diffusion. One problem with the variant performed in the capillary is that, in the case of minute amounts of DNA, a small percentage of DNA fragments may be predominant, which can bind to the inner wall of the capillary and thus be lost and cannot be analyzed.

Therefore, the following methods are proposed: DNA extraction was performed in small amounts under the oil layer as described. In a preferred variant of the method, the volume is 1 μ L. Naturally, the method is substantially invariant with the use of smaller or larger volumes. Accordingly, these methods are also within the scope of what is claimed. The DNA was denatured (as described above). The required bisulfite concentration is then increased by adding a larger volume of bisulfite solution (e.g. 4 μ L) than required for proper treatment, so that the required final concentration and pH are automatically determined under oil. The bisulfite reaction is subsequently carried out in one of the ways described.

In the next process step (in a preferred variant according to the invention), a small molar amount of a salt, for example barium hydroxide, whose cation forms an insoluble salt with the bisulfite, is added to the solution, whereupon it precipitates out of solution. The addition of this solution also increases the pH to a value which enables the desulfonation of the cytosine which was sulfonated and deaminated in the first reaction step. In the desulfonation reaction which occurs very rapidly, the precipitated bisulfite can be separated in the aqueous sample solution by brief centrifugation. However, it is preferred to use salts having the following characteristics. The cation forms a salt with bisulfite, which is insoluble even under the conditions of the amplification process and has no harmful effect on the amplification process at all. In addition, none of the ions that are not precipitated in solution in this way is in an amount such that the amount of ions present interferes with the amplification process. Moreover, possible interference of these salts in the amplification process can also be prevented by using salt solutions which are very finely prepared and which can also be transferred very precisely. The use of an equivalent amount of salt results in a reduction in the number of potentially interfering ions. The use of potassium bisulfite and other counterions supplemented into the subsequent amplification buffers also simplifies the buffer changes described below for the amplification reactions.

In the next method step, an additional volume of solution having the following characteristics was added under the oil. The salt component is such that, during mixing with the treated DNA solution located under the oil, a salt concentration and pH are achieved that allow for an enzymatic amplification process. In this connection, all thermostable polymerases of any origin can be used. The type of polymerase used is not critical and may vary depending on the buffer conditions present, and therefore all applications of these polymerases are claimed. Secondly, the solution contains this polymerase, all nucleotides and the required oligonucleotide primers. After addition of this solution, amplification can be performed directly in the same reaction vessel. Thus, contact with the "outside world" is not possible in all processes; even with minimal sample loss.

4.3 genome-wide general amplification of bisulfite-treated DNA

In each case, detection of thousands to millions of methylcytosine positions requires amplification of a large percentage of all possible sequences of the sample genome. This part of the method according to the invention should be subdivided into two principle-different variants, as is done in the "pretreatment" part.

The first variant of these method steps is based on the ligation of adaptors to the fragment DNA prior to bisulfite treatment. In its simplest form, an oligonucleotide complementary to the linker sequence and appearing after bisulfite treatment is used for this purpose. In this method, the oligonucleotide is capable of hybridizing to any region of the linker sequence. In a polymerase reaction with these components, amplification of all fragments with linkers at both ends is theoretically caused. For example, this may be all fragments that cause restriction endonuclease cleavage beforehand. However, for certain variants of this method, since only a limited number of individual fragments are produced in one such amplification, it is necessary to subdivide the reaction into different partial reactions after a small number of amplification cycles. These partial reactions can now be performed with oligonucleotides, a few of which extend from the appropriate linker sequence, i.e. 1-4 bases, into the unknown sequence of the different fragments. Oligonucleotides are selected for different reactions in such a way that each covers a part of all possible unknown sequences, so that the totality of all these oligonucleotides in different reactions comprises all possible sequences which may theoretically be located after the known linker sequence. For example, four reactions can be set up, where the oligonucleotide of the first reaction contains the base adenine at the 3' end after the known adaptor complement sequence, cytosine in the second reaction, guanine in the third reaction, and thymine in the fourth reaction. This principle naturally also applies to more than 4 different reactions in which the sequence at the 3' end of the oligonucleotide contains more than one base. The position of the 3' end of the oligonucleotide can also represent a so-called degenerate position. This means that in one position more than one base with similar efficiency is attached to the oligonucleotide, or two or more oligonucleotides are mixed with non-degenerate sequences. Thus, all possible sequences can be covered with a total number of reactions that is not a power of 4.

In this way, a subset of all fragments can be amplified in each reaction, resulting in higher reliability and higher amplification of individual fragments. In principle, stepwise subdivision of the reaction is also possible, so that only one oligonucleotide covering all sequences is used for the first amplification cycle, and the subsequent reactions are subdivided into, for example, four reactions each containing one specific 3' base, followed by several further amplification cycles, and then followed by one or more subdivisions. One key point here is the accurate determination of the amount of oligonucleotide added. Ideally, an amount of oligonucleotide is added to each series of amplification cycles such that it is completely or nearly completely consumed during the reaction. The reaction mixture of each cycle can then be transferred directly or automatically to other steps.

Alternative variants with different principles do not require pre-ligation of adaptors to pre-cut DNA. Several methods have been specified in the prior art which, to varying degrees of success, achieve whole genome amplification of DNA. All these methods must be varied according to the method of the invention. We tested the application of three different methods. First, as a preferred variant, we apply an improvement of the "DOPE" technique. Unlike the methods described in the literature, we use two or more different oligonucleotides in each amplification, which can be subdivided into two classes. These classes are characterized by the absence, or near absence, or presence only in the 5' region of the base guanine in one class and the base cytosine in the other class. If these bases are present completely in these oligonucleotide sequences, they are usually in the 5 '-CpG-3' sequence. The aim is that each of these two classes of oligonucleotides hybridizes to either the two (G-rich) strands present after bisulfite treatment or to the (C-rich) opposite strand copied from these strands using a polymerase reaction. By combining the representatives of these two sequence classes, it is then possible to achieve amplification of the bisulfite-treated DNA. In most cases, cytosines outside of the 5 '-CpG-3' sequence should be converted to uracils in the template DNA so that guanine is not required for efficient amplification of oligonucleotides that can hybridize to bisulfite-treated strands. On the opposite strand, the same applies to guanine. In these classes of oligonucleotides, if guanine or cytosine is present in the 5 '-CpG-3', the possibility arises that these oligonucleotides can also hybridize to potential methylation positions. This is not useful for the method. However, it happens that the disadvantages are so few that a considerable part of the method can also be carried out in this way. Therefore, the scope of protection should also include these oligonucleotides. The presence of individual guanines in positions other than 5 '-CpG-3' is also contemplated, although in principle this tends to be detrimental to the efficient implementation of the method. During hybridization of the oligonucleotide to the target DNA, which is necessary for amplification, this usually results in the generation of sites which are not base-paired, which in most cases reduces the amplification efficiency and is therefore undesirable. However, amplification of this strand is possible, although not ideal, using oligonucleotides containing one or a few guanine bases. Since such amplification still completes the subject of the invention, the use of these oligonucleotides which do not strictly fall into this category due to the use of several guanines should also be within the scope of protection. We use in particular the second technique, which requires an exception of this type. In this technique, oligonucleotides are used whose 3' region in principle belongs to one of the sequence classes mentioned. In the 5' region of these oligonucleotides, a so-called "sequence tag" is attached, which is used in a subsequent step for further amplification. In this variant, in a first cycle of amplification, a large number of fragments are amplified with the 3' region of oligonucleotides which in principle belong to one of the above-mentioned classes. In a subsequent step, each fragment amplified so far has a sequence at the 3' end corresponding to the sequence tag. Similar to amplification using oligonucleotides complementary to the adaptors, these sequences can then be used as hybridization partners for one of the oligonucleotides used for the other amplification. Naturally, the sequence tag of this first oligonucleotide may contain guanine in the 5 ' region of the oligonucleotide, which 3 ' region belongs to the first class, and cytosine in the 5 ' region, which belongs to the second class.

Oligonucleotides, or oligonucleotides belonging to one of the two classes according to their 3' region, can be constructed differently. Our variant of the DOPE method uses a combination of oligonucleotides of two sequence classes, which show a predetermined base sequence in the 3' region. In the method according to the present invention, the base sequence may have a length of 2 to 20 bases. The first type has a portion of the "H" position, typically 5-20 bases in length, before this sequence, while the second type has a "D" position. This means that in the synthesis of oligonucleotides, one of the three bases A, C or T is incorporated into these positions in class "H", and one of the three bases A, G or T is incorporated into class "D" (where the above exceptions that do not affect the subject invention are within the protection). Before this part (5'), there may be (but is not necessarily) another part with a specific sequence. If these oligonucleotides are used under conditions corresponding to the amplification of bisulfite treated DNA, a small part of the complete genome can be amplified in a reproducible manner, which may be determined to cover a specific region of the oligonucleotide. In the case of sequence tags, the 5' region of the oligonucleotide can have a defined sequence, which breaks through the definition of two sequence classes. Oligonucleotides are also included within the scope of protection for the overall method if they contain regions "H" and "D" in the 3' region, or if they contain specific base positions alternating in either form with the "H" or "D" classes.

In addition, the scope of protection shall also include oligonucleotides used as amplification primers, which are used in the general concept of the method and form "hairpin" structures at their 5' ends; molecules that exhibit base pair behavior similar to that implied in the above description, e.g., PNA (protein-nucleic acid) based oligonucleotides, chemically modified oligonucleotides; and modified or unmodified oligonucleotides synthesized with nucleotides other than natural nucleotides.

4.4 detection of methylation status of CpG dinucleotides

4.4.1 detection of methylated CpG dinucleotides on DNA chips

In its final form, the method according to the invention is applied to DNA chips. Thus, the use of a DNA chip represents a preferred variant of the method. In principle, all the described variants of the method make it possible to achieve amplification of bisulfite-treated DNA. In a preferred variant, the chip for implementing the method has the following form: on one surface provided for this purpose, at least one thousand, and often more than one hundred thousand, oligonucleotides are synthesized in situ in a known manner, or else micropipettes or nanopipette (nanopipette), stamp-like devices or microfluidic (microfluidic) networks are applied. Each oligonucleotide is specific for one CpG position; this means that an oligonucleotide can hybridize to the target DNA only if the CpG position it contains is methylated or if this position is specifically unmethylated. Thus, for each position, at least (see below) two oligonucleotides can be used. There is no upper limit for the number of different oligonucleotides, which can even be 8 times higher than the CpG dinucleotides contained in the genome. For each point of the DNA chip, it is known exactly what oligonucleotide sequence is located there.

The method according to the invention leads to the necessary modifications normally occupied by such DNA chips. On a DNA chip according to the prior art, there are oligonucleotides which are complementary to the genomic or expression sequence. This means that all oligonucleotides correspond on average to the base composition of the genomic DNA or of the biologically expressed sequence. For most oligonucleotides located on such DNA chips, i.e., all four bases, on average, the ratio of guanine and cytosine bases corresponds to the ratio of genomic and/or expressed sequences.

This situation differs in the method according to the invention. In principle, 8 types of oligonucleotides can be synthesized for each sequence covered by the oligonucleotide. Due to the bisulfite treatment, the DNA is modified such that the initially complementary top and bottom strands (Watson and Crick strands, also known as the coding and template strands) are now no longer complementary. This means that oligonucleotides can be synthesized for both strands. This possibility exists because it is possible in this way to use both chains as internal controls for each other. Due to the partly significant differences in the sequence, the hybridization behavior of the two different strands with the oligonucleotides suitable for each case is different. As a result, when the same result is achieved for both chains, it can be considered that this has been independently confirmed. The amount of methylcytosine and cytosine at each position measured should also be quantified. The use of two strands allows quantification of data independent of different hybridization parameters of the oligonucleotide due to the evaluation of different hybridization events at each CpG position. The background error is thus minimized.

After the bisulfite treatment, not only the two strands differ, because, after the treatment, amplification is carried out in each case, which effects a new synthesis of the complementary opposing strand at each of the two strands. Just as the original strands are not complementary to each other after bisulfite treatment, the two opposite strands are not complementary to each other. Neither is the newly synthesized one of the opposite strands complementary to the originally different strand (the one that cannot synthesize the opposite strand) in the amplification. Thus, two different hybridization targets are generated for each single CpG position. All of these four strands contain (here we assume symmetric methylation, i.e. methylation of CpG positions on both strands) the same information, but they hybridize to oligonucleotides with different sequences. Thus, each piece of information obtained at any CpG position is independently validated four times. However, the signal intensity of the four different oligonucleotides cannot be directly correlated with the degree of methylation at one position (except for experimental values generated using the system). In fact, it is the case that different fragments are amplified with different efficiencies also in the enzymatic amplification, and thus the signal intensity does not have to be correlated with the degree of methylation, but rather with the efficiency of amplification of fragments containing CpG positions. Thus, in each case, all four strands must be analyzed for two possible oligonucleotides, on the one hand those which hybridize only when the CpG position to be determined is methylated (containing CpG) and on the other hand those which hybridize only when the CpG position is unmethylated (and thus do not contain CpG). Two possible variants of a DNA strand, namely methylated and unmethylated variants, can be amplified with essentially the same efficiency and can thus be compared. Since complementary information is now available for all four strands, the overall result can also be confirmed with all four strands. In the method according to the invention, the main criterion for distinguishing oligonucleotides from other methods is that they contain in each case only three of the four bases. The oligonucleotide complementary to the original DNA strand contains only base C and no base G. Only half of all these oligonucleotides contain exactly one guanine, i.e. in the CpG at the position where its methylation state is to be measured. In contrast, the second type of oligonucleotide generated during amplification, which is complementary to the opposite strand of the original DNA, contains the base cytosine only in the position where its methylation state is to be determined. Those oligonucleotides which can hybridize only with the target DNA when the position to be determined is unmethylated contain neither cytosine nor guanine (depending on the strand). Naturally, the class 8 oligonucleotides may also be varied in other ways in the method. Several representatives of a class can also be used simultaneously for the detection of each individual position of possible methylation. For example, it is unclear how many bases are contained on each side of the potential methyl position on each side of the oligonucleotide in each case. The position capable of being methylated does not have to be exactly in the middle of the oligonucleotide. Thus, for each position measured, multiple permutations are possible.

In the extreme case, the site to be detected is located at one of the ends of the oligonucleotide, or even (although already a component of another variant of the method) a position after the 3' end, so that the presence of cytosine or guanine (and thus methylation of the original sample) is not detected by simple hybridization, but by primer extension. In this variant of the method, a modified nucleotide triphosphate having a different label for each of the four nucleotides is added to the target DNA (so modified that, although incorporation of this nucleotide at the 3' end of the primer is possible, no other extension through this nucleotide is possible. Instead of detecting hybridization directly, a polymerase is added and exactly one nucleotide is synthesized at each position at the 3' end of the oligonucleotide. The nucleotide complementary to the nucleotide incorporated at the 3 'end of the nucleotide corresponds exactly to the nucleotide located on the target DNA which can hybridize to the oligonucleotide one 5' before the oligonucleotide. In our method, this position is a position in the original DNA which may be methylated. Thus, when a position in a DNA sample is methylated (depending on the strand), C is located at that position; g is then "added" to the oligonucleotide. If dGTP (or an analogue of this nucleotide) is now unambiguously labeled and (as a prerequisite) the oligonucleotide sequences are known at all positions, then in this case the detection of guanine incorporation can be used to detect the presence of a methyl group in the original sample. If adenine is linked to the same oligonucleotide, detection of thymine has been successful and, for the same reason, it has been demonstrated that the detected position is unmethylated. The same demonstration can be performed on the opposite strand of the amplification preparation, except with the labeled ddNTP cytosine and thymine. In this variant of the method, the oligonucleotides of both sequence classes contain neither cytosine nor guanine. However, this rule can be broken in exceptional cases (e.g.when it is well known that a position is always methylated or always unmethylated, or if the methylation state of the position has no effect on the hybridization behavior of the oligonucleotide). Furthermore, one or several "mismatch positions" within the oligonucleotide can fulfill the essential requirements of the method despite the fact that they often have a detrimental effect. Thus, oligonucleotides which do not strictly belong to this sequence class but which satisfy the main part of the method are intended to be included in the scope of patent protection. In addition, attachment of the oligonucleotides to the surface of the DNA chip can occur through sequence tags on the oligonucleotides that are complementary to the universal sequence of the oligonucleotides attached to the surface. These oligonucleotides belong to a certain sequence class only in the regions suitable for hybridization with a DNA sample. Furthermore, oligonucleotides used as hybridization partners on the surface of a DNA chip should also be included in the scope of protection, for example PNA (protein-nucleic acid) based oligonucleotides, chemically modified oligonucleotides and modified or unmodified oligonucleotides synthesized with nucleotides other than natural nucleotides, if they have a base pair behavior similar to the one implied in the above description.

This naturally also applies to all variants of the method which are based on direct hybridization of the oligonucleotide to the position to be determined or to only one base on the primer extension: usually, only one position is detected, and this position also contains the complete cytosine or guanine content of the oligonucleotide. However, exceptions to this rule may not be meaningful in individual cases, and therefore they are also an object of the present invention.

Detection of the differently labeled nucleotide analogs in a primer extension reaction on a DNA chip (which may be degenerate to any degree) can also be accomplished in a variety of ways. A preferred variant is detection in a known manner by means of a CCD camera which records a fluorescence signal indicating that the (naturally fluorescently labeled) nucleotide has bound to the chip. In this regard, in the above-described variant of the method, each nucleotide analogue is labeled with a different color, so that it is possible to detect which nucleotide is incorporated at each position.

Yet another important variant consists in labeling each of the four nucleotide analogues with one chemical molecule, then separating the nucleotides photochemically (either by heat generation, or similar methods) by exposure to laser irradiation of MALDI-TOF, then ionizing directly and determining their molecular weight. The laser of the MALDI-TOF apparatus can be directed precisely to each position of the chip, so that it is also possible to determine for each position on the chip which weight change has occurred at said position. Usually (since in this variant both methylated and unmethylated target DNA can hybridize with the same oligonucleotide and the methylation state can be determined by the labeling of the incorporated nucleotide), two labels are detected at each position (this naturally also applies to fluorescent labels), and these two signals have to be quantified and compared with one another to determine the degree of methylation.

However, in a currently preferred method variant, detection based on fluorescence is used. Alternatively, hybridization may be detected directly without performing a primer extension reaction.

4.4.2 detection of the methylation State of Cytosine by Mass Spectrometry of the Length of the "primer extension" product

A variant of this method was developed which enables the detection of very large amounts of cytosine and/or guanine in bisulfite-treated DNA samples by mass spectrometry of length with a MALDI-based mass spectrometer. The basis for this technology for the improvement of this process has been described above.

In the method, we used oligonucleotides that, because they belong to one of the two sequence classes mentioned above, hybridize with the greatest probability to only one of the two strands of bisulfite-treated DNA. The oligonucleotides used in this variant of the method enable the detection of cytosine and/or guanine from an amplification mixture prepared by any of the amplification methods described above. This means that in principle it is possible to use oligonucleotides which are complementary to adaptors ligated to the sample fragments before the bisulfite treatment, as well as oligonucleotides which hybridize at undefined positions on the otherwise amplified fragments.

Preferred variants of this method include the use of DNA samples in which adaptors are ligated to their restriction fragments (and then amplified after bisulfite treatment), or DNA samples amplified with oligonucleotides containing a fixed sequence tag in the 5' region. To this end, the adaptors are synthesized such that after bisulfite treatment of both strands (i.e., the original bisulfite-modified strand and the strand newly synthesized during amplification) their cytosine or guanine content is different, enabling the preparation of oligonucleotides for primer extension reactions that specifically recognize one of the two strands. This means that, also in this case, two sequence classes of oligonucleotides can be distinguished. The oligonucleotides used have the property that their 3' region extends via the known linker sequence, via the sequence recognized by the restriction endonuclease and thus via the known sequence, to an unknown region of the DNA sample. If the universal amplification with stepwise extension of oligonucleotides as described above is carried out in successively subdivided separate reactions, the oligonucleotides described here also extend outside this known region. In this regard, the oligonucleotide can extend 2-20 bases into the unknown region. The mixture of fragments obtained from the first or from the first universal amplification is now subdivided and mixed with the different oligonucleotides of each (sub) reaction. In each sub-reaction, it is known which oligonucleotide is added, and the sub-reactions differ only in the sequence of the oligonucleotide that has to be added. It is not critical whether the oligonucleotide sequence is precisely determined, or whether each position is occupied by the degenerate nucleotide position "H" or "D" described above. The use of degenerate positions allows the use of longer regions that extend into the unknown region, thus allowing potentially more precise regulation and increase in the number and type of extension fragments produced by such reactions.

Using all the different sub-reactions, a polymerase reaction with the following composition was performed. Those reactions which contain oligonucleotides which hybridize to the cytosine-deficient strand (corresponding to the original strand of the bisulfite-treated DNA) contain the nucleotides dATP, dCTP, dTTP, and a terminator which, in terms of base pair behavior, is analogous to the nucleotide dGTP, such as ddGTP or functionally equivalent nucleotides. Reactions with oligonucleotides of other sequence classes contain a mixture of dATP, dGTP, dTTP and a terminator which is similar in base pair behavior to the nucleotide dGTP, such as ddCTP, or a functionally equivalent nucleotide. A new DNA strand is then synthesized by a polymerase reaction, starting with the oligonucleotide, which has only the first cytosine on one (cytosine-deficient) strand and the first guanine on the other strand.

For mass spectrometry analysis, it is also suitable to replace the naturally occurring nucleotides with nucleotides modified in a known manner by chemical means, in order to facilitate subsequent analysis of the extension products by mass spectrometry. To this end, in our variant, phosphorothioate analogues of the natural nucleotides are used. They can be alkylated in a subsequent step, which eliminates back-loading of the DNA and improves the quality and sensitivity of the analysis. However, other modifications are within the scope of protection if this is done for this purpose. Furthermore, modifications of the loading of the oligonucleotides used, and their hybridization properties, can be improved or modified.

The aim of this variant of the method is the preparation of populations of fragments in individual reactions, which are so complex, or only so complex, that they can be separated by gel electrophoresis or more precisely by mass spectrometry according to length. As a result, it is necessary to adjust the number of synthetic fragments and the degree of degeneracy over the length of the portion of the oligonucleotide extending into the unknown sequence range, so that in each reaction there is one fragment to as many as several thousand different fragments possible.

In a preferred variant, the individual reactions are nowadays applied separately in defined coordinates of the ion source of the mass spectrometer. The fragment spectrum for each coordinate is then determined by mass spectrometry. In the case of up to several thousand coordinates on the ion source of the mass spectrometer, several hundred fragments per spectrum, each of which estimates cytosine or guanine positions as an indication of methylation, it is also possible to estimate up to several hundred thousand individual CpG dinucleotides.

In a similar manner, it is also possible to detect a spectrum of fragments generated from a population of fragments amplified by the above-mentioned oligonucleotide primers without ligation of adaptors. In the case of this variant, the sequence complementary to the linker is omitted and instead a 5' region containing several degenerate positions is used.

In the case of pre-amplification of bisulfite-treated DNA with an oligonucleotide containing the above-mentioned (5 ') sequence tag in the 3' region, DNA similar to the adaptor sequence can also be used as the constant region to which the oligonucleotide hybridizes, as described in the section above.

4.4.4 detection of the methylation State of Cytosine by Mass spectrometric detection of chemically modified oligonucleotides

A further variant of this method uses a method known per se which enables mass spectrometric identification of certain sequences indirectly by detection of chemical modifications for the oligonucleotides.

In the mass spectrometric detection variant described above, a number of different primer extension reactions are performed, each of which contains one or several oligonucleotide sequences. In principle, the maximum number of different fragments that can be analyzed can only be achieved by a subdivision into a number of different reactions (and coordinates on the MALDI ion source).

If chemical modifications are used for each primer sequence, this separation can be omitted in the case of another analysis technique than MALDI alone.

In practice, this means that all the different primers used have this chemistry during or after synthesis; in principle, this chemistry meets two requirements. On the one hand, separation according to the length of the generated fragments cannot be prevented. On the other hand, in the second analysis step after capillary electrophoresis, the type of modification must be able to allow the identification of the separation according to length. Thus, the type of modification depends on the type of analysis in the second step. In a preferred embodiment variant, the 5' end of the primer has a short peptide sequence which can be isolated in a subsequent step by a variety of conventional analytical methods. One of the greatest advantages of this variant is that even in the first non-specific amplification step, a relatively small total amount of DNA must be amplified, since this amount no longer needs to be dispersed in other reactions. The second-dimensional separation, which is achieved in the above-described variant by separation to the individual reactions, can be achieved in a preferred variant of the method by implementation of the method according to the invention, in which the separation of the fragments produced is first carried out by capillary electrophoresis. In this connection, it is not essential for correct results that the chemical modification of the fragments influences or does not influence the migration behavior of the fragments, as long as a separation according to length is still possible. In each "fraction" of the capillary electrophoresis endpoint, a number of fragments with identical electrophoretic migration behavior were found, which differed only in the chemical modification of the respective 5' region (in the primer region, which was used for the extension reaction). These populations of fragments separated according to electrophoretic migration behavior are now checked in a second step for the presence of chemical modifications. A preferred variant of this method is the use of the capillary electrophoresis output volume for direct injection in high-speed atom bombardment (FAB-MS), electron-jet ionization (ESI-MS), application to a MALDI mass spectrometer or equivalent analysis apparatus.

Specifically, this modification is performed, for example, as follows. The reaction step of preparing DNA from cells as described, precutting with restriction endonucleases, adding adaptors, and passing through a heatable capillary permeable to small molecules, in which the bisulfite reaction is carried out by adding and removing reagents by dialysis. The volume of the total reaction is minute here. After completion of the bisulfite reaction, the capillary tube can be cross-linked to other inlet capillaries to obtain the reagents required for amplification, and amplification can then be performed in the same heatable capillary tube. However, it is also possible to carry out the amplification not directly in the capillaries but in the containers connected to these capillaries. After universal amplification of the genomic part, a second linear extension step is carried out as described, which is carried out with a mixture of chemically modified oligonucleotides, which are thus distinguishable by their weight and are complementary to the bisulphite modified adaptors. The next step is the separation according to the length of the extension product in another part of the capillary and, possibly, another dialysis against a buffer suitable for mass spectrometry, such as ammonium sulfate.

Each individual portion is added to the coordinates of the ion source of the mass spectrometer and each coordinate is then checked for the presence of chemical modifications, which are distinguished by their weight. In this variant, in order to be suitable for the apparatus, it is preferable to use a MALDI-TOF with an extremely large ion source, which makes it possible to apply an extremely large number of different coordinates in succession for a short time.

Other variants of the method can be obtained due to the fact that, in general, the method generates all the measurement points on a two-dimensional plane, which, as described herein, is a necessary condition for a detailed description of the number of measurement points. In the described variant of the analysis of the individual "subreactions" on a MALDI-TOF ion source, the two-dimensional measurement spots are spatially arranged on the DNA chip. In the capillary electrophoresis variant, two dimensions are achieved by the sequential connection of two separation methods distinguished by different criteria. There are several other variants of this method, which are to be protected as they correspond to the general concept according to the invention. For assays performed at a very large number of points in the method according to the invention, it is not absolutely necessary to know the origin of each assay point. For many applications of this approach, it is sufficient to correlate a large amount of abstract data with the phenotypic properties of the cells. As a result, there are a considerable number of possible analytical methods. However, capillary electrophoresis is generally necessary in all variants, in which the hybridization results occurring are detected by indirect methods (the result itself being one-dimensional to the analysis).

4.5 analysis of the Universal data

The main claims generally relate to methods for preparing complex methylated fingerprints and methods for correlating phenotypic characteristics of cells under examination using evaluation algorithms. However, the patent protection should also be applicable to all methods suitable for the generation of methylation data, which are aimed at evaluating these data according to the invention, since the generation and application of the combined data is a factor in actually achieving the level of inventive activity.

At the end of all the above method steps, a large number of measurement points are available. Three different types of values can be generated. The pure plus-minus signal of a position (present in methylated or unmethylated form on all analyzed chromosomes) may not constitute the largest fraction of detectable positions that can be methylated. A large number of locations will generate such signals and must be described in the above-mentioned manner.

In principle, the analysis of a pure plus-minus signal is rather simple. The analysis strategy is as follows. Data is generated in a number of assays from a variety of different DNA samples of known origin (e.g., antibody-labeled cells of the same phenotype isolated by immunofluorescence) and tested for reproducibility. Locations that do not produce reproducible results are separated from all other locations by logical methods because no evaluation is made in the first step to determine whether the differences in the individual locations are biologically significant. These test series were performed on different types of cells. The result of these test series should be a large number of CpG dinucleotides, still unknown today, that produce reproducible differences in methylation status compared to either pair of cell types. Not all positions that differ in the direct comparison of two cell types are informative in all comparisons regarding their difference. If all the positions distinguishable in at least one cell type comparison are now analyzed, a characteristic pattern can be established for each cell type tested. Thus, a DNA sample of unknown origin can be assigned to a cell type. These patterns need not be fixed in all test positions. At this point it is not possible to assess (for the first time in practice the method according to the invention provides a basis for such an assessment) to what extent the methylation pattern of the individual sample cell type deviates from the characteristic pattern.

Ideally, the pattern produced by each cell type and individual is fixed, so that such tissues can be identified without significant expenditure. The sample can then be directly assigned to a cell type by a predetermined matrix having coordinates specifying the characteristic signal. In the most complex case, there is not a definable signal pattern characteristic of one cell type, but rather a number of such patterns which are substantially characteristic, but clearly indistinguishable. In fact, this may be derived from the prior art methylation analysis, and it is possible that it appears that very different patterns have very similar functions. However, no statement can be made at this point as to the degree of difficulty, since the method of the invention in fact makes it possible for the first time to assess this situation. Thus, it may be the case that a sample cannot be designated as a source using conventional methods, like "visual inspection". In this case, the method includes the possibility of "training" a "neural network" (NN) with the data determined in the test series. In practice, this appears to be as follows: a very large series of experiments was performed with cellular DNA samples and injected with the input level of NN. At the same time, the methylation data of the samples was used to provide the NN with information about the origin of the samples. After a sufficient number of experiments, the neural network can know, for example, which pattern belongs to which cell type. Thus, these extremely complex and apparently unclear modes can be categorized, which seems to be completely confusing to human understanding and conventional algorithms.

However, as mentioned, it has not been possible to predict how complex and significantly confusing the resulting pattern is seemingly. Each of the described cases is possible. Therefore, in order to be able to classify all cell types of unknown origin, every method of assigning complex methylation patterns to cell types of known origin in a test series is an object of the present invention.

In the analysis of cells of abnormal origin, the analysis of data will certainly become more complex. The aim of the method is to allow classification of unknown disease cell types. Using the methylation data of the examined samples, the phenotypic parameters of the examined cells in the test series must be available to the NN and/or other evaluation systems, in which respect it is not at all clear which of these phenotypic data need to be correlated with the methylation pattern and which may yield reasonable data in such a correlation. In these cases, the difficulty arises in the amount of data that originates from the apparent confusion, although in principle classifiable. For degenerated cells, different epigenotypic states may result in similar phenotypic characteristics. These cases can be recognized particularly well by NNs and can then lead to the definition of new, precisely differentiated phenotypes, which is one of the main objectives of the method. Therefore, it is desirable in the patent protection to specifically include the use of different neural network types in methylation data analysis where methylation patterns are correlated with phenotypic data. However, the simpler cases also enable the subject of the invention, and they should therefore not be excluded from patent protection.

Claims

1. A method for the characterization, classification and identification of tissue and cell types for the prediction of the behaviour of tissue and cell populations and for the identification of genes with altered expression, characterized in that:

in genomic DNA obtained from any tissue sample, untreated, sheared or cleaved by restriction endonucleases, the base cytosine, but not 5-methylcytosine, is converted to uracil by treatment with a bisulfite solution,

amplifying the fraction of the genomic DNA thus treated using a very short oligonucleotide which is so short that more than 100 different fragments are amplified, or a degenerate oligonucleotide containing so many degenerate positions that more than 100 different fragments are amplified, or an oligonucleotide complementary to a linker oligonucleotide which has been ligated to the ends of the cut or sheared genomic DNA prior to bisulfite treatment,

the remaining cytosines on the guanine-rich DNA strands obtained from the amplified fraction, and/or the amount of guanines on the cytosine-rich DNA strands obtained from the amplified fraction are detected by hybridization or polymerase reactions, thereby enabling their characterization, classification and identification.

2. A method for the characterization, classification and identification of tissue and cell types for the prediction of the behaviour of tissue and cell populations and for the identification of genes with altered expression, characterized in that:

in genomic DNA obtained from any tissue sample that has been treated, sheared, or cleaved by a restriction endonuclease, the base cytosine, but not 5-methylcytosine, is converted to uracil by treatment with a bisulfite solution,

the amount of cytosine remaining on the guanine-rich DNA strand from the amplification fraction and/or the amount of guanine on the cytosine-rich DNA strand from the amplification fraction is detected by hybridization or polymerase reaction, and

the data generated by the analysis is automatically applied to processing algorithms, thereby drawing conclusions about the phenotype of the analyzed cellular material and thereby enabling its characterization, classification and identification.

3. A method according to claim 1, characterized in that:

data obtained from several or more such assays of DNA samples derived from phenotypically identical or similar cells or tissues is correlated during a training phase with the phenotype of the cells for which DNA was detected using a neural network or other evaluation algorithm;

The data taken in the evaluation mode during the training phase is used to identify the difference between the methylation status of cytosine positions in DNA of the known cell type determined during the training phase and the methylation status of cytosine positions in DNA detected, using the methylation status of DNA of the known cell type.

4. Method according to claim 1, characterized in that the DNA is cleaved with a restriction endonuclease containing the cytosine in the 5 ' -CpG-3 ' in the recognition sequence prior to the treatment with bisulfite and that the DNA is cleaved only at recognition sequences in which the cytosine in the 5 ' -CpG-3 ' is in unmethylated form at the 5 ' position.

5. A method according to claim 1, characterized in that:

cleaving the genomic DNA with a restriction endonuclease before modifying the genomic DNA with a bisulfite solution, providing a linker for the resulting terminus by a ligation reaction,

6. A method according to claim 1, characterized in that:

before the bisulfite treatment, the DNA sample is transferred to a heatable porous capillary which is permeable only to small molecules, wherein reagents are added and removed by dialysis, and the subsequent reaction step of the bisulfite treatment is carried out.

7. A method according to claim 1, characterized in that:

before the bisulfite treatment, the DNA sample is transferred into a heatable capillary which is impermeable to small molecules, wherein the subsequent reaction step of the bisulfite treatment is carried out by supplying reagents for adding and removing reagents through the connected capillary.

8. Method according to claim 6 or 7, characterized in that: the polymerase reaction after the bisulfite treatment is carried out in the same capillary as the bisulfite treatment, or in a capillary connected to the capillary, or in a vessel connected to the capillary.

9. The method of claim 8, wherein: in the capillary where the polymerase reaction is carried out with the bisulfite treated DNA sample, separation is again carried out according to the length of the resulting fragment population.

10. A method according to claim 1, characterized in that: after bisulfite treatment, the bisulfite treated DNA is separated from the bisulfite by precipitation of the bisulfite prior to DNA amplification.

11. A method according to claim 1, characterized in that:

oligonucleotides for the amplification of bisulfite-treated genomic DNA samples are two classes of primer oligonucleotides, wherein one class of oligonucleotides does not contain the base cytosine except in the 5 '-CpG-3' region, either only to a very small extent in amounts that do not affect amplification, or only in regions of the oligonucleotides that are not essential for amplification; while the other type of oligonucleotide does not contain the base guanine except in the 5 '-CpG-3', either only to a very small extent in amounts that do not affect the amplification or only in regions of the oligonucleotide that are not essential for the amplification, wherein the two types of primer oligonucleotides

a) Should be so short that in each amplification containing only one representative of each of the two classes, more than 100 different fragments are amplified, or

b) Contains so many so-called degenerate positions that in an amplification of only one representative of each of the two classes, more than 100 different fragments are amplified, or

c) So many representatives of both classes of oligonucleotides are used in one amplification that more than 100 different fragments are amplified.

12. A method according to claim 4 or 11, characterized in that:

in a separate preparation for detection of polymerase reactions, the treated and amplified DNA is mixed in each reaction with different oligonucleotides which are complementary at their 5' end to the adaptors or to the oligonucleotides used for amplification of the nucleic acids treated with bisulfite,

wherein the oligonucleotide differs at its 3' end in each reaction, and

the variable 3' end of which starts downstream of the known linker sequence or the oligonucleotide sequence treated with bisulfite,

the variable 3' end extends 2-12 nucleotides beyond the known linker sequence.

13. The method of claim 12, wherein:

these reactions in which the polymerase reaction assay is initiated with oligonucleotides complementary to bisulfite-treated DNA contain, in addition to the three nucleotides dATP, dTTP and dCTP:

additional nucleotides or nucleotide analogs that are complementary to the base cytosine and block any further extension of the strand after incorporation by the polymerase, or

Without nucleotides or nucleotide analogues complementary to the base cytosine.

14. The method of claim 12, wherein:

these reactions, in which the polymerase reaction assay is initiated with oligonucleotides complementary to bisulfite-treated DNA, contain, in addition to the three nucleotides dATP, dTTP and dGTP:

an additional nucleotide or nucleotide analogue which is complementary to the base guanine and blocks any further extension of the strand after incorporation by the polymerase, or

Without nucleotides or nucleotide analogues complementary to the base guanine.

15. The method of claim 12, wherein:

the polymerase reaction detection is terminated at the positions containing methylcytosine in the DNA sample prior to amplification using a terminator, which is itself modified in such a way that it allows the detection of specifically terminated polymerase reaction products.

16. A method according to claim 1, characterized in that:

the different fragment mixtures produced by each reaction are smeared at various points of the ion source of a MALDI-TOF or another mass spectrometer and the fragment composition of each reaction is determined by determining the weight of all DNA fragments.

17. A method according to claim 1, characterized in that:

the different fragment mixtures produced by each reaction were applied to each lane in gel electrophoresis, and the fragment composition of each reaction was determined by determining the length of all DNA fragments.

18. The method according to claim 5, characterized in that: the oligonucleotides with which the polymerase reaction is initiated are coupled to different chemical labels depending on the sequence of the oligonucleotide, wherein their chemical and/or physical properties allow the detection and differentiation of the different labels by standard chromatographic or mass spectrometry methods.

19. The method of claim 18, wherein:

the fragment fraction of the DNA to be examined which has been treated with bisulfite and prepared in the first amplification step is simultaneously mixed with two or more chemically differently labeled oligonucleotides which serve as primers for the detection of the polymerase reaction in the reaction preparation, the complex mixture of fragments resulting from said polymerase reaction is subjected to electrophoretic separation according to length, and

20. A method according to claim 1, characterized in that:

in the hybridization for detection, the following oligonucleotides are applied to one surface:

which does not contain the base cytosine, or is contained only in 5 '-CpG-3' or only in regions which are not essential for hybridization with sample DNA,

alternatively, it does not contain the base guanine, or is contained only in 5 '-CpG-3', or in a region not essential for hybridization with the sample DNA.

21. The method of claim 20, wherein:

hybridizing a bisulfite treated and amplified DNA sample with oligonucleotides, wherein the oligonucleotides are immobilized on a surface such that the oligonucleotides can be positioned at specific points on the surface, and wherein hybridization of the amplified sample DNA with the immobilized oligonucleotides occurs only if the oligonucleotides and the sample DNA are fully complementary in the regions necessary for hybridization.