WO2012059563A1

WO2012059563A1 - Genomic analysis method

Info

Publication number: WO2012059563A1
Application number: PCT/EP2011/069374
Authority: WO
Inventors: Hugues Roest Crollius; Xavier Darzacq; Leila Muresan; Jean-François MARIET
Original assignee: Centre National de la Recherche Scientifique CNRS; Institut National de la Sante et de la Recherche Medicale INSERM; Ecole Normale Superieure de Paris; Vivatech Co
Current assignee: Centre National de la Recherche Scientifique CNRS; Institut National de la Sante et de la Recherche Medicale INSERM; Ecole Normale Superieure de Paris; Vivatech Co
Priority date: 2010-11-03
Filing date: 2011-11-03
Publication date: 2012-05-10
Anticipated expiration: 2013-05-03
Also published as: FR2966844A1

Abstract

The invention relates to a method for obtaining a specific signature of a DNA molecule. This method makes it possible in particular to obtain signatures for long DNA fragments. It also makes it possible to analyse and compare genomic differences between various individuals.

Description

Procédé d'analyse génomique Genomic analysis method

L'invention concerne un procédé d'obtention d'une signature spécifique d'une molécule d'ADN. Ce procédé permet notamment l'obtention de signatures pour de longs fragments d'ADN. Il permet également l'analyse et la comparaison des différences génomiques entre différents individus. The invention relates to a method for obtaining a specific signature of a DNA molecule. This process notably makes it possible to obtain signatures for long DNA fragments. It also allows the analysis and comparison of genomic differences between different individuals.

Introduction: Introduction:

A l'heure actuelle, dans le domaine du diagnostic médical lié aux altérations génétiques structurales (en anglais Copy Number Variants, CNV), les besoins sont principalement satisfaits par deux approches complémentaires: la technologie d'hybridation génomique comparative (ou "Comparative Genomic Hybridisation", (CGH)) ou la technologie de séquençage de génome individuel. La CGH sur chromosomes en métaphase est relativement peu coûteuse mais présente des limites. Le débit et la précision de cette technique sont faibles. Seules les modifications chromosomiques non équilibrées de plus de 1 millions de bases sont détectées. Des aberrations chromosomiques structurales telle que les translocations réciproques ou les inversions ne seront pas détectées. De plus, cette technique ne donne pas d'information concernant la ploïdie. Ainsi, des tétraploïdes ne présentant pas de réarrangements déséquilibrés apparaîtront normaux. Il existe également une technique de CGH sur fragments d'ADN fixés sur une puce (ou CGH array). Cette technologie est plus résolutive (jusqu'à 5 ou 10 kb) mais elle ne peut découvrir d'anomalie impliquant de l'ADN qui n'est pas présent sur la puce, et elle est insensible aux translocations et aux inversions. At the present time, in the field of medical diagnosis related to structural copy genetic alterations (CNV), the needs are mainly met by two complementary approaches: comparative genomic hybridization technology (or "Comparative Genomic Hybridization"). ", (CGH)) or the individual genome sequencing technology. CGH on metaphase chromosomes is relatively inexpensive but has limitations. The flow rate and accuracy of this technique are low. Only unbalanced chromosomal changes of more than 1 million bases are detected. Structural chromosomal aberrations such as reciprocal translocations or inversions will not be detected. In addition, this technique does not provide information about ploidy. Thus, tetraploids without unbalanced rearrangements will appear normal. There is also a CGH technique on DNA fragments attached to a chip (or CGH array). This technology is more resolutive (up to 5 or 10 kb) but it can not discover any anomaly involving DNA that is not present on the chip, and it is insensitive to translocations and inversions.

Plus récemment, des technologies de séquençage à très haut débit ont permis dans des cas isolés, en recherche fondamentale, de séquencer l'ADN du génome de patients. Les fragments obtenus par séquençage sont très courts (de 35 à 300 pb) et doivent être alignés sur un génome de référence pour retrouver leurs positions respectives. Les anomalies de type CNV sont identifiées en comptant le nombre de fragments alignés à une position et en le comparant au nombre moyen de lecture alignés par base dans le génome entier. Si le nombre de fragments alignés sur une région correspond à 0% ou 50% de cette moyenne, on diagnostiquera une délétion complète (homozygote) ou partielle (hétérozygote) de la région. Si le nombre correspond à 150% ou plus, on conclura à une duplication. En pratique cette approche est limitée par la difficulté à tenir compte des biais de séquençage, qui biaisent localement le nombre de fragments obtenus. De plus, seulement 60% environ des régions du génome correspondant aux gènes et 45% des autres régions peuvent êtres "vues" par cette technologie. Finalement le coût du séquençage d'un génome humain reste encore prohibitif. More recently, very high throughput sequencing technologies have made it possible, in isolated cases, in basic research, to sequence the genome DNA of patients. The fragments obtained by sequencing are very short (from 35 to 300 bp) and must be aligned on a reference genome to find their respective positions. CNV abnormalities are identified by counting the number of aligned fragments at one position and comparing it to the average number of reads aligned per base in the entire genome. If the number of fragments aligned to a region is 0% or 50% of this average, a complete (homozygous) or partial (heterozygous) deletion of the region will be diagnosed. If the number corresponds to 150% or more, we will conclude to a duplication. In practice this This approach is limited by the difficulty of taking into account sequencing biases, which locally bias the number of fragments obtained. In addition, only about 60% of the regions of the genome corresponding to the genes and 45% of the other regions can be "seen" by this technology. Finally, the cost of sequencing a human genome is still prohibitive.

Des procédés basés sur l'utilisation d'enzymes reconnaissant des séquences spécifiques d'ADN ont également été proposés pour produire des signatures de molécules d'ADN. On peut notamment citer l'étude de Neely et al, 2010. Cet article concerne un procédé de cartographie de l'ADN du bactériophage λ par marquage fluorescent au niveau de séquences consensus reconnues par la méthyltransférase M. Hhal. L'ADN marqué est ensuite immobilisé et la fluorescence est observée au microscope pour établir une signature qualifiée de "fluorocode" par les auteurs de l'article. Cette technique ne permet pas d'obtenir une signature spécifique d'une molécule d'ADN, puisqu'un même fluorocode pourrait être obtenu pour différentes molécules d'ADN, par exemples des molécules d'ADN avec des séquences consensus dans des positions relatives équivalentes. Elle nécessite en outre plusieurs "lectures" de la molécule d'ADN marquée pour obtenir un fluorocode. Methods based on the use of enzymes recognizing specific DNA sequences have also been proposed to produce DNA molecule signatures. One particular example is the study by Neely et al, 2010. This article concerns a method for mapping bacteriophage λ DNA by fluorescent labeling at the level of consensus sequences recognized by the methyltransferase M. Hhal. The labeled DNA is then immobilized and the fluorescence is observed under a microscope to establish a signature termed "fluorocode" by the authors of the article. This technique does not make it possible to obtain a specific signature of a DNA molecule, since the same fluorocode could be obtained for different DNA molecules, for example DNA molecules with consensus sequences in equivalent relative positions. . It further requires several "reads" of the labeled DNA molecule to obtain a fluorocode.

Il serait donc avantageux de disposer d'un procédé permettant l'obtention d'une signature spécifique d'un ADN donné. De préférence, un tel procédé permettrait l'étude de longues molécules d'ADN. Il serait également avantageux de disposer de techniques permettant le séquençage à l'échelle de la molécule unique. It would therefore be advantageous to have a method for obtaining a specific signature of a given DNA. Preferably, such a method would allow the study of long DNA molecules. It would also be advantageous to have techniques for sequencing at the level of the single molecule.

Résumé de l'invention Summary of the invention

La présente demande décrit un procédé d'obtention d'une signature spécifique d'une molécule d'ADN. Selon l'invention, deux molécules différentes auront des signatures différentes. A une séquence d'ADN correspond une signature spécifique unique. La signature obtenue par le procédé selon l'invention permet donc de reconnaître de manière spécifique et reproductible des molécules d'ADN identiques et de distinguer des molécules d'ADN différentes. Un premier objet de l'invention concerne donc un procédé d'obtention d'une signature spécifique d'une molécule d'ADN, comprenant: The present application describes a method for obtaining a specific signature of a DNA molecule. According to the invention, two different molecules will have different signatures. A DNA sequence corresponds to a unique specific signature. The signature obtained by the method according to the invention thus makes it possible to specifically and reproducibly recognize identical DNA molecules and to distinguish different DNA molecules. A first object of the invention therefore relates to a method for obtaining a specific signature of a DNA molecule, comprising:

a) le marquage d'une molécule d'ADN - par incorporation d'au moins un nucléotide modifié au cours d'une réaction de polymérisation d'ADN, le nucléotide modifié étant détectable, ou étant susceptible d'être modifié de manière à être détectable; ou a) the labeling of a DNA molecule by incorporating at least one modified nucleotide during a DNA polymerization reaction, the modified nucleotide being detectable, or being capable of being modified so as to be detectable; or

- par hybridation d'un ou plusieurs oligonucléotides marqués, lesdits oligonucléotides étant de taille comprise entre 10 et 15 nucléotides; ou by hybridization of one or more labeled oligonucleotides, said oligonucleotides being between 10 and 15 nucleotides in size; or

- par mise en contact de l'ADN avec une molécule intercalante fluorescente ; by contacting the DNA with a fluorescent intercalating molecule;

b) l'immobilisation et l'étirement de l'ADN marqué sur une surface solide ; b) immobilizing and stretching the labeled DNA on a solid surface;

c) la détection du marquage le long de l'ADN au moyen d'un dispositif optique adapté; c) detecting the labeling along the DNA by means of a suitable optical device;

moyennant quoi une signature spécifique de ladite séquence ADN est obtenue. whereby a specific signature of said DNA sequence is obtained.

Un autre objet de l'invention concerne un procédé pour l'identification d'une molécule d'ADN, le procédé comprenant: Another subject of the invention relates to a method for the identification of a DNA molecule, the method comprising:

i) l'obtention de la signature de ladite molécule d'ADN selon le procédé décrit ci-dessus; et ii) la comparaison de la signature obtenue à l'étape i) à des signatures prédéterminées de molécules d'ADN de référence; i) obtaining the signature of said DNA molecule according to the method described above; and ii) comparing the signature obtained in step i) with predetermined signatures of reference DNA molecules;

la comparaison permettant d'identifier la molécule d'ADN. the comparison to identify the DNA molecule.

L'invention a également pour objet un procédé d'identification d'une anomalie génomique chez un sujet, ledit procédé comprenant: The invention also relates to a method for identifying a genomic anomaly in a subject, said method comprising:

i) l'obtention de la signature moléculaire d'une molécule d'ADN génomique d'un tissu ou d'une cellule d'un sujet, selon le procédé d'obtention décrit ci-dessus; i) obtaining the molecular signature of a genomic DNA molecule of a tissue or cell of a subject, according to the obtaining method described above;

ii) la comparaison de la signature obtenue à l'étape i) aux signatures prédéterminées de molécules d'ADN de référence provenant d'un sujet sain afin d'identifier ladite molécule d'ADN génomique ; ii) comparing the signature obtained in step i) with the predetermined signatures of reference DNA molecules from a healthy subject to identify said genomic DNA molecule;

une anomalie génomique étant détectée si la signature de la d'ADN génomique du sujet diffère des signatures des molécules d'ADN de référence. a genomic abnormality being detected if the signature of the genomic DNA of the subject differs from the signatures of the reference DNA molecules.

Selon une première variante, le procédé d'identification d'une anomalie génomique mentionné ci-dessus comprend en outre: According to a first variant, the method for identifying a genomic anomaly mentioned above further comprises:

- l'obtention à l'étape i) de la signature de plusieurs fragments d'ADN génomique et leur concaténation sur la base de chevauchement de signatures; obtaining in step i) the signature of several genomic DNA fragments and their concatenation on the basis of overlapping signatures;

- le(s) concaténât étant ensuite comparé(s) aux séquences de références pour identifier le(s)dit(s) concaténât; une anomalie génomique étant détectée si un concaténât diffère de la molécule d'ADN de référence correspondante. the concatenate (s) being then compared with the reference sequences to identify the said concatenate (s); a genomic abnormality being detected if a concatenate differs from the corresponding reference DNA molecule.

Selon une seconde variante, le procédé d'identification d'une anomalie génomique mentionné ci-dessus est utilisé pour identifier une molécule d'ADN complémentaire, la signature dudit ADN complémentaire étant comparée aux signatures d'un ensemble de molécule d'ADN complémentaire de référence. According to a second variant, the method for identifying a genomic anomaly mentioned above is used to identify a complementary DNA molecule, the signature of said complementary DNA being compared to the signatures of a set of complementary DNA molecule. reference.

En outre, l'invention a pour objet un procédé d'obtention de la séquence nucléotidique d'une molécule d'ADN comprenant: In addition, the subject of the invention is a process for obtaining the nucleotide sequence of a DNA molecule comprising:

a) le marquage d'une molécule d'ADN par incorporation d'au moins un nucléotide modifié au cours d'une ou plusieurs réactions indépendantes de polymérisation d'ADN, le(s) nucléotide(s) modifié(s) étant détectable(s), ou étant susceptible(s) d'être modifié(s) de manière à être détectable(s); a) the labeling of a DNA molecule by incorporation of at least one modified nucleotide during one or more independent DNA polymerization reactions, the modified nucleotide (s) being detectable ( s), or likely to be modified to be detectable;

b) l'immobilisation et l'étirement de l'ADN marqué sur une surface solide; b) immobilizing and stretching the labeled DNA on a solid surface;

c) la détection du marquage au moyen d'un dispositif optique et d'un moyen de traitement adapté pour obtenir une information de résolution inférieure ou égale à 0,5 nm. c) the detection of the marking by means of an optical device and a processing means adapted to obtain a resolution information less than or equal to 0.5 nm.

Selon un mode particulier de réalisation de cet objet, le moyen de détection utilisé est la technique TIRF couplée à un système STORM. According to a particular embodiment of this object, the detection means used is the TIRF technique coupled to a STORM system.

L'invention concerne par ailleurs un procédé de tri de molécules d'ADN différentes comprises dans un mélange, le procédé comprenant: The invention furthermore relates to a method for sorting different DNA molecules contained in a mixture, the method comprising:

a) le marquage des molécules d'ADN a) the labeling of the DNA molecules

- par incorporation au cours d'une réaction de polymérisation d'au moins un nucléotide modifié détectable ou susceptible d'être modifié de manière à être détectable; ou by incorporating during the course of a polymerization reaction at least one modified nucleotide detectable or capable of being modified so as to be detectable; or

- par hybridation d'un ou plusieurs oligonucléotides marqués, lesdits oligonucléotides étant de taille comprise entre 10 et 15 nucléotides ; ou by hybridization of one or more labeled oligonucleotides, said oligonucleotides being between 10 and 15 nucleotides in size; or

- par mise en contact de l'ADN avec une molécule intercalante fluorescente; by contacting the DNA with a fluorescent intercalating molecule;

b) l'immobilisation et l'étirement des molécules d'ADN marquées sur une surface solide; b) immobilizing and stretching the labeled DNA molecules on a solid surface;

c) la détection du marquage le long des molécules d'ADN au moyen d'un dispositif optique adapté; moyennant quoi un ensemble de profils lumineux correspondant aux différentes molécules d'ADN est obtenu, les différentes molécules d'ADN étant discriminées selon leur profil lumineux. Description détaillée de l'invention c) detecting the labeling along the DNA molecules by means of a suitable optical device; whereby a set of light profiles corresponding to the different DNA molecules is obtained, the different DNA molecules being discriminated according to their light profile. Detailed description of the invention

Selon un premier aspect, l'invention concerne un procédé d'obtention d'une signature spécifique d'une molécule d'ADN. Ce procédé comprend: According to a first aspect, the invention relates to a method for obtaining a specific signature of a DNA molecule. This process comprises:

a) le marquage d'une molécule d'ADN a) the labeling of a DNA molecule

Selon les procédés de marquage et d'immobilisation utilisées, la séquence des étapes a) et b) peut être inversée. Ainsi, dans une variante, le marquage précède l'immobilisation et l'étirement de la molécule d'ADN. Dans une seconde variante, l'immobilisation et l'étirement de la molécule d'ADN précède le marquage de ladite molécule d'ADN, en particulier dans le cas d'un marquage de la molécule d'ADN par hybridation avec des oligonucléotides marqués. According to the marking and immobilization methods used, the sequence of steps a) and b) can be reversed. Thus, in one embodiment, the labeling precedes immobilization and stretching of the DNA molecule. In a second variant, the immobilization and stretching of the DNA molecule precedes the labeling of said DNA molecule, in particular in the case of a labeling of the DNA molecule by hybridization with labeled oligonucleotides.

Dans le cadre de la présente invention, le terme "signature d'un ADN" désigne le marquage spécifique obtenu le long dudit ADN après lecture optique du signal émis par le marqueur employé à l'étape a) du procédé selon l'invention. Comme nous le décrivons ci-dessous, la signature peut notamment correspondre: In the context of the present invention, the term "signature of a DNA" designates the specific marking obtained along said DNA after optical reading of the signal emitted by the marker employed in step a) of the process according to the invention. As we describe below, the signature can include:

- à la densité de marquage le long de la molécule d'ADN, convertie en unités de fluorescence, ou at the labeling density along the DNA molecule, converted into fluorescence units, or

- à la séquence de distances successives entre les positions des molécules fluorescentes le long des molécules d'ADN, déterminées par utilisation d'une fonction PSF. the sequence of successive distances between the positions of the fluorescent molecules along the DNA molecules determined by using a PSF function.

Ladite signature obtenue selon l'invention est "spécifique" en ce sens que, dépendant de la séquence primaire de ladite molécule d'ADN, la probabilité que deux molécules d'ADN puissent avoir la même signature est faible, voire nulle, contrairement aux signatures obtenues par des techniques reposant sur la reconnaissance de séquences consensus. Said signature obtained according to the invention is "specific" in that, depending on the primary sequence of said DNA molecule, the probability that two DNA molecules can have the same signature is weak, or even null, unlike the signatures obtained by techniques based on the recognition of consensus sequences.

Molécule d'ADN étudiée DNA molecule studied

La molécule d'ADN soumise au procédé selon l'invention peut être toute molécule d'ADN pour laquelle l'obtention d'une signature spécifique est souhaitable. Elle peut provenir d'une cellule d'un organisme unicellulaire ou pluricellulaire, et notamment être extraite d'un échantillon de cellules ou tissu d'un organisme pluricellulaire (par exemple échantillon de cheveu, de peau, de sang, etc.) La molécule d'ADN à étudier peut en particulier être une molécule de séquence inconnue. The DNA molecule subjected to the process according to the invention may be any DNA molecule for which obtaining a specific signature is desirable. It can come from a cell of a unicellular or multicellular organism, and in particular be extracted from a sample of cells or tissue of a multicellular organism (for example a sample of hair, skin, blood, etc.). of DNA to be studied may in particular be a molecule of unknown sequence.

L'ADN soumis au procédé selon l'invention peut être une molécule d'ADN, notamment une molécule d'ADN génomique, extraite d'une cellule eucaryote (par exemple une cellule de plante, de champignon, d'animal, notamment de mammifère, notamment une cellule humaine) ou procaryote, ou d'un virus, par exemple un virus de mammifère à génome ADN, notamment un virus humain, ou un bactériophage à génome ADN. Le procédé selon l'invention peut également être appliqué à une molécule d'ADN complémentaire. La molécule d'ADN à étudier peut également être une molécule d'ADN isolé ou contenue dans une banque d'ADN. The DNA subjected to the process according to the invention may be a DNA molecule, in particular a genomic DNA molecule, extracted from a eukaryotic cell (for example a plant, fungus or animal cell, in particular a mammalian cell). , in particular a human) or prokaryotic cell, or a virus, for example a mammalian virus with a DNA genome, in particular a human virus, or a bacteriophage with a DNA genome. The method according to the invention can also be applied to a complementary DNA molecule. The DNA molecule to be studied may also be a DNA molecule isolated or contained in a DNA library.

L'ADN génomique peut notamment provenir d'un patient susceptible d'être atteint d'une maladie impliquant une anomalie dans son génome, par exemple une maladie génétique ou un cancer. L'ADN génomique peut également provenir d'un sujet sain, notamment en vue d'une mise en oeuvre du procédé selon l'invention pour obtenir les signatures de molécules d'ADN "de référence" (cf. ci-dessous). The genomic DNA can in particular come from a patient likely to be suffering from a disease involving an abnormality in his genome, for example a genetic disease or a cancer. The genomic DNA may also come from a healthy subject, in particular with a view to carrying out the process according to the invention to obtain the signatures of "reference" DNA molecules (see below).

Le procédé selon l'invention peut notamment comprendre, dans un mode particulier de réalisation, l'extraction préalable de l'ADN à partir des cellules de l'organisme dont on souhaite analyser le génome. A titre illustratif, dans le cas de mammifères (par exemple l'homme ou un autre animal), l'ADN peut être extrait de lymphocytes contenus dans un échantillon de sang. Les techniques de prélèvement de tissus ou cellules (notamment par biopsie, frottis, prélèvement sanguin, ...) et d'extraction de l'ADN sont bien connues de l'homme du métier. L'ADN génomique peut notamment être extrait par lyse alcaline des cellules en prenant les précautions nécessaires lors de la manipulation pour minimiser la fragmentation des molécules d'ADN. En effet, selon un mode particulier de réalisation, les molécules d'ADN soumises au procédé selon l'invention sont des molécules de haut poids moléculaire, de préférence de taille supérieure à 50 kb, en particulier supérieure à 60 kb, en particulier supérieure à 100 kb, plus particulièrement supérieure à 200 kb. L'homme du métier pourra prendre en compte la densité du marquage obtenu pour déterminer la taille de fragment la plus adaptée. Ainsi, une signature spécifique peut notamment être obtenue pour un fragment de 60kb pour une densité de marquage de 50 % de l'une des quatre bases de l'ADN (A, T, C ou G), c'est-à-dire pour un marquage de 12,5 % de la molécule d'ADN en moyenne. L'homme du métier connaît les techniques permettant d'obtenir des molécules d'ADN de taille comprise adéquate. Par exemple, cette opération peut être réalisée en isolant les cellules dans des blocs d'agarose dans lesquels diffusent les solutions de lyse cellulaire et de digestion de l'ADN, suivi par une séparation des molécules d'ADN par électrophorèse, la découpe d'un bloc d'agarose correspondant à la taille souhaitée et la digestion de l'agarose par une agarase. The method according to the invention may in particular comprise, in a particular embodiment, the prior extraction of the DNA from the cells of the organism whose genome is to be analyzed. As an illustration, in the case of mammals (eg humans or other animals), the DNA can be extracted from lymphocytes contained in a blood sample. Tissue or cell harvesting techniques (especially by biopsy, smear, blood sampling, etc.) and DNA extraction are well known to those skilled in the art. The genomic DNA can in particular be extracted by alkaline lysis of the cells by taking the necessary precautions during handling to minimize the fragmentation of the DNA molecules. According to a particular embodiment, the DNA molecules subjected to the process according to the invention are molecules of high molecular weight, preferably of size greater than 50 kb, in particular greater than 60 kb, in particular greater than 100 kb, more particularly greater than 200 kb. Those skilled in the art can take into account the density of the marking obtained to determine the most suitable fragment size. Thus, a specific signature can in particular be obtained for a 60 kb fragment for a labeling density of 50% of one of the four bases of the DNA (A, T, C or G), that is to say for a labeling of 12.5% of the DNA molecule on average. Those skilled in the art are aware of the techniques for obtaining DNA molecules of suitable size. For example, this operation can be carried out by isolating the cells in agarose blocks in which the solutions of cell lysis and DNA digestion are diffused, followed by a separation of the DNA molecules by electrophoresis, the cutting out of the cells. an agarose block corresponding to the desired size and agarose digestion with agarase.

L'ADN à étudier peut être ancré ou non ancré à la surface solide avant étirement. Dans un mode particulier de réalisation, l'ADN à étudier est modifié de manière à permettre son immobilisation sur une surface solide (par exemple en exploitant le système biotine- streptavidine). L'ancrage par une extrémité peut avantageusement permettre d'obtenir une distribution particulière des molécules sur la surface solide. The DNA to be studied can be anchored or not anchored to the solid surface before stretching. In a particular embodiment, the DNA to be studied is modified so as to allow it to be immobilized on a solid surface (for example by exploiting the biotin-streptavidin system). Anchoring at one end may advantageously make it possible to obtain a particular distribution of the molecules on the solid surface.

Selon un autre mode de réalisation, les molécules d'ADN sont peignées « telles quelles » à partir d'une solution d'ADN non modifié à l'une de ses extrémités (étant entendu que l'ADN peut être un ADN modifié à l'une de ses extrémités, mais la modification n'étant pas réalisée en vue de son ancrage sur la surface). According to another embodiment, the DNA molecules are combed "as is" from an unmodified DNA solution at one of its ends (it being understood that the DNA may be DNA modified at the one of its ends, but the modification is not carried out for anchoring on the surface).

Marquage de l'ADN DNA labeling

Selon un mode particulier de réalisation, l'ADN est marqué par incorporation d'analogues de nucléotides au cours d'une réaction de polymérisation, selon des techniques bien connues de l'homme du métier. Ces techniques mettent en oeuvre une ADN polymérase. According to a particular embodiment, the DNA is labeled by incorporation of nucleotide analogues during a polymerization reaction, according to techniques well known to those skilled in the art. These techniques use a DNA polymerase.

L'homme du métier a plusieurs techniques à sa disposition pour réaliser ce marquage. On peut ainsi notamment marquer l'ADN par synthèse in vitro en utilisant une ADN polymérase recombinante. Des exemples d'ADN polymérases utilisables sont le fragment Klenow de l'ADN polymérase I de Escherichia coli, les polymérases thermorésistantes Taq et Pfu et leur variantes commerciales, la polymérase Phi29. De préférence, on utilise une ADN polymérase suffisamment processive pour incorporer des nucléotides modifiés dans un ADN de haut poids moléculaire. De préférence, l'ADN polymérase utilisée est une polymérase à haute fidélité. L'ADN polymérase Phi29 disponible à la vente notamment chez New England Biolabs est un exemple de polymérase qui réunit ces propriétés et qui peut être utilisée dans le cadre de la présente invention. Selon un autre mode avantageux de réalisation, l'incorporation de nucléotides modifiés dans de l'ADN double brin, notamment de nucléotides couplés à un fluorophore, est mise en œuvre en utilisant une ADN polymérase particulièrement tolérante aux nucléotides modifiés. Par exemple la forme mutée de la polymérase de la famille des polB extraite de la bactérie Pyrococcus furiousus (Pfu), appelée E10, est susceptible de substituer jusqu'à 100% de cytosines par des cytosines couplées au fluorophore Cy3, comme décrit dans Ramsay et col. Cette Pfu polymérase E10 peut bien entendu être utilisée pour incorporer d'autres nucléotides marqués, notamment des nucléotides marqués par d'autres marqueurs fluorescents que le Cy3, notamment les autres marqueurs cités dans la présente demande. L'incorporation de nucléotides modifiés dans les molécules d'ADN double brin (par exemple un plasmide circulaire, chromosome bactérien circulaire, ADN génomique ou plasmidique linéarisé ou autre) peut ainsi notamment se faire selon l'un des protocoles détaillés fournis dans les exemples 1 à 5. Selon un mode particulier de réalisation, la synthèse in vitro est mise en œuvre en réalisant un seul cycle de synthèse, de manière à obtenir des molécules d'ADN marquées sur un seul de leurs deux brins. The skilled person has several techniques at his disposal to carry out this marking. In particular, it is possible to label the DNA by in vitro synthesis using a recombinant DNA polymerase. Examples of usable DNA polymerases are the Klenow fragment of Escherichia coli DNA polymerase I, Taq and Pfu heat-resistant polymerases and their commercial variants, Phi29 polymerase. Preferably, a sufficiently processive DNA polymerase is used to incorporate modified nucleotides into high molecular weight DNA. Preferably, the DNA polymerase used is a high fidelity polymerase. The Phi29 DNA polymerase available for sale, in particular from New England Biolabs, is an example of a polymerase which combines these properties and which can be used in the context of the present invention. According to another advantageous embodiment, the incorporation of modified nucleotides in double-stranded DNA, especially nucleotides coupled to a fluorophore, is implemented using a DNA polymerase particularly tolerant to modified nucleotides. For example, the mutated form of the polB family polymerase extracted from the bacterium Pyrococcus furiousus (Pfu), designated E10, is capable of substituting up to 100% of cytosines by cytosines coupled to the Cy3 fluorophore, as described in Ramsay and collar. This E10 Pfu polymerase can of course be used to incorporate other labeled nucleotides, especially nucleotides labeled with other fluorescent markers than Cy3, especially the other markers mentioned in the present application. The incorporation of modified nucleotides in the double-stranded DNA molecules (for example a circular plasmid, circular bacterial chromosome, genomic or linearized plasmid DNA or the like) can thus be carried out according to one of the detailed protocols provided in Examples 1. According to a particular embodiment, the in vitro synthesis is carried out by carrying out a single synthesis cycle, so as to obtain labeled DNA molecules on only one of their two strands.

L'incorporation peut être réalisée à partir d'extraits de cytosol de cellules en culture, de préférence de cellules humaines, contenant les composants de la machinerie réplicative. Typiquement, ces composants sont extraits à partir de cellules (type cellules HeLa) en culture, sont complétés avec les nucléotides modifiés que l'on souhaite incorporer dans l'ADN et mis en contact avec de l'ADN, par exemples de l'ADN génomique, que l'on souhaite marquer. Ces techniques sont connues de l'homme du métier (par exemple Marheineke et al, 2009). Selon un autre mode de réalisation de l'invention, l'ADN que l'on souhaite marquer est de l'ADN situé dans le noyau de cellules en culture, dans lesquels des nucléotides fluorescents sont introduits par micro-injection. Les nucléotides modifiés sont alors incorporés dans l'ADN in vivo par la machinerie réplicative native des cellules. Ces techniques sont connues de l'homme du métier (Zink et al., 1998). Incorporation can be carried out from cytosol extracts of cells in culture, preferably human cells, containing the components of the replicative machinery. Typically, these components are extracted from cells (HeLa cell type) in culture, are supplemented with the modified nucleotides that it is desired to incorporate into the DNA and brought into contact with DNA, for example DNA genomics, which one wishes to mark. These techniques are known to those skilled in the art (eg Marheineke et al., 2009). According to another embodiment of the invention, the DNA which one wishes to label is DNA located in the nucleus of cells in culture, in which fluorescent nucleotides are introduced by microinjection. The modified nucleotides are then incorporated into the DNA in vivo by the native replicative machinery of the cells. These techniques are known to those skilled in the art (Zink et al., 1998).

La réaction de polymérisation est réalisée en présence des quatre bases de l'ADN dATP, dTTP, dGTP et dCTP. Selon un mode de réalisation, l'un ou plusieurs des nucléotides peuvent être remplacés par un analogue. Par exemple le dUTP peut remplacer le dTTP. Selon l'invention, l'étape de marquage peut être réalisée par incorporation d'au moins un analogue d'un nucléotide, ledit analogue étant soit un analogue fluorescent, soit un analogue susceptible d'être modifié pour le rendre fluorescent. Selon une variante, la polymérisation est réalisée en présence des quatre bases de l'ADN, complétées du ou des analogue(s) à incorporer dans l'ADN. Selon une autre variante, le ou les analogue(s) à incorporer sont utilisés en remplaçant totalement la base correspondante non modifiée dans le mélange réactionnel. The polymerization reaction is carried out in the presence of the four bases of dATP DNA, dTTP, dGTP and dCTP. According to one embodiment, one or more of the nucleotides may be replaced by an analogue. For example, the dUTP can replace the dTTP. According to the invention, the labeling step may be carried out by incorporating at least one nucleotide analogue, said analogue being either a fluorescent analogue or an analogue capable of being modified to make it fluorescent. According to one variant, the polymerization is carried out in the presence of the four bases of the DNA, supplemented with the analogue (s) to be incorporated in the DNA. According to another variant, the analog (s) to be incorporated are used by completely replacing the corresponding unmodified base in the reaction mixture.

Ainsi, selon un mode particulier de réalisation, le procédé selon l'invention comprend le marquage de l'ADN avec au moins un analogue fluorescent d'un nucléotide choisi parmi dATP, dTTP (ou dUTP), dGTP et dCTP. Selon une variante spécifique de ce mode de réalisation, un seul analogue fluorescent est employé. Lorsque plusieurs analogues différents de différents nucléotides sont employés, le marqueur fluorescent utilisé est identique ou différent pour chacun de ces analogues. De préférence le marqueur fluorescent est différent pour chacun des analogues utilisés. Thus, according to a particular embodiment, the method according to the invention comprises the labeling of the DNA with at least one fluorescent analogue of a nucleotide selected from dATP, dTTP (or dUTP), dGTP and dCTP. According to a specific variant of this embodiment, a single fluorescent analogue is employed. When several different analogs of different nucleotides are used, the fluorescent label used is identical or different for each of these analogs. Preferably the fluorescent label is different for each of the analogs used.

Les analogues fluorescents de nucléotides sont bien connus de l'homme du métier et sont disponibles dans le commerce. Tout fluorophore susceptible d'être lié à un nucléotide soit directement, soit par l'intermédiaire d'un espaceur (par exemple (CH₂)_n) peut être utilisé dans la présente invention. On peut citer notamment les analogues de nucléotides marqués à la diéthylaminocoumarine, la fluorescéine, une cyanine, notamment la cyanine 3 ou la cyanine 5, la tétraméthylrhodamine, la Lissamine®, au Texas Red®, à l'alexa ou encore à un marqueur de la famille ATTO (notamment ATT0647N, ATTO 550, ATT0565 disponibles chez Jena BioSciences couplés à des nucléotides (ou analogues) selon différentes variantes). Différents fournisseur commercialisent ce type d'analogues fluorescents, notamment Perkin Elmer, Amersham, etc. Fluorescent nucleotide analogs are well known to those skilled in the art and are commercially available. Any fluorophore capable of being bound to a nucleotide either directly or via a spacer (e.g. (CH ₂ ) _n ) can be used in the present invention. Mention may in particular be made of nucleotide analogs labeled with diethylaminocoumarin, fluorescein, a cyanine, especially cyanine 3 or cyanine 5, tetramethylrhodamine, Lissamine®, Texas Red®, alexa or a marker of the ATTO family (in particular ATT0647N, ATTO 550, ATT0565 available from Jena BioSciences coupled to nucleotides (or the like) according to different variants). Different vendors market this type of fluorescent analogs, including Perkin Elmer, Amersham, etc.

Selon un mode particulier de réalisation, les marqueurs sont des molécules photoactivables en fonction des conditions oxydo-réductrices du milieu réactionnel. Ce mode de réalisation est particulièrement avantageux dans le cadre de la mise en oeuvre de la technologie STORM décrite ci-dessous. According to one particular embodiment, the markers are photoactivatable molecules as a function of the oxido-reducing conditions of the reaction medium. This embodiment is particularly advantageous in the context of the implementation of the STORM technology described below.

L'analogue de nucléotide utilisé peut également alternativement ne pas être lui-même fluorescent, mais être porteur d'un groupement susceptible d'être modifié pour le rendre fluorescent. Le couplage des fluorophores peut être covalent ou transitoire (pour pallier d'éventuelles limitations stériques). Dans le cas d'un couplage transitoire, on utilise des molécules fluorescentes possédant une affinité pour l'analogue de nucléotide incorporé dans l'ADN. On peut notamment citer à titre illustratif les bromo-, iodo-, fluoro-nucléotides (notamment bromo-, iodo-, fluoro-UTP, -ATP ou -CTP), des oxo-nucléotides (tels que oxo- GTP) des nucléotides contenant un groupement alkyne (par exemple5-éthynyl-dUTP, C8- Alkyne-dUTP, γ-Propargyl-ATP) et des nucléotides modifiés par un groupement contenant une fonction aminé, notamment une fonction aminoallyle tels que aminoallyl-dUTP, et les aminoallyl-dCTP. The nucleotide analogue used may alternatively not itself be fluorescent, but may carry a group that can be modified to make it fluorescent. The coupling of fluorophores can be covalent or transient (to overcome any steric limitations). In the case of transient coupling, fluorescent molecules having affinity for the nucleotide analog incorporated in the DNA are used. Bromo-, iodo-, fluoro-nucleotides (especially bromo-, iodo-, fluoro-UTP, -ATP or -CTP), oxo-nucleotides (such as oxo-GTP), nucleotides containing, for example, an alkylene group (for example 5-ethynyl-dUTP, C8-Alkyne-dUTP, γ-Propargyl-ATP) and nucleotides modified with a group containing an amino function, in particular an aminoallyl function such as aminoallyl-dUTP, and aminoallyl-dCTP .

Le couplage d'un marqueur fluorescent sur un nucléotide modifié est bien connu de l'homme de l'art. Coupling a fluorescent label to a modified nucleotide is well known to those skilled in the art.

A titre illustratif, le marquage fluorescent d'un nucléotide modifié porteur d'une fonction aminé tel que l'aminoallyl-dUTP ou aminoallyle-dCTP peut être réalisé par réaction de l'ADN modifié avec un groupement fluorescent porteur d'une fonction N-hydroxysuccinimide (NHS), sulfo-NHS ou sulfotétrafluorophényl (STP) qui permettra le couplage au niveau de la fonction aminé. Le marqueur fluorescent employé peut ainsi être un ester NHS d'une molécule fluorescente telle que Cy3-NHS, Cy5-NHS, ATTO-NHS, Alexa-NHS, fluoresceine- NHS, rhodamine-NHS ou autres. By way of illustration, the fluorescent labeling of a modified nucleotide carrying an amino function such as aminoallyl-dUTP or aminoallyl-dCTP may be carried out by reaction of the modified DNA with a fluorescent group carrying an N-function. hydroxysuccinimide (NHS), sulfo-NHS or sulfotetrafluorophenyl (STP) which will allow coupling at the amino function. The fluorescent label employed can thus be an NHS ester of a fluorescent molecule such as Cy3-NHS, Cy5-NHS, ATTO-NHS, Alexa-NHS, fluorescein-NHS, rhodamine-NHS or others.

Le marquage fluorescent d'un nucléotide modifié porteur d'une fonction alkyne tel que 5- éthynyl-dUTP peut être réalisé par la "Click Chemistry", soit une réaction de l'ADN modifié avec une solution de bromure de cuivre (CuBr), de tris-(benzyltriazolylmethyl)amine et d'une molécule fluorescente contenant un groupement azide (comme par exemple la fluorescein- azide, la cyanine3 -azide, la cyanine5 -azide, etc.). Dans un mode particulier de réalisation, le couplage avec un marqueur fluorescent peut être réalisé sur une molécule d'ADN simple brin comprenant un nucléotide modifié pour augmenter l'efficacité de ce couplage. Les méthodes pour obtenir un ADN simple brin son bien connues de l'homme du métier. On peut notamment citer l'addition d'une concentration d'urée concentrée à la solution d'ADN double brin. Le marquage fluorescent est ensuite réalisé après purification de l'ADN simple brin. Fluorescent labeling of a modified nucleotide carrying an alkyl function such as 5-ethynyl-dUTP can be achieved by the "Click Chemistry", a reaction of the modified DNA with a solution of copper bromide (CuBr), tris (benzyltriazolylmethyl) amine and a fluorescent molecule containing an azide group (such as fluoresceinazide, cyanine3-azide, cyanine-azide, etc.). In a particular embodiment, the coupling with a fluorescent marker may be carried out on a single-stranded DNA molecule comprising a nucleotide modified to increase the efficiency of this coupling. The methods for obtaining a single-stranded DNA are well known to those skilled in the art. In particular, the addition of a concentration of concentrated urea to the double-stranded DNA solution may be mentioned. The fluorescent labeling is then carried out after purification of the single-stranded DNA.

Selon un mode particulier de réalisation, le marquage est réalisé par réplication d'une molécule d'ADN (ADN matrice) telle que décrite ci-dessus avec l'ADN polymérase Phi29, en présence de dATP, dTTP (ou alternativement dUTP), dCTP et dGTP non modifiés et d'un ou plusieurs analogues de nucléotides fluorescents ou susceptibles d'être rendus fluorescents. Selon une variante de ce mode de réalisation, l'analogue de nucléotide remplace complètement la base non modifiée correspondante dans le milieu réactionnel. La réaction de polymérisation est réalisée à la température requise pour le fonctionnement de l'enzyme selon les recommandations du fournisseur (30°C pour la Phi29) et pendant un temps suffisant pour permettre l'incorporation de nucléotides modifiés sur plusieurs dizaines de kilobases sur une partie au moins des molécules matrices. Selon un second mode de réalisation de l'invention, le marquage est réalisé par hybridation de la molécule d'ADN avec un ou plusieurs oligonucléotides simple brin marqués, de taille comprise entre 10 et 15 nucléotides. La séquence de ces oligonucléotides est choisie de sorte qu'elle soit répartie de manière à peu près uniforme dans le génome d'intérêt, tout en permettant l'obtention d'un marquage spécifique d'une molécule d'ADN donné. Dans le cas du génome humain, ces oligonucléotides sont choisis en fonction de la composition du génome, et de leur répartition. La séquence des oligonucléotides pourra être de la forme NNX1X2X3X4 5 6 7 8NN, dans laquelle X1X2X3X4X5X6X7X8 est une séquence déterminée et N est une base aléatoire. De cette façon la séquence déterminée de l'oligonucléotide sera suffisamment courte (par exemple 8 bases) pour avoir de nombreuses occurrences dans le génome, et suffisamment longue par addition de bases aléatoires (par exemple 8+4=12 bases ici) pour s'hybrider de manière stable. Les oligonucléotides peuvent notamment être marqués par incorporation d'une ou plusieurs molécules fluorescentes à leur extrémité, selon des procédures bien connues de l'homme du métier. Selon un aspect particulier de ce second mode de réalisation, le marquage par hybridation est réalisé après immobilisation et étirement de la molécule d'ADN à étudier. According to one particular embodiment, the labeling is carried out by replication of a DNA molecule (template DNA) as described above with the Phi29 DNA polymerase, in the presence of dATP, dTTP (or alternatively dUTP), dCTP. and unmodified dGTPs and one or more fluorescent or fluorescent nucleotide analogues. According to a variant of this embodiment, the nucleotide analogue completely replaces the corresponding unmodified base in the reaction medium. The polymerization reaction is carried out at the temperature required for the operation of the enzyme according to the supplier's recommendations (30 ° C. for Phi29) and for a time sufficient to allow the incorporation of modified nucleotides over several tens of kilobases over a period of time. at least part of the matrix molecules. According to a second embodiment of the invention, the labeling is carried out by hybridization of the DNA molecule with one or more labeled single-stranded oligonucleotides of size between 10 and 15 nucleotides. The sequence of these oligonucleotides is chosen so that it is distributed approximately uniformly in the genome of interest, while allowing to obtain a specific labeling of a given DNA molecule. In the case of the human genome, these oligonucleotides are chosen according to the composition of the genome, and their distribution. The sequence of the oligonucleotides may be of the form NNX1X2X3X4 5 6 7 8NN, wherein X1X2X3X4X5X6X7X8 is a defined sequence and N is a random base. In this way the determined sequence of the oligonucleotide will be sufficiently short (for example 8 bases) to have numerous occurrences in the genome, and sufficiently long by addition of random bases (for example 8 + 4 = 12 bases here) for hybridize stably. The oligonucleotides may in particular be labeled by incorporation of one or more fluorescent molecules at their end, according to procedures well known to those skilled in the art. According to a particular aspect of this second embodiment, the hybridization labeling is carried out after immobilization and stretching of the DNA molecule to be studied.

Selon un troisième mode particulier de réalisation, le marquage de la molécule d'ADN est réalisé par mise en contact dudit ADN avec un agent intercalant capable de se lier à l'ADN double brin et qui émet un signal fluorescent lorsqu'il est ainsi lié. De nombreux agents intercalants sont connus de l'homme du métier. On peut citer à titre illustratif le bromure d'éthidium, l'iodure de propidium, le Sybr Green, le TOTO (1, 1 '-(4,4,7,7-tetramethyl-4,7- diazaundecamethylene)-bis-4-[3 -methyl-2,3-dihydro-(benzo- 1,3 -thiazole)-2- methylidene]- quinolinium tetraiodide) ou l'oxazole orange (YOYO). According to a third particular embodiment, the labeling of the DNA molecule is carried out by contacting said DNA with an intercalating agent capable of binding to the double-stranded DNA and which emits a fluorescent signal when it is so bound. . Many intercalating agents are known to those skilled in the art. There may be mentioned, for example, ethidium bromide, propidium iodide, Sybr Green, TOTO (1,1 '- (4,4,7,7-tetramethyl-4,7-diazaundecamethylene) -bis- 4- [3-methyl-2,3-dihydro- (benzo-1,3-thiazole) -2-methylidene] quinolinium tetraiodide) or orange oxazole (YOYO).

L'agent intercalant peut être utilisé pour marquer la molécule d'ADN avant ou après immobilisation sur support solide. De préférence, le marquage est réalisé avant immobilisation de la molécule d'ADN, en particulier en utilisant la molécule YOYO. Immobilisation et étirement de l'ADN The intercalating agent can be used to label the DNA molecule before or after immobilization on a solid support. Preferably, the labeling is carried out before immobilization of the DNA molecule, in particular using the molecule YOYO. Immobilization and stretching of DNA

L'étape b) du procédé d'obtention d'une signature spécifique d'une molécule d'ADN comprend l'immobilisation et l'étirement dudit ADN sur un support solide. Step b) of the method for obtaining a specific signature of a DNA molecule comprises immobilizing and stretching said DNA on a solid support.

De préférence, l'ADN marqué est immobilisé et étiré par la technique du peignage moléculaire. Cette technique est bien connue de l'homme du métier. On en trouvera une description détaillée notamment dans la demande W09521939, incorporée ici par référence. En résumé, cette technique exploite le mouvement d'un ménisque de solvant qui étire et oriente les molécules d'ADN en solution dans la direction du mouvement, tout en appliquant localement une force qui plaque les molécules d'ADN sur la surface solide. Un taux d'étirement de 1 base/0,5 nm peut être obtenu avec cette technique. Preferably, the labeled DNA is immobilized and stretched by the molecular combing technique. This technique is well known to those skilled in the art. A detailed description will be found in particular in application WO9521939, incorporated herein by reference. In summary, this technique exploits the movement of a solvent meniscus that stretches and orients the DNA molecules in solution in the direction of motion, while locally applying a force that clumps the DNA molecules onto the solid surface. A stretch rate of 1 base / 0.5 nm can be obtained with this technique.

La surface sur laquelle les molécules d'ADN sont immobilisées est rigide (par exemple du verre) ou semi-rigide (par exemple une feuille de polymère flexible). La surface peut notamment être composée ou recouverte d'un polymère organique ou inorganique, d'un métal, oxyde de métal, semi- conducteur, ou une combinaison de ces matières. The surface on which the DNA molecules are immobilized is rigid (for example glass) or semi-rigid (for example a flexible polymer sheet). The surface can in particular be composed or covered with an organic or inorganic polymer, a metal, metal oxide, semiconductor, or a combination of these materials.

Selon un mode particulier de réalisation, la surface est recouverte, dans sa totalité ou en partie, de groupements susceptibles de réagir avec une extrémité des molécules d'ADN en solution et ainsi de la fixer. Bien entendu, l'homme du métier connaît les différents systèmes utilisables pour fixer une molécule d'ADN et saura dans quels cas la molécule d'ADN doit être fonctionnalisée en parallèle de la surface. On peut citer à titre d'exemple le système streptavidine-biotine, un système anticorps-antigène, etc. According to a particular embodiment, the surface is covered, in whole or in part, with groups capable of reacting with one end of the DNA molecules in solution and thus of fixing it. Of course, one skilled in the art knows the different systems that can be used to fix a DNA molecule and will know in which cases the DNA molecule must be functionalized in parallel with the surface. By way of example, mention may be made of the streptavidin-biotin system, an antibody-antigen system, and the like.

Selon un autre mode particulier de réalisation, l'ADN n'est pas modifié en vu de son ancrage. On utilise alors par exemple des lames de verre silanisées (par exemple par trempage de la lame de verre dans un bain de trimethoxy(7-octen-l-yl)silane à 80 %) qui n'ont pas subi un traitement supplémentaire et la molécule d'ADN est alors immobilisée par application de la technique de peignage moléculaire. Selon une variante de ce mode de réalisation, le support peut également être un support composé ou recouvert d'un polymère synthétique, de polyacrylamide, de polycarbonates, etc. According to another particular embodiment, the DNA is not modified in view of its anchoring. For example, silanized glass slides are used (for example by soaking the glass slide in a bath of 80% trimethoxy (7-octen-1-yl) silane) which has not undergone additional treatment and the DNA molecule is then immobilized by application of the molecular combing technique. According to a variant of this embodiment, the support may also be a support composed of or covered with a synthetic polymer, polyacrylamide, polycarbonates, etc.

Dans un mode particulier de réalisation, l'ADN marqué est immobilisé de manière individualisée. En d'autres termes, l'ADN marqué est dilué à une concentration adéquate pour obtenir, après immobilisation, des molécules d'ADN marqué séparées les unes des autres. A titre illustratif, des fragments d'ADN de 50 à 200 kb de longueur sont utilisés à une concentration de 0,2 à 1 pM. La dilution peut être réalisée dans un tampon Tris 50mM, pH 8 ou MES 50mM, pH 5.5 par exemple. In a particular embodiment, the labeled DNA is immobilized in an individualized manner. In other words, the labeled DNA is diluted to an adequate concentration to obtain, after immobilization, labeled DNA molecules separated from each other. By way of illustration, DNA fragments of 50 to 200 kb in length are used at a concentration of 0.2 to 1 μM. The dilution may be carried out in 50 mM Tris buffer, pH 8 or 50 mM MES, pH 5.5, for example.

Le peignage moléculaire peut être réalisé dans un appareil adapté permettant de déplacer à vitesse constante le support hors d'un réservoir contenant l'ADN marqué à fixer et peigner. La vitesse de déplacement du support hors du réservoir peut être adaptée par l'homme du métier. A titre illustratif, les exemples ci-dessous présentent la mise en œuvre d'un déplacement à vitesse constante de 300 μιη/s hors du réservoir. The molecular combing can be carried out in a suitable apparatus for moving the carrier at a constant speed out of a reservoir containing the labeled DNA to be fixed and combed. The speed of movement of the support out of the tank can be adapted by those skilled in the art. By way of illustration, the examples below show the implementation of a movement at a constant speed of 300 μιη / s out of the tank.

Observation Observation

Après immobilisation, les molécules d'ADN étirées sont observées à l'aide d'un microscope optique afin de générer un signal fluorescent. Le microscope peut notamment comprendre un système TIRF (Total Internai Reflection Fluorescence) pour minimiser le bruit autour de la molécule d'ADN à étudier. Dans un mode particulier de réalisation, le microscope est équipé d'un objectif à grossissement choisi de manière à optimiser l'observation de la molécule d'ADN marquée. L'objectif peut notamment présenter un grossissement d'au moins 40X, de préférence d'au moins 50X, au moins 60X voire d'au moins 100X. Dans un autre mode particulier de réalisation, le microscope est équipé d'une caméra CCD à haute résolution (par exemple 6,5 μιη x 6,5 μπι, soit 65 nm par pixel pour un objectif 100X) à efficacité quantique élevée (de préférence supérieure à 65%). Dans ces conditions, un pixel de la caméra couvre une longueur d'environ 130 paires de base d'ADN étiré et immobilisé sur la surface. After immobilization, the stretched DNA molecules are observed using an optical microscope to generate a fluorescent signal. The microscope may include a Total Internal Reflection Fluorescence (TIRF) system to minimize noise around the DNA molecule to be studied. In a particular embodiment, the microscope is equipped with a magnifying objective chosen so as to optimize the observation of the labeled DNA molecule. The objective may especially have a magnification of at least 40 ×, preferably at least 50 ×, at least 60 × or even at least 100 ×. In another particular embodiment, the microscope is equipped with a high resolution CCD camera (for example 6.5 μιη × 6.5 μπι, or 65 nm per pixel for a 100 × objective) with high quantum efficiency (preferably greater than 65%). Under these conditions, one pixel of the camera covers a length of about 130 base pairs of DNA stretched and immobilized on the surface.

Selon un mode préféré de réalisation, le microscope utilisé comprend un système TIRF et est équipé d'un objectif à grossissement d'au moins 100X et d'une caméra CCD à haute résolution (par exemple 6,5 μπι x 6,5 μπι, soit 65 nm par pixel) à efficacité quantique élevée (de préférence supérieure à 65%). According to a preferred embodiment, the microscope used comprises a TIRF system and is equipped with a magnification objective of at least 100 × and a high resolution CCD camera (for example 6.5 μπι × 6.5 μπι, or 65 nm per pixel) with high quantum efficiency (preferably greater than 65%).

Les molécules fluorescentes sont excitées par un laser de longueur d'onde adaptée, connue de l'homme du métier. La lumière émise est capturée par la caméra fixée au microscope dont l'objectif est équipé d'un filtre optique adapté. Des moyens adaptés pour l'obtention d'une fluorescence sont donc employés. On pourra notamment citer l'emploi de filtres d'excitation, dichroïques et d'émission appropriés en fonction du fluorophore utilisé pour marquer l'ADN et l'emploi d'une excitation par un laser à une longueur d'onde adaptée. A titre illustratif, les exemples décrivent des filtres qui peuvent être utilisés pour la détection du marqueur Cy3 ou encore l'emploi d'un laser de longueur d'onde de 633 nm pour exciter le Cy5 ou 532 nm pour exciter le Cy3, l'homme du métier connaissant les moyens nécessaires à l'obtention d'une fluorescence à partir d'autres marqueurs. Fluorescent molecules are excited by a laser of suitable wavelength, known to those skilled in the art. The light emitted is captured by the camera attached to the microscope whose lens is equipped with a suitable optical filter. Suitable means for obtaining a fluorescence are therefore used. It will be possible to mention the use of appropriate excitation, dichroic and emission filters depending on the fluorophore used to label the DNA and the use of a laser excitation at a suitable wavelength. By way of illustration, the examples describe filters that can be used for the detection of the Cy3 marker or the use of a 633 nm wavelength laser to excite the Cy5 or 532 nm to excite the Cy3, the a person skilled in the art knowing the means necessary to obtain a fluorescence from other markers.

Bien entendu, l'homme du métier pourra adapter l'équipement pour obtenir la meilleure résolution possible. Selon un mode préféré de réalisation, la microscopie mise en œuvre est une microscopie à fluorescence de super-résolution. Selon un aspect préféré, la microscopie de super-résolution utilisée met en œuvre une localisation successive de fluorophores individuels. Parmi les technologies disponibles, on peut citer le PALM (photoactivated localization microscopy) et le STORM (stochastic optical reconstruction microscopy) qui permettent de s'affranchir de la limite de résolution imposée par la diffraction. Ces techniques sont décrites dans les articles de Betzig et al, 2006 et Rust et al., 2006, qui sont incorporés ici par référence. Ces techniques implémentent une photoactivation des fluorophores incorporés dans l'ADN par des flashs de lumière qui excitent aléatoirement un petit nombre de molécules fluorescentes, plutôt que de les exciter en continu. Ces molécules fluorescentes sont alors imagées jusqu'à ce que le fluorophore soit détruit ou repasse dans un état inactif. Of course, those skilled in the art will be able to adapt the equipment to obtain the best possible resolution. According to a preferred embodiment, the microscopy implemented is a super-resolution fluorescence microscopy. In a preferred aspect, the super-resolution microscopy used implements a successive localization of individual fluorophores. Among the available technologies, mention may be made of PALM (photoactivated localization microscopy) and STORM (stochastic optical reconstruction microscopy) which make it possible to overcome the resolution limit imposed by diffraction. These techniques are described in the articles by Betzig et al, 2006 and Rust et al., 2006, which are incorporated herein by reference. These techniques implement photoactivation of the fluorophores incorporated in the DNA by flashes of light that randomly excite a small number of fluorescent molecules, rather than continuously excite them. These fluorescent molecules are then imaged until the fluorophore is destroyed or reverts to an inactive state.

Selon un mode particulier de réalisation, le microscope utilisé comprend un système TIRF et un système STORM ou PALM, de préférence STORM. According to a particular embodiment, the microscope used comprises a TIRF system and a STORM or PALM system, preferably STORM.

Les signaux lumineux collectés sous forme d'images sont ensuite traités par un processus informatisé dans le but de localiser la position relative des fluorophores (et donc des bases modifiées dans l'ADN) les uns par rapport aux autres le long de l'axe linéaire des molécules d'ADN, et donc leur distance relative les uns par rapport aux autres. The light signals collected in the form of images are then processed by a computerized process for the purpose of locating the relative position of the fluorophores (and thus the modified bases in the DNA) with respect to each other along the linear axis DNA molecules, and therefore their relative distance from each other.

La détermination du profil lumineux de chaque molécule présente sur une image peut être réalisée automatiquement par un algorithme implémenté dans un programme informatique. Selon un aspect particulier, l'algorithme peut notamment se dérouler en deux étapes. La première comprend l'identification des pics d'intensité maximale sur les pixels de premier plan le long d'une ligne de crête. Une ligne de crête est une surface de courbure élevée (courbure maximale dans au moins une direction) produite par l'image de la molécule d'ADN, définie par des valeurs absolues élevée des plus petites valeurs propres de la matrice symétrique des dérivées partielles de second ordre (matrice Hessienne) de chaque pixel. Selon un mode particulier de réalisation, un flou Gaussien (sigma = 1 pixel) peut être appliqué afin de réduire l'influence du bruit. La deuxième étape comprend la détermination d'une courbe spline par interpolation des points de courbure maximale détectés sur la crête, établis lors de la première étape.. La courbe est échantillonnée à des points équi distants (par exemple à chaque pixel, ou à chaque demi-pixel) afin d'approximer la valeur de la surface à ces points. L'approximation peut se faire par la prise en compte directe de la valeur du pixel, par l'approximation de Taylor de la surface au point choisi, ou par une déconvolution de Richardson-Lucy basée sur une « point spread fonction ». La liste successive de ces valeurs constitue le profil fluorescent de la molécule d'ADN (Figure 3). Dans un mode de réalisation de l'invention s'appuyant sur la technologie STORM ou PALM, une fonction d'étalement du point (ou Point Spread Function (PSF) en anglais), soit une fonction mathématique décrivant le signal émis par une molécule fluorescente unique, sera ajustée sur le signal lumineux afin de décomposer celui-ci en autant de PSFs que de molécules émettrices uniques. Cette opération permet de pointer le centre de chaque PSF et ainsi de positionner les molécules émettrices les unes par rapport aux autres le long de l'axe de la molécule d'ADN immobilisée et étirée (de préférence par peignage moléculaire). L'opération de pointage est réalisée grâce à des algorithmes bien connus de l'homme du métier (par exemple, Mortensen et al, 2010). Le centre de chaque PSF, correspondant à la position de chaque molécule fluorescente, pourra ainsi être localisé avec une précision dépendant du nombre de photons acquis, et non pas de la limite de diffraction de la lumière émise. La séquence de distances successives entre les positions des molécules fluorescentes le long des molécules d'ADN fournit ainsi une signature spécifique de chacune des molécules, exprimée en paire de base d'ADN. Dans un autre mode de réalisation de l'invention, le pointage des molécules individuelles produisant le signal lumineux n'est pas requis, le signal lumineux linéaire, exprimé en unités d'intensité arbitraire par nanomètre, représentant directement la signature spécifique de chaque molécule d'ADN. Dans ce cas une opération de décomposition du signal peut être réalisée afin d'accroître la résolution de la signature. Cette opération peut être réalisée par des méthodes bien connues de l'homme du métier, comme les méthodes basées sur les ondelettes (Donoho et al., 1995) ou sur la déconvolution (Bertero et Boccacci, 1998). The determination of the light profile of each molecule present on an image can be performed automatically by an algorithm implemented in a computer program. According to one particular aspect, the algorithm can in particular be carried out in two stages. The first includes identifying the peaks of maximum intensity on the foreground pixels along a ridge line. A peak line is a high curvature surface (maximum curvature in at least one direction) produced by the image of the DNA molecule, defined by high absolute values of the smallest eigenvalues of the symmetric matrix of the partial derivatives of second order (Hessian matrix) of each pixel. According to a particular embodiment, a Gaussian blur (sigma = 1 pixel) can be applied in order to reduce the influence of the noise. The second step comprises the determination of a spline curve by interpolation of the peak curvature peak points established in the first step. The curve is sampled at equidistant points (for example at each pixel, or at each pixel). half-pixel) to approximate the value of the surface at these points. The approximation can be done by directly taking into account the value of the pixel, by Taylor's approximation of the surface at the chosen point, or by a Richardson-Lucy deconvolution based on a "point spread function". The successive list of these values constitutes the fluorescent profile of the DNA molecule (FIG. 3). In one embodiment of the invention based on STORM or PALM technology, a Point Spread Function (PSF) is a mathematical function describing the signal emitted by a fluorescent molecule. unique, will be adjusted on the light signal to break it down into as many PSFs as single emitting molecules. This operation makes it possible to point the center of each PSF and thus to position the emitting molecules relative to one another along the axis of the immobilized DNA molecule and stretched (preferably by molecular combing). The pointing operation is carried out by means of algorithms well known to those skilled in the art (for example, Mortensen et al, 2010). The center of each PSF, corresponding to the position of each fluorescent molecule, can thus be located with a precision depending on the number of photons acquired, and not on the diffraction limit of the emitted light. The sequence of successive distances between the positions of the fluorescent molecules along the DNA molecules thus provides a specific signature of each of the molecules, expressed as a base pair of DNA. In another embodiment of the invention, the pointing of the individual molecules producing the light signal is not required, the linear light signal, expressed in units of arbitrary intensity per nanometer, directly representing the specific signature of each molecule of light. DNA. In this case a signal decomposition operation can be performed to increase the resolution of the signature. This operation can be performed by methods well known to those skilled in the art, such as wavelet-based methods (Donoho et al., 1995) or deconvolution (Bertero and Boccacci, 1998).

Il est possible que des fragments différents représentant différentes copies d'une même séquence d'ADN soient immobilisés sur la surface dans le procédé selon l'invention. Le procédé d'obtention d'une signature spécifique d'une molécule d'ADN peut alors comprendre en outre, après l'étape c) une étape de concaténation où des chevauchements de signature sont identifiés entre plusieurs fragments, indiquant que la même région du génome est observée. Dans ce cas, les deux signatures sont alors fusionnées pour générer une signature plus longue. Autres objets de l'invention It is possible that different fragments representing different copies of the same DNA sequence are immobilized on the surface in the process according to the invention. The method for obtaining a specific signature of a DNA molecule may then further comprise, after step c), a concatenation step where signature overlaps are identified between several fragments, indicating that the same region of the genome is observed. In this case, the two signatures are then merged to generate a longer signature. Other objects of the invention

Le procédé décrit ci-dessus permet d'obtenir la signature d'une molécule d'ADN. Cette signature peut avantageusement être mise en œuvre pour constituer des banques de signatures de molécules d'ADN de référence. The method described above makes it possible to obtain the signature of a DNA molecule. This signature can advantageously be implemented to constitute reference libraries of DNA molecules.

Ainsi, selon l'un de ses aspects, l'invention concerne une signature spécifique d'un ADN donné obtenue selon le procédé détaillé ci-dessus. Thus, according to one of its aspects, the invention relates to a specific signature of a given DNA obtained according to the method detailed above.

Les signatures des molécules d'ADN de référence peuvent avantageusement être utilisées pour identifier des molécules d'ADN inconnues. En conséquence, un autre aspect de l'invention concerne un procédé pour l'identification d'une molécule d'ADN comprenant: i) l'obtention de la signature de ladite molécule d'ADN selon le procédé d'obtention décrit ci- dessus; et The signatures of the reference DNA molecules can advantageously be used to identify unknown DNA molecules. Accordingly, another aspect of the invention relates to a method for identifying a DNA molecule comprising: i) obtaining the signature of said DNA molecule according to the method of obtaining described above ; and

ii) la comparaison de la signature obtenue à l'étape i) à des signatures prédéterminées de molécules d'ADN de référence; ii) comparing the signature obtained in step i) with predetermined signatures of reference DNA molecules;

ladite comparaison permettant d'identifier la molécule d'ADN. said comparison to identify the DNA molecule.

La signature obtenue à l'étape i) est comparée par un procédé informatique basé sur des corrélations statistiques. The signature obtained in step i) is compared by a computer process based on statistical correlations.

Il est possible que des fragments différents issus d'une même molécule d'ADN soient immobilisés sur la surface dans le procédé selon l'invention. Comme mentionné ci-dessus, le procédé d'identification peut alors comprendre en outre, entre les étapes i) et ii) une étape de concaténation où des chevauchements de signature sont identifiés entre plusieurs fragments, indiquant que la même région du génome est observée. Dans ce cas, les deux signatures sont alors fusionnées pour générer une signature plus longue. It is possible that different fragments from the same DNA molecule are immobilized on the surface in the process according to the invention. As mentioned above, the identification method may then further comprise, between steps i) and ii) a concatenation step where signature overlaps are identified between several fragments, indicating that the same region of the genome is observed. In this case, the two signatures are then merged to generate a longer signature.

Les signatures de référence peuvent notamment correspondre aux signatures de molécules d'ADN obtenues à partir de sujets sains. On peut notamment citer de l'ADN génomique d'un sujet sain, ou de l'ADN complémentaire obtenu à partir du transcriptome d'une cellule ou d'un tissu d'un sujet sain. Dans le cas d'ADN complémentaire, des signatures de référence pourront être produite pour chaque tissu/type cellulaire. Par "sujet sain", on désigne selon l'invention un sujet pour lequel on ne suspecte aucune altération génomique. Préférentiellement, les signatures de molécule d'ADN de sujet sain sont représentatives de la diversité des populations considérées, notamment humaines. Les molécules d'ADN de référence peuvent correspondre à la signature de chromosomes, par exemple à la signature des 23 chromosomes humains. La signature complète d'un chromosome aura bien entendu été obtenue par concaténation des signaux chevauchant correspondant à ce chromosome, selon les modalités exposées ci-dessus. Les signatures de référence peuvent également correspondre à des molécules d'ADN de patients souffrant de maladies génétiques diagnostiquées, ou souffrant d'un cancer documenté. Ces signatures peuvent permettre l'identification des pathologies concernées. The reference signatures may in particular correspond to the signatures of DNA molecules obtained from healthy subjects. There may be mentioned genomic DNA of a healthy subject, or complementary DNA obtained from the transcriptome of a cell or tissue of a healthy subject. In the case of complementary DNA, reference signatures may be produced for each tissue / cell type. By "healthy subject" is meant according to the invention a subject for which no genomic alteration is suspected. Preferentially, the DNA molecule signatures of healthy subjects are representative of the diversity of the populations considered, especially human. The reference DNA molecules may correspond to the signature of chromosomes, for example to the signature of the 23 human chromosomes. The complete signature of a chromosome will of course have been obtained by concatenation of the overlapping signals corresponding to this chromosome, according to the modalities described above. The reference signatures may also correspond to DNA molecules of patients suffering from genetic diseases diagnosed or suffering from a documented cancer. These signatures can allow the identification of the pathologies concerned.

Bien entendu, comme cela a été mentionné ci-dessus, les applications du procédé selon l'invention ne se limitent pas à l'étude du génome humain. L'ADN étudié peut notamment être extrait d'un animal, notamment un mammifère (dont l'homme), mais également d'une plante, d'un champignon, d'un procaryote, ou d'un virus, par exemple un virus de mammifère à génome ADN, notamment un virus humain, ou un bactériophage à génome ADN. En particulier, les signatures de référence peuvent correspondre à des molécules d'ADN provenant de chacun de ces organismes. Of course, as mentioned above, the applications of the method according to the invention are not limited to the study of the human genome. The DNA studied can in particular be extracted from an animal, in particular a mammal (including man), but also from a plant, a fungus, a prokaryote, or a virus, for example a virus. mammalian DNA genome, in particular a human virus, or a bacteriophage with DNA genome. In particular, the reference signatures may correspond to DNA molecules from each of these organisms.

Le procédé selon l'invention peut donc notamment être un procédé d'identification d'un ADN extrait d'une plante que l'on cherche à identifier. Dans ce cadre, la signature de l'ADN étudié sera comparée à des signatures de référence provenant de différentes plantes ou variétés de plantes. Selon un mode particulier de réalisation, les signatures de référence comprennent en tout ou partie des signatures d'ADN de plantes génétiquement modifiées. Ainsi, le procédé selon l'invention peut également permettre de déterminer si une plante donnée est génétiquement modifiée. Selon un autre aspect, l'invention concerne un procédé pour l'identification d'une anomalie génomique chez un sujet comprenant: The method according to the invention may therefore in particular be a method of identifying a DNA extracted from a plant that is to be identified. In this context, the signature of the DNA studied will be compared with reference signatures from different plants or varieties of plants. According to a particular embodiment, the reference signatures comprise all or part of DNA signatures of genetically modified plants. Thus, the method according to the invention can also make it possible to determine whether a given plant is genetically modified. In another aspect, the invention provides a method for identifying a genomic abnormality in a subject comprising:

i) l'obtention de la signature moléculaire d'une molécule d'ADN génomique d'un tissu ou d'une cellule d'un sujet; ii) la comparaison de la signature obtenue à l'étape i) aux signatures prédéterminées de molécules d'ADN de référence provenant d'un sujet sain afin d'identifier ladite molécule d'ADN génomique ; i) obtaining the molecular signature of a genomic DNA molecule of a tissue or cell of a subject; ii) comparing the signature obtained in step i) with the predetermined signatures of reference DNA molecules from a healthy subject to identify said genomic DNA molecule;

une anomalie génomique étant détectée si la signature de l'ADN génomique du sujet diffère des signatures des molécules d'ADN de référence. Ces différences comprennent notamment des délétions, des inversions, des duplications de segments de chromosomes. a genomic abnormality being detected if the signature of the genomic DNA of the subject differs from the signatures of the reference DNA molecules. These differences include deletions, inversions, duplications of chromosome segments.

Selon un autre mode de réalisation, l'ADN obtenu à l'étape i) est comparé à des signatures prédéterminées de molécules d'ADN de référence provenant de sujets malades, dont la maladie est le résultat d'une altération dans le génome d'une ou plusieurs cellules dudit patient, une anomalie génomique pouvant être suspectée si la signature obtenue à l'étape i) correspond à l'une des signatures de référence. L'invention concerne donc également un procédé pour l'identification d'une anomalie génomique chez un sujet comprenant: According to another embodiment, the DNA obtained in step i) is compared with predetermined signatures of reference DNA molecules from diseased subjects whose disease is the result of an alteration in the genome of one or more cells of said patient, a genomic anomaly that may be suspected if the signature obtained in step i) corresponds to one of the reference signatures. The invention therefore also relates to a method for identifying a genomic abnormality in a subject comprising:

i) l'obtention de la signature moléculaire d'une molécule d'ADN génomique d'un tissu ou d'une cellule d'un sujet; i) obtaining the molecular signature of a genomic DNA molecule of a tissue or cell of a subject;

ii) la comparaison de la signature obtenue à l'étape i) aux signatures prédéterminées de molécules d'ADN de référence provenant de sujets malades, dont la maladie est le résultat d'une altération génétique, afin d'identifier ladite molécule d'ADN génomique ; ii) comparing the signature obtained in step i) with the predetermined signatures of reference DNA molecules from diseased subjects, whose disease is the result of genetic alteration, in order to identify said DNA molecule genomics;

une anomalie génomique étant détectée si la signature de l'ADN génomique du sujet testé correspond à au moins l'une des signatures de référence. a genomic abnormality being detected if the signature of the genomic DNA of the test subject corresponds to at least one of the reference signatures.

Par ailleurs, l'invention peut notamment avantageusement être utilisée pour diagnostiquer une altération génétique dans le cadre d'une amniocentèse. Ce procédé permet une identification plus fine d'altérations génétiques (inversion, délétions, duplications, translocations au niveau d'un ou plusieurs chromosomes) que le caryotypage par cytogénétique. Le procédé pour l'identification d'une anomalie génomique selon l'invention peut être mis en oeuvre indépendamment ou en complément d'un tel caryotypage. Moreover, the invention may advantageously be used to diagnose genetic damage in the context of amniocentesis. This method allows for a finer identification of genetic alterations (inversion, deletions, duplications, translocations at one or more chromosomes), and cytogenetic karyotyping. The method for identifying a genomic anomaly according to the invention can be implemented independently or in addition to such karyotyping.

L'invention concerne également un procédé de tri de molécules. Ce procédé est appliqué à un mélange de molécules d'ADN différentes. Selon cet aspect, l'invention concerne donc un procédé de tri de molécules d'ADN différentes comprises dans un mélange, le procédé comprenant: The invention also relates to a method for sorting molecules. This process is applied to a mixture of different DNA molecules. According to this aspect, the invention therefore relates to a method of sorting different DNA molecules included in a mixture, the method comprising:

a) le marquage des molécules d'ADN - par incorporation au cours d'une réaction de polymérisation d'au moins un nucléotide modifié détectable ou susceptible d'être modifié de manière à être détectable; ou a) the labeling of the DNA molecules by incorporating during the course of a polymerization reaction at least one modified nucleotide detectable or capable of being modified so as to be detectable; or

c) la détection du marquage le long des molécules d'ADN au moyen d'un dispositif optique adapté; c) detecting the labeling along the DNA molecules by means of a suitable optical device;

moyennant quoi un ensemble de profils lumineux correspondant aux différentes molécules d'ADN est obtenu, les différentes molécules d'ADN étant triées selon leur profil lumineux. whereby a set of light profiles corresponding to the different DNA molecules is obtained, the different DNA molecules being sorted according to their light profile.

Le tri de molécules présente un intérêt dans le cas où un extrait d'ADN représente de très petites quantités de molécules (par exemple obtenues sur une scène de crime, ou sur des traces dans un récipient ayant contenu de la nourriture potentiellement contaminée par un OGM, etc). Si la quantité d'ADN est trop faible pour envisager des techniques de caractérisation de l'ADN existantes comme le séquençage, mais que l'on souhaite néanmoins reconnaître la présence d'une séquence particulière redondante dans le mélange (par exemple identifier la présence d'ADN humain à travers les éléments répétés du génome, ou un plasmide multi- copies contenant un transgène dans un organisme potentiellement modifié génétiquement), le tri de molécule est une alternative très intéressante. Dans ce cas les molécules d'ADN dans le mélange présentent typiquement une certaine redondance due à l'existence dans le mélange de molécules possédant la même séquence d'ADN, exactement ou en partie. Dans ce dernier cas, les molécules sont dites chevauchantes. Ainsi, le mélange peut comprendre n sous-groupes de molécules d'ADN, chaque sous-groupe possédant une ou plusieurs propriétés qui lui sont spécifique (par exemple une origine biologique ou une propriété structurale). Dans ce cas, le procédé de tri selon l'invention peut également être suivi d'une étape de comparaison 2 à 2 des différents profils lumineux obtenus, permettant ainsi de procéder à un assemblage de profils sur la base du chevauchement observé entre deux profils. Selon un mode spécifique de réalisation, les profils de chaque molécule sont découpés en blocs pour procéder à la comparaison. Cette approche présente l'avantage de s'affranchir de tout problème éventuel de discontinuité dans la similarité des profils le long des molécules, puisque l'appariement des molécules qui partagent une partie de séquence commune se fera de manière locale (par le biais des blocs) plutôt que globale (par le biais des profils complets). Ainsi, il est possible de découper les profils lumineux obtenus en blocs de taille t se chevauchants de p pixels, avec p < t. Une mesure de similarité (par exemple une distance Euclidienne) est calculée entre toutes les paires possibles de blocs. On recherche alors les situations où une succession de blocs appartenant à une molécule donnée montrent une forte similarité à une succession de blocs d'une autre molécule. Cette situation est caractéristique de deux molécules partageant une région de même séquence d'ADN. The sorting of molecules is of interest in the case where a DNA extract represents very small quantities of molecules (for example obtained at a crime scene, or on traces in a container having contained food potentially contaminated with a GMO , etc). If the amount of DNA is too small to consider existing DNA characterization techniques such as sequencing, but it is nevertheless desirable to recognize the presence of a particular redundant sequence in the mixture (eg to identify the presence of Human DNA through the repeated elements of the genome, or a multi-copy plasmid containing a transgene in a potentially genetically modified organism), the molecule sorting is a very interesting alternative. In this case the DNA molecules in the mixture typically have some redundancy due to the existence in the mixture of molecules having the same DNA sequence, in part or in part. In the latter case, the molecules are said to be overlapping. Thus, the mixture may comprise n subgroups of DNA molecules, each subgroup having one or more properties specific thereto (eg, a biological origin or a structural property). In this case, the sorting method according to the invention can also be followed by a comparison step 2 to 2 of the different light profiles obtained, thus making it possible to assemble profiles on the basis of the overlap observed between two profiles. According to a specific embodiment, the profiles of each molecule are cut into blocks to make the comparison. This approach has the advantage of avoiding any possible problem of discontinuity in the similarity of the profiles along the molecules, since the pairing of the molecules that share a common sequence part will be done locally (by means of the blocks). rather than global (through complete profiles). Thus, it is possible to cut the light profiles obtained in blocks of size t overlapping by p pixels, with p <t. A similarity measure (for example a Euclidean distance) is calculated between all possible pairs of blocks. We then look for situations where a succession of blocks belonging to a given molecule show a strong similarity to a succession of blocks of another molecule. This situation is characteristic of two molecules sharing a region of the same DNA sequence.

Par exemple, il est possible de déterminer la distance euclidienne entre deux blocs bl et b2 de taille N, donnée par l'équation suivan

For example, it is possible to determine the Euclidean distance between two blocks bl and b2 of size N, given by the following equation.

On définit pour chaque bloc de chaque molécule, ses k meilleurs voisins (par exemple avec k= 50, 100, 150, etc) parmi tous les blocs de toutes les autres molécules. Les k meilleurs voisins (k Nearest Neighbors, kNN) d'un bloc donné sont ceux qui lui sont le plus similaires car ils en sont éloignés par une petite distance euclidienne. Selon un mode alternatif de réalisation, d'autres stratégies pour calculer la similarité entre deux profils ou sous-segments de profil de molécules incluent : We define for each block of each molecule, its k best neighbors (for example with k = 50, 100, 150, etc.) among all the blocks of all the other molecules. The k nearest neighbors (k Nearest Neighbors, kNN) of a given block are those which are most similar to it because they are removed by a small Euclidean distance. According to an alternative embodiment, other strategies for calculating the similarity between two profiles or sub-segments of the profile of molecules include:

- des corrélations croisées normalisées - standardized cross-correlations

- des corrélations croisées normalisées avec chevauchement partiel - standardized cross-correlations with partial overlap

- la distance euclidienne sur des transformées de Fourier à terme - Euclidean distance on Fourier transforms at term

- la distance de Hellinger - distance from Hellinger

- la Minimum Variance Matching (MVM) selon Keogh - Minimum Variance Matching (MVM) according to Keogh

- la distance euclidienne sur des coefficients d'ondelettes seuillés the Euclidean distance on thresholded wavelet coefficients

- l'écart de Weyl Selon un mode particulier de réalisation, les blocs similaires sont ensuite représentés sur une matrice. Par exemple, pour toutes les paires possibles de molécules {m_1} m₂), on sélectionne parmi les kNN de chaque bloc de m_} les blocs appartenant à m₂, et qui peuvent être représentés sur un graphe à deux dimensions, dont les axes sont les positions des blocs respectifs de m_} et m₂ (voir à titre illustratif la figure 4). In a particular embodiment, the similar blocks are then represented on a matrix. For example, for all the possible pairs of molecules {m _1} m ₂ ), among the kNNs of each block of m _}, the blocks belonging to m ₂ , which can be represented on a two-dimensional graph, whose axes are the positions of the respective blocks of m _} and m ₂ (see Figure 4 for illustrative purposes).

Un coefficient de corrélation entre deux molécules peut également être déterminé. En particulier, afin de quantifier la comparaison des molécules mi et m₂, on peut calculer le coefficient de corrélation de Pearson des positions des blocs représentés dans la matrice (m 1,1112) . Le coefficient de corrélation de Pearson permet de classer toutes les paires possibles de molécules (mi,m₂) de la plus similaire à la moins similaire. Si nécessaire, le calcul du coefficient de Pearson peut être remplacé par le calcul du coefficient de Kendall. A correlation coefficient between two molecules can also be determined. In particular, in order to quantify the comparison of the mi and m ₂ molecules, the Pearson correlation coefficient of the positions of the blocks represented in the matrix (m 1,1112). The Pearson correlation coefficient classifies all possible pairs of molecules (mi, m ₂ ) from most similar to least similar. If necessary, the calculation of the Pearson coefficient can be replaced by the calculation of the Kendall coefficient.

Bien entendu, ces méthodes de calcul peuvent également être appliquées aux autres procédés selon l'invention nécessitant une comparaison de profils. Of course, these calculation methods can also be applied to the other methods according to the invention requiring a profile comparison.

Après assemblage des différents profils lumineux identifiés sur le support, il est possible d'obtenir une signature spécifique (qui correspond à la succession de profils obtenue grâce à l'assemblage) pour chaque molécule d'ADN différente présente dans le mélange initial. Ces profils spécifiques peuvent alors être utilisés pour identifier les molécules d'ADN présentes dans le mélange, selon les procédures développées ci-dessus. After assembling the different light profiles identified on the support, it is possible to obtain a specific signature (which corresponds to the succession of profiles obtained by the assembly) for each different DNA molecule present in the initial mixture. These specific profiles can then be used to identify the DNA molecules present in the mixture, according to the procedures developed above.

Compte tenu des résolutions obtenues grâce aux techniques de microscopie de superrésolution, il est également décrit un procédé d'obtention de la séquence nucléotidique d'une molécule d'ADN, comprenant: In view of the resolutions obtained using superresolution microscopy techniques, there is also described a method for obtaining the nucleotide sequence of a DNA molecule, comprising:

Cette résolution peut notamment être atteinte par l'utilisation des technologies TIRF couplées à un système STORJVI ou PALM décrits ci-dessus, et analysé par pointage des PSF comme également décrit ci-dessus. La résolution d'un microscope optique est communément considérée comme étant λ/2, soit environ 250 nm. Cependant le centre de l'image d'un point (par exemple un fluorophore unique) peut être localisé avec une précision largement supérieure. En effet, si l'observation est réalisée par un microscope équipé d'une caméra CCD dont les pixels mesure a nm de côté, et dont la valeur efficace du bruit de chaque pixel est b (en nombre de photons comptés), alors l'incertitude σ sur la position de l'émetteur de photons (Thompson, R.E., Larson, D.R., Webb, W.W. Précise nanometer localization analysis for individual fluorescent probes. Biophys. J. (2002) 82 :2775-2783) est donnée par s² + a² l\2 8TTS⁴ b² This resolution can in particular be achieved by the use of TIRF technologies coupled to a STORJVI or PALM system described above, and analyzed by pointing PSFs as also described above. The resolution of an optical microscope is commonly considered to be λ / 2, about 250 nm. However, the center of the image of a point (for example a single fluorophore) can be located with much greater accuracy. Indeed, if the observation is performed by a microscope equipped with a CCD camera whose pixels measure a nm side, and whose effective noise value of each pixel is b (in number of counted photons), then the uncertainty σ on the position of the photon emitter (Thompson, RE, Larson, DR, Webb, WW Precise nanometer localization analysis for individual fluorescent probes Biophys J. (2002) 82: 2775-2783) is given by s ² + a ² l \ 2 8TTS ⁴ b ²

σ = 1 ~ σ = 1 ~

N a² N N ² N

avec s (en nm) la déviation standard de la PSF et N le nombre total de photons collectés. On constate que cette incertitude est en théorie limitée uniquement par le nombre N de photon (les autres paramètres étant des constantes), et permet donc d'atteindre une précision inférieure au nanomètre. Avec des méthodes de correction du mouvement de la dérive du microscope bien connues de l'homme de l'art, et après calibrage de la réponse quantique de chaque pixel, de telle résolution ont été atteintes en pratique (Pertsinidis, A., Zhang, Y., Chu, S. (2010) Subnanometre single-molecule localization, registration and distance measurements, Nature 466:647-51). Selon le procédé d'obtention de la séquence nucléotidique d'une molécule d'ADN selon l'invention, l'ADN étudié peut être un ADN génomique ou un ADN complémentaire. with s (in nm) the standard deviation of PSF and N the total number of photons collected. It is found that this uncertainty is in theory limited only by the number N of photon (the other parameters being constants), and thus makes it possible to reach a precision less than one nanometer. With methods of correcting the movement of the microscope drift well known to those skilled in the art, and after calibration of the quantum response of each pixel, such resolution has been achieved in practice (Pertsinidis, A., Zhang, Y., Chu, S. (2010) Subnanometer single-molecule localization, registration and distance measurements, Nature 466: 647-51). According to the method for obtaining the nucleotide sequence of a DNA molecule according to the invention, the DNA studied can be a genomic DNA or a complementary DNA.

Légendes des figures Legends of the figures

Figure 1: exemple d'image obtenue après marquage d'un ADN avec un extrait de cytoplasme d'œuf de xénope, peignage et acquisition de la fluorescence Figure 1: Example of image obtained after labeling a DNA with an extract of Xenopus egg cytoplasm, combing and acquisition of fluorescence

Figure 2: signal lumineux converti en unités de fluorescence arbitraires par un logiciel d'analyse d'image (ImageJ; courbe rose (courbe du haut)) tandis qu'un échantillon représentatif du bruit de fond environnant est représenté par la courbe bleue (courbe du bas). Figure 2: Light signal converted into arbitrary fluorescence units by image analysis software (ImageJ, pink curve (top curve)) while a representative sample of the surrounding background noise is represented by the blue curve (curve bottom).

Figure 3. Exemple de la comparaison du profil de deux molécules de phage lambda. La molécule 1 en abscisse possède 208 blocs, et la molécule 2 en ordonnée possède 147 blocs.Figure 3. Example of the comparison of the profile of two lambda phage molecules. The molecule 1 on the abscissa has 208 blocks, and the molecule 2 on the y-axis has 147 blocks.

Un point bleu est dessiné aux coordonnées lorsqu'un bloc i de la molécule 1 possède un bloc j de la molécule 2 parmi ses k meilleurs voisins (kNN . A blue dot is drawn at coordinates when a block i of molecule 1 has a block j of molecule 2 among its k best neighbors (kNN.

Figure 4. Histogramme du nombre de paires reconnues comme étant similaires par le procédé (ordonnées) en fonction du seuil de coefficient de corrélation minimal requis pour considérer deux paires comme étant similaires (abscisses). Figure 4. Histogram of the number of pairs recognized as being similar by the (ordered) method as a function of the minimum correlation coefficient threshold required to consider two pairs as being similar (abscissa).

Figure 5. Distribution des coefficients de corrélation obtenus en comparant les profils de deux molécules de phage lambda (courbe trait plein) et les profils d'une molécule de phage lambda et d'une molécule d'ADN de souris (courbe trait pointillé). Exemples Figure 5. Distribution of the correlation coefficients obtained by comparing the profiles of two phage lambda molecules (solid line curve) and the profiles of a lambda phage molecule and a mouse DNA molecule (dotted line curve). Examples

Exemple 1. Marquage par des extraits de système réplicatif d'œufs de Xenopus laevis Example 1. Marking with Xenopus laevis replicative egg system extracts

1°) Marquage de l 'ADN 1 °) Marking of DNA

La technique de marquage est décrite en détail dans Marheineke et al, 2009 et est basée sur un protocole initialement décrit dans Blow et al., 1986. Brièvement, le protocole requiert l'extraction de cytosol d'œufs de xénopes. Pour cela des œufs sont récoltés sur des femelles de xénopes et rincés dans de l'eau déionisée pendant 5 minutes. Les œufs sont ensuite dissociés par une solution appropriée (solution « dégelante ») et activés par une solution appropriée (solution de Barth) additionnée de 0,25 μg/ml de calcium ionophore A23187. Après rinçage, les œufs sont centrifugés à 2°C à une vitesse de 350 x g pendant 1 minute pour permettre l'élimination de toute trace de solution liquide puis à 20000 x g pendant 15 minutes pour les casser. Parmi les trois phases liquides qui résultent, la phase centrale correspond à l'extrait de cytoplasme, qui est récolté à l'aide d'une seringue, puis transféré dans un tube placé sur la glace jusqu'à utilisation. Pour le marquage, on réalise un mélange de 1/20 d'extrait de cytosol, 1/20 de solution de créatine phosphate, 1/50 de solution de cycloheximide, 1/50 de solution de dUTP-rhodamine. Après complétion de la réaction, celle- ci est arrêtée par l'addition de solution de PBS glacé. L'ADN est centrifugé à 1000 x g pendant 7 minutes et resuspendu dans 50 μΐ de PBS. The labeling technique is described in detail in Marheineke et al, 2009 and is based on a protocol originally described in Blow et al., 1986. Briefly, the protocol requires the extraction of cytosol from xenopal eggs. For this, eggs are collected on female xenopus and rinsed in deionized water for 5 minutes. The eggs are then dissociated with an appropriate solution ("thaw" solution) and activated with an appropriate solution (Barth solution) supplemented with 0.25 μg / ml calcium ionophore A23187. After rinsing, the eggs are centrifuged at 2 ° C at a rate of 350 x g for 1 minute to allow removal of any trace of liquid solution and then at 20000 x g for 15 minutes to break them. Of the three liquid phases that result, the central phase is the cytoplasmic extract, which is collected with a syringe and transferred to a tube placed on ice until use. For labeling, a mixture of 1/20 of cytosol extract, 1/20 of creatine phosphate solution, 1/50 of cycloheximide solution, 1/50 of dUTP-rhodamine solution is prepared. After completion of the reaction, it is stopped by the addition of ice cold PBS solution. The DNA is centrifuged at 1000 × g for 7 minutes and resuspended in 50 μl of PBS.

2°) Préparation des lamelles de verre pour le peignage moléculaire 2 °) Preparation of glass slides for molecular combing

La préparation des lames pour le peignage est décrite en détail dans Marheineke et al. Premièrement les lamelles sont soigneusement nettoyées par traitement dans un four à plasma afin de les débarrasser de toute trace de matière organique. Les lamelles sont ensuite soniquées dans de l'heptane pur pendant 10 minutes puis une couche de silane est déposée par incubation pendant 16 heures dans une solution de 100 ml d'heptane additionnée de 100 μΐ de octenyltrichlorosilane. Les lamelles sont ensuite soniquées 5 minutes chaque fois dans une succession de liquide : heptane, eau, 50% méthanol/eau, eau, chloroforme, eau, 50% méthanol/eau, eau puis 2 minutes dans du chloroforme. Les lamelles sont finalement séchées à l'air. The preparation of the blades for combing is described in detail in Marheineke et al. First, the coverslips are carefully cleaned by treatment in a plasma oven to remove any traces of organic matter. The coverslips are then sonicated in pure heptane for 10 minutes and then a silane layer is deposited by incubation for 16 hours in a solution of 100 ml of heptane supplemented with 100 μl of octenyltrichlorosilane. The coverslips are then sonicated for 5 minutes each time in a succession of liquid: heptane, water, 50% methanol / water, water, chloroform, water, 50% methanol / water, water and then 2 minutes in chloroform. The slats are finally air dried.

3°) Peignage moléculaire 3 °) Molecular combing

L'ADN marqué à l'étape 1 dans un volume de 50 μΐ est incubé pendant 30 minutes à 65°C, refroidi, puis incubé 30 minutes à température ambiante. L'ADN est dilué dans 1,2 ml de solution MES à pH 5,7-6,2. Le mélange est introduit dans le réservoir de l'appareil à peignage. Une lamelle préparée à l'étape 2°) est fixée à l'emplacement prévu dans l'appareil à peignage et immergée dans la solution d'ADN pendant 5 minutes. La lamelle est retirée à vitesse constante de 300 μιη / s, ce qui provoque le peignage de l'ADN sur les deux faces de la lamelle. Celle-ci est ensuite séchée à l'air et fixée sur une lame de verre. Le peignage de l'ADN est visualisé au microscope optique équipé d'un objectif 100X, une immersion à l'huile et un filtre FITC. The DNA labeled in step 1 in a volume of 50 μl is incubated for 30 minutes at 65 ° C., cooled and then incubated for 30 minutes at room temperature. The DNA is diluted in 1.2 ml of MES solution at pH 5.7-6.2. The mixture is introduced into the tank of the combing apparatus. A coverslip prepared in step 2 °) is fixed at the location provided in the apparatus to combing and immersed in the DNA solution for 5 minutes. The coverslip is removed at a constant speed of 300 μιη / s, which causes the DNA to be combed on both sides of the coverslip. This is then dried in air and fixed on a glass slide. The combing of the DNA is visualized with an optical microscope equipped with a 100X objective, an immersion in oil and a FITC filter.

4°) Acquisition des images 4 °) Acquisition of images

Les lamelles sont placées sous l'objectif 100X d'un microscope optique équipé d'une caméra CCD (type CoolSnap HQ, Photometrics) possédant une grille de 512 x 512 photocapteurs de 6,5 μπι². Une lumière excitatrice de 510 nm est appliquée par une lumière laser, et un filtre adapté est utilisé pour collecter les signaux. Des images 16-bit TIFF sont capturées et transmises pour analyse aux systèmes informatiques (figure 1). The slats are placed under the objective 100X of an optical microscope equipped with a CCD camera (type CoolSnap HQ, Photometrics) having a grid of 512 x 512 photosensors of 6.5 μπι ² . An excitation light of 510 nm is applied by a laser light, and a matched filter is used to collect the signals. 16-bit TIFF images are captured and transmitted for analysis to computer systems (Figure 1).

5°) Analyse d'image 5 °) Image analysis

Un exemple d'image obtenue par cette méthode est donné en figure 1, et un exemple de résultat d'analyse de l'image est donné en figure 2. La trace lumineuse convertie en unités de fluorescence représente la signature numérique de cette molécule d'ADN. An example of an image obtained by this method is given in FIG. 1, and an example of an image analysis result is given in FIG. 2. The luminous trace converted into fluorescence units represents the digital signature of this molecule. DNA.

On observe une trace rectiligne verticale correspondant à une molécule individuelle d'un fragment d'ADN d'environ 30000 bases, marqué par des molécules de rhodamine fluorescentes fixées de manière covalentes sur des bases dUTP, un analogie de la thymidine (figure 1). Les points lumineux épars correspondent à du bruit de fond. A vertical rectilinear trace corresponding to an individual molecule of a DNA fragment of about 30,000 bases, labeled with fluorescent rhodamine molecules covalently fixed on UTP bases, analogous to thymidine, is observed (FIG. 1). The scattered light spots correspond to background noise.

Sur la figure 2, le signal lumineux a été converti en unités arbitraires de fluorescence par un logiciel d'analyse d'image (ImageJ; courbe rose - courbe du haut) tandis qu'un échantillon représentatif du bruit de fond environnant est représenté par la courbe bleue (courbe du bas). Exemple 2. marquage par hybridation d'oligonucléotides fluorescents In Figure 2, the light signal has been converted into arbitrary fluorescence units by image analysis software (ImageJ, pink curve - top curve) while a representative sample of the surrounding background noise is represented by the blue curve (bottom curve). Example 2. Hybridization Labeling of Fluorescent Oligonucleotides

Une signature spécifique de molécule d'ADN peut être obtenue par hybridation sur ces molécules de courtes séquences d'ADN (oligomères) marquées lors de leur synthèse et donc de manière covalente par une molécule fluorescente. Le procédé comprend quatre étapes principales : A specific signature of a DNA molecule can be obtained by hybridization on these molecules of short DNA sequences (oligomers) labeled during their synthesis and thus covalently by a fluorescent molecule. The process comprises four main steps:

1) le peignage de l'ADN de haut poids moléculaire (par exemple : ADN génomique) sur des lames de verre comme décrit ci-dessus (cet ADN n'est pas marqué par des molécules fluorescentes. 1) combing high molecular weight DNA (eg genomic DNA) on glass slides as described above (this DNA is not labeled with fluorescent molecules.

2) l'hybridation de cet ADN par des oligomères simple brin de 10 à 15 bases, associant des séquences connues et aléatoires comme décrit ci-dessus, et connues pour être répartie à peu près uniformément dans le génome d'intérêt. Ces oligomères comportent à l'une de leurs extrémités une ou plusieurs molécules fluorescentes. L'hybridation résulte en l'attachement, par complémentarité de bases, des oligomères sur les molécules d'ADN peignées, à des positions dépendantes de la séquence de celles-ci. 2) the hybridization of this DNA by single-stranded oligomers of 10 to 15 bases, associating known and random sequences as described above, and known to be distributed approximately nearly uniformly in the genome of interest. These oligomers have at one of their ends one or more fluorescent molecules. Hybridization results in base complement attachment of the oligomers to the combed DNA molecules at sequence-dependent positions.

3) Des images des molécules d'ADN peignées sont réalisées en hyper résolution (Système TIRF et STORM). 3) Images of the combed DNA molecules are made in hyper resolution (TIRF and STORM system).

4) une analyse informatique des images convertit la séquence des signaux lumineux générés par les oligomères, et la distance qui les sépare, en signature spécifique de chaque molécule d'ADN peigné. L'utilisation de différents types de fluorophores, chacun associé à une séquence d'oligonucléotide spécifique, peut permettre d'obtenir un signal d'une complexité accrue, qui à son tour augmente la spécificité des signatures. 4) A computer analysis of the images converts the sequence of the light signals generated by the oligomers, and the distance that separates them, into a specific signature of each molecule of combed DNA. The use of different types of fluorophores, each associated with a specific oligonucleotide sequence, can provide a signal of increased complexity, which in turn increases the specificity of the signatures.

Exemple 3: Marquage de l'ADN par des nucléotides amino-allyle L'incorporation de nucléotides modifiés dans des molécules d'ADN double brin (par exemple un plasmide circulaire, chromosome bactérien circulaire, ADN génomique ou plasmidique linéarisé ou autre) se fait par synthèse in vitro à l'aide d'une polymérase ADN-dépendante Phi29 (par exemple Fermentas #E0402, 10 U/μΙ). Brièvement, une solution de 25 ng/μΐ d'ADN est préparée dans 10 μΐ d'eau pure additionnée de 0,5 μΐ de rouge de crésol. L'ADN est dénaturé par l'addition de 1 μΐ de tampon KOH à 0.2 M et une incubation de 3 minutes à température ambiante. Le mélange est neutralisé par l'addition de 1 μΐ de tampon Tris-Cl. La réaction de polymérisation est réalisée par la préparation du mélange suivant : 1 μΐ_^ d'oligomères aléatoires de 6 bases, 3 μΐ d'ADN dénaturé préparé précédemment, 6 μΐ d'eau pure, 1 μΐ de pyrophosphatase (10 U/μΙ), 2 μΐ d'amino-allyle dCTP à 10 mM, 1 μΐ de chacun des nucléotides dATP, dTTP, dGTP à 25 mM, 2 μΐ de dCTP à 2,5 mM, 5 μΐ de tampon de réaction fourni avec l'enzyme commerciale, 1 μΐ d'enzyme Phi29, 26 μΐ d'eau pure. Le mélange est incubé 16 heures à 37°C puis l'ADN est dénaturé pendant 20 minutes à 70 °C. l'ADN est purifié sur une colonne de filtration (par exemple Macherey-nagel Nucleospin Extractll). EXAMPLE 3 Marking of DNA with Amino-Allyl Nucleotides The incorporation of modified nucleotides into double-stranded DNA molecules (for example a circular plasmid, a circular bacterial chromosome or genomic or linearized plasmid DNA or other) is carried out by in vitro synthesis using a Phi29 DNA-dependent polymerase (eg Fermentas # E0402, 10 U / μΙ). Briefly, a solution of 25 ng / μΐ of DNA is prepared in 10 μl of pure water supplemented with 0.5 μΐ of cresol red. The DNA is denatured by the addition of 1 μl of 0.2 M KOH buffer and incubation for 3 minutes at room temperature. The mixture is neutralized by the addition of 1 μl of Tris-Cl buffer. The polymerization reaction is conducted by preparing the following mixture: 1 μΐ _^ random oligomers of 6 bases, 3 μΐ denatured DNA prepared above, 6 μΐ of pure water, 1 μΐ of pyrophosphatase (10 U / μΙ) , 2 μl of 10mM dCTP amino allyl, 1 μl of each dATP nucleotide, dTTP, 25mM dGTP, 2 μl of 2.5mM dCTP, 5 μl of reaction buffer provided with the commercial enzyme , 1 μΐ of Phi29 enzyme, 26 μΐ of pure water. The mixture is incubated for 16 hours at 37 ° C. and then the DNA is denatured for 20 minutes at 70 ° C. the DNA is purified on a filtration column (for example Macherey-nagel Nucleospin Extractll).

Le couplage entre les nucléotides modifiés porteurs de groupements amino-allyles (par exemple dUMP-aa ou dCMP-aa) est réalisé par une réaction chimique connue de l'homme de l'art. Brièvement une solution de 10 μΐ de tampon carbonate à 0, 1 M et à pH 9,3 contenant 200 mg de groupement fluorescent (par exemple Cy3-NHS ou Cy5-NHS) dissous est additionnée de 1 μg d'ADN comportant le nucléotide modifié et de 80 μΐ d'eau pure. Le mélange est incubé sur la nuit à 25°C avec une agitation douce et constante. Les fluorophores non couplés sont ensuite éliminés par précipitation, par exemple grâce à un tampon acétate ou par un kit de purification commercial (par exemple Promega ref #A7280 ou Macherey-nagel Nucleospin Extractll). L'ADN purifié est remis en solution dans l'eau pure (par exemple 50 μΐ). L'efficacité du marquage est réalisée par mesure spectrophotométrique connue de l'homme de l'art, en calculant le rapport d'absorption entre l'ADN (260 nm) et le fluorophore (à la longueur d'onde spécifique du fluorophore). The coupling between the modified nucleotides bearing amino-allyl groups (for example dUMP-aa or dCMP-aa) is achieved by a chemical reaction known to those skilled in the art. Briefly a solution of 10 μl of 0.1 M carbonate buffer and pH 9.3 containing 200 mg of dissolved fluorescent group (for example Cy3-NHS or Cy5-NHS) is added with 1 μg of DNA containing the modified nucleotide and 80 μl of pure water. The mixture is incubated overnight at 25 ° C with gentle and constant agitation. The uncoupled fluorophores are then removed by precipitation, for example by means of an acetate buffer or by a commercial purification kit (for example Promega ref # A7280 or Macherey-nagel Nucleospin Extractll). The purified DNA is redissolved in pure water (for example 50 μl). The labeling efficiency is achieved by spectrophotometric measurement known to those skilled in the art, by calculating the absorption ratio between the DNA (260 nm) and the fluorophore (at the specific wavelength of the fluorophore).

Si nécessaire, un couplage plus efficace peut-être réalisé sur de l'ADN simple brin contenant les nucléotides modifiés. Après l'étape de polymérisation en présence de nucléotides modifiés portant un groupement amino-allyle, l'ADN est dénaturé par des méthodes connues de l'homme de l'art, par exemple par addition d'urée concentrée. Après purification l'ADN est traité comme décrit ci-dessus par obtenir un marquage fluorescent. If necessary, more efficient coupling can be achieved on single-stranded DNA containing the modified nucleotides. After the polymerization step in the presence of modified nucleotides bearing an amino-allyl group, the DNA is denatured by methods known to those skilled in the art, for example by addition of concentrated urea. After purification the DNA is treated as described above to obtain fluorescent labeling.

Exemple 4: Marquage de l'ADN par des nucléotides ethynyl-dUTP et "click chemistry" Example 4: DNA labeling with ethynyl-dUTP nucleotides and "click chemistry"

On peut également incorporer dans de l'ADN double brin des nucléotides 5 -Ethynyl-dUTP (5-EdUTP) par le même protocole que décrit dans l'exemple 3. Le couplage entre les 5- EdUTP et le groupe fluorescent suit un protocole différent comme suit. Une solution de 5 μΐ d'ADN (2 mM) dans l'eau pure est additionnée de 2 μΐ d'une solution d'azide (50 mM), 3 μΐ d'une solution de 0, 1 M CuBr, 0, 1 M de tampon tris-(benzyltriazolylmethyl)amine. Le mélange est incubé à 25°C pendant 3 heures. L'ADN est purifié par précipitation à l'acétate par un protocole bien connu de l'homme de l'art. Nucleotide 5-Ethynyl-dUTP (5-EdUTP) can also be incorporated into double-stranded DNA by the same protocol as described in Example 3. The coupling between the 5-EdUTPs and the fluorescent group follows a different protocol. as following. A solution of 5 μΐ of DNA (2 mM) in pure water is added with 2 μΐ of a solution of azide (50 mM), 3 μl of a solution of 0.1 M CuBr, 0, 1 M tris buffer (benzyltriazolylmethyl) amine. The mixture is incubated at 25 ° C for 3 hours. The DNA is purified by acetate precipitation by a protocol well known to those skilled in the art.

Si nécessaire, un couplage plus efficace peut-être réalisé sur de l'ADN simple brin contenant les nucléotides modifiés. Après l'étape de polymérisation en présence de nucléotides modifiés portant un groupement ethynyl, l'ADN est dénaturé par des méthodes connues de l'homme de l'art, par exemple par addition d'urée concentrée. Après purification l'ADN est traité comme décrit dans le paragraphe précédent pour procéder au marquage fluorescent. Exemple 5: incorporation de nucléotides fluorescents avec le mutant E10 de la polymérase pfu If necessary, more efficient coupling can be achieved on single-stranded DNA containing the modified nucleotides. After the polymerization step in the presence of modified nucleotides bearing an ethynyl group, the DNA is denatured by methods known to those skilled in the art, for example by addition of concentrated urea. After purification the DNA is treated as described in the previous paragraph to carry out the fluorescent labeling. Example 5 Incorporation of Fluorescent Nucleotides with the E10 Mutant of the Pfu Polymerase

L'incorporation directe de nucléotides couplés à un fluorophore dans de l'ADN double brin est possible grâce à l'utilisation de polymérases à ADN particulièrement tolérantes aux nucléotides modifiés. Par exemple la forme mutée de la polymérase de la famille des polB extraite de la bactérie Pyrococcus furiousus (Pfu), appelée E10, est susceptible de substituer jusqu'à 100% de cytosines par des cytosines couplées au fluorophore Cy3, comme décrit dans Ramsay et col. (J. AM. CHEM. SOC. 2010, 132, 5096-5104). The direct incorporation of fluorophore-coupled nucleotides into double-stranded DNA is possible through the use of particularly modified nucleotide tolerant DNA polymerases. For example, the mutated form of the polB family polymerase extracted from the bacterium Pyrococcus furiousus (Pfu), designated E10, is capable of substituting up to 100% of cytosines by cytosines coupled to the Cy3 fluorophore, as described in Ramsay and collar. (J. AM CHEM SOC, 2010, 132, 5096-5104).

Exemple 6: Peignage moléculaire Example 6 Molecular Combing

Préparation du support Preparation of the support

Une lamelle de microscope en verre de 22 x 22 mm de surface et de 0, 17 mm d'épaisseur est lavée deux fois successivement dans de l'acétone, de l'isopropanol et de l'eau pure, séchée et placée immédiatement dans un four à plasma (par exemple Diener electronics Femto) pour un traitement de 2 minutes à 80% d'intensité. La lame est ensuite immédiatement trempée dans un bain composé de 100 ml d'heptane et 150 μΐ de Trimethoxy(7-octen-l-yl)silane à 80% pendant quelques secondes puis séchée pendant 12 heures à température ambiante. La lame est finalement soniquée successivement dans un bain d'heptane, d'eau pure et enfin de chloroforme, pendant 5 minutes chaque fois. A glass microscope slide having a surface area of 22 × 22 mm and a thickness of 0.17 mm is washed twice successively in acetone, isopropanol and pure water, dried and placed immediately in a plasma furnace (eg Diener electronics Femto) for a treatment of 2 minutes at 80% intensity. The slide is then immediately soaked in a bath composed of 100 ml of heptane and 150 μl of Trimethoxy (7-octen-1-yl) silane at 80% for a few seconds and then dried for 12 hours at room temperature. The slide is finally sonicated successively in a bath of heptane, of pure water and finally of chloroform, for 5 minutes each time.

Peignage combing

Le peignage moléculaire est réalisé dans un appareil ad hoc, permettant de déplacer à vitesse constante une lamelle de verre hors d'un réservoir. Une solution d'ADN à peigner est préparée comme suit : 150 ng d'ADN en suspension dans de l'eau, 150 μΐ de tampon MES (par exemple euromedex ref#EU0033), complétion à 1,5 ml par de l'eau pure. La solution est placée dans le réservoir de l'appareil. Une lamelle de verre de 22x 22 mm et de 0, 17 mm d'épaisseur préparée comme décrit en 2.1 est placée sur le support adapté et incubée 5 minutes dans la solution d'ADN. La lamelle est ensuite déplacée à vitesse constante de 300 μιη/s hors du réservoir. Molecular combing is carried out in an ad hoc apparatus, making it possible to move a glass slide out of a reservoir at a constant speed. A DNA solution to be combed is prepared as follows: 150 ng of DNA suspended in water, 150 μΐ of MES buffer (for example euromedex ref # EU0033), completion to 1.5 ml with water pure. The solution is placed in the tank of the device. A glass slide 22x 22 mm and 0.17 mm thick prepared as described in 2.1 is placed on the suitable support and incubated for 5 minutes in the DNA solution. The strip is then moved at a constant speed of 300 μιη / s out of the tank.

Exemple 7: Acquisition d'image par microscopie optique 7.1. Microscopie optique haute résolution Example 7 Image Acquisition by Light Microscopy 7.1. High resolution optical microscopy

Une lame de microscope sur laquelle sont immobilisées les molécules d'ADN est montée sur un microscope optique (par exemple Nikon Eclipse Ti) équipé d'une caméra haute résolution (par exemple Coolsnap HQ2 de Roper Scientific Photometrics ou Ixon 897 de ANDOR Technology). Du liquide de montage anti-blanchiment est utilisé (Spector lab). Les images sont acquises à travers un objectif de 60X (par exemple Plan Apo Vc; 60x/l,40 OIL;∞/0, 17 WD 0, 13 de Nikon) ou de 100X (par exemple Plan APO Vc; 100x/1,40 OIL;∞/0, 17 WD DIC N2 de Nikon) par immersion sous huile (par exemple type F; Ne=l,518 ue=41 23 °C de Olympus). Des filtres d'excitation, dichroïques et d'émission appropriés sont utilisés en fonction du fluorophore utilisé pour marquer l'ADN. Par exemple on utilisera pour le Cy3 un filtre d'excitation FF01-531/40-25, un filtre dichroïque FF562-Di02-25x36 et un filtre d'émission FF593/40-25 de SEMROCK Brightline. Pour le Cy5 on utilisera un filtre d'excitation FF01-628/40-25, un filtre dichroïque FF660-DÏ01-25x36 et un filtre d'émission FF01 -692/40-25 de SEMROCK Brightline. A microscope slide on which the DNA molecules are immobilized is mounted on an optical microscope (eg Nikon Eclipse Ti) equipped with a high resolution camera (for example Coolsnap HQ2 from Roper Scientific Photometrics or Ixon 897 from ANDOR Technology). Anti-whitening mounting liquid is used (Spector lab). Images are acquired through an objective of 60X (for example Plan Apo Vc, 60x / l, 40 OIL; ∞ / 0, 17 WD 0, 13 of Nikon) or 100X (for example Plan APO Vc; 100x / 1, 40 OIL; ∞ / 0, 17 Nikon WD DIC N2) by immersion in oil (eg type F; Ne = 1.518 ue = 41 23 ° C from Olympus). Appropriate excitation, dichroic and emission filters are used depending on the fluorophore used to label the DNA. For example, an excitation filter FF01-531 / 40-25, a dichroic filter FF562-Di02-25x36 and a transmission filter FF593 / 40-25 from SEMROCK Brightline will be used for the Cy3. For the Cy5 an excitation filter FF01-628 / 40-25, a dichroic filter FF660-D101-25x36 and a transmission filter FF01 -692 / 40-25 from SEMROCK Brightline will be used.

7.2. Mise en œuyre du système TIRF 7.2. Implementation of the TIRF system

Une méthode alternative à 7.1 consiste à utiliser la technique Total Internai Reflection Microscopy (TIRF). Les modifications sont les suivantes : un objectif possédant une grande ouverture numérique Nikon APO TIRF; 100x/l,49 OIL;∞/0, 13 - 0,20 DIC N2) et l'huile de montage préférée est par exemple Nikon Immersion oil type F; Nd=l,515 - vd (dispersion)=43 à 23°C - viscosité = 800cSt. L'excitation est produite par un laser de longueur d'onde adaptée au fluorophore choisi, par exemple pour le Cy5 l'excitation est produite à 633 nm (Vortran Laser technology (class IIIB)) ou pour du Cy3 à 532 nm (Melles Griot 43 Séries ion laser). Le faisceau laser excite le milieu avec un angle d'incidence supérieur à l'angle critique grâce au miroir « TIRF » présent dans le microscope. An alternative method to 7.1 is to use the Total Internal Reflection Microscopy (TIRF) technique. The changes are as follows: a lens with a large numerical aperture Nikon APO TIRF; 100x / l, 49 OIL; ∞ / 0.13-0.20 DIC N2) and the preferred mounting oil is, for example, Nikon Immersion oil type F; Nd = 1.515 - vd (dispersion) = 43 at 23 ° C - viscosity = 800cSt. The excitation is produced by a wavelength laser adapted to the chosen fluorophore, for example for Cy5 the excitation is produced at 633 nm (Vortran Laser technology (class IIIB)) or for Cy3 at 532 nm (Melles Griot 43 Laser ion series). The laser beam excites the medium with an angle of incidence greater than the critical angle thanks to the "TIRF" mirror present in the microscope.

Exemple 8: obtention d'un profil d'intensité lumineuse sur des molécules individuelles d'ADN Example 8 Obtaining a Light Intensity Profile on Individual DNA Molecules

Obtention d'un profil par détection de crêtes Get a profile by peak detection

La détermination du profil lumineux de chaque molécule présente sur une image est réalisée automatiquement par un algorithme implémenté dans un programme informatique. L'algorithme se déroule en deux étapes. Une ligne de crête est une surface de courbure élevée (courbure maximale dans au moins une direction) produite par l'image de la molécule d'ADN, définie par des valeurs absolues élevée des plus petites valeurs propres de la matrice symétrique des dérivées partielles de second ordre (matrice Hessienne) de chaque pixel. Selon un mode particulier de réalisation, un flou Gaussien (sigma = 1 pixel) peut être appliqué afin de réduire l'influence du bruit. La deuxième étape comprend la détermination d'une courbe spline par interpolation des points de courbure maximale détectés sur la crête, établis lors de la première étape. La courbe est échantillonnée à des points équidistants (par exemple à chaque pixel, ou à chaque demi-pixel) afin d'approximer la valeur de la surface à ces points. L'approximation peut se faire par la prise en compte directe de la valeur du pixel, par l'approximation de Taylor de la surface au point choisi, ou par une déconvolution de Richardson-Lucy basée sur une « point spread function ». La liste successive de ces valeurs constitue le profil fluorescent de la molécule d'ADN. The determination of the light profile of each molecule present on an image is performed automatically by an algorithm implemented in a computer program. The algorithm takes place in two stages. A ridge line is a high curvature surface (Maximum curvature in at least one direction) produced by the image of the DNA molecule, defined by high absolute values of the smallest eigenvalues of the symmetric matrix of the second order partial derivatives (Hessian matrix) of each pixel. According to a particular embodiment, a Gaussian blur (sigma = 1 pixel) can be applied in order to reduce the influence of the noise. The second step includes determining a spline curve by interpolation of the maximum curvature points detected on the peak established in the first step. The curve is sampled at equidistant points (for example at each pixel, or at each half-pixel) in order to approximate the value of the surface at these points. The approximation can be done by directly taking into account the value of the pixel, by the Taylor approximation of the surface at the chosen point, or by a Richardson-Lucy deconvolution based on a "point spread function". The successive list of these values constitutes the fluorescent profile of the DNA molecule.

Exemple 9: tri de molécules sur la base de leurs profils lumineux Example 9 Sorting Molecules on the Basis of Their Light Profiles

L'objectif de cette application est de trier un groupe de molécules composé de n types de molécules différentes en n sous-groupes distincts, chaque type de molécule pouvant être représenté un grand nombre de fois. L'exemple (réel) ci-dessous consiste à montrer qu'il est possible de distinguer des molécules de phage lambda d'une part, et des molécules issues du génome d'E. coli d'autre part. The purpose of this application is to sort a group of molecules composed of n different types of molecules in n distinct subgroups, each type of molecule being able to be represented a large number of times. The (real) example below is to show that it is possible to distinguish lambda phage molecules on the one hand, and molecules from the E genome. coli on the other hand.

9.1. Découpage des profils en blocs 9.1. Cutting profiles into blocks

Les profils de chaque molécule sont découpés en blocs de taille t se chevauchants de p pixels, avec p < t. Une distance Euclidienne est calculée entre toutes les paires possibles de blocs. On recherche alors les situations où une succession de blocs appartenant à une molécule donnée montrent une petite distance d'une succession de blocs d'une autre molécule. Cette situation est caractéristique de deux molécules partageant une région de même séquence d'ADN. The profiles of each molecule are cut into blocks of size t overlapping by p pixels, with p <t. A Euclidean distance is calculated between all possible pairs of blocks. We then look for situations where a succession of blocks belonging to a given molecule show a small distance from a succession of blocks of another molecule. This situation is characteristic of two molecules sharing a region of the same DNA sequence.

9.2. Calcul de la distance euclidienne entre chaque bloc 9.2. Calculating the Euclidean distance between each block

La distance euclidienne entre deux blocs bl et b2 de taille N est donnée par l'équation suivante :

On définit pour chaque bloc de chaque molécule, ses k meilleurs voisins (par exemple avec k= 50, 100, 150, etc) parmi tous les blocs de toutes les autres molécules. Les k meilleurs voisins (k Nearest Neighbors, kNN) d'un bloc donné sont ceux qui lui sont le plus similaires car ils en sont éloignés par une petite distance euclidienne. The Euclidean distance between two blocks bl and b2 of size N is given by the following equation:

We define for each block of each molecule, its k best neighbors (for example with k = 50, 100, 150, etc.) among all the blocks of all the other molecules. The k nearest neighbors (k Nearest Neighbors, kNN) of a given block are those which are most similar to it because they are removed by a small Euclidean distance.

D'autres stratégies sont également possibles pour calculer la distance entre deux profils ou sous-segments de profil de molécules : Other strategies are also possible to calculate the distance between two profiles or sub-segments of the molecule profile:

- Corrélations croisées normalisées - Standard cross-correlations

- Corrélations croisées normalisées avec chevauchement partiel - Standard cross-correlations with partial overlap

- Distance euclidienne sur des transformées de Fourier à terme - Euclidean distance on Fourier transforms at term

- Distance de Hellinger - Distance from Hellinger

- Minimum Variance Matching (MVM) selon Keogh - Minimum Variance Matching (MVM) according to Keogh

- Distance euclidienne sur des coefficients d'ondelettes seuillés - Euclidean distance on thresholded wavelet coefficients

- Ecart de Weyl - Weyl gap

9.3. Représentation matricielle des blocs similaires 9.3. Matrix representation of similar blocks

Pour toutes les paires possibles de molécules {m_1} m₂), on sélectionne parmi les kNN de chaque bloc de m_} les blocs appartenant à m₂, et on les représente sur un graphe à deux dimensions, dont les axes sont les positions des blocs respectifs de m_} et m₂ (Figure 3). For all the possible pairs of molecules {m _1} m ₂ ), from among the kNNs of each block of m _}, the blocks belonging to m ₂ are selected and represented on a two-dimensional graph whose axes are the positions respective blocks of m _} and m ₂ (Figure 3).

9.4. Calcul d'un coefficient de corrélation entre deux molécules 9.4. Calculation of a correlation coefficient between two molecules

Afin de quantifier la comparaison des molécules mi et m₂, on calcule le coefficient de corrélation de Pearson des positions des blocs représentés dans la matrice (m_1}m₂). Le coefficient de corrélation de Pearson permet de classer toutes les paires possibles de molécules (m_1}m₂) de la plus similaire à la moins similaire. Si nécessaire, le calcul du coefficient de Pearson peut être remplacé par le calcul du coefficient de Kendall. In order to quantify the comparison of the mi and m ₂ molecules, the Pearson correlation coefficient of the positions of the blocks represented in the matrix (m _1} m ₂ ) is calculated. The Pearson correlation coefficient classifies all possible pairs of molecules (m _1} m ₂ ) from most similar to least similar. If necessary, the calculation of the Pearson coefficient can be replaced by the calculation of the Kendall coefficient.

9.5. Exemple de résultat sur 52 molécules de lambda et 52 molécules de E. coli 9.5. Sample result on 52 lambda molecules and 52 E. coli molecules

Dans cet exemple, 52 molécules de phage lambda (30 à 40 kb) et 52 molécules du génome d'E. coli (30 à 40 kb) ont été identifiées par le procédé décrit ci-dessus (étapes 9.1 à 9.4). Les molécules ont été marquées en substituant 17% des thymidines natives par une uracile couplée à une molécule fluorescente sur l'un ou l'autre brin d'ADN (marquage amino-allyle + Cy3-NHS). Le génome du phage lambda mesure environ 48 kb tandis que le génome d'E. coli mesure environ 4.6 Mb, soit cent fois plus. Ainsi l'hypothèse de départ postule qu'il est probable que beaucoup de molécules de lambda partagent des segments de séquences entre elles, tandis qu'en raison de sa très grande taille relative au phage lambda, il est très peu probable que deux molécules d'E. coli partagent un segment de séquence. La comparaison des profils par la méthode décrite ci-dessus doit donc permettre de distinguer les molécules de lambda, qui auront un profil similaire entre elles, des molécules d'E. coli qui auront un profil différent de tous les autres. La figure 4 illustre une comparaison de deux profils de molécules de lambda différentes par la méthode décrite ci-dessus. Dans cette comparaison le coefficient de corrélation de Pearson est égal à 0.99. Les résultats montrent que le procédé est clairement en mesure de trier les paires de molécules lambda en net excès par rapport aux paires de molécules E. coli ou par rapport aux paires de molécules E. coli - lambda (Figure 4). On constate en effet que plus le seuil de coefficient de corrélation augmente - c'est à dire plus la spécificité augmente - mieux le procédé peut distinguer les paires de molécules lambda, au point d'éliminer totalement les paires artéfactuelles lambda - E. coli. In this example, 52 lambda phage molecules (30 to 40 kb) and 52 molecules of the E. coli (30 to 40 kb) were identified by the method described above (steps 9.1 to 9.4). The molecules were labeled by substituting 17% of the native thymidines with an uracil coupled to a fluorescent molecule on either strand of DNA (amino-allyl labeling + Cy3-NHS). The genome of the lambda phage is about 48 kb while the genome of E. coli is about 4.6 Mb, a hundred times more. So the starting hypothesis postulates that it is It is likely that many lambda molecules share sequence segments with each other, whereas because of its very large size relative to lambda phage, it is very unlikely that two molecules of E. coli share a sequence segment. The comparison of the profiles by the method described above must therefore make it possible to distinguish the lambda molecules, which will have a similar profile to each other, molecules of E. coli that will have a profile different from all others. Figure 4 illustrates a comparison of two different lambda molecule profiles by the method described above. In this comparison the correlation coefficient of Pearson is 0.99. The results show that the method is clearly able to sort the pairs of lambda molecules in net excess over the pairs of E. coli molecules or with respect to the pairs of E. coli-lambda molecules (Figure 4). It can be seen that the higher the correlation coefficient threshold increases - that is, the more the specificity increases - the better the process can distinguish the pairs of lambda molecules, to the point of totally eliminating the artefactual lambda-E. coli pairs.

Exemple 10: identification de signatures spécifiques de molécules Example 10: Identification of Specific Signatures of Molecules

Nous montrons ci-dessous que deux profils de molécules d'ADN lambda marquées à 20% génèrent chacun une signature spécifique car leur coefficient de corrélation maximal est obtenu uniquement lorsque les deux séquences sont parfaitement alignées. De plus ce coefficient maximal est largement supérieur à ceux produits en alignant un profil d'une molécule de lambda à celui d'une molécule d'E. coli, dans toutes les combinaisons possibles. 10.1. Obtention d'images de profils d'intensités de séquences d'ADN. We show below that two lambda DNA molecule profiles labeled at 20% each generate a specific signature because their maximum correlation coefficient is obtained only when the two sequences are perfectly aligned. Moreover, this maximum coefficient is much greater than those produced by aligning a profile of a lambda molecule with that of an E molecule. coli, in all possible combinations. 10.1. Obtaining images of intensity profiles of DNA sequences.

Une image de molécule d'ADN est générée comme suit. Une Point Spread Function (PSF), soit une fonction mathématique décrivant la distribution des photons émis par une molécule fluorescente unique, est déterminée en approximant par une fonction Gaussienne les points obtenus empiriquement sur des fluorophores individuels par le système microscopique décrit dans l'exemple 5.1. Cette PSF est attribuée à un pourcentage donné (pourcentage de bases marquées) des bases adénine et thymidine choisies de manière aléatoire sur une molécule d'ADN d'intérêt. Chaque base de la molécule d'ADN est assignée à un pixel de taille 0.5 nm (soit la taille d'une base lorsqu'une molécule d'ADN est étirée) sur une ligne de pixel dans l'image. Un nombre aléatoire de ces pixels correspondant à la base marquée (par exemple adénine) est sélectionné, en fonction du pourcentage de marquage souhaité, et la valeur maximale d'un pixel est fixée à la valeur constante de 100 photons. Une convolution en deux dimensions est ensuite réalisée par intégration de la PSF, puis un bruit de Poisson aléatoire est rajouté (fonction de l'intensité des pixels) ainsi qu'une léger bruit Gaussien (de moyenne 0 et de déviation standard 10). A DNA molecule image is generated as follows. A Spread Function Point (PSF), a mathematical function describing the distribution of photons emitted by a single fluorescent molecule, is determined by approximating by a Gaussian function the points obtained empirically on individual fluorophores by the microscopic system described in Example 5.1. . This PSF is assigned to a given percentage (percentage of labeled bases) of adenine and thymidine bases randomly selected on a DNA molecule of interest. Each base of the DNA molecule is assigned to a pixel of size 0.5 nm (the size of a base when a DNA molecule is stretched) on a pixel line in the image. A random number of these pixels corresponding to the marked base (for example adenine) is selected, depending on the percentage of desired labeling, and the maximum value of one pixel is set at the constant value of 100 photons. A convolution in two dimensions is then carried out by integration of the PSF, then a noise of random Poisson is added (function of the intensity of the pixels) as well as a slight Gaussian noise (of average 0 and standard deviation 10).

10.2. Calcul du profil d'intensité 10.2. Calculation of the intensity profile

Le profil de chaque molécule est déterminé par la méthode de détection de crêtes telle que décrite dans l'exemple 8.1 ci-dessus. The profile of each molecule is determined by the peak detection method as described in Example 8.1 above.

10.3. Calcul de la similarité entre différents profils d'intensité par coefficient de corrélation Comparaison de deux molécules de lambda par glissement 10.3. Calculation of the similarity between different intensity profiles by correlation coefficient Comparison of two molecules of lambda by slip

Le profil de chaque molécule de lambda (complète et fragment) dérivé des images représentées en Figure 6 est comparé en glissant les deux profils l'un par rapport à l'autre, pixel par pixel, et en calculant un coefficient de corrélation de Pearson à chaque position. Cette expérience montre que le coefficient maximal de 0.669 se démarque remarquablement bien des autres valeurs obtenues et est obtenu précisément à la position correspondant à l'alignement des deux séquences (Figure 5, courbe trait plein). L'allure globale de la courbe témoigne également du caractère spécifique de la signature, par l'augmentation progressive et symétrique des coefficients de corrélation de part et d'autre de la position correspondant à l'alignement exact. The profile of each lambda molecule (complete and fragment) derived from the images shown in Figure 6 is compared by sliding the two profiles relative to each other, pixel by pixel, and calculating a correlation coefficient of Pearson to each position. This experiment shows that the maximum coefficient of 0.669 stands out remarkably well from the other values obtained and is obtained precisely at the position corresponding to the alignment of the two sequences (Figure 5, solid line curve). The overall shape of the curve also testifies to the specific character of the signature, by the gradual and symmetrical increase of the correlation coefficients on either side of the position corresponding to the exact alignment.

Comparaison d'une molécule de lambda et d'une molécule d'ADN de souris Comparison of a lambda molecule and a mouse DNA molecule

Le contrôle négatif suivant renforce la notion de signature spécifique. Une molécule issue du génome de souris mesurant 212 kb a également servi de base pour produire une image comme décrit en 10.1 (marquage aléatoire à 80%), puis un profil comme décrit en 8.1. Ce profil a été comparé à celui du phage lambda, par glissement progressif et calcul d'un coefficient de corrélation à chaque position (Figure 5, courbe pointillée). On constate sans équivoques que les coefficients issus de cette comparaison de deux molécules différentes montrent un maximum à 0.27, soit très largement inférieur au coefficient maximal obtenu lors de la comparaison de deux molécules identiques comme décrit ci-dessus. The following negative control reinforces the notion of specific signature. A molecule from the 212 kb mouse genome also served as a basis for producing an image as described in 10.1 (80% randomization), followed by a profile as described in 8.1. This profile was compared to that of lambda phage, by progressive sliding and calculation of a correlation coefficient at each position (Figure 5, dashed curve). It is unequivocally clear that the coefficients resulting from this comparison of two different molecules show a maximum at 0.27, which is very much less than the maximum coefficient obtained when comparing two identical molecules as described above.

Références Bertero et P. Boccacci, 1998, "Introduction to Inverse Problems in Imaging" (IOP Publishing, Bristol) References Bertero and P. Boccacci, 1998, "Introduction to Inverse Problems in Imaging" (IOP Publishing, Bristol)

Betzig et al, "Imaging Intracellular Fluorescent Proteins at Nanometer Resolution" , Science, 2006, 1642-1645 Betzig et al, "Imaging Intracellular Fluorescent Proteins at Nanometer Resolution", Science, 2006, 1642-1645

Blow et al, "Initiation ofDNA replication in nuclei and purified DNA by a cell-free extract of Xenopus eggs", Cell, 1986, 47, 577 - 587 Blow et al, "Initiation of DNA replication in nuclei and purified DNA by cell-free extract of Xenopus eggs", Cell, 1986, 47, 577-587

Donoho et al, "Nonlinear Solution of Linear Inverse Problem by Wavelet-Vaguelette Décomposition" , Appl. Comput. Harmon. Anal., 2 (1995), 101-126 Donoho et al, "Nonlinear Solution of Inverse Linear Problem by Wavelet-Vaguelette Decomposition", Appl. Comput. Harmon. Anal., 2 (1995), 101-126

Marheineke, et al, "Use of DNA combing to study DNA replication in Xenopus and human cell-free Systems", Methods in Molecular Biology, DNA replication: Methods and Protocols, 2009, 521, 575-60 Marheineke, et al., Methods in Molecular Biology, DNA Replication: Methods and Protocols, 2009, 521, 575-60.

Mortensen et al, "Optimized localization analysis for single-molecule tracking and super- resolution microscopy" , Nature Methods 7, 2010, 377 - 381 Mortensen et al, "Optimized localization analysis for single-molecule tracking and super-resolution microscopy", Nature Methods 7, 2010, 377 - 381

Neely et al, "DNA fluor ocode: a single molécule, optical map of DNA with nanometre resolution", Chem. Sci., 2010, 1, 453-460 Neely et al, "DNA fluorode ocode: a single molecule, optical map of DNA with nanometer resolution", Chem. Sci., 2010, 1, 453-460

Pertsinis et al. "Subnanometre single-molecule localization, registration and distance measurement" s, Nature 2010, 466:647-51 Pertsinis et al. "Subnanometer single-molecule localization, registration and distance measurement", Nature 2010, 466: 647-51

Ramsay et al. CyDNA: Synthesis and Replication of Highly Cy-Dye Substituted DNA by an Evolved Polymerase" J. Am. Chem. Soc. 2010, 132, 5096-5104) Ramsay et al. CyDNA: Synthesis and Replication of Highly Cy-Dye Substituted DNA by an Evolved Polymerase "J. Am., Chem Soc., 2010, 132, 5096-5104)

Rust et al., "Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM)", Nat Methods, 2006, 793-795 Rust et al., "Sub-diffraction-limiting imaging by stochastic optical reconstruction microscopy (STORM)", Nat Methods, 2006, 793-795

Thompson, R.E., Larson, D.R., Webb, W.W. Précise nanometer localization analysis for individual fluorescent probes. Biophys. J. (2002) 82 :2775-2783 Thompson, R.E., Larson, D.R., Webb, W.W. Precise nanometer localization analysis for individual fluorescent probes. Biophys. J. (2002) 82: 2775-2783

Zink et al., "Structure and dynamics of human interphase chromosome territories in vivo"; Hum Genêt; 1998 Feb; 102(2):241-51 Zink et al., "Structure and dynamics of human interphase chromosome territories in vivo"; Hum Genet; 1998 Feb; 102 (2): 241-51

Claims

A method for obtaining a signature specific for a DNA molecule, comprising:

a) the labeling of a DNA molecule by incorporation of at least one modified nucleotide during a DNA polymerization reaction, the modified nucleotide being detectable, or being capable of being modified so as to be detectable ;

b) immobilizing and stretching the labeled DNA on a solid surface;

c) detecting the labeling along the DNA by means of a suitable optical device, whereby a specific signature of said DNA sequence is obtained.

2. The method according to claim 1, wherein the DNA molecule has a size greater than 50 kb, preferably greater than 60 kb, more preferably greater than 100 kb, even more preferably greater than 200 kb.

3. The method of claim 1 or 2, the labeling step a) comprising incorporating at least one modified nucleotide during a DNA polymerization reaction in vitro with Phi29 polymerase.

4. Method according to any one of claims 1 to 3, the detection in step c) using a TIRF system.

5. Method according to any one of claims 1 to 4, the detection in step c) implementing a STORJVI system.

The method of any one of claims 1 to 5, wherein the immobilizing and stretching step b) precedes the marking step a).

A method for identifying a DNA molecule comprising:

i) obtaining the signature of said DNA molecule according to the method of any one of claims 1 to 6; and

ii) comparing the signature obtained in step i) with predetermined signatures of reference DNA molecules;

the comparison to identify the DNA molecule.

A method for identifying a genomic abnormality in a subject comprising:

i) obtaining the molecular signature of a genomic DNA molecule of a tissue or cell of a subject, according to the method of any one of claims 1 to

6;

ii) comparing the signature obtained in step i) with the predetermined signatures of reference DNA molecules from

-of a healthy subject and / or

- a sick person, their illness resulting from an alteration of the genome of one or more of their cells;

to identify said genomic DNA molecule;

a genomic anomaly being detected

if the signature of the genomic DNA of the subject differs from the signatures of the reference DNA molecules in the case of reference DNA molecules originating from a healthy subject, or

if the signature of the genomic DNA tested corresponds to at least one of the reference signatures originating from a diseased subject.

The method of claim 8, further comprising

obtaining in step i) the signature of several genomic DNA fragments and their concatenation on the basis of overlapping signatures;

the concatenate (s) being then compared with the reference sequences to identify the said concatenate (s);

a genomic abnormality is detected if a concatenate differs from the reference DNA molecule that made it possible to identify it.

The method of claim 8 for identifying a complementary DNA molecule, the signature of said complementary DNA being compared to the signatures of a reference complementary DNA molecule set.

11. A method for obtaining the nucleotide sequence of a DNA molecule comprising: a) the labeling of a DNA molecule by incorporation of at least one modified nucleotide during one or more independent polymerization reactions DNA, the modified nucleotide (s) being detectable, or likely to be modified (s) to be detectable;

b) immobilizing and stretching the labeled DNA on a solid surface; c) the detection of the marking by means of an optical device and a processing means adapted to obtain a resolution information less than or equal to 0.5 nm.

The method of claim 11, wherein the DNA molecule is genomic DNA or complementary DNA.

The method of any one of claims 1 to 12, wherein the step

immobilization and stretching is performed using the molecular combing technique.

14. A method of sorting different DNA molecules comprised in a mixture, the method comprising:

a) the labeling of the DNA molecules by incorporation during a polymerization reaction of at least one modified nucleotide detectable or capable of being modified so as to be detectable;

b) immobilizing and stretching the labeled DNA molecules on a solid surface; c) detecting the labeling along the DNA molecules by means of a suitable optical device;

whereby a set of light profiles corresponding to the different DNA molecules is obtained, the different DNA molecules being discriminated according to their light profile.