HK1118870B - Chromosome conformation capture-on-chip (4c) assay - Google Patents
Chromosome conformation capture-on-chip (4c) assay Download PDFInfo
- Publication number
- HK1118870B HK1118870B HK08112831.0A HK08112831A HK1118870B HK 1118870 B HK1118870 B HK 1118870B HK 08112831 A HK08112831 A HK 08112831A HK 1118870 B HK1118870 B HK 1118870B
- Authority
- HK
- Hong Kong
- Prior art keywords
- dna
- nucleotide sequence
- restriction enzyme
- probes
- sequence
- Prior art date
Links
Description
Technical Field
The present invention relates to the analysis of the frequency of interaction of two or more nucleotide sequences in the nuclear space.
Background
The aim of studies on the structure of mammalian nuclei is to understand how 2m long of DNA folds into 10 μm diameter nuclei, while being able to correctly express genes specific for cell types and how faithfully it replicates in each cell cycle. Much of the progress in this area has come from microscopic studies, which reveal that genomes are non-randomly arranged in nuclear space. For example, densely packed heterochromatin is separated from more open euchromatin, and chromosomes occupy distinct regions in the nuclear space. There is a complex relationship between nuclear localization and transcriptional activity. Although transcription occurs throughout the interior of the nucleus, active genes that are clustered on chromosomes tend to be located at the borders or outside of their chromosomal regions. Individual genes may migrate after their transcriptional state has been altered, as measured for relatively large nuclear markers such as chromosomal regions, centromere, or nuclear borders. Also, as recently demonstrated by Fluorescence In Situ Hybridization (FISH) of the β -globin locus and some selected other genes, actively transcribed genes that are tens of millions of bases apart on a chromosome can be brought together in the nucleus. In addition to transcription, genomic organization is associated with the coordination of replication, recombination and the possibility of transposition of loci (which can lead to malignancy) and the placement and replacement of foreign genetic processes. From these observations, the structural organization of DNA in the nucleus is believed to be a key contributor to genome function.
Different tests have been developed to provide insight into the spatial organization of genomic loci in vivo. A test known as RNA-TRAP has been developed (Carter et al (2002) nat. genet.32, 623) which involves targeting horseradish peroxidase (HRP) to nascent RNA transcripts and then quantifying HRP-catalyzed deposition of biotin on nearby chromatin.
Another test that has been developed is known as the chromosome conformation capture (3C) technique, which provides a tool for studying the structural organization of genomic regions. The 3C technique involves the analysis of the cross-linking frequency between two given DNA restriction fragments by quantitative PCR, which allows the measurement of their proximity in the nuclear space (see FIG. 1). This technique, originally developed for analyzing chromosome conformation in yeast (Dekker et al, 2002), has been adapted to study the relationship between gene expression and chromatin folding in complex mammalian gene bundles (see, e.g., Tolhuis et al, 2002; Palstra et al, 2003; and Drissen et al, 2004). Briefly, the 3C technique involves cross-linking cells with formaldehyde in vivo and performing a nuclear digestion of the chromatin with a restriction enzyme, followed by ligation of DNA fragments that are cross-linked into a complex. The ligation products were then quantified by PCR. The PCR amplification step requires knowledge of the sequence information of each DNA fragment to be amplified. Thus, the 3C technique provides a measure of the frequency of interaction between selected DNA fragments.
There is a significant need for high throughput techniques that can systematically screen the entire genome in a manner that has no preference for DNA loci that contact each other in the nuclear space.
The present invention seeks to provide improvements to 3C technology.
Brief description of the invention
The currently used 3C technology is only capable of analyzing a limited number of selected DNA-DNA interactions due to the limitation of the PCR amplification step, which requires knowledge of the specific sequence information of each fragment to be analyzed. Moreover, the selection of restriction fragments as candidates for long-range DNA interactions requires sufficient prior knowledge of the locus of interest (e.g., the location of highly sensitive sites), which is not generally available. Given the functional relevance of many of the long-range DNA-DNA interactions described to date, the ability to randomly screen DNA elements for looping over a sequence of interest (e.g., a gene promoter, enhancer, insulator, silencer, origin of replication, or MAR/SAR) or a genomic region of interest (e.g., a dense or rare region of a gene or a repetitive element) allows for great ease in mapping sequences involved in regulatory networks.
The present invention relates to the 4C technology (i.e. tocapture andcharacterisecolocalisedchromatin (capture and characterization of co-localized chromatin)), which provides a method for analyzing the frequency of interaction of two or more nucleotide sequences in the nuclear space at high throughput.
The 4C (capture and characterization of co-localized chromatin) technology is a modified version of the 3C technology that can search for DNA fragments that interact with a selected locus over a non-biased genome breadth. In short, the 3C analysis was performed as usual, but the PCR step was omitted. The 3C template comprises a bait (e.g., selected restriction fragments comprising the gene of interest) linked to a number of different nucleotide sequences of interest (representative of the genomic environment of the gene). The template is cleaved with another secondary restriction enzyme and ligated. Advantageously, one or more nucleotide sequences of interest linked to the target nucleotide sequence are amplified with at least one (preferably at least 2) oligonucleotide primer(s), wherein at least one primer hybridizes to a DNA sequence flanking the nucleotide sequence of interest. Typically, this results in a PCR fragment pattern that is highly reproducible between independent amplification reactions and specific for a given tissue. In one embodiment, HindIII and DpnII are used as the first and second restriction enzymes. The amplified fragments can then be labeled and optionally hybridized to an array, typically relative to a control sample containing genomic DNA digested with the same combination of restriction enzymes.
In a preferred embodiment of the invention, the ligated fragments cut with the second restriction enzyme are subsequently religated to form small DNA loops.
The 3C technique was therefore modified so that all the nucleotide sequences of interest that interacted with the target nucleotide sequence were amplified. This means in fact that the amplification reaction is carried out without primers specific for the fragment one wishes to analyze, but with one or more oligonucleotide primers hybridizing to the DNA sequences flanking the nucleotide sequence of interest. Advantageously, 4C has no preference for PCR primer design included in the PCR amplification step and can therefore be used to search for interacting DNA elements in the complete genome.
Brief aspects of the invention
Aspects of the invention appear in the appended claims.
In a first aspect, there is provided a method of analysing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences of interest (such as one or more genomic loci), comprising the steps of: (a) providing a sample of cross-linked DNA; (b) digesting the cross-linked DNA with a first restriction enzyme; (c) linking the cross-linked nucleotide sequences; (d) releasing the crosslinking; (e) digesting the nucleotide sequence with a second restriction enzyme; (f) ligating one or more DNA sequences of known nucleotide composition to the available one or more secondary restriction enzyme digestion sites flanking the one or more nucleotide sequences of interest; (g) amplifying one or more nucleotide sequences of interest using at least two oligonucleotide primers, wherein each primer hybridizes to a DNA sequence flanking the nucleotide sequence of interest; (h) hybridizing the amplified one or more sequences to an array; and (i) determining the frequency of interaction between DNA sequences.
In a second aspect, there is provided a method of analysing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences (such as one or more genomic loci) comprising the steps of: (a) providing a sample of cross-linked DNA; (b) digesting the cross-linked DNA with a first restriction enzyme; (c) linking the cross-linked nucleotide sequences; (d) releasing the crosslinking; (e) digesting the nucleotide sequence with a second restriction enzyme; (f) a cyclized nucleotide sequence; (g) amplifying one or more nucleotide sequences linked to a target nucleotide sequence; (h) optionally hybridizing the amplified sequences to an array; and (i) determining the frequency of interaction between DNA sequences.
In a third aspect, there is provided a circularised nucleotide sequence comprising a first and a second nucleotide sequence, wherein each end of the first and second nucleotide sequences are separated by a different restriction enzyme recognition site, and wherein said first nucleotide sequence is a target nucleotide sequence and said second nucleotide sequence is obtained by cross-linking genomic DNA.
In a fourth aspect, there is provided a method of preparing a circularised nucleotide sequence comprising the steps of: (a) providing a sample of cross-linked DNA; (b) digesting the cross-linked DNA with a first restriction enzyme; (c) linking the cross-linked nucleotide sequences; (d) releasing the crosslinking; (e) digesting the nucleotide sequence with a second restriction enzyme; and (f) a cyclized nucleotide sequence.
In a fifth aspect, there is provided a method of analysing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences (such as one or more genomic loci) comprising the use of a circularised nucleotide sequence.
In a sixth aspect, there is provided an array of probes immobilized on a support comprising one or more probes that hybridize or are capable of hybridizing to a circularized nucleotide sequence.
In a seventh aspect, a set of probes is provided that are complementary in sequence to the nucleic acid sequence adjacent to each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
In an eighth aspect, there is provided a method of preparing a probe set comprising the steps of: (a) identifying each of the primary restriction enzyme recognition sites for the primary restriction enzymes in the genomic DNA; (b) designing a probe capable of hybridizing to the adjacent sequence of each of the first restriction enzyme recognition sites in the genomic DNA; (c) synthesizing a probe; and (d) bringing the probes together to form a probe set or substantially to form a probe set.
In a ninth aspect, there is provided a set of probes, or a set of probes substantially obtained, obtained or obtainable by a method as described herein.
In a 10 th aspect, there is provided an array comprising an array of probes as described herein or substantially comprising a set of probes as described herein.
In an 11 th aspect, an array is provided comprising a set of probes as described herein.
In a 12 th aspect, there is provided a method of preparing an array comprising the step of immobilising an array of probes or a primary set of probes as described herein on a solid support.
In a 13 th aspect, there is provided a method of preparing an array comprising the step of immobilizing an array or set of probes as described herein on a solid support.
In a 14 th aspect, there is provided an array obtainable or obtainable by a method as described herein.
In a15 th aspect, there is provided a method of identifying one or more DNA-DNA interactions indicative of a particular disease state, comprising the steps of performing steps (a) - (i) of the first and second aspects of the invention, wherein the sample of cross-linked DNA in step (a) is provided by diseased and non-diseased cells, and wherein a difference between the frequency of interaction between DNA sequences from the diseased and non-diseased cells indicates that the DNA-DNA interaction is indicative of the particular disease state.
In a 16 th aspect, there is provided a method of diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction comprising the steps of performing steps (a) - (i) of the first and second aspects of the invention, wherein step (a) comprises providing a sample of cross-linked DNA from a subject; and wherein step (i) comprises comparing the frequency of interaction between DNA sequences with the frequency of unaffected controls; wherein a difference between the value from the control and the value from the subject indicates that the subject is suffering from the disease or syndrome or that the subject will suffer from the disease or syndrome.
In a 17 th aspect, there is provided a method of diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction comprising the steps of: performing steps (a) - (i) of the first and second aspects of the invention, wherein step (a) comprises providing a sample of cross-linked DNA from the subject; and wherein the method comprises the additional step of: (j) identifying one or more loci that undergo a genomic rearrangement associated with a disease.
In an 18 th aspect, there is provided an assay for identifying one or more agents that modulate DNA-DNA interactions, comprising the steps of: (a) contacting the sample with one or more reagents; and (b) performing steps (a) to (i) of the first and second aspects of the invention, wherein step (a) comprises providing cross-linked DNA from the sample; wherein a difference between (i) the frequency of DNA sequence interactions in the presence of the agent and (ii) the frequency of DNA sequence interactions in the absence of the agent indicates that the agent is capable of modulating DNA-DNA interactions.
In a 19 th aspect, there is provided a method of detecting the location of a balanced and/or unbalanced break point (such as a translocation) comprising the steps of: (a) performing steps (a) to (i) of the first and second aspects of the invention; and (b) comparing the frequency of interaction between the DNA sequences with the frequency of a control; wherein a transition from low to high in the frequency of DNA-DNA interaction in the sample relative to the control indicates the location of the breakpoint.
In a 20 th aspect, there is provided a method of detecting a position of a balanced and/or unbalanced inversion comprising the steps of: (a) performing steps (a) to (i) of the first and second aspects of the invention; and (b) comparing the frequency of interaction between the DNA sequences with the frequency of a control; wherein inversion is indicated by the DNA-DNA interaction frequency of the sample in inverted mode relative to the control.
In a 21 st aspect, there is provided a method of detecting a missing position, comprising the steps of: (a) performing steps (a) to (i) of the first and second aspects of the invention; and (b) comparing the frequency of interaction between the DNA sequences with a control; wherein a decrease in the frequency of DNA-DNA interaction of the sample relative to the control is indicative of a deletion.
In a 22 nd aspect, there is provided a method of detecting a position of a repetition (duplicate), comprising the steps of: (a) performing steps (a) to (i) of the first and second aspects of the invention; and (b) comparing the frequency of interaction between the DNA sequences with the frequency of a control; wherein an increase or decrease in the frequency of DNA-DNA interactions in the subject relative to the control is indicative of a duplication or insertion.
In a 23 th aspect, there is provided a reagent obtainable or obtainable by a test method as described herein.
In a 24 th aspect, there is provided the use of a circularized nucleotide sequence for identifying one or more DNA-DNA interactions in a sample.
In a 25 th aspect, there is provided the use of a circularised nucleotide sequence for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
In a 26 th aspect, there is provided the use of a probe array or probe set as described herein for identifying one or more DNA-DNA interactions in a sample.
In a 27 th aspect, there is provided use of a probe array or probe set as described herein for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
In a 28 th aspect, there is provided use of an array as described herein for identifying one or more DNA-DNA interactions in a sample.
In a 29 th aspect, there is provided the use of an array as described herein for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
In a 30 th aspect, there is provided a method, probe array, probe set, process, array, test method, reagent, or use substantially as described herein and with reference to any example or drawing.
Detailed description of the preferred embodiments
Preferably, the ligation reaction in step (f) results in the formation of a DNA loop.
Preferably, the target nucleotide sequence is selected from the group consisting of genomic rearrangements, promoters, enhancers, silencers, isolators, matrix attachment regions, locus control regions, transcription units, replication origins, recombination hotspots, translocation breakpoints, centromeres, telomeres, gene-dense regions, gene-rare regions, repetitive elements and (viral) integration sites.
Preferably, the target nucleotide sequence is a disease-associated or disease-causing nucleotide sequence or is located up to or greater than 15Mb away from a disease-associated or disease-causing locus on a linear DNA template.
Preferably, the target nucleotide sequence is selected from the group consisting of AML1, MLL, MYC, BCL, BCR, ABL1, IGH, LYL1, TAL1, TAL2, LMO2, TCR α/, TCR β and HOX or other disease-associated loci described in "Catalogue of unbalanced Chromosome analogs in Man" 2 nd edition Albert schinzel, berlin: walter de Gruyter, 2001.ISBN 3-11-011607-3.
Preferably, the first restriction enzyme is a restriction enzyme that recognizes a 6-8bp recognition site.
Preferably, the first restriction enzyme is selected from the group consisting of BglII, HindIII, EcoRI, BamHI, SpeI, PstI and NdeI.
Preferably, the second restriction enzyme is a restriction enzyme that recognizes a 4 or 5bp nucleotide sequence recognition site.
Preferably, the second restriction enzyme recognition site is located more than about 350bp from the first restriction enzyme site in the target nucleotide sequence.
Preferably, the nucleotide sequence is labeled.
Preferably, the probe is complementary in sequence to the adjacent nucleic acid sequences on each side of each of the primary restriction enzyme recognition sites for the primary restriction enzymes in the genomic DNA.
Preferably, the probe is complementary in sequence to a nucleic acid sequence that is less than 300 base pairs from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
Preferably, the probe is complementary to a sequence that is less than 300bp from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
Preferably, the probe is complementary to a sequence which is 200-300bp from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
Preferably, the probe is complementary to a sequence which is 100-200bp or 0-100bp away from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
Preferably, the two or more probes are capable of hybridizing to adjacent sequences of each primary restriction enzyme recognition site of the primary restriction enzymes in the genomic DNA.
Preferably, the probes overlap or partially overlap.
Preferably, the overlap is less than 10 nucleotides.
Preferably, the probe sequence corresponds to all or part of the sequence between each of the primary restriction enzyme recognition sites of a primary restriction enzyme and each of the first adjacent secondary restriction enzyme recognition sites of a secondary restriction enzyme.
Preferably, each probe is at least a 25 mer.
Preferably, each probe is a 25-60 mer.
Preferably, the probe is a PCR amplification product.
Preferably, the array comprises about 300,000 and 400,000 probes.
Preferably, the array comprises about 385,000 or more probes, preferably about 750,000 probes, more preferably 6 x 750,000 probes.
Preferably, the array comprises or consists of a lower resolution representation of the complete genome of a given species.
Preferably, one of every 2, 3, 4, 5, 6, 7,8, 9 or 10 probes arrayed on a linear chromosome template is included in the array.
Preferably, a transition of the interaction frequency from low to high indicates the location of a balanced and/or unbalanced break point.
Preferably, an inversion pattern of the DNA-DNA interaction frequency of the subject sample relative to the interaction frequency of the control indicates balanced and/or unbalanced inversion.
Preferably, the combination of a decrease in the frequency of DNA-DNA interaction of the subject sample relative to the frequency of interaction of the control, and an increase in the frequency of DNA-DNA interaction with more distant regions, is indicative of a balanced and/or unbalanced deletion.
Preferably, an increase or decrease in the DNA-DNA interaction frequency of the subject sample relative to the interaction frequency of the control is indicative of a balanced and/or unbalanced duplication or insertion.
Preferably, spectroscopic karyotyping and/or FISH is used prior to performing the method.
Preferably, the disease is a genetic disease.
Preferably, the disease is cancer.
Preferably, the two or more amplified sequences are differentially labeled.
Preferably, when two or more amplified sequences are located on different chromosomes, the sequences are labeled identically.
Preferably, two or more amplified sequence sequences are labeled identically when they are located on the same chromosome far enough apart that there is minimal overlap between the DNA-DNA interaction signals.
Preferably, wherein the diagnosis or prognosis is a prenatal diagnosis or prognosis.
Advantages of the invention
The present invention has many advantages. These advantages will be apparent from the following description.
For example, the present invention is advantageous because it provides nucleotide sequences, methods, probes and arrays that are particularly commercially available.
By way of further example, the present invention is advantageous because it provides a method for analyzing the frequency of interaction of two or more nucleotide sequences in the nuclear space at high throughput.
By way of further example, the present invention is advantageous because each DNA-DNA interaction must be analyzed by a unique PCR reaction that includes a unique primer pair using conventional 3C techniques. Therefore, high throughput analysis is only possible if PCR is automated, but the cost of so many primers can be high. Therefore, high throughput (genome-wide) analysis of DNA-DNA interactions using conventional 3C technology is not feasible. In contrast, the present invention is now able to screen thousands of DNA-DNA interactions simultaneously. The high throughput analysis of DNA-DNA interactions described in the present invention will greatly increase the analytical scale and resolution.
By way of further example, the present invention is advantageous because, using conventional 3C techniques, the screening will favor those DNA sequences whose oligonucleotide primers are designed, arranged and included in the analysis. The selection of these oligonucleotide primers is generally based on the knowledge of the location of, for example, (distant) enhancers and/or other regulatory elements/hypersensitive sites which are believed to cross-link with the nucleotide sequence being investigated. Thus, whereas conventional 3C favors the design of PCR primers involved in the PCR amplification step, 4C is unbiased and can be used to search for DNA element interactions over the complete genome. This is because amplification of the crosslinked sequence in 4C is not based on prior knowledge of the sequence that is crosslinked to the nucleotide sequence under study. More specifically, in one embodiment of 4C, the sequence that is cross-linked to the first (target) nucleotide sequence can be amplified using PCR primers that hybridize to the nucleotide sequence. Thus, the present invention enables the unbiased screening of DNA-DNA interactions across a broad range of genomes.
By way of further example, the present invention is advantageous because only a single DNA-DNA interaction can be selectively amplified using conventional 3C techniques. This is not informative when hybridizing to the array. The technique has been improved such that all fragments that interact with the first (target) nucleotide sequence are now amplified, e.g. selectively amplified.
By way of further example, the present invention is beneficial because 4C technology can be used to detect balanced or unbalanced genetic abnormalities, such as all types of translocations, deletions, inversions, duplications and other genomic rearrangements, in nucleic acids (e.g., chromosomes). The 4C technique, which measures proximity of DNA fragments, can even determine a subject's propensity to acquire certain translocations, deletions, inversions, duplications, and other genomic rearrangements (e.g., balanced or unbalanced translocations, deletions, inversions, duplications, and other genomic rearrangements). The advantage over current strategies is that it does not require knowledge of the exact location of the change, since the resolution of 4C technology makes it useful to detect rearrangements even when the '4C-decoys' (as defined by the first and second restriction enzyme recognition sites being analyzed) are far from the change (e.g. up to one million bases or even more). Another advantage is that the 4C technique allows accurate mapping of changes, since it can be used to define two (first) restriction enzyme sites between where changes occur. Another advantage is that the cells do not need to be cultured before fixation. Thus, genomic rearrangements such as solid tumors can also be analyzed.
By way of further example, the present invention is advantageous because 4C technology is also capable of detecting changes (e.g., rearrangements) in a pre-malignant state (i.e., before all cells contain such changes). Thus, the technique can be used not only to diagnose a disease, but also to predict a disease.
By way of further example, the array design described herein is particularly advantageous over existing genomic chimeric arrays (e.g., the Nimblegen genomic chimeric array) because the design can represent a much larger portion of the genome in each individual array. For example, for a restriction enzyme that recognizes a six nucleotide sequence, for example, 3 arrays (each with about 385,000 probes) would be sufficient to cover the entire human or mouse genome. For restriction enzymes recognizing more than 6bp, for example, a single array of about 385,000 probes can be used to cover, for example, the entire human or mouse genome. The advantages of the array design are: (1) each probe can provide much information because each probe analyzes an independent ligation event, greatly facilitating interpretation of the results; and (2) large representatives of the genome can be found on a single array, which is cost-effective.
The 4C technique can be advantageously used to fine-map rearrangements that were originally detected by cytogenetic methods (light microscopy, FISH, SKY, etc.) without good characterization.
The 4C technique can be advantageously used to simultaneously screen combinations of rearrangements occurring near multiple loci on a single array.
Brief Description of Drawings
FIG. 1 shows a schematic view of a
Principle of 3C technology
FIG. 2
(a) Principle of one embodiment of 4C technology. 3C analysis is carried out as usual with restriction enzymes such as HindIII (H). After de-crosslinking, the DNA mixture will contain the first (target) nucleotide sequence, which is attached to a number of different fragments. These fragments are amplified and labeled, for example, on the DpnII loop using an amplification method such as inverse PCR using a first (target) nucleotide sequence specific primer. The labeled amplification products can be hybridized to an array as described herein. HindIII and DpnII are exemplified, but other combinations of restriction enzymes (e.g., 6 or 8-and 4 or 5-cleavases) can be used. (b) PCR results from separation by gel electrophoresis of two independent fetal liver (L1, L2) and brain (B1, B2) samples. (c) The microarray probe locations are schematically indicated. The probe was designed to be within 100bp of the HindIII site. Thus, each probe is analyzed for one possible linked ligand.
FIG. 3
The 4C technique detects the genomic environment of β -globin (chromosome 7). The untreated ratio (4C signal for beta-globin HS2 divided by the signal obtained for the control sample) is shown for probes located in the-35 Mb genomic region on mouse chromosomes 10, 11, 12, 14, 15, 7 and 8 (from top to bottom; the regions shown are at the same distance from each corresponding centromere). Note the large cluster of strong signals around the (globin) decoy on chromosome 7 (row 6), which confirms that the 4C technique detects genomic fragments near the linear chromosome template (consistent with the fact that the interaction frequencies are inversely proportional to genomic locus segregation). Note that the cis-linked region around the bait showing high signal density is large (> 5Mb), suggesting that e.g.translocations can be detected even with baits more than 1MB from the breakpoint.
FIG. 4
The 4C technique detects the genomic environment of Rad23A (chromosome 8). The untreated ratio (4C signal for Rad23A divided by the signal from the control sample) is shown for probes located in genomic regions-15 Mb or more above mouse chromosomes 10, 11, 12, 14, 15, 7 and 8 (from top to bottom; the regions shown are at the same distance from each corresponding centromere). Note the large cluster strong signal around the decoy on chromosome 8 (row 7) (Rad23A), which confirms that the 4C technique detects genomic fragments near the linear chromosome template (consistent with the fact that the frequency of interaction is inversely proportional to genomic locus segregation). Note that the cis-linked region around the bait showing high signal density is large (> 5Mb), suggesting that e.g.translocations can be detected even with baits more than 1MB from the breakpoint.
FIG. 5
4C interaction of beta-globin on chromosome 7 (. about.135 Mb) in transcribed tissue (fetal liver) and non-transcribed tissue (fetal brain) as analyzed by running mean. Note that the long-range interaction with β -globin differs between tissues (possibly depending on the gene transcription status). The strong 4C signal demarcates a large area (> 5Mb) around the bait, independent of the organization.
FIG. 6
In fetal hepatocytes, Uros and Er α f interact with β -globulin. The 4C method revealed that both genes (Er. alpha.f and Uros) interacted more than > 30Mb with a beta-globin locus located outside of-30 Mb. These two interactions have previously been discovered by different techniques (fluorescence in situ hybridization) and are described in Osborne et al, Nature Genetics 36, 1065 (2004). This example shows that the long-range interaction detected by the 4C technique can be verified by FISH and truly reflects nuclear proximity.
FIG. 7
The 4C technique accurately identified transitions between cis-linked unrelated genomic regions. For these experiments, transgenic mice were used that contained a human β -globin Locus Control Region (LCR) expression cassette (-20 kb) that was inserted (by homologous recombination) into the Rad23A locus on mouse chromosome 8. The 4C technique was performed on the liver of a transgenic mouse E14.5 embryo in which the insert was homozygous. The HindIII fragment in the integrated expression cassette (HS2) was used as '4C decoy'. The data show that the 4C technique accurately determines the position of integration on mouse chromosome 8 (top row: signal on chromosome 8 (see arrow for integration position) compared to signal on 6 other mouse chromosomes) for both ends of the transgene expression cassette (bottom row: probes in human LCR (20 kb) only, and probes in the 380kb human beta-globin sequence not), clearly revealing the position of integration on mouse chromosome 8 (complete chromosome is depicted). This example shows that 4C technology can be used to detect the genomic position of ectopically integrated DNA fragments (viruses, transgenes, etc.). It shows that transitions between cis-linked unrelated genomic regions can be accurately identified, which can be used to identify genomic breakpoint and translocation partners.
FIG. 8
The 4C technique produced reproducible data because the profiles for HS2 and β -globin were very similar. 4 biologically independent 4C experiments were performed on E14.5 fetal livers using either the beta-globin gene beta-major (top 2 rows) or beta-globin HS2 (bottom 2 rows) as bait. These decoys are-40 kb apart on linear chromosome templates, but have previously been shown to be close in the nuclear space (Tolhuis et al, Molecular Cell 10, 1453 (2002)). Described is a region of 5Mb on mouse chromosome 7, which is 20-20Mb away from the β -globin locus. The data showed high reproducibility among independent experiments, confirming that 2 fragments close in nuclear space share interacting ligands located elsewhere in the genome.
FIG. 9
The 4C technique was used to measure the DNA-DNA interaction frequency of sequence X (on chromosome A) in cells from healthy persons (top) and patients (A; B) (bottom) with translocations. The signal intensity (Y-axis) representing the frequency of DNA-DNA interaction is plotted against the probes (X-axis) arrayed on the linear chromosome template. In normal cells, frequent DNA-DNA interactions are detected on chromosome A around sequence X. In patient cells, a 50% decrease in interaction frequency was observed for probes located on chromosome a on the other side of the Breakpoint (BP) (compare gray curve (patient) to black line (healthy person)). Moreover, the translocation brings part of chromosome B physically close to sequence X, and frequent DNA-DNA interactions are now observed for this region on chromosome B. The sudden transition from low to high interaction frequency on this chromosome marks the location of its breakpoint.
FIG. 10 shows a schematic view of a
The 4C technique can detect one or more (balanced) inversions. The inverse pattern of DNA-DNA interaction frequency (measured by 4C technique as hybridization signal intensity) was observed in diseased (solid curve) subjects compared to non-diseased (dot curve) subjects, revealing the presence and size of the inversions.
FIG. 11
Detection of heterozygous deletions by 4C technology. Probes with reduced DNA-DNA interaction frequency (measured by 4C technique as hybridization signal intensity) in diseased (gray curve) compared to non-diseased (black curve) subjects revealed the location and size of the deleted region. The remaining hybridization signal in the deletion region of the diseased subject is from the complete allele (heterozygous deletion). Deletions are usually accompanied by an increase in signal intensity for probes located directly outside the deletion region (note that the gray curve is located above the black curve to the right of the deletion) because these regions are physically closer to the 4C sequence (bait).
FIG. 12
Duplicates detected by the 4C technique. Probes with increased hybridization signals in patients (gray curve) indicate the location and size of the repeats compared to normal (black curve) subjects. The repeats detected by the 4C technique are usually accompanied by a decrease in hybridization signal of the probes outside the repeat region (repeated increases in their separation from the genomic locus of the 4C sequence) compared to non-diseased subjects.
FIG. 13
4C techniques reveal long-range interactions with beta-globin. The untreated ratio of a, 4C relative to the control hybridization signal revealed the interaction of β -globin HS2 within chromosome 7 with 2 unrelated chromosomes (8 and 14). b-c, unprocessed data for 2 independent fetal liver (top, red) and fetal brain samples (bottom, blue) were plotted along 2 different 1-2Mb regions on chromosome 7. Highly reproducible interaction populations were observed in 2 fetal liver samples (b) or 2 brain samples (c). d-e, running mean data for the same region. The false discovery rate is set at 5% (dotted line). f, schematic representation of the regions that interact with activated (fetal liver, apical) and inactivated (fetal brain, basal) beta-globin on chromosome 7.
FIG. 14
Activated and inactivated beta-globin interact with the activated and inactivated chromosomal regions, respectively. a comparison between long-range beta-globin interactions in fetal liver (4C running average, top), microarray expression analysis in fetal liver (logarithmic scale, middle) and gene location mapping along the 4Mb region containing the gene Uros (distance beta-globin. about.30 Mb) (bottom) indicates that activated beta-globin preferentially interacts with other active transcriptional genes. b, similarly comparing the circumference of the OR gene bundle at a distance of 38Mb from globin in the fetal brain, shows that the inactivated β -globin interacts preferentially with the inactivated region. c, characterizing the regions of interaction with β -globin in fetal liver (left) and brain (right) based on gene composition and activity.
FIG. 15 shows a schematic view of a
The ubiquitously expressed Rad23A interacts with very similar active regions in the fetal liver and brain. a, schematic representation of the regions on chromosome 8 that interact with activated Rad23A in fetal liver (top, red) and brain (bottom, blue). b, compare Rad23A long range interaction (4C running average) to microarray expression analysis in fetal liver (logarithmic scale) (top two rows), Rad23A long range interaction (4C running average) and microarray expression analysis in fetal brain (logarithmic scale) (rows 3 and 4), and gene location mapped along the 3Mb region of chromosome 8 (bottom row). c, characterization of the region of interaction with Rad23A in fetal liver (left) and brain (right) based on gene composition and activity.
FIG. 16
Low temperature FISH confirms that the 4C technique indeed identifies the region of interaction. a, partial cryosection (200nm) examples show more than 10 nuclei, some of which contain the beta-globin locus (green) and/or Uros (red). Due to the sectioning, many nuclei do not contain signals for both loci. b-d, examples of full (b) and partial (c) overlap signals and contact signals (d), all of which were scored positive for interaction. e-g, examples of non-contact allele-containing nuclei (e-f) and β -globin-only nuclei (g), which score all interactions as negative. h-i, schematic of low temperature FISH results. The percentage of interaction with β -globin (h) and Rad23A (i) was above the chromosome to indicate regions identified as positive (red arrow) and identified as negative (blue arrow) by the 4C technique. The same BAC was used for both tissues. The interaction frequency between two distant OR gene bundles in the fetal liver and brain measured by cryofish is written below the chromosome.
FIG. 17
4C analysis of HS2 and β -major gave highly similar results. (a) The untreated 4C data for 4 independent E14.5 liver samples showed a very similar pattern between interaction with HS2 (top) and interaction with β -major (bottom). (b) A large overlap exists between probes scored as positive for interaction in the HS-2 experiment and probes scored as positive for interaction in the β -major experiment.
FIG. 18
The cis and trans interactions were compared. (a) The untreated 4C data from 2 independent experiments showed interaction of β -globin with the cis region identified as positive (chromosome 7, top) and the trans region containing the α -globin locus (chromosome 11, bottom). (b) The untreated 4C data from 2 independent experiments showed the interaction of Rad23A with the cis region identified as positive (chromosome 8, top) and the trans region at the top most when aligned according to the highest running mean (chromosome 11, bottom). No trans region reaches stringent conditions that would identify the long interacting cis region.
FIG. 19
The regions that interact with β -globin also frequently touch each other. 2 regions (almost 60Mb apart) containing actively transcribed genes and identified by 4C technology as being able to interact with beta-globin in fetal liver showed a co-localization frequency of 5.5% by cryoFISH, which is significantly greater than the background co-localization frequency.
Detailed Description
3C technique
The 3C method has been described in detail in Dekker et al (2002), Tolhuis et al (2002), Palstra et al (2003), Splint et al (2004) and Drissen et al (2004). Briefly, 3C was performed as: the cross-linked DNA is digested with a primary restriction enzyme and then ligated at a very low DNA concentration. Under these conditions, intramolecular ligation of the cross-linked fragments is much better than intermolecular ligation of the random fragments. The crosslinks are then released and the individual ligation products are detected and quantified by Polymerase Chain Reaction (PCR) using locus-specific primers. The cross-linking frequency (X) for two specific loci was determined by quantitative PCR reactions using control and cross-linked template, and X is expressed as the ratio of the amount of product obtained with cross-linked template and with control template.
According to the present invention, 3C templates were prepared using the method described by Splint et al, (2004) Methods enzymol.375, 493-. (i.e.formaldehyde fixation, (primary) restriction enzyme digestion, religation of cross-linked DNA fragments and purification of the DNA). Briefly, a sample (e.g., a cell, tissue, or nucleus) is fixed with a crosslinking agent (e.g., formaldehyde). Then, a first restriction enzyme digestion is performed to digest the DNA within the confines of the cross-linked core. Then, intramolecular ligation is performed at a low DNA concentration (e.g., about 3.7 ng/. mu.l), which is preferable to the ligation between the crosslinked DNA fragments (i.e., intramolecular ligation) over the ligation between the non-crosslinked DNA fragments (i.e., intermolecular or random ligation). Next, the crosslinking is released and the DNA is purified. The resulting 3C template contains the ligated restriction fragments because they were originally in proximity in the nuclear space.
Since the primary restriction enzyme is used to digest the DNA prior to the intramolecular ligation step, the enzyme recognition site of the primary restriction enzyme will separate the first (target) nucleotide sequence from the already ligated nucleotide sequence. Thus, the first recognition site is located between the first (target) nucleotide sequence and the linked nucleotide sequence (i.e., the linked second sequence).
Nucleotide sequence
The present invention relates to the use of nucleotide sequences (e.g. 3C templates, 4C templates, DNA templates, amplification templates, DNA fragments and genomic DNA) which can be used in databases.
The nucleotide sequence may be DNA or RNA, such as cDNA, of genomic, synthetic or recombinant origin. For example, recombinant nucleotide sequences can be prepared using PCR cloning techniques. This would involve preparing primer pairs flanking the region of the sequence to be cloned, contacting the primers with mRNA or cDNA obtained from, for example, a mammalian (e.g., animal or human cells) or non-mammalian cell, performing Polymerase Chain Reaction (PCR) under conditions that amplify the region of interest, isolating the amplified fragments (e.g., by purifying the reaction mixture on an agarose gel) and harvesting the amplified DNA. The primers may be designed to contain suitable restriction enzyme recognition sites so that the amplified DNA can be cloned into a suitable cloning vector.
The nucleotide sequence may be double-stranded or single-stranded, whether representing the sense or antisense strand or a combination thereof.
For some aspects, it is preferred that the nucleotide sequence is a single-stranded DNA-such as single-stranded primers and probes.
For some aspects, it is preferred that the nucleotide sequence is a double-stranded DNA-such as double-stranded 3C and 4C templates.
For some aspects, it is preferred that the nucleotide sequence is genomic DNA-such as one or more genomic loci.
For some aspects, it is preferred that the nucleotide sequence is chromosomal DNA.
The nucleotide sequence may comprise a first (target) nucleotide sequence and/or a second nucleotide sequence.
The first and second restriction enzyme recognition sites will be different from each other and will usually occur only once in the nucleotide sequence.
In one aspect, a circularized nucleotide sequence is provided comprising a first nucleotide sequence and (e.g., linked to) a second nucleotide sequence, said first and second nucleotide sequences being separated (e.g., separated or cleaved) by a first and second restriction enzyme recognition site, wherein said first nucleotide sequence is a target nucleotide sequence and said second nucleotide sequence is obtainable by cross-linking genomic DNA (e.g., in vivo or in vitro). The first and second restriction enzyme recognition sites will be different from each other and will usually occur only once in the nucleotide sequence.
In another aspect, there is provided a circularized nucleotide sequence comprising a first nucleotide sequence and (e.g., linked to) a second nucleotide sequence, said first and second nucleotide sequences being separated (e.g., separated or cleaved) by a first and second restriction enzyme recognition site, wherein said first nucleotide sequence is a target nucleotide sequence, and wherein said first and second nucleotide sequences are obtainable by a process comprising the steps of: (a) cross-linking genomic DNA (e.g., in vivo or in vitro); (b) digesting the cross-linked DNA with a first restriction enzyme; (c) linking the cross-linked nucleotide sequences; (d) releasing the crosslinking; and (e) circularizing the nucleotide sequence by digesting the nucleotide sequence with a second restriction enzyme.
Preferably, the second nucleotide sequence cleaves (e.g., bisects) the first (target) nucleotide sequence. Thus, a nucleotide sequence comprises a second nucleotide sequence that separates a first (target) nucleotide sequence into two portions or fragments-e.g., 2 portions or fragments that are approximately equal in size. Typically the portion or fragment will be at least about 16 nucleotides in length.
A first nucleotide sequence
The first nucleotide sequence is a target nucleotide sequence.
The term "target nucleotide sequence" as used herein refers to a sequence that serves as a decoy sequence, thereby identifying one or more sequences (e.g., one or more nucleotide sequences of interest or one or more sequences consisting of unknown nucleotide sequences) to which it is cross-linked.
The sequence of the target nucleotide sequence is known.
Cross-linking indicates that the target nucleotide sequence and the sequence cross-linked thereto are initially close in the nuclear space. By determining the frequency at which sequences are close to each other, it is possible, for example, to understand the conformation of chromosomes and chromosomal regions in nuclear-spatial relationships (e.g., in vivo or in vitro). Furthermore, complex structural organization in the genome may be understood, for example, when enhancers or other transcriptional regulatory elements are interconnected with promoters that are positioned in cis or even in trans. Furthermore, it is even possible to understand the positioning of a given genomic region with respect to nucleotide sequences that occur on the same chromosome (cis) as well as nucleotide sequences on other chromosomes (trans). Thus, it is possible to map nucleotide sequences on different chromosomes that frequently share sites in the nuclear space. Furthermore, it is even possible to detect balanced and/or unbalanced genetic abnormalities such as balanced and/or unbalanced translocations, deletions, inversions, duplications and other genomic rearrangements (such as deletions or translocations in one or more chromosomes). In this regard, genetic abnormalities can result in changes in DNA-DNA interactions at the location where the change occurs, which can be detected.
The first (target) nucleotide sequence according to the invention may be any sequence for which it is desirable to determine the frequency of interaction in the nuclear space with one or more other sequences.
In a specific embodiment, the length of the first (target) nucleotide sequence will be greater than about 350bp, because the second restriction enzyme is selected to cleave the first (target) nucleotide sequence at a distance of about 350bp or more from the first restriction site. This can minimize the preference for loop formation due to topological constraints (Rippe et al (2001) Trends in biochem. sciences 26, 733-40).
Suitably, the amplified first (target) nucleotide sequence comprises at least about 32bp, since the minimum length of at least 2 amplification primers used for amplifying the second nucleotide sequence is about 16 bases each.
In a preferred embodiment, the first (target) nucleotide sequence may comprise, in whole or in part (e.g., a fragment thereof), or in proximity to (e.g., adjacent to): promoters, enhancers, silencers, isolates, matrix attachment regions, locus control regions, transcription units, origins of replication, recombination hotspots, translocation breakpoints, centromeres, telomeres, gene-dense regions, gene-rare regions, repetitive elements, (viral) integration sites, nucleotide sequences whose deletions and/or mutations are associated with an effect (such as a disease, physiological, functional or structural effect-such as a SNP (single nucleotide polymorphism)), or nucleotide sequences containing such deletions and/or mutations, or any sequence in which it is desired to determine the frequency of interaction with other sequences in the nucleic space.
As described above, the first (target) nucleotide sequence may comprise, be in proximity to (e.g., adjacent to) a complete or partial (e.g., fragment) nucleotide sequence in which a genetic abnormality (e.g., a deletion and/or mutation) is associated with an effect (e.g., a disease). Thus, according to this embodiment of the invention, the first (target nucleotide sequence) may be a nucleotide sequence (e.g. a gene or locus) in which the change is associated with or associated with a disease (e.g. a genetic or congenital disease), adjacent thereto (on a solid DNA template), or in such a genomic region. In other words, the first (target) nucleotide sequence may be or be selected based on its association with a clinical phenotype. In preferred embodiments, the change is a change in one or more chromosomes, and the disease may be the result of, for example, one or more deletions, one or more translocations, one or more repeats, and/or one or more inversions therein. Non-limiting examples of these genes/loci are AML1, MLL, MYC, BCL, BCR, ABL1, immunoglobulin loci, LYL1, TAL1, TAL2, LMO2, TCR α/, TCR β, HOX and other loci in various lymphoblastic leukemias.
Other examples are described in electronic databases, such as:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cancerchromosomes
http://cgap.nci.nih.gov/chromosomes/Mitelman
http://www.progenetix.net/progenetix/P14603437/ideogram.html
http://www.changbioscience.com/cytogenetics/cytol.pl?query=47,xy
http://www.possum.net.au/
http://www.lmdatabases.com/
http://www.wiley com/legacy/products/subject/life/borgaonkar/
index.html
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM
http://www.sanger.ac.uk/PostGenomics/decipher/
http://agserver01.azn.nl:8080/ecaruca/ecaruca.jsp
other examples are described in "Catalogue of Unbalanced Chromosome abortions inMan" 2 nd edition Albert Schinzel. Walter de Gruyter, 2001.ISBN 3-11-011607-3.
In one embodiment, the term "adjacent" refers to "directly adjacent" such that there are no intervening nucleotides between 2 adjacent sequences.
In another embodiment, the term "adjacent" in the context of the nucleic acid sequence and the primary restriction enzyme recognition site refers to "directly adjacent" such that there are no intervening nucleotides between the nucleic acid sequence and the primary restriction enzyme recognition site.
A second nucleotide sequence
The second nucleotide sequence is obtainable, obtained, identified, identifiable by cross-linking genomic DNA (e.g., in vivo or in vitro).
The second nucleotide sequence (e.g., the nucleotide sequence of interest) becomes linked to the first (target) nucleotide sequence after treating the sample with the crosslinking agent and digesting/ligating the crosslinked DNA fragments. This sequence is cross-linked to the first (target) nucleotide sequence because they are initially close in the nuclear space and linked to the first (target) nucleotide sequence because the linking conditions are more prone to cross-linking the (intra-molecular) linkage between DNA fragments than are random linking events.
Diseases based on alterations such as translocations, deletions, inversions, duplications and other genomic rearrangements are generally caused by aberrant DNA-DNA interactions. The 4C technique measures the frequency of DNA-DNA interactions, which is primarily a function of genomic site separation, i.e., the frequency of DNA-DNA interactions is inversely proportional to the linear distance (in kilobases) between 2 DNA loci present on the same solid DNA template (Dekker et al, 2002). Thus, changes that produce new and/or physically different DNA templates are accompanied by changes in DNA-DNA interactions, and this can be measured by 4C techniques.
Suitably, the second nucleotide sequence is at least 40 base pairs.
Crosslinking agents (e.g., formaldehyde) can be used to crosslink proteins with other adjacent proteins and nucleic acids. Thus, two or more nucleotide sequences may be cross-linked only by proteins bound to (one of) these nucleotide sequences. Crosslinking agents other than formaldehyde may also be used in accordance with the present invention, including those that directly crosslink nucleotide sequences. Examples of agents that crosslink DNA include, but are not limited to, ultraviolet light, mitomycin C, nitrogen mustard, melphalan, 1, 3-butadiene diepoxide, cis-diamminedichloroplatinum (II), and cyclophosphamide.
Suitably, the cross-linking agent will form a relatively short distance of attachment (e.g. about 2)) Thereby selecting reversible intimate interactions.
Cross-linking may be performed, for example, by incubating the cells in 2% formaldehyde at room temperature, e.g.by incubating 1 × 10 in 10ml of DMEM supplemented with 2% formaldehyde, 10% FCS at room temperature7Cells were incubated for 10 minutes.
First restriction enzyme
The term "primary restriction enzyme" as used herein refers to a primary restriction enzyme used to digest cross-linked DNA.
The choice of the first restriction enzyme depends on the type of target sequence (e.g.locus) to be analyzed. It is desirable to perform preliminary experiments to optimize digestion conditions.
The primary restriction enzyme may be selected from restriction enzymes recognizing a DNA of at least 6bp sequence or more.
Restriction enzymes recognizing 6bp sequence DNA include, but are not limited to, AclI, HindIII, SspI, BspLU11I, AgeI, MluI, SpeI, BglII, Eco47III, StuI, ScaI, ClaI, AvaIII, VspI, MfeI, PmaCI, PvuII, NdeI, NcoI, SmaI, SacII, AvrII, PvuI, XmaIII, SplI, XhoI, PstI, AflII, EcoRI, AatII, SacI, EcoRV, SphI, NaeI, BsePI, BamHI, rI, ApaI, KpnI, SnaI, SalI, ApaLI, HpaI, SnaBI, BspHI, pMBsiI, NruI, XbaI, BstI, BaidII, BspII, BsiI, BaihII, AilII, AsuII, and AsuIII.
Restriction enzymes recognizing DNA of more than 6bp in sequence include, but are Not limited to, BbvC I, AscI, AsiS I, Fse I, Not I, Pac I, Pme I, Sbf I, SgrA I, Swa I, Sap I, Cci NI, FspA I, Mss I, SgfI, SmiI, SrfI, and Sse 8387I.
For some aspects of the invention, BglII, HindIII or EcoRI are preferred for restriction enzymes that recognize 6bp sequences.
The term "primary restriction enzyme recognition site" refers to a site in a nucleotide sequence that is recognized and cleaved by a primary restriction enzyme.
Second restriction enzyme
The term "second restriction enzyme" as used herein refers to the second restriction enzyme used after digestion of the first restriction enzyme, ligation of the cross-linked DNA, de-cross-linking and (optionally) purification of the DNA. In one embodiment, a second restriction enzyme is used to provide defined DNA ends for the nucleotide sequence of interest, thereby allowing the ligation of a sequence of known nucleotide composition to the recognition site of the second restriction enzyme flanking the nucleotide sequence of interest.
In one embodiment, ligating a sequence of known nucleotide composition to the second restriction enzyme recognition site flanking (e.g., on each side or end of) the nucleotide sequence of interest involves ligation under dilute conditions to facilitate intramolecular ligation between the second restriction enzyme recognition site flanking the target nucleotide sequence and the ligated nucleotide sequence of interest. This effectively results in the formation of a DNA loop in which the known target nucleotide sequence flanks the unknown sequence of interest.
In another embodiment, ligating a sequence of known nucleotide composition to a second restriction enzyme recognition site flanking (e.g., on each side or at each end of) the nucleotide sequence of interest involves adding a unique DNA sequence of known nucleotide composition and then ligating under conditions that promote intermolecular ligation between the second restriction enzyme recognition site flanking the nucleotide sequence of interest and the introduced unique DNA sequence of known nucleotide composition.
In one embodiment, the second restriction enzyme is selected such that the second restriction enzyme site is not within about 350bp (e.g., 350-400bp) of the first restriction site.
In another embodiment, the second restriction enzyme is selected such that the same second restriction enzyme site is likely located in the ligated nucleotide sequence (i.e., the ligated crosslinking sequence). Since the ends of the first (target) nucleotide sequence and the ligated nucleotide sequence may be suitably sticky (or blunt) ends, even the sequences may be ligated so as to circularize the DNA. Thus, following the digestion step, ligation is performed under dilution conditions that promote intramolecular interactions, and optionally circularization of the DNA via the adapted ends.
Preferably, the second restriction enzyme recognition site is a 4 or 5bp nucleotide sequence recognition site. Enzymes that recognize DNA of 4 or 5bp sequence include, but are not limited to, TspEI, MaeII, AluI, NlaIII, HpaII, FnuDII, MaeI, DpnI, MboI, HhaI, HaeIII, RsaI, TaqI, CviRI, MseI, Sth132I, AciI, DpnII, Sau3AI, and MnlI.
In a preferred embodiment, the second restriction enzyme is NlaIII and/or DpnII.
The term "secondary restriction enzyme recognition site" refers to a site in a nucleotide sequence that is recognized and cleaved by a secondary restriction enzyme.
After digestion with the second restriction enzyme, further ligation reactions were performed. In one embodiment, the ligation reaction ligates a DNA sequence of known nucleotide sequence composition to one or more secondary restriction enzyme digestion sites on the sequence contiguous with the target nucleotide sequence.
Third restriction enzyme
The term "third restriction enzyme" as used herein refers to a third restriction enzyme that may optionally be used after the second restriction enzyme step in order to linearize the circularized DNA prior to amplification.
The third restriction enzyme is preferably an enzyme that recognizes a nucleotide recognition site of 6bp or more.
The third restriction enzyme preferably digests the first (target) nucleotide sequence between the recognition sites for the first and second restriction enzymes. As will be appreciated by those of ordinary skill, it is desirable that the third restriction enzyme is not too close to the recognition sites of the first and second restriction enzymes when digesting the first (target) nucleotide sequence so that the amplified primers no longer hybridize. Thus, it is preferred that the third restriction enzyme recognition site is at least as far from the first and second restriction enzyme recognition sites as the length of the primers used, so that one or more amplification primers can still hybridize.
In a preferred embodiment, the third restriction enzyme is a restriction enzyme recognizing the 6-bpDNA sequence.
The term "recognition site for a third restriction enzyme" refers to a site in a nucleotide sequence that is recognized and cleaved by the third restriction enzyme.
Recognition sites
Restriction enzymes are enzymes that cleave the sugar-phosphate backbone of DNA. In the most practical configuration, a given restriction enzyme cleaves both strands of a piece of duplex DNA that is only a few bases long. Substrates for restriction endonucleases are double stranded DNA sequences called recognition sites/sequences.
The length of the restriction recognition site may vary depending on the restriction enzyme used. The length of the recognition sequence controls how frequently the enzyme will cut in the DNA sequence.
For example, many restriction endonucleases recognize a 4bp DNA sequence. Sequences and enzymes that recognize a 4bp DNA sequence include, but are not limited to, AATT (TspEI), ACGT (MaeII), AGCT (AluI), CATG (NlaIII), CCGG (HpaII), CGCG (FnuDII), CTAG (MaeI), GATC (DpnI, DpnII, Sau3AI & MboI), GCGC (HhaI), GGCC (HaeIII), GTAC (RsaI), TCGA (TaqI), TGCA (CviRI), TTAA (MseI), CCCG (Sth132I), CCGC (AciI), and CCTC (MnlI)
By way of further example, many restriction endonucleases recognize a 6bp DNA sequence. The sequences and enzymes that recognize DNA sequences of 6 base pairs include, but are not limited to, AACGTT (AclI), AAGCTT (HindIII), AATATT (SspI), ACATGTT (BspLU11I), ACCGGT (AgeI), ACGCGT (MluI), ACTAGT (SpeI), AGATCT (BglII), AGCGCT (Eco47III), AGGCCT (StuI), AGTACT (ScaI), ATGAGT (ClaI), ATGCAT (III), ATTAVspI, CAATMfeI, CACGGTG (CCPACI), CAGCPvuII, SacATCGI (CCATGATGACC), CGAGGAAGC (CGAGGCATCGI), CGGGGAAGG (CGGAAGGCCGI), (CGGCCGAGGCAGGCAGC), (CGAGGCAGGCAGGCAGI), (CGAGGCAGGCAGGCAGGCAGGCAGI), (CGAGGCAGGCAGGCAGGCAGGCAGGCAGI), (CGAGGCAGGCAGGCAGGCAGAACGI), (CGAGGCAGGCAGGCAGGCAGGCAGI), (CGAGGCAGGCAGGCAGGCAGGCAGGCAGI), (CGAGAGAGAGGCAGGCAGGCAGI), (CGAGAGAGGCAGAGAGAGAGAGGCAGGCAGI), (CGAGCGAGCGAGCGAGGCAGGCAGGCAGGCAGI), (CGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGI), (CGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGI), (CGAGAGCGAGAGAGAGAGAGCGAGCGI), (CGAGCGAGAGAGCGAGAGCGAGCGAGAGAGAGCGI), (CGAGCGI), (CGAGCGAGCGI), (CGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCGAGCG, TCGCGA (NruI), TCTAGA (XbaI), TGATCA (BclI), TGCGCA (MstI), TGGCCA (BalI), TGTACA (Bsp1407I), TTATAA (PsiI), TTCGAA (AsuII), and TTTAAA (AhaiI).
By way of further example, many restriction endonucleases recognize a 7bp DNA sequence. The sequences and enzymes that recognize the 7bp DNA sequence include, but are not limited to, CCTNAGG (SauI), GCTNAGC (EspI), GGTNACC BstEII, and TCCNGGA PfoI.
By way of further example, many restriction endonucleases recognize an 8bp DNA sequence. The sequences and enzymes recognizing the DNA sequence of 8bp include, but are not limited to ATTTAAAT (SwaI), CCTGCAGG (Sse8387I), CGCCGGCG (Sse232I), CGTCGACG (SgrDI), GCCCGGGC (SrfI), GCGATCGC (SgfI), GCGGCCGC (NotI), GGCCGGCC (FseI), GGCGCGCC (AscI), GTTTAAAC (PmeI), and TTAATTAA (PacI).
Many of these sequences comprise the sequence CG, which can be methylated in vivo. Many restriction enzymes are sensitive to this methylation and will not cleave methylated sequences, e.g., HpaII will not cleave sequence CCmGG, while its isoschizomer MspI is insensitive to this modification and cleaves methylated sequences. Thus, in some cases, eukaryotic methylation sensitive enzymes are not used.
In a specific embodiment, the recognition site is a digestion site.
In one embodiment, the restriction enzyme recognition site is a restriction enzyme digestion site.
Cyclization of
According to one embodiment of the present invention, the material for 4C is prepared by digesting a 3C template with a second restriction enzyme, followed by ligation, thereby generating a DNA loop.
Preferably, the second restriction enzyme is selected to cleave the first (target) nucleotide sequence at a distance of greater than about 350bp (e.g., 350-400bp) from the first restriction site. Beneficially, this minimizes the preference for loop formation due to topological constraints (Rippe et al (2001) Trends in biochem. sciences 26, 733-40).
The secondary restriction enzyme is preferably a frequent cutting molecule recognizing a 4 or 5bp restriction enzyme recognition site. It is thus possible to obtain the smallest restriction fragment that provides equal amplification efficiency for all ligated fragments during amplification.
Prior to the second restriction enzyme digestion and ligation, the DNA template will comprise one second enzyme recognition site in the first (target) nucleotide sequence which is more than about 350-400bp from the first restriction enzyme site, and another second enzyme recognition site in the already ligated nucleotide sequence (i.e.in the second nucleotide sequence).
The second restriction enzyme digestion step is preferably carried out for more than 1 hour to overnight, and then the enzyme is heat inactivated.
The DNA in the reaction mixture is preferably purified by conventional methods/kits known in the art.
After the second restriction enzyme digestion step, the second restriction enzyme site will be more than 350-400bp apart from the first restriction enzyme site in the first (target) nucleotide sequence, and another second restriction enzyme site will be located in the ligated nucleotide sequence (i.e.the second nucleotide sequence). Since the ends of the first (target) nucleotide sequence and the ligated nucleotide sequence have compatible ends, the sequences may be ligated to circularize the DNA.
Following the digestion step, ligation is performed under dilute conditions favoring intramolecular interactions, and the DNA is circularized through the appropriate termini.
The ligation reaction is preferably performed at a DNA concentration of about 1-5 ng/. mu.1.
The ligation reaction is preferably carried out at about 16-25 deg.C for more than 1 hour (e.g., 2, 3, 4, or more hours).
Thus, following ligation, circularized DNA can be prepared. The circularised DNA will comprise at least the recognition site for the second restriction enzyme or the first and second restriction enzymes. In circularized DNA comprising a first (target) nucleotide sequence, the first and second restriction enzyme recognition sites will define the ends of the first (target) nucleotide sequence and the ligated nucleotide sequence (i.e., the second nucleotide sequence). Thus, the first restriction enzyme recognition site and the second restriction enzyme recognition site will separate (e.g., separate) the first (target) nucleotide sequence from the linked nucleotide sequence.
Amplification of
One or more amplification reactions may be performed to amplify the 4C DNA template.
DNA amplification can be carried out using a number of different methods known in the art. For example, polymerase chain reaction (Saiki et al, 1988); ligation-mediated PCR, Qb replicase amplification (Cahill, Foster and Mahan, 1991; Chetverin and Spirin, 1995; Katanaev, Kurnanov and Spirin, 1995); ligase Chain Reaction (LCR) (Landegren et al, 1988; Barany, 1991); the self-sustained sequence replication system (Fahy, Kwoh and Gingeras, 1991) and strand displacement amplification (Walker et al, 1992) to amplify DNA.
Preferably, PCR is used to amplify the DNA. "PCR" refers to the method of U.S. Pat. Nos. 4,683,195, 4,683,202, and 4,965,188 to Mullis, which describe a method to increase the concentration of nucleotide sequence fragments in a genomic DNA mixture without cloning or purification.
In one embodiment, inverse PCR is used. (Ochman et al (1988) Genetics120(3), 621-3) Inverse PCR (IPCR) is a method for rapid in vitro amplification of DNA sequences flanking regions of known sequence. The method uses a Polymerase Chain Reaction (PCR) with primers in the opposite direction to the conventional direction. The template of the reverse primer is a restriction fragment that ligates itself into a loop. Inverse PCR has many applications in molecular genetics, e.g., amplification and identification of sequences flanking a transposable element. To increase the efficiency and reproducibility of the amplification, it is preferred that the DNA loop is linearized and then amplified with a third restriction enzyme. It is preferable to use a third restriction enzyme, which is a restriction enzyme recognizing 6bp or more. The third restriction enzyme preferably cleaves the first (target) nucleotide sequence between the first and second restriction enzyme sites.
Digestion of the 3C template with a second restriction enzyme, optional circularization, ligation (e.g., ligation under dilute conditions), and optional linearization of the loop containing the first (target) nucleotide sequence, can yield a DNA template for amplification ("4C DNA template").
For the amplification step, at least 2 oligonucleotide primers are used, wherein each primer hybridizes to a DNA sequence flanking the nucleotide sequence of interest. In a preferred embodiment, at least 2 oligonucleotide primers are used, wherein each primer hybridizes to a target sequence flanking a nucleotide sequence of interest.
In one embodiment, the term "flanking" in the context of primer hybridization means that at least one primer hybridizes to a DNA sequence adjacent to one end (e.g., the 5 'end) of the nucleotide sequence of interest and at least one primer hybridizes to a DNA sequence at the other end (e.g., the 3' end) of the nucleotide sequence of interest. Preferably, at least one forward primer hybridizes to a DNA sequence adjacent to one end (e.g., the 5 'end) of the nucleotide sequence of interest, and at least one reverse primer hybridizes to a DNA sequence at the other end (e.g., the 3' end) of the nucleotide sequence of interest.
In a preferred embodiment, the term "flanking" in the context of primer hybridization means that at least one primer hybridizes to the target sequence adjacent to one end (e.g., the 5 'end) of the nucleotide sequence of interest and at least one primer hybridizes to the target sequence at the other end (e.g., the 3' end) of the nucleotide sequence of interest. Preferably, at least one forward primer hybridizes to the target sequence adjacent to one end (e.g., the 5 'end) of the nucleotide sequence of interest, and at least one reverse primer hybridizes to the target sequence at the other end (e.g., the 3' end) of the nucleotide sequence of interest.
The term "primer" as used herein refers to an oligonucleotide, whether naturally occurring in the form of a purified restriction digest or produced synthetically, which can serve as a point of initiation of synthesis when placed under conditions (i.e., in the presence of nucleotides and an inducing agent such as a DNA polymerase, and at a suitable temperature and pH) that induce synthesis of a primer extension product complementary to a nucleic acid strand. The primer is preferably single stranded for maximum amplification efficiency, but may be double stranded. If double stranded, the primer is first treated to separate its strands and then used to prepare extension products. The primer is preferably an oligodeoxyribonucleotide. The primer must be long enough to prime the synthesis of extension products in the presence of the inducing agent. The exact length of the primer will depend on many factors, including temperature, source of primer, and method used.
Suitably, the primer will be at least 15, preferably at least 20, for example at least 25, 30 or 40 nucleotides in length. Preferably, the amplification primers are 16 to 30 nucleotides in length.
The primers are preferably designed as close as possible to the first and second restriction enzyme recognition sites separating the first (target) nucleotide sequence and the second nucleotide sequence. The primers may be designed such that they are within about 100 nucleotides, such as about 90, 80, 70, 60, 50, 40, 30, 20, 10, 9, 8, 7, 6, 5, 4, 3,2, or 1 nucleotide, of the first and second restriction enzyme recognition sites.
Suitably, the amplification primers are designed such that their 3' ends face outwards towards the first and second restriction enzyme recognition sites, such that extension immediately across the restriction enzyme sites into the second nucleotide sequence.
If the amplification method used is inverse PCR, it is preferred to perform the amplification reaction on about 100-400ng of 4C template DNA (in about 50. mu.l of each PCR reaction mixture) or other DNA amounts that will give reproducible results in repeated PCR reactions (see FIG. 1) and that will include the largest number of ligation events in each PCR reaction.
Preferably, the inverse PCR amplification reaction is performed using the Expand Long Template PCR System (Roche) using buffer 1 according to the manufacturer's instructions.
Sample (I)
The term "sample" as used herein has its normal meaning. The sample may be any entity of matter that comprises cross-linked or cross-linkable DNA. The sample may be or may be derived from a biological material.
The sample may be or may be derived from one or more entities, such as one or more cells, one or more nuclei, or one or more tissue samples. The entity may be or may be derived from any entity in which DNA, such as chromatin, is present. The sample may be or may be derived from one or more isolated cells or one or more isolated tissue samples, or one or more isolated nucleic acids.
The sample may be or may be derived from living and/or dead cells and/or nuclear lysates and/or isolated chromatin.
The sample may be or may be derived from a subject with and/or without a disease.
The sample may be or may be derived from a subject suspected of suffering from a disease.
The sample may be or may be derived from a subject to be tested for the likelihood that they will suffer from a disease in the future.
The sample may be or may be derived from living or non-living patient material.
Splint et al, (2004) Methods enzymol.375, 493-507 describe the use of fixed cells and tissues for the preparation of 3C templates in detail.
Marking
Preferably, the nucleotide sequences (e.g., amplified 4C DNA templates, primers or probes, etc.) are labeled to aid in their downstream use-e.g., array hybridization. For example, 4C DNA templates can be labeled using random priming or nick translation.
The nucleotide sequences described herein can be labeled with a variety of labels (e.g., reporter molecules), particularly during the amplification step. Suitable labels include radionuclides, enzymes, fluorescent, chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, magnetic particles, and the like. Patents teaching the use of these markers include US-A-3817837; US-A-3850752; US-A-3939350; US-A-3996345; US-A-4277437; US-A-4275149 and US-A-4366241.
Other labels include, but are not limited to, beta-galactosidase, invertase, green fluorescent protein, luciferase, chloramphenicol, acetyltransferase, beta-glucuronidase, exoglucanase, and glucoamylase. Fluorescent labels, as well as specifically synthesized fluorescent reagents with specific chemical properties, may also be used. A number of ways of measuring fluorescence are available. For example, some fluorescent labels exhibit a change in excitation or absorption spectra, some exhibit resonance energy transfer when one of the fluorescent reporter molecules fluoresces and the second absorbs fluorescence, some exhibit a loss of fluorescence (quenching) or exhibit fluorescence, and some report rotational motion.
To obtain sufficient material for labeling, multiple amplifications can be pooled without increasing the number of amplification cycles in each reaction. Alternatively, labeled nucleotides can be incorporated into the last few cycles of the amplification reaction (e.g., 30 cycles of PCR (no label) +10 cycles of PCR (labeled)).
Array of cells
In particularly advantageous embodiments, 4C DNA templates prepared according to the methods described herein can be hybridized to an array. Thus, array (e.g., microarray) technology can be used to identify nucleotide sequences (e.g., genomic fragments) that frequently share a nucleic acid site with a first (target) nucleotide sequence.
Existing arrays (e.g., expression and genomic arrays) can be used in accordance with the present invention. However, the present invention also seeks to provide novel arrays (e.g.DNA arrays) as described herein.
An "array" is a collection of intentionally generated nucleic acids that can be prepared synthetically or biosynthetically and screened for biological activity in a variety of different formats (e.g., libraries of soluble molecules; and libraries of oligomers attached to resin beads, silicon wafers, or other solid supports). In addition, the term "array" includes those libraries of nucleic acids prepared by spotting nucleic acids of almost any length (e.g., from 1 to about 1000 nucleotide monomers in length) onto a substrate.
Array technology and the various techniques and applications associated therewith are generally described in a large number of texts and literature. These include Lemieux et al, 1998, Molecular Breeding 4, 277-289, Schena and Davis, parallel analysis with Biological chips, taken from PCR Methods Manual (M.Innis, D.Gelfand, J.Sninsky, ed.), Schena and Davis, 1999, Genes, genome and chips, taken from DNA microarray: the analytical Approach (edited by m.schena), Oxford University Press, Oxford, uk, 1999), the chinese Forecast (nature genetics specialty; journal of 1 month 1999), Mark Schena (eds), Microarray biochips technology, (Eaton Publishing Company), cortex, 2000, the scientist 14[17 ]: 25, Gwynne and Page, Microarray analysis: the next differentiation in molecular biology, Science, 8/6/1999; and Eakins and Chu, 1999, trends Biotechnology, 17, 217-218.
The array technology overcomes the disadvantages of the traditional molecular biology methods, which generally work on the basis of "one gene in one experiment", are low-throughput and cannot form a "panorama" of gene functions. Currently, the main applications of array technology include the identification of sequences (genes/gene mutations) and the determination of gene expression levels (abundances). Gene expression profiling can be performed using array technology, optionally in combination with proteome technology (Celis et al, 2000, FEBS Lett, 480 (1): 2-16; Lockhart and Winzeller, 2000, Nature 405 (6788): 827-. Other applications of array technology are also known in the art; for example, gene discovery, Cancer research (Marx, 2000, Science 289: 1670-.
In general, any library can be arranged in an ordered fashion by spatially separating the members of the library. Examples of suitable array libraries include nucleic acid libraries (including libraries of DNA, cDNA, oligonucleotides, etc.), peptide, polypeptide, and protein libraries, as well as libraries comprising any molecule, such as ligand libraries, among others.
The sample (e.g., a member of the library) is typically immobilized or immobilized on a solid phase, preferably on a solid substrate, thereby limiting diffusion and mixing of the sample. In a preferred embodiment, a library of ligand-bound DNA is prepared. In particular, the library may be immobilized on a substantially flat solid phase, including membranes and non-porous substrates (e.g., plastics and glass). In addition, the samples are preferably arranged in a manner that facilitates indexing (i.e., provides a reference or extraction for a particular sample). Typically the sample is used as dots in a grid format. The ordinary test system may be modified for this purpose. For example, the array may be immobilized on the surface of a microplate, with multiple samples in one well, or a single sample in each well. Furthermore, the solid substrate may be a membrane, such as a nitrocellulose or nylon membrane (e.g. a membrane used in blotting experiments). Other substrates include glass or silicon based substrates. Thus, the sample may be immobilised by any suitable means known in the art, for example by charge interaction, or by chemical coupling to the pore walls or bottom, or to the membrane surface. Other alignment and immobilization methods may be used, for example, spotting with pipettes, drop contacts, piezoelectric methods, ink jet and foam jet techniques, electrostatic applications, and the like. For silicon-based chips, photolithography can be used to align and fix the samples on the chip.
Arranging the samples by "dots" on a solid substrate; this can be done manually or by using robotics to spot the sample. In general, arrays can be described as large arrays or microarrays, with the difference being the size of the sample spots. Large arrays typically contain sample spots of about 300 microns or larger in size and can be conveniently imaged by existing gel and blot scanners. Sample spots in microarrays are typically less than 200 microns in diameter, and these arrays typically contain thousands of spots. Therefore, microarrays require specialized robotics and imaging equipment, which may require customization. The instrument used was described in cortex, 2000, the scientist 14[11 ]: there is a general overview in 26.
Techniques for generating libraries of immobilized DNA molecules are described in the prior art. In general, most prior art methods describe how to synthesize libraries of single stranded nucleic acid molecules using, for example, cryptic techniques to generate various sequence changes at various discrete locations on a solid substrate. U.S. Pat. No. 5,837,832 describes an improved method based on large scale integration techniques to generate DNA arrays immobilized on a silicon substrate. In particular, U.S. Pat. No. 5,837,832 describes a strategy called "tiling" to synthesize specific sets of probes at spatially defined locations on a substrate, which can be used to generate an immobilized DNA library of the invention. U.S. Pat. No. 5,837,832 also provides references to earlier techniques that may also be used.
Arrays can also be fabricated using photo-deposition chemistry.
Peptide (or peptidomimetic) arrays can also be synthesized on a surface by placing each unique library member (e.g., unique peptide sequence) at a discrete, predetermined array position. The identity of each library member is determined by the spatial position in the array. Determining the location of the binding interaction that occurs between a predetermined molecule (e.g., target or probe) in the array and the reactive library member, thereby identifying the sequence of the reactive library member based on the spatial location. These methods are described in U.S. Pat. nos. 5,143,854; WO90/15070 and WO 92/10092; fodor et al (1991) Science, 251: 767; dower and Fodor (1991) ann. 271.
To facilitate detection, a label (as described above) is typically used, such as any reporter that facilitates detection, e.g., fluorescent, bioluminescent, phosphorescent, radioactive, etc. These reporter molecules, their detection, coupling to targets/probes, and the like are discussed elsewhere herein. Labeling of probes and targets is also disclosed in Shalon et al, 1996, Genome Res6 (7): 639-45.
Examples of specific DNA arrays are as follows:
type I: probe cDNAs (500-5,000 bases long) are immobilized on a solid surface (e.g., glass) using robotic spotting and exposed to a set of separated or mixed targets. This method is widely believed to be developed by Stanford university (Ekins and Chu, 1999, Trends in Biotechnology, 1999, 17, 217-.
Type II: oligonucleotide (20-25 poly-oligomer, preferably 40-60 poly-oligomer) or Peptide Nucleic Acid (PNA) probe arrays are synthesized in situ (on-chip) or routinely and then immobilized on a chip. The array is exposed to labeled sample DNA, hybridized, and the identity/abundance of the complementary sequence determined. The DNA chip was manufactured by Affymetrix, Inc. as GeneChipAnd (4) selling a trademark. Agilent and Nimblegen also provide suitable arrays (e.g., genomic chimeric arrays).
Examples of some commercially available microarray models are listed in Table 1 below (see also Marshall and Hodgson, 1998, Nature Biotechnology, 16(1), 27-31).
| Company(s) | Name of the product | Arrangement method | Hybridization step | Reading instrument |
| Affymetrix,Inc.,Santa Clara,California | GeneChip | In situ (on-chip) photolithography Synthesis of-20-25 Polyoligomers on silicon wafers, cut them to 1.25cm2Or 5.25cm2Square chip | 10,000-260,000 oligomer features were detected using labeled 30-40 nucleotide fragments of sample cDNA or antisense RNA | Fluorescence |
| Brax,Cambridge,UK | Short synthetic oligomers, off-chip synthesis | Detection of 1000 oligos on a "Universal chip" with labeled nucleic acids | Mass spectrometry | |
| Gene Logic,Inc.,Columbia,Maryland | READSTM |
| Genometrix Inc.,The Woodlands,Texas | UniversalArraysTM | |||
| GENSETParis, France | ||||
| Hvsea Inc.,Sunnyvale,California | HyChipTM | 500-2000nt DNA samples were printed at 0.6cm2(HyGnositics) or 18cm2(Gene Discovery) 5-mer oligo made on membranes at 1,15cm2Arrays were printed on glass (HyChip) | 64 sample cDNA spots can be detected with 8,000 7-mer oligo spots (HyGnostics) or 55,000 sample cDNA spots can be detected with 1024 oligomer spots common to 300 7-mer detection (GeneDiscovery) 10kb sample cDNA, labeled 5-mer, and ligase | Fluorescence of radioactive isotope |
| Incyte Pharmaceuticals, Inc.,Palo Alto,Caiifornia | GEM | Piezoelectric printing of spotted PCR fragments and synthesis of oligomers on a chip | 1000 (up to 10,000) oligo/PCR fragment spots can be detected using labeled RNA | Fluorescent and radioactive isotopes |
| Molecular Dynamics,Inc.,Sunnyvale,California | StormFluorImager | 500-plus 5000nt cDNA was printed at-10 cm with a pen tip2On the glass sheet | About 10,000 cDNA spots can be detected with sample cDNA labeled with 200- & 400nt | Fluorescence |
| Nanogen,SanDiego,California | Semiconductor microchip | Preformed-20 poly-oligomers, trapped on electroactive sites on silicon wafers, cut to < 1cm2Square chip | 25. 64, 400 (and up to 10,000) oligomer spots were polarized to enhance hybridization with 200-400nt labeled sample cDNA | Fluorescence |
| Protogene Laboratories,PaloAlto,California | Synthesis of 40-50 Poly-oligo chips at 9cm by printing on a surface tension array2On the glass chip | The < 8,000 oligomer spots can be detected by using sample nucleic acid labeled with 200- > 400nt | Fluorescence | |
| SequenomHamburg, Germany, and SanDiego, California | Large array SpectroChip | Offset printing the array; about 20-25 poly-oligomer | Detection of 250 positions per SpectroChip by laser desorption and mass spectrometry | Mass spectrometry |
| Synteni,Inc.,Fremont,California | UniGEMTM | The 500-plus 5,000nt cDNA was printed at-4 cm using a tip2On the glass chip | The 10,000 cDNA spots can be detected by using sample cDNA labeled with 200- | Fluorescence |
| NimblegenSystems Inc.,Madison | Chile whole genome 60-mer microarray | 38,000 transcripts, 5 probes per gene 17.4mm × 13mm | 5-micron scanning platform |
| The German cancer Institute, Heidelberg, Germany | Prototype PNA big chip, probes synthesized on the chip using f-moc or t-moc chemistry | About 1,000 dots on an 8 × 12cm chip | Fluorescence/mass spectrometry |
Table 1: examples of currently available hybridization microarray models
To generate data from array-based assays, signals are detected that indicate the presence or absence of hybridization between the probe and nucleotide sequence. The present invention also concerns direct and indirect labeling techniques. For example, direct labeling incorporates a fluorescent dye directly onto a nucleotide sequence that hybridizes to a probe attached to an array (e.g., by enzymatic synthesis in the presence of labeled nucleotides or PCR primers). Direct labeling schemes can produce strong hybridization signals, which typically utilize a family of fluorescent dyes with similar chemical structures and properties, and are easy to implement. In preferred embodiments involving direct labeling of nucleic acids, cyanine or alexa analogs are used in multiplex fluorescence comparison array assays. In other embodiments, an indirect labeling scheme can be used to incorporate an epitope on a nucleic acid either before or after hybridization to a microarray probe. One or more staining procedures and reagents may be used to label the hybridized complex (e.g., a fluorescent molecule that binds to the epitope, whereby a fluorescent signal may be provided by the dye molecule attached to the epitope of the hybrid).
Data analysis is also an important part of the experiments involving arrays. Raw data obtained in array experiments is typically an image that needs to be converted into a matrix-table, where rows represent, for example, genes, columns represent, for example, various samples (e.g., tissues) or experimental conditions, and the numbers in each unit characterize, for example, the expression of a particular sequence (preferably a second sequence linked to a first (target) nucleotide sequence) in a particular sample. These matrices must be further analyzed if knowledge of any relevant biological processes is to be extracted. Data analysis methods (including both directed and unguided data analysis and bioinformatics methods) are disclosed in Brazma and Vilo J (2000) FEBS Lett 480 (1): 17-24.
As described herein, one or more nucleotide sequences (e.g., DNA templates) that are tagged and then hybridized to the array comprise nucleotide sequences that are enriched with small stretches of sequence of unique importance, i.e., that span between a first restriction enzyme recognition site that is linked to a first (target) nucleotide sequence during a 3C process and their respective adjacent second restriction enzyme recognition sites.
A single array may contain multiple (e.g., two or more) bait sequences.
Probe needle
The term "probe" as used herein refers to a molecule (e.g., an oligonucleotide, whether it is naturally occurring as a purified restriction digest or produced synthetically, recombinantly or by PCR amplification) that is capable of hybridizing to another molecule of interest (e.g., another oligonucleotide). When the probes are oligonucleotides, they may be single-stranded or double-stranded. Probes can be used to detect, identify and isolate specific targets (e.g., gene sequences). As described herein, it is contemplated that probes used in the present invention may be labeled with a label so as to be detectable in any detection system, including but not limited to enzymes (e.g., ELISA, and enzyme-based histochemical tests), fluorescent, radioactive, and luminescent systems.
With respect to arrays and microarrays, the term "probe" is used to refer to any hybridizable material that can be immobilized on an array for the purpose of detecting a nucleotide sequence that has hybridized to the probe. These probes are preferably 25-60 mer or longer.
Probe design strategies are described in WO95/11995, EP 717, 113 and WO 97/29212.
Since 4C allows unbiased genome-wide searching of interactions, it can be advantageous to prepare arrays with probes that probe every possible (e.g., unique/non-repetitive) primary restriction enzyme recognition site in the genome. Thus, the array design is dependent only on the choice of the first restriction enzyme and not on the actual sequence of the first or second nucleotide.
Although existing arrays may be used in accordance with the present invention, other configurations are preferred.
In one configuration, one or more probes on the array are designed so that they hybridize near the site of digestion by the first restriction enzyme. More preferably, the one or more probes are within about 20bp from the recognition site of the first restriction enzyme. More preferably, the one or more probes are within about 50bp from the recognition site for the first restriction enzyme.
Suitably, one or more of the probes is within about 100bp (e.g.about 0-100bp, about 20-100bp) from the recognition site of the primary restriction enzyme.
In a preferred configuration, a single, unique probe is designed within 100bp of each side of the site digested by the primary restriction enzyme.
In another preferred configuration, the location of the site of digestion with the second restriction enzyme relative to the location of the site of digestion with the first restriction enzyme is also taken into account. In this configuration, a single, unique probe is designed only on each side of the site of digestion by the primary restriction enzyme, which is a large enough distance from the nearest secondary restriction enzyme recognition site to design a probe of a given length between the primary and secondary restriction enzyme recognition sites. For example, in this configuration, no probe is designed to be located on one side of a particular primary restriction enzyme recognition site within 10bp of the secondary restriction enzyme recognition site on the same side.
In another configuration, the probes on the array are designed so that they can hybridize to either side of the site of the first restriction enzyme digestion. Suitably, a single probe on each side of the primary restriction enzyme recognition site may be used.
In yet another configuration, two or more (e.g., 3, 4, 5, 6, 7, or 8 or more) probes can be designed on each side of the primary restriction enzyme recognition site, which can then be used to study the same ligation event. The exact genomic position of the adjacent secondary restriction enzyme recognition sites can be considered for the number and location of probes relative to each primary restriction enzyme recognition site.
In yet another configuration, two or more (e.g., 3, 4, 5, 6, 7, or 8 or more) probes may be designed to be located near each primary restriction enzyme recognition site regardless of the closest secondary restriction enzyme recognition site. In this configuration, all probes should remain close to the primary restriction enzyme recognition site (preferably within 300bp of the restriction site).
Advantageously, the latter design, and also the design of 1 probe per (side of) the recognition site of the primary restriction enzyme, enables the use of different secondary restriction enzymes in combination with a given primary restriction enzyme.
Advantageously, the use of multiple (e.g., 2, 3, 4, 5, 6, 7, or 8 or more) probes per primary restriction enzyme recognition site minimizes the problem of false negative results due to poor performance of a single probe. Moreover, it also increases the reliability of data obtained in a single chip experiment and reduces the number of arrays required to reach a statistically reliable conclusion.
Probes used in the array may be greater than 40 nucleotides in length and may be isothermal.
Probes containing repetitive DNA sequences are preferably excluded.
Probes used to detect restriction enzyme sites immediately flanking or near the first nucleotide sequence are expected to give very strong hybridization signals and may also exclude probe designs.
The array may encompass any genome, including mammalian (e.g., human, mouse (e.g., chromosome 7)), vertebrate (e.g., zebrafish), or non-vertebrate (e.g., bacterial, yeast, fungal, or insect (e.g., drosophila)) genomes.
In a further preferred embodiment, the array contains 2-6 probes around each unique primary restriction enzyme site and as close as possible to the restriction enzyme digestion site.
The maximum distance from the restriction enzyme digestion site is preferably about 300 bp.
In a further preferred embodiment of the invention, arrays for restriction enzymes (e.g.HindIII, EcoRI, BglII and NotI) are provided which encompass mammalian or non-mammalian genomes. Advantageously, the array design described herein overcomes the requirement to redesign the array for each target sequence, provided that the analysis is performed in the same species.
Probe set
The term "set of probes" as used herein refers to a combination or collection of probes that hybridize to each of the primary restriction enzyme recognition sites of a primary restriction enzyme in a genome.
Thus, in another aspect, there is provided a set of probes complementary in sequence to the nucleic acid sequence adjacent to each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
Suitably, the set of probes is complementary in sequence to the first 25-60 (e.g.35-60, 45-60, or 50-60) or more nucleotides adjacent to each of the primary restriction enzyme recognition sites in the genomic DNA. The probe set may be complementary in sequence to one or both sides of the first restriction enzyme recognition site. Thus, the probe may be complementary in sequence to the nucleic acid sequence adjacent each side of each of the first restriction enzyme recognition sites in the genomic DNA.
It is also possible to determine a window in which one or more probes of the set can be designed (e.g., 300bp or less, such as 250bp, 200bp, 150bp, or 100bp from the primary restriction enzyme recognition site). Important factors for determining the window (in which the probe is designed) are: such as GC-content, lack of palindromic sequences capable of forming hairpin structures, maximum size of a single type of nucleotide segment. Thus, a probe set may be complementary in sequence to a nucleic acid sequence that is less than 300bp from each of the primary restriction enzyme recognition sites in the genomic DNA.
It is also possible to determine a window of about 100bp from the recognition site of the first restriction enzyme to identify the optimal probe in the vicinity of each restriction site.
In other embodiments of the invention, the set of probes is complementary to sequences less than 300bp from each of the primary restriction enzyme recognition sites in the genomic DNA, to sequences 200-300bp from each of the primary restriction enzyme recognition sites in the genomic DNA and/or to sequences 100-200bp from each of the primary restriction enzyme recognition sites in the genomic DNA.
In other embodiments of the invention, the set of probes is complementary to a sequence 0-300bp from each of the primary restriction enzyme recognition sites in the genomic DNA, 0-200bp from each of the primary restriction enzyme recognition sites in the genomic DNA, and/or 0-100bp from each of the primary restriction enzyme recognition sites in the genomic DNA (e.g., about 10, 20, 30, 40, 50, 60, 70, 80, or 90bp from each of the primary restriction enzyme recognition sites in the genomic DNA).
It is even possible to design two or more probes capable of hybridizing to sequences adjacent to each of the primary restriction enzyme recognition sites in the genomic DNA.
The probes may overlap or partially overlap. If the probes overlap, it is preferred that the overlap is less than 10 nucleotides.
PCR fragments representing the first 1-300 nucleotides (e.g., 1-20, 1-40, 1-60, 1-80, 1-100, 1-120, 1-140, 1-160, 1-180, 1-200, 1-220, 1-240, 1-260, or 1-280 nucleotides) flanking each primary restriction enzyme recognition site may also be used.
PCR fragments can also be used as probes, which correspond strictly to each genomic site flanked by a first restriction enzyme recognition site and a first adjacent second restriction enzyme recognition site, respectively. Thus, the probe sequence may correspond to all or part of the sequence between each primary restriction enzyme recognition site and each first adjacent secondary restriction enzyme recognition site.
Typically, the probes, probe arrays or probe sets will be immobilized on a support. The support (e.g., solid support) can be made of a variety of materials, such as glass, silica, plastic, nylon, or nitrocellulose. The support is preferably rigid and has a flat surface. The support typically has about 1-10,000,000 discrete spatially addressable areas, or cells. Supports having about 10-1,000,000 or about 100-100,000 or about 1000-100,000 units are common. The cell density is typically at least about 1000, 10,000, 100,000, or 1,000,000 cells per square centimeter. In some supports, all of the elements are occupied by pooled probes or probe set mixtures. In other supports, some units are occupied by pooled probe or probe set mixtures, while other units are occupied by a single type of oligonucleotide of at least the degree of purity available with synthetic methods.
The arrays described herein preferably comprise at least one probe per first restriction enzyme recognition site, e.g.about 750,000 occurrences in the genome of each human or mouse for recognition of a 6bp restriction enzyme.
For example, for restriction endonucleases recognizing a recognition sequence > 6bp, a single array of about 2 × 750,000 probes can be used to cover the complete human or mouse genome, 1 probe on each side of each restriction site.
In a preferred array design, the total number of probe molecules for a given nucleotide sequence present on the array greatly exceeds the number of homologous fragments present in the 4C sample to be hybridized to the array. Given the nature of the 4C technique, fragments representing genomic regions adjacent to the nucleotide sequence to be analyzed on the linear chromatin template will be in large excess in the 4C hybridization sample (as depicted in figure 2). To obtain quantitative information about the efficiency of hybridization of this abundant fragment, it may be necessary to reduce the amount of sample to be hybridized and/or increase the number of molecules of a given oligonucleotide sequence probe on the array.
Thus, in order to detect frequent exposure to DNA regulatory elements such as gene promoter elements, it may be necessary to use an array in which the probes represent only a selected genomic region (e.g., about 0.5-10Mb), but in which each particular probe appears at multiple (e.g., about 100, 200, 1000) locations on the array. The design may also be preferably used for diagnostic purposes to detect local (e.g., within about 10Mb) genomic rearrangements (e.g., deletions, inversions, duplications, etc.) around a site (e.g., a gene of interest).
The array may contain about 3X 750,000 probes, 4X 750,000 probes, 5X 750,000 probes, or preferably 6X 750,000 probes. More preferably, the array comprises 6X 750,000 probes, with 2, 3, 4, 5, 6, 7 or 8 or more probes on each side of each restriction site. Most preferably, the array comprises 6X 750,000 probes, with 3 probes on each side of each restriction site.
The probe array or set of probes may be synthesized on a support in a step-by-step fashion, or may be attached in a presynthesized form. One method of synthesis is vlsis (tm) (as described in US 5,143,854 and EP 476,014), which must be synthesized in high density, miniaturized arrays using light directed oligonucleotide probes. Algorithms such as those described in US 5,571,639 and US 5,593,839 are used to design overlays (masks) that reduce the number of synthesis cycles. Arrays can also be synthesized in a combinatorial fashion by delivering monomers to support elements via mechanically constrained tracks, as described in EP 624,059. Arrays can also be synthesized by spotting reagents onto a support with an inkjet printer (see, e.g., EP 728,520).
In the context of the present invention, the terms "substantial set of probes", "substantial array of probes" means that the set or array of probes includes at least about 50, 60, 70, 80, 90, 95, 96, 97, 98, or 99% of all or the entire set or array of probes. The set or array of probes is preferably a complete or complete set of probes (i.e., 100%).
In a preferred embodiment, the array comprises a single unique probe on each side of each primary restriction enzyme recognition site present in a given genome. If this number of probes exceeds the number of probes that a single array can contain, the array may preferably still contain a representation of the complete genome of a given species, but with lower resolution, e.g.every 2, 3, 4, 5, 6, 7,8, 9, 10 in sequence on a linear chromosome template2、103Or 104One of the probes is present on the array. For example, in the case where translocation ligands are to be found, arrays that encompass the entire human or other genome at sub-optimal resolution may be preferred over high resolution arrays that encompass a portion of the same genome.
A representation of the complete genome of a given species at lower resolution is preferably obtained by probes on an array, each representing a single restriction fragment obtained after digestion with a first restriction enzyme. This is preferably achieved by omitting one from every second, third, fourth, fifth, sixth, seventh, eighth, ninth, tenth, twenty, thirty, forty, fifty, sixty, seventy, eighty, ninety, or one hundred, etc., probes that hybridize to the same restriction fragment.
A lower resolution representation of the complete genome of a given species preferably comprises probes evenly distributed along a linear chromosome template. This is preferably achieved by omitting one or more probes in the genomic region that shows the highest probe density.
Hybridization of
The term "hybridization" as used herein shall include "the process by which a strand of nucleic acid joins with a complementary strand through base pairing" as well as the process of amplification as carried out in Polymerase Chain Reaction (PCR) technology.
A nucleotide sequence capable of selective hybridization will generally be at least 75%, preferably at least 85 or 90% and more preferably at least 95% or 98% homologous to the corresponding complementary nucleotide sequence over a region of at least 20, preferably at least 25 or 30, e.g.at least 40, 60 or 100 or more nucleotides.
"specifically hybridize" refers to binding, dimerizing, or hybridizing a molecule to a specific nucleotide sequence only under stringent conditions (e.g., 65 ℃ and 0.1 XSSC {1 XSSC ═ 0.15M NaCl, 0.015M sodium citrate pH7.0 }). Stringent conditions are those under which the probe will hybridize to its target sequence but not to other sequences. Stringent conditions are sequence dependent and will be different in different circumstances. Longer sequences hybridize specifically at higher temperatures. Generally, stringent conditions are selected to be about 5 ℃ below the thermal melting point (Tm) for a particular sequence, under defined ionic strength and pH. The Tm is the temperature (under defined conditions of ionic strength, pH, and nucleic acid concentration) at which 50% of the probes complementary to the target hybridize to the target sequence at equilibrium (since the target sequence is generally present in excess, at Tm, 50% of the probes are occupied by the equilibrium). Typically, stringent conditions include a salt concentration of at least about 0.01-1.0M Na (or other salt) ion concentration at pH 7.0-8.3, and a temperature of at least about 30 ℃ for short probes. Stringent conditions may also be achieved with the addition of destabilizing agents such as formamide or tetraalkylammonium salts.
As will be appreciated by those skilled in the art, hybridization of maximum stringency can be used to identify or detect identical nucleotide sequences, while hybridization of intermediate (or low) stringency can be used to identify or detect similar or related polynucleotide sequences.
Methods of hybridizing probe arrays to labeled or unlabeled nucleotide sequences are also described. Specific hybridization reaction conditions can be controlled to alter hybridization (e.g., increase or decrease probe/target binding stringency). For example, reaction temperature, anion and cation concentrations, addition of detergents, etc., can alter the hybridization characteristics of the array probes and target molecules.
Frequency of interaction
Quantifying the ligation frequency of the restriction fragments can measure their cross-linking frequency. Suitably, PCR may then be used to obtain it using conventional 3C techniques as described by Splint et al (2004) (see above). Briefly, the formation of PCR products was measured by separation on an ethidium bromide stained agarose gel followed by scanning the signal intensity with a Typhoon 9200 imager (Molecular Dynamics, Sunnyvale, CA). Suitably, several controls are used to interpret the data correctly, also as described by Splint et al (2004) (see above).
Since the 4C technique described herein provides a method for high throughput analysis of the frequency of interaction of two or more nucleotide sequences in the nuclear space, it is preferred to use the arrays described herein to quantify the ligation frequency of restriction fragments.
For quantification, the signal obtained in the 4C sample was normalized with the signal obtained for the control sample. The 4C sample and one or more control samples are labeled with different and distinguishable labels (e.g., dyes) and will be hybridized to the array simultaneously. The control sample or samples will typically contain all equimolar amounts of the DNA fragments (i.e. all potential second nucleotide sequences linked to the first (target) nucleotide sequence) and should be similar in size to the second nucleotide sequence or sequences in order to rule out a preference for hybridization efficiency. Thus, the control template will typically comprise genomic DNA (of the same genetic background as the genomic DNA used to obtain the 4C template) which is digested with the first and second restriction enzymes and labeled with the same method as the 4C template (e.g., random priming). Such a control template makes it possible to correct for probe-to-probe differences in hybridization efficiency. Normalizing the 4C array signal relative to the control array signal makes it possible to express the results in an enriched manner rather than a random event.
The labeled 4C template can even be hybridized to an array of control samples with or without different labels and other 4C templates with or without one or more different labels. The other 4C template may be unrelated to the 4C template, e.g., it may be obtained from a different tissue and/or with a different set of reverse PCR primers. For example, the first 4C template may be the material of a patient, while the second 4C template may be obtained from a healthy subject or a control sample.
Given the surprising hybridization pattern expected from gene rearrangement, it is not always necessary to compare diseased subjects with healthy subjects. Thus, multiple (e.g., two or more) 4C templates (each of which may study a different locus from the same patient or subject) may be hybridized to one (e.g., one or more) array.
The 4C templates may be differently labeled (e.g., hybridized with two or more colors), and/or the same label in the event that the locus is normally located on a different chromosome or is sufficiently distant on the same chromosome that there is minimal overlap between DNA-DNA interaction signals. For example, material from subjects with T cell leukemia can be treated to obtain 4C templates for TCR α/(which is labeled with one color, thereby enabling detection of translocation), and MLL, TAL1, HOX11, and LMO2 (each 0 labeled with the same second color, thereby enabling detection of other gene rearrangements). These 5 4C templates can be hybridized to an array to allow simultaneous analysis of disease-associated genomic rearrangements at multiple loci.
To quantify the frequency of the interaction, the absolute signal intensity or ratio relative to the control sample may also be considered. In addition, adjacent probe signals on a linear chromosome template can be used to identify interacting chromosomal regions. The positional information is preferably analyzed for absolute signal intensity or ratio relative to control template signal by sequencing the probes on a linear chromosome template and by a conversion windowing method, e.g., using running average or running median.
Test method
In another aspect of the invention, an assay is provided that identifies one or more agents that modulate DNA-DNA interactions.
The term "modulate" as used herein refers to preventing, reducing, inhibiting, restoring, elevating, increasing, or otherwise affecting DNA-DNA interactions.
In some cases, it is desirable to evaluate two or more agents together for modulating DNA-DNA interactions. In these cases, the test may be conveniently modified by adding the one or more additional reagents at the same time or after the first reagent.
The methods of the invention may also be screening methods whereby a number of agents are tested for their ability to modulate the activity of DNA-DNA interactions.
The test methods of the invention are expected to be suitable for small and large scale screening of reagents and for quantitative testing.
Medical applications of these therapeutic agents are included within the scope of the present invention, as are drug development procedures per se and pharmaceutical compositions containing these agents. For example, a drug development procedure may comprise taking an agent identified or identifiable by a method described herein, optionally modifying it (e.g., modifying its structure and/or providing a new composition comprising the moiety) and performing further research (e.g., toxicity research and/or research on activity, structure or function). The assay can be performed on non-human animals and ultimately on humans. These tests will generally involve the determination of one or more effects at different dosage levels. The drug development program can utilize a computer to analyze the portions identified by the screening method (e.g., predict structure and/or function, identify potential agonists or antagonists, search for other portions that may have similar structure or function, etc.).
Diagnostic test
Currently, various genomic rearrangements are still difficult to detect by molecular-cellular genetic techniques. Although the array comparative genomic hybridization technique (array-CGH) is a newly developed technique to detect chromosomal amplification and/or deletion at a resolution of 35-300Kb, it is not suitable for detecting balanced translocations and chromosomal inversions. On the other hand, Spectral Karyotyping (SKY) or conventional karyotyping is often performed on patient material to detect chromosomal translocations and changes in numbers, but the resolution to determine translocation breakpoints is low, typically 10-50Mb and 5-10Mb, respectively. Therefore, the results obtained with these two methods (especially SKY) will lead to time-consuming and laborious validation experiments like Fluorescence In Situ Hybridization (FISH) and molecular breakpoint cloning strategies.
The 4C technique involves a process by which any chromosomal rearrangement can be detected based on changes in the frequency of interaction between physically linked DNA sequences. Thus, the 4C technique can be used to identify (recurrent) chromosomal rearrangements for most human malignancies/multiple congenital malformations or mental retardation. An important advantage of the 4C technique is that it allows very accurate mapping of breakpoints over regions of only a few thousand base pairs. Another advantage of the 4C technique is that the exact breakpoint location need not be known in advance, since breakpoints can be detected even when the 4C-decoy sequence is 1-5Mb from the breakpoint. This also has the advantage that the same decoy sequence can be used to detect specific chromosomal rearrangements covering large breakpoint regions. Accurate mapping of genomic rearrangements by 4C techniques would greatly facilitate the identification of one or more aberrantly expressed genes involved in a disease or genetic disorder, which would play an important role in better understanding genotype-phenotype interrelationships, aid in making treatment decisions, and add important prognostic information.
In one embodiment of the invention, a normal or standard value for a subject is established in order to provide a basis for diagnosing or prognosing a disease. This is obtained by testing a sample taken from a normal subject, such as an animal or human. The frequency of DNA-DNA interactions can be quantified by comparing it to a series of dilutions of a positive control. Normal values from a normal sample can then be compared to values from a sample from a subject that is or is potentially affected by the disease or condition. The deviation between the standard and subject values establishes the presence of a disease state.
These diagnostic tests can be modified to assess the efficacy of a particular treatment regimen and used in animal studies, clinical trials, or for monitoring treatment of individual patients. In order to provide a basis for diagnosing disease, a normal or standard profile of DNA-DNA interactions is established. Standard values from normal samples can be compared to values from samples from subjects that are or are potentially affected by the disease or disorder. The deviation between the standard and subject values establishes the presence of a disease state. If a disease is identified, an existing therapeutic agent can be administered and a therapeutic profile or value generated. Finally, the method is repeated regularly to assess whether the value goes towards or returns to normal or standard mode. A continuous treatment profile can be used to show the efficacy of treatment over a period of days or months.
The 4C technique accurately detects at least 5Mb of genomic DNA linked in cis to the nucleotide sequence to be analyzed (see FIGS. 2-3 and 5). Advantageously, the 4C technique can be used to detect any genomic abnormality that accompanies changes in genomic locus separation between the rearranged sequence and the selected 4C sequence (bait). For example, the alteration may be an increase or decrease in genomic locus segregation, or may be an insufficient representation (as in a deletion) or an excessive representation (as in replication) of sequences adjacent (e.g., up to or greater than 15Mb apart) to the 4C sequence (decoy). Typically, the genomic abnormality or rearrangement is the cause of or associated with a disease, such as cancer (e.g., leukemia) and other genetic or congenital diseases as described herein.
Genetic abnormalities (e.g., genomic or chromosomal abnormalities-such as balanced and/or unbalanced genomic or chromosomal abnormalities) include, but are not limited to, rearrangements, translocations, inversions, insertions, deletions, and other mutations of nucleic acids (e.g., chromosomes) and loss or gain of a partial or complete chromosome. They are the major cause of genetic disorders or diseases, including both congenital and acquired disorders, such as malignancies. In many rearrangements, 2 different chromosomes are involved. In this manner, a gene (or gene fragment) is removed from the normal physiological contents of a particular chromosome and is located on the recipient chromosome adjacent to an unrelated gene or gene fragment (typically an oncogene or proto-oncogene).
Malignant tumors may include acute leukemias, malignant lymphomas, and solid tumors. Non-limiting examples of changes are: t (14; 18), which usually occurs in NHL; t (12; 21), which is commonly found in the childhood precursor, B-ALL; and 11q23(MLL (myeloid-lymphoid leukemia or mixed lineage leukemia) gene abnormalities that appear in acute leukemias.
The MLL gene in chromosomal region 11q23 is involved in several translocations in ALL and Acute Myeloid Leukemia (AML). To date, at least 10 partner genes have been identified. Some of these translocations (e.g., t (4; 11) (q 21; q23), t (11; 19) (q 23; p13) and t (1; 11) (p 32; q23)) occur primarily in ALL; while others like t (1; 11) (q 21; q23), t (2; 11) (p 21; q23), t (6; 11) (q 27; q23) and t (9; 11) (p 22; q23) are more commonly observed in AML. Rearrangements involving the 11q23 region occur very frequently in acute leukemia in infants (about 60-70%) and less frequently in childhood and adult leukemia (about 5% respectively).
Rearrangements in lymphoid malignancies often involve Ig or TCR genes. Examples include three classes of translocations found in Burkitt's lymphoma (t (8; 14), t (2; 8), and t (8; 22)) in which the MYC gene is coupled to an Ig heavy chain (IGH), Ig kappa (IGK), or Ig λ (IGL) gene segment, respectively. Another common type of translocation within this class is t (14; 18) (q 32; q21), which is observed in about 90% of follicular lymphomas, which are one of the major NHL types. In this translocation, the BCL2 gene is rearranged to a region within the JH gene segment or within the IGH locus adjacent thereto. The result of this chromosomal abnormality is the overexpression of BCL2 protein, which acts as a survival factor in growth control by inhibiting programmed cell death.
The BCL2 gene is composed of three exons, but these are dispersed in a large region. The last exon encodes a large 3 'untranslated region (3' UTR). The 3' UTR is one of 2 regions in which many t (14; 18) breakpoints cluster and is called the "major breakpoint region"; another breakpoint region involved in the t (14; 18) translocation is located 20-30kb downstream of the BCL2 locus and is referred to as the "minor clustering region". The third BCL2 breakpoint region, VCR (variant grouping region), is located 5' to the BCL2 locus and in other regions involved in variant translocation, i.e., t (2; 18) and t (18; 22), for which IGK and IGL gene fragments are partner genes.
Thus, for example, 4C technology can be used to screen patient material for genetic abnormalities in or near loci that are selected based on their frequent association with a given clinical phenotype. Other non-limiting examples of these loci are AML1, MLL, MYC, BCL, BCR, ABL1, immunoglobulin loci, LYL1, TAL1, TAL2, LMO2, TCR α/, TCR β, HOX, and other loci in various lymphoblastic leukemias.
Advantageously, if a genetic abnormality is suspected, the 4C technique can be used as an initial and sole screening method to verify and map the presence of the abnormality explained herein.
Detecting genomic rearrangements
In a particularly preferred embodiment of the invention, the methods described herein can be used to detect genomic rearrangements.
Currently, genomic rearrangements (such as translocation breakpoints) are very difficult to detect. For example, Comparative Genomic Hybridization (CGH) microarrays can detect several types of rearrangements but not translocations, and if a translocation is suspected in a patient but a chromosomal partner is not known, Spectral Karyotyping (SKY) can be performed to find the translocation partner and to roughly estimate the breakpoint location. However, the resolution is very low (typically not more than-50 Mb) and additional fine mapping (which is time consuming and expensive) is often required. This is usually done using Fluorescence In Situ Hybridization (FISH), which also provides only limited resolution. Using FISH, breakpoints can be localized to the +/-50kb region at maximum resolution.
The frequency of DNA-DNA interactions is primarily a function of the separation distance of genomic sites, i.e., the frequency of DNA-DNA interactions is inversely proportional to the linear distance (in kilobases) between 2 DNA loci that appear on the same physical DNA template (Dekker et al, 2002). Thus, a translocation that produces one or more new physical DNA templates is accompanied by a change in DNA-DNA interactions near the breakpoint, and this can be measured by the 4C technique. Translocation-based diseases are often caused by abnormal DNA-DNA interactions, because translocations are the result of broken chromosome (DNA) arm physical connections (interactions).
Thus, to detect translocation. 4C technology can be used to identify different DNA-DNA interactions between those with and without disease.
For example, 4C technology can be used to screen patient material for translocations near loci that are selected based on their frequent association with a given clinical phenotype as described herein.
If a translocation is suspected in the patient but no chromosomal partners are known, initial mapping can be performed using currently available methods, like Spectral Karyotyping (SKY). This allows identification of translocation partners and very rough assessment of breakpoint locations (typically not better than-50 Mb resolution). The 4C technique can then be used to fine map breakpoints, for example, using 'decoy' -sequences located at every 2Mb, 5Mb, 10Mb, 20Mb (or other intervals as described herein) in the region, and identify one or more genes that are misexpressed, such as due to translocation.
Typically, translocation identification is performed by a sudden change from low to high in the frequency of interaction on a chromosome that does not contain a 4C-decoy sequence, or elsewhere on the same chromosome.
In a preferred embodiment, the sample from the subject is in a pre-malignant state.
In a preferred embodiment, the subject's sample consists of cultured or uncultured amniotic fluid cells obtained by amniocentesis for prenatal diagnosis.
In a preferred array design, the probes present on a single array represent the entire genome of a given species with maximum resolution. Thus, arrays or the like for detecting translocations by 4C techniques comprise probes complementary to each side of each primary restriction enzyme recognition site in the genome of a given species (e.g., human) as described herein.
In another preferred array design, the probes present on a single array represent the entire genome of a given species, but the resolution is not maximal. Thus, arrays or the like for detecting translocations by 4C techniques comprise probes complementary to only one side of each primary restriction enzyme recognition site in the genome of a given species (e.g., human) as described herein.
In another preferred array design, the probes present on a single array represent the entire genome of a given species, but the resolution is not maximal. Thus, arrays for detection of translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques comprise probes complementary to one side of every other primary restriction enzyme recognition site arranged along a genomic linear template of a given species (e.g., human), as described herein.
Thus, arrays for detection of translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques comprise probes as described herein each of which represents a single restriction fragment obtained after digestion with a primary restriction enzyme. This is preferably achieved by omitting one of every two, three, four, five, six, seven, eight, nine, ten, twenty, thirty, forty, fifty, sixty, seventy, eighty, ninety, or one hundred, etc., probes that hybridize to the same restriction fragment. Arrays for detecting translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques may comprise probes evenly distributed along a linear chromosome template as described herein. This is preferably achieved by omitting one or more probes in those genomic regions that show the highest probe density.
In another preferred array design, the probes present on a single array represent the entire genome of a given species, but not at maximum resolution. Thus, arrays for detection of translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques comprise probes complementary to one side of one of every third, four, five, six, seven, eight, nine, ten, twenty, thirty, forty, fifty, sixty, seventy, eighty, ninety, or one hundred, etc., first restriction enzyme recognition sites arranged sequentially along a linear template of a genome of a given species (e.g., human), as described herein. Arrays for detection of translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques may comprise probes representing the entire genome, as described herein, one probe per 100 kilobases. Arrays for detecting translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques may comprise probes as described herein representing each individual primary restriction enzyme recognition site in a genome that may be represented by a unique probe sequence.
In another preferred array design, probes on a single array as described herein represent genomic regions of a given size, e.g., about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, or 10Mb (e.g., about 50kb-10Mb) around all 0 loci known to be involved in translocations, deletions, inversions, duplications, and other genomic rearrangements.
In another preferred array design, probes on a single array as described herein represent genomic regions of a given size, e.g., about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, or 10Mb (e.g., about 50kb-10Mb) around a selected locus known to be involved in translocations, deletions, inversions, duplications, and other genomic rearrangements. Selection may be performed according to the criteria taught, e.g., they may simply represent the loci contained in a given disease type.
In another preferred array design, probes on a single array as described herein represent 100kb, 200kb, 300kb, 400kb, 500kb, 600kb, 700kb, 800kb, 900kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, 10Mb, 20Mb, 30Mb, 40Mb, 50Mb, 60Mb, 70Mb, 80Mb, 90Mb, or 100Mb (e.g., 100kb-10Mb) genomic regions of interest for a (part of) chromosome or chromosomes, wherein each probe is represented multiple times (e.g., 10, 100, 1000 times) so that the hybridization signal intensity at each probe sequence can be quantitatively measured.
In a preferred experimental design, the 4C sequence (decoy) is within about 0kb, 10kb, 20kb, 30kb, 40kb, 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, 10Mb, 11Mb, 12Mb, 13Mb, 14Mb or 15Mb (e.g., about 0-15Mb) or more from the actual rearranged sequence (i.e., the breakpoint in the case of translocation).
In a preferred hybridization, 2 differentially labeled 4C templates derived from 1 sequence (4C decoy) from both diseased and non-diseased subjects are simultaneously hybridized to the same array. Differences in DNA-DNA interactions allow detection of breakpoints in cis (on the same chromosome as the 4C-decoy) and trans (on the translocation partner).
In a preferred hybridization, multiple differentially labeled 4C templates derived from 1 sequence (4C decoy) from both diseased and non-diseased subjects are hybridized to the same array simultaneously. Differences in DNA-DNA interactions allow detection of breakpoints in cis (on the same chromosome as the 4C-decoy) and trans (on the translocation partner).
Advantageously, multicolor analysis on microarrays can be used instead of two-color analysis to allow more than 2 samples to be hybridized to a single array simultaneously. Thus, multi-color hybridization can be used in 4C technology.
In a preferred hybridization, multiple differentially labeled 4C templates derived from 1 sequence (4C decoy) from diseased subjects and 1 differentially labeled 4C template from non-diseased subjects are hybridized to the same array simultaneously. Differences in DNA-DNA interactions allow detection of breakpoints in cis (on the same chromosome as the 4C-decoy) and trans (on the translocation partner).
In another preferred hybridization, 2 differentially labeled 4C templates from the same non-diseased subject obtained with 2 different sequences (4C-decoys) representing respectively another possible translocation partner can be hybridized simultaneously with the same array. The strong hybridization signal bundle observed on a linear template of chromosomes unrelated to the chromosome bearing the sequence of interest (4C-decoy) will identify the translocation partner chromosome and the breakpoint on the translocation partner.
In another preferred hybridization, multiple differentially labeled 4C templates from the same non-diseased subject, obtained with multiple different sequences (4C-decoys) representing respectively another possible translocation partner, can be hybridized simultaneously to the same array. The strong hybridization signal bundle observed on a linear template of the chromosome unrelated to the chromosome bearing the sequence of interest (4C-decoy) will identify the translocation partner chromosome and its breakpoint for the sequence of interest.
Materials for detecting translocations, deletions, inversions, duplications and other genomic rearrangements by 4C techniques can be obtained by cross-linking (and further processing as described) live and/or dead cells and/or nuclear lysates from diseased and/or non-diseased subjects and/or isolated chromatin (as described herein) and the like.
Detecting a reversal
Inversions (e.g., balanced inversions) cannot be detected by methods such as comparative genomic hybridization techniques, but can be detected by 4C techniques, particularly when the (balanced) inversions are close to the 4C sequence (bait) (e.g., up to about 1-15Mb or more).
The detection of (balanced) inversions is based on the identification of those DNA-DNA interactions that differ between diseased and non-diseased subjects. Inversion will alter the relative position (in kilobases) on the physical DNA template of all (but the most centrally located) sequences of the rearranged region relative to nearby sequences on the same chromosome as the 4C sequence (decoy). Since the frequency of DNA-DNA interactions is inversely proportional to the genomic site separation distance, diseased subjects will give an inverted pattern of hybridization intensity for all probes located in the rearranged genomic region compared to non-diseased subjects. Thus, the 4C technique is able to identify the location and size of the (balanced) inversion.
According to this aspect of the invention, a preferred specialized array design comprises probes on a single array that represent a genomic region of a given size, such as about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb or 10Mb (e.g., 50kb-10Mb) around a locus suspected of inversion and other rearrangements.
In another preferred specialized array design, probes on a single array represent genomic regions of a given size (50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, etc.) around loci suspected of having inversions and other rearrangements. In order to reliably quantify signal intensity, the amount of probe present on the array is typically in large excess over the amount of cognate fragments hybridized to the array. Therefore, it may be necessary to have each probe appear on the array multiple times (e.g., 10, 20, 50, 100, 1000, etc.). In addition, it may be necessary to titrate the amount of template hybridized to the array.
Detecting absence
The detection of deletions is based on the identification of those DNA-DNA interactions that differ between diseased and non-diseased subjects. Deletions will result in a lack of DNA interaction with 4C sequences (baits) located near the deleted region (e.g., about 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, or 15Mb or more). If the deletion is present on both alleles (homozygous) it may result in a complete lack of hybridization signal for all probes located in the rearranged region, or if the deletion is present on only one allele (heterozygous) it may result in a decrease in signal intensity in diseased subjects compared to non-diseased subjects. Deletions bring sequences further away on the physical DNA template closer to the 4C sequence to be analyzed (bait), which will result in a stronger hybridization signal to probes located directly on the farther side of the deletion region.
Detecting one or more repetitions
Repeated detection is generally based on identifying those DNA-DNA interactions that differ between diseased and non-diseased subjects. Probes in the repeat region will show enhanced hybridization signals to 4C sequences (baits) located near the rearranged region (e.g., about 1, 2, 3, 4, 5, 6, 7,8, 9, 10, 11, 12, 13, 14, or 15Mb or more) compared to the signal from control, non-diseased subjects. Probes further to the repeat region are further away from the 4C sequence and will therefore show a reduced hybridization signal than that from a control, non-diseased subject.
An increase or decrease in the frequency of DNA-DNA interaction in the subject sample compared to the control is preferably indicative of a duplication or insertion.
An increase in the frequency of DNA-DNA interaction and/or a decrease in the frequency of DNA-DNA interaction for a more distant region of the subject sample compared to the control is preferably indicative of a duplication or insertion.
Prenatal diagnosis
Advantageously, the 4C technique can also be used for prenatal diagnosis.
Nucleic acids can be obtained from a fetus using a variety of methods known in the art. Amniotic fluid may be obtained, for example, by amniocentesis, from which a fetal cell suspension is extracted and cultured for several days (Mercier & Bresson (1995) Ann. Gnt., 38, 151-157). Nucleic acids are then extracted from the cells. Collecting chorionic villi may eliminate the culturing step and avoid the collection of amniotic fluid. These techniques can be applied earlier (up to 7 weeks of pregnancy for chorionic villi collection; 13-14 weeks for amniocentesis) but slightly increase the risk of abortion.
Fetal blood collected directly on the umbilicus can also be used to obtain nucleic acids, but this usually requires a clinician team specialized in the technology (Donner et al (1996) Total digital Ther., 10, 192- "199).
Advantageously, genetic abnormalities (e.g., genomic or chromosomal abnormalities), such as rearrangements, translocations, inversions, insertions, deletions and other mutations in chromosomes and nucleic acids, can be detected at this stage.
Preferably, genetic abnormalities (e.g., genomic or chromosomal abnormalities) such as rearrangements, translocations, inversions, insertions, deletions and other mutations in chromosome 21, 18, 13, X or Y and loss or gain of part or all of chromosome 21, 18, 13, X or Y are detected because most fetal abnormalities occur in these chromosomes.
Determination of genomic integration sites
The 4C technique also allows the determination of the genomic integration site for viruses and transgenes, etc., when multiple copies are inserted at different locations in the genome (as depicted in fig. 4).
Determining the propensity to acquire a certain translocation
Beneficially, the 4C technique can also be used in non-diseased subjects to measure the genomic environment of loci frequently involved in genetic abnormalities. In this way, it is possible to determine a subject's propensity to acquire a certain genetic abnormality.
Thus, in addition to the medical applications described herein, the present invention may be used for diagnostics.
Test subject
The term "subject" includes mammals, such as animals and humans.
Reagent
The reagent may be an organic compound or other chemical. The agent may be a compound obtained or produced from any suitable source, whether natural or artificial. The agent may be an amino acid molecule, a polypeptide, or a chemical derivative thereof, or a combination thereof. The agent may even be a polynucleotide molecule, which may be a sense or antisense molecule, or an antibody, e.g., a polyclonal antibody, a monoclonal antibody or a monoclonal humanized antibody.
Various strategies have been developed to produce monoclonal antibodies with human characteristics that do not require antibody-producing human cell lines. For example, useful mouse monoclonal antibodies are "humanized" by linking rodent variable regions to human constant regions (Winter, G., and Milstein, C. (1991) Nature 349, 293-. This reduces the human anti-mouse immunogenicity of the antibody, but residual immunogenicity is still retained by the foreign V-region framework. Furthermore, antigen binding specificity is primarily of murine donors. CDR grafting and framework manipulation (EP 0239400) improved and optimized antibody manipulation to the extent that it was possible to produce humanized murine antibodies that could be used therapeutically in humans. Humanized antibodies can be obtained using methods known in the art (e.g., as described in US-A-239400).
The agent may be attached to the entity (e.g., an organic molecule) via a linker which may be a hydrolytic type bifunctional linker.
Entities can be designed or obtained from libraries of compounds, including peptides, as well as other compounds, such as small organic molecules.
For example, the entity can be a natural substance, a biological macromolecule, or an extract prepared from cells or tissues such as bacteria, fungi, or animals (particularly mammals), organic or inorganic molecules, synthetic agents, semi-synthetic agents, structural or functional mimetics, peptides, peptidomimetics, peptides cleaved from intact proteins, or peptides that are synthetic (such as, for example, using a peptide synthesizer or by recombinant techniques or combinations thereof), recombinant agents, antibodies, natural or non-natural agents, fusion proteins, or equivalents thereof, and mutants, derivatives, or combinations thereof.
The entity will typically be an organic compound. For some cases, the organic compound will comprise two or more hydrocarbyl groups. As used herein, the term "hydrocarbyl" refers to a group that comprises at least C and H and optionally may comprise one or more other suitable substituents. Examples of such substituents may include halogen, alkoxy, nitro, alkyl, cyclic groups, and the like. In addition to the substituents possibly being cyclic groups, combinations of substituents may form cyclic groups. If the hydrocarbyl group contains more than one C, those carbons need not be linked to each other. For example, at least 2 carbons may be attached through a suitable element or group. Thus, the hydrocarbyl group may comprise heteroatoms. Suitable heteroatoms will be apparent to those skilled in the art and include, for example, sulfur, nitrogen and oxygen. For some applications, the entity preferably comprises at least one cyclic group. The cyclic group can be a polycyclic group, such as a non-fused polycyclic group. For some applications, the entity comprises at least one of said cyclic groups attached to another hydrocarbyl group.
The entity may comprise a halogen group such as a fluoro, chloro, bromo or iodo group.
The entity may comprise one or more alkyl, alkoxy, alkenyl, alkylene and alkenylene groups, which may be linear or branched.
Prodrugs
One skilled in the art will appreciate that the entity may be derived from a prodrug. Examples of prodrugs include certain protecting groups which are not pharmaceutically active themselves, but which in some cases may be administered (e.g., orally or parenterally) and which have the opportunity to be metabolized in the body to form a pharmaceutically active entity.
Suitable prodrugs may include, but are not limited to, adriamycin, mitomycin, phenol mustard, methotrexate, antifolate, chloramphenicol, camptothecin, 5-fluorouracil, cyanide, quinine, dipyridamole, and taxol.
It will be further understood that certain moieties known as "precursor moieties", for example as described in H.Bundgaard "Designoff precursors", Elsevier, 1985, may be placed on the appropriate functionality of the reagent. These prodrugs are also included within the scope of the present invention.
The agent may be in the form of a pharmaceutically acceptable salt (e.g. an acid addition salt or a base salt) or solvate thereof, including hydrates thereof. For an overview of suitable salts, see Berge et al, j.pharm.sci., 1977, 66, 1-19.
The agent can exhibit other therapeutic properties.
The agent may be used with one or more other pharmaceutically active agents.
If a combination of active agents is administered, the combination of active agents may be administered simultaneously, separately or sequentially.
Stereo and geometric isomers
An entity may exist in stereoisomeric and/or geometric forms-e.g., an entity may possess one or more asymmetric and/or geometric centers and thus may exist in two or more stereoisomeric and/or geometric forms. The present invention contemplates the use of individual stereoisomers and geometric isomers of all those entities as well as mixtures thereof.
Pharmaceutical salt
The agent may be administered in the form of a pharmaceutically acceptable salt.
Pharmaceutically acceptable salts are well known to those skilled in the art and include, for example, those described in J.pharm.Sci. 66, 1-19(1977) by Berge et al. Suitable acid addition salts are formed from acids which form non-toxic salts and include the hydrochloride, hydrobromide, hydroiodide, nitrate, sulphate, bisulfate, phosphate, hydrogen phosphate, acetate, trifluoroacetate, gluconate, lactate, salicylate, citrate, tartrate, ascorbate, succinate, maleate, fumarate, gluconate, formate, benzoate, methanesulfonate, ethanesulfonate, benzenesulfonate and p-toluenesulfonate salts.
When one or more acidic moieties are present, suitable pharmaceutically acceptable base addition salts can be formed from bases that form non-toxic salts, including salts of aluminum, calcium, lithium, magnesium, potassium, sodium, zinc, and pharmaceutically active amines such as diethanolamine.
Pharmaceutically acceptable salts of the agents can be conveniently prepared by mixing together a solution of the agent and the desired acid or salt under suitable conditions. The salt may be precipitated from the solution and collected by filtration, or may be harvested by evaporation of the solvent.
The agent may be present in polymorphic form.
The agent may comprise one or more asymmetric carbon atoms and thus exist in two or more stereoisomeric forms. When the reagent contains an alkenyl or alkenylene group, cis (E) and trans (Z) isomerism may also occur. The invention includes the individual stereoisomers of the agents and, where appropriate, the individual tautomeric forms thereof, as well as mixtures thereof.
Separation of diastereomers or cis and trans isomers may be obtained by conventional techniques, such as fractional crystallization of the reagents or stereoisomeric mixtures of suitable salts or derivatives thereof, chromatography, or h.p.l.c. The individual enantiomers of the reagents may also be prepared from the corresponding optically pure intermediates, or by resolution (e.g. by h.p.l.c. of the corresponding racemate using a suitable chiral support), or by fractional crystallization from diastereomeric salts formed by reaction of the corresponding racemate with a suitable optically active acid or base under suitable conditions.
The agent may also include all suitable isotopic variations of the agent or a pharmaceutically acceptable salt thereof. Isotopic variations of an agent or a pharmaceutically acceptable salt thereof are defined as molecules wherein at least one atom is replaced by an atom having the same atomic number but an atomic weight different from the atomic weight usually found in nature. Examples of isotopes that can be incorporated into agents and pharmaceutically acceptable salts thereof include isotopes of hydrogen, carbon, nitrogen, oxygen, phosphorus, sulfur, fluorine and chlorine, respectively, such as2H、3H、13C、14C、15N、17O、18O、31P、32P、35S、18F and36and (4) Cl. Isotopic variations of certain agents and pharmaceutically acceptable salts thereof, e.g. integration such as3H or14Those of the radioisotope of C, useful for drugs and/or stromal tissueAnd (5) distribution research. A tritium-containing (i.e.,3H) and carbon-14 (i.e.,14C) isotopes are particularly preferred because of their ease of preparation and detection. In addition, the metal can be doped with a metal such as deuterium (i.e.,2H) certain therapeutic advantages may be obtained due to greater metabolic stability, for example, increased in vivo half-life or reduced dosage requirements, and thus may be preferred in some circumstances. Isotopic variations of the agents of the present invention and pharmaceutically acceptable salts thereof can generally be prepared by conventional methods using suitable isotopic variations of suitable agents.
Pharmaceutically active salts
The agent may be administered as a pharmaceutically acceptable salt. Generally, pharmaceutically acceptable salts can be conveniently prepared using the desired acid or base, as appropriate. The salt may be precipitated from the solution and collected by filtration, or may be harvested by evaporation of the solvent.
Chemical synthesis method
The reagents may be prepared by chemical synthesis techniques.
It is clear to the skilled person that sensitive functional groups need to be protected and deprotected when synthesizing the compounds of the invention. This can be achieved by conventional techniques, such as those described by "protective Groups in Organic Synthesis" by T W Greene and PG M Wuts, John Wiley and Sons Inc. (1991), and "protective Groups" by P.J. Kocienski, Georg Thieme Verlag (1994).
In some reactions, it is possible to racemize any stereocenters present under certain conditions, for example, if a base is used to react with a substrate having an optical center containing a base-sensitive group. This is possible, for example, in a guanylation step. Such as by selection of reaction sequences, conditions, reagents, protection/deprotection schemes, etc., in accordance with known methods, should overcome potential problems.
The compounds and salts can be isolated and purified by conventional methods.
The separation of the diastereomers may be obtained by conventional techniques, such as by fractional crystallization, chromatography, or h.p.l.c. of a stereoisomeric mixture of the compound of formula (I) or a suitable salt or derivative thereof. Each enantiomer of a compound of formula (I) may also be prepared from the corresponding optically pure intermediate, or by resolution, such as by h.p.l.c. of the corresponding racemate using a suitable chiral support, or by fractional crystallization of the diastereomeric salts formed by reaction of the corresponding racemate with a suitable optically active acid or base.
Reagents can be generated by chemical synthesis of whole or part of the reagent. For example, if the reagents comprise peptides, the peptides can be synthesized by solid phase techniques, cleaved from the resin, and purified by preparative high performance liquid chromatography (e.g., Creighton (1983) protein Structures and molecular principles, WH Freeman and Co, N.Y.). The composition of the synthetic peptide can be confirmed by amino acid analysis or sequencing (e.g., Edman degradation process; Creighton, supra).
The synthesis of peptide inhibitors (or variants, homologues, derivatives, fragments or mimetics thereof) can be carried out using various solid phase techniques (Roberge JY et al (1995) Science 269: 202-204) and can be automated, for example, using the ABI 431A peptide synthesizer (PerkinElmer) according to the instructions provided by the manufacturer. In addition, the amino acid sequence containing the agent can be altered during direct synthesis and/or chemically combined with sequences from other subunits, or any portion thereof, to produce variant agents.
Chemical derivatives
The term "derivative" or "derivatised" as used herein includes chemical modification of the agent. Examples of such chemical modifications are the replacement of hydrogen by halogen groups, alkyl groups, acyl groups or amino groups.
Chemical modification
The agent may be a modified agent-such as, but not limited to, a chemically modified agent.
Chemical modification of the agent may enhance or reduce hydrogen bonding interactions, charge interactions, hydrophobic interactions, van der waals interactions, or dipolar interactions.
In one aspect, the reagents can be used as models (e.g., templates) to develop other compounds.
Pharmaceutical composition
In another aspect, a pharmaceutical composition is provided comprising an agent identified by the test methods described herein admixed with a pharmaceutically acceptable carrier, diluent, excipient or adjuvant and/or combinations thereof.
In another aspect, a vaccine composition is provided that includes an agent.
In another aspect, a method of preparing a pharmaceutical composition is provided, comprising mixing an agent identified by a test with a pharmaceutically acceptable diluent, excipient or adjuvant and/or combinations thereof.
In another aspect, a method of preventing and/or treating a disease is provided, comprising administering to a subject an agent or a pharmaceutical composition or a vaccine.
The pharmaceutical compositions may be for use in human or animal medicine, which will typically comprise one or more pharmaceutically acceptable diluents, carriers, or excipients. Acceptable carriers or diluents for use in therapy are known in the pharmaceutical art and are described, for example, in Remington's pharmaceutical sciences, Mack Publishing co. (a.r. gennaro eds., 1985). The choice of pharmaceutical carrier, excipient or diluent can be selected according to the intended route of administration and standard pharmaceutical practice. The pharmaceutical composition may comprise any suitable binder or binders, lubricant or lubricants, suspending agent or solvents, or a carrier, excipient or diluent in addition to or as appropriate.
Preservatives, stabilizers, dyes and even flavoring agents may be provided in the pharmaceutical composition. Examples of preservatives include sodium benzoate, sorbic acid and esters of p-hydroxybenzoic acid. Antioxidants and suspending agents may also be used.
There may be different combination/formulation requirements depending on the different delivery systems. For example, the pharmaceutical compositions of the present invention may be formulated for administration using a minipump or by a mucosal route, such as for example as a nasal spray or an inhalation aerosol or an ingestible solution, or by a parenteral route, wherein the composition is formulated for delivery in an injectable form, such as by intravenous, intramuscular or subcutaneous routes. Alternatively, the dosage form may be designed to be administered by several routes.
If the agent is administered mucosally through the gastrointestinal mucosa, it should remain stable during transit through the gastrointestinal tract; for example, it should be resistant to proteolytic enzyme degradation, stable at acidic pH and resistant to the detergent effects of bile.
Where appropriate, the pharmaceutical compositions may be administered by inhalation, in the form of suppositories or pessaries, topically in the form of lotions, solutions, creams, ointments or dusting powders, by application of a skin patch, orally in the form of tablets containing excipients such as starch or lactose, or in capsules or pills, alone or in admixture with excipients, or as elixirs, solutions or suspensions containing flavoring or coloring agents, or parenterally, for example intravenously, intramuscularly or subcutaneously. For parenteral administration, the compositions are best used in the form of a sterile aqueous solution which may contain other substances, for example, enough salts or monosaccharides to make the solution isotonic with blood. For buccal or sublingual administration, the compositions may be administered in the form of tablets or lozenges which can be formulated in a conventional manner.
The agent may be used in combination with a cyclodextrin. Cyclodextrins are known to form inclusion and non-inclusion complexes with drug molecules. Formulating drug-cyclodextrin complexes can alter the solubility, dissolution rate, bioavailability, and/or stability of the drug molecule. Drug-cyclodextrin complexes are generally used in most formulation forms and routes of administration. As an alternative to direct complexation with the drug, the cyclodextrin may be used as an auxiliary additive, e.g. as a carrier, diluent or solubiliser. alphA-, betA-and gammA-cyclodextrins are the most commonly used, suitable examples being described in WO-A-91/11172, WO-A-94/02518 and WO-A-98/55148.
If the agent is a protein, the protein may be prepared in situ in the subject to be treated. In this regard, the nucleotide sequence encoding the protein may be delivered by using non-viral techniques (e.g., by using liposomes) and/or viral techniques (e.g., by using retroviruses) such that the protein is expressed from the nucleotide sequence.
The pharmaceutical compositions of the present invention may also be used in combination with conventional therapies.
Administration of drugs
The term "administering" includes delivery by viral or non-viral techniques. Viral delivery mechanisms include, without limitation, adenoviral vectors, adeno-associated virus (AAV) vectors, herpesvirus vectors, retroviral vectors, lentiviral vectors, and baculovirus vectors. Non-viral delivery mechanisms include lipid-mediated transfection, liposomes, immunoliposomes, transfected lipids (lipofectins), cationic surface amphiphiles (CFAs), and combinations thereof.
The ingredients may be administered alone, but are generally administered as a pharmaceutical composition (e.g., when the ingredients are mixed together with a suitable pharmaceutical excipient, diluent or carrier selected with regard to the intended route of administration and standard pharmaceutical practice).
For example, the ingredients may be administered in the form of tablets, capsules, pills, elixirs, solutions or suspensions, which may contain flavouring or colouring agents, for immediate, extended, modified, sustained, impact or controlled release applications.
If the drug is a tablet, the tablet may contain excipients such as microcrystalline cellulose, lactose, sodium citrate, calcium carbonate, dicalcium phosphate and glycine, disintegrants such as starch (preferably corn, potato or tapioca starch), sodium starch glycolate, croscarmellose sodium and certain complex silicates), and granulation binders such as polyvinylpyrrolidone, Hydroxypropylmethylcellulose (HPMC), Hydroxypropylcellulose (HPC), sucrose, gelatin and acacia.
Solid compositions of a similar type may also be used as fillers in gelatin capsules. Preferred excipients in this regard include lactose, starch, cellulose, milk sugar or high molecular weight polyethylene glycols. For water soluble suspensions and/or elixirs, the agent may be combined with various sweetening or flavouring agents, colouring matter or dyes, with emulsifying and/or suspending agents and with diluents such as water, ethanol, propylene glycol and glycerin, and combinations thereof.
Routes of administration (delivery) may include, but are not limited to, one or more of oral (e.g., as a tablet, capsule, or as a swallowable solution), topical, mucosal (e.g., as a nasal spray or inhalation aerosol), nasal, parenteral (e.g., by injectable form), gastrointestinal, intraspinal, intraperitoneal, intramuscular, intravenous, intrauterine, intraocular, intradermal, intracranial, intratracheal, intravaginal, intracerebroventricular, subcutaneous, ocular (including intravitreal or intraocular), transdermal, rectal, buccal, vaginal, epidural, sublingual.
Dosage level
Typically, the physician will determine the actual dosage which will be most suitable for each subject. The specific dose level and frequency of dosage for any particular patient may be varied and will depend upon a variety of factors including the activity of the specific compound employed, the metabolic stability and length of action of that compound, the age, body weight, general health, sex, diet, mode and time of administration, rate of excretion, drug combination, the severity of the particular condition, and the therapy to which the individual is being subjected.
Formulation of
One or more of the ingredients may be formulated into pharmaceutical compositions, for example by mixing with one or more suitable carriers, diluents or excipients using techniques known in the art.
Disease and disorder
Aspects of the invention are useful in the treatment and/or prevention and/or diagnosis and/or prognosis of diseases such as those listed in WO-A-98/09985.
For ease of reference, a portion of this list is now provided: macrophage inhibitory and/or T cell inhibitory activity and anti-inflammatory activity resulting therefrom; anti-immune activity, i.e., an inhibitory effect against cellular and/or humoral immune responses, including responses not associated with inflammation; diseases associated with viruses and/or other intracellular pathogens; inhibiting the ability of macrophages and T cells to adhere to extracellular matrix components and fibronectin, and upregulating fas receptor expression in T cells; inhibiting unwanted immune responses and inflammation including arthritis, including rheumatoid arthritis, inflammation associated with hypersensitivity, anaphylaxis, asthma, systemic lupus erythematosus, collagen diseases and other autoimmune diseases, inflammation associated with arteriosclerosis, atherosclerotic heart disease, perfusion injury, cardiac arrest, myocardial infarction, vascular inflammatory diseases, respiratory distress syndrome or other cardiopulmonary diseases, inflammation associated with gastric ulcers, ulcerative colitis and other gastrointestinal diseases, liver fibrosis, cirrhosis or other liver diseases, thyroiditis or other glandular diseases, glomerulonephritis or other kidney and urinary diseases, otitis or other otorhinolaryngological diseases, dermatitis or other skin diseases, periodontal diseases or other dental diseases, orchitis or epididymitis, infertility, testicular injury or other immune-related testicular diseases, Placental dysfunction, placental insufficiency, habitual abortion, eclampsia, preeclampsia and other immune and/or inflammation related gynaecological diseases, posterior uveitis, intermediate uveitis, anterior uveitis, conjunctivitis, chorioretinitis, uveoretinitis, optic neuritis, intraocular inflammation, such as retinitis or saccular macular edema, sympathetic ophthalmia, scleritis, retinitis pigmentosa, immune and inflammatory components of degenerative ocular fundus disease (degenerative fovea disease), inflammatory components of ocular injury, inflammation of the eye resulting from infection, proliferative hyaloid retinopathy, acute ischemic ocular neuropathy, excessive scarring such as after glaucoma filtration, immune and/or inflammatory responses against ocular grafts and other immune and inflammation related eye diseases, inflammation related to an immune disease itself or condition or disorder (where immune and/or inflammation is inhibited in either the Central Nervous System (CNS) or any other organ Symptoms may be beneficial), Parkinson's disease, complications and/or side effects from treatment of Parkinson's disease, AIDS-related dementia complex with HIV-related encephalopathy, Devic's disease, Sydenham's chorea, Alzheimer's disease and other CNS degenerative diseases, conditions or disorders, inflammatory components of stroke, post-polio syndrome, immunological and inflammatory components of psychosis, myelitis, encephalitis, subacute sclerosing panencephalitis, encephalomyelitis, acute neuropathy, subacute neuropathy, chronic neuropathy, Guillain-Barre syndrome, Sydenham's chorea, myasthenia gravis, pseudoencephaloma, Down syndrome, Huntington's disease, amyotrophic lateral sclerosis, inflammatory components of CNS compression or CNS trauma or CNS infection, inflammatory components of muscular atrophy and muscular dystrophy, and immunological and inflammatory-related diseases, conditions or disorders of the central and peripheral nervous systems, Post-traumatic inflammation, septic shock, infectious disease, inflammatory complications or side effects of surgery, complications and/or side effects of bone marrow transplantation or other transplantations, inflammatory and/or immune complications and side effects of gene therapy, e.g., due to viral vector infection, or inflammation associated with AIDS, thereby attenuating or inhibiting humoral and/or cellular immune responses, treating or ameliorating monocyte or leukocyte proliferative diseases (e.g., leukemia) by reducing the amount of monocytes or lymphocytes, for preventing and/or treating transplant rejection in the case of transplantation of natural or artificial cells, tissues and organs (e.g., cornea, bone marrow, organs, crystalline lens, pacemaker, natural or artificial skin tissue). Specific cancer-related disorders include, but are not limited to: a solid tumor; blood-borne tumors, such as leukemia; tumor metastasis; benign tumors such as hemangioma, acoustic neuroma, neurofibroma, trachoma, and pyogenic granuloma; rheumatoid arthritis; psoriasis; ocular angiogenic diseases such as diabetic retinopathy, retinopathy of prematurity, macular degeneration, corneal graft rejection, neovascular glaucoma, retrolental fibroplasia, flushing; Osler-Webber syndrome; myocardial angiogenesis; plaque neovascularization (plaqueneovascularization); telangiectasia; hemophiliac joints; angiofibroma; wound granulation formation; a coronary collateral branch; a collateral cerebral branch; arteriovenous malformation; ischemic limb angiogenesis; neovascular photophobic eyes; retrolental fiber formation; diabetic neovascularization; helicobacter pylori related diseases, bone fractures, angiogenesis, hematopoiesis, ovulation, menstruation and placentation.
The disease is preferably cancer, such as Acute Lymphocytic Leukemia (ALL), Acute Myeloid Leukemia (AML), adrenocortical cancer, anal cancer, bladder cancer, blood cancer, bone cancer, brain tumor, breast cancer, female reproductive system cancer, male reproductive system cancer, central nervous system lymphoma, cervical cancer, childhood rhabdomyosarcoma, childhood sarcoma, Chronic Lymphocytic Leukemia (CLL), Chronic Myeloid Leukemia (CML), colon and rectal cancer, colon cancer, endometrial sarcoma, esophageal cancer, eye cancer, gallbladder cancer, stomach cancer, gastrointestinal cancer, hairy cell leukemia, head and neck cancer, hepatocellular carcinoma, Hodgkin's disease, hypopharyngeal cancer, Kaposi's sarcoma, kidney cancer, larynx cancer, leukemia, liver cancer, lung cancer, malignant fibrous histiocytoma, malignant thymoma, melanoma, mesothelioma, multiple myeloma, and cervical cancer, Myeloma, nasal and paranasal sinus cancer, nasopharyngeal carcinoma, nervous system cancer, neuroblastoma, non-hodgkin's lymphoma, oral cancer, oropharyngeal cancer, osteosarcoma, ovarian cancer, pancreatic cancer, parathyroid cancer, penile cancer, pharyngeal cancer, pituitary tumor, plasmacytoma, primary CNS lymphoma, prostate cancer, rectal cancer, respiratory system, retinoblastoma, salivary gland cancer, skin cancer, small intestine cancer, soft tissue sarcoma, stomach cancer, testicular cancer, thyroid cancer, urinary system cancer, uterine sarcoma, vaginal cancer, vascular system, waldenstrom's macroglobulinemia, and Wilms ' tumor.
Reagent kit
The materials used in the method of the invention are ideally suited for use in the preparation of kits.
The kit may comprise containers, each with one or more of the various reagents (typically in concentrated form) used in the methods described herein, including, for example, primary restriction enzymes, secondary restriction enzymes, crosslinking agents, enzymes for ligation (e.g., ligase), and reagents for de-crosslinking (e.g., proteinase K).
The oligonucleotides may also be provided in containers, which may be in any form, such as lyophilized or in solution (e.g., distilled water or buffer), and the like.
In a preferred aspect of the invention, there is provided a kit comprising a set of probes, an array as described herein and optionally one or more labels.
A set of instructions is also typically included.
Applications of
Advantageously, the present invention is used in order to obtain spatial organisational information about a nucleotide sequence, such as a genomic locus in vitro or in vivo.
For example, 4C techniques can be used to study the three-dimensional organization of one or more loci. This technique is particularly useful for studying the role of one or more transcription factors in the three-dimensional organization of one or more loci.
By way of further example, 4C technology can be used to study the effects of trans-acting factors and cis-regulatory DNA elements.
By way of further example, 4C technology can be used to study long-range gene regulation in vitro or in vivo.
By way of further example, 4C technology can be used to study the proximity and interaction within chromosomes.
By way of further example, 4C technology can be used to study the proximity and interactions between chromosomes.
By way of further example, 4C technology can be used to identify nucleotide sequences that function with promoters, enhancers, silencers, isolates, locus control regions, origins of replication, MAR, SAR, centromere, telomeres, or any other sequence of interest in a regulatory network.
By way of further example, 4C technology can be used to identify genes responsible for phenotypes (diseases) in cases where mutations and/or deletions happen to affect distant regulatory elements and thus mapping them would not provide such information.
By way of further example, 4C techniques can be used to ultimately reconstruct the spatial conformation of a locus, large genomic region, or even an entire chromosome.
By way of further example, 4C techniques can be used to determine potential anchor sequences that hold certain chromosomes together in the nuclear space.
By way of further example, 4C techniques may be used to ultimately reconstruct the position of chromosomes relative to each other with high resolution.
By way of further example, 4C technology can be used in diagnostics (e.g., prenatal diagnostics) to detect or identify genomic rearrangements and/or abnormalities, such as translocations, deletions, inversions, duplications.
General recombinant DNA methodology
Unless otherwise indicated, the present invention employs conventional chemical, molecular biological, microbiological, recombinant DNA and immunological techniques, which are within the capabilities of one of ordinary skill in the art. These techniques are explained in the literature. See, e.g., j.sambrook, e.f. fritsch, and t.maniotis, 1989, Molecular Cloning: a Laboratory Manual, second edition, Books 1-3, Cold spring harbor Laboratory Press; ausubel, F.M., et al (1995 and periodic supplements; Current protocols in Molecular Biology, chapters 9, 13, and 16, John Wiley & Sons, New York, N.Y.); roe, j.crabtree, and a.kahn, 1996, DNA isolationdsequencing: essential Techniques, John Wiley & Sons; gate (editor), 1984, Oligonucleotide Synthesis: a Practical Approach, Irl Press; and, d.m.j.lilley and j.e.dahlberg, 1992, Methods of Enzymology: DNA Structure PartA: synthesis and physical Analysis of DNA Methods in Enzymology, academic Press. Each of these general textbooks is incorporated herein by reference.
The invention will now be described by way of further example for the purpose of assisting a person skilled in the art in the implementation of the invention, without intending to limit the scope of the invention in any way.
Example 1
Materials and methods
4C technique
The initial steps of the 3C procedure were as described previously (Splint et al (2004). Methods Enzymol375, 493-, thereby promoting DpnII-or DpnII-loop formation. The ligation product was extracted with phenol and precipitated with ethanol, and glycogen (Roche) was used as a carrier (20. mu.g/ml). Digesting overnight with 50U of a third restriction enzyme that cleaves the bait between the first and second restriction enzyme recognition sites to linearize the loop of interest; the third restriction enzyme is: SpeI (HS2), PstI (Rad23A) and PflmI (. beta. -major). This linearization step was performed to facilitate primer hybridization during the first round of PCR amplification next. The digested product was purified using QIAquick nucleotide isolation (250) columns (Qiagen).
The PCR reaction was performed using the Expand Long Template PCR System (Roche) using carefully optimized conditions to ensure linear amplification of the longest 1.2kb fragment (80% of the 4C-PCR fragments are less than 600 bp). The PCR conditions were as follows: 30 cycles at 94 ℃ for 2 minutes, 94 ℃ for 15 seconds, 55 ℃ for 1 minute and 68 ℃ for 3 minutes, followed by a final step at 68 ℃ for 7 minutes. The maximum template amount that still shows a linear amplification range is determined. To this end, serial dilutions of template were added to the PCR reaction, and the amplified DNA material was separated on an agarose gel and PCR products quantified using ImageQuant software. In general, 100-200ng template per 50. mu.l PCR reaction produced products within the linear amplification range. 16 to 32 PCR reactions were pooled and the 4C template was purified using the QIAquick nucleotide isolation (250) system (Qiagen). The purified 4C templates were labeled and hybridized to the array according to standard ChIP-ChIP protocol (Nimblegen Systems of Iceland, LLC). Differentially labeled genomic DNA (which was digested with the first and second enzymes used in the 4C process) was used as a control template to correct for differences in hybridization efficiency. For each experiment, 2 independently treated samples were labeled with alternating dye localization (orientation).
The 4C-primer sequences used were:
HS2:5’-ACTTCCTACACATTAACGAGCC-3’,
5’-GCTGTTATCCCTTTCTCTTCTAC-3’
Rad23A:5’-TCACACGCGAAGTAGGCC-3’,
5’-CCTTCCTCCACCATGATGA-3’
β-major:5’-AACGCATTTGCTCAATCAACTACTG-3’,
5’-GTTGCTCCTCACATTTGCTTCTGAC-3’
4C array
Arrays and analyses were performed based on m34 established at NCBI. The probe (60-mer) is selected from HindSequences 100bp up and down stream of the III site. CG content was optimized towards 50% for homogenization of hybridization signals. To avoid cross-hybridization, the abundant repeat sequences (RepBase 10.09) are removed from the probe set3Probes with any similarity. In addition, probes that could have more than two BLAST hits in the genome also removed probe sets. Sequence alignments were performed using standard settings with MegaBLAST (Zhang et al, (2000) JCompout Biol7, 203-14). Hits are defined as having a pairing arrangement of 30nt or more.
4C data analysis
The signal ratio of 4C-sample/genomic DNA was calculated for each probe and the data visualized with SignalMap software provided by the Nimblegen system. Using the R software package (http://www.r-project.org) Spotfire and Excel analyzed the data. The untreated hybridization rate showed a bundle consisting of 20-50 positive 4C-signals along the chromosome template. To determine these beams, a continuous averaging method is used. Various window sizes were used, ranging from 9-39 probes, all of which were able to identify the same strand. The results shown are based on a window size of 29 probes (average 60kb) and compared to a running average performed on random data. Each array does this separately. Thus, all measurements are evaluated with respect to amplitude and noise for the particular array. False Discovery Rate (FDR) is defined as (number of False positives)/(number of False positives + number of true positives), which is determined by the following formula: (number of positives in randomized group)/(number of positives in data). The threshold level was established using the top-down method (toprodownaproach) to establish a minimum value of FDR < 0.05.
Subsequently, biological replicates were compared. The window that reached the threshold in both replicates was considered positive. When comparing randomized data, there was no window above the threshold in both replicate experiments. Directly adjacent positive windows on the chromosome template are linked (no gaps allowed) to produce positive regions.
Expression analysis
For each tissue, 3 independent microarray manipulations were performed according to the Affymetrix protocol (mouse 430_2 array). The data was normalized using RMA ca-tools (www.bioconductor.org) and the measurements from 3 microarrays were averaged for each probe set. In addition, when multiple probe sets represent the same gene, they are also averaged. Mass 5calls (Affy library: www.bioconductor.org) were used to determine "present", "absent" and "edge" accesses (calls). Genes that are represented as "present" in all 3 arrays with expression values greater than 50 are referred to as expressed genes. "fetal liver-specific genes" are classified as genes that meet our criteria for expression in fetal liver and have expression values that exceed 5-fold that of fetal brain. To measure the overall transcriptional activity around each gene, a continuous sum was used. For this, we use expression values converted to logarithms. For each gene, we calculated the total value of expression of all genes (including the gene itself) found in the window 100kb upstream of the start and 100kb downstream of the end of the gene. Comparing the results for the active genes found in the positive 4C region (n 124, 123 and 208 for HS2 in the liver, Rad23A in the brain and Rad23A in the liver, respectively) with the values for the active genes outside the positive 4C region (n 153, 301 and 186, respectively, where n 153 corresponds to the active, non-interacting gene present between the most centromeric interaction region and telomeres in chromosome 7); the two groups were compared using the Wilcoxon rank-sum test with one end trailing.
FISH probe
The following BAC clone (BACPAC Resources Centre) was used; RP23-370E12 for Hbb-1, RP23-317H16 at 80.1Mb for chromosome 7 (OR gene bundle), RP23-334E9 for Uros, RP23-32C19 at 118.3Mb for chromosome 7, RP23-143F10 at 130.1Mb for chromosome 7, RP23-470N5 at 73.1Mb for chromosome 7, RP23-247L11 at 135.0Mb for chromosome 7 (OR gene bundle), RP23-136A15 for Rad23A, RI23-307P24 at 21.8Mb for chromosome 8, and RP23-460F21 at 122.4Mb for chromosome 8. For centromere-specific probes for chromosome 7, we used P1 clone 5279(Genome systems inc.), which anneals to DNA segment D7Mit 21. Random primer-labeled probes were prepared using a BioPrime Array CGH Genomic Labeling System (Invitrogen). Before labeling, the DNA was digested with DpnII and purified with DNA clearandconcentrator-5 kit (Zymo research). The digested DNA (300ng) was labeled with Spectrum GreendUTP (Vysis) or Alexa fluor 594 dUTP (molecular probes) and purified by GFX PCR DNA and gel Band Purification kit (Amersham biosciences) to remove unincorporated nucleotides. The specificity of the labeled probes was tested on metaphase smear prepared from murine embryonic stem cells.
Low temperature FISH
Low temperature FISH as described previously5Briefly, E14.5 liver and brain were fixed in 4% paraformaldehyde/250 mM HEPES (pH 7.5) for 20 minutes and cut into small tissue pieces, then fixed in 8% paraformaldehyde at 4 ℃ for 2 hours. the fixed tissue pieces were soaked in 2.3M sucrose for 20 minutes at room temperature, mounted on sample holders and snap frozen in liquid nitrogen. the tissue pieces were stored in liquid nitrogen until sectioning. ultrathin cryosections of approximately 200nm thickness were sectioned with Reichert ultrathin section E (Leica) with cryoadnexa. sections were transferred to coverslips and stored at-20 ℃ for hybridization, the sections were washed with PBS to remove sucrose, treated with 250ng/ml RNase in 2XSSC for 1 hour at 37 ℃, incubated in 0.1MHCL for 10 minutes, dehydrated in serial diluted ethanol and re-incubated in 70% formamide/2 XSSC (pH 7.5) for 8 minutes at 80 ℃ just before denaturation of the probe, dehydrated in 0.1 ng PBS, re-incubated with 10 minutes in 10% PBS, and annealed in 10% PBS buffer (10 mM) at 37 ℃ and incubated with 10% PBS after denaturing the probe, renaturing the probe, the PCR probe, incubated with 10 min at least 50% PBS, and annealed in 10% PBS, and incubated with 10% PBS, and denatured buffer (10 min) for denaturing the probe after denaturing the PCR probe at 37 ℃ and denaturing 20 mM).
The image was captured using a Zeissaxio Imager Z1 equipped with a CCD camera and Isis FISH Imaging System software (Metasys)Images were collected by fluorescence microscopy (x100 x planar apochromates, 1.4 x oil objective). A minimum of 250 β -globin or Rad23A alleles were analyzed and scored as overlapping or non-overlapping with BACs located elsewhere in the genome by a person unaware of the probe combination used for sectioning6To assess the significance of the difference between the measurements of 4C-positive versus 4C-negative regions. An overview of the results is provided in table 2.
Although we found statistical significance between background (0.4-3.9%) and true (5-20.4%) interaction frequencies, it is clear that the frequency ratios measured by cryo-FISH are lower than those measured by other humans with different FISH protocols. Sections may separate some of the interacting loci, so cryoFISH measurements will slightly underestimate the true interaction frequency. On the other hand, current 2D-and 3D FISH protocols will overestimate these percentages due to the limited resolution in the z-direction. In the future, improved microscopy combined with more specific FISH probes will better reveal the true interaction frequencies.
Example 2
The 3C procedure (i.e., fixation with formaldehyde, digestion with (primary) restriction enzymes, religation of cross-linked DNA fragments and DNA purification) was performed essentially as described (Splint et al, (2004) Methods enzymol.375: 493-507) to generate a DNA mixture ('3C template') containing restriction fragments that were ligated because they were originally in the nuclear space.
Inverse PCR is performed to amplify all fragments ligated to a given restriction fragment ("decoy"; selected for its promoter, enhancer, insulator, nuclear matrix attachment region, origin of replication, or any other first (target) nucleotide sequence).
To this end, DNA loops are generated by digesting the 3C template with a second restriction enzyme (preferably a frequently cut enzyme that recognizes a sequence of four or five nucleotides) and then ligating under dilution conditions that favor intramolecular interactions. To minimize the preference in loop formation due to topological constraints (Rippe et al, (2001) Trends in biochem. sciences 26, 733-40), a second restriction enzyme was chosen that preferentially cleaves the bait at > 350-400bp from the first restriction site. To increase the efficiency and reproducibility of the inverse PCR amplification, it is preferred that the loop be linearized with a restriction enzyme that cleaves the bait between the diagnostic first and second restriction sites (e.g., a restriction enzyme that recognizes 6 or more bp), and then PCR amplified.
The 3C template was digested with a second restriction enzyme, circularized by ligation under dilution conditions, and linearized with bait-containing loops, which were performed under standard conditions for these DNA manipulations, to generate DNA templates for performing inverse PCR amplification ('4C templates').
Thus, 10. mu.g of 3C template was digested with 20U of the second restriction enzyme in 100. mu.l (overnight), and then the enzyme was heat inactivated and the DNA purified. Ligation was performed in 10ml (1 ng/. mu.l DNA) with 50U T4 ligase (4 h at 16 ℃ C. for 30 min at RT) followed by DNA purification. Finally, the loop of interest was linearized with 20U of restriction enzyme in 100. mu.l (overnight) and then DNA purification was performed again.
For reverse PCR, two bait-specific primers were designed, each as close as possible to the first restriction enzyme recognition site and directly adjacent to the second restriction enzyme recognition site, with the 3' end of each facing outward, so that extension immediately passed the restriction enzyme site into the bait-attached fragment. Reverse PCR with these primers is preferably performed using a 4C template of 100-400ng DNA (per 50. mu.l PCR reaction mixture) such that the maximum number of ligation events is included per PCR reaction. We performed inverse PCR using the Expand Long Template PCR System (Roche) according to the manufacturer's protocol using buffer 1.
The following PCR cycles were performed:
1.94 ℃ for 2 minutes
2.94 ℃ for 15 seconds
3.55 ℃ for 1 minute
4.68 ℃ for 3 minutes
5. Repeating steps 2-4 29 times (or any number of times between 25-40 times)
6.68 ℃ for 7 minutes
7. End up
Gel electrophoresis was performed to analyze reproducibility among PCR reactions. Generally a consistent product pattern should be obtained.
In order to obtain sufficient material for labeling by random priming and array hybridization, multiple PCR reactions (each obtained after 30 cycles of PCR) were pooled, (rather than increasing the number of PCR cycles in each reaction). As an alternative to random priming, labeled nucleotides can be added to the final cycle of PCR (e.g., 30 cycles (no label) +10 cycles (labeled)).
Example 3
Detection of translocation Using 4C technology
The frequency of interaction for a given sequence X occurring on a given chromosome a in cells from healthy subjects and in cells from patients with a single reciprocal translocation between chromosomes a and B was measured using the 4C technique, with a breakpoint close to sequence X (as shown in figure 9).
In normal cells, this analysis revealed that the hybridization signal (i.e.frequent interactions with X) for (almost) every probe located within 0.2-10Mb of sequence X on chromosome A is increased (the actual size of the chromosomal region showing strong cross-linking signal depends mainly on the complexity of the sample hybridized to the array). Such large probe regions (on a linear DNA template) with increased hybridization signals are not observed elsewhere on the same chromosome a as well as on other chromosomes.
However, in patient cells, hybridization signals obtained with all chromosome A probes located on the other side of the breakpoint are reduced by-50% (one copy of chromosome A is still intact and will produce a normal signal), while for probes at the edge of the breakpoint on chromosome B, a unique concentration of increased hybridization signals (i.e., not present in normal cells) is observed. In fact, a sudden transition from no-to strong-hybridization signal probes on chromosome B revealed the location of the breakpoint on chromosome B.
Example 4
Analysis of 4C technical results
The genomic environment of the mouse beta-globin Locus Control Region (LCR) was characterized using 4C technique, focusing on the restriction fragment containing its hypersensitive site 2(hypersensitive site2, HS 2). LCR is a strong red blood cell-specific transcriptional regulatory element, necessary for high β -globin gene expression levels. The β -globin locus occurs at the 97Mb position on chromosome 7, where it is located in a large 2.9Mb tract of olfactory receptor genes that are transcribed only in olfactory neurons. The interaction in 2 tissues was analyzed: e14.5 fetal liver (where LCR is active and the beta-globin gene is highly transcribed), and E14.5 fetal brain (where LCR is inactive and the globin gene is silent). In both tissues, the vast majority of interactions were found in sequences on chromosome 7, whereas only few LCR interactions were detected for 6 unrelated chromosomes (8, 10, 11, 12, 13, 14) (fig. 13 a). The strongest signal on chromosome 7 was found in the 5-10Mb region centered on the beta-globin chromosomal position, consistent with the notion that the frequency of interaction is inversely proportional to the distance (in base pairs) between physically linked DNA sequences. It is not possible to quantitatively elucidate the interaction in this region. Our reason is that these adjacent sequences are too frequent with β -globin, so that they are greatly overexpressed in our hybridization samples, saturating the corresponding probes. We performed hybridization with samples diluted 1: 10 and 1: 100 and found that signal intensity was reduced for both the outer and marginal probes, but not for the probes in this region (data not shown), confirming this reason.
The 4C process generates highly repeatable data. FIGS. 2b-C show the untreated ratio of 4C-signal to the control hybridization signal for two 1.5Mb regions on chromosome 7 (approximately 25Mb and 80Mb apart from the. beta. -globin genes). At this resolution level, the results for the independently treated samples were almost identical. In both the fetal liver and brain, a positive signal beam is identified on chromosome 7, which is usually located at a chromosomal location ten million bases away from β -globin. These beams typically consist of a minimum of 20-50 probes juxtaposed on a chromosome template, the signal ratios of which are increased (FIGS. 13 b-c). Each probe on the array analyzes an independent ligation event. Furthermore, there were only 2 copies of the HS2 restriction fragment per cell, each of which was ligated to only one other restriction fragment. Thus, detection of independent ligation events with 20 or more adjacent restriction fragments strongly suggests that the corresponding locus is in contact with β -globin LCR in a variety of cells.
To determine the statistical significance of these bundles, the data from each experiment were sequenced on the chromosome map and analyzed with a running average algorithm with a window size of approximately 60 kb. The threshold was set with a running average distribution of randomly shuffled data, allowing a false discovery rate of 5%. This analysis identified 66 bundles in fetal liver and 45 bundles in brain, which were found repeatedly in repeated experiments (fig. 13 d-f). Indeed, high resolution FISH confirmed that these bundles truly represent frequently interacting loci (see below).
Thus, the 4C technique identifies long-range interaction loci by detecting independent ligation events of multiple restriction fragments clustered at a chromosomal location.
A series of completely independent 4C experiments were performed with different sets of reverse PCR primers to study the genomic environment of the β -major gene, which is located at 50kb downstream of HS 2. In fetal liver, the β -major gene is highly transcribed and frequently contacts LCR. In fetal liver and brain, almost identical bundles interacting with β -major long range and HS2 were found, further confirming that these loci are frequently in contact with β -globin loci (fig. 17).
Example 5
The activated and inactivated β -globin loci occupy different genomic environments.
Comparison between 2 tissues revealed that actively transcribed β -globin loci in fetal liver and their transcriptionally silenced counterparts in brain interacted with a completely different set of loci (τ -0.03; Spearman rank-related) (fig. 13 f). This excludes that the results are influenced by the combination of probe sequences. In fetal liver, the interacting DNA fragments are located in a 70Mb region centered on the β -globin locus, most of which (40/66) are distributed towards telomeres of chromosome 7. In fetal brain, interacting loci are found at similar or even greater distances from β -globin than in fetal liver, and the most prominent interactions (43/45) are distributed towards telomeres of chromosome 7. These data demonstrate that active and inactive β -globin loci contact different parts of chromosome 7.
There are another 6 chromosome representations (8, 10, 11, 12, 13 and 14) on the microarray. Strong hybridization signals on these chromosomes are rare, often shown in isolated form on linear DNA templates, and often absent in repeated experiments. In addition, the running average level along these chromosomes never reproducibly approached the level scored for chromosome 7 (fig. 19). Thus, our data show that the β -globin locus is mostly in contact with loci elsewhere on the same chromosome, consistent with the preferred location of the locus within its own chromosomal region. We note that the α -globin locus is also present on the array (chromosome 11) and does not exhibit positive interaction with β -globin, consistent with recent evidence by FISH that mouse α -and β -globin do not meet frequently in the nuclear space (Brown, j.m., et al (2006) J Cell Biol 172, 177-87).
To better understand the relevance of the long-range interactions observed on chromosome 7, we compared the interacting loci with the chromosomal location of the genes. In addition, Affymetrix expression array analysis was performed to determine transcriptional activity at these locations in both tissues. Although the average sizes of the interaction regions in the fetal liver and brain were similar (183 kb and 159kb, respectively), a great difference was observed in their gene composition and activity. In fetal liver, 80% of β -globin interaction loci contain one or more actively transcribed genes, while in fetal brain, most (87%) show no detectable gene activity (fig. 15). Thus in both tissues, the β -globin locus is contained in a very different genomic environment. In the brain, where the locus is inactive, it primarily contacts a transcriptional silencing locus that is distributed toward the centromere of chromosome 7. In fetal liver, where the locus is hyperactive, it interacts preferentially with actively transcribed regions that are more prominently located towards the telomeric side of chromosome 7. Importantly, the 4C technique allows identification of Uros and Eraf distances (β -globin 30Mb) as genes that interact with active β -globin loci in fetal liver, consistent with previous observations made by FISH (Osborne, c.s., et al (2004) Nat gene 36, 1065-71 (2004)). Interestingly, it was observed that there were contacts between two other olfactory receptor gene tracts in the brain, which are present on chromosome 7, on either side of β -globin, and at distances of 17 and 37Mb from β -globin, respectively.
In fetal liver, not all transcribed regions on chromosome 7 interact with the active β -globin locus. Therefore, we sought a common feature that was exclusively shared by interacting loci, and not by other active regions in the fetal liver. The β -globin genes Uros and Eraf are both red blood cell specific genes regulated by the same set of transcription factors, and it is an attractive view that these factors coordinate the expression of their target genes in the nuclear space. We compared Affymetrix expression array data from E14.5 fetal liver to fetal brain to identify genes preferentially expressed (> 5-fold) in fetal liver. Thus, 28% of the active genes on chromosome 7 were classified as "fetal liver-specific", 25% of which were found in the co-localized region. Therefore, we found that "fetal liver-specific" genes are not abundant in the co-localized region. More importantly, 49 of the 66 interacting regions (74%) did not contain "fetal liver-specific" genes, thus concluding that our data does not show evidence of tissue-specific gene coordinated expression in the nuclear space. The beta-globin gene is transcribed at an abnormally high rate and the locus is next asked whether it interacts preferentially with other regions of high transcriptional activity, whether these regions are highly expressed genes or regions with high density of active genes. Gene activity was determined using Affymetrix counts and we performed a running sum algorithm to measure the overall transcriptional activity in the 200kb region surrounding the transcriptionally active gene. This analysis revealed that the transcriptional activity around the interacting gene was not higher than that around the gene without the interaction activity on chromosome 7 (p ═ 0.9867; Wilcoxon rank sum).
Example 6
The genomic environment of housekeeping genes is largely conserved between tissues
It was then investigated whether genes similarly expressed in both tissues also switched their genomic environment. Rad23A is a ubiquitously expressed gene that is located in a dense cluster of genes on chromosome 8 that is composed primarily of housekeeping genes. In the E14.5 fetal liver and brain, this gene and many of its immediate neighbors are active. 4C analysis was performed and identified many long-range interactions with loci up to 70Mb from Rad 23A. Importantly, the interaction with Rad23A was highly correlated in fetal liver and brain (τ 0.73; Spearman rank correlation) (FIG. 15 a). In addition, these loci share the feature that they contain a transcriptionally active gene. Thus, about 70% of these tissues contained at least one active gene (FIGS. 15 b-c). The region around the interacting genes showed statistically significantly higher levels of gene activity compared to the active genes elsewhere in the chromosome as determined by the running sum algorithm (p < 0.001 for both tissues). Thus, unlike the β -globin locus, the Rad23A gene located in the gene rich region preferentially interacts with other chromosomal regions spanning increased transcriptional activity. It was observed by FISH that the Rad23A containing chromosomal region was mostly located at the border (90%) or outside (10%) of its chromosomal region (unpublished, d. However, 4C analysis only revealed intrachromosomal interactions, and no region on chromosome 7, 10, 11, 12, 13 or 14 could reproducibly meet our stringent interaction criteria. Thus, Rad23A is primarily involved in similar intrachromosomal interactions in two very different tissues. If Rad23A has preferred adjacent loci on these unrelated chromosomes, their interaction is not sufficient to be detected under the conditions used in the 4C technique herein.
Example 7
4C technique by high resolution microscopy
To verify the results obtained with the 4C technique, a low temperature FISH experiment was performed. Cryofish is a recently developed microscopic technique that has the advantage over the existing 3D-FISH protocol in that it better preserves the nuclear superstructure and improves z-axis resolution by making ultra-thin cryosections (branch, M.R. & Pombo, a (2006). Validation of the 4C data was performed by measuring how frequently the beta-globin or Rad23A allele (usually n > 250) co-localizes with more than 15 selected chromosomal regions in 200nm ultrathin sections prepared from E14.5 liver and brain. Importantly, all interaction frequencies measured by cryofish perfectly fit the 4C results (fig. 17). For example, distant regions identified by 4C techniques as interacting with β -globin co-localize more frequently (7.4% and 9.7%, respectively, relative to 3.6% and 3.5%) than intervening regions not detected by 4C. In addition, the frequency of co-localization in the brain of two distant olfactory receptor gene tracts identified by the 4C technique to interact with β -globin in the fetal brain but not in the liver was scored as 12.9% and 7%, respectively, and 3.6% and 1.9% in liver sections. In conclusion, loci identified as positive by the 4C technique all had significantly higher co-localization frequencies than those measured for the background locus (p < 0.05; G-test). We conclude that the 4C technique faithfully identifies interacting DNA loci. Finally, we demonstrated using cryo-FISH that loci identified as interacting with β -globulin also frequently contact each other. This is true for 2 active regions spanning large chromosomal distances in the fetal liver (fig. 19) and for two inactive OR gene bundles located further apart on the chromosomes in the brain (fig. 17). Interestingly, frequent contacts between these two distant OR gene bundles were also found in fetal liver, where they did not interact with the OR gene bundle containing the actively transcribed β -globin locus. These data show that nuclear interactions between OR gene bundles that are far apart are not unique to the fetal brain tissue analyzed. This seems to speculate that this spatial contact helps in the association between many OR genes, which is necessary to ensure that only one allele is transcribed per olfactory neuron (Shykind, B. (2005) Hum Mol gene 14 Spec No 1, R33-9.
Example 8
Nuclear organization of chromatin domains with and without activity
The observations described herein demonstrate that not only active, but also inactive genomic regions form unique regions in the nuclear space that involve many long-range contacts, strongly suggesting that each DNA fragment has its own preferred set of interactions. Our data suggest that when the β -globin locus is switched on, it creates a transcriptionally silent genomic environment and enters the nuclear region that is conducive to interacting with the active domain. It is expected that this dramatic relocation after transcriptional activation is likely only a marker for tissue-specific genes that reach a certain expression level and, more importantly, are separated from other active genes on linear chromosome templates (as is the case for beta-globin). This suggests that the extensive network of long-range interactions identified between both inactive and active genomic loci reflects differences in chromosome conformation from Cell to Cell, rather than the results of dynamic movement of interphase (Chakalova et al (2005) Nat Rev Genet 6, 669-77 (2005). presumably, varying degrees of de-aggregation following Cell division drives active genomic regions away from inactive chromatin (Gilbert, N. et al (2004) Cell 118, 555-66(2004)) and stabilizes contact between distant loci of similar chromatin composition by virtue of chromatin-binding protein affinity And require cell division to be reset. This view is consistent with the results of live Cell imaging studies showing restricted movement of tagged DNA loci within the nucleus (Chubb et al (2002) Curr Biol 12, 439-45(2002)), and well with the results of studies showing that chromatin position information is frequently transmitted during Cell division and is not preserved in the Cell population (Essers, J. et al. Mol Biol Cell 16, 769-75 (2005); Gerlich, D. et al. Cell 112, 751-64 (2003)).
Other aspects 1
Further aspects of the invention are set forth in the following numbered paragraphs.
1. A set of probes complementary to each side of each primary restriction enzyme recognition site in the genome of a given species (e.g. human).
2. A set of probes complementary to only one side of each of the primary restriction enzyme recognition sites in the genome of a given species (e.g., human).
3. A set of probes complementary to one side of every other primary restriction enzyme recognition site arranged along a linear template of the genome of a given species (e.g. human).
4. A set of probes complementary to one side of every third, every fourth, every fifth, every sixth, every seventh, every eighth, every ninth, every tenth, every twenty, every thirty, every forty, every fifty, every sixty, every seventy, every eighty, every ninety, or every hundred primary restriction enzyme recognition sites arranged along a linear template of a genome of a given species (e.g., a human).
5. A probe set representing a genomic region of a given size (e.g., about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, or 10Mb) (e.g., 50kb-10Mb) surrounding all loci known to be involved in translocations, deletions, inversions, duplications, and other genomic rearrangements.
6. A probe set representing a genomic region of a given size (e.g., about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, or 10Mb) (e.g., 50kb-10Mb) surrounding a selected locus known to be involved in translocations, deletions, inversions, duplications, and other genomic rearrangements.
The 7.4C sequence (decoy) is preferably within a distance of about 50kb, 100kb, 200kb, 300kb, 400kb, 500kb, 1Mb, 2Mb, 3Mb, 4Mb, 5Mb, 6Mb, 7Mb, 8Mb, 9Mb, 10Mb, 11Mb, 12Mb, 13Mb, 14Mb or 15Mb or more from the actual rearranged sequence (i.e.the breakpoint in case of a translocation).
8. A set of probes representing the complete genome of a given species, wherein each probe represents a single restriction fragment obtained or obtainable after digestion with a first restriction enzyme.
9. A set of probes representing a complete genome of a given species, wherein the probes are evenly distributed along a linear chromosome template.
10. An array comprising a set of probes according to any of paragraphs 1-10.
11. A method of analysing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences (such as one or more genomic loci) comprising the use of a nucleotide sequence or probe array or set or array of probes as described herein.
12. A method of identifying one or more DNA-DNA interactions indicative of a particular disease state comprising the use of a nucleotide sequence or probe array or set or array of probes as described herein.
13. A method of diagnosing or prognosing a disease or syndrome caused by or associated with a DNA-DNA alteration, comprising the use of a nucleotide sequence or probe array or set or array of probes as described herein.
14. An assay method for identifying one or more agents which modulate DNA-DNA interactions comprising the use of a nucleotide sequence or probe array or set or array of probes as described herein.
15. A method of detecting the location of a breakpoint (e.g. a translocation) comprising the use of a nucleotide sequence or probe array or probe set or array as described herein.
16. A method of detecting the position of an inversion comprising the use of a nucleotide sequence or probe array or probe set or array as described herein.
17. A method of detecting the location of a deletion comprising the use of a nucleotide sequence or probe array or probe set or array as described herein.
18. A method of detecting the position of a repeat comprising the use of a nucleotide sequence or probe array or probe set or array as described herein.
19. Use of a microarray in 4C technology to identify (all) DNA fragments in close spatial proximity to a selected DNA fragment.
20. A microarray comprising probes homologous to DNA sequences directly adjacent to a first restriction enzyme recognition site present in a genomic region (which may be a complete genome or a genomic portion) included in the assay: each probe is preferably located within 100bp, or up to 300bp, from a unique primary restriction enzyme recognition site, or alternatively is designed to be between each primary restriction enzyme recognition site and its closest secondary restriction enzyme recognition site.
21. An array as described herein comprising probes complementary to selected locus sequences, wherein the array represents a complete genome of a given species.
22. The array of paragraph 21, wherein the loci are loci associated with one or more diseases.
23. The array of paragraph 21 or paragraph 22, wherein the selected locus sequences comprise sequences up to 20Mb from the locus.
24. A method for analyzing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences of interest (such as one or more genomic loci) comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) linking the nucleotide sequences;
(g) amplifying one or more nucleotide sequences of interest linked to a target nucleotide sequence using at least two oligonucleotide primers, wherein each primer hybridizes to a known DNA sequence flanking the nucleotide sequence of interest;
(h) hybridizing the amplified one or more sequences to an array; and
(i) the frequency of interactions between DNA sequences is determined.
Other aspects 2
Still further aspects of the invention are set forth in the following numbered paragraphs.
1. A circularized nucleotide sequence comprising a first and a second nucleotide sequence separated by a first and a second restriction enzyme recognition site, wherein said first nucleotide sequence is a target nucleotide sequence and said second nucleotide sequence is obtainable by cross-linking genomic DNA.
2. The cyclized nucleotide sequence of paragraph 1, wherein the target nucleotide sequence is selected from the group consisting of a promoter, an enhancer, a silencer, an isolator, a nuclear matrix attachment region, a locus control region, a transcription unit, an origin of replication, a recombination hotspot, a translocation breakpoint, a centromere, a telomere, a gene-dense region, a gene-rare region, a repeat element, and a (viral) integration site.
3. The circularized nucleotide sequence of paragraph 1, wherein the target nucleotide sequence is a disease-associated or disease-causing nucleotide sequence or is located less than 15Mb from a disease-associated or disease-causing locus on a linear DNA template.
4. The cyclized nucleotide sequence of any one of paragraphs 1-3, wherein the target nucleotide sequence is selected from the group consisting of: AML1, MLL, MYC, BCL, BCR, ABL1, IGH, LYL1, TAL1, TAL2, LMO2, TCR α/, TCR β and HOX or other disease-related loci such as "Catalogue of ubalancedchromosopheresis in Man" 2 nd edition Albert schinzel, berlin: walter de Gruyter, 2001.ISBN 3-11-011607-3.
5. The cyclized nucleotide sequence of any of paragraphs 1-4, wherein the first restriction enzyme recognition site is a 6-8bp recognition site, preferably selected from the group consisting of BglII, HindIII, EcoRI, BamHI, SpeI, PstI, and NdeI.
6. The circularized nucleotide sequence of any of the preceding paragraphs, wherein the second restriction enzyme recognition site is a 4 or 5bp nucleotide sequence recognition site.
7. The circularized nucleotide sequence of any of the preceding paragraphs, wherein the secondary restriction enzyme recognition site is located more than about 350bp from the primary restriction site.
8. The cyclized nucleotide sequence of any of the preceding paragraphs, wherein the nucleotide sequence is labeled.
9. A nucleotide sequence comprising a first and a second nucleotide sequence separated by a first and a second restriction enzyme recognition site, wherein the first nucleotide sequence is a target nucleotide sequence and the second nucleotide sequence is obtainable by cross-linking genomic DNA, and wherein the second nucleotide sequence intersects the target nucleotide sequence.
10. A method for preparing a circularized nucleotide sequence comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme; and
(f) a cyclized nucleotide sequence.
11. A method of preparing a nucleotide sequence comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence; and
(g) amplifying one or more nucleotide sequences associated with the target nucleotide sequence.
12. The method of paragraph 11, wherein the circularized target nucleotide sequence is linearized prior to amplification.
13. The method of paragraph 12, wherein the circularized target nucleotide sequence is linearized with a restriction enzyme that recognizes a recognition site of 6bp or more.
14. The method of any one of paragraphs 10-13, wherein the crosslinked nucleotide sequences are amplified by PCR.
15. The method of paragraph 14 wherein the crosslinked nucleotide sequence is amplified by inverse PCR.
16. The method of paragraph 14 or paragraph 15, wherein the Expand Long template PCR System (Roche) is used.
17. A method of analysing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences, such as one or more genomic loci, comprising the use of a nucleotide sequence as described in any of paragraphs 1-9.
18. An array of probes immobilized on a support comprising one or more probes that hybridize or are capable of hybridizing to a nucleotide sequence as described in paragraphs 1-9.
19. A set of probes complementary in sequence to a nucleic acid sequence in the genomic DNA near each of the primary restriction enzyme recognition sites for the primary restriction enzymes.
20. The set of probes described in paragraph 19, wherein the probes are complementary in sequence to the nucleic acid sequences adjacent to each side of each primary restriction enzyme recognition site for a primary restriction enzyme in the genomic DNA.
21. The set of probes of paragraph 19 or paragraph 20, wherein the probes are complementary in sequence to a nucleic acid sequence that is less than 300 base pairs from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
22. The set of probes as in any of paragraphs 19-21, wherein the probes are complementary to less than 300bp of sequence from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
23. The set of probes as described in any of paragraphs 19-22, wherein the probes are complementary to a sequence 200-300bp from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
24. The set of probes as described in any of paragraphs 19-23, wherein the probes are complementary to sequences that are 100-200bp from each of the primary restriction enzyme recognition sites of the primary restriction enzymes in the genomic DNA.
25. The set of probes as described in any of paragraphs 19-24, wherein two or more probes are designed to hybridize to adjacent sequences of each primary restriction enzyme recognition site of a primary restriction enzyme in the genomic DNA.
26. The set of probes of paragraph 25, wherein the probes overlap or partially overlap.
27. The set of probes of paragraph 26, wherein the overlap is less than 10 nucleotides.
28. The set of probes as described in any of paragraphs 19-27, wherein the probe sequence corresponds to all or part of the sequence between each primary restriction enzyme recognition site of a primary restriction enzyme and each primary adjacent secondary restriction enzyme recognition site of a secondary restriction enzyme.
29. The set of probes of any of paragraphs 19-28, wherein each probe is at least a 25 mer.
30. The set of probes of any of paragraphs 19-29, wherein each probe is at least a 25-60 mer.
31. A method of making a probe set comprising the steps of:
(a) identifying each of the primary restriction enzyme recognition sites for the primary restriction enzymes in the genomic DNA;
(b) designing a probe capable of hybridizing to the adjacent sequence of each of the first restriction enzyme recognition sites in the genomic DNA;
(c) synthesizing a probe; and
(d) the probes are grouped together to form a set of probes or substantially form a set of probes.
32. The method of paragraph 31 wherein the probe is a PCR amplification product.
33. A set of probes or a substantially set of probes obtainable or obtainable by the method of paragraph 31 or paragraph 32.
34. An array comprising the probe array of paragraph 18 or consisting essentially of the probe set of any one of paragraphs 19-30 or 33.
35. An array comprising a set of probes as described in any of paragraphs 19-30 or 33.
36. The array of paragraph 34 or paragraph 35, wherein the array comprises about 300,000 and 400,000 probes.
37. The array of any of paragraphs 34-36, wherein the array comprises about 385,000 or more probes, preferably about 750,000 probes, more preferably 6 x 750,000 probes.
38. The array of any of paragraphs 34-37, wherein if the number of probes exceeds the number of probes a single array can contain, the array comprises or consists of a representation of a complete genome of a given species at a lower resolution.
39. The array of paragraph 38, wherein the array comprises one probe out of every 2, 3, 4, 5, 6, 7,8, 9, or 10 probes sequenced on the linear chromosome template.
40. A method of preparing an array comprising the step of immobilizing substantially the probe array of paragraph 18 or substantially the probe set of any one of paragraphs 19-30 or 33 on a solid support.
41. A method of preparing an array comprising the step of immobilizing substantially the probe array of paragraph 18 or substantially the probe set of any one of paragraphs 19-30 or 33 on a solid support.
42. An array obtainable or obtainable by the method of paragraph 40 or paragraph 41.
43. A method for analyzing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences (such as one or more genomic loci) comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying one or more nucleotide sequences linked to a target nucleotide sequence;
(h) optionally hybridizing the amplified sequences to an array; and
(i) the frequency of interactions between DNA sequences is determined.
44. A method of identifying one or more DNA-DNA interactions indicative of a particular disease state comprising the steps of:
(a) providing a sample of cross-linked DNA from diseased and non-diseased cells;
(b) digesting the cross-linked DNA in each sample with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying the one or more sequences ligated to the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array; and
(i) determining the frequency of interactions between DNA sequences,
wherein a difference in the frequency of interaction between DNA sequences from diseased and non-diseased cells indicates that a DNA-DNA interaction is indicative of a particular disease state.
45. A method of diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNS-DNA interaction comprising the steps of:
(a) providing a sample of cross-linked DNA from a subject;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying the one or more sequences ligated to the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array;
(i) determining the frequency of interactions between DNA sequences; and
(j) comparing the frequency of interaction of the DNA sequence to the frequency of interaction of an unaffected control;
wherein a difference between the value obtained for the control and the value obtained from the subject indicates that the subject is suffering from a disease or syndrome, or that the subject will suffer from a disease or syndrome.
46. The method of paragraph 45 wherein a transition in the interaction frequency from low to high indicates the location of the breakpoint.
47. The method of paragraph 45, wherein inversion is indicated by a frequency of DNA-DNA interactions in the subject sample in an inverted pattern relative to the control.
48. The method of paragraph 45 wherein a decrease in the frequency of DNA-DNA interaction of the subject sample relative to the control combined with an increase in the frequency of DNA-DNA interaction in more distant regions indicates a deletion.
49. The method of paragraph 45, wherein an increase or decrease in the frequency of DNA-DNA interaction of the subject sample relative to the control is indicative of a duplication or insertion.
50. The method of any one of paragraphs 45-49, wherein spectroscopic karyotyping and/or FISH are used prior to performing the method.
51. The method of any of paragraphs 45-50, wherein the disease is a genetic disease.
52. The method of any of paragraphs 45-51 wherein the disease is cancer.
53. A method of diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction comprising the steps of:
(a) providing a sample of cross-linked DNA from a subject;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying two or more sequences linked to one or more target nucleotide sequences;
(h) labeling two or more amplified sequences;
(i) hybridizing the nucleotide sequence to the array;
(j) determining the frequency of interactions between DNA sequences; and
(j) identifying one or more loci that are subject to a genomic rearrangement associated with the disease.
54. The method of paragraph 53, wherein two or more of the amplified sequences are differentially labeled.
55. The method of paragraph 54, wherein the markers are the same when the two or more amplified sequences are located on different chromosomes.
56. The method of paragraph 53, wherein the marker of the two or more amplified sequences is the same when the two or more amplified sequences are located on the same chromosome at a distance sufficiently far to minimize overlap between DNA-DNA interaction signals.
57. An assay method for identifying one or more agents that modulate DNA-DNA interactions, comprising the steps of:
(a) contacting the sample with one or more reagents;
(b) providing cross-linked DNA from a sample;
(c) digesting the cross-linked DNA with a first restriction enzyme;
(d) linking the cross-linked nucleotide sequences;
(e) releasing the crosslinking;
(f) digesting the nucleotide sequence with a second restriction enzyme;
(g) a cyclized nucleotide sequence;
(h) amplifying one or more nucleotide sequences linked to a target nucleotide sequence;
(i) optionally hybridizing the amplified nucleotide sequences to an array; and
(j) determining the frequency of interactions between DNA sequences,
wherein a difference between (i) the frequency of interaction between DNA sequences in the presence of the agent and (ii) the frequency of interaction between DNA sequences in the absence of the agent indicates that the agent is capable of modulating DNA-DNA interaction.
58. A method of detecting the location of a breakpoint (e.g., a translocation) comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying one or more sequences associated with the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array;
(i) determining the frequency of interactions between DNA sequences; and
(j) comparing the frequency of interaction between the DNA sequences to the frequency of interaction of a control;
wherein a transition from low to high in the frequency of DNA-DNA interaction in the sample relative to the control indicates the location of the breakpoint.
59. A method of detecting a flip position, comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying one or more sequences associated with the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array;
(i) determining the frequency of interactions between DNA sequences; and
(i) comparing the frequency of interaction between the DNA sequences to the frequency of interaction of a control;
wherein inversion is indicated by the DNA-DNA interaction frequency of the sample in an inverted pattern relative to the interaction frequency of the control.
60. A method of detecting a deletion location, comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying one or more sequences associated with the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array;
(i) determining the frequency of interactions between DNA sequences; and
(j) comparing the frequency of interaction between the DNA sequences to the frequency of interaction of a control;
wherein a decrease in the frequency of DNA-DNA interaction of the sample relative to the frequency of interaction of the control is indicative of a deletion.
61. A method of detecting a repetitive position comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) a cyclized nucleotide sequence;
(g) amplifying one or more sequences associated with the target nucleotide sequence;
(h) optionally hybridizing the amplified nucleotide sequences to an array;
(i) determining the frequency of interactions between DNA sequences; and
(j) comparing the frequency of interaction between DNA sequences to a control;
wherein an increase or decrease in the DNA-DNA interaction frequency of the subject sample relative to the DNA-DNA interaction frequency of the control indicates a duplication or insertion.
62. A reagent obtainable or obtainable by the test method described in paragraph 57.
63. Use of a nucleotide sequence as described in any of paragraphs 1-9 for identifying one or more DNA-DNA interactions in a sample.
64. Use of a nucleotide sequence as described in any of paragraphs 1-9 for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
65. Use of the probe array of paragraph 18 or the set of probes of any of paragraphs 19-30 or 33 for identifying one or more DNA-DNA interactions in a sample.
66. Use of the probe array of paragraph 18 or the set of probes of any of paragraphs 19-30 or 33 for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
67. Use of the array of any of paragraphs 34-39 or 42 to identify one or more DNA-DNA interactions in a sample.
68. Use of an array as described in any of paragraphs 34-39 or 42 for diagnosing or prognosing a disease or syndrome caused by or associated with a change in DNA-DNA interaction.
69. The use of any of paragraphs 64, 66 or 68, wherein the diagnosis or prognosis is a prenatal diagnosis or prognosis.
70. A method substantially as described herein and with reference to any one of the examples or figures.
71. A probe array substantially as described herein and with reference to any one of the examples or figures.
72. A set of probes substantially as described herein and with reference to any one of the examples or figures.
73. A method substantially as described herein and with reference to any one of the examples or figures.
74. An array substantially as described herein and with reference to any of the examples or figures.
75. A test method substantially as described herein and with reference to any one of the examples or figures.
76. An agent substantially as described herein and with reference to any of the examples or figures.
77. Use substantially as described herein and with reference to any of the examples or figures.
TABLE 2
| Interaction of | In 4C | N | % overlap | In low temperature FISH | P value |
| B-globulin-chromosome 773.1Mb | + | 258 | 7.4 | + | P<0.001 |
| B-globulin-chromosome 780.1Mb (OR) | - | 254 | 3.6 | - | |
| B-globulin-chromosome 7118.3Mb | - | 255 | 3.5 | - | |
| B-globulin-chromosome 7127.9Mb (Uros) | + | 259 | 6.6 | + | P<0.001 |
| B-globulin-chromosome 7130.1Mb | + | 413 | 9.7 | + | P<0.001 |
| B-globulin-chromosome 7135.0Mb (OR) | - | 261 | 1.9 | - | |
| B-globulin-D7 Mit21 | x | 258 | 0.4 | - |
| Chromosome 780.1 Mb-chromosome 7135.0Mb | x | 253 | 5.9 | + | P<0.05 |
| Chromosome 773.1 Mb-chromosome 7130.1Mb | x | 254 | 5.5 | + | P<0.05 |
| Rad 23A-chromosome 821.8Mb | + | 255 | 5.9 | + | P<0.05 |
| Rad 23A-chromosome 8122.4Mb | + | 261 | 8 | + | P<0.001 |
| Interaction of | In 4C | N | % overlap | In low temperature FISH | P value |
| B-globulin-chromosome 773.1Mb | - | 256 | 3.9 | - | |
| B-globulin-chromosome 780.1Mb (OR) | + | 256 | 12.9 | + | P<0.001 |
| B-globulin-chromosome 7118.3Mb | - | 242 | 4.1 | - | |
| B-globulin-chromosome 7130.1Mb | - | 263 | 3 | - | |
| B-globulin-chromosome 7135.0Mb (OR) | + | 256 | 7 | + | P<0.05 |
| B-globulin-D7 Mit21 | 258 | 6.2 | + | P<0.05 | |
| Chromosome 780.1 Mb-chromosome 7135Mb | 261 | 5 | + | P<0.1 | |
| Rad 23A-chromosome 821.8Mb | - | 260 | 38 | - | |
| Rad 23A-chromosome 8122.3Mb | + | 258 | 8.1 | + | P<0.001 |
Reference to the literature
Blanton J,Gaszner M,Schedl P.2003.Protein:protein interactions andthepairing of boundary elements in vivo.Genes Dev 17:664-75.
Dekker, j., Rippe, k., Dekker, m., and Kleckner, n.2002. capturingchromosonformation.science 295: 1306-11.
Drissen R,Palstra RJ,Gillemans N,Splinter E,Grosveld F,Philipsen S,deLaat W.2004.The active spatial organization of the beta-globin locusrequiresthe transcription factor EKLF.Genes Dev 18:2485-90.
Horike S,Cai S,Miyano M,Cheng JF,Kohwi-Shigematsu T.2005.Lossofsilent-chromatin looping and impaired imprinting of DLX5 in Rettsyndrome.Nat Genet 37:3l-40.
Murrell A,Heeson S,Reik W.2004.Interaction betweendifferentiallymethylated regions partitions the imprinted genes Igf2 and H19intoparent-specific chromatin loops.Nat Genet 36:889-93.
Palstra,R.J.,Tolhuis,B.,Splinter,E.,Nijmeijer,R.,Grosveld,F.,anddeLaat,W.2003.The beta-globin nuclear compartment in development anderythroiddifferentiation.Nat Genet 35:190-4.
Patrinos,G.P.,de Krom,M.,de Boer,E.,Langeveld,A.,Imam,A.M.A,Strouboulis,J.,de Laat,W.,and Grosveld,F.G.(2004).Multipleinteractionsbetween regulatory regions are required to stabilize an activechromatin hub.Genes&Dev.18:1495-1509.
Spilianakis CG,Flavell RA.2004.Long-rangeintrachromosomalinteractions in the T helper type 2 cytokine locus.NatImmunol 5:1017-27.
Tolhuis, b., Palstra, r.j., spline, e., green, f., and de Laat, w.2002.looping and interaction between photosensitive sites in the active beta-globulocus. molecular Cell 10: 1453-65.
Vakoc CR,Letting DL,Gheldof N,Sawado T,Bender MA,Groudine M,Weiss MJ,Dekker J,Blobel GA.2005.Proximity among distant regulatoryelements at thebeta-globin locus requires GATA-1 and FOG-1.Mol Cell.17:453-62
All publications mentioned in the above specification are herein incorporated by reference. Various modifications and alterations of the described methods and systems of the present invention will be apparent to those skilled in the art without departing from the scope and spirit of the invention. While the invention has been described in connection with specific preferred embodiments, it should be understood that the invention as claimed should not be unduly limited to such specific embodiments. Indeed, various modifications of the described modes for carrying out the invention which are obvious to those skilled in molecular biology or related fields are intended to be within the scope of the following claims.
Sequence listing
<110> University of Iraq Massa Medical Center (Erasmus University Medical Center)
<120> chromosome conformation chip Capture (4C) assay
<130>P022908WO
<140>PCT/IB2006/002268
<141>2006-07-03
<150>GB 0513676.7
<151>2005-07-04
<150>GB 0605449.8
<151>2006-03-17
<160>6
<170>PatentIn version 3.4
<210>1
<211>22
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>1
acttcctaca cattaacgag cc 22
<210>2
<211>23
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>2
gctgttatcc ctttctcttc tac 23
<210>3
<211>18
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>3
tcacacgcga agtaggcc 18
<210>4
<211>19
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>4
ccttcctcca ccatgatga 19
<210>5
<211>25
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>5
aacgcatttg ctcaatcaac tactg 25
<210>6
<211>25
<212>DNA
<213> Artificial
<220>
<223> oligonucleotide primer
<400>6
gttgctcctc acatttgctt ctgac 25
Claims (6)
1. A circularized nucleotide sequence obtainable by a method for analyzing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences of interest, said method comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme;
(f) ligating one or more DNA sequences of known nucleotide composition to the available one or more secondary restriction enzyme digestion sites flanking the one or more nucleotide sequences of interest;
wherein the ligation reaction in step (f) results in the formation of a DNA loop,
wherein the circularized nucleotide sequence comprises a first and a second nucleotide sequence, wherein each end of the first and second nucleotide sequence is separated by a different restriction enzyme recognition site, and wherein the first nucleotide sequence is a target nucleotide sequence and the second nucleotide sequence is obtained by cross-linking genomic DNA.
2. The circularized nucleotide sequence according to claim 1, wherein said one or more nucleotide sequences of interest are one or more genomic loci.
3. A method for preparing a circularized nucleotide sequence comprising the steps of:
(a) providing a sample of cross-linked DNA;
(b) digesting the cross-linked DNA with a first restriction enzyme;
(c) linking the cross-linked nucleotide sequences;
(d) releasing the crosslinking;
(e) digesting the nucleotide sequence with a second restriction enzyme that cleaves the nucleotide sequence at a distance of greater than about 350bp from the first restriction enzyme site; and
(f) a cyclized nucleotide sequence.
4. An assay method for identifying one or more agents that modulate DNA-DNA interactions, comprising the steps of:
(a) contacting the sample with one or more reagents; and
(b) performing steps (a) to (f) of the method of claim 1 or 2 to analyse the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences of interest, wherein step (a) comprises providing cross-linked DNA from the sample;
wherein a difference between (i) the frequency of DNA sequence interaction in the presence of the agent and (ii) the frequency of DNA sequence interaction in the absence of the agent indicates that the agent is capable of modulating DNA-DNA interaction.
5. Use of a nucleotide sequence according to claim 1 for analyzing the frequency of interaction of a target nucleotide sequence with one or more nucleotide sequences of interest, wherein the nucleotide sequence is not used for the diagnosis of a disease.
6. Use of a nucleotide sequence according to claim 1 to identify an agent for a pharmaceutical composition comprising said agent that modulates DNA-DNA interactions.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| GB0513676.7 | 2005-07-04 | ||
| GB0605449.8 | 2006-03-17 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| HK1118870A HK1118870A (en) | 2009-02-20 |
| HK1118870B true HK1118870B (en) | 2018-02-09 |
Family
ID=
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| KR101383593B1 (en) | Chromosome conformation capture-on-chip(4C) assay | |
| US8642295B2 (en) | Circular chromosome conformation capture (4C) | |
| Sha et al. | Identification of testis development and spermatogenesis-related genes in human and mouse testes using cDNA arrays | |
| US20180282796A1 (en) | Typing and Assembling Discontinuous Genomic Elements | |
| WO2002097090A1 (en) | Genes with es cell-specific expression | |
| CN103649332A (en) | Association markers for beta thalassemia trait | |
| US20030013671A1 (en) | Genomic DNA library | |
| JPH05211897A (en) | Nucleotide sequence | |
| HK1118870B (en) | Chromosome conformation capture-on-chip (4c) assay | |
| CN101238225B (en) | Chromosome conformation chip capture (4C) assay | |
| HK1118870A (en) | Chromosome conformation capture-on-chip (4c) assay | |
| EP2453022A1 (en) | Method for detection or analysis of target sequence in genomic dna | |
| CN101238225A (en) | Chromosome conformation chip capture (4C) assay | |
| WO2008050870A1 (en) | Organ-specific gene, method for identifying the same and use thereof | |
| KR20150038944A (en) | Method for Analysis of Gene Methylation and Ratio Thereof | |
| Class et al. | Patent application title: CIRCULAR CHROMOSOME CONFORMATION CAPTURE (4C) Inventors: Wouter De Laat (Rotterdam, NL) Frank Grosveld (Rotterdam, NL) Assignees: Erasmus University Medical Center | |
| JP2001518311A (en) | Linking Diseases by Hierarchical Loci | |
| WO2003050282A1 (en) | Mutated cyp2d6 genes | |
| JP2004229653A (en) | Novel protein and DNA encoding it |