WO2016069539A1

WO2016069539A1 - Systems and methods of screening with a molecule recorder

Info

Publication number: WO2016069539A1
Application number: PCT/US2015/057479
Authority: WO
Inventors: Carina NAMIH; Katharina WENDELSTADT; Geoffrey SIWO; Hannu Rajaniemi; Remco LOOS; Cristina COTOBAL; Tim SMALLIE
Original assignee: Helix Nanotechnologies Inc
Current assignee: Helix Nanotechnologies Inc
Priority date: 2014-10-27
Filing date: 2015-10-27
Publication date: 2016-05-06
Anticipated expiration: 2017-04-27

Abstract

The present invention is directed to compositions and methods related to nucleotide enzymes and their use in screening assays, the enzyme comprising two substrate binding domains, the target or effector binding domain allosterically responding to initiate ligation activity by a nucleotide enzyme moiety and releasing the ligated oligonucleotide as a reporter for the detected moiety.

Description

PCT PATENT APPLICATION

FOR

SYSTEMS AND METHODS OF SCREENING WITH A MOLECULAR RECORDER

BY

GEOFFREY SIWO, LEBANON, NH, USA

KATHARINA WENDELSTADT, SAO PAOLO, SAO PAOLO, BRAZIL

CARINA NAMIFI, LONDON, BRITAIN

HANNU RAJANIEMI, OAKLAND, USA

TIM SMALLIE, LONDON, BRITAIN

CRISTINA COTOBAL, CAMBRIDGE, BRITAIN

REMCO LOOS, CAMBRIDGE, BRITAIN

SYSTEMS AND METHODS OF SCREENING WITH A MOLECULAR RECORDER

CROSS-REFERENCE

[001] This application claims the benefit of U.S. Provisional Application No. 62/068,872, filed October

27, 2014 and 62/068,880, filed October 27, 2014, both of which applications are expressly incorporated herein by reference.

BACKGROUND

[002] Current methods of measuring biological effects of agents or fluctuations in phenotypes in cells are time-consuming, expensive and generally require destruction of cells at given time points in order to capture the state of the cell at a given point in time. While real-time imaging can occur when monitoring labeled cells, this frequently requires a priori knowledge of an agent that can be labeled. Accordingly, notable challenges exist in monitoring real-time fluctuations of cellular phenotypes.

SUMMARY

[003] In one embodiment the principles of the present disclosure provide a method for recording a detectable signal onto an oligomer comprising providing a molecular recorder system said system comprising at least a first, a second and a third molecular entity, wherein the first molecular entity exhibits a ligase activity whose activity to ligate the second and third molecular entities depends on the presence of the signal, the method comprising exposing the system to a signal whereupon in the presence of the signal, the first molecular entity joins the second and third molecular entities, thereby generating an oligomer indicative of the presence of the signal.

[004] In yet another embodiment, principles of the present disclosure provide a polynucleotide comprising first and second substrate binding domains, at least one nucleotide enzyme domain and at least one target binding domain.

[005] In one embodiment, principles of the present disclosure provide a molecular recorder system comprising a polynucleotide comprising first and second substrate binding domains, at least one nucleotide enzyme domain, at least one autoinhibitory domain and at least one target binding domain, and first and second polynucleotide substrates, wherein said first and second polynucleotide substrates are complementary to said first and second substrate binding domains.

[006] In one embodiment, principles of the present disclosure provide a multi-functional molecular recorder system comprising a first and second molecular recorder system as described herein, wherein the target binding domain of the first molecular recorder system is correlated with a sequence of at least said first or second substrate binding domains of said first molecular recorder system and the target binding domain of said second molecular recorder system is correlated with a sequence of at least said first or second substrate binding domains of said second molecular recorder system.

[007] In one embodiment, principles of the present disclosure provide a method of detecting a plurality of target analytes comprising contacting the multifunctional molecular recorder system as described herein with at least first and second candidate agents, whereby upon binding of said first candidate agent to said target binding domain of said polynucleotide of said first molecular recorder system, first and second polynucleotide substrates bind to first and second substrate binding domains of said polynucleotide of said first molecular recorder system, whereby said first and second substrates are ligated to produce a first product indicative of the presence of said first target, whereby upon binding of said second candidate agent to said target binding domain of said polynucleotide of said second molecular recorder system, first and second polynucleotide substrates bind to first and second substrate binding domains of said polynucleotide of said second molecular recorder system, whereby said first and second substrates are ligated to produce a second product indicative of the presence of said second target.

[008] In one embodiment, principles of the present disclosure provide a cell comprising at least a first and second heterologous nucleotide enzyme, wherein each of the first and second nucleotide enzymes comprise a first binding moiety, at least a first and second target binding sequence; and a catalytic sequence, wherein said first and second target binding sequences of said first and second nucleotide enzyme are substantially complementary to polynucleotide target sequences and wherein upon hybridization of polynucleotide target sequences with first and second target binding sequences of any of said nucleotide enzymes, said nucleotide enzyme is capable of catalyzing ligation of said first and second polynucleotide sequences. [009] In one embodiment, principles of the present disclosure provide a method of screening for the presence of target molecules comprising culturing the cell as described herein and detecting the presence of ligated products of at least one of said heterologous nucleotide enzymes.

[010] In one embodiment, principles of the present disclosure provide a method of screening for a biological effect of an agent comprising culturing the cell as described herein with a candidate agent, detecting the presence or absence of the products of at least one of said heterologous nucleotide enzymes, whereby a change in the presence or absence of the product of said first or second heterologous nucleotide enzymes following culturing with said candidate agent indicates that the candidate agent has a biological effect.

[011] In one embodiment, principles of the present disclosure provide a method of recording temporal effects of a plurality of candidate agents on a biological response in a cell comprising culturing a cell as described herein with a first candidate agent, then culturing the cell with a second candidate agent, detecting the presence or absence of the products of said first or second heterologous nucleotide enzymes, whereby a change in the presence or absence of the product of said first or second heterologous nucleotide enzymes following culturing with said first or second candidate agents indicates that the candidate agent has a biological effect.

[012] In one embodiment, principles of the present disclosure provide a molecular recorder system comprising first and second polynucleotides comprising first and second substrate binding domains, at least one nucleotide enzyme domain and at least one target binding domain and first and second pairs of polynucleotide substrates, wherein said first pair of polynucleotide substrates is complementary to said first and second substrate binding domains of said first polynucleotide and said second pair of polynucleotide substrates is complementary to said first and second substrate binding domains of said second polynucleotide.

[013] In one embodiment the present disclosure provides a molecular recorder as described herein, wherein said first and second polynucleotides each further comprise at least one autoinhibitory domain. [014] In one embodiment the present disclosure provides a multi-functional molecular recorder system as described herein, wherein the target binding domain of said first polynucleotide is correlated with a sequence of at least said first or second substrate binding domains of said first polynucleotide and the target binding domain of said second polynucleotide is correlated with a sequence of at least said first or second substrate binding domains of said second polynucleotide.

[015] In one embodiment the present disclosure provides a method of detecting a plurality of target analytes comprising contacting a multifunctional molecular recorder system as described herein with at least first and second candidate agents, whereby upon binding of said first candidate agent to said target binding domain of said first polynucleotide, first and second polynucleotide substrates bind to first and second substrate binding domains of said first polynucleotide, whereby said first and second substrates are ligated to produce a first product indicative of the presence of said first target, whereby upon binding of said second candidate agent to said target binding domain of said second, first and second polynucleotide substrates bind to first and second substrate binding domains of said second polynucleotide, whereby said first and second substrates are ligated to produce a second product indicative of the presence of said second target.

[016] In one embodiment the present disclosure provides a method of detecting a plurality of messenger RNA molecules comprising contacting a multifunctional molecular recorder system as described herein with at least first and second mRNA molecules, whereby upon binding of said first candidate agent to said target binding domain of said first polynucleotide, first and second polynucleotide substrates bind to first and second substrate binding domains of said first polynucleotide, whereby said first and second substrates are ligated to produce a first product indicative of the presence of said first target, whereby upon binding of said second candidate agent to said target binding domain of said second, first and second polynucleotide substrates bind to first and second substrate binding domains of said second polynucleotide, whereby said first and second substrates are ligated to produce a second product indicative of the presence of said second target, wherein poly -A tails of said mRNA molecules hybridize with at least one of said substrate binding domains of each of said first or second polynucleotides.

[017] It is contemplated that any embodiment of a method or composition described herein can be implemented with respect to any other method or composition described herein. [018] The use of the word "a" or "an" when used in conjunction with the term "comprising" in the claims and/or the specification may mean "one," but it is also consistent with the meaning of "one or more," "at least one," and "one or more than one."

[019] The use of the term "or" in the claims is used to mean "and/or" unless explicitly indicated to refer to alternatives only or the alternative are mutually exclusive, although the disclosure supports a definition that refers to only alternatives and "and/or."

[020] Throughout this application, the term "about" is used to indicate that a value includes the standard deviation of error for the device or method being employed to determine the value.

[021 ] As used in this specification and claim(s), the words "comprising" (and any form of comprising, such as "comprise" and "comprises"), "having" (and any form of having, such as "have" and "has"), "including" (and any form of including, such as "includes" and "include") or "containing" (and any form of containing, such as "contains" and "contain") are inclusive or open-ended and do not exclude additional, unrecited elements or method steps.

[022] Other objects, features and advantages of the present invention will become apparent from the following detailed description. It should be understood, however, that the detailed description and the specific examples, while indicating specific embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.

DESCRIPTION OF THE DRAWINGS

[023] The following drawings form part of the present specification and are included to further demonstrate certain aspects of the present invention. The invention may be better understood by reference to one or more of these drawings in combination with the detailed description of the specification embodiments presented herein.

[024] FIG. 1 (SEQ ID NOs: 2, 3 and 4) Diagram of the domains of a nucleotide ligase as described herein. [025] FIG 2 E47 catalyzed reaction using two arbitrary DNA sequences. One DNA fragment was labeled with FAM while the other was attached to biotin to allow tracking of the ligation reaction on a flow cytometer.

[026] FIG. 3 Multiple ligations by an active molecular recorder generate a distribution of products.

[027] FIG. 4 Drawing showing split catalytic domain of a polynucleotide enzyme.

[028] FIG. 5 Ligations of multiple substrates. Products from one reaction can be substrates for a second reaction yielding products containing substrates from different nucleotide enzymes.

[029] FIG. 6 Detection of multiple targets using target specific polynucleotide enzymes.

[030] FIG. 7 Discrimination of phenotypic states based on the mean tape lengths. Each line represents the distribution of tapes of different lengths observed in each biological sample. Samples in the cancer state express the gene of interest and as a consequence are associated with activation of the DNAzyme, resulting into the production of longer tapes.

[031] FIG. 8 An exemplary sensor, substrate and DNAzyme for a BRCA1 recorder (SEQ ID NOs:5, 6 and 7).

[032] FIG. 9 A depiction of the product of reactions as described herein showing A. relative timestamp and B. biological time stamp.

[033] FIG. 10 A depiction of the product of reactions as described herein demonstrating detection of molecules of different concentrations.

[034] FIG. 11 A depiction of the reaction to detect mRNA molecules.

[035] FIG. 12 A depiction of heterogeneity (H) in gene expression using molecular recorders that detect an mRNA signal is directly proportional to the variance in the length of tapes encoding the mRNA signal. [036] FIG 13 A depiction of configurations of a system for detecting nucleic acid targets and transducing the associated sequence information into a detectable product nucleic acid.

[037] FIG 14 A depiction of configurations of a system for detecting nucleic acid targets and transducing information from multiple ligation reactions.

[038] FIG 15 A depiction of a switchable DNAzyme ligase. (a) diagram showing a design of a switchable ligase system. A switching input (oligo) is used to join together two halves of the ligase to render the two halves active, (b) an experiment to test the switchable ligase. Note the presence of a ligation product even in the absence of the switching input, (c) diagram showing new switchable split ligase system (where one half of the ligase acts as the switch input), (d) image showing products of a ligation reaction. Two ssDNa substrates (each 50 nt in length) were ligated into a lOObp product in the presenct of the 26nt ssDNA switch (Split2). The 26nt ssDNA switch interats with an inactive 41 nt (Splitl) DNAzyme ligase, and renders the ligase active. DNA fragments were resolved using TBE-UREA Polyacrylamide Gel Electrophoresis (PAGE) (SEQ ID NOs: 8, 9, 10 and 11).

DESCRIPTION

[039] The present disclosure overcomes limitations of current screening methods by providing novel nucleic acids, systems and methods for screening of a variety of agents. Notably, principles of the present disclosure provide the ability to screen for agents in vitro, in vivo, intracellularly or extracellularly. Moreover, levels of agents to be screened can be monitored over time.

[040] Accordingly, principles of the present disclosure provide novel nucleotide enzymes and their use in screens. The nucleotide enzymes that find use in the methods disclosed herein may be referred to as nucleotide enzyme recorders and contain features in addition to those found in a core nucleotide enzyme, which would include two substrate binding elements flanking a catalytic domain. For instance, while DNA enzymes, such as the E47 DNA ligase have been described previously, development of related technologies has not advanced. For instance, previously disclosed nucleotide enzyme ligases were limited in their substrate specificity and developments in altering substrate specificity have languished. However, the present disclosure provides nucleotide enzymes with diverse substrate specificity that can be tailored or customized to ligate a target of choice. As such, nucleotides enzymes described herein may be described as "programmable" or "customizable" or "tailored" nucleotide enzymes. In addition, nucleotide enzyme recorders described herein contain additional features that may include a target binding domain or moiety, also referred to as a binding ligand, an autoinhibitory domain and a linker(s) separating these features.

[041] By nucleotide ligase or DNA ligase or DNAzyme, as described herein is meant a nucleotide sequence having a first and second substrate binding sequence flanking a catalytic domain (Figure 1). The substrate binding sequences hybridize with target sequences, also referred to herein as substrate sequences that are complementary or substantially complementary to the substrate binding sequences. Upon hybridization with the first and second substrate binding sequences, the two substrate molecules are ligated together. Notably, this ligation requires only single stranded substrate molecules and requires no overlapping regions of complementarity between the two substrate sequences. As such, nucleotide enzymes can be designed to ligate any two substrate molecules of interest allowing for the first time protein-free, multiplex ligation of a plurality of molecules in a single reaction vessel. Moreover, ligation reactions are not limited by the size of the target molecule. Thus, very large substrates can be targeted for ligation and larger products generated. Thus, the present disclosure provides methods for sequential ligation of a plurality of target sequences to generate products larger than could be achieved by prior nucleotide enzymes.

[042] Another notable feature of nucleotide ligases is that the product formed has a 2'-5'-phosphodiester bonds instead of 3'-5'- phosphodiesters. This provides numerous advantages, such as being resistant to nucleases while still being recognized by DNA polymerase.

[043] Accordingly, in one embodiment, the nucleotide ligases described herein may ligate any of a variety of nucleotide molecules. For instance, the disclosure need not be limited to ligation of DNA molecules. Rather, other nucleotides, such as RNA, LNA, PNA and the like may be ligated. Other nucleotides that may be ligated include, but are not limited to XNAs where X can be one of different types of sugars: 1,5-anhydrohexitol nucleic acids (HNAs), cyclohexenyl nucleic acids (CeNAs), 2'-0,4'-C-methylene-beta-D-ribonucleic acids [locked nucleic acids (LNAs)], arabinonucleic acids (ANAs), 2'-fluoro-arabinonucleic acids (FANAs), alpha-L-threofurnaosyl Nucleic Acids (TNAs).

[044] As noted above, the nucleotide enzymes of the present disclosure comprise first and second substrate binding sequences. These regions hybridize with complementary or substantially complementary substrate sequences in target molecules. In some embodiments the substrates are found in target molecules to be investigated. In other embodiments the substrates are provided as a tool to measure the ligation reaction. A benefit of the nucleotide enzymes described herein is that the substrate binding sequences are designed and may be of different lengths. That is, in some embodiments each of the substrate binding sequences may have as few as 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or 10 or 11 or 12 or 13 or 14 or 15 nucleotides. In one embodiment, the substrate binding sequences of the nucleotide enzyme are up to 20 or 30 or 40 or 50 nucleotides in length. In some embodiments the target binding sequences are from 3 to 50, 4 to 40, 5 to 30, 6 to 25, or 8 to 10 nucleotides in length.

[045] The nucleotide ligases described herein also comprise a catalytic domain between the two substrate binding sequences.

[046] In some embodiments, DNAzymes or nucleotide enzymes are specifically designed or programmed. In this embodiment, substrate43inding domains of a nucleotide enzyme are designed to be complementary to substrates with which they hybridize. Factors influencing hybridization include degree of complementarity, length of complementarity region, nucleotide composition of the hybridizing regions, secondary structure and other factors as known in the art. Accordingly, the present disclosure also provides methods of designing a library of nucleotide enzymes or DNAzymes that can be used to ligate a plurality of target molecules.

[047] In some embodiments designing the DNAzymes or nucleotide enzymes includes modularizing the catalytic nucleic acid into one or more catalytic domains and at least two substrate recognition domains. Catalytic domains can be identified by their characteristic sequence of 5'- CCTGTTTCATGAGACCATGTGACGCATGGCCCG-3' (SEQ ID NO: l). Variants, however, can be identified by methods disclosed herein and that are known to those of ordinary skill in the art. Substrate binding (also referred to as substrate recognition) domains are found 5 ' and 3 ' of the catalytic domain. As described above, the substrate binding domains can be designed to be complementary to any target of choice and thus provide the ability for a DNAzyme to ligate any two nucleotides together. As such, the method provides programming the catalytic nucleic acid to ligate two nucleic acids by designing at least one of the substrate recognition domains to be complementary to the terminus of one of the target nucleic acids and the second substrate recognition domain to be complementary to the terminus of the other target nucleic acid, for instance, based on Watson-Crick base pairing.

[048] In some embodiments, the substrate binding domains are of the same or different length within a DNAzyme or within a set of DNAzymes. In one embodiment, designing DNAzymes, therefore, includes defining the optimal length of the substrate-binding domain. Factors to consider and to calculate when designing optimal lengths include calculating the thermodynamic stability of Watson-Crick pairs between the substrate recognition domains of a given length and the terminal regions of the target nucleic acid sequences, calculating the probability of secondary structures forming on the substrate recognition sequences, calculating the probability of intra-molecular interactions between the substrate recognition sequences and the catalytic domains, calculating the expected catalytic efficiency of the catalytic nucleic acid sequence given the sequence of the substrate recognition domain, calculating the error rate of the nucleic acid synthesizer for making the catalytic nucleic acid; and/or calculating the cost of synthesizing the nucleic acid of a given length on the selected nucleic acid synthesizer.

[049] Additionally, the catalytic nucleotide ligases are generally metalloenzymes. That is, they require metal co-factors for enzymatic activity. Preferred metals include Zn²⁺ and/or Cu²⁺, Ag(I), Pb(II), Hg(II), As(III), Fe(III), Zn(II), Cd(II), Cu(II), Sr(II), Sa(II), Ni(II), Co(II), As(V), U(VI), Cr(VI), Ca(II), Mg(II).

[050] Other components of the nucleotide enzyme include a target-binding moiety. By "target binding moiety" is meant a portion of the nucleotide that binds with high affinity to a target. See Figure 3. In a preferred embodiment, the binding is specific, and the binding moiety, also referred to herein as binding ligand, is part of a binding pair, wherein one part of the binding pair is in the nucleotide enzyme. By "specifically bind" herein is meant that the moiety binds the target or analyte with specificity sufficient to differentiate between the analyte and other cellular. In some embodiments, for example in the detection of certain biomolecules, the binding constants of the analyte to the binding ligand will be at least about 10⁴ -10⁶ M^"1, with at least about 10⁵ to 10⁹M^-1 being preferred and at least about 10⁷ -10⁹ M^"1 or higher affinity being particularly preferred.

[051 ] As will be appreciated by those in the art, the composition of the binding ligand will depend on the composition of the target. Binding ligands to a wide variety of analytes are known or can be readily found using known techniques. For example, when the analyte is a single-stranded nucleic acid, the binding ligand may be a complementary nucleic acid. Similarly, the analyte may be a nucleic acid binding protein and the binding ligand is either single-stranded or double stranded nucleic acid; alternatively, the binding ligand may be a nucleic acid-binding protein when the analyte is a single or double- stranded nucleic acid. When the analyte is a protein, the binding ligands include proteins or small molecules. Binding ligands may include peptides in some embodiments. For example, when the analyte is an enzyme, suitable binding ligands include substrates and inhibitors. As will be appreciated by those in the art, any two molecules that will associate may be used, either as an analyte or as the binding ligand. Suitable analyte/binding ligand pairs include, but are not limited to, antibodies/antigens, receptors/ligands, proteins/nucleic acid, enzymes/substrates and/or inhibitors, carbohydrates (including glycoproteins and glycolipids)/lectins, proteins/proteins, proteins/small molecules; and carbohydrates and their binding partners are also suitable analyte-binding ligand pairs. These may be wild-type or derivative sequences. In a preferred embodiment, the binding ligands are portions of transmembrane proteins, such as cell surface receptors.

[052] As noted above, in some embodiments the target-binding moiety is a nucleotide sequence capable of binding to a target. In some embodiments the target-binding moiety hybridizes with a nucleotide having a complementary sequence. In some embodiments, the target-binding moiety binds to a molecule based on structure specificity, such as an aptamer. As such, in one embodiment, the disclosure provides the use of nucleic acids as the binding ligand or target binding moiety, for example for when the target analyte is a nucleic acid or a nucleic acid binding protein, or when the nucleic acid serves as an aptamer for binding a protein; see U.S. Pat. Nos. 5,270,163, 5,475,096, 5,567,588, 5,595,877, 5,637,459, 5,683,867,5,705,337, and related patents, hereby incorporated by reference. [053] Thus, in one embodiment, the target-binding moiety is a nucleotide sequence that is contiguous with the sequence of the nucleotide enzyme. In some embodiments, the target-binding moiety may be separated from the sequence defining the nucleotide enzyme by a linker that also may be nucleotide sequences. In some embodiments, the linker may be from 2 to 100 nucleotides in length. In some embodiments the linker may be from 4 to 70 nucleotides in length. In some embodiments the linker may be from 8 to 50 nucleotides in length. In some embodiments the linker may be from 10 to 25 nucleotides in length.

[054] The method of attachment of the binding ligand to the linker will generally be done as is known in the art, and will depend on the composition of the linker and the binding ligand. In general, the binding ligands are attached to the linker through the use of functional groups on each that can then be used for attachment. Preferred functional groups for attachment are amino groups, carboxy groups, oxo groups and thiol groups. These functional groups can then be attached, either directly or through the use of a linker. Linkers are known in the art; for example, homo-or hetero- bifunctional linkers as are well known (see 1994 Pierce Chemical Company catalog, technical section on cross-linkers, pages 155-200, incorporated herein by reference). Preferred linkers include, but are not limited to, alkyl groups (including substituted alkyl groups and alkyl groups containing heteroatom moieties), with short alkyl groups, esters, amide, amine, epoxy groups and ethylene glycol and derivatives being preferred. Linkers may also be a sulfone group, forming sulfonamide. In this way, binding ligands comprising proteins, lectins, nucleic acids, small organic molecules, carbohydrates, etc. can be added. In one embodiment when the binding ligand or target binding moiety is a nucleic acid, it is synthesized during the synthesis of the nucleotide enzyme and linker.

[055] Another feature of the nucleotide enzyme as described herein is an autoinhibitory segment. By "autoinhibitory" is meant that a component of the nucleotide enzyme inhibits the activity of the enzyme or inhibits substrate binding of the nucleotide enzyme until the inhibition is repressed. See Fig. 3 and Fig. 8. In one embodiment, binding by the target-binding moiety to the target represses the inhibition allowing for catalysis by the nucleotide enzyme. In one embodiment, the autoinhibitory region is a nucleotide sequence that hybridizes to complementary sequences found in other parts of the nucleotide enzyme molecule. In this embodiment, hybridization of the autoinhibitory domain with its complement in the nucleotide enzyme polynucleotide prevents activity of the nucleotide enzyme. However, upon binding of the target binding moiety with its cognate target, hybridization between the autoinhibitory domain and its complementary sequence is disrupted allowing for increased activity of the catalytic domain of the nucleotide enzyme. Thus, in the presence of substrates and a target, the nucleotide enzyme is derepressed and catalysis can proceed resulting in target induced ligation of the substrates.

[056] Accordingly, the nucleotide enzyme recorder, also referred to herein as a molecular recorder, may include the following elements: an autoinhibitory domain, a target binding domain, a first substrate binding domain, a catalytic domain and a second substrate binding domain. In one embodiment these elements are in a contiguous polynucleotide sequence, although in some embodiments they may be found in multiple sequences that are brought into sufficiently close proximity to allow for the nucleotide enzyme activity. See Figure 4. In some embodiments, these components may be separated by a liner sequence as described herein.

[057] Accordingly, in one embodiment the disclosure provides for screening a plurality of candidate agents. These agents may be found in any of a variety of compositions including, but not limited to within a cell, within an organism, in a mixture of candidate agents from a chemical library, or in a biological fluid, for instance. In some embodiments, the disclosure provides for a method detecting a candidate agent's ability to modulate or influence a biological system and cause a detectable change in a biological effect. Biological effects that can be measured using the nucleotide enzyme recorder include, but are not limited to gene expression, polynucleotide degradation, nucleotide methylation, phosphorylation state of a protein, protein degradation, protein glycosylation, levels of metabolites, pH, binding kinetics of metabolites or small molecules, levels of exogenous small molecules such as drugs, lipids, post-translational modifications on proteins such as ubiquitination and the like. In some embodiments, the nucleotide enzyme can be used to study the biological effect of a particular organelle or subcellular location. In some embodiments, these assays may be performed in a high throughput manner. In some embodiments, when performing celLbased screens, the cells comprising the nucleotide enzymes disclosed herein are cultured in the presence or absence of various candidate agents. Culturing methods are known in the art and include the use of selective markers and selection strategies as are known in the art. In one embodiment the cells are cultured in multi well plates, such as 96 well, 384 well and 1536 well or more microtiter plates.

[058] Nucleotide enzymes as described herein may be transiently or permanently transfected into the recipient cell using methods known in the art. Such methods include, but are not limited to intracellular delivery of recorders, e.g. nucleotide enzymes, into cells by using standard approaches for delivery of nucleic acids into cells using electroporation, nucleofection, lipofection. Delivery may also be performed using viral vectors commonly applied in delivery of DNA into cells. Delivery can also be targeted to specific cell types by attaching tapes and recorders to aptamers, specific cell penetrating nucleic acids sequences such as CpG oligodeoxynucleotides or other small molecule ligands that recognize cell surface receptors. Alternatively, cell penetrating peptides can be attached to tapes to enhance their delivery. Delivery may also be attained by micromanipulation devices such as patch-clamp pipettes using standard protocols but in which the recorder components are added in solution.

[059] In addition to nucleotide enzymes to be used as molecular recorder, the disclosure herein also provides for genes encoding molecular recorders. In this embodiment, a molecular recorder gene comprises a DNA sequence encoding a ribozyme ligase and RNA based molecular recorder tapes which are short mRNA oligomers (<1000 bases but preferably less than 20 bases). A simple molecular recorder gene consists of a promoter for transcribing non-coding RNAs (ncRNA) such as RNA polymerase (RNA pol) III promoter. The gene can have two RNA pol III promoters- one for transcribing the ribozyme ligase and another for transcribing tapes. The recorder genes can be encoded on plasmids or engineered into host genomes.

[060] Activity of molecular recorder genes can be controlled through the modification of RNA pol III promoters of the genes by addition of promoter elements such as transcription factor binding sites that are responsive to specific stimuli such as nutrient deprivation or cell stress and occur naturally in RNA pol III promoters.

[061] In the presence of a target signal (eg high salt concentrations), the RNA pol III promoters driving the expression of recorder genes becomes activated leading to the production of a ribozyme ligase and tapes. The expressed ribozyme ligase then ligates the resulting tapes which can be detected by RNA-seq or any of the methods below for reading the resulting sequences.

[062] In addition to nucleotide enzymes being transfected into the cell. Substrates for the nucleotide enzyme(s) also are transfected into the cell. Because the nucleotide enzymes are ligases, the substrates are polynucleotides. As described above, each nucleotide enzyme has two features that hybridize with complementary polynucleotides. As such, for each nucleotide enzyme, there are two substrate polynucleotides. Each polynucleotide substrate is complementary to one of the substrate binding regions of the polynucleotide enzyme.

[063] In the instance when a plurality of nucleotide enzymes are transfected into a cell, so too are corresponding polynucleotide substrates. Thus, in one embodiment, when 2 polynucleotide enzymes are used, at least nx2 corresponding polynucleotide substrates are also included; when 3 polynucleotide enzymes are used, at least 6 corresponding polynucleotide substrates are also included and so on. As such, the present disclosure provides for at least 2, at least 5, at least 10, at least 20, at least 30, at least 40 or more different polynucleotides being used in the screening as disclosed herein.

[064] In an alternative embodiment, substrate redundancy is used. In this embodiment, one polynucleotide substrate may be used for a plurality of polynucleotide enzymes so long as the polynucleotide substrate to which it is ligated following the ligation reaction results in a unique combination of substrates. In this fashion, each of the products of the ligation reaction has a unique sequence.

[065] In one embodiment, the disclosure therefore provides a method of screening and detecting distinct candidate agents. In some embodiments screening for the presence of an agent for the effect(s) of an agent on biological systems can be done. In this embodiment, candidate agents are applied to a cell and biological effects are monitored by molecular recorders either within or outside of the cell. When inside the cell, the molecular recorders may record intracellular events,. When outside the cell, the molecular recorders may record changes in the extracellular milieu. If the candidate agent induces a biological response resulting in the production of a target of a particular target- binding moiety, the binding of the binding pairs (the target with the target binding moiety) releases inhibition of the autoinhibitory domain of the polynucleotide enzyme resulting in ligation of two polynucleotides producing a uniquely identifiable polynucleotide sequence.

[066] Candidate agents can be any of a variety of agents including, but not limited to small molecules, proteins, bacteria, viruses, fungi and the like. In one embodiment, candidate agents can be detected using the molecular recorders described herein in in vitro assays by detecting target induced ligation of substrates as described herein. Alternatively, the candidate agents may be detected within an organism, such as in the blood stream. In this embodiment, molecular recorders and corresponding substrates are injected or otherwise enter the blood stream of an organism. Detection of a ligated product indicates the presence of the cognate target in the blood stream of the organism. Detection of a plurality of targets is possible using molecular recorders with specific target binding moieties and a substrate binding domain that correlates with the target binding domain. In this manner, reading the sequence of substrates that are ligated together yields information on the identification of the target that induced the ligation.

[067] In one embodiment, individual candidate agents are assayed one at a time. If the first candidate agent binds or induces a biological response in which the target binds the target-binding moiety, ligation of the corresponding substrates will occur resulting on a product that can be associated with the presence of the first target or a biological effect of the first candidate agent. Notably, following addition of the first candidate agent, a second candidate agent may be investigated and if the second candidate agent binds to the target binding domain or induces a biological response in which a target is induced or produced or becomes available for binding to the target binding moiety, ligation of the corresponding substrates will occur resulting in a product that can be associated with the second candidate agent or a biological effect of the second candidate agent. In one embodiment, the product of a first ligation reaction may also contain sequences corresponding to a substrate of a second polynucleotide enzyme molecule. In this situation, the second polynucleotide enzyme may ligate its two substrates together forming a single product containing the product of the first and second ligation reactions. In this manner, a plurality of polynucleotide enzymes may ligate substrates together to form discrete products or a single contiguous product. See Figure 5 and 6. [068] Once the screen is complete, the products may be isolated from a cell by methods known in the art and sequenced to identify the nature of the ligated products. In some embodiments, this is referred to herein as decoding the ligated product or products, e.g. the molecular tape. Several methods can be employed for reading or decoding the products. These include, but are not limited to any of the commercial nucleic acid sequencing technologies including Sanger sequencing, pyrosequencing by 454, Illumina, Ion Torrent, single-molecule real-time sequencing and nanopore sequencing methods in development. Data can also be obtained through estimation of the length of tapes without single base resolution of their sequences. For example, capillary electrophoresis machines widely applied for the determination of length polymorphisms of microsatellite markers can be leveraged to estimate the fragment length distributions of the tapes. Real-time quantitative PCR (qPCR) using primers that target ligation junctions of tapes, e.g. enzymatic products, can also be applied to estimate the amount of recording events, e.g. ligation reactions, collected by the enzyme. In addition, reading by fluorescence imaging of tapes that containing fluorescent tags can be performed. Reading may also be performed using FISH probes complementary to tapes of varying lengths. Reading recorder data in nanochannel arrays in which nucleic acid sequences labeled by a fluorescent dye are stretched in a nanochannel and imaged at high resolution using a CCD camera may be performed. Reading by microarrays containing nucleic acid probes complementary to ligated tapes may be performed. Quantitative detection of ligated tapes can then be performed by measuring fluorescent intensity using high resolution scanners as is performed in standard applications of microarrays

[069] As described above substrates can be RNA, DNA and the like. To facilitate ligation of two target nucleotides, the substrates may be activated. Specifically, the 3 ' phosphate of at least one of the substrates should be activated for the ligation reaction to proceed. In one embodiment the substrates may be activated by phosphocyanate (cyanogen bromide), azide-alkyne (triazole linkage), phosphorimidazolide, including 3'-phosphoramidazolide, 3'-phosphorothioate, 5'-iodide, 5'-bromoacetyl, 5'-tosyl, phosphoramidate, boranophosphate linkages, peptide nucleic acid linkages (PNAs), locked nucleic acid linkages (LNAs), xeno nucleic acid linkage (XNAs), and morpholino linkage. Reaction classes that are applicable in activating nucleic acid templates or can be catalyzed by the nucleotide ligase include, but are not limited to reductive amination, amine acylation, acyl transfer, S_N2 reaction, conjugation addition, Henry reaction, Nitro-Michael reaction, Wittig reaction, 1,3-nitrone cycloaddition, Huisgen cycloaddition, Oxazolidine formation, Heck coupling, Cross coupling, Me-salen formation, Aldol Reaction (e.g. using DNA- linked glyceraldehyde and glycoladehyde).

[070] As such, the present disclosure provides a molecular recorder system. A molecular recorder system comprises a nucleotide enzyme as described herein and substrates that are designed to hybridize to the substrate binding domains of the nucleotide enzyme as well as provide an indication of the presence of and in some embodiments the identity of a target molecule. As such, when detecting a plurality of target molecules, a corresponding number of molecular recorder systems may be employed.

[071 ] As described herein, the nucleotide enzymes generally have ligase activity and therefore employ a ligation reaction to provide a read-out. The ligation reaction may include at least a first and second nucleotide ligase, metal co-factors if necessary, and substrates. Once combined, the reaction continues at for at least 1, 2, 3, 4, 5, 10, 20, 30, 40, 60, 120, 240, 360, 480 minutes or more. In one embodiment, the reaction occurs at room temperature. In other embodiments, the reaction occurs at 37°C. The two target nucleotides are ligated to assemble a functional ligation product, such as a chimeric polynucleotide encoding a chimeric protein or RNA molecule. In addition, the two target polynucleotides may assemble nucleotide regulatory regions, such as promoters/enhancers with coding sequences to be expressed or under the control of the promoter/enhancer.

[072] A benefit of using nucleotide ligases as described herein is the ability to perform simultaneous, multiplex sequence specific ligation in a single reaction vessel. First and second different nucleotide ligases are included in a single reaction vessel and perform at least two sequence specific ligations yielding two distinct products. Further, the reaction may include additional nucleotide ligases to produce a plurality of ligation products in a single reaction. In this embodiment, at least 5, 10, 15, 25, 50, 75, 100, 150, or more different nucleotide ligases are included in a single reaction to produce the plurality of ligation products. That is, up to 1000, 2000 or more ligation reactions can occur in a single reaction vessel. Moreover, following a first ligation reaction, substrates can be activated resulting in a second ligation reaction using the first set of nucleotide ligases. The cycle may continue until the appropriate number of ligations has occurred.

[073] The stability and flexibility provides for numerous uses for nucleotide ligases and molecular recorders comprising them. For instance the molecular recorder systems can be used in a variety of assays to determine if target molecules are present in a sample such as a biological sample. By "biological sample" is meant any bodily fluids (including, but not limited to, blood, urine, serum, lymph, saliva, anal and vaginal secretions, perspiration and semen, of virtually any organism, with mammalian samples being preferred and human samples being particularly preferred); biopsy/tissue material; environmental samples (including, but not limited to, air, agricultural, water and soil samples); biological warfare agent samples; research samples; purified samples, such as purified genomic DNA, RNA, proteins, etc.; raw samples (bacteria, virus, genomic DNA, etc.). As will be appreciated by those in the art, virtually any experimental manipulation may have been done on the sample.

[074] In this embodiment, nucleotide enzymes are designed to ligate two nucleotide substrate sequences in the presence of a target molecule anticipated to be in the sample. Following the reaction, the longer, ligated product is detected by methods known in the art. Production of the ligation product is evidence of the presence of the target molecule.

[075] In one embodiment the disclosure provides systems and methods for detecting target nucleotides, such as RNA, DNA, RNA/DNA hybrids and the like and transducing the sequence information of the target onto customizable nucleic acid molecules. In this embodiment nucleotide ligases as described herein contain a recognition domain for the targeted molecules and a highly programmable substrate recognition domain. The target molecule serves as a second substrate recognition domain, either on its own or in combination with a part of the ligase. The substrate recognition domain of the ligase can contain the information of the recognition sequence directly, or in encoded form (See Figure 13).

[076] In this embodiment the ligated DNA molecule contains the complementary sequence of part of the target molecule. In addition, the molecule can contain any combination of functional elements and additional information including but not limited to PCR primers, sequencing adapters, modifications/additions to allow extraction or sorting, such as biotin or fluorophores, a unique molecular barcode, sequences representing additional biological information, such as localization, time, cell type, etc. Libraries of various ligases and substrates can be used in parallel to measure different targets simultaneously.

[077] In one embodiment, ligation results in a DNA molecule containing sequencing adapters and optionally a unique molecular barcode. The ligated fragments can then be directly sequenced after extraction, with or without further amplification by PCR. This results in a one-step library preparation for transcriptome profiling by high-throughput sequencing, for instance, and can be performed in vitro or in vivo. In one embodiment the choice of the composition of the ligase/substrate libraries determines the fraction of the transcriptome that is interrogated. For instance, Targeted: allows deep sequencing of specific regions of the transcriptome, such as a gene of interest, or splice junctions; Whole genome: A random library, or one that reflects the transcriptome composition allows measurement of the entire transcriptome; and/or Selected: The library can be designed to specifically select a specific subset of the RNA e.g. non-ribosomal, viral/host, microRNAs etc. This procedure can be performed in vivo or in vitro.

[078] In one embodiment the ligated fragments described above may contain an an additional sequence that marks an individual cell. The cell-specific substrates can be delivered directly through microinjection, or can be made available by through an external agent or stimulus, for example light (as used in Transcriptome in vivo analysis: US patent 20130267678 Al, incorporated herein by reference). This unique cell marker sequence allows multiplexing of many cells in a same sequencing sample, e.g. multiplexed single cell sequencing.

[079] In one embodiment the system utilizes a previous ligation product. In this embodiment, in order to obtain a complete transduced molecule, a previous ligation triggered by a marker molecule (e.g. gene marker which identifies cell type, disease state, etc.) is required. The resulting molecule contains the information about the molecule that triggered it, thus carrying additional biological information (See Figure 14). This can be extended to involve complex conditions and encode logical operations, and can be combined with the sequencing approaches above. [080] In one embodiment the system provides for single-molecule quantification of expression. In this embodiment, the ligase sequesters the target RNA molecule upon binding, making it unavailable for further transduction (for instance, the ligase library is attached to a solid support). This ensures a single molecule only gives rise to a single ligated output molecule. After denaturing and sequencing of the output molecules, quantification can be done at a single molecule level.

[081] In one embodiment the system provides for converting similar sequences into divergent results.

In this embodiment, the recognition domains of the library distinguish between very similar target sequences. Through the substrate recognition domains, they transduce this information into very divergent DNA molecules, which are robust to imprecise measurement, e.g. sequencing errors or unspecific aptamers.

[082] By "nucleic acid" or "oligonucleotide" or grammatical equivalents herein means at least two nucleotides covalently linked together. A nucleic acid of the present invention will generally contain phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are included that may have alternate backbones, comprising, for example, phosphoramide (Beaucage et al., Tetrahedron 49(10):1925 (1993) and references therein; Letsinger, J. Org. Chem. 35:3800 (1970); Sprinzl et al., Eur. J. Biochem. 81 :579 (1977); Letsinger et al., Nucl. Acids Res. 14:3487 (1986); Sawai et al, Chem. Lett. 805 (1984), Letsinger et al., J. Am. Chem. Soc. 110:4470 (1988); and Pauwels et al., Chemica Scripta 26: 141 91986)), phosphorothioate (Mag et al., Nucleic Acids Res. 19:1437 (1991); and U.S. Pat. No. 5,644,048), phosphorodithioate (Briu et al., J. Am. Chem. Soc. 111 :2321 (1989), O-methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A Practical Approach, Oxford University Press), and peptide nucleic acid backbones and linkages (see Egholm, J. Am. Chem. Soc. 114: 1895 (1992); Meier et al., Chem. Int. Ed. Engl. 31 : 1008 (1992); Nielsen, Nature, 365:566 (1993); Carlsson et al., Nature 380:207 (1996), all of which are incorporated by reference). Other analog nucleic acids include those with positive backbones (Denpcy et al., Proc. Natl. Acad. Sci. USA 92:6097 (1995); non-ionic backbones (U.S. Pat. Nos. 5,386,023, 5,637,684, 5,602,240, 5,216,141 and 4,469,863; Kiedrowshi et al., Angew. Chem. Intl. Ed. English 30:423 (1991); Letsinger et al., J. Am. Chem. Soc. 110:4470 (1988); Letsinger et al., Nucleoside & Nucleotide 13:1597 (1994); Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate Modifications in Antisense Research", Ed. Y. S. Sanghui and P. Dan Cook; Mesmaeker et al., Bioorganic & Medicinal Chem. Lett. 4:395 (1994); Jeffs et al., J. Biomolecular NMR 34: 17 (1994); Tetrahedron Lett. 37:743 (1996)) and non-ribose backbones, including those described in U.S. Pat. Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC Symposium Series 580, "Carbohydrate Modifications in Antisense Research", Ed. Y. S. Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also included within the definition of nucleic acids (see Jenkins et al., Chem. Soc. Rev. (1995) pp 169-176). Several nucleic acid analogs are described in Rawls, C & E News Jun. 2, 1997 page 35. All of these references are hereby expressly incorporated by reference. These modifications of the ribose-phosphate backbone may be done to facilitate the addition of labels, or to increase the stability and half-life of such molecules in physiological environments.

[083] As will be appreciated by those in the art, all of these nucleic acid analogs may find use in the present invention. In addition, mixtures of naturally occurring nucleic acids and analogs can be made. Alternatively, mixtures of different nucleic acid analogs, and mixtures of naturally occurring nucleic acids and analogs may be made.

[084] Particularly preferred are peptide nucleic acids (PNA), which include peptide nucleic acid analogs. These backbones are substantially non-ionic under neutral conditions, in contrast to the highly charged phosphodiester backbone of naturally occurring nucleic acids. This results in two advantages. First, the PNA backbone exhibits improved hybridization kinetics. PNAs have larger changes in the melting temperature (Tm) for mismatched versus perfectly matched basepairs. DNA and RNA typically exhibit a 2-4°C drop in Tm for an internal mismatch. With the non-ionic PNA backbone, the drop is closer to 7-9°C. This allows for better detection of mismatches. Similarly, due to their non-ionic nature, hybridization of the bases attached to these backbones is relatively insensitive to salt concentration.

[085] The nucleic acids may be single stranded or double stranded, as specified, or contain portions of both double stranded or single stranded sequence. The nucleic acid may be DNA, both genomic and cDNA, RNA or a hybrid, where the nucleic acid contains any combination of deoxyribo- and ribo-nucleotides, and any combination of bases, including uracil, adenine, thymine, cytosine, guanine, inosine, xathanine hypoxathanine, isocytosine, isoguanine, etc. A preferred embodiment utilizes isocytosine and isoguanine in nucleic acids designed to be complementary to other probes, rather than target sequences, as this reduces non-specific hybridization, as is generally described in U.S. Pat. No. 5,681,702. As used herein, the term "nucleoside" includes nucleotides as well as nucleoside and nucleotide analogs, and modified nucleosides such as amino modified nucleosides. In addition, "nucleoside" includes non-naturally occurring analog structures. Thus for example the individual units of a peptide nucleic acid, each containing a base, are referred to herein as a nucleoside. In embodiments when target nucleic acids are found in double stranded nucleic acids the samples may be treated to disrupt the double strand so that the single stranded polynucleotides may serve as templates for nucleotide ligation.

[086] While the compositions and methods of this invention have been described in terms of preferred embodiments, it will be apparent to those of skill in the art that variations may be applied to the compositions and/or methods and in the steps or in the sequence of steps of the method described herein without departing from the concept, spirit and scope of the invention. More specifically, it will be apparent that certain agents which are both chemically and physiologically related may be substituted for the agents described herein while the same or similar results would be achieved. All such similar substitutes and modifications apparent to those skilled in the art are deemed to be within the spirit, scope and concept of the present invention.

EXAMPLES

[087] The following examples are included to demonstrate preferred embodiments of the invention. It should be appreciated by those of skill in the art that the techniques disclosed in the examples which follow represent techniques discovered by the inventor to function well in the practice of the invention, and thus can be considered to constitute preferred modes for its practice. However, those of skill in the art should, in light of the present disclosure, appreciate that many changes can be made in the specific embodiments which are disclosed and still obtain a like or similar result without departing from the spirit and scope of the invention.

EXAMPLE 1

NUCLEOTIDE LIGASE CATALYZED ASSEMBLY OF TARGET DNA

[088] E47 ligase: [089] 5-CGGATAGTGTTCTTTCGCTAGACCATGTGACGCATGGTGAGATGCTT-3 (SEQ ID NO:2)

[090] Substrate 1 (SI): 5-AAGCATCTCAAGC-3 (SEQ ID N0:3) [001] SEQ ID NO [°⁹¹] Substrate 2 (S2): 5-GGAACACTATCCG-3 (:4)

IMIDAZOLE ACTIVATION OF S 1

[092] Substrate 1 was first prephosphorylated at the 3' end (Integrated DNA Technologies). To add the imidazole group to the activated 3 'phosphate group, 20 uL of 100 uM SI with 2.5 uL of 1M imidazole (pH 6.0) and 4.5 uL of 1M EDC.HCL was incubated at room temperature for 1 hour. The mixture was then purified by a desalting column (PD-10, Amersham Biosciences) and the first fraction collected. FIG 1 depicts the E47 nucleotide ligate hybridized to target DNA.

E-47 CATALYZED LIGATION

[093] The activated SI was mixed with the E-47 ligase (2 uM), Cu(N03)2 (20 uM) and S2 (3 uM) in 25 mM HEPES (pH 7.0) containing 300 mM NaCl. The solution is incubated for 4 hours at room temperature.

ASSESSMENT OF LIGATION

[094] The success of the ligation reaction was observed using FACS. Fig. 2 depicts results demonstrating increased ligation of two nucleotides. Fluorescent measurement of ligated DNA molecules was obtained after exposure of a DNA sequence tagged with a polystyrene bead (biotin labeled) and another sequence labeled with fluorescent dye FAM. This method allowed detection of only ligated DNA molecules because they have both the polystyrene bead and FAM dye which was detected by fluorescent activated cell sorter (FACS).

EXAMPLE 2

CELL BASED SCREEN FOR M NA

[095] Transcriptional profiling of cells currently relies on technologies such as microarrays and RNAseq. While these technologies can assay the levels of thousands of transcripts in parallel, they provide snapshots of the levels of each gene at specific time-points. The generation of time- series transcriptional data sets, for example throughout the developmental cycle of cells, requires samples to be taken at multiple time points. Due to this, transcriptional changes that occur over short periods of time are easily missed depending on the time points chosen. Furthermore, collection of biological samples of cells over short time intervals can interfere with the integrity of cell cultures and demands large volumes of cultures. In addition, because cell cultures are commonly highly heterogeneous, cells sampled at different time intervals do not necessarily have the same transcriptional patterns of other cells within the same culture, thereby confounding extrapolation of time-series data sets. Therefore, strictly speaking, many existing transcriptional data sets collected in various phenotypic states (e.g cancers) only reflect transcriptional patterns at specific time-points.

[096] The molecular recorder described herein is fundamentally distinct from these existing technologies because of its ability to track fluctuations in transcript levels over time without the need for sampling at multiple time-points. Because each recording unit of the recorder measures changes in only a single cell at a time, the data from each product, e.g. a molecular tape, of the recorder directly relates to events in a single cell.

Recording temporal dynamics in an mRNA signal across a cell population

[097] One application of the recorder for transcriptional profiling is to assay the temporal fluctuations in an mRNA transcript across a cell population without the need for collection of multiple time- points. For example, one may track the expression of a cancer specific mRNA transcript in cancer cells vs. normal cells. The procedure described below is applicable to any situation in which one wishes to compare the expression of a signal in any two phenotypic states including drug exposed vs. exposed cells, one cell type vs. another cell type, one developmental state vs. another, drug resistant vs. sensitive cells, etc.

[098] For such an application, the recorder is constructed by first designing a switchable DNAzyme (such as presented in Figure 3) that becomes active in the presence of the mRNA signal of interest. The recorder alongside DNA substrates can then be transfected into replicate cultures of cancer cells and control cultures composed of normal cells. The cell cultures can then be incubated for a desired period of time (from minutes to days. At the end of the desired period, the cell cultures are harvested by standard procedures and DNA extraction performed.

[099] Following DNA extraction, targeted sequencing of the recorder products, e.g. tapes, from each culture is performed at high depth by first ligating adapter sequences of length (10 to 50 bases) and with no sequence similarity to the tapes or host genome sequences. Ligation of adaptor sequences can be performed using standard methods for adapter ligation or sequencing library preparation for next generation sequencing technologies. The ligation of adapters ensures that during sequencing of the tapes, other sequences- e.g genomic DNA and mRNA transcripts are not sequenced. This increases the potential sequencing coverage and depth of tapes relative to other nucleic acid sequences, increasing the overall sensitivity of the recording system. Recorder tapes may also be enriched by sequence capture methods such as hybrid selection methods on microarrays or beads, multiplex-PCR and targeted circularization. The sequencing libraries of the tapes can then be sequenced on a variety of sequencing platforms such as Illumina MiSeq, 454 or ion torrent.

[0100] The resulting sequencing reads can then be aligned and the length of each read consisting of ligated units of tapes can then be determined. From this data, the following parameters can be directly computed for each sample (in this example cancer vs. normal cell cultures): i) mean length of tapes, ii) variance of tape lengths, iii) frequency distributions of the tape lengths.

EXAMPLE 3

PREDICTION/ CLASSIFICATION OF PHENOTYPIC STATES SUCH AS CANCER OR OTHER DISEASES

[0101] The parameters above can be used to differentiate phenotypic states (e.g cancer vs. normal cells).

For example, a highly expressed gene in cancer cells will lead to the generation of longer tapes (high mean tape length) which is significantly different to those observed in normal cell cultures/ controls. While cancer biomarkers that involve over/ under-expression of specific genes (such as P53) or aberrant mutations can be assayed by methods such as qPCR, RNAseq or microarrays, the results if such methods can be confounded when the tissue samples from a patient are assayed at an incorrect time-point or when the cancer cells only constitute a small fraction of the assayed cells (e.g. in early stage cancer). The recorder can also be used to detect mutated cancer genes or drug resistance genes by the use of DNAzymes that are activated only by the mutated version of the genes.

[0102] A robust method for predicting phenotypic states from the recorder data may involve the use of frequency distribution of tape lengths for several samples of cultures from 2 or more phenotypic states as a feature vector for machine learning classifiers. For instance, for each biological sample, say Sj to S„, a feature vector consisting of Z elements where each element f_y, is the observed frequency of a tape of length j units in the i^th sample. Each sample in the training set has an associated phenotype (represented by 1 or 0 in the example below but can also be a continuous variable representing the expression level of a phenotype, e.g. growth rate, levels of a protein or metabolite, toxicity).

[0103] Here, the feature vector of recorder containing frequency of tape lengths for a given number of biological samples is evident. Machine learning classifiers (e.g. SVMs, Na^'ive Bayes) can be trained using such a feature vector and resulting models applied in the classification of phenotypic states or prediction of levels of non-binary phenotypes.

[0104] Identification of distinct phenotypic states by unsupervised methods

[0105] The feature vector shown above can also be used to characterize relationships between biological samples or phenotypic states. For example, similarity between phenotypic states can be determined by computing correlations between the feature vectors. Biological samples can be placed into groups by clustering approaches with this feature vector as input (using K-means, self-organizing maps, expectation-maximization algorithm or hierarchical clustering approaches). Previously unknown phenotypic states can also be identified as biological samples that cluster distinctly from other samples.

[0106] A single biological sample can also be segmented into distinct phenotypic states - that are referred to as pseudo-cells- based on the feature vector of the sample. Assuming that recorder tapes of the same characteristics (e.g length) are more likely derived from the same single cell or from another cell but of similar phenotype, tapes that are highly similar to each other reflect highly related phenotypes. Pseudo-cells can be identified by applying binning algorithms to the feature vector for a given sample.

EXAMPLE 4

RECORDING CELL TO CELL VARIATION (HETEROGENEITY) IN EXPRESSION OF SPECIFIC GENES

[0107] Molecular processes in biological systems are inherently noisy. For example, even genetically identical cells cultured under the same conditions are rarely phenotypically identical. Single cell heterogeneity can arise from intrinsic features of transcriptional regulatory systems or signaling pathways. Heterogeneity can also arise due to genetic and/ or epigenetic variation among cells in the same tissues, as happens in cancer cells due to rapid accumulation of mutations. Heterogeneity has important implications for understanding basic biological processes such as stem cell differentiation or tissue development and developing therapeutic interventions against multi-drug resistant cancer cells and pathogens.

[0108] At steady state, heterogeneity (H) in gene expression using molecular recorders that detect an mPvNA signal is directly proportional to the variance in the length of tapes encoding the mRNA signal:

[01 10] Where R is the total number of tape sequences (reads), Zj is the length of the i^th tape (read) and μ is the mean length of all tapes (reads). Heterogeneity can be expressed in relative units by determining the ratio between H in a given sample (e.g. cancer, drug resistant cells) and a control (e.g. normal) sample. [01 1 1] Simultaneous detection of heterogeneity and temporal fluctuations of a signal

[01 12] Many biological signals not only vary across cells but also across time. In fact, cellular heterogeneity commonly arises because many cells within a tissue or culture sampled at the same time do not occupy the same dimension in 'biological time' due to developmental/ cell cycle differences. To detect the heterogeneity of a signal and its fluctuations over time (i.e. variation in both space and time) simultaneously, the use of a timestamp is recommended- a nucleic acid sequence that is incorporated at a known rate into the molecular tape. Two basic types of timestamps may be used: a relative timestamp and a biological event timestamp (See Figure 9).

[01 13] Here a timestamp is a set of "DNA codewords" of known concentration (or introduced into cells at known intervals). The encoding rate of the timestamp into the tape is used as reference. Fluctuation frequency/concentration of signal being recorded as expressed in terms of relative units (molecular recording units) i.e. normalized by the frequency of the timestamp. Most biological studies involve comparing conditions e.g. drug treated and untreated cells. Thus, the relative time will be useful for many biological applications. Biological events, such as cell division or detection of a housekeeping gene activity occurs it down time continuously encoded by the recorder and used to set a fixed time. The time can also be set by other signals of interest in being recorded, for instance to compare two signals. The frequency/concentration of signal of interest is expressed relative to the frequency of the known biological signal. For instance, the signal can be described as fluctuating at a rate of X per cell division.

[01 14] Transforming relative fluctuations into fluctuations in 'clock-time'

[01 15] Molecular Recording Units can be transformed into clock time when the incorporation rate of the timestamp is known. The incorporation rate of the timestamp can be determined by performing an experiment in which a known concentration of the timestamp is transfected into cells and its incorporation rate into molecular tapes followed over multiple time intervals e.g. 10 mins, 20 mins, 30 mins, 40 mins, 50 mins, 60 mins, etc. The incorporation rate of the timestamp is not necessarily linear with time and hence increased accuracy of the timestamp can be obtained by measurements

detection threshold of the recorder such that the lowest concentration of the signal that can be detected reliably is equal to the concentration of timestamp (See Figure 10).

[01 16] Figure 10 exemplifies transformation sequence data from a single tape into relative concentrations of the signal over time. Timestamps are indicated in red while black segments represent pieces of tape that are added in the presence of the signal of interest the relative times dependent on the basal activity of the DNAzyme, hence it is proportional to the number of tape segments incorporated or the length of tapes generated. The basal rate can be determined by running a control experiment (e.g. if investigating cancer cells, control experiments can be run using normal cells and the basal rate determined as the number of timestamp segments incorporated into a tape in the control experiment). In a typical experiment, sequence data may be obtained from thousands of tapes coming from many cells/biological states within the sample. The tapes can be placed into groups of biological states/pseudo-cells by applying data clustering approaches to the tape 'fingerprints' - binary sequences in which the data in the tape is represented by a '0' when the tape segment is a timestamp and T if the segment encodes the signal being monitored. Tapes whose fingerprints are identical are highly likely to come from the same cell or from highly related biological states.

[01 17] Estimating temporal fluctuations into 'clock-time' using machine learning

[01 18] To estimate temporal fluctuations in clock-time, training data from a range of known concentrations of input signals (e.g. specific mRNA) spiked into cells at known intervals of time can be first generated. For example, a specific mRNA can be transfected into cells at 10 minute intervals in increasing or decreasing concentrations of O.OlnM, 0.1 nM, InM, ΙΟηΜ, ΙΟΟηΜ, ΙΟΟηΜ, etc. to mimic fluctuations of a signal in an hour. From each of these experiments, tape fingerprints encoding timestamp and signal of interest (see figure above) can be generated after sequencing all the tapes. The fingerprints from experiments in which cells are exposed to the same input concentration can then be combined together to obtain a 'biological sample signature'. To construct this signature, unique tape fingerprints are first determined. The frequency at which each unique fingerprint is represented in the sequence reads is then obtained. The sample signature is then constructed as a vector whose elements are the frequency of each type of unique tape. This signature can then be applied as a feature vector to train machine learning classifiers or used to cluster biological samples into groups.

[0119] Characterization of the transcriptomes of specific cells using whole transcriptome tagging

[0120] The molecular recorder can also be used to record/ infer the whole transcriptome of cells that are expressing a particular phenotype or genotype within a protein coding sequence in a biological sample in which the cells of interest are underrepresented. Currently, there is no specific way to characterize the transcriptome of a minority group of cells expressing a particular genotype or phenotype without first isolating such cells. Furthermore, isolation of such cells can be very challenging and can interfere with their biology. For example, to capture the transcriptional state of a highly transient development checkpoint during stem cell differentiation or the transcriptome of a cell containing a specific drug resistance or cancer mutation is outlined in Figure 11.

[0121] Figure 11 shows parallel tagging of the transcriptome of cells displaying a particular phenotype or genotype. The tagging process leverages a switchable catalytic nucleic acid ligase that in the presence of the signal of interest becomes active and attaches a tape of nucleic sequence to mRNA transcripts in the cell. The tagging process takes advantage of the fact that RNA polymerase II transcripts contain a poly A tail. Thus nucleic acid ligases containing substrate recognition domains complementary this poly A tail will ligate specified tapes to the mRNAs in cells expressing the signal of interest. The transcriptome of the cells of interest can then be identified by targeted sequencing using primers complementary to the ligated tapes.

EXAMPLE 5

DETECTION OF MIRNA

[0122] We attempted to design a split switch system where E47 was split into 2 separate molecules where a bridging oligo would join the 2 halves together and make a functional ligase from 2 separate parts (see Figure 15a). Figure 15b shows this initial attempt. Notably, the 2 separate halves of E47 retained ligase activity even in the absence of our bridging input switch. This then presented an opportunity for designing a nucleotide ligase where the second half of the ligase is the switching input. The inactive or 'off switch would therefore be the half of the E47 by itself, and the On' ligase would be the second half completing the ligase and conveying ligating activity (Figure 4c). We tested this idea and noted a ligation product only in the presence of all the components of the ligation. Having discovered that E47 can be split into 2 separate halves forming a split switch system, we then explored the sensor functions. We changed the sequence of the target (or second half of E47) to match that of the physiologically relevant microRNA, miRNA122. The switchable ligase was capable not only of sensing and ligating specific ssDNA in response to ssDNA representing miRNA122, but also the RNA of miRNA122.

Claims

1. A method for recording a detectable signal onto an oligomer comprising providing a molecular recorder system said system comprising at least a first, a second and a third molecular entity, wherein the first molecular entity exhibits a ligase activity whose activity to ligate the second and third molecular entities depends on the presence of said signal, said method comprising exposing said system to a signal whereupon in the presence of the signal, the first molecular entity joins the second and third molecular entities, thereby generating an oligomer indicative of the presence of said signal.

2. The method according to claim 1 , wherein said second and third molecular entities comprise targets for the enzymatic entity where the second entity binds to one end of the enzymatic entity and the third entity binds to another end of the enzymatic entity.

3. The method according to claim 1, further comprising decoding said oligomers.

4. The method of claim 1 wherein the second and third molecular entities comprise oligonucleotide targets for the nucleotide enzyme where the second oligonucleotide is complementary to the 5' end of the nucleotide enzyme and the third oligonucleotide is complementary to the 3' of the nucleotide enzyme.

5. The method according to claim 1, wherein the amount of oligomer produced correlates with the concentration of said signal.

6. The method according to claim 5, wherein said signal is selected from the group consisting of a small molecule, a polynucleotide, a metabolite, a protein, and a lipid.

7. The method according to claim 5, wherein said signal is a chemical property such as pH, ionic concentration or the state/ rate activity of a biological system such as respiration, cell division, apoptosis, necrosis, death or the electronic state of a solution or cell such as voltage, or chemical modifications existing on another molecular entity such as post-translational modifications on a protein.

8. The method of claim 5 wherein said nucleotide enzyme is activated by bringing together first and second domains.

9. The method of claim 8, wherein the signal binds to a region of the first and second domains bringing the domains of the nucleotide enzyme together for catalysis.

10. The method of claim 1 wherein the nucleotide enzyme further comprises an autoinhibitory domain and target binding domain and the activity of the nucleotide enzyme is controlled by a signal that binds to the target binding domain to cause activation of the enzyme.

11. The method of claim 10, wherein said target binding domain is an aptamer.

12. The method of claim 10 wherein the activity of the nucleotide enzyme is controlled by a polynucleotide signal that binds to a polynucleotide sequence contiguous with the nucleotide enzyme.

13. The method of claim 10 wherein the activity of the nucleotide enzyme is controlled by a polynucleotide sequence released in the presence of a signal wherein the signal displaces the polynucleotide from a self-hybridized state and releases it to activate the nucleotide enzyme.

14. The method of claim 3, wherein said decoding comprises sequencing said oligomers.

15. The method of claim 14, wherein said sequencing is by a method comprising Sanger sequencing, pyrosequencing by 454, Illumina, Ion Torrent, single-molecule real-time sequencing and nanopore sequencing methods.

16. The method of claim 3, wherein said decoding comprises estimating the length of the ligated oligomer product.

17. The method of claim 16, wherein said estimating is by a method comprising capillary electrophoresis machines widely applied for the determination of length polymorphisms of microsatellite markers can be leveraged to estimate the fragment length distributions of the ligated oligomers wherein the length of oligomers is proportional to the signal concentration over time.

18. The method of claim 3, wherein said decoding is by a method comprising real-time quantitative PCR (qPCR), wherein the primers for said aPCR hybridize to junctions of the ligated oligomers wherein the amplicon lengths are proportional to the signal concentration over time.

19. The method of claim 3, wherein said decoding comprises fluorescence imaging of ligated oligomers that contain fluorescent tags.

20. The method of claim 3, wherein said decoding comprises using FISH probes complementary to molecular tapes of varying lengths.

21. The method of claim 3 wherein said decoding comprises the use of nanochannel arrays in which nucleic acid sequences labeled by a fluorescent dye are stretched in a nanochannel and imaged at high resolution using a CCD camera.

22. The method of claim 3, wherein said comprises the use of microarrays containing nucleic acid probes complementary to ligated oligomers.

23. The method of claim 22, wherein quantitative detection of ligated oligomers comprises measuring fluorescent intensity.

24. A polynucleotide comprising: a. first and second substrate binding domains; b. at least one nucleotide enzyme domain; and c. at least one target binding domain.

25. The polynucleotide according to claim 24, further comprising at least one autoinhibitory domain.

26. A molecular recorder system comprising: a. a polynucleotide comprising: i. first and second substrate binding domains; ii. at least one nucleotide enzyme domain; iii. at least one autoinhibitory domain; and iv. at least one target binding domain; and b. first and second polynucleotide substrates, wherein said first and second polynucleotide substrates are complementary to said first and second substrate binding domains.

27. The molecular recorder system of claim 26 comprising at least first and second of said polynucleotides and first and second pairs of polynucleotide substrates, wherein said first pair of polynucleotide substrates is complementary to said first and second substrate binding domains of said first polynucleotide and said second pair of polynucleotide substrates is complementary to said first and second substrate binding domains of said second polynucleotide.

28. The molecular recorder according to claim 26 or claim 27, wherein said polynucleotides further comprise at least one autoinhibitory domain.

29. A multi-functional molecular recorder system comprising a first and second molecular recorder systems of claim 26 or claim 27, wherein the target binding domain of said first polynucleotide is correlated with a sequence of at least said first or second substrate binding domains of said first polynucleotide and the target binding domain of said second polynucleotide is correlated with a sequence of at least said first or second substrate binding domains of said second polynucleotide.

30. A method of detecting a plurality of target analytes comprising contacting the multifunctional molecular recorder system of claim 29 with at least first and second candidate agents, whereby upon binding of said first candidate agent to said target binding domain of said polynucleotide of said first molecular recorder system, first and second polynucleotide substrates bind to first and second substrate binding domains of said polynucleotide of said first molecular recorder system, whereby said first and second substrates are ligated to produce a first product indicative of the presence of said first target, whereby upon binding of said second candidate agent to said target binding domain of said polynucleotide of said second molecular recorder system, first and second polynucleotide substrates bind to first and second substrate binding domains of said polynucleotide of said second molecular recorder system, whereby said first and second substrates are ligated to produce a second product indicative of the presence of said second target.

31. The method according to claim 29, wherein at least one of said substrates comprises sequences complementary to at least one substrate binding domain of each of the polynucleotides in said first and second molecular recorder systems.

32. The method according to claim 31, wherein said first product can become a substrate for said polynucleotide of said second molecular recorder system and in the presence of said second target, said first product is ligated to a second substrate that correlates with said second target, thereby producing a product comprising sequences that correlate with said first and second targets.

33. The method according to claim 31, wherein said first target is at known levels and said second target is at variable levels.

34. The method according to claim 33, further comprising detecting said product.

35. The method according to claim 34, wherein said detecting comprises detecting the occurrence of said first and second products to determine temporal occurrence of said first and second product.

36. A cell comprising at least a first and second heterologous nucleotide enzyme, wherein each of said first and second nucleotide enzymes comprise:

a. a first binding moiety;

b. at least a first and second target binding sequence; and

c. a catalytic sequence;

wherein said first and second target binding sequences of said first and second nucleotide enzyme are substantially complementary to polynucleotide target sequences and wherein upon hybridization of polynucleotide target sequences with first and second target binding sequences of any of said nucleotide enzymes, said nucleotide enzyme is capable of catalyzing ligation of said first and second polynucleotide sequences.

37. The cell of claim 36, wherein each of said first and second heterologous nucleotide enzymes further comprises at least a first inhibitory component capable of preventing activity of said nucleotide enzyme in the absence of a target of said first binding moiety.

38. The cell of claim 37, further comprising a target of said binding moiety of either said first or second heterologous nucleotide enzyme or both.

39. The cell of claim 38, further comprising a product of an enzymatic reaction of said first or second nucleotide enzyme or both.

40. The cell of claim 36, comprising at least 100 of said heterologous nucleotides.

41. The cell of claim 36, wherein the target of the first binding moiety of said first or second heterologous nucleotides is different.

42. The cell of claim 36, wherein the binding moiety of at least one of said first or second heterologous nucleotide enzymes is an aptamer.

43. The cell of claim 36 wherein said first and second heterologous nucleotide enzymes are nucleotide ligases.

44. The cell of claim 43, wherein the products of the enzymatic reactions of said first and second heterologous nucleotides are: a. in one contiguous polynucleotide sequence; or b. in first and second polynucleotide sequences.

45. A method of screening for the presence of target molecules comprising culturing the cell of claim 36 and detecting the presence of ligated products of at least one of said heterologous nucleotide enzymes.

46. The method of claim 45, wherein said detecting comprises sequencing said products.

47. A method of screening for a biological effect of an agent comprising: a. culturing the cell of claim 36 with a candidate agent; b. detecting the presence or absence of the products of at least one of said heterologous nucleotide enzymes, whereby a change in the presence or absence of the product of said first or second heterologous nucleotide enzymes following culturing with said candidate agent indicates that the candidate agent has a biological effect.

48. The method of claim 47, wherein said biological effect is selected from the group consisting of gene expression, nucleotide degradation, nucleotide methylation, phosphorylation state of a protein, protein degradation, protein glycosylation.

49. The method of claim 47, wherein said detecting comprises sequencing said first, second or both products.

50. The method of claim 47, wherein said agent is selected from the group consisting of small molecules, antibodies, bacterium, and viruses.

51. A method of recording temporal effects of a plurality of candidate agents on a biological response in a cell comprising: a. culturing the cell of claim 36 with a first candidate agent; b. then culturing the cell of claim 36 with a second candidate agent; c. detecting the presence or absence of the products of said first or second heterologous nucleotide enzymes, whereby a change in the presence or absence of the product of said first or second heterologous nucleotide enzymes following culturing with said first or second candidate agents indicates that the candidate agent has a biological effect.

52. The method of claim 51, whereby said detecting comprising sequencing said products.

53. The method of claim 51, wherein said first and second candidate agents are different and the products of the first and second heterologous nucleotides have different sequences.

54. A method of detecting a plurality of messenger RNA molecules comprising contacting the multifunctional molecular recorder system of claim 29 with at least first and second mRNA molecules, whereby upon binding of said first candidate agent to said target binding domain of said first polynucleotide, first and second polynucleotide substrates bind to first and second substrate binding domains of said first polynucleotide, whereby said first and second substrates are ligated to produce a first product indicative of the presence of said first target, whereby upon binding of said second candidate agent to said target binding domain of said second, first and second polynucleotide substrates bind to first and second substrate binding domains of said second polynucleotide, whereby said first and second substrates are ligated to produce a second product indicative of the presence of said second target, wherein poly -A tails of said mRNA molecules hybridize with at least one of said substrate binding domains of each of said first or second polynucleotides.

55. A method of phenotypic classification of a cell comprising performing the method of claim 54, wherein detection of longer products is indicative of a more highly expressed molecule.

56. The method of claim 55, wherein detection of said more highly expressed molecule indicates that said cell is a cancerous cell.