WO2008091225A1 - Détection comparative de motifs de structure dans des sites d'interaction de molécules - Google Patents
Détection comparative de motifs de structure dans des sites d'interaction de molécules Download PDFInfo
- Publication number
- WO2008091225A1 WO2008091225A1 PCT/SG2008/000025 SG2008000025W WO2008091225A1 WO 2008091225 A1 WO2008091225 A1 WO 2008091225A1 SG 2008000025 W SG2008000025 W SG 2008000025W WO 2008091225 A1 WO2008091225 A1 WO 2008091225A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- interaction site
- descriptor
- local
- target molecule
- global
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/60—In silico combinatorial chemistry
- G16C20/64—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/30—Drug targeting using structural data; Docking or binding prediction
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
Definitions
- the present invention relates to methods and systems for the systematic identification of structure patterns and representation of an interaction site of one or more molecules or macromolecules and analysis of interaction sites of associated pairs of molecules or macromolecules and of interactions between the associated pairs.
- a receptor protein has a binding site which selectively binds a particular ligand, and which initiates a cascade of reactions that induce a change of the state of the affected cell. This new state of the cell results in a biological response, such as enzyme activation or deactivation, protein synthesis, protein stabilization, or the release of hormones or transmitters, among others.
- ligands of receptor include naturally occurring and synthetic hormones, pheromones, neurotransmitters, peptides, drugs, and small molecules.
- a receptor may bind multiple ligands, or the same ligand may be recognized by multiple receptors.
- a cell may contain multiple copies of a particular receptor. The same type of receptor may be present in different cells. Some of these receptors belong to families with large number of variants. Even if two proteins share similar structures, they may have different functions, as the binding site is highly sensitive and a small number of amino acid residues differences may alter the function of the protein.
- High-throughput flexible docking is an emerging technology for rational lead discovery that attempts to find the correct binding mode of a candidate ligand within a target receptor (binding site) over a three-dimensional search space (Abagyan and Totrov, 2001; Shoichet and Bussiere, 2000; Gane and Dean, 2000). Many docking algorithms have been developed, guided by the geometry and/or energy of the candidate receptor and ligand. FlexX (Rarey et al., 1999) is an incremental docking technique that places short fragments of target ligand within the receptor binding site and gradually constructs the entire ligand by extending and linking the fragments together in a series of steps.
- DOCK4.0 (Fradera et al., 2000) is another incremental docking technique in which, at certain steps of the calculation, the similarity of a target ligand to a reference ligand is used as a weighting factor to correct the docking energy score.
- DARWIN (Taylor and Burnett, 2000) was developed based on a genetic algorithm and CHARMM force field.
- Internal Coordinate Mechanics (Abagyan and Totrov, 2001; Fernandez-Recio et al., 2002) utilizes biased Monte Carlo technique to sample different orientations of a ligand around a receptor (binding site) using a soft energy function precalculated on a grid.
- a method for representing an interaction site on a target molecule may comprise selecting a global region of the target molecule.
- the global region may encompass an interaction site on the target molecule.
- a plurality of local regions may be selected.
- At least one local descriptor for each of the local regions may be determined.
- the plurality of local regions may lie substantially within the global region.
- the plurality of local regions may be partially overlapping with one, two, three, four, five, six or more adjacent local regions.
- at least one global descriptor for the global region may be determined.
- a representation of the interaction site is formed by combining the local descriptors for each of the local regions.
- the representation of the interaction site may be formed by combining the at least one global descriptor for the global region and the at least one local descriptor for each of the local regions.
- the method may comprise in combination the steps of: selecting a global region of the target molecule, the global region encompassing an interaction site on the target molecule; selecting a plurality of local regions which lie substantially within the global region; determining at least one local descriptor for each of the local regions; and forming a representation of the interaction site from the local descriptors for each of the local regions.
- the method may further comprise determining at least one global descriptor for the global region.
- the representation of the interaction site may be formed by combining the global descriptors for the global region and the local descriptors for each of the local regions.
- the target molecule may be a macromolecule for example a receptor, ligand, antibody, epitope, antigen, enzyme, or substrate among others, or it may be a small molecule.
- the small molecule may bind to a macromolecule.
- the at least one descriptor may be selected from the group consisting of free binding energy, entropic free energy, electrostatic energy, charge, van der Waals energy, torsion energy, hydrogen bonding energy, hydrophobic energy, bond stretching energy, and disulfide bonding energy, overall charge, charge distribution, contour, solvent accessible surface area, energetics, or torsion angles.
- the interaction site may comprise a plurality of elements (which may be functional elements of the interaction site or non-functional "spacer" elements) which may be residues for example amino acid residues, nucleotide residues monosaccharide residue for proteins, polynucleotides (eg. DNA/RNA) or polysaccharides (eg. carbohydrates) respectively.
- the residues which participate in an interaction between interaction sites of two or more molecules are referred to as functional residues.
- The may be a plurality of functional residues and a plurality of non-functional residues and each of the plurality of functional residues may be encompassed by the global region and at least one local region.
- Each of the functional residues may be encompassed by a plurality of local regions.
- the plurality of local regions may be a plurality of adjacent overlapping local regions.
- the local regions may be selected such that they are evenly distributed throughout the global region, i.e. from the centre of the global region to its boundary; however, the local regions may be selected in an irregular manner provided that at least each of the functional elements of the interaction site are encompassed by at least one local region. Selection of a particular method of selecting the local regions may be based on its accuracy, soundness, availability and computational time for a particular interaction site.
- the representation of the interaction site may be selected from the group consisting of: a list of the at least one descriptors for each of the local regions; a delimited list of the at least one descriptors for each of the local regions; a linear combination of the at least one descriptors for each of the local regions and a matrix representation of the at least one descriptors for each of the local regions.
- the linear combination of the at least one descriptors for each of the local regions may be a weighted linear combination of the at least one descriptors for each of the local regions wherein each of the descriptors is assigned a weight.
- the matrix representation of the at least one descriptors for each of the local regions may be a weighted matrix representation of the at least one descriptors for each of the local regions wherein each of the descriptors is assigned a weight.
- a method of identifying or screening for common structural patterns in an interaction site of molecules may comprise representing an interaction site of at least one first target molecule and of at least one second target molecule using the method according to the first aspect.
- the representation of the interaction site of the first target molecule may be compared with the representation of the interaction site of the at least one second target molecule to identify or screen for common structural patterns in the interaction sites.
- the method may comprise the combination of representing an interaction site of at least one first target molecule and of at least one second target molecule using the method according to the first aspect; and comparing the representation of the interaction site of the at least one first target molecule with the representation of the interaction site of the at least one second target molecule to identify or screen for common structural patterns in the interaction sites.
- the first target molecule and the second target molecule form an association pair, where one or both of the molecules in the association pair may be macromolecules.
- the association pair maybe selected from the group consisting of: a receptor-ligand pair; an antibody-epitope pair or antibody-antigen pair; an enzyme-substrate pair; protein-protein pairs; channel/transporter-solute pairs or cytoskeletal-protein pairs; or other donor-acceptor molecule pair.
- the method of representing the interaction site may be provided by the method of the first aspect.
- the target molecule is at least one molecule, and in certain embodiments a plurality of molecules.
- the at least one or the plurality of molecules may be macromolecules.
- the target molecule is selected from the group consisting of a polypeptide, a polynucleotide, a polysaccharide, a glycoprotein, and a lipoprotein.
- the target molecule is in association with an associated molecule which interacts with the target molecule via the association site.
- a method of predicting interactions between at least one first and at least one second target molecule may comprise representing at least one interaction site of each of the at least one first target molecule and of the at least one second target molecule using the method according to the first aspect.
- the method may further comprise comparing the representation of the at least one interaction site of the at least one first target molecule with the at least one representation of the interaction site of the at least one second target molecule to identify or screen for common structural patterns in the interaction sites.
- the method may comprise in combination: representing at least one interaction site of each of the at least one first target molecule and of the at least one second target molecule using the method according to the first aspect; and comparing the representation of the at least one interaction site of the at least one first target molecule with the at least one representation of the interaction site of the at least one second target molecule to identify or screen for common structural patterns in the interaction sites.
- the comparison of the interaction site representations may comprise comparison of the descriptors in each of the interaction site representations.
- the comparison of the descriptors may be performed by either matching descriptor values, fuzzy matching of descriptor values.
- the fuzzy matching of descriptor values may comprise matching of a descriptor value to a fuzzy value within a desired range of values.
- the comparison of the descriptors may be performed by one or more machine-learning techniques selected from the group consisting of artificial neural network, support vector machine, hidden Markov model and genetic algorithms.
- the global region may encompass an interaction site in the at least one target molecule.
- the system may further comprise means for selecting a plurality of local regions which lie substantially within the global region.
- the system may further comprise means for determining at least one local descriptor for each of the local regions.
- the system may further comprise means for forming a representation of the interaction site from a combination of the at least one descriptor for each of the local regions.
- the system may further comprise means for comparing the representation of the interaction site of the target molecule with the representations of an interaction site of at least one test molecule to identify or screen for common structural patterns in the interaction sites of the at least one target molecule and the at least one test molecule.
- the system may further comprise means for determining at least one global descriptor for the global region.
- the means for forming the representation of the interaction site may comprise means for forming a representation of the interaction site from a combination of the at least one descriptor for the global region and for each of the local regions [ 0023 ]
- the system of the fourth aspect may, in a particular embodiment, comprise in combination: means for selecting a global region in at least one target molecule, the global region encompassing an interaction site in the at least one target molecule; optionally means for determining at least one descriptor for each of the local regions; means for forming a representation of the interaction site from a combination of the at least one descriptor for each of the local regions; and means for comparing the representation of the interaction site of the target molecule with the representations of an interaction site of at least one test molecule to identify or screen for common structural patterns in the interaction sites of the at least one target molecule and the at least one test molecule.
- the system may further comprise means for selecting a plurality of local regions which lie substantially within the global region wherein the means representation of the interaction site is formed from a combination of
- a computer system comprising a computer processor and memory, the memory comprising software code stored therein for execution by the computer processor of a method for representing an interaction site on a target molecule, the method comprising: selecting a global region in at least one target molecule, the global region encompassing an interaction site in the at least one target molecule; selecting a plurality of local regions which lie substantially within the global region; determining at least one local descriptor for each of the local regions; and forming representation of the interaction site by combining the at least one local descriptor for each of the local regions.
- the computer system may further comprise means for determining at least one global descriptor for the global region wherein the representation of the interaction site is formed by combining the at least one global descriptor for the global region and the at least one local descriptor for each of the local regions.
- a computer system comprising a computer processor and memory, the memory comprising software code stored therein for execution by the computer processor of a method for identifying or screening for common structural patterns in an interaction site of molecules, the method comprising: representing an interaction site of at least one first target molecule and of at least one second target molecule using the method according to the method of the first or fifth aspects; and comparing the representation of the interaction site of the at least one first target molecule with the representation of the interaction site of the at least one second target molecule to identify or screen for common structural patterns in the interaction sites.
- a software product for representing an interaction site of a molecule may comprise, either in combination or separately in discrete functional software code units or modules: code for selecting a global region in at least one target molecule, the global region encompassing an interaction site in the at least one target molecule; code for selecting a plurality of local regions which lie substantially within the global region; code for determining at least one local descriptor for each of the local regions; and/or code for forming a representation of the interaction site by combining the at least one of local descriptors for each of the local regions.
- the software product may further comprise code for determining at least one global descriptor for the global region.
- the code for forming a representation of the interaction site may comprise code for forming a representation of the interaction site by combining the plurality of global descriptors for the global region and the at least one of local descriptors for each of the local regions.
- the software product comprises code in discrete functional software code units or modules
- the software product further comprises code for using one or more of the functional software code units of modules in combination to form a representation of an interaction site in a molecule.
- the software product for identifying/screening for common structural patterns in an interaction site of molecules.
- the software product may comprise, either in combination or separately in discrete functional software code units or modules: code for representing an interaction site of at least one first target molecule and of at least one second target molecule; and code for comparing the representation of the interaction site of the target molecule with the representations of an interaction site of at least one test molecule to identify or screen for common structural patterns in the interaction sites of the at least one target molecule and the at least one test molecule.
- the software product comprises code in discrete functional software code units or modules
- the software product further comprises code for using one or more of the functional software code units of modules in combination to either identify or screen for common structural patterns in an interaction site of molecules from a representation of an interaction site in a molecule.
- the code of the eighth aspect for representing an interaction site may comprise, either in combination or separately in discrete functional software code units or modules: code for selecting a global region in at least one target molecule, the global region encompassing an interaction site in the at least one target molecule; code for selecting a plurality of local regions which lie substantially within the global region; code for determining at least one local descriptor for each of the local regions; and code for forming a representation of the interaction site by combining the at least one descriptor for each of the local regions.
- the code may further comprise code for determining at least one global descriptor for the global region.
- the code for forming a representation of the interaction site may comprise code for forming a representation of the interaction site by combining the at least one descriptor for the global region and the at least one descriptor for each of the local regions.
- an information database product comprising information on interaction site representations for a plurality of molecules.
- the interaction site representations for each of the plurality of molecules in the information database product may each being formed by the method comprising: selecting a global region in a molecule, the global region encompassing an interaction site in the molecule; selecting a plurality of local regions which lie substantially within the global region; determining at least one local descriptor for each of the local regions; and forming a representation of the interaction site of the molecule by combining the the at least one descriptor for each of the local regions.
- the method for deriving the interaction site representations may further comprise determining at least one global descriptor for the global region.
- the representation of the interaction site of the method may comprise forming a representation of the interaction site by combining the at least one descriptor for the global region and the at least one descriptor for each of the local regions.
- an information database product comprising information on a plurality of interaction site representations for each of a plurality of molecules, each of the plurality of information site representations for each molecule being formed by the method of the ninth aspect.
- Figure 1 is a diagrammatic illustration of the process of extracting a representation of descriptors for a protein interaction site using a single source protein (which for example may be a receptor or ligand).
- the contact amino acids are those that are involved in interaction, or that provide for structural integrity of the interaction.
- a 3D global region container in this case a sphere, is used to enclose structural features and optionally define global descriptors, such as structural features, of the source protein interaction site.
- Figure IA illustrates the extraction of descriptors from contact elements at a global- level.
- Figure 1C illustrates the seeding of the global region with a plurality of (possibly overlapping) local regions, in this case sperical local regions, to capture local-3D substructure features similar to those described in IB.
- Figure ID illustrates the extraction of local descriptors from contact elements at the local-level.
- Figure IE illustrates the combination of descriptors from both global- and local-levels.
- Figure IF illustrates the representation of descriptors of the source protein in a format suitable for use with a means which is capable of comparison with other representations of descriptors from other molecules.
- G Global region descriptor
- Lx Local region descriptor for local region number x
- Suffixes used are: C - Overall charge within enclosed environment, SA - Solvent accessible surface area, E - Binding energy, within enclosed environment.
- Figure 2 provides a diagrammatic illustration of the process of extracting a representation of descriptors for a protein interaction site using a receptor-ligand association pair; the contact amino acids are those that are involved in interaction, or that provide for structural integrity of the interaction.
- Figure 2 A illustrates a 3D global region container in the form of a sphere in this example, is used to select global structural features of both receptor and ligand binding site.
- the contact elements are amino acids that directly or indirectly affect a ligand-receptor interaction.
- Figure 2B illustrates the extraction of global descriptors from contact elements at the global-level.
- Figure 2C illustrates the seeding of the global region with multiples of (possibly overlapping) local regions to capture micro-3D substructure features similar to those described in B.
- Figure 2D illustrates the extraction of local descriptors from contact elements at the local region level.
- Figure 2E illustrates the combination of descriptors from both global and local levels.
- Figure 2F illustrates the representation of descriptors in a format suitable for use with a determining means. Prefixes used are: G - Global region descriptor, Lx - Local region descriptor for local region number x; Suffixes used are: C - Overall charge within enclosed environment, SA — Solvent accessible surface area, E - Binding energy, within enclosed environment.
- Figure 3 illustrates the hierarchical clustering of structural interaction characteristics for ligands binding to three different families of protein receptors, the clustering identified using the methods provided herein. Three well-defined clusters representing ligands binding to their respective receptors can be identified in this figure.
- Figure 4 is a flow chart of an embodiment of the procedure for forming an interaction site representation in accordance with the aspects of the invention.
- Figure 5 a flow chart of an embodiment of the procedure for identifying and/or screening for structural patterns between two or more molecules two molecules, where one or both of the molecules may be macromolecules, in accordance with the aspects of the invention.
- Figure 6 is a schematic block diagram of a general purpose computer upon which arrangements described can be practiced
- Figures 7A and 7B show a table of sample data for the binding and non-binding sites from mAb E5.2 (PDB ID IDVF) and HEL (PDB ID 1 A2Y) of Example 2.
- [ 0041 ] Disclosed herein are methods for representing an interaction site on a target molecule. These methods may be used in methods of identifying or screening for common structural patterns in an interaction site of molecules, by comparing the representations of interaction sites in silico. Also disclosed herein are systems for practicing these methods. [ 0042 ] The methods and systems provided herein may be used for the systematic identification of structure patterns involved in interactions between an association pair, such as between receptors and ligands or between an antibody and an epitope or between an enzyme and a substrate.
- the methods and systems also allow the prediction of molecule function, the prediction of receptor-ligand/antibody-epitope/enzyme-substrate binding sites, and the construction or screening of virtual ligands, receptors, antibody epitope-binding regions, epitopes, enzyme active sites, or substrates.
- the methods and systems described herein may be used to perform screening using either ligand (i.e. binding target) interaction site alone, receptor interaction site alone or ligand interaction site in combination with receptor interaction site, if available.
- ligand i.e. binding target
- the term "molecule" as described herein is intended to encompass macromolecules and compounds which interact with macromolecules via an association site.
- the molecule may be a polypeptide, such as a protein, a glycoprotein, a lipoprotein or a proteoglycan.
- the molecule may be a polynucleotide, such as a RNA or a DNA polynucleotide.
- the molecule may be a polysaccharide.
- the molecule may be a molecule which is not a polypeptide, a polynucleotide or a polysaccharide, for example a cholesterol- based hormone.
- These and other molecules may be described by common features such as geometry (e.g. the position of subunits of the molecule, surface area, bonding angles) and energy (e.g. entropic free energy, van der Waals energy, electrostatic energy or others), which may be used as descriptors in the methods described herein.
- a "target molecule” will be a molecule for which at least one structure is described from which three-dimensional data may be obtained.
- the structure may be, for example, a Nuclear Magnetic Resonance structure, a X-ray crystallography structure, or a cryo-electron microscope structure, or a structure obtained using computational modelling techniques such as homology modelling and ab initio molecular modelling techniques or any combination of these.
- sequence data may be used from which structure information of a potential test or target molecule may be determined from homology modelling or ab initio molecular modelling.
- the target molecule is a polypeptide, the amino acid sequence will be known.
- target molecule comprises polysaccharide units
- sequence and saccharide bond structure between monosaccharide residues will be known.
- target molecule is a polynucleotide
- nucleotide sequence of the polynucleotide will be known.
- the target molecule will have X-Y-Z positional data for each of the residues or atoms which it contains, although the methods described herein can utilise low resolution structures or models (also known as "fuzzy" data).
- An "interaction site" on a molecule refers to at least one site on the molecule which is known or suspected of interacting with another molecule.
- the interaction site is the site on the molecule which is represented by the methods described herein, to allow comparison with sites on other molecules for similarity or complementarity.
- An "interaction site” consists of those parts of the target molecule which may interact (for example via contact) with an interaction site of an associated molecule when the molecule interacts with the associated molecule, such as the other component of an "association pair", or those parts of the target molecule which are essential for the structural integrity of the contact parts.
- the shape and charge distribution of an interaction site may provide for the recognition and specificity of an interaction between an association pair.
- An "association pair” is a pair of molecules, each having an interaction site on their molecular structure, which interact with each other.
- association pairs may comprise any one of, among others, receptor-ligand pairs, antibody-epitope or antibody- antigen pairs, enzyme-substrate pairs, protein-protein pairs, channel/transporter-solute pairs or cytoskeletal-protein pairs, or other donor-acceptor molecule pairs which are known to interact.
- an association pair generally includes a pair of moieties and/or molecules that can be linked directly via a respective complementary interaction site on each of the pair of moieties and/or molecules.
- the linkage between the components of the association pair may be a non-covalent linkage formed by physical interaction or binding between the members of the association pair.
- the binding interaction between members of association pair may be driven by any suitable physical interaction(s), including but not limited to electrostatic, charge-charge (including distributed charge interactions), or ionic interactions, van der Waals interactions, hydrogen-bonding interactions, hydrophobic-hydrophilic interactions, dipole-dipole interactions, sulphide-sulphide bonding or creation of di-sulphide bridges and/or the like, and may be limited or moderated by physical structural shape of the component moieties and/or molecules of the association pair. These interactions generally do not require covalent interactions; however, these interactions may, in some cases, be supplemented by such interactions, for example, using cross-linking reagents.
- the linkage between the components of the association pair may be a covalent linkage, that is, one or more covalent bonds may be formed between the members of the association pair by chemical reaction.
- the components of the association pair may form a chemically reactive pair, in which both members of such a pair may be chemically modified by formation of the covalent linkage.
- association pairs may comprise channel/transportor (eg. passive or active holes) - transported solute pairs (eg. ions, amino acids); or cytoskeletal - protein (eg. muscle fibres) pairs.
- the antigen may comprise for example IgA, IgE, IgG, IgM or IgD immunoglobulins and the antigen may comprise for example any foreign entity such as food antigens (eg. food allergies) foreign antigens (eg. toxins from invasive bacteria, or any antigen derived from parasite, fungi, or viruses), or self antigens (eg. involved in autoimmune diseases such as rheumatoid arthritis).
- the interaction between a ligand — receptor association pair may comprise, for example, the analysis or screening of receptor-ligand drug interactions for example for predicting viral gene delivery activity and/or efficacy for diseases such as cancer and neuro-degenerative diseases such as Parkinson's disease Huntington's disease or Alzheimer's disease.
- the interaction site may comprise those amino acids on the target molecule which contact another molecule, or which are directly involved in positioning or stabilizing the amino acids which contact the other molecule.
- the interacting site may be comprised of a contiguous sequence of amino acids, or it may be a site comprised of two or more amino acid residues which are not contiguous.
- the amino acids which make up the interaction site may be identified by substituting amino acid residues or functional groups within the subject site and determining whether there is a loss or elimination of binding strength with its association counterpart.
- the borders of an interaction site may be identified by experimental data such as the identification of essential amino acid residues, as described above, or through the use of solution structures of experimentally solved receptor-ligand complexes.
- the interaction site comprises or consists of the ligand binding site on the receptor. Conversely, if the molecule is a ligand, the interaction site comprises or consists of the receptor-binding portion of the ligand. If the molecule is an enzyme, the interaction site comprises or consists of the substrate-binding portion of the enzyme. If the molecule is an antibody, the interaction site may comprise or consist of one or more of the hypervariable regions of the antibody which interact with an epitope.
- the methods provided herein include the step of selecting a global region which encompasses an interaction site on the target molecule, and the selection of a plurality of local regions which lie substantially within the global region.
- the global region encompasses the entire interaction site within a three dimensional defined space.
- the shape and size of the global region is selected so as to minimise those parts of the molecule encompassed by the global region which are not involved in the interaction site, in order to limit the comparison to the region of interest of the target molecule; however, it is not critical to the methods described herein if the global region encompasses parts of the molecule not contributing to the interaction site.
- Simple three dimensional shapes such as spheres are computationally less taxing and are easier to implement as global regions in the methods described herein; however, the methods described herein may be implemented using global regions of any three dimensional shapes.
- the global region may be any one of a platonic solid, a sphere, a cube, a prism, a pyramid, a cone or a cylinder, or a combination of any two or more of these.
- the global region may itself be used as an additional descriptor.
- the selection of global region may be based on either wet-lab experimental data such as point mutations, or three-dimensional solution structures of experimentally solved receptor-ligand complexes.
- the global region is used to determine the location of the entire binding site of interest and the location of local regions within the global region.
- specific descriptors at least one, but many more may be used for the global region may be determined which may be used to improve the predictive performance of the method in combination with the descriptors (at least one, but many more may be used) from each of the local regions, hi the methods described herein a "plurality of local regions" which lie substantially within the global region are selected.
- the plurality of local regions may be selected after the global region is selected, or plurality of local regions may be selected before the global region, and then a global region is selected to substantially encompass the local regions.
- the local regions may be defined using standard molecular three-dimensional visualization software such as, among others, Internal Coordinate Mechanics (Abagyan and Totrov 1994, Abagyan et al. 1994), or the RasMol (http://www.RasMol.org) or Jmol (http://jmol.sourceforge.net) molecular graphics visualization software programs. [ 0057 ]
- the number of local regions within the global region may be selected based on the structure of the interaction site (or sites where there is more than one interaction site within the global region) and the granularity and spacing between elements (which may be functional elements of the interaction site or non-functional "spacer" elements) of the molecule, particularly at the interaction site.
- the local regions are typically selected such that they are evenly distributed throughout the global region, i.e. from the centre of the global region to its boundary; however, the local regions may be selected in an irregular manner provided that at least each of the functional elements of the interaction site are encompassed by at least one local region. Selection of a particular method of selecting the local regions may be based on its accuracy, soundness, availability and computational time for a particular interaction site.
- Each of the elements or sub-units of the molecule or interaction site may be residues, for example amino acid residues, nucleotide residues monosaccharide residue for proteins, polynucleotides (eg. DNA/RNA) or polysaccharides (eg. carbohydrates) respectively.
- the residues which participate in an interaction between interaction sites of two or more molecules are referred to as functional residues.
- the local regions are chosen such that each of the functional residues at least in the interaction site of interest are encompassed by a local region.
- Each of the local regions may be centred on a functional residue of the interaction site.
- each of the functional residues eg. each of the contact amino acids in a protein
- the local regions may in some arrangements be approximately the same size as particular residues or they may be larger or smaller.
- the local regions may be about 1 A or greater in diameter for a spherical local region, however, the local regions may be any convenient shape and may be a different shape to that of the global region or other local regions.
- the local regions need not necessarily completely fill the space of the global region, however, the local regions need to encompass each of the functional residues which are active in the interaction site.
- the local regions may be selected to encompass non-functional residues of the interaction site, or they may be selected to encompass empty space within the global region.
- the selection of local regions for non-functional residue sites may be useful in determining whether that residue is a functional residue.
- the selection of local regions which encompass empty space may by used to identify differences in three-dimensional shape between the interaction sites of one or more molecules.
- the local regions may overlap within the global region and may overlap significantly with adjacent local regions.
- One or more of the functional elements/residues of the interaction site may be encompassed by one or more local regions, and may be encompassed by a significant number (ie. more than 5, more than 10, or more than 20) of local regions
- the amount of overlap may be determined on the basis of the amount that adjacent or nearby residues are expected to influence or otherwise affect each other, for example the amount in which amino acids influence the conformations of each other.
- the local regions may extend partially beyond the boundary of the global region, however, the residues of interest within the local region will also be within the global region. Where a function residue lies near the boundary of the global region, the local region encompassing that functional residue may extend beyond the boundary of the global region, and the local region may reside substantially outside of the global region.
- the size of the interaction sites for different macromolecules will vary and that the interaction site size varies even within a particular class of macromolecule (eg. proteins), thus the total number of local regions for representation of a particular interaction site is variable depending upon the specific macromolecule of interest.
- One or more of the properties of the local regions eg. the size, shape, number, amount of overlap between adjacent regions etc may be optimised to improve the performance of the method/system.
- the optimisation may for example be an iterative process where one or more of the properties of the local region are changed eg. increased/decreased repeatedly and the degree to which the interaction site representation agrees with the properties of a known equivalent or similar interaction site tested after each iteration to improve the performance.
- Machine-learning algorithms can also be used to handle fuzziness in the comparison with known or similar interaction sites to benchmark the accuracy of the interaction site representation obtained from the local regions.
- the performance may be benchmarked against a test dataset.
- the iterative process may cease when there is no further increase in accuracy of the representation to known equivalent or similar interaction site or the test dataset.
- the methods provided herein include the step of "determining at least one descriptor" for each of the local regions and optionally for the global region.
- a "descriptor" as described herein is intended to encompass a numerical value for a measurable or calculable parameter of the molecule which may be derived from knowledge of the structure of the molecule.
- a descriptor is a value which may be computed or measured from a three dimensional structure of a target molecule.
- Non-limiting examples of descriptors include the numbers of subunits, geometry descriptors, and energy descriptors.
- Descriptor parameters associated with the number of subunits may include the number of amino acid residues (for proteins), the number of monosaccharide units (for polysaccharides), the number of nucleotides (for polynucleotides) or the number of functional groups (for small molecules). Thus one descriptor for a region may be a simple count of the number of amino acid residues within the region.
- measurable parameters may include the surface area, the solvent accessible surface area, the torsion angles of amino acids within a specific region, or contours (including height, horizontal, vertical angles, width).
- measurable parameters include free binding energy, entropic free energy, electrostatic energy, charge, van der Waals energy, torsion energy, hydrogen bonding energy, hydrophobic energy, bond stretching energy, and disulfide bonding energy.
- the accessible surface area of a region of a molecule may be measured by tracing out the maximum permitted van der Waals' contact that is covered by the center of a water molecule as it rolls over the surface of the target molecule (Tong et al., 2007). Torsion angles may be computed from 3D coordinates (Hao et al., 2007
- the at least one descriptor used in the methods described herein will include one or more of the surface area, the free binding energy, and the number of amino acids or functional groups within a region.
- Each descriptor for each region is assigned a value based on its calculation. This value will typically be a numerical value. Although a variety of alternative calculation methods will be available for determining descriptors, the particular calculation method used will not be critical to the methods described herein provided the calculation method used is consistent between the target molecule and the molecule being compared with the target molecule.
- Examples of software which may be used to calculate descriptors includes Internal Coordinate Mechanics (Abagyan and Totrov 1994, Abagyan et al. 1994), SURPNET (Laskowski, RA, available from University College London ⁇ http://www.biochem.ucl.ac.uk/ ⁇ roman/surfiiet/surfhet.html>), HBPLUS (McDonald and Thornton, 1994), LIGPLOT (Andrew Wallace and Roman Laskowski, available from University College London), AutoDock (Morris et al, 1998), GOLD (available from The Cambridge Crystallographic Data Centre), GLIDE (Friesner et al, 2004) and FlexX (available from BioSolvelt GmbH).
- binding energies are typically in the negative range
- surface areas are normally greater than zero (>0)
- charges may be positive, 0 (neutral) or negative.
- the minimum number of descriptors for any given region will be one. More typically, however, there will be a plurality of descriptors for each region. In particular embodiments there will be at least 2, 3, 4, 5, 6, 7, 8, 9 or 10 descriptors for any given global or local region.
- the descriptors which are used to characterise the global region may be the same or different to the descriptors used to characterise each of the local regions.
- the number of descriptors used to characterise the global region may be the same or different to the number of descriptors used to characterise each of the plurality of local regions. Different sets and combinations of global descriptors and local descriptors may be used, provided the same global descriptors are used in a comparison between molecules and the same local descriptors are used a comparison between molecules.
- a global descriptor, such as overall charge, may also be used as a local descriptor, although the value of the descriptor may be different for the global region and each of the local regions.
- a global or local descriptor is selected from the group consisting of charge, solvent accessible surface area, van der Waals energy, hydrophobic energy, electrostatic energy, torsion energy, free binding energy, the number of amino acid residues within the defined enclosed environment, and torsion angles.
- the step of calculation of the plurality of global descriptors for the global region and local descriptors for each of the local regions may be carried out sequentially, with either the global region or the local regions calculated first, or simultaneously.
- the methods provided herein comprise the step of forming a representation of the interaction site, by combining the at least one descriptor for each of the local regions.
- the representation of the interaction site thus comprises a series of numerical values of parameters which are characteristic of the interaction site and which may be presented in a form which is readily compared with representations of other interaction sites or possible interaction sites, for example by computer means.
- the representation may comprise the step of forming a representation of the interaction site, by combining the descriptors) for the global region and the at least one descriptor for each of the local regions.
- a representation of the interaction site comprises providing the descriptors as a string of values or comprises providing the descriptors as a matrix of values. These embodiments are illustrated in the Examples below. Thus the representation of the interaction site preserves each of values of the descriptor for each region.
- the methods described herein involve the comparison of two or more representations of interaction sites. Such comparisons may be performed using matrix comparison techniques, exact matching, fuzzy matching, or using machine-learning techniques such as artificial neural networks (ANN), support vector machine (SVM), hidden Markov models (HMM) or genetic algorithms.
- ANN artificial neural networks
- SVM support vector machine
- HMM hidden Markov models
- a system will be trained based on the descriptors of the target molecule and the same set of descriptors will be used to screen other (test) molecules to identify potential similarities. It is preferable that multiple structures are used to train the system and techniques such as using machine-learning, fuzzy match or matrix comparison may be used, however, it is possible to operate the system with a single structure i.e. the target molecule by utilising either exact or fuzzy matching techniques.
- the representations of the interaction sites to be compared are formed from the same set of descriptors derived from the selection of the same global and local regions around each of the interaction sites.
- the comparison of representations may be a direct one-to-one comparison of the descriptors in each of the corresponding global and local regions, or the descriptors may be compared using fuzzy logic matching or machine-learning techniques, for example by allowing a range on the descriptor value which indicates a match.
- the descriptors in the representations of either of the interaction sites under comparison are first computed independently and compiled in a suitable representation as above for comparison. Each of the descriptors may be weighted or otherwise manipulated appropriately prior to the comparison.
- the descriptors may indicate a match or potential similarity where the descriptors for the target molecule are complementary to the corresponding descriptors of the test molecule(s), such as for example complementary charge (positive and negative) matches for interaction sites in a receptor-ligand pair.
- Further examples of complementary descriptors in receptor-ligand pairs may include geometry and binding energy and the descriptors for potential interaction sites in the receptor-ligand pair may be weighted or otherwise manipulated appropriately in accordance with known functions of interaction site processes.
- Also provided are methods of identifying or screening for common structural patterns in interaction sites of molecules comprising representing the interaction site of a at least one first molecule and of the interaction site of at least one second molecule using the method as described above, and comparing the representations to identify or screen for common structural patters in the interaction sites.
- Common structural patterns in the interaction sites may, for example, include geometry/shape of the molecule at the interaction site (i.e. three dimensional structure), charge distribution (including both sign and magnitude) and hydrophilicity/hydrophobicity amongst others. Comparisons of the structural patterns between interaction sites may be performed using linear combinations and/or matrices comprising numerical values for each of the descriptors used in the representations of the numerical sites.
- the comparison of descriptors may be an exact match, fuzzy matching (such as within a desired range of values), or machine-learning techniques such as artificial neural network, support vector machine, hidden Markov model and genetic algorithms.
- the output from the comparison of the representations may be presented as a numeric score indicative of the probability that the interaction site is a binding site or the molecule is a potential binder.
- the interaction sites for each of the test molecules may be ranked on the basis of their probability of being a binding site.
- the comparison of the interaction sites of the test molecules may be used to predict whether a particular interaction site will behave similarly or better than an alternative interaction site, where the alternative interaction site may reside on the same molecule (although not necessarily).
- the comparison may be carried out on a receptor-ligand pair which are in the act of interacting and also on each component separately, and the interaction site representation is also of use for screening for only one of the pair (i.e. either the receptor or the ligand in isolation) based on similarities between components based on the initial data such as the three-dimensional structure of the macromolecule.
- the properties of the target macromolecule may be manipulated (eg. a weighting or inverse function) the modified descriptors used to screen for potential test macromolecules likely to interact with the target molecule. This is particularly useful as typically only one component (either the receptor or ligand) is available for initial determination of the interaction site representation.
- Virtual ligand screening using descriptors derived from three-dimensional structures allow capturing of sophisticated patterns that define receptor-ligand binding.
- Important functional units eg. functional residues
- structure patterns analogous to the patterns of (protein) amino acid sequences or functional groups of chemicals.
- Families of proteins and synthetic catalysts with the same function may be formed. It may be possible to find a signature (pattern) derived from the protein structure which is characteristic of its function. Such descriptors can be used to match against a new protein of unknown function, and the outcome of the comparison may be indicative of the presence or absence of the particular function. These patterns can be used to classify protein structures into structure families, for example by considering the occurrence of common arrangements of secondary structure elements in the core of proteins, as described by Koch et al. (1996).
- Structure patterns can also be used to infer the function of a protein, for example the "coordinate templates" for finding Ser-His- Asp catalytic triads in the serine proteinases and lipases as reported in Wallace et al. (1996), and to screen for candidate binding partners.
- the methods described above may be used to identify or predict ligand and/or receptor interaction sites, or antibody and/or epitope interaction sites, or enzyme and/or substrate interaction sites using descriptors derived from the three-dimensional structure of the ligand, receptor, receptor-ligand complex, or antibody, epitope or antibody-epitope complex, or enzyme, substrate or enzyme-substrate complex respectively.
- the following uses of the methods described above are proposed in the context of receptor-ligand interactions, but should not be construed as suggesting that they are limited to only receptor- ligand interactions.
- the methods may be used to screen a binding candidate to a particular ligand or receptor for which no experimental data or three-dimensional structure is available, but which may be refined by inclusion of new experimental data.
- the methods may be used for the construction of a three-dimensional binding candidate to a particular ligand or receptor for which no experimental data or three- dimensional structure is available, but which may be refined by inclusion of new experimental data.
- the methods may be used to predict the activity of molecules for which no experimental data is available, but which may be refined by inclusion of new experimental data.
- the methods may be used for large-scale, high-throughput screening of molecules which can be generalized for prediction of receptor-ligand interactions for various receptor families.
- the methods may be used to study the phylogeny of protein families using descriptors derived from the three-dimensional structure of the ligand, receptor or receptor- ligand complex.
- the methods described above are based on the use of descriptors, such as functional features, derived from the three-dimensional structure of a molecule, such as a polypeptide or a protein-protein complex, for the prediction of interaction site patterns or binding activity.
- the methods described may be used in building a single model which can predict ligand binding to a multiplicity of different receptors, and/or may be used in the construction of three-dimensional representations of molecule structural formations, particularly known or potential interaction sites or vice versa.
- the methods may facilitate cyclical refinement of predictive models for improved accuracy by inclusion of new experimental data.
- the methods may facilitate high accuracy predictions of ligand binding to receptor molecules for which no experimental data are available.
- the methods may be generalised to the prediction of a wide variety of types of molecular interactions, for example interactive pair interactions such as between receptor- ligand complexes, antibody-epitope complexes or enzyme-substrate complexes.
- the invention relates to a computer program, residing on a computer-readable medium, for identifying molecule interaction sites comprising instructions or code for causing a computer to represent a an interaction site (for example a ligand or receptor interaction site; or alternatively a antibody or epitope interaction site; or alternatively still a enzyme or substrate interaction site) by descriptors using a probabilistic approach or other deterministic means such as multiple regression, artificial neural network (ANN), hidden markov model (HMM) or other dynamic Bayesian network model, support vector machines (SVM) or alternatively a fuzzy means for representing descriptors as would be appreciated by the skilled addressee. Where necessary, each descriptor may be associated with a variance in order to provide a degree of relaxation.
- ANN artificial neural network
- HMM hidden markov model
- SVM support vector machines
- each descriptor may be associated with a variance in order to provide a degree of relaxation.
- the computer program may be generally described by the following pseudo code:
- the computer program may further include instructions or code to represent an interaction between one or more molecules by combining and/or comparing representations of interaction sites of, for example, a receptor interaction site and a ligand interaction site.
- the computer program may further include instructions or code train the computer or other determining means with representations characterizing at least one interaction site of one or both molecules of an association pair, where the interaction site(s) location(s) may be known or estimated interaction.
- the computer program may further include instructions or code to apply representations of at least one test association pair (eg. ligand-receptor) interaction of unknown interaction site(s) and/or unknown structure, using the same representation form as used in training the computer or other determining means.
- test association pair eg. ligand-receptor
- the computer program may further include instructions or code to analyse each applied test association pair interaction in order to predict the interaction site(s) of each test interaction.
- the determining means is selected from the group using probabilistic means, fuzzy means or multiple regression means, however other means may be employed such as direct or complementary matching means.
- the computer program may optionally comprise instructions or code for manipulating one or more of the descriptors, for example adding weights to one or more descriptors of the interaction site(s), prior to analysis.
- the methods and systems described herein utilize properties of amino acid or monosaccharide or nucleic acids which are enclosed within a three-dimensional region, such as charge, energetics, solvent accessible surface area and other properties, as descriptors.
- the characterisation of an interaction site of a molecule based solely on examination of one or two components of a pair in isolation may utilize a probabilistic approach or other deterministic means such as multiple regression, SVM, ANN or HMM for representing descriptors.
- Characterization based complexes of an association pair may combine both representations of ligand and receptor for each single data training point and are thus based on the characteristics of the descriptors derived from receptor-ligand interactions rather than on the characteristics of either ligand or receptor component in isolation.
- the first portion is the utilization of amino acid properties (such as charge, energetics, solvent accessible surface area, among others) enclosed within a three- dimensional container (such as a sphere, cube, pyramid, cylinder, among others) as descriptors.
- amino acid properties such as charge, energetics, solvent accessible surface area, among others
- a three- dimensional container such as a sphere, cube, pyramid, cylinder, among others
- the second portion is the description and the representation of the association pair (eg. receptor-ligand) interactions by using descriptors derived from a receptor-ligand complex (or a series of similar complexes with the same receptor but different ligands; or vice- versa).
- the steps of the present invention include, but are not limited to, the following: a) Identification of the contact elements in a three-dimensional structure of a receptor-ligand complex from a representative known structure using an enclosed three- dimensional (3D) container (which may be but not limited to a sphere, cube, pyramid or cylinder).
- the contact elements are amino acids that directly or indirectly affect a ligand- receptor interaction.
- a macro-3D container e.g. sphere, cube, pyramid, cylinder, among others
- a macro-3D container is used to detect global structural features of both receptor and ligand binding site.
- the macro-3D container is further seeded with multiples of (possibly overlapping) micro-containers to capture micro-3D substructure features or profiles (e.g. charge distribution, contour, solvent accessible surface area, energetics, torsion angles).
- Macro- and micro-features detected by all containers are mapped into descriptors.
- e) Represent a protein target (which may be a ligand or receptor in isolation) of unknown interaction site in the format suitable for use with the determining means (following the procedure described in steps a) to c)). f) Predict the interaction site of the unknown target.
- the third portion is the description and the representation of the target binding candidate to a source protein (which may be a receptor or ligand) using descriptors derived from the source in isolation.
- a source protein which may be a receptor or ligand
- the steps of the present invention include, but are not limited to, the following: a) Identification of the contact elements in a three-dimensional structure of a source receptor or source ligand in isolation from a representative known structure using an enclosed three-dimensional (3D) container (which may be but not limited to a sphere, cube, pyramid or cylinder).
- the contact elements are amino acids that directly or indirectly affect a ligand-receptor interaction.
- a macro-3D container (e.g. sphere, cube, pyramid, cylinder, among others) is used to detect global structural features of both receptor and ligand binding site.
- the macro-3D container is further seeded with multiples of (possibly overlapping) micro-containers to capture micro-3D substructure features or profiles (e.g. charge distribution, contour, solvent accessible surface area, energetics, torsion angles, among others).
- Macro- and micro-features detected by all containers are mapped into descriptors.
- c) Obtain a complementary, which may be, but not limited, to an inverse, or weighted relationship of appropriate descriptors (e.g.
- VBS virtual binding site
- d) Represent descriptors in a format suitable for use with a determining means.
- e) Train the determining means.
- f) Represent a protein of unknown interaction site in the format suitable for use with the determining means (following the procedure described in steps a) to c)).
- g) Predict novel binding candidates or interaction site to the source protein where only experimental 3D data of the source is available.
- the VBS is also applicable for construction of three-dimensional structures/profiles of virtual targets.
- the fourth portion is the training of derived descriptors using statistical means such as probabilistic function, artificial neural network, hidden Markov model, multiple regression or Bayesian network.
- a computer-based general system and method for prediction of protein interaction sites operates as follows. In one aspect, the methods described herein perform the process of forming an interaction site representation of a molecule as depicted in one arrangement as method 100 presented in flow chart form in Figure 4.
- the method 100 for representing an interaction site on a target molecule comprises selecting 100 a global region in at least one target molecule.
- the global region may encompass an interaction site on the target molecule.
- a plurality of local regions may further be selected 105.
- the plurality of local regions may lie substantially within the global region.
- the plurality of local regions may be partially overlapping with one, two, three, four, five, six or more adjacent local regions.
- One or a plurality of descriptors are then determined 109 for each of the local regions.
- at least one descriptor for the global region may be determined 107, which may be determined either sequentially or simultaneously to the determination of the local descriptors.
- the determination 107 of the global descriptors may optionally be performed prior to the selection 105 of the local regions of interest.
- the selection 105 of the local regions of interest may be determined from or influenced by the global descriptors.
- the at least one descriptor for each of the local regions are then combined 111 to form a representation of the interaction site. Where one or more global descriptor have been determined, the representation of the interaction site is formed from a combination of the at least one global descriptor and the at least one descriptor for each of the local regions.
- the methods described herein perform the process of identify and/or screening for common structural patterns between two or more molecules as depicted in one arrangement as method 200 presented in flow chart form in Figure 5.
- the method 200 may in one arrangement comprise selecting 201 a target molecule with at least one target interaction site. A representation of the at least one target interaction site is formed by the method 100 of Figure 4. At least one test molecule having at least one test interaction site is then selected 203 and a representation of the at least one test interaction site is formed, again by the method 100 of Figure 4.
- one or more additional test molecules and/or test interaction sites may be selected 205 and a site representation of the known and/or potential interaction sites for each additional test molecules and/or test interaction sites may be formed by the method 100 of Figure 4.
- the representation of the target interaction site is then compared 207 with the representation of the test interaction site(s) to identify 209 or screen for common structural patterns in the interaction sites.
- a test molecule may then be selected 211 on the basis of the comparison 209 where a favourable interaction is found.
- process steps, method steps, algorithms or the like as described above may be described in a sequential order, such processes, methods and algorithms may be configured to work in alternate orders. In other words, any sequence or order of steps that may be described does not necessarily indicate a requirement that the steps be performed in that order. The steps of processes described herein may be performed in any order practical, Further, some steps may be performed simultaneously.
- the method 100 may be implemented using a computer system 400, such as that shown in Figure 6 wherein the processes of Figures 3 or 4 may be implemented as software, such as one or more application programs executable within the computer system 400.
- the steps of the method 100 of forming an interaction site representation of a molecule or the method 200 of identifying and/or screening for common structural patterns between two or more molecules are effected by instructions in the software that are carried out within the computer system 400.
- the software may be stored in a computer readable medium, including the storage devices described below, for example.
- the software is loaded into the computer system 400 from the computer readable medium, and then executed by the computer system 400.
- a computer readable medium having such software or computer program recorded on it is a computer program product.
- the use of the computer program product in the computer system 200 preferably effects an advantageous apparatus for performing the steps of the method described herein [ 0114 ]
- the computer system 400 is formed by a computer module 401, input devices such as a keyboard 402 and a mouse pointer device 403, and output devices including a printer 415, and a display device 414.
- the computer module 401 typically includes at least one processor unit 405, and a memory unit 406.
- the module 401 also includes a number of input/output (I/O) interfaces including a video interface 407 that couples to the video display 414, an I/O interface 413 for the keyboard 402 and mouse 403, and an interface 408 for the printer 415.
- Computer readable medium and/or storage devices 409 are provided and typically include a hard disk drive (HDD) and optical disk drive.
- the components 405, to 413 of the computer module 401 typically communicate via an interconnected bus 404 and in a manner which results in a conventional mode of operation of the computer system 400 known to those in the relevant art.
- the application programs discussed above are resident on the hard disk drive and read and controlled in execution by the processor 405. Intermediate storage of such programs and any data generated may be accomplished using the memory 406, possibly in concert with the hard disk drive.
- Non-volatile media include, for example, optical or magnetic disks and other persistent memory.
- Volatile media include dynamic random access memory (DRAM), which typically constitutes the main memory.
- Transmission media include coaxial cables, copper wire and fibre optics, including the wires that comprise a system bus coupled to the processor.
- Transmission media may include or convey acoustic waves, light waves and electromagnetic emissions, such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EEPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer can read. [ 0118 ]
- Various forms of computer readable media may be involved in carrying sequences of instructions to a processor.
- sequences of instruction (i) may be delivered from RAM to a processor, (ii) may be carried over a wireless transmission medium, and/or (iii) may be formatted according to numerous formats, standards or protocols, such as Bluetooth, TDMA, CDMA, and 3 G.
- the methods described herein may alternatively be implemented in dedicated hardware such as one or more integrated circuits performing the functions or sub-functions of the methods eg. either or both of the methods 100 or 200.
- identification of contact (interaction) sites of the receptor-ligand complex from structural models facilitates representation of the receptor-ligand interaction by combining representations of receptor interaction site and ligand receptor site.
- a determining means such as probability density function, multiple regression system, ANN, HMM, SVM, among others
- input data characterizing instances of receptor-ligand interactions of known 3D structure.
- test data representing a test protein which may be a receptor or ligand of unknown interaction site (using the same representation form as for training the detem ⁇ ning means) is applied and analyzed to predict the interaction site of the test protein.
- the method of present invention may be set out in the form of a computer program, residing on a computer-readable medium and may be implemented using a computer programmed with one of the above mentioned determining means.
- the present invention uses descriptors which may be input to probabilistic function, an artificial neural network, a hidden Markov model, multiple regression or Bayesian network.
- the use of such a technique facilitates cyclical refinement of predictive models for improved accuracy by inclusion of new experimental data as it becomes available. If there is no experimental data with which to train the determining means is available, the training process may be based on estimated binding affinity produced using other methods. For example, if binding activity of a ligand-receptor interaction is unknown, but there is experimental evidence of biological activity, a reasonable estimate of binding affinity can be deduced and used for training a predictive model.
- the systems and methods provided herein generally applicable to data sets based on any type of association pair interaction, such as interactions between, among others, receptor-ligand, antibody-epitope or antibody-antigen, enzyme-substrate, protein-protein, channel/transporter-solute or cytoskeletal-protein interaction.
- the methods and systems provided herein may be applicable to, but are not limited to, the identification of novel interaction sites of components of association pairs comprising pairs of moieties and/or molecules, for example receptors or ligands, the identification of unknown binding counterparts eg. of a known receptor or ligand, the identification of unknown and secondary therapeutic targets of drugs, drug leads, drug candidates, natural products, the identification of novel receptor or ligand molecule with similar interaction site as the source or target molecule, the prediction of drug targets related to side effect and toxicity (including drug safety evaluation), the prediction of targets to drug ADME (pharmacokinetics), and the construction of virtual binding targets in docking simulations.
- Receptor families typically share a common conserved structure.
- the representations of known interaction residues derived from three-dimensional models can therefore be used to train a system for prediction of interaction site and relative binding to a range of related receptors.
- Two techniques for representing interaction sites using descriptors derived from a single protein in isolation and from receptor-ligand complexes are illustrated:
- the representation of a interaction site for a protein (receptor or ligand) in isolation may be expressed as [G-P n ][L xIn -P n ][BA], where G-P n stands for global descriptor for amino acid property P n where n is an integer between 1 and total number of defined properties, Lx m -P n stands for local descriptor enclosed by micro-container x m where m is an integer between 1 and total number of micro-containers for amino acid property P n and n is an integer between 1 and total number of defined properties, and BA (optional) for strength of the interaction (the binding affinity or weight) where applicable.
- G-P n stands for global descriptor for amino acid property P n where n is an integer between 1 and total number of defined properties
- Lx m -P n stands for local descriptor enclosed by micro-container x m where m is an integer between 1 and total number of micro-containers for amino acid property
- the descriptors may be calculated using a variety of different methods, depending on the nature of the descriptor. For example, using suitable three-dimensional information, binding energies may be computed using molecular dynamic algorithms, partitioning the binding energy into biophysical energy terms, or knowledge-based scoring functions; the accessible surface area of a region of interest of a molecule may be measured by tracing out the maximum permitted van der Waals' contact that is covered by the centre of a water molecule as it rolls over the surface of the protein; the torsion angles of amino acids may be computed from three-dimensional XYZ coordinates using mathematically formulas. The interaction site may be represented by collating the values for each of the descriptors together in either a linear combination or matrix representation.
- a continuous string of numerical digits may be suitable for representing the descriptors of the interaction site.
- the descriptors may be used for training a computer or other determining means including but not limited to binding matrices, fuzzy systems and machine-learning algorithms (eg. support vector machines, artificial neural networks, hidden Markov models, and genetic algorithms).
- the representation of the interaction site for a protein in isolation for one global region and n local descriptor regions may be an encoded string of the form: G_SurfaceArea G Charge G BindingEnergy G NoOfResidues; Ll Surface Area
- Ll Charge Ll BindingEnergy Ll NoOfResidues; L2_SurfaceArea L2_Charge L2_BindingEnergy L2_NoOfResidues; ...; Ln SurfaceArea Lx Charge Ln BindingEnergy Ln NoOfResidues where G represents a descriptor for the global region and L represents a descriptor for a local region.
- the encoded string of the same interaction site using the same descriptors may be of the form:
- G SurfaceArea G Charge G BindingEnergy G NoOfResidues Ll SurfaceArea Ll Charge Ll BindingEnergy Ll NoOfResidues L2_SurfaceArea L2_Charge L2_BindingEnergy L2_NoOfResidues ... Ln SurfaceArea Ln Charge Ln BindingEnergy Ln NoOfResidues where no delimiters are used. Alternatively, different delimiters, more delimiters, or combinations of delimiters may be used between each of or groups of descriptors as convenient.
- a further example of the representation may be a linear combination of the descriptors such as:
- X GGX are optional weights for each of the global descriptors and X GXX are optional weights for each of the local descriptors of each local region 1 to n.
- a still further example of the representation may be in matrix forma such as:
- the representation of a receptor-ligand interaction site can be described as [GR-P n ][GL-P n ][LR x1n -P n ] [LLxm-Pn][BA], where GR-P n is the global receptor descriptor for amino acid property P n ; GL-P n is the global ligand descriptor for amino acid property P n ; LR ⁇ m -P n stands for local receptor descriptor enclosed by micro-container x m , LL ⁇ m -P n is the local ligand descriptor enclosed by micro-container x m ; and BA (optional) for strength of the interaction (the binding affinity or weight) where applicable; and where n is an integer between 1 and total number of defined properties and m is an integer between 1 and total number of micro-containers for amino acid property P n .
- Hen egg white lysozyme (Worldwide Protein Data Bank IDS: 1A2Y, 1G7M, 1G7L, 1G7L 1G7H, IKIR, IVFB, IFDL, IKIQ, and IKIP; Berman et al., 2000) binds monoclonal antibody (mAb) D 1.3.
- This mAb D 1.3 recognizes a conformational (nonlinear) epitope on HEL.
- the positional binding environments of the complex have been resolved by crystallography (Dall'Acqua et al. 1998, Fischmann et al. 1991, Sundberg et al. 2000, Fields et al. 1996, and Bhat et al. 1994).
- HEL has 18 contact amino acids that constitute ligand interaction site
- Table 1 shows a listing of contact amino acid residue pairs derived from 3D coordinates of D1.3/HEL complex_using structures provided in Protein Data Bank (PDB) IDs 1A2Y, 1G7M, 1G7L, 1G7I, 1G7H, IKIR, IVFB, IFDL, IKIQ, or IKIP). These residues may be effectively captured by a three-dimensional sphere of radius 9.00 A ( Figure IA).
- PDB Protein Data Bank
- Table 1 HEL residue positional environments for D 1.3 at a distance of 4.50 A.
- Monoclonal antibody (mAb) D 1.3 binds HEL and the anti-idiotypic monoclonal antibody mAb E5.2.
- the structures of these antibodies are described in Fischmann et al., 1991; Bhat et al., 1994; and Sundberg et al., 2000.
- a representation of the interaction site comprising a set of descriptors was prepared consisting of binding sites (8.5 - 9.0 A radius) from 9 HEL crystallographic structures (PDB IDs 1G7M, 1G7L, 1G7I, 1G7H, IKIR, IVFB, IFDL, IKIQ, IKIP).
- the global and local free binding energy profiles of each HEL interaction site were effectively captured using a total of 30 three-dimensional spheres, the global sphere having a radius of 9 A, and the 29 local spheres having a radius of 4.5 A.
- the position and size of the global sphere was based on experimental data of interaction site, and the local spheres were defined to be half the radius of global sphere.
- AG H is the hydrophobic energy computed as the product of solvent accessible surface area (determined by rolling a sphere of 1.4 A radius along the surface of the molecule) by the surface tension.
- AGs refers to the entropic contribution from the protein side-chains computed from the maximal burial entropies for each type of amino acid and their relative accessibilities.
- AG EL denotes the electrostatic term composed of coulombic interactions between receptor and ligand and the desolvation of partial charges transferred from an aqueous medium to a protein core environment, and is determined by the numeric solution of the Poisson equation using an implementation of the boundary element algorithm (Schapira et al., 1999). No other descriptors were used in this example.
- the maximum (E n13x ) and minimum (E nUn ) free binding energies each global and local region of each crystal structure were computed using the software program Internal Coordinate Mechanics and provided with a degree of relaxation from 0 kJ/mol to 90 kJ/mol, representing a range from high specific binding to the low specific binding to form the training set.
- the test dataset included 2 binding and 1344 non-binding sites from mAb E5.2 (PDB ID IDVF) and HEL (PDB ID 1A2Y) a sample of which is shown in Figures 7 A and 7B for the two binding-sites and 21 IDVF and 9 1A2Y non-binding sites.
- the model is tested on the test dataset.
- SE and SP represent percentages of correctly predicted interaction sites and non-interaction sites, respectively.
- TP true positives
- TN true negatives
- FN false negatives
- FP false positives
- Chymotrypsin is a proteolytic enzyme acting in the digestive systems of mammals and other organisms. It facilitates the cleavage of peptide bonds by a hydrolysis reaction. The receptor cleaves peptides and polypeptides into shorter peptide chains, tri- and dipeptides. The crystallographic structures of chymotrypsin in complex with nine different ligands have been solved and deposited in the Protein Data Bank (Table 4).
- the training set consisted of 8 binding sites (13.0 - 13.5 A radius) from APPI (PDB ID ICAO), BPTI (PDB IDs 1T8O, 1T8N, 1T8M, ICBW), autocatalytic peptide (PDB ID 1OXG), and the human pancreatic secretory trypsin inhibitor (PDB IDs ICGJ, ICGI).
- the global and local free binding energy profiles of each interaction site were effectively captured using a total of 30 three-dimensional spheres as described in Example 2.
- the test dataset included 7 binding sites from ecotin (PDB ID 1N8O), eglin C (PDB ID IACB), ovomucoid (ICHO), pmp-C (IGLl), pmp-D2v (IGLO), BPTI (1T7C, 1T8L) and 2237 non-binding sites from mAb E5.2 (PDB ID IDVF), HEL (PDB ID 1A2Y) and MHC class I allele A*0201 (PDB ID IDUZ). E m3x and E n ⁇ n were gradually relaxed from 0 kJ/mol to 90 kJ/mol. [ 0144 ] The results of prediction are shown in Table 5.
- the present method has the advantage that all the predictions were produced using a single predictive model.
- ICM A New Method For Protein Modeling and Design: Applications To Docking and Structure Prediction From The Distorted Native Conformation J. Comp. Chem., 15, 488-506.
Landscapes
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Library & Information Science (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Medical Informatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Biophysics (AREA)
- Biochemistry (AREA)
- Molecular Biology (AREA)
- Medicinal Chemistry (AREA)
- Computing Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
L'invention porte sur des procédés et des systèmes servant à représenter un site d'interaction sur une molécule. Le procédé comprend les opérations consistant à : sélectionner une région globale de la molécule, la région globale enfermant un site d'interaction sur la molécule ; sélectionner une pluralité de régions locales qui s'étendent sensiblement à l'intérieur de la région globale ; déterminer au moins un descripteur local pour chacune des régions locales ; et former une représentation du site d'interaction par combinaison des descripteurs locaux pour chacune des régions locales. Facultativement, au moins un descripteur pour la région globale peut être déterminé et la représentation du site d'interaction peut être formée par combinaison des descripteurs globaux et des descripteurs locaux pour chacune des régions locales. De plus, des procédés et des systèmes sont également décrits pour identifier ou cribler des motifs structuraux courants dans un site d'interaction de molécules, le procédé comprenant les opérations consistant à : représenter un site d'interaction d'au moins une première molécule et d'au moins une seconde molécule ; et comparer la représentation du site d'interaction des premières molécules avec la représentation du site d'interaction des secondes molécules pour identifier ou cribler des motifs structuraux courants dans les sites d'interaction.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US88147907P | 2007-01-22 | 2007-01-22 | |
| US60/881,479 | 2007-01-22 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2008091225A1 true WO2008091225A1 (fr) | 2008-07-31 |
Family
ID=39644728
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/SG2008/000025 Ceased WO2008091225A1 (fr) | 2007-01-22 | 2008-01-22 | Détection comparative de motifs de structure dans des sites d'interaction de molécules |
Country Status (1)
| Country | Link |
|---|---|
| WO (1) | WO2008091225A1 (fr) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101339180B (zh) * | 2008-08-14 | 2012-05-23 | 南京工业大学 | 基于支持向量机的有机化合物燃爆特性预测方法 |
| WO2019178610A1 (fr) * | 2018-03-16 | 2019-09-19 | uBiome, Inc. | Procédé et système de caractérisation d'états associés au métabolisme, comprenant des diagnostics et des thérapies, sur la base d'une approche bioinformatique |
| US10733499B2 (en) | 2014-09-02 | 2020-08-04 | University Of Kansas | Systems and methods for enhancing computer assisted high throughput screening processes |
| CN117116384A (zh) * | 2023-10-20 | 2023-11-24 | 聊城高新生物技术有限公司 | 一种靶向诱导的医药分子结构生成方法 |
| CN117275571A (zh) * | 2023-08-24 | 2023-12-22 | 安徽大学 | Rna-小分子结合位点预测方法及系统 |
| CN118038977A (zh) * | 2024-04-12 | 2024-05-14 | 山东大学 | 基于几何深度学习的蛋白质结合位点识别方法及系统 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6182016B1 (en) * | 1997-08-22 | 2001-01-30 | Jie Liang | Molecular classification for property prediction |
-
2008
- 2008-01-22 WO PCT/SG2008/000025 patent/WO2008091225A1/fr not_active Ceased
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6182016B1 (en) * | 1997-08-22 | 2001-01-30 | Jie Liang | Molecular classification for property prediction |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN101339180B (zh) * | 2008-08-14 | 2012-05-23 | 南京工业大学 | 基于支持向量机的有机化合物燃爆特性预测方法 |
| US10733499B2 (en) | 2014-09-02 | 2020-08-04 | University Of Kansas | Systems and methods for enhancing computer assisted high throughput screening processes |
| WO2019178610A1 (fr) * | 2018-03-16 | 2019-09-19 | uBiome, Inc. | Procédé et système de caractérisation d'états associés au métabolisme, comprenant des diagnostics et des thérapies, sur la base d'une approche bioinformatique |
| CN117275571A (zh) * | 2023-08-24 | 2023-12-22 | 安徽大学 | Rna-小分子结合位点预测方法及系统 |
| CN117116384A (zh) * | 2023-10-20 | 2023-11-24 | 聊城高新生物技术有限公司 | 一种靶向诱导的医药分子结构生成方法 |
| CN117116384B (zh) * | 2023-10-20 | 2024-01-09 | 聊城高新生物技术有限公司 | 一种靶向诱导的医药分子结构生成方法 |
| CN118038977A (zh) * | 2024-04-12 | 2024-05-14 | 山东大学 | 基于几何深度学习的蛋白质结合位点识别方法及系统 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Smith et al. | Prediction of protein–protein interactions by docking methods | |
| Skolnick et al. | FINDSITE: a combined evolution/structure-based approach to protein function prediction | |
| Desaphy et al. | Comparison and druggability prediction of protein–ligand binding sites from pharmacophore-annotated cavity shapes | |
| Loewenstein et al. | Protein function annotation by homology-based inference | |
| Singh et al. | AADS-An automated active site identification, docking, and scoring protocol for protein targets based on physicochemical descriptors | |
| Venkatraman et al. | Protein-protein docking using region-based 3D Zernike descriptors | |
| Rahman et al. | Use of computer in drug design and drug discovery: A review | |
| Kawabata et al. | 3D flexible alignment using 2D maximum common substructure: dependence of prediction accuracy on target-reference chemical similarity | |
| Fernández‐Recio | Prediction of protein binding sites and hot spots | |
| Daga et al. | Template-based protein modeling: recent methodological advances | |
| Sugita et al. | New protocol for predicting the ligand-binding site and mode based on the 3D-RISM/KH theory | |
| Osaki et al. | 3D-RISM-AI: a machine learning approach to predict protein–ligand binding affinity using 3D-RISM | |
| WO2008091225A1 (fr) | Détection comparative de motifs de structure dans des sites d'interaction de molécules | |
| Chelur et al. | Birds-binding residue detection from protein sequences using deep resnets | |
| Jin et al. | Protein structure prediction in CASP13 using AWSEM-suite | |
| Xu et al. | OPUS-Rota3: improving protein side-chain modeling by deep neural networks and ensemble methods | |
| Krull et al. | ProPairs: a data set for protein–protein docking | |
| Olson et al. | Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions | |
| Strömbergsson et al. | Interaction model based on local protein substructures generalizes to the entire structural enzyme-ligand space | |
| Feldman et al. | Pocket similarity: are α carbons enough? | |
| Guterres et al. | CHARMM-GUI LBS finder & refiner for ligand binding site prediction and refinement | |
| US20070166760A1 (en) | Ligand searching device, ligand searching method, program, and recording medium | |
| Li et al. | Simultaneous prediction of interaction sites on the protein and peptide sides of complexes through multilayer graph convolutional networks | |
| Krotzky et al. | Extraction of protein binding pockets in close neighborhood of bound ligands makes comparisons simple due to inherent shape similarity | |
| Jarmolinska et al. | DCA-MOL: a PyMOL plugin to analyze direct evolutionary couplings |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08705418 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 08705418 Country of ref document: EP Kind code of ref document: A1 |