CN110379467B - Chemical molecular formula segmentation method - Google Patents
Chemical molecular formula segmentation method Download PDFInfo
- Publication number
- CN110379467B CN110379467B CN201910645530.4A CN201910645530A CN110379467B CN 110379467 B CN110379467 B CN 110379467B CN 201910645530 A CN201910645530 A CN 201910645530A CN 110379467 B CN110379467 B CN 110379467B
- Authority
- CN
- China
- Prior art keywords
- atoms
- mol
- cut
- atom
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16C—COMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
- G16C20/00—Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
- G16C20/10—Analysis or design of chemical reactions, syntheses or processes
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P20/00—Technologies relating to chemical industry
- Y02P20/50—Improvements relating to the production of bulk chemicals
- Y02P20/55—Design of synthesis routes, e.g. reducing the use of auxiliary or protecting groups
Landscapes
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Crystallography & Structural Chemistry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明属于化合物合成技术领域,尤其涉及一种化学分子式切分方法。The invention belongs to the technical field of compound synthesis, and in particular relates to a chemical molecular formula segmentation method.
背景技术Background technique
在化合物合成的流程中,要对带有保护基团的试剂分子进行脱保护基、互相反应得到新化合物。从化学角度来看,这两个步骤都是化学反应;从计算的角度来看,分子脱掉保护基的化学反应是分子的“位点切分”,得到可以与其他分子反应(连接)的位点,之后互相反应则是分子间的“位点拼接”,把可发生化学反应的位点相连,得到新化合物。In the process of compound synthesis, the reagent molecules with protective groups should be deprotected and reacted with each other to obtain new compounds. From a chemical point of view, both steps are chemical reactions; from a computational point of view, the chemical reaction in which a molecule removes protecting groups is a "site cleavage" of the molecule, resulting in a molecule that can react (link) with other molecules. site, and then react with each other is "site splicing" between molecules, linking sites where chemical reactions can occur to obtain new compounds.
计算上目前的处理方式是把第一步得到的带位点的分子记录下来,将不同反应得到的位点区分标记,如反应类型1得到的位点标记为[R1],位点的位置即分子在反应中脱去部分的原子的位置,然后将带有相同位点标记的分子进行拼接,如两个分子都带有[R1]则可以拼接。The current processing method is to record the molecules with sites obtained in the first step, and mark the sites obtained by different reactions. For example, the site obtained by reaction type 1 is marked as [R1], and the position of the site is Molecules remove part of the atomic positions in the reaction, and then splicing molecules with the same site label, such as two molecules with [R1] can be spliced.
目前的计算化学工具如Openbabel、RdKit等,是通过模拟真实化学反应过程进行切分的,即输入待反应的化合物,得到反应后的化合物。The current computational chemistry tools, such as Openbabel, RdKit, etc., are segmented by simulating the real chemical reaction process, that is, inputting the compound to be reacted, and obtaining the reacted compound.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于针对现有技术的不足,在分子式的图结构基础实现了一种切分方法,即用特定类型的化学反应切掉分子式的某一基团,切分模拟了化学反应中的一个子过程,在真实化学反应中,发生化学反应的化合物会被切掉一部分,之后与其他化合物结合,形成新的化合物。在通过大量试剂分子进行化合物合成的应用场景下,分子式切分是一个固有的子环节,试剂分子脱去固定基团首先要脱掉某个基团,才能与其他化合物结合。The purpose of the present invention is to aim at the deficiencies of the prior art, and realize a segmentation method on the basis of the graph structure of the molecular formula, that is to use a specific type of chemical reaction to cut off a certain group of the molecular formula, and the segmentation simulates the chemical reaction. A sub-process, in a real chemical reaction, a part of the chemically reacted compound is cut off and then combined with other compounds to form new compounds. In the application scenario of compound synthesis through a large number of reagent molecules, molecular formula segmentation is an inherent sub-link. Removing a fixed group from a reagent molecule must first remove a certain group before it can be combined with other compounds.
本发明的目的是通过以下技术方案来实现的:一种化学分子式切分方法,对于一个带位点或不带位点的分子式smi,将分子式中的位点记为[Ri],i=0,1…N,[Ri]对应于某一反应类型,N为反应类型的总数;The object of the present invention is achieved through the following technical solutions: a chemical molecular formula segmentation method, for a molecular formula smi with or without a site, the site in the molecular formula is recorded as [Ri], i=0 ,1…N, [Ri] corresponds to a certain reaction type, and N is the total number of reaction types;
对一种反应类型Reaction,定义一个反应类型串smi_sub和转换类型串smi_convert,表示在smi中如果存在smi_sub子串,则把该子串部分转换为smi_convert;smi_sub子串包含至少一个与smi其余部分连接的位点[Ri],smi_convert在对应位置有相同的位点[Ri],smi_convert还包含要切除的原子[*],切分过程如下:For a reaction type Reaction, define a reaction type string smi_sub and a conversion type string smi_convert, indicating that if there is a smi_sub substring in smi, then convert the substring part to smi_convert; the smi_sub substring contains at least one connection with the rest of smi The site [Ri] of , smi_convert has the same site [Ri] in the corresponding position, and smi_convert also contains the atom [*] to be excised. The segmentation process is as follows:
(1)读入smi和smi_sub,转为图结构mol和mol_sub,生成mol_sub到mol的原子映射关系表maplist2,如果mol中不存在mol_sub子结构,则不再进行切分。如果mol中存在多个mol_sub子结构,则maplist2中保存多个映射关系。映射时考虑连接位点对应的原子是否满足smi_sub中对[Ri]类型的限定,如不满足则映射失败。在读入smi_sub时,保存连接位点[Ri]的位置。(1) Read in smi and smi_sub, convert them into graph structures mol and mol_sub, and generate the atomic mapping table maplist2 from mol_sub to mol. If there is no mol_sub substructure in mol, no further segmentation will be performed. If there are multiple mol_sub substructures in mol, multiple mapping relationships are saved in maplist2. When mapping, consider whether the atom corresponding to the linking site satisfies the restriction of the [Ri] type in smi_sub. If not, the mapping fails. When reading in smi_sub, save the position of the ligation site [Ri].
(2)读入smi_convert,转为图结构mol_convert,生成mol_convert到mol_sub的原子映射关系表maplist1,找到切除原子[*]的位置cut_idx,找到切除原子[*]连接的原子的位置cut_connect_idx。对于mol_convert中的每个原子,在mol_sub中找不到对应原子时,用-1表示该原子的映射原子位置。(2) Read in smi_convert, convert it to the graph structure mol_convert, generate the atom mapping table maplist1 from mol_convert to mol_sub, find the position cut_idx of the cut atom [*], and find the position cut_connect_idx of the atom connected by the cut atom [*]. For each atom in mol_convert, when no corresponding atom is found in mol_sub, use -1 to represent the mapped atomic position of that atom.
(3)对maplist2中的每一个映射关系,得到一个切分结果:(3) For each mapping relationship in maplist2, a segmentation result is obtained:
(3.1)把mol和mol_sub相加,得到更新后的mol。(3.1) Add mol and mol_sub to get the updated mol.
(3.2)判断该映射关系是否会导致mol中的环状结构被切开,如果是,则跳过这个映射关系,执行下一个映射关系,否则,执行步骤3.3。(3.2) Determine whether the mapping relationship will cause the ring structure in the mol to be cut. If so, skip this mapping relationship and execute the next mapping relationship, otherwise, go to step 3.3.
(3.3)通过该映射关系,搜集映射到mol中不切除的原子的位置,即maplist1中>-1的元素对应的位置,存放到not_cut_idxs。(3.3) Through this mapping relationship, collect and map the positions of atoms that are not cut in mol, that is, the positions corresponding to elements >-1 in maplist1, and store them in not_cut_idxs.
搜集映射到mol中切除原子[*]连接的原子的位置,即cut_connect_idx对应的原子,存放到cut_connect_idxs。Collect and map the position of the atom connected to the cut atom [*] in the mol, that is, the atom corresponding to cut_connect_idx, and store it in cut_connect_idxs.
搜集映射到mol中的全部切除原子的位置,即maplist1中为-1的元素对应的位置,存放到cut_idxs,对于切分点位置的原子,将其邻接原子中不在not_cut_idxs中的原子存放到cut_idxs。每个切除位置维护一个cut_idxs列表,如果该位置找不到mol中对应的原子,则该位置列表为空。Collect the positions of all the cut atoms mapped to the mol, that is, the positions corresponding to the elements of -1 in maplist1, and store them in cut_idxs. For the atoms at the cut point position, store the atoms in the adjacent atoms that are not in not_cut_idxs in cut_idxs. Each cut position maintains a list of cut_idxs, which is empty if the corresponding atom in the mol cannot be found at that position.
对每个切除位置的cut_idxs,如果为空,则切除该位置上的氢原子,如果该位置没有氢原子,则跳过这个映射关系的切分,执行下一个映射关系。如果不为空,搜集该位置上要切除的原子,包括该位置的原子和与之相邻但不在cut_connect_idxs中的原子,递归搜集该位置相邻原子的相邻原子。For the cut_idxs of each excision position, if it is empty, the hydrogen atom at the position is cut off. If there is no hydrogen atom at the position, the division of this mapping relationship is skipped and the next mapping relationship is executed. If not empty, collect atoms to be cut at this position, including atoms at this position and atoms adjacent to it but not in cut_connect_idxs, recursively collect adjacent atoms of adjacent atoms at this position.
搜集mol中要切除的原子的位置,即cut_connect_idx中的那些原子,包括该位置的原子和与之相邻但不在cut_connect_idx和cut_idx中的原子,这些原子在新mol中有新的位置,递归搜集该位置相邻原子的相邻原子。Collect the positions of the atoms to be cut in the mol, that is, those atoms in cut_connect_idx, including atoms at this position and atoms adjacent to them but not in cut_connect_idx and cut_idx, these atoms have new positions in the new mol, recursively collect the atoms The adjacent atoms of the adjacent atoms in the position.
(3.4)连接原子:将cut_connect_idxs和cut_idx中的对应原子用化学键相连。(3.4) Connecting atoms: Connect the corresponding atoms in cut_connect_idxs and cut_idx with chemical bonds.
(3.5)拷贝键属性:从mol_sub中复制键属性到mol中新增的键。搜集新增原子,即新增键的两端的原子。(3.5) Copy bond properties: Copy bond properties from mol_sub to the newly added bonds in mol. Collect newly added atoms, that is, atoms at both ends of the newly added bond.
(3.6)更新双键的\和/空间属性:如果mol的空间属性结构中包含要切除的原子,则需要更新空间属性。双键空间属性结构中存储了双键两端的原子及与它们相连的原子,如果双键空间属性结构中的两端原子与上一步的新增原子之间有键,则把结构中的对应原子替换为新增原子。(3.6) Update the \ and / steric properties of the double bond: If the steric property structure of the mol contains the atoms to be excised, the steric properties need to be updated. The double bond space attribute structure stores the atoms at both ends of the double bond and the atoms connected to them. If there is a bond between the two ends of the double bond space attribute structure and the newly added atom in the previous step, the corresponding atoms in the structure are stored. Replace with new atom.
(3.7)将搜集的要切除的原子切除。(3.7) The collected atoms to be excised are excised.
(3.8)输出一个maplist2映射关系的切分后的分子。(3.8) Output a segmented molecule of the maplist2 mapping relationship.
进一步地,根据各反应类型的特点,对反应类型串smi_sub中的位点[Ri]所代表的原子类型作一些限定,即对[Ri]对应到待切分子式中的原子的类型作一些限定,包括位点[Ri]不能为氢原子(H),或只能为C、N、O原子,或可以为任意原子,限定按照SMARTS规范。Further, according to the characteristics of each reaction type, make some restrictions on the atom type represented by the site [Ri] in the reaction type string smi_sub, that is, make some restrictions on the type of atom in the molecular formula to be cut corresponding to [Ri], Including the site [Ri] cannot be hydrogen atom (H), or can only be C, N, O atom, or can be any atom, which is limited according to the SMARTS specification.
进一步地,所述步骤(1)中,读入的分子式格式为Canonical SMILES,其他格式的输入转换为该格式。Further, in the step (1), the read-in molecular formula format is Canonical SMILES, and the input of other formats is converted into this format.
进一步地,所述步骤(1)中,映射时考虑连接位点对应的原子是否满足smi_sub中对[Ri]类型的限定,如不满足则映射失败。Further, in the step (1), when mapping, consider whether the atom corresponding to the connection site satisfies the definition of the [Ri] type in smi_sub, if not, the mapping fails.
进一步地,所述步骤(3.2)中,判断该映射关系是否会导致mol中的环状结构被切开即判断mol_sub是否映射到了mol的环状结构的一部分,具体为:通过maplist2中的映射关系找到mol中待切位点[Ri]的对应原子A,如果A有一个相邻原子A_neighbor不在要切除的列表中,且A_neighbor在环上,则环会被切开。Further, in the step (3.2), it is determined whether the mapping relationship will cause the ring structure in the mol to be cut, that is, to determine whether the mol_sub is mapped to a part of the ring structure of the mol, specifically: through the mapping relationship in maplist2 Find the corresponding atom A of the site to be excised [Ri] in mol. If A has an adjacent atom A_neighbor that is not in the list to be excised, and A_neighbor is on the ring, the ring will be excised.
进一步地,所述步骤(3.5)中,从mol_sub中复制键属性到mol中新增的键,即找到mol_sub中每个键两端的原子,找到在mol中对应的两个原子,将这两个原子之间的键属性更新为mol_sub中对应键的属性值。Further, in the step (3.5), copy the bond properties from mol_sub to the newly added bond in mol, that is, find the atoms at both ends of each bond in mol_sub, find the two corresponding atoms in mol, and combine these two The bond properties between atoms are updated to the property values of the corresponding bonds in mol_sub.
本发明的有益效果是:本发明在原子、化学键的图结构上实现分子式的切分,操作灵活性强,可定制性强,适用于对于多种情形的切分。通过待切分化学式中与表示反应类型的反应类型串,及反应类型串表示转换类型的反应类型串之间的原子映射关系,找到待切分化学式中哪些原子上的哪些分枝要切除,哪些分枝要保留,实现正确的切分。The beneficial effects of the present invention are as follows: the present invention realizes the segmentation of molecular formulas on the graph structure of atoms and chemical bonds, has strong operational flexibility, and is highly customizable, and is suitable for segmentation in various situations. Through the atomic mapping relationship between the chemical formula to be split and the reaction type string representing the reaction type, and the reaction type string representing the conversion type in the chemical formula to be split, find out which branches on which atoms in the chemical formula to be split need to be removed and which branches Branches should be preserved to achieve correct segmentation.
附图说明Description of drawings
图1为本发明方法流程图;Fig. 1 is the flow chart of the method of the present invention;
图2为本发明的一个切分示例。FIG. 2 is a segmentation example of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
本发明提供的一种化学分子式切分方法,具体如下:A kind of chemical molecular formula segmentation method provided by the invention is as follows:
对于一个带位点或不带位点的分子式smi(假设输入格式为Canonical SMILES,其他格式的输入也可转换为该格式),将分子式中的位点记为[Ri],i=0,1…N,[Ri]对应于某一反应类型,是用试剂分子进行该类型的化学反应得到的,N为反应类型的总数;For a molecular formula smi with or without a site (assuming the input format is Canonical SMILES, input in other formats can also be converted to this format), record the site in the formula as [Ri], i=0,1 ...N, [Ri] corresponds to a certain reaction type, which is obtained by using reagent molecules to carry out this type of chemical reaction, and N is the total number of reaction types;
对一种反应类型Reaction,定义一个反应类型串smi_sub和转换类型串smi_convert,表示在smi中如果存在smi_sub子串,则把该子串部分转换为smi_convert。smi_sub子串包含至少一个与smi其余部分连接的位点[Ri],smi_convert在对应位置有相同的位点[Ri],smi_convert还包含要切除的原子[*]。特别地,根据各反应类型的特点,对反应类型串smi_sub中的位点[Ri]所代表的原子类型作一些限定,即对[Ri]对应到待切分子式中的原子的类型作一些限定,如位点[Ri]不能为氢原子(H),或只能为C、N、O原子,或可以为任意原子,具体的限定处理主要按照SMARTS规范。切分过程如下:For a reaction type Reaction, define a reaction type string smi_sub and a conversion type string smi_convert, indicating that if there is a smi_sub substring in smi, then convert the substring part to smi_convert. The smi_sub substring contains at least one site [Ri] connected to the rest of smi, smi_convert has the same site [Ri] at the corresponding position, and smi_convert also contains the atom to be excised [*]. In particular, according to the characteristics of each reaction type, make some restrictions on the atom type represented by the site [Ri] in the reaction type string smi_sub, that is, make some restrictions on the type of atom in the molecular formula to be cut corresponding to [Ri], For example, the site [Ri] cannot be a hydrogen atom (H), or can only be a C, N, O atom, or can be any atom, and the specific limiting treatment is mainly in accordance with the SMARTS specification. The segmentation process is as follows:
1.读入smi和smi_sub,转为图结构mol和mol_sub,生成mol_sub到mol的原子映射关系表maplist2,如果mol中不存在mol_sub子结构,则不再进行切分,说明分子式smi不能发生反应类型Reaction的化学反应。如果mol中存在多个mol_sub子结构,则maplist2中保存多个映射关系,此时说明分子式smi有多个基团可以发生该类型的化学反应。映射时考虑连接位点对应的原子是否满足smi_sub中对[Ri]类型的限定,如不满足则映射失败。1. Read in smi and smi_sub, convert them into graph structures mol and mol_sub, and generate the atomic mapping table maplist2 from mol_sub to mol. If there is no mol_sub substructure in mol, no further segmentation will be performed, indicating that the molecular formula smi cannot have a reaction type. The chemical reaction of Reaction. If there are multiple mol_sub substructures in mol, multiple mapping relationships are stored in maplist2, which means that there are multiple groups in the molecular formula smi that can undergo this type of chemical reaction. When mapping, consider whether the atom corresponding to the linking site satisfies the restriction of the [Ri] type in smi_sub. If not, the mapping fails.
在读入smi_sub时,保存连接位点[Ri]的位置。When reading in smi_sub, save the position of the ligation site [Ri].
2.读入smi_convert,转为图结构mol_convert,生成mol_convert到mol_sub的原子映射关系表maplist1,找到切除原子[*]的位置cut_idx,找到切除原子[*]连接的原子的位置cut_connect_idx。对于mol_convert中的每个原子,maplist1中由于被切除或替换为更长的子串,导致在mol_sub中找不到对应原子时,用-1表示该原子的映射原子位置。2. Read in smi_convert, convert it to the graph structure mol_convert, generate the atom mapping table maplist1 from mol_convert to mol_sub, find the position cut_idx of the cut atom [*], and find the position cut_connect_idx of the atom connected by the cut atom [*]. For each atom in mol_convert, when the corresponding atom cannot be found in mol_sub due to being cut off or replaced by a longer substring in maplist1, use -1 to represent the mapped atomic position of the atom.
3.对maplist2中的每一个映射关系,得到一个切分结果:3. For each mapping relationship in maplist2, get a segmentation result:
3.1.把mol和mol_sub相加,即将两者的数据结构堆叠,得到更新后的mol。3.1. Add mol and mol_sub, that is, stack the data structures of the two to get the updated mol.
3.2.判断该映射关系是否会导致mol中的环状结构被切开,如果是,则跳过这个映射关系,执行下一个映射关系,否则,执行步骤3.3;判断该映射关系是否会导致mol中的环状结构被切开即判断mol_sub是否映射到了mol的环状结构的一部分,具体为:通过maplist2中的映射关系找到mol中待切位点[Ri]的对应原子A,如果A有一个相邻原子A_neighbor不在要切除的列表中,且A_neighbor在环上,则环会被切开。3.2. Determine whether the mapping relationship will cause the ring structure in the mol to be cut. If so, skip the mapping relationship and execute the next mapping relationship. Otherwise, go to step 3.3; determine whether the mapping relationship will cause the mol to be cut. The ring structure of mol is cut to determine whether mol_sub is mapped to a part of the ring structure of mol, specifically: find the corresponding atom A of the site to be cut [Ri] in mol through the mapping relationship in maplist2, if A has a phase If the neighboring atom A_neighbor is not in the list to be cut, and A_neighbor is on the ring, the ring will be cut.
3.3.通过该映射关系,搜集映射到mol中不切除的原子的位置,即maplist1中>-1的元素对应的位置,存放到not_cut_idxs。3.3. Through this mapping relationship, collect and map the positions of atoms that are not cut in mol, that is, the positions corresponding to elements >-1 in maplist1, and store them in not_cut_idxs.
搜集映射到mol中切除原子[*]连接的原子的位置,即cut_connect_idx对应的原子,存放到cut_connect_idxs。Collect and map the position of the atom connected to the cut atom [*] in the mol, that is, the atom corresponding to cut_connect_idx, and store it in cut_connect_idxs.
搜集映射到mol中的全部切除原子的位置,即maplist1中为-1的元素对应的位置,存放到cut_idxs,对于切分点位置的原子,将其邻接原子中不在not_cut_idxs中的原子存放到cut_idxs,由于在转换串中已经不存在,所以这些原子也是要切除的。每个切除位置维护一个cut_idxs列表,如果该位置找不到mol中对应的原子,则该位置列表为空。Collect the positions of all the cut atoms mapped to the mol, that is, the positions corresponding to the elements of -1 in maplist1, and store them in cut_idxs. For the atoms at the cut point position, store the atoms in the adjacent atoms that are not in not_cut_idxs in cut_idxs, Since they no longer exist in the transition string, these atoms are also cut off. Each cut position maintains a list of cut_idxs, which is empty if the corresponding atom in the mol cannot be found at that position.
对每个切除位置的cut_idxs,如果为空,则切除该位置上的氢原子,如果该位置没有氢原子,则跳过这个映射关系的切分,执行下一个映射关系。如果不为空,搜集该位置上要切除的原子,包括该位置的原子和与之相邻但不在cut_connect_idxs中的原子,递归搜集该位置相邻原子的相邻原子。For the cut_idxs of each excision position, if it is empty, the hydrogen atom at the position is cut off. If there is no hydrogen atom at the position, the division of this mapping relationship is skipped and the next mapping relationship is executed. If not empty, collect atoms to be cut at this position, including atoms at this position and atoms adjacent to it but not in cut_connect_idxs, recursively collect adjacent atoms of adjacent atoms at this position.
搜集mol中要切除的原子的位置,即cut_connect_idx中的那些原子,包括该位置的原子和与之相邻但不在cut_connect_idx和cut_idx中的原子,这些原子在新mol中有新的位置,递归搜集该位置相邻原子的相邻原子。Collect the positions of the atoms to be cut in the mol, that is, those atoms in cut_connect_idx, including atoms at this position and atoms adjacent to them but not in cut_connect_idx and cut_idx, these atoms have new positions in the new mol, recursively collect the atoms The adjacent atoms of the adjacent atoms in the position.
3.4.连接原子:将cut_connect_idxs和cut_idx中的对应原子用化学键相连。3.4. Connecting atoms: Connect the corresponding atoms in cut_connect_idxs and cut_idx with chemical bonds.
3.5.拷贝键属性:从mol_sub中复制键属性到mol中新增的键,即找到mol_sub中每个键两端的原子,找到在mol中对应的两个原子,将这两个原子之间的键属性更新为mol_sub中对应键的属性值。搜集新增原子,即新增键的两端的原子。3.5. Copy bond attributes: Copy the bond attributes from mol_sub to the newly added bonds in mol, that is, find the atoms at both ends of each bond in mol_sub, find the corresponding two atoms in mol, and convert the bonds between these two atoms. The attributes are updated to the attribute values of the corresponding keys in mol_sub. Collect newly added atoms, that is, atoms at both ends of the newly added bond.
3.6.更新双键的\和/空间属性:由于空间属性是单独存储的,需要单独处理,如果mol的空间属性结构中包含要切除的原子,则需要更新空间属性。双键空间属性结构中存储了双键两端的原子及与它们相连的原子,如果双键空间属性结构中的两端原子与上一步的新增原子之间有键,则把结构中的对应原子替换为新增原子。3.6. Update the \ and / space attributes of the double bond: Since the space attributes are stored separately and need to be processed separately, if the space attribute structure of the mol contains the atoms to be excised, the space attributes need to be updated. The double bond space attribute structure stores the atoms at both ends of the double bond and the atoms connected to them. If there is a bond between the two ends of the double bond space attribute structure and the newly added atom in the previous step, the corresponding atoms in the structure are stored. Replace with new atom.
3.7.将搜集的要切除的原子切除。3.7. The collected atoms to be excised are excised.
3.8.输出一个maplist2映射关系的切分后的分子,即把mol转换为CanonicalSMILES格式。3.8. Output a segmented molecule of the maplist2 mapping relationship, that is, convert mol to CanonicalSMILES format.
切分示例如下:如图2所示,(a)、(b)、(c)、(d)依次为:反应类型、转换类型、待切分子式、切分的最终结果,其中Xx即为[*]原子,S为[R]原子;对应的化学式依次为:An example of segmentation is as follows: As shown in Figure 2, (a), (b), (c), (d) are in turn: reaction type, conversion type, molecular formula to be segmented, and final result of segmentation, where Xx is [ *] atom, S is [R] atom; the corresponding chemical formulas are:
[S]C=C[S]C=C
[S]C([*])C[*][S]C([*])C[*]
CCOC(=O)/C=C/C(O)=OCCOC(=O)/C=C/C(O)=O
CCOC(=O)C(C(C(=O)O)[*])[*]CCOC(=O)C(C(C(=O)O)[*])[*]
mol_convert(第2个化学式)到mol_sub(第1个化学式)的原子映射关系表maplist1为{0,1,-1,2,-1}。The atomic mapping relationship table maplist1 of mol_convert (2nd chemical formula) to mol_sub (1st chemical formula) is {0,1,-1,2,-1}.
需要声明的是,本发明内容及具体实施方式意在证明本发明所提供技术方案的实际应用,不应解释为对本发明保护范围的限定。在本发明的精神和权利要求的保护范围内,对本发明作出的任何修改和改变,都落入本发明的保护范围。It should be stated that the content and specific embodiments of the present invention are intended to prove the practical application of the technical solutions provided by the present invention, and should not be construed as limiting the protection scope of the present invention. Any modifications and changes made to the present invention within the spirit of the present invention and the protection scope of the claims fall into the protection scope of the present invention.
Claims (6)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910645530.4A CN110379467B (en) | 2019-07-17 | 2019-07-17 | Chemical molecular formula segmentation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910645530.4A CN110379467B (en) | 2019-07-17 | 2019-07-17 | Chemical molecular formula segmentation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110379467A CN110379467A (en) | 2019-10-25 |
CN110379467B true CN110379467B (en) | 2022-08-19 |
Family
ID=68253640
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910645530.4A Active CN110379467B (en) | 2019-07-17 | 2019-07-17 | Chemical molecular formula segmentation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110379467B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111613277A (en) * | 2020-05-22 | 2020-09-01 | 重庆大学 | A Knowledge Representation Method in the Field of Hazardous Chemicals |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005350383A (en) * | 2004-06-09 | 2005-12-22 | Canon Inc | Denatured protein activation structure, method for producing the same, and denatured protein activation method |
CN101039870A (en) * | 2004-10-14 | 2007-09-19 | 国际商业机器公司 | Programmable molecular manipulating processes |
JP2009173902A (en) * | 2007-12-25 | 2009-08-06 | Sumitomo Chemical Co Ltd | Method for decomposing aromatic ether compounds |
CN104039809A (en) * | 2011-10-10 | 2014-09-10 | 希望之城公司 | Meditope and meditope-binding antibodies and uses thereof |
CN109661232A (en) * | 2016-05-23 | 2019-04-19 | 纽约哥伦比亚大学董事会 | Nucleotide derivatives and methods of use thereof |
CN110379468A (en) * | 2019-07-17 | 2019-10-25 | 成都火石创造科技有限公司 | A kind of improved chemical molecular formula cutting method |
CN110390997A (en) * | 2019-07-17 | 2019-10-29 | 成都火石创造科技有限公司 | A method for splicing chemical molecular formulas |
-
2019
- 2019-07-17 CN CN201910645530.4A patent/CN110379467B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005350383A (en) * | 2004-06-09 | 2005-12-22 | Canon Inc | Denatured protein activation structure, method for producing the same, and denatured protein activation method |
CN101039870A (en) * | 2004-10-14 | 2007-09-19 | 国际商业机器公司 | Programmable molecular manipulating processes |
JP2009173902A (en) * | 2007-12-25 | 2009-08-06 | Sumitomo Chemical Co Ltd | Method for decomposing aromatic ether compounds |
CN104039809A (en) * | 2011-10-10 | 2014-09-10 | 希望之城公司 | Meditope and meditope-binding antibodies and uses thereof |
CN109661232A (en) * | 2016-05-23 | 2019-04-19 | 纽约哥伦比亚大学董事会 | Nucleotide derivatives and methods of use thereof |
CN110379468A (en) * | 2019-07-17 | 2019-10-25 | 成都火石创造科技有限公司 | A kind of improved chemical molecular formula cutting method |
CN110390997A (en) * | 2019-07-17 | 2019-10-29 | 成都火石创造科技有限公司 | A method for splicing chemical molecular formulas |
Non-Patent Citations (4)
Title |
---|
Segmentation and the Entropic Elasticity of Modular Proteins;Ronen Berkovich等;《The Journal of Physical Chemistry Letters》;20180730;第9卷(第16期);4707-4713 * |
Single-molecule chemical reaction reveals molecular reaction kinetics and dynamics;Yuwei Zhang等;《nature communications》;20140625;1-8 * |
催化RNA切割反应的新型短结合臂脱氧核酶;王月瑶;《中国优秀硕士学位论文全文数据库_基础科学辑》;20190715;A006-126 * |
基于金和石墨烯纳米材料的生物分子化学发光新方法及其应用;骆凯;《中国博士学位论文全文数据库_工程科技Ⅰ辑》;20160115;B014-11 * |
Also Published As
Publication number | Publication date |
---|---|
CN110379467A (en) | 2019-10-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6262874B2 (en) | Database implementation method | |
JP5241738B2 (en) | Method and apparatus for building tree structure data from tables | |
US7702641B2 (en) | Method and system for comparing and updating file trees | |
CN101504662B (en) | A method and device for converting data | |
CN105607960A (en) | Repairing method and device of file system directory tree | |
CN110390997B (en) | Chemical molecular formula splicing method | |
CA2501608A1 (en) | System and method for schemaless data mapping with nested tables | |
CN110058969B (en) | Data recovery method and device | |
JP6331756B2 (en) | Test case generation program, test case generation method, and test case generation apparatus | |
Rahman et al. | Disk compression of k-mer sets | |
CN110379468B (en) | Improved chemical molecular formula segmentation method | |
CN110379467B (en) | Chemical molecular formula segmentation method | |
JP2018045286A (en) | Pre-processing device, index addition tree data correction method, and index addition tree data correction program | |
US20070083543A1 (en) | XML schema template builder | |
CN104378362A (en) | Method and device for carrying out conversion of message interfaces | |
Pons et al. | Generation of Level-$ k $ k LGT Networks | |
JP5867208B2 (en) | Data model conversion program, data model conversion method, and data model conversion apparatus | |
CN101553800B (en) | Migration apparatus which convert SAM/VSAM files of mainframe system into SAM/VSAM files of open system and method for thereof | |
US20210216566A1 (en) | Method, apparatus, and computer-readable medium for extracting a subset from a database | |
CN111414741B (en) | Method, device, equipment and medium for making layout templates for publications | |
CN115017161A (en) | Method, device and application for updating tree data structure by combining virtual DOM | |
CN102193947A (en) | Data access processing method and system | |
CN114356404A (en) | Interface document generating method, system and computer readable storage medium | |
JP4957618B2 (en) | Information processing apparatus and information processing program | |
CN115525435B (en) | A method and device for deep extraction of residual area data based on UBI image format |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20250908 Address after: 310000 Zhejiang Province Hangzhou City Binjiang District Xixing Street Qianmo Road 482 Building B 7th Floor Patentee after: Huoshi Creation Technology Co.,Ltd. Country or region after: China Address before: 610200 Sichuan Province, Chengdu City, Tianfu International Biomedical City (No. 18, Second Section of Shengwu City Middle Road, Shuangliu District) Patentee before: Chengdu Firestone Creation Technology Co.,Ltd. Country or region before: China |