CN114520022A - GPU parallel computing molecular similarity method, device, system and medium - Google Patents
GPU parallel computing molecular similarity method, device, system and medium Download PDFInfo
- Publication number
- CN114520022A CN114520022A CN202210144227.8A CN202210144227A CN114520022A CN 114520022 A CN114520022 A CN 114520022A CN 202210144227 A CN202210144227 A CN 202210144227A CN 114520022 A CN114520022 A CN 114520022A
- Authority
- CN
- China
- Prior art keywords
- molecular
- query
- compound
- gpu
- parallel computing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5072—Grid computing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Bioethics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及计算机辅助药物研发技术领域,具体为一种GPU并行计算分子相似度方法、装置、系统及介质。The invention relates to the technical field of computer-aided drug research and development, in particular to a method, device, system and medium for GPU parallel calculation of molecular similarity.
背景技术Background technique
药物研发具有投入大、风险高、周期长的特点,通常一个药物研发周期在10年以上,研发投入在数亿美金,并且呈现逐年上升趋势。药物筛选是药物发现的关键环节,而高通量药物虚拟筛选能大大降低筛选时间和成本,对于加速药物研发具有重要意义。两个分子的三维形状和药效团(分子中具有特定性质的原子集团)的彼此匹配和定量比较是药物设计中主要方法。分子体积与形状相关,而形状决定分子的物理化学性质,而这种性质决定分子的生物活性。Drug R&D has the characteristics of large investment, high risk and long cycle. Usually, a drug R&D cycle is more than 10 years, and the R&D investment is hundreds of millions of dollars, and it shows an upward trend year by year. Drug screening is a key link in drug discovery, and high-throughput drug virtual screening can greatly reduce screening time and cost, which is of great significance for accelerating drug development. The matching and quantitative comparison of the three-dimensional shapes of two molecules and the pharmacophore (a group of atoms with specific properties in a molecule) is a major method in drug design. Molecular volume is related to shape, and shape determines the physicochemical properties of the molecule, and this property determines the biological activity of the molecule.
分子形状比较是一种常见的技术:用于识别两个或多个分子之间的空间特征,在基于配体的化合物发现工作中,这是常用的一个度量。其中在3D形状相似度搜索,效果最好的算法是“基于原子中心的高斯叠加”,OpenEye Scientific Software的软件包中,ROCS就应用了“基于原子中心的高斯叠加”算法进行相似度运算。但即使使用了高斯优化的技术,ROCS在扫描大型化合物数据库时,也需要很长时间才能计算完成。Molecular shape comparison is a common technique: used to identify spatial features between two or more molecules, a metric commonly used in ligand-based compound discovery efforts. Among them, in the 3D shape similarity search, the best algorithm is "Gaussian superposition based on atomic center". In the software package of OpenEye Scientific Software, ROCS applies the "Gaussian superposition based on atomic center" algorithm for similarity calculation. But even with Gaussian optimization techniques, ROCS takes a long time to complete when scanning large compound databases.
因此研发一种快速且成功率较高的分子形状比较方法成了亟需解决的难题。Therefore, the development of a rapid and successful molecular shape comparison method has become an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本发明的发明目的在于提供一种GPU并行计算分子相似度方法、装置、系统及介质,用于提高3D高斯叠加相似度的计算速度,缩短大规模化合物相似度搜索需要的时间,同时在加入预处理提高搜索的成功率和效率。The purpose of the present invention is to provide a GPU parallel calculation method, device, system and medium for molecular similarity, which is used to improve the calculation speed of 3D Gaussian superposition similarity and shorten the time required for large-scale compound similarity search. Processing improves the success rate and efficiency of searches.
本发明解决上述技术问题所采取的技术方案如下:The technical scheme adopted by the present invention to solve the above-mentioned technical problems is as follows:
一种GPU并行计算分子相似度方法,包括以下步骤:A GPU parallel computing molecular similarity method, comprising the following steps:
输入查询化合物;Enter the query compound;
计算查询化合物的质心;Calculate the centroid of the query compound;
基于质心对查询化合物进行平移和/或旋转变换;translate and/or rotate the query compound based on the centroid;
通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对;Parallel computing is performed by GPU, and the query compounds before and after transformation are used as starting search points to compare with the molecular compounds in the molecular library;
根据比对的计算结果进行过滤,输出与查询化合物相似的分子化合物。Filter based on the calculated results of the alignment to output molecular compounds similar to the query compound.
在一个实施例中,所述输入查询化合物包括:In one embodiment, the input query compound includes:
输入查询化合物的三维结构信息,所述三维结构信息包括查询化合物中每个原子的类型及其坐标数值;Input the three-dimensional structure information of the query compound, where the three-dimensional structure information includes the type of each atom in the query compound and its coordinate value;
通过高斯函数表示查询化合物的体积。The volume of the query compound is represented by a Gaussian function.
在一个实施例中,所述计算查询化合物的质心包括以下步骤:In one embodiment, calculating the centroid of the query compound includes the steps of:
基于查询化合物的三维结构信息,计算查询化合物的质心;Calculate the center of mass of the query compound based on the three-dimensional structure information of the query compound;
并且进行SVD分解,计算3D旋转矩阵。And perform SVD decomposition to calculate 3D rotation matrix.
在一个实施例中,所述基于质心对查询化合物进行平移和/或旋转变换包括以下步骤:In one embodiment, the translation and/or rotation transformation of the query compound based on the centroid comprises the following steps:
对查询化合物进行平移和/或旋转变换;perform translation and/or rotation transformations on the query compound;
通过一组四元组数据记录查询化合物的平移和/或旋转变换。The translation and/or rotation transformation of the query compound is recorded through a set of quadruplets of data.
在一个实施例中,所述通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对包括以下步骤:In one embodiment, performing parallel computing on GPU, and comparing the query compound before and after the transformation with the molecular compound in the molecular library as the initial search point includes the following steps:
将查询化合物的高斯函数和记录化合物位移的四元组加载进GPU内存;Load the Gaussian function of the query compound and the quadruple recording the compound displacement into the GPU memory;
GPU的多个算点运行最优化算法,计算查询化合物与分子库内分子化合物之间的3D高斯叠加。Multiple computing points of the GPU run the optimization algorithm to calculate the 3D Gaussian superposition between the query compound and the molecular compounds in the molecular library.
在一个实施例中,所述最优化算法为BFGS和梯度下降法。In one embodiment, the optimization algorithm is BFGS and gradient descent.
本发明的另一实施例还提供了一种GPU并行计算分子相似度装置,所述装置包括:Another embodiment of the present invention also provides a GPU parallel computing molecular similarity device, the device includes:
分子输入模块,用于输入查询化合物;Molecular input module for inputting query compounds;
质心计算模块,用于计算查询化合物的质心;The centroid calculation module is used to calculate the centroid of the query compound;
分子变换模块,用于基于质心对查询化合物进行平移和/或旋转变换;Molecular transformation module for translation and/or rotation transformation of query compounds based on centroids;
分子比对模块,用于通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对;Molecular alignment module is used for parallel computing through GPU, and the query compound before and after transformation is used as the starting search point to compare with the molecular compounds in the molecular library;
结果输出模块,用于根据比对的计算结果进行过滤,输出与查询化合物相似的分子化合物。The result output module is used to filter according to the calculation result of the comparison, and output the molecular compounds similar to the query compound.
本发明的另一实施例还提供了一种GPU并行计算分子相似度系统,所述系统包括至少一个处理器;以及,Another embodiment of the present invention also provides a GPU parallel computing molecular similarity system, the system includes at least one processor; and,
与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行上述的GPU并行计算分子相似度方法。The memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to perform the above-described GPU parallel computing molecular similarity method.
本发明的另一实施例还提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行时,可使得所述一个或多个处理器执行上述的GPU并行计算分子相似度方法。Another embodiment of the present invention also provides a non-volatile computer-readable storage medium storing computer-executable instructions, the computer-executable instructions being stored by one or more When executed by the processor, the one or more processors can be made to execute the above-mentioned method for calculating molecular similarity in parallel on the GPU.
有益效果:本发明一种GPU并行计算分子相似度方法、装置、系统及介质利用GPU并行计算优势,在GPU上实现3D相似度的高斯叠加计算,与基于CPU实现相比,在速度获得了一个数量级的提升,缩短了虚拟筛选中,3D相似度的查询时间。在计算高斯叠加的过程中,我们计算重叠函数及其梯度变换坐标,优化分子重叠,进行并行计算,同时为了能更快的找到全局最优叠加,对化合物进行平移和旋转变换,作为起始搜索点。使得在GPU进行计算能更好的找到全局最优;最终的计算结果按照用户感兴趣的相似度阈值生成,与现有技术相比较在计算速度上有显著的提升。Beneficial effects: a method, device, system and medium for parallel computing of molecular similarity based on GPU of the present invention utilizes the advantages of parallel computing of GPU to realize Gaussian superposition calculation of 3D similarity on GPU. The improvement of orders of magnitude shortens the query time of 3D similarity in virtual screening. In the process of calculating the Gaussian superposition, we calculate the overlapping function and its gradient transformation coordinates, optimize the molecular overlap, and perform parallel calculations. At the same time, in order to find the global optimal superposition faster, we perform translation and rotation transformations on the compounds as a starting search. point. The calculation on the GPU can better find the global optimum; the final calculation result is generated according to the similarity threshold that the user is interested in, and the calculation speed is significantly improved compared with the prior art.
发明的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点可通过在所写的说明书以及附图中所特别指出的结构来实现和获得。Other features and advantages of the invention will be set forth in the description which follows, and in part will become apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and drawings.
附图说明Description of drawings
下面结合附图对本发明进行详细的描述,以使得本发明的上述优点更加明确。The present invention will be described in detail below with reference to the accompanying drawings, so as to make the above advantages of the present invention more clear.
图1是本发明一种GPU并行计算分子相似度方法的流程图。FIG. 1 is a flowchart of a GPU parallel computing method for molecular similarity according to the present invention.
图2本发明一种GPU并行计算分子相似度装置的实施例的功能模块示意图;2 is a schematic diagram of functional modules of an embodiment of a GPU parallel computing device for molecular similarity of the present invention;
图3本发明一种GPU并行计算分子相似度设备的实施例的硬件结构示意图。FIG. 3 is a schematic diagram of the hardware structure of an embodiment of a GPU parallel computing device for molecular similarity according to the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案及效果更加清楚、明确,以下对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the objectives, technical solutions and effects of the present invention clearer and clearer, the present invention will be described in further detail below. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.
如图1-2所示,一种GPU并行计算分子相似度方法,包括以下步骤:As shown in Figure 1-2, a GPU parallel computing molecular similarity method includes the following steps:
S100、输入查询化合物;S100. Input a query compound;
从数据库中提取或自定义构建,输入需要查询的化合物,基于查询的化合物提取特征点,生成检索式子。Extract from the database or customize the construction, enter the compound to be queried, extract feature points based on the queried compound, and generate a search formula.
S200、计算查询化合物的质心;S200. Calculate the centroid of the query compound;
在计算查询化合物与库内的化合物之间的3D高斯叠加时,为了保证计算的准确度和提高计算速度,会对查询化合物进行调整,为了方便计算和比对,选取查询化合物的质心为标定点。When calculating the 3D Gaussian superposition between the query compound and the compounds in the library, in order to ensure the accuracy of the calculation and improve the calculation speed, the query compound will be adjusted. For the convenience of calculation and comparison, the centroid of the query compound is selected as the calibration point .
S300、基于质心对查询化合物进行平移和/或旋转变换;S300, performing translation and/or rotation transformation on the query compound based on the centroid;
基于质心对查询化合物进行位移变换,为了计算高斯叠加时更快的找到全局最优叠加。The query compound is shifted based on the centroid, in order to find the global optimal superposition faster when calculating the Gaussian superposition.
S400、通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对;S400, performing parallel computing through GPU, and comparing the query compound before and after transformation with the molecular compound in the molecular library as a starting search point;
GPU的核心擅长完成具有简单的控制逻辑的任务,重在计算,重在并行。相比于CPU,GPU拥有大量的算力。查询化合物在与分子库的分子化合物进行比对时,利用GPU的并行计算的优势,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对,极大的提高了计算查询化合物与库内的化合物之间的3D高斯叠加的效率,缩短了检索时间;且利用GPU的并行计算和对查询化合物进行平移和/或旋转变换,提高了在计算3D高斯叠加的成功率。The core of the GPU is good at completing tasks with simple control logic, focusing on computing and focusing on parallelism. Compared with CPU, GPU has a lot of computing power. When the query compound is compared with the molecular compounds in the molecular library, the advantage of GPU parallel computing is used to compare the query compounds before and after transformation with the molecular compounds in the molecular library as the starting search point, which greatly improves the performance of the query compound. The efficiency of calculating the 3D Gaussian superposition between the query compound and the compounds in the library shortens the retrieval time; and the parallel computing of the GPU and the translation and/or rotation transformation of the query compound improve the success rate of calculating the 3D Gaussian superposition .
S500、根据比对的计算结果进行过滤,输出与查询化合物相似的分子化合物。根据查询化合物与分子库的分子化合物的比对结果,根据3D高斯叠加计算Tanimoto3D相似度,从分子库中输出与查询化合物相近似的分子化合物。S500 , filtering according to the comparison calculation result, and outputting molecular compounds similar to the query compound. According to the comparison result between the query compound and the molecular compounds in the molecular library, the Tanimoto3D similarity is calculated according to the 3D Gaussian superposition, and the molecular compounds similar to the query compound are output from the molecular library.
在一个实施例中,所述输入查询化合物包括:In one embodiment, the input query compound includes:
输入查询化合物的三维结构信息,所述三维结构信息包括查询化合物中每个原子的类型及其坐标数值;根据查询化合物中各原子的类型得到相应的范德华半径,将三维结构信息转换为一组代表查询化合物中各原子的高斯球,每个高斯球的半径与相应原子的范德华半径相同,且每个高斯球的位置与相应原子的坐标相同;Input the three-dimensional structure information of the query compound, the three-dimensional structure information includes the type of each atom in the query compound and its coordinate value; obtain the corresponding van der Waals radius according to the type of each atom in the query compound, and convert the three-dimensional structure information into a set of representative Query the Gaussian sphere of each atom in the compound, the radius of each Gaussian sphere is the same as the van der Waals radius of the corresponding atom, and the position of each Gaussian sphere is the same as the coordinates of the corresponding atom;
通过高斯函数表示查询化合物的体积。The volume of the query compound is represented by a Gaussian function.
在一个实施例中,所述计算查询化合物的质心包括以下步骤:In one embodiment, calculating the centroid of the query compound includes the steps of:
基于查询化合物的三维结构信息,计算查询化合物的质心;Calculate the center of mass of the query compound based on the three-dimensional structure information of the query compound;
并且进行SVD分解,计算3D旋转矩阵。And perform SVD decomposition to calculate 3D rotation matrix.
相较于原子中心,质心在后续对查询化合物的变换中具有重要地位,基于分子的三维结构信息,计算查询化合物的质心。为了减少其他数据的干扰且数据计算的体量,在找出查询化合物的质心后,通过SVD分解,计算3D旋转矩阵。Compared with the atom center, the center of mass plays an important role in the subsequent transformation of the query compound. Based on the three-dimensional structure information of the molecule, the center of mass of the query compound is calculated. In order to reduce the interference of other data and the volume of data calculation, after finding the center of mass of the query compound, the 3D rotation matrix is calculated by SVD decomposition.
在一个实施例中,所述基于质心对查询化合物进行平移和/或旋转变换包括以下步骤:In one embodiment, the translation and/or rotation transformation of the query compound based on the centroid comprises the following steps:
对查询化合物进行平移和/或旋转变换;perform translation and/or rotation transformations on the query compound;
通过一组四元组数据记录查询化合物的平移和/或旋转变换。The translation and/or rotation transformation of the query compound is recorded through a set of quadruplets of data.
在三维空间中,任何一个坐标系可以有一个4*4的转换矩阵表示,其中左上角3*3的矩阵表示旋转矩阵,第四列前三个表示x,y,z坐标。In three-dimensional space, any coordinate system can be represented by a 4*4 transformation matrix, where the 3*3 matrix in the upper left corner represents the rotation matrix, and the first three in the fourth column represent the x, y, and z coordinates.
四元组最普通的表示形式为:q=s+xi+yj+zk s,x,y,z∈R。对于一个任意的向量,我们可以用它的长度,它的方向来表示它,类似与向量的旋转,我们也可以用一个四元组表示四元组的旋转q=[cosθ,sinθv]。The most common representation of a quaternion is: q=s+xi+yj+zk s,x,y,z∈R. For an arbitrary vector, we can use its length and its direction to represent it. Similar to the rotation of the vector, we can also use a quadruple to represent the rotation of the quadruple q=[cosθ, sinθv].
在计算查询化合物与库内的化合物之间的3D高斯叠加时,为了需求最优解,往往会对查询化合物进行调整,本实施例中采用GPU进行并行计算,充分利用GPU算力高和并行计算的优势,在检索时先将查询化合物进行位移变换,并通过一组四元组记录查询化合物的旋转矢量;并利用GPU并行计算额优势,以变换前后的查询化合物均作为搜索起点与分子库内的分子化合物进行比对,提高提高搜索的成功率和效率。When calculating the 3D Gaussian superposition between the query compound and the compounds in the library, the query compound is often adjusted in order to demand the optimal solution. In this embodiment, the GPU is used for parallel computing, making full use of the high computing power and parallel computing of the GPU. When searching, the query compound is first subjected to displacement transformation, and the rotation vector of the query compound is recorded through a set of quaternions; and the advantage of GPU parallel computing is used to use the query compound before and after transformation as the search starting point and in the molecular library. The molecular compounds are compared to improve the success rate and efficiency of the search.
在一个实施例中,所述通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对包括以下步骤:In one embodiment, performing parallel computing on GPU, and comparing the query compound before and after the transformation with the molecular compound in the molecular library as the initial search point includes the following steps:
将查询化合物的高斯函数和记录化合物位移的四元组加载进GPU内存;Load the Gaussian function of the query compound and the quadruple recording the compound displacement into the GPU memory;
为了提高搜索的成功率和效率,利用GPU算力高和并行计算的优势,变换前后的查询化合物均作为搜索起点。In order to improve the success rate and efficiency of the search, taking advantage of the high computing power and parallel computing of GPU, the query compounds before and after transformation are used as the search starting point.
GPU的多个算点运行最优化算法,计算查询化合物与分子库内分子化合物之间的3D高斯叠加。Multiple computing points of the GPU run the optimization algorithm to calculate the 3D Gaussian superposition between the query compound and the molecular compounds in the molecular library.
以变换前后的查询化合物均作为搜索起点,同时进行计算与库内的化合物之间的3D高斯叠加,计算重叠函数及其梯度变换坐标,优化分子重叠。The query compound before and after transformation is used as the search starting point, and the 3D Gaussian superposition between the calculation and the compounds in the library is performed at the same time, the overlapping function and its gradient transformation coordinates are calculated, and the molecular overlap is optimized.
并行计算的优势,主要体现在两个发面,一方面为查询化合物可以与分子库中多个不同你那个的化合物进行重叠比对,提高效率。另一方面为可以以查询化合物的不同形态与分子库汇总同一化合物进行重叠比对,加快了寻找最优解的速度的同时提高了正确率。The advantages of parallel computing are mainly reflected in two aspects. On the one hand, the query compound can be overlapped with multiple different compounds in the molecular library to improve efficiency. On the other hand, different forms and molecular libraries of the query compound can be used to aggregate the same compound for overlapping comparison, which speeds up the search for the optimal solution and improves the accuracy rate.
在一个实施例中,所述最优化算法为BFGS和梯度下降法。In one embodiment, the optimization algorithm is BFGS and gradient descent.
在计算高斯叠加的过程中,我们计算重叠函数及其梯度变换坐标,优化分子重叠,使用了支持并行计算的BFGS算法。牛顿法是一种在实数域和复数域上近似求解方程的方法。方法使用函数f(x)的泰勒级数的前面几项来寻找方程f(x)=0的根。牛顿法最大的特点就在于它的收敛速度很快。In the process of calculating the Gaussian superposition, we calculate the overlap function and its gradient transformation coordinates, optimize the molecular overlap, and use the BFGS algorithm that supports parallel computing. Newton's method is a method of approximately solving equations in the real and complex fields. The method uses the first few terms of the Taylor series of the function f(x) to find the roots of the equation f(x)=0. The biggest feature of Newton's method is that its convergence speed is very fast.
梯度下降法是最早最简单,也是最为常用的最优化方法。梯度下降法实现简单,当目标函数是凸函数时,梯度下降法的解是全局解。一般情况下,其解不保证是全局最优解,梯度下降法的速度也未必是最快的。梯度下降法的优化思想是用当前位置负梯度方向作为搜索方向,因为该方向为当前位置的最快下降方向,所以也被称为是”最速下降法“。最速下降法越接近目标值,步长越小,前进越慢。Gradient descent is the earliest, simplest and most commonly used optimization method. The gradient descent method is simple to implement. When the objective function is a convex function, the solution of the gradient descent method is a global solution. In general, the solution is not guaranteed to be the global optimal solution, and the speed of the gradient descent method is not necessarily the fastest. The optimization idea of the gradient descent method is to use the negative gradient direction of the current position as the search direction, because this direction is the fastest descent direction of the current position, so it is also called the "steepest descent method". The closer the steepest descent method is to the target value, the smaller the step size and the slower the progress.
综上的技术方案,利用GPU并行计算优势,在GPU上实现3D相似度的高斯叠加计算,与基于CPU实现相比,在速度获得了一个数量级的提升,缩短了虚拟筛选中,3D相似度的查询时间。在计算高斯叠加的过程中,我们计算重叠函数及其梯度变换坐标,优化分子重叠,使用了支持并行计算的BFGS算法,同时为了是BFGS能更快的找到全局最优叠加,对化合物进行平移和旋转变换,作为BFGS的起始搜索点。使得在GPU进行计算能更好的找到全局最优;最终的计算结果按照用户感兴趣的相似度阈值生成。实践结果证明采用该并行GPU实现的方法与OpenEye ROCS CPU实现相比较,在计算速度上有巨大的提升。To sum up, the above technical solution takes advantage of GPU parallel computing to realize Gaussian superposition calculation of 3D similarity on GPU. Compared with CPU-based implementation, the speed is improved by an order of magnitude, which shortens the time required for 3D similarity in virtual screening. query time. In the process of calculating the Gaussian superposition, we calculate the overlap function and its gradient transformation coordinates, optimize the molecular overlap, and use the BFGS algorithm that supports parallel computing. Rotation transformation, as the starting search point for BFGS. This makes it possible to better find the global optimum for computing on the GPU; the final computing result is generated according to the similarity threshold that the user is interested in. The practical results show that the method implemented by the parallel GPU has a huge improvement in calculation speed compared with the OpenEye ROCS CPU implementation.
本发明的另一实施例还提供了一种GPU并行计算分子相似度装置,所述装置包括:Another embodiment of the present invention also provides a GPU parallel computing molecular similarity device, the device includes:
分子输入模块,用于输入查询化合物;Molecular input module for inputting query compounds;
质心计算模块,用于计算查询化合物的质心;The centroid calculation module is used to calculate the centroid of the query compound;
分子变换模块,用于基于质心对查询化合物进行平移和/或旋转变换;Molecular transformation module for translation and/or rotation transformation of query compounds based on centroids;
分子比对模块,用于通过GPU进行并行计算,将变换前后的查询化合物均作为起始搜索点与分子库中的分子化合物进行比对;Molecular alignment module is used for parallel computing through GPU, and the query compound before and after transformation is used as the starting search point to compare with the molecular compounds in the molecular library;
结果输出模块,用于根据比对的计算结果进行过滤,输出与查询化合物相似的分子化合物。The result output module is used to filter according to the calculation result of the comparison, and output the molecular compounds similar to the query compound.
本发明另一实施例提供一种三维模型的生成系统,如图3所示,系统50包括:Another embodiment of the present invention provides a system for generating a three-dimensional model. As shown in FIG. 3 , the
一个或多个处理器510以及存储器520,图3中以一个处理器510为例进行介绍,处理器510和存储器520可以通过总线或者其他方式连接,图3中以通过总线连接为例。One or
处理器510用于完成系统50的各种控制逻辑,其可以为通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、单片机、ARM(AcornRISCMachine)或其它可编程逻辑器件、分立门或晶体管逻辑、分立的硬件组件或者这些部件的任何组合。还有,处理器510还可以是任何传统处理器、微处理器或状态机。处理器510也可以被实现为计算设备的组合,例如,DSP和微处理器的组合、多个微处理器、一个或多个微处理器结合DSP和/或任何其它这种配置。The
存储器520作为一种非易失性计算机可读存储介质,可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本发明实施例中的GPU并行计算分子相似度方法对应的程序指令。处理器510通过运行存储在存储器520中的非易失性软件程序、指令以及单元,从而执行系统50的各种功能应用以及数据处理,即实现上述方法实施例中的GPU并行计算分子相似度方法。The
存储器520可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据系统50使用所创建的数据等。此外,存储器520可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件或其他非易失性固态存储器件。在一些实施例中,存储器520可选包括相对于处理器510远程设置的存储器,这些远程存储器可以通过网络连接至系统50。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The
一个或者多个单元存储在存储器520中,当被一个或者多个处理器510执行时,执行上述任意方法实施例中的GPU并行计算分子相似度方法,例如,执行以上描述的图1中的方法步骤S100至步骤S400。One or more units are stored in the
本发明实施例提供了一种非易失性计算机可读存储介质,计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行,例如,执行以上描述的图1中的方法步骤S100至步骤S500。Embodiments of the present invention provide a non-volatile computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more processors, for example, to execute the above-described The method steps S100 to S500 in FIG. 1 .
作为示例,非易失性存储介质能够包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦ROM(EEPROM)或闪速存储器。易失性存储器能够包括作为外部高速缓存存储器的随机存取存储器(RAM)。通过说明丽非限制,RAM可以以诸如同步RAM(SRAM)、动态RAM、(DRAM)、同步DRAM(SDRAM)、双数据速率SDRAM(DDR SDRAM)、增强型SDRAM(ESDRAM)、Synchlink DRAM(SLDRAM)以及直接Rambus(兰巴斯)RAM(DRRAM)之类的许多形式得到。本文中所描述的操作环境的所公开的存储器组件或存储器旨在包括这些和/或任何其他适合类型的存储器中的一个或多个。As examples, the nonvolatile storage medium can include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) as external cache memory. By way of illustration, and not limitation, RAM can be configured in formats such as Synchronous RAM (SRAM), Dynamic RAM, (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), Synchlink DRAM (SLDRAM) And many forms like direct Rambus (Lambas) RAM (DRRAM). The disclosed memory components or memories of the operating environments described herein are intended to include one or more of these and/or any other suitable types of memory.
本发明的另一种实施例提供了一种计算机程序产品,计算机程序产品包括存储在非易失性计算机可读存储介质上的计算机程序,计算机程序包括程序指令,当程序指令被处理器执行时,使处理器执行上述方法实施例的GPU并行计算分子相似度方法。例如,执行以上描述的图1中的方法步骤S100至步骤S400。Another embodiment of the present invention provides a computer program product comprising a computer program stored on a non-volatile computer-readable storage medium, the computer program comprising program instructions that when executed by a processor , causing the processor to execute the GPU parallel computing molecular similarity method of the above method embodiments. For example, the above-described method steps S100 to S400 in FIG. 1 are performed.
综上,本发明公开了一种GPU并行计算分子相似度方法、装置、系统及介质,方法对骨骼模型进行轻量化的同时,还将骨骼模型与皮肤模型进行绑定,并提供骨骼微调、动作微调、皮肤微调等工具,创作出自然、流畅的三维虚拟对象,严格控制数据的体积和传输量,极大地减少加载等待时间和程序的预算量,并最后将文件转存为一个通用性文件,适用于不同的三维模型运用平台,通用性较强。To sum up, the present invention discloses a method, device, system and medium for GPU parallel calculation of molecular similarity. The method reduces the weight of the skeleton model, binds the skeleton model and the skin model, and provides fine-tuning and action of the skeleton. Fine-tuning, skin fine-tuning and other tools can create natural and smooth 3D virtual objects, strictly control the volume and transmission volume of data, greatly reduce the loading waiting time and the budget of the program, and finally transfer the file to a general file, It is suitable for different 3D model application platforms and has strong versatility.
以上所描述的实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际需要选择其中的部分或者全部模块来实现本实施例方案的目的。The above-described embodiments are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, Alternatively, it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
通过以上的实施例的描述,本领域的技术人员可以清楚地了解到各实施例可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件实现。基于这样的理解,上述技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存在于计算机可读存储介质中,如ROM/RAM、碱碟、光盘等,包括若干指今用以使得一台计算机电子设备(可以是个人计算机,服务器,或者网络电子设备等)执行各个实施例或者实施例的某些部分的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence, or the parts that make contributions to related technologies. The computer software products can exist in computer-readable storage media, such as ROM/RAM, alkaline disks , optical disc, etc., including several instructions for causing a computer electronic device (which may be a personal computer, a server, or a network electronic device, etc.) to perform various embodiments or methods of certain parts of the embodiments.
除了其他之外,诸如"能够"、"能"、"可能"或"可以"之类的条件语言除非另外具体地陈述或者在如所使用的上下文内以其他方式理解,否则一般地旨在传达特定实施方式能包括(然而其他实施方式不包括)特定特征、元件和/或操作。因此,这样的条件语言一般地还旨在暗示特征、元件和/或操作对于一个或多个实施方式无论如何都是需要的或者一个或多个实施方式必须包括用于在有或没有输入或提示的情况下判定这些特征、元件和/或操作是否被包括或者将在任何特定实施方式中被执行的逻辑。Conditional language such as "could," "could," "may," or "could," among others, is generally intended to convey unless specifically stated otherwise or otherwise understood within the context as used Certain embodiments can include, while other embodiments do not, include particular features, elements, and/or operations. Thus, such conditional language is also generally intended to imply that features, elements, and/or operations are required anyway for one or more implementations or that one or more implementations must include for use with or without input or prompting logic to determine whether such features, elements and/or operations are included or to be performed in any particular implementation.
已经在本文中在本说明书和附图中描述的内容包括能够提供一种GPU并行计算分子相似度方法、装置、系统及介质的示例。当然,不能够出于描述本公开的各种特征的目的来描述元件和/或方法的每个可以想象的组合,但是可以认识到,所公开的特征的许多另外的组合和置换是可能的。因此,显而易见的是,在不脱离本公开的范围或精神的情况下能够对本公开做出各种修改。此外,或在替代方案中,本公开的其他实施例从对本说明书和附图的考虑以及如本文中所呈现的本公开的实践中可能是显而易见的。意图是,本说明书和附图中所提出的示例在所有方面被认为是说明性的而非限制性的。尽管在本文中采用了特定术语,但是它们在通用和描述性意义上被使用并且不用于限制的目的。What has been described herein in this specification and the accompanying drawings includes examples that can provide a method, apparatus, system and medium for GPU parallel computing of molecular similarity. Of course, not every conceivable combination of elements and/or methods has been described for the purpose of describing the various features of the present disclosure, but it will be appreciated that many additional combinations and permutations of the disclosed features are possible. Therefore, it will be apparent that various modifications can be made in the present disclosure without departing from the scope or spirit of the disclosure. In addition, or in the alternative, other embodiments of the present disclosure may be apparent from consideration of this specification and drawings, and from practice of the present disclosure as presented herein. It is intended that the examples presented in this specification and drawings are to be regarded in all respects as illustrative and not restrictive. Although specific terms are employed herein, they are used in a generic and descriptive sense and not for purposes of limitation.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210144227.8A CN114520022B (en) | 2022-02-17 | 2022-02-17 | A GPU parallel calculation method, device, system and medium for molecular similarity |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210144227.8A CN114520022B (en) | 2022-02-17 | 2022-02-17 | A GPU parallel calculation method, device, system and medium for molecular similarity |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN114520022A true CN114520022A (en) | 2022-05-20 |
| CN114520022B CN114520022B (en) | 2025-06-10 |
Family
ID=81599429
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202210144227.8A Active CN114520022B (en) | 2022-02-17 | 2022-02-17 | A GPU parallel calculation method, device, system and medium for molecular similarity |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN114520022B (en) |
Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1886659A (en) * | 2003-10-14 | 2006-12-27 | 维颂公司 | Molecular conformation and combination analysis method and instrument |
| US20090006395A1 (en) * | 2007-05-25 | 2009-01-01 | Isis Innovation Ltd. | Shape recognition methods and systems for searching molecular databases |
| KR20110085706A (en) * | 2010-01-21 | 2011-07-27 | 서울대학교산학협력단 | 3D alignment method of molecules |
| US20110213567A1 (en) * | 2010-02-26 | 2011-09-01 | The Board Of Trustees Of The Leland Stanford Junior University | Method for Rapidly Approximating Similarities |
| CN102436545A (en) * | 2011-10-13 | 2012-05-02 | 苏州东方楷模医药科技有限公司 | Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration |
| CN107209813A (en) * | 2014-11-25 | 2017-09-26 | 国家信息及自动化研究院 | Interaction parameters for the input set of molecular structures |
| CN107657146A (en) * | 2017-09-20 | 2018-02-02 | 广州市爱菩新医药科技有限公司 | Drug molecule comparative approach based on three-dimensional minor structure |
| CN108205613A (en) * | 2017-12-11 | 2018-06-26 | 华南理工大学 | The computational methods of similarity and system and their application between a kind of compound molecule |
| CN113409896A (en) * | 2020-03-16 | 2021-09-17 | Gsi 科技公司 | Molecular Similarity Search |
-
2022
- 2022-02-17 CN CN202210144227.8A patent/CN114520022B/en active Active
Patent Citations (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1886659A (en) * | 2003-10-14 | 2006-12-27 | 维颂公司 | Molecular conformation and combination analysis method and instrument |
| US20090006395A1 (en) * | 2007-05-25 | 2009-01-01 | Isis Innovation Ltd. | Shape recognition methods and systems for searching molecular databases |
| KR20110085706A (en) * | 2010-01-21 | 2011-07-27 | 서울대학교산학협력단 | 3D alignment method of molecules |
| US20110213567A1 (en) * | 2010-02-26 | 2011-09-01 | The Board Of Trustees Of The Leland Stanford Junior University | Method for Rapidly Approximating Similarities |
| CN102436545A (en) * | 2011-10-13 | 2012-05-02 | 苏州东方楷模医药科技有限公司 | Diversity analysis method based on chemical structure with CPU (Central Processing Unit) acceleration |
| CN107209813A (en) * | 2014-11-25 | 2017-09-26 | 国家信息及自动化研究院 | Interaction parameters for the input set of molecular structures |
| CN107657146A (en) * | 2017-09-20 | 2018-02-02 | 广州市爱菩新医药科技有限公司 | Drug molecule comparative approach based on three-dimensional minor structure |
| CN108205613A (en) * | 2017-12-11 | 2018-06-26 | 华南理工大学 | The computational methods of similarity and system and their application between a kind of compound molecule |
| CN113409896A (en) * | 2020-03-16 | 2021-09-17 | Gsi 科技公司 | Molecular Similarity Search |
Non-Patent Citations (2)
| Title |
|---|
| MA, C等: "GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison", 《JOURNAL OF CHEMICAL INFORMATION AND MODELING》, vol. 51, no. 7, 21 June 2011 (2011-06-21) * |
| MAGGIONI, M等: "GPU-accelerated Chemical Similarity Assessment for Large Scale Databases", 《INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2011》, vol. 4, 31 December 2011 (2011-12-31) * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN114520022B (en) | 2025-06-10 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Zhang et al. | A novel quantum representation for log-polar images | |
| Zhao et al. | AttentionDTA: Drug–target binding affinity prediction by sequence-based deep learning with attention mechanism | |
| CN115148279B (en) | Method and device for predicting affinity between protein and ligand molecule | |
| Finkbeiner et al. | In-memory intelligence | |
| Sarkar et al. | An algorithm for DNA read alignment on quantum accelerators | |
| Zhang et al. | Learning all-in collaborative multiview binary representation for clustering | |
| KR102028404B1 (en) | Quantum RAM and quantum database utilizing quantum superposition | |
| CN114373509B (en) | GPU (graphics processing unit) acceleration AutoDock Vina-based method | |
| KR102705722B1 (en) | Method and apparatus for predicting protein-ligand docking for heme-protein | |
| CN114520022A (en) | GPU parallel computing molecular similarity method, device, system and medium | |
| Huang et al. | Moped: Efficient motion planning engine with flexible dimension support | |
| US20200321081A1 (en) | Method and device for computing stable binding structure and computer-readable recording medium recording program | |
| CN110111837B (en) | Method and system for searching protein similarity based on two-stage structure comparison | |
| CN115527626B (en) | Molecular processing method, molecular processing device, electronic apparatus, storage medium, and program product | |
| Liu et al. | G-learned index: Enabling efficient learned index on GPU | |
| Velentzas et al. | GPU-aided edge computing for processing the k nearest-neighbor query on SSD-resident data | |
| CN118918980B (en) | Method and device for generating initial structure of small molecule transition state based on conformation and force field | |
| WO2024161359A2 (en) | Compound representation and property analysis at scale | |
| Heidari et al. | DobLIX: A Dual-Objective Learned Index for Log-Structured Merge Trees | |
| Bishwas et al. | Molecular unfolding formulation with enhanced quantum annealing approach | |
| CN116994672A (en) | Screening method, apparatus, computer device, storage medium, and program product | |
| CN117577224B (en) | A template-based protein-small molecule complex modeling method and its application | |
| CN115393561A (en) | Method, device, equipment and storage medium for generating workpiece template coordinate system | |
| Matsantonis et al. | A Geometric Algebra Solution to the Absolute Orientation Problem | |
| Vasil’ev et al. | Hierarchical Assessment of the Structural Similarity of Pharmacologically Active Compounds |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |