CN111428056A

CN111428056A - Method and device for constructing scientific research personnel cooperative community

Info

Publication number: CN111428056A
Application number: CN202010340274.0A
Authority: CN
Inventors: 郑新章; 冯伟华; 王锐; 贾楠; 宗国浩; 刘亚丽; 王迪; 王永胜; 王峙
Original assignee: Zhengzhou Tobacco Research Institute of CNTC
Current assignee: Zhengzhou Tobacco Research Institute of CNTC
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2020-07-17

Abstract

The invention relates to a method and device for constructing a cooperative community of scientific researchers, and belongs to the technical field of data processing. It includes the following steps: obtaining the data of scientific research projects and researchers participating in the corresponding scientific research projects, as well as the data of scientific research results and the researchers who output the corresponding scientific research results; screening the obtained data based on the influence of the researchers: deleting the influence from the cooperation data Researchers who are lower than the set value; according to the filtered cooperation data, a network of scientific researchers' cooperation relationship is generated, which includes: the number of cooperation between each scientific researcher and other researchers, and the number of cooperation refers to the number of cooperation between two researchers. The number of times that you have participated in the same scientific research project, or the number of times that the same scientific research results have been produced; using the community discovery algorithm, based on the cooperative relationship network of researchers and the influence of researchers, a collaborative community of researchers is generated. The method of the invention reduces the amount of data processing and improves the construction efficiency and information accuracy of the map.

Description

A method and device for constructing a cooperative community of scientific researchers

技术领域technical field

本发明涉及一种科研人员合作社区的构建方法及装置，属于数据处理技术领域。The invention relates to a method and device for constructing a cooperative community of scientific researchers, and belongs to the technical field of data processing.

背景技术Background technique

科技知识图谱合作社区是指由合作关系紧密的科研人员组成的小群体，是通过科研人员之间在参与项目、产出各类成果中的合作关系，识别合作紧密的科研人员间社区结构以及社区间的关系，并以可视化图谱方式展示，从而发现科研活动中科研人员之间的合作性。科技知识图谱中合作社区网络同现实生活中社交网络具有相似的特性及社区结构，即人员归属于不同社区，整个网络由若干个社区构成，每个社区内的节点之间联系相对紧密，社区之间的联系相对稀疏。网络中每个节点的大小、连线的粗细具有实际意义，以揭示网络节点的影响力、节点之间合作者合作的关系以及合作的密切程度。社区发现方法就是从复杂网络关系中发现具有模块结构特性的群体，结合领域知识数据，实现领域内人员社群结构的探查。The scientific and technological knowledge graph cooperative community refers to a small group composed of researchers with close cooperation relationships. The relationship between them is displayed in a visual map, so as to discover the cooperation between researchers in scientific research activities. The cooperative community network in the science and technology knowledge graph has similar characteristics and community structure to the social network in real life, that is, people belong to different communities, the whole network is composed of several communities, and the nodes in each community are relatively closely connected. The connection between them is relatively sparse. The size of each node in the network and the thickness of the connection have practical significance to reveal the influence of network nodes, the cooperation relationship between partners and the closeness of cooperation. The community discovery method is to discover groups with modular structure characteristics from complex network relationships, and combine domain knowledge data to realize the exploration of the community structure of people in the domain.

社区发现算法最早应用于社交网络领域，用来发现或找出兴趣、爱好相同的社交团体。在科学技术领域，各个领域经过多年发展，均积累了大量有价值的科研活动数据。基于科研人员的合作关系数据进行社区发现，能够得到反映出有紧密合作关系的科研人员社区团体的图谱。反映出在同一社区内的科研人员在学术研究上有一定的相通之处，大的社区也代表了对应技术领域下的一个发展和研究方向，提供了极为有价值的技术情报和科研资料，为寻找研发课题提供了参考，为相关的研究和开发工作寻找合作伙伴提供了支持，同时为不同科研机构中科研团队的发现起到辅助作用。Community discovery algorithms were first used in the field of social networks to discover or find social groups with the same interests and hobbies. In the field of science and technology, various fields have accumulated a large amount of valuable scientific research activity data after years of development. Community discovery based on data on researchers' collaborations yields a map that reflects community groups of researchers with close collaborations. Reflecting that researchers in the same community have certain similarities in academic research, large communities also represent a development and research direction in the corresponding technical field, providing extremely valuable technical information and scientific research materials for It provides a reference for finding research and development topics, provides support for finding partners for related research and development work, and plays an auxiliary role in the discovery of research teams in different research institutions.

技术密集发展的今天，在科研活动中，任何一个领域或方向的开发或研究都不可能是孤立的，某个领域内的某个课题的探索和研究都离不开其他领域的技术支持，这也导致了今天大量交叉学科的产生和技术分支越来越细分和精确。在这个背景下，技术合作中在各个领域各个研发方向都会出现其他领域辅助技术或支撑技术的身影，这就导致某个技术领域，例如烟草技术领域中，科研人员的合作数据里面除了相对于本领域的活跃技术人员，还存在大量边缘技术领域或其他领域活跃的科研技术人员，造成合作关系及合作数据复杂且庞大，导致在进行社区发现时计算量大、社区发现算法计算效率低下，且占用很大硬件计算资源，不便于实时或者频繁及时的更新数据，造成信息滞后，难以保持信息图谱的准确性。同时合作数据中每个科研人员都是合作社区中的最小单元，也是最终信息图谱上的节点，节点数量大也直接导致了图谱信息冗杂，可读性差，难以提取出直接有效的信息；另外对对应合作社区的科研方向的判断也要参考社区中作为节点的科研人员本身的研究方向，因此大量其他领域科研人员的存在会影响社区科研方向的判断，导致社区反映的科研方向和涉及的相关课题出现偏差甚至错误，导致最终传达出错误的技术情报，极大的影响了用户体验。Today, with the intensive development of technology, in scientific research activities, the development or research of any field or direction cannot be isolated, and the exploration and research of a certain topic in a certain field cannot be separated from the technical support of other fields. It has also led to the generation of a large number of interdisciplinary disciplines and the more and more subdivided and precise branches of technology. In this context, auxiliary technologies or supporting technologies in other fields will appear in various R&D directions in various fields in technical cooperation, which leads to a certain technical field, such as the field of tobacco technology, in the cooperation data of scientific researchers, in addition to relative to this field. There are also a large number of active technical personnel in the field, and there are also a large number of active scientific research and technical personnel in the field of edge technology or other fields, resulting in complex and huge cooperative relations and cooperative data, resulting in a large amount of calculation during community discovery, and the computational efficiency of community discovery algorithms is low. Large hardware computing resources make it inconvenient to update data in real time or frequently, resulting in information lag, and it is difficult to maintain the accuracy of the information map. At the same time, each researcher in the cooperative data is the smallest unit in the cooperative community, and it is also a node on the final information graph. The large number of nodes also directly leads to redundant graph information, poor readability, and difficulty in extracting direct and effective information; The judgment of the scientific research direction corresponding to the cooperative community should also refer to the research direction of the researchers who are nodes in the community. Therefore, the existence of a large number of researchers in other fields will affect the judgment of the scientific research direction of the community, resulting in the scientific research direction reflected by the community and related topics involved. Deviations or even errors occur, resulting in the final communication of wrong technical intelligence, which greatly affects the user experience.

发明内容SUMMARY OF THE INVENTION

本发明的目的是提供一种科研人员合作社区的构建方法及装置，用以解决现有社区发现方法用于技术领域合作社区的构建中计算量大、生成的社区图谱可读性差以及图谱中合作社区的信息存在偏差影响用户体验的问题。The purpose of the present invention is to provide a method and device for constructing a cooperative community of scientific researchers, so as to solve the problem that the existing community discovery method is used in the construction of cooperative community in the technical field due to the large amount of computation, the poor readability of the generated community graph, and the problem of cooperation in the graph. There is a problem that the information in the community is biased and affects the user experience.

为实现上述目的，本发明的方案包括：To achieve the above object, the scheme of the present invention includes:

本发明的一种科研人员合作社区的构建方法，包括如下步骤：A method for constructing a cooperative community of scientific researchers of the present invention includes the following steps:

1)获取合作数据，所述合作数据包括科研项目和参与对应科研项目的科研人员，以及科研成果和产出对应科研成果的科研人员；所述科研成果包括论文；1) Obtaining cooperation data, the cooperation data includes scientific research projects and scientific researchers participating in corresponding scientific research projects, as well as scientific research results and scientific research personnel who produce corresponding scientific research results; the scientific research results include papers;

2)建立科研人员影响力模型，计算科研人员影响力；2) Establish the influence model of researchers and calculate the influence of researchers;

3)对合作数据进行筛选，所述筛选包括：从合作数据中删除科研人员影响力低于影响力设定值的科研人员；3) Screening the cooperation data, the screening includes: deleting the scientific researcher whose influence is lower than the influence setting value from the cooperation data;

4)根据筛选后的合作数据，生成科研人员合作关系网络，所述科研人员合作关系网络包括：每个科研人员分别和其他科研人员之间的合作次数，所述合作次数是指对应两个科研人员参与过同一个科研项目的次数、或者产出过相同科研成果的次数；4) According to the cooperation data after screening, generate a network of scientific researcher cooperation relationship, the scientific researcher cooperation relationship network includes: the number of cooperation between each scientific researcher and other scientific researchers, and the number of cooperation times refers to the corresponding two scientific research personnel. The number of times the staff has participated in the same scientific research project or produced the same scientific research results;

5)采用社区发现算法，基于科研人员合作关系网络及科研人员影响力，生成科研人员合作社区图谱。5) Using the community discovery algorithm, based on the cooperative relationship network of researchers and the influence of researchers, a community map of researchers' cooperation is generated.

一个科研领域内科研人员及其合作关系数量庞大，某些支撑技术或辅助技术在本领域内大量出现和使用，其科研人员出现在本领域的社区发现的图谱中是有参考价值的，但也有大量其他领域科研人员仅在本领域的合作数据中出现较少次数甚至一次，且都是相对单一的某个或几个研发方向中的合作。这些合作数据对社区发现及最终的社区发现图谱造成了一定程度的影响。There are a large number of researchers and their cooperative relationships in a scientific research field. Certain supporting technologies or auxiliary technologies have appeared and used in large numbers in this field. It is valuable for their researchers to appear in the map found by the community in this field, but there are also A large number of researchers in other fields appear only a few times or even once in the cooperation data in this field, and they are all cooperation in a relatively single or several R&D directions. These collaborative data have a certain impact on community discovery and the final community discovery graph.

这些数据的处理非常棘手，直接删除或基于领域过滤掉所有非本领域科研人员的做法不可取，一是现在科研活动频繁，跨领域的科研人员很多，很难为每个科研人员贴上确定的领域标签；二是其他领域的技术人员出现在本领域的合作图谱中在有些情况下反而很有价值，便于发现本领域内的一些独特技术难题的关键所在，或者说发现了解决本领域相关技术难题所依赖的其他领域的技术及科研人员，这些科研人员对于技术情报来说反而具有很大参考意义，会给合作社区图谱的现实作用带来很大帮助。The processing of these data is very tricky. It is not advisable to directly delete or filter out all researchers who are not in the field based on the field. First, there are frequent scientific research activities and there are many cross-field researchers, so it is difficult to label each researcher with a specific field. Labels; Second, the presence of technical personnel in other fields in the cooperation map in this field is valuable in some cases, which is convenient for discovering the key to some unique technical problems in this field, or discovering and solving related technical problems in this field. Relying on other fields of technology and researchers, these researchers have great reference for technical information, and will bring great help to the practical role of the cooperative community map.

面对上述数据处理技术难题，本发明在社区发现算法前对科研活动数据依据科研人员在对应科研领域的影响力进行过滤，在对应领域项目及成果数据中出现次数较少的科研人员，则其在该领域的影响力也会较低；而反映了本领域特殊技术难题关键所在的其他技术领域科研人员，其在本领域的影响力也会很高。因此本发明的方法首先能减少在本领域中研发方向较为单一、合作次数较少的其他领域科研人员；其次也过滤掉了虽然在本领域，但作为技术情报来说参考价值不大的本领域科研人员；同时又防止虽然是跨领域或边缘领域，但对本领域各个技术关键问题提供很好支撑和辅助的科研人员被过滤掉。保证了数据源的准确性。Faced with the above-mentioned technical difficulties in data processing, the present invention filters the scientific research activity data according to the influence of the scientific research personnel in the corresponding scientific research field before the community discovers the algorithm. The influence in this field will also be low; and researchers in other technical fields that reflect the key to the special technical problems in this field will also have a high influence in this field. Therefore, the method of the present invention can firstly reduce the number of researchers in other fields with relatively single research and development direction and few cooperation times in this field; secondly, it can also filter out the field of this field, which has little reference value as technical information. At the same time, it prevents the researchers who provide good support and assistance to various technical key issues in this field from being filtered out, although they are cross-field or marginal fields. The accuracy of the data source is guaranteed.

本发明的方法有效减少了后期社区发现图谱中参考价值小且造成干扰的数据节点，增加图谱可读性，提高了用户体验；另外有效减小了最终生成的合作社区反映科研方向及涉及科研课题的偏差。同时相比在社区发现算法中调节相关参数和阈值的方法，降低了计算复杂度，提高了计算效率。为社区图谱的更新频率的提升提供了技术支持。The method of the invention effectively reduces the data nodes with small reference value and causes interference in the later community discovery map, increases the readability of the map, and improves the user experience; in addition, the finally generated cooperative community effectively reduces the reflection of scientific research directions and related scientific research topics. deviation. At the same time, compared with the method of adjusting relevant parameters and thresholds in the community discovery algorithm, the computational complexity is reduced and the computational efficiency is improved. Provided technical support for the improvement of the update frequency of the community graph.

进一步的，所述影响力设定值N为：Further, the influence setting value N is:

其中，N_max为合作数据中科研人员影响力最高值。Among them, N _max is the highest value of the influence of scientific researchers in the cooperation data.

进一步的，对步骤4)中所述科研人员合作关系网络进行过滤，所述过滤包括，将合作次数低于合作次数设定值的科研人员之间的合作次数设为0。Further, filtering the cooperation relationship network of scientific researchers in step 4), the filtering includes setting the number of cooperation between scientific researchers whose number of cooperation is lower than the set value of the number of cooperation as 0.

进一步的，所述合作次数设定值M为：Further, the set value M of the number of times of cooperation is:

其中，M_max为合作数据中科研人员合作的最高频次。Among them, M _max is the highest frequency of collaboration between researchers in the collaboration data.

本发明中科研人员合作度的分析基于科研人员通过项目、各类成果的合作关系构建合作关系网络，分析数据的质量将直接影响所发现社团结构的质量，因此科研人员的选择综合考虑了科研人员的综合影响力和合作参与项目或成果的合作频次，一方面体现了科研工作者的科研能力及科研成果产出能力，另一方面又体现出科研人员的合作交流能力，使科研合作社区发现能够为图谱科研信息反映的准确程度提供更有力的支撑。因此用于构建作者科研合作网络的核心科研人员选择标准有两个，一是最低科研人员影响力，二是科研人员合作的最低频次。能够有效过滤掉不能够真实准确反映生成的社区的科研信息的干扰数据。The analysis of the degree of cooperation of scientific researchers in the present invention is based on the construction of a cooperative relationship network by scientific researchers through the cooperative relationship of projects and various achievements. The quality of the analysis data will directly affect the quality of the found community structure. Therefore, the selection of scientific researchers comprehensively considers the scientific research personnel. On the one hand, it reflects the scientific research ability of scientific research workers and the ability to produce scientific research results; Provide more powerful support for the accuracy of the information reflected in the map scientific research. Therefore, there are two core research staff selection criteria for constructing the author's scientific research cooperation network, one is the lowest research staff influence, and the other is the minimum frequency of research staff cooperation. It can effectively filter out the interference data that cannot truly and accurately reflect the scientific research information of the generated community.

本发明采用普莱斯定律来确定核心科研人员的最低影响力分值和最低项目、成果合作频次。过滤标准科学有效。The invention adopts Price's Law to determine the minimum influence score and the minimum project and achievement cooperation frequency of core scientific researchers. The filtering standard is scientific and effective.

进一步的，所述科研人员影响力模型中，科研人员影响力至少由科研产出数量决定，所述科研产出包括如下类别：论文、专利。Further, in the scientific research personnel influence model, the scientific research personnel influence is determined at least by the number of scientific research outputs, and the scientific research outputs include the following categories: papers and patents.

在本领域(“本领域”或“该领域”在本文中主要指进行合作社区发现的科研技术领域)中产出的成果少，且成果的质量较低，对于本领域科研人员来说，自然影响力较小，其作为技术情报内容来说参考价值不大；而对于其他领域和跨领域的科研人员来说，哪怕其参与的科研项目很多，而产出的成果少，说明其仅是对一些常规性问题进行协助，并不能反映该领域特殊的技术难题及解决办法。作为技术情报来说，其在该领域合作数据中所代表的技术的可替代性很强，在图谱中的参考价值不大(即其他领域科研人员在本领域中解决常规性问题)。In this field ("this field" or "the field" in this article mainly refers to the scientific research and technical field for collaborative community discovery), the output of the results is few, and the quality of the results is low. For researchers in this field, it is natural The influence is small, and it has little reference value as a technical information content; for researchers in other fields and cross-fields, even if they participate in a lot of scientific research projects and produce few results, it means that they are only suitable for Assisting with some routine problems does not reflect the special technical problems and solutions in this field. As technical information, the technologies represented in the cooperation data in this field are highly replaceable and have little reference value in the map (that is, researchers in other fields solve routine problems in this field).

而其他领域或跨领域的科研人员，其所代表的技术在本领域中解决了独特的问题，带来了创新的效果，则其在该领域一定会有相当程度的科研成果的产出，例如申请了专利或者发表了论文(即其他领域科研人员在本领域中解决突出独特的问题)。For researchers in other fields or across fields, the technologies they represent solve unique problems in this field and bring about innovative effects, so they will definitely produce a considerable degree of scientific research results in this field, such as Patented or published papers (i.e. researchers in other fields address outstanding and unique problems in the field).

因此，将本领域中论文、专利等科研产出的成果作为影响力计算模型中的决定性因素，过滤效果和准确性更好。Therefore, the results of scientific research such as papers and patents in this field are used as the decisive factors in the influence calculation model, and the filtering effect and accuracy are better.

关于“其他领域科研人员在本领域中解决常规性问题”和“其他领域科研人员在本领域中解决突出独特的问题”的区别进行如下举例说明。例如化学领域的科研人员合成了一种有机化合物，这种有机化合物在烟草技术领域中作为卷烟工艺的添加剂能够大大提升香烟的口感，减少有害物质产生，因而在烟草领域成为了独特的创新，该化学领域的科研人员相应的也会在烟草科技领域产出大量的成果，例如论文、专利、甚至关于这种化合物添加剂量的国家标准或行业标准等。而另一个控制领域的科研人员开发了一种高效的炉温闭环反馈控制方法，对炉温的控制更加精准高效降低能耗。烟草科技领域和该温控领域的科研人员开展了一些合作，在烟叶回潮过程中采用了这种技术，这些合作可能仅限于烟草科技领域的烟叶回潮方向，且数量较少，而加热和温度控制对于烟叶回潮来说可替代性很强，可以采用别的方法方式进行温度控制，因此，该控制领域的技术人员在烟草科技领域合作社区图谱中的参考价值不大，而作为一个节点留在图谱中会造成一定的信息干扰，且这类数据大量存在会增加社区发现计算量、增加图谱的复杂度、降低图谱的可读性。而由于该温度控制相关的技术在烟草领域替代性很强，因此该控制领域科研人员在烟草科技领域相关成果就会很少，其成果应当都产出在控制相关的技术领域。因此，基于本发明的方法及影响力模型，能够有效过滤掉干扰数据，降低计算量且提高图谱可读性。The following examples illustrate the difference between "scientists in other fields solve conventional problems in this field" and "researchers in other fields solve outstanding and unique problems in this field". For example, researchers in the field of chemistry have synthesized an organic compound, which can greatly improve the taste of cigarettes and reduce the production of harmful substances in the field of tobacco technology as an additive in cigarette technology. Therefore, it has become a unique innovation in the field of tobacco. Correspondingly, researchers in the field of chemistry will also produce a lot of results in the field of tobacco technology, such as papers, patents, and even national or industry standards on the dosage of this compound. Another researcher in the field of control has developed an efficient closed-loop feedback control method for furnace temperature, which can control furnace temperature more accurately and efficiently and reduce energy consumption. Researchers in the field of tobacco technology and temperature control have carried out some cooperation, and this technology is used in the process of tobacco leaf rejuvenation. These cooperation may be limited to the direction of tobacco leaf rejuvenation in the field of tobacco technology, and the number is small, while heating and temperature control It is highly substitutable for the resurgence of tobacco leaves, and other methods can be used for temperature control. Therefore, the reference value of technicians in the field of tobacco control in the cooperative community map in the field of tobacco science and technology is not large, and they remain in the map as a node. It will cause certain information interference, and the large amount of such data will increase the computational complexity of community discovery, increase the complexity of the map, and reduce the readability of the map. And because the technology related to temperature control is highly substitutable in the field of tobacco, researchers in the field of control will have few related achievements in the field of tobacco science and technology, and their achievements should all be produced in the technical field related to control. Therefore, based on the method and the influence model of the present invention, the interference data can be effectively filtered, the calculation amount can be reduced, and the readability of the map can be improved.

进一步的，所述科研人员影响力模型为：Further, the researcher influence model is:

其中，P为影响力，n为该科研人员的科研产出数量，S_i为该科研人员的第i项科研产出所属类别的设定得分，W_i为该科研人员的第i项科研产出在该类别科研产出中的设定权重。Among them, P is the influence, n is the number of scientific research output of the researcher, S _i is the set score of the category to which the scientific researcher's ith scientific research output belongs, and Wi is the scientific researcher's _ith scientific research output. Out of the set weights in the category of scientific research output.

对于科研人员的影响力计算模型基于该科研人员在生成图谱的科研领域的科研产出数量和质量决定，同时针对不同的科研产出(例如获奖的科技成果、论文或标准)以及同一类科研产出的不同产出等级(例如获得国家级和省部级科技奖励的科技成果)设置不同的基础得分和权重。采用本发明的模型计算科研人员在对应领域的影响力客观准确，得出的数值方便比较。The influence calculation model for a researcher is determined based on the quantity and quality of the researcher's research output in the research field that generates the map, and at the same time for different research outputs (such as award-winning scientific and technological achievements, papers or standards) and the same type of research output Different basic scores and weights are set for different output levels (such as scientific and technological achievements that have obtained national and provincial scientific and technological awards). Using the model of the invention to calculate the influence of scientific researchers in the corresponding field is objective and accurate, and the obtained numerical values are convenient for comparison.

进一步的，步骤4)中，生成所述科研人员合作关系网络时，对于早于设定时间的科研项目，其中任意两个科研人员因参与该次科研项目所产生的合作记为a次，0<a<1。在合作次数计算时，还可以使用不同的映射方法，将特定时间的一次合作次数映射到0到1的数值，以体现不同时间的影响。Further, in step 4), when generating the scientific research personnel cooperative relationship network, for the scientific research project earlier than the set time, the cooperation generated by any two scientific research personnel due to participating in this scientific research project is recorded as a time, and 0 <a<1. When calculating the number of cooperation, different mapping methods can also be used to map the number of cooperation at a specific time to a value from 0 to 1, so as to reflect the influence of different times.

在合作关系网络构建过程中，计算科研人员之间合作程度(次数)除了合作关系频次作为主要分析数据以外，为保证分析结果数据更具有时效性以及生成的社区所反映的信息更具价值，科研人员之间合作程度在计算作者合作频次关系值的基础上增加合作关系时序权重参数，例如合作时间早于一定年限时的1次合作乘以对应的一个大于零小于1的权重系数。In the process of building the cooperative relationship network, the degree of cooperation (number of times) between researchers is calculated, in addition to the frequency of cooperation relationship as the main analysis data, in order to ensure that the analysis result data is more timely and the information reflected by the generated community is more valuable, scientific research The degree of cooperation between personnel is based on the calculation of the author's cooperation frequency relationship value, and the time series weight parameter of the cooperation relationship is added.

进一步的，所述社区发现算法为louvain算法。Further, the community discovery algorithm is a louvain algorithm.

进一步的，为所述科研人员合作社区图谱中的各社区加上主题标签为，所述主题标签为对应社区中作为节点的各科研人员发表的论文中，词频最高的设定数量的论文关键词。Further, adding a topic tag to each community in the scientific researcher cooperative community graph is, the topic tag is a set number of paper keywords with the highest word frequency among the papers published by each scientific researcher serving as a node in the corresponding community. .

以社区中全部科研人员发表的所有论文的全部关键词中词频高的关键词作为标签，能够反映出这个科研社区的研究方向和研究重点，增加图谱的可读性，辅助读者识别各社区的研究主题。Using keywords with high frequency among all keywords of all papers published by all researchers in the community as tags can reflect the research direction and research focus of this scientific research community, increase the readability of the map, and help readers identify research in each community. theme.

本发明的一种科研人员合作社区的构建装置，包括处理器和存储器，所述处理器执行储存于存储器中的指令，以实现如上所述的科研人员合作社区的构建方法。An apparatus for constructing a scientific researcher cooperative community of the present invention includes a processor and a memory, and the processor executes the instructions stored in the memory to realize the above-mentioned construction method for a scientific researcher cooperative community.

附图说明Description of drawings

图1是本发明的科研人员合作社区的构建方法流程图；Fig. 1 is the construction method flow chart of the scientific research personnel cooperative community of the present invention;

图2是科研领域知识实体关系网络模型图；Figure 2 is a diagram of the knowledge entity relationship network model in the scientific research field;

图3是科研人员影响力模型示意图；Figure 3 is a schematic diagram of the influence model of researchers;

图4是本发明的方法生成的烟草科研领域合作社区图谱示例。FIG. 4 is an example of a collaborative community map in the field of tobacco scientific research generated by the method of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明做进一步详细的说明。The present invention will be described in further detail below with reference to the accompanying drawings.

方法实施例：Method example:

本发明的科研人员合作社区的构建方法，以烟草科研领域为例，对本发明的方法进行说明。方法流程如图1所示，具体包括以下步骤：The construction method of the scientific researcher cooperative community of the present invention is described by taking the field of tobacco scientific research as an example to illustrate the method of the present invention. The method flow is shown in Figure 1, which specifically includes the following steps:

S1.获取科研合作原始数据。科研合作数据的获得主要包括科研项目及参与科研项目的科研人员、科研成果及做出科研成果的科研人员数据的抽取、融合与筛选。对于烟草领域来说，合作知识图谱数据源包括：XML、RDB、EXCEL、HTML等格式的数据。首先制定各实体、关系、属性的核心元数据。例如：在科研人员和科研项目之间建立1条承担关系(responsibleFor)，科研人员实体、科研项目实体、承担关系的核心元数据模板如下表1、2、3所示：S1. Obtain the original data of scientific research cooperation. The acquisition of scientific research cooperation data mainly includes the extraction, fusion and screening of the data of scientific research projects and researchers participating in scientific research projects, scientific research results and scientific research personnel who made scientific research results. For the tobacco field, cooperative knowledge graph data sources include data in XML, RDB, EXCEL, HTML and other formats. First, formulate the core metadata of each entity, relationship, and attribute. For example, to establish a responsibility relationship (responsibleFor) between researchers and research projects, the core metadata templates of researcher entities, research project entities, and responsibility relationships are shown in Tables 1, 2, and 3 below:

表1科研人员实体核心元数据模板Table 1 The core metadata template of the researcher entity

表2科研项目实体核心元数据模板Table 2 Core metadata templates of scientific research project entities

序号serial number 属性名称property name 类型type 说明illustrate 11 CODECODE longlong 项目IDProject ID 22 NAMENAME texttext 项目名称project name 33 START_TIMESTART_TIME texttext 项目起始日期Project start date 44 END_TIMEEND_TIME texttext 项目截止日期project deadline 55 SOURCESOURCE texttext 项目来源project source 66 CONTENTCONTENT texttext 研究内容及意义Research content and significance 77 INDICATORSINDICATORS texttext 项目指标Project indicators 88 PROGRAMPROGRAM texttext 项目实施方案Project implementation plan

表3承担关系核心元数据模板Table 3 assume relational core metadata template

序号serial number 属性名称property name 类型type 说明illustrate 11 PERSON_CODEPERSON_CODE longlong 科研人员IDScientist ID 22 PROJECT_CODEPROJECT_CODE longlong 科研项目IDResearch project ID 33 IS_LEADERIS_LEADER texttext 是否为项目负责人Is the project leader

通过解析不同数据源，将数据按上述模板进行储存形成原始数据。即根据核心元数据制定不同数据源的抽取模板；然后通过制定的数据抽取模板采集原始数据。By parsing different data sources, the data is stored according to the above template to form the original data. That is, according to the core metadata, the extraction templates of different data sources are formulated; then the original data is collected through the formulated data extraction templates.

原始数据包括，科研项目和参与对应科研项目的科研人员，以及科研成果和做出对应科研成果的科研人员。科研成果、科研项目以及科研人员的关系如图2所示，参与或负责同一项目的科研人员产生合作关系；共同产出对应成果(根据成果对应的署名科研人员)的科研人员产生合作关系。科研成果包括论文、专利、标准、著作以及成果奖励等，针对不同的成果形式和项目的记录形式，制定有针对性的数据抽取模板，提取出的能够反映科研人员合作关系的数据形成原始数据。The original data includes scientific research projects and researchers involved in corresponding scientific research projects, as well as scientific research results and researchers who have made corresponding scientific research results. The relationship between scientific research achievements, scientific research projects and scientific research personnel is shown in Figure 2. The scientific research personnel participating in or responsible for the same project form a cooperative relationship; the scientific research personnel who jointly produce the corresponding results (according to the signed scientific research personnel corresponding to the results) form a cooperative relationship. Scientific research results include papers, patents, standards, works, and achievement awards. Targeted data extraction templates are formulated for different achievement forms and project record forms, and the extracted data that can reflect the cooperative relationship of researchers forms the original data.

S2.对原始数据进行数据融合处理，数据处理规则如表4所示。因数据提取渠道繁多，不同渠道提取的原始数据的记录格式以及内容字段各不相同，提取出的科研人员、科研项目也可能出现重复，按照数据融合规则对不同来源的数据进行整合，补充完善数据的字段，去除重复数据。烟草合作知识图谱科研知识数据融合采用模型驱动的方法，通过配置生成ETL数据融合语言，ETL数据融合语言通过解析引擎解析并执行。最终ETL数据融合语言会映射为Spark SQL执行，用来完成原始数据的融合。原始数据融合后形成图数据库。S2. Perform data fusion processing on the original data, and the data processing rules are shown in Table 4. Due to the variety of data extraction channels, the record formats and content fields of the original data extracted from different channels are different, and the extracted scientific research personnel and scientific research projects may also be duplicated. Integrate data from different sources according to the data fusion rules to supplement and improve the data. field to remove duplicate data. Tobacco Cooperative Knowledge Graph scientific research knowledge data fusion adopts a model-driven method, generates ETL data fusion language through configuration, and ETL data fusion language is parsed and executed by a parsing engine. Finally, the ETL data fusion language will be mapped to Spark SQL execution to complete the fusion of raw data. The original data is fused to form a graph database.

表4烟草科研人员与科研项目数据融合规则Table 4 Data fusion rules for tobacco researchers and scientific research projects

S3.科研人员影响力计算。构建科研人员影响力计算模型，并采用预计算的方式，将影响力作为属性值同步更新至图数据库中，可根据数据更新频率定期更新该属性值。S3. Calculation of the influence of scientific researchers. Build a researcher's influence calculation model, and use the pre-calculation method to synchronously update the influence as an attribute value to the graph database. The attribute value can be updated regularly according to the data update frequency.

科研人员影响力计算模型如图3所示，影响力的计算基于对应科研人员在烟草科研领域的科研产出，具体包含二级指标体系，分别为科研产出类型、对应科研产出类型下的科研产出级别，对于部分类型的科研产出还可能有第三级指标，比如奖励中对应的获奖等级(一等奖、二等奖等)。对于科研产出类型和更多级指标的划分，本发明不做限定。本实施例中，对各类型产出的最后一级指标进行打分，具体可以为，根据产出类型的基础分和对应的二、三级指标的附加分得到各类型产出的最后一级指标(即各种产出的产出得分)，最后一级指标是二级指标的(对应图3中的专利、论文、标准、著作)则对二级指标下的产出打分，有三级指标的(对应图3中的奖励)则对三级指标下的产出打分。每个科研人员基于某项科研产出的得分为该产出的产出得分乘以得分权重，得分权重与该科研人员在该项产出中的贡献大小有关，贡献大小可以由该产出的署名顺序确定，或者根据该产出的原始记录例如工作日志来确定。科研人员最终的影响力得分由其各项产出的得分累加得到。基于此模型，科研人员得分的计算公式为：The calculation model of scientific research personnel influence is shown in Figure 3. The calculation of influence is based on the scientific research output of the corresponding scientific research personnel in the field of tobacco scientific research, and specifically includes a two-level index system, which is the scientific research output type and the corresponding scientific research output type. For the scientific research output level, there may also be a third-level indicator for some types of scientific research output, such as the corresponding award level in the reward (first prize, second prize, etc.). The present invention does not limit the division of scientific research output types and higher-level indicators. In this embodiment, scoring the last-level indicators of each type of output can specifically be: obtaining the last-level indicators of each type of output according to the basic score of the output type and the corresponding additional points of the second- and third-level indicators (that is, the output scores of various outputs), the last-level indicators are the second-level indicators (corresponding to patents, papers, standards, and works in Figure 3), and the outputs under the second-level indicators are scored, and there are three-level indicators. (corresponding to the reward in Figure 3), then score the output under the three-level indicator. The score of each researcher based on a research output is the output score of the output multiplied by the score weight. The score weight is related to the contribution of the researcher in the output. The contribution can be determined by the output of the output. The order of signatures is determined, or based on the original record of the output, such as a work log. The final impact score of a researcher is obtained by accumulating the scores of their various outputs. Based on this model, the formula for calculating the researcher's score is:

其中P为某科研人员影响力得分，n为该科研人员的科研产出数量，S_i表示该科研人员的第i个科研产出对应的最后一级指标的得分(即对应产出的产出得分)，W_i表示该科研人员的第i个科研产出对应的得分权重(由贡献大小确定)。Among them, P is the influence score of a researcher, n is the number of scientific research outputs of the researcher, and S _i represents the score of the last level indicator corresponding to the ith scientific research output of the researcher (that is, the output of the corresponding output). Score), Wi represents the score weight (determined by the contribution size) corresponding to the _ith scientific research output of the researcher.

例如，某个技术人员的科技成果获得省部级的三等奖，则该科技人员针对该项产出的人员得分将为，三等奖的省部级奖励的产出得分，再乘以由该技术人员在该科技成果中所作出的贡献决定的得分权重。For example, if a technician's scientific and technological achievements have won the third prize at the provincial and ministerial level, the staff score for this output will be the output score of the third prize at the provincial and ministerial level, multiplied by the The weight of the score determined by the contribution made by the technician in the scientific and technological achievements.

S4.数据筛选。为了保证科研人员项目合作数据的数量和质量，进行科研人员的选择。以领域内影响力高于阈值的科研人员作为合作图谱的生成和展示对象，根据普莱斯定律，将领域内影响力小于N的科研人员过滤掉，N的计算公式如下：S4. Data screening. In order to ensure the quantity and quality of the researcher's project cooperation data, the selection of the researcher is carried out. The researchers whose influence in the field is higher than the threshold are used as the objects for generating and displaying the cooperation map. According to Price's Law, the researchers whose influence is less than N in the field are filtered out. The calculation formula of N is as follows:

其中，N_max表示该领域内科研人员中最高的影响力值，计算出N值后，领域内影响力大于等于N的科研人员作为合作社区分析数据源中的科研人员。Among them, N _max represents the highest influence value among researchers in the field. After the N value is calculated, the researchers with influence greater than or equal to N in the field are regarded as the researchers in the cooperative community analysis data source.

S5.生成科研人员项目合作关系数据。基于筛选出的科研人员数据，通过两两计算科研人员间的项目合作频次，可以得到科研人员项目合作关系网络(即为合作频次数据)。S5. Generate researcher project partnership data. Based on the screened data of scientific researchers, by calculating the frequency of project cooperation between scientific researchers in pairs, the project cooperation relationship network of scientific researchers can be obtained (that is, the data of cooperation frequency).

具体例如，科研人员a、b、c、d、e作为科研人员节点数据，其项目合作关系集合为：{<a,b>,<a,c>,<a,d>,<a,e>,<b,c>,<b,d>,<b,e>,<c,d>,<c,e>,<d,e>}，构建的合作关系网络以如下矩阵形式表示：Specifically, for example, researchers a, b, c, d, and e are used as node data for researchers, and their project partnership set is: {<a,b>,<a,c>,<a,d>,<a,e >,<b,c>,<b,d>,<b,e>,<c,d>,<c,e>,<d,e>}, the constructed partnership network is represented in the following matrix form:

表5烟草科研人员合作关系矩阵Table 5 Tobacco research staff partnership matrix

其中，矩阵中的数字代表横竖方向上两位科研人员合作次数，本实施例中，对应两位科研技术人员共同参与同一个科研项目一次，则认为产生一次合作；或者对应两位科研技术人员共同产出一项科研成果，则认为产生一次合作。Among them, the numbers in the matrix represent the number of collaborations between two scientific researchers in the horizontal and vertical directions. In this embodiment, if the corresponding two scientific research personnel participate in the same scientific research project once, it is considered that a cooperation occurs; If a scientific research result is produced, it is considered to produce a cooperation.

进一步的，对于科研人员项目合作关系网络可以进行进一步的过滤，将合作次数低于设定阈值的合作关系删去，以降低偶然的项目合作对最终的合作图谱的干扰，同时进一步较少社区发现算法的计算量。Further, further filtering can be performed on the project cooperation relationship network of researchers, and the cooperation relationship whose number of cooperation is lower than the set threshold can be deleted, so as to reduce the interference of accidental project cooperation on the final cooperation map, and further reduce the number of community discoveries. The computational effort of the algorithm.

具体根据普莱斯定律，将合作次数阈值定位M，即将合作次数低于M的两位科研人员的合作次数设为0次，M的计算公式如下：Specifically, according to Price's Law, the threshold of the number of collaborations is positioned M, that is, the number of collaborations between two researchers whose number of collaborations is lower than M is set to 0. The formula for calculating M is as follows:

其中，M_max表示该领域的合作数据中科研人员间合作次数的最高值，计算出M值后，合作次数低于此值的两位科研人员间认为没有合作。Among them, M _max represents the highest value of the number of collaborations between researchers in the cooperation data in this field. After the M value is calculated, two researchers whose number of collaborations is lower than this value are considered to have no collaboration.

例如，计算出的M值为3.3，则合作次数低于3.3的两位科研人员a和c间认为没有过合作，表5的合作关系矩阵调整如下：For example, if the calculated value of M is 3.3, the two researchers a and c whose number of collaborations are lower than 3.3 are considered to have not cooperated. The cooperation matrix in Table 5 is adjusted as follows:

表6烟草科研人员合作关系矩阵(合作次数过滤后)Table 6 Tobacco research staff partnership matrix (after filtering the number of collaborations)

aa bb cc dd ee aa 1414 00 1919 2828 bb 1414 1212 2626 22twenty two cc 00 1212 3232 1313 dd 1919 2626 3232 1616 ee 2828 22twenty two 1313 1616

进一步的，对于科研人员项目合作关系网络可以进行进一步的过滤，将合作的时间早于设定时间的合作设置权重，降低早期合作的影响。因科研人员的研究方向和领域可能随时间推移而变化，因此早期的合作数据对合作社区图谱的意义和价值降低。因此，可以设置一级或多级权重，例如两位科研人员之间近2年的合作关系的权重为1，近2-5年的合作关系的权重为0.8，超过5年的合作关系的权重为0.5。即基于合作时间进行过滤之前，两位科研人员6年前的4次项目合作视为4次合作，按上述权重过滤后，仅视为2次合作；再比如3年前的1次合作在过滤后仅为0.8次合作。Further, further filtering can be performed on the project cooperation relationship network of scientific researchers, and the weight of the cooperation whose cooperation time is earlier than the set time can be set to reduce the influence of early cooperation. Because the research directions and fields of researchers may change over time, early collaboration data is less meaningful and valuable to the collaborative community map. Therefore, one-level or multi-level weights can be set. For example, the weight of the cooperation relationship between two researchers in the past 2 years is 1, the weight of the cooperation relationship between the past 2-5 years is 0.8, and the weight of the cooperation relationship between more than 5 years is 0.8. is 0.5. That is, before filtering based on the cooperation time, the 4 project cooperation between the two researchers 6 years ago is regarded as 4 cooperation. After filtering according to the above weight, it is only regarded as 2 cooperation; another example is 1 cooperation 3 years ago. After only 0.8 cooperation.

S6.社区发现。采用Louvain算法，基于最终得到的科研人员合作关系网络(合作频次数据)以及科研人员影响力数据，进行社区发现。Louvain算法是基于模块度的社区发现算法，能够发现层次性的社区结构，可最大化发现烟草科研领域科研人员所在的社区(具有合作关系，研究方向和领域相似的科研人员群体)。S6. Community Discovery. The Louvain algorithm is used to conduct community discovery based on the finally obtained researcher collaboration network (cooperation frequency data) and researcher influence data. The Louvain algorithm is a modularity-based community discovery algorithm that can discover hierarchical community structures and maximize the discovery of the communities where researchers in the field of tobacco research are located (groups of researchers with cooperative relationships, research directions and fields similar).

具体为，基于科研人员项目合作关系矩阵构成的科研人员合作关系网络，采用louvain算法，具体包括如下步骤Specifically, the louvain algorithm is used for the scientific research personnel cooperative relationship network formed by the scientific research personnel project cooperative relationship matrix, which specifically includes the following steps

1)进行第1阶段遍历网络中的节点，并进行社区间的节点转移。对于网络中的每个节点A，依次尝试将节点A加入到每个邻居节点所在的社区中，并计算加入前后的模块度变化ΔQ。如果最大的ΔQ>0，则将节点A加入到使ΔQ最大的邻居节点所在的社区，否则不改变节点A的社区归属；1) Carry out the first phase of traversing the nodes in the network and transfer nodes between communities. For each node A in the network, try to add node A to the community where each neighbor node is located in turn, and calculate the modularity change ΔQ before and after joining. If the largest ΔQ>0, add node A to the community where the neighbor node with the largest ΔQ is located, otherwise the community ownership of node A is not changed;

重复节点转移步骤，直到所有节点的社区归属不再变化，社区间的节点转移结束，第1阶段完成；Repeat the node transfer steps until the community ownership of all nodes does not change, the node transfer between communities is over, and the first stage is completed;

2)进行第2阶段重构图，将第一阶段结束后形成的社区重构成为新的节点，新节点间的边的权重为原社区间的边的权重之和，新节点到自身的环的权重为原社区内节点之间边的权重之和；2) Reconstruct the graph in the second stage, and reconstruct the community formed after the first stage into a new node. The weight of the edges between the new nodes is the sum of the weights of the edges between the original communities, and the loop from the new node to its own. The weight of is the sum of the weights of edges between nodes in the original community;

对于重构后的图，不断迭代第1阶段和第2阶段，直至整个图的模块度不再发生变化，保存每个节点所属的社区信息；For the reconstructed graph, iterate the first and second stages until the modularity of the entire graph no longer changes, and save the community information to which each node belongs;

根据每个节点的社区信息，确定每个节点最终的社区归属，完成社区发现。According to the community information of each node, determine the final community ownership of each node, and complete the community discovery.

S7.将烟草领域科研人员合作社区可视化。对S6社区发现结果数据中表示科研人员影响力的节点大小以及表示科研人员之间合作关系紧密程度的边的粗细数据进行曲线标准化处理，以保证数据节点和关系边权重值在可视化图谱中均匀分布，这里采用取对数方法对原始数据进行处理，过程如下：S7. Visualize the collaborative community of researchers in the tobacco field. In the S6 community discovery result data, the size of the nodes representing the influence of researchers and the thickness data of the edges representing the closeness of the cooperative relationship between researchers are subjected to curve normalization to ensure that the weights of data nodes and relationship edges are evenly distributed in the visual graph , the logarithm method is used to process the original data, and the process is as follows:

针对节点大小size值的处理：S_new＝log₁₀S；Processing for node size value: S _new =log ₁₀ S;

针对边粗细width值的处理：W_new＝log₁₀W；Processing for edge thickness width value: W _new =log ₁₀ W;

即对得到的节点大小、边的粗细的原始数据进行对数处理。That is, logarithmic processing is performed on the obtained raw data of node size and edge thickness.

可视化图谱中每个节点代表一个科研人员，节点的大小代表科研人员在本领域的影响力，节点越大影响力越大；节点与节点之间的连线宽度反应两位科研人员之间的合作频次，即合作度权重值，合作频次越高，连线越宽。Each node in the visualization graph represents a researcher, the size of the node represents the influence of the researcher in the field, the larger the node, the greater the influence; the width of the connection between the nodes reflects the cooperation between the two researchers Frequency, that is, the weight value of cooperation degree. The higher the cooperation frequency, the wider the connection.

具体的，合作社区可视化基于Echarts可视化框架实现，通过加载合作社区JSON数据，实现合作社区图谱可视化前端展示，jason数据格式示例如下：Specifically, the cooperative community visualization is implemented based on the Echarts visualization framework. By loading the cooperative community JSON data, the front-end visualization of the cooperative community graph is realized. An example of the jason data format is as follows:

其中：nodes表示科研人员(id为科研人员唯一标识、label表示姓名、group表示社区分组编号、size表示社区中节点的大小)；links表示关系边(source表示边的起始节点唯一标识、target表示边的终止节点唯一标识、weight表示边的宽度)。Among them: nodes represents the researcher (id is the unique identifier of the researcher, label represents the name, group represents the community group number, size represents the size of the node in the community); links represents the relationship edge (source represents the unique identifier of the starting node of the edge, target represents The terminal node of the edge is uniquely identified, and the weight represents the width of the edge).

根据本发明的方法最终生成的烟草科研领域合作社区，其可视化图谱如图4所示，每个节点表示一个科研人员，节点的大小表示该科研人员在烟草技术领域的影响力，节点间的连线及连线的粗细表示合作的频次，频次越高合作度越高，有高合作度的科研人员聚在一起，形成科学共同体，共同体中相同颜色(灰度)的部分表示一个社区，同一个社区的科研人员有着较紧密的合作关系，同时有着相近的科研方向和科研领域。通过合作关系，可以发现如果两位科研人员合作频次越高，则两位科研人员合作关系越密切。The visual map of the tobacco scientific research cooperative community finally generated according to the method of the present invention is shown in Figure 4, each node represents a scientific researcher, the size of the node represents the influence of the scientific researcher in the field of tobacco technology, and the connection between nodes The thickness of the line and the connection indicates the frequency of cooperation. The higher the frequency, the higher the degree of cooperation. The researchers with high degree of cooperation gather together to form a scientific community. The researchers in the community have a close cooperative relationship and have similar research directions and research fields. Through the cooperation relationship, it can be found that the higher the cooperation frequency between the two researchers, the closer the cooperation relationship between the two researchers.

烟草科研领域合作社区图谱中通过关系边的宽度可确定俩俩科研人员之间合作的密切程度，一组相关科研人员合作关系频次(及时序参数)社团聚类模式分析能够揭示出科研人员间突出的关系链接，同一社区内的科研人员可认为在学术研究上有一定的相通之处，为合作伙伴的推荐提供了支持，同时为不同烟草机构中科研团队的发现起到辅助作用。In the cooperative community map in the field of tobacco research, the width of the relationship edge can determine the degree of close cooperation between two researchers, and the analysis of the frequency (and time series parameters) community clustering pattern of a group of related researchers' cooperation relationships can reveal the prominent relationship between researchers. The relationship link, researchers in the same community can think that there are certain similarities in academic research, which provides support for the recommendation of partners, and at the same time plays an auxiliary role in the discovery of scientific research teams in different tobacco institutions.

对各社区研究主题进行识别，例如提取合作社区内科研人员发表论文的关键词，并对关键词的词频进行统计，取词频最高的前N个(如前3个)作为该合作社区的研究主题，并在图谱的对应社区上加入研究主题标签，增加图谱的可读性，辅助读者识别各社区的研究主题。Identify the research topics of each community, such as extracting the keywords of the papers published by the researchers in the cooperative community, and count the word frequencies of the keywords, and select the top N (such as the top 3) with the highest word frequency as the research topics of the cooperative community. , and add research topic tags to the corresponding communities of the graph to increase the readability of the graph and help readers identify the research topics of each community.

装置实施例：Device Example:

本发明的一种科研人员合作社区的构建装置，包括存储器、处理器和数据接口，所示处理器能够通过数据接口获得烟草或其他技术领域的科研数据，包括项目、成果和技术人员等信息，并执行储存在存储器中的程序以实现本发明的一种科研人员合作社区的构建方法，本发明的方法在方法实施例中已描述的足够清楚，此处不再赘述。A device for constructing a cooperative community of scientific researchers of the present invention includes a memory, a processor and a data interface, and the processor can obtain scientific research data in tobacco or other technical fields through the data interface, including information on projects, achievements and technical personnel, etc. And execute the program stored in the memory to implement a method for constructing a scientific researcher cooperative community of the present invention. The method of the present invention has been described clearly enough in the method embodiments, and will not be repeated here.

Claims

1. a construction method of a cooperative community of scientific researchers, is characterized in that, comprises the steps:

1) Obtaining cooperation data, the cooperation data includes scientific research projects and scientific researchers participating in corresponding scientific research projects, as well as scientific research results and scientific research personnel who produce corresponding scientific research results; the scientific research results include papers;

2) Establish the influence model of researchers and calculate the influence of researchers;

3) Screening the cooperation data, the screening includes: deleting the scientific researcher whose influence is lower than the influence setting value from the cooperation data;

4) According to the cooperation data after screening, generate a network of scientific researcher cooperation relationship, the scientific researcher cooperation relationship network includes: the number of cooperation between each scientific researcher and other scientific researchers, and the number of cooperation times refers to the corresponding two scientific research personnel. The number of times the staff has participated in the same scientific research project or produced the same scientific research results;

5) Using the community discovery algorithm, based on the cooperative relationship network of researchers and the influence of researchers, a community map of researchers' cooperation is generated.

2. the construction method of scientific research personnel cooperative community according to claim 1, is characterized in that, described influence setting value N is:

Among them, N _max is the highest value of the influence of scientific researchers in the cooperation data.

3. the construction method of the cooperative community of scientific researchers according to claim 1 or 2, is characterized in that, to filter the cooperative relationship network of scientific researchers described in step 4), and described filtering comprises, the number of times of cooperation is lower than the number of times of cooperation The number of collaborations between researchers at the set value is set to 0.

4. the construction method of scientific research personnel cooperative community according to claim 3, is characterized in that, described cooperation times setting value M is:

Among them, M _max is the highest value of the number of collaborations between researchers in the collaboration data.

5. The method for constructing a cooperative community of scientific researchers according to claim 4, wherein, in the influence model of scientific researchers, the influence of scientific researchers is determined at least by the number of scientific research outputs, and the scientific research outputs include the following categories : Papers, patents.

6. The construction method of scientific research personnel cooperative community according to claim 5, is characterized in that, described scientific research personnel influence model is:

Among them, P is the influence, n is the number of scientific research output of the researcher, S _i is the set score of the category to which the scientific researcher's ith scientific research output belongs, and Wi is the scientific researcher's _ith scientific research output. Out of the set weights in the category of scientific research output.

7. The method for constructing a cooperative community of scientific researchers according to claim 6, wherein in step 4), when generating the cooperative relationship network of scientific researchers, for the scientific research project earlier than the set time, any two of them. The cooperation between researchers who participated in this scientific research project is recorded as a times, 0<a<1.

8. The method for constructing a cooperative community of scientific researchers according to claim 7, wherein the community discovery algorithm is a louvain algorithm.

9. The method for constructing a scientific researcher cooperative community according to claim 8, wherein each community in the scientific researcher cooperative community graph is added with a subject tag, and the subject tag is each node in the corresponding community as a node. In the papers published by researchers, the set number of paper keywords with the highest word frequency.

10. An apparatus for constructing a cooperative community of scientific researchers, comprising a processor and a memory, wherein the processor executes instructions stored in the memory, so as to realize the scientific research personnel according to any one of claims 1 to 9 How to build cooperative communities.