[go: up one dir, main page]

CN116561292A - Data search method, device, electronic device and computer readable medium - Google Patents

Data search method, device, electronic device and computer readable medium Download PDF

Info

Publication number
CN116561292A
CN116561292A CN202310551514.5A CN202310551514A CN116561292A CN 116561292 A CN116561292 A CN 116561292A CN 202310551514 A CN202310551514 A CN 202310551514A CN 116561292 A CN116561292 A CN 116561292A
Authority
CN
China
Prior art keywords
field
maintenance data
fields
search
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310551514.5A
Other languages
Chinese (zh)
Inventor
孙博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202310551514.5A priority Critical patent/CN116561292A/en
Publication of CN116561292A publication Critical patent/CN116561292A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • G06F16/337Profile generation, learning or modification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data searching method, a data searching device, electronic equipment and a computer readable medium, and relates to the technical field of big data processing. One embodiment of the method comprises the following steps: receiving operation and maintenance data pushed by each operation and maintenance system, or pulling the operation and maintenance data from each operation and maintenance system; adding classification fields and tag fields to the operation and maintenance data, writing the operation and maintenance data and the corresponding classification fields and tag fields into a database, creating a document according to the operation and maintenance data and the corresponding classification fields and tag fields, and updating the document into an index; receiving a search request sent by a searcher, wherein the search request carries a search sentence input by a user, identifying a target field from the search sentence, searching a plurality of matched documents in an index according to the target field, and sequencing the plurality of documents, so that the document with the front sequencing result is returned to the searcher. The implementation mode can solve the technical problem that full-ecological operation and maintenance data cannot be conveniently searched.

Description

数据搜索方法、装置、电子设备和计算机可读介质Data search method, device, electronic device and computer readable medium

技术领域technical field

本发明涉及大数据处理技术领域,尤其涉及一种数据搜索方法、装置、电子设备和计算机可读介质。The present invention relates to the technical field of big data processing, in particular to a data search method, device, electronic equipment and computer readable medium.

背景技术Background technique

随着计算机系统的普及和智能办公系统的推广,各种运维操作的规模和频率也随之不断地增加。运维系统在操作过程中产生的各种信息可以帮助分析工作的内容,解决发现的问题,以及提供相关运维数据的查询等,非常有研究和保存的价值,所以现在绝大多数的运维系统都会记录下自身的操作信息以供后续使用。With the popularization of computer systems and the promotion of intelligent office systems, the scale and frequency of various operation and maintenance operations are also increasing. The various information generated by the operation and maintenance system during the operation process can help analyze the content of the work, solve the problems found, and provide inquiries about relevant operation and maintenance data, which is very valuable for research and preservation, so most of the operation and maintenance The system will record its own operation information for subsequent use.

但是,由于不同运维系统之间功能上的巨大差距以及操作复杂性带来的运维数据复杂度不同,导致这些运维数据之间没有打通彼此之间的联系。即使有的系统实现了跨平台的数据存储,也不能提供复杂的搜索支持,很难按照某一维度实时性地找出一个硬件或者网络的全生态运维数据,这导致运维人员在搜索数据时非常不便利,不得不搜索多个系统或进行多次搜索再人工拼接得出想要的数据。However, due to the huge gap in function between different operation and maintenance systems and the complexity of operation and maintenance data brought about by the complexity of operations, these operation and maintenance data have not been connected with each other. Even if some systems implement cross-platform data storage, they cannot provide complex search support, and it is difficult to find out the whole ecological operation and maintenance data of a hardware or network in real time according to a certain dimension, which causes operation and maintenance personnel to search for data It is very inconvenient when you have to search multiple systems or conduct multiple searches and then manually splicing to get the desired data.

发明内容Contents of the invention

有鉴于此,本发明实施例提供一种数据搜索方法、装置、电子设备和计算机可读介质,以解决无法便利地搜索全生态运维数据的技术问题。In view of this, the embodiments of the present invention provide a data search method, device, electronic device, and computer-readable medium to solve the technical problem that it is impossible to conveniently search the operation and maintenance data of the whole ecology.

为实现上述目的,根据本发明实施例的一个方面,提供了一种数据搜索方法,包括:To achieve the above purpose, according to an aspect of the embodiments of the present invention, a data search method is provided, including:

接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;Receive the operation and maintenance data pushed by each operation and maintenance system, or pull the operation and maintenance data from each operation and maintenance system;

为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;Adding classification fields and label fields to the operation and maintenance data, writing the operation and maintenance data and their corresponding classification fields and label fields into the database, creating documents according to the operation and maintenance data and their corresponding classification fields and label fields, and update said document into the index;

接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;receiving a search request sent by a searcher, the search request carrying a search statement input by a user, and identifying a target field from the search statement;

根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。Searching for a plurality of matched documents in the index according to the target field, and sorting the multiple documents, so as to return the documents with higher ranking results to the searcher.

可选地,将所述运维数据及其对应的分类字段和标签字段写入数据库,包括:Optionally, writing the operation and maintenance data and its corresponding classification fields and label fields into a database includes:

根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;Generate a tree-shaped directory according to the classification field, and add the operation and maintenance data to the tree-shaped directory;

将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。Writing the tree directory, and the corresponding relationship between the operation and maintenance data and the label field into a database.

可选地,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中,包括:Optionally, create a document according to the operation and maintenance data and its corresponding classification field and label field, and update the document into the index, including:

根据所述运维数据生成索引字段;generating an index field according to the operation and maintenance data;

将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。Assembling the index field, the operation and maintenance data and its corresponding classification field and label field into a document, and writing the document into an index creator, thereby updating the document into the index.

可选地,根据所述运维数据生成索引字段,包括:Optionally, an index field is generated according to the operation and maintenance data, including:

根据所述运维数据的元数据生成索引字段;和/或,generating an index field according to the metadata of the operation and maintenance data; and/or,

对所述运维数据进行分词处理,从而得到索引字段。Word segmentation is performed on the operation and maintenance data to obtain index fields.

可选地,从所述搜索语句中识别出目标字段,包括:Optionally, identifying a target field from the search statement includes:

对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种;Preprocessing the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing;

从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段;extracting keywords and/or label fields from the preprocessed search statement, identifying user intentions according to the keywords and/or the label fields, so as to obtain associated fields;

对所述搜索语句进行分词处理,从而得到分词;performing word segmentation processing on the search statement, thereby obtaining word segmentation;

其中,所述目标字段包括关键字和/或标签字段、关联字段、分词。Wherein, the target field includes keyword and/or label field, associated field, and word segmentation.

可选地,根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,包括:Optionally, searching out multiple matching documents in the index according to the target field, and sorting the multiple documents includes:

对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值;For each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and perform a weighted summation of the calculated correlations, Thereby obtaining the BM25 value of the search statement and the document;

根据BM25值由大到小的顺序,对各个文档进行排序。Sort each document according to the order of BM25 value from large to small.

另外,根据本发明实施例的另一个方面,提供了一种数据搜索装置,包括:In addition, according to another aspect of the embodiments of the present invention, a data search device is provided, including:

接收模块,用于接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;The receiving module is used to receive the operation and maintenance data pushed by each operation and maintenance system, or to pull the operation and maintenance data from each operation and maintenance system;

存储模块,用于为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;A storage module, configured to add classification fields and label fields to the operation and maintenance data, write the operation and maintenance data and their corresponding classification fields and label fields into a database, and according to the operation and maintenance data and their corresponding classification fields and Create a document with the label field and update said document into the index;

处理模块,用于接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;A processing module, configured to receive a search request sent by a searcher, the search request carrying a search sentence input by a user, and identifying a target field from the search sentence;

搜索模块,用于根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。A search module, configured to find multiple matching documents in the index according to the target field, and sort the multiple documents, so as to return the documents with higher ranking results to the searcher.

可选地,所述存储模块还用于:Optionally, the storage module is also used for:

根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;Generate a tree-shaped directory according to the classification field, and add the operation and maintenance data to the tree-shaped directory;

将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。Writing the tree directory, and the corresponding relationship between the operation and maintenance data and the label field into a database.

可选地,所述存储模块还用于:Optionally, the storage module is also used for:

根据所述运维数据生成索引字段;generating an index field according to the operation and maintenance data;

将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。Assembling the index field, the operation and maintenance data and its corresponding classification field and label field into a document, and writing the document into an index creator, thereby updating the document into the index.

可选地,所述存储模块还用于:Optionally, the storage module is also used for:

根据所述运维数据的元数据生成索引字段;和/或,generating an index field according to the metadata of the operation and maintenance data; and/or,

对所述运维数据进行分词处理,从而得到索引字段。Word segmentation is performed on the operation and maintenance data to obtain index fields.

可选地,所述处理模块还用于:Optionally, the processing module is also used for:

对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种;Preprocessing the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing;

从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段;extracting keywords and/or label fields from the preprocessed search statement, identifying user intentions according to the keywords and/or the label fields, so as to obtain associated fields;

对所述搜索语句进行分词处理,从而得到分词;performing word segmentation processing on the search statement, thereby obtaining word segmentation;

其中,所述目标字段包括关键字和/或标签字段、关联字段、分词。Wherein, the target field includes keyword and/or label field, associated field, and word segmentation.

可选地,所述搜索模块还用于:Optionally, the search module is also used for:

对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值;For each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and perform a weighted summation of the calculated correlations, Thereby obtaining the BM25 value of the search statement and the document;

根据BM25值由大到小的顺序,对各个文档进行排序。Sort each document according to the order of BM25 value from large to small.

根据本发明实施例的另一个方面,还提供了一种电子设备,包括:According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including:

一个或多个处理器;one or more processors;

存储装置,用于存储一个或多个程序,storage means for storing one or more programs,

当所述一个或多个程序被所述一个或多个处理器执行时,所述一个或多个处理器实现上述任一实施例所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method described in any of the foregoing embodiments.

根据本发明实施例的另一个方面,还提供了一种计算机可读介质,其上存储有计算机程序,所述程序被处理器执行时实现上述任一实施例所述的方法。According to another aspect of the embodiments of the present invention, there is also provided a computer-readable medium, on which a computer program is stored, and when the program is executed by a processor, the method described in any of the above-mentioned embodiments is implemented.

根据本发明实施例的另一个方面,还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述任一实施例所述的方法。According to another aspect of the embodiments of the present invention, there is also provided a computer program product, including a computer program, when the computer program is executed by a processor, the method described in any of the foregoing embodiments is implemented.

上述发明中的一个实施例具有如下优点或有益效果:因为采用为运维数据添加分类字段和标签字段,将运维数据及其对应的分类字段和标签字段写入数据库,根据运维数据及其对应的分类字段和标签字段创建文档,并将文档更新到索引中,从搜索语句中识别出目标字段,根据目标字段在索引中查找出匹配的多个文档,并对多个文档进行排序的技术手段,所以克服了现有技术中无法便利地搜索全生态运维数据的技术问题。本发明实施例既可以灵活大量存储复杂多类型数据信息,又可以精准地根据不同需求搜索出对应的运维数据,从而便利地搜索全生态运维数据。An embodiment of the above invention has the following advantages or beneficial effects: because the operation and maintenance data and its corresponding classification fields and label fields are written into the database by adding classification fields and label fields to the operation and maintenance data, according to the operation and maintenance data and its The corresponding classification field and label field create a document, update the document to the index, identify the target field from the search statement, find multiple matching documents in the index according to the target field, and sort the multiple documents means, so it overcomes the technical problem that the existing technology cannot conveniently search for the operation and maintenance data of the whole ecology. The embodiments of the present invention can not only flexibly store a large amount of complex and multi-type data information, but also accurately search for corresponding operation and maintenance data according to different needs, so as to conveniently search for the whole ecological operation and maintenance data.

上述的非惯用的可选方式所具有的进一步效果将在下文中结合具体实施方式加以说明。The further effects of the above-mentioned non-conventional alternatives will be described below in conjunction with specific embodiments.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。其中:In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work. in:

图1是根据本发明实施例的数据搜索方法的流程图;Fig. 1 is a flowchart of a data search method according to an embodiment of the present invention;

图2是实现本发明实施例的数据搜索方法的系统架构示意图;FIG. 2 is a schematic diagram of a system architecture for implementing a data search method according to an embodiment of the present invention;

图3是根据本发明一个可参考实施例的数据搜索方法的流程图;FIG. 3 is a flow chart of a data search method according to a reference embodiment of the present invention;

图4是根据本发明另一个可参考实施例的数据搜索方法的流程图;FIG. 4 is a flowchart of a data search method according to another reference embodiment of the present invention;

图5是根据本发明实施例的数据搜索装置的示意图;5 is a schematic diagram of a data search device according to an embodiment of the present invention;

图6是本发明实施例可以应用于其中的示例性系统架构图;FIG. 6 is an exemplary system architecture diagram to which the embodiment of the present invention can be applied;

图7是适于用来实现本发明实施例的终端设备或服务器的计算机系统的结构示意图。Fig. 7 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.

具体实施方式Detailed ways

以下结合附图对本发明的示范性实施例做出说明,其中包括本发明实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本发明的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

需要说明的是,本发明的技术方案中,所涉及的用户个人信息的采集、分析、使用、传输、存储等方面,均符合相关法律法规的规定,被用于合法且合理的用途,不在这些合法使用等方面之外共享、泄露或出售,并且接受监管部门的监督管理。应当对用户个人信息采取必要措施,以防止对此类个人信息数据的非法访问,确保有权访问个人信息数据的人员遵守相关法律法规的规定,确保用户个人信息安全。一旦不再需要这些用户个人信息数据,应当通过限制甚至禁止数据收集和/或删除数据的方式将风险降至最低。It should be noted that in the technical solution of the present invention, the collection, analysis, use, transmission, storage, etc. of the user's personal information involved are in compliance with the provisions of relevant laws and regulations, and are used for legal and reasonable purposes. Sharing, divulging or selling other than legitimate use, and accepting the supervision and management of regulatory authorities. Necessary measures should be taken for users' personal information to prevent illegal access to such personal information and data, ensure that those who have access to personal information and data abide by relevant laws and regulations, and ensure the security of users' personal information. Risk should be minimized by limiting or even prohibiting data collection and/or deleting data once such user personal data is no longer required.

本发明实施例主要实现两个方面的功能,即复杂多类型系统信息和数据的搜集以及基于此进行的高效准确而多维度的搜索功能,从而便利地搜索全生态运维信息。The embodiment of the present invention mainly realizes two aspects of functions, that is, the collection of complex and multi-type system information and data and the efficient, accurate and multi-dimensional search function based on this, so as to conveniently search for the entire ecological operation and maintenance information.

图1是根据本发明实施例的数据搜索方法的流程图。作为本发明的一个实施例,如图1所示,所述数据搜索方法可以包括:Fig. 1 is a flowchart of a data search method according to an embodiment of the present invention. As an embodiment of the present invention, as shown in Figure 1, the data search method may include:

步骤101,接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据。Step 101, receiving the operation and maintenance data pushed by each operation and maintenance system, or pulling the operation and maintenance data from each operation and maintenance system.

如图2所示,可以通过搭建的数据处理层服务,主动地接收各个运维系统(比如运维系统A、运维系统B、运维系统C等)推送过来的最新的运维数据。除此之外作为辅助措施,数据处理层服务会定时对接入的各个运维系统进行主动地数据扫描,已达到查漏补缺的效果,保证数据的完整性。As shown in Figure 2, the data processing layer service can be built to actively receive the latest operation and maintenance data pushed by various operation and maintenance systems (such as operation and maintenance system A, operation and maintenance system B, operation and maintenance system C, etc.). In addition, as an auxiliary measure, the data processing layer service will regularly scan the data of each connected operation and maintenance system actively, which has achieved the effect of checking for leaks and filling in vacancies and ensuring the integrity of the data.

有时候由于网络问题或者流量限制,运维系统无法保证实时地推送最新的运维数据,所以如果只依靠接口被动接收运维数据可能会造成数据的缺失,因此服务还需要主动地数据扫描机制。对于接入系统的所有运维系统,服务可以根据数据量或请求频率灵活地为各个系统单独设置一个扫描周期,以达到按某一频率持续定时扫描运维系统的要求。需要指出的是,对于主动扫描到的运维数据会进行筛选,已经录入过数据库的信息被忽略掉,不会被重复录入。Sometimes due to network problems or traffic restrictions, the operation and maintenance system cannot guarantee to push the latest operation and maintenance data in real time, so if only relying on the interface to passively receive operation and maintenance data may cause data loss, so the service also needs an active data scanning mechanism. For all the operation and maintenance systems connected to the system, the service can flexibly set a scanning cycle for each system according to the data volume or request frequency, so as to meet the requirement of continuously scanning the operation and maintenance system at a certain frequency. It should be pointed out that the operation and maintenance data scanned actively will be screened, and the information that has been entered into the database will be ignored and will not be repeatedly entered.

因此,采用主动和被动两种方式并行地获取运维数据,保证了接入数据的完整性和实时性。Therefore, the operation and maintenance data is obtained in parallel in two ways, active and passive, to ensure the integrity and real-time performance of the access data.

步骤102,为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中。Step 102, adding classification fields and label fields to the operation and maintenance data, writing the operation and maintenance data and their corresponding classification fields and label fields into the database, and according to the operation and maintenance data and their corresponding classification fields and label fields A document is created and said document is updated into the index.

为了结构灵活地存储大量运维数据,本发明实施例采用Nosql分布式数据库集群进行数据存储,这样选择的理由有以下两点:存储数据结构灵活,可以适配各种运维系统的运维数据;存储数据量大,方便随时扩容。因此采用Nosql分布式数据库集群存储运维数据,既保证了存储大量数据,又保护了数据的原始格式。In order to store a large amount of operation and maintenance data in a flexible structure, the embodiment of the present invention uses a Nosql distributed database cluster for data storage. There are two reasons for this choice: the storage data structure is flexible and can be adapted to the operation and maintenance data of various operation and maintenance systems ; A large amount of stored data is convenient for expansion at any time. Therefore, the Nosql distributed database cluster is used to store operation and maintenance data, which not only ensures the storage of a large amount of data, but also protects the original format of the data.

由于采用了Nosql数据库进行数据存储,接收运维数据的接口服务对于传送过来的数据结构没有严格的限制,可以借此在数据存储中保住原始的运维数据结构。在保住原生的数据结构的同时,数据处理层会为后续的搜索和管理额外的添加一些系统信息,比如:Since the Nosql database is used for data storage, the interface service for receiving operation and maintenance data has no strict restrictions on the transmitted data structure, so that the original operation and maintenance data structure can be preserved in the data storage. While maintaining the original data structure, the data processing layer will add some additional system information for subsequent search and management, such as:

1)添加分类字段:根据数据来源,也就是从哪个运维系统获取到的运维数据,给运维数据添加分类字段,比如所在运维系统的名称、编号、类型等,部署运行的信息,比如网段、部署机房、部署设备、部署区域等的名称、编号等信息,方便后续根据具体分类字段作为条件进行信息搜索,比如以网段或部署机房进行搜索。1) Add classification fields: According to the data source, that is, the operation and maintenance data obtained from which operation and maintenance system, add classification fields to the operation and maintenance data, such as the name, number, type, etc. of the operation and maintenance system where it is located, and deploy and run information. For example, information such as network segments, deployment equipment rooms, deployment equipment, and deployment areas, such as names and numbers, facilitate subsequent information searches based on specific classification fields as conditions, such as searching by network segments or deployment equipment rooms.

2)添加标签字段:标签字段和分类字段的区别在于分类是有限的几种聚类,反映了运维系统的总体信息,而标签是一些短语或关键词,能够最大程度描述系统的功能细节。根据预置的标签类型,从运维数据中提取出一些关键字/关键词作为标签字段。标签是非常有价值的搜索条件,可以在后续搜索中极大地增加搜索准确性。需要指出的是,根据运维数据的标签命中情况,可以为运维数据添加一个标签字段,也可以添加多个标签字段。2) Add label fields: The difference between label fields and classification fields is that classification is a limited number of clusters that reflect the overall information of the operation and maintenance system, while labels are some phrases or keywords that can describe the functional details of the system to the greatest extent. According to the preset tag type, some keywords/keywords are extracted from the operation and maintenance data as tag fields. Tags are very valuable search terms that can greatly increase search accuracy in subsequent searches. It should be pointed out that, according to the tag hit status of the operation and maintenance data, one tag field or multiple tag fields can be added to the operation and maintenance data.

3)设置目录:根据以上的分类字段生成树形目录,将运维数据加入到目录信息中。通过使用树形目录的格式,可以更加方便地管理和查找到对应的运维数据,如按照所属单位,部门,机房等,方便快速查看同类信息下的运维数据。3) Set directory: Generate a tree directory according to the above classification fields, and add operation and maintenance data to the directory information. By using the format of the tree directory, you can manage and find the corresponding operation and maintenance data more conveniently, such as according to the unit, department, computer room, etc., so that you can quickly view the operation and maintenance data under the same information.

可选地,将所述运维数据及其对应的分类字段和标签字段写入数据库,包括:根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。在本发明的实施例中,接收到各个运维系统推送过来的运维数据或者从各个运维系统拉取运维数据后,分别为各个运维数据添加分类字段和标签字段,然后根据分类字段生成树形目录并将运维数据加入到树形目录中,数据处理层将加入了运维数据的树形目录以及运维数据与标签字段的对应关系推送给数据存储层进行持久化存储,并进行后续处理。Optionally, writing the operation and maintenance data and its corresponding classification fields and label fields into the database includes: generating a tree directory according to the classification fields, adding the operation and maintenance data to the tree directory; The above tree directory, and the corresponding relationship between the operation and maintenance data and the label field is written into the database. In the embodiment of the present invention, after receiving the operation and maintenance data pushed by each operation and maintenance system or pulling the operation and maintenance data from each operation and maintenance system, adding classification fields and label fields for each operation and maintenance data, and then according to the classification fields Generate a tree directory and add the operation and maintenance data to the tree directory. The data processing layer will push the tree directory with the operation and maintenance data and the corresponding relationship between the operation and maintenance data and the label field to the data storage layer for persistent storage, and Follow up.

为了方便存储和管理大量异构运维数据,本发明实施例采用NoSql分布式数据库集群作为数据存储层服务底层的数据库应用。在数据存储层中,又包含有以下两部分业务:利用Nosql灵活的数据结构,存储由数据处理层传递过来的大量异构的运维数据信息;为了后续搜索,将存储的运维数据处理为搜索服务使用的搜索索引格式数据,即根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中。In order to facilitate the storage and management of a large amount of heterogeneous operation and maintenance data, the embodiment of the present invention uses a NoSql distributed database cluster as the underlying database application of the data storage layer service. In the data storage layer, it also includes the following two parts of business: use the flexible data structure of Nosql to store a large amount of heterogeneous operation and maintenance data information passed by the data processing layer; for subsequent searches, process the stored operation and maintenance data into The search index format data used by the search service is to create documents based on the operation and maintenance data and their corresponding classification fields and label fields, and update the documents to the index.

从数据处理层推送过来的数据对应新增或者修改两种不同的处理方式。数据库中不存在的数据为新增数据,系统中已存在的数据则对对应字段进行修改更新,保证数据的实时性和正确性。具体地,根据数据类型,将其存储入分布式数据库集群中。数据类型即反映了系统的信息类别,比如数字类型或文字类型的系统信息,日期类型的各种系统日历,文件类型的系统日志、系统手册等等。运维数据的数据标识与标签字段的对应关系可以存储到数据库表中,用来表示运维数据与标签字段的对应关系。The data pushed from the data processing layer corresponds to adding or modifying two different processing methods. The data that does not exist in the database is new data, and the data that already exists in the system will modify and update the corresponding fields to ensure the real-time and correctness of the data. Specifically, according to the data type, it is stored in a distributed database cluster. The data type reflects the information category of the system, such as system information of number type or text type, various system calendars of date type, system log and system manual of file type, and so on. The corresponding relationship between the data identification of the operation and maintenance data and the label field can be stored in the database table, which is used to represent the corresponding relationship between the operation and maintenance data and the label field.

可选地,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中,包括:根据所述运维数据生成索引字段;将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。如图2所示,为了方便后续数据搜索服务使用,这些新入库的数据会激活一个更新搜索索引的流程,以确保其会在第一时间被用户搜索到。主要包括如下步骤:首先,根据运维数据生成一个个索引字段(Field),然后将该运维数据的分类字段、标签字段和索引字段组装成一个文档(Document),最后将该运维数据的文档写入索引创建器(IndexWriter),索引创建器将该文档更新到索引中。也就是说,将多个运维系统的运维数据组织起来,最终形成索引,以便于搜索。待更新完数据库和对应的索引后,数据更新的流程结束。最后,数据存储将生成的搜索索引数据存储到数据库中,以便于搜索。Optionally, creating a document according to the operation and maintenance data and its corresponding classification field and label field, and updating the document into the index includes: generating an index field according to the operation and maintenance data; adding the index field, The operation and maintenance data and its corresponding classification fields and label fields are assembled into a document, and the document is written into the index creator, thereby updating the document into the index. As shown in Figure 2, in order to facilitate the use of subsequent data search services, these newly entered data will activate a process of updating the search index to ensure that it will be searched by users in the first time. It mainly includes the following steps: First, generate index fields (Fields) based on the operation and maintenance data, then assemble the classification fields, label fields and index fields of the operation and maintenance data into a document (Document), and finally A document is written to an index writer (IndexWriter), which updates the document into the index. That is to say, organize the operation and maintenance data of multiple operation and maintenance systems, and finally form an index for easy search. After the database and corresponding indexes are updated, the data update process ends. Finally, the data store stores the generated search index data into the database for easy searching.

可选地,根据所述运维数据生成索引字段,包括:根据所述运维数据的元数据生成索引字段;和/或,对所述运维数据进行分词处理,从而得到索引字段。对于每条运维数据,提取出其全部类型的数据,从而生成一个个索引字段。其中,根据数据的分类和信息内容的类型,有些索引字段直接存储不处理的元数据,比如数据标识、网络类型、系统部署地域等需要具有完整性的内容;而有的字段为了尽可能的方便查找,就需要分词后存储,比如系统日志、系统文件、系统简介等。Optionally, generating an index field according to the operation and maintenance data includes: generating an index field according to metadata of the operation and maintenance data; and/or performing word segmentation processing on the operation and maintenance data to obtain an index field. For each piece of operation and maintenance data, all types of data are extracted to generate index fields one by one. Among them, according to the classification of data and the type of information content, some index fields directly store unprocessed metadata, such as data identification, network type, system deployment region, etc. To search, it needs to be stored after word segmentation, such as system logs, system files, system profiles, etc.

步骤103,接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段。Step 103, receiving a search request sent by a searcher, the search request carrying a search sentence input by a user, and identifying a target field from the search sentence.

对于运维数据的搜索请求,搜索服务层会对用户输入的搜索语句进行识别,从而识别出目标字段,然后根据识别出的目标字段从数据存储层中找出符合用户期望的运维数据。如图2所示,比如对用户输入的搜索语句进行关键字提取、拼音转换、意图识别等输入处理,从而识别出目关键字、标签字段、关联字段、分词等目标字段,以求让搜索服务能更精准的获取到用户的真实搜索意愿。For the search request of operation and maintenance data, the search service layer will identify the search statement entered by the user to identify the target field, and then find the operation and maintenance data that meets the user's expectations from the data storage layer according to the identified target field. As shown in Figure 2, for example, input processing such as keyword extraction, pinyin conversion, and intent recognition is performed on the search sentence input by the user, so as to identify target keywords, tag fields, associated fields, word segmentation, and other target fields, so that the search service Can more accurately obtain the user's real search intention.

步骤104,根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。Step 104 , find multiple matching documents in the index according to the target field, and sort the multiple documents, so as to return the documents with the highest ranking results to the searcher.

如图2所示,使用步骤103识别出的目标字段在数据存储层的搜索索引中找出符合条件的文档,对于符合条件的文档,根据目标字段与文档的相关度对各个文档进行排序,并将排序结果靠前的文档返回给搜索方。因此,本发明实施例能够依照多维度和用户需求从数据存储层中找出符合用户期望的运维数据。As shown in Figure 2, use the target field identified in step 103 to find out qualified documents in the search index of the data storage layer, and for qualified documents, sort each document according to the correlation between the target field and the document, and Return the top-ranked documents to the searcher. Therefore, the embodiment of the present invention can find the operation and maintenance data that meets user expectations from the data storage layer according to multi-dimensional and user requirements.

本发明实施例收集了大量的运维数据,可以提供一个运维节点(如服务器)的纵向和横向的完整数据。运维节点的纵向数据是指其自身从出生到消亡所产生的一切操作和维护数据,而运维的横向数据是指其所处的环境以及相邻节点的内容。因此,本发明实施例能够打通运维节点的纵向和横向数据,从而有效地挖掘出该节点数据的价值。The embodiment of the present invention collects a large amount of operation and maintenance data, and can provide vertical and horizontal complete data of an operation and maintenance node (such as a server). The longitudinal data of an operation and maintenance node refers to all the operation and maintenance data generated from its birth to its demise, while the horizontal data of operation and maintenance refers to its environment and the content of adjacent nodes. Therefore, the embodiment of the present invention can open up the vertical and horizontal data of the operation and maintenance node, so as to effectively mine the value of the node data.

根据上面所述的各种实施例,可以看出本发明实施例通过为运维数据添加分类字段和标签字段,将运维数据及其对应的分类字段和标签字段写入数据库,根据运维数据及其对应的分类字段和标签字段创建文档,并将文档更新到索引中,从搜索语句中识别出目标字段,根据目标字段在索引中查找出匹配的多个文档,并对多个文档进行排序的技术手段,解决了现有技术中无法便利地搜索全生态运维数据的技术问题。本发明实施例既可以灵活大量存储复杂多类型数据信息,又可以精准地根据不同需求搜索出对应的运维数据,从而便利地搜索全生态运维数据。According to the various embodiments described above, it can be seen that the embodiment of the present invention writes the operation and maintenance data and its corresponding classification fields and label fields into the database by adding classification fields and label fields to the operation and maintenance data, and according to the operation and maintenance data Create a document with its corresponding classification field and label field, and update the document to the index, identify the target field from the search statement, find multiple matching documents in the index according to the target field, and sort the multiple documents The technical means solves the technical problem that the existing technology cannot conveniently search the operation and maintenance data of the whole ecology. The embodiments of the present invention can not only flexibly store a large amount of complex and multi-type data information, but also accurately search for corresponding operation and maintenance data according to different needs, so as to conveniently search for the whole ecological operation and maintenance data.

图3是根据本发明一个可参考实施例的数据搜索方法的流程图。作为本发明的又一个实施例,如图3所示,所述数据搜索方法可以包括:Fig. 3 is a flowchart of a data search method according to a reference embodiment of the present invention. As another embodiment of the present invention, as shown in FIG. 3, the data search method may include:

步骤301,接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据。Step 301, receiving the operation and maintenance data pushed by each operation and maintenance system, or pulling the operation and maintenance data from each operation and maintenance system.

步骤302,为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中。Step 302, adding classification fields and label fields to the operation and maintenance data, writing the operation and maintenance data and their corresponding classification fields and label fields into the database, and according to the operation and maintenance data and their corresponding classification fields and label fields A document is created and said document is updated into the index.

步骤303,接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句。Step 303, receiving a search request sent by a searcher, the search request carrying a search sentence input by a user.

步骤304,对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种。Step 304, perform preprocessing on the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing.

步骤305,从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段。Step 305 , extract keywords and/or label fields from the preprocessed search statement, and identify user intentions according to the keywords and/or the label fields, so as to obtain associated fields.

本发明实施例在数据搜索上采用了Lucene全文搜索,比起传统的Like搜索拥有更高的准确性和灵活度,可以在尝试理解用户意图的基础上,根据用户要求返回最符合其要求的结果。The embodiment of the present invention adopts Lucene full-text search in the data search, which has higher accuracy and flexibility than the traditional Like search, and can return the results that best meet the user's requirements on the basis of trying to understand the user's intention .

为了根据条件搜索出符合要求的运维数据,首先对用户输入的搜索语句进行处理,以帮助用户更加清楚的表达搜索意图,主要包括以下处理步骤:In order to search out the operation and maintenance data that meets the requirements according to the conditions, the search statement entered by the user is first processed to help the user express the search intention more clearly, mainly including the following processing steps:

1)拼音转换处理:这一步是双向的,对于用户输入的非中文字符,检查是否是对应中文词语的拼音,如果能找到则转换为中文词语添加进搜索语句;对于输入的中文,则转换为对应的拼音,以应对出现同音词或错别字的情况。1) Pinyin conversion processing: This step is two-way. For the non-Chinese characters input by the user, check whether it is the pinyin of the corresponding Chinese word. If it can be found, it will be converted into a Chinese word and added to the search statement; for the input Chinese, it will be converted into Corresponding pinyin to deal with homonyms or typos.

如:数据库->shujukuSuch as: database -> shujuku

Ceshi->测试Ceshi -> test

2)补完处理:根据从搜索索引和标签字段中提取到的词组,辅助用户完成搜索语句输入。因为运维系统是一个比较独立的领域,有较多的专有名词需要记忆,因此这个功能可以较为方便的帮助用户完善输入信息,比如当用户输入“Orac”时,则补全“Oracle”这一数据库名词。2) Completion processing: according to the phrases extracted from the search index and label fields, assist the user to complete the input of the search sentence. Because the operation and maintenance system is a relatively independent field, there are many proper nouns that need to be memorized, so this function can help users improve the input information more conveniently. For example, when the user enters "Orac", it will complete the word "Oracle". A database noun.

3)近义词补充处理:对于输入的搜索语句,会在一个维护的近义词表中查询其是否有与其意义相近的其他词组,如果有,则补充进搜索语句中,扩大搜索命中的可能性。对于运维系统这一应用场景,有些名词可以归整为近义词,如:软件和应用、框架和架构等。3) Supplementary processing of synonyms: For the input search sentence, it will be inquired in a maintained synonym table whether there are other phrases with similar meanings. If so, it will be added to the search sentence to expand the possibility of search hits. For the application scenario of the operation and maintenance system, some nouns can be grouped into synonyms, such as: software and application, framework and architecture, etc.

4)关键字提取:作为运维数据,有很多专业名词或者关键语句,如IP地址等等,根据预先维护的关键词表以及IP格式等等,可以提取出搜索的关键字,在后续搜索中会根据这些关键字进行偏重的搜索。4) Keyword extraction: As the operation and maintenance data, there are many professional terms or key sentences, such as IP address, etc., according to the pre-maintained keyword list and IP format, etc., the search keywords can be extracted, and in the subsequent search Searches will be biased based on these keywords.

5)标签检查:输入的搜索语句中可能包含标签,如果命中标签,则在后续搜索中着重搜索包含标签的数据。比如,对于希望查找的“内网测试环境包含有哪些单点部署的信息系统”这一搜索语句,如果系统包含有“内网”、“测试环境”、“单点部署”、“信息系统”这些标签,就可以认为基本符合搜索条件。5) Label check: The input search statement may contain labels, and if the label is hit, the subsequent search will focus on searching for data containing labels. For example, for the search statement "what information systems are deployed in the intranet test environment?", if the system contains "intranet", "test environment", "single-site deployment", and "information system" These tags can be considered to basically meet the search criteria.

6)意图识别:根据提取的关键字和检查到的标签,能够猜测用户的搜索意图,比如搜索的信息系统种类、搜索的倾向性等等,这些意图能够帮助后续流程更准确地找出符合用户需要的数据。比如当一个用户搜索的是“最近性能压力较大的系统”,即可以根据意图判断和理解,将网络请求较多、CPU和内存等数据饱和性较大的系统返回。6) Intent recognition: According to the extracted keywords and checked tags, it is possible to guess the user's search intention, such as the type of information system searched, the tendency of search, etc. These intentions can help the follow-up process to more accurately find out what matches the user. data needed. For example, when a user searches for "systems with high performance pressure recently", it can judge and understand the intent, and return systems with more network requests and higher data saturation such as CPU and memory.

步骤306,对所述搜索语句进行分词处理,从而得到分词。Step 306, performing word segmentation processing on the search sentence, so as to obtain word segmentation.

由于中文表意的特殊性,想要搜索效果好,重点在于搜索语句的分词以及召回数据的补完措施。由于索引数据在建立时有的字段进行了分词处理,为了最大限度的找到这些数据,也需要对搜索语句进行分词处理,以达到搜索内容和存储内容的匹配。Due to the particularity of Chinese ideograms, if you want to have a good search effect, the focus is on the word segmentation of the search sentence and the completion of the recalled data. Since some fields of the index data are word-segmented when they are created, in order to find the data to the greatest extent, it is also necessary to perform word-segmentation on the search statement to match the search content with the stored content.

即使再好的分词器,由于中文分词的复杂程度极高,也不可能完美表达用户的意图。而有时候可能不分词直接根据语句搜索,既LIKE的形式找到的就是最合适的结果。因此除了分词搜索之外,在召回结果较少或召回得分较低的情况下,还会采用额外的手段扩大找回数据的范围:Even the best tokenizer, due to the extremely high complexity of Chinese word segmentation, it is impossible to perfectly express the user's intention. And sometimes you may search directly based on the sentence without word segmentation, that is, the most suitable result is found in the form of LIKE. Therefore, in addition to word segmentation search, in the case of fewer recall results or lower recall scores, additional means will be used to expand the scope of retrieved data:

1)逐字分词搜索:对搜索语句的每一个字进行拆分,将分词结果在搜索索引中进行搜索。这种形式对于输入词不长且召回数据极少时有不错的效果。对于运维信息系统搜索这一应用场景来说,可能有的时候搜索语句比较复杂,或者原数据内容包含有一些混合的信息,会影响到分词的结果,这时候无论什么分词方式都不易匹配到索引中储存的文档,这时如果逐字拆开然后搜索,将匹配到的结果都进行返回也许就能找到合适的结果。1) Word-by-word word segmentation search: split each word of the search statement, and search the word segmentation results in the search index. This form has a good effect when the input word is not long and the recall data is very small. For the application scenario of operation and maintenance information system search, sometimes the search statement may be more complicated, or the original data content contains some mixed information, which will affect the result of word segmentation. At this time, no matter what word segmentation method is used, it is not easy to match For the documents stored in the index, if you disassemble them word by word and search, and return all the matching results, you may be able to find the appropriate result.

2)不分词搜索:即采用数据库搜索的LIKE形式,将搜索语句当作一个完整的词组,在索引中查找包含这一词组的数据进行召回。对于中英文混合或分词效果不好的搜索语句,采用这种形式补完召回数据往往能帮助找到需要的结果。如搜索信息系统的某些功能或组件的编号或名称,这些内容有时候是中文数字和字母混合的内容,这时候用一个不分词的完整搜索语句去搜索可能正确率更高。2) Wordless search: that is, use the LIKE form of database search, treat the search statement as a complete phrase, and search the index for data containing this phrase to recall. For search sentences with mixed Chinese and English or poor word segmentation, using this form to complete the recall data can often help find the desired results. For example, when searching for the numbers or names of certain functions or components of the information system, these contents are sometimes mixed with Chinese numbers and letters. At this time, it may be more accurate to use a complete search statement without word segmentation.

无论采用什么形式召回的数据,数据此时并没有按顺序进行排列,要想优先返回最符合用户意图的结果,还需要对其进行排序操作。No matter what form of recalled data is used, the data is not arranged in order at this time. In order to give priority to returning the results that best meet the user's intentions, it needs to be sorted.

步骤307,根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序。Step 307 , find multiple matching documents in the index according to the target field, and sort the multiple documents.

步骤308,将排序结果靠前的文档返回至所述搜索方。Step 308 , returning the documents with the highest ranking results to the searcher.

另外,在本发明一个可参考实施例中数据搜索方法的具体实施内容,在上面所述数据搜索方法中已经详细说明了,故在此重复内容不再说明。In addition, the specific implementation content of the data search method in a reference embodiment of the present invention has been described in detail in the above data search method, so the repeated content will not be described here.

图4是根据本发明另一个可参考实施例的数据搜索方法的流程图。作为本发明的另一个实施例,如图4所示,所述数据搜索方法可以包括:Fig. 4 is a flowchart of a data search method according to another reference embodiment of the present invention. As another embodiment of the present invention, as shown in Figure 4, the data search method may include:

步骤401,接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据。Step 401, receiving the operation and maintenance data pushed by each operation and maintenance system, or pulling the operation and maintenance data from each operation and maintenance system.

步骤402,为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中。Step 402, adding classification fields and label fields to the operation and maintenance data, writing the operation and maintenance data and their corresponding classification fields and label fields into the database, and according to the operation and maintenance data and their corresponding classification fields and label fields A document is created and said document is updated into the index.

步骤403,接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句。Step 403, receiving a search request sent by a searcher, the search request carrying a search sentence input by a user.

步骤404,从所述搜索语句中识别出目标字段。Step 404, identifying the target field from the search statement.

步骤405,对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值。Step 405, for each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and weight the calculated correlation sum, so as to obtain the BM25 value of the search statement and the document.

数据排序,即对待返回的数据根据某种规则进行顺序调整,目标是:将最符合搜索语义的结果尽量放在头部,将最有可能被用户选择的数据放在头部,这一步骤的重点是采用的排序维度和使用的排序算法。本发明实施例采用的是BM25算法对召回数据进行打分处理,比较召回数据的得分之后,也就有了相关的顺序。Data sorting is to adjust the order of the returned data according to certain rules. The goal is to put the results that are most in line with the search semantics at the head as much as possible, and put the data that is most likely to be selected by the user at the head. The focus is on the sorting dimension employed and the sorting algorithm used. The embodiment of the present invention uses the BM25 algorithm to score the recalled data, and after comparing the scores of the recalled data, there is a related sequence.

可选地,所述标签字段的权重最高且大于1。但是由于信息系统的字段种类很多,而这些内容的重要性也不甚相同,比如系统名称或系统简介,显然比系统日志或者系统文件在搜索时包含的价值更大,因此在计算BM25值,还需要根据搜索匹配到的字段的具体出处进行加权求和后再进行排序。比如:Optionally, the label field has the highest weight and is greater than 1. However, there are many types of fields in the information system, and the importance of these contents is not the same. For example, the system name or system introduction is obviously more valuable than the system log or system file when searching. Therefore, when calculating the BM25 value, it is also It is necessary to perform a weighted summation based on the specific sources of the fields matched by the search before sorting. for example:

可以看到,标签对于表现一个系统的具体内容非常重要,对于在标签中匹配到的内容,会通过给予较高的权重,保证匹配到的文档排列在前。另外,对于系统名称或者系统简介等重要字段,匹配到的结果也会给予正向的加权计算。而对于系统日志或者系统文件等系统包含较多而且信息混杂的字段,为了防止混淆结果,会适度地通过反向加权引导排序的方法,将结果放置在较后面显示。It can be seen that tags are very important to express the specific content of a system. For the content matched in the tags, a higher weight will be given to ensure that the matched documents are ranked first. In addition, for important fields such as system name or system introduction, the matching results will also be given a positive weight calculation. For systems such as system logs or system files that contain many fields with mixed information, in order to prevent confusing results, the reverse weighting will be used to guide the sorting method appropriately, and the results will be displayed later.

步骤406,根据BM25值由大到小的顺序,对各个文档进行排序。Step 406, sort each document according to the descending order of BM25 values.

步骤407,将排序结果靠前的文档返回至所述搜索方。Step 407, returning the documents with the highest ranking results to the searcher.

另外,在本发明另一个可参考实施例中数据搜索方法的具体实施内容,在上面所述数据搜索方法中已经详细说明了,故在此重复内容不再说明。In addition, the specific implementation content of the data search method in another reference embodiment of the present invention has been described in detail in the above data search method, so the repeated content will not be described here.

图5是根据本发明实施例的数据搜索装置的示意图。如图5所示,所述数据搜索装置500包括接收模块501、存储模块502、处理模块503和搜索模块504;其中,接收模块501用于接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;存储模块502用于为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;处理模块503用于接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;搜索模块504用于根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。Fig. 5 is a schematic diagram of a data search device according to an embodiment of the present invention. As shown in FIG. 5 , the data search device 500 includes a receiving module 501, a storage module 502, a processing module 503, and a searching module 504; wherein, the receiving module 501 is used to receive the operation and maintenance data pushed by each operation and maintenance system, or, Pull the operation and maintenance data from each operation and maintenance system; the storage module 502 is used to add classification fields and label fields for the operation and maintenance data, write the operation and maintenance data and its corresponding classification fields and label fields into the database, according to the Create a document based on the operation and maintenance data and its corresponding classification fields and label fields, and update the document into the index; the processing module 503 is used to receive the search request sent by the searcher, and the search request carries the search statement input by the user, A target field is identified from the search statement; the search module 504 is configured to find multiple matching documents in the index according to the target field, and sort the multiple documents, so that the sorting result is placed at the front of documents are returned to the searcher.

可选地,所述存储模块502还用于:Optionally, the storage module 502 is also used for:

根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;Generate a tree-shaped directory according to the classification field, and add the operation and maintenance data to the tree-shaped directory;

将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。Writing the tree directory, and the corresponding relationship between the operation and maintenance data and the label field into a database.

可选地,所述存储模块502还用于:Optionally, the storage module 502 is also used for:

根据所述运维数据生成索引字段;generating an index field according to the operation and maintenance data;

将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。Assembling the index field, the operation and maintenance data and its corresponding classification field and label field into a document, and writing the document into an index creator, thereby updating the document into the index.

可选地,所述存储模块502还用于:Optionally, the storage module 502 is also used for:

根据所述运维数据的元数据生成索引字段;和/或,generating an index field according to the metadata of the operation and maintenance data; and/or,

对所述运维数据进行分词处理,从而得到索引字段。Word segmentation is performed on the operation and maintenance data to obtain index fields.

可选地,所述处理模块503还用于:Optionally, the processing module 503 is further configured to:

对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种;Preprocessing the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing;

从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段;extracting keywords and/or label fields from the preprocessed search statement, identifying user intentions according to the keywords and/or the label fields, so as to obtain associated fields;

对所述搜索语句进行分词处理,从而得到分词;performing word segmentation processing on the search statement, thereby obtaining word segmentation;

其中,所述目标字段包括关键字和/或标签字段、关联字段、分词。Wherein, the target field includes keyword and/or label field, associated field, and word segmentation.

可选地,所述搜索模块504还用于:Optionally, the search module 504 is also used for:

对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值;For each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and perform a weighted summation of the calculated correlations, Thereby obtaining the BM25 value of the search statement and the document;

根据BM25值由大到小的顺序,对各个文档进行排序。Sort each document according to the order of BM25 value from large to small.

需要说明的是,在本发明所述数据搜索装置的具体实施内容,在上面所述数据搜索方法中已经详细说明了,故在此重复内容不再说明。It should be noted that the specific implementation content of the data search device of the present invention has been described in detail in the above data search method, so the repeated content will not be described here.

图6示出了可以应用本发明实施例的数据搜索方法或数据搜索装置的示例性系统架构600。Fig. 6 shows an exemplary system architecture 600 to which the data search method or data search device according to the embodiment of the present invention can be applied.

如图6所示,系统架构600可以包括终端设备601、602、603,网络604和服务器605。网络604用以在终端设备601、602、603和服务器605之间提供通信链路的介质。网络604可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 6 , a system architecture 600 may include terminal devices 601 , 602 , and 603 , a network 604 and a server 605 . The network 604 is used as a medium for providing communication links between the terminal devices 601 , 602 , 603 and the server 605 . Network 604 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备601、602、603通过网络604与服务器605交互,以接收或发送消息等。终端设备601、602、603上可以安装有各种通讯客户端应用,例如购物类应用、网页浏览器应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等(仅为示例)。Users can use terminal devices 601 , 602 , 603 to interact with server 605 via network 604 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 601, 602, 603, such as shopping applications, web browser applications, search applications, instant messaging tools, email clients, social platform software, etc. (just for example).

终端设备601、602、603可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 601, 602, 603 may be various electronic devices with display screens and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers and the like.

服务器605可以是提供各种服务的服务器,例如对用户利用终端设备601、602、603所浏览的购物类网站提供支持的后台管理服务器(仅为示例)。后台管理服务器可以对接收到的物品信息查询请求等数据进行分析等处理,并将处理结果反馈给终端设备。The server 605 may be a server that provides various services, such as a background management server that provides support for shopping websites browsed by users using the terminal devices 601 , 602 , 603 (just an example). The background management server can analyze and process the received data such as item information query requests, and feed back the processing results to the terminal device.

需要说明的是,本发明实施例所提供的数据搜索方法一般由服务器605执行,相应地,所述数据搜索装置一般设置在服务器605中。It should be noted that the data search method provided by the embodiment of the present invention is generally executed by the server 605 , and correspondingly, the data search device is generally set in the server 605 .

应该理解,图6中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 6 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.

下面参考图7,其示出了适于用来实现本发明实施例的终端设备的计算机系统700的结构示意图。图7示出的终端设备仅仅是一个示例,不应对本发明实施例的功能和使用范围带来任何限制。Referring now to FIG. 7 , it shows a schematic structural diagram of a computer system 700 suitable for implementing a terminal device according to an embodiment of the present invention. The terminal device shown in FIG. 7 is only an example, and should not limit the functions and application scope of this embodiment of the present invention.

如图7所示,计算机系统700包括中央处理单元(CPU)701,其可以根据存储在只读存储器(ROM)702中的程序或者从存储部分708加载到随机访问存储器(RAM)703中的程序而执行各种适当的动作和处理。在RAM 703中,还存储有系统700操作所需的各种程序和数据。CPU 701、ROM 702以及RAM703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , a computer system 700 includes a central processing unit (CPU) 701 that can operate according to a program stored in a read-only memory (ROM) 702 or a program loaded from a storage section 708 into a random-access memory (RAM) 703 Instead, various appropriate actions and processes are performed. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701 , ROM 702 , and RAM 703 are connected to each other via a bus 704 . An input/output (I/O) interface 705 is also connected to the bus 704 .

以下部件连接至I/O接口705:包括键盘、鼠标等的输入部分706;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分707;包括硬盘等的存储部分708;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分709。通信部分709经由诸如因特网的网络执行通信处理。驱动器710也根据需要连接至I/O接口705。可拆卸介质711,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器710上,以便于从其上读出的计算机程序根据需要被安装入存储部分708。The following components are connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, etc.; an output section 707 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker; a storage section 708 including a hard disk, etc. and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the Internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, optical disk, magneto-optical disk, semiconductor memory, etc. is mounted on the drive 710 as necessary so that a computer program read therefrom is installed into the storage section 708 as necessary.

特别地,根据本发明公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明公开的实施例包括一种计算机程序,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分709从网络上被下载和安装,和/或从可拆卸介质711被安装。在该计算机程序被中央处理单元(CPU)701执行时,执行本发明的系统中限定的上述功能。In particular, according to the disclosed embodiments of the present invention, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, the disclosed embodiments of the present invention include a computer program, including a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication portion 709 and/or installed from removable media 711 . When this computer program is executed by a central processing unit (CPU) 701, the above-mentioned functions defined in the system of the present invention are performed.

需要说明的是,本发明所示的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本发明中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本发明中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium shown in the present invention may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present invention, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present invention, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, in which computer-readable program codes are carried. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. . Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

附图中的流程图和框图,图示了按照本发明各种实施例的系统、方法和计算机程序的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

描述于本发明实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中,例如,可以描述为:一种处理器包括接收模块、存储模块、处理模块和搜索模块,其中,这些模块的名称在某种情况下并不构成对该模块本身的限定。The modules involved in the embodiments described in the present invention may be implemented by software or by hardware. The described modules can also be set in the processor, for example, it can be described as: a processor includes a receiving module, a storage module, a processing module and a search module, wherein the names of these modules do not constitute a pair of The module itself is defined.

作为另一方面,本发明还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的设备中所包含的;也可以是单独存在,而未装配入该设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被一个该设备执行时,该设备实现如下方法:接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。As another aspect, the present invention also provides a computer-readable medium. The computer-readable medium may be contained in the device described in the above embodiments, or it may exist independently without being assembled into the device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by a device, the device implements the following method: receiving the operation and maintenance data pushed by each operation and maintenance system, or receiving the operation and maintenance data from each operation and maintenance system The maintenance system pulls the operation and maintenance data; adds classification fields and label fields to the operation and maintenance data, writes the operation and maintenance data and its corresponding classification fields and label fields into the database, and according to the operation and maintenance data and its corresponding Classification field and label field create a document, and update the document into the index; receive the search request sent by the searcher, the search request carries the search statement entered by the user, and identify the target field from the search statement; according to the Find multiple matching documents in the index by using the target field, and sort the multiple documents, so as to return the documents with higher ranking results to the searcher.

作为另一方面,本发明实施例还提供了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现上述任一实施例所述的方法。As another aspect, an embodiment of the present invention further provides a computer program product, including a computer program, and when the computer program is executed by a processor, the method described in any of the foregoing embodiments is implemented.

根据本发明实施例的技术方案,因为采用为运维数据添加分类字段和标签字段,将运维数据及其对应的分类字段和标签字段写入数据库,根据运维数据及其对应的分类字段和标签字段创建文档,并将文档更新到索引中,从搜索语句中识别出目标字段,根据目标字段在索引中查找出匹配的多个文档,并对多个文档进行排序的技术手段,所以克服了现有技术中无法便利地搜索全生态运维数据的技术问题。本发明实施例既可以灵活大量存储复杂多类型数据信息,又可以精准地根据不同需求搜索出对应的运维数据,从而便利地搜索全生态运维数据。According to the technical solution of the embodiment of the present invention, since the operation and maintenance data and its corresponding classification fields and label fields are written into the database by adding classification fields and label fields to the operation and maintenance data, according to the operation and maintenance data and its corresponding classification fields and Create a document with a label field, update the document to the index, identify the target field from the search statement, find multiple matching documents in the index according to the target field, and sort the multiple documents, so it overcomes the In the existing technology, it is impossible to conveniently search for the technical problem of the whole ecological operation and maintenance data. The embodiments of the present invention can not only flexibly store a large amount of complex and multi-type data information, but also accurately search for corresponding operation and maintenance data according to different needs, so as to conveniently search for the whole ecological operation and maintenance data.

上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,取决于设计要求和其他因素,可以发生各种各样的修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above specific implementation methods do not constitute a limitation to the protection scope of the present invention. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.

Claims (15)

1.一种数据搜索方法,其特征在于,包括:1. A data search method, characterized in that, comprising: 接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;Receive the operation and maintenance data pushed by each operation and maintenance system, or pull the operation and maintenance data from each operation and maintenance system; 为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;Adding classification fields and label fields to the operation and maintenance data, writing the operation and maintenance data and their corresponding classification fields and label fields into the database, creating documents according to the operation and maintenance data and their corresponding classification fields and label fields, and update said document into the index; 接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;receiving a search request sent by a searcher, the search request carrying a search statement input by a user, and identifying a target field from the search statement; 根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。Searching for a plurality of matched documents in the index according to the target field, and sorting the multiple documents, so as to return the documents with higher ranking results to the searcher. 2.根据权利要求1所述的方法,其特征在于,将所述运维数据及其对应的分类字段和标签字段写入数据库,包括:2. The method according to claim 1, wherein writing the operation and maintenance data and its corresponding classification fields and label fields into the database includes: 根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;Generate a tree-shaped directory according to the classification field, and add the operation and maintenance data to the tree-shaped directory; 将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。Writing the tree directory, and the corresponding relationship between the operation and maintenance data and the label field into a database. 3.根据权利要求1所述的方法,其特征在于,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中,包括:3. The method according to claim 1, wherein creating a document according to the operation and maintenance data and its corresponding classification field and label field, and updating the document into the index, comprising: 根据所述运维数据生成索引字段;generating an index field according to the operation and maintenance data; 将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。Assembling the index field, the operation and maintenance data and its corresponding classification field and label field into a document, and writing the document into an index creator, thereby updating the document into the index. 4.根据权利要求3所述的方法,其特征在于,根据所述运维数据生成索引字段,包括:4. The method according to claim 3, wherein generating an index field according to the operation and maintenance data includes: 根据所述运维数据的元数据生成索引字段;和/或,generating an index field according to the metadata of the operation and maintenance data; and/or, 对所述运维数据进行分词处理,从而得到索引字段。Word segmentation is performed on the operation and maintenance data to obtain index fields. 5.根据权利要求1所述的方法,其特征在于,从所述搜索语句中识别出目标字段,包括:5. The method of claim 1, wherein identifying a target field from the search statement comprises: 对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种;Preprocessing the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing; 从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段;extracting keywords and/or label fields from the preprocessed search statement, identifying user intentions according to the keywords and/or the label fields, so as to obtain associated fields; 对所述搜索语句进行分词处理,从而得到分词;performing word segmentation processing on the search statement, thereby obtaining word segmentation; 其中,所述目标字段包括关键字和/或标签字段、关联字段、分词。Wherein, the target field includes keyword and/or label field, associated field, and word segmentation. 6.根据权利要求5所述的方法,其特征在于,根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,包括:6. The method according to claim 5, characterized in that, finding a plurality of matching documents in the index according to the target field, and sorting the plurality of documents, comprising: 对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值;For each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and perform a weighted summation of the calculated correlations, Thereby obtaining the BM25 value of the search statement and the document; 根据BM25值由大到小的顺序,对各个文档进行排序。Sort each document according to the order of BM25 value from large to small. 7.一种数据搜索装置,其特征在于,包括:7. A data search device, characterized in that it comprises: 接收模块,用于接收各个运维系统推送过来的运维数据,或者,从各个运维系统拉取运维数据;The receiving module is used to receive the operation and maintenance data pushed by each operation and maintenance system, or to pull the operation and maintenance data from each operation and maintenance system; 存储模块,用于为所述运维数据添加分类字段和标签字段,将所述运维数据及其对应的分类字段和标签字段写入数据库,根据所述运维数据及其对应的分类字段和标签字段创建文档,并将所述文档更新到索引中;A storage module, configured to add classification fields and label fields to the operation and maintenance data, write the operation and maintenance data and their corresponding classification fields and label fields into a database, and according to the operation and maintenance data and their corresponding classification fields and Create a document with the label field and update said document into the index; 处理模块,用于接收搜索方发送的搜索请求,所述搜索请求携带用户输入的搜索语句,从所述搜索语句中识别出目标字段;A processing module, configured to receive a search request sent by a searcher, the search request carrying a search sentence input by a user, and identifying a target field from the search sentence; 搜索模块,用于根据所述目标字段在所述索引中查找出匹配的多个文档,并对所述多个文档进行排序,从而将排序结果靠前的文档返回至所述搜索方。A search module, configured to find multiple matching documents in the index according to the target field, and sort the multiple documents, so as to return the documents with higher ranking results to the searcher. 8.根据权利要求7所述的装置,其特征在于,所述存储模块还用于:8. The device according to claim 7, wherein the storage module is also used for: 根据所述分类字段生成树形目录,将所述运维数据加入所述树形目录;Generate a tree-shaped directory according to the classification field, and add the operation and maintenance data to the tree-shaped directory; 将所述树形目录,以及,所述运维数据与所述标签字段的对应关系写入数据库。Writing the tree directory, and the corresponding relationship between the operation and maintenance data and the label field into a database. 9.根据权利要求7所述的装置,其特征在于,所述存储模块还用于:9. The device according to claim 7, wherein the storage module is also used for: 根据所述运维数据生成索引字段;generating an index field according to the operation and maintenance data; 将所述索引字段、所述运维数据及其对应的分类字段和标签字段组装成一个文档,将所述文档写入索引创建器,从而将所述文档更新到索引中。Assembling the index field, the operation and maintenance data and its corresponding classification field and label field into a document, and writing the document into an index creator, thereby updating the document into the index. 10.根据权利要求9所述的装置,其特征在于,所述存储模块还用于:10. The device according to claim 9, wherein the storage module is also used for: 根据所述运维数据的元数据生成索引字段;和/或,generating an index field according to the metadata of the operation and maintenance data; and/or, 对所述运维数据进行分词处理,从而得到索引字段。Word segmentation is performed on the operation and maintenance data to obtain index fields. 11.根据权利要求7所述的装置,其特征在于,所述处理模块还用于:11. The device according to claim 7, wherein the processing module is further used for: 对所述搜索语句进行预处理,所述预处理包括拼音转换处理、补完处理和近义词补充处理中的至少一种;Preprocessing the search sentence, the preprocessing includes at least one of pinyin conversion processing, completion processing and synonym supplement processing; 从经过预处理后的搜索语句中提取出关键字和/或标签字段,根据所述关键字和/或所述标签字段识别出用户意图,从而得到关联字段;extracting keywords and/or label fields from the preprocessed search statement, identifying user intentions according to the keywords and/or the label fields, so as to obtain associated fields; 对所述搜索语句进行分词处理,从而得到分词;performing word segmentation processing on the search statement, thereby obtaining word segmentation; 其中,所述目标字段包括关键字和/或标签字段、关联字段、分词。Wherein, the target field includes keyword and/or label field, associated field, and word segmentation. 12.根据权利要求11所述的装置,其特征在于,所述搜索模块还用于:12. The device according to claim 11, wherein the search module is also used for: 对于所述索引中的每个文档,分别计算每个目标字段与所述文档中的各个索引字段、各个分类字段和标签字段的相关性得分,并对计算得到的相关性得到进行加权求和,从而得到所述搜索语句与所述文档的BM25值;For each document in the index, calculate the correlation score between each target field and each index field, each classification field and label field in the document, and perform a weighted summation of the calculated correlations, Thereby obtaining the BM25 value of the search statement and the document; 根据BM25值由大到小的顺序,对各个文档进行排序。Sort each document according to the order of BM25 value from large to small. 13.一种电子设备,其特征在于,包括:13. An electronic device, characterized in that it comprises: 一个或多个处理器;one or more processors; 存储装置,用于存储一个或多个程序,storage means for storing one or more programs, 当所述一个或多个程序被所述一个或多个处理器执行时,所述一个或多个处理器实现如权利要求1-6中任一所述的方法。When the one or more programs are executed by the one or more processors, the one or more processors implement the method according to any one of claims 1-6. 14.一种计算机可读介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现如权利要求1-6中任一所述的方法。14. A computer-readable medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1-6 is implemented. 15.一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-6中任一项所述的方法。15. A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the method according to any one of claims 1-6 is implemented.
CN202310551514.5A 2023-05-16 2023-05-16 Data search method, device, electronic device and computer readable medium Pending CN116561292A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310551514.5A CN116561292A (en) 2023-05-16 2023-05-16 Data search method, device, electronic device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310551514.5A CN116561292A (en) 2023-05-16 2023-05-16 Data search method, device, electronic device and computer readable medium

Publications (1)

Publication Number Publication Date
CN116561292A true CN116561292A (en) 2023-08-08

Family

ID=87494280

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310551514.5A Pending CN116561292A (en) 2023-05-16 2023-05-16 Data search method, device, electronic device and computer readable medium

Country Status (1)

Country Link
CN (1) CN116561292A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119377450A (en) * 2024-12-31 2025-01-28 长江证券股份有限公司 A method for constructing a securities industry data asset directory, a retrieval method and a device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130006999A1 (en) * 2011-06-30 2013-01-03 Copyright Clearance Center, Inc. Method and apparatus for performing a search for article content at a plurality of content sites
CN106294695A (en) * 2016-08-08 2017-01-04 深圳市网安计算机安全检测技术有限公司 A kind of implementation method towards the biggest data search engine
CN108363768A (en) * 2018-02-07 2018-08-03 深圳壹账通智能科技有限公司 A kind of document search method, storage medium and server based on Lucene
CN108520002A (en) * 2018-03-12 2018-09-11 平安科技(深圳)有限公司 Data processing method, server and computer storage media
US11030242B1 (en) * 2018-10-15 2021-06-08 Rockset, Inc. Indexing and querying semi-structured documents using a key-value store
CN112988863A (en) * 2021-02-09 2021-06-18 苏州中科蓝迪软件技术有限公司 Elasticissearch-based efficient search engine method for heterogeneous multiple data sources

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130006999A1 (en) * 2011-06-30 2013-01-03 Copyright Clearance Center, Inc. Method and apparatus for performing a search for article content at a plurality of content sites
CN106294695A (en) * 2016-08-08 2017-01-04 深圳市网安计算机安全检测技术有限公司 A kind of implementation method towards the biggest data search engine
CN108363768A (en) * 2018-02-07 2018-08-03 深圳壹账通智能科技有限公司 A kind of document search method, storage medium and server based on Lucene
CN108520002A (en) * 2018-03-12 2018-09-11 平安科技(深圳)有限公司 Data processing method, server and computer storage media
US11030242B1 (en) * 2018-10-15 2021-06-08 Rockset, Inc. Indexing and querying semi-structured documents using a key-value store
CN112988863A (en) * 2021-02-09 2021-06-18 苏州中科蓝迪软件技术有限公司 Elasticissearch-based efficient search engine method for heterogeneous multiple data sources

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘耀,袁伟著: "政策研究自动化关键技术研发与应用", vol. 2021, 30 April 2021, 科学技术文献出版社, pages: 53 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN119377450A (en) * 2024-12-31 2025-01-28 长江证券股份有限公司 A method for constructing a securities industry data asset directory, a retrieval method and a device

Similar Documents

Publication Publication Date Title
CN110647614B (en) Intelligent question-answering method, device, medium and electronic equipment
US11521603B2 (en) Automatically generating conference minutes
CN107491547B (en) Search method and device based on artificial intelligence
CN109325201B (en) Method, device, equipment and storage medium for generating entity relationship data
WO2019091026A1 (en) Knowledge base document rapid search method, application server, and computer readable storage medium
CN106960030B (en) Information pushing method and device based on artificial intelligence
CN111797214A (en) Question screening method, device, computer equipment and medium based on FAQ database
US9507867B2 (en) Discovery engine
US8661049B2 (en) Weight-based stemming for improving search quality
US20160041986A1 (en) Smart Search Engine
CN112256860A (en) Semantic retrieval method, system, equipment and storage medium for customer service conversation content
CN113988157B (en) Semantic retrieval network training method, device, electronic equipment and storage medium
CN113204621B (en) Document storage, document retrieval method, device, equipment and storage medium
CN111160007B (en) Search method and device based on BERT language model, computer equipment and storage medium
CN114722137A (en) Security policy configuration method, device and electronic device based on sensitive data identification
US10606903B2 (en) Multi-dimensional query based extraction of polarity-aware content
CN101187924A (en) A method and system for obtaining word-pair translations from bilingual sentence pairs
CN110245357B (en) Main entity identification method and device
CN116150497A (en) Text information recommendation method, device, electronic device and storage medium
CN111325018A (en) Domain dictionary construction method based on web retrieval and new word discovery
CN111126073B (en) Semantic retrieval method and device
CN110705285B (en) Government affair text subject word library construction method, device, server and readable storage medium
CN113360602B (en) Method, apparatus, device and storage medium for outputting information
CN116561292A (en) Data search method, device, electronic device and computer readable medium
CN114742062B (en) Text keyword extraction processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination