CN112269816B

CN112269816B - A Relevance Retrieval Method for Government Appointments

Info

Publication number: CN112269816B
Application number: CN202011244701.1A
Authority: CN
Inventors: 张超
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2020-11-10
Filing date: 2020-11-10
Publication date: 2023-04-21
Anticipated expiration: 2040-11-10
Also published as: CN112269816A

Abstract

The invention discloses a correlation retrieval method for government affairs reservation items, which belongs to the technical field of government affairs reservation. Based on user operation records, timing task induction is used to generate a correlation type index, and basic information maintenance generates a common type index; the correlation type index A statistical model in the form of scoring is generated, and a combined retrieval method is generated for keyword retrieval, related word retrieval, and ranking of relevance between keywords and reservation business for reservation business item retrieval. The present invention can accurately locate the service provider's needs and display relevant services to them when the public makes online reservations for business, and improves the efficiency of reservation service. At the same time, the statistical analysis method based on data can continuously optimize the query accuracy to improve performance and improve experience.

Description

A Relevance Retrieval Method for Government Appointment Items

技术领域technical field

本发明涉及政务预约技术领域，具体地说是一种政务预约事项相关性检索方法。The invention relates to the technical field of government affairs reservation, in particular to a correlation retrieval method for government affairs reservation items.

背景技术Background technique

随着政务服务领域不断发展提高和移动互联网发展到新阶段，基于网页端、App和小程序的多种渠道的在线预约办理给办事群众提供了简洁、便利、高效的政务办事体验，但是对智能办事的需求也越来越迫切，越来越多的群众办事更加需要智能化、个性化、准确化的办事过程，同时为了实现政务服务能力“可办”、“快办”到“智办”的转换，更加需要转变服务方式，将基于用户数据的统计分析能力运用网上服务，提高关键词命中率，改善政务服务治理能力。With the continuous development and improvement of the field of government services and the development of the mobile Internet to a new stage, the online reservation process based on various channels such as web pages, apps and small programs provides the public with a simple, convenient and efficient experience in handling government affairs. The demand for handling affairs is becoming more and more urgent. More and more people need intelligent, personalized and accurate handling procedures. It is even more necessary to change the service mode, use the statistical analysis ability based on user data to use online services, improve the keyword hit rate, and improve the ability of government service governance.

发明内容Contents of the invention

本发明的技术任务是针对以上不足之处，提供一种政务预约事项相关性检索方法，可以在办事群众网上预约办理业务时，准确定位办事者需求并对其展示相关业务，提高预约办事效率，同时基于数据的统计分析方法可以不断地优化查询准确度来提高性能改善体验。The technical task of the present invention is to address the above deficiencies, and provide a method for retrieving the relevance of government affairs reservation items, which can accurately locate the needs of the service providers and display relevant services to them when the public makes online reservations for business, and improve the efficiency of appointment work. At the same time, data-based statistical analysis methods can continuously optimize query accuracy to improve performance and experience.

本发明解决其技术问题所采用的技术方案是：The technical solution adopted by the present invention to solve its technical problems is:

一种政务预约事项相关性检索方法，基于用户操作记录，使用定时任务归纳产生相关性类型索引，基本信息维护产生普通类型索引；A method for retrieving the relevance of government affairs appointment items, based on user operation records, using timed task induction to generate a correlation type index, and basic information maintenance to generate a general type index;

所述相关性类型索引的产生使用计分形式的统计模型，对预约业务事项检索生成关键词检索、关联词检索、关键词和预约业务的相关度排名的组合式检索方式。实现多渠道在线预约场景下，预约业务事项快速搜索事项和相关性业务分析推荐，实现对消费者预约事项的需求的准确预测和智能分析推荐，从而达到个性化、智能化、准确度高的要求，减少数据库搜索压力，提高群众办事效率。The generation of the correlation type index uses a statistical model in the form of scoring, and generates a combined retrieval method of keyword retrieval, related word retrieval, and ranking of relevance between keywords and reservation business for the retrieval of reservation business items. In the multi-channel online reservation scenario, fast search for appointment business items and related business analysis and recommendation can be realized, and accurate prediction and intelligent analysis and recommendation of consumer demand for appointment items can be realized, so as to meet the requirements of personalization, intelligence and high accuracy , Reduce the pressure of database search and improve the efficiency of people's work.

该方法能够在办事群众网上预约办理业务时，准确定位办事者需求并对其展示相关业务，提高预约办事效率，同时基于数据的统计分析方法可以不断的优化查询准确度来提高性能改善体验。This method can accurately locate the service provider's needs and display relevant services to them when the service people make online reservations for business, and improve the efficiency of reservation service. At the same time, the statistical analysis method based on data can continuously optimize the query accuracy to improve performance and improve experience.

优选的，选用Elasticsearch搜索引擎和中文IK分词器进行检索。Elaticsearch是一个开源的高扩展的分布式全文检索引擎，它可以近乎实时的存储、检索数据，而本身扩展性很好，在开源搜索领域份额占据第一位，中文IK分词器提取关键词准确，所以基于Elaticsearch可以提出一种业务分析检索方法。Preferably, Elasticsearch search engine and Chinese IK word breaker are selected for retrieval. Elaticsearch is an open source and highly scalable distributed full-text search engine. It can store and retrieve data in near real-time, and its own scalability is very good. It occupies the first place in the open source search field. The Chinese IK tokenizer extracts keywords accurately. Therefore, a business analysis retrieval method can be proposed based on Elaticsearch.

使用搜索引擎代替简单的数据库检索，Elasticsearch是一个很好的选择，它是一个开源的分布式、RESTful风格的搜索和数据分析引擎，底层是开源库Apache Lucene，作为一个分布式的全文检索引擎，具有很好的拓展性，支持PB级别的结构化或者非结构化数据，完全可以适应大规模集中部署情况下的数据量庞大的预约事项业务的快速定位。Using a search engine instead of simple database retrieval, Elasticsearch is a good choice, it is an open source distributed, RESTful style search and data analysis engine, the bottom layer is the open source library Apache Lucene, as a distributed full-text search engine, It has good scalability, supports PB-level structured or unstructured data, and can fully adapt to the rapid positioning of appointment business with a large amount of data in the case of large-scale centralized deployment.

Elasticsearch有众多的优质分词器，这里选择基于中文IK分词器，她提供了ik_smart和ik_max_word两种分词算法，为了最大可能的定位用户的目标数据使用ik_max_word最细粒度划分方式，可以将一段文本以语义化多层次划分，创建的索引更多，定位精度更高。Elasticsearch has many high-quality tokenizers. The choice here is based on the Chinese IK tokenizer. She provides two word segmentation algorithms, ik_smart and ik_max_word. In order to locate the user's target data as much as possible, the finer-grained division method of ik_max_word can be used to divide a piece of text into semantics. With multi-level division, more indexes are created and positioning accuracy is higher.

优选的，选用RocketMq作为消息队列实现正常业务和记录结果的解耦。Preferably, RocketMq is selected as the message queue to realize the decoupling of normal business and recorded results.

实现更高的搜索精度，需要对用户目标搜索词和检索结果进行收集归纳，等待后续定时任务拉取数据到分析模型服务，在不影响正常业务流程的条件下，选择使用消息队列来做异步解耦，Rokectmq是很好的选择，具有事务性消息解决方案，保证每条结果集的正确消费和存储。To achieve higher search accuracy, it is necessary to collect and summarize the user's target search terms and search results, wait for the subsequent scheduled tasks to pull data to the analysis model service, and choose to use the message queue for asynchronous solution without affecting the normal business process Coupled, Rokectmq is a good choice, with a transactional message solution to ensure the correct consumption and storage of each result set.

优选的，所述相关性类型索引包括三种索引类型，即检索结果集由三种索引和数据库SQL查询组成，三种索引类型分别是：Preferably, the correlation type index includes three index types, that is, the retrieval result set is composed of three indexes and database SQL queries, and the three index types are respectively:

基于预约事项业务本身名称分词处理后产生的关键词与当前预约事项业务关联生成的索引，记为N型(Normal)，The index generated based on the keywords generated after the participatory processing of the name of the reservation business itself and the current reservation business is associated, which is denoted as N type (Normal),

基于用户反馈和触发行为日志定时归纳分析出的相关度模式的索引，记为C型(Correlation)，The index of the correlation pattern based on user feedback and trigger behavior log timing inductive analysis, which is recorded as type C (Correlation),

和基于关键词的关联词所携带的预约业务事项业务索引，记为R型(Related)。and the business index of reservation business items carried by the related words based on keywords, which are recorded as R type (Related).

三种索引类型的重要性不同，检索结果的排序依次按照C型索引结果相关度大小排序、N型索引结果和R型索引结果进行排序，结果去重，可以保证预约事项按照相关性排序展示，增加检索可靠性。The importance of the three index types is different. The search results are sorted according to the degree of relevance of the C-type index results, the N-type index results and the R-type index results. The results are deduplicated, which can ensure that the appointment items are sorted and displayed according to the relevance. Increase retrieval reliability.

进一步的，检索结果集中每条信息的字段含有业务事项名称、业务事项ID、业务事项部门、关键词、索引类型和索引ID，总结果集中还含有本次检索的UUID，为后续用户日志收集和召回记录提供数据。Further, the fields of each piece of information in the search result set include business item name, business item ID, business item department, keywords, index type and index ID, and the total result set also contains the UUID of this search, which is useful for subsequent user log collection and Recall records provide data.

优选的，所述基本信息维护产生普通类型索引，管理服务在维护预约业务事项时，增加修改和删除都会对基础索引(即N型索引)产生影响，Preferably, the basic information maintenance generates a common type index, and when the management service maintains the reserved business items, adding, modifying and deleting will have an impact on the basic index (that is, the N-type index),

新增预约事项后，将业务名称分词处理，每个关键词和当前预约业务事项ID形成一条索引数据存储到ES服务中；After adding a reservation item, the business name is word-segmented, and each keyword and the current reservation business item ID form an index data and store it in the ES service;

修改预约事项后，根据事项ID删除原有基础类型索引、重新生成新的基础类型索引；After modifying the appointment item, delete the original basic type index and regenerate the new basic type index according to the item ID;

删除预约事项后，根据事项ID删除掉原有的基础类型索引，同时根据事项ID删除掉其余两种类型的索引，保证数据的准确性。After deleting the appointment item, delete the original basic type index according to the item ID, and delete the other two types of indexes according to the item ID to ensure the accuracy of the data.

优选的，对用户的操作记录进行日志收集，包括Preferably, log collection is performed on the user's operation records, including

客户端搜索请求处理过程完毕后，将本次检索结果集的每条数据组装放入消息队列中，日志服务作为消息的消费端记录日志信息，放入消息队列；组装的日志信息字段包括搜索词、关键词、Es索引ID、事项ID、索引类型和本次检索UUID；After the client search request processing process is completed, each piece of data in the search result set is assembled and put into the message queue, and the log service records the log information as the consumer of the message, and puts it into the message queue; the assembled log information field includes the search term , keywords, Es index ID, item ID, index type and UUID of this search;

客户端用户在获取到检索结果后，点击浏览某一条检索信息，形成一条点击定位召回日志，数据通过客户端送往日志服务中保存，重复点击只记录一次，防止分析数据失真；日志信息包括本次检索UUID、索引ID、事项ID、关键词、索引类型和搜索词；After the client user obtains the search results, he clicks to browse a piece of search information to form a click location recall log. The data is sent to the log service through the client for storage. Repeated clicks are only recorded once to prevent the analysis data from being distorted; the log information includes this Retrieve UUID, index ID, item ID, keyword, index type and search term;

客户端用户在获取到检索结果后，点击浏览某一条检索信息并且成功办理业务，形成一条成功办理召回日志，数据通过客户端送往日志服务中保存；日志信息包括本次检索UUID、索引ID、索引类型、事项ID、关键词和搜索词；After the client user obtains the search results, he clicks to browse a piece of search information and successfully handles the business, forming a successful recall log, and the data is sent to the log service through the client for storage; the log information includes this search UUID, index ID, Index type, item ID, keywords and search terms;

以上三种日志收集，具有相同的检索UUID，则视为一组检索流程日志，在分别进入到日志服务中后，等待定时任务扫描将一组检索流程日志放入分析模型服务中进行分析处理。The above three types of log collections, which have the same retrieval UUID, are regarded as a set of retrieval process logs. After entering the log service respectively, wait for the scheduled task scan to put a set of retrieval process logs into the analysis model service for analysis and processing.

优选的，生成相关度索引：Preferably, a correlation index is generated:

日志服务定时任务扫描收集的日志，按照检索UUID将所述三种日志打包发送至分析模型服务中处理，生成相关度索引，相关度索引的相关度计算采用数值统计规则，根据新生成的相关度索引自身属性中含有相关度字段，默认100，区间0至1000。The log service timing task scans the collected logs, packages and sends the three types of logs to the analysis model service according to the retrieval UUID, and generates a correlation index. The correlation calculation of the correlation index adopts numerical statistical rules, and according to the newly generated correlation The index itself contains a correlation field, which defaults to 100 and ranges from 0 to 1000.

相关度索引包含关键信息有关键词、对应基础索引ID和相关度数值，将关键词和基础索引多对多关联映射并挂载相关度数值，检索该类型索引将使用索引中关联的基础索引ID最终指向基础索引；同时，根据日志类型不同设定不同的step值：浏览召回(+1)、办理召回(+2)以及未命中召回(-1)，相关度数值依次设定变化，定时扫描删除掉相关度数值为0的索引，目的是通过分析服务模块不断纠正每条相关度索引的相关度数值，提高命中率；A correlation index contains key information including keywords, corresponding base index IDs, and correlation values. The keywords and base indexes are mapped many-to-many and the correlation values are mounted. Retrieving this type of index will use the associated base index ID in the index. Finally point to the basic index; at the same time, set different step values according to different log types: browse recall (+1), process recall (+2) and miss recall (-1), the correlation value is set and changed in sequence, and the scan is scheduled Delete the index whose correlation value is 0, the purpose is to continuously correct the correlation value of each correlation index through the analysis service module, and improve the hit rate;

日志中携带索引类型虽然不同，但因为日志默认会携带关键词、基础索引ID两个信息，所以处理过程大致相同；检查是否存在相关度索引，若存在则根据上述setp规则修改数值，若不存在则根据上述生成规则生成新的索引；Although the type of index carried in the log is different, because the log will carry two information of keywords and basic index ID by default, the processing process is roughly the same; check whether there is a correlation index, if it exists, modify the value according to the above setp rule, if it does not exist Then generate a new index according to the above generation rules;

通过该模型基于大量数据的不断校准，一条预约业务事项的所有关键词分别的相关度数值分布大致符合正态分布，相关度数值用来决定展示排序优先度。Through the continuous calibration of this model based on a large amount of data, the distribution of the correlation value of all the keywords of a reservation business item roughly conforms to the normal distribution, and the correlation value is used to determine the priority of display ranking.

生成关联词索引：Generate an index of associated words:

关联词汇索引包含的字段是关键词和其关联词数组，数据来源是预约服务搜索接口在处理完成检索结果后，将本次IK分词结果放入消息队列中，分析模型服务作为消费端处理生成关联索引，消息队列实现异步解耦。The fields contained in the associated vocabulary index are keywords and their associated word arrays. The data source is the reservation service search interface. After processing the search results, the IK word segmentation results are put into the message queue, and the analysis model service is processed as a consumer to generate an associated index. , the message queue implements asynchronous decoupling.

仅使用关键词和相关度进行检索，但是仍可能查询结果不准确，所以需要使用关联词索引达到推论推荐的功能，提高检索准确度。Only keywords and relevance are used to search, but the query results may still be inaccurate, so it is necessary to use the associated word index to achieve the function of inference and recommendation, and improve the accuracy of retrieval.

该检索方法实现了搜索词汇模糊查询时关键词和关联词汇结果匹配，提高群众需求的命中率；The retrieval method realizes the matching of keywords and related vocabulary results in fuzzy query of search vocabulary, and improves the hit rate of mass demand;

可以根据展示搜索结果后用户浏览和成功办理的日志记录，通过召回率提供相关度检索方式，按照相关度权重展示，提高准确率；According to the log records of user browsing and successful processing after displaying the search results, the recall rate can be used to provide a correlation retrieval method, and the display will be displayed according to the correlation weight to improve the accuracy rate;

还能够基于召回率的相关性分析模型，提高关键词和事项信息的相关性准确度；It can also improve the correlation accuracy of keywords and item information based on the correlation analysis model of recall rate;

实现关键词和关联词汇的相关性索引，增强个性化推荐功能。Realize the correlation index of keywords and related words, and enhance the personalized recommendation function.

本发明还要求保护一种政务预约事项相关性检索装置，包括：至少一个存储器和至少一个处理器；The present invention also claims to protect a device for retrieving the relevance of government appointment items, including: at least one memory and at least one processor;

所述至少一个存储器，用于存储机器可读程序；said at least one memory for storing machine-readable programs;

所述至少一个处理器，用于调用所述机器可读程序，执行上述的方法。The at least one processor is configured to call the machine-readable program to execute the above method.

本发明还要求保护一种计算机可读介质，所述计算机可读介质上存储有计算机指令，所述计算机指令在被处理器执行时，使所述处理器执行上述的方法。The present invention also claims to protect a computer-readable medium, on which computer instructions are stored, and when the computer instructions are executed by a processor, the processor executes the above-mentioned method.

本发明的一种政务预约事项相关性检索方法与现有技术相比，具有以下有益效果：Compared with the prior art, a method for retrieving the relevance of government appointment items of the present invention has the following beneficial effects:

该方法可以在系统在业务检索和关键词模糊搜索时，不完全依赖于数据库检索，减少并发环境下对数据库的压力，提高并发能力；This method can reduce the pressure on the database in a concurrent environment and improve the concurrent ability when the system does not completely rely on database retrieval when performing business retrieval and keyword fuzzy search;

数据检索接口数据来自于Elasticsearch的索引和部分数据库查询，可以提供毫米级的响应速度，减少办事群众的等待，优化使用体验。The data retrieval interface data comes from the Elasticsearch index and some database queries, which can provide millimeter-level response speed, reduce the waiting of the service people, and optimize the user experience.

基于用户操作日志记录的统计分析和定时任务的归纳总结，使查询结果更加智能、准确、个性化、取代原有的基于数据库的模糊搜索，优化办事群众的定位准确度。Based on the statistical analysis of user operation log records and the induction and summary of timed tasks, the query results are more intelligent, accurate, and personalized, replacing the original fuzzy search based on the database, and optimizing the positioning accuracy of the service masses.

附图说明Description of drawings

图1是本发明一个实施例提供的政务预约事项相关性检索方法流程图；Fig. 1 is a flow chart of a method for retrieving the relevance of government appointments provided by an embodiment of the present invention;

图2是本发明一个实施例提供的客户端与预约服务端检索示意图。Fig. 2 is a schematic diagram of retrieval between the client and the reservation server provided by an embodiment of the present invention.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

目前的政务服务场景中，办事群众根据自己的需求准确寻找到对应的标准事项业务时，在不熟悉的情况下往往定位效率比较低下，准确度比较低，这种基于数据库检索的方式越来越不能满足生产环境的需要。业务查询和检索过程基于数据库搜索时，需要的关键字往往只能使用数据库LIKE函数进行模糊查询，当全省统一预约办事的场景下，在业务办理事项的数据量十分庞大的情况下，办事群众输入的关键词稍微有所出入时查询结果就千差万别，很难准确定位到需求对应的办事业务办理事项，所以要改善这种情况，需要业务查询和检索过程具备分词功能、模糊化查询、完全不对应查询和毫秒级查询特点。In the current government service scenario, when the people who handle affairs accurately find the corresponding standard items and services according to their own needs, they often have low positioning efficiency and low accuracy when they are not familiar with them. This method based on database retrieval is becoming more and more It cannot meet the needs of the production environment. When the business query and retrieval process is based on database search, the required keywords can only be fuzzy searched using the database LIKE function. When the input keywords are slightly different, the query results will vary widely, and it is difficult to accurately locate the service business handling items corresponding to the needs. Therefore, to improve this situation, it is necessary for the business query and retrieval process to have word segmentation functions, fuzzy queries, and completely unambiguous Corresponding query and millisecond-level query features.

本发明实施例提供一种政务预约事项相关性检索方法，基于Elasticsearch搜索引擎和中文IK分词器索引检索基础功能，基于用户操作记录，使用定时任务归纳产生相关性类型索引，基本信息维护产生普通类型索引；The embodiment of the present invention provides a correlation retrieval method for government appointment items, based on the basic function of Elasticsearch search engine and Chinese IK tokenizer index retrieval, based on user operation records, using timing task induction to generate correlation type index, and basic information maintenance to generate common type index;

所述相关性类型索引的产生使用计分形式的统计模型，从而对预约业务事项检索生成关键词检索、关联词检索、关键词和预约业务的相关度排名的组合式检索方式。实现多渠道在线预约场景下，预约业务事项快速搜索事项和相关性业务分析推荐，实现对消费者预约事项的需求的准确预测和智能分析推荐，从而达到个性化、智能化、准确度高的要求，减少数据库搜索压力，提高群众办事效率。The generation of the correlation type index uses a statistical model in the form of scoring, so as to generate a combined retrieval method of keyword retrieval, related word retrieval, and ranking of relevance between keywords and reservation business for the retrieval of reservation business items. In the multi-channel online reservation scenario, fast search for appointment business items and related business analysis and recommendation can be realized, and accurate prediction and intelligent analysis and recommendation of consumer demand for appointment items can be realized, so as to meet the requirements of personalization, intelligence and high accuracy , Reduce the pressure of database search and improve the efficiency of people's work.

Elaticsearch是一个开源的高扩展的分布式全文检索引擎，它可以近乎实时的存储、检索数据，而本身扩展性很好，在开源搜索领域份额占据第一位，中文IK分词器提取关键词准确，所以基于Elaticsearch可以提出一种业务分析检索方法。Elaticsearch is an open source and highly scalable distributed full-text search engine. It can store and retrieve data in near real-time, and its own scalability is very good. It occupies the first place in the open source search field. The Chinese IK tokenizer extracts keywords accurately. Therefore, a business analysis retrieval method can be proposed based on Elaticsearch.

选用RocketMq作为消息队列实现正常业务和记录结果的解耦。实现更高的搜索精度，需要对用户目标搜索词和检索结果进行收集归纳，等待后续定时任务拉取数据到分析模型服务，在不影响正常业务流程的条件下，选择使用消息队列来做异步解耦，Rokectmq是很好的选择，具有事务性消息解决方案，保证每条结果集的正确消费和存储。Choose RocketMq as the message queue to realize the decoupling of normal business and record results. To achieve higher search accuracy, it is necessary to collect and summarize the user's target search terms and search results, wait for the subsequent scheduled tasks to pull data to the analysis model service, and choose to use the message queue for asynchronous solution without affecting the normal business process Coupled, Rokectmq is a good choice, with a transactional message solution to ensure the correct consumption and storage of each result set.

本文中出现的：Appeared in this article:

搜索词：即用户客户端输入的搜索语句或词汇；Search term: the search sentence or vocabulary entered by the user client;

关键词：即使用IK分词器ik_max_word模式处理后的每一个分词结果；Keywords: each word segmentation result processed by the IK tokenizer ik_max_word mode;

关联词：即和关键词具有相同或相近搜索结果的关键词称为该关键词的关联词。Associated Words: A keyword that has the same or similar search results as a keyword is called an associated word of the keyword.

检索结果集由三种索引和数据库SQL查询组成，三种索引类型分别是：The search result set consists of three indexes and database SQL queries. The three index types are:

N型：“Normal”基于预约事项业务本身名称分词处理后产生的关键词与当前预约事项业务关联生成的索引；Type N: "Normal" is an index generated based on the keywords generated after word segmentation of the appointment business name and the current appointment business;

C型：“Correlation”基于用户反馈和触发行为日志定时归纳分析出的相关度模式的索引；Type C: "Correlation" is an index of the correlation pattern based on user feedback and trigger behavior log timing inductive analysis;

R型：“Related”基于关键词的关联词所携带的预约业务事项业务索引。Type R: "Related" is based on the business index of reservation business items carried by the associated words of keywords.

客户端和预约服务端的检索流程简述为，预约服务端接收用户输入的搜索词，使用中文IK分词器ik_max_word模式处理成多个关键词，将每个关键词调用ES容器服务获取三种类型的索引结果，获取索引中的业务事项主键ID，检索数据库获取业务的基础信息，形成结果集返回客户端。参考图2所示。The retrieval process between the client and the reservation server is briefly described as follows: the reservation server receives the search words entered by the user, processes them into multiple keywords using the Chinese IK tokenizer ik_max_word mode, and calls the ES container service for each keyword to obtain three types of keywords: Index results, obtain the primary key ID of the business item in the index, search the database to obtain the basic information of the business, form a result set and return it to the client. Refer to Figure 2.

检索结果集中每条信息的字段含有业务事项名称、业务事项ID、业务事项部门、关键词、索引类型和索引ID，总结果集中还含有本次检索的UUID，为后续用户日志收集和召回记录提供数据。The fields of each piece of information in the retrieval result set include business item name, business item ID, business item department, keywords, index type, and index ID. The total result set also contains the UUID of this search, which provides information for subsequent user log collection and recall records. data.

分析模型服务需要庞大的用户行为日志支撑，所以搜索分析的全流程日志链路中关于搜索的有三部分日志记录，以下分别是产生过程、含有信息和存储流程：The analysis model service requires huge user behavior logs to support, so there are three parts of the log records about the search in the log link of the whole process of search analysis. The following are the generation process, information content and storage process:

客户端搜索请求处理过程完毕后，将本次检索结果集的每条数据组装放入消息队列中，日志服务作为消息的消费端记录日志信息。组装的日志信息字段包括搜索词、关键词、Es索引ID、事项ID、索引类型和本次检索UUID，放入消息队列。After the client search request processing process is completed, each piece of data in the search result set is assembled and put into the message queue, and the log service records the log information as the consumer of the message. The assembled log information fields include search terms, keywords, Es index ID, item ID, index type and UUID of this search, and put them into the message queue.

客户端用户在获取到检索结果后，点击浏览某一条检索信息，形成一条点击定位召回日志，数据通过客户端送往日志服务中保存。日志信息包括本次检索UUID、索引ID、事项ID、关键词、索引类型和搜索词；重复点击只记录一次，防止分析数据失真。After the client user obtains the retrieval results, he clicks to browse a certain piece of retrieval information to form a click-to-locate recall log, and the data is sent to the log service through the client for storage. The log information includes the retrieval UUID, index ID, item ID, keywords, index type, and search terms; repeated clicks are only recorded once to prevent the analysis data from being distorted.

客户端用户在获取到检索结果后，点击浏览某一条检索信息并且成功办理业务，形成一条成功办理召回日志，数据通过客户端送往日志服务中保存；日志信息包括本次检索UUID、索引ID、索引类型、事项ID、关键词和搜索词。After the client user obtains the search results, he clicks to browse a piece of search information and successfully handles the business, forming a successful recall log, and the data is sent to the log service through the client for storage; the log information includes this search UUID, index ID, Index Type, Item ID, Keyword and Search Term.

管理服务维护基础索引：The management service maintains the base index:

管理服务在维护预约业务事项时，增加修改和删除都会对基础索引(即N型索引)产生影响，When the management service maintains the reservation business items, adding, modifying and deleting will have an impact on the basic index (that is, the N-type index).

分析模型服务生成相关度索引：Analytic Model Service generates a relevance index:

相关度索引包含关键信息有关键词、对应基础索引ID和相关度数值，其本质是将关键词和基础索引多对多关联映射并挂载相关度数值，检索该类型索引将使用索引中关联的基础索引ID最终指向基础索引。同时，根据日志类型不同设定不同的step值：浏览召回(+1)、办理召回(+2)以及未命中召回(-1)，相关度数值依次设定变化，定时扫描删除掉相关度数值为0的索引，目的是通过分析服务模块不断纠正每条相关度索引的相关度数值，提高命中率；A correlation index contains key information including keywords, corresponding base index IDs, and correlation values. Its essence is to map keywords and base indexes with many-to-many associations and mount correlation values. Retrieving this type of index will use the associated The base index ID ultimately points to the base index. At the same time, set different step values according to different log types: browse recall (+1), process recall (+2) and miss recall (-1), the correlation value is set and changed in sequence, and the correlation value is deleted by regular scanning The index is 0, the purpose is to continuously correct the correlation value of each correlation index through the analysis service module, and improve the hit rate;

日志中携带索引类型虽然不同，但因为日志默认会携带关键词、基础索引ID两个信息，所以处理过程大致相同。检查是否存在相关度索引，若存在则根据上述setp规则修改数值，若不存在则根据上述生成规则生成新的索引。Although the types of indexes carried in the logs are different, the processing process is roughly the same because the logs carry keywords and basic index IDs by default. Check whether there is a correlation index, if it exists, modify the value according to the above setp rules, if not, generate a new index according to the above generation rules.

分析模型生成关联词索引：Analyze the model to generate an index of associated words:

IK分词结果包含一次检索分词结果，例如搜索词为“车辆年检”时，关键词分词为“车辆”、“年检”、“车检”，“车辆年检”、“车”，它们互为关联词，将上述五个关键词依次建关联索引，每个索引包含本身关键词和其关联词汇数组，后续触发检索时会根据关键词找到其关联词，再去根据关联词检索基础索引。特殊的关联类型索引的关联词汇数组是会根据IK分词结果不断补充的，最多20个词汇防止检索基础索引时效率变慢。The IK word segmentation result includes a search word segmentation result. For example, when the search term is "vehicle annual inspection", the keyword segmentation is "vehicle", "annual inspection", "vehicle inspection", "vehicle annual inspection" and "vehicle". They are related words. Build the associated indexes for the above five keywords one by one. Each index contains its own keyword and its associated vocabulary array. When the subsequent search is triggered, its associated words will be found according to the keywords, and then the basic index will be searched based on the associated words. The associative vocabulary array of the special association type index will be continuously supplemented according to the IK word segmentation results, and the maximum number of words is 20 to prevent the efficiency of the basic index from being slowed down.

所述至少一个处理器，用于调用所述机器可读程序，执行上述的政务预约事项相关性检索方法。The at least one processor is configured to call the machine-readable program to execute the above-mentioned method for retrieving the relevance of government appointment items.

本发明实施例还提供了一种计算机可读介质，所述计算机可读介质上存储有计算机指令，所述计算机指令在被处理器执行时，使所述处理器执行本发明上述实施例中所述的政务预约事项相关性检索方法。具体地，可以提供配有存储介质的系统或者装置，在该存储介质上存储着实现上述实施例中任一实施例的功能的软件程序代码，且使该系统或者装置的计算机(或CPU或MPU)读出并执行存储在存储介质中的程序代码。An embodiment of the present invention also provides a computer-readable medium, where computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor executes the steps described in the above-mentioned embodiments of the present invention. The correlation retrieval method of the government affairs appointment items described above. Specifically, a system or device equipped with a storage medium may be provided, on which a software program code for realizing the functions of any of the above embodiments is stored, and the computer (or CPU or MPU of the system or device) ) to read and execute the program code stored in the storage medium.

在这种情况下，从存储介质读取的程序代码本身可实现上述实施例中任何一项实施例的功能，因此程序代码和存储程序代码的存储介质构成了本发明的一部分。In this case, the program code itself read from the storage medium can realize the function of any one of the above-mentioned embodiments, so the program code and the storage medium storing the program code constitute a part of the present invention.

用于提供程序代码的存储介质实施例包括软盘、硬盘、磁光盘、光盘(如CD-ROM、CD-R、CD-RW、DVD-ROM、DVD-RAM、DVD-RW、DVD+RW)、磁带、非易失性存储卡和ROM。可选择地，可以由通信网络从服务器计算机上下载程序代码。Examples of storage media for providing program code include floppy disks, hard disks, magneto-optical disks, optical disks (such as CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD+RW), Tape, non-volatile memory card, and ROM. Alternatively, the program code can be downloaded from a server computer via a communication network.

此外，应该清楚的是，不仅可以通过执行计算机所读出的程序代码，而且可以通过基于程序代码的指令使计算机上操作的操作系统等来完成部分或者全部的实际操作，从而实现上述实施例中任意一项实施例的功能。In addition, it should be clear that not only by executing the program code read by the computer, but also by making the operating system on the computer complete part or all of the actual operations through instructions based on the program code, so as to realize the function of any one of the embodiments.

此外，可以理解的是，将由存储介质读出的程序代码写到插入计算机内的扩展板中所设置的存储器中或者写到与计算机相连接的扩展单元中设置的存储器中，随后基于程序代码的指令使安装在扩展板或者扩展单元上的CPU等来执行部分和全部实际操作，从而实现上述实施例中任一实施例的功能。In addition, it can be understood that the program code read from the storage medium is written into the memory provided in the expansion board inserted into the computer or written into the memory provided in the expansion unit connected to the computer, and then based on the program code The instruction causes the CPU installed on the expansion board or the expansion unit to perform some or all of the actual operations, so as to realize the functions of any one of the above-mentioned embodiments.

上文通过附图和优选实施例对本发明进行了详细展示和说明，然而本发明不限于这些已揭示的实施例，基与上述多个实施例本领域技术人员可以知晓，可以组合上述不同实施例中的代码审核手段得到本发明更多的实施例，这些实施例也在本发明的保护范围之内。The present invention has been shown and described in detail through the accompanying drawings and preferred embodiments above, but the present invention is not limited to these disclosed embodiments, and those skilled in the art based on the above-mentioned multiple embodiments can know that the above-mentioned different embodiments can be combined More embodiments of the present invention can be obtained by means of code review in the present invention, and these embodiments are also within the protection scope of the present invention.

Claims

1. A government affair appointment relativity retrieval method is characterized in that a relativity type index is generated by using timed task induction based on user operation records, and a common type index is generated by basic information maintenance;

generating a correlation type index by using a statistical model in a scoring form, and generating a combined retrieval mode of keyword retrieval, associated word retrieval and correlation ranking of keywords and reservation service for reservation service item retrieval;

collecting logs of operation records of users, and generating a relevance index:

the log service timing task scans the collected log, packages the log according to the search UUID, and sends the log to the analysis model service for processing, and generates a relevance index:

the relevance index comprises a keyword, a corresponding basic index ID and a relevance value, the keyword and the basic index are mapped in a many-to-many associated mode, the relevance value is mounted, and the basic index ID associated in the index is used for searching the type index to finally point to the basic index; meanwhile, different step values are set according to different log types: browse recall +1, transact recall +2 and miss recall-1, the correlation value presumes the change sequentially, scan and delete the index with 0 of correlation value regularly;

checking whether a correlation index exists, if so, modifying a numerical value according to the step rule, and if not, generating a new index according to the generation rule;

generating an associated word index:

the relevant vocabulary index contains fields which are keywords and relevant word arrays thereof, the reservation service search interface puts the IK word segmentation result into a message queue after processing the search result, and the analysis model service is used as a consumer to process and generate the relevant index, so that the message queue realizes asynchronous decoupling.

2. The method for searching relevance of government affairs appointment according to claim 1 wherein the search is performed by using an elastomer search engine and a chinese IK word segmentation device.

3. The method for searching relevance of government affair appointment according to claim 1 or 2, wherein the method is characterized in that RocketMq is selected as a message queue to realize decoupling of normal business and recording result.

4. The method for searching relevance of government affairs appointment according to claim 1, wherein the relevance type index comprises three index types, namely:

the index generated based on the association of the keyword generated after the name word segmentation processing of the appointment business and the current appointment business is marked as N type,

based on the user feedback and the trigger behavior log, the index of the analyzed relevance mode is summarized regularly and marked as C type,

and a reservation service item service index carried by the related words based on the keywords, recorded as R type,

and sequencing the search results sequentially according to the relevance of the C-type index results, the N-type index results and the R-type index results, and de-duplicating the results.

5. The method for searching relevance of government affair appointment according to claim 4 wherein the fields of each piece of information in the search result set contain business item names, business item IDs, business item departments, keywords, index types and index IDs, and the total result set also contains UUIDs of the search, so as to provide data for subsequent user log collection and recall records.

6. The method for searching relevance of government affairs appointment as set forth in claim 2 wherein the basic information maintenance generates a general type index,

after the reservation items are added, the business names are processed in word segmentation, and each keyword and the current reservation business item ID form index data to be stored in the ES service;

after the appointment is modified, deleting the original basic type index according to the appointment ID, and regenerating a new basic type index;

after the appointment is deleted, the original basic type index is deleted according to the appointment ID, and the other two types of indexes are deleted according to the appointment ID, so that the accuracy of data is ensured.

7. The method for searching for relevance of government affairs appointment as claimed in claim 1, 2, 4, 5 or 6, wherein the step of collecting the log of the operation record of the user comprises

After the client-side search request processing process is finished, each piece of data of the search result set is assembled and put into a message queue, and the log service is used as a consumption side of the message to record log information and put into the message queue; the assembled log information field comprises a search word, a keyword, an Es index ID, a matter ID, an index type and a current search UUID;

after the client user obtains the search result, clicking and browsing a certain search information to form a click positioning recall log, and sending the data to a log service for storage through the client, wherein repeated clicking is only recorded once; the log information comprises a current search UUID, an index ID, a item ID, a keyword, an index type and a search word;

after the client user obtains the search result, clicking and browsing a certain search information and successfully transacting business to form a successful transacting recall log, and sending the data to log service for storage through the client; the log information comprises a current search UUID, an index ID, an index type, a item ID, keywords and search words;

and collecting the three logs, namely regarding a group of search flow logs if the three logs have the same search UUID, waiting for the timing task scanning to put the group of search flow logs into the analysis model service for analysis processing after the search flow logs respectively enter the log service.

8. A government affair appointment correlation retrieval device, comprising: at least one memory and at least one processor;

the at least one memory for storing a machine readable program;

said at least one processor for invoking said machine readable program to perform the method of any of claims 1 to 7.

9. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 7.