[go: up one dir, main page]

CN111460810A - Crowd-sourced task spot check method and device, computer equipment and storage medium - Google Patents

Crowd-sourced task spot check method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN111460810A
CN111460810A CN202010134385.6A CN202010134385A CN111460810A CN 111460810 A CN111460810 A CN 111460810A CN 202010134385 A CN202010134385 A CN 202010134385A CN 111460810 A CN111460810 A CN 111460810A
Authority
CN
China
Prior art keywords
sampling
answer
response
word
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010134385.6A
Other languages
Chinese (zh)
Inventor
王健宗
李佳琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010134385.6A priority Critical patent/CN111460810A/en
Publication of CN111460810A publication Critical patent/CN111460810A/en
Priority to PCT/CN2020/118461 priority patent/WO2021174829A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

本申请公开了一种众包任务的抽检方法、装置、计算机设备及存储介质,包括针对每个历史众包任务,获取参与历史众包任务的每个应答对象,以及每个应答对象对应的应答答案;对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库;针对每个应答对象的每个应答答案中的抽检分词,统计预设的答案词库中,抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数,并根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;按照可靠性值确定抽检对象,并对抽检对象对应的应答答案进行检查操作。本申请能够提取抽检关键字,加强抽检针对性,提高抽检效率。

Figure 202010134385

The present application discloses a method, device, computer equipment and storage medium for sampling inspection of crowdsourcing tasks, including obtaining, for each historical crowdsourcing task, each response object participating in the historical crowdsourcing task, and the corresponding response of each response object Answer; analyze and process the answer to obtain the spot check word, and extract the spot check keyword from the spot check word segment, and store the spot check keyword in the preset answer thesaurus; Word segmentation, counting the number of times that the word segmentation hits the sampling keyword in the preset answer thesaurus, as the basic times corresponding to each answer, and determines the reliability corresponding to each answer object according to the basic times corresponding to each answer value; determine the sampling object according to the reliability value, and check the response answer corresponding to the sampling object. The application can extract the keywords of sampling inspection, strengthen the pertinence of sampling inspection, and improve the efficiency of sampling inspection.

Figure 202010134385

Description

众包任务的抽检方法、装置、计算机设备及存储介质Sampling method, device, computer equipment and storage medium for crowdsourcing tasks

技术领域technical field

本申请涉及数据处理技术领域,尤其涉及众包任务的抽检方法、装置、计算机设备及存储介质。The present application relates to the technical field of data processing, and in particular, to a sampling inspection method, device, computer equipment and storage medium for crowdsourcing tasks.

背景技术Background technique

随着网络技术的飞速发展,一些公司或者机构为了获取更多创意信息,或者高效便捷解决一些跨领域问题,往往会通过互联网向互联网对象发放众包任务,通过众包任务的方式,来解决这些问题。With the rapid development of network technology, in order to obtain more creative information or solve some cross-domain problems efficiently and conveniently, some companies or institutions often issue crowdsourcing tasks to Internet objects through the Internet, and solve these problems through crowdsourcing tasks. question.

众包任务是指一个公司或者机构把过去由员工执行的工作任务,以自由自愿的形式外包给非特定的(而且通常是大型的)大众网络的做法。众包平台上的员工分为两类:在平台上发布任务的人员称为任务发布者,完成任务的人员称为应答对象。任务发布者在平台上发布任务,应答对象通过完成任务获得一定的报酬。众包任务的工作方式可以帮助任务发布者获得大量自由的对象,通过利用这些对象的智慧解决实际问题。Crowdsourcing tasks refers to the practice of a company or institution outsourcing work tasks that were previously performed by employees to an unspecified (and usually large) mass network on a free and voluntary basis. The employees on the crowdsourcing platform are divided into two categories: those who post tasks on the platform are called task publishers, and those who complete tasks are called responders. The task publisher publishes the task on the platform, and the respondent gets a certain reward for completing the task. The way crowdsourced tasks work can help task issuers get a lot of free objects to solve real-world problems by leveraging the intelligence of those objects.

在当前,由于应答对象的擅长领域和专业程度的不确定性,需要对收集到的众包任务的应答答案的正确性进行抽检,但在参与应答的对象数量较多时,也即,获取到的应答答案较多时,检查需要耗费较长时间,当前的做法是通过随机抽检的方式,从所有应答答案中,随机抽取预设数量的应答答案进行检查,并根据检查结果,对该众包任务进行评估,这种随机抽检的方式针对性较弱,使得众包任务的效果评估并不理想,导致众包任务抽检效率低的问题,如何有针对性地抽取众包任务,提高众包任务的抽检效率,成了一个亟待解决的难题。At present, due to the uncertainty of the field of expertise and professional level of the respondent, it is necessary to spot-check the correctness of the collected answers to the crowdsourcing tasks. When there are many answers, the inspection will take a long time. The current practice is to randomly select a preset number of answers from all the answers for inspection by random sampling, and carry out the crowdsourcing task according to the inspection results. Evaluation, this method of random sampling is less targeted, which makes the evaluation of the effect of crowdsourcing tasks unsatisfactory, resulting in the problem of low sampling efficiency of crowdsourcing tasks. How to select crowdsourcing tasks in a targeted manner and improve the sampling inspection of crowdsourcing tasks Efficiency has become an urgent problem to be solved.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的在于提出一种众包任务的抽检方法,解决现有技术随机抽检方式的针对性较弱,导致众包任务抽检效率低的问题。The purpose of the embodiments of the present application is to propose a method for sampling inspection of crowdsourcing tasks, so as to solve the problem that the random sampling inspection method in the prior art is weak in pertinence, resulting in low sampling inspection efficiency of crowdsourcing tasks.

为了解决上述技术问题,本申请实施例提供一种众包任务的抽检方法,包括:In order to solve the above-mentioned technical problems, the embodiment of the present application provides a sampling method for crowdsourcing tasks, including:

针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task, and the response answer corresponding to each response object;

对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing and processing the response answer, obtaining the word-sampling check, and extracting the keyword for spot-checking from the word-sampling check, and storing the keyword for spot-checking into a preset answer thesaurus;

针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampling word segmentation in each response answer of each answering object, count the number of times the sampling word segmentation hits the sampling keyword in the preset answer thesaurus, as the number of times corresponding to each answer answer. base times;

根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;According to the basic times corresponding to each response answer, determine the reliability value corresponding to each response object;

按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability values from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked.

进一步的,所述对所述应答答案进行解析处理,得到抽检分词包括:Further, performing the analysis and processing on the response answer, and obtaining the word segmentation by sampling inspection includes:

使用动态规划算法,对所述应答答案进行分词处理,得到初始分词;Using a dynamic programming algorithm, word segmentation is performed on the response answer to obtain an initial word segmentation;

对所述初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到所述抽检分词。Filter processing is performed on the initial word segmentation, and synonymous replacement is performed on the initial word segmentation after the filtering processing to obtain the sampling word segmentation.

进一步的,所述从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库包括:Further, the extraction of sampling keywords from the sampling word segmentation, and storing the sampling keywords into a preset answer thesaurus includes:

针对同一所述历史众包任务,获取所述历史众包任务对应的所有抽检分词,并统计每个所述抽检分词的出现次数,得到所述抽检分词对应的抽检词频;For the same historical crowdsourcing task, obtain all the sampling word segments corresponding to the historical crowdsourcing task, and count the number of occurrences of each of the sampling word segmentations, to obtain the sampling word frequency corresponding to the sampling word segmentation;

将所述抽检词频大于预设词频的抽检分词,作为抽检关键字,并将所述抽检关键字存入到预设的答案词库。The sampling word frequency of the sampling inspection word frequency is greater than the preset word frequency, as the sampling inspection keyword, and the sampling inspection keyword is stored in the preset answer thesaurus.

进一步的,所述对过滤处理后的初始分词进行同义替换,得到所述抽检分词包括:Further, performing synonymous replacement on the initial word segmentation after filtering, and obtaining the sampling word segmentation includes:

通过命名实体识别的方式,对所述初始分词进行同义替换,得到所述抽检分词。By means of named entity recognition, synonymous substitution is performed on the initial participle to obtain the sampling participle.

进一步的,所述通过命名实体识别的方式,对所述初始分词进行同义替换包括:Further, performing synonymous replacement on the initial participle by means of named entity recognition includes:

获取预设的标准词汇字典;Get the preset standard vocabulary dictionary;

针对每个所述初始分词,通过遍历的方式,将所述初始分词分别与所述标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;For each of the initial participles, by means of traversal, perform named entity recognition on the initial participles and each vocabulary in the standard vocabulary dictionary, respectively, to obtain an entity recognition result;

若所述实体识别结果为存在相同命名实体,则获取所述识别结果对应的初始分词和标准词汇,并使用所述标准分词替代所述初始分词。If the entity recognition result is that the same named entity exists, the initial participle and standard vocabulary corresponding to the recognition result are acquired, and the standard participle is used to replace the initial participle.

进一步的,所述针对所述每个应答对象的每个应答答案中的所述抽检分词,统计预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数包括:Further, in the described sampling word segmentation in each response answer for the each response object, in a preset answer thesaurus, the number of times the sampling word segmentation hits the sampling keyword is used as each response. The base times corresponding to the answers include:

获取所述应答对象参与的每个历史众包任务,作为参考任务;Obtain each historical crowdsourcing task that the respondent participates in as a reference task;

针对每个所述参考任务,从所述预设的答案词库中获取抽检关键字,并统计所述应答对象的应答答案对应的抽检分词命中所述抽检关键字的次数,作为基础次数。For each of the reference tasks, the sampling keyword is obtained from the preset answer thesaurus, and the number of times that the sampling word corresponding to the response answer of the answering object hits the sampling keyword is counted as the basic number of times.

进一步的,所述根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值包括:根据所述基础次数,统计每个所述参考任务对应的抽检关键字之和M,并统计所述应答对象对于每个所述参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;Further, according to the basic times corresponding to each response answer, determining the reliability value corresponding to each response object includes: according to the basic times, count the sum M of the sampling keywords corresponding to each of the reference tasks. , and count the sum N of the number of hits of the response object for each of the reference tasks, where M and N are both positive integers, and N is less than or equal to M;

采用公式δ=N/M进行计算,得到所述可靠性值δ。The formula δ=N/M is used for calculation to obtain the reliability value δ.

为解决上述技术问题,本发明采用的一个技术方案是:提供一种众包任务的抽检装置,包括:In order to solve the above-mentioned technical problems, a technical solution adopted by the present invention is to provide a sampling device for crowdsourcing tasks, including:

获取模块,用于针对每个历史众包任务,获取每个应答对象对应所述历史众包任务的应答答案;an obtaining module, used for obtaining the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task;

解析模块,用于对所述应答答案进行解析处理,得到抽检分词,并提取所述抽检分词的抽检关键字,将所述抽检关键字存入到预设的答案词库;The parsing module is used to parse and process the response answer, obtain the word-sampling check, and extract the keyword of the spot-checking of the word-sampling check, and store the keyword of the spot-checking into a preset answer thesaurus;

统计模块,用于针对所述每个应答对象的每个应答答案中的所述抽检分词,统计统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;The statistical module is used to count the number of times that the sampling word hits the sampling keyword in the preset answer thesaurus for the sampling word segmentation in each response answer of the each response object, as The basic times corresponding to each answer;

确定模块,用于根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;A determination module, for determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer;

选取模块,用于按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module is configured to select a preset number of response objects in the order of the reliability values from small to large as sampling objects, and to check the response answers corresponding to the sampling objects.

为解决上述技术问题,本发明采用的一个技术方案是:提供一种计算机设备,包括,一个或多个处理器;存储器,用于存储一个或多个程序,使得一个或多个处理器实现上述任意一项所述的抽检方案。In order to solve the above-mentioned technical problem, a technical solution adopted by the present invention is to provide a computer device, including one or more processors; a memory for storing one or more programs, so that the one or more processors can realize the above Any one of the sampling plans described.

为解决上述技术问题,本发明采用的一个技术方案是:一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的抽检方案。In order to solve the above-mentioned technical problem, a technical solution adopted in the present invention is: a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned items is realized. the described sampling plan.

以上方案中的一种众包任务的抽检方法,通过针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库,得到的抽检关键字用于后续对应答对象的可靠性值进行评估,使得对应答对象的可靠性值的评估更加具有针对性;同时针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数,并根据基础次数,确定每个应答对象对应的可靠性值,然后按照可靠性值确定抽检对象,并对抽检对象对应的应答答案进行检查操作。通过确定每个应答对象对应的可靠性值,并根据可靠性值确定抽检对象,能够使得抽检更加有针对性,并且通过将应答对象按照对应的可靠性值进行排列,有利于提高抽检效率。A method for sampling inspection of crowdsourcing tasks in the above scheme is to obtain the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task, parse and process the response answer, and obtain the sampling word segmentation, and from the sampling inspection Sampling keywords are extracted from the word segmentation, and the sampling keywords are stored in the preset answer thesaurus. The obtained sampling keywords are used for subsequent evaluation of the reliability value of the response object, which makes the evaluation of the reliability value of the response object easier. It is pertinent; at the same time, for the sampling word segmentation in each response answer of each response object, the number of times the sampling word segmentation hits the sampling inspection keyword is counted as the basic number of times corresponding to each response answer, and each response object is determined according to the basic number of times. The corresponding reliability value is determined, and then the sampling object is determined according to the reliability value, and the response answer corresponding to the sampling object is checked. By determining the reliability value corresponding to each response object, and determining the sampling object according to the reliability value, the sampling inspection can be made more targeted, and by arranging the response objects according to the corresponding reliability value, it is beneficial to improve the sampling efficiency.

附图说明Description of drawings

为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

图1是本申请实施例提供的众包任务的抽检方法的应用环境示意图;1 is a schematic diagram of the application environment of the sampling method for crowdsourcing tasks provided by an embodiment of the present application;

图2根据本申请实施例提供的众包任务的抽检方法的一实现流程图;Fig. 2 is a realization flow chart of the sampling method for crowdsourcing tasks provided according to an embodiment of the present application;

图3是本申请实施例提供的众包任务的抽检方法中步骤S2的一实现流程图;3 is a flow chart of an implementation of step S2 in the sampling method for crowdsourcing tasks provided by the embodiment of the present application;

图4是本申请实施例提供的众包任务的抽检方法中步骤S221的一实现流程图;4 is a flow chart of an implementation of step S221 in the sampling method for crowdsourcing tasks provided by the embodiment of the present application;

图5是本申请实施例提供的众包任务的抽检方法中步骤S3的一实现流程图;5 is a flow chart of an implementation of step S3 in the sampling method for crowdsourcing tasks provided by the embodiment of the present application;

图6是本申请实施例提供的众包任务的抽检方法中步骤S4的一实现流程图;6 is a flow chart of an implementation of step S4 in the sampling method for crowdsourcing tasks provided by the embodiment of the present application;

图7是本申请实施例提供的众包任务的抽检装置示意图;7 is a schematic diagram of a sampling device for crowdsourcing tasks provided by an embodiment of the present application;

图8是本申请实施例提供的计算机设备的示意图。FIG. 8 is a schematic diagram of a computer device provided by an embodiment of the present application.

具体实施方式Detailed ways

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

下面结合附图和实施方式对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and embodiments.

请参阅图1,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。Referring to FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、搜索类应用、即时通信工具等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, search applications, instant communication tools, and the like.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.

服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .

需要说明的是,本申请实施例所提供的一种众包任务的抽检方法一般由服务器执行,相应地,一种众包任务的抽检装置一般设置于服务器中。It should be noted that a method for sampling a crowdsourcing task provided by the embodiment of the present application is generally performed by a server, and correspondingly, a device for sampling a crowdsourcing task is generally set in the server.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

请参阅图2,图2示出了众包任务抽检方法的一种具体实施方式。Please refer to FIG. 2, which shows a specific implementation of the crowdsourcing task sampling method.

需注意的是,若有实质上相同的结果,本发明的方法并不以图2所示的流程顺序为限,该方法包括如下步骤:It should be noted that, if there is substantially the same result, the method of the present invention is not limited to the flow sequence shown in FIG. 2, and the method includes the following steps:

S1:针对每个历史众包任务,获取参与历史众包任务的每个应答对象,以及每个应答对象对应的应答答案。S1: For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task, and the corresponding response answer of each response object.

具体地,服务端存储有每个历史众包任务、应答对象、应答答案,以及,应答对象、应答答案和历史众包任务之间的映射关系,对于任意一个历史众包任务,均存在至少一个应答答案,每个应答对象对于同一历史众包任务,最多对应有一个应答答案,针对每个历史众包任务,获取每个应答对象对于历史众包任务的应答答案。Specifically, the server stores each historical crowdsourcing task, the response object, the response answer, and the mapping relationship between the response object, the response answer and the historical crowdsourcing task. For any historical crowdsourcing task, there is at least one Answer answer, each answer object has at most one answer answer for the same historical crowdsourcing task, and for each historical crowdsourcing task, obtain the answer answer of each answer object for the historical crowdsourcing task.

其中,本实施例中的众包任务是指通过网络,让对象参与任务并给出相应的应答答案的网络方式的任务。The crowdsourcing task in this embodiment refers to a network-based task in which objects are allowed to participate in the task and give corresponding answers through the network.

其中,应答对象是指针对历史众包任务给出相应的应答答案的对象,具体可以是多个预设的网络模型,例如,针对众包任务A,一网络模型K通过预设方式,对该众包任务A进行识别解析,给出了应答答案,则给网络模型K可以称为该众包任务A的一个应答对象。Among them, the response object refers to the object that gives the corresponding response answer to the historical crowdsourcing task, and may specifically be a plurality of preset network models. For example, for the crowdsourcing task A, a network model K uses a preset method to The crowdsourcing task A is identified and analyzed, and the answer is given, and the network model K can be called a response object of the crowdsourcing task A.

在本实施例中,一种众包任务的抽检方法运行于其上的电子设备(例如图1所示的服务器),可以通过有线连接方式或者无线连接方式。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultrawideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, an electronic device (for example, the server shown in FIG. 1 ) on which a method for spot checking a crowdsourcing task runs may be connected by a wired connection or a wireless connection. It should be noted that the above wireless connection methods may include, but are not limited to, 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultrawideband) connection, and other wireless connection methods currently known or developed in the future.

S2:对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库。S2: Perform analysis processing on the response answer to obtain sampling word segmentations, extract sampling keywords from the sampling segmentations, and store the sampling keywords in a preset answer thesaurus.

具体的,对应答答案进行解析处理,得到能表达应答答案语义的分词信息,作为抽检分词,并从抽检分词中提取关键字,作为抽检关键字,并将抽检关键字存入到预设的答案词库,具体过程也可参考步骤S21至步骤S24的描述,为避免重复,此处不再赘述。Specifically, the response answer is parsed to obtain word segmentation information that can express the semantics of the response answer, which is used as the sampling word segmentation, and keywords are extracted from the sampling segmentation word as the sampling inspection keyword, and the sampling keyword is stored in the preset answer. Thesaurus, the specific process can also refer to the description of step S21 to step S24, in order to avoid repetition, it will not be repeated here.

其中,解析处理具体包括但不限于:分词处理、数据清洗、去重处理和同义替换等。The parsing processing specifically includes but is not limited to: word segmentation processing, data cleaning, deduplication processing, and synonymous substitution.

其中,分词处理是指将连续的字序列按照一定的规范重新组合成词序列的过程,在本实施例中,具体是指将应答答案分成一个个独立的抽检分词,以便后续使用这些抽检分词进行抽检关键字的提取。Among them, word segmentation processing refers to the process of recombining consecutive word sequences into word sequences according to certain specifications. In this embodiment, it specifically refers to dividing the response answer into individual random word segmentations, so that these random word segmentations can be used for subsequent use. Sampling keyword extraction.

其中,分词处理具体可以通过第三方分词工具,或者分词算法。Among them, the word segmentation processing can be specifically performed by a third-party word segmentation tool or a word segmentation algorithm.

其中,常见的第三方分词工具包括但不限于:Stanford NLP分词器、ICTClAS分词系统、ansj分词工具和HanLP中文分词工具等。Among them, common third-party word segmentation tools include but are not limited to: Stanford NLP word segmentation tool, ICTClAS word segmentation system, ansj word segmentation tool and HanLP Chinese word segmentation tool.

其中,分词算法包括但不限于:最大正向匹配(Maximum Matching,MM)算法、逆向最大匹配(ReverseDirectionMaximum Matching Method,RMM)算法、双向最大匹配(Bi-directction Matching method,BM)算法、动态规划算法、隐马尔科夫模型(Hidden MarkovModel,HMM)和N-gram模型等。Wherein, word segmentation algorithms include but are not limited to: maximum forward matching (Maximum Matching, MM) algorithm, reverse maximum matching (ReverseDirectionMaximum Matching Method, RMM) algorithm, two-way maximum matching (Bi-directction Matching method, BM) algorithm, dynamic programming algorithm , Hidden Markov Model (Hidden Markov Model, HMM) and N-gram model, etc.

优选地,本实施例采用动态规划算法进行分词处理,具体过程可参考步骤S21的描述,为避免重复,此处不再赘述。Preferably, a dynamic programming algorithm is used to perform word segmentation processing in this embodiment, and reference may be made to the description of step S21 for the specific process. To avoid repetition, details are not repeated here.

其中,数据清洗是指发现并纠正数据文件中可识别的错误的一道程序,包括检查数据一致性,处理无效值和缺失值等。在本实施例中,是对应答答案和分词进行文本规范性检查,剔除掉无效项的干扰,以便提高后续提取抽检关键字的准确率和效率。Among them, data cleaning refers to a process of finding and correcting identifiable errors in data files, including checking data consistency, dealing with invalid and missing values, etc. In this embodiment, the text normative check is performed on the response answer and word segmentation, and the interference of invalid items is eliminated, so as to improve the accuracy and efficiency of subsequent extraction of random keywords.

需要说明的是,作为一种优选方式,本实施例中,将抽检关键字存入到预设的答案词库,具体包括:获取历史众包任务对应有预设任务类型;将相同预设任务类型的历史众包任务对应的抽检关键字,作为同组抽检关键字;建立同组抽检关键字与预设任务类型之间的映射关系,并将该映射关系存入到预设的答案词库中。It should be noted that, as a preferred method, in this embodiment, the sampling keywords are stored in the preset answer thesaurus, which specifically includes: obtaining historical crowdsourcing tasks corresponding to preset task types; storing the same preset tasks The sampling keywords corresponding to the type of historical crowdsourcing tasks are used as the sampling keywords in the same group; the mapping relationship between the sampling keywords in the same group and the preset task types is established, and the mapping relationship is stored in the preset answer thesaurus middle.

其中,预设的任务类型可以根据实际需要进行设定,此处不作具体限定,例如,在一具体实施方式中,预设的任务类型包括:问答题、选择题、填空题和判断题等,又例如,在另一具体实施方式中,预设的任务类型包括:素材收集、美工制造、策划、宣传设计等。The preset task types can be set according to actual needs, which is not specifically limited here. For example, in a specific implementation manner, the preset task types include: quiz questions, multiple-choice questions, fill-in-the-blank questions, and judgment questions, etc. For another example, in another specific implementation manner, the preset task types include: material collection, art production, planning, publicity design, and the like.

通过对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,能够从众多的历史任务中,针对每个应答对象的应答答案提取到抽检关键字,从而能够有效的提高抽检的针对性,并能够提供抽检的效率。By analyzing and processing the response answer, the sampling word segmentation is obtained, and the sampling keyword is extracted from the sampling segmentation word, and the sampling keyword can be extracted from the response answer of each response object from many historical tasks, thereby effectively improving the sampling inspection. targeted and able to provide sampling efficiency.

S3:针对每个应答对象的每个应答答案中的抽检分词,统计预设的答案词库中,抽检分词命中抽检关键字的次数,作为每个应答对象对应的基础次数。S3: For the sampling word segmentation in each response answer of each answering object, count the number of times the sampling wording word hits the sampling inspection keyword in the preset answer thesaurus, as the basic number corresponding to each answering object.

具体的,统计预设的答案中,抽检分词命中抽检关键字的次数,得到能表达抽检分词的可靠性的基础次数,具体过程也可参考步骤S31和步骤S32的描述,为避免重复,此处不再赘述。Specifically, in the preset answers, the number of times that the sampling word segmentation hits the sampling keyword is counted, and the basic number of times that can express the reliability of the sampling word segmentation is obtained. The specific process can also refer to the description of steps S31 and S32. To avoid repetition, here No longer.

其中,可靠性值是指根据对象对于历史众包任务的应答答案,来评估对象对于众包任务的应答情况的可靠程度,一般来说,可靠性值越高,对象反馈的应答答案的风险值越小。The reliability value refers to evaluating the reliability of the object's response to the crowdsourcing task according to the object's response to the historical crowdsourcing task. Generally speaking, the higher the reliability value, the higher the risk value of the response answer returned by the object. smaller.

S4:根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值。S4: Determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.

具体的,每个应答答案对应有应答对象,通过每个应答答案对应的基础次数,可以确定每个应答对象对应的可靠性值。得到每个应答对象对应的可靠性值,用于确定后续抽检的顺序,避免了随机性的抽检,增强了抽检的针对性。其S4具体过程也可参考步骤S41和步骤S42的描述,为避免重复,此处不再赘述。Specifically, each response answer corresponds to a response object, and the reliability value corresponding to each response object can be determined through the basic times corresponding to each response answer. The reliability value corresponding to each response object is obtained, which is used to determine the sequence of subsequent sampling inspections, which avoids random sampling inspections and enhances the pertinence of sampling inspections. For the specific process of S4, reference may also be made to the descriptions of steps S41 and S42, which are not repeated here to avoid repetition.

S5:按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。S5: Select a preset number of response objects in an ascending order of reliability values as sampling objects, and check the response answers corresponding to the sampling objects.

具体的,按照可靠性值进行从小到大的顺序,对应答对象进行排序,得到应答对象序列;按照从前往后的顺序,从应答对象序列中,选取预设数量的应答对象,作为用于抽检的目标对象。Specifically, the response objects are sorted in ascending order according to the reliability value to obtain a sequence of response objects; in the order from front to back, a preset number of response objects are selected from the sequence of response objects to be used for random inspection target object.

其中,选取预设数量的应答对象根据实际的抽检需要进行设置,此处不作具体限定,例如,可以根据抽检的人力情况安排抽检的数量,抽检人力若只能检查100个选取对象,则可靠性值排在前100应答对象。Among them, the selection of the preset number of response objects is set according to the actual sampling inspection needs, which is not specifically limited here. For example, the number of sampling inspections can be arranged according to the manpower situation of the sampling inspection. If the sampling inspection manpower can only check 100 selected objects, the reliability Values are in the top 100 answering objects.

向终端设备101、102、103发送抽检结果,使得应答对象能够获知抽检结果。Send the sampling results to the terminal devices 101, 102, and 103, so that the respondent can know the sampling results.

本实施例中,通过针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库,得到的抽检关键字用于后续对应答对象的可靠性值进行评估,使得对应答对象的可靠性值的评估更加具有针对性;同时针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数,并根据基础次数,确定每个应答对象对应的可靠性值,然后按照可靠性值确定抽检对象,并对抽检对象对应的应答答案进行检查操作。通过确定每个应答对象对应的可靠性值,并根据可靠性值确定抽检对象,能够使得抽检更加有针对性,并且通过将应答对象按照对应的可靠性值进行排列,避免了随机抽检的情况,有利于提高抽检效率。In this embodiment, for each historical crowdsourcing task, the response answer corresponding to the historical crowdsourcing task of each response object is obtained, and the response answer is parsed to obtain the spot check word, and the spot check keyword is extracted from the spot check word. The sampling keywords are stored in the preset answer thesaurus, and the obtained sampling keywords are used for subsequent evaluation of the reliability value of the response object, which makes the evaluation of the reliability value of the response object more targeted; Sampling word segmentation in each response answer of the respondent, count the number of times that the sampling word segment hit the sampling keyword, as the basic number of times corresponding to each response answer, and determine the reliability value corresponding to each response object according to the basic number of times, and then follow The reliability value determines the sampling object, and checks the response answer corresponding to the sampling object. By determining the reliability value corresponding to each response object, and determining the sampling object according to the reliability value, the sampling inspection can be made more targeted, and by arranging the response objects according to the corresponding reliability value, the situation of random sampling is avoided. It is beneficial to improve the sampling efficiency.

请参阅图3,图3示出了步骤S2的一种具体实施方式,步骤S2中,对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库的具体实现过程,详叙如下:Please refer to FIG. 3. FIG. 3 shows a specific implementation of step S2. In step S2, the response answer is parsed to obtain the sampling word segmentation, and the sampling keyword is extracted from the sampling segmentation, and the sampling keyword is stored in the The specific implementation process to the preset answer thesaurus is described in detail as follows:

S21:使用动态规划算法,对应答答案进行分词处理,得到初始分词。S21: Using a dynamic programming algorithm, perform word segmentation on the answer to obtain an initial word segmentation.

具体的,在应答对象的对应的历史应答答案中,应答答案往往较为冗杂,需要对其进行简化处理,通过使用动态规划算法,对应答答案进行分词处理,提取与抽检相关的分词,得到初始分词。Specifically, in the corresponding historical response answers of the respondent, the response answers are often complicated and need to be simplified. By using the dynamic programming algorithm, the response answers are processed by word segmentation, and the word segmentation related to the sampling inspection is extracted to obtain the initial word segmentation. .

其中,动态规划算法通常用于求解具有某种最优性质的问题;在本申请中,使用动态规划算法得出最优的初始分词。动态规划算法其基本思想是将待求解问题分解成若干个子问题,先求解子问题,然后从这些子问题的解得到原问题的解。Among them, the dynamic programming algorithm is usually used to solve a problem with a certain optimal property; in this application, the dynamic programming algorithm is used to obtain the optimal initial word segmentation. The basic idea of the dynamic programming algorithm is to decompose the problem to be solved into several sub-problems, first solve the sub-problems, and then obtain the solution of the original problem from the solutions of these sub-problems.

优选的,使用维比特(vibiter)算法,对应答答案进行分词处理,得到初始分词。维比特算法是用于寻找观察结果最有可能解释相关的动态规划算法。Preferably, a Vibiter algorithm is used to perform word segmentation processing on the answer to obtain an initial word segmentation. The Vibit algorithm is a dynamic programming algorithm for finding observations that are most likely to explain the correlation.

S22:对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词。S22: Perform filtering processing on the initial word segmentation, and perform synonymous replacement on the initial word segmentation after the filtering processing to obtain the word segmentation by sampling inspection.

具体的,针对解析处理后,得到的初始分词,对初始分词进行过滤处理,处理掉不必要或是冗余的词汇,得到更加符合抽检需要的分词,再对过滤处理后的初始分词进行同义替换,得到抽检分词。同义替换的目的在于将分词中同义词、近义词转化成一致的标准词汇,进一步简化分词,从而能够得到抽检分词。Specifically, for the initial word segmentation obtained after the parsing process, the initial word segmentation is filtered, and unnecessary or redundant words are processed to obtain a word segmentation that is more suitable for the needs of sampling inspection, and then the filtered initial word segmentation is synonymous. Substitute to get random word segmentation. The purpose of synonymous substitution is to convert the synonyms and synonyms in the participle into a consistent standard vocabulary, and further simplify the participle, so that the spot-checked participle can be obtained.

其中,过滤处理是指对于过滤掉形容词、副词等对语义描述影响不大的词性的初始分词,保留名称、动词和量词等对语义表达起到关键作用的初始分词。Among them, the filtering process refers to filtering out the initial participles such as adjectives and adverbs that have little influence on the semantic description, and retaining the initial participles that play a key role in semantic expression, such as names, verbs and quantifiers.

其中,同义替换是指将同义词、近义词转换为统一的标准词汇进行表示。Among them, synonymous substitution refers to converting synonyms and synonyms into unified standard vocabulary for representation.

例如,在一具体实施方式中,得到的初始分词为“张三”、“正在”、“疯狂地”、“打CALL”,而“正在”和“疯狂地”这两个词均为副词,可予过滤,“张三”为一具体人名、“打CALL”指示的为一具体动作,因而,在过滤处理之后,得到的“张三”、和“打CALL”两个分词,进而对“打CALL”进行同义替换,得到抽检分词“张三”、和“欢呼”。For example, in a specific implementation manner, the obtained initial participles are "Zhang San", "Zheng", "Crazy", "CALL", and the two words "Zheng" and "Crazy" are both adverbs, It can be filtered, "Zhang San" is a specific person's name, and "CALL" indicates a specific action. Therefore, after the filtering process, the obtained "Zhang San" and "CALL" are two participles, and then the "Zhang San" and "CALL" are obtained. "CALL" to perform synonymous replacement, and get the sampling participles "Zhang San" and "Cheers".

S23:针对同一历史众包任务,获取历史众包任务对应的所有抽检分词,并统计每个抽检分词的出现次数,得到抽检分词对应的抽检词频。S23: For the same historical crowdsourcing task, obtain all the sampling word segments corresponding to the historical crowdsourcing task, and count the occurrences of each sampling word segment, and obtain the sampling word frequency corresponding to the sampling word segment.

具体地,对于同一个历史众包任务,获取该历史众包任务对应的所有抽检分词,并每种抽检分词的出现次数进行统计,得到该历史众包任务中,该抽检分词对应的抽检词频。Specifically, for the same historical crowdsourcing task, all sampling word segments corresponding to the historical crowdsourcing task are obtained, and the occurrences of each sampling word segment are counted to obtain the sampling word frequency corresponding to the sampling word segment in the historical crowdsourcing task.

其中,抽检词频是在同一历史众包任务中,该抽检分词出现的频率,抽检词频的表示方式,可以是按抽检分词出现的次数,也可以是统计抽检分词出现的比例,具体可依据实际情况进行设置。Among them, the frequency of sampling words is the frequency of the occurrence of the sampling word in the same historical crowdsourcing task, and the way of expressing the frequency of sampling words can be the number of occurrences of the sampling word, or the proportion of the occurrence of the statistical sampling, which can be based on the actual situation. Make settings.

例如,在一具体实施方式中,一历史众包任务对应有8个抽检分词,分别为:分词1、分词2、分词3、分词4、分词5、分词6、分词7和分词8,折8个抽检分词的出现次数依次为:22、5、19、25、8、1、20、2,得到对应的抽检词频依次为22、5、19、25、8、1、20、2。For example, in a specific embodiment, a historical crowdsourcing task corresponds to 8 random word segmentations, which are: word segmentation 1, word segmentation 2, word segmentation 3, word segmentation 4, word segmentation 5, word segmentation 6, word segmentation 7 and word segmentation 8, folded 8 The number of occurrences of each sampling word is: 22, 5, 19, 25, 8, 1, 20, 2, and the corresponding sampling word frequencies are 22, 5, 19, 25, 8, 1, 20, 2.

S24:将抽检词频大于预设词频的抽检分词,作为抽检关键字,并将抽检关键字存入到预设的答案词库。S24: Use the sampling word frequency with the sampling word frequency greater than the preset word frequency as the sampling keyword, and store the sampling keyword in the preset answer thesaurus.

具体地,服务端存储由预设词频,将每个抽检词频与预设词频进行比较,当抽检词频大于预设词频时,将抽检词频对应的抽检分词,作为抽检关键字,并存入到预设的答案词库中。Specifically, the server stores the preset word frequency, and compares each sampling word frequency with the preset word frequency. When the sampling word frequency is greater than the preset word frequency, the sampling word corresponding to the sampling word frequency is used as the sampling keyword, and is stored in the preset word frequency. set in the answer thesaurus.

其中,预设词频可根据实际抽检需要进行设置。Among them, the preset word frequency can be set according to actual sampling needs.

本实施例中,通过对应答答案进行分词处理,能够得到与抽检相关的初始分词,并对初始分词进行过滤处理和同义替换,能够进一步简化分词,从而能够得到抽检分词,然后从抽检分词提取关键字,能够获取到更为精确的抽检关键字。In this embodiment, by performing word segmentation processing on the response answer, the initial word segmentation related to the random inspection can be obtained, and the initial word segmentation can be filtered and replaced by synonyms, which can further simplify the word segmentation, so that the sampling inspection word segmentation can be obtained, and then extracted from the sampling inspection word segmentation keyword, you can obtain more accurate sampling keywords.

在一实施例中,步骤S22中对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词,进行了详细说明,具体过程如下:In one embodiment, in step S22, filter processing is performed on the initial word segmentation, and synonymous replacement is performed on the initial word segmentation after the filtering processing to obtain the word segmentation by sampling inspection, which is described in detail, and the specific process is as follows:

通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词。By means of named entity recognition, the initial participles are replaced by synonyms, and the sampling participles are obtained.

具体地,通过命名实体识别的方式,对过滤处理后的初始分词进行同义替换,得到抽检分词。Specifically, by means of named entity recognition, synonymous replacement is performed on the initial word segmentation after filtering to obtain the word segmentation by sampling.

其中,命名实体识别(Named Entity Recognition,NER)是确定实体边界主要和分词相关,发现命名实体的基本方法,用于识别文本中具有特定意义的实体,它是自然语言处理实用化的重要内容,在信息提取、句法分析、机器翻译等应用领域中具有重要的基础性作用。命名实体识别一方面要识别实体边界,另一方面要识别实体类别,例如人名、地名、机构名等。Among them, Named Entity Recognition (NER) is a basic method for determining entity boundaries mainly related to word segmentation and discovering named entities. It is used to identify entities with specific meanings in texts. It is an important part of the practical application of natural language processing. It plays an important fundamental role in application fields such as information extraction, syntactic analysis, and machine translation. Named entity recognition needs to identify entity boundaries on the one hand, and entity categories, such as person names, place names, and institution names, on the other hand.

在本实施例中,通过命名实体识别的方式,对已经经过过滤后的初始分词进行同义替换,对初始分词的进一步简化,得到抽检分词,提高了抽检分词的精准度。In this embodiment, by means of named entity recognition, synonymous replacement is performed on the initial word segmentation that has been filtered, and the initial word segmentation is further simplified to obtain random word segmentation, which improves the accuracy of random word segmentation.

请参阅图4,图4示出了步骤S22中,通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词的具体实现过程,详叙如下:Please refer to Fig. 4, Fig. 4 shows that in step S22, by means of named entity recognition, synonymous replacement is performed on the initial word segmentation to obtain the specific implementation process of the sampling word segmentation, which is described in detail as follows:

S221:获取预设的标准词汇字典。S221: Obtain a preset standard vocabulary dictionary.

具体的,服务器中设置事先设置有标准词汇字典,其标准词汇字典根据抽检的应答答案的词汇设定,能够有效的过滤掉相对冗余的词汇,例如一些不需要的介词、形容词,亦或是同义词和近义词。Specifically, a standard vocabulary dictionary is set in the server in advance, and the standard vocabulary dictionary can effectively filter out relatively redundant vocabulary, such as some unnecessary prepositions, adjectives, or Synonyms and Synonyms.

S222:针对每个初始分词,通过遍历的方式,将初始分词分别与标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果。S222: For each initial word segmentation, perform named entity recognition on the initial word segmentation and each word in the standard vocabulary dictionary by means of traversal, to obtain an entity recognition result.

具体的,通过将初始分词一一与标准词汇字典中的每个词汇进行命名实体识别,得到不同的识别结果,该识别结果可能是存在相同的命名实体,也有可能是不相同的命名实体。Specifically, by performing named entity identification on each word in the standard vocabulary dictionary with the initial segmentation, different identification results are obtained, and the identification results may be the same or different named entities.

例如,识别到的两个命名实体“NYC”和“New York”,表面上是不同的字符串,但其实指的都是纽约这个城市,需要合并,又例如,识别到的两个命名实体“打CALL”和“欢呼”,字面描述不同,但对应的语义都是“欢呼”的含义,需要进行实体命名合并。For example, the two identified named entities "NYC" and "New York" appear to be different strings, but they actually refer to the city of New York, which needs to be merged. For example, the two identified named entities " "CALL" and "Cheers" have different literal descriptions, but the corresponding semantics are the meaning of "CALL", which requires entity naming and merging.

S223:若实体识别结果为存在相同命名实体,则获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词。S223: If the entity recognition result is that the same named entity exists, acquire the initial word segmentation and standard vocabulary corresponding to the recognition result, and use the standard word segmentation to replace the initial word segmentation.

具体地,实体识别结果为存在相同命名实体,即两个分词对应的语义都是相同的含义,通过获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词,得到抽检分词。Specifically, the entity recognition result is the existence of the same named entity, that is, the semantics corresponding to the two participles are the same meaning. By obtaining the initial participle and standard vocabulary corresponding to the recognition result, and using the standard participle to replace the initial participle, the sampling participle is obtained.

例如,一个初始分词命名实体“打CALL”,另一个初始分词命名实体“欢呼”,通过步骤S222中的命名实体识别,可以的得出实体识别结果为相同命名实体,通过获取初始分词“打CALL”、“欢呼”和标准词汇“喝彩”,通过“喝彩”替换“打CALL”、“欢呼”,得到抽检分词“喝彩”。For example, an initial participle named entity "calls", another initial participle named entity "cheers", through the named entity recognition in step S222, it can be concluded that the entity recognition result is the same named entity, by obtaining the initial participle "call CALL" ", "Cheers" and the standard word "Cheers", replace "CALL" and "Cheers" with "Cheers", and get the sampling participle "Cheers".

本实施例中,通过进一步简化初始分词,得到更加贴近抽检目的的抽检分词,过滤掉了初始分词中没有意义的介词、形容词等词语,并替换相同或相近词义的词语,提高了抽检分词选取的精准度。In this embodiment, by further simplifying the initial participle, a sampling participle that is closer to the purpose of sampling inspection is obtained, words such as prepositions and adjectives that have no meaning in the initial participle are filtered out, and words with the same or similar meaning are replaced, which improves the selection of the sampling participle. precision.

请参阅图5,图5示出了步骤S3的一种具体实施方式,步骤S3中,针对每个应答对象的每个应答答案中的抽检分词,统计预设的答案词库中,抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数的具体实现过程,详叙如下:Please refer to FIG. 5. FIG. 5 shows a specific implementation of step S3. In step S3, for the sampling word segmentation in each response answer of each response object, in the preset answer thesaurus, the sampling word segmentation hits the The number of sampling keywords, as the specific implementation process of the basic number corresponding to each answer, is described in detail as follows:

S31:获取应答对象参与的每个历史众包任务,作为参考任务。S31: Obtain each historical crowdsourcing task that the respondent participated in as a reference task.

具体的,在服务器中保存着每个应答对象参与的每个历史众包任务,通过获取应答对象参与的每个历史众包任务,将其作为参考任务,能够囊括每个应答对象的应答答案。Specifically, each historical crowdsourcing task participated by each respondent is stored in the server, and each respondent's response answer can be included by acquiring each historical crowdsourcing task participated by the respondent as a reference task.

S32:针对每个参考任务,从预设的答案词库中获取抽检关键字,并统计应答对象的应答答案对应的抽检分词命中抽检关键字的次数,作为基础次数。S32: For each reference task, obtain the sampling keyword from the preset answer thesaurus, and count the number of times the sampling keyword corresponding to the response answer of the respondent hits the sampling keyword, as the basic number of times.

具体的,由于抽检针对性的要求,需要针对性的抽检每个应答对象的应答答案对众包任务的应答答案,是否更加具有正确性;通过对不同应答答案的抽检分词命中关键字的次数进行统计,能够获知不同应答对象对众包任务完成能力。Specifically, due to the targeted requirements of sampling inspection, it is necessary to spot-check whether the response answer of each response object is more correct to the response answer of the crowdsourcing task; Statistics can be used to know the ability of different respondents to complete the crowdsourcing task.

本实施例中,通过获取应答对象参与的每个历史众包任务,作为参考任务,并确定基础次数,能够为后续可靠性值的确定提供基础。In this embodiment, by acquiring each historical crowdsourcing task that the respondent participated in as a reference task, and determining the basic number of times, a basis can be provided for the subsequent determination of the reliability value.

请参阅图6,图6示出了步骤S4的一种具体实施方式,步骤S4中,根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值的具体实现过程,详叙如下:Please refer to FIG. 6. FIG. 6 shows a specific implementation of step S4. In step S4, according to the basic times corresponding to each response answer, the specific implementation process of determining the reliability value corresponding to each response object is described in detail. as follows:

S41:根据基础次数,统计每个参考任务对应的抽检关键字之和M,并统计应答对象对于每个参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M。S41: According to the basic number of times, count the sum M of the sampling keywords corresponding to each reference task, and count the sum N of the hit times of the response object for each reference task, where M and N are both positive integers, and N is less than or equal to M .

S42:采用公式δ=N/M进行计算,得到可靠性值δ。S42: Calculate by using the formula δ=N/M to obtain the reliability value δ.

例如,其中参考任务对应的抽检关键字之和M为10,其中一个应答对象对于每个参考任务命中次数之和N为2,通过公式δ=N/M,即可靠性值δ为0.2,另外一个应答对象对于每个参考任务命中次数之和N为4,即可靠性值δ为0.4;由于可靠性值越高,对象反馈的应答答案的风险值越小,所以可以得出可靠性值δ为0.4比靠性值δ为0.2风险值小。For example, the sum M of the sampling keywords corresponding to the reference task is 10, and the sum N of the number of hits of a response object for each reference task is 2, through the formula δ=N/M, that is, the reliability value δ is 0.2, and in addition The sum N of hits of a responding object for each reference task is 4, that is, the reliability value δ is 0.4; since the higher the reliability value, the smaller the risk value of the response answer fed back by the object, so the reliability value δ can be obtained. A risk value of 0.4 is smaller than a reliability value of δ of 0.2.

本实例中,通过统计基础次数,并采用公式得出具体的可靠性值,能够解决现有技术中过于对应答答案的随机抽检,通过精确的可靠性值,能够准确的获知哪些应答对象的应答答案值得信任,哪些应答对象的应答答案过于随机。In this example, by counting the basic times and using the formula to obtain the specific reliability value, the random sampling of the response answers in the prior art can be solved, and through the accurate reliability value, it is possible to accurately know which response objects have responded. The answers are trustworthy, and which respondent's responses are too random.

应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that the realization of all or part of the processes in the methods of the above embodiments can be accomplished by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium, and the program is During execution, it may include the processes of the embodiments of the above-mentioned methods. The aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

请参考图7,作为对上述图2所示方法的实现,本申请提供了一种众包任务的抽检装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Please refer to FIG. 7 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a sampling inspection device for crowdsourcing tasks. The device embodiment corresponds to the method embodiment shown in FIG. 2 . The device can be specifically applied to various electronic devices.

如图7所示,本实施例的一种众包任务的抽检装置包括:获取模块51、解析模块52、统计模块53、确定模块54以及选取模块55。其中:As shown in FIG. 7 , a sampling inspection device for crowdsourcing tasks in this embodiment includes: an acquisition module 51 , an analysis module 52 , a statistics module 53 , a determination module 54 , and a selection module 55 . in:

获取模块51,用于针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案。The obtaining module 51 is configured to obtain, for each historical crowdsourcing task, a response answer corresponding to the historical crowdsourcing task for each response object.

解析模块52,用于对应答答案进行解析处理,得到抽检分词,并提取抽检分词的抽检关键字,将抽检关键字存入到预设的答案词库。The parsing module 52 is used for parsing and processing the response answer, obtaining the sampling word segmentation, extracting the sampling keyword of the sampling segmentation word, and storing the sampling keyword in the preset answer thesaurus.

统计模块53,用于针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案对应的基础次数。The statistics module 53 is used to count the number of times that the sampling word hits the sampling keyword for the sampling word in each response answer of each response object, as the basic number of times corresponding to each answering answer.

确定模块54,用于根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值。The determination module 54 is configured to determine the reliability value corresponding to each response object according to the basic times corresponding to each response answer.

选取模块55,用于用于按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module 55 is used to select a preset number of response objects in the order of increasing reliability values as sampling objects, and to check the response answers corresponding to the sampling objects.

进一步地,解析模块52包括:Further, the parsing module 52 includes:

分词单元,用于使用动态规划算法,对应答答案进行分词处理,得到初始分词;The word segmentation unit is used to perform word segmentation processing on the response answer by using the dynamic programming algorithm to obtain the initial word segmentation;

抽检分词确定单元,用于对初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到抽检分词;The spot check word segmentation determination unit is used to filter the initial word segmentation, and perform synonymous replacement for the filtered initial word segmentation to obtain the spot check word segmentation;

抽检词频确定单元,用于针对同一历史众包任务,获取历史众包任务对应的所有抽检分词,并统计每个抽检分词的出现次数,得到抽检分词对应的抽检词频;The sampling word frequency determination unit is used to obtain all sampling word segments corresponding to the historical crowdsourcing task for the same historical crowdsourcing task, and count the occurrences of each sampling word segment to obtain the sampling word frequency corresponding to the sampling word segment;

抽检关键字确定单元,用于将抽检词频大于预设词频的抽检分词,作为抽检关键字,并将抽检关键字存入到预设的答案词库。The sampling keyword determination unit is used for sampling the word frequency of the sampling inspection is greater than the preset word frequency as the sampling inspection keyword, and storing the sampling inspection keyword in the preset answer thesaurus.

进一步地,抽检分词确定单元包括:Further, the sampling and segmentation determination unit includes:

命名实体识别子单元,用于通过命名实体识别的方式,对初始分词进行同义替换,得到抽检分词。The named entity recognition sub-unit is used to synonymously replace the initial participle by means of named entity recognition to obtain the sample participle.

进一步地,抽检分词确定单元还包括:Further, the sampling and segmentation determination unit also includes:

标准词汇字典获取子单元,用于获取预设的标准词汇字典;The standard vocabulary dictionary obtaining subunit is used to obtain the preset standard vocabulary dictionary;

实体识别结果确定子单元,用于针对每个初始分词,通过遍历的方式,将初始分词分别与标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;The entity recognition result determination subunit is used to perform named entity recognition on the initial word segmentation and each word in the standard vocabulary dictionary by traversing for each initial word segmentation, and obtain the entity recognition result;

标准分词替代子单元,用于若实体识别结果为存在相同命名实体,则获取识别结果对应的初始分词和标准词汇,并使用标准分词替代初始分词。The standard participle replacement subunit is used to obtain the initial participle and standard vocabulary corresponding to the recognition result if the entity recognition result is that the same named entity exists, and use the standard participle to replace the initial participle.

进一步地,统计模块53包括:Further, the statistics module 53 includes:

参考任务确定单元,用于获取应答对象参与的每个历史众包任务,作为参考任务;The reference task determination unit is used to obtain each historical crowdsourcing task that the respondent participated in as a reference task;

基础次数确定单元,用于针对每个参考任务,从预设的答案词库中获取抽检关键字,并统计应答对象的应答答案对应的抽检分词命中抽检关键字的次数,作为基础次数。The basic number determination unit is used for obtaining the sampling keyword from the preset answer thesaurus for each reference task, and counting the number of times that the sampling word corresponding to the response answer of the respondent hits the sampling keyword as the basic number.

进一步的,确定模块54包括:Further, the determining module 54 includes:

基础次数统计单元,用于根据基础次数,统计每个参考任务对应的抽检关键字之和M,并统计应答对象对于每个参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;The basic count statistics unit is used to count the sum M of the sampling keywords corresponding to each reference task according to the basic count, and count the sum N of the hit times of the response object for each reference task, where M and N are both positive integers, and N is less than or equal to M;

可靠性值确定单元,用于采用公式δ=N/M进行计算,得到可靠性值δ。The reliability value determination unit is configured to perform calculation by adopting the formula δ=N/M to obtain the reliability value δ.

以上方案中的一种众包任务的抽检装置,通过获取模块51针对每个历史众包任务,获取每个应答对象对应历史众包任务的应答答案;解析模块52对应答答案进行解析处理,得到抽检分词,并从抽检分词中提取抽检关键字,将抽检关键字存入到预设的答案词库;将繁杂的抽检分词提炼出针对性较强的抽检关键字,能够有效的提高抽检效率。统计模块53针对每个应答对象的每个应答答案中的抽检分词,统计抽检分词命中抽检关键字的次数,作为每个应答答案的基础次数;能够将应答对象转化成对应的可靠性值,增强抽检的针对性;确定模块54根据每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值,然后选取模块55按照可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,能够使得抽检更加有针对性,有利于提高抽检效率。A sampling inspection device for crowdsourcing tasks in the above scheme, through the obtaining module 51, for each historical crowdsourcing task, obtains the response answer corresponding to the historical crowdsourcing task for each response object; the parsing module 52 parses the response answer to obtain Sampling and segmenting words, extracting sampling keywords from the sampling segmentation, and saving the sampling keywords into the preset answer thesaurus; extracting the more targeted sampling keywords from the complicated sampling segmentation words, which can effectively improve the sampling efficiency. The statistics module 53 counts the number of times that the sampling word hits the sampling keyword for the sampling word segmentation in each response answer of each response object, as the basic number of each response answer; the response object can be converted into a corresponding reliability value, enhancing the The pertinence of the random inspection; the determination module 54 determines the reliability value corresponding to each response object according to the corresponding basic times of each response answer, and then the selection module 55 selects the preset number of responses according to the order of the reliability values from small to large The object, as a sampling object, can make the sampling more targeted and help improve the efficiency of sampling.

为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. For details, please refer to FIG. 8 , which is a block diagram of a basic structure of a computer device according to this embodiment.

计算机设备6包括通过系统总线相互通信连接存储器61、处理器62、网络接口63。需要指出的是,图中仅示出了具有三种组件存储器61、处理器62、网络接口63的计算机设备6,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field -Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 6 includes a memory 61 , a processor 62 , and a network interface 63 connected to each other through a system bus. It should be pointed out that the figure only shows the computer device 6 with three components, the memory 61, the processor 62, and the network interface 63, but it should be understood that it is not required to implement all the shown components, and alternative implementations are possible. More or fewer components. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.

计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, and a cloud server and other computing equipment. Computer devices can interact with users through keyboards, mice, remote controls, touchpads, or voice-activated devices.

存储器61至少包括一种类型的可读存储介质,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器61可以是计算机设备6的内部存储单元,例如该计算机设备6的硬盘或内存。在另一些实施例中,存储器61也可以是计算机设备6的外部存储设备,例如该计算机设备6上配备的插接式硬盘,智能存储卡(SmartMedia Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器61还可以既包括计算机设备6的内部存储单元也包括其外部存储设备。本实施例中,存储器61通常用于存储安装于计算机设备6的操作系统和各类应用软件,例如众包任务的抽检方法的程序代码等。此外,存储器61还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 61 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 61 may be an internal storage unit of the computer device 6 , such as a hard disk or a memory of the computer device 6 . In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD card) equipped on the computer device 6 ) card, flash card (Flash Card) and so on. Of course, the memory 61 may also include both the internal storage unit of the computer device 6 and its external storage device. In this embodiment, the memory 61 is generally used to store the operating system and various application software installed on the computer device 6 , such as program codes of the sampling method for crowdsourcing tasks, and the like. In addition, the memory 61 can also be used to temporarily store various types of data that have been output or will be output.

处理器62在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器62通常用于控制计算机设备6的总体操作。本实施例中,处理器62用于运行存储器61中存储的程序代码或者处理数据,例如运行一种众包任务的抽检方法的程序代码。The processor 62 may be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6 . In this embodiment, the processor 62 is configured to run the program code or process data stored in the memory 61, for example, the program code for running a sampling method for crowdsourcing tasks.

网络接口63可包括无线网络接口或有线网络接口,该网络接口63通常用于在计算机设备6与其他电子设备之间建立通信连接。The network interface 63 may comprise a wireless network interface or a wired network interface, and the network interface 63 is typically used to establish a communication connection between the computer device 6 and other electronic devices.

本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,计算机可读存储介质存储有抽检程序,抽检程序可被至少一个处理器执行,以使至少一个处理器执行如上述的一种众包任务的抽检方法的步骤。The present application also provides another embodiment, which is to provide a computer-readable storage medium, where the computer-readable storage medium stores a sampling inspection program, and the sampling inspection program can be executed by at least one processor, so that the at least one processor executes the above-mentioned The steps of a sampling method for crowdsourcing tasks.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods of the various embodiments of the present application.

显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (10)

1.一种众包任务的抽检方法,其特征在于,包括:1. a sampling method of crowdsourcing task, is characterized in that, comprises: 针对每个历史众包任务,获取参与所述历史众包任务的每个应答对象,以及每个所述应答对象对应的应答答案;For each historical crowdsourcing task, obtain each response object participating in the historical crowdsourcing task, and the response answer corresponding to each response object; 对所述应答答案进行解析处理,得到抽检分词,并从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库;Analyzing and processing the response answer, obtaining the word-sampling check, and extracting the keyword for spot-checking from the word-sampling check, and storing the keyword for spot-checking into a preset answer thesaurus; 针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;For the sampling word segmentation in each response answer of each responding object, count the number of times that the sampling word segmentation hits the sampling keyword in the preset answer thesaurus, as the basis for each answer answer frequency; 根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;According to the basic times corresponding to each response answer, determine the reliability value corresponding to each response object; 按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。According to the order of the reliability values from small to large, a preset number of response objects are selected as sampling objects, and the response answers corresponding to the sampling objects are checked. 2.根据权利要求1所述众包任务的抽检方法,其特征在于,所述对所述应答答案进行解析处理,得到抽检分词包括:2. the sampling method of crowdsourcing task according to claim 1, is characterized in that, described answering answer is analyzed and processed, obtains the word of sampling inspection and comprises: 使用动态规划算法,对所述应答答案进行分词处理,得到初始分词;Using a dynamic programming algorithm, word segmentation is performed on the response answer to obtain an initial word segmentation; 对所述初始分词进行过滤处理,并对过滤处理后的初始分词进行同义替换,得到所述抽检分词。Filter processing is performed on the initial word segmentation, and synonymous replacement is performed on the initial word segmentation after the filtering processing to obtain the sampling word segmentation. 3.根据权利要求1所述众包任务的抽检方法,其特征在于,所述从所述抽检分词中提取抽检关键字,将所述抽检关键字存入到预设的答案词库包括:3. the sampling method of crowdsourcing task according to claim 1, is characterized in that, described extracting sampling key word from described sampling inspection participle, described sampling inspection key word is stored in the preset answer thesaurus comprising: 针对同一所述历史众包任务,获取所述历史众包任务对应的所有抽检分词,并统计每个所述抽检分词的出现次数,得到所述抽检分词对应的抽检词频;For the same historical crowdsourcing task, obtain all the sampling word segments corresponding to the historical crowdsourcing task, and count the number of occurrences of each of the sampling word segmentations, to obtain the sampling word frequency corresponding to the sampling word segmentation; 将所述抽检词频大于预设词频的抽检分词,作为抽检关键字,并将所述抽检关键字存入到预设的答案词库。The sampling word frequency of the sampling inspection word frequency is greater than the preset word frequency, as the sampling inspection keyword, and the sampling inspection keyword is stored in the preset answer thesaurus. 4.根据权利要求2所述众包任务的抽检方法,其特征在于,所述对过滤处理后的初始分词进行同义替换,得到所述抽检分词包括:4. the sampling method of crowdsourcing task according to claim 2, is characterized in that, described initial participle after filtering processing is carried out synonymous replacement, obtain described sampling participle comprises: 通过命名实体识别的方式,对所述初始分词进行同义替换,得到所述抽检分词。By means of named entity recognition, synonymous replacement is performed on the initial participle to obtain the sampling participle. 5.根据权利要求4所述众包任务的抽检方法,其特征在于,所述通过命名实体识别的方式,对所述初始分词进行同义替换包括:5. the sampling method of crowdsourcing task according to claim 4, is characterized in that, described by the mode of named entity recognition, described initial participle is carried out synonymous replacement and comprises: 获取预设的标准词汇字典;Get the preset standard vocabulary dictionary; 针对每个所述初始分词,通过遍历的方式,将所述初始分词分别与所述标准词汇字典中的每个词汇进行命名实体识别,得到实体识别结果;For each of the initial participles, by means of traversal, perform named entity recognition on the initial participles and each vocabulary in the standard vocabulary dictionary, respectively, to obtain an entity recognition result; 若所述实体识别结果为存在相同命名实体,则获取所述识别结果对应的初始分词和标准词汇,并使用所述标准分词替代所述初始分词。If the entity recognition result is that the same named entity exists, the initial participle and standard vocabulary corresponding to the recognition result are acquired, and the standard participle is used to replace the initial participle. 6.根据权利要求1至5任一项所述众包任务的抽检方法,其特征在于,所述针对所述每个应答对象的每个应答答案中的所述抽检分词,统计预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数包括:6. according to the random inspection method of the crowdsourcing task described in any one of claim 1 to 5, it is characterized in that, described for the described spot inspection participle in each answer answer of described each response object, statistical preset answer In the thesaurus, the number of times that the sampling word segment hits the sampling keyword, as the basic number of times corresponding to each response answer includes: 获取所述应答对象参与的每个历史众包任务,作为参考任务;Obtain each historical crowdsourcing task that the respondent participates in, as a reference task; 针对每个所述参考任务,从所述预设的答案词库中获取抽检关键字,并统计所述应答对象的应答答案对应的抽检分词命中所述抽检关键字的次数,作为基础次数。For each of the reference tasks, the sampling keyword is obtained from the preset answer thesaurus, and the number of times that the sampling word corresponding to the response answer of the answering object hits the sampling keyword is counted as the basic number of times. 7.根据权利要求1至5任一项所述众包任务的抽检方法,其特征在于,所述根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值包括:7. according to the sampling method of the crowdsourcing task described in any one of claim 1 to 5, it is characterized in that, described according to the basic times corresponding to each response answer, determine that the reliability value corresponding to each response object comprises: 根据所述基础次数,统计每个所述参考任务对应的抽检关键字之和M,并统计所述应答对象对于每个所述参考任务命中次数之和N,其中,M和N均为正整数,且N小于等于M;According to the basic number of times, count the sum M of the sampling keywords corresponding to each of the reference tasks, and count the sum N of the hit times of the response object for each of the reference tasks, where M and N are both positive integers , and N is less than or equal to M; 采用公式δ=N/M进行计算,得到所述可靠性值δ。The formula δ=N/M is used for calculation to obtain the reliability value δ. 8.一种众包任务的抽检装置,其特征在于,包括:8. A sampling device for crowdsourcing tasks, comprising: 获取模块,用于针对每个历史众包任务,获取每个应答对象对应所述历史众包任务的应答答案;an obtaining module, used for obtaining the response answer of each response object corresponding to the historical crowdsourcing task for each historical crowdsourcing task; 解析模块,用于对所述应答答案进行解析处理,得到抽检分词,并提取所述抽检分词的抽检关键字,将所述抽检关键字存入到预设的答案词库;The parsing module is used to parse and process the response answer, obtain the word-sampling check, and extract the keyword of the spot-checking of the word-sampling check, and store the keyword of the spot-checking into a preset answer thesaurus; 统计模块,用于针对所述每个应答对象的每个应答答案中的所述抽检分词,统计所述预设的答案词库中,所述抽检分词命中所述抽检关键字的次数,作为每个应答答案对应的基础次数;The statistical module is used to count the number of times that the sampling word hits the sampling keyword in the preset answer thesaurus for the sampling word in each response answer of the each response object, as each The basic times corresponding to each answer; 确定模块,用于根据所述每个应答答案对应的基础次数,确定每个应答对象对应的可靠性值;A determination module, used for determining the reliability value corresponding to each response object according to the basic times corresponding to each response answer; 选取模块,用于按照所述可靠性值由小到大的顺序,选取预设数量的应答对象,作为抽检对象,并对所述抽检对象对应的应答答案进行检查操作。The selection module is configured to select a preset number of response objects in the order of the reliability values from small to large as sampling objects, and to check the response answers corresponding to the sampling objects. 9.一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至7中任一项所述的众包任务的抽检方法的步骤。9. A computer device comprising a memory and a processor, wherein a computer program is stored in the memory, and when the processor executes the computer program, the crowdsourcing task according to any one of claims 1 to 7 is realized. The steps of the sampling method. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7中任一项所述众包任务的抽检方法的步骤。10. A computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, any one of claims 1 to 7 is implemented. The steps of the sampling method for the package task.
CN202010134385.6A 2020-03-02 2020-03-02 Crowd-sourced task spot check method and device, computer equipment and storage medium Pending CN111460810A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010134385.6A CN111460810A (en) 2020-03-02 2020-03-02 Crowd-sourced task spot check method and device, computer equipment and storage medium
PCT/CN2020/118461 WO2021174829A1 (en) 2020-03-02 2020-09-28 Crowdsourced task inspection method, apparatus, computer device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010134385.6A CN111460810A (en) 2020-03-02 2020-03-02 Crowd-sourced task spot check method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111460810A true CN111460810A (en) 2020-07-28

Family

ID=71679970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010134385.6A Pending CN111460810A (en) 2020-03-02 2020-03-02 Crowd-sourced task spot check method and device, computer equipment and storage medium

Country Status (2)

Country Link
CN (1) CN111460810A (en)
WO (1) WO2021174829A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112765985A (en) * 2021-01-13 2021-05-07 中国科学技术信息研究所 Named entity identification method for specific field patent embodiment
WO2021174829A1 (en) * 2020-03-02 2021-09-10 平安科技(深圳)有限公司 Crowdsourced task inspection method, apparatus, computer device, and storage medium
CN113486246A (en) * 2021-07-26 2021-10-08 平安科技(深圳)有限公司 Information searching method, device, equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114926066A (en) * 2022-05-31 2022-08-19 广西盖德科技有限公司 Efficient sampling check decision method and system based on process factors
CN116303715A (en) * 2023-02-27 2023-06-23 阿里巴巴(中国)有限公司 Map data sampling inspection and classification model training method, device, equipment, medium
CN116137073B (en) * 2023-04-19 2023-06-27 北京国电通网络技术有限公司 Remote intelligent selective examination method for electric power materials and equipment materials, electronic equipment and medium
CN118229155B (en) * 2024-05-16 2024-11-05 深圳市博派智能移动科技有限公司 Mobile phone motherboard function test management method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106849A1 (en) * 2005-11-04 2007-05-10 Sun Microsystems, Inc. Method and system for adaptive intelligent prefetch
US20150235160A1 (en) * 2014-02-20 2015-08-20 Xerox Corporation Generating gold questions for crowdsourcing
CN109978339A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 AI interviews model training method, device, computer equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100837358B1 (en) * 2006-08-25 2008-06-12 한국전자통신연구원 Apparatus and method for field adaptive portable broadcasting subtitle machine translation using dynamic translation resources
CN105117398B (en) * 2015-06-25 2018-10-26 扬州大学 A kind of software development problem auto-answer method based on crowdsourcing
CN110196901B (en) * 2019-06-28 2022-02-11 北京百度网讯科技有限公司 Constructing method, device, computer equipment and storage medium of dialogue system
CN111460810A (en) * 2020-03-02 2020-07-28 平安科技(深圳)有限公司 Crowd-sourced task spot check method and device, computer equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070106849A1 (en) * 2005-11-04 2007-05-10 Sun Microsystems, Inc. Method and system for adaptive intelligent prefetch
US20150235160A1 (en) * 2014-02-20 2015-08-20 Xerox Corporation Generating gold questions for crowdsourcing
CN109978339A (en) * 2019-02-27 2019-07-05 平安科技(深圳)有限公司 AI interviews model training method, device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021174829A1 (en) * 2020-03-02 2021-09-10 平安科技(深圳)有限公司 Crowdsourced task inspection method, apparatus, computer device, and storage medium
CN112765985A (en) * 2021-01-13 2021-05-07 中国科学技术信息研究所 Named entity identification method for specific field patent embodiment
CN112765985B (en) * 2021-01-13 2023-10-27 中国科学技术信息研究所 A named entity recognition method for patent embodiments in specific fields
CN113486246A (en) * 2021-07-26 2021-10-08 平安科技(深圳)有限公司 Information searching method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2021174829A1 (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN111460810A (en) Crowd-sourced task spot check method and device, computer equipment and storage medium
JP7153004B2 (en) COMMUNITY Q&A DATA VERIFICATION METHOD, APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM
WO2022218186A1 (en) Method and apparatus for generating personalized knowledge graph, and computer device
US10089296B2 (en) System and method for sentiment lexicon expansion
CN106940788B (en) Intelligent scoring method and device, computer equipment and computer readable medium
CN113076735B (en) Target information acquisition method, device and server
US9710829B1 (en) Methods, systems, and articles of manufacture for analyzing social media with trained intelligent systems to enhance direct marketing opportunities
WO2020077824A1 (en) Method, apparatus, and device for locating abnormality, and storage medium
WO2021169485A1 (en) Dialogue generation method and apparatus, and computer device
CN117931991B (en) Training sample acquisition and large model optimization training method and device
CN113704422B (en) Text recommendation method, device, computer equipment and storage medium
CN117932036A (en) Dialogue processing method, device, electronic device and storage medium
CN111143556A (en) Software function point automatic counting method, device, medium and electronic equipment
CN111859969A (en) Data analysis method and device, electronic equipment and storage medium
CN114003693A (en) Question answering method, model training method, equipment and program product thereof
CN115359799A (en) Speech recognition method, training method, device, electronic equipment and storage medium
CN117332068A (en) Human-computer interaction methods, devices, electronic equipment and storage media
CN117422067A (en) Information processing method, information processing device, electronic equipment and storage medium
CN112735564A (en) Mental health state prediction method, mental health state prediction apparatus, mental health state prediction medium, and computer program product
CN112580896A (en) Knowledge point prediction method, device, equipment and storage medium
CN113343714B (en) Information extraction method, model training method and related equipment
CN113407677B (en) Method, apparatus, device and storage medium for evaluating consultation dialogue quality
CN114625960A (en) On-line evaluation method and device, electronic equipment and storage medium
CN110688558A (en) Method and device for searching web page, electronic equipment and storage medium
CN112732969A (en) Image semantic analysis method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40031289

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination