[go: up one dir, main page]

CN106611058A - Test question searching method and device - Google Patents

Test question searching method and device Download PDF

Info

Publication number
CN106611058A
CN106611058A CN201611229381.6A CN201611229381A CN106611058A CN 106611058 A CN106611058 A CN 106611058A CN 201611229381 A CN201611229381 A CN 201611229381A CN 106611058 A CN106611058 A CN 106611058A
Authority
CN
China
Prior art keywords
examination question
search results
search
test question
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611229381.6A
Other languages
Chinese (zh)
Inventor
丁新朗
林亚男
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Genius Technology Co Ltd
Original Assignee
Guangdong Genius Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Genius Technology Co Ltd filed Critical Guangdong Genius Technology Co Ltd
Priority to CN201611229381.6A priority Critical patent/CN106611058A/en
Publication of CN106611058A publication Critical patent/CN106611058A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/5846Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using extracted text

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for searching test questions, comprising the following steps: acquiring an original image of a target test question, and carrying out image recognition on the original image of the target test question; carrying out full text retrieval in a question bank based on the result of image identification to obtain a test question searching result; when the number of the obtained test question search results is more than two, calculating first scores corresponding to the obtained test question search results according to a scoring mechanism of full-text retrieval; calculating second scores corresponding to the obtained test question search results respectively according to a similarity calculation method; and according to a preset weighted linear scheme, carrying out weighted calculation on the first score and the second score, determining a final score, and sequencing and outputting the test question search results from high to low according to the final score. By the scheme of the invention, the accuracy of the test question search is improved.

Description

一种试题搜索方法及装置Method and device for searching test questions

技术领域technical field

本发明涉及通信技术领域,具体涉及一种试题搜索方法及装置。The invention relates to the field of communication technology, in particular to a test question search method and device.

背景技术Background technique

目前,教育行业也在与互联网接轨,出现了许多在线教育产品,其中包括具备拍照答疑等功能的搜题类产品。搜题类产品旨在学生用户在作业中遇到难题时,可以获取包含题目的图像并对该图像进行图像识别,基于图像识别的结果在后台题库搜索用户需要的题目和答案解析。At present, the education industry is also in line with the Internet, and many online education products have appeared, including question-searching products with functions such as taking pictures and answering questions. Question-searching products are designed for student users to obtain images containing questions and perform image recognition on the images when they encounter difficulties in homework. Based on the results of image recognition, they can search for questions and answer analysis that users need in the background question bank.

然而,由于现有的图像识别技术会被拍摄的环境光线及灯光所影响,常常导致用户使用搜题类产品时,无法准确地搜索出用户所需要的试题及答案解析,满足不了用户的搜题需求,影响该类产品的用户体验。However, because the existing image recognition technology will be affected by the ambient light and lighting of the shooting, it often leads to the inability to accurately search for the test questions and answer analysis required by the user when using the search question product, and cannot satisfy the user's search question Demand affects the user experience of this type of product.

发明内容Contents of the invention

本发明提供一种试题搜索方法及装置,旨在提高试题搜索的准确率。The invention provides a test question search method and device, aiming at improving the accuracy of test question search.

本发明实施例的第一方面,提供一种试题搜索方法,所述试题搜索方法包括:According to the first aspect of the embodiments of the present invention, a method for searching for test questions is provided, and the method for searching for test questions includes:

获取目标试题的原始图像,并对所述目标试题的原始图像进行图像识别;Obtaining the original image of the target test question, and performing image recognition on the original image of the target test question;

基于图像识别的结果在题库中进行全文检索,获取试题搜索结果;Based on the results of image recognition, perform a full-text search in the question bank to obtain the search results of the test questions;

当获取的试题搜索结果的数量为两个以上时,根据全文检索的评分机制,计算获取的各个试题搜索结果分别对应的第一评分;When the number of obtained test question search results is more than two, according to the scoring mechanism of the full-text search, calculate the first score corresponding to each obtained test question search result;

根据相似度算法,计算获取的各个试题搜索结果分别对应的第二评分;According to the similarity algorithm, calculate the second score corresponding to the obtained search results of each test question;

根据预设加权线性方案,对所述第一评分及第二评分进行加权计算,确定最终评分,并根据所述最终评分,由高至低对所述试题搜索结果进行排序并输出。According to the preset weighted linear scheme, the first score and the second score are weighted and calculated to determine the final score, and according to the final score, the search results of the test questions are sorted from high to low and output.

本发明实施例的第二方面,提供一种试题搜索装置,所述试题搜索装置包括:According to the second aspect of the embodiments of the present invention, a test question search device is provided, and the test question search device includes:

目标试题获取单元,用于获取目标试题的原始图像,并对所述目标试题的原始图像进行图像识别;A target test item acquisition unit, configured to acquire the original image of the target test item, and perform image recognition on the original image of the target test item;

初步检索单元,用于基于所述目标试题获取单元得到图像识别的结果在题库中进行全文检索,获取试题搜索结果;A preliminary retrieval unit, configured to perform a full-text search in the question bank based on the image recognition results obtained by the target test question acquisition unit to obtain test question search results;

第一评分计算单元,用于当所述初步检索单元获取到的试题搜索结果的数量为两个以上时,根据全文检索的评分机制,计算获取的各个试题搜索结果分别对应的第一评分;The first score calculation unit is used to calculate the first scores corresponding to each of the obtained test question search results according to the scoring mechanism of the full-text search when the number of test question search results obtained by the preliminary retrieval unit is more than two;

第二评分计算单元,用于根据相似度算法,计算所述初步检索单元获取到的各个试题搜索结果分别对应的第二评分;The second score calculation unit is used to calculate the second scores corresponding to the search results of each test question obtained by the preliminary retrieval unit according to the similarity algorithm;

搜索结果确定单元,用于根据预设加权线性方案,对所述第一评分计算单元得到的第一评分及所述第二评分计算单元得到的第二评分进行加权计算,确定最终评分,并根据所述最终评分,由高至低对所述试题搜索结果进行排序并输出。A search result determining unit, configured to perform weighted calculations on the first score obtained by the first score calculation unit and the second score obtained by the second score calculation unit according to a preset weighted linear scheme, to determine a final score, and according to The final score is to sort and output the test question search results from high to low.

由上可见,在本发明方案中,首先获取目标试题的原始图像,并对所述目标试题的原始图像进行图像识别,然后基于图像识别的结果在题库中进行全文检索,获取试题搜索结果,当试题搜索结果包含两个以上结果时,根据全文检索的评分机制,计算各个试题搜索结果分别对应的第一评分,并根据相似度算法,计算各个试题搜索结果分别对应的第二评分,最后根据预设加权线性方案,对所述第一评分及第二评分进行加权计算,确定最终评分,并根据所述最终评分,由高至低对所述试题搜索结果进行排序并输出。使得试题搜索的结果能够被进一步筛选,并最终筛选出更加准确的,匹配度更高的试题及相应解析。相对于现有技术中,由于图像识别的不准确性而影响到最终搜索到的试题结果,导致无法搜索出匹配的试题,本发明方案提高了试题搜索时的准确率,更好的满足了用户的搜题需求,提升了用户体验。As can be seen from the above, in the solution of the present invention, first obtain the original image of the target test question, and perform image recognition on the original image of the target test question, then perform a full-text search based on the image recognition result in the question bank to obtain the test question search result, when When the test question search results contain more than two results, according to the scoring mechanism of the full-text search, the first score corresponding to each test question search result is calculated, and the second score corresponding to each test question search result is calculated according to the similarity algorithm. A weighted linear scheme is set up to perform weighted calculation on the first score and the second score to determine the final score, and according to the final score, the search results of the test questions are sorted from high to low and output. The results of the test question search can be further screened, and finally more accurate, higher matching test questions and corresponding analysis can be screened out. Compared with the prior art, the inaccuracy of image recognition affects the final search results of test questions, resulting in the inability to search for matching test questions. The solution of the present invention improves the accuracy of test question search and better satisfies the needs of users. search questions, improving the user experience.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1为本发明实施例一提供的试题搜索方法的实现流程图;Fig. 1 is the implementation flowchart of the test question search method provided by Embodiment 1 of the present invention;

图2为本发明实施例一提供的试题搜索方法步骤S102的具体实现流程图;Fig. 2 is the specific implementation flowchart of step S102 of the test question search method provided by Embodiment 1 of the present invention;

图3为本发明实施例二提供的试题搜索装置的结构框图。FIG. 3 is a structural block diagram of a test question search device provided in Embodiment 2 of the present invention.

具体实施方式detailed description

为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而非全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to make the purpose, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described The embodiments are only some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

在本发明实施例中,首先获取目标试题的原始图像,并对所述目标试题的原始图像进行图像识别,然后基于图像识别的结果在题库中进行全文检索,获取试题搜索结果,当试题搜索结果包含两个以上结果时,根据全文检索的评分机制,计算各个试题搜索结果分别对应的第一评分,并根据相似度算法,计算各个试题搜索结果分别对应的第二评分,最后根据预设加权线性方案,对所述第一评分及第二评分进行加权计算,确定最终评分,并根据所述最终评分,由高至低对所述试题搜索结果进行排序并输出。使得试题搜索的结果能够被进一步筛选,并最终筛选出更加准确的,匹配度更高的试题及相应解析。In the embodiment of the present invention, first obtain the original image of the target test question, and perform image recognition on the original image of the target test question, and then perform a full-text search in the question bank based on the image recognition result to obtain the test question search result, when the test question search result When there are more than two results, according to the scoring mechanism of full-text retrieval, calculate the first score corresponding to each test question search result, and calculate the second score corresponding to each test question search result according to the similarity algorithm, and finally according to the preset weighted linear The solution is to perform weighted calculation on the first score and the second score, determine the final score, and sort and output the search results of the test questions from high to low according to the final score. The results of the test question search can be further screened, and finally more accurate, higher matching test questions and corresponding analysis can be screened out.

以下结合具体实施例对本发明的实现进行详细描述:The realization of the present invention is described in detail below in conjunction with specific embodiment:

实施例一Embodiment one

图1示出了本发明实施例一提供的试题搜索方法的实现流程,详述如下:Fig. 1 shows the implementation process of the test question search method provided by Embodiment 1 of the present invention, which is described in detail as follows:

在步骤S101中,获取目标试题的原始图像,并对上述目标试题的原始图像进行图像识别。In step S101, the original image of the target test question is acquired, and image recognition is performed on the original image of the target test question.

在本发明实施例中,可以通过摄像头拍摄的方式获取包含目标试题的原始图像,例如可以是启动拍照功能后,通过摄像头对目标试题进行拍照。或者,在步骤S101中,也可以从其它设备主动或被动地获取包含目标试题的原始图像;或者,在步骤S101中,也可以从图像库中获取包含目标试题的原始图像,此处不作限定。在获取到目标试题的原始图像后,对上述目标试题的原始图像进行文本识别,以此实现通过多种方式,方便迅速的获取到包含有目标试题的原始图像。具体地,在步骤S101中,可以采用光学字符识别(OCR,OpticalCharacter Recognition)技术对获取到的包含目标试题的原始图像进行文本识别。当然,步骤S104也可以采用其它文本识别技术对获取到的包含目标试题的原始图像进行文本识别,此处不作限定。In the embodiment of the present invention, the original image containing the target test question may be acquired by means of camera shooting, for example, after the camera function is activated, the camera may take a picture of the target test question. Alternatively, in step S101, the original image containing the target test question may also be actively or passively obtained from other devices; or, in step S101, the original image containing the target test question may also be obtained from an image library, which is not limited here. After the original image of the target test question is obtained, the text recognition is performed on the original image of the target test question, so as to realize the convenient and rapid acquisition of the original image containing the target test question in various ways. Specifically, in step S101, optical character recognition (OCR, Optical Character Recognition) technology may be used to perform text recognition on the acquired original image containing the target test question. Of course, step S104 may also use other text recognition technologies to perform text recognition on the acquired original image containing the target test question, which is not limited here.

在步骤S102中,基于图像识别的结果在题库中进行全文检索,获取试题搜索结果。In step S102, a full-text search is performed in the question bank based on the image recognition result to obtain test question search results.

在本发明实施例中,试题搜索装置将会在接收到包含有目标试题的原始图像并对上述原始图像进行图像识别后,基于图像识别的结果,在题库中进行全文检索,获取到与目标试题相关的搜索结果。其中,题库包括本地的题库,也包括互联网上的题库。本地题库可以为试题搜索装置中内置的题库,也可以是用户经由互联网下载至试题搜索装置中的题库,以此方便用户在联网环境下能够下载所需题库,并能够在离线环境下访问下载好的题库,此处不做限定。全文检索由于会将步骤S101中图像识别得到的结果全部进行检索,因而所获得的搜索结果的相关度与匹配度都会较高。In the embodiment of the present invention, after receiving the original image containing the target test question and performing image recognition on the original image, the test question search device will perform a full-text search in the question bank based on the result of image recognition to obtain the target test question. Related search results. Wherein, the question bank includes a local question bank, and also includes a question bank on the Internet. The local question bank can be the built-in question bank in the test question search device, or it can be the question bank downloaded to the test question search device by the user via the Internet, so that the user can download the required question bank in a networked environment, and can access and download it in an offline environment. The question bank is not limited here. Since the full-text search will retrieve all the results obtained from the image recognition in step S101, the obtained search results will have a high degree of relevance and matching.

具体地,在基于图像识别的结果在题库中进行全文检索时,可以使用Lucene框架对图像识别的结果进行全文检索。Lucene使用的开发语言为Java,是一款开源的全文检索引擎工具包。它不是一个完整的全文检索引擎,而是一个全文检索引擎的架构,提供了完整的查询引擎和索引引擎与部分文本分析引擎。Lucene为开发人员提供了简单易用的工具包,以此实现方便的在试题搜索装置中建立起完整的全文搜索引擎,因而用户能够基于此实现全文检索的功能。当然,步骤S102中,还可以使用其他检索框架或检索程序对图像识别的结果进行全文检索,例如Galago、Xapian或者Zebra等,在此不做限定。Specifically, when performing a full-text search in the question bank based on the image recognition result, the Lucene framework can be used to perform a full-text search on the image recognition result. The development language used by Lucene is Java, which is an open source full-text search engine toolkit. It is not a complete full-text search engine, but a full-text search engine architecture, which provides a complete query engine, index engine and partial text analysis engine. Lucene provides developers with an easy-to-use toolkit to facilitate the establishment of a complete full-text search engine in the test question search device, so that users can realize the full-text search function based on this. Of course, in step S102, other retrieval frameworks or retrieval programs may also be used to perform full-text retrieval of the image recognition results, such as Galago, Xapian, or Zebra, etc., which are not limited here.

在步骤S103中,当获取的试题搜索结果的数量为两个以上时,根据全文检索的评分机制,计算获取的各个实体搜索结果分别对应的第一评分。In step S103, when the number of obtained test question search results is more than two, according to the scoring mechanism of the full-text search, calculate the first scores corresponding to each of the obtained entity search results.

在本发明实施例中,步骤S102最终获得的试题搜索结果的数量是无法确定的。当用户搜索的目标试题较为新颖时,步骤S102可能会出现检索到的试题搜索结果很少,甚至没有的情况。针对获取的试题搜索结果数量不同,存在以下几种应用场景:In the embodiment of the present invention, the number of test question search results finally obtained in step S102 cannot be determined. When the target test question searched by the user is relatively new, in step S102 there may be few or no search results for the test question retrieved. Depending on the number of search results obtained for test questions, there are the following application scenarios:

在一种应用场景中,步骤S102没有获取到任何与目标试题相匹配的试题搜索结果。此时,步骤S103可以直接返回无搜索结果的信息,以此告知用户,无法在已有题库中搜索到与目标试题相匹配的题目。In an application scenario, step S102 does not obtain any search results of test questions matching the target test question. At this time, step S103 may directly return the information that there is no search result, so as to inform the user that a question matching the target test question cannot be found in the existing question bank.

在另一种应用场景中,步骤S102只获取到一个试题搜索结果。此时,步骤S103直接将获取到的上述一个试题搜索结果返回给用户,供用户查阅。In another application scenario, step S102 only obtains one test question search result. At this time, step S103 directly returns the acquired search result of the above-mentioned test question to the user for the user to check.

可选地,在上述两种应用场景中,还可以在屏幕上显示步骤S102中,试题搜索装置具体在哪些题库中进行了搜索,并为用户推荐可以尝试进行搜索的其它题库,用户可自行选择下载题库进行再次搜索或者联网在线再次搜索,以此获得更多的试题搜索结果。Optionally, in the above two application scenarios, it is also possible to display on the screen which question banks the test question search device has searched in step S102, and recommend other question banks that the user can try to search for, and the user can choose Download the question bank to search again or search online again to get more test question search results.

在第三种应用场景中,步骤S102获取到了两个以上试题搜索结果。此时,由于试题搜索结果存在多个,因而步骤S103可以先对其进行第一次评分筛选。全文检索技术提供了一种评分机制,可以对搜索得到的结果进行评分,搜索结果的评分越高,代表其与目标试题的相关度越高。在步骤S103中,针对步骤S102获得的每一个搜索结果,都会进行一次评分,所得得分即为各个搜索结果分别对应的第一评分,并将其记为S1。上述第一评分S1将会被暂时的存储起来,以待后续利用上述第一评分S1进行进一步的计算。In the third application scenario, step S102 obtains more than two test question search results. At this point, since there are multiple test question search results, step S103 may first perform the first scoring screening on them. The full-text retrieval technology provides a scoring mechanism that can score the search results. The higher the score of the search result, the higher its relevance to the target test questions. In step S103, each search result obtained in step S102 is scored once, and the obtained score is the first score corresponding to each search result, which is recorded as S1. The above-mentioned first score S1 will be temporarily stored for subsequent further calculation using the above-mentioned first score S1.

具体地,当在步骤S102中为使用Lucene框架对图像识别的结果进行全文检索时,针对上述第三种应用场景,步骤S103可以为,对获取的各个试题搜索结果分别进行Lucene评分,获得上述各个试题搜索结果分别对应的第一评分。基于Lucene框架进行全文检索时,能够对每一个试题搜索结果进行评分。其中,任一试题搜索结果所对应的Lucene评分的最大分值为1.0.Specifically, when the Lucene framework is used to perform full-text search on the results of image recognition in step S102, for the above-mentioned third application scenario, step S103 may be to perform Lucene scores on the obtained search results of each test question to obtain the above-mentioned The first score corresponding to the test question search results respectively. When performing full-text search based on the Lucene framework, it can score each search result of the test question. Among them, the maximum score of the Lucene score corresponding to any test question search result is 1.0.

在步骤S104中,根据相似度算法,计算获取的各个试题搜索结果分别对应的第二评分。In step S104, according to the similarity algorithm, the second scores corresponding to the acquired search results of each test question are calculated.

在本发明实施例中,步骤S103中可能会出现三种应用场景,而步骤S104则是针对第三种应用场景,即,步骤S102获得的试题搜索数量为两个以上时,才会执行的步骤。在步骤S104中,将会根据相似度算法,针对步骤S102获得的每一个搜索结果,与目标试题作相似度对比,再次进行一次评分,所得得分即为各个搜索结果分别对应的第二评分,将其记为S2。上述第二评分S2将会被暂时的存储起来,以待后续利用上述第二评分S2进行进一步的计。In the embodiment of the present invention, there may be three application scenarios in step S103, and step S104 is for the third application scenario, that is, a step that will be executed only when the number of search questions obtained in step S102 is more than two . In step S104, according to the similarity algorithm, each search result obtained in step S102 will be compared with the target test questions for similarity, and scoring will be performed again, and the obtained score will be the second score corresponding to each search result. It is denoted as S2. The above-mentioned second score S2 will be temporarily stored for further calculation using the above-mentioned second score S2.

具体地,上述相似度算法可以为最长公共子序列算法。则此时,步骤S104具体表现为,根据最长公共子序列算法,对获取的各个试题搜索结果分别进行评分,获得上述各个试题搜索结果分别对应的第二评分。最长公共子序列,英文缩写为LCS(longest CommonSubsequence),它可以描述两段文本之间的相似度,因而在步骤S104中,使用最长公共子序列算法,可以有效地对各个试题搜索结果与目标试题之间的相似度进行计算,以此对试题搜索结果进行进一步的分析。可选地,试题搜索装置可以根据第二评分S2的分值,由高至低对试题搜索结果进行二次排序。Specifically, the above similarity algorithm may be the longest common subsequence algorithm. Then, at this time, step S104 is embodied as, according to the longest common subsequence algorithm, scoring each obtained search result of the test question respectively, and obtaining the second score corresponding to each search result of the test question. The longest common subsequence, English abbreviation is LCS (longest CommonSubsequence), it can describe the similarity between two paragraphs of texts, so in step S104, use the longest common subsequence algorithm, can effectively search for each test item search result and The similarity between the target test questions is calculated, so as to further analyze the search results of the test questions. Optionally, the test question search device may perform secondary ranking on the test question search results from high to low according to the scores of the second score S2.

在步骤S105中,根据预设加权线性方案,对上述第一评分及第二评分进行加权计算,确定最终评分,并根据上述最终评分,由高至低对上述两个以上的试题搜索结果进行排序并输出。In step S105, according to the preset weighted linear scheme, the above-mentioned first score and the second score are weighted and calculated to determine the final score, and according to the above-mentioned final score, the search results of the above two or more test questions are sorted from high to low and output.

在本发明实施例中,当处于的上述的第三种应用场景时,即步骤S102获得的试题搜索结果数量为两个以上时,步骤S105中的试题搜索装置将分别针对每一个试题搜索结果,获取对应的第一评分S1及第二评分S2,并根据预设加权线性方案,对每一个试题搜索结果所对应的第一评分S1及第二评分S2进行加权计算,得到试题搜索结果对应的最终评分,将其记为S3,并根据最终评分S3的分值,由高至低,对相应的试题搜索结果进行排序,并将排序后的结果输出,以供用户查阅。In the embodiment of the present invention, when in the above-mentioned third application scenario, that is, when the number of test question search results obtained in step S102 is more than two, the test question search device in step S105 will search for each test question search result, Obtain the corresponding first score S1 and second score S2, and perform weighted calculations on the first score S1 and second score S2 corresponding to each test question search result according to the preset weighted linear scheme, and obtain the final score corresponding to the test question search result Score, record it as S3, and sort the corresponding test question search results from high to low according to the final score S3, and output the sorted results for users to review.

进一步地,上述预设加权线性方案可以为:S3=n1*S1+n2*S2。其中,n1>0,n2>0,n1+n2=1。具体地,可以取n1=0.3,取n2=0.7,这种情况下得到的最终评分S3及相对应试题搜索结果的排序较为符合大部分用户的搜题需求。Further, the above preset weighted linear scheme may be: S3=n1*S1+n2*S2. Wherein, n1>0, n2>0, n1+n2=1. Specifically, n1=0.3 and n2=0.7 can be set. In this case, the final score S3 and the sorting of the corresponding search results of the test questions are more in line with the search questions of most users.

由上可见,在本发明实施例中,可以对试题搜索结果进行两次评分,并对这两次评分作加权计算,得到最终评分,由于最终评分是对前两次评分的加权处理,因而得到的最终试题搜索结果准确性将得到提高,可以很好的解决用户在光线不强或者受到其他外界坏境干扰的情况下,对目标试题进行拍照搜题时,试题搜索结果会受到影响,导致结果不准确这一问题。提高了用户搜题时的准确率,提升了用户的操作体验,更好的满足了用户的搜题需求。It can be seen from the above that in the embodiment of the present invention, the search results of the test questions can be scored twice, and the two scores can be weighted to obtain the final score. Since the final score is a weighted process of the previous two scores, the The accuracy of the search results of the final test questions will be improved, which can well solve the problem that when users take pictures of the target test questions and search for them when the light is not strong or are disturbed by other external environments, the search results of the test questions will be affected, resulting in Inaccurate on this question. It improves the accuracy of the user's search questions, improves the user's operating experience, and better meets the user's search questions.

图2示出了本发明实施例提供的试题搜索方法步骤S102的具体实现流程图,详述如下:Fig. 2 shows the specific implementation flowchart of step S102 of the test question search method provided by the embodiment of the present invention, which is described in detail as follows:

在步骤S201中,根据图像识别的结果在题库中进行全文检索,获得所有相关的试题搜索结果。In step S201, a full-text search is performed in the question bank according to the image recognition result to obtain all relevant test question search results.

在步骤S202中,当上述所有相关的试题搜索结果的数量为两个以上时,根据预设的数据模型对上述所有相关的试题搜索结果进行多维度统计,确定包含试题搜索结果数量最多的试题类别。In step S202, when the number of all the above-mentioned relevant test question search results is more than two, perform multi-dimensional statistics on all the above-mentioned relevant test question search results according to the preset data model, and determine the test question category containing the largest number of test question search results .

在本发明实施例中,步骤S201中获得的所有相关的试题搜索结果的数量是不确定的。只有当步骤S201中获得的所有相关试题搜索结果的数量为两个以上时,步骤S202才会根据预设的数据过滤模型对上述所有相关的试题搜索结果进行多维度统计,以此能够筛选出最佳类别下的试题搜索结果。具体地,上述维度包括但不限于如下一种以上:时间、地域、科目、年级。In this embodiment of the present invention, the number of all related test question search results obtained in step S201 is uncertain. Only when the number of all relevant test question search results obtained in step S201 is more than two, step S202 will perform multi-dimensional statistics on all the above-mentioned relevant test question search results according to the preset data filtering model, so as to filter out the most Search results for test questions in the best category. Specifically, the above-mentioned dimensions include but are not limited to more than one of the following: time, region, subject, and grade.

其中,时间为试题所对应的命题年份。对于学生用户,特别是对于处于初高中的学生用户群体,他们所面对的题目常常是往年的中高考真题或者中高考模拟题,这些题目在题库中常常会保留有年份信息。此时在步骤S202中,试题搜索装置可以统计出与目标试题相关的试题搜索结果中,各个试题结果时间分布的情况,并筛选出包含试题搜索结果最多的所属年份,获得该所属年份下的所有试题搜索结果。Among them, the time is the proposition year corresponding to the test question. For student users, especially those in middle and high school, the questions they face are often real exam questions or simulated exam questions from previous years, and the year information of these questions is often kept in the question bank. At this time, in step S202, the test question search device can count the time distribution of each test question result in the test question search results related to the target test question, and filter out the year that contains the most test question search results, and obtain all the results of the test question under the year. Exam search results.

相对应的,地域为试题所对应的命题区域。对于初高中学生用户群体来说,地域也是他们在搜索试题时的一个重要因素,因为初高中试题常常有很强的地域性。此时在步骤S202中,用户在搜索试题时,试题搜索装置可以以地域作为统计的一个维度,统计各个试题结果地域分布的情况,并筛选出包含试题搜索结果最多的所属地域,并获得该所属地域下的所有试题搜索结果。Correspondingly, the region is the proposition area corresponding to the test question. For middle and high school student user groups, region is also an important factor when they search for test questions, because middle and high school test questions often have strong regional characteristics. At this time, in step S202, when the user searches for a test question, the test question search device can take the region as a dimension of statistics, count the geographical distribution of the results of each test question, and filter out the region that contains the most test question search results, and obtain the belonging region. Search results for all test questions under the region.

同时,科目也是一个重要维度。每一个试题都有其所属的学科科目,但由于实际生活中,知识是交叉的,因此某些知识点可能会在多个科目中都出现。以理科为例,物理、化学、生物、数学等科目,相互之间都会有所交叉,特别是数学作为理科类的基础学科,在物理、化学、生物中,都会有所涉及。例如,当用户以一道数学中的应用题作为目标试题进行搜索时,如果此应用题的包含有物理背景,题目为计算某个物体的位移,速度,路程或其他物理量时,若对上述目标试题在题库中作全文检索,有极大可能不仅能够检索到数学科目下的试题,而且会检索到包含有物理或者其他科目的题目。此时在步骤S202中,设置科目为其中一个统计维度,在对搜索到的所有试题结果进行了科目统计,确定不同科目下所属的试题搜索结果的数量后,能够有效地筛选出与目标试题所属科目相同的试题搜索结果。At the same time, subject is also an important dimension. Each test question has its own subject, but because in real life, knowledge is crossed, so some knowledge points may appear in multiple subjects. Taking science as an example, subjects such as physics, chemistry, biology, and mathematics will all intersect with each other. In particular, mathematics, as a basic subject of science, will be involved in physics, chemistry, and biology. For example, when a user searches for an application problem in mathematics as the target test question, if the application problem contains a physical background, and the topic is to calculate the displacement, velocity, distance or other physical quantities of an object, if the above target test question In the full-text search in the question bank, it is very likely that not only the test questions under the subject of mathematics can be retrieved, but also the questions containing physics or other subjects can be retrieved. At this time, in step S202, the subject is set as one of the statistical dimensions. After performing subject statistics on all the searched test question results and determining the number of test question search results belonging to different subjects, it is possible to effectively filter out the results that are related to the target test question. Search results for test questions in the same subject.

而针对年级,现实生活中,对某一个知识点的,往往在学习了这个知识点的年级,出题数目最为集中,而其他年级对该知识点是不会着重考察的。因而试题搜索装置也设置了对于出题年级的统计,能够较为准确的筛选出与目标试题处于同一年级的试题搜索结果,避免了试题搜索结果超纲或其他不符合用户期望的情况发生。As for grades, in real life, for a certain knowledge point, the number of questions is most concentrated in the grade where this knowledge point has been learned, while other grades will not focus on this knowledge point. Therefore, the test question search device is also equipped with statistics on the grades of the questions, which can more accurately filter out test question search results that are in the same grade as the target test question, avoiding the occurrence of test question search results that exceed the outline or other situations that do not meet user expectations.

可选地,用户可自行选择使用哪种或者哪几种维度进行统计。Optionally, the user can choose which dimension or dimensions to use for statistics.

可选地,用户可在进行了多维度统计后,在不同维度下所包含的不同试题类别中,自行选择用户感兴趣或者需要的试题类别,而不用试题搜索装置自动挑选包含试题搜索结果数量最多的试题类别。Optionally, after performing multi-dimensional statistics, the user can choose the test question category that the user is interested in or needs from among the different test question categories contained in different dimensions, instead of using the test question search device to automatically select the test questions that contain the largest number of search results category of test questions.

在步骤S203中,在上述包含试题搜索结果数量最多的试题类别下,保留相关度高的试题搜索结果。In step S203 , under the test question category containing the largest number of test question search results, keep the test question search results with high relevance.

在本发明实施例中,由于步骤S202能够获取到某一维度下,试题搜索结果数量最多的某一试题类别,其中该试题类别下的试题搜索结果数是不确定的,其中我们只需要保留与目标试题相关度高的试题搜索结果,对于与目标试题有所相关,但相关度不高的试题搜索结果来说,该试题搜索结果是可以被过滤的。In the embodiment of the present invention, since step S202 can obtain a test question category with the largest number of test question search results in a certain dimension, the number of test question search results under this test question category is uncertain, and we only need to keep the same The test question search results with high relevance to the target test question can be filtered for the test question search results that are related to the target test question but not highly relevant.

具体地,当上述试题类别下的试题搜索结果数量多于N个时,则对上述试题类别下的试题搜索结果进行相关度排序,保留相关度高的前N个试题搜索结果。其中上述试题类别为步骤S202确定的包含试题搜索结果数量最多的试题类别。在对上述试题类别进行相关度排序时,可以用包括但不限于如下任一种算法进行相关度计算:TF-IDF(Term Frequency-Inverse Document Frequency)或者Okapi BM25(Best Match 25)。根据上述相关度计算的得分,由高至低对试题搜索结果进行排序,并基于上述相关度得分作筛选,只保留排序后排名靠前的相关度得分较高的N个试题搜索结果,并不考虑试题类别,剔除步骤S201得到其它所有试题搜索结果。Specifically, when the number of test question search results under the above test question category is more than N, the test question search results under the above test question category are sorted by relevance, and the top N test question search results with high relevance are retained. The above-mentioned test question category is the test question category determined in step S202 that contains the largest number of test question search results. When sorting the above test item categories, you can use any of the following algorithms to calculate the correlation: TF-IDF (Term Frequency-Inverse Document Frequency) or Okapi BM25 (Best Match 25). According to the score calculated by the above correlation, sort the search results of the test questions from high to low, and filter based on the above correlation score, only keep the top N test search results with high correlation scores after sorting, and do not Considering the test question category, step S201 is eliminated to obtain all other test question search results.

若上述试题类别下的试题搜索结果数量不多于N个,则保留上述试题类别下的所有试题搜索结果。此时由于上述试题类别下的试题搜索结果数量并未多余N个,则无论试题搜索结果与目标试题的相关度高低,都作保留处理,无需在上述试题类别下作基于相关度的进一步筛选。但是,其他试题类别下的试题搜索结果将全部被剔除。If the number of test question search results under the above test question category is not more than N, all the test question search results under the above test question category will be retained. At this time, since the number of test question search results under the above test question categories does not exceed N, no matter how high or low the correlation between the test question search results and the target test question is, they will be retained, and there is no need for further screening based on the correlation under the above test question categories. However, search results for test questions under other test question categories will all be excluded.

其中,上述提到的N为预设的大于或等于2的自然数。在步骤S203中,N可以是由试题搜索装置初始设定的,也可以是用户自行设定的。可选地,试题搜索装置将初始地设定N为100。Wherein, the aforementioned N is a preset natural number greater than or equal to 2. In step S203, N may be initially set by the test question search device, or may be set by the user. Optionally, the test question search device will initially set N to be 100.

由上可见,在本实施例中,能够在对试题结果进行评分之前,首先对试题结果进行一次粗略的初步筛选,以此避免了对所有的试题搜索结果都进行后续的第一评分,第二评分及最终评分。大大节约了试题搜索方法对系统资源的占用。It can be seen from the above that in this embodiment, before scoring the results of the test questions, the results of the test questions can be first roughly screened, so as to avoid the subsequent first scoring and second scoring of all the test question search results. scoring and final scoring. It greatly saves the occupation of system resources by the test question search method.

需要说明的是,本发明实施例中提及的试题搜索装置具体可以以软件的方式(例如App的形式)和/或硬件的方式集成在移动终端(例如智能手机、平板电脑、学习机等终端)中。It should be noted that the test question search device mentioned in the embodiment of the present invention can specifically be integrated in a mobile terminal (such as a smart phone, a tablet computer, a learning machine, etc.) in the form of software (such as an App) and/or hardware. )middle.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,相应的程序可以存储于一计算机可读取存储介质中,上述的存储介质,如ROM/RAM、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, and the corresponding program can be stored in a computer-readable storage medium. The above-mentioned storage medium, Such as ROM/RAM, disk or CD, etc.

实施例二Embodiment two

图3示出了本发明实施例二提供的试题搜索装置的具体结构框图,为了便于说明,仅示出了与本发明实施例相关的部分。该试题搜索装置3包括:FIG. 3 shows a specific structural block diagram of the test question search device provided by the second embodiment of the present invention. For the convenience of description, only the parts related to the embodiment of the present invention are shown. This test question searching device 3 comprises:

目标试题获取单元31,用于获取目标试题的原始图像,并对上述目标试题的原始图像进行图像识别;A target test question acquisition unit 31, configured to acquire the original image of the target test question, and perform image recognition on the original image of the target test question;

初步检索单元32,用于基于上述目标试题获取单元31得到图像识别的结果在题库中进行全文检索,获取试题搜索结果;The preliminary retrieval unit 32 is used to perform a full-text search in the question bank based on the image recognition results obtained by the above-mentioned target test question acquisition unit 31, and obtain the test question search results;

第一评分计算单元33,用于当上述初步检索单元32获取到的试题搜索结果的数量为两个以上时,根据全文检索的评分机制,计算获取的各个试题搜索结果分别对应的第一评分;The first score calculation unit 33 is used to calculate the first scores corresponding to the obtained test question search results respectively according to the scoring mechanism of the full-text search when the quantity of the test question search results obtained by the above-mentioned preliminary retrieval unit 32 is more than two;

第二评分计算单元34,用于根据相似度算法,计算上述初步检索单元32获取到的各个试题搜索结果分别对应的第二评分;The second score calculation unit 34 is used to calculate the second score corresponding to each test question search result obtained by the above-mentioned preliminary retrieval unit 32 according to the similarity algorithm;

搜索结果确定单元35,用于根据预设加权线性方案,对上述第一评分计算单元33得到的第一评分及上述第二评分计算单元34得到的第二评分进行加权计算,确定最终评分,并根据上述最终评分,由高至低对上述试题搜索结果进行排序并输出。The search result determination unit 35 is configured to perform weighted calculations on the first score obtained by the first score calculation unit 33 and the second score obtained by the second score calculation unit 34 according to a preset weighted linear scheme to determine a final score, and According to the above final scores, the search results of the above test questions are sorted from high to low and output.

具体地,上述初步检索单元32,用于使用Lucene框架对图像识别的结果进行全文检索;Specifically, the above-mentioned preliminary retrieval unit 32 is used to perform full-text retrieval on the results of image recognition using the Lucene framework;

具体地,上述第一评分确定单元33用于,对上述初步检索单元32获取到的各个试题搜索结果分别进行Lucene评分,获得上述各个试题搜索结果分别对应的第一评分。Specifically, the above-mentioned first score determination unit 33 is configured to perform Lucene scoring on each test question search result obtained by the above-mentioned preliminary retrieval unit 32 to obtain the first score corresponding to each test question search result.

具体地,上述第二评分确定单元34用于,当使用的相似度算法为最长公共子序列算法时,根据最长公共子序列算法,对上述初步检索单元32获取到的各个试题搜索结果分别进行评分,获得上述各个试题搜索结果分别对应的第二评分。Specifically, the above-mentioned second score determination unit 34 is configured to, when the similarity algorithm used is the longest common subsequence algorithm, according to the longest common subsequence algorithm, search results of each test question obtained by the above-mentioned preliminary retrieval unit 32 respectively Scoring is performed to obtain the second scoring corresponding to the search results of the above test questions.

可选地,上述初步检索单元32还包括:Optionally, the above-mentioned preliminary retrieval unit 32 also includes:

搜索结果获取子单元,用于根据上述目标试题获取单元31得到的图像识别结果在题库中进行全文检索,获得所有相关的试题搜索结果;The search result acquisition subunit is used to perform a full-text search in the question bank according to the image recognition result obtained by the above-mentioned target test question acquisition unit 31, and obtain all relevant test question search results;

多维度统计子单元,用于当上述搜索结果获取子单元获取到的所有相关的试题搜索结果的数量为两个以上时,根据预设的数据模型对上述搜索结果获取子单元获取到的所有相关的试题搜索结果进行多维度统计,确定包含试题搜索结果数量最多的试题类别;The multi-dimensional statistical subunit is used to analyze all relevant test questions obtained by the above search result acquisition subunit according to the preset data model when the number of all relevant test question search results obtained by the above search result acquisition subunit is more than two Perform multi-dimensional statistics on the search results of test questions to determine the category of test questions that contains the largest number of search results for test questions;

结果筛选子单元,用于在上述多维度统计子单元确定的包含试题搜索结果数量最多的试题类别下,保留相关度高的试题搜索结果。The result screening subunit is used for retaining highly relevant test question search results under the test question category determined by the above-mentioned multidimensional statistics subunit that contains the largest number of test question search results.

具体地,上述结果筛选子单元用于,若上述多维度统计子单元确定的试题类别下的试题搜索结果数量多于N个,则对上述试题类别下的试题搜索结果进行相关度排序,保留相关度高的前N个试题搜索结果;若上述多维度统计子单元确定的试题类别下的试题搜索结果数量不多于N个,则保留上述试题类别下的所有试题搜索结果;其中,N为预设的大于或等于2的自然数。Specifically, the above-mentioned result screening subunit is used for, if the number of test question search results under the test question category determined by the above-mentioned multi-dimensional statistics subunit is more than N, then sort the test question search results under the above test question category by relevance, and keep the relevant The top N test question search results with high degree; if the number of test question search results under the test question category determined by the above-mentioned multi-dimensional statistics sub-unit is not more than N, all the test question search results under the above test question category will be kept; Let it be a natural number greater than or equal to 2.

需要说明的是,本发明实施例中的试题搜索装置具体可以以软件的方式(例如App的形式)和/或硬件的方式集成在移动终端(例如智能手机、平板电脑、学习机等终端)中。It should be noted that the test question search device in the embodiment of the present invention can specifically be integrated in a mobile terminal (such as a smart phone, a tablet computer, a learning machine, etc.) in the form of software (such as an App) and/or hardware. .

应理解,本发明实施例中的试题搜索装置可以用于实现上述方法实施例中的全部技术方案,其各个功能模块的功能可以根据上述方法实施例中的方法具体实现,其具体实现过程可参照上述实施例中的相关描述,此处不再赘述。It should be understood that the test question search device in the embodiment of the present invention can be used to realize all the technical solutions in the above-mentioned method embodiments, and the functions of each functional module can be specifically realized according to the methods in the above-mentioned method embodiments, and the specific implementation process can refer to Relevant descriptions in the foregoing embodiments will not be repeated here.

由上可见,在本发明实施例中,试题搜索装置能够在对试题结果进行评分之前,首先对试题结果进行一次粗略的初步筛选,以此避免了对所有的试题搜索结果都进行后续的第一评分,第二评分及最终评分。大大节约了试题搜索装置对系统资源的占用。It can be seen from the above that in the embodiment of the present invention, the test question search device can first perform a rough preliminary screening on the test question results before scoring the test question results, so as to avoid subsequent first-time screening of all test question search results. Scoring, Second Scoring and Final Scoring. It greatly saves the occupation of system resources by the test question search device.

需要说明的是,在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。It should be noted that, in the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or can be Integrate into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

对于前述的各方法实施例,为了简便描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其它顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定都是本发明所必须的。For the foregoing method embodiments, for the sake of simplicity of description, they are expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action sequence, because according to the present invention, Certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。In the foregoing embodiments, the descriptions of each embodiment have their own emphases, and for parts not described in detail in a certain embodiment, reference may be made to relevant descriptions of other embodiments.

以上为对本发明所提供的一种较佳实施例而已,对于本领域的一般技术人员,依据本发明实施例的思想,在具体实施方式及应用范围上均会有改变之处,综上,本说明书内容不应理解为对本发明的限制。The above is just a preferred embodiment provided by the present invention. For those of ordinary skill in the art, according to the idea of the embodiment of the present invention, there will be changes in the specific implementation and application range. In summary, this The content of the description should not be construed as limiting the present invention.

Claims (10)

1. a kind of examination question searching method, it is characterised in that include:
The original image of target examination question is obtained, and the original image to the target examination question carries out image recognition;
Result based on image recognition carries out full-text search in exam pool, obtains examination question Search Results;
When the quantity of the examination question Search Results for obtaining is two or more, according to the scoring of full-text search, calculate what is obtained Corresponding first scoring of each examination question Search Results difference;
According to similarity algorithm, corresponding second scoring of each examination question Search Results difference for obtaining is calculated;
According to default weighted linear scheme, the described first scoring and the second scoring are weighted, it is determined that final scoring, and According to the final scoring, from high to low examination question Search Results are ranked up and are exported.
2. examination question searching method as claimed in claim 1, it is characterised in that the result based on image recognition is in exam pool Full-text search is carried out, examination question Search Results are obtained, including:
Full-text search is carried out in exam pool according to the result of image recognition, all related examination question Search Results are obtained;
When the quantity of all related examination question Search Results is two or more, according to default data model to the institute The examination question Search Results for having correlation carry out various dimensions statistics, it is determined that comprising the most examination question classification of examination question Search Results quantity;
Under the examination question classification most comprising examination question Search Results quantity, retain the high examination question Search Results of degree of association.
3. examination question searching method as claimed in claim 2, it is characterised in that it is described described comprising examination question Search Results quantity Under most examination question classifications, retain the high examination question Search Results of degree of association, including:
If the examination question Search Results quantity under the examination question classification is more than N number of, to the examination question search knot under the examination question classification Fruit carries out relevancy ranking, retains the high top n examination question Search Results of degree of association;
If the examination question Search Results quantity under the examination question classification is not more than N number of, retain all examinations under the examination question classification Topic Search Results;
N be it is default be more than or equal to 2 natural number.
4. the examination question searching method as described in any one of claim 1-3, it is characterised in that the result based on image recognition Full-text search is carried out in exam pool, including:
Full-text search is carried out to the result of image recognition using Lucene frameworks;
It is described when obtain examination question Search Results quantity be two or more when, according to the scoring of full-text search, calculating is obtained Corresponding first scoring of each examination question Search Results difference for taking, including:
Each examination question Search Results to obtaining carry out respectively Lucene scorings, obtain described each examination question Search Results right respectively The first scoring answered.
5. the examination question searching method as described in any one of claim 1-3, it is characterised in that the similarity algorithm is most long public affairs Common subsequence algorithm;It is described that corresponding second scoring of each examination question Search Results difference for obtaining is calculated based on similarity algorithm, Including:
According to longest common subsequence algorithm, each examination question Search Results to obtaining score respectively, obtain it is described each Corresponding second scoring of examination question Search Results difference.
6. a kind of examination question searcher, it is characterised in that the examination question searcher includes:
Target examination question acquiring unit, for obtaining the original image of target examination question, and the original image to the target examination question enters Row image recognition;
Preliminary search unit, the result for being obtained image recognition based on the target examination question acquiring unit is carried out entirely in exam pool Text retrieval, obtains examination question Search Results;
First score calculation unit, the quantity of the examination question Search Results for getting when the preliminary search unit be two with When upper, according to the scoring of full-text search, corresponding first scoring of each examination question Search Results difference for obtaining was calculated;
Second score calculation unit, for according to similarity algorithm, calculating each examination question that the preliminary search unit gets Corresponding second scoring of Search Results difference;
Search result determination unit, for according to default weighted linear scheme, the first score calculation unit is obtained the The second scoring that one scoring and the second score calculation unit are obtained is weighted, it is determined that final scoring, and according to institute Final scoring is stated, from high to low the examination question Search Results is ranked up and is exported.
7. a kind of examination question searcher as claimed in claim 6, it is characterised in that the preliminary search unit, including:
Search Results obtain subelement, for the image recognition result that obtained according to the target examination question acquiring unit in exam pool Full-text search is carried out, all related examination question Search Results are obtained;
Various dimensions count subelement, for obtaining all related examination question search knot that subelement gets when the Search Results When the quantity of fruit is two or more, all phases that subelement gets are obtained to the Search Results according to default data model The examination question Search Results of pass carry out various dimensions statistics, it is determined that comprising the most examination question classification of examination question Search Results quantity;
As a result subelement is screened, for counting the most comprising examination question Search Results quantity of subelement determination in the various dimensions Under examination question classification, retain the high examination question Search Results of degree of association.
8. a kind of examination question searcher as claimed in claim 7, it is characterised in that the result screens subelement, concrete to use In when the examination question Search Results quantity under the various dimensions count the examination question classification that subelement determines is more than N number of, to the examination Examination question Search Results under topic classification carry out relevancy ranking, retain the high top n examination question Search Results of degree of association;
When the examination question Search Results quantity under the various dimensions count the examination question classification that subelement determines is not more than N number of, retain All examination question Search Results under the examination question classification;
N be it is default be more than or equal to 2 natural number.
9. the examination question searcher as described in any one of claim 6-8, it is characterised in that the preliminary search unit is specifically used In carrying out full-text search to the result of image recognition using Lucene frameworks;
The first score calculation unit is specifically for each examination question Search Results got to the preliminary search unit point Lucene scorings are not carried out, corresponding first scoring of described each examination question Search Results difference is obtained.
10. the examination question searcher as described in any one of claim 6-8, it is characterised in that the second score calculation unit Specifically for when the similarity algorithm for using is longest common subsequence algorithm, according to longest common subsequence algorithm, to institute State each examination question Search Results that preliminary search unit gets to be scored respectively, obtain described each examination question Search Results point Not corresponding second scoring.
CN201611229381.6A 2016-12-27 2016-12-27 Test question searching method and device Pending CN106611058A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611229381.6A CN106611058A (en) 2016-12-27 2016-12-27 Test question searching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611229381.6A CN106611058A (en) 2016-12-27 2016-12-27 Test question searching method and device

Publications (1)

Publication Number Publication Date
CN106611058A true CN106611058A (en) 2017-05-03

Family

ID=58636226

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611229381.6A Pending CN106611058A (en) 2016-12-27 2016-12-27 Test question searching method and device

Country Status (1)

Country Link
CN (1) CN106611058A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909520A (en) * 2017-11-02 2018-04-13 浙江工商大学 The method and apparatus that make the test based on examination question correlation
CN108416264A (en) * 2018-01-29 2018-08-17 山东汇贸电子口岸有限公司 A kind of searching method and search module of supporting OCR to input
CN109325051A (en) * 2018-08-14 2019-02-12 广东小天才科技有限公司 Problem search result output method based on solution model and learning equipment
CN111241276A (en) * 2020-01-06 2020-06-05 广东小天才科技有限公司 Topic searching method, device, equipment and storage medium
CN111563498A (en) * 2020-04-30 2020-08-21 广东小天才科技有限公司 A method, device, electronic device and storage medium for topic collection
CN111652203A (en) * 2020-06-01 2020-09-11 北京字节跳动网络技术有限公司 Resource pushing method and device
CN118132733A (en) * 2024-05-07 2024-06-04 江西风向标智能科技有限公司 Test question retrieval method, system, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140052716A1 (en) * 2012-08-14 2014-02-20 International Business Machines Corporation Automatic Determination of Question in Text and Determination of Candidate Responses Using Data Mining
CN103955525A (en) * 2014-05-09 2014-07-30 北京奇虎科技有限公司 Method and client for searching answer to test question
CN105373594A (en) * 2015-10-23 2016-03-02 广东小天才科技有限公司 A method and device for screening repeated test questions in a question bank
CN105426390A (en) * 2015-10-23 2016-03-23 广东小天才科技有限公司 Test question searching method and system based on image recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140052716A1 (en) * 2012-08-14 2014-02-20 International Business Machines Corporation Automatic Determination of Question in Text and Determination of Candidate Responses Using Data Mining
CN103955525A (en) * 2014-05-09 2014-07-30 北京奇虎科技有限公司 Method and client for searching answer to test question
CN105373594A (en) * 2015-10-23 2016-03-02 广东小天才科技有限公司 A method and device for screening repeated test questions in a question bank
CN105426390A (en) * 2015-10-23 2016-03-23 广东小天才科技有限公司 Test question searching method and system based on image recognition

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909520A (en) * 2017-11-02 2018-04-13 浙江工商大学 The method and apparatus that make the test based on examination question correlation
CN108416264A (en) * 2018-01-29 2018-08-17 山东汇贸电子口岸有限公司 A kind of searching method and search module of supporting OCR to input
CN109325051A (en) * 2018-08-14 2019-02-12 广东小天才科技有限公司 Problem search result output method based on solution model and learning equipment
CN111241276A (en) * 2020-01-06 2020-06-05 广东小天才科技有限公司 Topic searching method, device, equipment and storage medium
CN111563498A (en) * 2020-04-30 2020-08-21 广东小天才科技有限公司 A method, device, electronic device and storage medium for topic collection
CN111563498B (en) * 2020-04-30 2024-01-19 广东小天才科技有限公司 Method and device for collecting questions, electronic equipment and storage medium
CN111652203A (en) * 2020-06-01 2020-09-11 北京字节跳动网络技术有限公司 Resource pushing method and device
CN118132733A (en) * 2024-05-07 2024-06-04 江西风向标智能科技有限公司 Test question retrieval method, system, storage medium and electronic equipment
CN118132733B (en) * 2024-05-07 2024-07-26 江西风向标智能科技有限公司 Test question retrieval method, system, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN112214670B (en) Online course recommendation method and device, electronic equipment and storage medium
CN106611058A (en) Test question searching method and device
CN105512331B (en) A kind of video recommendation method and device
CN107526846B (en) Method, device, server and medium for generating and sorting channel sorting model
CN113704623B (en) Data recommendation method, device, equipment and storage medium
CN110020185A (en) Intelligent search method, terminal and server
EP4057163A1 (en) Facilitating use of images as search queries
KR102126911B1 (en) Key player detection method in social media using KeyplayerRank
US10061767B1 (en) Analyzing user reviews to determine entity attributes
CN111538903B (en) Method and device for determining search recommended word, electronic equipment and computer readable medium
WO2023040516A1 (en) Event integration method and apparatus, and electronic device, computer-readable storage medium and computer program product
CN106959998B (en) Test question recommendation method and device
CN106407316B (en) Software question and answer recommendation method and device based on topic model
CN114238668B (en) Industry information display method, system, computer equipment and storage medium
CN116561402B (en) Method, device and server for acquiring target content information in webpage
CN118070291A (en) Vulnerability information processing method and electronic device
CN105653546A (en) Method and system for searching target theme
CN111538830B (en) Law article retrieval method, device, computer equipment and storage medium
KR101850853B1 (en) Method and apparatus of search using big data
CN113204662A (en) Method and device for predicting user group based on shooting and searching behaviors and computer equipment
CN119829574A (en) Data processing method, system and related device
CN119441426A (en) Intelligent question-answering method, device, electronic device and storage medium
KR102324179B1 (en) System for providing child care center data integration service
CN111428086B (en) Background music recommendation method and system for video production of electronic commerce
CN110689079B (en) Processing method, processing device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170503

RJ01 Rejection of invention patent application after publication