CN111666383A - Information processing method, information processing device, electronic equipment and computer readable storage medium - Google Patents
Information processing method, information processing device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN111666383A CN111666383A CN202010622216.7A CN202010622216A CN111666383A CN 111666383 A CN111666383 A CN 111666383A CN 202010622216 A CN202010622216 A CN 202010622216A CN 111666383 A CN111666383 A CN 111666383A
- Authority
- CN
- China
- Prior art keywords
- image
- report file
- report
- valid information
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3334—Selection or weighting of terms from queries, including natural language queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Finance (AREA)
- Strategic Management (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本申请提供了一种信息处理方法、装置、电子设备及计算机可读存储介质,涉及信息处理领域。该方法包括:针对搜索关键词,基于预设的搜索引擎搜索得到对应的至少一个有效信息组;确定出每个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;将各个有效信息组基于所属的报告文件进行聚合,得到聚合后的各份报告文件和每份报告文件各自对应的有效信息;针对每份报告文件生成内容盒,得到至少一个内容盒;所述内容盒包括报告文件的报告文件信息和对应的有效信息;展示各个内容盒。本申请提升了搜索关键词的命中率,以及减少了用户在浏览时的甄别行为,从而提升了用户体验。
The present application provides an information processing method, apparatus, electronic device, and computer-readable storage medium, and relates to the field of information processing. The method includes: searching for a search keyword based on a preset search engine to obtain at least one corresponding valid information group; determining the report file to which each valid information group belongs, and obtaining report file information of each report file; The valid information group is aggregated based on the report file to which it belongs, and each report file after the aggregation and the valid information corresponding to each report file are obtained; a content box is generated for each report file, and at least one content box is obtained; the content box includes Report file information and corresponding valid information of the report file; display each content box. The present application improves the hit rate of search keywords and reduces the user's screening behavior when browsing, thereby improving the user experience.
Description
技术领域technical field
本申请涉及信息处理技术领域,具体而言,本申请涉及一种信息处理方法、装置、电子设备及计算机可读存储介质。The present application relates to the technical field of information processing, and in particular, the present application relates to an information processing method, an apparatus, an electronic device, and a computer-readable storage medium.
背景技术Background technique
行业报告,指的是商业信息、是竞争情报,具有很强的时效性,一般都是通过国家政府机构及专业市调组织的一些最新统计数据及调研数据,根据合作机构专业的研究模型和特定的分析方法,经过行业资深人士的分析和研究,做出的对当前行业、市场的研究分析和预测。Industry reports refer to business information and competitive intelligence, which are highly time-sensitive. They are generally obtained through some of the latest statistical data and survey data from national government agencies and professional market survey organizations. Through the analysis and research of industry veterans, the research analysis and forecast of the current industry and market are made.
现有技术中搜索行业报告是通过用户输入的关键词,做报告内图表标题的命中,提取报告内部相关的可视化图表内容,在搜索结果页以瀑布流的方式进行展示,展示结果如图1所示。In the prior art, searching for industry reports is to use keywords input by the user to hit the title of the chart in the report, extract the relevant visual chart content inside the report, and display it in the form of waterfall flow on the search result page. The display results are shown in Figure 1. Show.
但是,这种搜索方式存在如下缺点:However, this search method has the following disadvantages:
1)通过关键词命中报告中可视化图表的标题,对报告内容的结构化标准程度要求比较高,在内容结构较为简单的券商类报告中可以有较好的应用效果,但是对于内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现命中率较低的问题;1) The title of the visual chart in the keyword hit report requires a relatively high level of structural standard of the report content. It can have a good application effect in the brokerage report with a relatively simple content structure, but it is not suitable for the diversified content format and Institutional reports and other types of reports with high complexity will have a low hit rate;
2)搜索关键词匹配的内容以瀑布流的方式进行内容的展示,且不同的内容在搜索结果页面中都是相互独立的,当匹配的内容排序混乱时,需要用户对内容进行甄别,用户体验较差。2) The content matched by the search keywords is displayed in the form of waterfall flow, and different content is independent of each other in the search result page. When the matching content is in disorder, the user needs to screen the content, and the user experience poor.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种信息处理方法、装置、电子设备及计算机可读存储介质,可以解决搜索行业报告中命中率较低、用户需要甄别的问题。所述技术方案如下:The present application provides an information processing method, device, electronic device, and computer-readable storage medium, which can solve the problems of low hit rate in search industry reports and user needs to be screened. The technical solution is as follows:
第一方面,提供了一种信息处理方法,该方法包括:In a first aspect, an information processing method is provided, the method comprising:
针对搜索关键词,搜索得到与所述搜索关键词对应的至少一个有效信息组;For search keywords, search to obtain at least one valid information group corresponding to the search keywords;
确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;Determine the report file to which each valid information group belongs, and obtain the report file information of each report file;
将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组;Aggregate each valid information group based on the report file to which it belongs, and obtain the aggregated valid information group corresponding to each report file;
针对每份报告文件生成内容盒,得到至少一个内容盒;所述内容盒包括报告文件的所述报告文件信息和对应的有效信息组;A content box is generated for each report file, and at least one content box is obtained; the content box includes the report file information of the report file and a corresponding valid information group;
分别展示所述至少一个内容盒。The at least one content box is displayed, respectively.
优选地,所述至少一个有效信息组中的任一有效信息组包括有效信息图像、有效信息标题以及有效信息关键字;Preferably, any effective information group in the at least one effective information group includes an effective information image, an effective information title and an effective information keyword;
所述方法还包括:The method also includes:
当接收到针对所述至少一个内容盒中任一内容盒的展示指令时,获取与所述任一内容盒对应的各个有效信息组中的有效信息标题;When receiving a display instruction for any content box in the at least one content box, acquiring the valid information title in each valid information group corresponding to the any content box;
通过预设的报告内容阅读器展示各个有效信息标题,以及各个有效信息标题中当前被选中的有效信息标题对应的有效信息组。Each valid information title and a valid information group corresponding to the currently selected valid information title in each valid information title are displayed through the preset report content reader.
优选地,所述报告内容阅读器还设置有针对当前展示的有效信息组的至少一个交互指令;Preferably, the report content reader is further provided with at least one interactive instruction for the currently displayed valid information group;
所述方法还包括:The method also includes:
当所述至少一个内容盒中任一交互指令被触发时,针对当前展示的有效信息组执行所述交互指令对应的交互动作。When any interaction instruction in the at least one content box is triggered, the interaction action corresponding to the interaction instruction is executed for the currently displayed valid information group.
优选地,所述交互指令包括摘录指令;Preferably, the interaction instruction includes an excerpt instruction;
所述当任一交互指令被触发时,针对当前展示的有效信息组执行所述交互指令对应的交互动作,包括:When any interaction instruction is triggered, executing the interaction action corresponding to the interaction instruction for the currently displayed valid information group, including:
当所述摘录指令被触发时,判断预设的收藏夹中是否存在已生成的笔记本;When the excerpt instruction is triggered, determine whether there is a generated notebook in the preset favorites;
若是,则展示已生成的笔记本的笔记本列表,当接收到针对笔记本列表中任一笔记本的确认指令时,将所述当前展示的有效信息组复制至所述笔记本中;If so, display the notebook list of the notebooks that have been generated, and when receiving a confirmation instruction for any notebook in the notebook list, copy the currently displayed valid information group to the notebook;
若否,则展示预设的创建笔记本界面,基于所述创建笔记本界面创建新笔记本,并将所述当前展示的有效信息组复制至所述新笔记本。If not, the preset notebook creation interface is displayed, a new notebook is created based on the notebook creation interface, and the currently displayed valid information group is copied to the new notebook.
优选地,还包括:Preferably, it also includes:
接收到针对预设的收藏夹中已生成的笔记本中任一笔记本的展示指令时,通过报告内容阅读器展示所述笔记本中的有效信息组。When receiving a display instruction for any notebook among the generated notebooks in the preset favorites, display the effective information group in the notebook through the report content reader.
优选地,所述搜索得到与所述搜索关键词对应的至少一个有效信息组,包括:Preferably, the search obtains at least one valid information group corresponding to the search keyword, including:
对所述搜索关键词进行Query分析,得到分析后的关键词;Query analysis is performed on the search keywords to obtain the analyzed keywords;
基于Elasticsearch Query DSL语法对所述分析后的关键词进行拼装,得到有效信息组的查询语句;所述查询语句包括关键字字段和标题字段;Assembling the analyzed keywords based on the Elasticsearch Query DSL grammar to obtain a query statement of an effective information group; the query statement includes a keyword field and a title field;
采用所述查询语句与预设的搜索引擎中的索引进行查询,得到与所述搜索关键词匹配的至少一个有效信息组。The query statement and the preset index in the search engine are used to query, and at least one valid information group matching the search keyword is obtained.
优选地,所述预设的搜索引擎通过如下方式生成:Preferably, the preset search engine is generated in the following manner:
当检测到预设的有效信息数据库中已存储的所述至少一个有效信息组中任一有效信息组发生数据更新时,获取发生数据更新的有效信息组的有效信息标题和有效信息关键字;所述数据更新包括有效信息组的增加、删除、修改中的至少一种;When it is detected that any valid information group in the at least one valid information group that has been stored in the preset valid information database is updated with data, obtain the valid information title and valid information keyword of the valid information group in which the data update has occurred; The data update includes at least one of addition, deletion and modification of valid information groups;
基于所述有效信息标题和有效信息关键字生成索引,并建立所述有效信息标题、有效信息关键字与所述索引的映射关系;其中,所述索引包括标题字段和关键字字段。An index is generated based on the valid information title and valid information keyword, and a mapping relationship between the valid information title, valid information keyword and the index is established; wherein, the index includes a title field and a keyword field.
优选地,所述预设的有效信息数据库通过如下方式生成:Preferably, the preset valid information database is generated in the following manner:
获取报告文件;obtain report documents;
将所述报告文件按页数进行文档切图处理,得到至少一张报告文件图像;Perform document cutting processing on the report file according to the number of pages to obtain at least one report file image;
对每张报告文件图像进行字块识别,得到每张报告文件图像各自对应的至少一个字块;Perform word block identification on each report file image to obtain at least one word block corresponding to each report file image;
将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像;In each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image;
提取出每张有效信息图像的有效信息标题和有效信息关键字,并建立每张有效信息图像、每张有效信息图像各自对应的有效信息标题和有效信息关键字的关联关系;Extracting the valid information title and valid information keyword of each valid information image, and establishing the association relationship between each valid information image and the valid information title and valid information keyword corresponding to each valid information image;
将每张有效信息图像、每张有效信息图像各自对应的有效信息标题、有效信息关键字,以及关联关系存储至所述有效信息数据库。Each valid information image, the valid information title corresponding to each valid information image, the valid information keyword, and the associated relationship are stored in the valid information database.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量是否超过第一数量阈值;Detecting whether the number of digital blocks in each report file image exceeds the first number threshold;
若是,则将每张报告文件图像中超过第一数量阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, the report document images that exceed the first number threshold in each report document image are used as valid information images to obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量与对应的报告文件图像中全部字块的数量的比例是否超过比例阈值;Detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds the proportional threshold;
若是,则将每张报告文件图像中超过比例阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report document image that exceeds the ratio threshold in each report document image as the valid information image, and obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
获取每张报告文件图像中所述至少一个字块的高度,并确定出高度最大的预设数量的目标字块;Obtain the height of the at least one word block in each report file image, and determine the target word block with the maximum height;
检测每张报告文件图像中的目标字块是否包含中文字块;Detect whether the target block in each report file image contains Chinese block;
若是,则检测包含中文块的目标字块中中文字符的数量是否超过第三数量阈值;If so, detect whether the number of Chinese characters in the target word block containing the Chinese block exceeds the third number threshold;
若是,则将每张报告文件图像中目标字块包含中文字块的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report file image in which the target word block contains Chinese character block in each report file image as the valid information image, and obtain at least one valid information image.
第二方面,提供了一种信息处理装置,该装置包括:In a second aspect, an information processing device is provided, the device comprising:
搜索模块,用于针对搜索关键词,搜索得到与所述搜索关键词对应的至少一个有效信息组;A search module, configured to search for a search keyword to obtain at least one valid information group corresponding to the search keyword;
处理模块,用于确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;The processing module is used to determine the report file to which each valid information group belongs, and obtain the report file information of each report file;
聚合模块,用于将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后有效信息组;The aggregation module is used to aggregate each valid information group based on the report file to which it belongs, and obtain the corresponding aggregated valid information group for each report file;
生成模块,用于针对每份报告文件生成内容盒,得到至少一个内容盒;所述内容盒包括报告文件的所述报告文件信息和对应的有效信息组;a generating module, configured to generate a content box for each report file to obtain at least one content box; the content box includes the report file information of the report file and a corresponding valid information group;
展示模块,用于分别展示所述至少一个内容盒。The display module is used to display the at least one content box respectively.
优选地,所述至少一个有效信息组中的任一有效信息组包括有效信息图像、有效信息标题以及有效信息关键字;Preferably, any effective information group in the at least one effective information group includes an effective information image, an effective information title and an effective information keyword;
所述装置还包括:The device also includes:
接收模块,用于接收针对所述至少一个内容盒中任一内容盒的展示指令;a receiving module, configured to receive a display instruction for any one of the at least one content box;
获取模块,用于获取与所述任一内容盒对应的各个有效信息组中的有效信息标题;an acquisition module for acquiring valid information titles in each valid information group corresponding to any of the content boxes;
所述展示模块,还用于通过预设的报告内容阅读器展示各个有效信息标题,以及各个有效信息标题中当前被选中的有效信息标题对应的有效信息组。The display module is further configured to display each valid information title and a valid information group corresponding to the currently selected valid information title in each valid information title through a preset report content reader.
优选地,所述报告内容阅读器还设置有针对当前展示的有效信息组的至少一个交互指令;Preferably, the report content reader is further provided with at least one interactive instruction for the currently displayed valid information group;
所述装置还包括:The device also includes:
执行模块,用于当至少一个交互指令中任一交互指令被触发时,针对当前展示的有效信息组执行所述交互指令对应的交互动作。An execution module, configured to execute an interaction action corresponding to the interaction instruction for the currently displayed valid information group when any interaction instruction in the at least one interaction instruction is triggered.
优选地,所述交互指令包括摘录指令;Preferably, the interaction instruction includes an excerpt instruction;
所述执行模块具体用于:The execution module is specifically used for:
当所述摘录指令被触发时,判断预设的收藏夹中是否存在已生成的笔记本;When the excerpt instruction is triggered, determine whether there is a generated notebook in the preset favorites;
若是,则展示已生成的笔记本的笔记本列表,当接收到针对笔记本列表中任一笔记本的确认指令时,将所述当前展示的有效信息组复制至所述笔记本中;If so, display the notebook list of the notebooks that have been generated, and when receiving a confirmation instruction for any notebook in the notebook list, copy the currently displayed valid information group to the notebook;
若否,则展示预设的创建笔记本界面,基于所述创建笔记本界面创建新笔记本,并将所述当前展示的有效信息组复制至所述新笔记本。If not, the preset notebook creation interface is displayed, a new notebook is created based on the notebook creation interface, and the currently displayed valid information group is copied to the new notebook.
优选地,所述接收模块,还用于接收针对预设的收藏夹中已生成的笔记本中任一笔记本的展示指令;Preferably, the receiving module is further configured to receive a display instruction for any notebook among the generated notebooks in the preset favorites;
所述展示模块,还用于通过报告内容阅读器展示所述笔记本中的有效信息组。The display module is further configured to display valid information groups in the notebook through a report content reader.
优选地,所述搜索模块,包括:Preferably, the search module includes:
分析子模块,用于对所述搜索关键词进行Query分析,得到分析后的关键词;An analysis submodule, for performing Query analysis on the search keywords to obtain the analyzed keywords;
语句拼装子模块,用于基于Elasticsearch Query DSL语法对所述分析后的关键词进行拼装,得到有效信息组的查询语句;所述查询语句包括关键字字段和标题字段;A statement assembling submodule for assembling the analyzed keywords based on the Elasticsearch Query DSL grammar to obtain a query statement of an effective information group; the query statement includes a keyword field and a title field;
查询子模块,用于采用所述查询语句与预设的搜索引擎中的索引进行查询,得到与所述搜索关键词匹配的至少一个有效信息组。The query sub-module is used for querying by using the query statement and a preset index in the search engine to obtain at least one valid information group matching the search keyword.
优选地,所述预设的搜索引擎通过如下方式生成:Preferably, the preset search engine is generated in the following manner:
当检测到预设的有效信息数据库中已存储的所述至少一个有效信息组中任一有效信息组发生数据更新时,获取发生数据更新的有效信息组的有效信息标题和有效信息关键字;所述数据更新包括有效信息组的增加、删除、修改中的至少一种;When it is detected that any valid information group in the at least one valid information group that has been stored in the preset valid information database is updated with data, obtain the valid information title and valid information keyword of the valid information group in which the data update has occurred; The data update includes at least one of addition, deletion and modification of valid information groups;
基于所述有效信息标题和有效信息关键字生成索引,并建立所述有效信息标题、有效信息关键字与所述索引的映射关系;其中,所述索引包括标题字段和关键字字段。An index is generated based on the valid information title and valid information keyword, and a mapping relationship between the valid information title, valid information keyword and the index is established; wherein, the index includes a title field and a keyword field.
优选地,所述预设的有效信息数据库通过如下方式生成:Preferably, the preset valid information database is generated in the following manner:
获取报告文件;obtain report documents;
将所述报告文件按页数进行文档切图处理,得到至少一张报告文件图像;Perform document cutting processing on the report file according to the number of pages to obtain at least one report file image;
对每张报告文件图像进行字块识别,得到每张报告文件图像各自对应的至少一个字块;Perform word block identification on each report file image to obtain at least one word block corresponding to each report file image;
将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像;In each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image;
提取出每张有效信息图像的有效信息标题和有效信息关键字,并建立每张有效信息图像,以及每张有效信息图像各自对应的有效信息标题和有效信息关键字的关联关系;Extracting the valid information title and valid information keyword of each valid information image, and establishing each valid information image, as well as the association relationship between the valid information title and valid information keyword corresponding to each valid information image;
将每张有效信息图像、每张有效信息图像各自对应的有效信息标题、有效信息关键字,以及关联关系存储至所述有效信息数据库。Each valid information image, the valid information title corresponding to each valid information image, the valid information keyword, and the associated relationship are stored in the valid information database.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量是否超过第一数量阈值;Detecting whether the number of digital blocks in each report file image exceeds the first number threshold;
若是,则将每张报告文件图像中超过第一数量阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, the report document images that exceed the first number threshold in each report document image are used as valid information images to obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量与对应的报告文件图像中全部字块的数量的比例是否超过比例阈值;Detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds the proportional threshold;
若是,则将每张报告文件图像中超过比例阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report document image that exceeds the ratio threshold in each report document image as the valid information image, and obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
获取每张报告文件图像中所述至少一个字块的高度,并确定出高度最大的预设数量的目标字块;Obtain the height of the at least one word block in each report file image, and determine the target word block with the maximum height;
检测每张报告文件图像中的目标字块是否包含中文字块;Detect whether the target block in each report file image contains Chinese block;
若是,则检测包含中文块的目标字块中中文字符的数量是否超过第三数量阈值;If so, detect whether the number of Chinese characters in the target word block containing the Chinese block exceeds the third number threshold;
若是,则将每张报告文件图像中目标字块包含中文字块的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report file image in which the target word block contains Chinese character block in each report file image as the valid information image, and obtain at least one valid information image.
第三方面,提供了一种电子设备,该电子设备包括:In a third aspect, an electronic device is provided, the electronic device comprising:
处理器、存储器和总线;processors, memories and buses;
所述总线,用于连接所述处理器和所述存储器;the bus for connecting the processor and the memory;
所述存储器,用于存储操作指令;the memory for storing operation instructions;
所述处理器,用于通过调用所述操作指令,可执行指令使处理器执行如本申请的第一方面所示的信息处理方法对应的操作。The processor is used for invoking the operation instruction, and the executable instruction causes the processor to perform an operation corresponding to the information processing method shown in the first aspect of the present application.
第四方面,提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,该程序被处理器执行时实现本申请第一方面所示的信息处理方法。In a fourth aspect, a computer-readable storage medium is provided, where a computer program is stored on the computer-readable storage medium, and when the program is executed by a processor, the information processing method shown in the first aspect of the present application is implemented.
本申请提供的技术方案带来的有益效果是:The beneficial effects brought by the technical solution provided by the application are:
针对搜索关键词,搜索得到与所述搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;所述内容盒包括报告文件的所述报告文件信息和对应的有效信息组;分别展示所述至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。For the search keyword, search to obtain at least one valid information group corresponding to the search keyword, then determine the report file to which each valid information group belongs, obtain the report file information of each report file, and then combine each valid information group Aggregate based on the report file to which it belongs to obtain an aggregated valid information group corresponding to each report file, generate a content box for each report file, and obtain at least one content box; the content box includes the report file of the report file. information and a corresponding set of valid information; respectively presenting the at least one content box. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports according to the search keywords, including but not limited to the titles of visual charts. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, each valid information group that matches the search keyword and belongs to different report files is obtained through comprehensive identification, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对本申请实施例描述中所需要使用的附图作简单地介绍。In order to illustrate the technical solutions in the embodiments of the present application more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments of the present application.
图1为现有技术中搜索行业报告的搜索结果页面示意图;1 is a schematic diagram of a search result page for searching industry reports in the prior art;
图2为本申请一个实施例提供的一种信息处理方法的流程示意图;2 is a schematic flowchart of an information processing method provided by an embodiment of the present application;
图3为本申请另一实施例提供的一种信息处理方法的流程示意图;3 is a schematic flowchart of an information processing method provided by another embodiment of the present application;
图4为本申请中内容盒的界面示意图;Fig. 4 is the interface schematic diagram of the content box in the application;
图5为本申请中搜索行业报告的搜索结果页面示意图;5 is a schematic diagram of a search result page for searching industry reports in this application;
图6A~6B为本申请中报告内容阅读器的界面示意图一和二;6A-6B are schematic diagrams 1 and 2 of the interface of the report content reader in this application;
图7为本申请中报告内容阅读器的界面示意图三;7 is a schematic diagram three of the interface of the report content reader in this application;
图8A~8B为本申请中选择笔记本进行摘录的效果示意图;8A-8B are schematic diagrams of the effect of selecting a notebook to extract in this application;
图9为本申请中新建笔记本进行摘录的效果示意图;9 is a schematic diagram of the effect of extracting a new notebook in this application;
图10为本申请中摘录的流程示意图;Fig. 10 is the schematic flow chart excerpted in this application;
图11为本申请中收藏夹的界面示意图;Fig. 11 is the interface schematic diagram of favorites in the application;
图12为本申请中采用报告内容阅读器浏览摘录的界面示意图;Fig. 12 is the interface schematic diagram of adopting the report content reader to browse the excerpt in this application;
图13为本申请中基于搜索关键词的搜索流程示意图;13 is a schematic diagram of a search process based on search keywords in the application;
图14为本申请中ES搜索引擎的数据处理示意图;Fig. 14 is the data processing schematic diagram of ES search engine in the application;
图15为本申请中OCR的效果示意图;15 is a schematic diagram of the effect of OCR in the application;
图16为本申请中提取有效信息图像的流程示意图;16 is a schematic flowchart of extracting valid information images in this application;
图17为本申请又一实施例提供的一种信息处理装置的结构示意图;17 is a schematic structural diagram of an information processing apparatus according to another embodiment of the present application;
图18为本申请又一实施例提供的一种信息处理的电子设备的结构示意图。FIG. 18 is a schematic structural diagram of an electronic device for information processing according to another embodiment of the present application.
具体实施方式Detailed ways
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能解释为对本发明的限制。The following describes in detail the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are exemplary and are only used to explain the present application, but not to be construed as limiting the present invention.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。应该理解,当我们称元件被“连接”或“耦接”到另一元件时,它可以直接连接或耦接到其他元件,或者也可以存在中间元件。此外,这里使用的“连接”或“耦接”可以包括无线连接或无线耦接。这里使用的措辞“和/或”包括一个或更多个相关联的列出项的全部或任一单元和全部组合。It will be understood by those skilled in the art that the singular forms "a", "an", "the" and "the" as used herein can include the plural forms as well, unless expressly stated otherwise. It should be further understood that the word "comprising" used in the specification of this application refers to the presence of stated features, integers, steps, operations, elements and/or components, but does not preclude the presence or addition of one or more other features, Integers, steps, operations, elements, components and/or groups thereof. It will be understood that when we refer to an element as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Furthermore, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combination of one or more of the associated listed items.
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions and advantages of the present application clearer, the embodiments of the present application will be further described in detail below with reference to the accompanying drawings.
首先对本申请涉及的几个名词进行介绍和解释:First of all, some terms involved in this application are introduced and explained:
云技术(Cloud technology)基于云计算商业模式应用的网络技术、信息技术、整合技术、管理平台技术、应用技术等的总称,可以组成资源池,按需所用,灵活便利。云计算技术将变成重要支撑。技术网络系统的后台服务需要大量的计算、存储资源,如视频网站、图片类网站和更多的门户网站。伴随着互联网行业的高度发展和应用,将来每个物品都有可能存在自己的识别标志,都需要传输到后台系统进行逻辑处理,不同程度级别的数据将会分开处理,各类行业数据皆需要强大的系统后盾支撑,只能通过云计算来实现。Cloud technology is a general term for network technology, information technology, integration technology, management platform technology, application technology, etc. based on the application of cloud computing business models. It can form a resource pool, which can be used on demand, flexible and convenient. Cloud computing technology will become an important support. Background services of technical network systems require a lot of computing and storage resources, such as video websites, picture websites and more portal websites. With the high development and application of the Internet industry, in the future, each item may have its own identification mark, which needs to be transmitted to the back-end system for logical processing. Data of different levels will be processed separately, and all kinds of industry data need to be strong. The system backing support can only be achieved through cloud computing.
数据库(Database),简而言之可视为电子化的文件柜——存储电子文件的处所,用户可以对文件中的数据进行新增、查询、更新、删除等操作。所谓“数据库”是以一定方式储存在一起、能与多个用户共享、具有尽可能小的冗余度、与应用程序彼此独立的数据集合。Database, in short, can be regarded as an electronic filing cabinet—a place where electronic files are stored, and users can perform operations such as adding, querying, updating, and deleting data in the files. The so-called "database" is a collection of data that is stored together in a certain way, can be shared with multiple users, has as little redundancy as possible, and is independent of applications.
数据库管理系统(Database Management System,DBMS)是为管理数据库而设计的电脑软件系统,一般具有存储、截取、安全保障、备份等基础功能。数据库管理系统可以依据它所支持的数据库模型来作分类,例如关系式、XML(Extensible Markup Language,可扩展标记语言);或依据所支持的计算机类型来作分类,例如服务器群集、移动电话;或依据所用查询语言来作分类,例如SQL(结构化查询语言(Structured Query Language)、XQuery;或依据性能冲量重点来作分类,例如最大规模、最高运行速度;亦或其他的分类方式。不论使用哪种分类方式,一些DBMS能够跨类别,例如,同时支持多种查询语言。A database management system (DBMS) is a computer software system designed for managing databases, and generally has basic functions such as storage, interception, security, and backup. A database management system can be classified according to the database model it supports, such as relational, XML (Extensible Markup Language); or according to the type of computer it supports, such as server clusters, mobile phones; or Classify according to the query language used, such as SQL (Structured Query Language), XQuery; or classify according to the focus of performance impulse, such as the largest scale, the highest running speed; or other classification methods. No matter which one is used In this way of classification, some DBMSs are capable of cross-classification, for example, supporting multiple query languages at the same time.
云存储(cloud storage)是在云计算概念上延伸和发展出来的一个新的概念,分布式云存储系统(以下简称存储系统)是指通过集群应用、网格技术以及分布存储文件系统等功能,将网络中大量各种不同类型的存储设备(存储设备也称之为存储节点)通过应用软件或应用接口集合起来协同工作,共同对外提供数据存储和业务访问功能的一个存储系统。Cloud storage is a new concept extended and developed from the concept of cloud computing. Distributed cloud storage system (hereinafter referred to as storage system) refers to functions such as cluster application, grid technology and distributed storage file system. A storage system that integrates a large number of different types of storage devices (also called storage nodes) in the network through application software or application interfaces to work together to provide external data storage and service access functions.
目前,存储系统的存储方法为:创建逻辑卷,在创建逻辑卷时,就为每个逻辑卷分配物理存储空间,该物理存储空间可能是某个存储设备或者某几个存储设备的磁盘组成。客户端在某一逻辑卷上存储数据,也就是将数据存储在文件系统上,文件系统将数据分成许多部分,每一部分是一个对象,对象不仅包含数据而且还包含数据标识(ID,ID entity)等额外的信息,文件系统将每个对象分别写入该逻辑卷的物理存储空间,且文件系统会记录每个对象的存储位置信息,从而当客户端请求访问数据时,文件系统能够根据每个对象的存储位置信息让客户端对数据进行访问。At present, the storage method of the storage system is as follows: creating a logical volume, and when creating a logical volume, a physical storage space is allocated to each logical volume, and the physical storage space may be composed of a storage device or disks of several storage devices. The client stores data on a logical volume, that is, stores the data on the file system. The file system divides the data into many parts, each part is an object, and the object contains not only data but also data identification (ID, ID entity) and other additional information, the file system writes each object into the physical storage space of the logical volume, and the file system records the storage location information of each object, so that when the client requests to access data, the file system can The storage location information of the object allows the client to access the data.
存储系统为逻辑卷分配物理存储空间的过程,具体为:按照对存储于逻辑卷的对象的容量估量(该估量往往相对于实际要存储的对象的容量有很大余量)和独立冗余磁盘阵列(RAID,Redundant Array of Independent Disk)的组别,预先将物理存储空间划分成分条,一个逻辑卷可以理解为一个分条,从而为逻辑卷分配了物理存储空间。The process of allocating physical storage space by the storage system to the logical volume, specifically: according to the capacity estimation of the objects stored in the logical volume (this estimation often has a large margin relative to the actual capacity of the objects to be stored) and independent redundant disks Array (RAID, Redundant Array of Independent Disk) group, which divides the physical storage space into stripes in advance, and a logical volume can be understood as a stripe, thereby allocating physical storage space for the logical volume.
在本申请中,一种信息处理方法可以在服务器中执行。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。In the present application, an information processing method may be executed in a server. The server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications. , middleware services, domain name services, security services, CDN, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms. The terminal may be a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc., but is not limited thereto. The terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
进一步,用户可以通过终端与服务器进行交互从而实现业务请求。其中,终端可以具有如下特点:Further, the user can interact with the server through the terminal to realize the service request. Among them, the terminal may have the following characteristics:
(1)在硬件体系上,设备具备中央处理器、存储器、输入部件和输出部件,也就是说,设备往往是具备通信功能的微型计算机设备。另外,还可以具有多种输入方式,诸如键盘、鼠标、触摸屏、送话器和摄像头等,并可以根据需要进行调整输入。同时,设备往往具有多种输出方式,如受话器、显示屏等,也可以根据需要进行调整;(1) On the hardware system, the device has a central processing unit, a memory, an input component and an output component, that is to say, the device is often a microcomputer device with communication functions. In addition, it can also have a variety of input methods, such as keyboard, mouse, touch screen, microphone and camera, etc., and can adjust the input according to needs. At the same time, devices often have multiple output methods, such as receivers, display screens, etc., which can also be adjusted as needed;
(2)在软件体系上,设备必须具备操作系统,如Windows Mobile、Symbian、Palm、Android、iOS等。同时,这些操作系统越来越开放,基于这些开放的操作系统平台开发的个性化应用程序层出不穷,如通信簿、日程表、记事本、计算器以及各类游戏等,极大程度地满足了个性化用户的需求;(2) On the software system, the device must have an operating system, such as Windows Mobile, Symbian, Palm, Android, iOS, etc. At the same time, these operating systems are becoming more and more open, and personalized applications developed based on these open operating system platforms emerge in an endless stream, such as address books, calendars, notepads, calculators and various games, etc. Customize the needs of users;
(3)在通信能力上,设备具有灵活的接入方式和高带宽通信性能,并且能根据所选择的业务和所处的环境,自动调整所选的通信方式,从而方便用户使用。设备可以支持GSM(Global System for Mobile Communication,全球移动通信系统)、WCDMA(Wideband CodeDivision Multiple Access,宽带码分多址)、CDMA2000(Code Division MultipleAccess,码分多址)、TDSCDMA(Time Division-Synchronous Code Division MultipleAccess,时分同步码分多址)、Wi-Fi(Wireless-Fidelity,无线保真)以及WiMAX(WorldwideInteroperability for Microwave Access,全球微波互联接入)等,从而适应多种制式网络,不仅支持语音业务,更支持多种无线数据业务;(3) In terms of communication capabilities, the device has flexible access modes and high-bandwidth communication performance, and can automatically adjust the selected communication mode according to the selected business and the environment, so as to facilitate the use of users. The device can support GSM (Global System for Mobile Communication, Global System for Mobile Communication), WCDMA (Wideband CodeDivision Multiple Access, Wideband Code Division Multiple Access), CDMA2000 (Code Division Multiple Access, Code Division Multiple Access), TDSCDMA (Time Division-Synchronous Code) Division Multiple Access, time division synchronous code division multiple access), Wi-Fi (Wireless-Fidelity, wireless fidelity) and WiMAX (Worldwide Interoperability for Microwave Access, global microwave interconnection access), etc., so as to adapt to a variety of standard networks, not only support voice services , and supports a variety of wireless data services;
(4)在功能使用上,设备更加注重人性化、个性化和多功能化。随着计算机技术的发展,设备从“以设备为中心”的模式进入“以人为中心”的模式,集成了嵌入式计算、控制技术、人工智能技术以及生物认证技术等,充分体现了以人为本的宗旨。由于软件技术的发展,设备可以根据个人需求调整设置,更加个性化。同时,设备本身集成了众多软件和硬件,功能也越来越强大。(4) In the use of functions, the equipment pays more attention to humanization, personalization and multi-function. With the development of computer technology, the equipment has changed from the "equipment-centered" model to the "people-centered" model, integrating embedded computing, control technology, artificial intelligence technology and biometric authentication technology, which fully reflects the purpose of people-oriented . Thanks to the development of software technology, the device can adjust the settings according to individual needs, making it more personalized. At the same time, the device itself integrates a lot of software and hardware, and its functions are becoming more and more powerful.
本申请提供的信息处理方法、装置、电子设备和计算机可读存储介质,旨在解决现有技术的如上技术问题。The information processing method, apparatus, electronic device and computer-readable storage medium provided by the present application are intended to solve the above technical problems in the prior art.
下面以具体地实施例对本申请的技术方案以及本申请的技术方案如何解决上述技术问题进行详细说明。下面这几个具体的实施例可以相互结合,对于相同或相似的概念或过程可能在某些实施例中不再赘述。下面将结合附图,对本申请的实施例进行描述。The technical solutions of the present application and how the technical solutions of the present application solve the above-mentioned technical problems will be described in detail below with specific examples. The following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be repeated in some embodiments. The embodiments of the present application will be described below with reference to the accompanying drawings.
在一个实施例中提供了一种信息处理方法,如图2所示,该方法包括:In one embodiment, an information processing method is provided, as shown in FIG. 2 , the method includes:
步骤S201,针对搜索关键词,搜索得到与所述搜索关键词对应的至少一个有效信息组;Step S201, for search keywords, search to obtain at least one valid information group corresponding to the search keywords;
在本发明实施例中,终端中可以安装用于浏览报告的应用程序,应用程序可以包括搜索界面,搜索界面中可以设置搜索栏,用户可以通过在搜索栏中输入搜索关键词并进行搜索,应用程序通过搜索从而得到相应的搜索结果,并将搜索结果展示给用户。在本发明实施例中报告可以是包含相关信息的任意文件,以行业报告为例,行业报告可以是行业分析报告、行业研究报告、行业数据报告等等,比如,《OTA行业-国盛证券_券商报告》就是行业报告。In this embodiment of the present invention, an application program for browsing reports may be installed in the terminal, the application program may include a search interface, and a search bar may be set in the search interface. The program obtains the corresponding search results by searching, and displays the search results to the user. In this embodiment of the present invention, the report may be any file containing relevant information. Taking an industry report as an example, the industry report may be an industry analysis report, an industry research report, an industry data report, etc. The report is an industry report.
进一步,搜索结果可以是与搜索关键词对应的至少一个有效信息组。有效信息组可以包括至少一个有效信息,该有效信息可以与搜索关键词对应。在一些实施例中,有效信息是指,行业报告中与用户的搜索意图相匹配的、并可以直接为用户的研究工作提供高价值度参考信息的报告内容。根据现行市面各机构和团队的报告撰写标准与习惯,报告内的高价值度信息一般在报告的图表类内容中出现。Further, the search result may be at least one valid information group corresponding to the search keyword. The valid information group may include at least one valid information, and the valid information may correspond to a search keyword. In some embodiments, the effective information refers to the report content in the industry report that matches the user's search intention and can directly provide high-value reference information for the user's research work. According to the current report writing standards and habits of various institutions and teams in the market, high-value information in the report generally appears in the chart content of the report.
有效信息包括但不限于报告中的有效信息图像、有效信息标题以及有效信息关键字。其中,有效信息图像,指的是报告中含有高价值有效信息的整页对应的图像;有效信息标题,指的是有效信息图像的标题;有效信息关键字,指的是有效信息图像中包含的关键字,和/或,有效信息图像周边的关键字。在一些实施例中,一个有效信息组可以包括与搜索关键词对应的、属于相同报告文件的至少一个有效信息,该至少一个有效信息可以是与搜索关键词对应的有效信息图像、有效信息标题以及有效信息关键字中的任意一个或多个。The valid information includes, but is not limited to, the valid information image, the valid information title and the valid information keyword in the report. Among them, the effective information image refers to the image corresponding to the entire page containing high-value effective information in the report; the effective information title refers to the title of the effective information image; the effective information keyword refers to the information contained in the effective information image. Keywords, and/or keywords surrounding the valid informational image. In some embodiments, one valid information group may include at least one valid information corresponding to the search keyword and belonging to the same report file, and the at least one valid information may be the valid information image, the valid information title and the valid information corresponding to the search keyword. Any one or more of the valid message keywords.
步骤S202,确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;Step S202, determine the report file to which each valid information group belongs, and obtain the report file information of each report file;
具体而言,各个有效信息组可以存储在有效信息数据库中,当用户采用搜索关键词进行搜索时,预设的搜索引擎可以从预设的有效信息数据库中查询得到匹配的至少一个有效信息组,查询得到的各个有效信息组可以属于不同的报告文件。比如,通过查询得到三个有效信息组:a组、b组和c组,其中,a组和b组属于报告文件A,c组属于报告文件B。其中,一个报告文件可以是一份行业报告,比如,行业报告《OTA行业-国盛证券_券商报告》就是一份报告文件。Specifically, each valid information group can be stored in a valid information database, and when a user searches using a search keyword, a preset search engine can query the preset valid information database to obtain at least one matching valid information group, Each valid information group obtained from the query may belong to different report files. For example, three valid information groups are obtained through query: group a, group b, and group c, wherein group a and group b belong to report file A, and group c belongs to report file B. Among them, a report file may be an industry report, for example, the industry report "OTA Industry - Guosheng Securities_Broker Report" is a report file.
因此,在搜索得到了各个有效信息组后,可以进一步确定出每个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息。其中,报告文件信息包括但不限于:报告的ID、创建人、标签、简介、摘要、创建时间、行业类型。Therefore, after each valid information group is obtained by searching, the report file to which each valid information group belongs can be further determined, and the report file information of each report file can be obtained. Wherein, the report file information includes but is not limited to: report ID, creator, label, introduction, summary, creation time, and industry type.
步骤S203,将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组;In step S203, each valid information group is aggregated based on the report file to which it belongs, to obtain the aggregated valid information group corresponding to each report file;
在确定出了各个有效信息组所属的报告文件后,就可以按照所属的报告文件对各个有效信息组进行聚合,从而确定出每份报告文件各自对应的聚合后的有效信息组了。比如,对前例的a组、b组和c个有效信息组进行聚合,从而确定出报告文件A对应a组、b个有效信息组,报告文件B对应c个有效信息组。After the report file to which each valid information group belongs is determined, each valid information group can be aggregated according to the report file to which it belongs, so as to determine the aggregated valid information group corresponding to each report file. For example, group a, group b, and c valid information groups in the previous example are aggregated, thereby determining that report file A corresponds to group a and b valid information groups, and report file B corresponds to c valid information groups.
步骤S204,针对每份报告生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;Step S204, generating a content box for each report to obtain at least one content box; the content box includes the report file information of the report file and the corresponding valid information group;
具体而言,针对每份报告和对应的各个有效信息组生成内容盒,从而得到与报告文件数量相同的内容盒,每个内容盒包括报告文件的报告文件信息和对应的有效信息组。Specifically, content boxes are generated for each report and each corresponding valid information group, so as to obtain the same number of content boxes as report files, and each content box includes the report file information of the report file and the corresponding valid information group.
步骤S205,分别展示至少一个内容盒。Step S205, displaying at least one content box respectively.
得到多个内容盒之后,即可在应用程序的界面中分别展示各个内容盒了。比如,在内容盒1中展示报告文件A的报告文件信息和a组、b个有效信息组,在内容盒2中展示报告文件B的报告文件信息和c个有效信息组。After obtaining multiple content boxes, each content box can be displayed separately in the interface of the application. For example, the report file information of the report file A and groups a and b of valid information are displayed in the
在本发明实施例中,针对搜索关键词,搜索得到与搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;分别展示至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。In the embodiment of the present invention, for search keywords, at least one valid information group corresponding to the search keyword is obtained by searching, then the report files to which each valid information group belongs is determined, and the report file information of each report file is obtained, and then Aggregate each valid information group based on the report file to which it belongs, obtain the aggregated valid information group corresponding to each report file, generate a content box for each report file, and obtain at least one content box; the content box includes the report of the report file. File information and corresponding valid information groups; at least one content box is displayed respectively. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports according to the search keywords, including but not limited to the titles of visual charts. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, each valid information group that matches the search keyword and belongs to different report files is obtained through comprehensive identification, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
在另一个实施例中提供了一种信息处理方法,如图3所示,该方法包括:In another embodiment, an information processing method is provided, as shown in FIG. 3 , the method includes:
步骤S301,针对搜索关键词,搜索得到对应的至少一个有效信息组;Step S301, for search keywords, search to obtain at least one corresponding valid information group;
在本发明实施例中,终端中可以安装用于浏览行业报告的应用程序,应用程序可以包括搜索界面,搜索界面中可以设置搜索栏,用户可以通过在搜索栏中输入搜索关键词并进行搜索,应用程序通过搜索从而得到相应的搜索结果,并将搜索结果展示给用户。其中,行业报告可以是行业分析报告、行业研究报告、行业数据报告等等,比如,《OTA行业-国盛证券_券商报告》就是行业报告。In this embodiment of the present invention, an application program for browsing industry reports may be installed in the terminal, and the application program may include a search interface, and a search bar may be set in the search interface. The application obtains the corresponding search results by searching, and displays the search results to the user. Among them, the industry report can be an industry analysis report, an industry research report, an industry data report, etc. For example, "OTA Industry-Guosheng Securities_Broker Report" is an industry report.
进一步,搜索结果可以是与搜索关键词对应的至少一个有效信息组。其中,有效信息是指,行业报告中与用户的搜索意图相匹配的、并可以直接为用户的研究工作提供高价值度参考信息的报告内容。根据现行市面各机构和团队的报告撰写标准与习惯,报告内的高价值度信息一般在报告的图表类内容中出现。Further, the search result may be at least one valid information group corresponding to the search keyword. The effective information refers to the report content in the industry report that matches the user's search intention and can directly provide high-value reference information for the user's research work. According to the current report writing standards and habits of various institutions and teams in the market, high-value information in the report generally appears in the chart content of the report.
一个有效信息组包括但不限于报告中的有效信息图像、有效信息标题以及有效信息关键字。其中,有效信息图像,指的是报告中含有高价值有效信息的整页对应的图像;有效信息标题,指的是有效信息图像的标题;有效信息关键字,指的是有效信息图像中包含的关键字,和/或,有效信息图像周边的关键字。A valid message group includes, but is not limited to, valid message images, valid message titles, and valid message keywords in the report. Among them, the effective information image refers to the image corresponding to the entire page containing high-value effective information in the report; the effective information title refers to the title of the effective information image; the effective information keyword refers to the information contained in the effective information image. Keywords, and/or keywords surrounding the valid informational image.
其中,应用程序依据搜索关键词进行搜索时,可以调用预设的有效信息搜索接口进行搜索,有效信息搜索接口包括:Among them, when the application searches according to the search keywords, it can call the preset effective information search interface to search, and the effective information search interface includes:
请求方法:GETRequest method: GET
请求路径:/api/search/modulesRequest path: /api/search/modules
请求参数:keyword,搜索关键词。Request parameters: keyword, search keyword.
步骤S302,确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;Step S302, determine the report file to which each valid information group belongs, and obtain the report file information of each report file;
具体而言,各个有效信息组可以存储在有效信息数据库中,当用户采用搜索关键词进行搜索时,预设的搜索引擎可以从预设的有效信息数据库中查询得到匹配的至少一个有效信息组,查询得到的各个有效信息组可以属于不同的报告文件;其中,一个有效信息组包括有效信息图像、有效信息标题以及有效信息关键字。比如,通过查询得到三个有效信息组:a组、b组和c组,其中,a组和b组属于报告文件A,c组属于报告文件B。其中,一个报告文件可以是一份行业报告,比如,行业报告《OTA行业-国盛证券_券商报告》就是一份报告文件。Specifically, each valid information group can be stored in a valid information database, and when a user searches using a search keyword, a preset search engine can query the preset valid information database to obtain at least one matching valid information group, Each valid information group obtained by the query may belong to different report files; wherein, one valid information group includes valid information image, valid information title and valid information keyword. For example, three valid information groups are obtained through query: group a, group b, and group c, wherein group a and group b belong to report file A, and group c belongs to report file B. Among them, a report file may be an industry report, for example, the industry report "OTA Industry - Guosheng Securities_Broker Report" is a report file.
因此,在搜索得到了各个有效信息组后,可以进一步确定出每个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息。其中,报告文件信息包括但不限于:报告的ID、创建人、标签、简介、摘要、创建时间、行业类型。Therefore, after each valid information group is obtained by searching, the report file to which each valid information group belongs can be further determined, and the report file information of each report file can be obtained. Wherein, the report file information includes but is not limited to: report ID, creator, label, introduction, summary, creation time, and industry type.
步骤S303,将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组;In step S303, each valid information group is aggregated based on the report file to which it belongs, to obtain the aggregated valid information group corresponding to each report file;
在确定出了各个有效信息组所属的报告文件后,就可以按照所属的报告文件对各个有效信息组进行聚合,从而确定出每份报告文件各自对应的聚合后的有效信息组了。比如,对前例的a组、b组和c个有效信息组进行聚合,从而确定出报告文件A对应a组、b个有效信息组,报告文件B对应c个有效信息组。After the report file to which each valid information group belongs is determined, each valid information group can be aggregated according to the report file to which it belongs, so as to determine the aggregated valid information group corresponding to each report file. For example, group a, group b, and c valid information groups in the previous example are aggregated, thereby determining that report file A corresponds to group a and b valid information groups, and report file B corresponds to c valid information groups.
步骤S304,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;Step S304, generating a content box for each report file to obtain at least one content box; the content box includes the report file information of the report file and the corresponding valid information group;
具体而言,针对每份报告和对应的各个有效信息组生成内容盒,从而得到与报告文件数量相同的内容盒,每个内容盒包括报告文件的报告文件信息和对应的有效信息组。Specifically, content boxes are generated for each report and each corresponding valid information group, so as to obtain the same number of content boxes as report files, and each content box includes the report file information of the report file and the corresponding valid information group.
比如,如图4所示,内容盒可以包括两个区域:第一区域和第二区域,第一区域可以为报告文件的报告文件信息,点击后可跳转至该报告文件的详情页;第二区域可以为与搜索关键词匹配的各个有效信息组,点击任一有效信息组可呼出报告内容阅读器。这样,将属于同一份报告文件的至少一个有效信息组进行了聚合性展示,提高了属于同一份报告文件的至少一个有效信息组的关联性,使得阅读者通过对报告文件进行甄别即可获取每个报告文件对应的有效信息组,避免了现有技术中需要对混乱无序的搜索结果进行甄别的问题。For example, as shown in Figure 4, the content box may include two areas: a first area and a second area. The first area may be the report file information of the report file, and after clicking, it will jump to the details page of the report file; The second area can be each valid information group that matches the search keyword. Clicking any valid information group can call out the report content reader. In this way, the aggregated display of at least one valid information group belonging to the same report file improves the correlation of at least one valid information group belonging to the same report file, so that readers can obtain each The effective information group corresponding to each report file avoids the problem of needing to screen out chaotic search results in the prior art.
需要说明的是,内容盒的形式除了如图4所示之外,其它形式的内容盒也是适用于本申请的;而且,当有效信息组的数量较多时,在内容盒右侧中展示一定数量的有效信息组即可,所有的有效信息组可在报告内容阅读器中进行展示,或者,在内容盒右侧中也可以设置滚动条,这样就要以展示全部的有效信息组了。当然,在实际应用中,用户可以根据实际需求对内容盒的形式、内容盒的布局进行设置,本申请对此均不作限制。It should be noted that, in addition to the form of the content box shown in Figure 4, other forms of content boxes are also applicable to this application; moreover, when the number of valid information groups is large, a certain amount is displayed on the right side of the content box All valid information groups can be displayed in the report content reader, or a scroll bar can be set on the right side of the content box, so that all valid information groups can be displayed. Of course, in practical applications, the user can set the form of the content box and the layout of the content box according to actual needs, which are not limited in this application.
步骤S305,分别展示至少一个内容盒;Step S305, displaying at least one content box respectively;
得到多个内容盒之后,即可在应用程序的界面中展示各个内容盒了。比如,在内容盒1中展示报告文件A的报告文件信息和有效信息组a、b,在内容盒2中展示报告文件B的报告文件信息和有效信息组c。再比如,搜索关键词为“OTA酒店代理抽佣”,得到如图5所示的各个内容盒。Once you have multiple content boxes, you can display each content box in the application's interface. For example, the report file information and the valid information groups a and b of the report file A are displayed in the
步骤S306,当接收到针对至少一个内容盒中任一内容盒的展示指令时,获取与任一内容盒对应的各个有效信息组中的有效信息标题;Step S306, when receiving a display instruction for any content box in the at least one content box, obtain the valid information title in each valid information group corresponding to any content box;
具体而言,当用户点击至少一个内容盒中任一内容盒时,即发起了针对该内容盒的展示指令,此时获取该内容盒中各个有效信息组中的有效信息标题即可。Specifically, when the user clicks any content box in the at least one content box, a display instruction for the content box is initiated, and at this time, the valid information title in each valid information group in the content box can be obtained.
步骤S307,通过预设的报告内容阅读器展示各个有效信息标题,以及各个有效信息标题中当前被选中的有效信息标题对应的有效信息组;Step S307, displaying each valid information title and the valid information group corresponding to the currently selected valid information title in each valid information title through the preset report content reader;
通过预设的报告内容阅读器展示各个有效信息标题,以及各个有效信息标题中当前被选中的有效信息标题对应的有效信息组,如图6A所示。Each valid information title and a valid information group corresponding to the currently selected valid information title in each valid information title are displayed through the preset report content reader, as shown in FIG. 6A .
报告内容阅读器可对当前搜索关键词对应的、不同报告文件中的所有有效信息组进行浏览和管理。如图6B所示,报告内容盒可以包括四个部分:The report content reader can browse and manage all valid information groups in different report files corresponding to the current search keywords. As shown in Figure 6B, the report content box can include four parts:
1)搜索关键词信息区1) Search keyword information area
展示当前搜索结果页面下用户输入的搜索关键词。Displays the search keywords entered by the user on the current search result page.
2)报告及有效信息标题导航区2) Report and effective information title navigation area
展示当前正在浏览的内容所属的报告文件,以及属于同一报告文件的其他有效信息标题。用户可通过对该导航区中其它有效信息标题的点击进行切换或光标滚动进行连续性浏览。Displays the report file to which the content currently being viewed belongs, and other valid information titles that belong to the same report file. The user can perform continuous browsing by clicking on other valid information titles in the navigation area to switch or scroll the cursor.
进一步,在某份报告文件的最后一个有效信息标题之后,导航区可以自动加载下一份报告文件的标题,如图7所示。Further, after the last valid information title of a certain report file, the navigation area can automatically load the title of the next report file, as shown in FIG. 7 .
3)报告内容阅读区3) Report content reading area
与有效信息标题对应的有效信息图像,也就是详细的报告内容,可进行放大及缩小操作。The effective information image corresponding to the effective information title, that is, the detailed report content, can be zoomed in and out.
4)报告内容操作区4) Report content operation area
设置有针对当前展示的有效信息报告内容的至少一种交互指令,在本发明实施例中,包括但不限于:摘录、下载及原文。At least one interactive instruction for the currently displayed valid information report content is provided, and in this embodiment of the present invention, includes but is not limited to: excerpt, download, and original text.
其中,报告内容盒可以调用预设的有效信息获取接口来获取有效信息,有效信息获取接口包括:The report content box can call a preset valid information acquisition interface to acquire valid information, and the valid information acquisition interface includes:
请求方法:GETRequest method: GET
请求路径:/api/report?modules=1Request path: /api/report? modules=1
请求参数:报告文件的相关参数。Request parameters: related parameters of the report file.
步骤S308,当至少一个交互指令中任一交互指令被触发时,针对当前展示的有效信息组执行交互指令对应的交互动作;Step S308, when any interaction instruction in the at least one interaction instruction is triggered, perform an interaction action corresponding to the interaction instruction for the currently displayed valid information group;
其中,点击“摘录”,可以将该有效信息组摘录至笔记本中;点击“下载”,可以将该有效信息图像下载到本地;点击“原文”,可以打开新页面窗口,并在新页面窗口中展示该有效信息图像所属的原始报告文件,并定位至与该有效信息图像内容相同的页面。Among them, click "Excerpt", you can extract the effective information group into the notebook; click "Download", you can download the effective information image to the local; click "Original", you can open a new page window, and in the new page window Display the original report file to which the effective information image belongs, and locate the page with the same content as the effective information image.
当至少一个交互指令中任一交互指令被触发时,针对当前展示的有效信息组执行交互指令对应的交互动作,包括:When any interactive instruction in the at least one interactive instruction is triggered, the interactive action corresponding to the interactive instruction is executed for the currently displayed valid information group, including:
当摘录指令被触发时,判断预设的收藏夹中是否存在已生成的笔记本;When the excerpt instruction is triggered, determine whether there is a generated notebook in the preset favorites;
若是,则展示已生成的笔记本的笔记本列表,当接收到针对笔记本列表中任一笔记本的确认指令时,将当前展示的有效信息组复制至笔记本中;If so, display the notebook list of the generated notebook, and when receiving a confirmation instruction for any notebook in the notebook list, copy the currently displayed valid information group to the notebook;
若否,则展示预设的创建笔记本界面,基于创建笔记本界面创建新笔记本,并将当前展示的有效信息组复制至新笔记本。If not, the preset notebook creation interface is displayed, a new notebook is created based on the notebook creation interface, and the currently displayed valid information group is copied to the new notebook.
具体而言,当用户点击了“摘录”,可以判断预设的收藏夹中是否存在已生成的笔记本,也就是用户已经建立的笔记本,如果是,则通过预设的列表窗口展示笔记本列表,该笔记本列表中可以包括所有已生成的笔记本。当用户选择了其中任一笔记本并确认,则将当前展示的有效信息组复制至用户确认的笔记本中,然后还可以将报告内容操作区中的“摘录”更改为“已摘录”,如图8A~8B所示。Specifically, when the user clicks "Excerpt", it can be determined whether there is a generated notebook in the preset favorites, that is, the notebook that the user has created. If so, the notebook list is displayed through the preset list window. The notebook list can include all generated notebooks. When the user selects any of the notebooks and confirms, the currently displayed valid information group will be copied to the notebook confirmed by the user, and then the "Excerpt" in the operation area of the report content can be changed to "Extracted", as shown in Figure 8A ~8B.
如果收藏夹中没有已生成的笔记本,那么就可以直接展示新建笔记本窗口,用户可以在新建笔记本窗口中设置笔记本的名字,确定之后即可生成笔记本,然后还可以将报告内容操作区中的“摘录”更改为“已摘录”,如图9所示。If there is no generated notebook in the favorites, the new notebook window can be displayed directly. The user can set the name of the notebook in the new notebook window. After confirming, the notebook can be generated. " to "Excerpted", as shown in Figure 9.
进一步,在列表窗口中,还可以设置“新建笔记本”的按钮,当用户点击了该按钮后,仍然可以展示新建笔记本窗口,如图9所示,用户可以在新建笔记本窗口中设置笔记本的名字,确定之后即可生成笔记本,此时报告内容操作区中的“摘录”更改为“已摘录”,如图8B所示。Further, in the list window, you can also set the button of "New Notebook". When the user clicks this button, the new notebook window can still be displayed. As shown in Figure 9, the user can set the name of the notebook in the new notebook window. After confirmation, the notebook can be generated, and at this time, "Excerpt" in the report content operation area is changed to "Extracted", as shown in FIG. 8B .
其中,笔记本可以是记录有效信息组的容器,可以用于管理和浏览已摘录的有效信息组。用户可以新建、删除、修改笔记本,也可以把有效信息组摘录至笔记本中,以便查看。The notebook can be a container for recording valid information groups, and can be used to manage and browse the extracted valid information groups. Users can create, delete, and modify notebooks, and can also extract valid information groups into notebooks for viewing.
参照图10,摘录的详细步骤可以如下:Referring to FIG. 10, the detailed steps of the excerpt can be as follows:
1)用户发起摘录某个有效信息组的请求,此时需要选择一个笔记本(包括从已生成的笔记本中选择一个,或者新建一个笔记本);1) The user initiates a request to extract a valid information group, and at this time, it is necessary to select a notebook (including selecting one from the generated notebooks, or creating a new notebook);
2)有效信息组将被完全克隆一份,不是关联。克隆可以防止有效信息组被删除时摘录也无法查看。克隆后摘录为有效信息组的副本;2) The valid message group will be fully cloned, not associated. Cloning prevents excerpts from being unviewable when valid message groups are deleted. Excerpted as a copy of the valid message group after cloning;
3)此时再将摘录关联至选择的笔记本中;3) Now associate the excerpt to the selected notebook;
4)用户需要查看摘录时,发起查看笔记本内容的请求即可。4) When the user needs to view the excerpt, he can initiate a request to view the notebook content.
进一步,笔记本的接口可以包括:Further, the interface of the notebook may include:
1)笔记本列表GET/api/notebooks1) Notebook list GET/api/notebooks
2)笔记本详情(带摘录列表)GET/api/notebooks/{$notebook_id}2) Notebook details (with excerpt list) GET /api/notebooks/{$notebook_id}
3)新建笔记本POST/api/notebooks3) Create a new notebook POST/api/notebooks
参数:必填title,长度255。Parameters: required title, length 255.
4)更新笔记本PUT/PATCH/api/notebooks/{$notebook_id}4) Update notebook PUT/PATCH/api/notebooks/{$notebook_id}
参数:必填title,长度255。Parameters: required title, length 255.
5)删除笔记本DELETE/api/notebooks/{$notebook_id}5) Delete notebook DELETE/api/notebooks/{$notebook_id}
参数:选填force,可选值0或1,此参数意义为,是否强行删除,如果为0,则不进行删除;如果为1,则会连笔记本中的摘录一起删除(不会提示笔记本中存在摘录);Parameter: optional force, the optional value is 0 or 1. The meaning of this parameter is whether to delete it forcibly. If it is 0, it will not be deleted; if it is 1, it will be deleted together with the excerpts in the notebook (it will not prompt in the notebook). Existing excerpts);
其中,在删除笔记本时,如果有笔记本中存在摘录,则会生成提示和确认信息,比如:“笔记本中有摘录,是否删除”,确认信息包括“是”和“否”,如果用户点击了“是”,则force值为1;如果用户点击了“否”,则force值为0。Among them, when deleting a notebook, if there are excerpts in the notebook, a prompt and confirmation information will be generated, such as: "There are excerpts in the notebook, delete it", the confirmation information includes "Yes" and "No", if the user clicks " Yes", the force value is 1; if the user clicks "No", the force value is 0.
6)摘录内容POST/api/notebooks/{$notebook_id}/excerpt6) Excerpt content POST /api/notebooks/{$notebook_id}/excerpt
参数:必填report_module_id,内容模块id。Parameters: required report_module_id, content module id.
7)删除摘录POST/api/notebooks/{$notebook_id}/unexcerpt7) Delete the excerpt POST /api/notebooks/{$notebook_id}/unexcerpt
参数:必填report_module_id,内容模块id,可以删除一个笔记本中的多个摘要,英文逗号隔开,例如:report_module_id=1,2,3。Parameters: required report_module_id, content module id, multiple abstracts in a notebook can be deleted, separated by English commas, for example: report_module_id=1,2,3.
步骤S309,接收到针对预设的收藏夹中已生成的笔记本中任一笔记本的展示指令时,通过报告内容阅读器展示笔记本中的有效信息组。Step S309, when receiving a display instruction for any notebook among the notebooks that have been generated in the preset favorites, display the valid information group in the notebook through the report content reader.
具体而言,所有摘录的内容都可以收藏夹中进行统一的浏览和管理,如图11所示。在本发明实施例中,报告内容阅读器作为通用性较强的控件,除了可以展示有效信息组,还可以复用于更多的相似场景,比如,笔记本下已摘录的内容仍然可以采用报告内容阅读器的控件进行展示,如图12所示,当用户点击已生成的笔记本中任一笔记本,即可呼出报告内容阅读器来浏览摘录。Specifically, all the excerpted contents can be browsed and managed uniformly in the favorites, as shown in Figure 11. In the embodiment of the present invention, the report content reader, as a control with strong versatility, can not only display valid information groups, but also be reused in more similar scenarios. The controls of the reader are displayed, as shown in Figure 12, when the user clicks any notebook in the generated notebook, the report content reader can be called out to browse the excerpts.
在本发明一种优选实施例中,搜索得到与搜索关键词对应的至少一个有效信息组,包括:In a preferred embodiment of the present invention, at least one valid information group corresponding to the search keyword is obtained by searching, including:
对搜索关键词进行Query分析,得到分析后的关键词;Perform Query analysis on the search keywords to obtain the analyzed keywords;
基于Elasticsearch Query DSL语法对分析后的关键词进行拼装,得到有效信息组的查询语句;查询语句包括关键字字段和标题字段;Assemble the analyzed keywords based on the Elasticsearch Query DSL grammar to obtain a query statement of an effective information group; the query statement includes a keyword field and a title field;
采用查询语句与预设的搜索引擎中的索引进行查询,得到与搜索关键词匹配的至少一个有效信息组。The query statement and the preset index in the search engine are used to query, and at least one valid information group matching the search keyword is obtained.
具体而言,在应用程序的搜索界面中,可以设置两个搜索模式:“搜内容”和“搜报告”,其中,搜报告可以是基于报告文件的名称进行搜索,也就是普通搜索;搜内容则是基于报告的内容进行搜索。Specifically, in the search interface of the application, two search modes can be set: "Search content" and "Search report", where the search report can be based on the name of the report file, that is, a common search; search content The search is based on the content of the report.
参照图13,为本发明实施例中基于搜索关键词的搜索流程示意图。针对用户输入的搜索关键词进行搜索时,先判断搜索模式是否为“搜内容”,若否,则进行普通搜索(即“搜报告”),得到普通搜索的结果页面;若是,则对搜索关键词进行Query分析,包括对搜索关键词进行分词和近义词扩充,得到分析后的搜索关键词,然后使用Elasticsearch QueryDSL语法对分析后的关键词进行语句拼装,得到用于查询有效信息组的查询语句,其中,查询语句包括关键字字段和标题字段;再通过预设的搜索引擎对该查询语句进行查询,包括采用该查询语句与搜索引擎中的索引进行查询,从而得到与搜索关键词匹配的至少一个有效信息组,然后执行步骤S201~步骤S205,或者步骤S302~步骤S305即可。Referring to FIG. 13 , it is a schematic diagram of a search process based on search keywords in an embodiment of the present invention. When searching for the search keywords input by the user, first determine whether the search mode is "search content"; Query analysis of the words, including word segmentation and synonym expansion of the search keywords, to obtain the analyzed search keywords, and then use the Elasticsearch QueryDSL syntax to assemble the analyzed keywords to obtain the query statement for querying valid information groups. Wherein, the query statement includes a keyword field and a title field; then query the query statement through a preset search engine, including using the query statement to query an index in the search engine, so as to obtain at least one matching search keyword. Valid information group, and then perform steps S201 to S205, or steps S302 to S305.
其中,对搜索关键词进行中文分词与近义词扩充,即可得到分析后的搜索关键词。中文分词可以使用Elasticsearch开源插件IK Analysis for Elasticsearch,近义词扩充可以使用经验总结的近义词词库。Among them, Chinese word segmentation and synonym expansion are performed on the search keywords to obtain the analyzed search keywords. Chinese word segmentation can use the Elasticsearch open source plug-in IK Analysis for Elasticsearch, and synonym expansion can use the synonym thesaurus based on experience.
基于Elasticsearch Query DSL语法对分析后的搜索关键词进行拼装,得到有效信息组的查询语句,查询语句针对内容模块的标题与关键词进行搜索,其中也会设置标题与关键词的权重,比如,“市场调研”为Query语句,其中“调研”与“调查”、“研究”为近义词。Based on the Elasticsearch Query DSL syntax, the analyzed search keywords are assembled to obtain query sentences of valid information groups. The query sentences are searched for the title and keywords of the content module, and the weights of the title and keywords are also set, for example, " "Market research" is a Query statement, in which "research" is synonymous with "survey" and "research".
需要说明的是,分词插件除了可以是上述插件之外,还可以是其它分词插件,在实际应用中可以根据实际需求进行设置,本发明实施例对此不作限制;近义词词库除了通过上述方式获得的词库之外,还可以是通过其它方式获得的词库,在实际应用中可以根据实际需求进行设置,本发明实施例对此不作限制。It should be noted that the word segmentation plug-in can be other than the above-mentioned plug-ins, and can be set according to actual needs in practical applications, which is not limited in the embodiment of the present invention; In addition to the thesaurus, the thesaurus may also be obtained in other ways, which may be set according to actual requirements in practical applications, which is not limited in this embodiment of the present invention.
进一步,搜索引擎中的索引也可以包括报告的ID,也就是各个有效信息组所属的报告文件的ID,这样搜索得到各个有效信息组时,也可以确定出每个有效信息组所属的报告文件了。Further, the index in the search engine can also include the ID of the report, that is, the ID of the report file to which each valid information group belongs, so that when each valid information group is obtained by searching, the report file to which each valid information group belongs can also be determined. .
在本发明一种优选实施例中,预设的搜索引擎中的索引通过如下方式生成:In a preferred embodiment of the present invention, the index in the preset search engine is generated in the following manner:
当检测到预设的有效信息数据库中已存储的至少一个有效信息组中任一有效信息组发生数据更新时,获取发生数据更新的有效信息组的有效信息标题和有效信息关键字;数据更新包括有效信息组的增加、删除、修改中的至少一种;When it is detected that any valid information group in the at least one valid information group stored in the preset valid information database has been updated with data, obtain the valid information title and valid information keyword of the valid information group in which the data update has occurred; the data update includes At least one of the addition, deletion and modification of the valid information group;
基于有效信息标题和有效信息关键字生成索引,并建立有效信息标题、有效信息关键字与索引的映射关系;其中,索引包括标题字段和关键字字段。The index is generated based on the effective information title and the effective information keyword, and the mapping relationship between the effective information title, the effective information keyword and the index is established; wherein, the index includes a title field and a keyword field.
其中,搜索引擎可以是ES(ElasticSearch),ES是一种分布式全文搜索引擎。ES是面向文档的,这意味着它可以存储整个对象或文档。然而它不仅仅是存储,还会索引(index)每个文档的内容使之可以被搜索。在ES中,用户可以对文档或对象(而非成行成列的数据)进行索引、搜索等操作。The search engine may be ES (ElasticSearch), which is a distributed full-text search engine. ES is document-oriented, which means it can store entire objects or documents. However, it doesn't just store, it indexes the content of each document so that it can be searched. In ES, users can index, search, etc. on documents or objects (rather than rows and columns of data).
具体而言,ES可以基于异步脚本从有效信息数据库获取数据。如图14所示,有效信息数据库MYSQL中的有效信息组发生数据更新时,会触发数据修改事件,该数据修改事件会进入事件处理队列等待ES对有效信息组进行相应的数据处理;其中,数据更新包括有效信息组的增加、删除、修改中的至少一种。这样,ES就可以从有效信息数据库中实时更新有效信息组了。Specifically, ES can fetch data from a database of valid information based on asynchronous scripts. As shown in Figure 14, when a data update occurs in the valid information group in the valid information database MYSQL, a data modification event will be triggered, and the data modification event will enter the event processing queue to wait for ES to perform corresponding data processing on the valid information group; Updating includes at least one of addition, deletion, and modification of valid information groups. In this way, the ES can update the valid information group in real time from the valid information database.
进一步,ES更新有效信息组时,还需要基于发生更新的有效信息组更新索引(index),包括新建、修改、删除索引,索引包括标题字段和关键字字段,并确定索引与有效信息组的映射(mappings),映射可以告诉ES如何来处理新加入的各种字段。有效信息组需要被处理的字段为title(有效信息标题)和keyword(有效信息关键词)。title被映射为text类型字段,在处理时将被分词和倒排索引,分词时可以使用ik插件;keyword映射为keyword类型,只会被精确匹配。Further, when the ES updates the valid information group, it also needs to update the index (index) based on the updated valid information group, including creating, modifying, and deleting the index. The index includes the title field and the keyword field, and the mapping between the index and the valid information group is determined. (mappings), the mapping can tell ES how to handle the various newly added fields. The fields that need to be processed in the valid information group are title (valid message title) and keyword (valid message keyword). The title is mapped to a text type field, which will be segmented and indexed in reverse during processing. The ik plugin can be used for word segmentation; keyword is mapped to the keyword type, which will only be matched exactly.
更进一步,ES在对有效信息组进行相应的数据处理时,还可以进一步获取有效信息组所属的报告文件的ID(删除有效信息除外)。Furthermore, when performing corresponding data processing on the valid information group, the ES can further obtain the ID of the report file to which the valid information group belongs (except for deleting valid information).
在本发明一种优选实施例中,预设的有效信息数据库通过如下方式生成:In a preferred embodiment of the present invention, the preset valid information database is generated in the following manner:
获取报告文件;obtain report documents;
将报告文件按页数进行文档切图处理,得到至少一张报告文件图像;Cut the report file according to the number of pages to obtain at least one image of the report file;
对每张报告文件图像进行字块识别,得到每张报告文件图像各自对应的至少一个字块;Perform word block identification on each report file image to obtain at least one word block corresponding to each report file image;
将每张报告文件图像中,至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像;In each report file image, at least one of the report file images that meets the preset requirements is used as a valid information image to obtain at least one valid information image;
提取出每张有效信息图像的有效信息标题和有效信息关键字,并建立每张有效信息图像,以及每张有效信息图像各自对应的有效信息标题和有效信息关键字的关联关系;Extracting the valid information title and valid information keyword of each valid information image, and establishing each valid information image, as well as the association relationship between the valid information title and valid information keyword corresponding to each valid information image;
将每张有效信息图像、每张有效信息图像各自对应的有效信息标题、有效信息关键字,以及关联关系存储至有效信息数据库。Each effective information image, the corresponding effective information title, effective information keyword, and association relationship of each effective information image are stored in the effective information database.
具体而言,先获取任一完整的报告文件,然后对报告文件的每一页进行文档切图处理,得到至少一张报告文件图像。其中,文档切图是将获取到的文档按页转换成图像,比如png格式的图像,具体可以使用软件xpdf工具包中的pdftopng,它能将pdf页面转换成png格式的图像。Specifically, any complete report file is obtained first, and then each page of the report file is cut into a document to obtain at least one report file image. Among them, document slicing is to convert the obtained document into images on a page-by-page basis, such as images in png format. Specifically, pdftopng in the software xpdf toolkit can be used, which can convert pdf pages into images in png format.
然后再对每张报告文件图像进行字块识别,得到每张报告文件图像中的至少一个字块。其中,字块识别可以采用OCR(Optical Character Recognition,光学字符识别),每一张报告文件图像都要经过OCR处理,报告文件图像中不同区域的文字可称作字块,字块被OCR处理后将得到字块的内容、位置、置信度、段落等信息,OCR处理得到的字块需要过滤处理,将非文字与数字字块删除,OCR的效果如图15所示。Then perform word block identification on each report file image to obtain at least one word block in each report file image. Among them, OCR (Optical Character Recognition, Optical Character Recognition) can be used for word block recognition. Each report file image must undergo OCR processing. The text in different areas of the report file image can be called word blocks. After the word blocks are processed by OCR The content, location, confidence, paragraph and other information of the word block will be obtained. The word block obtained by OCR processing needs to be filtered, and the non-text and number word blocks will be deleted. The effect of OCR is shown in Figure 15.
在本发明一种优选实施例中,检测每张报告文件图像中数字字块的数量是否超过第一数量阈值;In a preferred embodiment of the present invention, detecting whether the number of digital blocks in each report file image exceeds a first number threshold;
若是,则将每张报告文件图像中超过第一数量阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, the report document images that exceed the first number threshold in each report document image are used as valid information images to obtain at least one valid information image.
在本发明一种优选实施例中,将每张报告文件图像中,至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:In a preferred embodiment of the present invention, in each report file image, a report file image with at least one word block satisfying preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量与对应的报告文件图像中全部字块的数量的比例是否超过比例阈值;Detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds the proportional threshold;
若是,则将每张报告文件图像中超过比例阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report document image that exceeds the ratio threshold in each report document image as the valid information image, and obtain at least one valid information image.
在本发明一种优选实施例中,将每张报告文件图像中,至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:In a preferred embodiment of the present invention, in each report file image, a report file image with at least one word block satisfying preset requirements is used as a valid information image to obtain at least one valid information image, including:
获取每张报告文件图像中所述至少一个字块的高度,并确定出高度最大的预设数量的目标字块;Obtain the height of the at least one word block in each report file image, and determine the target word block with the maximum height;
检测每张报告文件图像中的目标字块是否包含中文字块;Detect whether the target block in each report file image contains Chinese block;
若是,则检测包含中文块的目标字块中中文字符的数量是否超过第三数量阈值;If so, detect whether the number of Chinese characters in the target word block containing the Chinese block exceeds the third number threshold;
若是,则将每张报告文件图像中目标字块包含中文字块的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report file image in which the target word block contains Chinese character block in each report file image as the valid information image, and obtain at least one valid information image.
具体而言,高价值的有效信息一般是图和表的形式,拥有概括性结论,数字信息居多,针对任一张报告文件图像,可以使用以下规则进行判断:Specifically, high-value effective information is generally in the form of graphs and tables, with general conclusions and mostly digital information. For any report file image, the following rules can be used to judge:
1)纯数字字块的数量(包括含有“%”,“-”,“+”)是否超过第一数量阈值,比如30;1) Whether the number of pure digital word blocks (including "%", "-", "+") exceeds the first number threshold, such as 30;
2)纯数字字块的数量与该报告文件图像中全部字块的数量的比例是否超过比例阈值,比如0.2;2) Whether the ratio of the number of pure digital word blocks to the number of all word blocks in the report file image exceeds the ratio threshold, such as 0.2;
3)获取每张报告文件图像中至少一个字块的高度,并确定出高度最大的预设数量的目标字块;检测每张报告文件图像中的目标字块是否包含中文字块;若是,则检测包含中文块的目标字块中中文字符的数量是否超过第三数量阈值;若是,则将每张报告文件图像中目标字块包含中文字块的报告文件图像作为有效信息图像;比如,将报告文件图像中的全部字块按字块高度进行降序排序,然后获取排序前三的目标字块,检测三个目标字块中是否含有中文字块,若是,则检测包含中文字块的目标字块中中文字符的数量是否超过8个。3) Obtain the height of at least one word block in each report file image, and determine the target word block of the maximum preset number of heights; Detect whether the target word block in each report file image contains Chinese word blocks; if so, then Detect whether the number of Chinese characters in the target word block containing the Chinese block exceeds the third number threshold; if so, use the report file image of the target word block containing the Chinese word block in each report file image as a valid information image; All the word blocks in the file image are sorted in descending order according to the height of the word block, and then the top three target word blocks are obtained, and it is detected whether the three target word blocks contain Chinese word blocks. Whether the number of Chinese characters exceeds 8.
当然,上述的规则和数值是根据实际实验总结得出,在实际应用中,可以根据实际需求进行调整,本发明实施例对此不作限制。而且,在检测时可以采用上述至少一种规则,或者,除了上述规则外,还可以采用其它规则,在实际应用中可以根据实际需求进行设置,本发明实施例对此也不作限制。Certainly, the above-mentioned rules and numerical values are obtained by summarizing actual experiments. In practical applications, they may be adjusted according to actual requirements, which are not limited in this embodiment of the present invention. Moreover, at least one of the foregoing rules may be used during detection, or other rules may be used in addition to the foregoing rules, which may be set according to actual requirements in practical applications, which are not limited in this embodiment of the present invention.
基于上述规则,对每一页报告文件图像中的所有字块进行判断,将满足上述规则的报告文件图像作为有效信息图像,从而得到至少一张有效信息图像。Based on the above rules, all word blocks in each page of the report document image are judged, and the report document image that satisfies the above rules is regarded as a valid information image, so as to obtain at least one valid information image.
其中,一张有效信息图像的提取过程可以被抽象为Job类,处理Job类会放在队列中执行,执行的流程如图16所示。Among them, the extraction process of a valid information image can be abstracted into a Job class, and the processing Job class will be executed in a queue. The execution process is shown in Figure 16.
针对每一张有效信息图像,提取出有效信息标题和有效信息关键字。其中,有效信息标题和有效信息关键字的提取可以采用腾讯云自然语言处理(Natural LanguageProcess,NLP)服务,NLP服务深度整合了腾讯内部的NLP技术,依托千亿级中文语料累积,提供18项智能文本处理能力,包括智能分词、实体识别、文本纠错、情感分析、文本分类、敏感审核、词向量、关键词提取、自动摘要、智能闲聊、百科知识图谱查询等。For each valid information image, the valid information title and valid information keyword are extracted. Among them, effective information titles and effective information keywords can be extracted by using Tencent Cloud's Natural Language Process (NLP) service. The NLP service deeply integrates Tencent's internal NLP technology and relies on the accumulation of hundreds of billions of Chinese corpus to provide 18 intelligent Text processing capabilities, including intelligent word segmentation, entity recognition, text error correction, sentiment analysis, text classification, sensitive auditing, word vectors, keyword extraction, automatic summarization, intelligent chat, encyclopedic knowledge graph query, etc.
然后建立每张有效信息图像,以及每张有效信息图像各自对应的有效信息标题和有效信息关键字的关联关系,再将有效信息图像、有效信息标题、有效信息关键字和关联关系存储至预设的有效信息数据库中,除了有效信息之外,还可以存储其它数据,并生成数据表,来建立每个有效信息组与其它数据之间的关联关系,生成的数据表如表1所示:Then establish the relationship between each valid information image, as well as the valid information title and valid information keyword corresponding to each valid information image, and then store the valid information image, valid information title, valid information keyword and relationship in the preset In the valid information database of , in addition to valid information, other data can also be stored, and a data table can be generated to establish the relationship between each valid information group and other data. The generated data table is shown in Table 1:
表1Table 1
在本发明实施例中,任一数据库可以采用对象存储(Cloud Object Storage,COS),COS是由腾讯云推出的无目录层次结构、无数据格式限制,可容纳海量数据且支持HTTP/HTTPS协议访问的分布式存储服务。In the embodiment of the present invention, any database can adopt Cloud Object Storage (COS). COS is a directory-free hierarchical structure and no data format restriction introduced by Tencent Cloud, which can accommodate massive data and supports HTTP/HTTPS protocol access. distributed storage service.
需要说明的是,完整的报告文件可以存储在预设的完整报告数据库中,完整报告数据库与有效信息数据库可以是两个独立的数据库,也可以是一个数据库中的两个独立的部分,在实际应用中可以根据实际需求进行设置,本发明实施例对此不作限制。It should be noted that the complete report file can be stored in the preset complete report database. The complete report database and the effective information database can be two independent databases, or two independent parts in one database. In applications, settings may be made according to actual requirements, which are not limited in this embodiment of the present invention.
在本发明实施例中,针对搜索关键词,搜索得到与所述搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;分别展示至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。In the embodiment of the present invention, for search keywords, at least one valid information group corresponding to the search keyword is obtained by searching, then the report files to which each valid information group belongs is determined, and the report file information of each report file is obtained , and then aggregate each valid information group based on the report file to which it belongs to obtain the aggregated valid information group corresponding to each report file, generate a content box for each report file, and obtain at least one content box; The content box includes the report file The report file information and the corresponding valid information group; at least one content box is displayed respectively. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports according to the search keywords, including but not limited to the titles of visual charts. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, each valid information group that matches the search keyword and belongs to different report files is obtained through comprehensive identification, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
图17为本申请又一实施例提供的一种信息处理装置的结构示意图,如图17所示,本实施例的装置可以包括:FIG. 17 is a schematic structural diagram of an information processing apparatus according to another embodiment of the present application. As shown in FIG. 17 , the apparatus of this embodiment may include:
搜索模块1701,用于针对搜索关键词,搜索得到与搜索关键词对应的至少一个有效信息组;A
处理模块1702,用于确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息;The
聚合模块1703,用于将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组;The
生成模块1704,用于针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;The
展示模块1705,用于分别展示至少一个内容盒。The
在本发明一种优选实施例中,至少一个有效信息组中的任一有效信息组包括有效信息图像、有效信息标题以及有效信息关键字;In a preferred embodiment of the present invention, any effective information group in the at least one effective information group includes an effective information image, an effective information title and an effective information keyword;
该装置还包括:The device also includes:
接收模块,用于接收针对至少一个内容盒中任一内容盒的展示指令;a receiving module, configured to receive a display instruction for any content box in the at least one content box;
获取模块,用于获取与任一内容盒对应的各个有效信息中的有效信息标题;an acquisition module for acquiring the valid information title in each valid information corresponding to any content box;
展示模块,还用于通过预设的报告内容阅读器展示各个有效信息标题,以及各个有效信息标题中当前被选中的有效信息标题对应的有效信息。The display module is further configured to display each valid information title and the valid information corresponding to the currently selected valid information title in each valid information title through the preset report content reader.
在本发明一种优选实施例中,报告内容阅读器还设置有针对当前展示的有效信息组的至少一个交互指令;In a preferred embodiment of the present invention, the report content reader is further provided with at least one interactive instruction for the currently displayed valid information group;
该装置还包括:The device also includes:
执行模块,用于当至少一个交互指令中任一交互指令被触发时,针对当前展示的有效信息组执行交互指令对应的交互动作。The execution module is configured to execute the interaction action corresponding to the interaction instruction for the currently displayed valid information group when any interaction instruction in the at least one interaction instruction is triggered.
在本发明一种优选实施例中,交互指令包括摘录指令;In a preferred embodiment of the present invention, the interactive instruction includes an excerpt instruction;
执行模块具体用于:The execution module is specifically used to:
当摘录指令被触发时,判断预设的收藏夹中是否存在已生成的笔记本;When the excerpt instruction is triggered, determine whether there is a generated notebook in the preset favorites;
若是,则展示已生成的笔记本的笔记本列表,当接收到针对笔记本列表中任一笔记本的确认指令时,将当前展示的有效信息组复制至笔记本中;If so, display the notebook list of the generated notebook, and when receiving a confirmation instruction for any notebook in the notebook list, copy the currently displayed valid information group to the notebook;
若否,则展示预设的创建笔记本界面,基于创建笔记本界面创建新笔记本,并将当前展示的有效信息组复制至新笔记本。If not, the preset notebook creation interface is displayed, a new notebook is created based on the notebook creation interface, and the currently displayed valid information group is copied to the new notebook.
在本发明一种优选实施例中,接收模块,还用于接收针对预设的收藏夹中已生成的笔记本中任一笔记本的展示指令;In a preferred embodiment of the present invention, the receiving module is further configured to receive a display instruction for any notebook among the generated notebooks in the preset favorites;
展示模块,还用于通过报告内容阅读器展示笔记本中的有效信息组。The presentation module is also used to present valid groups of information in the notebook through the report content reader.
在本发明一种优选实施例中,搜索模块,包括:In a preferred embodiment of the present invention, the search module includes:
分析子模块,用于对搜索关键词进行Query分析,得到分析后的关键词;The analysis sub-module is used to perform Query analysis on the search keywords to obtain the analyzed keywords;
语句拼装子模块,用于基于Elasticsearch Query DSL语法对分析后的关键词进行拼装,得到有效信息组的查询语句;查询语句包括关键字字段和标题字段;The statement assembly sub-module is used to assemble the analyzed keywords based on the Elasticsearch Query DSL syntax to obtain a query statement of a valid information group; the query statement includes a keyword field and a title field;
查询子模块,用于采用查询语句与预设的搜索引擎中的索引进行查询,得到与搜索关键词匹配的至少一个有效信息组。The query sub-module is used for querying by using a query statement and a preset index in the search engine to obtain at least one valid information group matching the search keyword.
在本发明一种优选实施例中,预设的搜索引擎通过如下方式生成:In a preferred embodiment of the present invention, the preset search engine is generated in the following manner:
当检测到预设的有效信息数据库中已存储的所述至少一个有效信息组中任一有效信息组发生数据更新时,获取发生数据更新的有效信息组的有效信息标题和有效信息关键字;数据更新包括有效信息组的增加、删除、修改中的至少一种;When it is detected that any valid information group in the at least one valid information group that has been stored in the preset valid information database is updated with data, obtain the valid information title and valid information keyword of the valid information group in which the data update has occurred; data The update includes at least one of the addition, deletion and modification of the valid information group;
基于有效信息标题和有效信息关键字生成索引,并建立有效信息标题、有效信息关键字与索引的映射关系;其中,索引包括标题字段和关键字字段。The index is generated based on the effective information title and the effective information keyword, and the mapping relationship between the effective information title, the effective information keyword and the index is established; wherein, the index includes a title field and a keyword field.
在本发明一种优选实施例中,预设的有效信息数据库通过如下方式生成:In a preferred embodiment of the present invention, the preset valid information database is generated in the following manner:
获取报告文件;obtain report documents;
将报告文件按页数进行文档切图处理,得到至少一张报告文件图像;Cut the report file according to the number of pages to obtain at least one image of the report file;
对每张报告文件图像进行字块识别,得到每张报告文件图像各自对应的至少一个字块;Perform word block identification on each report file image to obtain at least one word block corresponding to each report file image;
将每张报告文件图像中,至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像;In each report file image, at least one of the report file images that meets the preset requirements is used as a valid information image to obtain at least one valid information image;
提取出每张有效信息图像的有效信息标题和有效信息关键字,并建立每张有效信息图像,以及每张有效信息图像各自对应的有效信息标题和有效信息关键字的关联关系;Extracting the valid information title and valid information keyword of each valid information image, and establishing each valid information image, as well as the association relationship between the valid information title and valid information keyword corresponding to each valid information image;
将每张有效信息图像、每张有效信息图像各自对应的有效信息标题、有效信息关键字,以及关联关系存储至有效信息数据库。Each effective information image, the corresponding effective information title, effective information keyword, and association relationship of each effective information image are stored in the effective information database.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量是否超过第一数量阈值;Detecting whether the number of digital blocks in each report file image exceeds the first number threshold;
若是,则将每张报告文件图像中超过第一数量阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, the report document images that exceed the first number threshold in each report document image are used as valid information images to obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
检测每张报告文件图像中数字字块的数量与对应的报告文件图像中全部字块的数量的比例是否超过比例阈值;Detecting whether the ratio of the number of digital blocks in each report file image to the number of all blocks in the corresponding report file image exceeds the proportional threshold;
若是,则将每张报告文件图像中超过比例阈值的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report document image that exceeds the ratio threshold in each report document image as the valid information image, and obtain at least one valid information image.
优选地,将每张报告文件图像中,所述至少一个字块满足预设要求的报告文件图像作为有效信息图像,得到至少一张有效信息图像,包括:Preferably, in each report file image, the report file image in which the at least one word block meets the preset requirements is used as a valid information image to obtain at least one valid information image, including:
获取每张报告文件图像中所述至少一个字块的高度,并确定出高度最大的预设数量的目标字块;Obtain the height of the at least one word block in each report file image, and determine the target word block with the maximum height;
检测每张报告文件图像中的目标字块是否包含中文字块;Detect whether the target block in each report file image contains Chinese block;
若是,则检测包含中文块的目标字块中中文字符的数量是否超过第三数量阈值;If so, detect whether the number of Chinese characters in the target word block containing the Chinese block exceeds the third number threshold;
若是,则将每张报告文件图像中目标字块包含中文字块的报告文件图像作为有效信息图像,得到至少一张有效信息图像。If yes, take the report file image in which the target word block contains Chinese character block in each report file image as the valid information image, and obtain at least one valid information image.
本实施例的信息处理装置可执行本申请第一个实施例、第二个实施例所示的信息处理方法,其实现原理相类似,此处不再赘述。The information processing apparatus of this embodiment can execute the information processing methods shown in the first embodiment and the second embodiment of the present application, and the implementation principles thereof are similar, which will not be repeated here.
在本发明实施例中,针对搜索关键词,搜索得到与搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;分别展示至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。In the embodiment of the present invention, for search keywords, at least one valid information group corresponding to the search keyword is obtained by searching, then the report files to which each valid information group belongs is determined, and the report file information of each report file is obtained, and then Aggregate each valid information group based on the report file to which it belongs, obtain the aggregated valid information group corresponding to each report file, generate a content box for each report file, and obtain at least one content box; the content box includes the report of the report file. File information and corresponding valid information groups; at least one content box is displayed respectively. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports according to the search keywords, including but not limited to the titles of visual charts. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, each valid information group that matches the search keyword and belongs to different report files is obtained through comprehensive identification, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
本申请的又一实施例中提供了一种电子设备,该电子设备包括:存储器和处理器;至少一个程序,存储于存储器中,用于被处理器执行时,与现有技术相比可实现:针对搜索关键词,搜索得到与搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;分别展示至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。Another embodiment of the present application provides an electronic device, the electronic device includes: a memory and a processor; at least one program, stored in the memory, for being executed by the processor, can be implemented compared with the prior art : For the search keyword, search to obtain at least one valid information group corresponding to the search keyword, then determine the report file to which each valid information group belongs, obtain the report file information of each report file, and then base each valid information group on the basis of The affiliated report files are aggregated to obtain an aggregated valid information group corresponding to each report file, a content box is generated for each report file, and at least one content box is obtained; the content box includes the report file information of the report file and the corresponding valid information. Information groups; each displays at least one content box. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports, including but not limited to visual chart titles, according to search keywords. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, through comprehensive identification, each valid information group that matches the search keyword and belongs to different report files is obtained, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
在一个可选实施例中提供了一种电子设备,如图18所示,图18所示的电子设备18000包括:处理器18001和存储器18003。其中,处理器18001和存储器18003相连,如通过总线18002相连。可选地,电子设备18000还可以包括收发器18004。需要说明的是,实际应用中收发器18004不限于一个,该电子设备18000的结构并不构成对本申请实施例的限定。In an optional embodiment, an electronic device is provided. As shown in FIG. 18 , the
处理器18001可以是CPU,通用处理器,DSP,ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。处理器18001也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等。The
总线18002可包括一通路,在上述组件之间传送信息。总线18002可以是PCI总线或EISA总线等。总线18002可以分为地址总线、数据总线、控制总线等。为便于表示,图18中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。The
存储器18003可以是ROM或可存储静态信息和指令的其他类型的静态存储设备,RAM或者可存储信息和指令的其他类型的动态存储设备,也可以是EEPROM、CD-ROM或其他光盘存储、光碟存储(包括压缩光碟、激光碟、光碟、数字通用光碟、蓝光光碟等)、磁盘存储介质或者其他磁存储设备、或者能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。The
存储器18003用于存储执行本申请方案的应用程序代码,并由处理器18001来控制执行。处理器18001用于执行存储器18003中存储的应用程序代码,以实现前述任一方法实施例所示的内容。The
其中,电子设备包括但不限于:移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。Among them, electronic devices include but are not limited to: mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (such as in-vehicle navigation terminals), etc. Mobile terminals such as digital TVs, desktop computers, etc., as well as stationary terminals.
本申请的又一实施例提供了一种计算机可读存储介质,该计算机可读存储介质上存储有计算机程序,当其在计算机上运行时,使得计算机可以执行前述方法实施例中相应内容。与现有技术相比,针对搜索关键词,搜索得到与搜索关键词对应的至少一个有效信息组,然后确定出各个有效信息组所属的报告文件,并获取每份报告文件的报告文件信息,再将各个有效信息组基于所属的报告文件进行聚合,得到每份报告文件各自对应的聚合后的有效信息组,针对每份报告文件生成内容盒,得到至少一个内容盒;内容盒包括报告文件的报告文件信息和对应的有效信息组;分别展示至少一个内容盒。通过上述方式,本发明实施例可以根据搜索关键词对所有报告的内容进行综合性识别,包括但不限于可视化图表类标题,相较于现有技术中仅限于可视化图表类标题的识别,导致针对内容格式多样化和复杂度高的机构类报告和其他类型报告,会出现搜索关键词命中率较低问题,本发明实施例对报告内容标准化程度要求较低,可兼容更多报告内容类型,从而提升了搜索关键词的命中率。同时,通过综合性识别得到与搜索关键词匹配的、属于不同报告文件的各个有效信息组,再基于报告文件对各个有效信息组进行聚合性展示,使得同一份报告文件内与搜索关键词匹配的多个有效信息组具有关联性,减少了用户在浏览时的甄别行为,从而提升了用户体验。Yet another embodiment of the present application provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when it runs on a computer, the computer can execute the corresponding content in the foregoing method embodiments. Compared with the prior art, for the search keyword, at least one valid information group corresponding to the search keyword is obtained by searching, then the report file to which each valid information group belongs is determined, and the report file information of each report file is obtained, and then the report file information of each report file is obtained. Aggregate each valid information group based on the report file to which it belongs, obtain the aggregated valid information group corresponding to each report file, generate a content box for each report file, and obtain at least one content box; the content box includes the report of the report file. File information and corresponding valid information groups; at least one content box is displayed respectively. In the above manner, the embodiment of the present invention can comprehensively identify the contents of all reports according to the search keywords, including but not limited to the titles of visual charts. Institutional reports and other types of reports with diverse content formats and high complexity may have a low search keyword hit rate. The embodiment of the present invention requires less standardization of report content, and is compatible with more report content types, thereby Improve the hit rate of search keywords. At the same time, each valid information group that matches the search keyword and belongs to different report files is obtained through comprehensive identification, and then each valid information group is aggregated and displayed based on the report file, so that the same report file matches the search keyword. Multiple valid information groups are related, which reduces the user's discriminating behavior when browsing, thereby improving the user experience.
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
以上所述仅是本发明的部分实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above are only some embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principles of the present invention, several improvements and modifications can be made. It should be regarded as the protection scope of the present invention.
Claims (14)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010622216.7A CN111666383B (en) | 2020-06-30 | 2020-06-30 | Information processing method, device, electronic device, and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010622216.7A CN111666383B (en) | 2020-06-30 | 2020-06-30 | Information processing method, device, electronic device, and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111666383A true CN111666383A (en) | 2020-09-15 |
CN111666383B CN111666383B (en) | 2025-09-26 |
Family
ID=72391184
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010622216.7A Active CN111666383B (en) | 2020-06-30 | 2020-06-30 | Information processing method, device, electronic device, and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111666383B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112612944A (en) * | 2020-12-07 | 2021-04-06 | 深圳价值在线信息科技股份有限公司 | Case information management method, terminal equipment and system |
CN113239650A (en) * | 2021-07-09 | 2021-08-10 | 成都爱旗科技有限公司 | Report generation method and device and electronic equipment |
CN113297345A (en) * | 2021-05-21 | 2021-08-24 | 深圳市智尊宝数据开发有限公司 | Analysis report generation method, electronic equipment and related product |
CN113342967A (en) * | 2021-05-25 | 2021-09-03 | 清华大学 | Display method and device based on augmented reality technology and electronic equipment |
CN113535892A (en) * | 2021-06-08 | 2021-10-22 | 北京易创新科信息技术有限公司 | Industry research report searching method and device and electronic equipment |
CN114239485A (en) * | 2021-12-17 | 2022-03-25 | 杭州太美星程医药科技有限公司 | A method and system for generating report preview page |
CN114298650A (en) * | 2021-11-19 | 2022-04-08 | 上海挚星信息科技有限公司 | Project report entry method and device, electronic equipment and storage medium |
CN114691974A (en) * | 2020-12-30 | 2022-07-01 | 珠海市魅族科技有限公司 | Fragmentation information processing method, device, medium and computer equipment |
CN115203398A (en) * | 2021-04-13 | 2022-10-18 | 珠海金山办公软件有限公司 | Document processing method and device, electronic equipment and storage medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6513032B1 (en) * | 1998-10-29 | 2003-01-28 | Alta Vista Company | Search and navigation system and method using category intersection pre-computation |
KR20100066263A (en) * | 2008-12-08 | 2010-06-17 | 한국전자통신연구원 | Device for index managing of evidence image in digital forensic system and method therefor |
JP2011128669A (en) * | 2009-12-15 | 2011-06-30 | Nippon Telegr & Teleph Corp <Ntt> | Device and program for retrieving information |
CN102483747A (en) * | 2009-08-11 | 2012-05-30 | Cpa全球专利研究有限公司 | Image element searching |
CN103023968A (en) * | 2012-11-15 | 2013-04-03 | 中科院成都信息技术有限公司 | Network distributed storage and reading method for file |
CN104462471A (en) * | 2014-12-17 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for providing segmentation search results |
CN105786852A (en) * | 2014-12-23 | 2016-07-20 | 北京奇虎科技有限公司 | Search result integration method and apparatus |
CN107846352A (en) * | 2017-11-10 | 2018-03-27 | 维沃移动通信有限公司 | A kind of method for information display, mobile terminal |
US20180165724A1 (en) * | 2016-12-13 | 2018-06-14 | International Business Machines Corporation | Method and system for contextual business intelligence report generation and display |
CN108932294A (en) * | 2018-05-31 | 2018-12-04 | 平安科技(深圳)有限公司 | Resume data processing method, device, equipment and storage medium based on index |
-
2020
- 2020-06-30 CN CN202010622216.7A patent/CN111666383B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6513032B1 (en) * | 1998-10-29 | 2003-01-28 | Alta Vista Company | Search and navigation system and method using category intersection pre-computation |
KR20100066263A (en) * | 2008-12-08 | 2010-06-17 | 한국전자통신연구원 | Device for index managing of evidence image in digital forensic system and method therefor |
CN102483747A (en) * | 2009-08-11 | 2012-05-30 | Cpa全球专利研究有限公司 | Image element searching |
JP2011128669A (en) * | 2009-12-15 | 2011-06-30 | Nippon Telegr & Teleph Corp <Ntt> | Device and program for retrieving information |
CN103023968A (en) * | 2012-11-15 | 2013-04-03 | 中科院成都信息技术有限公司 | Network distributed storage and reading method for file |
CN104462471A (en) * | 2014-12-17 | 2015-03-25 | 北京奇虎科技有限公司 | Method and device for providing segmentation search results |
CN105786852A (en) * | 2014-12-23 | 2016-07-20 | 北京奇虎科技有限公司 | Search result integration method and apparatus |
US20180165724A1 (en) * | 2016-12-13 | 2018-06-14 | International Business Machines Corporation | Method and system for contextual business intelligence report generation and display |
CN107846352A (en) * | 2017-11-10 | 2018-03-27 | 维沃移动通信有限公司 | A kind of method for information display, mobile terminal |
CN108932294A (en) * | 2018-05-31 | 2018-12-04 | 平安科技(深圳)有限公司 | Resume data processing method, device, equipment and storage medium based on index |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112612944A (en) * | 2020-12-07 | 2021-04-06 | 深圳价值在线信息科技股份有限公司 | Case information management method, terminal equipment and system |
CN112612944B (en) * | 2020-12-07 | 2024-05-31 | 深圳价值在线信息科技股份有限公司 | Case information management method, terminal equipment and system |
CN114691974A (en) * | 2020-12-30 | 2022-07-01 | 珠海市魅族科技有限公司 | Fragmentation information processing method, device, medium and computer equipment |
CN115203398A (en) * | 2021-04-13 | 2022-10-18 | 珠海金山办公软件有限公司 | Document processing method and device, electronic equipment and storage medium |
CN113297345B (en) * | 2021-05-21 | 2021-12-03 | 深圳市智尊宝数据开发有限公司 | Analysis report generation method, electronic equipment and related product |
CN113297345A (en) * | 2021-05-21 | 2021-08-24 | 深圳市智尊宝数据开发有限公司 | Analysis report generation method, electronic equipment and related product |
CN113342967A (en) * | 2021-05-25 | 2021-09-03 | 清华大学 | Display method and device based on augmented reality technology and electronic equipment |
CN113535892A (en) * | 2021-06-08 | 2021-10-22 | 北京易创新科信息技术有限公司 | Industry research report searching method and device and electronic equipment |
CN113535892B (en) * | 2021-06-08 | 2023-12-01 | 北京易创新科信息技术有限公司 | Search method and device for industry research report and electronic equipment |
CN113239650B (en) * | 2021-07-09 | 2021-10-15 | 成都爱旗科技有限公司 | Report generation method and device and electronic equipment |
CN113239650A (en) * | 2021-07-09 | 2021-08-10 | 成都爱旗科技有限公司 | Report generation method and device and electronic equipment |
CN114298650A (en) * | 2021-11-19 | 2022-04-08 | 上海挚星信息科技有限公司 | Project report entry method and device, electronic equipment and storage medium |
CN114239485A (en) * | 2021-12-17 | 2022-03-25 | 杭州太美星程医药科技有限公司 | A method and system for generating report preview page |
Also Published As
Publication number | Publication date |
---|---|
CN111666383B (en) | 2025-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111666383A (en) | Information processing method, information processing device, electronic equipment and computer readable storage medium | |
US12406133B2 (en) | Method and apparatus for displaying text content copied from a first application in a second application | |
US10445063B2 (en) | Method and apparatus for classifying and comparing similar documents using base templates | |
US9507867B2 (en) | Discovery engine | |
JP6538277B2 (en) | Identify query patterns and related aggregate statistics among search queries | |
US8046436B2 (en) | System and method of providing context information for client application data stored on the web | |
US20130311485A1 (en) | Method and system relating to sentiment analysis of electronic content | |
US20080229251A1 (en) | System and method for providing web system services for storing data and context of client applications on the web | |
MX2013014800A (en) | Recommending data enrichments. | |
US8156144B2 (en) | Metadata search interface | |
US9552378B2 (en) | Method and apparatus for saving search query as metadata with an image | |
US20220152474A1 (en) | Developing implicit metadata for data stores | |
RU2693193C1 (en) | Automated extraction of information | |
CN109670183B (en) | Text importance calculation method, device, equipment and storage medium | |
US8046437B2 (en) | System and method of storing data and context of client application on the web | |
JP7675792B2 (en) | How cache is generated for a database | |
US11250084B2 (en) | Method and system for generating content from search results rendered by a search engine | |
US8892596B1 (en) | Identifying related documents based on links in documents | |
CN110597953A (en) | A keyword search method, mobile terminal and computer storage medium | |
CN116186198A (en) | Information retrieval method, information retrieval device, computer equipment and storage medium | |
KR101662215B1 (en) | Search system and method for providing expansion search information | |
US9946698B2 (en) | Inserting text and graphics using hand markup | |
US10318582B2 (en) | Indexing electronic documents | |
US20160350315A1 (en) | Intra-document search | |
US20160150038A1 (en) | Efficiently Discovering and Surfacing Content Attributes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |