[go: up one dir, main page]

CN113806491B - Information processing method, device, equipment and medium - Google Patents

Information processing method, device, equipment and medium Download PDF

Info

Publication number
CN113806491B
CN113806491B CN202111143741.1A CN202111143741A CN113806491B CN 113806491 B CN113806491 B CN 113806491B CN 202111143741 A CN202111143741 A CN 202111143741A CN 113806491 B CN113806491 B CN 113806491B
Authority
CN
China
Prior art keywords
document
text
query
similarity
keyword
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111143741.1A
Other languages
Chinese (zh)
Other versions
CN113806491A (en
Inventor
李舒
周永鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Comac Software Co ltd
Shanghai Aviation Industry Group Co ltd
Original Assignee
Comac Software Co ltd
Shanghai Aviation Industry Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Comac Software Co ltd, Shanghai Aviation Industry Group Co ltd filed Critical Comac Software Co ltd
Priority to CN202111143741.1A priority Critical patent/CN113806491B/en
Publication of CN113806491A publication Critical patent/CN113806491A/en
Application granted granted Critical
Publication of CN113806491B publication Critical patent/CN113806491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an information processing method, a device, equipment and a medium, wherein the method comprises the following steps: acquiring a query request sent by a query terminal; the query request carries a query text; according to the keywords in the query text and the first labels of each document stored in the search library, sorting the documents in the search library in similarity; for each document, determining a target fragment text with similarity meeting a preset requirement according to a second label corresponding to each fragment in the document; and sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence. By adopting the method, the problem of inaccurate search results is solved.

Description

Information processing method, device, equipment and medium
Technical Field
The present application relates to the field of information processing, and in particular, to a method, an apparatus, a device, and a medium for information processing.
Background
In the process of technical progress, scientific research materials are continuously accumulated, the scientific research materials are basically stored in a database in a numeric mode so as to be conveniently referred by users in the subsequent production process, and in order to conveniently refer to the materials in the database, a search engine is generated, the search engine collects information from the Internet by using a specific computer program according to a certain strategy, and after the information is organized and processed, search services are provided for the users and the searched related information is displayed to the systems of the users.
However, in the existing method of referring to materials by a search engine, basically, a user inputs a query text, and a retrieval system directly provides documents related to the query text.
Disclosure of Invention
Accordingly, an object of the present application is to provide a method, apparatus, device and medium for information processing, which are used for solving the problem that the search result in the prior art is not accurate enough.
In a first aspect, an embodiment of the present application provides a method for information processing, including:
acquiring a query request sent by a query terminal; the query request carries a query text;
According to the keywords in the query text and the first labels of each document stored in the search library, sorting the documents in the search library in similarity;
for each document, determining a target fragment text with similarity meeting a preset requirement according to a second label corresponding to each fragment in the document;
And sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
In one possible embodiment, the first label of the document includes a second label corresponding to each fragmented text in the document and a third label corresponding to a large title of the document;
The second tag includes any one or more of the following words: a subtitle keyword and a text keyword of the fragmented text, a first associated word with a correlation with the subtitle keyword, and a second associated word with a correlation with the text keyword;
The third tag includes the following words: and the large title keyword of the document and a third related word with relevance to the large title keyword.
In one possible embodiment, the first tag of each document stored in the search repository is obtained by:
For each document in the search library, carrying out fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each piece of text in the document and the third label corresponding to the large title of the document into the first label of the document aiming at each document in the search library.
In one possible embodiment, the second tag of the fragmented text is determined by:
Determining at least one keyword of the fragmented text based on content information of the fragmented text;
determining related words with relevance to the keywords according to the similarity between the keywords and each candidate word in the related word library;
And determining the keywords and the related words with the relatedness to the keywords as second labels of the fragmented text.
In a possible implementation manner, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
For each document in the search library, if the document comprises a subtitle, carrying out fragmentation processing on the document according to the subtitle to obtain at least one fragmented text; and if the document does not comprise the subtitle, carrying out fragmentation processing on the document according to the segmentation to obtain at least one fragmented text.
In one possible real-time scenario, the ranking the documents in the search repository according to the keywords in the query text and the first label of each document stored in the search repository includes:
For each document stored in the search library, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text;
for each document stored in the search library, calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, sorting the documents in the search library.
In one possible embodiment, the method further comprises:
And sending the keywords in the query text to a query terminal so that the query terminal highlights the keywords in the query text contained in the target fragment text of each document.
In a second aspect, an embodiment of the present application provides an apparatus for information processing, including:
the acquisition module is used for acquiring the query request sent by the query terminal; the query request carries a query text;
the sorting module is used for sorting the similarity of the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library;
The determining module is used for determining target fragment texts with similarity meeting preset requirements according to the second labels corresponding to the fragments in each document aiming at each document;
and the sending module is used for sending the target fragment text of each document and the similarity sequence of the documents to the query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
In one possible embodiment, the first label of the document includes a second label corresponding to each fragmented text in the document and a third label corresponding to a large title of the document;
The second tag includes any one or more of the following words: a subtitle keyword and a text keyword of the fragmented text, a first associated word with a correlation with the subtitle keyword, and a second associated word with a correlation with the text keyword;
The third tag includes the following words: and the large title keyword of the document and a third related word with relevance to the large title keyword.
In a possible embodiment, the first tag of each document stored in the search pool in the ranking unit is obtained by:
For each document in the search library, carrying out fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each piece of text in the document and the third label corresponding to the large title of the document into the first label of the document aiming at each document in the search library.
In a possible embodiment, the second tag of the piece of text in the determining unit is determined by:
Determining at least one keyword of the fragmented text based on content information of the fragmented text;
determining related words with relevance to the keywords according to the similarity between the keywords and each candidate word in the related word library;
And determining the keywords and the related words with the relatedness to the keywords as second labels of the fragmented text.
In a possible implementation manner, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
For each document in the search library, if the document comprises a subtitle, carrying out fragmentation processing on the document according to the subtitle to obtain at least one fragmented text; and if the document does not comprise the subtitle, carrying out fragmentation processing on the document according to the segmentation to obtain at least one fragmented text.
In a possible embodiment, the ranking module is specifically configured to, when ranking the documents in the search repository according to the keywords in the query text and the first tag of each document stored in the search repository, perform similarity ranking on the documents in the search repository:
For each document stored in the search library, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text;
for each document stored in the search library, calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, sorting the documents in the search library.
In one possible embodiment, the apparatus further comprises:
And the display unit is used for sending the keywords in the query text to the query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of the first aspect.
The embodiment of the application firstly acquires the query request which is sent by the query terminal and carries the query text, thereby determining the content which the user wants to search at the query terminal according to the query text in the query request. And sorting the documents in the search library according to the keywords in the query text and the first labels of each document stored in the search library, so that the documents are displayed on a display terminal according to the similarity between the documents and the query text. And determining target fragment texts with similarity meeting preset requirements with the query text according to the second labels corresponding to the fragments in each document aiming at each document. And sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
By the method, the target fragment text with high similarity with the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed from high to low according to the similarity.
In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a method for processing information according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for determining a first tag of a document according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present application.
Fig. 1 is a flow chart of a method for processing information according to an embodiment of the present application, as shown in fig. 1, the method is implemented by the following steps:
step 101, acquiring a query request sent by a query terminal; the query request carries query text.
Specifically, the query text may be an entire document, or may be a word or a sentence, where the content of the query text is what the user wants to search through the query terminal. The query request contains query text, time of submitting the query text, information of the user, and the like, and also one or more of document types selected by the user. Document types may include: information, bar, encyclopedia, library, web page. Specifically, when the user does not set the document searching range, searching is performed in all documents in the search library by default.
And 102, sorting the similarity of the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library.
Specifically, the keywords in the query text can be marked when marking or query is realized, or can be retrieved and extracted by a system according to big data or a retrieval library. For example, when the query text is "what the operating voltage of the single-chip microcomputer is", if the first label related to the "single-chip microcomputer" and the "voltage" is marked in advance in the search library, the keyword in the query text is determined to be "single-chip microcomputer" and "operating voltage". Or setting a keyword extraction formula or an extraction model for the query text, so as to extract the words of the keywords such as verbs, nouns and the like in the query text. In the embodiment of the application, the documents in the search library can be classified according to the first label or the corresponding field of the documents, so that the user can conduct classified search during search or searching, and the data processing capacity of the system is reduced. The first label of the document comprises a second label corresponding to each fragment text in the document and a third label corresponding to the large title of the document; the second tag includes any one or more of the following words: a subtitle keyword and a text keyword of the fragmented text, a first associated word with a correlation with the subtitle keyword, and a second associated word with a correlation with the text keyword; the third tag includes the following words: and the large title keyword of the document and a third related word with relevance to the large title keyword.
According to the determined keywords in the query text, determining the similarity between the keywords and the first labels of all documents stored in the search library, determining the similarity between the documents in the search library and the query text according to the keywords and the first labels, and sorting the documents in the search library according to the numerical value of the similarity.
And 103, determining target fragment texts with similarity meeting preset requirements with the query text according to second labels corresponding to fragments in each document aiming at each document.
Specifically, each fragment in the document corresponds to key information of a section or a paragraph in the document, and the finer the division of the fragments in the document, the more second tags of the document stored in the search repository.
And for each document, determining the similarity between the document and the query text according to the second label marked for the document and the query text and the keywords in the query text, and taking the text formed by the vocabulary, the sentence or the paragraph, of which the similarity with the query text meets the preset requirement, in the document as the target fragment text.
And 104, sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
Specifically, after obtaining the target fragment text with the similarity meeting the preset requirement with the query text in step 103, sorting the target fragment text and the similarity between the document corresponding to the target fragment text and the query text according to the similarity value of the query text, and sending the target fragment text arranged according to the similarity value and the document corresponding to each target fragment text to the query terminal, so that the query terminal displays the target fragment text of each document according to the similarity sorting. In the embodiment of the application, the display terminal can have various modes for displaying the target fragment text, for example, a path which can lead to a document corresponding to the target fragment text is provided in the display terminal; or directly locating the region where the target fragment text is located in the document corresponding to the target fragment text, and highlighting the content in the target fragment text in the document.
Specifically, for each document, the fragment text is a text formed by the content corresponding to each fragment in the document, and when each fragment is a paragraph in the document, the content in each paragraph forms a fragment text; when each segment is a piece of content in the document, all the content of each segment is taken as a segment text no matter how many pieces of text or images are contained in each segment. If information such as images and audios appears, the part of the information can be deleted, or the part of the information can be converted into text information through a specific means, a subtitle in a fragmented text, a keyword in the subtitle, a body keyword of the fragmented text, a first associated word with correlation with the subtitle keyword and a second associated word with correlation with the body keyword can be used as a second tag of the fragmented text by any one or more words. The second label of the fragmented text may also be at least two or more words and a correspondence between each word. At least one second tag may be provided for each fragmented text.
The embodiment of the application firstly acquires the query request which is sent by the query terminal and carries the query text, thereby determining the content which the user wants to search at the query terminal according to the query text in the query request. And sorting the documents in the search library according to the keywords in the query text and the first labels of each document stored in the search library, so that the documents are displayed on a display terminal according to the similarity between the documents and the query text. And determining target fragment texts with similarity meeting preset requirements with the query text according to the second labels corresponding to the fragments in each document aiming at each document. And sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
By the method, the target fragment text with high similarity with the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed from high to low according to the similarity.
In a possible implementation manner, fig. 2 is a flowchart of a method for determining a first label of a document according to an embodiment of the present application, where, as shown in fig. 2, the first label of each document stored in the search repository is obtained by:
Step 201, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text.
Specifically, before the document is subjected to fragmentation processing according to a preset segmentation requirement, the method further comprises: and acquiring each document in a database, judging the type of the document according to each document in the database, and storing the document into the search library if the document is of a text type. The documents in the database can be uploaded and obtained from public channels, or can be obtained by statistics according to big data and other modes. If the type of the document is a non-text type, the document of the non-text type can be arranged into a document of a text type through a text conversion plug-in, and then the document is stored in the database.
The number of documents in the search pool is limited. For each document in the search library, the document is firstly subjected to fragmentation processing, and one document is divided into at least one fragmented text. According to practical situations, the preset segmentation requirements are adjusted, for example, the preset segmentation requirements can be that texts formed by each paragraph in the document are used as a fragment text; each text made up of the content of each section in the document may also be used as fragmented text. In the embodiment of the application, the document which is judged to be the text type can be stored in the preset area in the database, and the access interface of the preset area in the database is provided for the retrieval library, so that a user can access the target document in the database through the interface in the retrieval library.
Step 202, integrating a second label of each piece of text in the document and a third label corresponding to a large title of the document into a first label of the document for each document in the search library.
Specifically, after the fragmented text is sorted out in step 201, at least one second label and at least one third label are marked for each fragmented text, and the second label and the third label corresponding to the document are formed into the first label of the document.
In one possible embodiment, the second tag of the fragmented text is determined by: determining at least one keyword of the fragmented text based on content information of the fragmented text; determining related words with relevance to the keywords according to the similarity between the keywords and each candidate word in the related word library; and determining the keywords and the related words with the relatedness to the keywords as second labels of the fragmented text.
Specifically, the candidate words are words with similar or same semantics, such as a paraphrasing word, a synonym word, a paraphrasing word, an abbreviation word and the like, of each keyword in the fragment text. Content information is all content that constitutes the fragmented text, including but not limited to: subheadings, large headings, text content, video audio, and the like. Determining at least one keyword in the fragmented text according to the content information in the fragmented text, wherein the keyword comprises the following components: large title, subtitle, body text, and summary text for the fragmented text. And determining the similarity between the keywords in the fragment text and each candidate word in the associated word stock according to each keyword determined for the fragment text, and determining the candidate word which meets the preset association threshold as the associated word of the keywords in the fragment text according to the obtained similarity result. And determining each keyword in the fragmented text and the related word with the correlation with each keyword as a second label of the fragmented text.
In a possible implementation manner, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including: for each document in the search library, if the document comprises a subtitle, carrying out fragmentation processing on the document according to the subtitle to obtain at least one fragmented text; and if the document does not comprise the subtitle, carrying out fragmentation processing on the document according to the segmentation to obtain at least one fragmented text.
Specifically, in the embodiment of the application, the documents are divided according to the subtitles, and the text formed by each subtitle and the corresponding content under the subtitle is used as the fragment text; when the document has no subtitle or only one title, the document is divided according to paragraphs, and the text formed by the content corresponding to each paragraph is used as the fragment text. The embodiment of the application does not limit the division mode of the fragment text, and for example, the fragment text can be divided according to keywords, description modes and description contents.
The embodiment of the application firstly acquires the query request which is sent by the query terminal and carries the query text, thereby determining the content which the user wants to search at the query terminal according to the query text in the query request. And sorting the documents in the search library according to the keywords in the query text and the first labels of each document stored in the search library, so that the documents are displayed on a display terminal according to the similarity between the documents and the query text. And determining target fragment texts with similarity meeting preset requirements with the query text according to the second labels corresponding to the fragments in each document aiming at each document. And sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
By the method, the target fragment text with high similarity with the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed from high to low according to the similarity.
In one possible embodiment, the ranking the documents in the search repository according to the keywords in the query text and the first tag of each document stored in the search repository in step 102 includes:
step 1021, calculating the vocabulary similarity of each vocabulary in the first label of the document and the keywords in the query text for each document stored in the search library.
Specifically, the method for calculating the similarity may be to compare the vocabulary in the first tag with the vocabulary in the query text word by word, or determine the semantic similarity between the vocabulary in the first tag and the vocabulary in the query text through a model, and calculate the similarity value of the two.
Step 1022, for each document stored in the search library, calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document.
Specifically, the weights of the words may be preset or may be determined according to the tag groups where the words are located, for example, the weights of the words in the second tag are set to be different from the weights of the words in the third tag, or the words in the document are classified, the weights of the large titles are set to be a first value, the weights of the sub-titles are set to be a second value, the weights of the text keywords are set to be a third value, and the like. After calculating the similarity value between each word in the first tag of all documents in the search library and the keyword in the query text in step 1021, calculating the similarity between the keyword in the query text and each document stored in the search library according to the calculated similarity value and weight. By adjusting the weight of each word in the first tag, different document similarity ranks can be obtained.
Step 1023, sorting the documents in the search library according to the document similarity of each document.
Specifically, after the similarity between each document in the search repository and the query text is calculated in step 1022, the documents are ranked. In the embodiment of the application, the document corresponding to the value of the document similarity lower than the lowest similarity threshold is not displayed by a method of setting the lowest similarity threshold, or the vocabulary lower than the lowest similarity threshold in the first label is rejected and does not participate in document similarity calculation, so that the browsing time of a user on useless information is reduced or the calculation amount of a system is reduced.
In one possible embodiment, the method further comprises:
And sending the keywords in the query text to a query terminal so that the query terminal highlights the keywords in the query text contained in the target fragment text of each document.
Specifically, at the query terminal, keywords in the query text are displayed or highlighted separately, and target fragment text for the keyword search is displayed. And highlighting the vocabulary, the keyword similarity of which with the query text exceeds a preset threshold value, in the target fragment text.
Fig. 3 is a schematic structural diagram of an information processing apparatus according to an embodiment of the present application, where the method corresponds to a method for processing information in fig. 1, and as shown in fig. 3, the apparatus includes: an acquisition module 301, a sorting module 302, a determination module 303 and a sending module 304.
An obtaining module 301, configured to obtain a query request sent by a query terminal; the query request carries query text.
And the ranking module 302 is configured to rank the documents in the search repository according to the keywords in the query text and the first tag of each document stored in the search repository.
And the determining module 303 is configured to determine, for each document, a target fragment text with similarity meeting a preset requirement according to a second tag corresponding to each fragment in the document.
And the sending module 304 is configured to send the target fragment text of each document and the similarity ranking of the documents to the query terminal, so that the query terminal displays the target fragment text of each document according to the similarity ranking.
In one possible embodiment, the first tag of the document includes a second tag corresponding to each fragmented text in the document and a third tag corresponding to a large title of the document.
The second tag includes any one or more of the following words: the method comprises the steps of a subtitle keyword and a text keyword of the fragmented text, a first associated word with a correlation with the subtitle keyword and a second associated word with a correlation with the text keyword.
The third tag includes the following words: and the large title keyword of the document and a third related word with relevance to the large title keyword.
In a possible embodiment, the first tag of each document stored in the search pool in the ranking unit is obtained by the following steps.
And for each document in the search library, carrying out fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text.
And integrating the second label of each piece of text in the document and the third label corresponding to the large title of the document into the first label of the document aiming at each document in the search library.
In a possible embodiment, the second tag of the piece of text in the determining unit is determined by the following steps.
At least one keyword of the fragmented text is determined based on content information of the fragmented text.
And determining the related words with the relevance to the keywords according to the similarity between the keywords and each candidate word in the related word library.
And determining the keywords and the related words with the relatedness to the keywords as second labels of the fragmented text.
In a possible implementation manner, for each document in the search library, performing fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text, including:
For each document in the search library, if the document comprises a subtitle, carrying out fragmentation processing on the document according to the subtitle to obtain at least one fragmented text; and if the document does not comprise the subtitle, carrying out fragmentation processing on the document according to the segmentation to obtain at least one fragmented text.
In a possible embodiment, the ranking module is specifically configured to, when ranking the documents in the search repository according to the keywords in the query text and the first tag of each document stored in the search repository, perform similarity ranking on the documents in the search repository:
for each document stored in the search repository, calculating the lexical similarity of each lexicon in the first label of the document to the keywords in the query text.
And calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document.
And according to the document similarity of each document, sorting the documents in the search library.
In one possible embodiment, the apparatus further comprises:
And the display unit is used for sending the keywords in the query text to the query terminal so that the query terminal can highlight the keywords in the query text contained in the target fragment text of each document.
The embodiment of the application firstly acquires the query request which is sent by the query terminal and carries the query text, thereby determining the content which the user wants to search at the query terminal according to the query text in the query request. And sorting the documents in the search library according to the keywords in the query text and the first labels of each document stored in the search library, so that the documents are displayed on a display terminal according to the similarity between the documents and the query text. And determining target fragment texts with similarity meeting preset requirements with the query text according to the second labels corresponding to the fragments in each document aiming at each document. And sending the target fragment text of each document and the similarity sequence of the documents to a query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequence.
By the method, the target fragment text with high similarity with the query text is displayed on the display terminal, the search result can be accurate to a specific section or paragraph, and the target fragment text and the document corresponding to the target fragment text are displayed from high to low according to the similarity.
Corresponding to the method in fig. 1, the embodiment of the present application further provides a computer device 400, and fig. 4 is a schematic structural diagram of a computer device provided in the embodiment of the present application, where the device includes a memory 401, a processor 402, and a computer program stored in the memory 401 and capable of running on the processor 402, where the processor 402 implements the method of information processing when executing the computer program.
Specifically, the memory 401 and the processor 402 can be general-purpose memories and processors, which are not limited herein, and when the processor 402 runs a computer program stored in the memory 401, the method for processing information can be performed, so that the problem of inaccurate search results in the prior art is solved.
Corresponding to a method of information processing in fig. 1, an embodiment of the present application further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method of information processing described above.
Specifically, the storage medium can be a general storage medium, such as a mobile disk, a hard disk, and the like, and when a computer program on the storage medium is run, the method for processing information can be used for solving the problem that a search result is not accurate enough in the prior art.
In the embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be other manners of division in actual implementation, and for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments provided in the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters in the following figures denote like items, and thus once an item is defined in one figure, no further definition or explanation of it is required in the following figures, and furthermore, the terms "first," "second," "third," etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above examples are only specific embodiments of the present application, and are not intended to limit the scope of the present application, but it should be understood by those skilled in the art that the present application is not limited thereto, and that the present application is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the corresponding technical solutions. Are intended to be encompassed within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1.A method of information processing, comprising:
acquiring a query request sent by a query terminal; the query request carries a query text;
According to the keywords in the query text and the first labels of each document stored in the search library, the documents in the search library are subjected to similarity sorting, the first labels comprise second labels corresponding to each fragment text in the documents, and the second labels comprise associated words in an associated word library;
for each document, determining a target fragment text with similarity meeting a preset requirement according to a second label corresponding to each fragment text in the document;
The method comprises the steps of sequencing target fragment texts of each document and the similarity of the documents, and sending the target fragment texts to a query terminal so that the query terminal displays the target fragment texts of each document according to the similarity sequencing;
The second label corresponding to each fragmented text is determined by:
Determining at least one keyword of the fragmented text based on content information of the fragmented text;
According to the similarity between the keyword and each candidate word in the associated word library, determining the candidate word which accords with a preset associated threshold as an associated word corresponding to the keyword, wherein the candidate word is a word similar to or the same as the semantic of each keyword in the fragment text;
Determining the keywords and associated words corresponding to the keywords as second labels of the fragmented text;
The step of sorting the similarity of the documents in the search library according to the keywords in the query text and the first label of each document stored in the search library comprises the following steps:
For each document stored in the search library, calculating the vocabulary similarity of each vocabulary in a first label of the document and the keywords in the query text, wherein the vocabulary similarity is used for representing the semantic approximation degree between the two vocabularies;
for each document stored in the search library, calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, sorting the documents in the search library.
2. The method of claim 1, wherein the first tag of the document further comprises a third tag corresponding to a large title of the document;
The second tag includes any one or more of the following words: a subtitle keyword and a text keyword of the fragmented text, a first associated word with a correlation with the subtitle keyword, and a second associated word with a correlation with the text keyword;
The third tag includes the following words: and the large title keyword of the document and a third related word with relevance to the large title keyword.
3. The method of claim 1, wherein retrieving the first tag for each document stored in the repository is by:
For each document in the search library, carrying out fragmentation processing on the document according to a preset segmentation requirement to obtain at least one fragmented text;
and integrating the second label of each piece of text in the document and the third label corresponding to the large title of the document into the first label of the document aiming at each document in the search library.
4. A method according to claim 3, wherein for each document in the search pool, the document is fragmented according to a preset segmentation requirement to obtain at least one fragmented text, comprising:
For each document in the search library, if the document comprises a subtitle, carrying out fragmentation processing on the document according to the subtitle to obtain at least one fragmented text; and if the document does not comprise the subtitle, carrying out fragmentation processing on the document according to the segmentation to obtain at least one fragmented text.
5. The method according to claim 1, wherein the method further comprises:
And sending the keywords in the query text to a query terminal so that the query terminal highlights the keywords in the query text contained in the target fragment text of each document.
6. An apparatus for information processing, comprising:
the acquisition module is used for acquiring the query request sent by the query terminal; the query request carries a query text;
the sorting module is used for sorting the similarity of the documents in the search library according to the keywords in the query text and the first labels of each document stored in the search library, wherein the first labels comprise second labels corresponding to each fragment text in the documents, and the second labels comprise associated words in an associated word library;
The determining module is used for determining target fragment texts with similarity meeting preset requirements according to the second labels corresponding to the fragment texts in each document aiming at each document;
The sending module is used for sequencing the target fragment text of each document and the similarity of the documents, and sending the target fragment text to the query terminal so that the query terminal displays the target fragment text of each document according to the similarity sequencing;
the determining module is further configured to:
Determining at least one keyword of the fragmented text based on content information of the fragmented text;
According to the similarity between the keyword and each candidate word in the associated word library, determining the candidate word which accords with a preset associated threshold as an associated word corresponding to the keyword, wherein the candidate word is a word similar to or the same as the semantic of each keyword in the fragment text;
Determining the keywords and associated words corresponding to the keywords as second labels of the fragmented text;
the sorting module is specifically configured to:
For each document stored in the search library, calculating the vocabulary similarity of each vocabulary in a first label of the document and the keywords in the query text, wherein the vocabulary similarity is used for representing the semantic approximation degree between the two vocabularies;
for each document stored in the search library, calculating the keyword in the query text and the document similarity of each document stored in the search library according to the calculated vocabulary similarity corresponding to each vocabulary in the first label of the document and the weight of each vocabulary in the first label of the document;
and according to the document similarity of each document, sorting the documents in the search library.
7. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of the preceding claims 1-5 when the computer program is executed.
8. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the method of any of the preceding claims 1-5.
CN202111143741.1A 2021-09-28 2021-09-28 Information processing method, device, equipment and medium Active CN113806491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111143741.1A CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111143741.1A CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN113806491A CN113806491A (en) 2021-12-17
CN113806491B true CN113806491B (en) 2024-06-25

Family

ID=78938891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111143741.1A Active CN113806491B (en) 2021-09-28 2021-09-28 Information processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113806491B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114357511A (en) * 2021-12-30 2022-04-15 北京鼎普科技股份有限公司 A method, device and user terminal for marking key content of a document
CN114328983A (en) * 2021-12-31 2022-04-12 北京索为系统技术股份有限公司 Document shredding method, data retrieval method, device and electronic device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678412A (en) * 2012-09-21 2014-03-26 北京大学 Document retrieval method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190347281A1 (en) * 2016-11-11 2019-11-14 Dennemeyer Octimine Gmbh Apparatus and method for semantic search
CN108038096A (en) * 2017-11-10 2018-05-15 平安科技(深圳)有限公司 Knowledge database documents method for quickly retrieving, application server computer readable storage medium storing program for executing

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678412A (en) * 2012-09-21 2014-03-26 北京大学 Document retrieval method and device

Also Published As

Publication number Publication date
CN113806491A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
CN108829893B (en) Method and device for determining video label, storage medium and terminal equipment
TWI536181B (en) Language identification in multilingual text
CA2638558C (en) Topic word generation method and system
CN106156204B (en) Text label extraction method and device
TWI431493B (en) Method, computer readable storage medium, and computer system for optimization of fact extraction using a multi-stage approach
CN109634436B (en) Method, device, equipment and readable storage medium for associating input method
US9483557B2 (en) Keyword generation for media content
US9390161B2 (en) Methods and systems for extracting keyphrases from natural text for search engine indexing
CN109241319B (en) Picture retrieval method, device, server and storage medium
US8661049B2 (en) Weight-based stemming for improving search quality
US20130268519A1 (en) Fact verification engine
KR101377114B1 (en) News snippet generation system and method for generating news snippet
CN102043843A (en) Method and obtaining device for obtaining target entry based on target application
CN111460177B (en) Video expression search method and device, storage medium and computer equipment
CN113806491B (en) Information processing method, device, equipment and medium
JP2014106665A (en) Document retrieval device and document retrieval method
CN115794995A (en) Target answer obtaining method and related device, electronic equipment and storage medium
US9164981B2 (en) Information processing apparatus, information processing method, and program
US20230090601A1 (en) System and method for polarity analysis
CN113449063B (en) Method and device for constructing document structure information retrieval library
JP4428703B2 (en) Information retrieval method and system, and computer program
WO2019231635A1 (en) Method and apparatus for generating digest for broadcasting
KR101037091B1 (en) Ontology-based Semantic Retrieval System and Method for Multilingual Authorized Headings through Automatic Language Translation
CN116975202A (en) Document retrieval method, device, equipment and storage medium
CN112269852B (en) Method, system and storage medium for generating public opinion themes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant