[go: up one dir, main page]

CN117033744A - Data query method and device, storage medium and electronic equipment - Google Patents

Data query method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN117033744A
CN117033744A CN202311009446.6A CN202311009446A CN117033744A CN 117033744 A CN117033744 A CN 117033744A CN 202311009446 A CN202311009446 A CN 202311009446A CN 117033744 A CN117033744 A CN 117033744A
Authority
CN
China
Prior art keywords
data
sentence
history
sentences
sets
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311009446.6A
Other languages
Chinese (zh)
Inventor
伏勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311009446.6A priority Critical patent/CN117033744A/en
Publication of CN117033744A publication Critical patent/CN117033744A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data query method, a data query device, a storage medium and electronic equipment. Relates to the field of artificial intelligence. The method comprises the following steps: acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side. The application solves the problem of lower accuracy and completeness of the queried data when the data is queried in the related technology.

Description

Data query method and device, storage medium and electronic equipment
Technical Field
The application relates to the field of artificial intelligence, in particular to a data query method, a data query device, a storage medium and electronic equipment.
Background
Along with the increasing of open source information on the network, more and more users can query data in a gradually increasing amount when performing data query. However, since the data queried by the user is only displayed according to the query information input by the user, the situation that the data display is incomplete may occur, and in the case that the query information input by the user is inaccurate, the correct query result cannot be fed back according to the query information of the user, so that the user cannot acquire complete and accurate data.
Aiming at the problem that the accuracy and the completeness of the queried data are lower when the data are queried in the related technology, no effective solution is proposed at present.
Disclosure of Invention
The application provides a data query method, a data query device, a storage medium and electronic equipment, which are used for solving the problem that the accuracy and the completeness of queried data are lower when the data are queried in the related technology.
According to one aspect of the present application, a data query method is provided. The method comprises the following steps: acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side.
Optionally, the target language model is composed of a first language model and a second language model, and the database is generated by: acquiring text information in a preset scene from a public database to obtain a text information set, and preprocessing each text information in the text information set to obtain M statement sets, wherein the text information set comprises M text information; inputting M statement sets into a first language model to obtain N groups of associated statements, wherein the first language model is used for identifying whether the association relationship exists between the statements, each group of associated statements comprises two statements and an association relationship, and the two statements belong to the same statement set or belong to different statement sets; classifying N groups of associated sentences according to attribute information of each group of associated sentences to obtain P associated sentence sets, and inputting the P associated sentence sets into a second language model to obtain Q associated relations, wherein each associated relation is an associated relation among a group of associated sentence sets; and determining the Q association relations and the P association statement sets as a knowledge base.
Optionally, the first language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence contents to obtain a plurality of first history association relations; and taking each first history association relation and a group of history sentences corresponding to the first history association relation as a group of first samples to obtain a plurality of groups of first samples, and training the first initial language model by using the plurality of groups of first samples to obtain a first language model.
Optionally, the second language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence content to obtain a plurality of groups of first history association sentences; classifying a plurality of groups of first history associated sentences according to the attribute information of each group of first history associated sentences to obtain a plurality of history associated sentence sets, and determining the association relation among the plurality of history associated sentence sets to obtain a plurality of second history associated relations, wherein the attribute information of each group of first history associated sentences in the history associated sentence sets is the same; and taking each second history association relation and a group of history association statement sets corresponding to the second history association relation as a group of second samples to obtain a plurality of groups of second samples, and training a second initial language model by using the plurality of groups of second samples to obtain a second language model.
Optionally, preprocessing each text message in the text message set to obtain M sentence sets includes: identifying a language type of each text message in the set of text messages; sentence dividing is carried out on each text message according to sentence dividing rules corresponding to language types, and sentence sets of each text message are obtained; and screening the sentences in each sentence set through a preset dictionary to obtain M screened sentence sets.
Optionally, screening the sentences in each sentence set through a preset dictionary, and obtaining M screened sentence sets includes: acquiring any two target sentences in any two sentence sets from the M filtered sentence sets, and sequentially searching the two target sentences in a preset database to obtain two search results; acquiring the uniform resource locator of each search result, and judging whether the uniform resource locators of the two search results are the same or not; and deleting any one target sentence from the M sentence sets under the condition that the uniform resource locators are the same, so as to obtain updated M sentence sets.
Optionally, obtaining text information in a preset scene in a public database, where obtaining a text information set includes: acquiring all first text information in a public database, and acquiring scene information of a preset scene; sequentially carrying out matching degree calculation on each first text message and scene information, and determining initial text messages with the matching degree larger than a preset matching degree as second text messages to obtain a plurality of second text messages; and acquiring the generation time of each second text message, and determining the second text message with the generation time larger than the preset time as the text message in the preset scene.
According to another aspect of the present application, a data query apparatus is provided. The device comprises: the first acquisition unit is used for acquiring query information sent by a user side, determining a preset scene indicated by the query information and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; the second acquisition unit is used for acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and the return unit is used for returning the target data and the associated data to the user side.
According to another aspect of the present application, there is also provided a computer storage medium for storing a program, wherein the program controls a device in which the computer storage medium is located to execute a data query method when running.
According to another aspect of the present application, there is also provided an electronic device comprising one or more processors and a memory; the memory has stored therein computer readable instructions, and the processor is configured to execute the computer readable instructions, wherein the computer readable instructions when executed perform a data query method.
According to the application, the following steps are adopted: acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side. The method and the device solve the problem that the accuracy and the completeness of the queried data are lower when the data are queried in the related technology. The knowledge base corresponding to the preset scene is selected through determining the preset scene indicated by the query information, the query is carried out in the knowledge base, the information to be queried is accurately obtained, and meanwhile, the associated data associated with the target data reached by the query is also obtained through the knowledge base, so that the display of the associated data can be carried out to the user under the condition that the query information of the user is inaccurate, and the effect of improving the accuracy of the query of the user information is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application. In the drawings:
FIG. 1 is a flow chart of a data query method provided according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative knowledge base creation method provided in accordance with an embodiment of the application;
FIG. 3 is a schematic diagram of a data query device according to an embodiment of the present application;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.
In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the application herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
It should be noted that the data query method, the device, the storage medium and the electronic device determined by the present disclosure may be used in the field of artificial intelligence, and may also be used in any field other than the field of artificial intelligence, and the application fields of the data query method, the device, the storage medium and the electronic device determined by the present disclosure are not limited.
For convenience of description, the following will describe some terms or terminology involved in the embodiments of the present application:
LLM: large Language Model, a large-scale language model, is a deep learning-based natural language processing model, and can learn the grammar and semantics of natural language, so as to generate readable text.
According to an embodiment of the application, a data query method is provided.
Fig. 1 is a flowchart of a data query method according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting text information sets under the preset scene acquired in a public database into a target language model.
Specifically, after the user side sends the query information, a preset scene indicated by the query information needs to be determined first, for example, the query information is input into the financial field, or the query information belongs to the technical field, etc. When determining the preset scene, the determination of the preset scene can be performed according to specific query content in the query information, for example, the query content may include the following contents: and at the moment, the keyword of 'income' is detected, so that the query information can be determined to be in the financial field, and the query information is acquired in a knowledge base corresponding to the financial field when the query is performed, thereby improving the data acquisition efficiency and accuracy.
After the preset scene is determined, a knowledge base under the preset scene is required to be acquired, and the knowledge base is obtained by processing text information under the preset scene through a target language model, so that target data related to the preset scene can be acquired in the knowledge base, and further query operation of the user is completed.
Step S102, acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data.
Specifically, after the target data is acquired in the knowledge base under the preset scene, the associated data associated with the target data is required to be acquired in the knowledge base according to the target data, wherein the associated relation is located in the knowledge base, that is, after the target data is acquired, the associated data related to the target data can be acquired directly according to the associated relation among the data existing in the database, and therefore, the user can still acquire the required query data under the condition that the query information input by the user is fuzzy.
For example, when the information that the user wants to inquire is loan information, but the information input by the user is interest rate inquiry, the loan information related to the interest rate may be synchronously displayed after the related information of the interest rate is displayed to the user, and further, when the user only performs the sequential inquiry, the information that the user wants to search may be successfully displayed.
Step S103, the target data and the associated data are returned to the user side.
Specifically, after the target data and the associated data are obtained from the database, the target data and the associated data can be displayed in the user side, so that the data query operation of the user is completed.
According to the data query method provided by the embodiment of the application, the preset scene indicated by the query information is determined by acquiring the query information sent by the user side, and the knowledge base under the preset scene is determined, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting text information sets under the preset scene acquired in the public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side. The method and the device solve the problem that the accuracy and the completeness of the queried data are lower when the data are queried in the related technology. The knowledge base corresponding to the preset scene is selected through determining the preset scene indicated by the query information, the query is carried out in the knowledge base, the information to be queried is accurately obtained, and meanwhile, the associated data associated with the target data reached by the query is also obtained through the knowledge base, so that the display of the associated data can be carried out to the user under the condition that the query information of the user is inaccurate, and the effect of improving the accuracy of the query of the user information is achieved.
In order to improve accuracy of data stored in a knowledge base and accuracy of association relationships, optionally, fig. 2 is a flowchart of an alternative knowledge base creation method provided according to an embodiment of the present application, as shown in fig. 2, in a data query method provided by an embodiment of the present application, a target language model is composed of a first language model and a second language model, and in step S101, the database is generated by:
step S201, obtaining text information in a preset scene from a public database to obtain a text information set, and preprocessing each text information in the text information set to obtain M statement sets, wherein the text information set comprises M text information.
Step S202, inputting M sentence sets into a first language model to obtain N groups of associated sentences, wherein the first language model is used for identifying whether the associated relationship exists between the sentences, each group of associated sentences comprises two sentences and an associated relationship, and the two sentences belong to the same sentence set or belong to different sentence sets.
Step S203, classifying N groups of associated sentences according to the attribute information of each group of associated sentences to obtain P associated sentence sets, and inputting the P associated sentence sets into the second language model to obtain Q associated relations, wherein each associated relation is an associated relation among a group of associated sentence sets.
Step S204, determining Q association relations and P association statement sets as a knowledge base, wherein each statement in the association statement sets is target data.
Specifically, before the knowledge base is used, the knowledge base under the preset scene needs to be generated, when the knowledge base is generated, text information under the preset scene can be obtained in the public database first to obtain a text information set, for example, an information obtaining system can be used for searching and retrieving real-time articles from the whole network. It can search for articles published on the web by specifying the following options: keywords or phrases, date of release, source domain name, and language. In the case where the preset scenario is finance, 100 news articles about finance-related topics in 2023 may be collected using the information acquisition system. The collected text contains a different number of words, ranging from 50 to 4200, resulting in a collection of text information, where each news article can be considered as one text information.
After the text information set is obtained, preprocessing, such as word segmentation, screening and the like, is required to be carried out on each text information, and after the preprocessing, the text information set is processed into M statement sets, wherein each statement can be a part of content of a certain article, or can be a phrase or key data and the like.
After the M statement sets are obtained, the association relation among the statements can be determined through the first language model, so that the association relation among all the statements in the M statement sets is determined, N groups of association statements are obtained, and the statement with the association relation can be a statement with similar content or a statement with correlation among the content, for example, the statement A can be the loan amount of the company X and the statement B can be the loan interest rate of the area where the company X is located, and the statement A and the statement B have correlation and can be determined as the statement with the association relation.
Further, after obtaining N groups of related sentences, the sentences need to be classified according to attribute information of each group of related sentences, where the attribute information may be a service scenario used by the group of related sentences, or a crowd used by the group of related sentences, and after classifying, an association relationship between the categories is determined according to a second language model, for example, an association relationship exists between an a service scenario and a B crowd, then multiple groups of related sentences in the a service scenario and multiple groups of related sentences in the B crowd may be associated, so that under the condition that a user queries multiple groups of related sentences in the a service scenario, multiple groups of related sentences in the B crowd are synchronously displayed, and then an effect of pushing information to the user is achieved.
Optionally, in the data query method provided by the embodiment of the present application, the first language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence contents to obtain a plurality of first history association relations; and taking each first history association relation and a group of history sentences corresponding to the first history association relation as a group of first samples to obtain a plurality of groups of first samples, and training the first initial language model by using the plurality of groups of first samples to obtain a first language model.
Specifically, when training the first language model, multiple historical sentences in a preset scene are required to be obtained, sentences with association relations are determined through sentence content in a labeling mode, so that multiple first historical association relations are obtained, further the first initial language model can be trained by using the labeled historical sentences with association relations and the association relations among the sentences as first samples, and therefore the first language model which is trained is obtained, and the effect of accurately determining the association data of target data is achieved.
Optionally, in the data query method provided by the embodiment of the present application, the second language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence content to obtain a plurality of groups of first history association sentences; classifying a plurality of groups of first history associated sentences according to the attribute information of each group of first history associated sentences to obtain a plurality of history associated sentence sets, and determining the association relation among the plurality of history associated sentence sets to obtain a plurality of second history associated relations, wherein the attribute information of each group of first history associated sentences in the history associated sentence sets is the same; and taking each second history association relation and a group of history association statement sets corresponding to the second history association relation as a group of second samples to obtain a plurality of groups of second samples, and training a second initial language model by using the plurality of groups of second samples to obtain a second language model.
Specifically, when training the second language model, firstly, multiple groups of first history associated sentences are obtained, and multiple groups of first history associated sentences are classified through attribute information, wherein the attribute information can be a business scene used by the group of associated sentences or a crowd used by the group of associated sentences, after classification, the association relation among multiple history associated sentence sets can be determined through a labeling mode, the association relation and a group of history associated sentence sets corresponding to the association relation are used as a group of second samples, and the second initial language model is trained through multiple second samples, so that the second language model is obtained, and the effect of accurately determining the associated data of the target data is achieved.
Optionally, in the data query method provided by the embodiment of the present application, preprocessing each text message in the text message set to obtain M statement sets includes: identifying a language type of each text message in the set of text messages; sentence dividing is carried out on each text message according to sentence dividing rules corresponding to language types, and sentence sets of each text message are obtained; and screening the sentences in each sentence set through a preset dictionary to obtain M screened sentence sets.
Specifically, when preprocessing a text in an acquired text information set, the text may be firstly divided into a plurality of small paragraphs or a plurality of sentences, and after the sentence is divided, sentences obtained by the sentence are screened through a preset dictionary, so as to obtain screened sentence information, wherein the sentence or word with query meaning, for example, a word related to finance, for example, loan amount, loan time, etc., and a sentence with query meaning does not exist in a sentence, for example, a sentence in a certain financial article is: "thank you are good", the sentence will be filtered out, thereby ensuring that the information contained in the knowledge base is useful information.
For example, the text may be an article, when the sentence is divided, the sentence may be divided according to the content of different sentences in different paragraphs in the article, the sentence is filtered after a plurality of sentences are obtained, the sentences with useful information are reserved, and the repeated sentences and invalid sentences are deleted, so that the effect of simplifying the information in the knowledge base is achieved.
Optionally, in the data query method provided by the embodiment of the present application, screening, by a preset dictionary, the sentences in each sentence set, and obtaining M screened sentence sets includes: acquiring any two target sentences in any two sentence sets from the M filtered sentence sets, and sequentially searching the two target sentences in a preset database to obtain two search results; acquiring the uniform resource locator of each search result, and judging whether the uniform resource locators of the two search results are the same or not; and deleting any one target sentence from the M sentence sets under the condition that the uniform resource locators are the same, so as to obtain updated M sentence sets.
Specifically, when screening, invalid sentences can be deleted according to the content of sentences, repeated sentences are required to be screened and deleted, when determining whether repeated sentences exist, a mode of inputting two sentences into a third party database at the same time can be adopted, the three party database is searched, after a search result is obtained, whether the two sentences are repeated sentences is determined through URL (Uniform Resource Locator) of the search result, and when the two sentences are repeated sentences, the repeated sentences in any sentence set are deleted, so that screening operation of the sentences is completed, and redundancy of a generated knowledge base is reduced.
Optionally, in the data query method provided by the embodiment of the present application, obtaining text information in a preset scene in a public database, where obtaining a text information set includes: acquiring all first text information in a public database, and acquiring scene information of a preset scene; sequentially carrying out matching degree calculation on each first text message and scene information, and determining initial text messages with the matching degree larger than a preset matching degree as second text messages to obtain a plurality of second text messages; and acquiring the generation time of each second text message, and determining the second text message with the generation time larger than the preset time as the text message in the preset scene.
Specifically, when collecting the text information set, all the first text information in the public database can be firstly obtained, for example, all the articles published on the internet are searched, and the matching degree calculation is sequentially carried out on each article and a preset scene, so that a plurality of matching degrees are obtained, wherein when the matching degree calculation is carried out, the comparison calculation can be carried out on the content, the abstract and other information of the articles and the characteristics of the preset scene, so that the accurate matching degree is obtained.
After the matching degree is obtained, determining initial text information with the matching degree being larger than the preset matching degree as second text information, and determining second text information with the generation time being larger than the preset time in the second text information as text information in a preset scene, namely deleting text information with the generation time being earlier than the preset time in the second text information to ensure timeliness of the text information, wherein only the text information with the generation time being later than the preset time is used, so that timeliness of the text in a knowledge base is ensured.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a data query device, and the data query device of the embodiment of the application can be used for executing the data query method provided by the embodiment of the application. The following describes a data query device provided by an embodiment of the present application.
Fig. 3 is a schematic diagram of a data query device according to an embodiment of the present application. As shown in fig. 3, the apparatus includes: a first acquisition unit 31, a second acquisition unit 32, and a return unit 33.
The first obtaining unit 31 is configured to obtain query information sent by a user, determine a preset scene indicated by the query information, and determine a knowledge base under the preset scene, where the knowledge base is composed of target data and an association relationship, and the association relationship is obtained by inputting a text information set under the preset scene obtained in the public database into the target language model.
The second obtaining unit 32 is configured to obtain data associated with the query information from a knowledge base under a preset scenario, obtain target data, and obtain data associated with the target data from the knowledge base, thereby obtaining associated data.
And a return unit 33, configured to return the target data and the associated data to the user side.
According to the data query device provided by the embodiment of the application, the query information sent by the user side is acquired through the first acquisition unit 31, the preset scene indicated by the query information is determined, and the knowledge base under the preset scene is determined, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting text information sets under the preset scene acquired in the public database into the target language model. The second obtaining unit 32 obtains data associated with the query information from a knowledge base in a preset scene, obtains target data, and obtains data associated with the target data from the knowledge base, thereby obtaining associated data. The return unit 33 returns the target data and the associated data to the user side. The method and the device solve the problem that the accuracy and the completeness of the queried data are lower when the data are queried in the related technology. The knowledge base corresponding to the preset scene is selected through determining the preset scene indicated by the query information, the query is carried out in the knowledge base, the information to be queried is accurately obtained, and meanwhile, the associated data associated with the target data reached by the query is also obtained through the knowledge base, so that the display of the associated data can be carried out to the user under the condition that the query information of the user is inaccurate, and the effect of improving the accuracy of the query of the user information is achieved.
Optionally, in the data query device provided in the embodiment of the present application, the target language model is composed of a first language model and a second language model, and the first obtaining unit 31 includes: the system comprises an acquisition module, a text information collection module and a text information collection module, wherein the acquisition module is used for acquiring text information in a preset scene in a public database to obtain a text information collection, and preprocessing each text information in the text information collection to obtain M statement collections, wherein the text information collection comprises M text information; the input module is used for inputting M statement sets into the first language model to obtain N groups of associated statements, wherein the first language model is used for identifying whether the association relationship exists between the statements, each group of associated statements comprises two statements and one association relationship, and the two statements belong to the same statement set or belong to different statement sets; the classification module is used for classifying N groups of associated sentences according to the attribute information of each group of associated sentences to obtain P associated sentence sets, and inputting the P associated sentence sets into the second language model to obtain Q associated relations, wherein each associated relation is an associated relation among a group of associated sentence sets; and the determining module is used for determining the Q association relations and the P association statement sets as a knowledge base.
Optionally, in the data query device provided by the embodiment of the present application, the first language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence contents to obtain a plurality of first history association relations; and taking each first history association relation and a group of history sentences corresponding to the first history association relation as a group of first samples to obtain a plurality of groups of first samples, and training the first initial language model by using the plurality of groups of first samples to obtain a first language model.
Optionally, in the data query device provided by the embodiment of the present application, the second language model is trained by: acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence; determining association relations among a plurality of history sentences according to sentence content to obtain a plurality of groups of first history association sentences; classifying a plurality of groups of first history associated sentences according to the attribute information of each group of first history associated sentences to obtain a plurality of history associated sentence sets, and determining the association relation among the plurality of history associated sentence sets to obtain a plurality of second history associated relations, wherein the attribute information of each group of first history associated sentences in the history associated sentence sets is the same; and taking each second history association relation and a group of history association statement sets corresponding to the second history association relation as a group of second samples to obtain a plurality of groups of second samples, and training a second initial language model by using the plurality of groups of second samples to obtain a second language model.
Optionally, in the data query device provided in the embodiment of the present application, the first obtaining module includes: the recognition sub-module is used for recognizing the language type of each text message in the text message collection; sentence dividing module, which is used for dividing each text message according to sentence dividing rule corresponding to language type to obtain sentence collection of each text message; and the screening sub-module is used for screening the sentences in each sentence set through a preset dictionary to obtain M screened sentence sets.
Optionally, in the data query device provided in the embodiment of the present application, the identifying sub-module is configured to include: the searching sub-module is used for acquiring any two target sentences in any two sentence sets from the M filtered sentence sets, and sequentially searching the two target sentences in a preset database to obtain two search results; the judging sub-module is used for acquiring the uniform resource locator of each search result and judging whether the uniform resource locators of the two search results are the same or not; and the updating sub-module is used for deleting any one target sentence from the M sentence sets under the condition that the uniform resource locators are the same, so as to obtain M updated sentence sets.
Optionally, in the data query device provided in the embodiment of the present application, the first obtaining module includes: the acquisition sub-module is used for acquiring all first text information in the public database and acquiring scene information of a preset scene; the computing sub-module is used for sequentially computing the matching degree of each piece of first text information and the scene information, determining the initial text information with the matching degree larger than the preset matching degree as second text information, and obtaining a plurality of pieces of second text information; the determining submodule is used for acquiring the generation time of each piece of second text information and determining the second text information with the generation time being greater than the preset time as the text information in the preset scene.
The data query device includes a processor and a memory, the first acquiring unit 31, the second acquiring unit 32, the returning unit 33, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem that the accuracy and the integrity of the queried data are low when the data are queried in the related technology is solved by adjusting the kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the data query method.
The embodiment of the invention provides a processor which is used for running a program, wherein the data query method is executed when the program runs.
As shown in fig. 4, an embodiment of the present invention provides an electronic device, where the electronic device 40 includes a processor, a memory, and a program stored on the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side. The device herein may be a server, PC, PAD, cell phone, etc.
The application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model; acquiring data associated with the query information from a knowledge base under a preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data; and returning the target data and the associated data to the user side.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. A method of querying data, comprising:
acquiring query information sent by a user side, determining a preset scene indicated by the query information, and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting a text information set under the preset scene acquired in a public database into a target language model;
Acquiring data associated with the query information from a knowledge base under the preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data;
and returning the target data and the associated data to the user side.
2. The method of claim 1, wherein the target language model is comprised of a first language model and a second language model, and wherein the database is generated by:
acquiring text information in the preset scene from a public database to obtain a text information set, and preprocessing each text information in the text information set to obtain M statement sets, wherein the text information set comprises M text information;
inputting the M statement sets into the first language model to obtain N groups of associated statements, wherein the first language model is used for identifying whether the association relationship exists between the statements, each group of associated statements comprises two statements and one association relationship, and the two statements belong to the same statement set or belong to different statement sets;
classifying the N groups of associated sentences according to attribute information of each group of associated sentences to obtain P associated sentence sets, and inputting the P associated sentence sets into a second language model to obtain Q associated relations, wherein each associated relation is an associated relation among a group of associated sentence sets;
And determining the Q association relations and the P association statement sets as the knowledge base, wherein each statement in the association statement sets is target data.
3. The method of claim 2, wherein the first language model is trained by:
acquiring a plurality of history sentences in the preset scene, and determining sentence content of each history sentence;
determining the association relation among the plurality of history sentences according to the sentence content to obtain a plurality of first history association relations;
and taking each first history association relation and a group of history sentences corresponding to the first history association relation as a group of first samples to obtain a plurality of groups of first samples, and training a first initial language model by using the plurality of groups of first samples to obtain the first language model.
4. The method of claim 2, wherein the second language model is trained by:
acquiring a plurality of history sentences in a preset scene, and determining sentence content of each history sentence;
determining association relations among the plurality of history sentences according to the sentence content to obtain a plurality of groups of first history association sentences;
Classifying the plurality of groups of first history associated sentences according to the attribute information of each group of first history associated sentences to obtain a plurality of history associated sentence sets, and determining the association relation among the plurality of history associated sentence sets to obtain a plurality of second history associated relations, wherein the attribute information of each group of first history associated sentences in the history associated sentence sets is the same;
and taking each second history association relation and a group of history association statement sets corresponding to the second history association relation as a group of second samples to obtain a plurality of groups of second samples, and training a second initial language model by using the plurality of groups of second samples to obtain the second language model.
5. The method of claim 2, wherein preprocessing each text message in the set of text messages to obtain M sets of sentences comprises:
identifying a language type of each text message in the set of text messages;
sentence dividing is carried out on each text message according to sentence dividing rules corresponding to the language types, and sentence sets of each text message are obtained;
and screening the sentences in each sentence set through a preset dictionary to obtain M screened sentence sets.
6. The method of claim 5, wherein filtering the sentences in each sentence set by a pre-set dictionary to obtain M filtered sentence sets comprises:
acquiring any two target sentences in any two sentence sets from the M filtered sentence sets, and sequentially searching the two target sentences in a preset database to obtain two search results;
acquiring the uniform resource locator of each search result, and judging whether the uniform resource locators of the two search results are the same or not;
and deleting any one target sentence from the M sentence sets under the condition that the uniform resource locators are the same, so as to obtain M updated sentence sets.
7. The method of claim 2, wherein obtaining the text information in the preset scene in the public database, the text information set comprising:
acquiring all first text information in the public database, and acquiring scene information of the preset scene;
sequentially carrying out matching degree calculation on each piece of first text information and the scene information, and determining initial text information with the matching degree larger than a preset matching degree as second text information to obtain a plurality of second text information;
And acquiring the generation time of each second text message, and determining the second text message with the generation time larger than the preset time as the text message in the preset scene.
8. A data query device, comprising:
the first acquisition unit is used for acquiring query information sent by a user side, determining a preset scene indicated by the query information and determining a knowledge base under the preset scene, wherein the knowledge base consists of target data and association relations, and the association relations are obtained by inputting text information sets under the preset scene acquired in a public database into a target language model;
the second acquisition unit is used for acquiring data associated with the query information from a knowledge base under the preset scene to obtain target data, and acquiring data associated with the target data from the knowledge base to obtain associated data;
and the return unit is used for returning the target data and the associated data to the user side.
9. A computer storage medium for storing a program, wherein the program when run controls a device in which the computer storage medium is located to perform the data query method of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the data query method of any of claims 1-7.
CN202311009446.6A 2023-08-10 2023-08-10 Data query method and device, storage medium and electronic equipment Pending CN117033744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311009446.6A CN117033744A (en) 2023-08-10 2023-08-10 Data query method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311009446.6A CN117033744A (en) 2023-08-10 2023-08-10 Data query method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN117033744A true CN117033744A (en) 2023-11-10

Family

ID=88625864

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311009446.6A Pending CN117033744A (en) 2023-08-10 2023-08-10 Data query method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN117033744A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272073A (en) * 2023-11-23 2023-12-22 杭州朗目达信息科技有限公司 Text unit semantic distance pre-calculation method and device, and query method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117272073A (en) * 2023-11-23 2023-12-22 杭州朗目达信息科技有限公司 Text unit semantic distance pre-calculation method and device, and query method and device
CN117272073B (en) * 2023-11-23 2024-03-08 杭州朗目达信息科技有限公司 Text unit semantic distance precalculation method and device, query method and device

Similar Documents

Publication Publication Date Title
CN109284363B (en) Question answering method and device, electronic equipment and storage medium
CN116911312B (en) Task type dialogue system and implementation method thereof
US11663254B2 (en) System and engine for seeded clustering of news events
WO2019200752A1 (en) Semantic understanding-based point of interest query method, device and computing apparatus
CN111125086B (en) Method, device, storage medium and processor for acquiring data resources
CN110968663B (en) Answer display method and device of question-answering system
CN110968776B (en) Policy knowledge recommendation method, device storage medium and processor
TW201546633A (en) Method and Apparatus of Matching Text Information and Pushing a Business Object
CN112269816A (en) Government affair appointment event correlation retrieval method
CN117688155A (en) Service problem replying method and device, storage medium and electronic equipment
CA2956627A1 (en) System and engine for seeded clustering of news events
CN116756290A (en) Data query method and device, storage medium and electronic equipment
CN117149804A (en) Data processing method, device, electronic equipment and storage medium
CN117290481A (en) Question and answer method and device based on deep learning, storage medium and electronic equipment
CA3051919C (en) Machine learning (ml) based expansion of a data set
CN117033744A (en) Data query method and device, storage medium and electronic equipment
KR20220061388A (en) A recording medium in which the program providing the keyword-item mapping information service of news articles
CN117421397A (en) Question answering method, apparatus, electronic device, and readable storage medium
CN106776654B (en) Data searching method and device
CN114240496A (en) Client mining method, device, equipment and storage medium applied to insurance recommendation
CN110968691B (en) Judicial hotspot determination method and device
KR20220061401A (en) Program for providing stock recommendation information
CN110737851B (en) Hyper-link semantization method, device, equipment and computer readable storage medium
CN117056482A (en) Knowledge graph-based question and answer method and device, processor and electronic equipment
CN119293243A (en) A contract classification method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination