CN115438166A - Keyword and semantic-based searching method, device, equipment and storage medium - Google Patents
Keyword and semantic-based searching method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN115438166A CN115438166A CN202211202291.3A CN202211202291A CN115438166A CN 115438166 A CN115438166 A CN 115438166A CN 202211202291 A CN202211202291 A CN 202211202291A CN 115438166 A CN115438166 A CN 115438166A
- Authority
- CN
- China
- Prior art keywords
- text
- search
- target
- standard text
- semantic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/19—Recognition using electronic means
- G06V30/191—Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G06V30/19173—Classification techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to an artificial intelligence technology, and discloses a search method, a device, equipment and a medium based on keywords and semantics, wherein the method comprises the following steps: performing text conversion and standardization processing on input contents to obtain a standard text; performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result; utilizing a target database to recall the keywords of the standard text, and generating a first search set according to a recall result; constructing a sample pair of a standard text, and inputting the standard text and the sample pair into a semantic vector model to obtain a corresponding semantic vector; generating a target vector database according to the target database, comparing and querying the semantic vector with the source vector in the target vector database, and generating a second search set according to the result of comparison and query; and sequencing and combining the search results in the first search set and the second search set to obtain a target search set. The invention can improve the accuracy of the search result and the search speed.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a search method and device based on keywords and semantics, an electronic device and a computer readable storage medium.
Background
In the digital era, the amount of information generated by people is increased explosively, a large amount of news information is generated and consumed every day, some high-value information is often dispersed in the large amount of news information, and people want to acquire needed information from the large amount of information by means of a searching method. In the existing search method, search is mainly carried out based on keywords, content containing the keywords is used as a search result and pushed to a user, and the search result obtained by matching the terms cannot prepare information really needed by the user, so that the existing search scheme is inaccurate; secondly, the existing searching method is lack of semantic understanding capability or weak in semantic understanding capability, has the problems of inaccurate finding, incomplete finding and the like, and increases the difficulty of directly contacting with the user to obtain useful information; moreover, under the influence of the current internet information explosion, keyword matching or semantic analysis is performed based on a large amount of information, so that the efficiency is low, and the display of search results is slow.
In summary, the prior art has the problems of slow search speed and low accuracy of search results.
Disclosure of Invention
The invention provides a searching method and device based on keywords and semantics, electronic equipment and a computer readable storage medium, and mainly aims to solve the problems of low searching speed and low accuracy of a searching result.
In order to achieve the above object, the present invention provides a search method based on keywords and semantics, comprising:
acquiring input content of a user, performing text conversion on the input content to obtain an input text, and performing standardization processing on the input text to obtain a standard text;
performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result;
utilizing the target database to recall the keywords of the standard text, and generating a first search set according to a recall result;
constructing a sample pair of the standard text, and inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text;
generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
Optionally, the performing text conversion on the input content to obtain an input text includes:
judging whether the input content is a text, voice or a picture;
when the input content is a text, taking the input content as an input text;
when the input content is voice, performing feature extraction on the input content to obtain voice features, and calculating the voice features by using a preset acoustic model to obtain an input text;
and when the input content is a picture, performing OCR picture character recognition on the input content, and taking a recognition result as an input text.
Optionally, the retrieving the keywords from the standard text by using the target database, and generating a first search set according to a retrieval result includes:
acquiring all content documents of the target database, and extracting keywords of the standard text;
performing matching calculation on the content documents and the keywords by using a preset core search formula to obtain a matching score corresponding to each content document;
the matching calculation of the content document and the keywords is shown as the following formula:
wherein score (D, Q) is the matching score corresponding to the content document D; IDF (q) i ) An Inverse Document Frequency (IDF) of the ith keyword; n is the number of the keywords of the standard text; f (q) i D) is the Term Frequency (TF) of the ith keyword in the document D; k. b is a preset free parameter, optionally, k e [1.2,2.0]B =0.75; | D | is the total word number of the content document D; avgdl (average document length) is the average value of all the content document lengths;
and recalling a target search text from the target database according to the matching score, and generating a first search set according to the target search document.
Optionally, the constructing the sample pair of standard texts includes:
performing word segmentation processing on the standard text to obtain text word segmentation, and performing word segmentation repetition on the text word segmentation to obtain a first positive sample;
searching synonyms of the text participles by using a preset synonym dictionary, and performing text participle replacement by using the synonyms to obtain a second positive sample;
and randomly sampling to generate a negative sample of the standard text, taking the first positive sample and the second sample as positive samples of the standard text, and determining a sample pair of the standard text according to the negative sample and the positive sample.
Optionally, the inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text includes:
performing feature coding on the standard text and the sample pair to obtain a coded text and a coded sample pair;
fully connecting the coding sample pairs by utilizing a multilayer perceptron of the semantic vector model to obtain output sample pairs;
calculating the coded text and the output sample pair by using a preset target function to obtain a function value, and judging whether the function value meets a preset requirement;
computing the encoded text and the output sample pair using:
wherein h is 1 、h 2 Is unknown coding quantity; sim (h) 1 ,h 2 ) Is h 1 And h 2 Cosine similarity of (d); l is the function value; sim (h, h) p ) Outputting positive samples h for pairs of coded text h and output samples p Cosine similarity of (d); sim (h, h) q ) Outputting negative sample h for coded text h and output sample pair q Cosine similarity of (d); n is the total number of output positive samples; m is the total number of output negative samples; tau is a preset temperature coefficient;
if the function value does not meet the preset requirement, correcting the parameter of the semantic vector model;
and if the function value is required to be preset, taking an output positive sample in the output samples as a semantic vector.
Optionally, the generating a target vector database according to the target database includes:
extracting target data in the target database, and performing feature conversion on the target data to obtain a feature vector corresponding to the target data;
and storing the characteristic vectors into a preset Milvus database to obtain a target vector database.
Optionally, the ranking and combining the search results in the first search set and the second search set to obtain a target search set includes:
removing the same search results in the first search set and the second search set to obtain a target search result;
setting the weight of the target search result according to the ordering of the first search set and the second search set;
and reordering the target search results according to the weight, and generating a target search set according to the reordered search results.
In order to solve the above problems, the present invention further provides a search device based on keywords and semantics, the device comprising:
the standard text generation module is used for acquiring input contents of a user, performing text conversion on the input contents to obtain input texts, and performing standardization processing on the input texts to obtain standard texts;
the target database selection module is used for performing intention recognition on the standard text and extracting a target database corresponding to the standard text according to an intention recognition result;
the first search set generation module is used for recalling the keywords of the standard text by using the target database and generating a first search set according to a recall result;
the semantic vector calculation module is used for constructing a sample pair of the standard text, inputting the standard text and the sample pair into a preset semantic vector model, and obtaining a semantic vector corresponding to the standard text;
the second search set generation module is used for generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and the target search set generation module is used for sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
In order to solve the above problem, the present invention also provides an electronic device, including:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the keyword and semantic based search method described above.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one computer program is stored, the at least one computer program being executed by a processor in an electronic device to implement the keyword and semantic-based search method described above.
The embodiment of the invention improves the semantic comprehension capability of search by combining semantic search and keyword search, wherein the processes of keyword search and semantic search are performed in parallel, and a first search set generated by keyword search and a second search set generated by semantic search are combined and ordered when the final search result is output to obtain the output target search result; the intention recognition is carried out on the standard text, the target database corresponding to the standard text is confirmed, and further analysis is carried out according to the target database, so that the data analysis amount is reduced, and the searching speed and efficiency are improved; by recalling the keywords in the target database and comparing and querying the semantic vector generated by the semantic vector model with the source vector in the vector database, the search result meeting the conditions is obtained, and the accuracy and the search speed of the search result are improved. Therefore, the searching method, the searching device, the electronic equipment and the computer readable storage medium based on the keywords and the semantics can solve the problems of low searching speed and low accuracy of the searching result.
Drawings
FIG. 1 is a schematic flow chart of a search method based on keywords and semantics according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of a process of performing a keyword recall on the standard text by using the target database and generating a first search set according to a recall result according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of constructing a sample pair of the standard texts according to an embodiment of the present invention;
FIG. 4 is a functional block diagram of a searching apparatus based on keywords and semantics according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device implementing the keyword and semantic-based search method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The embodiment of the application provides a search method based on keywords and semantics. The execution subject of the keyword and semantic-based search method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the keyword and semantic based search method may be performed by software or hardware installed in a terminal device or a server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like. The server may be an independent server, or may be a cloud server that provides basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like.
Fig. 1 is a schematic flow chart of a search method based on keywords and semantics according to an embodiment of the present invention. In this embodiment, the search method based on keywords and semantics includes:
s1, obtaining input content of a user, performing text conversion on the input content to obtain an input text, and performing standardization processing on the input text to obtain a standard text.
In the embodiment of the present invention, the output content may include text input, voice input, picture input, and the like; the embodiment of the invention can perform keyword analysis and semantic analysis through the text content, thereby converting the input content into the text and facilitating the subsequent keyword analysis and semantic analysis.
In the embodiment of the present invention, the performing text conversion on the input content to obtain an input text includes:
judging whether the input content is a text, a voice or a picture;
when the input content is a text, taking the input content as an input text;
when the input content is voice, performing feature extraction on the input content to obtain voice features, and calculating the voice features by using a preset acoustic model to obtain an input text;
and when the input content is a picture, performing OCR picture character recognition on the input content, and taking a recognition result as an input text.
In the embodiment of the invention, when the user inputs text, voice or pictures at the terminal, the triggered interfaces are different. For example, when a user inputs text content, the transmission of the input text is completed through a search box interface; when the user inputs the voice content, the voice monitoring interface is called to complete the transmission of the input voice; and when the user inputs the picture content, calling a picture uploading interface to finish the transmission of the input picture. Therefore, the type of the input content can be judged through the transmission interface of the input content.
In the embodiment of the present invention, the method for extracting the characteristics of the input content comprises a Linear Predictive Cepstrum Coefficient (LPCC) and a Mel cepstrum coefficient (MFCC), which aim to change each frame waveform into a multi-dimensional vector containing sound information; the acoustic Model can be a Hidden Markov Model (HMM), and speech features can be calculated through Model calculation to obtain phoneme information; and then, performing probability prediction by using a preset dictionary and the phoneme information, and taking the text with the maximum probability value in the prediction result as an input text. The dictionary is the correspondence between characters or words and phonemes, for example, chinese is the correspondence between pinyin and Chinese characters, and English is the correspondence between phonetic symbols and words.
In the embodiment of the present invention, the process of performing the normalization processing on the input text may include, but is not limited to, complex and simple conversion, special character recognition, sensitive word filtering, case and case conversion, content error correction, fuzzy word recognition, and the like. According to the embodiment of the invention, the input text is subjected to standardized processing, so that the content interference of the standard text is reduced, and the content normalization and accuracy of the standard text are improved.
In the embodiment of the invention, when the input content of the user is acquired, the input association can be carried out according to the input content under the influence of the imperfection of the input content of the user, the user is assisted to retrieve the required content more quickly, and the experience of the user is improved.
In this embodiment of the present invention, before obtaining the input content of the user, the method further includes:
receiving first input content in the input content, and performing prefix matching on the first input content to obtain first matching content;
when the first matching content does not have the content required by the user, receiving second input content after the first input content, and carrying out infix matching on the second input content to obtain second matching content;
when the second matching content does not have the content required by the user, receiving third input content after the second input content, and performing field-level matching on the third input content to obtain third matching content;
and when the third matching content does not have the content required by the user, receiving subsequent input content after the third input content, and splicing the first input content, the second input content, the third input content and the subsequent input content to obtain the input content of the user.
In the embodiment of the invention, after the first matching content, the second matching content or the third matching content is obtained, if the content required by the user exists, the matching content selected by the user is used as the input content of the user.
And S2, performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result.
In the embodiment of the invention, when the existing massive content data is faced, in order to improve the search efficiency based on the text, the data is classified according to the type of the data content, and then a plurality of databases of different types of labels are obtained. Therefore, when searching based on text, it is first necessary to determine a target database corresponding to the text content according to the text content.
In the embodiment of the present invention, the performing intent recognition on the standard text and extracting a target database corresponding to the standard text according to an intent recognition result includes:
performing word segmentation and part-of-speech tagging on the standard text to obtain text segmented words, and searching in a preset intention dictionary according to the text segmented words;
when the text segmentation words are retrieved from the intention dictionary, extracting intention labels corresponding to the text segmentation words from the intention dictionary;
when the text participles are not retrieved from the intention dictionary, calculating semantic similarity between the text participles and a plurality of historical search texts in a preset intention dictionary, and taking an intention label corresponding to the historical search texts with semantic similarity meeting preset conditions as an intention label corresponding to the text participles;
matching the intention labels corresponding to the text participles with database labels corresponding to a plurality of preset databases, and selecting a target database from the databases according to a matching result.
In the embodiment of the invention, the intention dictionary is obtained by performing machine self-learning according to a historical search text and a search result corresponding to the historical search text, and the intention dictionary comprises the historical search text and an intention label corresponding to the historical search text. Further, the standard text can be identified to obtain more than one intention, so that each intention label can have a respective weight, and the target intention label is further selected according to the weight of the intention label.
In another optional embodiment of the present invention, a text feature of the standard text may be further calculated, and the text feature is input into a previously trained LSTM + attention network model, BERT network classification model, and the like to perform calculation, so as to obtain probabilities corresponding to different intentions, thereby determining the intention corresponding to the standard text.
And S3, performing keyword recall on the standard text by using the target database, and generating a first search set according to a recall result.
In the embodiment of the invention, an Elasticissearch engine can be adopted to recall the keywords of the standard text, the Elasticissearch engine is a distributed, extensible and real-time keyword-based search engine, and the standard text can be calculated through a core search formula of the Elasticissearch engine, so that the recall of the search result is realized.
Referring to fig. 2, in the embodiment of the present invention, the retrieving the standard text by using the target database, and generating a first search set according to a retrieval result includes:
s21, acquiring all content documents of the target database, and extracting keywords of the standard text;
s22, performing matching calculation on the content documents and the keywords by using a preset core search formula to obtain a matching score corresponding to each content document;
s23, recalling a target search text from the target database according to the matching score, and generating a first search set according to the target search document.
Specifically, the matching calculation of the content document and the keyword is shown as the following formula:
wherein score (D, Q) is the matching score corresponding to the content document D; IDF (q) i ) An Inverse Document Frequency (IDF) of the ith keyword; n is the number of the keywords of the standard text; f (q) i D) is the Term Frequency (TF) of the ith keyword in the document D; k. b is a preset free parameter, optionally, k E [1.2,2.0]B =0.75; | D | is the total word number of the content document D; avgdl (average document length) is an average value of all the content document lengths.
According to the method and the device, each content document in the target database is scored through a core search formula to obtain the matching degree of each content document to the standard text, so that the recalled content documents can be sequenced by taking the matching score as a sequencing condition, and the obtained contents at the front in the first search set are more in line with the user requirements.
In the embodiment of the invention, the keyword recall is carried out by the Elasticissearch search engine, the calculation logic of the method is simple based on the matching degree of the keyword, and the search efficiency is improved.
S4, constructing a sample pair of the standard text, and inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text.
In the embodiment of the invention, the semantic vector model can be a comparative learning model, the distance between each sample and a positive sample is shortened, the distance between the sample and a negative sample is lengthened, the model is corrected through a trained target function result, and finally, the semantic vector meeting the function requirement is output. Wherein positive samples are semantically similar samples and negative samples are semantically dissimilar samples.
Referring to fig. 3, in the embodiment of the present invention, the constructing a sample pair of the standard text includes:
s31, performing word segmentation processing on the standard text to obtain text word segmentation, and performing word segmentation repetition on the text word segmentation to obtain a first positive sample;
s32, searching synonyms of the text participles by using a preset synonym dictionary, and performing text participle replacement by using the synonyms to obtain a second positive sample;
s33, randomly sampling to generate a negative sample of the standard text, taking the first positive sample and the second sample as positive samples of the standard text, and determining a sample pair of the standard text according to the negative sample and the positive sample.
In embodiments of the invention, inserting randomly selected tokens into a sentence may introduce additional noise, which may distort the meaning of the sentence, or deleting a keyword from a sentence may substantially change its semantic meaning. Therefore, the embodiment of the invention avoids the condition of influencing semantics through the methods of word segmentation repetition and synonym replacement, so that the text enhancement is safer.
In the embodiment of the present invention, the inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text includes:
performing feature coding on the standard text and the sample pair to obtain a coded text and a coded sample pair;
fully connecting the coding sample pairs by utilizing a multilayer perceptron of the semantic vector model to obtain output sample pairs;
calculating the coded text and the output sample pair by using a preset target function to obtain a function value, and judging whether the function value meets a preset requirement;
if the function value does not meet the preset requirement, correcting the parameter of the semantic vector model;
and if the function value is required to be preset, taking an output positive sample in the output samples as a semantic vector.
In the embodiment of the invention, the fully-connected effect of the multi-layer perceptron on the coding sample pairs is to project two groups of coding vectors into a common space for comparison and learning.
In the embodiment of the present invention, the encoded text and the output sample pair may be calculated by the following formula:
wherein h is 1 、h 2 Is unknown coding quantity; sim (h) 1 ,h 2 ) Is h 1 And h 2 Cosine similarity of (a); l is the function value; sim (h, h) p ) Outputting positive sample h for coded text h and output sample pair p Cosine similarity of (d); sim (h, h) q ) Outputting negative sample h for coded text h and output sample pair q Cosine similarity of (d); n is the total number of output positive samples; m is the total number of output negative samples; tau is a preset temperature coefficient.
In the embodiment of the invention, by constructing the sample pair of the standard text and the semantic vector model, the semantic capture capability of text input is expanded, and the original semantic search effect is improved.
And S5, generating a target vector database according to the target database, comparing and querying the semantic vector and the source vector in the target vector database, and generating a second search set according to the result of comparison and query.
In the embodiment of the invention, a Milvus search engine can be adopted to realize the comparison query based on the semantic vector, and the process of performing similarity search by the Milvus search engine is divided into database vector storage and vector query.
The process of storing the database vector is a process of generating a target vector database according to the target database.
In this embodiment of the present invention, the generating a target vector database according to the target database includes:
extracting target data in the target database, and performing feature conversion on the target data to obtain a feature vector corresponding to the target data;
and storing the characteristic vectors into a preset Milvus database to obtain a target vector database.
In the embodiment of the invention, when the feature vector is stored, the feature vector can be stored in a partitioned mode according to the attribute of the feature vector. When the comparison query is carried out, the query can be directly carried out in the storage area corresponding to the target vector database through the vector attributes corresponding to the semantic vectors, and therefore the query speed and efficiency are improved.
The embodiment of the invention can adopt a method for calculating the vector similarity between the semantic vector and the source vector in the target vector database to realize comparison query. The vector similarity calculation method includes, but is not limited to, cosine similarity calculation, pearson correlation coefficient calculation, and euclidean distance.
Further, the embodiment of the invention can extract the corresponding content document from the target database through the similarity calculation result and sort the inner Rong Wendang to obtain the second search set.
S6, sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
In the embodiment of the present invention, the first search set and the second search set both include a plurality of content documents, and when performing terminal display, the content documents in the two search sets need to be integrated to obtain a target search set that needs to be displayed finally.
In this embodiment of the present invention, the ranking and combining the search results in the first search set and the second search set to obtain a target search set includes:
removing the same search results in the first search set and the second search set to obtain a target search result;
setting the weight of the target search result according to the ordering of the first search set and the second search set;
and reordering the target search results according to the weight, and generating a target search set according to the reordered search results.
In the embodiment of the present invention, the weight of the target search result may be set according to the situation before and after the ranking, and the weight in the front of the ranking may be greater than the weight in the back of the ranking. After the weights are generated, the target search results can be ranked according to the weight of the target search results.
In another optional embodiment of the present invention, the first search set may be directly placed in the first half of the target search set, and the second search set may be placed in the second half of the target search set; or calculating the matching degree of the standard text and the search results in the first search set and the second search set, and sequencing the search results according to the matching degree to obtain a target search set.
The embodiment of the invention improves the semantic comprehension capability of search by combining semantic search and keyword search, wherein the processes of keyword search and semantic search are performed in parallel, and a first search set generated by keyword search and a second search set generated by semantic search are combined and ordered when the final search result is output to obtain the output target search result; the intention recognition is carried out on the standard text, the target database corresponding to the standard text is confirmed, and further analysis is carried out according to the target database, so that the data analysis amount is reduced, and the searching speed and efficiency are improved; by recalling the keywords in the target database and comparing and querying the semantic vector generated by the semantic vector model with the source vector in the vector database, the search result meeting the conditions is obtained, and the accuracy and the search speed of the search result are improved. Therefore, the searching method based on the keywords and the semantics can solve the problems of low searching speed and low accuracy of the searching result.
Fig. 4 is a functional block diagram of a search apparatus based on keywords and semantics according to an embodiment of the present invention.
The keyword and semantic based search apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the search apparatus 100 based on keywords and semantics may include a standard text generation module 101, a target database selection module 102, a first search set generation module 103, a semantic vector calculation module 104, a second search set generation module 105, and a target search set generation module 106. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the standard text generation module 101 is configured to obtain input content of a user, perform text conversion on the input content to obtain an input text, and perform standardization processing on the input text to obtain a standard text;
the target database selection module 102 is configured to perform intention recognition on the standard text, and extract a target database corresponding to the standard text according to an intention recognition result;
the first search set generating module 103 is configured to perform keyword recall on the standard text by using the target database, and generate a first search set according to a recall result;
the semantic vector calculation module 104 is configured to construct a sample pair of the standard text, and input the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text;
the second search set generating module 105 is configured to generate a target vector database according to the target database, perform comparison query on the semantic vector and a source vector in the target vector database, and generate a second search set according to a result of the comparison query;
the target search set generating module 106 is configured to rank and combine the search results in the first search set and the second search set to obtain a target search set.
In detail, when the modules in the search apparatus 100 based on keywords and semantics according to the embodiment of the present invention are used, the same technical means as the search method based on keywords and semantics described in the drawings can be used, and the same technical effects can be produced, which is not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device implementing a keyword and semantic based search method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11, a communication bus 12 and a communication interface 13, and may further comprise a computer program, such as a keyword and semantic based search program, stored in the memory 11 and executable on the processor 10.
In some embodiments, the processor 10 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or may be composed of a plurality of integrated circuits packaged with the same function or different functions, and includes one or more Central Processing Units (CPUs), a microprocessor, a digital Processing chip, a graphics processor, a combination of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device by running or executing programs or modules (e.g., executing a search program based on keywords and semantics, etc.) stored in the memory 11 and calling data stored in the memory 11.
The memory 11 includes at least one type of readable storage medium including flash memory, removable hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device, for example a removable hard disk of the electronic device. The memory 11 may also be an external storage device of the electronic device in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device. The memory 11 may be used not only to store application software installed in the electronic device and various types of data, such as codes of a search program based on keywords and semantics, etc., but also to temporarily store data that has been output or is to be output.
The communication bus 12 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
The communication interface 13 is used for communication between the electronic device and other devices, and includes a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (e.g., WI-FI interface, bluetooth interface, etc.), which are typically used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a Display (Display), an input unit such as a Keyboard (Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable, among other things, for displaying information processed in the electronic device and for displaying a visualized user interface.
Fig. 5 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 5 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management and the like are realized through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The keyword and semantic based search program stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, can implement:
acquiring input content of a user, performing text conversion on the input content to obtain an input text, and performing standardization processing on the input text to obtain a standard text;
performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result;
utilizing the target database to recall the keywords of the standard text, and generating a first search set according to a recall result;
constructing a sample pair of the standard text, and inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text;
generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
Specifically, the specific implementation method of the instruction by the processor 10 may refer to the description of the relevant steps in the embodiment corresponding to the drawings, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
acquiring input content of a user, performing text conversion on the input content to obtain an input text, and performing standardization processing on the input text to obtain a standard text;
performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result;
utilizing the target database to recall the keywords of the standard text, and generating a first search set according to a recall result;
constructing a sample pair of the standard text, and inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text;
generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.
Claims (10)
1. A search method based on keywords and semantics, the method comprising:
acquiring input content of a user, performing text conversion on the input content to obtain an input text, and performing standardization processing on the input text to obtain a standard text;
performing intention recognition on the standard text, and extracting a target database corresponding to the standard text according to an intention recognition result;
utilizing the target database to recall the keywords of the standard text, and generating a first search set according to a recall result;
constructing a sample pair of the standard text, and inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text;
generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
2. The keyword and semantic-based search method according to claim 1, wherein the text-converting the input content to obtain an input text comprises:
judging whether the input content is a text, a voice or a picture;
when the input content is a text, taking the input content as an input text;
when the input content is voice, performing feature extraction on the input content to obtain voice features, and calculating the voice features by using a preset acoustic model to obtain an input text;
and when the input content is a picture, performing OCR picture character recognition on the input content, and taking a recognition result as an input text.
3. The keyword and semantic based search method according to claim 1, wherein the keyword recalling the standard text with the target database, and generating a first search set according to a recall result comprises:
acquiring all content documents of the target database, and extracting keywords of the standard text;
matching calculation is carried out on the content documents and the keywords by using a preset core search formula to obtain a matching score corresponding to each content document;
the matching calculation of the content document and the keywords is shown as the following formula:
wherein score (D, Q) is the matching score corresponding to the content document D; IDF (q) i ) An Inverse Document Frequency (IDF) of the ith keyword; n is the number of keywords of the standard text; f (q) i D) is the Term Frequency (TF) of the ith keyword in the document D; k. b is a preset free parameter; | D | is the total word number of the content document D; avgdl (average document length) is the average value of all the content document lengths;
and recalling a target search text from the target database according to the matching score, and generating a first search set according to the target search document.
4. The keyword and semantic based search method according to claim 1, wherein the constructing of the sample pairs of standard texts comprises:
performing word segmentation processing on the standard text to obtain text word segmentation, and performing word segmentation repetition on the text word segmentation to obtain a first positive sample;
searching synonyms of the text participles by using a preset synonym dictionary, and performing text participle replacement by using the synonyms to obtain a second positive sample;
and randomly sampling to generate a negative sample of the standard text, taking the first positive sample and the second sample as positive samples of the standard text, and determining a sample pair of the standard text according to the negative sample and the positive sample.
5. The method for searching based on keywords and semantics of claim 1, wherein the step of inputting the standard text and the sample pair into a preset semantic vector model to obtain a semantic vector corresponding to the standard text comprises:
performing feature coding on the standard text and the sample pair to obtain a coded text and a coded sample pair;
fully connecting the coding sample pairs by utilizing a multilayer perceptron of the semantic vector model to obtain output sample pairs;
calculating the coded text and the output sample pair by using a preset target function to obtain a function value, and judging whether the function value meets a preset requirement;
computing the encoded text and the output sample pair using:
wherein h is 1 、h 2 Is unknown coding quantity; sim (h) 1 ,h 2 ) Is h 1 And h 2 Cosine similarity of (d); l is the function value; sim (h, h) p ) Outputting positive sample h for coded text h and output sample pair p Cosine similarity of (d); sim (h, h) q ) Outputting negative sample h for coded text h and output sample pair q Cosine similarity of (d); n is the total number of output positive samples; m is the total number of output negative samples; tau is a preset temperature coefficient;
if the function value does not meet the preset requirement, correcting the parameter of the semantic vector model;
and if the function value is a preset requirement, taking an output positive sample in the output samples as a semantic vector.
6. The keyword and semantic based search method according to claim 1, wherein the generating a target vector database from the target database comprises:
extracting target data in the target database, and performing feature conversion on the target data to obtain a feature vector corresponding to the target data;
and storing the characteristic vectors into a preset Milvus database to obtain a target vector database.
7. The keyword and semantic based search method according to any one of claims 1 to 6, wherein the ranking and combining of the search results in the first search set and the second search set to obtain a target search set comprises:
removing the same search results in the first search set and the second search set to obtain a target search result;
setting the weight of the target search result according to the ordering of the first search set and the second search set;
and reordering the target search results according to the weight, and generating a target search set according to the reordered search results.
8. A keyword and semantic based search apparatus, the apparatus comprising:
the standard text generation module is used for acquiring input contents of a user, performing text conversion on the input contents to obtain input texts, and performing standardization processing on the input texts to obtain standard texts;
the target database selection module is used for performing intention recognition on the standard text and extracting a target database corresponding to the standard text according to an intention recognition result;
the first search set generation module is used for recalling the keywords of the standard text by using the target database and generating a first search set according to a recall result;
the semantic vector calculation module is used for constructing a sample pair of the standard text, inputting the standard text and the sample pair into a preset semantic vector model, and obtaining a semantic vector corresponding to the standard text;
the second search set generation module is used for generating a target vector database according to the target database, comparing and querying the semantic vector with a source vector in the target vector database, and generating a second search set according to a comparison query result;
and the target search set generation module is used for sequencing and combining the search results in the first search set and the second search set to obtain a target search set.
9. An electronic device, characterized in that the electronic device comprises:
at least one processor; and (c) a second step of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the keyword and semantic-based search method according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, carries out a keyword and semantic-based search method according to one of claims 1 to 7.
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211202291.3A CN115438166A (en) | 2022-09-29 | 2022-09-29 | Keyword and semantic-based searching method, device, equipment and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211202291.3A CN115438166A (en) | 2022-09-29 | 2022-09-29 | Keyword and semantic-based searching method, device, equipment and storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN115438166A true CN115438166A (en) | 2022-12-06 |
Family
ID=84251560
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202211202291.3A Pending CN115438166A (en) | 2022-09-29 | 2022-09-29 | Keyword and semantic-based searching method, device, equipment and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN115438166A (en) |
Cited By (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115858939A (en) * | 2022-12-31 | 2023-03-28 | 企知道网络技术有限公司 | Method, system and storage medium for recalling in-line |
| CN115905498A (en) * | 2022-12-26 | 2023-04-04 | 上海浦东发展银行股份有限公司 | Data retrieval method, device, equipment and storage medium |
| CN116049267A (en) * | 2022-12-26 | 2023-05-02 | 上海朗晖慧科技术有限公司 | A method for searching and displaying chemicals with multi-dimensional intelligent identification |
| CN116089598A (en) * | 2023-02-13 | 2023-05-09 | 合肥工业大学 | A green knowledge recommendation method based on feature similarity and user demand |
| CN116150497A (en) * | 2023-02-28 | 2023-05-23 | 北京百度网讯科技有限公司 | Text information recommendation method, device, electronic device and storage medium |
| CN116189193A (en) * | 2023-04-25 | 2023-05-30 | 杭州镭湖科技有限公司 | Data storage visualization method and device based on sample information |
| CN116450916A (en) * | 2023-04-11 | 2023-07-18 | 平安科技(深圳)有限公司 | Information query method and device based on fixed-segment classification, electronic equipment and medium |
| CN117093601A (en) * | 2023-08-31 | 2023-11-21 | 北京百度网讯科技有限公司 | Structured data recall methods, devices, equipment and media |
| CN117150144A (en) * | 2023-10-30 | 2023-12-01 | 南通苏鹏计算机技术有限公司 | A search engine optimization method based on big data |
| CN117149992A (en) * | 2023-08-24 | 2023-12-01 | 百度在线网络技术(北京)有限公司 | Data processing methods, devices, equipment and media |
| CN117235121A (en) * | 2023-11-15 | 2023-12-15 | 华北电力大学 | An energy big data query method and system |
| CN117235137A (en) * | 2023-11-10 | 2023-12-15 | 深圳市一览网络股份有限公司 | A method and device for occupational information query based on vector database |
| CN117271851A (en) * | 2023-11-22 | 2023-12-22 | 北京小米移动软件有限公司 | Vertical category search method and device, search system, storage medium |
| CN117971838A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Vector data storage method, query method, device, equipment and storage medium |
| CN118154279A (en) * | 2024-03-27 | 2024-06-07 | 珠海洋羊网络科技有限公司 | Automobile accessory searching method, device, equipment and medium based on HMM model |
| WO2024174865A1 (en) * | 2023-02-22 | 2024-08-29 | 抖音视界有限公司 | Search method and apparatus, and device and storage medium |
| CN118733744A (en) * | 2024-08-30 | 2024-10-01 | 深圳高灯计算机科技有限公司 | Intelligent question-answering method, device, computer equipment, and readable storage medium |
| CN118820407A (en) * | 2024-09-19 | 2024-10-22 | 清华大学 | Hybrid retrieval method and device for lifecycle stream data based on large language model |
| CN119396988A (en) * | 2025-01-03 | 2025-02-07 | 北京数科网维技术有限责任公司 | A hybrid document search method, device and equipment |
| CN120336511A (en) * | 2025-01-13 | 2025-07-18 | 科大讯飞股份有限公司 | Terminology standardization method, device, electronic device and storage medium |
| CN120849535A (en) * | 2025-09-22 | 2025-10-28 | 华院计算技术(上海)股份有限公司 | Large-screen content recall method, device, storage medium, and program product |
-
2022
- 2022-09-29 CN CN202211202291.3A patent/CN115438166A/en active Pending
Cited By (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115905498A (en) * | 2022-12-26 | 2023-04-04 | 上海浦东发展银行股份有限公司 | Data retrieval method, device, equipment and storage medium |
| CN116049267A (en) * | 2022-12-26 | 2023-05-02 | 上海朗晖慧科技术有限公司 | A method for searching and displaying chemicals with multi-dimensional intelligent identification |
| CN115858939A (en) * | 2022-12-31 | 2023-03-28 | 企知道网络技术有限公司 | Method, system and storage medium for recalling in-line |
| CN116089598B (en) * | 2023-02-13 | 2024-03-19 | 合肥工业大学 | A green knowledge recommendation method based on feature similarity and user demand |
| CN116089598A (en) * | 2023-02-13 | 2023-05-09 | 合肥工业大学 | A green knowledge recommendation method based on feature similarity and user demand |
| WO2024174865A1 (en) * | 2023-02-22 | 2024-08-29 | 抖音视界有限公司 | Search method and apparatus, and device and storage medium |
| CN116150497A (en) * | 2023-02-28 | 2023-05-23 | 北京百度网讯科技有限公司 | Text information recommendation method, device, electronic device and storage medium |
| CN116450916B (en) * | 2023-04-11 | 2026-01-06 | 平安科技(深圳)有限公司 | Information retrieval methods, devices, electronic equipment, and media based on segmented and hierarchical classification. |
| CN116450916A (en) * | 2023-04-11 | 2023-07-18 | 平安科技(深圳)有限公司 | Information query method and device based on fixed-segment classification, electronic equipment and medium |
| CN116189193B (en) * | 2023-04-25 | 2023-11-10 | 杭州镭湖科技有限公司 | Data storage visualization method and device based on sample information |
| CN116189193A (en) * | 2023-04-25 | 2023-05-30 | 杭州镭湖科技有限公司 | Data storage visualization method and device based on sample information |
| CN117149992A (en) * | 2023-08-24 | 2023-12-01 | 百度在线网络技术(北京)有限公司 | Data processing methods, devices, equipment and media |
| CN117093601B (en) * | 2023-08-31 | 2025-07-25 | 北京百度网讯科技有限公司 | Recall method, device, equipment and medium for structured data |
| CN117093601A (en) * | 2023-08-31 | 2023-11-21 | 北京百度网讯科技有限公司 | Structured data recall methods, devices, equipment and media |
| CN117150144B (en) * | 2023-10-30 | 2023-12-29 | 南通苏鹏计算机技术有限公司 | Search engine optimization method based on big data |
| CN117150144A (en) * | 2023-10-30 | 2023-12-01 | 南通苏鹏计算机技术有限公司 | A search engine optimization method based on big data |
| CN117235137A (en) * | 2023-11-10 | 2023-12-15 | 深圳市一览网络股份有限公司 | A method and device for occupational information query based on vector database |
| CN117235137B (en) * | 2023-11-10 | 2024-04-02 | 深圳市一览网络股份有限公司 | Professional information query method and device based on vector database |
| CN117235121A (en) * | 2023-11-15 | 2023-12-15 | 华北电力大学 | An energy big data query method and system |
| CN117235121B (en) * | 2023-11-15 | 2024-02-20 | 华北电力大学 | An energy big data query method and system |
| CN117271851A (en) * | 2023-11-22 | 2023-12-22 | 北京小米移动软件有限公司 | Vertical category search method and device, search system, storage medium |
| CN118154279A (en) * | 2024-03-27 | 2024-06-07 | 珠海洋羊网络科技有限公司 | Automobile accessory searching method, device, equipment and medium based on HMM model |
| CN117971838A (en) * | 2024-03-29 | 2024-05-03 | 苏州元脑智能科技有限公司 | Vector data storage method, query method, device, equipment and storage medium |
| CN117971838B (en) * | 2024-03-29 | 2024-06-07 | 苏州元脑智能科技有限公司 | Vector data storage method, query method, device, equipment and storage medium |
| CN118733744A (en) * | 2024-08-30 | 2024-10-01 | 深圳高灯计算机科技有限公司 | Intelligent question-answering method, device, computer equipment, and readable storage medium |
| CN118820407A (en) * | 2024-09-19 | 2024-10-22 | 清华大学 | Hybrid retrieval method and device for lifecycle stream data based on large language model |
| CN119396988A (en) * | 2025-01-03 | 2025-02-07 | 北京数科网维技术有限责任公司 | A hybrid document search method, device and equipment |
| CN120336511A (en) * | 2025-01-13 | 2025-07-18 | 科大讯飞股份有限公司 | Terminology standardization method, device, electronic device and storage medium |
| CN120849535A (en) * | 2025-09-22 | 2025-10-28 | 华院计算技术(上海)股份有限公司 | Large-screen content recall method, device, storage medium, and program product |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN115438166A (en) | Keyword and semantic-based searching method, device, equipment and storage medium | |
| CN108304375B (en) | Information identification method and equipment, storage medium and terminal thereof | |
| WO2020224097A1 (en) | Intelligent semantic document recommendation method and device, and computer-readable storage medium | |
| CN113434636A (en) | Semantic-based approximate text search method and device, computer equipment and medium | |
| CN111428488A (en) | Resume data information analysis and matching method, device, electronic device and medium | |
| CN110737774A (en) | Book knowledge graph construction method, book recommendation method, device, equipment and medium | |
| CN113761125B (en) | Dynamic summary determination method and device, computing device and computer storage medium | |
| CN111930792A (en) | Data resource labeling method and device, storage medium and electronic equipment | |
| CN112926308B (en) | Methods, devices, equipment, storage media and program products for matching text | |
| CN112380866A (en) | Text topic label generation method, terminal device and storage medium | |
| CN113326702A (en) | Semantic recognition method and device, electronic equipment and storage medium | |
| CN113268615A (en) | Resource label generation method and device, electronic equipment and storage medium | |
| CN112632264A (en) | Intelligent question and answer method and device, electronic equipment and storage medium | |
| CN111325033A (en) | Entity identification method, entity identification device, electronic equipment and computer readable storage medium | |
| CN116186220A (en) | Information retrieval method, question and answer processing method, information retrieval device and system | |
| CN115335819B (en) | Methods and systems for searching and retrieving information | |
| CN114118082A (en) | Resume retrieval method and device | |
| CN115878849A (en) | Video tag association method and device and electronic equipment | |
| CN117609418B (en) | Document processing method, device, electronic device and storage medium | |
| CN112149424A (en) | Semantic matching method, apparatus, computer equipment and storage medium | |
| CN113076740A (en) | Synonym mining method and device in government affair service field | |
| CN114417869A (en) | Entity identification method, apparatus, electronic device, and computer-readable storage medium | |
| CN116701680B (en) | Intelligent matching methods, devices, and equipment based on text and images | |
| CN117009170A (en) | Training sample generation method, device, equipment and storage medium | |
| CN114201607A (en) | Information processing method and device |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |