CN118689900B

CN118689900B - A method of operating Elasticsearch based on SQL

Info

Publication number: CN118689900B
Application number: CN202411156408.8A
Authority: CN
Inventors: 黄春鹏; 冯春旭
Original assignee: Beijing Meiluokesi Technology Co ltd
Current assignee: Beijing Meiluokesi Technology Co ltd
Priority date: 2024-08-22
Filing date: 2024-08-22
Publication date: 2024-12-20
Anticipated expiration: 2044-08-22
Also published as: CN118689900A

Abstract

The invention relates to the technical field of data processing, in particular to a method for operating an elastic search based on SQL (structured query language), which comprises the steps of obtaining SQL sentence fragments input by a user, analyzing the SQL sentence fragments to obtain a plurality of original keywords, inputting the original keywords into a target deepfm model to obtain a plurality of candidate keyword combinations and corresponding probabilities, determining push keyword combinations pushed to the user in the candidate keyword combinations, sorting the SQL sentence fragments into SQL sentences after supplementing keywords and obtaining SQL sentences selected by the user based on the push keyword combinations, and processing the SQL sentences selected by the user to obtain the elastic search operation logic corresponding to the SQL sentences selected by the user. The invention can simplify the data operation and development flow of the elastic search and improve the query efficiency of the elastic search.

Description

SQL operation based elastic search method

Technical Field

The invention relates to the technical field of data processing, in particular to a method for operating an elastic search based on SQL.

Background

With the advent of the big data age, non-relational databases (nosqls) have been increasingly valued for their advantages in handling large-scale data sets, high concurrent access, and flexible data models. The elastiscearch is used as a popular open source distributed search and analysis engine, and is constructed based on Lucene to provide full text search, highlighting, automatic completion, aggregation and other functions, and is commonly used for realizing complex search functions such as log data analysis, real-time data analysis and the like.

Conventional Relational Databases (RDBMS) widely use the SQL language as a standard query language. The SQL language, by its features of structuring, ease of understanding and use, becomes a de facto standard for database operations. However, the elastsearch is used as a non-relational database, the query language is Domain Specific Language (DSL) based on json format, which is significantly different from SQL, and for a developer who is accustomed to using SQL, additional time and effort are required to learn a new query language, so that a natural requirement is to operate the data of the elastsearch through SQL sentences.

Chinese patent publication No. CN107153535A discloses a method and apparatus for operating an elastic search, the method comprising obtaining data to be operated from upstream of the elastic search, converting the data to be operated into target operation data by converting the data to be operated into SQL, the target operation data having a data structure corresponding to the elastic search, packaging the target operation data and the type of the conversion operation into an Action class, and operating the elastic search by using the Action class.

Therefore, the invention has the following problems that the converted operation data needs to be packaged into an Action class, and the Action class is used for operating the elastic search, so that the flexibility and the query efficiency are low.

Disclosure of Invention

Therefore, the invention provides a method for operating the elastic search based on SQL, which is used for solving the problem of low query efficiency based on the SQL operation in the prior art.

To achieve the above object, the present invention provides a method for operating an elastic search based on SQL, comprising:

Step S1, acquiring SQL sentence fragments input by a user;

S2, analyzing the SQL sentence fragments according to a preset sentence analysis method to obtain a plurality of original keywords;

step S3, inputting a plurality of original keywords into a target deepfm model to obtain a plurality of candidate keyword combinations output by the target deepfm model and probabilities corresponding to the candidate keyword combinations;

Step S4, determining a plurality of pushed keyword combinations pushed to a user from a plurality of candidate keyword combinations based on probabilities corresponding to the candidate keyword combinations;

step S5, pushing the SQL sentence fragments to users based on the push keyword combinations to obtain SQL sentences selected by the users;

And S6, processing the SQL sentence selected by the user according to a preset mapping rule to obtain an elastic search operation logic corresponding to the SQL sentence selected by the user, and determining the elastic search operation logic as a target operation logic.

Further, in the step S3, the target deepfm model is acquired by:

Step S31, obtaining each initial keyword combination corresponding to SQL sentence fragments input by a plurality of users in historical data and a plurality of intermediate keyword combinations corresponding to SQL sentences input by the users, wherein the initial keyword combinations comprise a plurality of initial keywords;

step S32, generating a plurality of training samples according to the initial keyword combinations;

Step S33, if the intermediate keyword combination contains all the initial keywords in the initial keyword combination, the intermediate keyword combination is used as a sample label of a training sample corresponding to the initial keyword combination;

Step S34, training the initial deepfm model according to each training sample and the corresponding sample label to obtain a target deepfm model.

Further, the step S4 includes:

Step S41, traversing the probability corresponding to each candidate keyword combination, and determining the probability larger than a preset probability threshold as a key probability;

And step S42, determining the candidate keyword combination corresponding to the keyword probability as a push keyword combination pushed to the user so as to obtain a plurality of push keyword combinations.

Further, the step S5 includes:

Step S51, comparing each push keyword in each push keyword combination with each original keyword to determine a plurality of different keywords;

Step S52, carrying out sentence analysis on a plurality of the distinguishing keywords to obtain a plurality of distinguishing sentence fragments;

Step S53, splicing the SQL sentence fragments with the different sentence fragments to obtain a plurality of SQL sentences;

And step S54, pushing a plurality of SQL sentences to the user to obtain the SQL sentences selected by the user.

Further, the step S6 includes:

step S61, analyzing the SQL sentence selected by the user according to a preset SQL analysis method to obtain semantic information corresponding to the SQL sentence selected by the user;

Step S62, the semantic information is converted into corresponding elastic search operation logic based on a preset mapping rule, and the corresponding elastic search operation logic is determined to be target operation logic.

Further, the step S2 includes:

Step S21, acquiring a target instruction according to the SQL statement fragment, wherein the target instruction at least comprises a name tag of a target task type, and the target task type is any one of a plurality of preset task types;

S22, inputting the SQL sentence fragments and the target instructions into a target information extraction model to obtain an extraction result output by the target information extraction model;

Step S23, determining a plurality of original keywords according to the extraction result.

Further, in the step S22, the target information extraction model is acquired by:

Step S221, acquiring a plurality of original text sets, wherein each original text set has a corresponding original task type, each original task type is one of a plurality of preset task types, and each original text set comprises a plurality of original texts;

Step S222, according to each original text in a plurality of original text sets, a corresponding target training sample is obtained to obtain a target training sample set;

Step S223, training the preset large language model according to the target training sample set to obtain a target information extraction model.

Further, in the step S222, the target training sample is obtained by:

Step S2221, determining a task template corresponding to each original text set according to the original task type corresponding to the original text set, wherein each preset task type is provided with a corresponding task template, each task template is provided with a plurality of slots, and each slot is filled with a name label corresponding to the corresponding preset task type;

step S2222, filling the name tag of the original task type corresponding to each original text into the slot of the corresponding task template, so as to obtain a task instruction corresponding to each original text;

step 2223, splice each original text and the task instruction corresponding to the original text, so as to obtain a corresponding target training sample.

Further, in the step S61, the preset SQL parsing method includes parsing according to the parse method of CCJSqlParserUtil classes.

Further, before the step S41, the method further includes:

And step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is larger than a preset number threshold, re-acquiring a target deepfm model based on the steps S31-S34 to re-determine a plurality of candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations, wherein N/A represents the probability overflow corresponding to the candidate keyword combinations output by the target deepfm model, and 0 represents the probability underflow corresponding to the candidate keyword combinations of the tab to be displayed output by the target deepfm model.

Compared with the prior art, the method has the advantages that through analyzing the SQL sentence fragments input by the user, a plurality of original keywords corresponding to the SQL sentence fragments are obtained, the input result of the user is predicted and recommended to the user according to the target deepfm model, and the input efficiency of the user can be improved, so that the query efficiency is improved. The SQL sentences selected by the user are processed according to the preset mapping rules to obtain the corresponding elastiscearch operation logic, so that the flexibility of operation of elastiscearch can be improved, and the query efficiency of the user can be improved.

Further, the candidate keyword combinations with the probability larger than the preset probability threshold corresponding to the candidate keyword combinations are pushed, so that the input efficiency of a user and the pushing accuracy can be improved.

Furthermore, the invention can determine the distinguishing sentence fragments by determining the distinguishing keywords, splice the SQL sentence fragments with the distinguishing sentence fragments, and determine a plurality of SQL sentences pushed to the user, thereby reducing the errors pushed to the user and improving the user input efficiency.

Furthermore, the invention can improve the conversion efficiency and the flexibility of operating the elastic search by analyzing the SQL sentence and determining the target operation logic according to the SQL sentence.

Furthermore, the method and the device can improve the accuracy of determining the original keywords by constructing the target information extraction model to determine the original keywords.

Further, according to the method and the device, whether the target deepfm model is reacquired is determined according to the number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations, so that accuracy of determining the candidate keyword combinations and the probabilities corresponding to the candidate keyword combinations can be improved, and accuracy of pushing SQL sentences to users is further improved.

Drawings

FIG. 1 is a flow chart of a method of the present invention for operating an elastic search based on SQL;

FIG. 2 is a flow chart of step S2 according to the embodiment of the present invention;

FIG. 3 is a flowchart of an embodiment of the present invention for obtaining a target information extraction model;

FIG. 4 is a flowchart of an embodiment of obtaining a target training sample;

FIG. 5 is a flow chart of an embodiment of the invention for obtaining a model of a target deepfm;

FIG. 6 is a flowchart of step S4 according to an embodiment of the present invention;

FIG. 7 is a flowchart of step S5 according to an embodiment of the present invention;

Fig. 8 is a flowchart illustrating step S6 in the embodiment of the invention.

Detailed Description

The invention will be further described with reference to examples for the purpose of making the objects and advantages of the invention more apparent, it being understood that the specific examples described herein are given by way of illustration only and are not intended to be limiting.

Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.

It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.

In addition, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected through an intermediate medium, or in communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.

Referring to FIG. 1, a flow chart of a method of the present invention for operating an elastic search based on SQL is shown;

the invention provides a method for operating an elastic search based on SQL, which comprises the following steps:

Step S1, acquiring SQL sentence fragments input by a user;

In the implementation, the length of the complete SQL sentence is longer, and the bytes are more, for example, the SQL sentence of the inserted data is an insert insert_main_table ('22222', 'appeared'), and the SQL sentence fragment input by the user in the invention is a plurality of bytes input in advance, for example, an insert in, and the corresponding original keywords can be inserts and in.

Fig. 2 is a schematic flow chart of step S2 in the embodiment of the present invention, specifically, step S2 includes:

In implementation, the preset task type may be noun extraction, pronoun extraction, preposition extraction, conjunctive extraction, article extraction, adjective extraction, etc., and the corresponding name label may be noun, pronoun, preposition, conjunctive, article, adjective, etc.

in implementation, the extraction result is information output by the target information extraction model according to the target instruction.

In implementation, the original keywords determined by the extraction result are information output by the target information extraction model according to the target instruction.

Fig. 3 is a flowchart of the method for obtaining the target information extraction model according to the embodiment of the invention, specifically, in the step S22, the target information extraction model is obtained by:

In an implementation, several sets of original text are obtained from multiple data sources, including various SQL statements.

In implementations, the preset large language model may be a T5 model, baichuan model, YAYI model, chat model, or the like. Because the large language model has strong semantic understanding capability and generalization capability, different information extraction tasks can be well adapted, and higher extraction accuracy can be ensured. The original text sets comprise original texts with different task types, target training samples are generated according to the original texts with the preset task types, and the preset large language model is trained, so that a target information extraction model is obtained, the target information extraction model can have the capability of extracting information with different task types, and the divergence of the target information extraction model can be reduced. It is to be understood that the preset large predictive model is not specifically limited herein and will not be described herein.

According to the method and the device, the original keywords are determined by constructing the target information extraction model, so that the accuracy of determining the original keywords can be improved.

Fig. 4 is a flowchart of the method for obtaining the target training sample according to the embodiment of the invention, specifically, in the step S222, the target training sample is obtained by:

In practice, taking one task template extracted by an entity as an example, it may be set that the extracted task type is [ "], and answers in json {": [ ] }, format. The slot in the "" "is a slot, and the slot of the task template can be filled with the extracted task type [" noun "], and is answered in json {" noun ": [ ] }, format.

It can be appreciated that, inputting a plurality of original keywords into the target deepfm model may obtain the confidence probability of each candidate keyword combination corresponding to the original keywords output by the target deepfm model, that is, the probability corresponding to each candidate keyword combination.

Referring to fig. 5, which is a flowchart illustrating the acquisition of the object deepfm model according to an embodiment of the present invention, specifically, in the step S3, the object deepfm model is acquired by:

In an implementation, if the SQL sentence fragment is insert into, the corresponding initial keywords may be insert and into, and if the SQL sentence input by the user may be insert into main_table (id, title) values ('22222', 'appeared') and insert into reader (id, bid, type) values ('101', '1001', 1), the corresponding intermediate keywords are respectively insert, into, main _ table, id, title, values, 22222, appeared and insert, into, reader, id, bid, type, values, 101, 1001, 1. It is to be understood that the intermediate keyword combination may be any two or more of the intermediate keywords, and the keyword combinations after the position order of the keywords in the intermediate keyword combination are exchanged are considered to be the same keyword combination.

Fig. 6 is a schematic flow chart of step S4 in the embodiment of the present invention, specifically, step S4 includes:

in an actual application scene, an actual implementation person can set a preset probability threshold according to an actual situation, and preferably, the value range of the preset probability threshold can be set to be 0.7-0.8.

According to the method and the device for pushing the candidate keyword combinations, the candidate keyword combinations with the probability larger than the preset probability threshold corresponding to the candidate keyword combinations are pushed, so that the input efficiency of a user and the pushing accuracy can be improved.

Specifically, before the step S41, the method further includes:

Step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is greater than a preset number threshold, re-acquiring the target deepfm model based on the steps S31-S34 to re-determine a plurality of candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations;

Wherein N/A represents the probability overflow corresponding to the candidate keyword combination output by the target deepfm model, and 0 represents the probability underflow corresponding to the candidate keyword combination of the tab to be displayed output by the target deepfm model.

In practice, the target deepfm model is reacquired by increasing the range of historical data to increase the number of training samples and retraining the initial deepfm model based on the steps S31-S34.

In an actual application scene, an actual implementation personnel can set a preset quantity threshold according to an actual situation, and preferably, the value range of the preset quantity threshold is set to be 0.4 times to 0.6 times of the total quantity of candidate keyword combinations output by the target deepfm model.

According to the method and the device, whether the target deepfm model is reacquired is determined according to the number of N/A and 0 in the probability corresponding to each candidate keyword combination, so that accuracy of determining each candidate keyword combination and the probability corresponding to each candidate keyword combination can be improved, and accuracy of pushing SQL sentences to users is further improved.

Fig. 7 is a schematic flow chart of step S5 in the embodiment of the present invention, specifically, step S5 includes:

According to the invention, the distinguishing sentence fragments can be determined by determining the distinguishing keywords, and the SQL sentence fragments are spliced with the distinguishing sentence fragments, so that a plurality of SQL sentences pushed to the user can be determined, the errors pushed to the user can be reduced, and the user input efficiency is improved.

Fig. 8 is a schematic flow chart of step S6 in the embodiment of the present invention, specifically, step S6 includes:

In implementation, the preset SQL parsing method is a method capable of parsing an SQL statement to obtain semantic information according to any one of the prior art, and will not be described herein.

Specifically, the semantic information includes information about the type of operation (e.g., SELECT, INSERT, UPDATE, DELETE) and related behavior, query conditions, table names, field names, etc.

In the step S61, the preset SQL parsing method includes parsing according to the parse method of CCJSqlParserUtil classes.

In implementation, according to the package method parsing of CCJSqlParserUtil classes, when the SQL sentence selected by the user is "insert name user (id, name, age) value", the code for parsing the SQL sentence selected by the user is:

STRING SQL = "insert_in_user (id, name, age) values;// parse" insert_in_user (id, name, age) values: id: name, age) "

In practice, the actual implementation personnel can set the preset mapping rule based on the expert system according to the actual situation, for example, convert a WHERE clause in SQL into a query condition of an elastomer search, and UPDATE and insert data with UPDATE and INSTER respectively.

The invention can improve the conversion efficiency and the flexibility of operating the elastic search by analyzing the SQL sentence and determining the target operation logic according to the SQL sentence.

According to the invention, the SQL sentence fragments input by the user are analyzed to obtain a plurality of original keywords corresponding to the SQL sentence fragments, and the input result of the user is predicted and recommended to the user according to the target deepfm model, so that the input efficiency of the user can be improved, and the query efficiency is improved. The SQL sentences selected by the user are processed according to the preset mapping rules to obtain the corresponding elastiscearch operation logic, so that the flexibility of operation of elastiscearch can be improved, and the query efficiency of the user can be improved.

Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims

1. A method for operating Elasticsearch based on SQL, comprising:

Step S1, obtaining a SQL statement fragment input by a user;

Step S2, analyzing the SQL statement fragment according to a preset statement analysis method to obtain a number of original keywords;

The step S2 comprises:

Step S21, obtaining a target instruction according to the SQL statement fragment; the target instruction includes at least a name tag of a target task type; the target task type is any one of several preset task types;

Step S22, inputting the SQL statement fragment and the target instruction into the target information extraction model to obtain the extraction result output by the target information extraction model;

Step S23, determining a number of original keywords according to the extraction result;

In step S22, the target information extraction model is obtained by the following steps:

Step S221, obtaining a plurality of original text sets; each original text set has a corresponding original task type; each original task type is one of a plurality of preset task types; each of the original text sets includes a plurality of original texts;

Step S222, acquiring a corresponding target training sample according to each original text in the plurality of original text sets to obtain a target training sample set;

Step S223, training a preset large language model according to the target training sample set to obtain the target information extraction model;

Step S3, inputting the several original keywords into the target deepfm model to obtain several candidate keyword combinations output by the target deepfm model and the probability corresponding to each candidate keyword combination;

Step S4, determining a plurality of push keyword combinations to be pushed to the user from among the plurality of candidate keyword combinations based on the probabilities corresponding to the candidate keyword combinations;

Step S5, based on each of the push keyword combinations, the SQL statement fragments are sorted into SQL statements supplemented with keywords and pushed to the user to obtain the SQL statement selected by the user;

Step S6: Process the SQL statement selected by the user according to a preset mapping rule to obtain the Elasticsearch operation logic corresponding to the SQL statement selected by the user, and determine it as the target operation logic.

2. The method for operating Elasticsearch based on SQL according to claim 1, characterized in that, in step S3, the target deepfm model is obtained by the following steps:

Step S31, obtaining initial keyword combinations corresponding to several SQL statement fragments input by users in historical data and several intermediate keyword combinations corresponding to SQL statements input by each user; the initial keyword combinations include several initial keywords;

Step S32, generating a plurality of training samples according to each of the initial keyword combinations;

Step S33, if the intermediate keyword combination contains all the initial keywords in the initial keyword combination, use them as sample labels of the training samples corresponding to the initial keyword combination;

Step S34, training the initial deepfm model according to each training sample and the corresponding sample label to obtain the target deepfm model.

3. The method for operating Elasticsearch based on SQL according to claim 2, characterized in that the step S4 comprises:

Step S41, traversing the probabilities corresponding to each candidate keyword combination, and determining the probability greater than a preset probability threshold as a key probability;

Step S42: determining the candidate keyword combinations corresponding to the key probabilities as the push keyword combinations pushed to the user, so as to obtain a plurality of push keyword combinations.

4. The method for operating Elasticsearch based on SQL according to claim 3, characterized in that step S5 comprises:

Step S51, comparing each push keyword in each push keyword combination with each original keyword to determine a number of distinguishing keywords;

Step S52, performing sentence analysis on the plurality of distinguishing keywords to obtain a plurality of distinguishing sentence fragments;

Step S53, concatenating the SQL statement fragment with each distinguishing statement fragment to obtain a plurality of SQL statements;

Step S54: Pushing the plurality of SQL statements to the user to obtain the SQL statement selected by the user.

5. The method for operating Elasticsearch based on SQL according to claim 4, characterized in that step S6 comprises:

Step S61, parsing the SQL statement selected by the user according to a preset SQL parsing method to obtain semantic information corresponding to the SQL statement selected by the user;

Step S62: convert the semantic information into corresponding Elasticsearch operation logic based on preset mapping rules, and determine it as the target operation logic.

6. The method for operating Elasticsearch based on SQL according to claim 5, characterized in that, in the step S222, the target training sample is obtained by the following steps:

Step S2221, according to the original task type corresponding to the original text set, determine the task template corresponding to each original text set; each preset task type has a corresponding task template; each task template has a number of slots; each slot is filled with a name tag corresponding to the preset task type;

Step S2222, filling the name tag of the original task type corresponding to each original text into the slot of the corresponding task template to obtain the task instruction corresponding to each original text;

Step 2223, concatenate each original text and the task instruction corresponding to the original text to obtain the corresponding target training sample.

7. The method for operating Elasticsearch based on SQL according to claim 6 is characterized in that, in the step S61, the preset SQL parsing method includes parsing according to the parse method of the CCJSqlParserUtil class.

8. The method for operating Elasticsearch based on SQL according to claim 7, characterized in that before step S41, it also includes:

Step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is greater than a preset threshold, then reacquire the target deepfm model based on the steps S31 to S34 to re-determine the several candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations;

Among them, N/A indicates that the probability of candidate keyword combination output by the target deepfm model overflows; 0 indicates that the probability of candidate keyword combination of the to-be-displayed tab output by the target deepfm model underflows.