[go: up one dir, main page]

CN118689900B - A method of operating Elasticsearch based on SQL - Google Patents

A method of operating Elasticsearch based on SQL Download PDF

Info

Publication number
CN118689900B
CN118689900B CN202411156408.8A CN202411156408A CN118689900B CN 118689900 B CN118689900 B CN 118689900B CN 202411156408 A CN202411156408 A CN 202411156408A CN 118689900 B CN118689900 B CN 118689900B
Authority
CN
China
Prior art keywords
sql
target
user
original
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411156408.8A
Other languages
Chinese (zh)
Other versions
CN118689900A (en
Inventor
黄春鹏
冯春旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Meiluokesi Technology Co ltd
Original Assignee
Beijing Meiluokesi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Meiluokesi Technology Co ltd filed Critical Beijing Meiluokesi Technology Co ltd
Priority to CN202411156408.8A priority Critical patent/CN118689900B/en
Publication of CN118689900A publication Critical patent/CN118689900A/en
Application granted granted Critical
Publication of CN118689900B publication Critical patent/CN118689900B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2423Interactive query statement specification based on a database schema
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24526Internal representations for queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data processing, in particular to a method for operating an elastic search based on SQL (structured query language), which comprises the steps of obtaining SQL sentence fragments input by a user, analyzing the SQL sentence fragments to obtain a plurality of original keywords, inputting the original keywords into a target deepfm model to obtain a plurality of candidate keyword combinations and corresponding probabilities, determining push keyword combinations pushed to the user in the candidate keyword combinations, sorting the SQL sentence fragments into SQL sentences after supplementing keywords and obtaining SQL sentences selected by the user based on the push keyword combinations, and processing the SQL sentences selected by the user to obtain the elastic search operation logic corresponding to the SQL sentences selected by the user. The invention can simplify the data operation and development flow of the elastic search and improve the query efficiency of the elastic search.

Description

SQL operation based elastic search method
Technical Field
The invention relates to the technical field of data processing, in particular to a method for operating an elastic search based on SQL.
Background
With the advent of the big data age, non-relational databases (nosqls) have been increasingly valued for their advantages in handling large-scale data sets, high concurrent access, and flexible data models. The elastiscearch is used as a popular open source distributed search and analysis engine, and is constructed based on Lucene to provide full text search, highlighting, automatic completion, aggregation and other functions, and is commonly used for realizing complex search functions such as log data analysis, real-time data analysis and the like.
Conventional Relational Databases (RDBMS) widely use the SQL language as a standard query language. The SQL language, by its features of structuring, ease of understanding and use, becomes a de facto standard for database operations. However, the elastsearch is used as a non-relational database, the query language is Domain Specific Language (DSL) based on json format, which is significantly different from SQL, and for a developer who is accustomed to using SQL, additional time and effort are required to learn a new query language, so that a natural requirement is to operate the data of the elastsearch through SQL sentences.
Chinese patent publication No. CN107153535A discloses a method and apparatus for operating an elastic search, the method comprising obtaining data to be operated from upstream of the elastic search, converting the data to be operated into target operation data by converting the data to be operated into SQL, the target operation data having a data structure corresponding to the elastic search, packaging the target operation data and the type of the conversion operation into an Action class, and operating the elastic search by using the Action class.
Therefore, the invention has the following problems that the converted operation data needs to be packaged into an Action class, and the Action class is used for operating the elastic search, so that the flexibility and the query efficiency are low.
Disclosure of Invention
Therefore, the invention provides a method for operating the elastic search based on SQL, which is used for solving the problem of low query efficiency based on the SQL operation in the prior art.
To achieve the above object, the present invention provides a method for operating an elastic search based on SQL, comprising:
Step S1, acquiring SQL sentence fragments input by a user;
S2, analyzing the SQL sentence fragments according to a preset sentence analysis method to obtain a plurality of original keywords;
step S3, inputting a plurality of original keywords into a target deepfm model to obtain a plurality of candidate keyword combinations output by the target deepfm model and probabilities corresponding to the candidate keyword combinations;
Step S4, determining a plurality of pushed keyword combinations pushed to a user from a plurality of candidate keyword combinations based on probabilities corresponding to the candidate keyword combinations;
step S5, pushing the SQL sentence fragments to users based on the push keyword combinations to obtain SQL sentences selected by the users;
And S6, processing the SQL sentence selected by the user according to a preset mapping rule to obtain an elastic search operation logic corresponding to the SQL sentence selected by the user, and determining the elastic search operation logic as a target operation logic.
Further, in the step S3, the target deepfm model is acquired by:
Step S31, obtaining each initial keyword combination corresponding to SQL sentence fragments input by a plurality of users in historical data and a plurality of intermediate keyword combinations corresponding to SQL sentences input by the users, wherein the initial keyword combinations comprise a plurality of initial keywords;
step S32, generating a plurality of training samples according to the initial keyword combinations;
Step S33, if the intermediate keyword combination contains all the initial keywords in the initial keyword combination, the intermediate keyword combination is used as a sample label of a training sample corresponding to the initial keyword combination;
Step S34, training the initial deepfm model according to each training sample and the corresponding sample label to obtain a target deepfm model.
Further, the step S4 includes:
Step S41, traversing the probability corresponding to each candidate keyword combination, and determining the probability larger than a preset probability threshold as a key probability;
And step S42, determining the candidate keyword combination corresponding to the keyword probability as a push keyword combination pushed to the user so as to obtain a plurality of push keyword combinations.
Further, the step S5 includes:
Step S51, comparing each push keyword in each push keyword combination with each original keyword to determine a plurality of different keywords;
Step S52, carrying out sentence analysis on a plurality of the distinguishing keywords to obtain a plurality of distinguishing sentence fragments;
Step S53, splicing the SQL sentence fragments with the different sentence fragments to obtain a plurality of SQL sentences;
And step S54, pushing a plurality of SQL sentences to the user to obtain the SQL sentences selected by the user.
Further, the step S6 includes:
step S61, analyzing the SQL sentence selected by the user according to a preset SQL analysis method to obtain semantic information corresponding to the SQL sentence selected by the user;
Step S62, the semantic information is converted into corresponding elastic search operation logic based on a preset mapping rule, and the corresponding elastic search operation logic is determined to be target operation logic.
Further, the step S2 includes:
Step S21, acquiring a target instruction according to the SQL statement fragment, wherein the target instruction at least comprises a name tag of a target task type, and the target task type is any one of a plurality of preset task types;
S22, inputting the SQL sentence fragments and the target instructions into a target information extraction model to obtain an extraction result output by the target information extraction model;
Step S23, determining a plurality of original keywords according to the extraction result.
Further, in the step S22, the target information extraction model is acquired by:
Step S221, acquiring a plurality of original text sets, wherein each original text set has a corresponding original task type, each original task type is one of a plurality of preset task types, and each original text set comprises a plurality of original texts;
Step S222, according to each original text in a plurality of original text sets, a corresponding target training sample is obtained to obtain a target training sample set;
Step S223, training the preset large language model according to the target training sample set to obtain a target information extraction model.
Further, in the step S222, the target training sample is obtained by:
Step S2221, determining a task template corresponding to each original text set according to the original task type corresponding to the original text set, wherein each preset task type is provided with a corresponding task template, each task template is provided with a plurality of slots, and each slot is filled with a name label corresponding to the corresponding preset task type;
step S2222, filling the name tag of the original task type corresponding to each original text into the slot of the corresponding task template, so as to obtain a task instruction corresponding to each original text;
step 2223, splice each original text and the task instruction corresponding to the original text, so as to obtain a corresponding target training sample.
Further, in the step S61, the preset SQL parsing method includes parsing according to the parse method of CCJSqlParserUtil classes.
Further, before the step S41, the method further includes:
And step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is larger than a preset number threshold, re-acquiring a target deepfm model based on the steps S31-S34 to re-determine a plurality of candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations, wherein N/A represents the probability overflow corresponding to the candidate keyword combinations output by the target deepfm model, and 0 represents the probability underflow corresponding to the candidate keyword combinations of the tab to be displayed output by the target deepfm model.
Compared with the prior art, the method has the advantages that through analyzing the SQL sentence fragments input by the user, a plurality of original keywords corresponding to the SQL sentence fragments are obtained, the input result of the user is predicted and recommended to the user according to the target deepfm model, and the input efficiency of the user can be improved, so that the query efficiency is improved. The SQL sentences selected by the user are processed according to the preset mapping rules to obtain the corresponding elastiscearch operation logic, so that the flexibility of operation of elastiscearch can be improved, and the query efficiency of the user can be improved.
Further, the candidate keyword combinations with the probability larger than the preset probability threshold corresponding to the candidate keyword combinations are pushed, so that the input efficiency of a user and the pushing accuracy can be improved.
Furthermore, the invention can determine the distinguishing sentence fragments by determining the distinguishing keywords, splice the SQL sentence fragments with the distinguishing sentence fragments, and determine a plurality of SQL sentences pushed to the user, thereby reducing the errors pushed to the user and improving the user input efficiency.
Furthermore, the invention can improve the conversion efficiency and the flexibility of operating the elastic search by analyzing the SQL sentence and determining the target operation logic according to the SQL sentence.
Furthermore, the method and the device can improve the accuracy of determining the original keywords by constructing the target information extraction model to determine the original keywords.
Further, according to the method and the device, whether the target deepfm model is reacquired is determined according to the number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations, so that accuracy of determining the candidate keyword combinations and the probabilities corresponding to the candidate keyword combinations can be improved, and accuracy of pushing SQL sentences to users is further improved.
Drawings
FIG. 1 is a flow chart of a method of the present invention for operating an elastic search based on SQL;
FIG. 2 is a flow chart of step S2 according to the embodiment of the present invention;
FIG. 3 is a flowchart of an embodiment of the present invention for obtaining a target information extraction model;
FIG. 4 is a flowchart of an embodiment of obtaining a target training sample;
FIG. 5 is a flow chart of an embodiment of the invention for obtaining a model of a target deepfm;
FIG. 6 is a flowchart of step S4 according to an embodiment of the present invention;
FIG. 7 is a flowchart of step S5 according to an embodiment of the present invention;
Fig. 8 is a flowchart illustrating step S6 in the embodiment of the invention.
Detailed Description
The invention will be further described with reference to examples for the purpose of making the objects and advantages of the invention more apparent, it being understood that the specific examples described herein are given by way of illustration only and are not intended to be limiting.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are merely for explaining the technical principles of the present invention, and are not intended to limit the scope of the present invention.
It should be noted that, in the description of the present invention, terms such as "upper," "lower," "left," "right," "inner," "outer," and the like indicate directions or positional relationships based on the directions or positional relationships shown in the drawings, which are merely for convenience of description, and do not indicate or imply that the apparatus or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus should not be construed as limiting the present invention.
In addition, it should be noted that, in the description of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, integrally connected, mechanically connected, electrically connected, directly connected, indirectly connected through an intermediate medium, or in communication between two elements. The specific meaning of the above terms in the present invention can be understood by those skilled in the art according to the specific circumstances.
Referring to FIG. 1, a flow chart of a method of the present invention for operating an elastic search based on SQL is shown;
the invention provides a method for operating an elastic search based on SQL, which comprises the following steps:
Step S1, acquiring SQL sentence fragments input by a user;
S2, analyzing the SQL sentence fragments according to a preset sentence analysis method to obtain a plurality of original keywords;
In the implementation, the length of the complete SQL sentence is longer, and the bytes are more, for example, the SQL sentence of the inserted data is an insert insert_main_table ('22222', 'appeared'), and the SQL sentence fragment input by the user in the invention is a plurality of bytes input in advance, for example, an insert in, and the corresponding original keywords can be inserts and in.
Fig. 2 is a schematic flow chart of step S2 in the embodiment of the present invention, specifically, step S2 includes:
Step S21, acquiring a target instruction according to the SQL statement fragment, wherein the target instruction at least comprises a name tag of a target task type, and the target task type is any one of a plurality of preset task types;
In implementation, the preset task type may be noun extraction, pronoun extraction, preposition extraction, conjunctive extraction, article extraction, adjective extraction, etc., and the corresponding name label may be noun, pronoun, preposition, conjunctive, article, adjective, etc.
S22, inputting the SQL sentence fragments and the target instructions into a target information extraction model to obtain an extraction result output by the target information extraction model;
in implementation, the extraction result is information output by the target information extraction model according to the target instruction.
Step S23, determining a plurality of original keywords according to the extraction result.
In implementation, the original keywords determined by the extraction result are information output by the target information extraction model according to the target instruction.
Fig. 3 is a flowchart of the method for obtaining the target information extraction model according to the embodiment of the invention, specifically, in the step S22, the target information extraction model is obtained by:
Step S221, acquiring a plurality of original text sets, wherein each original text set has a corresponding original task type, each original task type is one of a plurality of preset task types, and each original text set comprises a plurality of original texts;
In an implementation, several sets of original text are obtained from multiple data sources, including various SQL statements.
Step S222, according to each original text in a plurality of original text sets, a corresponding target training sample is obtained to obtain a target training sample set;
Step S223, training the preset large language model according to the target training sample set to obtain a target information extraction model.
In implementations, the preset large language model may be a T5 model, baichuan model, YAYI model, chat model, or the like. Because the large language model has strong semantic understanding capability and generalization capability, different information extraction tasks can be well adapted, and higher extraction accuracy can be ensured. The original text sets comprise original texts with different task types, target training samples are generated according to the original texts with the preset task types, and the preset large language model is trained, so that a target information extraction model is obtained, the target information extraction model can have the capability of extracting information with different task types, and the divergence of the target information extraction model can be reduced. It is to be understood that the preset large predictive model is not specifically limited herein and will not be described herein.
According to the method and the device, the original keywords are determined by constructing the target information extraction model, so that the accuracy of determining the original keywords can be improved.
Fig. 4 is a flowchart of the method for obtaining the target training sample according to the embodiment of the invention, specifically, in the step S222, the target training sample is obtained by:
Step S2221, determining a task template corresponding to each original text set according to the original task type corresponding to the original text set, wherein each preset task type is provided with a corresponding task template, each task template is provided with a plurality of slots, and each slot is filled with a name label corresponding to the corresponding preset task type;
step S2222, filling the name tag of the original task type corresponding to each original text into the slot of the corresponding task template, so as to obtain a task instruction corresponding to each original text;
step 2223, splice each original text and the task instruction corresponding to the original text, so as to obtain a corresponding target training sample.
In practice, taking one task template extracted by an entity as an example, it may be set that the extracted task type is [ "], and answers in json {": [ ] }, format. The slot in the "" "is a slot, and the slot of the task template can be filled with the extracted task type [" noun "], and is answered in json {" noun ": [ ] }, format.
Step S3, inputting a plurality of original keywords into a target deepfm model to obtain a plurality of candidate keyword combinations output by the target deepfm model and probabilities corresponding to the candidate keyword combinations;
It can be appreciated that, inputting a plurality of original keywords into the target deepfm model may obtain the confidence probability of each candidate keyword combination corresponding to the original keywords output by the target deepfm model, that is, the probability corresponding to each candidate keyword combination.
Referring to fig. 5, which is a flowchart illustrating the acquisition of the object deepfm model according to an embodiment of the present invention, specifically, in the step S3, the object deepfm model is acquired by:
Step S31, obtaining each initial keyword combination corresponding to SQL sentence fragments input by a plurality of users in historical data and a plurality of intermediate keyword combinations corresponding to SQL sentences input by the users, wherein the initial keyword combinations comprise a plurality of initial keywords;
In an implementation, if the SQL sentence fragment is insert into, the corresponding initial keywords may be insert and into, and if the SQL sentence input by the user may be insert into main_table (id, title) values ('22222', 'appeared') and insert into reader (id, bid, type) values ('101', '1001', 1), the corresponding intermediate keywords are respectively insert, into, main _ table, id, title, values, 22222, appeared and insert, into, reader, id, bid, type, values, 101, 1001, 1. It is to be understood that the intermediate keyword combination may be any two or more of the intermediate keywords, and the keyword combinations after the position order of the keywords in the intermediate keyword combination are exchanged are considered to be the same keyword combination.
Step S32, generating a plurality of training samples according to the initial keyword combinations;
Step S33, if the intermediate keyword combination contains all the initial keywords in the initial keyword combination, the intermediate keyword combination is used as a sample label of a training sample corresponding to the initial keyword combination;
Step S34, training the initial deepfm model according to each training sample and the corresponding sample label to obtain a target deepfm model.
Step S4, determining a plurality of pushed keyword combinations pushed to a user from a plurality of candidate keyword combinations based on probabilities corresponding to the candidate keyword combinations;
Fig. 6 is a schematic flow chart of step S4 in the embodiment of the present invention, specifically, step S4 includes:
Step S41, traversing the probability corresponding to each candidate keyword combination, and determining the probability larger than a preset probability threshold as a key probability;
in an actual application scene, an actual implementation person can set a preset probability threshold according to an actual situation, and preferably, the value range of the preset probability threshold can be set to be 0.7-0.8.
And step S42, determining the candidate keyword combination corresponding to the keyword probability as a push keyword combination pushed to the user so as to obtain a plurality of push keyword combinations.
According to the method and the device for pushing the candidate keyword combinations, the candidate keyword combinations with the probability larger than the preset probability threshold corresponding to the candidate keyword combinations are pushed, so that the input efficiency of a user and the pushing accuracy can be improved.
Specifically, before the step S41, the method further includes:
Step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is greater than a preset number threshold, re-acquiring the target deepfm model based on the steps S31-S34 to re-determine a plurality of candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations;
Wherein N/A represents the probability overflow corresponding to the candidate keyword combination output by the target deepfm model, and 0 represents the probability underflow corresponding to the candidate keyword combination of the tab to be displayed output by the target deepfm model.
In practice, the target deepfm model is reacquired by increasing the range of historical data to increase the number of training samples and retraining the initial deepfm model based on the steps S31-S34.
In an actual application scene, an actual implementation personnel can set a preset quantity threshold according to an actual situation, and preferably, the value range of the preset quantity threshold is set to be 0.4 times to 0.6 times of the total quantity of candidate keyword combinations output by the target deepfm model.
According to the method and the device, whether the target deepfm model is reacquired is determined according to the number of N/A and 0 in the probability corresponding to each candidate keyword combination, so that accuracy of determining each candidate keyword combination and the probability corresponding to each candidate keyword combination can be improved, and accuracy of pushing SQL sentences to users is further improved.
Step S5, pushing the SQL sentence fragments to users based on the push keyword combinations to obtain SQL sentences selected by the users;
Fig. 7 is a schematic flow chart of step S5 in the embodiment of the present invention, specifically, step S5 includes:
Step S51, comparing each push keyword in each push keyword combination with each original keyword to determine a plurality of different keywords;
Step S52, carrying out sentence analysis on a plurality of the distinguishing keywords to obtain a plurality of distinguishing sentence fragments;
Step S53, splicing the SQL sentence fragments with the different sentence fragments to obtain a plurality of SQL sentences;
And step S54, pushing a plurality of SQL sentences to the user to obtain the SQL sentences selected by the user.
According to the invention, the distinguishing sentence fragments can be determined by determining the distinguishing keywords, and the SQL sentence fragments are spliced with the distinguishing sentence fragments, so that a plurality of SQL sentences pushed to the user can be determined, the errors pushed to the user can be reduced, and the user input efficiency is improved.
And S6, processing the SQL sentence selected by the user according to a preset mapping rule to obtain an elastic search operation logic corresponding to the SQL sentence selected by the user, and determining the elastic search operation logic as a target operation logic.
Fig. 8 is a schematic flow chart of step S6 in the embodiment of the present invention, specifically, step S6 includes:
step S61, analyzing the SQL sentence selected by the user according to a preset SQL analysis method to obtain semantic information corresponding to the SQL sentence selected by the user;
In implementation, the preset SQL parsing method is a method capable of parsing an SQL statement to obtain semantic information according to any one of the prior art, and will not be described herein.
Specifically, the semantic information includes information about the type of operation (e.g., SELECT, INSERT, UPDATE, DELETE) and related behavior, query conditions, table names, field names, etc.
In the step S61, the preset SQL parsing method includes parsing according to the parse method of CCJSqlParserUtil classes.
In implementation, according to the package method parsing of CCJSqlParserUtil classes, when the SQL sentence selected by the user is "insert name user (id, name, age) value", the code for parsing the SQL sentence selected by the user is:
STRING SQL = "insert_in_user (id, name, age) values;// parse" insert_in_user (id, name, age) values: id: name, age) "
Step S62, the semantic information is converted into corresponding elastic search operation logic based on a preset mapping rule, and the corresponding elastic search operation logic is determined to be target operation logic.
In practice, the actual implementation personnel can set the preset mapping rule based on the expert system according to the actual situation, for example, convert a WHERE clause in SQL into a query condition of an elastomer search, and UPDATE and insert data with UPDATE and INSTER respectively.
The invention can improve the conversion efficiency and the flexibility of operating the elastic search by analyzing the SQL sentence and determining the target operation logic according to the SQL sentence.
According to the invention, the SQL sentence fragments input by the user are analyzed to obtain a plurality of original keywords corresponding to the SQL sentence fragments, and the input result of the user is predicted and recommended to the user according to the target deepfm model, so that the input efficiency of the user can be improved, and the query efficiency is improved. The SQL sentences selected by the user are processed according to the preset mapping rules to obtain the corresponding elastiscearch operation logic, so that the flexibility of operation of elastiscearch can be improved, and the query efficiency of the user can be improved.
Thus far, the technical solution of the present invention has been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of protection of the present invention is not limited to these specific embodiments. Equivalent modifications and substitutions for related technical features may be made by those skilled in the art without departing from the principles of the present invention, and such modifications and substitutions will be within the scope of the present invention.

Claims (8)

1.一种基于SQL操作Elasticsearch的方法,其特征在于,包括:1. A method for operating Elasticsearch based on SQL, comprising: 步骤S1,获取用户输入的SQL语句片段;Step S1, obtaining a SQL statement fragment input by a user; 步骤S2,根据预设的语句分析方法对所述SQL语句片段进行分析,以得到若干原始关键词;Step S2, analyzing the SQL statement fragment according to a preset statement analysis method to obtain a number of original keywords; 所述步骤S2,包括:The step S2 comprises: 步骤S21,根据所述SQL语句片段,获取目标指令;所述目标指令至少包括一个目标任务类型的名称标签;所述目标任务类型为若干预设的任务类型中的任一类型;Step S21, obtaining a target instruction according to the SQL statement fragment; the target instruction includes at least a name tag of a target task type; the target task type is any one of several preset task types; 步骤S22,将所述SQL语句片段和目标指令输入目标信息抽取模型中,以得到所述目标信息抽取模型输出的抽取结果;Step S22, inputting the SQL statement fragment and the target instruction into the target information extraction model to obtain the extraction result output by the target information extraction model; 步骤S23,根据所述抽取结果,确定若干原始关键词;Step S23, determining a number of original keywords according to the extraction result; 在所述步骤S22中,所述目标信息抽取模型通过以下步骤获取:In step S22, the target information extraction model is obtained by the following steps: 步骤S221,获取若干原始文本集;每一原始文本集具有对应的原始任务类型;每一原始任务类型为若干预设的任务类型中的其中之一;每一所述原始文本集中包括若干原始文本;Step S221, obtaining a plurality of original text sets; each original text set has a corresponding original task type; each original task type is one of a plurality of preset task types; each of the original text sets includes a plurality of original texts; 步骤S222,根据若干所述原始文本集中的每一原始文本,获取对应的目标训练样本,以得到目标训练样本集;Step S222, acquiring a corresponding target training sample according to each original text in the plurality of original text sets to obtain a target training sample set; 步骤S223,根据所述目标训练样本集对预设大语言模型进行训练,以得到所述目标信息抽取模型;Step S223, training a preset large language model according to the target training sample set to obtain the target information extraction model; 步骤S3,将若干所述原始关键词输入目标deepfm模型中,以得到所述目标deepfm模型输出的若干候选关键词组合以及各候选关键词组合对应的概率;Step S3, inputting the several original keywords into the target deepfm model to obtain several candidate keyword combinations output by the target deepfm model and the probability corresponding to each candidate keyword combination; 步骤S4,基于所述各候选关键词组合对应的概率在若干所述候选关键词组合中确定向用户推送的若干推送关键词组合;Step S4, determining a plurality of push keyword combinations to be pushed to the user from among the plurality of candidate keyword combinations based on the probabilities corresponding to the candidate keyword combinations; 步骤S5,基于各所述推送关键词组合将所述SQL语句片段整理为补充关键词后的SQL语句向用户推送,获得用户选中的SQL语句;Step S5, based on each of the push keyword combinations, the SQL statement fragments are sorted into SQL statements supplemented with keywords and pushed to the user to obtain the SQL statement selected by the user; 步骤S6,根据预设映射规则对所述用户选中的SQL语句进行处理,以得到所述用户选中的SQL语句对应的Elasticsearch操作逻辑,并将其确定为目标操作逻辑。Step S6: Process the SQL statement selected by the user according to a preset mapping rule to obtain the Elasticsearch operation logic corresponding to the SQL statement selected by the user, and determine it as the target operation logic. 2.根据权利要求1所述的基于SQL操作Elasticsearch的方法,其特征在于,在所述步骤S3中,所述目标deepfm模型通过以下步骤获取:2. The method for operating Elasticsearch based on SQL according to claim 1, characterized in that, in step S3, the target deepfm model is obtained by the following steps: 步骤S31,获取历史数据中若干用户输入的SQL语句片段对应的各初始关键词组合以及各用户输入的SQL语句对应的若干中间关键词组合;所述初始关键词组合中包括若干初始关键词;Step S31, obtaining initial keyword combinations corresponding to several SQL statement fragments input by users in historical data and several intermediate keyword combinations corresponding to SQL statements input by each user; the initial keyword combinations include several initial keywords; 步骤S32,根据各所述初始关键词组合生成若干训练样本;Step S32, generating a plurality of training samples according to each of the initial keyword combinations; 步骤S33,若所述中间关键词组合中包含初始关键词组合中的所有初始关键词,则将其作为该初始关键词组合对应的训练样本的样本标签;Step S33, if the intermediate keyword combination contains all the initial keywords in the initial keyword combination, use them as sample labels of the training samples corresponding to the initial keyword combination; 步骤S34,根据各训练样本以及对应的样本标签对初始deepfm模型进行训练,以得到目标deepfm模型。Step S34, training the initial deepfm model according to each training sample and the corresponding sample label to obtain the target deepfm model. 3.根据权利要求2所述的基于SQL操作Elasticsearch的方法,其特征在于,所述步骤S4,包括:3. The method for operating Elasticsearch based on SQL according to claim 2, characterized in that the step S4 comprises: 步骤S41,遍历各候选关键词组合对应的概率,将大于预设概率阈值的概率确定为关键概率;Step S41, traversing the probabilities corresponding to each candidate keyword combination, and determining the probability greater than a preset probability threshold as a key probability; 步骤S42,将所述关键概率对应的候选关键词组合确定为向用户推送的推送关键词组合,以得到若干推送关键词组合。Step S42: determining the candidate keyword combinations corresponding to the key probabilities as the push keyword combinations pushed to the user, so as to obtain a plurality of push keyword combinations. 4.根据权利要求3所述的基于SQL操作Elasticsearch的方法,其特征在于,所述步骤S5,包括:4. The method for operating Elasticsearch based on SQL according to claim 3, characterized in that step S5 comprises: 步骤S51,将各所述推送关键词组合中的各推送关键词与各所述原始关键词进行对比,以确定若干区别关键词;Step S51, comparing each push keyword in each push keyword combination with each original keyword to determine a number of distinguishing keywords; 步骤S52,将若干所述区别关键词进行语句分析,以得到若干区别语句片段;Step S52, performing sentence analysis on the plurality of distinguishing keywords to obtain a plurality of distinguishing sentence fragments; 步骤S53,将所述SQL语句片段与各区别语句片段进行拼接,以得到若干SQL语句;Step S53, concatenating the SQL statement fragment with each distinguishing statement fragment to obtain a plurality of SQL statements; 步骤S54,将若干所述SQL语句向用户推送,获得用户选中的SQL语句。Step S54: Pushing the plurality of SQL statements to the user to obtain the SQL statement selected by the user. 5.根据权利要求4所述的基于SQL操作Elasticsearch的方法,其特征在于,所述步骤S6,包括:5. The method for operating Elasticsearch based on SQL according to claim 4, characterized in that step S6 comprises: 步骤S61,根据预设的SQL解析方法对所述用户选中的SQL语句进行解析,以得到所述用户选中的SQL语句对应的语义信息;Step S61, parsing the SQL statement selected by the user according to a preset SQL parsing method to obtain semantic information corresponding to the SQL statement selected by the user; 步骤S62,基于预设映射规则将所述语义信息转换为对应的Elasticsearch操作逻辑,并将其确定为目标操作逻辑。Step S62: convert the semantic information into corresponding Elasticsearch operation logic based on preset mapping rules, and determine it as the target operation logic. 6.根据权利要求5所述的基于SQL操作Elasticsearch的方法,其特征在于,在所述步骤S222中,所述目标训练样本通过以下步骤获取:6. The method for operating Elasticsearch based on SQL according to claim 5, characterized in that, in the step S222, the target training sample is obtained by the following steps: 步骤S2221,根据所述原始文本集对应的原始任务类型,确定每一原始文本集对应的任务模板;每一预设的任务类型具有对应的任务模板;每一任务模板具有若干槽位;每一槽位填充对应预设的任务类型对应的名称标签;Step S2221, according to the original task type corresponding to the original text set, determine the task template corresponding to each original text set; each preset task type has a corresponding task template; each task template has a number of slots; each slot is filled with a name tag corresponding to the preset task type; 步骤S2222,将所述每一原始文本对应的原始任务类型的名称标签填充至对应的任务模板的槽位中,以得到每一原始文本对应的任务指令;Step S2222, filling the name tag of the original task type corresponding to each original text into the slot of the corresponding task template to obtain the task instruction corresponding to each original text; 步骤2223,将每一原始文本以及所述原始文本对应的任务指令进行拼接,以得到对应的目标训练样本。Step 2223, concatenate each original text and the task instruction corresponding to the original text to obtain the corresponding target training sample. 7.根据权利要求6所述的基于SQL操作Elasticsearch的方法,其特征在于,在所述步骤S61中,所述预设的SQL解析方法包括根据CCJSqlParserUtil类的parse方法解析。7. The method for operating Elasticsearch based on SQL according to claim 6 is characterized in that, in the step S61, the preset SQL parsing method includes parsing according to the parse method of the CCJSqlParserUtil class. 8.根据权利要求7所述的基于SQL操作Elasticsearch的方法,其特征在于,在所述步骤S41之前,还包括:8. The method for operating Elasticsearch based on SQL according to claim 7, characterized in that before step S41, it also includes: 步骤S40,若所述各候选关键词组合对应的概率中N/A和0的总数量大于预设数量阈值,则基于所述步骤S31-步骤S34重新获取目标deepfm模型,以重新确定所述目标deepfm模型输出的若干候选关键词组合以及各候选关键词组合对应的概率;Step S40, if the total number of N/A and 0 in the probabilities corresponding to the candidate keyword combinations is greater than a preset threshold, then reacquire the target deepfm model based on the steps S31 to S34 to re-determine the several candidate keyword combinations output by the target deepfm model and the probabilities corresponding to the candidate keyword combinations; 其中,N/A表示目标deepfm模型输出的候选关键词组合对应的概率上溢;0表示目标deepfm模型输出的待展示选项卡候选关键词组合对应的概率下溢。Among them, N/A indicates that the probability of candidate keyword combination output by the target deepfm model overflows; 0 indicates that the probability of candidate keyword combination of the to-be-displayed tab output by the target deepfm model underflows.
CN202411156408.8A 2024-08-22 2024-08-22 A method of operating Elasticsearch based on SQL Active CN118689900B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411156408.8A CN118689900B (en) 2024-08-22 2024-08-22 A method of operating Elasticsearch based on SQL

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411156408.8A CN118689900B (en) 2024-08-22 2024-08-22 A method of operating Elasticsearch based on SQL

Publications (2)

Publication Number Publication Date
CN118689900A CN118689900A (en) 2024-09-24
CN118689900B true CN118689900B (en) 2024-12-20

Family

ID=92773049

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411156408.8A Active CN118689900B (en) 2024-08-22 2024-08-22 A method of operating Elasticsearch based on SQL

Country Status (1)

Country Link
CN (1) CN118689900B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481145A (en) * 2022-08-15 2022-12-16 深圳壹账通智能科技有限公司 Data query method, device, equipment and medium based on search engine

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11023463B2 (en) * 2016-09-26 2021-06-01 Splunk Inc. Converting and modifying a subquery for an external data system
CN106934062B (en) * 2017-03-28 2020-05-19 广东工业大学 Implementation method and system for querying elastic search
CN114201507A (en) * 2021-12-14 2022-03-18 平安养老保险股份有限公司 Log query method, device, device and storage medium based on ElasticSearch

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115481145A (en) * 2022-08-15 2022-12-16 深圳壹账通智能科技有限公司 Data query method, device, equipment and medium based on search engine

Also Published As

Publication number Publication date
CN118689900A (en) 2024-09-24

Similar Documents

Publication Publication Date Title
CN109684448B (en) Intelligent question and answer method
CN100378724C (en) Sentence structure analysis method based on mobile configuration concept and natural language search method using it
US9824083B2 (en) System for natural language understanding
US20150081277A1 (en) System and Method for Automatically Classifying Text using Discourse Analysis
US20200342052A1 (en) Syntactic graph traversal for recognition of inferred clauses within natural language inputs
US10503769B2 (en) System for natural language understanding
US20110040553A1 (en) Natural language processing
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
WO2022180989A1 (en) Model generation device and model generation method
CN119646016A (en) Data query method, device, electronic device, medium and program product
CN117034135A (en) API recommendation method based on prompt learning and double information source fusion
CN113157887A (en) Knowledge question-answering intention identification method and device and computer equipment
CN112417823A (en) A method and system for word order adjustment and quantifier completion in Chinese text
CN119227792A (en) A method of constructing expert system based on RAG technology
CN109522396B (en) Knowledge processing method and system for national defense science and technology field
CN116225933A (en) Program code checking method and checking device
CN120012771A (en) A multilingual universal part-of-speech recognition method and system based on large language model
CN120030041A (en) Wensheng SQL method and device based on large language model prompt words
CN117291192B (en) A method and system for semantic understanding and analysis of government affairs texts
CN118689900B (en) A method of operating Elasticsearch based on SQL
JP2003167898A (en) Information retrieving system
US11017172B2 (en) Proposition identification in natural language and usage thereof for search and retrieval
US20110320493A1 (en) Method and device for retrieving data and transforming same into qualitative data of a text-based document
WO2022180990A1 (en) Question generating device
CN114742068A (en) Multi-sentence correlation analysis method and system for ISO19650 standard text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant