[go: up one dir, main page]

CN115203237A - Method, device and equipment for generating query sql based on hive metadata - Google Patents

Method, device and equipment for generating query sql based on hive metadata Download PDF

Info

Publication number
CN115203237A
CN115203237A CN202210908469.XA CN202210908469A CN115203237A CN 115203237 A CN115203237 A CN 115203237A CN 202210908469 A CN202210908469 A CN 202210908469A CN 115203237 A CN115203237 A CN 115203237A
Authority
CN
China
Prior art keywords
data
metadata
preset
sql
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210908469.XA
Other languages
Chinese (zh)
Other versions
CN115203237B (en
Inventor
李臻阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Changan Automobile Co Ltd
Original Assignee
Chongqing Changan Automobile Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Changan Automobile Co Ltd filed Critical Chongqing Changan Automobile Co Ltd
Priority to CN202210908469.XA priority Critical patent/CN115203237B/en
Publication of CN115203237A publication Critical patent/CN115203237A/en
Application granted granted Critical
Publication of CN115203237B publication Critical patent/CN115203237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2445Data retrieval commands; View definitions
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/33Intelligent editors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请涉及一种基于hive元数据生成查询sql的方法、装置及设备,方法包括:获取元数据信息,标注元数据信息后存储至预设的mysql数据库,得到初始元数据;基于初始元数据,确定不同表的字段列表,连接不同表的字段列表,生成数据表关联关系,将数据表关联关系作为数据模型存储至预设主题域后,将预设主题域存储至预设的mysql数据库;获取数据模型中的目标数据关系,利用递归算法生成查询目标数据关系的sql模板,通过修改sql模板进行提数开发。解决了因编写复杂的sql语句借助大数据组件进行提数,导致需要对元数据及数据表之间的关系进行维护,并且操作员需要根据各表之间的关系来编写复杂sql,造成开发效率低的问题。

Figure 202210908469

The present application relates to a method, device and device for generating query sql based on hive metadata. The method includes: obtaining metadata information, marking the metadata information and storing it in a preset mysql database to obtain initial metadata; based on the initial metadata, Determine the field list of different tables, connect the field lists of different tables, generate the data table association relationship, store the data table association relationship as a data model in the preset subject domain, and store the preset subject domain in the preset mysql database; get For the target data relationship in the data model, a recursive algorithm is used to generate a SQL template for querying the target data relationship, and the data is developed by modifying the SQL template. It solves the problem that the relationship between metadata and data tables needs to be maintained because of writing complex SQL statements with the help of big data components, and operators need to write complex SQL according to the relationship between tables, resulting in development efficiency. low problem.

Figure 202210908469

Description

Method, device and equipment for generating query sql based on hive metadata
Technical Field
The application relates to the technical field of big data query, in particular to a method, a device and equipment for generating query sql based on hive metadata.
Background
With the advancement of big data informatization construction, businesses of all departments gather a large amount of business data to be stored in a data warehouse. This is accompanied by a large number of data-lifting requirements, which require the operator to be familiar with the relationships between the data tables, by writing complex sql statements to carry out the lifting with large data components.
However, this presents two problems, first, the need to maintain the relationship between the metadata and the data tables; secondly, an operator needs to write complex sql according to the relationship among the tables, and the efficiency is reduced along with the increase of the complexity of the relationship, which needs to be solved urgently.
Disclosure of Invention
The application provides a method, a device and equipment for generating query sql based on hive metadata, and aims to solve the problems that due to the fact that writing of complex sql statements is carried out through a big data assembly, relationships between metadata and data tables need to be maintained, and an operator needs to write complex sql according to the relationships between the tables, and development efficiency is low.
An embodiment of a first aspect of the present application provides a method for generating a query sql based on hive metadata, including the following steps: acquiring metadata information, labeling the metadata information, and storing the metadata information to a preset mysql database to obtain initial metadata; determining field lists of different tables based on the initial metadata, connecting the field lists of the different tables to generate a data table association relation, storing the data table association relation serving as a data model to a preset subject domain, and storing the preset subject domain to a preset mysql database; and acquiring a target data relation in the data model, and generating an sql template for inquiring the target data relation by using a recursive algorithm so as to carry out number extraction development by modifying the sql template.
According to the technical means, the problem that due to the fact that the complex sql statements are written for counting by means of a big data component, the relation between metadata and data tables needs to be maintained, and an operator needs to write the complex sql statements according to the relation between the tables, the development efficiency is low is solved, the maintenance cost of an administrator is reduced, and the development efficiency is improved.
Further, the acquiring metadata information, labeling the metadata information, and storing the labeled metadata information in a preset mysql database includes: acquiring information of an existing database and a data table through a live metastore api (a configuration unit element storage interface); based on the existing database, acquiring the metadata information in the data table information by using the hive scanner api and hdfs (Hadoop Distributed File System); and labeling the metadata information based on preset labeling data, and storing the labeled metadata information to the preset mysql database.
According to the technical means, the information of the existing database and the data table is obtained through the live metastore api, an operable interface is provided, and metadata information can be conveniently marked.
Further, the metadata information includes at least one of a table name, a table remark, table field information, a hard disk space size of table original data and a distributed file system HDFS space occupied by the table data.
According to the technical means, the information of the metadata in the database and the data table is obtained.
Further, the preset labeling data includes: at least one of the names of the characters in the table, the updating period and the preset loading strategy.
According to the technical means, the introduced metadata is labeled by the operable page, so that the metadata is more readable.
Further, the data model is a tree structure composed of a fact table and a plurality of dimension tables, wherein the fact table is a root node.
According to the technical means, the tree structure refers to a data structure with one-to-many tree relationship among data elements, the data structure is clear, and the database query rate is high.
Further, the data table association includes a first association between the fact table and any one of the plurality of dimension tables and a second association between any two of the plurality of dimension tables.
According to the technical means, the consistency and the integrity of the data in the two tables are restricted by establishing connection of the data of the two tables.
An embodiment of a second aspect of the present application provides an apparatus for generating a query sql based on hive metadata, including: the marking module is used for acquiring metadata information, marking the metadata information and storing the metadata information to a preset mysql database to obtain initial metadata; the storage module is used for determining field lists of different tables based on the initial metadata, connecting the field lists of the different tables, generating a data table association relation, storing the data table association relation serving as a data model to a preset subject domain, and storing the preset subject domain to the preset mysql database; and the generation module is used for acquiring the target data relation in the data model, generating an sql template for inquiring the target data relation by using a recursive algorithm and carrying out number-increasing development by modifying the sql template.
Further, the labeling module is specifically configured to: acquiring information of an existing database and a data table through a live metastore api; based on the existing database, acquiring the metadata information in the data table information by using the hive metastore api and the hdfs api; and marking the metadata information based on preset marking data, and storing the marked metadata information to the preset mysql database.
Further, the device for generating query sql based on hive metadata is characterized in that the metadata information includes at least one of a table name, a table remark, table field information, a hard disk space size of table original data, and a distributed file system HDFS space occupied by table data.
Further, the apparatus for generating query sql based on hive metadata is characterized in that the preset annotation data includes: at least one of a table name, an update period, and a preset loading policy.
Further, the apparatus for generating query sql based on hive metadata is characterized in that the data model is a tree structure composed of a fact table and a plurality of dimension tables, where the fact table is a root node.
Further, the apparatus for generating query sql based on hive metadata is characterized in that the data table association relationship includes a first association relationship between the fact table and any one of the plurality of dimension tables and a second association relationship between any two of the plurality of dimension tables.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of generating a query sql based on hive metadata as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor, so as to implement the method for generating a query sql based on hive metadata as described in the foregoing embodiments.
Therefore, initial metadata is obtained by obtaining metadata information, labeling the metadata information and storing the metadata information to a preset mysql database, field lists of different tables are determined based on the initial metadata, the field lists of the different tables are connected, a data table association relation is generated, the data table association relation is stored to a preset subject domain as a data model, the preset subject domain is stored to the preset mysql database, a target data relation in the data model is obtained, an sql template for inquiring the target data relation is generated by using a recursive algorithm, and number extraction development is performed by modifying the sql template. The problem that due to the fact that the complex sql statements are written for number increasing through a big data assembly, the relation between metadata and data tables needs to be maintained, and an operator needs to write the complex sql statements according to the relation between the tables, development efficiency is low is solved, maintenance cost of an administrator is reduced, and development efficiency is improved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart of a method for generating a query sql based on hive metadata according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a mysql database ER storing hive table relationships according to an embodiment of the present application;
FIG. 3 is a schematic flow diagram of a recursive method for query sql generation according to an embodiment of the application;
FIG. 4 is a block schematic diagram of an apparatus for generating a query sql based on hive metadata according to an embodiment of the present application;
fig. 5 is a block diagram of an electronic device according to an embodiment of the application.
Description of reference numerals:
10-means to generate query sql based on hive metadata, 100-annotation module, 200-storage module, 300-generation module, 501-memory, 502-processor, 503-communication interface.
Detailed Description
Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a method, an apparatus, and a device for generating a query sql based on hive metadata according to an embodiment of the present application with reference to the drawings. In the method, metadata information is obtained, the metadata information is labeled and then is stored in a preset mysql database to obtain initial metadata, field lists of different tables are determined based on the initial metadata and are connected with the field lists of different tables to generate a data table association relation, the data table association relation is stored to a preset mysql database as a data model, a preset theme domain is stored to the preset mysql database to obtain a target data relation in the data model, a sql template for inquiring the target data relation is generated by a recursive algorithm, and the number extraction development is carried out by modifying the sql template. Therefore, the problem that due to the fact that the writing of complex sql statements is carried out through a large data assembly, the relation between metadata and data tables needs to be maintained, and an operator needs to write the complex sql according to the relation between the tables, the development efficiency is low is solved, the maintenance cost of an administrator is reduced, and the development efficiency is improved.
Specifically, fig. 1 is a flowchart illustrating a method for generating a query sql based on hive metadata according to an embodiment of the present application.
As shown in fig. 1, the method for generating query sql based on hive metadata includes the following steps:
in step S101, metadata information is obtained, and the metadata information is tagged and then stored in a preset mysql database, so as to obtain initial metadata.
The hive is a data warehouse tool based on a distributed system infrastructure Hadoop, is used for data extraction, conversion and loading, and is a mechanism capable of storing, inquiring and analyzing large-scale data stored in the Hadoop; metadata, i.e. hive-defined table name, field name, type, partition, user data, is typically stored in mysql.
Optionally, in some embodiments, obtaining metadata information, labeling the metadata information, and storing the labeled metadata information in a preset mysql database includes: acquiring information of an existing database and a data table through a live metastore api; based on the existing database, using hive metastore api and hdfsapi to obtain the metadata information in the data table information; and marking metadata information based on preset marking data, and storing the marked metadata information to a preset mysql database.
Wherein the metadata information obtained from hive and hdfs and the operator tagged information are stored to mysql.
Optionally, in some embodiments, the metadata information may include at least one of a table name, a table remark, table field information, a size of a hard disk space where table raw data is located, and a distributed file system HDFS space occupied by the table data.
It can be understood that, the information of the existing database and the data table is obtained through the live metastore api, an operable interface is provided, the administrator introduces the metadata of the table by selecting the data table, and the following information is obtained through the live metastore api and the hdfs api: table name, table remark, table field information (field English name, field type, field division, remark), hard disk space size of table original data, and HDFS space size occupied by table data.
Optionally, in some embodiments, the preset annotation data includes: at least one of a table name, an update period, and a preset loading policy.
It should be understood that the operator may then make some annotation on the imported metadata through the operable page, so that the metadata is more readable, and may supplement the following information: table name, update period (day, week, month, year, real time), loading policy (incremental append, full overlay, full snapshot, upsert).
In step S102, based on the initial metadata, field lists of different tables are determined, the field lists of different tables are connected, a data table association relationship is generated, and after the data table association relationship is stored as a data model in a preset subject domain, the preset subject domain is stored in a preset mysql database.
Optionally, in some embodiments, the data model is a tree structure consisting of a fact table and a plurality of dimension tables, wherein the fact table is a root node.
The fact table stores data which can embody actual data or detailed numerical values and generally consists of dimension codes and fact data; the dimension table stores data with independent attributes and hierarchical structures, generally consists of dimension codes and corresponding dimension descriptions (labels), and can be associated with the fact table.
Optionally, in some embodiments, the data table associations include a first association between the fact table and any one of the plurality of dimension tables and a second association between any two of the plurality of dimension tables.
Specifically, a concept of a topic domain is introduced, a data model is located under the topic domain, and a relational graph corresponding to a plurality of tables under one model is shown in fig. 2. A tree structure of a model is defined, which is composed of a fact table and a plurality of dimension tables, and the fact table is used as a root node. Introducing a plurality of tables with mysql which is stored, labeling the table type (a fact table or a dimension table), establishing the association relation between the fields of the fact table and the dimension table and between the dimension table and the dimension table, and storing the association relation in a mysql database. The theme domain is a larger concept, the theme is a little, and the entity is the smallest, wherein the theme is a data model, and the entity represents a relational graph of a plurality of tables.
In step S103, a target data relationship in the data model is obtained, and an sql template for querying the target data relationship is generated by using a recursive algorithm, so as to perform number-increasing development by modifying the sql template.
Specifically, a data model under a certain subject domain is selected, the data relationship of the model is read, generation of query sql between relationship tables is realized by using a recursive function, the function is described below by combining a specific example, and a flow diagram of a recursive method for generating the query sql is shown in fig. 3.
In step S301, local variables (sql, modelId, tableId, tabNum) and a global variable tableNum (table number) are input.
First, a function declaration of Java language is private StringBuilder sqlnproducer (StringBuilder sql, stringmodel id, stringtableid, int tabNum); there is a global variable tablemum with an initial value of 0.
In step S302, the table field and table name are looked up using tableld. Specifically, the global variable tablemum is linked by the character't' for this part of the name, resulting in, for example:
select\n\tname,age\nfrom student t0。
step S303, splicing the 'select table field'. The field information and the table name information of the table are obtained from the database through tableId, such as the field id, the name and the age of the table, a splicing selection part is adopted, tabNum is used as indentation control, the initial value is 0, the formatting of sql is controlled and generated, the initial \ t quantity of each indentation is the same as that of tabNum, and when the indentation is needed, the \ t quantity of tabNum +1 is used. The results are for example:
select\n\tid,name,age\nfrom employee。
step S304, splicing the "from table name".
In step S305, the table is named "t" + tablemm. Specifically, the global variable tablemum is named for this part by the character't', resulting in, for example:
select\n\tname,age\nfrom student t0。
step S306, using modelId (service model id) and tableId to query the right table information. Specifically, right connection table information such as table destination field name and employeeId is queried through modeld and tableId, the connection mode is left connection left join (the other connection modes are the same), the first half of sql is spliced and connected, and the result is, for example:
select\n\tid,name,age\nfrom student t0\nleft join(\n。
in step S307, it is determined whether the query right connection table exists, if so, step S309 is executed, and if not, step S308 is executed.
And step S308, returning to the generated sql.
Step S309, determine the type left, right, inner, full and field of the connection.
And step S310, splicing and connecting the front half part of the sql and the left bracket.
In step S311, the "sql generation flow" (sql, modelId, rightTableId, tabNum + 1) is called again with the currently generated sql and the connection table id as parameters to splice the return values.
And step S312, splicing and connecting the second half part of the sql and the right bracket.
In step S313, the table is named "t" + tablemm.
Step S314, tableNum + +. Wherein, the second half of the concatenation connection sql names the connected sub-query part according to the global variable tableNum, and the result is, for example:
select\n\tid,name,age\nfrom student t0\nleft join
(\n\tselect\n\t\tname,employeeId\n\tfrom department t1)t2 on t0.id=t2.employeeId
and step S315, splicing connection conditions. Specifically, if there are a plurality of connection conditions, the and splicing process is used.
And step S316, returning the generated sql.
According to the method for generating the query sql based on the hive metadata, metadata information is obtained, the metadata information is labeled and then stored in a preset mysql database, initial metadata is obtained, field lists of different tables are determined based on the initial metadata, the field lists of the different tables are connected, a data table association relation is generated, the data table association relation is stored to a preset subject domain as a data model, the preset subject domain is stored to the preset mysql database, a target data relation in the data model is obtained, an sql template for querying the target data relation is generated by using a recursive algorithm, and the number is extracted and developed by modifying the sql template. Therefore, the problem that the relation between metadata and a data table needs to be maintained due to the fact that the writing of complex sql statements is carried out by means of a big data component, and an operator needs to write the complex sql according to the relation between the tables, so that the development efficiency is low is solved, the maintenance cost of an administrator is reduced, and the development efficiency is improved.
Next, an apparatus for generating a query sql based on hive metadata according to an embodiment of the present application will be described with reference to the drawings.
Fig. 4 is a block diagram illustrating an apparatus for generating a query sql based on hive metadata according to an embodiment of the present application.
As shown in fig. 4, the apparatus 10 for generating a query sql based on hive metadata includes: an annotation module 100, a storage module 200, and a generation module 300.
The labeling module 100 is configured to acquire metadata information, label the metadata information, and store the labeled metadata information in a preset mysql database to obtain initial metadata; the storage module 200 is configured to determine field lists of different tables based on the initial metadata, connect the field lists of the different tables, generate a data table association relationship, store the data table association relationship as a data model to a preset topic domain, and store the preset topic domain to a preset mysql database; the generating module 300 is configured to obtain a target data relationship in the data model, and generate an sql template for querying the target data relationship by using a recursive algorithm, so as to perform number-increasing development by modifying the sql template.
Optionally, in some embodiments, the labeling module 100 is configured to: acquiring information of an existing database and a data table through a live metastore api; based on the existing database, using hive metastore api and hdfs api to obtain the metadata information in the data table information; and marking metadata information based on preset marking data, and storing the marked metadata information to a preset mysql database.
Optionally, in some embodiments, the metadata information includes at least one of a table name, a table remark, table field information, a size of a hard disk space where table raw data is located, and a distributed file system HDFS space occupied by the table data.
Optionally, in some embodiments, the preset annotation data includes: at least one of the names of the characters in the table, the updating period and the preset loading strategy.
Optionally, in some embodiments, the data model is a tree structure consisting of a fact table and a plurality of dimension tables, wherein the fact table is a root node.
Optionally, in some embodiments, the data table associations include a first association between a fact table and any one of the plurality of dimension tables and a second association between any two of the plurality of dimension tables.
It should be noted that the foregoing explanation of the embodiment of the apparatus for generating the query sql based on the hive metadata is also applicable to the method for generating the query sql based on the hive metadata in this embodiment, and is not described herein again.
According to the device for generating the query sql based on the hive metadata, metadata information is obtained, the metadata information is labeled and then stored in a preset mysql database, initial metadata is obtained, field lists of different tables are determined based on the initial metadata, the field lists of the different tables are connected, a data table association relation is generated, the data table association relation is stored to a preset subject domain as a data model, the preset subject domain is stored to the preset mysql database, a target data relation in the data model is obtained, an sql template for querying the target data relation is generated by using a recursive algorithm, and the number is extracted and developed by modifying the sql template. Therefore, the problem that the relation between metadata and a data table needs to be maintained due to the fact that the writing of complex sql statements is carried out by means of a big data component, and an operator needs to write the complex sql according to the relation between the tables, so that the development efficiency is low is solved, the maintenance cost of an administrator is reduced, and the development efficiency is improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
The vehicle may include: a memory 501, a processor 502, and a computer program stored on the memory 501 and executable on the processor 502.
The processor 502, when executing a program, implements the method of generating a query sql based on hive metadata as provided in the embodiments above.
Further, the electronic device further includes:
a communication interface 503 for communication between the memory 501 and the processor 502.
A memory 501 for storing computer programs operable on the processor 502.
The memory 501 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 501, the processor 502 and the communication interface 503 are implemented independently, the communication interface 503, the memory 501 and the processor 502 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 501, the processor 502, and the communication interface 503 are integrated on one chip, the memory 501, the processor 502, and the communication interface 503 may complete mutual communication through an internal interface.
The processor 502 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
Embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored, where the computer program, when executed by a processor, implements the above method for generating a query sql based on hive metadata.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable gate arrays, field programmable gate arrays, and the like.
It will be understood by those skilled in the art that all or part of the steps carried out in the method of implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and the program, when executed, includes one or a combination of the steps of the method embodiments.
Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims (10)

1.一种基于hive元数据生成查询sql的方法,其特征在于,包括以下步骤:1. a method for generating query sql based on hive metadata, is characterized in that, comprises the following steps: 获取元数据信息,并标注所述元数据信息后存储至预设的mysql数据库,得到初始元数据;Acquire metadata information, mark the metadata information and store it in a preset mysql database to obtain initial metadata; 基于所述初始元数据,确定不同表的字段列表,并连接所述不同表的字段列表,生成数据表关联关系,并将所述数据表关联关系作为数据模型存储至预设主题域后,将所述预设主题域存储至所述预设的mysql数据库;以及Based on the initial metadata, the field lists of different tables are determined, and the field lists of the different tables are connected to generate a data table association relationship, and after the data table association relationship is stored as a data model in a preset subject domain, the The preset subject domain is stored in the preset mysql database; and 获取所述数据模型中的目标数据关系,并利用递归算法生成查询所述目标数据关系的sql模板,以通过修改所述sql模板进行提数开发。The target data relationship in the data model is acquired, and a sql template for querying the target data relationship is generated by using a recursive algorithm, so as to perform data extraction development by modifying the sql template. 2.根据权利要求1所述的方法,其特征在于,所述获取元数据信息,并标注所述元数据信息后存储至预设的mysql数据库,包括:2. method according to claim 1, is characterized in that, described obtaining metadata information, and after labeling described metadata information, store to preset mysql database, comprising: 通过hive metastore api获取已有数据库和数据表信息;Obtain existing database and data table information through hive metastore api; 基于所述已有数据库,利用所述hive metastore api和hdfs api获取所述数据表信息中的所述元数据信息;Based on the existing database, use the hive metastore api and hdfs api to obtain the metadata information in the data table information; 基于预设的标注数据,标注所述元数据信息,并将标注后的元数据信息存储至所述预设的mysql数据库。Based on the preset annotation data, the metadata information is marked, and the marked metadata information is stored in the preset mysql database. 3.根据权利要求2所述的方法,其特征在于,所述元数据信息包括表名、表备注、表字段信息、表原始数据所在硬盘空间大小和表数据所占分布式文件系统HDFS空间中的至少一项。3. The method according to claim 2, wherein the metadata information includes table name, table remarks, table field information, the size of the hard disk space where the table original data is located, and the HDFS space occupied by the table data in the distributed file system. at least one of. 4.根据权利要求2所述的方法,其特征在于,所述预设的标注数据包括:表中文名、更新周期和预设的加载策略中的至少一项。4 . The method according to claim 2 , wherein the preset labeling data comprises: at least one of a table Chinese name, an update period and a preset loading strategy. 5 . 5.根据权利要求1所述的方法,其特征在于,所述数据模型为由事实表和多个维表组成的树形结构,其中,所述事实表为根节点。5. The method according to claim 1, wherein the data model is a tree structure composed of a fact table and a plurality of dimension tables, wherein the fact table is a root node. 6.根据权利要求5所述的方法,其特征在于,所述数据表关联关系包括所述事实表与所述多个维表中任一维表之间的第一关联关系和所述多个维表中任意两个维表之间的第二关联关系。6 . The method according to claim 5 , wherein the data table association relationship comprises a first association relationship between the fact table and any dimension table among the plurality of dimension tables and the plurality of dimension tables. 7 . The second association between any two dimension tables in the dimension table. 7.一种基于hive元数据生成查询sql的装置,其特征在于,包括:7. a device for generating query sql based on hive metadata, is characterized in that, comprising: 标注模块,获取元数据信息,并标注所述元数据信息后存储至预设的mysql数据库,得到初始元数据;The labeling module obtains metadata information, labels the metadata information and stores it in a preset mysql database to obtain initial metadata; 存储模块,基于所述初始元数据,确定不同表的字段列表,并连接所述不同表的字段列表,生成数据表关联关系,并将所述数据表关联关系作为数据模型存储至预设主题域后,将所述预设主题域存储至所述预设的mysql数据库;The storage module, based on the initial metadata, determines the field lists of different tables, connects the field lists of the different tables, generates a data table association relationship, and stores the data table association relationship as a data model in a preset subject domain Then, the preset subject domain is stored in the preset mysql database; 生成模块,获取所述数据模型中的目标数据关系,并利用递归算法生成查询所述目标数据关系的sql模板,以通过修改所述sql模板进行提数开发。The generating module obtains the target data relationship in the data model, and uses a recursive algorithm to generate a SQL template for querying the target data relationship, so as to perform data extraction development by modifying the SQL template. 8.根据权利要求7所述的基于hive元数据生成查询sql的装置,其特征在于,所述标注模块,具体用于:8. The device for generating query sql based on hive metadata according to claim 7, wherein the labeling module is specifically used for: 通过hive metastore api获取已有数据库和数据表信息;Obtain existing database and data table information through hive metastore api; 基于所述已有数据库,利用所述hive metastore api和hdfs api获取所述数据表信息中的所述元数据信息;Based on the existing database, use the hive metastore api and hdfs api to obtain the metadata information in the data table information; 基于预设的标注数据,标注所述元数据信息,并将标注后的元数据信息存储至所述预设的mysql数据库。Based on the preset annotation data, the metadata information is marked, and the marked metadata information is stored in the preset mysql database. 9.一种电子设备,其特征在于,包括存储器、处理器;9. An electronic device, comprising a memory and a processor; 其中,所述处理器通过读取所述存储器中存储的可执行程序代码来运行与所述可执行程序代码对应的程序,以用于实现如权利要求1-6中任一所述的基于hive元数据生成查询sql的方法。Wherein, the processor runs a program corresponding to the executable program code by reading the executable program code stored in the memory, so as to implement the hive-based system according to any one of claims 1-6 Metadata generates methods for querying SQL. 10.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-6中任一所述的基于hive元数据生成查询sql的方法。10. A computer-readable storage medium storing a computer program, characterized in that, when the program is executed by a processor, the hive-based metadata according to any one of claims 1-6 is implemented A method to generate query sql.
CN202210908469.XA 2022-07-29 2022-07-29 Method, device and equipment for generating query sql based on hive metadata Active CN115203237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210908469.XA CN115203237B (en) 2022-07-29 2022-07-29 Method, device and equipment for generating query sql based on hive metadata

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210908469.XA CN115203237B (en) 2022-07-29 2022-07-29 Method, device and equipment for generating query sql based on hive metadata

Publications (2)

Publication Number Publication Date
CN115203237A true CN115203237A (en) 2022-10-18
CN115203237B CN115203237B (en) 2025-12-19

Family

ID=83585167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210908469.XA Active CN115203237B (en) 2022-07-29 2022-07-29 Method, device and equipment for generating query sql based on hive metadata

Country Status (1)

Country Link
CN (1) CN115203237B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109947A (en) * 2017-12-28 2019-08-09 北京京东尚科信息技术有限公司 The method and apparatus for extracting data
WO2019165691A1 (en) * 2018-02-28 2019-09-06 平安科技(深圳)有限公司 Method, apparatus and device for automatically generating test case, and readable storage medium
CN110399387A (en) * 2019-07-26 2019-11-01 北京明略软件系统有限公司 Method and device based on table incidence relation dynamic generation query SQL
CN111651531A (en) * 2020-06-05 2020-09-11 深圳前海微众银行股份有限公司 Data import method, device, device and computer storage medium
CN111797278A (en) * 2020-05-19 2020-10-20 武汉乐程软工科技有限公司 Method for mapping associated object and relation
CN111813796A (en) * 2020-06-15 2020-10-23 北京邮电大学 Column-level blood relationship processing system and method based on Hive data warehouse
CN111949856A (en) * 2020-08-11 2020-11-17 北京金山云网络技术有限公司 Object storage query method and device based on web
CN112463826A (en) * 2020-11-10 2021-03-09 银盛支付服务股份有限公司 Hive-based method and system for realizing linked list query of MongoDB
CN112948397A (en) * 2021-03-31 2021-06-11 中国建设银行股份有限公司 Data processing system, method, device and storage medium
CN113051263A (en) * 2019-12-26 2021-06-29 上海科技发展有限公司 Metadata-based big data platform construction method, system, equipment and medium
CN114328568A (en) * 2022-01-20 2022-04-12 重庆长安汽车股份有限公司 Hive job management method and system based on web application and readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110109947A (en) * 2017-12-28 2019-08-09 北京京东尚科信息技术有限公司 The method and apparatus for extracting data
WO2019165691A1 (en) * 2018-02-28 2019-09-06 平安科技(深圳)有限公司 Method, apparatus and device for automatically generating test case, and readable storage medium
CN110399387A (en) * 2019-07-26 2019-11-01 北京明略软件系统有限公司 Method and device based on table incidence relation dynamic generation query SQL
CN113051263A (en) * 2019-12-26 2021-06-29 上海科技发展有限公司 Metadata-based big data platform construction method, system, equipment and medium
CN111797278A (en) * 2020-05-19 2020-10-20 武汉乐程软工科技有限公司 Method for mapping associated object and relation
CN111651531A (en) * 2020-06-05 2020-09-11 深圳前海微众银行股份有限公司 Data import method, device, device and computer storage medium
CN111813796A (en) * 2020-06-15 2020-10-23 北京邮电大学 Column-level blood relationship processing system and method based on Hive data warehouse
CN111949856A (en) * 2020-08-11 2020-11-17 北京金山云网络技术有限公司 Object storage query method and device based on web
CN112463826A (en) * 2020-11-10 2021-03-09 银盛支付服务股份有限公司 Hive-based method and system for realizing linked list query of MongoDB
CN112948397A (en) * 2021-03-31 2021-06-11 中国建设银行股份有限公司 Data processing system, method, device and storage medium
CN114328568A (en) * 2022-01-20 2022-04-12 重庆长安汽车股份有限公司 Hive job management method and system based on web application and readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DONGHYUN CHOI ET AL.: "RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases", 《ARXIV:2004.03125V1》, 7 April 2020 (2020-04-07), pages 1 - 10 *
黄志 等: "基于Json的小型异构数据库同步策略研究", 《气象研究与应用》, no. 01, 15 March 2020 (2020-03-15), pages 50 - 55 *

Also Published As

Publication number Publication date
CN115203237B (en) 2025-12-19

Similar Documents

Publication Publication Date Title
CN112036736A (en) Workflow creating method and device
CN106708480B (en) Management platform implementation method and device
US8108367B2 (en) Constraints with hidden rows in a database
US7720885B2 (en) Generating a word-processing document from database content
CN115114321B (en) A dynamic query method and system
US20190228551A1 (en) Computerized systems and methods for graph data modeling
CN113608955B (en) Log recording method, device, equipment and storage medium
CN109710238A (en) Code generation method based on Spring Cloud
WO2023051125A1 (en) Structured data version management method, apparatus, and related device
CN113254455A (en) Dynamic configuration method and device of database, computer equipment and storage medium
CN115617338A (en) A method, device, and readable storage medium for quickly generating business pages
CN113918576B (en) Metadata management platform and management method
CN113282579B (en) Heterogeneous data storage and retrieval method, device, equipment and storage medium
CN117035639A (en) Business form and process automatic association processing method for process engine
US12260170B2 (en) Ability to add non-direct ancestor columns in child spreadsheets
Chellappan et al. MongoDB Recipes: With Data Modeling and Query Building Strategies
CN115629763A (en) Target code generation method and NPU instruction display method and device
CN118210809B (en) Object definition method, system, equipment and medium based on ER information
CN117033420B (en) Visual display method and device for entity data under same concept of knowledge graph
CN109471904B (en) Method and system for organizing labels
CN115203237A (en) Method, device and equipment for generating query sql based on hive metadata
CN119849462A (en) Method for automatically generating document based on label and template library and product thereof
CN109271161B (en) Template code generation method and device, storage device and computer device
JP2018112919A (en) Test input information retrieval apparatus and method
CN114895875B (en) Zero-code visual information system metadata production application method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant