WO2020215689A1 - Query method and apparatus for column-oriented files - Google Patents
Query method and apparatus for column-oriented files Download PDFInfo
- Publication number
- WO2020215689A1 WO2020215689A1 PCT/CN2019/117763 CN2019117763W WO2020215689A1 WO 2020215689 A1 WO2020215689 A1 WO 2020215689A1 CN 2019117763 W CN2019117763 W CN 2019117763W WO 2020215689 A1 WO2020215689 A1 WO 2020215689A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- file
- sql
- spl
- statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/182—Distributed file systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the embodiments of the present application relate to the technical field of database management, and in particular to a query method, query device, computer equipment, and readable storage medium for column storage files.
- the Search Processing Language (SPL) developed by Splunk is a common search language used to query log data that has been indexed.
- SPL Search Processing Language
- column storage such as parquet or optimized row columnar (orc)
- HDFS Hadoop Distributed File System
- this application aims to solve the problem of not supporting the SPL statement to directly query the column storage format.
- an embodiment of the present application provides a query method for column storage files, the method including:
- the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
- the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL;
- an embodiment of the present application also provides a query device for column storage files, including:
- the obtaining module is used to obtain the SPL query sentence input by the user from the terminal;
- the determining module is configured to determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule,
- the preset storage rules include: time sequence, application name and/or operator ID;
- a screening module configured to screen out a second file from the first file according to the query range
- the search module is used to import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data
- the platform SQL search engine includes Hive and/or Spark SQL;
- the output module is used to output the target query file to the terminal.
- an embodiment of the present application further provides a computer device, the computer device memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the computer When the readable instruction is executed by the processor, the following steps are implemented:
- the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
- the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL;
- the embodiments of the present application also provide a non-volatile computer-readable storage medium.
- the non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions may Is executed by at least one processor, so that the at least one processor executes the following steps:
- the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
- the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL;
- the query method, query device, computer equipment, and non-volatile computer-readable storage medium for column storage files convert SPL statements into SQL statements and determine the HDFS column storage files according to the query time range of the SPL statement
- the file that meets the query time range is imported into the big data platform SQL search engine, the SQL statement is executed on the big data platform SQL search engine to search for the target query file and the
- the target query file is output to the user terminal, providing a unified query mode for users of the original log search system, expanding the query range of SPL statements, and providing convenience for querying column storage data.
- FIG. 1 is a flowchart of the steps of a method for querying stored files in the first embodiment of the application.
- FIG. 2 is a schematic diagram of the hardware architecture of the query device according to the second embodiment of the application.
- FIG. 3 is a schematic diagram of program modules of a storage file query system according to the third embodiment of the application.
- FIG. 1 shows a flow chart of a method for querying stored files in the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. It should be noted that, in this embodiment, the query device 2 for column storage files (hereinafter referred to as query device 2) is used as the execution subject for exemplary description. details as follows:
- Step S100 Obtain the SPL query sentence input by the user from the terminal.
- the query device 2 obtains the SPL query sentence input by the user from the terminal.
- the query statement includes at least: query time range and name.
- Step S102 Determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule.
- the preset storage rules include: time sequence, application name, and operator identification information (identification, ID).
- Step S104 selecting a second file from the first file according to the query range.
- the query time range and name of the file to be queried are identified according to the type of the SPL query sentence, and the column storage files are filtered out corresponding to the query time range and name document.
- the column storage files are stored in accordance with preset storage rules to improve query efficiency.
- the preset storage rules can be freely set according to the needs of users, including application name, time, etc., or Other logos are not limited here.
- the files stored in the query device 2 are stored according to /[application name]/[year]/[month]/[day]/[hour], when it is obtained that the user uses the SPL statement to query the application name If it is syslog and the time range is from September 1st to September 3rd, 2018, the file query range is determined to be the application name syslog, and the time range is from September 1st to September 3rd, 2018.
- the determined query scope filters out files in the three folders /syslog/2018/9/1, /syslog/2018/9/2, and /syslog/2018/9/3.
- Step S106 Convert the SPL query sentence into a SQL sentence according to a preset conversion rule.
- the query device 2 pre-establishes a conversion mapping table for common commands of SPL sentences and common commands of SQL sentences, and stores the conversion mapping table in a database.
- the conversion mapping table at least includes: common command types of SPL statements, common command types of SQL statements, and a mapping relationship between common commands of the SPL statement and common commands of the SQL statement when the view name is determined.
- a view that is, a temporary table
- the query device 2 receives the view creation instruction input by the user, it recognizes and executes the view creation instruction to create a view.
- the query device 2 obtains the view name according to the name of the view by the user, and after obtaining the view name, queries the view according to the obtained SPL query sentence.
- the query device 2 receives the SPL query sentence, it identifies the command type corresponding to the SPL query sentence, and converts the SPL query sentence into a corresponding one according to the mapping relationship according to the command type and the view name. SQL statement.
- the view created by the user is named temp_1, and the content of the view includes attributes such as user name, gender, and date of birth.
- the query device 2 receives the SPL query sentence input by the user: gender : [male]
- step S106 can be executed before step S102, and the two can also be executed in parallel, which does not affect the implementation of the embodiment of the present application.
- Step S108 Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file.
- the SQL search engine of the big data platform includes Hive and/or Spark SQL.
- Step S110 output the target query file to the terminal.
- Hive's common data import methods include: import data from the local file system to the Hive table; import data from HDFS to the Hive table; and from other tables Query the corresponding data and import it into the Hive table; when creating the table, query the corresponding records from other tables and insert them into the created table.
- data is imported from the HDFS into the Hive table. This importing step is an existing technology and will not be described in detail here.
- the SQL search engine of the big data platform is Spark SQL
- the view is created by using Spark SQL, and the converted SQL statement is used for query to output the returned result.
- the big data platform SQL search engine when the big data platform SQL search engine receives the file within the query range, it executes the SQL statement converted from the SPL statement, and outputs the execution result to the user terminal.
- the execution result is the file that the user needs to query, that is, the target query file.
- the file in the HDFS column storage file that meets the query time range is determined, and the file is imported into the big data platform SQL search engine,
- the SQL statement on the SQL search engine of the big data platform to search for the target query file and output the target query file to the user terminal, it provides a unified query mode for users of the original log search system, and expands
- the query range of the SPL statement provides convenience for querying column storage data.
- FIG. 2 shows a schematic diagram of the hardware architecture of the query device in the second embodiment of the present application.
- the query device 2 includes, but is not limited to, a memory 21, a processing 22, and a network interface 23 that can communicate with each other through a system bus.
- FIG. 2 only shows the query device 2 having components 21-23, but it should be understood that it is not It is required to implement all the illustrated components, and more or fewer components may be implemented instead.
- the memory 21 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
- the memory 21 may be an internal storage unit of the query device 2, for example, a hard disk or a memory of the query device 2.
- the memory may also be an external storage device of the query device 2, for example, a plug-in hard disk equipped on the query device 2, a smart media card (SMC), a secure digital ( Secure Digital, SD card, Flash Card, etc.
- the memory 21 may also include both the internal storage unit of the query device 2 and its external storage device.
- the memory 21 is generally used to store the operating system and various application software installed in the query device 2, such as the program code of the column storage file query system 24.
- the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
- the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
- the processor 22 is generally used to control the overall operation of the query device 2.
- the processor 22 is configured to run the program code or process data stored in the memory 21, for example, run the column storage file query system 24 and so on.
- the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the query device 2 and other electronic devices.
- the network interface 23 is used to connect the query device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the query device 2 and the external terminal.
- the network may be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network, 5G Network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
- FIG. 3 shows a schematic diagram of program modules of the storage file query system in the third embodiment of the present application.
- the column storage file query system 24 may include or be divided into one or more program modules.
- the one or more program modules are stored in a storage medium and executed by one or more processors to This application can be completed, and the query method for column storage files can be realized.
- the program module referred to in the embodiment of the present application refers to a series of computer-readable instruction instruction segments that can complete specific functions, and is more suitable for describing the execution process of the column storage file query system 24 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
- the obtaining module 201 is used to obtain the SPL query sentence input by the user from the terminal.
- the obtaining module 201 obtains the SPL query sentence input by the user from the terminal.
- the query statement includes at least: query time range and name.
- the determining module 202 is configured to determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule ,
- the preset storage rules include: time sequence, application name, and operator ID.
- the screening module 203 is configured to screen out the second file from the first file according to the query range.
- the recognition module 208 recognizes the query time range and name of the file to be queried according to the type of the SPL query sentence, and the filtering module 203 stores the file from the column Filter out files corresponding to the query time range and name.
- the column storage files are stored in accordance with preset storage rules to improve query efficiency.
- the preset storage rules can be freely set according to the needs of users, including application name, time, etc., or Other logos are not limited here.
- the files stored in the query device 2 are stored in accordance with /[application name]/[year]/[month]/[day]/[hour], when the obtaining module 201 obtains that the user uses
- the identification module 208 recognizes that the file query range is the application name syslog, and the time range is September 2018 From 1st to September 3rd
- the filtering module 203 filters out the three folders /syslog/2018/9/1, /syslog/2018/9/2, and /syslog/2018/9/3 according to the identified query range In the file.
- the conversion module 204 is configured to convert the SPL query sentence into a SQL sentence according to a preset conversion rule.
- the query device 2 pre-establishes a conversion mapping table for common commands of SPL sentences and common commands of SQL sentences, and stores the conversion mapping table in a database.
- the conversion mapping table includes at least: common command types of SPL statements, common command types of SQL statements, and a mapping relationship between common commands of the SPL statement and common commands of the SQL statement when the view name is determined.
- a view that is, a temporary table
- the view is queried according to the input SPL query statement.
- the creation module 207 recognizes and executes the view creation instruction to create a view. Then, after the view is created, the obtaining module 201 obtains the view name according to the name of the view by the user, and after obtaining the view name, queries the view according to the obtained SPL query statement.
- the recognition module 208 recognizes the command type corresponding to the SPL query sentence, and then, the conversion module 204 converts the command type according to the command type and the view name.
- the SPL query statement is converted into a corresponding SQL statement according to the mapping relationship.
- the search module 205 is configured to import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file.
- the SQL search engine of the big data platform includes Hive and/or Spark SQL.
- the output module 206 is configured to output the target query file to the terminal.
- Hive's common data import methods include: import data from the local file system to the Hive table; import data from HDFS to the Hive table; and from other tables Query the corresponding data and import it into the Hive table; when creating the table, query the corresponding records from other tables and insert them into the created table.
- data is imported from the HDFS into the Hive table. This importing step is an existing technology and will not be described in detail here.
- the SQL search engine of the big data platform is Spark SQL
- the view is created by using Spark SQL, and the converted SQL statement is used for query to output the returned result.
- the big data platform SQL search engine when the big data platform SQL search engine receives the file within the query range, it executes the SQL statement converted from the SPL statement, and outputs the execution result to the user terminal.
- the execution result is the file that the user needs to query, that is, the target query file.
- the file in the HDFS column storage file that meets the query time range is determined, and the file is imported into the big data platform SQL search engine,
- the SQL statement on the SQL search engine of the big data platform to search for the target query file and output the target query file to the user terminal, it provides a unified query mode for users of the original log search system, and expands
- the query range of the SPL statement provides convenience for querying column storage data.
- This application also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or more A server cluster composed of two servers), etc.
- the computer device in this embodiment at least includes, but is not limited to: a memory, a processor, etc. that can be communicatively connected to each other through a system bus.
- This embodiment also provides a non-volatile computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which storage There are computer-readable instructions, and the corresponding functions are realized when the program is executed by the processor.
- the non-volatile computer-readable storage medium of this embodiment is used to store the column storage file query system 24, and when executed by a processor, the following steps are implemented:
- the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
- the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL;
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
本申请要求于2019年4月24日提交中国专利局,专利名称为“一种列存储文件的查询方法及查询装置”,申请号为201910331414.5的发明专利的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of a Chinese patent application filed with the Chinese Patent Office on April 24, 2019. The patent name is "A query method and query device for column storage files", and the application number is 201910331414.5. Incorporated in this application by reference.
本申请实施例涉及数据库管理技术领域,尤其涉及一种列存储文件的查询方法、查询装置、计算机设备及可读存储介质。The embodiments of the present application relate to the technical field of database management, and in particular to a query method, query device, computer equipment, and readable storage medium for column storage files.
当前的日志搜索系统中,Splunk公司开发的搜索处理语言(Search Processing Language,SPL)是一个常见的检索语言,用于查询已经被索引过的日志数据。然,有时因为对磁盘空间的要求,存放时间比较长的日志数据会以列存储的形式(比如parquet或者优化行柱(Optimized Row Columnar,orc))存放在分布式文件系统(Hadoop Distributed File System,HDFS)上,以节省空间。当需要查询这些数据时,要求使用SPL语句查询这些列存储格式的数据文件。发明人发现,当前列存储格式的数据文件往往只支持使用结构化查询语言(Structured Query Language,SQL)作为查询语句的查询引擎,并进行数据查询,而不支持使用SPL语句进行查询。In the current log search system, the Search Processing Language (SPL) developed by Splunk is a common search language used to query log data that has been indexed. Of course, sometimes due to disk space requirements, log data with a relatively long storage time will be stored in the form of column storage (such as parquet or optimized row columnar (orc)) in the distributed file system (Hadoop Distributed File System, HDFS) to save space. When you need to query these data, it is required to use SPL statements to query the data files in the column storage format. The inventor found that data files in the current column storage format often only support the use of Structured Query Language (SQL) as a query engine for query statements and perform data query, but do not support the use of SPL statements for query.
因此,本申请旨在解决不支持SPL语句对列存储格式直接进行查询的问题。Therefore, this application aims to solve the problem of not supporting the SPL statement to directly query the column storage format.
发明内容Summary of the invention
有鉴于此,有必要提供一种列存储文件的查询方法、查询装置、计算机设备及非易失性计算机可读存储介质,为原有日志搜索系统的用户提供了统一的查询模式,扩大了SPL语句的查询范围,为列存储数据的查询提供方便。In view of this, it is necessary to provide a query method, query device, computer equipment and non-volatile computer readable storage medium for column storage files, which provides a unified query mode for users of the original log search system and expands the SPL The query range of the statement provides convenience for querying column storage data.
为实现上述目的,本申请实施例提供了一种列存储文件的查询方法,所述方法包括:To achieve the foregoing objective, an embodiment of the present application provides a query method for column storage files, the method including:
从终端获取用户输入的SPL查询语句;Obtain the SPL query sentence entered by the user from the terminal;
根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称和/或操作人员ID;According to the SPL query sentence, the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
根据所述查询范围从所述第一文件中筛选出第二文件;Filter out a second file from the first file according to the query range;
将所述SPL查询语句按照预设转化规则转化为SQL语句;Converting the SPL query sentence into a SQL sentence according to a preset conversion rule;
将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件,其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL;及Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL; and
将所述目标查询文件输出至所述终端。Output the target query file to the terminal.
为实现上述目的,本申请实施例还提供了一种列存储文件的查询装置,包括:To achieve the foregoing objective, an embodiment of the present application also provides a query device for column storage files, including:
获取模块,用于从终端获取用户输入的SPL查询语句;The obtaining module is used to obtain the SPL query sentence input by the user from the terminal;
确定模块,用于根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称和/或操作人员ID;The determining module is configured to determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule, The preset storage rules include: time sequence, application name and/or operator ID;
筛选模块,用于根据所述查询范围从所述第一文件中筛选出第二文件;A screening module, configured to screen out a second file from the first file according to the query range;
转化模块,用于将所述SPL查询语句按照预设转化规则转化为SQL语句;A conversion module for converting the SPL query sentence into a SQL sentence according to a preset conversion rule;
搜索模块,用于将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件,其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL;及The search module is used to import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data The platform SQL search engine includes Hive and/or Spark SQL; and
输出模块,用于将所述目标查询文件输出至所述终端。The output module is used to output the target query file to the terminal.
为实现上述目的,本申请实施例还提供了一种计算机设备,所述计算机设备存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机可 读指令,所述计算机可读指令被处理器执行时实现如下步骤:In order to achieve the foregoing objective, an embodiment of the present application further provides a computer device, the computer device memory, a processor, and computer-readable instructions stored in the memory and running on the processor, the computer When the readable instruction is executed by the processor, the following steps are implemented:
从终端获取用户输入的SPL查询语句;Obtain the SPL query sentence entered by the user from the terminal;
根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称和/或操作人员ID;According to the SPL query sentence, the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
根据所述查询范围从所述第一文件中筛选出第二文件;Filter out a second file from the first file according to the query range;
将所述SPL查询语句按照预设转化规则转化为SQL语句;Converting the SPL query sentence into a SQL sentence according to a preset conversion rule;
将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件,其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL;及Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL; and
将所述目标查询文件输出至所述终端。Output the target query file to the terminal.
为实现上述目的,本申请实施例还提供了一种非易失性计算机可读存储介质,所述非易失性计算机可读存储介质内存储有计算机可读指令,所述计算机可读指令可被至少一个处理器所执行,以使所述至少一个处理器执行如下步骤:In order to achieve the above objective, the embodiments of the present application also provide a non-volatile computer-readable storage medium. The non-volatile computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions may Is executed by at least one processor, so that the at least one processor executes the following steps:
从终端获取用户输入的SPL查询语句;Obtain the SPL query sentence entered by the user from the terminal;
根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称和/或操作人员ID;According to the SPL query sentence, the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
根据所述查询范围从所述第一文件中筛选出第二文件;Filter out a second file from the first file according to the query range;
将所述SPL查询语句按照预设转化规则转化为SQL语句;Converting the SPL query sentence into a SQL sentence according to a preset conversion rule;
将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件,其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL;及Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL; and
将所述目标查询文件输出至所述终端。Output the target query file to the terminal.
本申请实施例提供的列存储文件的查询方法、查询装置、计算机设备及非易失性计算机可读存储介质,通过将SPL语句转换为SQL语句,根据SPL语句的查询时间范围确定HDFS列存储文件中符合所述查询时间范围的文件,并 将所述文件导入至大数据平台SQL搜索引擎中,通过于所述大数据平台SQL搜索引擎执行所述SQL语句以搜索出目标查询文件并将所述目标查询文件输出至用户终端,为原有日志搜索系统的用户提供了统一的查询模式,扩大了SPL语句的查询范围,为列存储数据的查询提供方便。The query method, query device, computer equipment, and non-volatile computer-readable storage medium for column storage files provided by the embodiments of the present application convert SPL statements into SQL statements and determine the HDFS column storage files according to the query time range of the SPL statement The file that meets the query time range is imported into the big data platform SQL search engine, the SQL statement is executed on the big data platform SQL search engine to search for the target query file and the The target query file is output to the user terminal, providing a unified query mode for users of the original log search system, expanding the query range of SPL statements, and providing convenience for querying column storage data.
图1为本申请实施例一之列存储文件的查询方法的步骤流程图。FIG. 1 is a flowchart of the steps of a method for querying stored files in the first embodiment of the application.
图2为本申请实施例二之查询装置的硬件架构示意图。FIG. 2 is a schematic diagram of the hardware architecture of the query device according to the second embodiment of the application.
图3为本申请实施例三之列存储文件查询系统的程序模块示意图。FIG. 3 is a schematic diagram of program modules of a storage file query system according to the third embodiment of the application.
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions, and advantages of this application clearer, the following further describes this application in detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and not used to limit the application. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions related to "first", "second", etc. in this application are only for descriptive purposes, and cannot be understood as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Therefore, the features defined with "first" and "second" may explicitly or implicitly include at least one of the features. In addition, the technical solutions between the various embodiments can be combined with each other, but it must be based on what can be achieved by a person of ordinary skill in the art. When the combination of technical solutions is contradictory or cannot be achieved, it should be considered that such a combination of technical solutions does not exist. , Not within the scope of protection required by this application.
实施例一Example one
参阅图1,示出了本申请实施例一之列存储文件的查询方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。需要说明是,本实施例以列存储文件的查询装置2(以下简称为查询装置2) 为执行主体进行示例性描述。具体如下:Referring to FIG. 1, it shows a flow chart of a method for querying stored files in the first embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. It should be noted that, in this embodiment, the
步骤S100,从终端获取用户输入的SPL查询语句。Step S100: Obtain the SPL query sentence input by the user from the terminal.
具体地,当用户需要查询列存储文件时,所述查询装置2获取所述用户从终端输入的SPL查询语句。其中,所述查询语句至少包括:查询时间范围及名称。Specifically, when the user needs to query the column storage file, the
步骤S102,根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称及操作人员身份信息(identification,ID)。Step S102: Determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule. The preset storage rules include: time sequence, application name, and operator identification information (identification, ID).
步骤S104,根据所述查询范围从所述第一文件中筛选出第二文件。Step S104, selecting a second file from the first file according to the query range.
具体地,当获取所述SPL查询语句时,根据所述SPL查询语句的类型识别待查询文件的查询时间范围及名称,并从所述列存储文件中筛选出与所述查询时间范围及名称对应的文件。需要说明的是,所述列存储文件按照预设存储规则进行存储,以提高查询效率,其中,所述预设存储规则可以根据用户的需要自由设置,可以包括应用名称、时间等,也可以包括其他标识,在此不作限定。示例性地,所述查询装置2中存储的文件按照/[应用名称]/[年]/[月]/[日]/[时]进行存放,当获取到所述用户使用SPL语句查询应用名为syslog,时间范围为2018年9月1日至9月3日的文件时,则确定的文件查询范围为应用名为syslog,时间范围为2018年9月1日至9月3日,并根据确定的查询范围筛选出/syslog/2018/9/1,/syslog/2018/9/2,/syslog/2018/9/3这3个文件夹中的文件。Specifically, when the SPL query sentence is obtained, the query time range and name of the file to be queried are identified according to the type of the SPL query sentence, and the column storage files are filtered out corresponding to the query time range and name document. It should be noted that the column storage files are stored in accordance with preset storage rules to improve query efficiency. The preset storage rules can be freely set according to the needs of users, including application name, time, etc., or Other logos are not limited here. Exemplarily, the files stored in the
需要说明的是,当所述查询装置2中存储的文件路径兼容Hive分区时,则每一层文件路径的格式为[field]=[value],若应用名称用appname表示,则存储的文件/syslog/2018/9/1路径为/appname=syslog/year=2018/month=9/day=1。It should be noted that when the file path stored in the
步骤S106,将所述SPL查询语句按照预设转化规则转化为SQL语句。Step S106: Convert the SPL query sentence into a SQL sentence according to a preset conversion rule.
在一实施例中,所述查询装置2预先建立有SPL语句常用命令以及SQL语句常用命令的转化映射表,并将所述转化映射表存储于数据库中。其中,所述转化映射表至少包括:SPL语句常用命令类型、SQL语句常用命令类型及 所述SPL语句常用命令与所述SQL语句常用命令在所述视图名确定时的映射关系。具体地,当用户需要查询数据库表中部分内容时,将所述部分内容创建视图(也即临时表),并对所述视图进行命名,然后根据输入的SPL查询语句针对所述视图进行查询。当所述查询装置2接收到所述用户输入的创建视图指令时,识别并执行所述创建视图指令以建立视图。然后,当视图建立完成后,所述查询装置2根据用户对所述视图的命名以获取视图名,并在获取所述视图名后根据获取的SPL查询语句对所述视图进行查询。当所述查询装置2接收到所述SPL查询语句时,识别所述SPL查询语句对应的命令类型,并根据所述命令类型以及所述视图名将所述SPL查询语句按照所述映射关系转化为对应的SQL语句。In one embodiment, the
示例性地,SPL语句常用命令类型包括:SELECT语句“Streams:[A]”;WHERE语句“[my_field]:[number]”;SELECT语句“FIELDS[A],[B]”及LIKE语句“[A]=“*some text*””。SQL语句常用命令类型包括:SELECT语句“SELECT*FROM[A]”;WHERE语句“SELECT*FROM[streams]WHERE[my_field]=[number]”;SELECT语句“SELECT[A],[B]”及LIKE语句“SELECT*FROM[streams]WHERE[A]LIKE“*some text*””。Illustratively, common command types of SPL statements include: SELECT statement "Streams:[A]"; WHERE statement "[my_field]: [number]"; SELECT statement "FIELDS[A], [B]" and LIKE statement "[ A]="*some text*"". Common command types of SQL statements include: SELECT statement "SELECT*FROM[A]"; WHERE statement "SELECT*FROM[streams]WHERE[my_field]=[number]"; SELECT statement "SELECT[A],[B]" and The LIKE sentence "SELECT*FROM[streams]WHERE[A]LIKE "*some text*"".
示例性地,所述用户创建的视图名为temp_1,且所述视图内容包括:用户名、性别及出生日期等属性,当所述查询装置2接收到所述用户输入的SPL查询语句为:gender:[male]时,所述查询装置2根据所述映射关系将所述SPL查询语句转化SQL查询语句为SELECT*FROM temp_1where gender=male,则其映射关系为gender:[male]对应于SELECT*FROM temp_1where gender=male。Exemplarily, the view created by the user is named temp_1, and the content of the view includes attributes such as user name, gender, and date of birth. When the
在一较佳实施例中,步骤S106可在步骤S102之前执行,两者也可并列执行,并不影响本申请实施例的实现。In a preferred embodiment, step S106 can be executed before step S102, and the two can also be executed in parallel, which does not affect the implementation of the embodiment of the present application.
步骤S108,将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件。其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL。Step S108: Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file. Wherein, the SQL search engine of the big data platform includes Hive and/or Spark SQL.
步骤S110,将所述目标查询文件输出至所述终端。Step S110: output the target query file to the terminal.
需要说明的是,若所述大数据平台SQL搜索引擎为Hive,Hive常见的数据导入方式包括:从本地文件系统中导入数据到Hive表;从HDFS上导入数据到Hive表;从别的表中查询出相应的数据并导入到Hive表中;在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中。在本实施例中,采用的是从HDFS上导入数据到Hive表中,该导入步骤为现有技术,在此不作详细说明。若所述大数据平台SQL搜索引擎为Spark SQL,通过使用Spark SQL创建视图,并使用转换后的SQL语句进行查询以输出返回的结果。It should be noted that if the SQL search engine of the big data platform is Hive, Hive's common data import methods include: import data from the local file system to the Hive table; import data from HDFS to the Hive table; and from other tables Query the corresponding data and import it into the Hive table; when creating the table, query the corresponding records from other tables and insert them into the created table. In this embodiment, data is imported from the HDFS into the Hive table. This importing step is an existing technology and will not be described in detail here. If the SQL search engine of the big data platform is Spark SQL, the view is created by using Spark SQL, and the converted SQL statement is used for query to output the returned result.
在一较佳实施例中,所述大数据平台SQL搜索引擎在接收到所述查询范围内的文件时,执行所述SPL语句转换后的SQL语句,并将执行结果输出至用户终端,此时所述执行结果即为所述用户需要查询的文件,也即目标查询文件。In a preferred embodiment, when the big data platform SQL search engine receives the file within the query range, it executes the SQL statement converted from the SPL statement, and outputs the execution result to the user terminal. The execution result is the file that the user needs to query, that is, the target query file.
本申请实施例通过将SPL语句转换为SQL语句,根据SPL语句的查询时间范围确定HDFS列存储文件中符合所述查询时间范围的文件,并将所述文件导入至大数据平台SQL搜索引擎中,通过于所述大数据平台SQL搜索引擎执行所述SQL语句以搜索出目标查询文件并将所述目标查询文件输出至用户终端,为原有日志搜索系统的用户提供了统一的查询模式,扩大了SPL语句的查询范围,为列存储数据的查询提供方便。In this embodiment of the application, by converting the SPL statement into the SQL statement, according to the query time range of the SPL statement, the file in the HDFS column storage file that meets the query time range is determined, and the file is imported into the big data platform SQL search engine, By executing the SQL statement on the SQL search engine of the big data platform to search for the target query file and output the target query file to the user terminal, it provides a unified query mode for users of the original log search system, and expands The query range of the SPL statement provides convenience for querying column storage data.
实施例二Example two
请参阅图2,示出了本申请实施例二之查询装置的硬件架构示意图。查询装置2包括,但不仅限于,可通过系统总线相互通信连接存储器21、处理22以及网络接口23,图2仅示出了具有组件21-23的查询装置2,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。Please refer to FIG. 2, which shows a schematic diagram of the hardware architecture of the query device in the second embodiment of the present application. The
所述存储器21至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存 储器、磁盘、光盘等。在一些实施例中,所述存储器21可以是所述查询装置2的内部存储单元,例如该查询装置2的硬盘或内存。在另一些实施例中,所述存储器也可以是所述查询装置2的外部存储设备,例如该查询装置2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器21还可以既包括所述查询装置2的内部存储单元也包括其外部存储设备。本实施例中,所述存储器21通常用于存储安装于所述查询装置2的操作系统和各类应用软件,例如列存储文件查询系统24的程序代码等。此外,所述存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 21 includes at least one type of readable storage medium, the readable storage medium includes flash memory, hard disk, multimedia card, card type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disks, optical disks, etc. In some embodiments, the memory 21 may be an internal storage unit of the
所述处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制所述查询装置2的总体操作。本实施例中,所述处理器22用于运行所述存储器21中存储的程序代码或者处理数据,例如运行所述列存储文件查询系统24等。The processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 22 is generally used to control the overall operation of the
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述查询装置2与其他电子设备之间建立通信连接。例如,所述网络接口23用于通过网络将所述查询装置2与外部终端相连,在所述查询装置2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the
实施例三Example three
请参阅图3,示出了本申请实施例三之列存储文件查询系统的程序模块示意图。在本实施例中,列存储文件查询系统24可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述列存储文件的查询方法。本申请实 施例所称的程序模块是指能够完成特定功能的一系列计算机可读指令指令段,比程序本身更适合于描述列存储文件查询系统24在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:Please refer to FIG. 3, which shows a schematic diagram of program modules of the storage file query system in the third embodiment of the present application. In this embodiment, the column storage
获取模块201,用于从终端获取用户输入的SPL查询语句。The obtaining module 201 is used to obtain the SPL query sentence input by the user from the terminal.
具体地,当用户需要查询列存储文件时,所述获取模块201获取所述用户从终端输入的SPL查询语句。其中,所述查询语句至少包括:查询时间范围及名称。Specifically, when the user needs to query the column storage file, the obtaining module 201 obtains the SPL query sentence input by the user from the terminal. Wherein, the query statement includes at least: query time range and name.
确定模块202,用于根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称及操作人员ID。The determining module 202 is configured to determine the query range in the first file of HDFS according to the SPL query sentence, where the first file is a column storage file, and the first file is classified and stored according to a preset storage rule , The preset storage rules include: time sequence, application name, and operator ID.
筛选模块203,用于根据所述查询范围从所述第一文件中筛选出第二文件。The screening module 203 is configured to screen out the second file from the first file according to the query range.
具体地,当所述获取模块201获取所述SPL查询语句时,识别模块208根据所述SPL查询语句的类型识别待查询文件的查询时间范围及名称,所述筛选模块203从所述列存储文件中筛选出与所述查询时间范围及名称对应的文件。需要说明的是,所述列存储文件按照预设存储规则进行存储,以提高查询效率,其中,所述预设存储规则可以根据用户的需要自由设置,可以包括应用名称、时间等,也可以包括其他标识,在此不作限定。示例性地,所述查询装置2中存储的文件按照/[应用名称]/[年]/[月]/[日]/[时]进行存放,当所述获取模块201获取到所述用户使用SPL语句查询应用名为syslog,时间范围为2018年9月1日至9月3日的文件时,则所述识别模块208识别出文件查询范围为应用名为syslog,时间范围为2018年9月1日至9月3日,筛选模块203根据识别出的查询范围筛选出/syslog/2018/9/1,/syslog/2018/9/2,/syslog/2018/9/3这3个文件夹中的文件。Specifically, when the obtaining module 201 obtains the SPL query sentence, the recognition module 208 recognizes the query time range and name of the file to be queried according to the type of the SPL query sentence, and the filtering module 203 stores the file from the column Filter out files corresponding to the query time range and name. It should be noted that the column storage files are stored in accordance with preset storage rules to improve query efficiency. The preset storage rules can be freely set according to the needs of users, including application name, time, etc., or Other logos are not limited here. Exemplarily, the files stored in the
需要说明的是,当所述查询装置2中存储的文件路径兼容Hive分区时,则每一层文件路径的格式为[field]=[value],若应用名称用appname表示,则存 储的文件/syslog/2018/9/1路径为/appname=syslog/year=2018/month=9/day=1。It should be noted that when the file path stored in the
转化模块204,用于将所述SPL查询语句按照预设转化规则转化为SQL语句。The conversion module 204 is configured to convert the SPL query sentence into a SQL sentence according to a preset conversion rule.
在一实施例中,所述查询装置2预先建立有SPL语句常用命令以及SQL语句常用命令的转化映射表,并将所述转化映射表存储于数据库中。其中,所述转化映射表至少包括:SPL语句常用命令类型、SQL语句常用命令类型及所述SPL语句常用命令与所述SQL语句常用命令在所述视图名确定时的映射关系。具体地,当用户需要查询表中部分内容时,将所述部分内容创建视图(也即临时表),并对所述视图进行命名,然后根据输入的SPL查询语句针对所述视图进行查询。当所述获取模块201获取到所述用户输入的创建视图指令时,建立模块207识别并执行所述创建视图指令以建立视图。然后,当视图建立完成后,所述获取模块201根据用户对所述视图的命名以获取视图名,并在获取所述视图名后根据获取的SPL查询语句对所述视图进行查询。当所述获取模块201接收到所述SPL查询语句时,所述识别模块208识别所述SPL查询语句对应的命令类型,然后,所述转化模块204根据所述命令类型以及所述视图名,将所述SPL查询语句按照所述映射关系转化为对应的SQL语句。In one embodiment, the
示例性地,SPL语句常用命令类型包括:SELECT语句“Streams:[A]”;WHERE语句“[my_field]:[number]”;SELECT语句“FIELDS[A],[B]”及LIKE语句“[A]=“*some text*””。SQL语句常用命令类型包括:SELECT语句“SELECT*FROM[A]”;WHERE语句“SELECT*FROM[streams]WHERE[my_field]=[number]”;SELECT语句“SELECT[A],[B]”及LIKE语句“SELECT*FROM[streams]WHERE[A]LIKE“*some text*””。Illustratively, common command types of SPL statements include: SELECT statement "Streams:[A]"; WHERE statement "[my_field]: [number]"; SELECT statement "FIELDS[A], [B]" and LIKE statement "[ A]="*some text*"". Common command types of SQL statements include: SELECT statement "SELECT*FROM[A]"; WHERE statement "SELECT*FROM[streams]WHERE[my_field]=[number]"; SELECT statement "SELECT[A],[B]" and The LIKE sentence "SELECT*FROM[streams]WHERE[A]LIKE "*some text*"".
示例性地,若所述用户创建的视图名为temp_1,且所述视图内容包括:用户名、性别及出生日期等属性,当所述查询装置2接收到所述用户输入的SPL查询语句为:gender:[male]时,所述查询装置2根据所述映射关系将所述SPL查询语句转化SQL查询语句为SELECT*FROM temp_1where gender=male,则其映射关系为gender:[male]对应于SELECT*FROM temp_1where gender=male。Exemplarily, if the view created by the user is named temp_1, and the content of the view includes attributes such as user name, gender, and date of birth, when the
搜索模块205,用于将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件。其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL。The search module 205 is configured to import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file. Wherein, the SQL search engine of the big data platform includes Hive and/or Spark SQL.
输出模块206,用于将所述目标查询文件输出至所述终端。The output module 206 is configured to output the target query file to the terminal.
需要说明的是,若所述大数据平台SQL搜索引擎为Hive,Hive常见的数据导入方式包括:从本地文件系统中导入数据到Hive表;从HDFS上导入数据到Hive表;从别的表中查询出相应的数据并导入到Hive表中;在创建表的时候通过从别的表中查询出相应的记录并插入到所创建的表中。在本实施例中,采用的是从HDFS上导入数据到Hive表中,该导入步骤为现有技术,在此不作详细说明。若所述大数据平台SQL搜索引擎为Spark SQL,通过使用Spark SQL创建视图,并使用转换后的SQL语句进行查询以输出返回的结果。It should be noted that if the SQL search engine of the big data platform is Hive, Hive's common data import methods include: import data from the local file system to the Hive table; import data from HDFS to the Hive table; and from other tables Query the corresponding data and import it into the Hive table; when creating the table, query the corresponding records from other tables and insert them into the created table. In this embodiment, data is imported from the HDFS into the Hive table. This importing step is an existing technology and will not be described in detail here. If the SQL search engine of the big data platform is Spark SQL, the view is created by using Spark SQL, and the converted SQL statement is used for query to output the returned result.
在一较佳实施例中,所述大数据平台SQL搜索引擎在接收到所述查询范围内的文件时,执行所述SPL语句转换后的SQL语句,并将执行结果输出至用户终端,此时所述执行结果即为所述用户需要查询的文件,也即目标查询文件。In a preferred embodiment, when the big data platform SQL search engine receives the file within the query range, it executes the SQL statement converted from the SPL statement, and outputs the execution result to the user terminal. The execution result is the file that the user needs to query, that is, the target query file.
本申请实施例通过将SPL语句转换为SQL语句,根据SPL语句的查询时间范围确定HDFS列存储文件中符合所述查询时间范围的文件,并将所述文件导入至大数据平台SQL搜索引擎中,通过于所述大数据平台SQL搜索引擎执行所述SQL语句以搜索出目标查询文件并将所述目标查询文件输出至用户终端,为原有日志搜索系统的用户提供了统一的查询模式,扩大了SPL语句的查询范围,为列存储数据的查询提供方便。In this embodiment of the application, by converting the SPL statement into the SQL statement, according to the query time range of the SPL statement, the file in the HDFS column storage file that meets the query time range is determined, and the file is imported into the big data platform SQL search engine, By executing the SQL statement on the SQL search engine of the big data platform to search for the target query file and output the target query file to the user terminal, it provides a unified query mode for users of the original log search system, and expands The query range of the SPL statement provides convenience for querying column storage data.
本申请还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备至少包括但不限于:可通过系统总线相互通信连接的存储器、处理器等。This application also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or more A server cluster composed of two servers), etc. The computer device in this embodiment at least includes, but is not limited to: a memory, a processor, etc. that can be communicatively connected to each other through a system bus.
本实施例还提供一种非易失性计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,程序被处理器执行时实现相应功能。本实施例的非易失性计算机可读存储介质用于存储列存储文件查询系统24,被处理器执行时实现如下步骤:This embodiment also provides a non-volatile computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory ( SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which storage There are computer-readable instructions, and the corresponding functions are realized when the program is executed by the processor. The non-volatile computer-readable storage medium of this embodiment is used to store the column storage
从终端获取用户输入的SPL查询语句;Obtain the SPL query sentence entered by the user from the terminal;
根据所述SPL查询语句,确定在HDFS的第一文件中的查询范围,其中,所述第一文件为列存储文件,所述第一文件按照预设存储规则进行分类存储,所述预设存储规则包括:时间顺序、应用名称和/或操作人员ID;According to the SPL query sentence, the query range in the first file of HDFS is determined, where the first file is a column storage file, the first file is classified and stored according to a preset storage rule, and the preset storage Rules include: time sequence, application name and/or operator ID;
根据所述查询范围从所述第一文件中筛选出第二文件;Filter out a second file from the first file according to the query range;
将所述SPL查询语句按照预设转化规则转化为SQL语句;Converting the SPL query sentence into a SQL sentence according to a preset conversion rule;
将所述第二文件导入至大数据平台SQL搜索引擎中,以使所述大数据平台SQL搜索引擎执行所述SQL语句,以搜索出目标查询文件,其中,所述大数据平台SQL搜索引擎包括Hive和/或Spark SQL;及Import the second file into a big data platform SQL search engine, so that the big data platform SQL search engine executes the SQL statement to search for a target query file, wherein the big data platform SQL search engine includes Hive and/or Spark SQL; and
将所述目标查询文件输出至所述终端。Output the target query file to the terminal.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the foregoing embodiments of the present application are for description only, and do not represent the superiority of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of this application, and do not limit the scope of this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of this application, or directly or indirectly used in other related technical fields , The same reason is included in the scope of patent protection of this application.
Claims (20)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910331414.5 | 2019-04-24 | ||
| CN201910331414.5A CN110175157B (en) | 2019-04-24 | 2019-04-24 | Query method and query device for column storage file |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020215689A1 true WO2020215689A1 (en) | 2020-10-29 |
Family
ID=67690041
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2019/117763 Ceased WO2020215689A1 (en) | 2019-04-24 | 2019-11-13 | Query method and apparatus for column-oriented files |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN110175157B (en) |
| WO (1) | WO2020215689A1 (en) |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112905595A (en) * | 2021-03-05 | 2021-06-04 | 腾讯科技(深圳)有限公司 | Data query method and device and computer readable storage medium |
| CN113792071A (en) * | 2021-09-18 | 2021-12-14 | 上海中通吉网络技术有限公司 | SQL intelligently generates and tunes components and methods |
| CN114328581A (en) * | 2021-12-23 | 2022-04-12 | 未来电视有限公司 | Method, device and system for querying data and storage medium |
| CN114817145A (en) * | 2022-04-11 | 2022-07-29 | 平安科技(深圳)有限公司 | Report query method, device, equipment and storage medium based on timing task |
| CN115630083A (en) * | 2022-10-28 | 2023-01-20 | 土巴兔集团股份有限公司 | Data transmission method and related equipment thereof |
| CN115858563A (en) * | 2022-12-27 | 2023-03-28 | 湖南航天信息有限公司 | Data blood margin acquisition method and device based on Gbase storage process |
| CN117235100A (en) * | 2023-09-15 | 2023-12-15 | 中国银行股份有限公司 | SQL statement conversion method, device, electronic equipment and storage medium |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110175157B (en) * | 2019-04-24 | 2023-10-03 | 平安科技(深圳)有限公司 | Query method and query device for column storage file |
| CN111581236A (en) * | 2020-04-02 | 2020-08-25 | 中国邮政储蓄银行股份有限公司 | Data query method and device |
| CN111666282B (en) * | 2020-04-28 | 2025-09-26 | 中国平安财产保险股份有限公司 | Data processing method, device, computer equipment and storage medium |
| CN111782682B (en) * | 2020-06-30 | 2024-01-02 | 北京金山云网络技术有限公司 | Data query method, device, equipment and storage medium |
| CN113722337B (en) * | 2021-11-03 | 2022-06-10 | 深圳市信润富联数字科技有限公司 | Service data determination method, device, equipment and storage medium |
| CN114036181A (en) * | 2021-11-16 | 2022-02-11 | 平安养老保险股份有限公司 | Log query method and device based on Splunk, computer equipment and storage medium |
| CN118277458B (en) * | 2024-06-04 | 2024-08-30 | 华腾数云(北京)科技有限公司 | Big data cloud storage method meeting ACID attribute |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5826258A (en) * | 1996-10-02 | 1998-10-20 | Junglee Corporation | Method and apparatus for structuring the querying and interpretation of semistructured information |
| CN103324701A (en) * | 2013-06-13 | 2013-09-25 | 深圳中兴网信科技有限公司 | Data searching device and method |
| CN104794247A (en) * | 2015-05-14 | 2015-07-22 | 东南大学 | Integrated query method for multi-structure database |
| CN107305583A (en) * | 2016-04-19 | 2017-10-31 | 中华电信股份有限公司 | Real-time streaming recording data analysis system and method |
| CN110175157A (en) * | 2019-04-24 | 2019-08-27 | 平安科技(深圳)有限公司 | A kind of querying method and inquiry unit of column storage file |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US7487227B2 (en) * | 2004-06-03 | 2009-02-03 | Alcatel-Lucent Usa Inc. | Scripting engine having a sequencer table and a plurality of secondary tables for network communication software |
| US20180032930A1 (en) * | 2015-10-07 | 2018-02-01 | 0934781 B.C. Ltd | System and method to Generate Queries for a Business Database |
| CN108121709A (en) * | 2016-11-28 | 2018-06-05 | 中兴通讯股份有限公司 | A kind of search processing method and device |
| US20190034540A1 (en) * | 2017-07-28 | 2019-01-31 | Insight Engines, Inc. | Natural language search with semantic mapping and classification |
| CN109271428A (en) * | 2018-09-11 | 2019-01-25 | 北京市计算中心 | Data pick-up method and method for exhibiting data based on geography information |
-
2019
- 2019-04-24 CN CN201910331414.5A patent/CN110175157B/en active Active
- 2019-11-13 WO PCT/CN2019/117763 patent/WO2020215689A1/en not_active Ceased
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5826258A (en) * | 1996-10-02 | 1998-10-20 | Junglee Corporation | Method and apparatus for structuring the querying and interpretation of semistructured information |
| CN103324701A (en) * | 2013-06-13 | 2013-09-25 | 深圳中兴网信科技有限公司 | Data searching device and method |
| CN104794247A (en) * | 2015-05-14 | 2015-07-22 | 东南大学 | Integrated query method for multi-structure database |
| CN107305583A (en) * | 2016-04-19 | 2017-10-31 | 中华电信股份有限公司 | Real-time streaming recording data analysis system and method |
| CN110175157A (en) * | 2019-04-24 | 2019-08-27 | 平安科技(深圳)有限公司 | A kind of querying method and inquiry unit of column storage file |
Cited By (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112905595A (en) * | 2021-03-05 | 2021-06-04 | 腾讯科技(深圳)有限公司 | Data query method and device and computer readable storage medium |
| CN113792071A (en) * | 2021-09-18 | 2021-12-14 | 上海中通吉网络技术有限公司 | SQL intelligently generates and tunes components and methods |
| CN114328581A (en) * | 2021-12-23 | 2022-04-12 | 未来电视有限公司 | Method, device and system for querying data and storage medium |
| CN114817145A (en) * | 2022-04-11 | 2022-07-29 | 平安科技(深圳)有限公司 | Report query method, device, equipment and storage medium based on timing task |
| CN115630083A (en) * | 2022-10-28 | 2023-01-20 | 土巴兔集团股份有限公司 | Data transmission method and related equipment thereof |
| CN115858563A (en) * | 2022-12-27 | 2023-03-28 | 湖南航天信息有限公司 | Data blood margin acquisition method and device based on Gbase storage process |
| CN117235100A (en) * | 2023-09-15 | 2023-12-15 | 中国银行股份有限公司 | SQL statement conversion method, device, electronic equipment and storage medium |
Also Published As
| Publication number | Publication date |
|---|---|
| CN110175157B (en) | 2023-10-03 |
| CN110175157A (en) | 2019-08-27 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2020215689A1 (en) | Query method and apparatus for column-oriented files | |
| CN107038207B (en) | A data query method, data processing method and device | |
| CN108874924B (en) | Method and device for creating search service and computer-readable storage medium | |
| CN107526777B (en) | Method and equipment for processing file based on version number | |
| CN111177113B (en) | Data migration method, device, computer equipment and storage medium | |
| WO2018095351A1 (en) | Method and device for search processing | |
| CN111797134A (en) | Data query method, device and storage medium for distributed database | |
| WO2019161645A1 (en) | Shell-based data table extraction method, terminal, device, and storage medium | |
| CN113111038B (en) | File storage method, device, server and storage medium | |
| CN108319608A (en) | The method, apparatus and system of access log storage inquiry | |
| CN111339171A (en) | Data query method, device and device | |
| CN107784026A (en) | A kind of ETL data processing methods and device | |
| CN104536987A (en) | Data query method and device | |
| WO2018188539A1 (en) | Data processing method, terminal, device, and storage medium | |
| CN112579705B (en) | Metadata acquisition method, device, computer equipment and storage medium | |
| CN115994148B (en) | Multi-table data updating method and device, electronic equipment and readable storage medium | |
| CN112231292B (en) | File processing method, device, storage medium and computer equipment | |
| WO2022127883A1 (en) | Signaling data query method, signaling data index library construction method, and server | |
| CN105426481B (en) | Handle the method and device of data | |
| CN110222046A (en) | Processing method, device, server and the storage medium of table data | |
| CN107463618B (en) | Index creating method and device | |
| CN111651466B (en) | Data sampling method and device | |
| CN108984720B (en) | Data query method and device based on column storage, server and storage medium | |
| US9537941B2 (en) | Method and system for verifying quality of server | |
| US20170116219A1 (en) | Efficient differential techniques for metafiles |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19925815 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 19925815 Country of ref document: EP Kind code of ref document: A1 |