CN110275920B

CN110275920B - Data query method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110275920B
Application number: CN201910570593.8A
Authority: CN
Inventors: 南菊红; 项建; 阳思伟; 王�华; 程宁; 白东营
Original assignee: BGP Inc; CNPC Beijing Richfit Information Technology Co Ltd
Current assignee: Kunlun Digital Technology Co ltd; China National Petroleum Corp; BGP Inc
Priority date: 2019-06-27
Filing date: 2019-06-27
Publication date: 2021-08-03
Anticipated expiration: 2039-06-27
Also published as: CN110275920A

Abstract

The application provides a data query method, a data query device, electronic equipment and a computer readable storage medium, wherein the data query method comprises the following steps: obtaining multi-source heterogeneous data, and performing clustering analysis to obtain dimensions and a dimension level corresponding to each dimension; constructing a dimension table based on the dimensions and the dimension level corresponding to each dimension, and constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data; constructing a data cube based on the dimension table and the fact table; based on the dimension combination contained in the dimension table, creating a structured query language statement corresponding to the dimension combination, acquiring a cube corresponding to the dimension combination from the constructed data cube, and storing the structured query language statement and the cube into a database; and receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user. The data query efficiency can be improved.

Description

Data query method and device, electronic equipment and computer readable storage medium

Technical Field

The present application relates to the field of data retrieval and query technologies, and in particular, to a data query method and apparatus, an electronic device, and a computer-readable storage medium.

Background

With the rapid development of oil exploration and development, mass data are accumulated, and the data in the oil exploration and development process are transmitted to an oil exploration and development platform for storage, so that the data can be comprehensively analyzed and utilized, data support is provided for the oil exploration and development, and data access is provided for each user. Thus, the oil exploration and development platform is required to provide the capability of query analysis on massive data and the capability of accessing massive data.

At present, a petroleum exploration and development platform generally adopts postgreSQL, Oracle, MySQL, MPP database technologies and full-text ElasticSearch information retrieval technologies to provide data management and query, but the data management and query functions provided by the technologies need to be traversed and queried on the petroleum exploration and development platform according to conditions provided by a query request, so that the data query efficiency is low under the condition that the data volume stored by the petroleum exploration and development platform is huge.

Disclosure of Invention

In view of this, an object of the present application is to provide a data query method, an apparatus, an electronic device and a computer-readable storage medium, which improve data query efficiency.

In a first aspect, an embodiment of the present application provides a data query method, including:

obtaining multi-source heterogeneous data, and performing clustering analysis on the multi-source heterogeneous data to obtain dimensions contained in the multi-source heterogeneous data and a dimension hierarchy corresponding to each dimension;

constructing a dimension table based on the obtained dimensions and the dimension level corresponding to each dimension, and constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data;

constructing a data cube based on the dimension table and the fact table;

based on the dimension combination contained in the dimension table, creating a structured query language statement corresponding to the dimension combination, acquiring a cube corresponding to the dimension combination from the constructed data cube, and storing the structured query language statement and the cube into a database;

receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user.

With reference to the first aspect, the present invention provides a first possible implementation manner of the first aspect, where constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data includes:

constructing a star graph containing a dimension table and a fact table associated with the dimension table according to the multi-source heterogeneous data;

creating a wide table by calling a data warehouse according to the fact table and the dimension table of the star map;

removing the weight of the dimension column in the wide table;

and creating a dimension dictionary according to the de-duplicated wide table, and associating the dimension dictionary with the fact table.

With reference to the first aspect, the present invention provides a second possible implementation manner of the first aspect, where constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data includes:

reading a dimension and a data value corresponding to the dimension from multi-source heterogeneous data;

inquiring whether the read dimensionality exists in the stored dimensionality table;

if the dimension does not exist, the read dimension is created in the dimension table, the data value corresponding to the read dimension is written into a stored fact table, the association relation between the read dimension and the data value written into the fact table is built, and the step of reading another dimension from the multi-source heterogeneous data is executed until the dimension in the multi-source heterogeneous data is completely read;

and if the data values exist, inquiring whether the data values corresponding to the read dimensions in the fact table are the same, if so, executing a step of reading another dimension from the multi-source heterogeneous data until the dimensions in the multi-source heterogeneous data are completely read, if not, writing the data values corresponding to the read dimensions in the fact table, constructing an association relation between the read dimensions and the data values written in the fact table, and executing a step of reading another dimension from the multi-source heterogeneous data until the dimensions in the multi-source heterogeneous data are completely read.

With reference to the first aspect, the present invention provides a third possible implementation manner of the first aspect, where the building a data cube based on the dimension table and the fact table includes:

creating a database table;

generating dimension combinations contained in the dimension table based on the dimension table, and inquiring data corresponding to each dimension combination based on the fact table to form a cube of the dimension combination;

and converting the data in the cubes of all the dimension combinations into a format supported by the database table, and writing the data into the created database table to obtain the data cube.

With reference to the first aspect, the present invention provides a fourth possible implementation manner of the first aspect, where the building a data cube based on the dimension table and the fact table includes:

constructing a coordinate system of the data cube according to the dimension table;

and reading a data value corresponding to the dimension table from a fact table associated with the dimension table, and filling the data value to a coordinate point corresponding to the dimension table in the coordinate system to obtain the data cube.

With reference to the first aspect and any one of the first to the fourth possible implementation manners of the first aspect, the present invention provides a fifth possible implementation manner of the first aspect, where after the obtaining of the multi-source heterogeneous data, before performing cluster analysis on the multi-source heterogeneous data, the method further includes:

and preprocessing the acquired multi-source heterogeneous data.

With reference to the fifth possible implementation manner of the first aspect, the present invention provides a sixth possible implementation manner of the first aspect, wherein the preprocessing includes: deleting abnormal multi-source heterogeneous data, repairing missing values in the multi-source heterogeneous data, and performing format conversion on the multi-source heterogeneous data according to a preset format or any combination thereof.

In a second aspect, an embodiment of the present application provides a data query apparatus, including:

the dimensionality acquisition module is used for acquiring multi-source heterogeneous data and carrying out clustering analysis on the multi-source heterogeneous data to obtain dimensionality contained in the multi-source heterogeneous data and a dimensionality hierarchy corresponding to each dimensionality;

the fact table building module is used for building a dimension table based on the obtained dimensions and the dimension hierarchy corresponding to each dimension, and building a fact table associated with the dimension table based on the obtained multi-source heterogeneous data;

the data cube construction module is used for constructing a data cube based on the dimension table and the fact table;

the database construction module is used for creating a structured query language sentence corresponding to the dimension combination based on the dimension combination contained in the dimension table, acquiring a cube corresponding to the dimension combination from the constructed data cube, and storing the structured query language sentence and the cube into a database;

and the query module is used for receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, performs the steps of the method described above.

According to the data query method, the data query device, the electronic equipment and the computer readable storage medium, multi-source heterogeneous data are obtained, clustering analysis is carried out on the multi-source heterogeneous data, and dimensions contained in the multi-source heterogeneous data and a dimension level corresponding to each dimension are obtained; constructing a dimension table based on the obtained dimensions and the dimension level corresponding to each dimension, and constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data; constructing a data cube based on the dimension table and the fact table; based on the dimension combination contained in the dimension table, creating a structured query language statement corresponding to the dimension combination, acquiring a cube corresponding to the dimension combination from the constructed data cube, and storing the structured query language statement and the cube into a database; receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user. Therefore, the dimension table and the fact table are built based on the multi-source heterogeneous data, the data cube is built based on the dimension table and the fact table, each dimension combination is carried out according to the dimension table, each dimension combination is packaged into the structured query language statement, the query result corresponding to the packaged structured query language statement is obtained from the data cube and stored into the database, so that when in query, the database is queried according to the structured query information corresponding to the query request, the corresponding query result is obtained, massive multi-source heterogeneous data do not need to be queried according to the query request and are matched with the data to obtain the query result, and the query speed and the query efficiency can be effectively improved.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart of a data query method provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a data query device according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a computer device 300 according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic flow chart of a data query method according to an embodiment of the present application. As shown in fig. 1, the method includes:

step 101, obtaining multi-source heterogeneous data, and performing clustering analysis on the multi-source heterogeneous data to obtain dimensions contained in the multi-source heterogeneous data and a dimension level corresponding to each dimension;

in the embodiment of the application, the multi-source heterogeneous data can be from a relational database or from Hadoop Hive. The Hadoop Hive is a data warehouse established on a Hadoop file system and supports structured data, semi-structured data and unstructured data. As another alternative, multi-source heterogeneous data can also be obtained by Kafka. The Kafka is used for obtaining multi-source heterogeneous data, and the Kafka can also be used for converting and preprocessing the obtained multi-source heterogeneous data.

In the embodiment of the application, after the multi-source heterogeneous data is subjected to cluster analysis, a data analysis topic set can be obtained, and the data analysis topic set comprises a plurality of data analysis topics. Each data analysis topic corresponds to one or more dimensions.

In this embodiment, as an optional embodiment, after the obtaining the multi-source heterogeneous data, before performing cluster analysis on the multi-source heterogeneous data, the method further includes:

and preprocessing the acquired multi-source heterogeneous data.

In the embodiment of the present application, as an optional embodiment, the preprocessing includes, but is not limited to: deleting abnormal multi-source heterogeneous data, repairing missing values in the multi-source heterogeneous data, performing format conversion on the multi-source heterogeneous data according to a preset format and the like.

In the embodiment of the application, data corresponding to abnormal data values in multi-source heterogeneous data are deleted. And determining whether the data value is abnormal or not can be obtained by judging according to a data abnormal value preset for the type of data. The method comprises the steps of classifying the acquired multi-source heterogeneous data by a machine learning method, analyzing the mass data corresponding to each classification, and setting a data abnormal value corresponding to the classified data by combining expert analysis.

In this embodiment, as an optional embodiment, the repairing of the missing value in the multi-source heterogeneous data may be performed by, after deleting abnormal data and data with empty data values in the classified data, counting a mean value of each data value of the remaining multi-source heterogeneous data, where the corresponding data value is empty mapping, and replacing the empty data value.

102, constructing a dimension table based on the obtained dimensions and the dimension level corresponding to each dimension, and constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data;

in the embodiment of the application, as an optional embodiment, the dimension table is identified by a primary key, the fact table is identified by a foreign key, and the dimension table and the fact table are associated by the primary key and the foreign key.

In the embodiment of the present application, as an optional embodiment, a hash operation is performed on the dimension table to obtain the primary keyword, and a hash operation is performed on the fact table to obtain the foreign keyword.

In an embodiment of the present application, a dimension table includes one or more dimensions, and each dimension includes one or more dimension hierarchies. As an optional embodiment, the fact table includes a dimension level attribute column and a dimension level attribute value column, where the dimension level attribute corresponds to a dimension level, and the dimension level attribute value corresponds to a data value corresponding to the dimension level.

In this application embodiment, use the heterogeneous data of multisource that acquire as each model oil recovery equipment oil production data as the example, cluster based on the heterogeneous data of multisource that acquires, can obtain the three dimensions of oil recovery equipment model, region and time, set up table 1 and record each model oil recovery equipment in each region annual oil production condition. Table 2 records the model of the oil production equipment, table 3 records the region, table 4 records the time, and table 2, table 3 and table 4 are respectively associated with table 1 through external keys. Wherein, table 1 is a fact table, and tables 2, 3, and 4 are dimension tables. In the embodiment of the present application, table 2, table 3, and table 4 may be combined into one dimension table.

In this embodiment of the present application, as an optional embodiment, constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data includes:

a01, constructing a star graph containing a dimension table and a fact table associated with the dimension table according to the multi-source heterogeneous data;

in the examples of the present application, the Star chart (Star chart) was constructed to include tables 1, 2, 3 and 4.

In the embodiment of the present application, taking the multi-source heterogeneous data as the mining well development month data as an example, it is assumed that the mining well development month data includes five dimensions, which are:

the organizational dimension (T _ CAL _ ORG), the geological unit dimension (T _ CAL _ ADDRUNIT), the time dimension (year, month T _ CAL _ DT), the production mode dimension (T _ CAL _ PROMODE), and the WELLBORE dimension (T _ CAL _ WELLBORE). Wherein,

the dimension hierarchy corresponding to the dimension of the organization mechanism comprises: organization number, organization name, organization abbreviation, father organization, etc.;

the dimension hierarchy corresponding to the geocellular dimension comprises: geocellular number, geocellular name, parent geocellular, type, etc.;

the dimension hierarchy corresponding to the time dimension comprises: year, month, day, quarter, etc.;

the dimension hierarchy corresponding to the extraction mode dimension comprises: a sampling mode number, a sampling mode name, and the like;

the dimension hierarchy corresponding to the wellbore dimension comprises: wellbore number, wellbore name, wellbore geographic location, wellbore type, etc. As shown in table 5.

TABLE 5

A fact table is constructed corresponding to Table 5, which is WELL (T _ WELL). Wherein,

attributes of the WELL (T _ WELL) include, but are not limited to: wellbore ID, geocellular ID, facility ID, geographic location, key well grade, date drilled, date completed, oil production, and time of statistics, among others.

In the embodiment of the present application, the dimensions in the Star diagram are defined by the following program code segments:

t _ WELL INNER JOIN T _ CAL _ ORG ON T _ well, organization ID ═ T _ CAL _ ORG, organization ID;

t _ WELL INNER JOIN T _ CAL _ wellbolt ON T _ well. WELLBORE ID ═ T _ CAL _ wellbolt. WELLBORE ID;

t _ WELL INNER JOIN T _ CAL _ addrontit ON T _ well. geocellular ID ═ T _ CAL _ addrontit. geocellular ID;

t _ WELL INNER JOIN T _ CAL _ DT ON MONTH (T _ well, date of production) MONTH (T _ CAL _ DT. date) and YEAR (T _ well, date of production) YEAR (T _ CAL _ DT. date);

t _ WELL INNER JOIN T _ CAL _ mode ON T _ well, extraction mode ID is T _ CAL _ mode.

Metrics that define fact tables: SUM (total net production), MIN (minimum net production), MAX (maximum net production), well ID, well name.

A02, creating a wide table by calling a data warehouse (Hive) according to the fact table and the dimension table of the star map;

in the embodiment of the application, whether the dimension is in the wide table is judged aiming at each dimension in the dimension table, and if the dimension is included, the processing is not carried out; if not, the dimension is added to the broad table.

In the embodiment of the present application, the program code segments for creating the wide table are as follows:

Create Intermediate Flat Hive Table

hive-e"DROP TABLE IF EXISTS Well_cube_desc_19700101000000_20140101000000_94eeff_f3ed_4c38_9e2f_b675f6cce23d；

CREATE EXTERNAL TABLE IF NOT EXISTS Well_cube_desc_19700101000000_20140101000000_94eeff_f3ed_4c38_9e2f_b675f6cce23d

(

DEFAULT _ WELL _ CATEG _ DT date- -time dimension

DEFAULT _ WELL _ CATEG _ ORGID string- -mechanism dimension

DEFAULT _ WELL _ CATEG _ WELLBORED string- -wellbore dimension

DEFAULT _ WELL _ CATEG _ ADDRUNITID string-address unit dimension

DEFAULT _ WELL _ CATEG _ PROMODEID string-sampling mode dimension

DEFAULT _ WELL _ CATEG _ WELLNAME string-WELL name

DEFAULT _ WELL _ CATEG _ WELLBORENME string-wellbore name

DEFAULT _ WELL _ WELL _ OUTPUT decimal (19,4) - - -yield

)

ROW FORMAT DELIMITED FIELDS TERMINATED BY'\177'

STORED AS SEQUENCEFILE

LOCATION '/tmp/well-94eeff_f3ed_4c38_9e2f_b675f6cce23d/Well_cube_desc_19700101000000_20140101000000_94eeff_f3ed_4c38_9e2f_b675f6cce23d'

A03, removing the weight of the dimension column in the wide table;

in this embodiment, the deduplicated wide table may be stored in a Hadoop Distributed File System (HDFS). And performing deduplication on the same dimension column in the wide table, so that the deduplicated wide table comprises the following dimension columns:

SELECT DISTINCTION DEFELT _ WELL _ CATEG _ DT date- -time dimension

DEFAULT _ WELL _ CATEG _ ORGID string- -mechanism dimension

DEFAULT _ WELL _ CATEG _ WELLBORED string- -wellbore dimension

DEFAULT _ WELL _ CATEG _ ADDRUNITID string-address unit dimension

DEFAULT _ WELL _ CATEG _ PROMODEID string-sampling mode dimension

DEFAULT _ WELL _ CATEG _ WELLNAME string-WELL name

DEFAULT _ WELL _ CATEG _ WELLBORENME string-wellbore name

DEFAULT _ WELL _ WELL _ OUTPUT decimal (19,4) - - -yield

FROM T _ WELL INNER JOIN T _ CAL _ ORG ON T _ well, organization ID T _ CAL _ ORG

INNER JOIN T _ CAL _ WELLBORE ON T _ WELL. WELLBORE ID ═ T _ CAL _ WELLBORE. WELLBORE ID

INNER JOIN T _ CAL _ addrontit ON T _ well. geocellular ID T _ CAL _ addrontit

INNER JOIN T _ CAL _ DT ON MONTH (T _ well, date of operation) ═ MONTH (T _ CAL _ DT. date) and YEAR (T _ well, date of operation) ═ YEAR (T _ CAL _ DT. date)

The acquisition mode ID is T _ CAL _ mode

A04, creating a dimension dictionary according to the de-duplicated wide table, and associating the dimension dictionary with the fact table.

In the embodiment of the present application, as an optional embodiment, a mapping reduction (MapReduce) method is used, a deduplication value of a dimension is extracted from a fact table, compression coding is performed in a dictionary tree manner, and meanwhile, compression coding is performed on all dimension tables to generate a dimension dictionary, where the dimension dictionary is a dimension table including each dimension level.

In the embodiment of the present application, as an optional embodiment, the dimension dictionary is a Hive dictionary. The Hive is a data warehouse tool based on Hadoop, can map structured data into a database table, provides a complete structured query language function, and can convert a structured query language statement into a mapping reduction (MapReduce) task for operation.

In this embodiment of the present application, as another optional embodiment, constructing a fact table associated with the dimension table includes:

a11, reading a dimension and a data value corresponding to the dimension from multi-source heterogeneous data;

in the embodiment of the application, the multi-source heterogeneous data comprises a data name and a data value corresponding to the data name. The name of the data is the dimension to be read in the embodiment of the application.

A12, inquiring whether the read dimension exists in the stored dimension table;

in the embodiment of the application, whether the read dimension is matched with any dimension in the dimension table can be determined in a key word matching mode, and if the read dimension is matched with any dimension in the dimension table, the dimension is determined to exist in the dimension table.

A13, if not, creating the read dimension in a dimension table, writing the data value corresponding to the read dimension into a stored fact table, constructing the association relationship between the read dimension and the data value written into the fact table, and executing the step of reading another dimension from the multi-source heterogeneous data until the dimension in the multi-source heterogeneous data is completely read;

a14, if yes, inquiring whether a data value which is the same as the data value corresponding to the read dimension exists in the fact table, if yes, executing a step of reading another dimension from the multi-source heterogeneous data until the dimension in the multi-source heterogeneous data is completely read, if not, writing the data value corresponding to the read dimension into the fact table, constructing the association relationship between the read dimension and the data value written into the fact table, and executing a step of reading another dimension from the multi-source heterogeneous data until the dimension in the multi-source heterogeneous data is completely read.

103, constructing a data cube based on the dimension table and the fact table;

in this embodiment, as an optional embodiment, constructing a data cube based on the dimension table and the fact table includes:

a21, creating a database table;

in the embodiment of the application, the database Table is an HBase Table.

A22, generating dimension combinations contained in the dimension table based on the dimension table, and querying data corresponding to each dimension combination based on the fact table to form a cube of the dimension combination;

in this embodiment of the present application, generating a dimension combination included in the dimension table based on the dimension table includes:

calculating a basic cube based on the dimension table; and calculating secondary cubes based on the basic cubes, and obtaining the dimension combination contained in the dimension table according to the basic cubes and the calculated secondary cubes at all levels.

In the embodiment of the application, all dimension combinations in the dimension table are calculated and stored. Wherein, each dimension combination corresponds to a cube.

In the embodiment of the present application, if the number of dimensions included in the dimension table is N, the number of dimension combinations is N times of 2, and as an optional embodiment, a layer-by-layer algorithm is adopted to generate the dimension combinations. Wherein,

in the layer-by-layer algorithm, calculation is performed in a mode that the dimension number decreases layer by layer, all dimensions contained in the dimension table are aggregated to obtain a dimension combination of a first level, and for the dimension combinations of other levels, calculation is performed based on the dimension combination of the previous level of the level.

And A23, converting the data in the cube with all the dimension combinations into a format supported by a database table, and writing the format into the created database table to obtain the data cube.

In the embodiment of the application, data is converted into HFile and is imported into HBase Table.

In the embodiment of the present application, the program code segment for writing data into the database table is as follows:

INSERT OVERWRITE TABLE Well_cube_desc_19700101000000_20140101000000_94eeff_f3ed_4c38_9e2f_b675f6cce23d SELECT

FACT_TABLE.CATEG_DT

,FACT_TABLE.CATEG_ORGID

,FACT_TABLE.CATEG_WELLBOREID

FACT_TABLE.CATEG_ADDRUNITID

,FACT_TABLE.CATEG_PROMODEID

FACT_TABLE.CATEG_WELLNAME

,LOOKUP_2.CATEG_WELLBORENAME

,FACT_TABLE.WELL_OUTPUT

FROM DEFAULT.T_WELL as FACT_TABLE

INNER JOIN DEFAULT.T_CAL_DT as LOOKUP_1

ON FACT_TABLE.CATEG_DT＝LOOKUP_1.CAL_DT

INNER JOIN DEFAULT.T_CAL_WELLBORE as LOOKUP_2

ON FACT_TABLE.CATEG_WELLBOREID＝LOOKUP_2.CATEG_WELLBOREID

INNER JOIN T_CAL_ORG as LOOKUP_3

ON FACT_TABLE.CATEG_ORGID＝LOOKUP_3.CATEG_ORGID

INNER JOIN T_CAL_ADDRUNIT as LOOKUP_4

ON FACT_TABLE.CATEG_ADDRUNIT ID＝LOOKUP_4.CATEG_ADDRUNIT ID

INNER JOIN T_CAL_PROMODE as LOOKUP_5

ON FACT_TABLE.CATEG_PROMODE ID＝LOOKUP_5.CATEG_PROMODEID

WHERE(FACT_TABLE.CATEG_DT<'2019-05-31')

in this embodiment, as another optional embodiment, building a data cube based on the dimension table and the fact table includes:

a31, constructing a coordinate system of the data cube according to the dimension table;

in the embodiment of the application, each dimension in the dimension table corresponds to a coordinate axis in the multidimensional coordinate system, and the dimension number included in the dimension table is the number of coordinate axes included in the coordinate system of the data cube.

A32, reading the data value corresponding to the dimension table from the fact table associated with the dimension table, and filling the data value to the coordinate point corresponding to the dimension table in the coordinate system to obtain the data cube.

In the embodiment of the application, the data value is a fact in the fact table, the data value corresponding to each dimension level information is obtained from the fact table according to the dimension level information of each dimension in the dimension table, and is added to the coordinate point corresponding to the dimension level information, so that the construction of the data cube is realized.

In the embodiment of the application, because the data cube is associated with the dimension table and the fact table, when the dimension in the dimension table or the numerical value in the fact table is updated, the coordinate point corresponding to the data cube is correspondingly dynamically updated.

104, establishing a structured query language statement corresponding to the dimension combination based on the dimension combination contained in the dimension table, acquiring a cube corresponding to the dimension combination from the constructed data cube, and storing the structured query language statement and the cube into a database;

in the embodiment of the present application, taking tables 1, 2, 3 and 4 as examples, in the data cube, time, types of oil production equipment and regions are dimensions, and oil production is a measure. For each model of oil recovery equipment, a cube (cuboid) is constructed according to the oil production of the model of oil recovery equipment in a first time period and a first region, for example, the oil production of M-type oil recovery equipment in Daqing is a cube in 2019.01.01-2019.06.30 time period, and all the cubes constructed form a data cube.

In the embodiment of the application, dimension combination is performed based on a dimension table, structured query language statements are packaged based on the dimension combination, query results corresponding to the packaged structured query language statements are obtained from a data cube or a star map and stored in a database, so that in the subsequent query, the data cube does not need to be queried according to a query request, and the database is directly queried, so that corresponding query results are obtained, namely, query SQL corresponding to each combination is set for possible combinations of dimensions in the dimension table, the SQL corresponding to the combination is processed in advance to obtain the query results and is persisted in the database, so that the query results persisted in the database can be reused in the actual query, the query speed is effectively increased, and the query efficiency is improved. In the embodiment of the present application, as an optional embodiment, a combination that all dimensions may exist may also be pre-calculated for data in the star map, and stored in the database as a subcube.

And 105, receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user.

In the embodiment of the application, the data cube is transparent to user query, is mapped to the corresponding data cube by automatically analyzing the structured query language, and acquires the data of the coordinate point corresponding to the mapped data cube and outputs the data as a query result; if the mapping is not mapped to the corresponding data cube, the query result is obtained by querying the dimension table and the fact table or the data warehouse.

In the embodiment of the application, if well type dimension statistics of wells needs to be added subsequently, the well type dimensions can be adaptively added in the dimension table, blood margin analysis and feature analysis of fields contained in the dimension table are performed, for example, a Neo4j graph database is constructed according to multi-source heterogeneous data, relationships among metadata corresponding to the dimensions, entity attribute relationships, blood margin analysis of the fields, feature analysis and data access frequency analysis are performed based on the Neo4j graph database, and therefore cubes corresponding to the well type dimensions are automatically filled in data cubes. Data cubes are a type of multi-dimensional matrix that allow users to explore and analyze data from multiple dimensions.

In the embodiment of the present application, taking increasing well type dimensionality in the data of a production well development month as an example, the method includes 6 dimensionalities, which are respectively:

the organizational dimension (T _ CAL _ ORG), the geological unit dimension (T _ CAL _ ADDRUNIT), the time dimension (year, month T _ CAL _ DT), the production mode dimension (T _ CAL _ PROMODE), the WELLBORE dimension (T _ CAL _ WELLBORE), and the well type dimension (T _ CAL _ WELLBYLTYPE). As shown in table 6.

TABLE 6

Adding an attribute description corresponding to the WELL-type dimension in the fact table, namely the attributes of the WELL (T _ WELL), comprises the following steps: wellbore ID, geocellular ID, organization ID, well type ID, geographic location, key well grade, drill-in date, drill-out date, completion date, oil production, and statistical time.

In the Star diagram, a new dimension is defined:

t _ WELL INNER JOIN T _ CAL _ well ON T _ well, well ID T _ CAL _ well.

In the embodiment of the application, based on the database, the columns cached in the fact table, the oil production amount in each time period, the oil production amount in the preset time period and the like can be inquired.

Fig. 2 is a schematic structural diagram of a data query device according to an embodiment of the present application. As shown in fig. 2, the apparatus includes:

the dimension obtaining module 201 is configured to obtain multi-source heterogeneous data, perform cluster analysis on the multi-source heterogeneous data, and obtain dimensions included in the multi-source heterogeneous data and a dimension hierarchy corresponding to each dimension;

in an embodiment of the present application, the multi-source heterogeneous data includes: structured data, semi-structured data, and unstructured data.

A fact table construction module 202, configured to construct a dimension table based on the obtained dimensions and the dimension hierarchy corresponding to each dimension, and construct a fact table associated with the dimension table based on the obtained multi-source heterogeneous data;

In this embodiment of the present application, as an optional embodiment, the fact table constructing module constructs a fact table associated with the dimension table based on the obtained multi-source heterogeneous data, including:

removing the weight of the dimension column in the wide table;

In this embodiment, as another optional embodiment, the fact table constructing module constructs a fact table associated with the dimension table based on the obtained multi-source heterogeneous data, including:

A data cube construction module 203, configured to construct a data cube based on the dimension table and the fact table;

in this embodiment, as an optional embodiment, the data cube construction module 203 is specifically configured to:

creating a database table;

In this embodiment, as another optional embodiment, the data cube construction module 203 is specifically configured to:

A database construction module 204, configured to create a structured query language statement corresponding to the dimension combination based on the dimension combination included in the dimension table, obtain a cube corresponding to the dimension combination from the constructed data cube, and store the structured query language statement and the cube in a database;

the query module 205 is configured to receive query information input by a user, convert the query information into structured query information, obtain cube information corresponding to the structured query information from the database, and output the obtained cube information to the user.

In this embodiment, as an optional embodiment, the apparatus further includes:

and the preprocessing module (not shown in the figure) is used for preprocessing the acquired multi-source heterogeneous data.

In this embodiment, as an optional embodiment, the preprocessing includes: deleting abnormal multi-source heterogeneous data, repairing missing values in the multi-source heterogeneous data, and performing format conversion on the multi-source heterogeneous data according to a preset format or any combination thereof.

As shown in fig. 3, an embodiment of the present application provides a computer device 300, configured to execute the data query method in fig. 1, where the device includes a memory 301, a processor 302, and a computer program stored in the memory 301 and executable on the processor 302, where the processor 302 implements the steps of the data query method when executing the computer program.

Specifically, the memory 301 and the processor 302 can be general-purpose memory and processor, and are not limited to specific examples, and the processor 302 can execute the data query method when executing the computer program stored in the memory 301.

Corresponding to the data query method in fig. 1, an embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the steps of the data query method.

In particular, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when executed, the computer program on the storage medium can execute the data query method.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments provided in the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus once an item is defined in one figure, it need not be further defined and explained in subsequent figures, and moreover, the terms "first", "second", "third", etc. are used merely to distinguish one description from another and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for querying data, comprising:

constructing a data cube based on the dimension table and the fact table;

receiving query information input by a user, converting the query information into structured query information, acquiring cube information corresponding to the structured query information from the database, and outputting the acquired cube information to the user;

constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data, wherein the fact table comprises:

2. The data query method of claim 1, wherein constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data comprises:

removing the weight of the dimension column in the wide table;

3. The data query method of claim 1, wherein said building a data cube based on said dimension table and fact table comprises:

creating a database table;

4. The data query method of claim 1, wherein said building a data cube based on said dimension table and fact table comprises:

5. The data query method of any one of claims 1 to 4, wherein after the obtaining the multi-source heterogeneous data and before performing cluster analysis on the multi-source heterogeneous data, the method further comprises:

and preprocessing the acquired multi-source heterogeneous data.

6. The data query method of claim 5, wherein the preprocessing comprises: deleting abnormal multi-source heterogeneous data, repairing missing values in the multi-source heterogeneous data, and performing format conversion on the multi-source heterogeneous data according to a preset format or any combination thereof.

7. A data query apparatus, comprising:

the data cube construction module is used for constructing a data cube based on the dimension table and the fact table; constructing a fact table associated with the dimension table based on the obtained multi-source heterogeneous data, wherein the fact table comprises:

if the data values exist, inquiring whether the data values corresponding to the read dimensions in the fact table are the same, if so, executing a step of reading another dimension from the multi-source heterogeneous data until the dimensions in the multi-source heterogeneous data are read completely, if not, writing the data values corresponding to the read dimensions in the fact table, constructing an association relation between the read dimensions and the data values written in the fact table, and executing a step of reading another dimension from the multi-source heterogeneous data until the dimensions in the multi-source heterogeneous data are read completely;

8. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the steps of the data query method of any one of claims 1 to 6.

9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data query method according to one of claims 1 to 6.