[go: up one dir, main page]

CN113553345A - Query method and device for vectorization database - Google Patents

Query method and device for vectorization database Download PDF

Info

Publication number
CN113553345A
CN113553345A CN202110815471.8A CN202110815471A CN113553345A CN 113553345 A CN113553345 A CN 113553345A CN 202110815471 A CN202110815471 A CN 202110815471A CN 113553345 A CN113553345 A CN 113553345A
Authority
CN
China
Prior art keywords
query
row
query result
null
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110815471.8A
Other languages
Chinese (zh)
Other versions
CN113553345B (en
Inventor
张勇
季桃桃
刘博天
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Business Intelligence Of Oriental Nations Corp ltd
Original Assignee
Business Intelligence Of Oriental Nations Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Business Intelligence Of Oriental Nations Corp ltd filed Critical Business Intelligence Of Oriental Nations Corp ltd
Priority to CN202110815471.8A priority Critical patent/CN113553345B/en
Publication of CN113553345A publication Critical patent/CN113553345A/en
Application granted granted Critical
Publication of CN113553345B publication Critical patent/CN113553345B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a query method and a query device for a vectorization database, which are used for receiving a query statement input by a user; in response to the query statement, determining target column data and a relational expression based on the query statement; acquiring null marks of all rows in a query result based on the null marks of all rows in the target column data; and acquiring a query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression. According to the vectorization database query method and device provided by the invention, the null marks of all rows in the query result are obtained based on the null marks of all rows in the target column data, and after logical operation is carried out on the null marks of all rows in the query result and the target column data, the relational expression in the query statement is executed and the query result is obtained, so that branch judgment can be reduced, and performance acceleration is achieved.

Description

Query method and device for vectorization database
Technical Field
The invention relates to the technical field of computers, in particular to a query method and a query device for a vectorization database.
Background
Nowadays, more and more databases and big data analysis start to adopt a column number storage structure, because column storage not only can reduce cache miss, but also can use vectorization attribute of a processor, thereby achieving performance acceleration effect.
However, for NULL values and overflows existing in the database, the existing CPU processes data by using branch judgment, and a penalty (loss) of clock cycles is generated if the branch judgment is wrong, and the loop of the branch judgment is inefficient in processing NULL values and overflows.
Disclosure of Invention
The invention provides a query method and a query device for a vectorization database, which are used for solving the defect of low query processing efficiency in the prior art and realizing performance acceleration.
The invention provides a query method of a vectorization database, which comprises the following steps:
determining target column data and a relational expression based on the query statement;
acquiring null marks of all rows in a query result based on the null marks of all rows in the target column data;
and acquiring a query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression.
According to the query method of the vectorization database provided by the present invention, after obtaining the query result based on the empty identifier of each row in the query result, the target column data and the relational expression, the query method further comprises:
and outputting an overflow abnormal prompt when overflow exists in any row of the query result.
According to the query method of the vectorization database provided by the invention, the obtaining of the query result based on the empty identifier of each row in the query result, the target column data and the relational expression comprises:
for each target row, performing logical and operation on the null identifier of the target row in the query result and the element of the target row in each target column data respectively, and processing the null value of the target row to obtain a result;
and determining the elements of the target row in the query result based on the null value processing result of the target row and the relational expression.
The present invention provides a query method for a vectorized database, where the obtaining of the empty identifier of each row in the query result based on the empty identifier of each row in the target column data includes:
for each target row, acquiring a null mark of the target row in each target column data;
and performing logic AND operation based on the empty marks of the target rows in the target column data to obtain the empty marks of all rows in the query result.
According to the query method of the vectorization database provided by the present invention, after determining the elements of the target row in the query result based on the null value processing result of the target row and the relational expression, the method further includes:
and performing overflow detection on the elements of the target row in the query result, and determining whether the target row of the query result has overflow.
According to the query method of the vectorization database provided by the invention, the overflow exception prompt comprises the following steps: and the query result has the line number of each overflowing line.
The invention also provides a query device of the vectorization database, which comprises the following components:
the receiving module is used for receiving the query statement input by the user; the statement analysis module is used for responding to the query statement and determining target column data and a relational expression based on the query statement;
a null value detection module, configured to obtain null identifiers of each row in the query result based on the null identifiers of each row in the target column data;
and the data processing module is used for acquiring the query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression.
The present invention provides an apparatus for querying a vectorized database, further comprising:
and the overflow detection module is used for outputting an overflow abnormal prompt under the condition that overflow exists in any row of the query result.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the vectorization database query method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of querying a vectorized database as described in any one of the above.
According to the vectorization database query method and device provided by the invention, the null marks of all rows in the query result are obtained based on the null marks of all rows in the target column data, and after logical operation is carried out on the null marks of all rows in the query result and the target column data, the relational expression in the query statement is executed and the query result is obtained, so that branch judgment can be reduced, and performance acceleration is achieved.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a flow chart of a query method for a vectorized database according to the present invention;
FIG. 2 is a schematic structural diagram of a query device for a vectorized database according to the present invention;
fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a query method for a vectorized database according to the present invention. As shown in fig. 1, the query method for a vectorized database provided by the embodiment of the present invention includes: step 101, receiving a query statement input by a user.
It should be noted that the main execution body of the query method for the vectorized database is the query device for the vectorized database, and the object for implementing the query method is the data stored in the database. The database can be divided into two types according to the basic logic storage unit stored in the database: a row-type storage database and a column-type storage database, and the kind of the database is not particularly limited by the embodiment of the present invention.
For example, the data storage system can be a line type storage database which stores according to a row data-based logical storage unit, such as a traditional relational database of Oracle, DB2, MySQL, SQL SERVER, and the like.
Preferably, the database is a column-wise storage database that stores according to column data based logic storage units, such as Hbase, HP Vertica, EMC greenplus and other emerging distributed databases.
The query statement refers to a data processing method, and obtains data meeting a condition from a database according to a query condition specified in the query statement. The different types of databases correspond to different types of data processing, and the type of the query statement is not particularly limited in the embodiment of the present invention.
For example, the query statement may be an On-Line Analytical Processing (OLAP) statement that is used to perform basic, daily transactions On the Line store database, such as adding, deleting, modifying, and checking database records.
Specifically, the user inputs a query statement to the query device of the vectorization database or the electronic device provided with the query device of the vectorization database according to the actual analysis requirement.
Preferably, the query statement is an On-Line Analytical Processing (OLAP) for performing computational analysis On the Line store database.
Step 102, responding to the query statement, and determining target column data and a relational expression based on the query statement.
It should be noted that the applicable scenario of the query method for vectorizing the database is that an OLAP type query statement may need to access millions or even billions of data rows, and the query statement usually only concerns a few data columns.
The target column data refers to at least two data columns to be subjected to computational analysis in the query statement.
For example, the top 20 items that sell most today are queried, which only concerns three columns of data: time (date), item (item), and sales volume (sales account). Other data columns for the item, such as item URL, item description, store to which the item belongs, etc., are not meaningful for this query.
In this application scenario, the columnar database can analyze a large amount of data according to an OLAP type query statement by only reading a data column storing "time, commodity, sales amount". In order to acquire three columns of data, namely "time, commodity and sales", of the line database, all data columns in the database need to be read and then processed, so that the reading efficiency is low. Therefore, in the scenario of executing an OLAP type query statement for a columnar database, the NULL value and the overflow value still need to be processed.
Specifically, the query device of the vectorization database parses the received query statement to obtain the data and the data processing relational expression in the field corresponding to the query statement.
Preferably, all data under the corresponding field are stored in the storage medium consecutively in column order, and accordingly, the column involved in the query and all data stored in the column can be read at once through one query statement.
The relational expression may be an addition, subtraction, multiplication, and division four arithmetic relations among a plurality of target column data, or may be a composite arithmetic relation, which is not specifically limited in the embodiment of the present invention.
And 103, acquiring the null marks of all rows in the query result based on the null marks of all rows in the target column data.
Note that NULL in the database indicates that the data value of the element is NULL, and NULL and the calculation result of any element are NULL.
The NULL flag is a representation independent of the data itself, and is used to indicate whether the value of the data itself is NULL.
The query result is a result obtained by performing the operation on the relational expression parsed in step 102 on the plurality of target column data.
Specifically, after the object and the manner of data processing are determined in step 102, for any row in the target column data, corresponding logical operations are performed according to the NULL operation rule on the NULL flag of the element in the row in each target column data, so as to obtain the NULL flag of the query result in the row, so as to know whether the query result in the row is NULL.
For the empty identifiers of the remaining rows in the query result, a loop or iteration may be performed with the row number as a variable, and the empty identifiers of the query result of all rows may be obtained, which is not specifically limited in this embodiment of the present invention.
And step 104, acquiring a query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression.
Specifically, after the empty flag of the query result is determined in step 103, for any row in the target column data, the query device of the vectorization database performs logical operation on the empty flag of the row of the query result and the element of each target column data in the row, performs the relational expression parsed in step 102 on the multiple logical operation results, and obtains the query result after data processing.
For the query results of the remaining rows, a loop or iteration may be performed with the row number as a variable, and the query results of all rows may be obtained, which is not specifically limited in this embodiment of the present invention.
The following illustrates a specific embodiment of the query method for setting a vectorized database.
For example, according to a common query statement "select c1+ c2 from table", it is indicated that the data in c1 column and the data in c2 column are sequentially added in rows.
The traditional query processing flow is as follows:
Figure BDA0003169987780000071
wherein, N is the total number of rows of data, c1 and c2 are two columns of data in the database, c1_ isNull and c2_ isNull are respectively null identifiers corresponding to c1 and c2, and result is the query result.
In the loop, an if judgment is performed for each row, and c1_ isNull and c2_ isNull perform bitwise AND operation, so that c1 and c2 perform addition operation when the operation result does not satisfy at least one NULL of c1 and c 2.
And (3) executing 2N times of operations until the circulation is finished and the number of the processed lines is N, wherein N times of branch judgment is executed if, and the other N times of corresponding operations are executed according to the branch judgment result.
The query processing flow provided by the embodiment of the invention is as follows:
Figure BDA0003169987780000072
wherein, N is the total number of rows of data, c1 and c2 are two columns of data in the database, tmp _ c1 and tmp _ c2 are null processing results corresponding to c1 and c2, respectively, c1_ isNull, c2_ isNull and result _ isNull are null identifiers corresponding to c1, c2 and result, respectively, and result is a query result.
Within the loop, for each row, result _ isNull is first obtained by a bitwise AND operation of c1_ isNull and c2_ isNull to indicate whether result is empty. Wherein, the identification rules corresponding to c1_ isNull, c2_ isNull and result _ isNull are that if the corresponding data are null, all the identification rules are 0; if the corresponding data is not empty, all the data are 1.
The empty flag (result _ isNull) of the row query result column is bitwise AND'd with the elements of c1 and c2 in the row, respectively:
if result _ isNull is not null, result _ isNull is all 1, and the logical operation results tmp _ c1 and tmp _ c2 of c1 and c2 are c1 and c2, respectively, and their data themselves. Thus, the query result is the sum of tmp _ c1 and tmp _ c2, i.e. the sum of c1 and c 2.
If result _ isNull is null, result _ isNull is all 0, and the logical operation results tmp _ c1 and tmp _ c2 of c1 and c2 are both 0. Thus, the query result is the sum of tmp _ c1 and tmp _ c2, i.e. 0+0, and the performance thereof can be omitted.
And (4) until the circulation is finished, the number of the processed lines is N, and N operations are correspondingly executed, so that the operation frequency is reduced by half compared with that of the traditional query process.
The embodiment of the invention obtains the null marks of all rows in the query result based on the null marks of all rows in the target column data, executes the relational expression in the query statement and obtains the query result after performing logic operation on the null marks of all rows in the query result and the target column data, and can reduce branch judgment to achieve performance acceleration.
On the basis of any of the above embodiments, after obtaining the query result based on the empty identifier of each row, the target column data, and the relational expression in the query result, the method further includes: and outputting an overflow exception prompt when overflow exists in any row of the query result.
It should be noted that, for any row of the columnar storage database, after step 104, overflow detection is also performed on the query result.
Overflow refers to the result of a re-arithmetic operation that produces a result that is outside the maximum range that can currently be represented, and such an arithmetic overflow is unacceptable for the database and requires an exception to be thrown to notify the user.
Specifically, after the query result is obtained in step 104, for any row in the query result, the query device of the vectorization database executes the overflow detection algorithm on the query result of the row for detection, and the executed overflow detection algorithm is not specifically limited in the embodiment of the present invention.
When the overflow of the query result value of the row is detected, an overflow exception prompt is output, and the embodiment of the invention does not specifically limit the exception prompt form.
For example, the data may be displayed in the interface as text or stored in a table form.
And when the query result value of the row is detected not to overflow, directly outputting the query result of the row.
For overflow detection of the query results of the remaining rows, loop or iteration may be performed by using the row number as a variable, and query results of all rows may be performed, which is not specifically limited in this embodiment of the present invention.
Preferably, for overflow detection of the query results of the remaining rows, the number of rows may be used as a variable, and after null processing is finished, loop or iteration is performed following the null processing, and overflow detection may be performed on the query results of all rows.
The following illustrates a specific embodiment of the query method for setting a vectorized database.
For example, according to a common query statement "select c1+ c2 from table", it is indicated that the data in c1 column and the data in c2 column are sequentially added in rows.
The traditional query processing flow is as follows:
Figure BDA0003169987780000091
Figure BDA0003169987780000101
wherein, N is the total number of rows of data, c1 and c2 are two columns of data in the database, c1_ isNull and c2_ isNull are respectively empty identifiers corresponding to c1 and c2, result is a query result, over _ flow _ calc is an overflow detection algorithm, and over _ flow is an overflow identifier.
In the circulation, each line needs to be subjected to branch judgment to execute the operation, and after the query result of the line is obtained, overflow detection is carried out. And judging according to the other branch to obtain whether the numerical value of the query result overflows or not, and executing different operations according to the judgment result.
And until the loop is finished, the number of the processed lines is N, and after the two times of nested if branch judgment, the operation is executed for 4N times.
The query processing flow provided by the embodiment of the invention is as follows:
Figure BDA0003169987780000102
Figure BDA0003169987780000111
wherein N is the total number of rows of data, c1 and c2 are two columns of data in the database, tmp _ c1 and tmp _ c2 are null processing results corresponding to c1 and c2, respectively, c1_ isNull, c2_ isNull, and result _ isNull are null flags corresponding to c1, c2, and result, result is a query result, over _ flow _ calc is an overflow detection algorithm, over _ flow _ tmp is an overflow detection result, and over _ flow is an overflow location flag.
In circulation, for each row, after the query result of the row is obtained, overflow judgment is carried out, and an overflow detection result is obtained. If the over _ flow _ calc () function returns 0, the over _ flow _ tmp is 0, which means that there is no overflow and the query result of the line can be directly output. The over _ flow _ calc () function returns 1, then over _ flow _ tmp is 1, indicating that there is an overflow and an exception needs to be thrown. Therefore, the over _ flow [1] position is preset to a default value of-1 before the loop begins, and if there is an overflow in the row, the value at this position is modified to the corresponding number of rows. Finally, the updated value from the over _ flow [1] position is compared to the default value of-1. If the value of the over _ flow [1] location update equals the default value of-1, then no exception is thrown. If the value of the over _ flow [1] location update is not equal to the default value of-1, then an exception is thrown.
And (3) until the circulation is finished, the number of the processed lines is N, and N +1 times of operation is correspondingly executed, wherein the query result and the corresponding overflow position identification are obtained for N times, and the overflow position identifications of all the lines are scanned in a full value mode for the last 1 time, so that the operation frequency is obviously reduced compared with that of the traditional query process.
The embodiment of the invention carries out overflow detection based on the obtained query result and executes full-value scanning and full-value calculation. The branch judgment can be reduced, and the parallel calculation of null value detection and overflow detection is simultaneously integrated to achieve the performance acceleration.
On the basis of any of the above embodiments, obtaining a query result based on the empty identifiers of the rows, the target column data, and the relational expression in the query result includes:
and for each target row, performing logical AND operation on the null identifier of the target row in the query result and the element of the target row in each target column data respectively, and processing the result of the null value of the target row.
It should be noted that the target row refers to a certain row in the query result column vector or the target column data, where the query result column vector and the target column data have the same target row.
Specifically, for each target row in the column vector, the query device of the vectorization database performs a logical and operation on the empty identifier of the row query result and each target column data related to the query statement, so as to obtain a null value processing result of each target column data in the row.
And the null value processing result is used for representing a result obtained by carrying out corresponding null value processing on the element, and the aim is to enable the element in the row in all the column vectors participating in the query processing to be converted into 0 if the query result in the row has a null value. If the query result of the row does not have a null value, the elements in the row in all the column vectors participating in the query processing are still the numerical values themselves.
Preferably, the null value processing procedure performs logical and operation on the element of the target row for the null identification target column data of the row query result. Wherein, the empty marks of any row of query results are all 0, which indicates that the query results have empty values, and the elements of the corresponding row in one or more column vectors participating in the query processing have empty values. The all null marks of any row of query results are 1, which means that no null value exists in the query results, and no null value exists in elements of corresponding rows in all column vectors participating in query processing.
And determining the elements of the target row in the query result based on the null value processing result and the relational expression of the target row.
Specifically, the query device of the vectorization database executes the relational expression parsed in step 102 according to the null processing result of the target row, so as to obtain the query result corresponding to the target row.
For the query results of other rows, the number of rows may be used as a variable, and the processing steps are executed in a loop or an iteration, which is not described in detail in this embodiment.
The following illustrates a specific embodiment of setting a vectorized database null process.
For example, c1 and c2 of a certain target row are null values, and since there is a null value between them, the query result obtained according to the query statement select c1+ c2 from table is also null value, i.e. its null flag result _ isNull is all 0. Then, result _ isNull is logically ANDed with c1 and c2, respectively. For result _ isNull & c1, i.e. 00000000& null 0000, for result _ isNull & c2, i.e. 00000000&00000011 00000000. Through such processing, the null processing results of c1 and c2, i.e., tmp _ c1 and tmp _ c2, are both 0. Thus, c1+ c2 with null values is converted into tmp _ c1 and tmp _ c2 being 0+0 being 0, and performance thereof is largely omitted.
Since there is no null value between c1 and c2 of a certain target row and c2 of the target row is 1, the query result obtained according to the query statement select c1+ c2 from table is not null value, i.e. its null flag result _ isNull is all 1. Then, result _ isNull is logically ANDed with c1 and c2, respectively. For result _ isNull & c1, i.e. 11111111&00000001 ═ 00000001, for result _ isNull & c2, i.e. 11111111&00000011 ═ 00000011. Through such processing, the null processing results of c1 and c2, i.e., tmp _ c1 and tmp _ c2, are still 1 and 3, respectively. Thus, the processed tmp _ c1+ tmp _ c2 is c1+ c2 is 4.
The embodiment of the invention performs logical operation on the null marks and the target column data of each row in the query result to obtain a null value processing result, and executes the relational expression in the query statement on the null value processing result and obtains the query result, so that branch judgment can be reduced, and performance acceleration can be achieved.
On the basis of any of the above embodiments, obtaining the empty identifier of each row in the query result based on the empty identifier of each row in the target column data includes:
for each target row, a null identification of the target row in the target column data is obtained.
Specifically, for each target column data related to the query statement, in each row, the query device of the vectorization database first obtains a null flag corresponding to an element according to the element of the target column data in the target row, so as to represent whether the element of the target column data in the target row is null.
And performing logic AND operation based on the empty marks of the target rows in the target column data to obtain the empty marks of all rows in the query result.
Specifically, the query device of the vectorization database determines whether an element of the query result in the target row is empty according to the empty identifiers of all the target column data related to the query statement in the target row.
Preferably, the query device of the vectorization database performs logical and operation according to the empty identifiers of all the target column data related to the query statement in the target row to obtain the empty identifier of the query result corresponding to the query statement in the target row, so as to represent whether the element of the query result in the target row is empty.
The following illustrates a specific embodiment of setting a vectorization database to determine whether the query result is empty.
For example, according to the conventional determination process, c1 of a certain target row is null, c2 is 3, and the query result obtained according to the query statement select c1+ c2 from table is c1+ c2 is null, and then the query result is determined to know that the result is null, that is, result _ isNull is all 0.
The judgment process provided by the embodiment of the invention is as follows: first, it is determined whether c1 and c2 are empty or not, respectively, to obtain c1_ isNull as all 0 and c2_ isNull as all 1. And performing logical AND operation according to the respective null flags of c1 and c2, namely that at least one null flag with all 0 exists, and the query result of the logical AND operation has to have a null value. The process is as follows:
result _ isNull is c1_ isNull & c2_ isNull 0000&11111111 0000000, and in the case where result _ isNull is all 0, it indicates that there is a null value in the query result.
If neither c1 nor c2 is empty, then both c1_ isNull and c2_ isNull are all 1's,
result _ issull is c1_ isNull & c2_ isNull is 11111111& 11111111111 is 11111111111, and if result _ isNull is all 1, it indicates that there is no null value in the query result.
According to the embodiment of the invention, the null mark of the query result is obtained based on the null mark of the target row in each target column data, so that invalid operation can be reduced under the condition that a null value exists, and the performance acceleration is achieved.
On the basis of any of the above embodiments, after determining the element of the target row in the query result based on the null value processing result of the target row and the relational expression, the method further includes: and performing overflow detection on the elements of the target line in the query result, and determining whether the target line of the query result has overflow.
Specifically, for each target row, after the query result of the row is obtained, the query device of the vectorization database executes an overflow detection algorithm on the query result of the row to determine whether the numerical value of the query result of the row overflows, and the embodiment of the present invention does not specifically limit the overflow detection algorithm.
Preferably, after the query device of the vectorization database executes the overflow detection algorithm on the query result of the row, if the return value of the overflow detection function is 0, it indicates that there is no overflow, and the query result of the row may be directly output. If the return value of the overflow detection function is 1, it indicates that there is overflow and an exception needs to be thrown.
For overflow detection of the query results of the remaining rows, loop or iteration may be performed by using the row number as a variable, and query results of all rows may be performed, which is not specifically limited in this embodiment of the present invention.
Preferably, for overflow detection of the query results of the remaining rows, the number of rows may be used as a variable, and after null processing is finished, loop or iteration is performed following the null processing, and overflow detection may be performed on the query results of all rows.
The embodiment of the invention carries out overflow detection based on the obtained query result and executes full-value scanning and full-value calculation. The branch judgment can be reduced, and the parallel calculation of null value detection and overflow detection is simultaneously integrated to achieve the performance acceleration.
On the basis of any one of the above embodiments, the overflow exception prompting includes: the query result has the row number of each row that overflows.
It should be noted that the overflow exception prompt refers to a prompt for indicating that the user has an overflow exception when the overflow detection is performed on the query result, and the prompt manner is not specifically limited in the embodiment of the present invention.
For example, the number of lines where overflow exists may be stored to a table output, or the prompt may be in the form of a text box.
Preferably, the overflow exception hint is to throw an exception in the function body using a throw mechanism.
Specifically, for each target row, after knowing that the query result of the row has overflow, the row number of the target row is output as the content of the exception prompt, so that the user can know which row value in the query result overflows.
Preferably, a determination condition about the overflow position identifier is preset before the cycle starts, the corresponding overflow position identifier is updated according to the overflow detection result, and the abnormality is thrown out in the function body under the condition that the updated overflow position identifier does not meet the preset determination condition.
The following illustrates a specific embodiment of setting a vectorized database overflow processing flow.
For example, the over _ flow [1] position is preset to a default value of-1 before the loop begins.
In the for loop with the number of rows as a variable, after obtaining a query result, the following operations are performed:
over_flow_tmp=over_flow_calc();
over_flow[over_flow_tmp]=i;
in a loop, if the query result of the line has overflow, the return value of the overflow detection function over _ flow _ calc () is 1, correspondingly, the position of over _ flow [ over _ flow _ tmp ] is identified as over _ flow [1], and further, the value corresponding to the position identification is modified to be the corresponding line number.
If the query result of the row does not have overflow, the return value of the overflow detection function over _ flow _ calc () is 0, correspondingly, the position of over _ flow [ over _ flow _ tmp ] is marked as over _ flow [0], and further, the value corresponding to the position mark is modified to make the value of the position mark be the corresponding row number.
And comparing the updated value corresponding to the position identification with the initial condition over _ flow [1] set before circulation to be-1.
If the value of the over _ flow [1] location update equals the default value of-1, then no exception is thrown. If the value of the over _ flow [1] location update is not equal to the default value of-1, then an exception is thrown.
If the location identification is over _ flow [0], then no processing is performed.
The following illustrates another embodiment of the vectorized database overflow processing flow.
According to the query statement select c1 from table, it can be known that the query result is all elements in the c1 column, and at this time, since the operation among a plurality of column vectors is not involved, the above overflow processing procedure can be directly performed on the elements in each row in the c1 column, which is not described in detail in the embodiments of the present invention.
According to the embodiment of the invention, under the condition that the obtained query result is overflow, the corresponding line number is output as the abnormal prompt content, so that a user can know the position where the overflow exists, branch judgment is reduced, and parallel calculation of null value detection and overflow detection is integrated to achieve performance acceleration.
Fig. 2 is a schematic structural diagram of a query device of a vectorization database provided by the present invention. As shown in fig. 2, the apparatus includes: a receiving module 210, a statement parsing module 220, a null value detection module 230, and a data processing module 240, wherein:
and the receiving module is used for receiving the query statement input by the user.
And the statement analysis module is used for responding to the query statement and determining the target column data and the relational expression based on the query statement.
And the null value detection module is used for acquiring the null marks of all rows in the query result based on the null marks of all rows in the target column data.
And the data processing module is used for acquiring the query result based on the empty identifier of each row, the target column data and the relational expression in the query result.
Specifically, the receiving module 210, the sentence parsing module 220, the null value detecting module 230, and the data processing module 240 are electrically connected in sequence.
The receiving module 210 receives a query statement from a user input.
The statement parsing module 220 parses the received query statement to obtain data and data processing relational expression under the field corresponding to the query statement.
For any row in the target column data, the NULL value detection module 230 performs corresponding logical operation according to the NULL flag of the element in the row in each target column data and the operation rule of NULL, to obtain the NULL flag of the row query result, so as to know whether the row query result is NULL.
The data processing module 240 performs logical operations on the empty identifier of the row query result and the element of each target column data in the row by the query device of the vectorization database for any row in the target column data, and executes the relational expression parsed by the statement parsing module 220 on the multiple logical operation results to obtain the query result after data processing.
Optionally, the apparatus may further include:
and the overflow detection module is used for outputting an overflow abnormal prompt under the condition that overflow exists in any row of the query result.
Optionally, the data processing module 240 may include a null processing unit and a result processing unit, wherein:
the null value processing unit is used for respectively carrying out logical AND operation on the null marks of the target rows in the query result and the elements of the target rows in the target column data for each target row to obtain a null value processing result of the target row;
and the result processing unit is used for determining the elements of the target row in the query result based on the null value processing result and the relational expression of the target row.
Alternatively, the null detection module 230 may include a column null detection unit and a result null detection unit, wherein:
the column null value detection unit is used for acquiring a null mark of a target row in each target column data for each target row;
and the result null value detection unit is used for carrying out logic AND operation on the basis of the null marks of the target rows in the target column data to obtain the null marks of all rows in the query result.
Optionally, the overflow detection module may include:
and the overflow detection unit is used for performing overflow detection on the elements of the target line in the query result and determining whether the target line of the query result has overflow.
Optionally, the overflow exception prompt comprises: the query result has the row number of each row that overflows.
The query device of the vectorization database provided in the embodiment of the present invention is configured to execute the query method based on the vectorization database, and an implementation manner of the query device of the vectorization database is consistent with an implementation manner of the query method of the vectorization database provided in the present invention, and the same beneficial effects can be achieved, which is not described herein again.
The embodiment of the invention obtains the null marks of all rows in the query result based on the null marks of all rows in the target column data, executes the relational expression in the query statement and obtains the query result after performing logic operation on the null marks of all rows in the query result and the target column data, and can reduce branch judgment to achieve performance acceleration.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication interface (communication interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform a query method of a vectorized database, the method comprising: determining target column data and a relational expression based on the query statement; acquiring null marks of all rows in the query result based on the null marks of all rows in the target column data; and acquiring a query result based on the null marks of all rows, the target column data and the relational expression in the query result.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the method for querying a vectorized database provided by the above methods, the method comprising: receiving a query statement input by a user; in response to the query statement, determining target column data and a relational expression based on the query statement; acquiring null marks of all rows in the query result based on the null marks of all rows in the target column data; and acquiring a query result based on the null marks of all rows, the target column data and the relational expression in the query result.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method of querying a vectorized database provided above, the method comprising: receiving a query statement input by a user; in response to the query statement, determining target column data and a relational expression based on the query statement; acquiring null marks of all rows in the query result based on the null marks of all rows in the target column data; and acquiring a query result based on the null marks of all rows, the target column data and the relational expression in the query result.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A query method for a vectorized database, comprising:
receiving a query statement input by a user;
in response to the query statement, determining target column data and a relational expression based on the query statement;
acquiring null marks of all rows in a query result based on the null marks of all rows in the target column data;
and acquiring a query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression.
2. The query method of claim 1, wherein after obtaining the query result based on the empty identifiers of the rows in the query result, the target column data, and the relational expression, the method further comprises:
and outputting an overflow abnormal prompt when overflow exists in any row of the query result.
3. The query method of claim 1, wherein the obtaining the query result based on the empty identifiers of the rows in the query result, the target column data, and the relational expression comprises:
for each target row, performing logical and operation on the null identifier of the target row in the query result and the element of the target row in each target column data respectively, and processing the null value of the target row to obtain a result;
and determining the elements of the target row in the query result based on the null value processing result of the target row and the relational expression.
4. The method as claimed in claim 1, wherein the obtaining the empty id of each row in the query result based on the empty id of each row in the target column data comprises:
for each target row, acquiring a null mark of the target row in each target column data;
and performing logic AND operation based on the empty marks of the target rows in the target column data to obtain the empty marks of all rows in the query result.
5. The query method of claim 3, wherein after determining the elements of the target row in the query result based on the null processing result of the target row and the relational expression, further comprising:
and performing overflow detection on the elements of the target row in the query result, and determining whether the target row of the query result has overflow.
6. The query method of claim 2, wherein the overflow exception prompt comprises: and the query result has the line number of each overflowing line.
7. An apparatus for querying a vectorized database, comprising:
the receiving module is used for receiving the query statement input by the user;
the statement analysis module is used for responding to the query statement and determining target column data and a relational expression based on the query statement;
a null value detection module, configured to obtain null identifiers of each row in the query result based on the null identifiers of each row in the target column data;
and the data processing module is used for acquiring the query result based on the empty identifiers of all rows in the query result, the target column data and the relational expression.
8. The vectorized database query device of claim 7 further comprising:
and the overflow detection module is used for outputting an overflow abnormal prompt under the condition that overflow exists in any row of the query result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the query method of a vectorized database according to any one of claims 1 to 6 when executing the program.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the query method of a vectorized database according to any one of claims 1 to 6.
CN202110815471.8A 2021-07-19 2021-07-19 Query method and device for vectorized database Active CN113553345B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110815471.8A CN113553345B (en) 2021-07-19 2021-07-19 Query method and device for vectorized database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110815471.8A CN113553345B (en) 2021-07-19 2021-07-19 Query method and device for vectorized database

Publications (2)

Publication Number Publication Date
CN113553345A true CN113553345A (en) 2021-10-26
CN113553345B CN113553345B (en) 2024-09-03

Family

ID=78132146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110815471.8A Active CN113553345B (en) 2021-07-19 2021-07-19 Query method and device for vectorized database

Country Status (1)

Country Link
CN (1) CN113553345B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996557B1 (en) * 2000-02-15 2006-02-07 International Business Machines Corporation Method of optimizing SQL queries where a predicate matches nullable operands
CN103970870A (en) * 2014-05-12 2014-08-06 华为技术有限公司 Database query method and server
CN108304505A (en) * 2018-01-18 2018-07-20 上海达梦数据库有限公司 A kind of processing method of SQL statement, device, server and storage medium
CN108804554A (en) * 2018-05-22 2018-11-13 上海达梦数据库有限公司 A kind of data base query method, device, server and storage medium
CN112650766A (en) * 2019-10-10 2021-04-13 腾讯科技(深圳)有限公司 Database data operation method, system and server

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6996557B1 (en) * 2000-02-15 2006-02-07 International Business Machines Corporation Method of optimizing SQL queries where a predicate matches nullable operands
CN103970870A (en) * 2014-05-12 2014-08-06 华为技术有限公司 Database query method and server
CN108304505A (en) * 2018-01-18 2018-07-20 上海达梦数据库有限公司 A kind of processing method of SQL statement, device, server and storage medium
CN108804554A (en) * 2018-05-22 2018-11-13 上海达梦数据库有限公司 A kind of data base query method, device, server and storage medium
CN112650766A (en) * 2019-10-10 2021-04-13 腾讯科技(深圳)有限公司 Database data operation method, system and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
潘娜, 毛宇光, 韩波: "SQL语言中的空值问题", 微机发展, no. 12, 10 December 2004 (2004-12-10) *

Also Published As

Publication number Publication date
CN113553345B (en) 2024-09-03

Similar Documents

Publication Publication Date Title
CN108710613B (en) Text similarity obtaining method, terminal device and medium
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
CN111798273A (en) Training method of purchase probability prediction model of product and purchase probability prediction method
CN112559554A (en) Query statement optimization method and device
CN112286961A (en) SQL optimization query method and device
US11386053B2 (en) Automatic generation of a data model from a structured query language (SQL) statement
CN110704486B (en) Data processing method, device, system, storage medium and server
CN111241093B (en) Dynamic storage expansion method based on database
US20200372026A1 (en) Retroreflective clustered join graph generation for relational database queries
CN112948415A (en) SQL statement detection method and device, terminal equipment and storage medium
CN112597149A (en) Data table similarity determination method and device
CN112069175A (en) Data query method and device and electronic equipment
CN113971224A (en) Image retrieval system, method and related equipment
CN110515967B (en) Spark calculation framework-based data analysis method and electronic equipment
CN110019341B (en) Data query method and device
CN110765100A (en) Label generation method and device, computer readable storage medium and server
CN113722302B (en) Data management method and device
CN119357334A (en) Production data processing method, device, storage medium and program product
KR20200118965A (en) A method for classifying sql query, a method for detecting abnormal occurrence, and a computing device
CN113553345B (en) Query method and device for vectorized database
CN113076322A (en) Commodity search processing method and device
CN111782691B (en) Method, device, electronic device and storage medium for determining index correlation
CN112766779B (en) Information processing method, computer device, and storage medium
CN115878661A (en) Query method, query device, electronic equipment and storage medium
Davitkova et al. Learning over Sets for Databases.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant