CN112860734B - Multi-dimensional range query method and device for seismic data - Google Patents
Multi-dimensional range query method and device for seismic data Download PDFInfo
- Publication number
- CN112860734B CN112860734B CN201911181069.8A CN201911181069A CN112860734B CN 112860734 B CN112860734 B CN 112860734B CN 201911181069 A CN201911181069 A CN 201911181069A CN 112860734 B CN112860734 B CN 112860734B
- Authority
- CN
- China
- Prior art keywords
- query
- data
- keywords
- track head
- seismic data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2291—User-Defined Types; Storage management thereof
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a multi-dimensional range query method and device for seismic data, wherein the method comprises the following steps: acquiring a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords; determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data; and inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords. The method and the device can quickly query the seismic data in a multi-dimensional range, avoid accessing a large amount of redundant data in the query process, improve the query efficiency and optimize the use experience of users.
Description
Technical Field
The invention relates to the technical field of high-performance calculation and big data, in particular to a multi-dimensional range query method and device for seismic data.
Background
Seismic data processing is an important technology in the petroleum exploration industry, and has the function of processing and calculating the field-collected seismic data according to a specific processing algorithm, so that an image of an underground geological structure is obtained and is used for guiding subsequent drilling and petroleum exploitation work. With the continuous application of new exploration technology and high-precision acquisition technology in petroleum exploration, the volume of original seismic data acquired from the field is rapidly increased, the current scale of a single data body exceeds PB level, and the number of seismic channels can reach trillion. The object that is handled by the seismic application is typically a massive volume of seismic data that is logically similar to the data tables in a relational database, organized in row order, with each row record being referred to as a seismic trace. The seismic channel consists of two parts, namely a channel head and a channel body. Wherein attribute information associated with the seismic trace is stored in a trace header, each attribute being referred to as a trace header key. The trace is a floating point array, each floating point number being referred to as a sample point. Because the seismic data volume is high-dimensional structured data, each seismic trace has hundreds of attribute information and is stored in different trace head keywords.
However, a large number of interactive seismic applications are typically only interested in a partial dataset of a seismic data volume when accessing that volume. Thus, a large number of seismic data accesses specify a range of values for some of the attributes to filter out particular datasets, while also possibly specifying the ordering of query results in the order of some of the attributes.
Since multi-dimensional range queries are the most common data query patterns in seismic applications, their query speed is critical to the performance and user experience of the seismic application, and in particular the interactive application. Efficient index querying is the basis for guaranteeing querying efficiency and reducing querying delay of seismic data.
In the prior art, the query range of the first track head keyword is generally used for determining the data range to be scanned, and the query ranges of other keywords are used for screening data records in the process of scanning data. When the selectivity of the first header key words is low, a large amount of redundant data can be accessed in the query process, and the query efficiency and the user use experience are seriously affected.
Disclosure of Invention
The embodiment of the invention provides a multi-dimensional range query method for seismic data, which is used for rapidly querying the seismic data in a multi-dimensional range, avoiding accessing a large amount of redundant data in the query process, improving the query efficiency and optimizing the use experience of users, and comprises the following steps:
Acquiring a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords;
Determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords;
the query sequence data is obtained as follows:
determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes;
Obtaining query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme;
the query model is pre-established according to a plurality of historical track head keywords of the seismic data and query sequence data, and comprises the following steps: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
The embodiment of the invention provides a multi-dimensional range query device for seismic data, which is used for rapidly querying the seismic data in a multi-dimensional range, avoiding accessing a large amount of redundant data in the query process, improving the query efficiency and optimizing the use experience of users, and comprises the following steps:
the data acquisition module is used for acquiring a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords;
The pointer determining module is used for determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
The data query module is used for querying the seismic data according to one or more pointers corresponding to the group of track head keywords;
the query sequence data is obtained as follows:
determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes;
Obtaining query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme;
the query model is pre-established according to a plurality of historical track head keywords of the seismic data and query sequence data, and comprises the following steps: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the multi-dimensional range query method of the seismic data when executing the computer program.
The embodiment of the invention also provides computer equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the multi-dimensional range query method of the seismic data when executing the computer program.
Compared with the scheme that the query range of the first head keyword is used for determining the data range to be scanned in the prior art, the query ranges of other keywords are used for screening the data records in the process of scanning the data, the method and the device for searching the data record, provided by the embodiment of the invention, have the advantages that a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords are obtained; determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data; and inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords. According to the embodiment of the invention, one or more pointers corresponding to the set of head keywords are determined according to the query condition data corresponding to each head keyword in the set of head keywords and the pre-established query model, and then the seismic data is queried according to the determined pointers, so that the scanning range of query is effectively reduced, the seismic data can be queried in a rapid multidimensional range, a large amount of redundant data is prevented from being accessed in the query process, the query efficiency is improved, and the user experience is optimized.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
FIG. 1 is a diagram of a method for querying a multi-dimensional range of seismic data according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a secondary B+ tree structure according to an embodiment of the present invention;
FIG. 3 is a bitmap of a multi-dimensional range query filtering stage of seismic data in an embodiment of the invention;
FIG. 4 is a flowchart of a distributed index building algorithm according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a temporary tree structure and merging process according to an embodiment of the present invention;
FIG. 6 is a diagram of a method for querying a multi-dimensional range of seismic data according to an embodiment of the invention;
FIG. 7 is a schematic diagram illustrating the construction of an index implemented by IndexBTreeWriter classes in an embodiment of the present invention;
FIG. 8 is a diagram illustrating implementation of a multi-dimensional range query by IndexBTreeReader classes in an embodiment of the present invention;
FIG. 9 is a block diagram of a multi-dimensional range query device for seismic data in accordance with an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
As previously mentioned, since multi-dimensional range queries are the most common data query patterns in seismic applications, their query speed is critical to the performance and user experience of the seismic application, and in particular the interactive application. Efficient index querying is the basis for guaranteeing querying efficiency and reducing querying delay of seismic data. The B+ tree index is used as a balanced search tree designed for disk or other direct access auxiliary storage devices, and can effectively reduce disk I/O operands during inquiry. And because the B+ tree can support the rapid range scanning along the leaf nodes, the B+ tree has better range query performance and is widely used in the current seismic data query. However, when executing a multidimensional range query, the b+ tree index uses only the query range of the first header key to determine the data range to be scanned, and the query ranges of other keys are used to filter the data records during the process of scanning the data. When the selectivity of the first header key words is low, a large amount of redundant data can be accessed in the query process, and the query efficiency and the user use experience are seriously affected.
In order to query seismic data in a multi-dimensional range rapidly, avoid accessing a large amount of redundant data in the query process, improve the query efficiency and optimize the user experience, an embodiment of the present invention provides a multi-dimensional range query method for seismic data, as shown in fig. 1, the method may include:
Step 101, acquiring a group of head keywords of seismic data and query condition data corresponding to each head keyword in the group of head keywords;
102, determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
Step 103, inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords.
As can be seen from fig. 1, in the embodiment of the present invention, a set of head keywords of seismic data and query condition data corresponding to each head keyword in the set of head keywords are obtained; determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data; and inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords. According to the embodiment of the invention, one or more pointers corresponding to the set of head keywords are determined according to the query condition data corresponding to each head keyword in the set of head keywords and the pre-established query model, and then the seismic data is queried according to the determined pointers, so that the scanning range of query is effectively reduced, the seismic data can be queried in a rapid multidimensional range, a large amount of redundant data is prevented from being accessed in the query process, the query efficiency is improved, and the user experience is optimized.
In the implementation, a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords are acquired.
In an embodiment, the header key includes: any combination of shot coordinates, wave detection point coordinates, sampling points, gun numbers and track numbers.
In the embodiment, the query condition data corresponding to each track head keyword may be one or more values, or may be a value range, and may be set as required.
In the implementation, according to query condition data corresponding to each head keyword in the group of head keywords and a pre-established query model, determining one or more pointers corresponding to the group of head keywords, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical head keywords and query sequence data of the seismic data; and inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords.
In an embodiment, a query model is pre-established according to a plurality of historical track head keywords and query sequence data of seismic data, including: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
In an embodiment, the query model is a multi-level b+ tree index structure. The multi-level B+ tree index adopts a multi-level structure, each level corresponds to a historical track head keyword and is composed of a plurality of B+ trees which are independent from each other. The b+ tree of each level is composed of the corresponding header key of that level. The structure ensures that the query condition data of each head keyword can be used for searching the B+ tree of the corresponding level in the query process, reduces the access to redundant data and achieves the aim of obtaining higher query performance.
For example, as shown in fig. 2, a two-level b+ tree structure composed of a header key a and a header key B is shown, and the first layer (level 0) includes a b+ tree TreeA 0 composed of all different values (a 0,a1,…,an-1 total n) of key a in the data body. Each head key a i in TreeA's leaf node is followed by a pointer to a second level 1 b+ tree TreeB i. The second layer contains n b+ trees of KeyB: the KeyB value in TreeB 0,TreeB1,…,TreeBn-1.TreeBi comes from all seismic trace records in the data volume that satisfy keya=a i. The second level is the lowest level in the multi-level b+ tree, with pointers to corresponding locations of the data being stored in leaf nodes of each TreeB i. The pointer carries pointer information corresponding to each pointer, and the pointer information comprises: numbering of seismic traces.
In an embodiment, the query model, i.e. a multi-level b+ tree index structure, has the following features:
1. the number of layers is the same as the number of the track head keywords;
2. starting from the uppermost layer, according to the sequence of the track head keywords, each layer corresponds to one track head keyword;
3. Within each hierarchy, the index is organized into several mutually independent b+ trees;
4. The lower layer B+ tree is a subtree of the upper layer B+ tree, and pointers pointing to the sub B+ tree of the upper layer B+ tree are stored in leaf nodes of the upper layer B+ tree;
5. storing pointers pointing to corresponding positions of the seismic data in leaf nodes of the lowest layer B+ tree;
6. The query model, namely a multi-level B+ tree index structure is a universal index structure, and compared with a single-level B+ tree, each track head keyword in the query process of the multi-level B+ tree can independently determine the data size to be read in each level, so that redundant data is prevented from being read, and the data quantity accessed in the whole process is less, so that more efficient query performance is achieved.
In an embodiment, the query sequence data is obtained as follows: determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes; and obtaining the query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme.
The inventors have found that in the course of querying seismic data, the order of header keywords to construct a query model can greatly impact model performance. In order to select the most suitable index structure, the query performance of some specific index candidates (i.e., preset ordering schemes) needs to be evaluated.
The prior art proposes the concept of a three-star index, and an ideal index should satisfy all conditions of the three-star index. Three conditions for ideal index are as follows:
1. if the index row that a query needs to access is adjacent or close enough, then the index can be assigned to the first star, which minimizes the size of index data that must be scanned;
2. If the sequence of the index rows is consistent with the query requirement, the index can be given to a second star, and the index query structure can avoid the ordering operation when meeting the condition;
3. if the index row contains all of the head keys in the query condition, then no further read operations need to be performed on the storage device during the query of the index, and such an index may be assigned to the third star.
But in the course of selecting the index, the first star and the second star cannot always be satisfied at the same time in most cases. To minimize the read index data, we need to place the head key that filters out the least data at the front of the index, which may result in the index rows being in a different order than the query requirement. Therefore, the invention proposes three preset sorting schemes as follows:
The method comprises the steps of A, placing the track head keywords with the best selectivity in the row at the forefront, adding the track head keywords to be arranged in sequence in a correct sequence, and finally adding the rest related track head keywords in the query in an index in any sequence;
the scheme B is that the head keywords to be ordered are arranged at the forefront end of the index sheet in sequence, and then the rest relevant head keywords in the query are added in any sequence;
Scheme C, dividing the query into two phases: a filtering stage and a scanning stage. In the filtering stage, an index containing only the head keywords in the row is used, with which all pointers to valid seismic traces are found. A bitmap is constructed using these pointers, as shown in fig. 3, marking all valid seismic traces. In the scanning stage, the index containing all the head keywords in the ordering requirement is used to sequentially scan the leaf nodes at the lowest layer of the index, and all the seismic trace records marked as valid in the bitmap are selected to be the final result.
In the process of inquiry, the performances of the three preset sorting schemes can be dynamically changed according to the characteristics of the data and the inquiry conditions. In order to select the most suitable index, the embodiment of the invention determines the approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes; and obtaining the query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme. The inventor finds that the performance of the query depends on the Size of the read data volume to a great extent, so when the application program performs the data query, an approximate value (ESTIMATED READ Size, ERS) of the data volume to be read corresponding to each preset ordering scheme in the plurality of preset ordering schemes is determined, and then the preset ordering scheme with the minimum ERS is selected as the query sequence data, and the specific steps are as follows:
1. except the top-level B+ tree, determining the number of B+ trees to be searched in each layer according to the number of the effective index items obtained in the query in the previous layer;
2. Estimating the number of query results of each level according to the query scope of each track head keyword;
3. The node size of each B+ tree in the multi-level B+ tree is preset, and the height of the B+ tree of each layer can be estimated according to the node size and the number of different values of the key words of the track heads of each layer, so that the data quantity to be read in the searching process is estimated. ERS is obtained by accumulating the data amount read from each level. Since the scheme a needs to consider the sorting time, the dynamic index policy counts the time proportion occupied by sorting in the query process when executing the query statement, and calculates the average avr_sort_ratio. Then adjust the value of ERS for scheme a to ERS = ERS =x (1+avr_sort_ratio);
4. And according to the ERS values of the three schemes, comparing to obtain query sequence data.
In the embodiment, the user directly submits the query conditions for selecting the filtering keywords and the query conditions for ordering, and does not need to pay attention to the specific use of the scheme A, the scheme B or the scheme C.
In an embodiment, the process of index selection is hidden from the user, for which the most appropriate index scheme is automatically deduced.
In an embodiment, a multi-level b+ tree, that is, a query model, can be quickly constructed by using computing resources of a plurality of computing nodes based on a distributed index construction algorithm of a MapReduce programming model. The B+ trees in the rest levels are independent from each other except the topmost level in the multi-level B+ tree. The tree construction process can thus be divided into a plurality of subtasks based on the value of the first head key, which are executed concurrently. The method and the device can utilize the computing resources of multiple nodes to quickly and concurrently construct the index, improve the efficiency of constructing the index and improve the user experience of the interactive application. The flow of the distributed index construction algorithm is shown in fig. 4, and the specific steps are as follows:
1. in the Map stage, header data is divided into M data pieces on average, each data piece represents a Map task, and then the tasks are randomly distributed to each Map Worker;
2. After the Map task is received, the Map Worker executes a Map function to process the data sheet: all the headers in the slice are read and then a key/value pair is generated for each seismic trace. Key is composed of a first head keyword of an index, the values of other head keywords except the first head keyword in the index are stored in a Value field, and the Value also contains pointer information pointing to the seismic channel;
3. map Worker uses a partitioning function: a hash (key) mod R divides a locally generated key/value pair into R groups, each group of data belongs to a Reduce task, and then a Map Worker sends each group of data to a corresponding Reduce Worker respectively;
4. executing a Reduce function after the Reduce Worker receives key/value pairs sent by each Map Worker, locally establishing a temporary tree, organizing the top layer of the temporary tree into an ordered array, and forming the structure of other layers to be the same as that of a complete multi-level B+ tree;
5. After all ReduceWorker temporary trees are generated, the master process merges all temporary trees, and the structure and merging process of the temporary trees are shown in fig. 5. The top-level arrays of each temporary tree are combined to establish a new B+ tree as the top-level B+ tree of the multi-level B+ tree. The leaf nodes of the top-level B+ tree are stored with pointers to the positions of the subtrees at the lower level, and the pointers contain information such as file identifications, stored offsets and the like.
In an embodiment, when an application executes a query, the most appropriate query model needs to be selected. If the query model does not exist, a distributed index builder is launched to create an index. The system administrator configures a list of available nodes for the index builder, and the index builder occupies only a small portion of the computing resources of each node, avoiding affecting the operation of other applications.
In the embodiment, as shown in fig. 6, searching is performed in the uppermost layer b+ tree according to the query condition of the corresponding track head keyword, so as to find the effective index record in the leaf node; then, pointer information in the index records is found, and a set of sub-B+ trees at the lower layer is found; and searching according to the found B+ tree, continuously searching according to the query condition of the corresponding track head keyword, finding pointer information of the effective index record, judging whether the pointed node is the bottommost node, if not, continuously searching a set of the lower sub-B+ tree, otherwise searching the structure of the lowest sub-B+ tree according to the query condition of the corresponding track head keyword, finding the seismic data corresponding to all the final effective index items, and ending the query.
In the embodiment, the construction of the index is implemented through IndexBTreeWriter classes, as shown in fig. 7, and the specific steps are as follows:
1. Invoking constructor IndexBTreeWriter (const IndexAttr & index_attr, headInfo head_info) generates a class of IndexBTreeWriter;
2. According to the filtering condition and the sorting condition of the query, a OpenWrite (int 64_t diff_key_num) function is called to estimate the total number of index pieces which need to be read respectively by using three index selection schemes, and then one index method which is most suitable for the query is selected from a scheme A, a scheme B and a scheme C;
3. And adding an index patch generated by each channel of seismic data according to the query by repeatedly calling WriteOneIndex (const IndexElement & index) functions until all index patches are completely built.
In an embodiment, the multidimensional range query is implemented by IndexBTreeReader classes, as shown in fig. 8, and the specific steps are as follows:
1. Invoking constructor IndexBTreeReader (const IndexAttr & index_attr, headInfo head_info) generates a class of IndexBTreeReader;
2. And invoking GETVALIDTRACES (const RowFilterByKey & row_filter, std:: vector < int64_t >. Valid_trace_ar) function, querying the existing multi-level B+ tree index, and recording all tracks meeting the filtering condition in a monolithic vector pointer named valid_trace_ar.
In the embodiment, a plurality of mutually independent B+ trees are arranged among the same hierarchy of the multi-level B+ tree, and the structure ensures that the B+ tree of the corresponding hierarchy can be searched by using the query condition of each track head keyword in the query process, thereby reducing the access to redundant data and further obtaining higher query performance.
Based on the same inventive concept, the embodiment of the invention also provides a multi-dimensional range query device for seismic data, as described in the following embodiment. Because the principles of solving the problems are similar to those of the multi-dimensional range query method of the seismic data, the implementation of the device can be referred to the implementation of the method, and the repetition is omitted.
FIG. 9 is a block diagram of a device for querying a multidimensional range of seismic data according to an embodiment of the invention, as shown in FIG. 9, the device includes:
The data acquisition module 901 is configured to acquire a set of header keywords of the seismic data and query condition data corresponding to each header keyword in the set of header keywords;
The pointer determining module 902 is configured to determine one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, where the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
The data query module 903 is configured to query the seismic data according to one or more pointers corresponding to the set of header keywords.
In one embodiment, the track head key comprises: any combination of shot coordinates, wave detection point coordinates, sampling points, gun numbers and track numbers.
In one embodiment, the query sequence data is obtained as follows:
determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes;
and obtaining the query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme.
In one embodiment, the query model is pre-established according to a plurality of historical track head keywords of the seismic data and the query sequence data, and comprises the following steps: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
In summary, according to the embodiment of the invention, a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords are obtained; determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data; and inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords. According to the embodiment of the invention, one or more pointers corresponding to the set of head keywords are determined according to the query condition data corresponding to each head keyword in the set of head keywords and the pre-established query model, and then the seismic data is queried according to the determined pointers, so that the scanning range of query is effectively reduced, the seismic data can be queried in a rapid multidimensional range, a large amount of redundant data is prevented from being accessed in the query process, the query efficiency is improved, and the user experience is optimized.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.
Claims (6)
1. A method for multi-dimensional range query of seismic data, comprising:
Acquiring a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords;
Determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
inquiring the seismic data according to one or more pointers corresponding to the group of track head keywords;
the query sequence data is obtained as follows:
determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes;
Obtaining query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme;
the query model is pre-established according to a plurality of historical track head keywords of the seismic data and query sequence data, and comprises the following steps: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
2. The method of claim 1, wherein the track head key comprises: any combination of shot coordinates, wave detection point coordinates, sampling points, gun numbers and track numbers.
3. A multi-dimensional range query device for seismic data, comprising:
the data acquisition module is used for acquiring a group of head keywords of the seismic data and query condition data corresponding to each head keyword in the group of head keywords;
The pointer determining module is used for determining one or more pointers corresponding to the set of track head keywords according to query condition data corresponding to each track head keyword in the set of track head keywords and a pre-established query model, wherein the one or more pointers carry pointer information corresponding to each pointer, and the query model is pre-established according to a plurality of historical track head keywords and query sequence data of the seismic data;
The data query module is used for querying the seismic data according to one or more pointers corresponding to the group of track head keywords;
the query sequence data is obtained as follows:
determining an approximate value of the data quantity to be read corresponding to each preset sequencing scheme in a plurality of preset sequencing schemes;
Obtaining query sequence data according to the approximate value of the data quantity to be read corresponding to each preset ordering scheme;
the query model is pre-established according to a plurality of historical track head keywords of the seismic data and query sequence data, and comprises the following steps: the query model is pre-established according to one or more B+ trees corresponding to each historical track head keyword in a plurality of historical track head keywords of the seismic data and query sequence data.
4. The apparatus of claim 3, wherein the track head key comprises: any combination of shot coordinates, wave detection point coordinates, sampling points, gun numbers and track numbers.
5. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 2 when executing the computer program.
6. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the method of any of claims 1 to 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911181069.8A CN112860734B (en) | 2019-11-27 | 2019-11-27 | Multi-dimensional range query method and device for seismic data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911181069.8A CN112860734B (en) | 2019-11-27 | 2019-11-27 | Multi-dimensional range query method and device for seismic data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112860734A CN112860734A (en) | 2021-05-28 |
CN112860734B true CN112860734B (en) | 2024-08-27 |
Family
ID=75985537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911181069.8A Active CN112860734B (en) | 2019-11-27 | 2019-11-27 | Multi-dimensional range query method and device for seismic data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112860734B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114153851A (en) * | 2021-12-06 | 2022-03-08 | 智慧足迹数据科技有限公司 | GEOHASH indexing method, GEOHASH indexing device, computer equipment and storage medium |
CN116414822B (en) * | 2021-12-30 | 2025-09-16 | 中国石油天然气集团有限公司 | Method and device for constructing seismic data index library, related equipment and index library |
CN119127925B (en) * | 2023-06-12 | 2025-09-26 | 中国石油天然气集团有限公司 | Seismic data reading method, device, computer equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101676899A (en) * | 2008-09-18 | 2010-03-24 | 上海宝信软件股份有限公司 | Profiling and inquiring method for massive database records |
CN102073727A (en) * | 2011-01-12 | 2011-05-25 | 中国石油集团川庆钻探工程有限公司 | Method for describing seismic data |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6735595B2 (en) * | 2000-11-29 | 2004-05-11 | Hewlett-Packard Development Company, L.P. | Data structure and storage and retrieval method supporting ordinality based searching and data retrieval |
CN102890722B (en) * | 2012-10-25 | 2015-03-11 | 国家电网公司 | Indexing method applied to time sequence historical database |
CN105550241B (en) * | 2015-12-07 | 2019-06-25 | 珠海多玩信息技术有限公司 | Multi-dimensional database querying method and device |
CN109343117B (en) * | 2018-11-10 | 2020-05-01 | 北京科胜伟达石油科技股份有限公司 | Double-cache double-thread seismic data display method |
CN109446293B (en) * | 2018-11-13 | 2021-12-10 | 嘉兴学院 | Parallel high-dimensional neighbor query method |
-
2019
- 2019-11-27 CN CN201911181069.8A patent/CN112860734B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101676899A (en) * | 2008-09-18 | 2010-03-24 | 上海宝信软件股份有限公司 | Profiling and inquiring method for massive database records |
CN102073727A (en) * | 2011-01-12 | 2011-05-25 | 中国石油集团川庆钻探工程有限公司 | Method for describing seismic data |
Also Published As
Publication number | Publication date |
---|---|
CN112860734A (en) | 2021-05-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7158996B2 (en) | Method, system, and program for managing database operations with respect to a database table | |
Beckmann et al. | A revised R*-tree in comparison with related index structures | |
US8037059B2 (en) | Implementing aggregation combination using aggregate depth lists and cube aggregation conversion to rollup aggregation for optimizing query processing | |
US6185557B1 (en) | Merge join process | |
US7469241B2 (en) | Efficient data aggregation operations using hash tables | |
US8838608B2 (en) | Virtual R-tree mapped to an extendible-hash based file system | |
US5875445A (en) | Performance-related estimation using pseudo-ranked trees | |
EP3014488B1 (en) | Incremental maintenance of range-partitioned statistics for query optimization | |
EP1234258B1 (en) | System for managing rdbm fragmentations | |
US20100082654A1 (en) | Methods And Apparatus Using Range Queries For Multi-dimensional Data In A Database | |
CN112860734B (en) | Multi-dimensional range query method and device for seismic data | |
US20170357708A1 (en) | Apparatus and method for processing multi-dimensional queries in a shared nothing system through tree reduction | |
CN105975587A (en) | Method for organizing and accessing memory database index with high performance | |
WO2016038749A1 (en) | A method for efficient one-to-one join | |
US7725448B2 (en) | Method and system for disjunctive single index access | |
Holanda et al. | Cracking KD-Tree: The First Multidimensional Adaptive Indexing (Position Paper). | |
US6732107B1 (en) | Spatial join method and apparatus | |
CN116414822B (en) | Method and device for constructing seismic data index library, related equipment and index library | |
Yagoubi et al. | Radiussketch: massively distributed indexing of time series | |
US9378229B1 (en) | Index selection based on a compressed workload | |
US6694324B1 (en) | Determination of records with a specified number of largest or smallest values in a parallel database system | |
Liu et al. | A new R-tree node splitting algorithm using MBR partition policy | |
CN119356647B (en) | Index construction method, index library and query method for massive seismic data | |
Vespa et al. | Efficient bulk-loading on dynamic metric access methods | |
Wang et al. | Mlb+-tree: A multi-level b+-tree index for multidimensional range query on seismic data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |