CN117762920A - A Bitmap-based multi-table association query method and device - Google Patents
A Bitmap-based multi-table association query method and device Download PDFInfo
- Publication number
- CN117762920A CN117762920A CN202311526239.8A CN202311526239A CN117762920A CN 117762920 A CN117762920 A CN 117762920A CN 202311526239 A CN202311526239 A CN 202311526239A CN 117762920 A CN117762920 A CN 117762920A
- Authority
- CN
- China
- Prior art keywords
- data
- bitmap
- intersection operation
- field
- perform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本发明提供了基于Bitmap的多表关联查询方法与装置,方法包括如下步骤:对多个索引进行多线程并行全文检索,返回每个索引的若干字段;将返回得到的字段进行数据缓存转换得到缓存数据;采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据;对交集运算数据进行重新排序得到排序数据;对排序数据进行聚合,并进行再查询得到完整的搜索数据。本发明提供的基于Bitmap的多表关联查询方法与装置,不依赖于外存,在内存中进行多表查询、关联、聚合、排序等复杂计算,并使用高效位图运算RoaringBitmap对关联查询速度和资源占用进行优化,达到1s内进行千万级别数据的查询关联聚合。
The present invention provides a multi-table association query method and device based on Bitmap. The method includes the following steps: performing multi-thread parallel full-text retrieval on multiple indexes and returning several fields of each index; performing data caching conversion on the returned fields to obtain a cache data; use the Bitmap data structure to perform intersection operations on cached data in memory to obtain intersection operation data; reorder the intersection operation data to obtain sorted data; aggregate the sorted data and perform re-query to obtain complete search data. The multi-table correlation query method and device based on Bitmap provided by the present invention does not rely on external memory, performs complex calculations such as multi-table query, correlation, aggregation, sorting, etc. in the memory, and uses the efficient bitmap operation RoaringBitmap to improve the correlation query speed and Resource occupancy is optimized to achieve query correlation and aggregation of tens of millions of data within 1 second.
Description
技术领域Technical field
本发明涉及信息搜索及查询技术领域,涉及一种结合Bitmap对elasticsearch多表关联复杂查询进行速度优化的方法。具体而言,涉及一种基于Bitmap的多表关联查询方法、装置、设备和计算机可读存储介质。The invention relates to the technical field of information search and query, and relates to a method for optimizing the speed of elasticsearch multi-table associated complex queries in combination with Bitmap. Specifically, it relates to a Bitmap-based multi-table association query method, device, equipment and computer-readable storage medium.
背景技术Background technique
本发明对于背景技术的描述属于与本发明相关的相关技术,仅仅是用于说明和便于理解本发明的发明内容,不应理解为申请人明确认为或推定申请人认为是本发明在首次提出申请的申请日的现有技术。The description of the background technology of the present invention belongs to the related technology related to the present invention. It is only used to illustrate and facilitate understanding of the invention content of the present invention. It should not be understood that the applicant explicitly believes or infers that the applicant believes that the present invention is filed for the first time. Prior art as of the filing date.
elasticsearch(以下简称ES)是一个基于lucene的分布式搜索服务,在全文搜索方面可以提供数亿量级下的秒级查询,比关系型数据库比如MySQL快了数倍不止,另外在日志分析、运维监控、安全分析等场景下也都有广泛的应用。Elasticsearch (hereinafter referred to as ES) is a distributed search service based on Lucene. In terms of full-text search, it can provide hundreds of millions of second-level queries, which is several times faster than relational databases such as MySQL. In addition, it is also used in log analysis and operation. It is also widely used in scenarios such as dimensional monitoring and security analysis.
但在多表关联查询方面,ES的查询效率却不如关系型数据库,因为ES提供的关联查询主要是基于嵌套(nested)文档和父子(parent-child)文档这两种类型来做,对于nested类型,ES会对每个nested字段建立一个单独的文档,nested字段越多越影响效率,而parent-child文档虽然是文档分离存储,但维护关联关系也需要占据部分内存,查询较nested更耗资源,这两种方式都会导致检索效率慢几倍到几百倍不止;另外一种基于ES做关联查询的方法是在数据建模阶段将关联字段扁平化处理,形成一张检索大宽表,但是这会产生大量冗余字段的存储,也会一定程度上影响查询性能。However, in terms of multi-table related queries, the query efficiency of ES is not as good as that of relational databases, because the related queries provided by ES are mainly based on two types: nested documents and parent-child documents. For nested documents Type, ES will create a separate document for each nested field. The more nested fields, the greater the impact on efficiency. Although the parent-child document is stored separately, maintaining the relationship also requires part of the memory, and the query consumes more resources than nested. , both methods will cause the retrieval efficiency to be several times to hundreds of times slower; another method of doing related queries based on ES is to flatten the related fields in the data modeling stage to form a large retrieval table, but This will result in the storage of a large number of redundant fields and will also affect query performance to a certain extent.
为了解决上述技术问题,本发明提出了一种基于Bitmap的多表关联查询方法、装置、设备和计算机可读存储介质,提供了一种基于ES多表关联下复杂查询的新型思路,不依赖于外存,在内存中进行多表查询、关联、聚合、排序等复杂计算,并使用高效位图运算RoaringBitmap对关联查询速度和资源占用进行优化,达到1s内进行千万级别数据的查询关联聚合。In order to solve the above technical problems, the present invention proposes a multi-table association query method, device, equipment and computer-readable storage medium based on Bitmap, and provides a new idea of complex query based on ES multi-table association, which does not rely on External storage performs complex calculations such as multi-table query, association, aggregation, and sorting in memory, and uses high-efficiency bitmap operation RoaringBitmap to optimize the speed of associated query and resource usage, achieving query association aggregation of tens of millions of data within 1 second.
发明内容Contents of the invention
本发明提供了一种基于Bitmap的多表关联查询方法、装置、设备和计算机可读存储介质,提供了一种基于ES多表关联下复杂查询的新型思路,不依赖于外存,在内存中进行多表查询、关联、聚合、排序等复杂计算,并使用高效位图运算RoaringBitmap对关联查询速度和资源占用进行优化,达到1s内进行千万级别数据的查询关联聚合。The invention provides a multi-table association query method, device, equipment and computer-readable storage medium based on Bitmap, and provides a new idea of complex query based on ES multi-table association, which does not rely on external storage and is stored in the memory. Perform complex calculations such as multi-table query, association, aggregation, and sorting, and use efficient bitmap operation RoaringBitmap to optimize the speed of associated query and resource usage, achieving query association aggregation of tens of millions of data within 1 second.
本发明第一方面的实施例提供了一种基于Bitmap的多表关联查询方法,包括如下步骤:对多个索引进行多线程并行全文检索,返回每个索引的若干字段;将返回得到的字段进行数据缓存转换得到缓存数据;采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据;对交集运算数据进行重新排序得到排序数据;对排序数据进行聚合,并进行再查询得到完整的搜索数据。The embodiment of the first aspect of the present invention provides a multi-table association query method based on Bitmap, including the following steps: performing multi-threaded parallel full-text retrieval on multiple indexes, returning several fields of each index; performing the returned fields The data cache is converted to obtain cached data; the Bitmap data structure is used to perform an intersection operation on the cached data in the memory to obtain intersection operation data; the intersection operation data is reordered to obtain sorted data; the sorted data is aggregated and re-queried to obtain complete data. search data.
优选地,若干字段包括_id字段、pid字段、vid字段、score字段;_id字段被配置为根据其去elasticsearch查询完整的信息数据,pid字段为聚合字段返回记录,vid字段作为索引的关联字段用于作交集运算,score字段用于对交集运算数据进行重新排序得到排序数据。Preferably, several fields include _id field, pid field, vid field, and score field; _id field is configured to query elasticsearch for complete information data based on it, pid field is an aggregation field to return records, and vid field is used as an associated field of the index. For intersection operations, the score field is used to reorder the intersection operation data to obtain sorted data.
优选地,对多个索引进行多线程并行全文检索,返回每个索引的若干字段步骤中,利用elasticsearch索引分片,在每个分片上并行执行Scroll查询。Preferably, in the step of performing multi-threaded parallel full-text retrieval on multiple indexes and returning several fields of each index, elasticsearch index shards are used to execute Scroll queries in parallel on each shard.
优选地,将返回得到的字段进行数据缓存转换得到缓存数据步骤中,包括如下子步骤:将返回的每个索引的字段进行vid为key的缓存;将每一条记录的vid取出存入整数数组。Preferably, the step of performing data caching conversion on the returned fields to obtain cached data includes the following sub-steps: caching the returned fields of each index with vid as key; taking out the vid of each record and storing it in an integer array.
优选地,采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据步骤包括如下子步骤:将vid整数数组表示成16进制;将存储结构分为大桶结构和小桶结构,每个大桶结构均包含有若干小桶结构;将表示为16进制的vid整数数组中的高16位存储在大桶结构中;将表示为16进制的vid整数数组中的低16位存储在对应大桶结构下的小桶结构中;进行交集运算。Preferably, the step of using the bitmap Bitmap data structure to perform an intersection operation on the cached data in the memory to obtain the intersection operation data includes the following sub-steps: expressing the vid integer array into hexadecimal; dividing the storage structure into a large bucket structure and a small bucket structure, Each large bucket structure contains several small bucket structures; the high 16 bits of the vid integer array expressed as hexadecimal are stored in the large bucket structure; the low 16 bits of the vid integer array expressed as hexadecimal are stored in Corresponds to the small bucket structure under the big bucket structure; performs intersection operation.
优选地,对交集运算数据进行重新排序得到排序数据步骤中,包括如下子步骤:基于elasticsearch原始分数,根据业务场景赋予不同索引对应的权重,根据交集运算得到交集运算数据去缓存中获取原始分数,加权求和后重新排序。Preferably, the step of re-sorting the intersection operation data to obtain the sorted data includes the following sub-steps: based on the elastic search original score, assigning corresponding weights to different indexes according to the business scenario, obtaining the intersection operation data according to the intersection operation and obtaining the original score in the cache, Reordering after weighted summation.
优选地,对排序数据进行聚合,并进行再查询得到完整的搜索数据步骤中,包括如下子步骤:按照pid聚合,同时按照分页策略返回满足条件的_id字段;根据_id字段再次查询elasticsearch得到完整的搜索数据。Preferably, the step of aggregating the sorted data and re-querying to obtain the complete search data includes the following sub-steps: aggregating according to pid and returning the _id field that meets the conditions according to the paging strategy; querying elasticsearch again according to the _id field to obtain Complete search data.
本发明第二方面的实施例还提供了一种基于Bitmap的多表关联查询装置,包括:字段生成模块,字段生成模块被配置为对多个索引进行多线程并行全文检索,返回每个索引的若干字段;数据缓存模块,数据缓存模块被配置为将返回得到的字段进行数据缓存转换得到缓存数据;交集运算模块,交集运算模块被配置为采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据;数据排序模块,数据排序模块被配置为对交集运算数据进行重新排序得到排序数据;再查询模块,再查询模块被配置为对排序数据进行聚合,并进行再查询得到完整的搜索数据。The embodiment of the second aspect of the present invention also provides a multi-table association query device based on Bitmap, including: a field generation module, the field generation module is configured to perform multi-threaded parallel full-text retrieval on multiple indexes, and return the data of each index. Several fields; the data cache module, the data cache module is configured to perform data cache conversion on the returned fields to obtain cached data; the intersection operation module, the intersection operation module is configured to use the bitmap Bitmap data structure to intersect the cached data in the memory The operation is performed to obtain intersection operation data; the data sorting module is configured to reorder the intersection operation data to obtain sorted data; the requery module is configured to aggregate the sorted data and perform requery to obtain a complete Search data.
本发明第三方面的实施例还提供了一种基于Bitmap的多表关联查询设备,其包括存储器和处理器;其中,存储器用于存储可执行程序代码;处理器用于读取存储器中存储的可执行程序代码以执行基于Bitmap的多表关联查询方法。The embodiment of the third aspect of the present invention also provides a multi-table association query device based on Bitmap, which includes a memory and a processor; wherein, the memory is used to store executable program codes; the processor is used to read executable program codes stored in the memory. Execute the program code to execute the Bitmap-based multi-table association query method.
本发明第四方面的实施例还提供了一种计算机可读存储介质,计算机可读存储介质存储有计算机程序,计算机程序被处理器执行时实现基于Bitmap的多表关联查询方法。Embodiments of the fourth aspect of the present invention also provide a computer-readable storage medium. The computer-readable storage medium stores a computer program. When the computer program is executed by a processor, a multi-table association query method based on Bitmap is implemented.
本发明提供的基于Bitmap的多表关联查询方法、装置、设备和计算机可读存储介质,提供了一种基于ES多表关联下复杂查询的新型思路,不依赖于外存,在内存中进行多表查询、关联、聚合、排序等复杂计算,并使用高效位图运算RoaringBitmap对关联查询速度和资源占用进行优化,达到1s内进行千万级别数据的查询关联聚合。The Bitmap-based multi-table correlation query method, device, equipment and computer-readable storage medium provided by the present invention provide a new idea of complex query based on ES multi-table correlation, which does not rely on external memory and performs multiple queries in the memory. Complex calculations such as table query, association, aggregation, and sorting are performed, and the efficient bitmap operation RoaringBitmap is used to optimize the associated query speed and resource usage, achieving the query associated aggregation of tens of millions of data within 1 second.
本发明的附加方面和优点将在下面的描述部分中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the invention will be apparent from the description which follows, or may be learned by practice of the invention.
附图说明Description of the drawings
本发明的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present invention will become apparent and readily understood from the description of the embodiments taken in conjunction with the following drawings, in which:
图1示出根据本发明实施例的基于Bitmap的多表关联查询方法的流程图;Figure 1 shows a flow chart of a Bitmap-based multi-table association query method according to an embodiment of the present invention;
图2示出根据本发明实施例的基于Bitmap的多表关联查询方法的Bitmap存储结构图;Figure 2 shows a Bitmap storage structure diagram of a Bitmap-based multi-table association query method according to an embodiment of the present invention;
图3示出根据本发明实施例的基于Bitmap的多表关联查询方法的RoaringBitmap存储结构图;Figure 3 shows the RoaringBitmap storage structure diagram of the Bitmap-based multi-table association query method according to an embodiment of the present invention;
图4是本说明书基于Bitmap的多表关联查询设备的一个实施例的结构图;Figure 4 is a structural diagram of an embodiment of the multi-table association query device based on Bitmap in this specification;
图5是本说明书基于Bitmap的多表关联查询方法的计算机可读存储介质的一个实施例的结构图。Figure 5 is a structural diagram of an embodiment of a computer-readable storage medium of the multi-table association query method based on Bitmap in this specification.
具体实施方式Detailed ways
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是,在不冲突的情况下,本申请的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, as long as there is no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.
在下面的描述中阐述了很多具体细节以便于充分理解本发明,但是,本发明还可以采用其他不同于在此描述的其他方式来实施,因此,本发明的保护范围并不受下面公开的具体实施例的限制。Many specific details are set forth in the following description in order to fully understand the present invention. However, the present invention can also be implemented in other ways different from those described here. Therefore, the protection scope of the present invention is not limited by the specific details disclosed below. Limitations of Examples.
下述讨论提供了本发明的多个实施例。虽然每个实施例代表了发明的单一组合,但是本发明不同实施例可以替换,或者合并组合,因此本发明也可认为包含所记载的相同和/或不同实施例的所有可能组合。因而,如果一个实施例包含A、B、C,另一个实施例包含B和D的组合,那么本发明也应视为包括含有A、B、C、D的一个或多个所有其他可能的组合的实施例,尽管该实施例可能并未在以下内容中有明确的文字记载。The following discussion provides various embodiments of the invention. Although each embodiment represents a single combination of the invention, different embodiments of the invention may be substituted, or combined and combined, and the invention is therefore considered to include all possible combinations of the same and/or different embodiments recited. Thus, if one embodiment contains A, B, C, and another embodiment contains a combination of B and D, then the invention should also be considered to include all other possible combinations containing one or more of A, B, C, D embodiment, although this embodiment may not be explicitly documented in the following content.
图1示出根据本发明实施例的基于Bitmap的多表关联查询方法的流程图。如图1所示,基于Bitmap的多表关联查询方法,包括如下步骤:Figure 1 shows a flow chart of a Bitmap-based multi-table association query method according to an embodiment of the present invention. As shown in Figure 1, the multi-table association query method based on Bitmap includes the following steps:
步骤S01,对多个索引进行多线程并行全文检索,返回每个索引的若干字段。数据存在于ES的多个索引中,多线程并行全文检索多个索引,并返回每个索引的若干字段,若干字段包括_id字段、pid字段、vid字段、score字段;_id字段被配置为根据其去elasticsearch查询完整的信息数据,pid字段为聚合字段返回记录,vid字段作为索引的关联字段用于作交集运算,score字段用于对交集运算数据进行重新排序得到排序数据。Step S01: Perform multi-threaded parallel full-text retrieval on multiple indexes and return several fields of each index. Data exists in multiple indexes of ES. Multi-threaded parallel full-text search multiple indexes and returns several fields of each index. Several fields include _id field, pid field, vid field, score field; _id field is configured according to It goes to elasticsearch to query complete information data. The pid field is an aggregate field and returns records. The vid field is used as an associated field of the index for intersection operations. The score field is used to reorder the intersection operation data to obtain sorted data.
步骤S02,将返回得到的字段进行数据缓存转换得到缓存数据。该步骤中,将返回的每个索引的字段进行vid为key的缓存,便于后续交集计算完成后的数据反查;将每一条记录的vid取出存入整数数组,便于后续的交集计算。Step S02: Perform data caching conversion on the returned fields to obtain cached data. In this step, the returned fields of each index are cached with vid as the key to facilitate subsequent data retrieval after the intersection calculation is completed; the vid of each record is taken out and stored in an integer array to facilitate subsequent intersection calculations.
步骤S03,采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据。Step S03: Use the Bitmap data structure to perform an intersection operation on the cached data in the memory to obtain the intersection operation data.
步骤S04,对交集运算数据进行重新排序得到排序数据。该步骤中,基于elasticsearch原始分数,根据业务场景赋予不同索引对应的权重,根据交集运算得到交集运算数据去缓存中获取原始分数,加权求和后重新排序。Step S04: Reorder the intersection operation data to obtain sorted data. In this step, based on the original score of elastic search, different indexes are given corresponding weights according to the business scenario, the intersection operation data is obtained according to the intersection operation, the original score is obtained in the cache, and the weighted sum is reordered.
步骤S05,对排序数据进行聚合,并进行再查询得到完整的搜索数据。该步骤中,按照pid聚合,同时按照分页策略返回满足条件的_id字段;根据_id字段再次查询elasticsearch得到完整的搜索数据。Step S05: Aggregate the sorted data and perform re-query to obtain complete search data. In this step, aggregation is performed based on pid, and the _id field that meets the conditions is returned according to the paging strategy; elasticsearch is queried again based on the _id field to obtain complete search data.
本发明实施例所提供的基于Bitmap的多表关联查询方法的实施例中,对多个索引进行多线程并行全文检索,返回每个索引的若干字段步骤中,利用elasticsearch索引分片,在每个分片上并行执行Scroll查询。在此步查询ES时需要将命中文档全部返回,没有分页,但是普通的查询是有窗口限制的,所以用了ES的Scroll查询来实现,当面对大数据量时Scroll查询耗时很长,为了解决这个问题本发明充分利用了ES索引分片,在每个分片上并行执行Scroll查询,大大提高了查询耗时,10个分片下千万级别数据可在500ms返回。In the embodiment of the multi-table association query method based on Bitmap provided by the embodiment of the present invention, multi-threaded parallel full-text retrieval is performed on multiple indexes and several fields of each index are returned. In the step of using elasticsearch index sharding, in each Execute Scroll queries in parallel on shards. When querying ES in this step, all hit documents need to be returned without paging. However, ordinary queries have window restrictions, so ES's Scroll query is used to implement it. When facing a large amount of data, Scroll query takes a long time. In order to solve this problem, this invention makes full use of ES index sharding and executes Scroll queries in parallel on each shard, which greatly improves the query time. Ten million-level data can be returned in 500ms under 10 shards.
本发明实施例所提供的基于Bitmap的多表关联查询方法的实施例中,采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据步骤包括如下子步骤:将vid整数数组表示成16进制;将存储结构分为大桶结构和小桶结构,每个大桶结构均包含有若干小桶结构;将表示为16进制的vid整数数组中的高16位存储在大桶结构中;将表示为16进制的vid整数数组中的低16位存储在对应大桶结构下的小桶结构中;进行交集运算。In the embodiment of the multi-table association query method based on Bitmap provided by the embodiment of the present invention, the step of using the bitmap Bitmap data structure to perform an intersection operation on the cached data in the memory to obtain the intersection operation data includes the following sub-steps: represent the vid integer array into hexadecimal; divide the storage structure into a large bucket structure and a small bucket structure, each large bucket structure contains several small bucket structures; store the high 16 bits of the vid integer array expressed as hexadecimal in the large bucket structure; Store the lower 16 bits of the vid integer array represented as hexadecimal in the small bucket structure under the corresponding big bucket structure; perform intersection operations.
在内存中进行千万级别数据的交集运算,如果使用普通的数据结构可能耗时一天不止,本发明采用位图Bitmap这种数据结构进行交集运算。Bitmap的原理是用一个bit位存放某种状态,每一位表示一个数,0表示不存在,1表示存在,比如在存储数字1时,就是在bit数组上将index为1的数值置为1,其它index位置为0,图2示出根据本发明实施例的基于Bitmap的多表关联查询方法的Bitmap存储结构图;如图2所示,在0-7排序位置处的1,2,4,6位置处存储数字1,则可很容易的表示出{1,2,4,6}这几个数。这种存储很适合用于判断一个数是否存在于一个集合中或者某个数组是否重复,而且这种数据结构也很节省存储空间,在Java中,int占4字节,1字节=8位(1byte=8bit),如果每个数字用int存储,那就是20亿个int,因而占用的空间约为(2000000000*4/1024/1024/1024)≈7.45G,如果按位存储就不一样了,20亿个数就是20亿位,占用空间约为(2000000000/8/1024/1024/1024)≈0.233G。Performing an intersection operation on tens of millions of data in the memory may take more than a day if a common data structure is used. The present invention uses a data structure such as Bitmap to perform the intersection operation. The principle of Bitmap is to use a bit to store a certain state. Each bit represents a number, 0 means it does not exist, and 1 means it exists. For example, when storing the number 1, the value with index 1 is set to 1 on the bit array. , other index positions are 0. Figure 2 shows the Bitmap storage structure diagram of the multi-table association query method based on Bitmap according to the embodiment of the present invention; as shown in Figure 2, 1, 2, 4 at the 0-7 sorting positions , storing the number 1 at position 6 can easily represent the numbers {1, 2, 4, 6}. This kind of storage is very suitable for determining whether a number exists in a set or whether an array is repeated, and this data structure also saves storage space. In Java, int occupies 4 bytes, and 1 byte = 8 bits. (1byte=8bit), if each number is stored in int, it will be 2 billion int, so the space occupied is about (2000000000*4/1024/1024/1024)≈7.45G. If it is stored in bits, it will be different. , 2 billion numbers are 2 billion bits, and the space occupied is about (2000000000/8/1024/1024/1024)≈0.233G.
但是这种结构也有缺点,比如要存入(10,99999999)这三个数据,我们需要建立一个100000000长度的BitMap,但是实际上只存了两个数据,这就造成了空间浪费,为了解决这种数据稀疏所带来的空间浪费,本发明使用了高效压缩位图RoaringBitmap来避免这种问题的出现,图3示出根据本发明实施例的基于Bitmap的多表关联查询方法的RoaringBitmap存储结构图,如图3所示,RoaringBitmap使用了分桶的思想,将一个整数表示成16进制,高16位作为key存储在大桶里,大桶里包含了若干个小桶,低16位就存储在各个小桶里,每个小桶可存储2^16(65536)个,这样就可以保证相同高16位的数存储在一个桶内,不需要额外申请多余的大桶空间。而且这种存储较Bitmap在交并集计算性能上也有所优化,根据key确定小桶后,只需要进行小桶内数据的运算,不需要像Bitmap那样对整体的数据运算,相当于用排好序的一级索引来加速了运算。However, this structure also has shortcomings. For example, to store the three data (10,99999999), we need to create a BitMap with a length of 100000000, but in fact only two data are stored, which causes a waste of space. In order to solve this problem This waste of space caused by sparse data. The present invention uses RoaringBitmap, an efficient compressed bitmap, to avoid this problem. Figure 3 shows the RoaringBitmap storage structure diagram of the multi-table association query method based on Bitmap according to an embodiment of the present invention. As shown in Figure 3, RoaringBitmap uses the idea of bucketing to represent an integer in hexadecimal. The high 16 bits are stored in the big bucket as the key. The big bucket contains several small buckets, and the low 16 bits are stored in each bucket. In small buckets, each small bucket can store 2^16 (65536) numbers. This ensures that the same high 16-bit numbers are stored in one bucket, and there is no need to apply for additional large bucket space. Moreover, this kind of storage is also optimized in terms of intersection and union calculation performance compared to Bitmap. After determining the bucket according to the key, only the data in the bucket needs to be calculated, and there is no need to calculate the overall data like Bitmap, which is equivalent to using arranged The sequential primary index speeds up the operation.
在一个具体的实施例中,本发明所公开的基于Bitmap的多表关联查询方法可用于医疗科研平台上的搜索场景,例如查询出得过子宫肌瘤疾病或使用过二甲双胍的所有患者以及这些患者的前三次就诊记录,上面的关键词或搜索结果在原始数据中存在于三张表:诊断、用药、患者,面对这种搜索场景,以往的做法是通过数据建模基于患者ID(即pid)、就诊ID(即vid)将所有表关联起来整合成一张大宽表,对这张大宽表进行子宫肌瘤或二甲双胍这种关键词的全文检索,命中的文档按照pid进行聚合,用ES的top_hit聚合来限制每个聚合桶返回三条数据,即三次就诊记录,在一千字段十万数据量下检索耗时10s不止。本发明的做法是不整合大宽表,基于多张表进行关键词的全文检索,仅返回每张表的pid、vid、score,在内存中进行交集计算、聚合、分数加权、按分数排序等步骤。具体的流程执行过程,包括如下步骤:In a specific embodiment, the multi-table association query method based on Bitmap disclosed in the present invention can be used in search scenarios on a medical scientific research platform, for example, querying all patients who have suffered from uterine fibroids or used metformin and these patients. For the first three medical treatment records, the above keywords or search results exist in three tables in the original data: diagnosis, medication, and patient. Faced with this search scenario, the previous approach was to use data modeling based on the patient ID (i.e. pid ), visit ID (i.e. vid), associate all the tables and integrate them into one large wide table. Perform a full-text search for keywords such as uterine fibroids or metformin on this large wide table. The hit documents are aggregated according to pid, and top_hit of ES is used. Aggregation is used to limit each aggregation bucket to return three pieces of data, that is, three medical records. It takes more than 10 seconds to retrieve the data with one thousand fields and one hundred thousand data. The method of the present invention is not to integrate large and wide tables, but to perform full-text search of keywords based on multiple tables. Only the pid, vid, and score of each table are returned, and intersection calculation, aggregation, score weighting, sorting by scores, etc. are performed in the memory. step. The specific process execution process includes the following steps:
步骤T01:并行全文检索。医疗的数据存在于ES九个索引中,通过多线程并行全文检索九个索引,返回每个索引的_id、pid、vid、score这四个字段,其中,_id是为了完成一系列复杂计算后根据_id去ES查询完整的患者信息数据,pid作为聚合字段完成返回每个患者的三条就诊记录,vid作为九个索引的关联字段用于做交集计算,score则是用于完成自定义的分数计算及排序逻辑。Step T01: Parallel full-text search. Medical data exists in nine ES indexes. Through multi-threaded parallel full-text retrieval of nine indexes, the four fields of _id, pid, vid, and score of each index are returned. Among them, _id is used to complete a series of complex calculations. Go to ES to query the complete patient information data based on _id. PID is used as an aggregation field to complete the return of three medical records for each patient. vid is used as an associated field of nine indexes for intersection calculations, and score is used to complete a custom score. Calculation and sorting logic.
步骤T02:数据缓存转换。将上述返回的每个索引的四个字段进行vid为key的缓存,便于后续交集计算完成后的数据反查;将每一条记录的vid取出存入整数数组,便于后续的交集计算。这两个过程都在多线程中完成,可达到8线程下300ms完成千万数据的处理。Step T02: Data cache conversion. The four fields of each index returned above are cached as vid as key to facilitate subsequent data retrieval after the intersection calculation is completed; the vid of each record is taken out and stored in an integer array to facilitate subsequent intersection calculation. Both processes are completed in multi-threads, and can handle tens of millions of data in 300ms with 8 threads.
步骤T03:交集计算。将步骤T02中的九个vid整数数组写入九个RoaringBitmap,再对它们进行交集运算即可。在实现上使用的是org.roaringbitmap.RoaringBitmap,它本身实现了写入和交并集,使用非常方便,引入RoaringBitmap后对千万级别数据进行交集运算只需要耗时140ms。org.roaringbitmap.RoaringBitmap的程序实现如下所示:Step T03: Intersection calculation. Write the nine vid integer arrays in step T02 into nine RoaringBitmaps, and then perform intersection operations on them. The implementation uses org.roaringbitmap.RoaringBitmap, which itself implements writing and intersection union, and is very convenient to use. After the introduction of RoaringBitmap, it only takes 140ms to perform intersection operations on tens of millions of data. The program implementation of org.roaringbitmap.RoaringBitmap is as follows:
List<RoaringBitmap>roaringBitmaps=Lists.newArrayList();List<RoaringBitmap>roaringBitmaps=Lists.newArrayList();
RoaringBitmap bitMap=new RoaringBitmap();RoaringBitmap bitMap=new RoaringBitmap();
bitMap.add(vidArr);bitMap.add(vidArr);
roaringBitmaps.add(bitMap);roaringBitmaps.add(bitMap);
RoaringBitmap mixedVids=roaringBitmaps.get(0);RoaringBitmap mixedVids=roaringBitmaps.get(0);
for(int i=1;i<roaringBitmaps.size();i++){for(int i=1;i<roaringBitmaps.size();i++){
if(logicOperator==LogicOperator.AND){if(logicOperator==LogicOperator.AND){
mixedVids.and(roaringBitmaps.get(i));mixedVids.and(roaringBitmaps.get(i));
}else if(logicOperator==LogicOperator.OR){}else if(logicOperator==LogicOperator.OR){
mixedVids.or(roaringBitmaps.get(i));mixedVids.or(roaringBitmaps.get(i));
}else{}else{
mixedVids.andNot(roaringBitmaps.get(i));mixedVids.andNot(roaringBitmaps.get(i));
}}
}}
步骤T04:分数加权排序。基于ES原始分数,根据业务场景赋予不同索引对应的权重,根据交集得出的mixedVids去缓存中获取原始的分数,加权求和后重新排序。Step T04: Score weighted sorting. Based on the ES original score, different indexes are given corresponding weights according to the business scenario, and the original scores are obtained from the cache based on the mixedVids obtained by the intersection, and then reordered after weighted summation.
步骤T05:聚合。对所有的数据按照pid聚合,同时按照分页策略返回满足条件的_id,最后根据_id再次查询ES得到完整的搜索数据。Step T05: Aggregation. All data is aggregated according to pid, and _id that meets the conditions is returned according to the paging strategy. Finally, ES is queried again according to _id to obtain complete search data.
本发明实施例所提供的一种基于Bitmap的多表关联查询装置,包括:字段生成模块,字段生成模块被配置为对多个索引进行多线程并行全文检索,返回每个索引的若干字段;数据缓存模块,数据缓存模块被配置为将返回得到的字段进行数据缓存转换得到缓存数据;交集运算模块,交集运算模块被配置为采用位图Bitmap数据结构在内存中对缓存数据进行交集运算得到交集运算数据;数据排序模块,数据排序模块被配置为对交集运算数据进行重新排序得到排序数据;再查询模块,再查询模块被配置为对排序数据进行聚合,并进行再查询得到完整的搜索数据。A multi-table association query device based on Bitmap provided by an embodiment of the present invention includes: a field generation module configured to perform multi-threaded parallel full-text retrieval on multiple indexes and return several fields of each index; data The cache module, the data cache module is configured to perform data cache conversion on the returned fields to obtain cached data; the intersection operation module, the intersection operation module is configured to use the bitmap Bitmap data structure to perform intersection operations on the cached data in the memory to obtain the intersection operation. data; a data sorting module, the data sorting module is configured to reorder the intersection operation data to obtain sorted data; a requery module, the requery module is configured to aggregate the sorted data, and perform requery to obtain complete search data.
图4是本说明书基于Bitmap的多表关联查询设备的一个实施例的结构图。下面参考图4,其示出了适于用来实现本公开实施例的基于Bitmap的多表关联查询设备300的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Figure 4 is a structural diagram of an embodiment of the multi-table association query device based on Bitmap in this specification. Referring now to FIG. 4 , which shows a schematic structural diagram of a Bitmap-based multi-table association query device 300 suitable for implementing an embodiment of the present disclosure. Electronic devices in embodiments of the present disclosure may include, but are not limited to, mobile phones, laptops, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablets), PMPs (Portable Multimedia Players), vehicle-mounted terminals (such as Mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers, etc. The electronic device shown in FIG. 4 is only an example and should not impose any limitations on the functions and scope of use of the embodiments of the present disclosure.
如图4所示,电子设备300可以包括处理装置(例如中央处理器、图形处理器等)301,其可以根据存储在只读存储器(ROM)302中的程序或者从存储装置308加载到随机访问存储器(RAM)303中的程序而执行各种适当的动作和处理。在RAM303中,还存储有电子设备300操作所需的各种程序和数据。处理装置301、ROM 302以及RAM303通过总线304彼此相连。输入/输出(I/O)接口305也连接至总线304。As shown in FIG. 4 , the electronic device 300 may include a processing device (eg, central processing unit, graphics processor, etc.) 301 , which may be loaded into a random access device according to a program stored in a read-only memory (ROM) 302 or loaded from a storage device 308 The program in the memory (RAM) 303 executes various appropriate actions and processes. In the RAM 303, various programs and data required for the operation of the electronic device 300 are also stored. The processing device 301, the ROM 302 and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
通常,以下装置可以连接至I/O接口305:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置307;包括例如磁带、硬盘等的存储装置308;以及通信装置309。通信装置309可以允许电子设备300与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), a speaker, An output device 307 such as a vibrator; a storage device 308 including a magnetic tape, a hard disk, etc.; and a communication device 309. The communication device 309 may allow the electronic device 300 to communicate wirelessly or wiredly with other devices to exchange data. Although FIG. 4 illustrates electronic device 300 with various means, it should be understood that implementation or availability of all illustrated means is not required. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置309从网络上被下载和安装,或者从存储装置308被安装,或者从ROM 302被安装。在该计算机程序被处理装置301执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product including a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such embodiments, the computer program may be downloaded and installed from the network via communication device 309, or from storage device 308, or from ROM 302. When the computer program is executed by the processing device 301, the above-mentioned functions defined in the method of the embodiment of the present disclosure are performed.
图5是本说明书基于Bitmap的多表关联查询方法的计算机可读存储介质的一个实施例的结构图。如图5所示,根据本公开实施例的计算机可读存储介质40,其上存储有非暂时性计算机可读指令41。当该非暂时性计算机可读指令41由处理器运行时,执行前述的本公开各实施例的基于Bitmap的多表关联查询方法的全部或部分步骤。Figure 5 is a structural diagram of an embodiment of a computer-readable storage medium of the multi-table association query method based on Bitmap in this specification. As shown in FIG. 5 , a computer-readable storage medium 40 according to an embodiment of the present disclosure has non-transitory computer-readable instructions 41 stored thereon. When the non-transitory computer-readable instructions 41 are executed by the processor, all or part of the steps of the aforementioned Bitmap-based multi-table association query methods of various embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination thereof. More specific examples of computer readable storage media may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard drive, random access memory (RAM), read only memory (ROM), removable Programmed read-only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical cable, RF (radio frequency), etc., or any suitable combination of the above.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; it may also exist independently without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:构建基础页面,所述基础页面的页面代码用于搭建所述业务页面运行所需的环境和/或实现同类业务场景中抽象出的相同的工作流程;构建一个或多个页面模板,所述页面模板用于提供业务场景中实现业务功能的代码模板;基于相应的所述页面模板,通过业务场景的每一个页面的具体功能的代码转换,生成业务场景的每一个页面的最终页面代码;将生成的所述每一个页面的最终页面代码合并入所述基础页面的页面代码,生成所述业务页面的代码。The computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device: constructs a basic page, and the page code of the basic page is used to build the business page. Run the required environment and/or implement the same workflow abstracted from similar business scenarios; build one or more page templates, which are used to provide code templates for implementing business functions in business scenarios; based on the corresponding The above page template generates the final page code of each page of the business scenario through code conversion of the specific functions of each page of the business scenario; the generated final page code of each page is merged into the page of the basic page Code, the code that generates the business page.
或者,上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:构建基础页面,所述基础页面的页面代码用于搭建所述业务页面运行所需的环境和/或实现同类业务场景中抽象出的相同的工作流程;构建一个或多个页面模板,所述页面模板用于提供业务场景中实现业务功能的代码模板;基于相应的所述页面模板,通过业务场景的每一个页面的具体功能的代码转换,生成业务场景的每一个页面的最终页面代码;将生成的所述每一个页面的最终页面代码合并入所述基础页面的页面代码,生成所述业务页面的代码。Alternatively, the computer-readable medium carries one or more programs. When the one or more programs are executed by the electronic device, the electronic device: constructs a basic page, and the page code of the basic page is used to build the The environment required for the operation of business pages and/or the implementation of the same workflow abstracted from similar business scenarios; constructing one or more page templates, which are used to provide code templates for implementing business functions in business scenarios; based on corresponding The page template, through code conversion of the specific functions of each page of the business scenario, generates the final page code of each page of the business scenario; the generated final page code of each page is merged into the basic page The page code generates the code for the business page.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operations of possible implementations of systems, methods, and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, segment, or portion of code that contains one or more logic functions that implement the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown one after another may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved. It will also be noted that each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or operations. , or can be implemented using a combination of specialized hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。The units involved in the embodiments of the present disclosure can be implemented in software or hardware. Among them, the name of a unit does not constitute a limitation on the unit itself under certain circumstances.
本发明提供的基于Bitmap的多表关联查询方法、装置、设备和计算机可读存储介质,提供了一种基于ES多表关联下复杂查询的新型思路,不依赖于外存,在内存中进行多表查询、关联、聚合、排序等复杂计算,并使用高效位图运算RoaringBitmap对关联查询速度和资源占用进行优化,达到1s内进行千万级别数据的查询关联聚合。The Bitmap-based multi-table correlation query method, device, equipment and computer-readable storage medium provided by the present invention provide a new idea of complex query based on ES multi-table correlation, which does not rely on external memory and performs multiple queries in the memory. Complex calculations such as table query, association, aggregation, and sorting are performed, and the efficient bitmap operation RoaringBitmap is used to optimize the associated query speed and resource usage, achieving the query associated aggregation of tens of millions of data within 1 second.
在本发明中,术语“第一”、“第二”、“第三”仅用于描述的目的,而不能理解为指示或暗示相对重要性;术语“多个”则指两个或两个以上,除非另有明确的限定。术语“安装”、“相连”、“连接”、“固定”等术语均应做广义理解,例如,“连接”可以是固定连接,也可以是可拆卸连接,或一体地连接;“相连”可以是直接相连,也可以通过中间媒介间接相连。对于本领域的普通技术人员而言,可以根据具体情况理解上述术语在本发明中的具体含义。In the present invention, the terms "first", "second" and "third" are only used for descriptive purposes and cannot be understood as indicating or implying relative importance; the term "plurality" refers to two or two Above, unless otherwise expressly limited. The terms "installation", "connection", "connection" and "fixing" should be understood in a broad sense. For example, "connection" can be a fixed connection, a detachable connection, or an integral connection; "connection" can be Either directly or indirectly through an intermediary. For those of ordinary skill in the art, the specific meanings of the above terms in the present invention can be understood according to specific circumstances.
本发明的描述中,需要理解的是,术语“上”、“下”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本发明和简化描述,而不是指示或暗示所指的装置或单元必须具有特定的方向、以特定的方位构造和操作,因此,不能理解为对本发明的限制。In the description of the present invention, it should be understood that the orientation or positional relationship indicated by the terms "upper", "lower", etc. is based on the orientation or positional relationship shown in the drawings, and is only for the convenience of describing the present invention and simplifying the description. It is not intended to indicate or imply that the device or unit referred to must have a specific orientation, be constructed and operate in a specific orientation, and therefore, it is not to be construed as a limitation of the invention.
在本说明书的描述中,术语“一个实施例”、“一些实施例”、“具体实施例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或实例。而且,描述的具体特征、结构、材料或特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, the terms "one embodiment," "some embodiments," "specific embodiments," etc., mean that a particular feature, structure, material or characteristic described in connection with the embodiment or example is included in the invention. in at least one embodiment or example. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
以上仅为本发明的某些实施例而已,并不用于限制本发明,对于本领域的技术人员来说,本发明可以有各种更改和变化。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above are only some embodiments of the present invention and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311526239.8A CN117762920A (en) | 2023-11-16 | 2023-11-16 | A Bitmap-based multi-table association query method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311526239.8A CN117762920A (en) | 2023-11-16 | 2023-11-16 | A Bitmap-based multi-table association query method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117762920A true CN117762920A (en) | 2024-03-26 |
Family
ID=90309558
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311526239.8A Pending CN117762920A (en) | 2023-11-16 | 2023-11-16 | A Bitmap-based multi-table association query method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117762920A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119396897A (en) * | 2024-10-10 | 2025-02-07 | 北京火山引擎科技有限公司 | A method, device and equipment for index query based on database |
CN119513142A (en) * | 2025-01-21 | 2025-02-25 | 杭州古珀医疗科技有限公司 | Medical big data ES wide table generation method and device based on efficient dynamic data configuration |
-
2023
- 2023-11-16 CN CN202311526239.8A patent/CN117762920A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119396897A (en) * | 2024-10-10 | 2025-02-07 | 北京火山引擎科技有限公司 | A method, device and equipment for index query based on database |
CN119396897B (en) * | 2024-10-10 | 2025-05-16 | 北京火山引擎科技有限公司 | Index query method, device and equipment based on database |
CN119513142A (en) * | 2025-01-21 | 2025-02-25 | 杭州古珀医疗科技有限公司 | Medical big data ES wide table generation method and device based on efficient dynamic data configuration |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992689B (en) | Search methods, terminals and media | |
CN117762920A (en) | A Bitmap-based multi-table association query method and device | |
KR101938953B1 (en) | Flash optimized columnar data layout and data access algorithms for big data query engines | |
CN103810237B (en) | Data managing method and system | |
US8117137B2 (en) | Field-programmable gate array based accelerator system | |
US8423562B2 (en) | Non-transitory, computer readable storage medium, search method, and search apparatus | |
CN112486988B (en) | Data processing method, device, equipment and storage medium | |
US11886411B2 (en) | Data storage using roaring binary-tree format | |
CN102902763B (en) | The method of association, retrieving information process data and process information task and device | |
CN113407785A (en) | Data processing method and system based on distributed storage system | |
WO2019147441A1 (en) | Wide key hash table for a graphics processing unit | |
CN117312325A (en) | Knowledge distillation-based quantization index construction method, device and equipment | |
CN111046085B (en) | Data tracing processing method and device, medium and equipment | |
CN108140022A (en) | Data query method and database system | |
Liu et al. | G-learned index: Enabling efficient learned index on GPU | |
CN109542912B (en) | Interval data storage method, device, server and storage medium | |
US20200012630A1 (en) | Smaller Proximate Search Index | |
CN117688124A (en) | Data query index creation method and device, storage medium and electronic equipment | |
CN106407137A (en) | Hardware accelerator and method of collaborative filtering recommendation algorithm based on neighborhood model | |
CN115658694A (en) | System, method and device for generating a database table | |
CN114153845B (en) | Data storage and reading method, device, equipment and medium | |
WO2023249756A1 (en) | Multi-model enrichment memory and catalog for better search recall with granular provenance and lineage | |
CN115587090A (en) | Data storage method, device, equipment and medium based on Doris | |
CN116260711A (en) | Data processing method, device, equipment and readable storage medium | |
Zhou et al. | A parallel high speed lossless data compression algorithm in large-scale wireless sensor network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Country or region after: China Address after: 100083 b-1011, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing Applicant after: Beijing Haizhi Technology Group Co.,Ltd. Address before: 100083 b-1011, bungalow, building 2, A5 Xueyuan Road, Haidian District, Beijing Applicant before: Beijing Haizhi Technology Group Co.,Ltd. Country or region before: China |
|
CB02 | Change of applicant information |