CN112052259A

CN112052259A - Data processing method, apparatus, equipment and computer storage medium

Info

Publication number: CN112052259A
Application number: CN202011042296.5A
Authority: CN
Inventors: 肖翔; 吴海山; 殷磊
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2020-12-08

Abstract

The invention discloses a data processing method, device, equipment and computer storage medium. The data processing method includes: receiving a data query request sent by a client, and determining at least one query condition corresponding to the data query request; respectively obtaining The query thread corresponding to each query condition, and obtain the target query data corresponding to each query condition from the data repository through the query thread respectively; at least one target query data is aggregated and classified to obtain the query result corresponding to the data query request . Compared with the prior art technology that uses a single query thread to traverse each query condition one by one, the data processing method proposed by the present invention obtains the target query corresponding to each query condition from the data repository through the query thread corresponding to each query condition. data, shorten the data query time, and then improve the data query efficiency.

Description

Data processing method, apparatus, equipment and computer storage medium

技术领域technical field

本发明涉及数据处理领域，尤其涉及一种数据处理方法、装置、设备及计算机存储介质。The present invention relates to the field of data processing, and in particular, to a data processing method, apparatus, device and computer storage medium.

背景技术Background technique

在信息数据研究中，用户通过在数据库的搜索引擎中输入查询关键词或查询条件，然后搜索引擎在数据库中根据查询关键词或查询条件进行查询数据，当查询条件有多个时，搜索引擎会利用一个查询线程逐一遍历查询每一个查询条件，导致查询时间会随着用户输入的查询条件数量倍数增长，不仅给搜索引擎带来巨大的计算压力，还影响用户的使用感。In information data research, users enter query keywords or query conditions in the search engine of the database, and then the search engine queries data in the database according to the query keywords or query conditions. When there are multiple query conditions, the search engine will Using a query thread to traverse and query each query condition one by one, the query time will increase multiple times with the number of query conditions input by the user, which not only brings huge computational pressure to the search engine, but also affects the user's sense of use.

发明内容SUMMARY OF THE INVENTION

本发明提供一种数据处理方法、装置、设备及计算机存储介质，旨在解决现有技术中采用单一查询线程逐一遍历每个查询条件而导致查询耗时长的技术问题。The present invention provides a data processing method, device, device and computer storage medium, aiming at solving the technical problem of long time-consuming query caused by adopting a single query thread to traverse each query condition one by one in the prior art.

为实现上述目的，本发明提供一种数据处理方法，所述数据处理方法包括以下步骤：In order to achieve the above object, the present invention provides a data processing method, the data processing method comprises the following steps:

接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；Receive a data query request sent by the client, and determine at least one query condition corresponding to the data query request;

分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；Obtain the query thread corresponding to each query condition respectively, and obtain the target query data corresponding to each query condition from the data repository through the query thread respectively;

对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。Aggregate and classify at least one target query data to obtain a query result corresponding to the data query request.

优选地，所述分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据的步骤之前，还包括：Preferably, before the step of acquiring the target query data corresponding to each query condition from the data repository through the query thread, the method further includes:

若检测到数据存储请求，则确定源数据库中所述数据存储请求对应的待存储数据；If a data storage request is detected, determine the data to be stored corresponding to the data storage request in the source database;

确定所述待存储数据对应的时间维度，并根据所述时间维度将所述待存储数据从源数据库中存储至数据存储库中。The time dimension corresponding to the data to be stored is determined, and the data to be stored is stored from the source database to the data repository according to the time dimension.

所述若检测到数据存储请求，则确定所述数据存储请求对应的待存储数据的步骤包括：If a data storage request is detected, the step of determining the data to be stored corresponding to the data storage request includes:

对源数据库进行实时监控；Real-time monitoring of the source database;

若监测到源数据库中数据更新，则判定检测到数据存储请求；If the data update in the source database is detected, it is determined that a data storage request is detected;

确定源数据库中数据更新对应的更新数据，并将所述更新数据作为所述数据存储请求对应的待存储数据。Determine the update data corresponding to the data update in the source database, and use the update data as the to-be-stored data corresponding to the data storage request.

优选地，所述确定所述待存储数据对应的时间维度，并根据所述时间维度将所述待存储数据从源数据库中存储至数据存储库中的步骤包括：Preferably, the step of determining the time dimension corresponding to the data to be stored and storing the data to be stored from the source database to the data repository according to the time dimension includes:

确定当前数据存储库中是否存在所述时间维度对应的索引；determining whether an index corresponding to the time dimension exists in the current data repository;

若当前数据存储库中存在所述时间维度对应的索引，则将所述待存储数据从源数据库中存储至所述时间维度对应的索引中；或者If there is an index corresponding to the time dimension in the current data repository, store the to-be-stored data from the source database into the index corresponding to the time dimension; or

若当前数据存储库中不存在所述时间维度对应的索引，则构建所述时间维度对应的索引，并将所述待存储数据从源数据库中存储至所述时间维度对应的索引中。If the index corresponding to the time dimension does not exist in the current data repository, the index corresponding to the time dimension is constructed, and the to-be-stored data is stored from the source database into the index corresponding to the time dimension.

优选地，所述将所述待存储数据从源数据库中存储至所述时间维度对应的索引中的步骤包括：Preferably, the step of storing the data to be stored from the source database to the index corresponding to the time dimension includes:

调用读线程将源数据库中的待存储数据读取至存储队列，并从线程池中调用空闲的写线程将所述存储队列中的待存储数据取出；Call the read thread to read the data to be stored in the source database to the storage queue, and call an idle write thread from the thread pool to take out the data to be stored in the storage queue;

对所述写线程取出的待存储数据进行结构化处理，以获取目标待存储数据；performing structured processing on the data to be stored taken out by the writing thread to obtain the target data to be stored;

当所述目标待存储数据的数据量达到预设数量，则将所述待存储数据存储所述时间维度对应的索引中。When the data amount of the target data to be stored reaches a preset amount, the to-be-stored data is stored in an index corresponding to the time dimension.

优选地，所述调用空闲的写线程将所述存储队列中的待存储数据取出的步骤之后，还包括：Preferably, after the step of calling an idle writing thread to take out the data to be stored in the storage queue, the method further includes:

判断所述存储队列中是否存在待存储数据；Judging whether there is data to be stored in the storage queue;

若所述存储队列中存在待存储数据，则返回执行从线程池中调用空闲的写线程将所述存储队列中的待存储数据取出的步骤；If there is data to be stored in the storage queue, return to the step of calling an idle writer thread from the thread pool to take out the data to be stored in the storage queue;

继续执行对所述写线程取出的待存储数据进行结构化处理，以获取目标待存储数据的步骤；Continue to perform structured processing on the to-be-stored data taken out by the writing thread to obtain the target to-be-stored data;

继续执行当所述目标待存储数据的数据量达到预设数量，则将所述待存储数据存储所述时间维度对应的索引中。Continue to execute. When the data amount of the target data to be stored reaches a preset amount, the to-be-stored data is stored in the index corresponding to the time dimension.

优选地，所述分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标数据的步骤包括：Preferably, the step of respectively acquiring the query threads corresponding to each query condition, and respectively acquiring the target data corresponding to each query condition from the data repository through the query thread includes:

确定各个查询条件匹配的查询索引，并获取所述查询索引对应的查询线程；Determine the query index matching each query condition, and obtain the query thread corresponding to the query index;

分别根据各个查询线程，从数据存储库中获取各个查询条件对应的目标数据。According to each query thread, the target data corresponding to each query condition is obtained from the data repository.

此外，为实现上述目的，本发明还提供一种数据处理装置，所述数据处理装置包括：In addition, in order to achieve the above object, the present invention also provides a data processing device, the data processing device comprising:

接收模块，用于接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；a receiving module, configured to receive a data query request sent by the client, and determine at least one query condition corresponding to the data query request;

获取模块，用于分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；The acquiring module is used to acquire the query thread corresponding to each query condition respectively, and acquire the target query data corresponding to each query condition from the data repository through the query thread respectively;

聚合模块，用于对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。The aggregation module is configured to aggregate and classify at least one target query data to obtain query results corresponding to the data query request.

此外，为实现上述目的，本发明还提供一种数据处理设备，所述数据处理设备包括处理器，存储器以及存储在所述存储器中的数据处理程序，所述数据处理程序被所述处理器运行时，实现如上所述的数据处理方法的步骤。In addition, in order to achieve the above object, the present invention also provides a data processing device comprising a processor, a memory and a data processing program stored in the memory, the data processing program being run by the processor , the steps of the data processing method as described above are implemented.

此外，为实现上述目的，本发明还提供一种计算机存储介质，所述计算机存储介质上存储有数据处理程序，所述数据处理程序被处理器运行时实现如上所述数据处理方法的步骤。In addition, in order to achieve the above object, the present invention also provides a computer storage medium on which a data processing program is stored, and the data processing program implements the steps of the above data processing method when the data processing program is run by a processor.

相比现有技术，本发明公开了一种数据处理方法、装置、设备及计算机存储介质，接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。与现有技术中采用单一查询线程逐一遍历每个查询条件的技术相比，本发明提出的数据处理方法通过分别通过各个查询条件对应的查询线程从数据存储库中获取各个查询条件对应的目标查询数据，缩减了数据查询时间，进而提高了数据查询效率。Compared with the prior art, the present invention discloses a data processing method, device, equipment and computer storage medium, which receive a data query request sent by a client, and determine at least one query condition corresponding to the data query request; The query thread corresponding to the query condition obtains the target query data corresponding to each query condition from the data repository through the query thread respectively; at least one target query data is aggregated and classified to obtain the query result corresponding to the data query request. Compared with the prior art technology that uses a single query thread to traverse each query condition one by one, the data processing method proposed by the present invention obtains the target query corresponding to each query condition from the data repository through the query thread corresponding to each query condition. data, shorten the data query time, and then improve the data query efficiency.

附图说明Description of drawings

图1是本发明各实施例涉及的数据处理设备的硬件结构示意图；1 is a schematic diagram of a hardware structure of a data processing device involved in various embodiments of the present invention;

图2是本发明数据处理方法第一实施例的流程示意图；FIG. 2 is a schematic flowchart of the first embodiment of the data processing method of the present invention;

图3是本发明数据处理装置第一实施例的功能模块示意图。FIG. 3 is a schematic diagram of functional modules of the first embodiment of the data processing apparatus of the present invention.

本发明目的的实现、功能特点及优点将结合实施例，参照附图做进一步说明。The realization, functional characteristics and advantages of the present invention will be further described with reference to the accompanying drawings in conjunction with the embodiments.

具体实施方式Detailed ways

应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

本发明实施例主要涉及的数据处理设备是指能够实现网络连接的网络连接设备，所述数据处理设备可以是数据存储服务器、云平台等。The data processing device mainly involved in the embodiments of the present invention refers to a network connection device capable of realizing network connection, and the data processing device may be a data storage server, a cloud platform, or the like.

参照图1，图1是本发明各实施例涉及的数据处理设备的硬件结构示意图。本发明实施例中，数据处理设备可以包括处理器1001(例如中央处理器Central ProcessingUnit、CPU)，通信总线1002，输入端口1003，输出端口1004，存储器1005。其中，通信总线1002用于实现这些组件之间的连接通信；输入端口1003用于数据输入；输出端口1004用于数据输出，存储器1005可以是高速RAM存储器，也可以是稳定的存储器(non-volatile memory)，例如磁盘存储器，存储器1005可选的还可以是独立于前述处理器1001的存储装置。本领域技术人员可以理解，图1中示出的硬件结构并不构成对本发明的限定，可以包括比图示更多或更少的部件，或者组合某些部件，或者不同的部件布置。Referring to FIG. 1 , FIG. 1 is a schematic diagram of a hardware structure of a data processing device involved in various embodiments of the present invention. In this embodiment of the present invention, the data processing device may include a processor 1001 (eg, Central Processing Unit, CPU), a communication bus 1002 , an input port 1003 , an output port 1004 , and a memory 1005 . Among them, the communication bus 1002 is used to realize the connection communication between these components; the input port 1003 is used for data input; the output port 1004 is used for data output, and the memory 1005 can be a high-speed RAM memory or a non-volatile memory (non-volatile memory). memory), such as a disk memory, the memory 1005 may optionally also be a storage device independent of the aforementioned processor 1001 . Those skilled in the art can understand that the hardware structure shown in FIG. 1 does not constitute a limitation of the present invention, and may include more or less components than those shown in the drawings, or combine some components, or arrange different components.

继续参照图1，图1中作为一种可读存储介质的存储器1005可以包括操作系统、网络通信模块、应用程序模块以及数据处理程序。在图1中，网络通信模块主要用于连接数据存储服务器，与数据存储服务器进行数据通信；而处理器1001可以调用存储器1005中存储的数据处理程序，并执行如下操作：Continuing to refer to FIG. 1 , the memory 1005 as a readable storage medium in FIG. 1 may include an operating system, a network communication module, an application program module and a data processing program. In FIG. 1, the network communication module is mainly used to connect to the data storage server and perform data communication with the data storage server; and the processor 1001 can call the data processing program stored in the memory 1005, and perform the following operations:

进一步地，处理器1001还可以用于调用存储器1005中存储的数据处理程序，并执行以下步骤：Further, the processor 1001 can also be used to call the data processing program stored in the memory 1005, and perform the following steps:

确定当前数据存储库中是否存在所述时间维度对应的时间索引；determining whether there is a time index corresponding to the time dimension in the current data repository;

若当前数据存储库中存在所述时间维度对应的时间索引，则将所述待存储数据从源数据库中读写至所述时间索引中；或者If there is a time index corresponding to the time dimension in the current data repository, read and write the to-be-stored data from the source database to the time index; or

若当前数据存储库中不存在所述时间维度对应的时间索引，则构建所述时间维度对应的时间索引，并将所述待存储数据从源数据库中读写至所述时间索引中。If the time index corresponding to the time dimension does not exist in the current data storage database, the time index corresponding to the time dimension is constructed, and the data to be stored is read and written from the source database into the time index.

基于上述的结构，提出本发明数据处理方法的各个实施例。Based on the above structure, various embodiments of the data processing method of the present invention are proposed.

参照图2，图2是本发明数据处理方法第一实施例的流程示意图。Referring to FIG. 2, FIG. 2 is a schematic flowchart of the first embodiment of the data processing method of the present invention.

本实施例中，所述数据处理方法包括：In this embodiment, the data processing method includes:

步骤S10：接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；Step S10: Receive a data query request sent by the client, and determine at least one query condition corresponding to the data query request;

本实施例中，所述数据处理方法应用于一存储及检索分析数据系统，需要说明的是，本实施例中提出的存储及检索分析数据系统包括存储组件及搜索引擎，可选地，存储组件为ElasticSearch，存储组件中存储着从源数据端获取的大量数据，可选地，基于时间维度将源数据端的数据分别存储至存储组件的对应的数据存储库的不同的索引中，其中，源数据端的数据来源包括但不限于从公开数据库中获取、公开接口服务端中获取及从第三方购买数据等，进一步地，本实施例中，搜索引擎与数据存储库中的索引一一对应，比如从提供信息点服务的公开网站或者接口查询服务网站获取的信息点数据，其中，信息点可以为一栋房子、一个商铺、一个公交站等任何具备地址信息的信息点，如从公交管理系统中获取公交管理系统中关联的公交站A的2020年9月份人流量数据或公交B的2020年9月份人流量数据，并将公交站A的2020年9月份人流量数据或公交B的2020年9月份人流量数据对应存储至数据存储库的2020年9月份索引中，具体地，2020年9月份每天实时从公交管理系统中获取公交站A的人流量数据或公交B的人流量数据，并及时将获取的公交站A的人流量数据或公交B的人流量数据对应存储至数据存储库的2020年9月份索引中，并用公交站A信息标识公交站A的人流量数据及用公交B信息标识公交B的人流量数据。In this embodiment, the data processing method is applied to a system for storing and retrieving analysis data. It should be noted that the system for storing and retrieving analysis data proposed in this embodiment includes a storage component and a search engine. Optionally, a storage component For ElasticSearch, the storage component stores a large amount of data obtained from the source data end. Optionally, the data of the source data end is stored in different indexes of the corresponding data repository of the storage component based on the time dimension, wherein the source data The data sources of the terminal include, but are not limited to, data obtained from public databases, public interface servers, and data purchased from third parties. Information point data obtained from public websites that provide information point services or interface query service websites, where the information point can be any information point with address information such as a house, a shop, a bus station, etc., such as from the bus management system The passenger flow data of bus station A in September 2020 or the passenger flow data of bus B in September 2020 in the bus management system, and the passenger flow data of bus station A in September 2020 or the passenger flow data of bus B in September 2020 The people flow data is stored in the index of September 2020 in the data repository. Specifically, the people flow data of bus station A or the people flow data of bus B is obtained from the bus management system in real time every day in September 2020, and timely The obtained passenger flow data of bus station A or bus B is stored in the September 2020 index of the data repository, and the bus station A information is used to identify the traffic data of bus station A and the bus B information is used to identify the bus. B's traffic data.

该步骤中，客户可通过存储及检索分析数据系统的搜索引擎发送相应的数据查询请求，比如，客户在存储及检索分析数据系统的搜索引擎中输入一些查询请求，如‘深圳9月壹方城的促销活动’或‘深圳and 9月and促销and活动and壹方城’等，具体地查询请求形式不限，当接收客户端发送的数据查询请求时，即接收到客户端基于搜索引擎输入的文字消息或语音消息时，确定客户端基于搜索引擎输入的文字消息或语音消息对应的至少一个查询条件，可选地，判断用户输入的数据查询请求是否包括逻辑词，比如是否包括and、or等，若用户输入的数据查询请求包括逻辑词，如‘深圳and 9月and促销and活动and壹方城’，则确定逻辑词关联的多个关键词，并将每一个关键词作为一个查询条件，如深圳、9月、促销、活动、壹方城，若用户输入的数据查询请求不包括逻辑词，则先删除数据查询请求中的连接词、语气词、停顿词等，接着根据分词器或预设分词规则确定数据查询请求中的关键词，并将每一个关键词作为一个查询条件。In this step, the customer can send the corresponding data query request through the search engine of the storage and retrieval analysis data system. The specific query request form is not limited. When receiving the data query request sent by the client, it will receive the client based on the search engine input. In the case of a text message or a voice message, determine at least one query condition corresponding to the text message or voice message input by the client based on the search engine, and optionally, determine whether the data query request input by the user includes logical words, such as whether it includes and, or, etc. , if the data query request input by the user includes logical words, such as 'Shenzhen and September and promotions and activities and One Fangcheng', then determine multiple keywords associated with the logical words, and use each keyword as a query condition, For example, in Shenzhen, September, promotions, activities, and One Square City, if the data query request input by the user does not include logical words, first delete the connecting words, modal particles, pause words, etc. Set the word segmentation rule to determine the keywords in the data query request, and use each keyword as a query condition.

进一步地，该步骤中还可基于预设分类规则，将客户端发送的数据查询请求划分为多个查询条件，比如按照月份、省份、城市等，比如当客户端发送的数据查询请求为‘咖啡’时，对发送数据查询请求的客户端进行定位，以确定客户端的位置信息、并确定当前数据查询请求对应的请求发起时间，根据客户端的位置信息及请求发起时间确定数据查询请求对应的至少一个查询条件，即咖啡、位置及时间，比如定位出数据查询请求是2020年9月18日下午4点从位于深圳地区的终端设备从发送的，则可将该查询请求划分为2020年、9月、深圳、咖啡等至少一个查询条件。Further, in this step, based on preset classification rules, the data query request sent by the client can be divided into multiple query conditions, such as by month, province, city, etc., for example, when the data query request sent by the client is 'coffee. ', locate the client that sends the data query request to determine the location information of the client, and determine the request initiation time corresponding to the current data query request, and determine at least one corresponding data query request according to the location information of the client and the request initiation time The query conditions are coffee, location and time. For example, if the data query request is located at 4:00 pm on September 18, 2020, it is sent from a terminal device located in Shenzhen, and the query request can be divided into 2020, September , Shenzhen, Coffee, etc. at least one query condition.

步骤S20：分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；Step S20: respectively acquiring query threads corresponding to each query condition, and respectively acquiring target query data corresponding to each query condition from the data repository through the query thread;

该步骤中，在确定数据查询请求对应的至少一个查询条件之后，从线程池中分别获取一个空闲线程执行该查询条件对应的查询操作，比如查询条件包括时间(如5月)及地理位置(如广东)，则根据第一空闲线程执行从数据存储库中获取关于5月份数据的查询操作及根据第二空闲线程执行从数据存储库中获取关于广东数据的查询操作等。In this step, after determining at least one query condition corresponding to the data query request, obtain an idle thread from the thread pool to execute the query operation corresponding to the query condition. For example, the query condition includes time (such as May) and geographic location (such as Guangdong), the query operation for obtaining data in May from the data store is performed according to the first idle thread, and the query operation for obtaining data about Guangdong from the data store is executed according to the second idle thread.

具体地，步骤S20包括：Specifically, step S20 includes:

该步骤中，在获取各个查询条件之后，比如当前查询条件包括时间、地点、咖啡厅，则分别获取时间索引、地点索引、咖啡厅索引对应的查询线程，以根据各个查询线程从数据存储库中获取时间数据、地点数据、咖啡厅数据。In this step, after each query condition is obtained, for example, the current query condition includes time, location, and coffee shop, the query threads corresponding to the time index, location index, and coffee shop index are obtained respectively, so as to retrieve the query threads from the data store according to each query thread. Get time data, location data, cafe data.

优选地，确定多个查询条件对应的一级查询索引，其中，一级索引为源数据库中已存在的索引，比如当前查询条件包括时间、地点、咖啡厅，但是当前数据存储库中的数据仅是基于时间维度进行分区存储的，则将时间作为一级查询索引，接着获取一级查询索引对应的查询线程，并根据一级查询索引对应的查询线程，从数据存储库中获取一级查询索引对应的初始数据集，即关于5月份的初始数据集，其中，初始数据集包括至少一个数据集，如若查询条件为5月，则初始数据集包括当年5月份的数据集、前一年5月份的数据集等，若查询条件为2020年5月，则初始数据集仅包括2020年5月份的数据集。Preferably, a primary query index corresponding to multiple query conditions is determined, wherein the primary index is an existing index in the source database. For example, the current query condition includes time, location, and coffee shop, but the data in the current data repository only If the storage is partitioned based on the time dimension, the time is used as the first-level query index, and then the query thread corresponding to the first-level query index is obtained, and the first-level query index is obtained from the data store according to the query thread corresponding to the first-level query index. The corresponding initial data set, that is, the initial data set about May, where the initial data set includes at least one data set. If the query condition is May, the initial data set includes the data set in May of the current year, and the May of the previous year. If the query condition is May 2020, the initial dataset only includes the dataset in May 2020.

进一步地，从数据存储库中获取一级查询索引对应的初始数据集之后，根据其他各个查询条件，构建二级查询索引，如构建地点、咖啡厅对应的地点索引及咖啡厅索引，接着获取二级查询索引对应的查询线程，并根据二级查询索引对应的查询线程，从初始数据集中获取二级查询索引对应的目标数据，即在获取与时间有关的初始数据集之后，在分别从初始数据集中筛选出与地点及咖啡厅有关的数据。Further, after obtaining the initial data set corresponding to the primary query index from the data repository, build the secondary query index according to other query conditions, such as the construction location, the location index corresponding to the coffee shop, and the coffee shop index, and then obtain the second query index. According to the query thread corresponding to the secondary query index, the target data corresponding to the secondary query index is obtained from the initial data set. Centrally filter data related to locations and cafes.

步骤S30：对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。Step S30: Aggregate and classify at least one target query data to obtain a query result corresponding to the data query request.

该步骤中，在获取各个查询条件对应的目标查询数据之后，基于预设聚合归类算法，对目标查询数据进行聚合归类，以获取查询结果，并将聚合归类结果通过异步线程的方式离线导出，可选地，以txt、json、csv等格式导出查询结果，并将查询结果返回至客户端，如根据各个查询线程从数据存储库中获取到5月份的数据集、地点深圳南山区的数据集、咖啡厅的数据集，接着对5月份的数据集、地点深圳南山区的数据集、咖啡厅的数据集进行聚合归类，比如按照咖啡厅的种类进行聚合归类，比如聚合归类出位于深圳南山区的5月份的咖啡厅名字为星巴克的营业额数据、位于深圳南山区的5月份的咖啡厅名字为瑞信咖啡的营业额数据等，或聚合归类出位于深圳南山区的5月份的每一天所有咖啡厅的累计消耗咖啡量数据等，具体归类结果不作限制。In this step, after obtaining the target query data corresponding to each query condition, based on a preset aggregation and classification algorithm, the target query data is aggregated and classified to obtain query results, and the aggregated and classified results are offline through an asynchronous thread. Export, optionally, export the query results in txt, json, csv and other formats, and return the query results to the client, such as the May data set obtained from the data repository according to each query thread, and the location in Nanshan District, Shenzhen Data set, coffee shop data set, and then aggregate and classify the May data set, the data set in Nanshan District, Shenzhen, and the coffee shop data set, for example, according to the type of coffee shop, such as aggregation and classification The turnover data of the coffee shop named Starbucks in Nanshan District, Shenzhen in May, the turnover data of the coffee shop named Credit Suisse Coffee in May in Nanshan District, Shenzhen, etc., or the aggregated classification of the coffee shop located in Nanshan District, Shenzhen The cumulative coffee consumption data of all cafes for each day in May, etc., the specific classification results are not limited.

本实施例中，接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。与现有技术中采用单一查询线程逐一遍历每个查询条件的技术相比，本发明提出的数据处理方法通过分别通过各个查询条件对应的查询线程从数据存储库中获取各个查询条件对应的目标查询数据，缩减了数据查询时间，进而提高了数据查询效率。In this embodiment, the data query request sent by the client is received, and at least one query condition corresponding to the data query request is determined; the query thread corresponding to each query condition is obtained separately, and each query thread is obtained from the data repository through the query thread. The target query data corresponding to the query condition; at least one target query data is aggregated and classified to obtain the query result corresponding to the data query request. Compared with the prior art technology that uses a single query thread to traverse each query condition one by one, the data processing method proposed by the present invention obtains the target query corresponding to each query condition from the data repository through the query thread corresponding to each query condition. data, shorten the data query time, and then improve the data query efficiency.

此外，基于上述第一实施例，提出本发明的第二实施例，在本实施例中，所述从数据存储库中获取各个查询条件对应的目标查询数据的步骤之前，还包括：In addition, based on the above-mentioned first embodiment, a second embodiment of the present invention is proposed. In this embodiment, before the step of acquiring target query data corresponding to each query condition from the data repository, the method further includes:

步骤S201：若检测到数据存储请求，则确定源数据库中所述数据存储请求对应的待存储数据；Step S201: if a data storage request is detected, determine the data to be stored corresponding to the data storage request in the source database;

步骤S202：确定所述待存储数据对应的时间维度，并根据所述时间维度将所述待存储数据从源数据库中存储至数据存储库中。Step S202: Determine the time dimension corresponding to the data to be stored, and store the data to be stored from the source database to the data storage repository according to the time dimension.

该步骤中，需要说明的是，本实施例中数据存储库中的数据是从源数据库中获取的，因此当检测到数据存储请求时，则确定源数据库中数据存储请求对应的待存储数据，可选地，若检测到数据存储请求，则确定源数据库中数据存储请求对应的待存储数据的步骤包括：In this step, it should be noted that the data in the data storage library in this embodiment is obtained from the source database, so when a data storage request is detected, the data to be stored corresponding to the data storage request in the source database is determined, Optionally, if a data storage request is detected, the step of determining the data to be stored corresponding to the data storage request in the source database includes:

该步骤中，需要说明的是，存储及检索分析数据系统实时对源数据库进行监控，以自动存储数据，以保证数据的完整性，即若监测到源数据库中数据更新，则判定检测到数据存储请求。In this step, it should be noted that the storage and retrieval analysis data system monitors the source database in real time to automatically store the data to ensure the integrity of the data, that is, if the data update in the source database is monitored, it is determined that data storage is detected. ask.

可选地，还可向源数据库自动向存储及检索分析数据系统发送数据存储请求，并发送相应的数据包，具体不做限制。Optionally, it is also possible to automatically send a data storage request to the source database to the storage and retrieval analysis data system, and send a corresponding data packet, which is not specifically limited.

在确定源数据库中数据存储请求对应的待存储数据之后，确定待存储数据对应的时间维度，可选地，根据数据存储请求对应的请求时间，确定时间维度，比如在2020年9月18日检测到数据存储请求，则该数据存储请求对应的待存储数据的时间维度为2020年或者2020年9月，具体取决于数据存储库的时间维度划分规则，接根据该时间维度将待存储数据从源数据库中存储至数据存储库中。After determining the data to be stored corresponding to the data storage request in the source database, determine the time dimension corresponding to the data to be stored, optionally, determine the time dimension according to the request time corresponding to the data storage request, such as detecting on September 18, 2020 To the data storage request, the time dimension of the data to be stored corresponding to the data storage request is 2020 or September 2020, depending on the time dimension division rules of the data repository, and then according to the time dimension, the to-be-stored data is stored from the source The database is stored in the data repository.

具体地，所述确定所述待存储数据对应的时间维度，并根据所述时间维度将所述待存储数据从源数据库中存储至数据存储库中的步骤包括：Specifically, the step of determining the time dimension corresponding to the data to be stored and storing the data to be stored from the source database to the data repository according to the time dimension includes:

步骤S2021：确定当前数据存储库中是否存在所述时间维度对应的索引；Step S2021: Determine whether there is an index corresponding to the time dimension in the current data repository;

步骤S2022：若当前数据存储库中存在所述时间维度对应的索引，则将所述待存储数据从源数据库中存储至所述时间维度对应的索引中；或者Step S2022: if there is an index corresponding to the time dimension in the current data repository, store the to-be-stored data from the source database to the index corresponding to the time dimension; or

步骤S2023：若当前数据存储库中不存在所述时间维度对应的索引，则构建所述时间维度对应的索引，并将所述待存储数据从源数据库中存储至所述时间维度对应的索引中。Step S2023: If the index corresponding to the time dimension does not exist in the current data repository, construct the index corresponding to the time dimension, and store the data to be stored from the source database to the index corresponding to the time dimension. .

该步骤中，比如该数据存储请求对应的待存储数据的时间维度2020年9月，则确定当前数据存储库中是否存在2020年9月对应的2020年索引或2020年9月索引，若当前数据存储库中存在2020年9月对应的2020年9月索引，则将待存储数据从源数据库中读写至所述2020年9月索引中，进一步地，若当前数据存储库中不存在时间维度对应的索引，则构建时间维度对应的索引，并将待存储数据从源数据库中读写至时间维度对应的索引中。In this step, for example, the time dimension of the to-be-stored data corresponding to the data storage request is September 2020, then it is determined whether there is a 2020 index or a September 2020 index corresponding to September 2020 in the current data repository. If the September 2020 index corresponding to September 2020 exists in the repository, read and write the data to be stored from the source database to the September 2020 index. Further, if there is no time dimension in the current data repository For the corresponding index, the index corresponding to the time dimension is constructed, and the data to be stored is read and written from the source database to the index corresponding to the time dimension.

进一步地，所述将所述待存储数据从源数据库中存储至数据存储库中的步骤包括：Further, the step of storing the to-be-stored data from the source database to the data repository includes:

该步骤中，需要说明的是，在数据高迸发状态下，由于写数据要比读数据的耗时更久，则采用单一线程进行数据读写时会造成数据读写效率过低，严重时造成系统崩溃，因此本发明实施例中，采用读写分离技术将代存储数据存储至数据存储库中，具体地，采用读线程源源不断地将源数据库中的待存储数据读取至存储队列，并从线程池中调用空闲的写线程将存储队列中的待存储数据取出，接着对写线程取出的待存储数据进行结构化处理，可选地，对待存储数据进行过滤、去重及合并处理，以获取目标待存储数据，并当待存储数据的数据量达到预设数量，比如达到1000条数据，则直接批量将待存储数据存储时间维度对应的索引中。In this step, it should be noted that in the state of high data burst, since it takes longer to write data than to read data, using a single thread to read and write data will result in low data read and write efficiency, and in severe cases it will cause The system crashes, so in this embodiment of the present invention, the read-write separation technology is used to store the surrogate storage data in the data storage repository. Specifically, the read thread is used to continuously read the data to be stored in the source database to the storage queue, and Invoke an idle writer thread from the thread pool to take out the data to be stored in the storage queue, and then perform structured processing on the data to be stored taken out by the write thread. Acquire the target data to be stored, and when the amount of data to be stored reaches a preset amount, such as 1000 pieces of data, directly store the data to be stored in the index corresponding to the time dimension in batches.

进一步地，调用第一写线程将所述存储队列中的待存储数据取出的步骤之后，还包括：判断所述存储队列中是否存在待存储数据；若所述存储队列中存在待存储数据，则继续执行调用写线程将所述存储队列中的待存储数据取出的步骤；继续执行若所述写线程取出的数据量达到预设数量，则将所述写线程取出的待存储数据写入至数据存储库中的步骤。Further, after the step of calling the first writing thread to take out the data to be stored in the storage queue, it also includes: judging whether there is data to be stored in the storage queue; if there is data to be stored in the storage queue, then Continue to execute the step of calling the write thread to take out the data to be stored in the storage queue; continue to execute if the amount of data taken out by the write thread reaches a preset amount, then write the to-be-stored data taken out by the write thread into the data steps in the repository.

本实施例中，若检测到数据存储请求，则确定源数据库中所述数据存储请求对应的待存储数据；确定所述待存储数据对应的时间维度，并根据所述时间维度将所述待存储数据从源数据库中存储至数据存储库中，由此，通过根据时间维度将待存储数据从源数据库中存储至数据存储库中，提高了从数据存储库中查询数据的效率，进一步地，采用读写分离技术将待存储数据存储至数据存储库中，提高了数据存储效率。In this embodiment, if a data storage request is detected, the data to be stored corresponding to the data storage request in the source database is determined; the time dimension corresponding to the data to be stored is determined, and the data to be stored is determined according to the time dimension The data is stored from the source database to the data repository, thus, by storing the data to be stored from the source database to the data repository according to the time dimension, the efficiency of querying data from the data repository is improved, and further, using The read-write separation technology stores the data to be stored in the data repository, which improves the data storage efficiency.

此外，本实施例还提供一种数据处理装置。参照图3，图3为本发明数据处理装置第一实施例的功能模块示意图。In addition, this embodiment also provides a data processing apparatus. Referring to FIG. 3 , FIG. 3 is a schematic diagram of functional modules of the first embodiment of the data processing apparatus of the present invention.

本实施例中，所述数据处理装置为虚拟装置，存储于图1所示的数据处理设备的存储器1005中，以实现数据处理程序的所有功能：用于接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；用于分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；用于对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。In this embodiment, the data processing device is a virtual device, which is stored in the memory 1005 of the data processing device shown in FIG. 1 to realize all functions of the data processing program: it is used to receive a data query request sent by the client, and Determining at least one query condition corresponding to the data query request; used to obtain query threads corresponding to each query condition respectively, and respectively obtain target query data corresponding to each query condition from the data repository through the query thread; used for at least one query The target query data is aggregated and classified to obtain query results corresponding to the data query request.

具体地，所述数据处理装置包括：Specifically, the data processing device includes:

接收模块10，用于接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；A receiving module 10, configured to receive a data query request sent by a client, and determine at least one query condition corresponding to the data query request;

获取模块20，用于分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；The obtaining module 20 is configured to obtain the query thread corresponding to each query condition respectively, and obtain the target query data corresponding to each query condition from the data repository through the query thread respectively;

聚合模块30，用于对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。The aggregation module 30 is configured to aggregate and classify at least one target query data to obtain a query result corresponding to the data query request.

进一步地，所述获取模块还用于：Further, the acquisition module is also used for:

此外，本发明实施例还提供一种计算机存储介质，所述计算机存储介质上存储有数据处理程序，所述数据处理程序被处理器运行时实现如上所述数据处理方法的步骤，此处不再赘述。In addition, an embodiment of the present invention also provides a computer storage medium, where a data processing program is stored on the computer storage medium, and when the data processing program is run by a processor, the steps of the above data processing method are implemented, which is not repeated here. Repeat.

相比现有技术，本发明提出的一种数据处理方法、装置、设备及计算机存储介质，所述数据处理方法包括：接收客户端发送的数据查询请求，并确定所述数据查询请求对应的至少一个查询条件；分别获取各个查询条件对应的查询线程，并分别通过查询线程从数据存储库中获取各个查询条件对应的目标查询数据；对至少一个目标查询数据进行聚合归类，以获取所述数据查询请求对应的查询结果。与现有技术中采用单一查询线程逐一遍历每个查询条件的技术相比，本发明提出的数据处理方法通过分别通过各个查询条件对应的查询线程从数据存储库中获取各个查询条件对应的目标查询数据，缩减了数据查询时间，进而提高了数据查询效率。Compared with the prior art, the present invention proposes a data processing method, device, equipment and computer storage medium. The data processing method includes: receiving a data query request sent by a client, and determining at least one corresponding to the data query request. a query condition; respectively obtain the query thread corresponding to each query condition, and obtain the target query data corresponding to each query condition from the data repository through the query thread respectively; perform aggregation and classification on at least one target query data to obtain the data The query result corresponding to the query request. Compared with the prior art technology that uses a single query thread to traverse each query condition one by one, the data processing method proposed by the present invention obtains the target query corresponding to each query condition from the data repository through the query thread corresponding to each query condition. data, shorten the data query time, and then improve the data query efficiency.

需要说明的是，在本文中，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者系统不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or system comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, method, article or system. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article or system that includes the element.

上述本发明实施例序号仅仅为了描述，不代表实施例的优劣。The above-mentioned serial numbers of the embodiments of the present invention are only for description, and do not represent the advantages or disadvantages of the embodiments.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件，但很多情况下前者是更佳的实施方式。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中，包括若干指令用以使得一台终端设备执行本发明各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course hardware can also be used, but in many cases the former is better implementation. Based on such understanding, the technical solutions of the present invention can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products are stored in a storage medium (such as ROM/RAM) as described above. , magnetic disk, optical disk), including several instructions to make a terminal device execute the method described in each embodiment of the present invention.

以上所述仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或流程变换，或直接或间接运用在其它相关的技术领域，均同理包括在本发明的专利保护范围内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any equivalent structure or process transformation made by using the contents of the description and drawings of the present invention, or directly or indirectly applied in other related technical fields , are similarly included in the scope of patent protection of the present invention.

Claims

1. A data processing method, characterized in that the data processing method comprises the steps of:

receiving a data query request sent by a client, and determining at least one query condition corresponding to the data query request;

respectively acquiring query threads corresponding to the query conditions, and respectively acquiring target query data corresponding to the query conditions from a data repository through the query threads;

and performing aggregation classification on at least one target query data to obtain a query result corresponding to the data query request.

2. The data processing method according to claim 1, wherein before the step of obtaining the target query data corresponding to each query condition from the data repository by the query thread, the method further comprises:

if the data storage request is detected, determining data to be stored corresponding to the data storage request in the source database;

and determining a time dimension corresponding to the data to be stored, and storing the data to be stored into a data storage library from a source database according to the time dimension.

3. The data processing method according to claim 2, wherein the step of determining the data to be stored corresponding to the data storage request if the data storage request is detected comprises:

monitoring a source database in real time;

if the data in the source database is monitored to be updated, judging that a data storage request is detected;

and determining update data corresponding to data update in a source database, and using the update data as data to be stored corresponding to the data storage request.

4. The data processing method according to claim 2, wherein the step of determining a time dimension corresponding to the data to be stored, and storing the data to be stored from the source database into the data repository according to the time dimension comprises:

determining whether an index corresponding to the time dimension exists in a current data storage library;

if the index corresponding to the time dimension exists in the current data storage library, storing the data to be stored into the index corresponding to the time dimension from the source database; or

If the index corresponding to the time dimension does not exist in the current data storage library, the index corresponding to the time dimension is constructed, and the data to be stored is stored into the index corresponding to the time dimension from the source database.

5. The data processing method of claim 4, wherein the step of storing the data to be stored from the source database into the index corresponding to the time dimension comprises:

calling a read thread to read data to be stored in a source database to a storage queue, and calling an idle write thread from a thread pool to take out the data to be stored in the storage queue;

carrying out structuralization processing on the data to be stored taken out by the write thread to acquire target data to be stored;

and when the data volume of the target data to be stored reaches a preset number, storing the data to be stored in the index corresponding to the time dimension.

6. The data processing method of claim 5, wherein after the step of invoking an idle write thread to fetch data to be stored in the store queue, further comprising:

judging whether the storage queue has data to be stored or not;

if the data to be stored exists in the storage queue, returning to execute the step of calling an idle write thread from a thread pool to take out the data to be stored in the storage queue;

continuing to execute the step of carrying out structuralization processing on the data to be stored taken out by the write thread to obtain target data to be stored;

and continuing to execute the step of storing the data to be stored in the index corresponding to the time dimension when the data volume of the target data to be stored reaches a preset number.

7. The data processing method according to any one of claims 1 to 6, wherein the step of respectively obtaining the query threads corresponding to the respective query conditions and respectively obtaining the target data corresponding to the respective query conditions from the data repository through the query threads comprises:

determining a query index matched with each query condition, and acquiring a query thread corresponding to the query index;

and respectively acquiring target data corresponding to each query condition from the data storage library according to each query thread.

8. A data processing apparatus, characterized in that the data processing apparatus comprises:

the receiving module is used for receiving a data query request sent by a client and determining at least one query condition corresponding to the data query request;

the acquisition module is used for respectively acquiring the query threads corresponding to the query conditions and respectively acquiring the target query data corresponding to the query conditions from the data repository through the query threads;

and the aggregation module is used for performing aggregation classification on at least one target query data to obtain a query result corresponding to the data query request.

9. A data processing device comprising a processor, a memory and a data processing program stored in the memory, which data processing program, when executed by the processor, carries out the steps of the data processing method according to any one of claims 1 to 7.

10. A computer storage medium, having stored thereon a data processing program which, when executed by a processor, implements the steps of the data processing method according to any one of claims 1 to 7.