CN115718690A - Data accuracy monitoring system and method - Google Patents
Data accuracy monitoring system and method Download PDFInfo
- Publication number
- CN115718690A CN115718690A CN202211512387.XA CN202211512387A CN115718690A CN 115718690 A CN115718690 A CN 115718690A CN 202211512387 A CN202211512387 A CN 202211512387A CN 115718690 A CN115718690 A CN 115718690A
- Authority
- CN
- China
- Prior art keywords
- data
- information
- counted
- log
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
本发明公开了一种数据准确性监控系统和方法,包括:数据统计模块,用于从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息;从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息;日志收集模块,用于实时采集数据链路上每个节点所包括的日志信息;数据分析模块,用于通过对比数据统计模块所确定的第一数据信息和第二数据信息,确定数据统计结果信息;通过对比日志收集模块所采集的各节点的日志信息,确定日志统计结果信息。通过数据分析模块对比第一数据信息和第二数据信息,确定数据统计结果信息,通过对比各节点的日志信息确定日志统计结果信息,提高了对数据进行监控的准确性。
The invention discloses a data accuracy monitoring system and method, comprising: a data statistics module, used to determine the first data information of the first table to be counted in the tenant database to be counted from the database cluster included in the production system; Determine the second data information of the second table to be counted corresponding to the first table to be counted from the real-time data warehouse included in the production system; the log collection module is used to collect log information included in each node on the data link in real time ; The data analysis module is used to determine the data statistics result information by comparing the first data information and the second data information determined by the data statistics module; by comparing the log information of each node collected by the log collection module, determine the log statistics result information . The data analysis module compares the first data information and the second data information to determine the data statistical result information, and determines the log statistical result information by comparing the log information of each node, thereby improving the accuracy of data monitoring.
Description
技术领域technical field
本发明实施例涉及数据处理技术领域,尤其涉及一种数据准确性监控系统和方法。The embodiments of the present invention relate to the technical field of data processing, and in particular to a data accuracy monitoring system and method.
背景技术Background technique
随着数据处理技术领域的发展,对大量数据进行数据处理的需求不断增多。在对大量数据进行数据处理时,需要对大量数据进行数据监控,以便在大量数据中发现数据丢失或数据不一致的问题。With the development of the field of data processing technology, the demand for data processing of large amounts of data is constantly increasing. When performing data processing on a large amount of data, it is necessary to perform data monitoring on the large amount of data in order to find data loss or data inconsistency in the large amount of data.
现有技术中,对数据的准确性监控通常更关注实时数据的准确性和数据链路的稳定性等,采取自动扩容、主备链路的方法提高数据链路的稳定性,而当主数据链路发生故障启动备数据链路时,可能会发生数据丢失,但现有技术中更关注实时数据的准确性,无法对丢失的数据进行定位,使数据监控的准确性降低。故,如何提高对数据进行监控的准确性是当前亟待解决的技术问题。In the existing technology, the accuracy monitoring of data usually pays more attention to the accuracy of real-time data and the stability of the data link, etc., and the method of automatic expansion and active and standby links is adopted to improve the stability of the data link, and when the main data link Data loss may occur when the standby data link is activated due to road failure. However, in the existing technology, more attention is paid to the accuracy of real-time data, and the lost data cannot be located, which reduces the accuracy of data monitoring. Therefore, how to improve the accuracy of data monitoring is an urgent technical problem to be solved at present.
发明内容Contents of the invention
本发明提供了一种数据准确性监控系统和方法,提高了对数据进行监控的准确性。The invention provides a data accuracy monitoring system and method, which improves the accuracy of data monitoring.
第一方面,本发明实施例提供了一种数据准确性监控系统,包括:In a first aspect, an embodiment of the present invention provides a data accuracy monitoring system, including:
数据统计模块,用于从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息;从生产系统所包括的实时数仓中确定与所述第一待统计表对应的第二待统计表的第二数据信息;The data statistics module is used to determine the first data information of the first statistics table in the tenant library to be counted from the database cluster included in the production system; The second data information of the second table to be counted corresponding to the table to be counted;
日志收集模块,用于实时采集数据链路上的每个节点所包括的日志信息;The log collection module is used to collect log information included in each node on the data link in real time;
数据分析模块,用于分别连接所述数据统计模块和所述日志收集模块;通过对比所述数据统计模块所确定的所述第一数据信息和所述第二数据信息,确定数据统计结果信息;通过对比所述日志收集模块所采集的各节点的日志信息,确定日志统计结果信息;The data analysis module is used to connect the data statistics module and the log collection module respectively; by comparing the first data information and the second data information determined by the data statistics module, determine the data statistics result information; By comparing the log information of each node collected by the log collection module, determine the log statistical result information;
其中,所述数据库集群的数量为一个或多个,各所述数据库集群中包括一个或多个租户库,各租户库中包括一个或多个第一待统计表;所述第二待统计表为所述第一待统计表经数据链路传输至所述实时数仓的表。Wherein, the number of the database clusters is one or more, and each of the database clusters includes one or more tenant databases, and each tenant database includes one or more first tables to be counted; the second table to be counted The first table to be counted is transmitted to the real-time data warehouse through a data link.
第二方面,本发明实施例提供了一种数据准确性监控方法,应用于如第一方面所述的数据准确性监控系统,包括:In the second aspect, an embodiment of the present invention provides a data accuracy monitoring method, which is applied to the data accuracy monitoring system described in the first aspect, including:
通过数据统计模块从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息;Determining the first data information of the first table to be counted in the tenant database to be counted from the database cluster included in the production system through the data statistics module;
通过数据统计模块从生产系统所包括的实时数仓中确定与所述第一待统计表对应的第二待统计表的第二数据信息;Determining the second data information of the second table to be counted corresponding to the first table to be counted from the real-time data warehouse included in the production system through the data statistics module;
通过日志收集模块实时采集数据链路上的每个节点所包括的日志信息;The log information included in each node on the data link is collected in real time through the log collection module;
通过数据分析模块对比所述数据统计模块所确定的所述第一数据信息和所述第二数据信息,确定数据统计结果信息;comparing the first data information and the second data information determined by the data statistics module through the data analysis module to determine the data statistics result information;
通过数据分析模块对比所述日志收集模块所采集的各节点的日志信息,确定日志统计结果信息。The log information of each node collected by the log collection module is compared by the data analysis module to determine the log statistical result information.
本发明实施例的技术方案,通过数据统计模块确定第一待统计表的第一数据信息,和与第一待统计表对应的第二待统计表的第二数据信息;通过日志收集模块实时采集数据链路上每个节点所包括的日志信息;通过数据分析模块对比第一数据信息和第二数据信息,确定数据统计结果信息,通过数据分析模块对比各节点的日志信息,确定日志统计结果信息,提高了对数据进行监控的准确性。In the technical scheme of the embodiment of the present invention, the first data information of the first table to be counted is determined by the data statistics module, and the second data information of the second table to be counted corresponding to the first table to be counted; real-time collection by the log collection module The log information included in each node on the data link; compare the first data information and the second data information through the data analysis module to determine the data statistical result information, and compare the log information of each node through the data analysis module to determine the log statistical result information , improving the accuracy of data monitoring.
应当理解,本部分所描述的内容并非旨在标识本发明的实施例的关键或重要特征,也不用于限制本发明的范围。本发明的其它特征将通过以下的说明书而变得容易理解。It should be understood that the content described in this section is not intended to identify key or important features of the embodiments of the present invention, nor is it intended to limit the scope of the present invention. Other features of the present invention will be easily understood from the following description.
附图说明Description of drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.
图1是根据本发明实施例一提供的一种数据准确性监控系统的结构示意图;FIG. 1 is a schematic structural diagram of a data accuracy monitoring system provided according to
图2是根据本发明实施例一提供的一种生产系统的结构示意图;Fig. 2 is a schematic structural diagram of a production system provided according to
图3是根据本发明实施例二提供的一种数据准确性监控系统的结构示意图;3 is a schematic structural diagram of a data accuracy monitoring system provided according to
图4是根据本发明实施例三提供的一种多租户库数据准确性监控系统的结构示意图;4 is a schematic structural diagram of a multi-tenant database data accuracy monitoring system provided according to Embodiment 3 of the present invention;
图5是根据本发明实施例三提供的一种数据统计子模块的结构示意图;Fig. 5 is a schematic structural diagram of a data statistics sub-module provided according to Embodiment 3 of the present invention;
图6是根据本发明实施例四提供的一种数据准确性监控方法的流程图。Fig. 6 is a flowchart of a data accuracy monitoring method according to Embodiment 4 of the present invention.
具体实施方式Detailed ways
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the following will clearly and completely describe the technical solutions in the embodiments of the present invention in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only It is an embodiment of a part of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts shall fall within the protection scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It should be noted that the terms "first" and "second" in the description and claims of the present invention and the above drawings are used to distinguish similar objects, but not necessarily used to describe a specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the invention described herein can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.
可以理解的是,在使用本发明各实施例公开的技术方案之前,均应当依据相关法律法规通过恰当的方式对本公开所涉及个人信息的类型、使用范围以及使用场景等告知用户并获得用户的授权。It can be understood that before using the technical solutions disclosed in the embodiments of the present invention, the user should be informed of the type, scope of use, and use scenarios of the personal information involved in this disclosure in an appropriate manner in accordance with relevant laws and regulations, and the authorization of the user should be obtained. .
实施例一Embodiment one
图1是根据本发明实施例一提供的一种数据准确性监控系统的结构示意图,本实施例可适用于实现数据的准确性监控的情况。如图1所示,一种数据准确性监控系统10,包括:FIG. 1 is a schematic structural diagram of a data accuracy monitoring system according to
数据统计模块11,用于从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息;从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息;The
日志收集模块12,用于实时采集数据链路上的每个节点所包括的日志信息;The
数据分析模块13,用于分别连接数据统计模块11和日志收集模块12;通过对比数据统计模块11所确定的第一数据信息和第二数据信息,确定数据统计结果信息;通过对比日志收集模块12所采集的各节点的日志信息,确定日志统计结果信息;The
其中,数据库集群的数量为一个或多个,各数据库集群中包括一个或多个租户库,各租户库中包括一个或多个第一待统计表;第二待统计表为第一待统计表经数据链路传输至实时数仓的表。Wherein, the number of database clusters is one or more, and each database cluster includes one or more tenant libraries, and each tenant library includes one or more first tables to be counted; the second table to be counted is the first table to be counted The table transmitted to the real-time data warehouse through the data link.
其中,生产系统可以是指在正常情况下支持日常业务运作的信息系统。图2是根据本发明实施例一提供的一种生产系统的结构示意图,本发明实施例中的数据准确性监控系统10可以对该生产系统中的数据进行准确性监控,从而发现生产系统中有数据丢失问题的租户库或表。Among them, the production system may refer to the information system that supports daily business operations under normal circumstances. Fig. 2 is a schematic structural diagram of a production system provided according to
如图2所示,生产系统中可以包括一个或多个数据库集群(如图2中最左侧的每个矩形框可以表示一个数据库集群),各数据库集群中可以包括一个或多个租户库(如图2中最左侧的任一矩形框中的每个圆柱型框可以表示一个租户库),各租户库中可以包括一个或多个数据库表(Table)。As shown in Figure 2, the production system may include one or more database clusters (each leftmost rectangular box in Figure 2 may represent a database cluster), and each database cluster may include one or more tenant libraries ( Each cylindrical box in any leftmost rectangular box in FIG. 2 may represent a tenant library), and each tenant library may include one or more database tables (Table).
生产系统中还可以包括数据链路,在数据链路上可以包含关系型数据库管理系统二进制日志消费者(Mysql Binlog Consumer)、存放二进制日志的消息队列中间件(Message Queue)和处理二进制日志的消息消费者(Message Consumer),这些组件对应的节点可以配备相应的日志并记录自身处理的数据记录。其中,Mysql Binlog Consumer可以用于指示租户库对应的Binlog,每个租户库可以有对应的Binlog;Message Queue可以用于存放Binlog,通过对Message Queue进行配置可以选择需要进行数据准确性分析的租户库和/或表;Message Consumer可以用于处理Binlog,如对Binlog进行格式转换等,MessageConsumer可以指示Binlog产生的时间或消息类型等信息。The production system can also include a data link, which can include a relational database management system binary log consumer (Mysql Binlog Consumer), a message queue middleware (Message Queue) for storing binary logs, and messages for processing binary logs on the data link Consumer (Message Consumer), the nodes corresponding to these components can be equipped with corresponding logs and record the data records processed by themselves. Among them, Mysql Binlog Consumer can be used to indicate the Binlog corresponding to the tenant library, and each tenant library can have a corresponding Binlog; Message Queue can be used to store Binlog, and the tenant library that needs to be analyzed for data accuracy can be selected by configuring the Message Queue And/or table; Message Consumer can be used to process Binlog, such as format conversion of Binlog, etc. Message Consumer can indicate information such as the time or message type generated by Binlog.
生产系统中还可以包括实时数仓,实时数仓更注重数据的实时性,原始的数据库集群中的租户库和/或表(即图2中最左侧的任一矩形框所示的租户库和/或表)可以经数据链路传输至实时数仓,实时数仓中可以有原始的数据库集群中的租户库和/或表,还可以有经数据链路中的节点处理后,与原始的数据库集群中的租户库和/或表相对应的租户库和/或表。The production system can also include a real-time data warehouse. The real-time data warehouse pays more attention to the real-time nature of the data. The tenant library and/or table in the original database cluster (that is, the tenant library shown in any leftmost rectangle in Figure 2 and/or table) can be transmitted to the real-time data warehouse through the data link, the real-time data warehouse can have the original tenant library and/or table in the database cluster, and there can also be processed by the nodes in the data link, and the original The tenant library and/or table corresponding to the tenant library and/or table in the database cluster.
数据统计模块11,用于从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息。The
其中,待统计的租户库可以是指,生产系统的数据库集群中待进行数据统计的一个或多个租户库。第一待统计表可以是指任一租户库中待进行数据统计的任一数据库表,各租户库中均可以包括一个或多个第一待统计表。对第一待统计表的数量以及第一待统计表所在的租户库不作限定,具体可以根据实际应用需要在待统计的租户库中选择第一待统计表。Wherein, the tenant database to be counted may refer to one or more tenant databases to be counted in the database cluster of the production system. The first table to be counted may refer to any database table to be counted in any tenant database, and each tenant database may include one or more first tables to be counted. The number of the first tables to be counted and the tenant database in which the first tables to be counted are not limited. Specifically, the first table to be counted can be selected from the tenant database to be counted according to actual application needs.
第一数据信息可以是指第一待统计表中的数据所指示的信息。如,第一数据信息可以包括数据量和业务指标。数据量可以是第一待统计表中的数据的总量,如第一待统计表中所记录数据的总行数。业务指标可以是根据实际应用需要所选择的与业务场景相关的指标,如销售额或销售数量等指标。The first data information may refer to the information indicated by the data in the first table to be counted. For example, the first data information may include data volume and service indicators. The amount of data may be the total amount of data in the first table to be counted, such as the total number of rows of data recorded in the first table to be counted. The business indicator may be an indicator related to a business scenario selected according to actual application needs, such as an indicator such as sales or sales quantity.
数据统计模块11确定待统计的租户库中第一待统计表的第一数据信息的方式不作限定,只要能够确定待统计的租户库中第一待统计表的第一数据信息即可。对于任一第一待统计表,数据统计模块11可以提供第一待统计表接入数据准确性监控系统10的接口,使第一待统计表可以通过数据统计模块11提供的接口接入至数据准确性监控系统10;当第一待统计表接入数据准确性监控系统10时,数据统计模块11可以通过count()函数将第一待统计表所包括的数据的总行数抽象化为数据量,通过sum()函数将第一待统计表所包括数据的累加值抽象化为业务指标;再通过编程语句对抽象化后的数据进行处理,数据统计模块11执行编程语句即可确定第一待统计表对应的数据量和业务指标,即确定第一待统计表的第一数据信息。其中,编程语句可以根据实际应用需要设定,如结构化查询语言(Structured Query Language,SQL)。The manner in which the
数据统计模块11,用于从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息。The
第二待统计表处在生产系统所包括的实时数仓中,第二待统计表可以是指第一待统计表经数据链路传输至实时数仓的表。在第一待统计表经数据链路传输至实时数仓的过程中,数据链路中的组件对应的节点可以配备相应的日志并记录自身处理的数据记录。The second table to be counted is located in the real-time data warehouse included in the production system, and the second table to be counted may refer to the table that the first table to be counted is transmitted to the real-time data warehouse through a data link. During the process of transmitting the first table to be counted to the real-time data warehouse through the data link, the nodes corresponding to the components in the data link can be equipped with corresponding logs and record the data records processed by themselves.
当用户对第一待统计表有修改等操作时,第一待统计表可以被数据链路中的节点处理,如记录用户对第一待统计表的修改操作等,再将处理后的第一待统计表传输至实时数仓,即为实时数仓中的第二待统计表,第二待统计表即为当前时刻对第一待统计表进行修改操作之后的数据库表。When the user has operations such as modifying the first table to be counted, the first table to be counted can be processed by nodes in the data link, such as recording the user's modification operation on the first table to be counted, and then the processed first table to be counted The table to be counted is transmitted to the real-time data warehouse, which is the second table to be counted in the real-time data warehouse, and the second table to be counted is the database table after the modification operation on the first table to be counted at the current moment.
第二数据信息可以是指第二待统计表中的数据所指示的信息。如,第二数据信息可以包括数据量和业务指标。数据量可以是第二待统计表中的数据的总量,如第二待统计表中所记录数据的总行数。业务指标可以是根据实际应用需要所选择的与业务场景相关的指标,如销售额或销售数量等指标。The second data information may refer to the information indicated by the data in the second table to be counted. For example, the second data information may include data volume and service indicators. The amount of data may be the total amount of data in the second table to be counted, such as the total number of rows of data recorded in the second table to be counted. The business indicator may be an indicator related to a business scenario selected according to actual application needs, such as an indicator such as sales or sales quantity.
数据统计模块11从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息的方式不作限定,只要能够确定第二待统计表的第二数据信息即可。如数据统计模块11确定第二待统计表的第二数据信息的方式,可以与确定第一待统计表的第一数据信息的方式相同,此处不再赘述。The
日志收集模块12,用于实时采集数据链路上的每个节点所包括的日志信息。The
其中,数据链路上的每个节点可以是指Mysql Binlog Consumer、Message Queue和Message Consumer这些组件对应的节点,如对应二进制日志消费者的第一节点、对应存放二进制日志的消息队列中间件的第二节点和处理二进制日志的消息消费者的第三节点,这些节点可以配备相应的日志并记录自身处理的数据记录。Wherein, each node on the data link may refer to the nodes corresponding to components such as Mysql Binlog Consumer, Message Queue, and Message Consumer, such as the first node corresponding to the binary log consumer, and the first node corresponding to the message queue middleware storing the binary log. The second node and the third node that handles the message consumer of the binary log, these nodes can be equipped with corresponding logs and record the data records processed by themselves.
每个节点所包括的日志信息可以是指各节点配备的日志所指示的信息。日志信息可以包括每个节点处理或存放的数据的关键信息,如表中记录的主键值。日志信息所指示的日志可以保持统一的格式和目录结构。The log information included in each node may refer to the information indicated by the log provided by each node. The log information can include key information of the data processed or stored by each node, such as the primary key value recorded in the table. The logs indicated by the log information can maintain a unified format and directory structure.
可选的,日志收集模块12还可以实时采集对应实时数仓的第四节点的日志信息。Optionally, the
日志收集模块12实时采集数据链路上的每个节点所包括的日志信息,或实时采集对应实时数仓的第四节点的日志信息的方式不作限定,如可以通过日志文件托运工具Filebeat来采集日志信息,Filebeat可以监控各节点的日志,追踪读取各节点日志对应的文件。The
数据分析模块13,用于分别连接数据统计模块11和日志收集模块12。The
数据分析模块13分别连接数据统计模块11和日志收集模块12的方式不作限定,只要能够使数据统计模块11所确定的第一数据信息和第二数据信息,和日志收集模块12所采集的各节点的日志信息能够传输至数据分析模块13即可。如可以是通过电连接,或通过网络连接等。The way that
数据统计结果信息可以是指对比第一数据信息和第二数据信息的结果所指示的信息,数据统计结果信息可以指示由第一待统计表到第二待统计表未发生数据丢失或发生数据丢失。The data statistical result information may refer to the information indicated by the result of comparing the first data information and the second data information, and the data statistical result information may indicate that no data loss or data loss occurs from the first table to be counted to the second table to be counted .
数据分析模块13用于通过对比数据统计模块11所确定的第一数据信息和第二数据信息,确定数据统计结果信息。如,通过结构化查询语言(Structured Query Language,SQL)对比当前时刻确定的第一数据信息所包括的数据量和第二数据信息所包括的数据量,或对比当前时刻确定的第一数据信息所包括的业务指标和第二数据信息所包括的业务指标,当第一数据信息所包括的数据量和业务指标与第二数据信息所包括数据量和业务指标相同时,确定数据统计结果信息指示未发生数据丢失;反之,确定数据统计结果信息指示发生数据丢失。The
又如,数据分析模块13通过趋势分析法对比数据统计模块11所确定的第一数据信息和第二数据信息,确定数据统计结果信息。趋势分析法可以是指通过对第一数据信息或第二数据信息各时间片段的变化趋势的分析,从中发现数据丢失问题的方法,通过趋势分析法可以处理实时系统无法保持静止状态而导致统计误差的分析问题。As another example, the
趋势分析法根据需要按时间长度将数据划分为片段,比如最近一周按天数可划分为7个时间片段,每个时间片段的时长为1天,每个时间片段内都会确定一次第一数据信息和第二数据信息,当第8天执行统计的时候,可根据第8天确定的第一数据信息和第二数据信息,结合前7天时间片段里所确定第一数据信息和第二数据信息,确定第8天对应的数据统计结果信息。且当数据准确性监控系统10第一次启动时,如果确定一个一致的基线,使第一数据信息和第二数据信息一致,则可以通过趋势分析法回溯到出现数据异常的时间片段,该时间片段的数据统计结果信息指示发生数据丢失。The trend analysis method divides the data into segments according to the length of time as needed. For example, the latest week can be divided into 7 time segments according to the number of days. The duration of each time segment is 1 day. The first data information and The second data information, when statistics are performed on the 8th day, can be based on the first data information and the second data information determined on the 8th day, combined with the first data information and the second data information determined in the time segment of the previous 7 days, Determine the data statistics result information corresponding to the 8th day. And when the data
日志统计结果信息可以是指对比各节点的日志信息的结果所指示的信息,日志统计结果信息可以指示在数据链路上出现数据不一致问题的节点。The log statistical result information may refer to the information indicated by the result of comparing the log information of each node, and the log statistical result information may indicate a node having a data inconsistency problem on the data link.
数据分析模块13通过对比日志收集模块12所采集的各节点的日志信息,确定日志统计结果信息。如,通过数据分析模块13对比业务主键、业务发生时间和binlog类型等来发现在数据链路上出现数据不一致问题的节点,将出现数据不一致问题的节点确定为日志统计结果信息。The
需要说明的是,数据分析模块13确定数据统计结果信息和日志统计结果信息的频率不作限定,如数据分析模块13每天确定一次数据统计结果信息,或数据分析模块13每天确定一次日志统计结果信息,或数据分析模块13每小时确定一次日志统计结果信息。It should be noted that the frequency of the
本发明实施例的技术方案,通过数据统计模块确定第一待统计表的第一数据信息,和与第一待统计表对应的第二待统计表的第二数据信息;通过日志收集模块实时采集数据链路上每个节点所包括的日志信息;通过数据分析模块对比第一数据信息和第二数据信息,确定数据统计结果信息,通过数据分析模块对比各节点的日志信息,确定日志统计结果信息。通过数据统计结果信息,可以指示未发生数据丢失或发生数据丢失,当发生数据丢失时,通过趋势分析法可以确定发生数据丢失的时间片段;通过日志统计结果信息,可以指示在数据链路上出现数据不一致问题的节点;即当发生数据丢失时,可以对丢失的数据进行定位,提高了对数据进行监控的准确性。In the technical scheme of the embodiment of the present invention, the first data information of the first table to be counted is determined by the data statistics module, and the second data information of the second table to be counted corresponding to the first table to be counted; real-time collection by the log collection module The log information included in each node on the data link; compare the first data information and the second data information through the data analysis module to determine the data statistical result information, and compare the log information of each node through the data analysis module to determine the log statistical result information . Through the statistical result information of the data, it can indicate that no data loss has occurred or that data loss has occurred. When data loss occurs, the time segment of data loss can be determined through the trend analysis method; through the statistical result information of the log, it can be indicated that there is a The node of data inconsistency; that is, when data loss occurs, the lost data can be located, which improves the accuracy of data monitoring.
进一步的,数据准确性监控系统10还包括:Further, the data
可视化处理模块,用于连接数据分析模块13;对数据分析模块13确定的数据统计结果信息和/或日志统计结果信息可视化处理。The visualization processing module is used for connecting to the
可视化处理模块连接数据分析模块13的方式可以是通过电连接或网络连接等,数据分析模块13可以将数据统计结果信息和/或日志统计结果信息传输至可视化处理模块,使可视化处理模块对数据统计结果信息和/或日志统计结果信息可视化处理。The way that the visualization processing module is connected to the
可视化处理模块对数据统计结果信息和/或日志统计结果信息可视化处理的方式不作限定,如可视化处理模块按照数据统计结果信息和/或日志统计结果信息生成分析报告,并按照数据库集群、租户库和数据库表将分析报告展示在显示屏上。其中,分析报告可以指示出现数据丢失问题的数据库集群、租户库和数据库表。显示屏可以是监控大盘。The visual processing module does not limit the way of visual processing of data statistical result information and/or log statistical result information. For example, the visual processing module generates analysis reports according to data statistical result information and/or log statistical result information, and according to database clusters, tenant libraries and The database table presents the analysis report on the display. Among them, the analysis report can indicate the database clusters, tenant libraries and database tables that have data loss problems. The display screen can be a monitoring panel.
同时,可视化处理模块可以结合不同时间片段确定的数据统计结果信息和/或日志统计结果信息,按时间维度绘制数据趋势曲线,来展示数据的准确性。其中,数据趋势曲线可以是指不同时间片段对应的数据形成的曲线,如数据趋势曲线可以指示每个时间片段下的发生数据丢失问题的数据库集群、租户库和数据库表的数量等。At the same time, the visualization processing module can combine the data statistical result information and/or log statistical result information determined in different time segments, and draw a data trend curve according to the time dimension to show the accuracy of the data. Wherein, the data trend curve may refer to a curve formed by data corresponding to different time segments. For example, the data trend curve may indicate the number of database clusters, tenant libraries, and database tables where data loss occurs in each time segment.
进一步的,数据准确性监控系统10还可以包括:Further, the data
异常告警模块,异常告警模块可以用于,将发生数据丢失问题的数据库集群、租户库和数据库表通过告警的方式发送给运维和开发人员进行处理。Abnormal alarm module, the abnormal alarm module can be used to send the database cluster, tenant library and database table with data loss problem to the operation and maintenance and development personnel for processing by means of alarm.
实施例二Embodiment two
图3是根据本发明实施例二提供的一种数据准确性监控系统的结构示意图,本实施例是在上述实施例一的基础上,对数据统计模块11和日志收集模块12的具体结构进行说明,未在本实施例中详细描述的技术细节可参见上述实施例一。Fig. 3 is a schematic structural diagram of a data accuracy monitoring system provided according to
在本发明实施例中,数据统计模块11包括:In the embodiment of the present invention, the
数据统计子模块111和数据存储子模块112;Data statistics sub-module 111 and
数据统计子模块111用于,从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息;从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息;The data statistics sub-module 111 is used to determine the first data information of the first table to be counted in the tenant database to be counted from the database cluster included in the production system; determine the first data information from the real-time data warehouse included in the production system The second data information of the second table to be counted corresponding to the table to be counted;
数据存储子模块112用于,分别连接数据统计子模块111和数据分析模块13;存储数据统计子模块111所确定的第一数据信息和第二数据信息;将第一数据信息和第二数据信息发送至数据分析模块13。The
数据存储子模块112分别连接数据统计子模块111和数据分析模块13的方式不作限定,如可以是通过电连接,或通过网络连接等。The manner in which the
数据存储子模块112连接数据统计子模块111,数据统计子模块111确定第一数据信息和第二数据信息后,数据统计子模块111可以将第一数据信息和第二数据信息传输至数据存储子模块112,数据存储子模块112接收并存储第一数据信息和第二数据信息。The
数据存储子模块112可以是在线分析处理数据库(OLAP DB),其中,OLAP是在线分析处理,OLAP可以用于数据分析,通过OLAP能够同时分析来自多个数据库集群的信息,即可以通过OLAP可以提取所需数据并查询数据,以便从不同的角度进行数据分析。The
数据存储子模块112连接数据分析模块13,数据存储子模块112可以将第一数据信息和第二数据信息发送至数据分析模块13,以便数据分析模块13根据第一数据信息和第二数据信息确定数据统计结果信息。The
进一步的,数据统计子模块111包括:Further, the data statistics sub-module 111 includes:
服务接口单元1111,用于为第一待统计表和第二待统计表提供接入系统的接口。The service interface unit 1111 is configured to provide an interface for accessing the system for the first table to be counted and the second table to be counted.
服务接口单元1111可以为第一待统计表和第二待统计表提供接入系统的接口,即生产系统中所包括的第一待统计表和第二待统计表可以通过服务接口单元1111接入至数据准确性监控系统10,使数据准确性监控系统10可以获取第一待统计表和第二待统计表,并对第一待统计表和第二待统计表作后续处理,如确定第一待统计表的第一数据信息、确定第二待统计表的第二数据信息或对比第一数据信息和第二数据信息等。相应的,第一待统计表或第二待统计表所在的租户库或数据库集群也可以通过服务接口单元1111提供的接口接入至数据准确性监控系统10。The service interface unit 1111 can provide an interface for accessing the first table to be counted and the second table to be counted to be counted, that is, the first table to be counted and the second table to be counted to be counted in the production system can be accessed through the service interface unit 1111 To the data
任务队列单元1112,用于存放等待处理的统计需求任务,统计需求任务为确定第一待统计表的第一数据信息,和/或确定第二待统计表的第二数据信息。The
统计需求任务可以是指需要对任一数据库表中的数据进行统计的任务,如统计需求任务可以是确定第一待统计表的第一数据信息,和/或确定第二待统计表的第二数据信息。对统计需求任务的数量不作限定,具体可以根据实际应用需要来确定。Statistical demand task can refer to the task that needs to carry out statistics to the data in any database table, as statistical demand task can be to determine the first data information of the first table to be counted, and/or determine the second data information of the second table to be counted. Data information. There is no limit to the number of statistical demand tasks, which can be determined according to actual application needs.
当第一待统计表或第二待统计表通过服务接口单元1111接入数据准确性监控系统10时,表明需要确定第一待统计表的第一数据信息,或需要确定第二待统计表的第二数据信息,每个第一待统计表可以对应一个统计需求任务,每个第二待统计表也可以对应一个统计需求任务,各统计需求任务均可以存放在任务队列单元1112,在任务队列单元1112等待被处理。When the first table to be counted or the second table to be counted is connected to the data
租户库管理单元1113,用于从生产系统中同步各数据库集群中各租户库的配置,并将同一数据库集群中的各租户库分为同一组。The tenant
生产系统中的各租户库在物理分布上是有组织的,同时又在根据软件即服务(Software as a Service,SaaS)业务的发展持续调整,租户库管理单元1113可以从生产系统中同步各数据库集群中各租户库的配置,并将同一数据库集群中的各租户库分为同一组,即租户库管理单元1113可以识别生产系统中每个数据库集群中的租户库,并将接入至数据准确性监控系统10的属于同一数据库集群的所有租户库分为同一组。Each tenant library in the production system is organized in physical distribution, and at the same time is continuously adjusted according to the development of the software as a service (Software as a Service, SaaS) business. The tenant
业务抽象单元1114,用于对第一待统计表和/或第二待统计表中的数据进行抽象处理。The
对第一待统计表和/或第二待统计表中的数据进行抽象处理的方式不作限定,如在后续需要通过任务执行单元1116确定第一数据信息和/或第二数据信息,且第一数据信息和/或第二数据信息包括数据量和业务指标时,则可以通过业务抽象单元1114将第一待统计表和/或第二待统计表中的数据抽象为数据量和业务指标。The way of abstracting the data in the first to-be-statistics table and/or the second to-be-statistics table is not limited, for example, the
在第一待统计表和/或第二待统计表中,可以通过count()函数将第一待统计表和/或第二待统计表所包括的数据的总行数作为数据量;可以通过sum()函数将第一待统计表和/或第二待统计表所包括的数据的累加值作为业务指标,如可以将第一待统计表和/或第二待统计表中记录的所有销售额的累加值作为业务指标。In the first table to be counted and/or the second table to be counted, the total number of rows of data included in the first table to be counted and/or the second table to be counted can be used as the amount of data through the count() function; The () function takes the cumulative value of the data included in the first table to be counted and/or the second table to be counted as a business index, such as all sales recorded in the first table to be counted and/or the second table to be counted The cumulative value of is used as a business indicator.
通过业务抽象单元1114可以实现对第一待统计表和/或第二待统计表中数据的符号化,在后续通过提供第一待统计表和/或第二待统计表以及统计数据时所需的函数,即可自动实现数据量和业务指标的统计。The symbolization of data in the first table to be counted and/or the second table to be counted to be counted can be realized through the
任务执行管理单元1115,用于管理并发处理的统计需求任务,和/或管理连接系统的租户库的数量。The task execution management unit 1115 is configured to manage concurrently processed statistical demand tasks, and/or manage the number of tenant databases connected to the system.
任务执行管理单元1115用于管理并发处理的统计需求任务,即任务执行管理单元1115能够管理并发处理的统计需求任务的数量,在需要进行统计的租户库增多时,扩展相应的线程的数量,使更多租户库的数据统计能够并发处理,即更多的统计需求任务能够并发处理,进而在生产系统中的数据持续变化时,不会因统计时间过长导致所统计的数据产生误差,保证了统计数据的时效性。The task execution management unit 1115 is used to manage the statistically required tasks of concurrent processing, that is, the task execution management unit 1115 can manage the number of concurrently processed statistically required tasks. The data statistics of more tenant databases can be processed concurrently, that is, more statistical demand tasks can be processed concurrently, so that when the data in the production system continues to change, errors in the statistical data will not be caused by too long statistical time, ensuring Timeliness of statistics.
任务执行管理单元1115用于管理连接系统的租户库的数量,当需要对租户库中的第一待统计表或第二待统计表中的数据进行统计时,表明该租户库存在对应的统计需求任务,使第一待统计表或第二待统计表所在的租户库连接至数据准确性监控系统10。相应的,当租户库中不存在第一待统计表或第二待统计表时,表明该租户库中没有需要进行数据统计的数据库表,即不存在对应的统计需求任务,则使该租户库断开与数据准确性监控系统10的连接,这样可以节省数据准确性监控系统10所运行的环境资源,同时节省租户库的连接开销。The task execution management unit 1115 is used to manage the number of tenant databases connected to the system. When it is necessary to perform statistics on the data in the first to-be-statistics table or the second to-be-statistics table in the tenant database, it indicates that the tenant database has corresponding statistics requirements The task is to connect the tenant library where the first table to be counted or the second table to be counted to be counted to the data
任务执行单元1116,用于将业务抽象单元1114抽象处理后的数据具体化为可执行的语句,并确定第一数据信息和/或第二数据信息。The
任务执行单元1116,用于将业务抽象单元1114抽象处理后的数据具体化为可执行的语句,并确定第一数据信息和/或第二数据信息,可以理解为,任务执行单元1116可以通过编程语句对业务抽象单元1114抽象处理后的数据进行处理,通过任务执行单元1116执行编程语句即可确定第一数据信息和/或第二数据信息。其中,编程语句可以根据实际应用需要设定,如结构化查询语言(Structured Query Language,SQL)。The
结果输出单元1117,用于将第一数据信息和/或第二数据信息发送至数据存储子模块112。The
在结果输出单元1117中,可以通过缓冲队列存放第一数据信息和/或第二数据信息,在需要时将缓冲队列中存放的第一数据信息和/或第二数据信息发送至数据存储子模块112。如可以是通过定时脚本将缓冲队列中存放的第一数据信息和/或第二数据信息批量写入到数据存储子模块112,其中,定时脚本可以是根据实际应用需要设定的脚本,具体不作限定,只要能够将第一数据信息和/或第二数据信息发送至数据存储子模块112即可。In the
进一步的,任务执行管理单元1115具体用于:Further, the task execution management unit 1115 is specifically used for:
通过线程管理器集中管理并发处理的统计需求任务,并发处理的统计需求任务的数量与租户库的数量呈线性关系;The concurrently processed statistical demand tasks are centrally managed through the thread manager, and the number of concurrently processed statistical demand tasks is linearly related to the number of tenant libraries;
通过数据库连接管理器结合租户库管理单元1113,管理连接系统的租户库的数量。The database connection manager is combined with the tenant
通过线程管理器集中管理并发处理的统计需求任务,并发处理的统计需求任务的数量与租户库的数量呈线性关系,可以理解为,在租户库的数量增多时,通过线程管理器扩展相应的线程的数量,使更多的统计需求任务能够并发处理。其中,并发处理能够顺应生产系统中租户库的线性扩展,而相应的扩展线程数可以保证统计数据的时效性,不会导致统计时间过长时由于生产系统中的数据持续变化而带来误差。The concurrently processed statistical demand tasks are centrally managed through the thread manager, and the number of concurrently processed statistical demand tasks is linearly related to the number of tenant libraries. It can be understood that when the number of tenant libraries increases, the corresponding threads are expanded through the thread manager so that more tasks with statistical requirements can be processed concurrently. Among them, concurrent processing can comply with the linear expansion of the tenant library in the production system, and the corresponding expansion of the number of threads can ensure the timeliness of statistical data, and will not cause errors due to continuous changes in the data in the production system when the statistical time is too long.
相应的,在租户库的数量减少时,通过线程管理器减少相应的线程的数量,使更少的统计需求任务能够并发处理,进而节省数据准确性监控系统10所运行的环境资源,同时节省租户库的连接开销。Correspondingly, when the number of tenant libraries decreases, the number of corresponding threads is reduced through the thread manager, so that fewer statistical demand tasks can be processed concurrently, thereby saving the environment resources run by the data
通过数据库连接管理器结合租户库管理单元1113,管理连接系统的租户库的数量,可以理解为,对连接系统的租户库的数量进行限流处理,可以通过HashMap数据结构记录每个需要执行统计需求任务的租户库,并通过租户库管理单元1113所记录的生产系统中各数据库集群中各租户库的配置,将处于同一数据库集群中的需要执行统计需求任务的租户库识别为同一组。对于任一数据库集群,可以将HashMap数据结构对应的键(Key)作为该数据库集群的标识,将值(Value)作为该数据库集群中连接数据准确性监控系统10的租户库的数量,通过数据库集群的标识即可确定该数据库集群中连接数据准确性监控系统10的租户库的数量。在确定数据库集群的标识后,即可通过数据库连接管理器使该数据库集群中,值(Value)对应数量的租户库连接至数据准确性监控系统10,则实现了通过数据库连接管理器结合租户库管理单元1113,管理连接系统的租户库的数量。Through the database connection manager combined with the tenant
其中,对数据库集群的标识不作限定,如可以是数字编号,每个数据库集群可以有与之唯一对应的数字编号,以便区分不同的数据库集群。限流处理可以使统计需求任务的执行不会影响生产系统的性能。Wherein, the identification of the database cluster is not limited, for example, it may be a digital number, and each database cluster may have a unique corresponding digital number, so as to distinguish different database clusters. The current limiting process can make the execution of statistical demand tasks not affect the performance of the production system.
进一步的,任务执行管理单元1115具体用于:Further, the task execution management unit 1115 is specifically used for:
通过数据库连接管理器释放没有统计需求任务的租户库与系统的连接;Use the database connection manager to release the connection between the tenant library and the system without statistical demand tasks;
在没有统计需求任务时,采用线程管理器释放线程。When there is no statistical demand task, the thread manager is used to release the thread.
在租户库中没有统计需求任务时,采用线程管理器释放线程,通过数据库连接管理器释放该租户库与数据准确性监控系统10的连接;在租户库中有统计需求任务时,采用线程管理器使线程保持活跃状态,通过数据库连接管理器使该租户库与数据准确性监控系统10保持连接,进而节省数据准确性监控系统10所运行的环境资源,同时节省租户库的连接开销。When there is no statistical requirement task in the tenant library, the thread manager is used to release the thread, and the connection between the tenant library and the data
进一步的,数据分析模块13具体用于:Further, the
在历史时间段内,将历史时间段内每个历史时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为历史时间片段对应的第一误差;In the historical time period, the absolute value of the difference between the value corresponding to the first data information and the second data information corresponding to each historical time segment in the historical time period is determined as the first error corresponding to the historical time segment ;
根据各历史时间片段对应的第一误差,确定历史时间段对应的第一均值和第一均方差;According to the first error corresponding to each historical time segment, determine the first mean value and the first mean square error corresponding to the historical time segment;
将当前时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为当前时间片段对应的第二误差;determining the absolute value of the difference between the value corresponding to the first data information corresponding to the current time segment and the value corresponding to the second data information as the second error corresponding to the current time segment;
将第二误差与第一均值的差值的绝对值,确定为当前时间片段相对于历史时间段的第三误差;The absolute value of the difference between the second error and the first mean value is determined as the third error of the current time segment relative to the historical time segment;
若第三误差不超过第一均方差,则确定数据统计结果信息指示未发生数据丢失;若第三误差超过第一均方差,则确定数据统计结果信息指示发生数据丢失;If the third error does not exceed the first mean square error, then determine that the data statistical result information indicates that no data loss has occurred; if the third error exceeds the first mean square error, then determine that the data statistical result information indicates that data loss occurs;
其中,历史时间段内包括多个相同时长的历史时间片段,当前时间片段与历史时间片段的时长相同。Wherein, the historical time segment includes multiple historical time segments of the same duration, and the current time segment and the historical time segment have the same duration.
数据分析模块13对比第一数据信息和第二数据信息,确定数据统计结果信息时,可以通过趋势分析法来实现,通过当前时间片段确定的第一数据信息和第二数据信息,结合历史时间段内每个历史时间片段确定的第一数据信息和第二数据信息,确定当前时间片段对应的数据统计结果信息。When the
以历史时间段为1周为例,一周按天数可划分为7个历史时间片段,每个历史时间片段的时长为1天,历史时间段内包括7个相同时长的历史时间片段。当前时间片段的时长也为1天,当前时间片段可以是在历史时间段之后的时间片段。Taking the historical time period as 1 week as an example, a week can be divided into 7 historical time segments according to the number of days, each historical time segment is 1 day long, and the historical time segment includes 7 historical time segments of the same duration. The duration of the current time segment is also 1 day, and the current time segment may be a time segment after the historical time segment.
具体的,在历史时间段内,将历史时间段内每个历史时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为历史时间片段对应的第一误差。Specifically, within the historical time period, the absolute value of the difference between the value corresponding to the first data information and the second data information corresponding to each historical time segment in the historical time period is determined as the value corresponding to the historical time segment first error.
历史时间段内每个历史时间片段对应的第一数据信息所对应数值,可以按时间先后依次表示为:The value corresponding to the first data information corresponding to each historical time segment in the historical time period can be expressed in order of time as:
r1,r2,r3,r4,r5,r6,r7 r 1 ,r 2 ,r 3 ,r 4 ,r 5 ,r 6 ,r 7
历史时间段内每个历史时间片段对应的第二数据信息所对应数值,可以按时间先后依次表示为:The value corresponding to the second data information corresponding to each historical time segment in the historical time period can be expressed in order of time as:
w1,w3,w3,w4,w5,w6,w7 w 1 ,w 3 ,w 3 ,w 4 ,w 5 ,w 6 ,w 7
则各历史时间片段对应的第一误差可以按时间先后依次表示为:Then the first error corresponding to each historical time segment can be expressed in order of time as:
δ1=|w1-r1|,δ2=|w2-r2|,δ3=|w3-r3|,......,δ7=|w7-r7|δ 1 =|w 1 -r 1 |, δ 2 =|w 2 -r 2 |, δ 3 =|w 3 -r 3 |,...,δ 7 =|w 7 -r 7 |
根据各历史时间片段对应的第一误差,确定历史时间段对应的第一均值和第一均方差。According to the first error corresponding to each historical time segment, a first mean value and a first mean square error corresponding to the historical time segment are determined.
历史时间段对应的第一均值的计算公式为:The formula for calculating the first mean value corresponding to the historical time period is:
历史时间段对应的第一均方差的计算公式为:The formula for calculating the first mean square error corresponding to the historical time period is:
将当前时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为当前时间片段对应的第二误差。当前时间片段对应的第一数据信息所对应数值可以表示为r8,当前时间片段对应的第二数据信息所对应数值可以表示为w8,则第二误差的计算公式为:The absolute value of the difference between the value corresponding to the first data information corresponding to the current time segment and the value corresponding to the second data information is determined as the second error corresponding to the current time segment. The value corresponding to the first data information corresponding to the current time segment can be expressed as r 8 , and the value corresponding to the second data information corresponding to the current time segment can be expressed as w 8 , then the calculation formula of the second error is:
δ8=|w8-r8|δ 8 =|w 8 -r 8 |
将第二误差与第一均值的差值的绝对值,确定为当前时间片段相对于历史时间段的第三误差,则第三误差的计算公式为:The absolute value of the difference between the second error and the first mean value is determined as the third error of the current time segment relative to the historical time segment, then the calculation formula of the third error is:
若第三误差不超过第一均方差,则确定数据统计结果信息指示未发生数据丢失,其中,第三误差不超过第一均方差可以表示为:If the third error does not exceed the first mean square error, it is determined that the statistical result information of the data indicates that no data loss occurs, wherein the third error does not exceed the first mean square error can be expressed as:
σ8≤σσ 8 ≤ σ
若第三误差超过第一均方差,则确定数据统计结果信息指示发生数据丢失,其中,第三误差超过第一均方差可以表示为:If the third error exceeds the first mean square error, it is determined that the data statistical result information indicates that data loss occurs, wherein the third error exceeds the first mean square error and can be expressed as:
σ8>σσ 8 >σ
在本发明实施例中,日志收集模块12包括:In the embodiment of the present invention, the
日志收集工具121,用于收集对应二进制日志消费者的第一节点的日志信息;收集对应存放二进制日志的消息队列中间件的第二节点的日志信息;收集对应处理二进制日志的消息消费者的第三节点的日志信息;收集对应实时数仓的第四节点的日志信息;The
日志存储工具122,用于分别连接日志收集工具121和数据分析模块13;存储日志收集工具121收集的每个节点所包括的日志信息;将日志信息发送至数据分析模块13。The
日志收集工具121可以是日志文件托运工具Filebeat,通过Filebeat可以收集第一节点、第二节点、第三节点和第四节点的日志信息。The
日志存储工具122分别连接日志收集工具121和数据分析模块13的方式可以是电连接或网络连接等。The manner in which the
日志存储工具122连接日志收集工具121,可以获取并存储日志收集工具121传输的各节点的日志信息。日志存储工具122可以是支持分布式搜索和分析引擎,比如ElasticSearch,以同时支持准确性比对和问题定位能力。日志信息的存贮可以进行生命周期自动化管理,比如Elasticsearch的索引模板和lifecycle功能,从而降低存储成本和运维成本。The
日志存储工具122连接数据分析模块13,日志存储工具122可以将每个节点所包括的日志信息发送至数据分析模块13,以便数据分析模块13对比各节点的日志信息,确定日志统计结果信息。The
进一步的,数据分析模块13具体用于:Further, the
以设定频率对比日志收集模块12所收集的各节点所包括的日志信息,确定数据链路上出现数据不一致问题的问题节点;Contrast the log information included in each node collected by the
将问题节点确定为日志统计结果信息。Determine the problem node as log statistics result information.
设定频率可以是指确定日志统计结果信息的频率,可以根据实际应用需要确定设定频率,如设定频率可以是每小时。问题节点可以是指数据链路上出现数据不一致问题的节点。The set frequency may refer to the frequency of determining the log statistical result information, and the set frequency may be determined according to actual application needs, for example, the set frequency may be every hour. The problem node may refer to a node where data inconsistency occurs on the data link.
数据分析模块13可以每小时对比一次各节点所包括的日志信息,如对比日志信息所指示的业务主键、业务发生时间和binlog类型等,来发现在数据链路上出现数据不一致问题的问题节点,将出现数据不一致问题的问题节点确定为日志统计结果信息。The
本发明实施例的技术方案,对数据统计模块和日志收集模块的具体结构进行说明,以及对数据分析模块的具体作用进行说明。数据分析模块可以通过历史时间段内每个历史时间片段对应的第一数据信息和第二数据信息,以及当前时间片段对应的第一数据信息和第二数据信息,确定当前时间片段对应的数据统计结果信息;通过选择不同的当前时间片段和历史时间片段,可以进一步确定发生数据丢失的时间片段;数据分析模块可以通过设定频率对比各节点所包括的日志信息,确定日志统计结果信息,日志统计结果信息指示在数据链路上出现数据不一致问题的节点;即当发生数据丢失时,可以对丢失的数据进行定位,提高了对数据进行监控的准确性。The technical solutions of the embodiments of the present invention describe the specific structures of the data statistics module and the log collection module, and the specific functions of the data analysis module. The data analysis module can determine the data statistics corresponding to the current time segment through the first data information and the second data information corresponding to each historical time segment in the historical time segment, and the first data information and the second data information corresponding to the current time segment Result information; By selecting different current time segments and historical time segments, you can further determine the time segment where data loss occurs; the data analysis module can compare the log information included in each node by setting the frequency, determine the log statistical result information, log statistics The result information indicates the node where data inconsistency occurs on the data link; that is, when data loss occurs, the lost data can be located, which improves the accuracy of data monitoring.
实施例三Embodiment Three
本发明实施例是对上述各实施例的示例性说明。本发明实施例提出了一种多租户库数据准确性监控系统,该系统可以对包含多服务器(即多数据库集群)和多租户的生产系统中的数据进行监控,并且支持同时OLAP和实时分析。The embodiments of the present invention are exemplary descriptions of the above-mentioned embodiments. The embodiment of the present invention proposes a multi-tenant database data accuracy monitoring system, which can monitor data in a production system including multiple servers (ie, multi-database clusters) and multiple tenants, and supports simultaneous OLAP and real-time analysis.
在实时生产系统中,面对TB级别的数据时,数据丢失很难被发现和进行问题定位,本系统从成本、效率角度综合考虑,利用统计方法和数据链路追踪方法从宏观、微观和时间维度,结合分析图表对数据准确性进行自动化检查、综合分析和告警,从而解决多租户场景下实时数据准确性检查的问题,能够在T+1周期上发现数据的准确性问题,在H+1周期内发现数万个数据源的运行状态,运行成本小,效率高,使用方便。In a real-time production system, when faced with terabytes of data, it is difficult to find data loss and locate problems. This system comprehensively considers cost and efficiency, and uses statistical methods and data link tracking methods to analyze macro, micro and time Dimensions, combined with analysis charts to automatically check, comprehensively analyze and alert data accuracy, thereby solving the problem of real-time data accuracy checking in multi-tenant scenarios, and can find data accuracy problems in the T+1 cycle, and in H+1 The operating status of tens of thousands of data sources can be found within a cycle, with low operating cost, high efficiency, and easy to use.
其中,周期就是数据监控的频率;监控有全量和抽样两个策略;在全量策略的执行上,平均每天执行1次,相应的我们定义这个周期叫T+1;在抽样策略的执行上平均每个小时执行1次,相应的我们定义这个周期叫H+1。Among them, the cycle is the frequency of data monitoring; there are two strategies for monitoring: full amount and sampling; in the execution of the full amount strategy, it is executed once a day on average, and correspondingly we define this cycle as T+1; in the execution of the sampling strategy, the average Executed once an hour, correspondingly we define this cycle as H+1.
图4是根据本发明实施例三提供的一种多租户库数据准确性监控系统的结构示意图,图4中示出的监控系统即为本发明实施例三提供的一种多租户库数据准确性监控系统,图4中还示出了生产系统,监控系统可以对生产系统进行数据准确性监控。Fig. 4 is a schematic structural diagram of a multi-tenant database data accuracy monitoring system provided according to Embodiment 3 of the present invention. The monitoring system shown in Fig. 4 is a multi-tenant database data accuracy monitoring system provided in Embodiment 3 of the present invention. As for the monitoring system, the production system is also shown in FIG. 4 , and the monitoring system can monitor the data accuracy of the production system.
如图4所示,多租户库数据准确性监控系统中包括:统计(即数据统计子模块)、收集日志(即日志收集工具)、分布式搜索和分析引擎(即日志存储工具)、在线分析处理数据库(即数据存储子模块)、分析(即数据分析模块)、监控大盘(即可视化处理模块)和异常告警(即异常告警模块)。As shown in Figure 4, the multi-tenant database data accuracy monitoring system includes: statistics (that is, data statistics sub-module), collecting logs (that is, log collection tools), distributed search and analysis engines (that is, log storage tools), online analysis Processing database (ie, data storage sub-module), analysis (ie, data analysis module), monitoring panel (ie, visual processing module) and abnormal alarm (ie, abnormal alarm module).
其中,统计(即数据统计子模块)主要用来满足数据统计步骤的各类需求;数据分布在不同的数据库服务器(即数据库集群)中,在上面运行着SaaS业务,需要解决租户库多、生产资源紧张的问题。Among them, statistics (that is, the data statistics sub-module) are mainly used to meet various needs of the data statistics step; data are distributed in different database servers (that is, database clusters), and SaaS services are running on them, so it is necessary to solve the problems of multiple tenant databases, production resource constraints.
统计(即数据统计子模块)设计为一个服务型任务,可以满足各类统计需求,从而为自动化的集成提供了可能性;在实现上采用多线程,提高并发,并发的能力取决于需求大小,即有多少数据库服务器,从而实现一个和SaaS业务对等的线性扩展能力;在底层实现了一个严格的数据库操作框架,用来严格控制数据库的连接和请求的管理;数据库连接数由生产系统运维方根据租户库容量和业务量评估并给出允许监控系统使用的值;数据库不保持长连接,只在需要时进行连接,处理完请求立刻释放;数据库请求只支持查询请求,规避对生产数据的影响;所有的请求只支持顺序执行,规避对生产资源的占用过多导致生产请求阻塞。Statistics (that is, the data statistics sub-module) is designed as a service task, which can meet various statistical needs, thus providing the possibility for automatic integration; in the implementation, multi-threading is used to improve concurrency, and the ability of concurrency depends on the size of the demand. That is, how many database servers are there, so as to achieve a linear expansion capability equivalent to the SaaS business; a strict database operation framework is implemented at the bottom layer to strictly control database connections and request management; the number of database connections is determined by the production system operation and maintenance The party evaluates and gives the value allowed to be used by the monitoring system according to the tenant database capacity and business volume; the database does not maintain a long connection, but only connects when needed, and releases the request immediately after processing the request; the database request only supports query requests, avoiding production data. Impact; all requests only support sequential execution, avoiding excessive occupation of production resources and causing production request blocking.
图5是根据本发明实施例三提供的一种数据统计子模块的结构示意图,具体的,数据统计子模块包括:FIG. 5 is a schematic structural diagram of a data statistics submodule provided according to Embodiment 3 of the present invention. Specifically, the data statistics submodule includes:
服务接口(即服务接口单元),用于提供统计需求的申请,可以按时、灵活的指定需要统计的集群(即数据库集群)或表(第一待统计表或第二待统计表);The service interface (ie, the service interface unit) is used to provide the application for statistical requirements, and can specify the cluster (ie, database cluster) or table (the first table to be counted or the second table to be counted) that needs to be counted on time and flexibly;
任务队列(即任务队列单元),统计需求(即统计需求任务)进入系统后统一进入任务队列等待处理;Task queue (i.e. task queue unit), statistical requirements (i.e. statistical demand tasks) enter the system and enter the task queue uniformly to wait for processing;
租户库管理模块(即租户库管理单元),生产系统的租户库在物理分布上是有组织的,同时又在根据SaaS业务的发展持续调整,租户库管理模块负责同步租户库配置,并对租户库进行配置识别,将同一个数据库集群的租户库分组(即将同一数据库集群中的各租户库分为同一组);Tenant library management module (that is, tenant library management unit), the tenant library of the production system is organized in physical distribution, and at the same time is continuously adjusted according to the development of the SaaS business. Databases are configured and identified, and tenant libraries of the same database cluster are grouped (that is, each tenant library in the same database cluster is divided into the same group);
业务抽象模块(即业务抽象单元),业务包括数据量统计、业务指标统计,业务抽象模块将两类统计抽象为count和sum,实现统计需求描述的符号化,即提供业务数据对应存储的表和统计类型,系统即可自动实现统计;The business abstraction module (that is, the business abstraction unit), the business includes data volume statistics and business index statistics. The business abstraction module abstracts the two types of statistics into count and sum, and realizes the symbolization of the description of statistical requirements, that is, it provides tables and tables for corresponding storage of business data. Statistics type, the system can automatically realize the statistics;
任务执行管理模块(即任务执行管理单元),任务执行管理模块对任务的执行设定了并发和限流双保证的规则:The task execution management module (i.e., the task execution management unit), the task execution management module sets the rules of concurrent and current limiting double guarantees for the execution of tasks:
并发保证能够顺应生产系统租户库的线性扩展而相应的扩展线程数,从而保证统计数据的时效性,不会导致统计时间过长时由于生产数据持续变化而带来误差;程序中建立一个线程管理器来集中管理并发任务,并发的数量和租户库的数量呈线性关系(即通过线程管理器集中管理并发处理的统计需求任务,并发处理的统计需求任务的数量与租户库的数量呈线性关系);The concurrency guarantee can adapt to the linear expansion of the production system tenant library and expand the number of threads accordingly, so as to ensure the timeliness of statistical data, and will not cause errors due to continuous changes in production data when the statistical time is too long; a thread management is established in the program The thread manager is used to centrally manage concurrent tasks, and the number of concurrent tasks is linearly related to the number of tenant libraries (that is, the number of concurrently processed statistical demand tasks is centrally managed through the thread manager, and the number of concurrently processed statistical demand tasks is linearly related to the number of tenant libraries) ;
限流能够保证统计任务不会影响生产系统的性能;程序中管理一个HashMap数据结构记录每个在统计运行(即执行统计需求任务)的租户库,并结合租户管理模块识别同一个数据库集群上的租户库,HashMap的key值即集群服务器的名称(即数据库集群的标识),value值为当前的租户库所在数据库集群的连接数,同时,实现一个数据库连接管理器来提供任务执行需要的数据库连接,这样就管理了租户数据库的连接数(即通过数据库连接管理器结合租户库管理单元,管理连接系统的租户库的数量);Current limiting can ensure that statistical tasks will not affect the performance of the production system; the program manages a HashMap data structure to record each tenant library that is running statistically (that is, performing statistical demand tasks), and combines the tenant management module to identify the same database cluster. Tenant library, the key value of HashMap is the name of the cluster server (that is, the identity of the database cluster), and the value value is the number of connections of the database cluster where the current tenant library is located. At the same time, a database connection manager is implemented to provide the database connection required for task execution In this way, the number of connections to the tenant database is managed (that is, the number of tenant libraries connected to the system is managed through the database connection manager combined with the tenant library management unit);
当有任务(即统计需求任务)持续不断进来时,线程和数据库连接都保持活跃状态,一旦发现没有任务再进来时线程管理器主动释放线程、数据库连接管理器主动释放数据库连接(即通过数据库连接管理器释放没有统计需求任务的租户库与系统的连接;在没有统计需求任务时,采用线程管理器释放线程),这样可以节省统计系统所运性的环境资源,同时节省生产系统租户库的连接开销;When tasks (that is, statistical demand tasks) continue to come in, both the thread and the database connection remain active. Once it is found that there is no task coming in, the thread manager actively releases the thread, and the database connection manager actively releases the database connection (that is, through the database connection) The manager releases the connection between the tenant library and the system that has no statistical demand task; when there is no statistical demand task, the thread manager is used to release the thread), which can save the environment resources of the statistical system and save the connection of the production system tenant library overhead;
任务执行模块(即任务执行单元),用于将业务抽象具体为可执行的SQL,和执行SQL获取结果(即将业务抽象单元抽象处理后的数据具体化为可执行的语句,并确定第一数据信息和/或第二数据信息);The task execution module (that is, the task execution unit) is used to concrete the business abstraction into executable SQL, and execute the SQL to obtain the result (that is, to materialize the abstracted data processed by the business abstraction unit into an executable statement, and to determine the first data information and/or second data information);
结果输出模块(即结果输出单元),由并发线程执行的统计结果会存放到一个缓冲队列中,由一个循环存储线程作业定时、将缓冲队列中统计结果批量写入到外部存储(即将第一数据信息和/或第二数据信息发送至数据存储子模块)。The result output module (that is, the result output unit), the statistical results executed by the concurrent threads will be stored in a buffer queue, and the statistical results in the buffer queue will be written to the external storage in batches by a cycle storage thread job timing (that is, the first data information and/or second data information is sent to the data storage sub-module).
通过分析(即数据分析模块)首先将租户库(即数据源)的数据量和实时数仓(即目标库)的数据量作对比,这是一个千万级别的数据处理过程,利用具有OLAP能力的数据库,这一步的算法用SQL实现,简单快捷,执行效率高;业务指标对比也采用同样的方法实现;Through analysis (that is, the data analysis module), first compare the data volume of the tenant database (that is, the data source) with the data volume of the real-time data warehouse (that is, the target database). database, the algorithm in this step is implemented with SQL, which is simple, fast, and highly efficient; the comparison of business indicators is also implemented using the same method;
数据源通常为MySQL型数据库集群,有多个服务器并分别包含多个租户库,将每个租户库中的表的数据进行统计,包含数据量和业务指标总计,并写入到支持OLAP分析的数据库中,需要对多个租户库的各个表进行对比,来发现有数据丢失问题的租户库和表;The data source is usually a MySQL-type database cluster, with multiple servers and multiple tenant libraries. The data of the tables in each tenant library is counted, including the data volume and business indicators, and written to the database that supports OLAP analysis. In the database, it is necessary to compare the tables of multiple tenant libraries to find tenant libraries and tables with data loss problems;
在业务指标对比时,可以选取核心业务的指标,如销售额,销售数量,通过对业务指标的总计进行比较,比较时可以按照每个租户库、和业务指标对应的存储表进行统计比较,从而发现有数据不准确问题的租户库和表。When comparing business indicators, you can select core business indicators, such as sales and sales volume, and compare the total of business indicators. When comparing, you can perform statistical comparisons according to the storage tables corresponding to each tenant database and business indicators, so that Tenant libraries and tables with inaccurate data issues were found.
基于上述处理的结果反映了静态的数据差异,需要利用趋势分析法结合历史数据进一步分析;实时系统的错误导致的数据准确性问题反映到数据上来说就是分析差异结果的缘由;通过对比历史数据可以分辨出这个差异是历史存留的问题,还是最近的问题,这需要从统计学入手。The results based on the above processing reflect static data differences, which need to be further analyzed by using the trend analysis method combined with historical data; the data accuracy problems caused by real-time system errors are reflected in the data, which is the reason for the analysis of the difference results; by comparing historical data, we can Distinguishing whether this difference is a historical problem or a recent problem requires a statistical approach.
趋势分析法叠加在统计分析法后面,用来处理实时系统无法保持静止状态而导致统计误差的分析问题;趋势分析法根据需要按时间长度将数据划分为片段,比如最近一周(即历史时间段)按天可划分为7个时间片段(即历史时间片段),每个时间片段内都会执行一次数据统计并保存统计和分析结果(即第一数据信息、第二数据信息和数据统计结果信息),当第8天执行统计的时候,可结合前7天时间片段里执行的结果,确定第8天对应的数据统计结果信息。The trend analysis method is superimposed on the back of the statistical analysis method to deal with the analysis of statistical errors caused by the inability of the real-time system to maintain a static state; the trend analysis method divides the data into segments according to the length of time as needed, such as the last week (that is, the historical time period) It can be divided into 7 time segments (that is, historical time segments) by day, and data statistics will be performed once in each time segment and the statistics and analysis results (that is, first data information, second data information and data statistical result information) will be saved. When performing statistics on the 8th day, you can combine the execution results in the time segment of the previous 7 days to determine the corresponding data statistics result information on the 8th day.
需要说明的是,趋势分析法是将每个时间段当时的差异值(即第一误差)进行对比,比如第1到7天每次统计都发现差异100条左右数据,那么第8天统计时任然相差100条左右,那么说明今天没有丢失数据。误差来源有两种:一是统计有时间差,因为数据在持续变化,所有会有误差;二是有历史积压的脏数据,这样需要往更早的时间片段进行回溯;当监控系统第一次启动时如果我们保证一个一致的基线,即租户库中的数据和实时数仓中的数据是一致的,那么我们总能回溯到出现异常的时间点,或者说,其实在该时间点时就能发现数据丢失的问题了。It should be noted that the trend analysis method is to compare the difference value (that is, the first error) in each time period. For example, from the 1st to the 7th day, each statistics finds a difference of about 100 pieces of data, then when the 8th day is counted If there is still a difference of about 100, it means that no data is lost today. There are two sources of error: one is that there is a time difference in statistics, because the data is constantly changing, so there will be errors; the other is that there is a backlog of dirty data in history, which needs to be traced back to an earlier time segment; when the monitoring system is started for the first time If we ensure a consistent baseline, that is, the data in the tenant database is consistent with the data in the real-time data warehouse, then we can always go back to the time when the exception occurred, or in fact, we can find it at that time. The problem of data loss is gone.
收集日志(即日志收集工具)设计为一个通用的采集框架,日志保持统一的格式和目录结构;日志统一收集到支持全文检索的存储引擎上(即日志存储工具),比如ElasticSearch,以同时支持准确性比对和问题定位能力;日志采集采用开源的工具,比如Filebeat;日志的存贮进行生命周期自动化管理,比如Elasticsearch的索引模板和lifecycle功能,从而降低存储成本和运维成本。Collecting logs (that is, log collection tools) is designed as a general collection framework, and the logs maintain a unified format and directory structure; the logs are uniformly collected on storage engines that support full-text retrieval (that is, log storage tools), such as ElasticSearch, to simultaneously support accurate Sexual comparison and problem location capabilities; log collection uses open source tools, such as Filebeat; log storage is automatically managed in the life cycle, such as the index template and lifecycle function of Elasticsearch, thereby reducing storage costs and operation and maintenance costs.
通过收集日志(即日志收集工具)实时采集数据链路上各个节点的日志信息,这些节点包含Mysql Binlog Consumer(即第一节点),存放Binlog的MessageQueue(即第二节点),处理binlog的Message Consumer(即第三节点),以及最终的数仓节点(即第四节点),日志信息所指示的内容为每个节点处理或存放的数据的关键信息,即表中记录的主键值。Collect the log information of each node on the data link in real time by collecting logs (that is, log collection tools). These nodes include Mysql Binlog Consumer (that is, the first node), the MessageQueue that stores Binlog (that is, the second node), and the Message Consumer that processes binlog (that is, the third node), and the final data warehouse node (that is, the fourth node), the content indicated by the log information is the key information of the data processed or stored by each node, that is, the primary key value recorded in the table.
通过分析(即数据分析模块)对日志信息进行比对,可以发现在链路上出现数据不一致问题的节点(即日志统计结果信息)。其中,抽样针对数据域;通常选择核心业务的对应的数据域,在该数据域内针对实时采集链路进行跟踪,利用Elasticsearch搜集的日志,可以对链路上各个节点之间的数据日志比对,在算法实现上主要比对业务主键、业务发生时间和binlog类型;通过抽样检查可以发现实时数据采集链路是否稳定运行,数据处理是否完整,尤其是针对变更数据可以补充统计方法发现不了的问题。By analyzing (that is, the data analysis module) and comparing the log information, it is possible to find nodes (that is, log statistical result information) where data inconsistency occurs on the link. Among them, the sampling is aimed at the data field; usually the corresponding data field of the core business is selected, and the real-time collection link is tracked in this data field. The logs collected by Elasticsearch can be used to compare the data logs between the nodes on the link. In the implementation of the algorithm, it mainly compares the business primary key, business occurrence time and binlog type; through sampling inspection, it can be found whether the real-time data collection link is running stably and whether the data processing is complete, especially for the problems that cannot be found by supplementary statistical methods for changed data.
将分析结果生成报告数据,并按照数据库实例、租户库、表进行大盘展示,同时结合历史统计结果按时间维度绘制数据趋势曲线,来展示数据准确性报告(即可视化处理模块对数据分析模块确定的数据统计结果信息和/或日志统计结果信息可视化处理);对于有数据异常的租户库和表通过异常告警模块告警的方式发送给运维和开发人员进行处理。Generate report data from the analysis results, and display them on a large scale according to the database instance, tenant library, and table, and draw the data trend curve according to the time dimension in combination with the historical statistical results to display the data accuracy report (that is, the data determined by the visual processing module to the data analysis module Visual processing of data statistical result information and/or log statistical result information); for tenant libraries and tables with abnormal data, send them to operation and maintenance and developers for processing through the abnormal alarm module alarm.
在自动化的流程中主要是发现异常问题,缺少对整体数据概括性的描述,这涉及到多租户库、多表和时间维度一起考虑时的描述复杂性,不是监控系统擅长处理的;此时可以结合分析图表,通过表格、曲线图等可视化图表工具进行分析,可以按需进行各个维度和数据范围的可配置分析,比如某个MySQL Server(即数据库集群)上、某个租户库的某个表,或者某个表的历史数据趋势图等;在具体实现可以借助开源的可视化分析工具。In the automated process, abnormal problems are mainly found, and there is a lack of a general description of the overall data. This involves the complexity of description when multi-tenant libraries, multi-tables and time dimensions are considered together, which is not what the monitoring system is good at handling; at this time, you can Combined with analysis charts, analysis is performed through visual chart tools such as tables and graphs, and configurable analysis of various dimensions and data ranges can be performed on demand, such as a table on a MySQL Server (that is, a database cluster) or a tenant library , or the historical data trend graph of a table, etc.; in the specific implementation, open source visual analysis tools can be used.
本发明实施例提出的一种多租户库数据准确性监控系统,在硬件成本上需要一个负责收集数据的机器运行收集任务(即统计需求任务),一个支持千万级别数据的数据库系统,和一个Elasticsearch系统;所有这些都是常规的需求,通常2台小规模的机器即可满足;运行资源上对生产的占用可以忽略,不需对生产进行扩容;A multi-tenant library data accuracy monitoring system proposed by the embodiment of the present invention requires a machine responsible for collecting data to run the collection task (that is, a statistical demand task), a database system supporting tens of millions of data, and a Elasticsearch system; all of these are conventional requirements, usually two small-scale machines can meet; the occupation of production resources on operating resources can be ignored, and there is no need to expand production;
在架构设计上,统计和分析分离,并且大部分通过SQL层面即可实现,系统复杂度低,开发成本低;在日志搜集上通过开源框架实现,目前有很成熟的各类组件,不需要投入额外成本;相比于需要将业务数据明细重新落库再计算指标逻辑的方式,本系统直接利用了生产资源,规避了业务数据明细落库的复杂性和硬件的维护成本。In terms of architecture design, statistics and analysis are separated, and most of them can be realized through the SQL level, the system complexity is low, and the development cost is low; the log collection is realized through an open source framework. Currently, there are various mature components that do not require investment. Additional cost; Compared with the method of re-logging the business data details and then calculating the indicator logic, this system directly utilizes production resources, avoiding the complexity of the business data detail storage and hardware maintenance costs.
本发明实施例提出的一种多租户库数据准确性监控系统,对于一个大规模(百亿级别)、多租户(万级别)的实时系统,系统可以在小时级别完成完整的数据校验,提高了数据准确性监控的效率。通过系统集成,可以同时满足定时检查、按需调用;整个过程全部自动化,很大程度降低了使用复杂度,降低了对使用人员的要求。A multi-tenant library data accuracy monitoring system proposed by the embodiment of the present invention, for a large-scale (10 billion level), multi-tenant (10,000 level) real-time system, the system can complete complete data verification at the hour level, improving Improve the efficiency of data accuracy monitoring. Through system integration, it can meet the requirements of regular inspection and on-demand calling at the same time; the whole process is fully automated, which greatly reduces the complexity of use and the requirements for users.
实施例四Embodiment four
图6是根据本发明实施例四提供的一种数据准确性监控方法的流程图,本实施例可适用于实现数据的准确性监控的情况,数据准确性监控方法可以应用于本发明任意实施例提供的数据准确性监控系统。Fig. 6 is a flow chart of a data accuracy monitoring method provided according to Embodiment 4 of the present invention. This embodiment is applicable to the realization of data accuracy monitoring, and the data accuracy monitoring method can be applied to any embodiment of the present invention Data accuracy monitoring system provided.
如图6所示,该方法包括:As shown in Figure 6, the method includes:
S110、通过数据统计模块从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息。S110. Determine the first data information of the first table to be counted in the tenant database to be counted from the database cluster included in the production system through the data counting module.
S120、通过数据统计模块从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息。S120. Determine the second data information of the second table to be counted corresponding to the first table to be counted from the real-time data warehouse included in the production system through the data statistics module.
S130、通过日志收集模块实时采集数据链路上的每个节点所包括的日志信息。S130. Collect log information included in each node on the data link in real time through the log collection module.
S140、通过数据分析模块对比数据统计模块所确定的第一数据信息和第二数据信息,确定数据统计结果信息。S140. Using the data analysis module to compare the first data information and the second data information determined by the data statistics module, determine data statistics result information.
S150、通过数据分析模块对比日志收集模块所采集的各节点的日志信息,确定日志统计结果信息。S150. Determine log statistical result information by comparing the log information of each node collected by the log collection module by the data analysis module.
该方法通过数据分析模块对比第一数据信息和第二数据信息,确定数据统计结果信息,通过对比数据分析模块各节点的日志信息确定日志统计结果信息,提高了对数据进行监控的准确性。In the method, the data analysis module compares the first data information and the second data information to determine the data statistical result information, and determines the log statistical result information by comparing the log information of each node of the data analysis module, thereby improving the accuracy of data monitoring.
进一步的,通过数据统计模块从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息,通过数据统计模块从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息,包括:Further, the first data information of the first statistics table in the tenant library to be counted is determined from the database cluster included in the production system through the data statistics module, and determined from the real-time data warehouse included in the production system through the data statistics module The second data information of the second table to be counted corresponding to the first table to be counted includes:
通过数据统计模块包括的数据统计子模块从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息,从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息;Through the data statistics sub-module included in the data statistics module, from the database cluster included in the production system, determine the first data information of the first table to be counted in the tenant database to be counted, and determine from the real-time data warehouse included in the production system. The second data information of the second table to be counted corresponding to the first table to be counted;
通过数据统计模块包括的数据存储子模块存储数据统计子模块所确定的第一数据信息和第二数据信息;将第一数据信息和第二数据信息发送至数据分析模块。The data storage sub-module included in the data statistics module stores the first data information and the second data information determined by the data statistics sub-module; and sends the first data information and the second data information to the data analysis module.
进一步的,通过数据统计模块包括的数据统计子模块从生产系统所包括的数据库集群中,确定待统计的租户库中第一待统计表的第一数据信息,从生产系统所包括的实时数仓中确定与第一待统计表对应的第二待统计表的第二数据信息,包括:Further, through the data statistics sub-module included in the data statistics module, from the database cluster included in the production system, determine the first data information of the first table to be counted in the tenant database to be counted, and from the real-time data warehouse included in the production system Determine the second data information of the second table to be counted corresponding to the first table to be counted, including:
通过数据统计子模块包括的服务接口单元,为第一待统计表和第二待统计表提供接入系统的接口;Through the service interface unit included in the data statistics sub-module, an interface for accessing the system is provided for the first table to be counted and the second table to be counted;
通过数据统计子模块包括的任务队列单元,存放等待处理的统计需求任务,统计需求任务为确定第一待统计表的第一数据信息,和/或确定第二待统计表的第二数据信息;The task queue unit included in the data statistics submodule stores the statistical demand tasks waiting to be processed, and the statistical demand task is to determine the first data information of the first table to be counted, and/or determine the second data information of the second table to be counted;
通过数据统计子模块包括的租户库管理单元,从生产系统中同步各数据库集群中各租户库的配置,并将同一数据库集群中的各租户库分为同一组;Through the tenant library management unit included in the data statistics sub-module, the configuration of each tenant library in each database cluster is synchronized from the production system, and each tenant library in the same database cluster is divided into the same group;
通过数据统计子模块包括的业务抽象单元,对第一待统计表和/或第二待统计表中的数据进行抽象处理;Abstract processing of data in the first table to be counted and/or the second table to be counted through the business abstraction unit included in the data statistics submodule;
通过数据统计子模块包括的任务执行管理单元,管理并发处理的统计需求任务,和/或管理连接系统的租户库的数量;Through the task execution management unit included in the data statistics sub-module, manage the statistical demand tasks for concurrent processing, and/or manage the number of tenant libraries connected to the system;
通过数据统计子模块包括的任务执行单元,将业务抽象单元抽象处理后的数据具体化为可执行的语句,并确定第一数据信息和/或第二数据信息;Through the task execution unit included in the data statistics sub-module, the data abstracted and processed by the business abstraction unit is embodied into an executable statement, and the first data information and/or the second data information are determined;
通过数据统计子模块包括的结果输出单元,将第一数据信息和/或第二数据信息发送至数据存储子模块。The first data information and/or the second data information are sent to the data storage sub-module through the result output unit included in the data statistics sub-module.
进一步的,通过数据统计子模块包括的任务执行管理单元,管理并发处理的统计需求任务,和/或管理连接系统的租户库的数量,包括:Further, through the task execution management unit included in the data statistics sub-module, manage the concurrently processed statistical demand tasks, and/or manage the number of tenant libraries connected to the system, including:
通过线程管理器集中管理并发处理的统计需求任务,并发处理的统计需求任务的数量与租户库的数量呈线性关系;The concurrently processed statistical demand tasks are centrally managed through the thread manager, and the number of concurrently processed statistical demand tasks is linearly related to the number of tenant libraries;
通过数据库连接管理器结合租户库管理单元,管理连接系统的租户库的数量。The number of tenant databases connected to the system is managed through the database connection manager combined with the tenant database management unit.
进一步的,通过数据库连接管理器结合租户库管理单元,管理连接系统的租户库的数量,包括:Further, the number of tenant libraries connected to the system is managed through the database connection manager combined with the tenant library management unit, including:
通过数据库连接管理器释放没有统计需求任务的租户库与系统的连接;Use the database connection manager to release the connection between the tenant library and the system without statistical demand tasks;
在没有统计需求任务时,采用线程管理器释放线程。When there is no statistical demand task, the thread manager is used to release the thread.
进一步的,通过数据分析模块对比数据统计模块所确定的第一数据信息和第二数据信息,确定数据统计结果信息,包括:Further, the data statistics result information is determined by comparing the first data information and the second data information determined by the data statistics module through the data analysis module, including:
在历史时间段内,将历史时间段内每个历史时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为历史时间片段对应的第一误差;In the historical time period, the absolute value of the difference between the value corresponding to the first data information and the second data information corresponding to each historical time segment in the historical time period is determined as the first error corresponding to the historical time segment ;
根据各历史时间片段对应的第一误差,确定历史时间段对应的第一均值和第一均方差;According to the first error corresponding to each historical time segment, determine the first mean value and the first mean square error corresponding to the historical time segment;
将当前时间片段对应的第一数据信息所对应数值和第二数据信息所对应数值的差值的绝对值,确定为当前时间片段对应的第二误差;determining the absolute value of the difference between the value corresponding to the first data information corresponding to the current time segment and the value corresponding to the second data information as the second error corresponding to the current time segment;
将第二误差与第一均值的差值的绝对值,确定为当前时间片段相对于历史时间段的第三误差;The absolute value of the difference between the second error and the first mean value is determined as the third error of the current time segment relative to the historical time segment;
若第三误差不超过第一均方差,则确定数据统计结果信息指示未发生数据丢失;若第三误差超过第一均方差,则确定数据统计结果信息指示发生数据丢失;If the third error does not exceed the first mean square error, then determine that the data statistical result information indicates that no data loss has occurred; if the third error exceeds the first mean square error, then determine that the data statistical result information indicates that data loss occurs;
其中,历史时间段内包括多个相同时长的历史时间片段,当前时间片段与历史时间片段的时长相同。Wherein, the historical time segment includes multiple historical time segments of the same duration, and the current time segment and the historical time segment have the same duration.
进一步的,通过日志收集模块实时采集数据链路上的每个节点所包括的日志信息,包括:Further, the log information included in each node on the data link is collected in real time through the log collection module, including:
通过日志收集模块包括的日志收集工具,收集对应二进制日志消费者的第一节点的日志信息;收集对应存放二进制日志的消息队列中间件的第二节点的日志信息;收集对应处理二进制日志的消息消费者的第三节点的日志信息;收集对应实时数仓的第四节点的日志信息;Through the log collection tool included in the log collection module, collect the log information corresponding to the first node of the binary log consumer; collect the log information corresponding to the second node of the message queue middleware storing the binary log; collect the message consumption corresponding to the binary log The log information of the third node of the user; the log information of the fourth node corresponding to the real-time data warehouse is collected;
通过日志收集模块包括的日志存储工具,存储日志收集工具收集的每个节点所包括的日志信息;将日志信息发送至数据分析模块。The log information included in each node collected by the log collection tool is stored through the log storage tool included in the log collection module; and the log information is sent to the data analysis module.
进一步的,通过数据分析模块对比日志收集模块所采集的各节点的日志信息,确定日志统计结果信息,包括:Further, the log information of each node collected by the log collection module is compared by the data analysis module to determine the log statistical result information, including:
以设定频率对比日志收集模块所收集的各节点所包括的日志信息,确定数据链路上出现数据不一致问题的问题节点;Compare the log information included in each node collected by the log collection module at a set frequency, and determine the problem node with data inconsistency on the data link;
将问题节点确定为日志统计结果信息。Determine the problem node as log statistics result information.
进一步的,数据准确性监控方法还包括:Further, the data accuracy monitoring method also includes:
通过可视化处理模块,对数据分析模块确定的数据统计结果信息和/或日志统计结果信息可视化处理。Through the visual processing module, the data statistical result information and/or the log statistical result information determined by the data analysis module are visually processed.
本发明实施例提供的数据准确性监控方法可基于上述任意实施例提供的数据准确性监控系统实现,属于同一发明构思,具备相应的功能和有益效果。The data accuracy monitoring method provided by the embodiments of the present invention can be implemented based on the data accuracy monitoring system provided by any of the above embodiments, belongs to the same inventive concept, and has corresponding functions and beneficial effects.
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发明中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本发明的技术方案所期望的结果,本文在此不进行限制。It should be understood that steps may be reordered, added or deleted using the various forms of flow shown above. For example, each step described in the present invention may be executed in parallel, sequentially, or in a different order, as long as the desired result of the technical solution of the present invention can be achieved, there is no limitation herein.
上述具体实施方式,并不构成对本发明保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本发明的精神和原则之内所作的修改、等同替换和改进等,均应包含在本发明保护范围之内。The above specific implementation methods do not constitute a limitation to the protection scope of the present invention. It should be apparent to those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made depending on design requirements and other factors. Any modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211512387.XA CN115718690B (en) | 2022-11-29 | 2022-11-29 | Data accuracy monitoring system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211512387.XA CN115718690B (en) | 2022-11-29 | 2022-11-29 | Data accuracy monitoring system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115718690A true CN115718690A (en) | 2023-02-28 |
CN115718690B CN115718690B (en) | 2025-07-18 |
Family
ID=85257015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211512387.XA Active CN115718690B (en) | 2022-11-29 | 2022-11-29 | Data accuracy monitoring system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115718690B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550975A (en) * | 2024-07-26 | 2024-08-27 | 舟谱数据技术南京有限公司 | Data acquisition accuracy detection and repair system and control method thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020152305A1 (en) * | 2000-03-03 | 2002-10-17 | Jackson Gregory J. | Systems and methods for resource utilization analysis in information management environments |
US20120254669A1 (en) * | 2011-04-04 | 2012-10-04 | Microsoft Corporation | Proactive failure handling in database services |
CN105447014A (en) * | 2014-08-15 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Metadata management method based on binglog, and method and device used for providing metadata |
CN105589791A (en) * | 2015-12-28 | 2016-05-18 | 江苏省电力公司信息通信分公司 | Method for application system log monitoring management in cloud computing environment |
US11314635B1 (en) * | 2017-12-12 | 2022-04-26 | Amazon Technologies, Inc. | Tracking persistent memory usage |
-
2022
- 2022-11-29 CN CN202211512387.XA patent/CN115718690B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020152305A1 (en) * | 2000-03-03 | 2002-10-17 | Jackson Gregory J. | Systems and methods for resource utilization analysis in information management environments |
US20120254669A1 (en) * | 2011-04-04 | 2012-10-04 | Microsoft Corporation | Proactive failure handling in database services |
CN105447014A (en) * | 2014-08-15 | 2016-03-30 | 阿里巴巴集团控股有限公司 | Metadata management method based on binglog, and method and device used for providing metadata |
CN105589791A (en) * | 2015-12-28 | 2016-05-18 | 江苏省电力公司信息通信分公司 | Method for application system log monitoring management in cloud computing environment |
US11314635B1 (en) * | 2017-12-12 | 2022-04-26 | Amazon Technologies, Inc. | Tracking persistent memory usage |
Non-Patent Citations (1)
Title |
---|
张旭红: "基于LVS的数据库集群负载均衡性能测试与分析", 现代信息科技, no. 03, 25 March 2018 (2018-03-25) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118550975A (en) * | 2024-07-26 | 2024-08-27 | 舟谱数据技术南京有限公司 | Data acquisition accuracy detection and repair system and control method thereof |
Also Published As
Publication number | Publication date |
---|---|
CN115718690B (en) | 2025-07-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8938421B2 (en) | Method and a system for synchronizing data | |
CN100487700C (en) | Data processing method and system of data library | |
CN111400288A (en) | Data quality inspection method and system | |
CN103177063A (en) | Time slider operator for temporal data aggregation | |
CN113641739B (en) | Spark-based intelligent data conversion method | |
US20200250188A1 (en) | Systems, methods and data structures for efficient indexing and retrieval of temporal data, including temporal data representing a computing infrastructure | |
CN111522870B (en) | Database access method, middleware and readable storage medium | |
CN117421376A (en) | Method and device for processing number of bins of online analysis of service data stream | |
CN112925697A (en) | Operation difference monitoring method, device, equipment and medium | |
CN117149873A (en) | Data lake service platform construction method based on flow batch integration | |
CN118377768A (en) | Data ETL method, device, equipment and medium based on service flow | |
CN109933798B (en) | Audit log analysis method and audit log analysis device | |
CN115718690A (en) | Data accuracy monitoring system and method | |
CN114020819B (en) | A method and device for synchronizing multi-system parameters | |
CN114418342A (en) | A business data processing method, device and readable storage medium | |
CN117076426B (en) | Traffic intelligent engine system construction method and device based on flow batch integration | |
CN116628023B (en) | Waiting event type query method and device, storage medium and electronic equipment | |
Jiadi et al. | Research on Data Center Operation and Maintenance Management Based on Big Data | |
CN111414355A (en) | Offshore wind farm data monitoring and storing system, method and device | |
CN114942916B (en) | Real-time data warehouse design method, device, equipment and storage medium based on Doris | |
CN116049194A (en) | Database index optimization method, storage medium and device | |
CN115099769A (en) | A risk control approval platform for automatic approval of auto financial orders | |
CN119782363B (en) | Real-time number bin construction method, query method and system based on asynchronous materialized view | |
CN112131302B (en) | Commercial data analysis method and platform | |
US12298980B1 (en) | Optimized storage of metadata separate from time series data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |