CN116701426A - Data processing method, electronic device and storage medium - Google Patents
Data processing method, electronic device and storage medium Download PDFInfo
- Publication number
- CN116701426A CN116701426A CN202310983572.5A CN202310983572A CN116701426A CN 116701426 A CN116701426 A CN 116701426A CN 202310983572 A CN202310983572 A CN 202310983572A CN 116701426 A CN116701426 A CN 116701426A
- Authority
- CN
- China
- Prior art keywords
- log
- data
- log file
- electronic device
- file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/14—Error detection or correction of the data by redundancy in operation
- G06F11/1402—Saving, restoring, recovering or retrying
- G06F11/1446—Point-in-time backing up or restoration of persistent data
- G06F11/1448—Management of the data involved in backup or backup restore
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/113—Details of archiving
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/1734—Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/1805—Append-only file systems, e.g. using logs or journals to store data
- G06F16/1815—Journaling file systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本申请实施例涉及信息技术领域,具体涉及一种数据处理方法、电子设备及存储介质。The embodiments of the present application relate to the field of information technology, and specifically relate to a data processing method, electronic equipment, and a storage medium.
背景技术Background technique
数据集成,是指将数据在逻辑上或物理上进行集中;如,通过源端数据库的日志文件,将源端数据库中的数据集成到目标端数据库。Data integration refers to centralizing data logically or physically; for example, integrating the data in the source database into the target database through the log files of the source database.
在从源端数据库获取日志文件时,存在获取到内容缺失的日志文件的情况,这会导致这些内容缺失的日志文件中的日志数据,未集成到目标端数据库,也就是产生漏数现象,会造成源端数据库与目标端数据库不一致,产生目标端的数据失真问题。When obtaining log files from the source database, there are cases where log files with missing content are obtained, which will cause the log data in these log files with missing content to not be integrated into the target database, that is, the phenomenon of missing numbers will occur. The database at the source end is inconsistent with the database at the target end, resulting in data distortion at the target end.
发明内容Contents of the invention
本申请实施例提供一种数据处理方法、电子设备及存储介质,可以缓解在数据集成过程中的漏数现象,进而可以提升源端与目标端数据的一致性,从而减少目标端的数据失真的问题。The embodiment of the present application provides a data processing method, electronic equipment and storage medium, which can alleviate the phenomenon of missing numbers in the data integration process, and further improve the data consistency between the source end and the target end, thereby reducing the problem of data distortion at the target end .
为达到上述目的,本申请的实施例采用如下技术方案:In order to achieve the above object, the embodiments of the present application adopt the following technical solutions:
第一方面,本申请提供了一种数据处理方法,该方法可以应用于数据存储系统中的电子设备。其中,电子设备可以是例如,个人计算机、笔记本电脑等等,具有数据传输及处理能力的电子设备;该数据存储系统包括源端数据库和目标端数据库。该方法包括:电子设备根据第一标识范围从源端数据库获取日志文件,该第一标识范围用于指示电子设备待获取的日志文件的日志数据标识的范围;获取到的日志文件用于对目标端数据库进行数据操作。接下来,若电子设备从获取到的日志文件中确定存在至少一个遗漏日志文件,则电子设备基于至少一个遗漏日志文件的起点日志数据标识和第一标识范围得到日志标志位。上述遗漏日志文件是包括的日志数据发生遗漏的日志文件。然后,电子设备根据上述日志标志位确定第二标识范围,并根据第二标识范围从所述源端数据库获取日志文件。其中,遗漏日志文件的起点日志数据标识是,该遗漏日志文件包括的第一条日志数据的日志数据标识。In a first aspect, the present application provides a data processing method, which can be applied to electronic devices in a data storage system. Wherein, the electronic device may be, for example, a personal computer, a notebook computer, etc., which have data transmission and processing capabilities; the data storage system includes a source-end database and a target-end database. The method includes: the electronic device obtains the log file from the source database according to the first identification range, the first identification range is used to indicate the log data identification range of the log file to be obtained by the electronic device; the obtained log file is used for the target End database for data manipulation. Next, if the electronic device determines that there is at least one missing log file from the obtained log files, the electronic device obtains the log flag bit based on the starting log data identifier and the first identification range of the at least one missing log file. The aforementioned missing log file is a log file in which included log data is missing. Then, the electronic device determines a second identification range according to the log flag bit, and obtains the log file from the source database according to the second identification range. Wherein, the starting log data identifier of the missing log file is the log data identifier of the first piece of log data included in the missing log file.
在上述方法中,电子设备通过从第一次获取到的日志文件(也就是,使用第一标识范围获取到的日志文件)中确定遗漏日志文件,并根据该遗漏日志文件得到日志标志位。并根据标志位得到第二标识范围,根据第二标识范围从源端数据库获取日志文件(也就是,第二次从源端数据库获取日志文件)。这样,在第一次获取到的日志文件中的遗漏日志文件,就可以通过日志标志位和第二标识范围在第二次从源端数据库获取到;缓解在数据集成过程中的漏数现象,可以提升源端与目标端数据的一致性,从而减少目标端的数据失真问题。In the above method, the electronic device determines the missing log file from the log file obtained for the first time (that is, the log file obtained using the first identification range), and obtains the log flag bit according to the missing log file. And obtain the second identification range according to the flag bit, and obtain the log file from the source database according to the second identification range (that is, obtain the log file from the source database for the second time). In this way, the missing log files in the log files obtained for the first time can be obtained from the source database for the second time through the log flag bit and the second identification range; to alleviate the phenomenon of missing numbers in the data integration process, It can improve the data consistency between the source end and the target end, thereby reducing the data distortion problem at the target end.
在第一方面的一种可能的设计中,上述电子设备基于至少一个遗漏日志文件的起点日志数据标识和第一标识范围得到日志标志位,可以包括:在电子设备从根据第一标识范围获取到的日志文件中确定存在一个遗漏日志文件的情况下,电子设备将上述一个遗漏日志文件的起点日志数据标识和第一标识范围的范围起点中,(数值)最大的日志数据标识作为日志标志位。或者,在电子设备从根据第一标识范围获取到的日志文件中确定存在至少两个遗漏日志文件的情况下,电子设备在至少两个遗漏日志文件中确定目标遗漏日志文件;将目标遗漏日志文件的起点日志数据标识和第一标识范围的范围起点中,(数值)最大的日志数据标识作为日志标志位;其中,目标遗漏日志文件是至少两个遗漏日志文件中起点日志数据标识最小的遗漏日志文件,目标遗漏日志文件的起点日志数据标识是该目标遗漏日志文件包括的第一条日志数据的日志数据标识。In a possible design of the first aspect, the above-mentioned electronic device obtains the log flag bit based on the starting log data identifier of at least one missing log file and the first identifier range, which may include: If it is determined that there is a missing log file in the log file, the electronic device will use the log data identifier with the largest (numerical value) among the start log data identifier of the above-mentioned one missing log file and the range start point of the first identifier range as the log flag bit. Or, when the electronic device determines that there are at least two missing log files from the log files obtained according to the first identification range, the electronic device determines the target missing log file in the at least two missing log files; Among the starting point log data identifier and the range starting point of the first identifier range, the (numerical) largest log data identifier is used as the log flag bit; wherein, the target missing log file is the missing log with the smallest starting point log data identifier among at least two missing log files file, the starting log data identifier of the target missing log file is the log data identifier of the first piece of log data included in the target missing log file.
可以理解的,日志标志位与第二标识范围的范围起点正相关;以及日志数据标识会和日志数据的生成时间相关。也就是说,日志标识位越小,第二标识范围的范围起点也会越小;电子设备使用第二标识范围从源端数据库获取到的日志文件也会越多,以及电子设备获取到的日志文件中日志数据的生成时间也会越早,与电子设备的当前时间也会相差越多。可见,日志标志位越小,电子设备在通过第二标识范围获取日志文件,进行数据集成的实时性也就越低。在这种设计中,电子设备通过尽量将日志数据标识位设置得大,可以使得电子设备既可以在第二次从源端获取日志文件时,获取到电子设备在第一次从源端数据库获取日志文件时确定的遗漏日志文件,又可以使电子设备获取的日志文件数量比较少,还可以在一定程度上控制从源端数据库获取日志文件的时效性。这样,可以既缓解在数据集成时的漏数问题,又可以提高电子设备进行数据集成的实时性。It can be understood that the log flag bit is positively related to the start point of the second identification range; and the log data identification is related to the generation time of the log data. That is to say, the smaller the log identification bit, the smaller the starting point of the second identification range; the more log files the electronic device obtains from the source database by using the second identification range, and the more log files the electronic device obtains The earlier the generation time of the log data in the file is, the greater the difference from the current time of the electronic device will be. It can be seen that the smaller the log flag, the lower the real-time performance of data integration when the electronic device obtains the log file through the second identification range. In this design, by setting the log data identification bit as large as possible, the electronic device can obtain the log file from the source end for the second time, and the electronic device can obtain log files from the source end database for the first time. When the log file is determined, the missing log file can make the number of log files obtained by the electronic device relatively small, and can also control the timeliness of obtaining log files from the source database to a certain extent. In this way, the problem of missing numbers during data integration can be alleviated, and the real-time performance of data integration by electronic equipment can be improved.
在第一方面的另一种可能的设计中,上述电子设备从获取到的日志文件中确定存在至少一个遗漏日志文件,可以包括:从获取到的日志文件中读取日志数据,若存在读取不到日志数据的日志文件,则确定存在至少一个遗漏日志文件。In another possible design of the first aspect, the above-mentioned electronic device determines that there is at least one missing log file from the obtained log file, which may include: reading log data from the obtained log file, and if there is read If there are no log files with no log data, it is determined that there is at least one missing log file.
在第一方面的另一种可能的设计中,上述电子设备从获取到的日志文件中确定存在至少一个遗漏日志文件,还可以包括:从基于第一标识范围获取到的日志文件中读取日志数据,将读取不到日志数据的日志文件,作为遗漏日志文件。In another possible design of the first aspect, the electronic device determines from the obtained log files that there is at least one missing log file, and may further include: reading the log from the log files obtained based on the first identification range data, the log files with no log data can be read as missing log files.
在第一方面的又一种可能的设计中,上述电子设备根据第一标识范围从源端数据库获取日志文件,可以包括:若电子设备未获取到与第一标识范围对应的每个日志文件,则电子设备重新根据第一标识范围获取日志文件,直至获取到与第一标识范围对应的每个日志文件。之后,电子设备基于获取到的与所述第一标识范围对应的每个日志文件,对目标端数据库进行数据操作。In yet another possible design of the first aspect, the above-mentioned electronic device obtaining log files from the source database according to the first identification range may include: if the electronic device does not obtain each log file corresponding to the first identification range, Then the electronic device acquires log files again according to the first identification range until each log file corresponding to the first identification range is acquired. Afterwards, the electronic device performs data operations on the target database based on each acquired log file corresponding to the first identification range.
可以理解的,考虑到在电子设备在根据第一标识范围从源端数据库获取日志文件的过程中,电子设备可能获取不到与第一标识范围对应的每个日志文件。基于此,在这种设计中,电子设备可以判断是否获取到与第一标识范围对应的每个日志文件,如未获取到则重新从源端数据库获取日志文件。直至电子设备获取到了与第一标识范围对应的每个日志文件。这样可以缓解,因为电子设备获取不到与第一标识范围对应的每个日志文件,而在电子设备的后续数据处理过程中导致的漏数现象。可以进一步地提升源端与目标端数据的一致性,从而缓解目标端的数据失真问题。It can be understood that, considering that the electronic device may not be able to obtain every log file corresponding to the first identification range during the process of the electronic device acquiring log files from the source database according to the first identification range. Based on this, in this design, the electronic device can determine whether each log file corresponding to the first identification range has been obtained, and if not, obtain the log file from the source database again. Until the electronic device obtains each log file corresponding to the first identification range. This can alleviate the missing data phenomenon caused by the subsequent data processing of the electronic device because the electronic device cannot obtain each log file corresponding to the first identification range. The data consistency between the source end and the target end can be further improved, thereby alleviating the data distortion problem at the target end.
在第一方面的又一种可能的设计中,上述电子设备未获取到的且与所述第一标识范围对应的日志文件包括:切换日志文件;切换日志文件可以理解位源端数据库正在进行归档操作的日志文件。In yet another possible design of the first aspect, the log files not obtained by the above-mentioned electronic device and corresponding to the first identification range include: switching log files; switching log files can understand that the source database is being archived The log file for the operation.
在第一方面的另一种可能的设计中,上述电子设备根据第一标识范围从源端数据库获取日志文件,包括:若未获取到与第一标识范围对应的每个日志文件,则重新根据第一标识范围获取日志文件;直至根据第一标识范围获取日志文件的获取次数大于或者等于预设的次数阈值(如,5次、10次)。接下来,电子设备根据最后一次获取到的日志文件对目标端数据库进行数据操作。In another possible design of the first aspect, the above-mentioned electronic device obtains log files from the source database according to the first identification range, including: if each log file corresponding to the first identification range is not obtained, re-according to Obtaining log files in the first identification range; until the number of acquisitions of log files according to the first identification range is greater than or equal to a preset number of times threshold (eg, 5 times, 10 times). Next, the electronic device performs data operations on the target database according to the last obtained log file.
可以理解的,考虑到如果源端数据库频繁地产生切换日志文件,电子设备就会重复地从源端获取日志文件,电子设备没有执行后续数据集成步骤,源端数据库中的数据到目标端数据库的时延会比较长,这会影响数据集成的实时性。基于此,在这种设计中,在电子设备从源端数据库获取日志文件的获取次数过多时,电子设备不会再次从源端数据库获取日志文件,直接基于最后一次获取到的日志文件对目标端数据库进行数据操作,这样可以使得源端数据库生成的日志文件可以及时的集成至目标端数据库,可以减少源端数据库的日志文件集成到目标端数据库的时延,可以提升电子设备进行数据集成的实时性。It is understandable, considering that if the source database frequently generates switching log files, the electronic device will repeatedly obtain log files from the source, and the electronic device does not perform subsequent data integration steps, and the data in the source database to the target database The delay will be relatively long, which will affect the real-time performance of data integration. Based on this, in this design, when the electronic device obtains log files from the source database too many times, the electronic device will not obtain log files from the source database again, and directly updates the target terminal based on the last obtained log file. The database performs data operations, so that the log files generated by the source database can be integrated into the target database in a timely manner, which can reduce the time delay of integrating the log files of the source database into the target database, and can improve the real-time data integration of electronic equipment. sex.
在第一方面的又一种可能的设计中,上述日志文件还可以包括日志文件编号,上述方法还包括:电子设备确定缺失日志文件的日志文件编号,该缺失日志文件是未获取到的与第一标识范围对应的日志文件。接下来,电子设备根据缺失日志文件的日志文件编号,得到上述缺失日志文件的起点日志数据标识或终点日志数据标识。然后,电子设备根据缺失日志文件的起点日志数据标识或终点日志数据标识,从源端数据库获取到该缺失日志文件;之后,电子设备可以基于该缺失日志文件对目标端数据库进行数据操作。其中,缺失日志文件的起点日志数据标识是该缺失日志文件包括的第一条日志数据的日志数据标识;缺失日志文件的终点日志数据标识是该缺失日志文件包括的最后一条日志数据的日志数据标识。In yet another possible design of the first aspect, the above-mentioned log file may also include a log file number, and the above-mentioned method further includes: the electronic device determines the log file number of the missing log file, and the missing log file is not obtained. A log file corresponding to an identifier range. Next, the electronic device obtains the start log data identifier or the end log data identifier of the missing log file according to the log file number of the missing log file. Then, the electronic device obtains the missing log file from the source database according to the starting log data identifier or the ending log data identifier of the missing log file; then, the electronic device can perform data operations on the target database based on the missing log file. Wherein, the start log data identifier of the missing log file is the log data identifier of the first log data included in the missing log file; the end log data identifier of the missing log file is the log data identifier of the last log data included in the missing log file .
在这种实现方式中,电子设备可以通过缺失日志文件的起点日志数据标识和终点日志数据标识,获取到上述缺失日志文件。并在后续,将该缺失日志文件中的日志数据集成至目标端数据库,这样,可以进一步地缓解数据集成时的漏数问题,提高源端数据库与目标端数据库的数据一致性,缓解目标端的数据失真问题。In this implementation manner, the electronic device can obtain the aforementioned missing log file through the starting log data identifier and the ending log data identifier of the missing log file. And in the follow-up, the log data in the missing log file will be integrated into the target database. In this way, the problem of missing data during data integration can be further alleviated, the data consistency between the source database and the target database can be improved, and the data at the target can be alleviated. Distortion problem.
在第一方面的另一种可能的设计中,上述日志文件包括:在线日志文件和归档日志文件。以及,上述电子设备根据第一标识范围从源端数据库获取日志文件包括:在日志文件是在线日志文件的情况下,电子设备获取在线日志文件的起点日志数据标识小于或等于第一标识范围的范围终点的在线日志文件。在日志文件是归档日志文件的情况下,电子设备获取起点日志数据标识小于或等于第一标识范围的范围终点,且大于或等于第一标识范围的范围起点的归档日志文件;或者,获取终点日志数据标识小于或等于第一标识范围的范围终点,且大于或等于第一标识范围的范围起点的归档日志文件。其中,在线日志文件的起点日志数据标识是该在线日志文件包括的第一条日志数据的日志数据标识,归档日志文件的起点日志数据标识是该归档日志文件包括的第一条日志数据的日志数据标识,归档日志文件的终点日志数据标识是该归档日志文件包括的最后一条日志数据的日志数据标识。In another possible design of the first aspect, the aforementioned log files include: online log files and archived log files. And, the above-mentioned electronic device obtaining the log file from the source database according to the first identification range includes: in the case that the log file is an online log file, the electronic device obtains the starting log data of the online log file from a range that is less than or equal to the first identification range Online log files for endpoints. In the case that the log file is an archived log file, the electronic device acquires the archived log file whose starting point log data identifier is less than or equal to the end point of the range of the first identification range and greater than or equal to the starting point of the range of the first identification range; or, obtains the end point log The data identifies archived log files that are less than or equal to the range end of the first identified range and greater than or equal to the range start of the first identified range. Wherein, the starting log data identifier of the online log file is the log data identifier of the first log data included in the online log file, and the starting log data identifier of the archived log file is the log data of the first log data included in the archived log file ID, the terminal log data ID of the archived log file is the log data ID of the last piece of log data included in the archived log file.
在这种设计中,电子设备在获取在线日志文件时,通过获取起点日志数据标识小于等于第一标识范围的范围终点的在线日志文件,可以更加全面、完整地获取在线日志文件,进而可以减少电子设备上出现的漏数现象。以及,电子设备在获取归档日志文件时,通过获取日志数据标识落入到上述第一标识范围中的归档日志文件,可以更加全面、完整地获取在线日志文件,进而可以减少电子设备上出现的漏数现象。In this design, when the electronic device obtains the online log file, it can obtain the online log file more comprehensively and completely by obtaining the online log file whose start point log data identifier is less than or equal to the end point of the first identification range, thereby reducing the number of electronic devices. Occurrence of missing numbers on the device. And, when the electronic device acquires the archived log files, the online log files can be obtained more comprehensively and completely by obtaining the archived log files whose log data identification falls within the above-mentioned first identification range, thereby reducing the occurrence of leaks on the electronic equipment. number phenomenon.
在第一方面的又一种可能的设计中,上述电子设备基于获取到的日志文件,对目标端数据库进行数据操作,可以包括:电子设备对获取到的所述日志文件进行解析,并基于解析后的日志文件对目标端数据库进行数据操作。其中,数据操作包括增加数据操作、删除数据操作或修改数据操作。In yet another possible design of the first aspect, the electronic device performing data operations on the target database based on the obtained log file may include: the electronic device parses the obtained log file, and based on the parsing The last log file performs data operations on the target database. Wherein, the data operation includes adding data operation, deleting data operation or modifying data operation.
在第一方面的另一种可能的设计中,上述方法还包括:若电子设备从获取到的日志文件中确定不存在遗漏日志文件,则电子设备基于第一标识范围的范围终点得到日志标志位。In another possible design of the first aspect, the above method further includes: if the electronic device determines from the obtained log files that there is no missing log file, the electronic device obtains the log flag based on the end point of the first identification range .
在第一方面的又一种可能的设计中,上述电子设备根据日志标志位确定第二标识范围,可以包括:电子设备将日志标志位作为第二标识范围的范围起点,并通过动态步长计算得到第二标识范围的范围终端。其中,动态步长与源端数据库中最大日志数据标识相关。In yet another possible design of the first aspect, the electronic device determining the second identification range according to the log flag may include: the electronic device takes the log flag as the starting point of the second identification range, and calculates the second identification range through the dynamic step Gets the range endpoint of the second identified range. Wherein, the dynamic step size is related to the maximum log data identifier in the source database.
在这种设计中,通过设置动态步长,电子设备可以根据源端生成日志数据的速度来动态地调整电子设备从源端获取日志文件的速度。当源端生成日志数据的速度快时,电子设备使用比较多的资源加载比较多的日志数据,以使得电子设备以较快的速度从源端加载日志文件;当源端生成日志数据的速度慢时,电子设备使用比较少的资源加载比较少的日志数据,以使得电子设备以比较慢的速度从源端加载日志文件。这样,可以更加合理的利用电子设备以及源端的资源(如,处理器资源、输入输出接口资源等等),以及可以提升数据集成的实时性。In this design, by setting a dynamic step size, the electronic device can dynamically adjust the speed at which the electronic device acquires log files from the source according to the speed at which the source generates log data. When the log data generated by the source is fast, the electronic device uses more resources to load more log data, so that the electronic device loads log files from the source at a faster speed; when the log data generated by the source is slow When , the electronic device uses relatively few resources to load relatively small log data, so that the electronic device loads log files from the source at a relatively slow speed. In this way, electronic equipment and source resources (such as processor resources, input and output interface resources, etc.) can be used more reasonably, and the real-time performance of data integration can be improved.
第二方面,本申请提供一种电子设备,该电子设备包括:存储器、一个或多个处理器、蓝牙模块;存储器与处理器耦合;其中,存储器中存储有计算机程序代码,计算机程序代码包括计算机指令;当计算机指令被处理器执行时,使得电子设备执行上述第一方面及第一方面任一种可能的设计所提供的方法。In a second aspect, the present application provides an electronic device, which includes: a memory, one or more processors, and a Bluetooth module; the memory is coupled to the processor; wherein, a computer program code is stored in the memory, and the computer program code includes a computer Instructions; when the computer instructions are executed by the processor, the electronic device is made to execute the method provided by the above-mentioned first aspect and any possible design of the first aspect.
第三方面,本申请提供一种数据存储系统,该数据存储系统包括:源端数据库、目标端数据库。其中,该数据存储系统还包括第二方面所提供的电子设备。In a third aspect, the present application provides a data storage system, and the data storage system includes: a source-end database and a target-end database. Wherein, the data storage system further includes the electronic device provided in the second aspect.
第四方面,本申请提供一种计算机可读存储介质,该计算机可读存储介质中存储有指令,当计算机指令在电子设备上运行时,使得电子设备执行上述第一方面及第一方面任一种可能的设计所提供的方法。In a fourth aspect, the present application provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer instructions are run on the electronic device, the electronic device executes any one of the above-mentioned first aspect and the first aspect. A possible design approach is provided.
第五方面,本申请提供一种包含指令的计算机程序产品,当该计算机程序产品在电子设备上运行时,使得电子设备可以执行上述第一方面及第一方面任一种可能的设计所提供的方法。In the fifth aspect, the present application provides a computer program product containing instructions. When the computer program product is run on the electronic device, the electronic device can execute the above-mentioned first aspect and any possible design of the first aspect. method.
其中,第二方面至第五方面中任一种设计方式所带来的技术效果可参见第一方面中不同设计方式所带来的技术效果,此处不再赘述。Wherein, the technical effect brought by any one of the design methods in the second aspect to the fifth aspect can refer to the technical effect brought by different design methods in the first aspect, and will not be repeated here.
附图说明Description of drawings
图1为本申请实施例提供的一种数据系统结构示意图;FIG. 1 is a schematic structural diagram of a data system provided by an embodiment of the present application;
图2为本申请实施例提供的一种电子设备的结构示意图;FIG. 2 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
图3为本申请实施例提供的数据处理方法的一例流程示意图;FIG. 3 is a schematic flow chart of an example of a data processing method provided in an embodiment of the present application;
图4为本申请实施例提供的数据存储节点存储日志文件的结构示意图;FIG. 4 is a schematic structural diagram of a log file stored by a data storage node provided in an embodiment of the present application;
图5为本申请实施例提供的数据处理方法的又一例流程示意图;FIG. 5 is a schematic flowchart of another example of the data processing method provided by the embodiment of the present application;
图6为本申请实施例提供的数据处理方法的另一例流程示意图;FIG. 6 is a schematic flowchart of another example of the data processing method provided by the embodiment of the present application;
图7为本申请实施例提供的电子设备获取日志文件的流程示意图;FIG. 7 is a schematic flow diagram of obtaining a log file by an electronic device provided in an embodiment of the present application;
图8为本申请实施例提供的数据处理方法的一例原理示意图;FIG. 8 is a schematic diagram of an example of the data processing method provided by the embodiment of the present application;
图9为本申请实施例提供的数据处理方法的另一例原理示意图;FIG. 9 is a schematic diagram of another example of the data processing method provided by the embodiment of the present application;
图10为本申请实施例提供的数据处理方法的又一例原理示意图;FIG. 10 is a schematic diagram of another example of the data processing method provided by the embodiment of the present application;
图11为本申请实施例提供的数据处理方法的另一例原理示意图;FIG. 11 is a schematic diagram of another example of the data processing method provided by the embodiment of the present application;
图12为本申请实施例提供的数据处理方法的又一例原理示意图;FIG. 12 is a schematic diagram of another example of the data processing method provided by the embodiment of the present application;
图13为本申请实施例提供的数据处理方法的另一例流程示意图;FIG. 13 is a schematic flowchart of another example of the data processing method provided by the embodiment of the present application;
图14为本申请实施例提供的数据处理方法的另一例原理示意图;FIG. 14 is a schematic diagram of another example of the data processing method provided by the embodiment of the present application;
图15为本申请实施例提供的一种数据处理装置的结构示意图;FIG. 15 is a schematic structural diagram of a data processing device provided in an embodiment of the present application;
图16为本申请实施例提供的另一种电子设备的结构示意图。FIG. 16 is a schematic structural diagram of another electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面,在介绍本申请实施例之前,先对本申请实施例涉及到的相关术语进行介绍。In the following, before introducing the embodiments of the present application, relevant terms involved in the embodiments of the present application will be introduced first.
数据库是指按照数据结构来组织、存储和管理数据的仓库。数据库是一个长期存储在电子设备内的、有组织的、可共享的、大量数据的集合。数据库中的文件可以按照文件的内容分为:数据文件、日志文件和控制文件。其中,数据文件是指该文件的内容是数据库中需要存储的数据;如,顾客的订单,商品的数量、原材料的库存、商品的浏览量等等。控制文件是指该文件的内容是对数据库中的数据进行的数据操作;如,对数据库中的数据进行查询操作、对数据库中的数据进行增加操作、对数据库中的数据进行删除操作等等。日志文件是指该文件的内容记录了数据库的变化;如,在XX时刻对数据库中的AA数据进行了增加操作、在YY时刻对数据库中的BB数据进行了删除操作等等。为了便于理解,下文将日志文件中的内容称为日志数据。A database is a warehouse that organizes, stores and manages data according to its data structure. A database is a collection of organized, shareable, and large amounts of data stored in electronic devices for a long time. The files in the database can be divided into: data files, log files and control files according to the contents of the files. Wherein, the data file means that the content of the file is the data that needs to be stored in the database; for example, the order of the customer, the quantity of the commodity, the inventory of the raw material, the page views of the commodity, and the like. The control file means that the content of the file is the data operation on the data in the database; for example, query the data in the database, add the data in the database, delete the data in the database, etc. The log file means that the content of the file records the changes of the database; for example, the AA data in the database is added at XX time, and the BB data in the database is deleted at YY time, etc. For ease of understanding, the content in the log file is referred to as log data hereinafter.
数据库的数据存储节点在生成日志数据时,会按照该日志数据的生成时间顺序,生成该日志数据的标识(下文可简称为日志数据标识);如,日志点位(system changenumber ,scn)或日志标号(binlog)。之后,数据库会将具有日志数据标识(如具有scn)的日志数据加入到日志文件中。也就是说,日志数据标识可以用于表征对数据库的数据操作的发生顺序。可以理解的,日志数据的scn会和该日志数据的生成时间正相关;日志数据的scn越大,该日志数据的生成时间越晚;日志数据的scn越小,该日志数据的生成时间越早。同样的,日志数据的binlog会和该日志数据的生成时间正相关;日志数据的binlog越大,该日志数据的生成时间越晚;日志数据的binlog越小,该日志数据的生成时间越早。When the data storage node of the database generates log data, it will generate the identifier of the log data (hereinafter referred to as the log data identifier for short) according to the generation time sequence of the log data; for example, log point (system changenumber, scn) or log label (binlog). Afterwards, the database will add the log data with the log data identifier (such as scn) to the log file. That is, log data identifiers can be used to characterize the sequence in which data operations on the database occur. It is understandable that the scn of the log data is positively correlated with the generation time of the log data; the larger the scn of the log data, the later the log data is generated; the smaller the scn of the log data, the earlier the log data is generated . Similarly, the binlog of the log data is positively correlated with the generation time of the log data; the larger the binlog of the log data, the later the generation time of the log data; the smaller the binlog of the log data, the earlier the generation time of the log data.
以及,数据库的数据存储节点,也会基于该数据存储节点中日志文件的数量,生成日志文件的日志文件标识;如,日志文件编号(redolog)。And, the data storage node of the database will also generate the log file identification of the log file based on the number of log files in the data storage node; for example, the log file number (redolog).
在一些数据库中,为了避免单个日志文件过大;数据库通常会将单个日志文件分为多个,即在正在写入的日志文件(在线日志文件)的大小达到预设大小时,将在线日志文件变成归档日志文件,之后,将未写入的日志内容滚动写入至下一个日志文件(准备就绪日志文件)中。以及在一些多数据存储节点的数据库中,如果单条日志数据比较大,数据库可以将单条日志数据分散为多条日志数据;之后,分散到多个数据存储节点上进行存储。这样,可以避免单机资源瓶颈。In some databases, in order to avoid a single log file being too large; the database usually divides a single log file into multiple ones, that is, when the size of the log file (online log file) being written reaches the preset size, the online log file become an archived log file, and then scroll the unwritten log content to the next log file (ready log file). And in some databases with multiple data storage nodes, if a single log data is relatively large, the database can disperse a single log data into multiple log data; and then disperse it to multiple data storage nodes for storage. In this way, a stand-alone resource bottleneck can be avoided.
需要指出的是,对于在线日志文件,由于该文件还正在被写入,没有被写完,因此该文件会具有日志文件编号,以及起点日志数据标识。其中,起点日志数据标识,是该日志文件上写入的第一条日志数据的日志数据标识。对于归档日志文件,由于该文件已经被写入完成,因此该文件会具有日志文件编号,以及起点日志数据标识,和终点日志数据标识。其中,终点日志数据标识,是该日志文件上写入的最后一条日志数据的日志数据标识。It should be pointed out that, for the online log file, since the file is still being written and has not been written, the file will have a log file number and a starting point log data identifier. Wherein, the log data identifier of the starting point is the log data identifier of the first log data written in the log file. For an archived log file, since the file has been written, the file will have a log file number, a starting point log data identifier, and an end point log data identifier. Wherein, the terminal log data identifier is the log data identifier of the last piece of log data written in the log file.
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。并且,在本申请实施例的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. Among them, in the description of this application, unless otherwise specified, "/" indicates that the objects associated with each other are an "or" relationship, for example, A/B can indicate A or B; in this application, "and/or "It is just an association relationship describing associated objects, which means that there can be three kinds of relationships, for example, A and/or B, which can mean: A exists alone, A and B exist at the same time, and B exists alone. , B can be singular or plural. Moreover, in the description of the embodiments of the present application, unless otherwise specified, "plurality" refers to two or more than two. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple . In addition, in order to clearly describe the technical solutions of the embodiments of the present application, in the embodiments of the present application, words such as "first" and "second" are used to distinguish the same or similar items with basically the same function and effect. Those skilled in the art can understand that words such as "first" and "second" do not limit the quantity and execution order, and words such as "first" and "second" do not necessarily limit the difference.
同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。Meanwhile, in the embodiments of the present application, words such as "exemplary" or "for example" are used as examples, illustrations or illustrations. Any embodiment or design scheme described as "exemplary" or "for example" in the embodiments of the present application shall not be interpreted as being more preferred or more advantageous than other embodiments or design schemes. To be precise, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete manner for easy understanding.
数据集成,是将数据在逻辑上或物理上进行集中,可以理解为将数据库中多个数据存储节点的数据汇聚到一起。例如,电子设备将来自来源端数据库设备或源端数据库设备集群(下文可简称为源端)的数据集成到目标端数据库设备或目标端数据库设备集群(下文可简称为目标端)中。在数据集成的过程中,会有多种因素导致遗漏源端的数据,未将源端的数据集成至目标端的情况,也就是会产生漏数现象。这会造成源端数据与目标端数据的不一致,产生目标端的数据失真问题。Data integration is to centralize data logically or physically, which can be understood as bringing together data from multiple data storage nodes in the database. For example, the electronic device integrates data from a source-side database device or a source-side database device cluster (hereinafter referred to as a source for short) into a target-side database device or a target-side database device cluster (hereinafter referred to as a target for short). In the process of data integration, there will be many factors that lead to the omission of data at the source end, and the situation where the data at the source end is not integrated to the target end, that is, the phenomenon of missing data will occur. This will cause inconsistency between the data at the source end and the data at the target end, resulting in data distortion at the target end.
例如,在实时数据分析业务中,数据失真问题会导致无法及时地捕捉到准确的数据变化,造成数据分析业务的数据变化的延迟,无法针对数据的变化进行及时的处理;对于一些对数据变化的准确性要求高的业务,如金融数据分析业务,人流量分析业务,订单处理业务等等,数据失真会造成很大的业务风险。又例如,在离线数据分析业务中,数据失真问题会导致,后续的数据计算产生比较大的误差,会浪费计算资源。For example, in the real-time data analysis business, the problem of data distortion will lead to the inability to capture accurate data changes in time, resulting in the delay of data changes in the data analysis business, and the inability to deal with data changes in a timely manner; For businesses that require high accuracy, such as financial data analysis business, traffic flow analysis business, order processing business, etc., data distortion will cause great business risks. For another example, in the offline data analysis business, the problem of data distortion will lead to relatively large errors in subsequent data calculations, which will waste computing resources.
在一些方案中,电子设备会从源端加载日志文件;之后,电子设备基于上述日志文件中的日志数据,对目标端的数据进行数据操作(如,增加数据、删除数据、修改数据)。接下来,电子设备基于目标端的数据操作结果(如,最后一次数据操作对应的日志数据标识),确定下一次电子设备需要加载的数据范围。这样,电子设备循环地从源端加载日志文件,之后基于日志文件中的日志数据,对目标端进行数据操作,就可以将源端的数据集成到目标端。然而,在这种方案中,并没有考虑到获取到内容缺失的日志文件的情况,基于这种方案进行数据集成,会出现漏数现象,会造成源端与目标端数据不一致,产生目标端的数据失真问题。In some solutions, the electronic device will load the log file from the source; after that, the electronic device will perform data operations on the data of the target end based on the log data in the above log file (for example, adding data, deleting data, modifying data). Next, the electronic device determines the range of data to be loaded by the electronic device next time based on the data operation result at the target end (for example, the log data identifier corresponding to the last data operation). In this way, the electronic device loads log files from the source end cyclically, and then performs data operations on the target end based on the log data in the log file, so that the data at the source end can be integrated into the target end. However, in this scheme, the situation of obtaining log files with missing content is not taken into account. Data integration based on this scheme will cause missing numbers, which will cause data inconsistencies between the source end and the target end, resulting in target end data. Distortion problem.
有鉴于此,本申请实施例提供一种数据处理方法,在该方法中,电子设备可以在获取到的日志文件中,确定内容缺失的日志文件。接下来,电子设备可以基于内容缺失的日志文件得到日志标志位。并在下一次数据集成时,基于日志标志位获取日志文件。这样,在下一次获取日志文件时,电子设备就可以再次获取到本次确定的内容缺失的日志文件。可以缓解在数据集成过程中的漏数现象,进而可以提升源端与目标端数据的一致性,从而减少目标端的数据失真的问题。In view of this, an embodiment of the present application provides a data processing method. In the method, the electronic device can determine a log file with missing content among the obtained log files. Next, the electronic device can obtain the log flag bit based on the log file with missing content. And in the next data integration, the log file is obtained based on the log flag. In this way, when the log file is acquired next time, the electronic device can acquire the log file whose content determined this time is missing again. It can alleviate the missing data phenomenon in the data integration process, and then can improve the data consistency between the source end and the target end, thereby reducing the problem of data distortion at the target end.
本申请实施例提供的数据处理方法可以应用于数据存储系统中。示例性的,参见图1,图1示出的一种数据存储系统结构示意图。其中,可按照数据的流动方向,将该数据存储系统中的数据存储节点划分为,源端数据库设备集群(简称为源端200)和目标端数据库设备集群(简称为目标端300)。在源端200和目标端300之间可以通过一个或多个电子设备100(图中仅示出一个),控制在源端200和目标端300之间的数据集成,或者对源端200或目标端300中的数据进行操作(如,增加数据、删除数据、修改数据等等)。以及,源端200可以与一个或多个数据操作设备(图中仅示出了一个)建立通信连接。该数据操作设备用于对源端200中的数据进行数据操作(如,增加数据、删除数据、修改数据等等)。同样的,目标端300也可以与一个或多个数据操作设备建立通信连接,该数据操作设备用于对目标端300中的数据进行数据操作。其中,源端200可以包括多个数据存储节点设备(下文可简称为节点);以及,目标端300可以包括多个数据存储节点。源端200或目标端300的节点中,存储有日志文件和数据文件。也就是说,节点既会存储日志文件也会存储数据文件。The data processing method provided in the embodiment of the present application can be applied to a data storage system. For example, refer to FIG. 1 , which shows a schematic structural diagram of a data storage system. Wherein, according to the flow direction of data, the data storage nodes in the data storage system can be divided into a source database device cluster (abbreviated as source 200 ) and a target database device cluster (referred to as target 300 ). Between the source end 200 and the target end 300, one or more electronic devices 100 (only one is shown in the figure) can control the data integration between the source end 200 and the target end 300, or control the source end 200 or the target end Operate data in terminal 300 (eg, add data, delete data, modify data, etc.). And, the source end 200 may establish a communication connection with one or more data operation devices (only one is shown in the figure). The data operation device is used for performing data operations on the data in the source end 200 (eg, adding data, deleting data, modifying data, etc.). Similarly, the target end 300 may also establish a communication connection with one or more data operation devices, and the data operation devices are used to perform data operations on the data in the target end 300 . Wherein, the source end 200 may include multiple data storage node devices (hereinafter referred to as nodes for short); and the target end 300 may include multiple data storage nodes. Log files and data files are stored in the nodes of the source end 200 or the target end 300 . That is, nodes store both log files and data files.
在一些实施例中,图1所示的数据存储系统架构,其架构可以是多节点集群架构(例如,Oracle数据库多节点集群架构)。在多节点集群架构中,每个节点会将该节点的日志内容滚动写入到日志文件中。其中,日志内容用于表示该数据库中的数据发生了改变。将日志内容写入到日志文件中,可以对数据库中库表数据发生的改变进行记录。为了避免单个日志文件过大,通常会将单个日志文件分为多个,即在当前日志文件的大小达到预设大小(如,1GB、3GB或5GB等等)时,将未写入的日志内容滚动写入至下一个日志文件中,同时多节点的架构,可以将过大的数据和日志内容分散到多个节点上存储,可以避免单节点资源瓶颈。In some embodiments, the architecture of the data storage system shown in FIG. 1 may be a multi-node cluster architecture (for example, an Oracle database multi-node cluster architecture). In a multi-node cluster architecture, each node will scroll and write the log content of the node to the log file. Wherein, the log content is used to indicate that the data in the database has changed. Write the log content to the log file to record the changes of the table data in the database. In order to prevent a single log file from being too large, a single log file is usually divided into multiples, that is, when the size of the current log file reaches a preset size (such as 1GB, 3GB, or 5GB, etc.), the unwritten log content Scrolling writes to the next log file. At the same time, the multi-node architecture can scatter oversized data and log content to multiple nodes for storage, which can avoid single-node resource bottlenecks.
示例性的,在图1中,数据操作设备向源端200增加数据。之后,源端200将数据操作设备增加的数据分散给源端的多个数据存储节点(如,节点1或其他节点)进行存储。以及,源端200可以生成针对数据操作设备此次增加数据操作的日志文件,该日志文件可以存储于源端200的任一个节点上。接下来,电子设备100可以将源端200的多个数据存储节点中的数据集成到目标端300。如果目标端300具有多个数据存储节点,目标端300也会进行分布式存储。Exemplarily, in FIG. 1 , the data operation device adds data to the source end 200 . Afterwards, the source end 200 distributes the data added by the data operation device to multiple data storage nodes (eg, node 1 or other nodes) at the source end for storage. And, the source end 200 can generate a log file for the data operation device adding data this time, and the log file can be stored on any node of the source end 200 . Next, the electronic device 100 may integrate data in multiple data storage nodes of the source end 200 into the target end 300 . If the target end 300 has multiple data storage nodes, the target end 300 will also perform distributed storage.
可以理解的,图1中所示出的数据库存储系统的结构并不构对数据库系统的任何限定,在实际使用中,数据库系统可以包括比图示中更多或更少的部件,或者组合某些部件,或者不同的部件布置。相关技术人员可以根据实际使用情况对数据库系统的结构进行设计。It can be understood that the structure of the database storage system shown in FIG. 1 does not constitute any limitation on the database system. In actual use, the database system may include more or less components than those shown in the illustration, or combine certain some components, or a different arrangement of components. Relevant technical personnel can design the structure of the database system according to the actual usage.
本申请实施例提供的数据存储节点设备可以是一些具有数据存储及处理能力的设备;例如,服务器(如,异构服务器、云服务器等等)、个人计算机、笔记本电脑、服务器、网络附属存储器(network attached storage,NAS)、手机、车载电脑等等。本申请实施例提供的电子设备,可以是一些具有数据传输及处理能力的电子设备;例如,个人计算机(personal computer,PC)、笔记本电脑、平板电脑、个人数据助理(personal digitalassistant,PDA)、超级移动个人计算机(ultra mobile personal computer,UMPC)、服务器、车载电脑等等。本申请实施例对数据存储节点设备和电子设备具体的产品形态不做任何限制。在一些实施例中,上述电子设备可以是运行操作系统、安装应用程序的电子设备。例如,电子设备运行的操作系统可以是Android™系统、Windows™系统,Linux™等等。The data storage node device provided in this embodiment of the application may be some devices with data storage and processing capabilities; for example, servers (such as heterogeneous servers, cloud servers, etc.), personal computers, notebook computers, servers, network-attached storage ( network attached storage, NAS), mobile phone, on-board computer, etc. The electronic equipment provided in the embodiment of the present application may be some electronic equipment with data transmission and processing capabilities; for example, personal computer (personal computer, PC), notebook computer, tablet computer, personal digital assistant (personal digital assistant, PDA), super Mobile personal computer (ultra mobile personal computer, UMPC), server, car computer, etc. The embodiments of the present application do not impose any restrictions on specific product forms of the data storage node device and the electronic device. In some embodiments, the aforementioned electronic device may be an electronic device running an operating system and installing application programs. For example, the operating system run by the electronic device may be Android™ system, Windows™ system, Linux™ and so on.
图2示出了电子设备100的硬件结构示意图。FIG. 2 shows a schematic diagram of the hardware structure of the electronic device 100 .
如图2所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,电源管理模块141,天线,无线通信模块150,显示屏140等。As shown in FIG. 2, the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a power management module 141, an antenna, and a wireless communication module 150, display screen 140 and the like.
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that, the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100 . In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown in the figure, or combine certain components, or separate certain components, or arrange different components. The illustrated components can be realized in hardware, software or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processingunit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units, for example: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (graphics processing unit, GPU), an image signal processor ( image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural network processor (neural-network processing unit, NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。Wherein, the controller may be the nerve center and command center of the electronic device 100 . The controller can generate an operation control signal according to the instruction opcode and timing signal, and complete the control of fetching and executing the instruction.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor 110 is reduced, thereby improving the efficiency of the system.
无线通信模块150可以提供应用在电子设备100上的包括无线局域网(wirelesslocal area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。The wireless communication module 150 can provide wireless local area networks (wireless local area networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (bluetooth, BT), global navigation satellite system, etc. (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
显示屏140用于显示图像,视频等。显示屏140包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emittingdiode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrixorganic light emitting diode的,AMOLED),柔性发光二极管(flex light-emittingdiode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot lightemitting diodes,QLED)等。The display screen 140 is used for displaying images, videos and the like. The display screen 140 includes a display panel. The display panel may be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light emitting diode). AMOLED), flexible light-emitting diode (flex light-emitting diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diodes (quantum dot light emitting diodes, QLED), etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100 . The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. Such as saving music, video and other files in the external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。例如,在本申请实施例中,处理器110可以通过执行存储在内部存储器121中的指令,内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。The internal memory 121 may be used to store computer-executable program codes including instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by executing instructions stored in the internal memory 121 . For example, in the embodiment of the present application, the processor 110 may execute instructions stored in the internal memory 121, and the internal memory 121 may include a program storage area and a data storage area. Wherein, the stored program area can store an operating system, at least one application program required by a function (such as a sound playing function, an image playing function, etc.) and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (universal flash storage, UFS) and the like.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100提供电能,也可以用于电子设备100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 130 is an interface conforming to the USB standard specification, specifically, it may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface 130 can be used to connect a charger to provide electric power for the electronic device 100 , and can also be used to transmit data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through them. This interface can also be used to connect other electronic devices, such as AR devices.
示例性的,电子设备可以通过天线、无线通信模块150从源端数据库获取日志文件;之后,电子设备可以通过内部存储器121存储获取的日志文件。或者,电子设备也可以通过USB接口130从源端数据库获取日志文件。接下来,电子设备可以通过处理器100解析获取到的日志文件。然后,电子设备可以通过天线、无线通信模块150将解析后的日志文件发送给目标端数据库。或者,电子设备也可以通过USB接口130从源端数据库获取日志文件。这样,源端数据库的数据就可以集成到目标端数据库。Exemplarily, the electronic device may obtain log files from the source database through the antenna and the wireless communication module 150; afterward, the electronic device may store the obtained log files through the internal memory 121. Alternatively, the electronic device can also acquire log files from the source database through the USB interface 130 . Next, the electronic device may parse the acquired log file through the processor 100 . Then, the electronic device can send the parsed log file to the target database through the antenna and the wireless communication module 150 . Alternatively, the electronic device can also acquire log files from the source database through the USB interface 130 . In this way, the data of the source database can be integrated into the target database.
又示例的,电子设备的处理器100可以通过存储在内部存储器121中的指令,从获取到的日志文件中,确定遗漏日志文件。接下来,电子设备可以基于该遗漏日志文件的日志数据标识确定电子设备在下一次获取日志文件时需要获取的日志文件;这样,电子设备在下一次获取日志文件时,就可以获取到本次获取日志文件时遗漏的日志文件。可以缓解在数据集成过程中的漏数现象,进而可以提升源端与目标端数据的一致性,从而减少目标端的数据失真的问题。As another example, the processor 100 of the electronic device may determine missing log files from the obtained log files through instructions stored in the internal memory 121 . Next, the electronic device can determine the log file that the electronic device needs to obtain when obtaining the log file next time based on the log data identifier of the missing log file; in this way, the electronic device can obtain the log file obtained this time when obtaining the log file next time log files that were missed. It can alleviate the missing data phenomenon in the data integration process, and then can improve the data consistency between the source end and the target end, thereby reducing the problem of data distortion at the target end.
下面,将以图1所示的数据系统中,电子设备100是PC,源端200包括至少两个数据存储节点(如,节点1、节点2)为例,对本申请实施例提供的数据处理方法进行介绍。In the following, in the data system shown in FIG. 1 , the electronic device 100 is a PC, and the source 200 includes at least two data storage nodes (such as node 1 and node 2) as an example, the data processing method provided by the embodiment of the present application will be described Make an introduction.
参见图3,本申请提供的数据处理方法可以包括步骤S101-S103。Referring to FIG. 3 , the data processing method provided in this application may include steps S101-S103.
在一些实施例中,在步骤S101之前,本申请提供的数据处理方法还可以包括步骤S100。In some embodiments, before step S101, the data processing method provided in this application may further include step S100.
S100.源端生成日志文件。S100. The source end generates a log file.
在一些实施例中,源端可以接收与源端建立了通信连接的数据操作设备对源端的数据操作(如,增加数据、删除数据、修改数据等等)。之后,由于数据操作设备对源端的数据操作会造成源端的数据文件的变化,因此,源端会生成日志数据,并将日志数据写入到源端的任一个数据存储节点的在线日志文件中。In some embodiments, the source end may receive data operations (such as adding data, deleting data, modifying data, etc.) performed on the source end by a data operation device that has established a communication connection with the source end. Afterwards, since the data operation of the source end by the data operation device will cause changes in the data files of the source end, the source end will generate log data and write the log data to the online log file of any data storage node at the source end.
示例性的,假设数据操作设备对节点1中的数据Z进行了删除操作,节点1的数据发生变化。此时,源端会生成日志数据,该日志数据记录了,对数据Z进行了删除操作。之后,源端会将该日志数据,随机地写入到源端任一个节点(如,节点1,或除节点1外的其他节点)的在线日志文件中。Exemplarily, it is assumed that the data operation device deletes data Z in node 1, and the data in node 1 changes. At this time, the source end will generate log data, and the log data is recorded, and the data Z is deleted. Afterwards, the source end will randomly write the log data into the online log file of any node (for example, node 1, or other nodes except node 1) at the source end.
在一些实施例中,源端也可以生成日志文件组。如,源端在生成日志文件时,源端的数据存储节点会对日志文件进行备份,生成日志文件的备份文件;并将日志文件和该日志文件的备份文件打包为一个文件组(日志文件组)。这样,可以提高日志文件的可靠性。In some embodiments, the origin can also generate groups of log files. For example, when the source end generates a log file, the data storage node at the source end will back up the log file to generate a backup file of the log file; and package the log file and the backup file of the log file into a file group (log file group) . In this way, the reliability of the log files can be improved.
在一些实施例中,源端的数据存储节点也可以采用滚动文件机制,初始化多个日志文件(日志文件组),这样可以提升源端的数据存储节点中存储日志文件的实时性。In some embodiments, the data storage node at the source end may also use a rolling file mechanism to initialize multiple log files (log file groups), which can improve the real-time performance of storing log files in the data storage node at the source end.
示例性的,参见图4,节点1采用滚动文件机制,初始化了三个日志文件组(日志文件组1、日志文件组2、日志文件组3)。且每一个日志文件组包括一个日志文件和一个该日志文件的备份文件。假设,节点1生成了日志数据,节点1会将该日志数据写入日志文件组1的日志文件和备份日志文件中。如果,该日志文件组或该日志文件组中的日志文件,达到了预设的文件大小(例如,1吉字节(gigabyte,GB)、3GB),节点1会将该日志文件进行归档操作。之后生成归档日志文件。之后,节点1会将新生成的日志数据写入日志文件组2,并初始化一个新的日志文件组。可以理解的,由于日志文件组2是初始化完成的状态,因此在日志文件组1达到了预设的文件大小后,节点2就可以立即向日志文件组2写入日志数据,这样可以提升源端的数据存储节点中存储日志文件的实时性。以及,考虑到节点1的日志数据生成速度可能比较快,可能会快于节点1初始化日志文件组的速度。因此,节点1可以设置日志文件组3;并在日志文件组2达到预设的文件大小后,将日志数据写入日志文件组3中。这样,即使节点1上的日志数据生成数据比较快,快于节点1初始化日志文件组的速度。节点1中也可以具有已经初始化完成的日志文件组,供节点1及时地写入日志数据。这样可以进一步地提升数据存储节点存储日志文件的实时性。Exemplarily, referring to FIG. 4 , node 1 adopts a rolling file mechanism and initializes three log file groups (log file group 1, log file group 2, and log file group 3). And each log file group includes a log file and a backup file of the log file. Assuming that node 1 generates log data, node 1 will write the log data into the log file and the backup log file of log file group 1. If the log file group or the log files in the log file group reaches a preset file size (for example, 1 gigabyte (GB), 3GB), node 1 will perform an archiving operation on the log file. Archived log files are generated afterwards. Afterwards, node 1 will write the newly generated log data into log file group 2, and initialize a new log file group. It is understandable that since log file group 2 is in the state of initialization, node 2 can immediately write log data to log file group 2 after log file group 1 reaches the preset file size, which can improve the Real-time performance of log files stored in data storage nodes. And, considering that the log data generation speed of node 1 may be relatively fast, it may be faster than the speed of initializing the log file group of node 1. Therefore, node 1 can set log file group 3; and write log data into log file group 3 after log file group 2 reaches a preset file size. In this way, even if the log data on node 1 generates data relatively quickly, it is faster than the speed at which node 1 initializes the log file group. The node 1 may also have an initialized log file group for the node 1 to write log data in time. In this way, the real-time performance of storing log files by the data storage node can be further improved.
可以理解的,由于日志文件组1中的日志文件,是节点1当前正写入的日志文件,因此该日志文件是在线日志文件,归档日志文件中最大的日志文件编号是139,该文件的日志文件编号是140。之后,在该日志文件达到预设的文件大小后,节点1会对该日志文件进行归档操作,并将该日志文件放入到节点1中存储归档日志文件的相关位置(如,归档日志文件队列)。Understandably, since the log file in log file group 1 is the log file currently being written by node 1, the log file is an online log file, and the largest log file number in the archived log file is 139. The log file of this file File number is 140. Afterwards, after the log file reaches the preset file size, node 1 will archive the log file, and put the log file into the relevant location where the archive log file is stored in node 1 (for example, the archive log file queue ).
示例性的,假设数据操作设备,对源端数据A进行了修改操作;之后,源端可以将该操作记录在节点1的日志文件A中。也就是说,该日志文件A的日志内容(日志数据)中会记录着,数据操作设备对源端数据A进行修改操作。Exemplarily, it is assumed that the data operation device modifies the source data A; after that, the source can record the operation in the log file A of node 1 . That is to say, the log content (log data) of the log file A will record that the data operation device modifies the source data A.
示例性的,假设数据操作设备,对源端数据B进行了删除操作;之后,源端可以将该操作记录在节点2的日志文件B中。也就是说,该日志文件B的日志内容(日志数据)中会记录着,数据操作设备对源端数据B进行删除操作。Exemplarily, it is assumed that the data operation device performs a delete operation on the data B of the source end; after that, the source end may record the operation in the log file B of the node 2 . That is to say, the log content (log data) of the log file B will record that the data operation device performs a deletion operation on the source data B.
示例性的,假设数据操作设备,对源端数据D进行了修改操作;之后,源端可以将该操作记录在节点1的日志文件A中。也就是说,该日志文件A的日志内容(日志数据)中会记录着,数据操作设备对源端数据A进行修改操作。Exemplarily, it is assumed that the data operation device modifies the source data D; after that, the source can record the operation in the log file A of node 1 . That is to say, the log content (log data) of the log file A will record that the data operation device modifies the source data A.
示例性的,假设数据操作设备,向源端增加了源端数据E;之后,源端可以将增加源端数据E的操作记录在节点2。此时,由于节点2的日志文件B的大小达到了预设大小(如,2Gb)。节点2会对日志文件B进行归档操作,以及节点2会新建日志文件C,并将增加源端数据E的操作记录在节点2的日志文件C中。也就是说,节点2的日志文件C的日志内容(日志数据)中会记录着,数据操作设备对源端数据E的增加操作。Exemplarily, it is assumed that the data operation device adds source data E to the source; after that, the source can record the operation of adding source data E in node 2 . At this time, because the size of the log file B of node 2 has reached a preset size (for example, 2Gb). Node 2 will archive log file B, and node 2 will create a new log file C, and record the operation of adding source data E in log file C of node 2. That is to say, the log content (log data) of the log file C of node 2 will record the addition operation of the data operation device on the source data E.
S101.电子设备从源端获取日志文件。S101. The electronic device obtains the log file from the source.
在一些实施例中,电子设备可以通过日志数据标识范围,从源端获取日志数据标识在上述日志数据标识范围内的日志文件。可以理解的,日志数据标识范围,可以是scn范围,也可以是binlog范围。In some embodiments, the electronic device may obtain log files whose log data identifiers are within the log data identifier range from the source through the log data identifier range. It can be understood that the log data identification range can be scn range or binlog range.
在一些实施例中,电子设备从源端获取日志文件可以包括,电子设备从源端加载(load)日志文件。In some embodiments, obtaining the log file from the source by the electronic device may include that the electronic device loads (loads) the log file from the source.
示例性的,电子设备可以通过scn范围,从源端的每个数据存储节点中获取,scn在上述scn范围中的日志文件。Exemplarily, the electronic device may obtain log files whose scn is in the above-mentioned scn range from each data storage node at the source through the scn range.
又示例性的,电子设备可以通过binlog范围,从源端的每个数据存储节点中获取,binlog在上述binlog范围中的日志文件。As another example, the electronic device may obtain, through the binlog range, from each data storage node at the source, the log files whose binlog is in the above-mentioned binlog range.
可以理解的,日志数据标识的生成会有一定的规则,如按照日志数据的生成时间,顺序生成日志数据标识。也就是说,日志数据标识会是顺序增大的,日志数据标识越大,就会表示着该日志数据的生成时间越晚;可见,日志数据标识是与时间相关的。It can be understood that there are certain rules for generating log data identifiers, for example, log data identifiers are generated sequentially according to the generation time of log data. That is to say, the log data identifier will increase in sequence, and the larger the log data identifier, the later the log data generation time will be. It can be seen that the log data identifier is related to time.
可以理解的,日志数据标识范围,可以由范围起点和范围终点组成。这样,电子设备在从源端获取日志文件时就可以获取日志数据标识大于或等于范围起点,或小于或等于范围终点的日志文件。It can be understood that the log data identifies a range, which may be composed of a range start point and a range end point. In this way, when the electronic device acquires log files from the source, it can acquire log files whose log data identifiers are greater than or equal to the start point of the range, or less than or equal to the end point of the range.
示例性的,对于源端数据存储节点中的在线日志文件,如果该在线日志文件的起点日志数据标识,小于或等于范围终点,则电子设备可以获取该在线日志文件。Exemplarily, for an online log file in the data storage node at the source end, if the log data identifier of the starting point of the online log file is less than or equal to the end point of the range, the electronic device may acquire the online log file.
可以理解的,对于在线日志文件来讲,在线日志文件中的日志数据是会不断增加的。因此,在获取在线日志文件时,通过获取起点日志数据标识小于等于范围终点的在线日志文件,可以更加全面、完整地获取在线日志文件,进而可以减少电子设备上出现的漏数现象。It can be understood that for online log files, the log data in the online log files will continuously increase. Therefore, when obtaining online log files, by obtaining the online log files whose starting point log data identifier is less than or equal to the end point of the range, the online log files can be obtained more comprehensively and completely, thereby reducing the phenomenon of missing data on electronic devices.
示例性的,对于源端数据存储节点中的归档日志文件,如果该归档日志文件的起点日志数据标识大于或等于范围起点,且小于或等于范围终点;或者,该归档日志文件的终点日志数据标识大于或等于范围起点,且小于或等于范围终点;则电子设备可以获取该归档日志文件。也就是说,电子设备会获取日志数据标识落入到上述日志数据标识范围中的归档日志文件。Exemplarily, for an archived log file in the source data storage node, if the start log data identifier of the archived log file is greater than or equal to the start point of the range and less than or equal to the end point of the range; or, the end log data identifier of the archived log file greater than or equal to the start of the range and less than or equal to the end of the range; the electronic device can obtain the archived log file. That is to say, the electronic device will acquire the archived log files whose log data identifiers fall within the aforementioned range of log data identifiers.
可以理解的,电子设备从源端获取日志文件时,电子设备大多只会获取到完整的日志文件。基于此,在获取归档日志文件时,通过获取日志数据标识落入到上述日志数据标识范围中的归档日志文件,可以更加全面、完整地获取在线日志文件,进而可以减少电子设备上出现的漏数现象。It can be understood that when an electronic device obtains a log file from a source, most of the electronic device only obtains a complete log file. Based on this, when obtaining archived log files, by obtaining the archived log files whose log data identification falls within the range of the above-mentioned log data identification, online log files can be obtained more comprehensively and completely, thereby reducing the number of omissions that occur on electronic devices Phenomenon.
可以理解的,上述步骤S101,在首次数据集成场景和非首次数据集成场景下会略有不同。其中首次数据集成场景是指,电子设备在没有进行过数据集成,或者电子设备在一段时间之前没有进行或数据集成。It can be understood that the above step S101 will be slightly different in the first data integration scenario and the non-first data integration scenario. The first data integration scenario refers to that the electronic device has not performed data integration before, or the electronic device has not performed data integration for a period of time.
在首次数据集成场景下,上述日志数据标识范围,可以是电子设备的使用人员指定的。In the first data integration scenario, the above log data identification range may be specified by the user of the electronic device.
在非首次数据集成场景下,上述日志数据标识范围,是基于历史日志标志位得到的,或者是该电子设备的使用人员指定的。其中,历史日志标志位可以理解为,电子设备在上一次执行数据集成时得到的日志标志位。In the non-initial data integration scenario, the above-mentioned log data identification range is obtained based on historical log flag bits, or is specified by the user of the electronic device. Wherein, the historical log flag can be understood as the log flag obtained by the electronic device when performing data integration last time.
在一些实施例中,电子设备可以,通过历史日志标志位,来判断是否为首次数据集成场景。如电子设备可以获取到,历史日志标志位则电子设备确定处于非首次数据集成场景。如电子设备从该电子设备的存储器中搜索得不到历史日志标志位,则电子设备确定为处于首次数据集成场景。In some embodiments, the electronic device may determine whether it is the first data integration scenario through the history log flag. If the electronic device can obtain the history log flag, the electronic device is determined to be in a non-first data integration scenario. If the electronic device cannot obtain the history log flag from the electronic device's memory, the electronic device is determined to be in the first data integration scenario.
下面,将以电子设备在首次数据集成场景下,对步骤S101进行介绍。In the following, step S101 will be introduced by using an electronic device in the first data integration scenario.
在首次数据集成场景下,由于电子设备在之前,或者在之前的一段时间,没有进行过数据集成;也就是说,电子设备获取不到历史日志标志位。基于此,电子设备可以获取该电子设备的使用人员指定的范围起点。之后,电子设备通过动态步长或预设步长(如,500、5000),得到范围终点。接下来,电子设备基于该范围起点和范围终点从源端获取日志文件。示例性的参见图5,上述步骤S101可以包括:步骤S101a1-S101a4。In the first data integration scenario, because the electronic device has not performed data integration before, or in a period of time before; that is to say, the electronic device cannot obtain the historical log flag. Based on this, the electronic device can obtain the starting point of the range specified by the user of the electronic device. Afterwards, the electronic device obtains the end point of the range through a dynamic step size or a preset step size (eg, 500, 5000). Next, the electronic device fetches log files from the source based on the range start and range end. Referring to Fig. 5 for example, the above step S101 may include: steps S101a1-S101a4.
S101a1.电子设备确定处于首次数据集成场景。S101a1. The electronic device determines that it is in the first data integration scenario.
示例性的,若电子设备未获取到历史日志标志位,则电子设备确定处于首次数据集成场景。Exemplarily, if the electronic device does not acquire the history log flag, the electronic device determines that it is in the first data integration scenario.
S101a2.电子设备获取指定范围起点。S101a2. The electronic device acquires the starting point of the specified range.
示例性的,电子设备可以获取电子设备的使用人员在电子设备上输入的范围起点。Exemplarily, the electronic device may acquire the starting point of the range input by the user of the electronic device on the electronic device.
S101a3.电子设备基于指定范围起点,通过动态步长,计算得到范围终点。S101a3. Based on the starting point of the specified range, the electronic device calculates the end point of the range through the dynamic step size.
其中,动态步长是一个会根据当前源端的最大日志数据标识变化的步长。通过设置动态步长,可以根据源端生成日志数据的速度来动态地调整电子设备从源端加载(获取)日志文件的速度。当源端生成日志数据的速度快时,电子设备使用比较多的资源加载比较多的日志数据,以使得电子设备以较快的速度从源端加载日志文件;当源端生成日志数据的速度慢时,电子设备使用比较少的资源加载比较少的日志数据,以使得电子设备以比较慢的速度从源端加载日志文件。这样,可以更加合理的利用电子设备以及源端的资源(如,处理器资源、输入输出接口资源等等),以及可以提升数据集成的实时性。Among them, the dynamic step size is a step size that will change according to the maximum log data identification of the current source. By setting the dynamic step size, the speed at which the electronic device loads (obtains) log files from the source can be dynamically adjusted according to the speed at which the source generates log data. When the log data generated by the source is fast, the electronic device uses more resources to load more log data, so that the electronic device loads log files from the source at a faster speed; when the log data generated by the source is slow When , the electronic device uses relatively few resources to load relatively small log data, so that the electronic device loads log files from the source at a relatively slow speed. In this way, electronic equipment and source resources (such as processor resources, input and output interface resources, etc.) can be used more reasonably, and the real-time performance of data integration can be improved.
示例性的,参见图6上述步骤S101a3可以包括:S101a31-S101a36。Exemplarily, referring to FIG. 6, the above step S101a3 may include: S101a31-S101a36.
其中,范围起点记为x 0,初始步长记为y 0,当前源端最大日志数据标识记为z 0。Wherein, the starting point of the range is recorded as x 0 , the initial step size is recorded as y 0 , and the current source maximum log data identifier is recorded as z 0 .
S101a31.电子设备计算日志数据标识x 1;x 1=x 0+y 0。其中,y 0可以是预设值(如,500、1000、2000等等)。S101a31. The electronic device calculates the log data identifier x 1 ; x 1 = x 0 + y 0 . Wherein, y 0 may be a preset value (eg, 500, 1000, 2000, etc.).
S101a32.电子设备获取当前源端的最大日志数据标识z 0。S101a32. The electronic device acquires the maximum log data identifier z 0 of the current source.
示例性,电子设备可以获取当前源端多个节点的在线日志文件中的最大日志数据标识。并将,上述多个节点的在线日志文件中的最大日志数据标识中最大的日志数据标识,作为源端的最大日志数据标识z 0。Exemplarily, the electronic device may acquire the largest log data identifier in the online log files of multiple nodes at the current source end. And, the largest log data identifier among the largest log data identifiers in the online log files of the above multiple nodes is used as the largest log data identifier z 0 at the source.
S101a33.电子设备计算日志数据标识x 1和上述最大日志数据标识z 0的差值z 0-x 1。S101a33. The electronic device calculates the difference z 0 - x 1 between the log data identifier x 1 and the above-mentioned maximum log data identifier z 0 .
S101a34.电子设备判断差值z 0-x 1是否在预设范围k0(如,[-100,100]、[-500,500])内。若在预设范围内,则输出x 1作为范围终点;若不在预设范围内,则执行步骤S101a35。S101a34. The electronic device judges whether the difference z 0 - x 1 is within a preset range k 0 (eg, [-100, 100], [-500, 500]). If it is within the preset range, then output x 1 as the end point of the range; if it is not within the preset range, then execute step S101a35.
S101a35.电子设备判断步长y 0是否在预设区间k1(如,[100,10000]、[300,5000]等等)内。若超出预设区间,则输出x 1作为范围终点;若未超出预设区间,则执行步骤S101a36。S101a35. The electronic device determines whether the step size y 0 is within a preset interval k 1 (eg, [100, 10000], [300, 5000], etc.). If it exceeds the preset interval, output x1 as the end point of the range; if it does not exceed the preset interval, execute step S101a36.
S101a36.电子设备基于上述z 0-x 1调整步长y 0。之后,再次执行S101a31。直至输出范围终点。S101a36. The electronic device adjusts the step size y 0 based on the above z 0 - x 1 . After that, S101a31 is executed again. until the end of the output range.
例如,电子设备为步长y 0增加上述z 0-x 1,或者,为步长增加0.8(z 0 -x 1)、1.2(z 0 -x 1)等等。可以理解的,对于源端来讲,源端会持续生成日志数据标识。也就是说,在电子设备计算范围终点的过程中,源端会生成新的日志数据标识。因此,对步长根据上述差值z 0 -x 1进行动态调整,可以根据源端生成日志数据的情况,调整电子设备获取日志数据的速度,可以合理的利用电子设备以及源端的资源。For example, the electronic device adds the above z 0 - x 1 for the step size y 0 , or, adds 0.8 ( z 0 - x 1 ), 1.2 ( z 0 - x 1 ), etc. for the step size. Understandably, for the source end, the source end will continuously generate log data identifiers. That is to say, in the process of calculating the end point of the range by the electronic device, the source end will generate a new log data identifier. Therefore, by dynamically adjusting the step size according to the above difference z 0 - x 1 , the speed at which the electronic device obtains log data can be adjusted according to the log data generated by the source, and the resources of the electronic device and the source can be reasonably used.
S101a4.电子设备基于范围起点和范围终点从源端获取日志文件。S101a4. The electronic device obtains the log file from the source based on the range start point and the range end point.
在另一些实施例中,电子设备可以获取指定的范围起点。之后,电子设备获取源端的最大日志数据标识,并将该最大日志数据标识作为范围终点,从源端获取日志文件。其中,源端的最大日志数据标识是指,源端的多个数据存储节点的在线日志文件包括的日志数据中数值最大的日志数据标识。In some other embodiments, the electronic device may acquire the specified starting point of the range. Afterwards, the electronic device obtains the maximum log data identifier of the source end, uses the maximum log data identifier as the end point of the range, and obtains the log file from the source end. Wherein, the maximum log data identifier at the source end refers to the log data identifier with the largest value among the log data included in the online log files of multiple data storage nodes at the source end.
在另一些实施例中,电子设备还可以比较上述差值z 0-x 1和y 0的大小;若z 0-x 1>y 0则增大y 0(如,增大10、100等等);之后,基于增大后的y 0计算范围终点,并在后续基于该范围终点从源端获取日志文件。若z 0-x 1<y 0则基于x 1从源端获取日志文件。和/或,电子设备还可以比较x 1-z 0和y 0的大小;若x 1-z 0>y 0则减小y 0(如,减小10、100等等);之后,基于减小后的y 0计算范围终点,并在后续基于该范围终点从源端获取日志文件。若x 1-z 0<y 0则基于x 1从源端获取日志文件。In some other embodiments, the electronic device can also compare the size of the above difference z 0 - x 1 and y 0 ; if z 0 - x 1 > y 0 , then increase y 0 (for example, increase by 10, 100, etc. ); After that, calculate the end point of the range based on the increased y 0 , and then obtain the log file from the source based on the end point of the range. If z 0 - x 1 < y 0 , log files are obtained from the source based on x 1 . And/or, the electronic device can also compare the size of x 1 - z 0 and y 0 ; if x 1 - z 0 > y 0 , reduce y 0 (for example, decrease by 10, 100, etc.); The smaller y 0 calculates the end point of the range, and then obtains the log file from the source based on the end point of the range. If x 1 - z 0 < y 0 , log files are obtained from the source based on x 1 .
在一些实施例中,上述步骤S101还可以包括:步骤S101a5。In some embodiments, the above step S101 may further include: step S101a5.
S101a5.若电子设备确定源端产生第一类遗漏日志文件,则电子设备重新执行步骤S101a4;若电子设备未确定源端产生第一类遗漏日志文件,则电子设备执行后续步骤。S101a5. If the electronic device determines that the source end generates the first type of missing log file, the electronic device re-executes step S101a4; if the electronic device does not determine that the source end generates the first type of missing log file, the electronic device performs subsequent steps.
在一些实施例中,若电子设备确定源端产生第一类遗漏日志文件,则电子设备可以执行步骤S101a4;或者,电子设备还可以执行步骤S101a4,和步骤S101a4之前的任意步骤(如S101a1-S101a4,S101a2-S101a4,S101a3-S101a4)。In some embodiments, if the electronic device determines that the source end generates the first type of missing log file, the electronic device may execute step S101a4; or, the electronic device may also execute step S101a4 and any steps before step S101a4 (such as S101a1-S101a4 , S101a2-S101a4, S101a3-S101a4).
其中,第一类遗漏日志文件,可以理解为,不能被电子设备获取到的日志文件。例如,正在进行归档操作的日志文件;对于源端的数据存储节点来讲,数据存储节点会在在线日志文件达到预设的大小时,对在线日志文件进行归档操作,由在线日志文件变成归档日志文件;也就是源端的数据存储节点会对日志文件进行切换。如果此时电子设备从源端获取日志文件,电子设备无法获取到正在进行归档操作的日志文件,仅可以获取到当前的在线日志文件和归档日志文件。之后,如果电子设备直接基于获取到的日志文件进行后续的数据集成步骤,那么此次数据集成就会产生漏数现象,会漏掉了正在进行归档操作的日志文件。Among them, the first type of missing log files can be understood as log files that cannot be obtained by electronic devices. For example, a log file that is being archived; for the data storage node at the source, the data storage node will archive the online log file when the online log file reaches the preset size, and the online log file becomes an archived log file; that is, the data storage node at the source will switch the log file. If the electronic device obtains log files from the source at this time, the electronic device cannot obtain log files that are being archived, but can only obtain current online log files and archived log files. Afterwards, if the electronic device directly performs subsequent data integration steps based on the obtained log files, then this data integration will generate missing numbers, and log files that are being archived will be missed.
在一些实施例中,电子设备可以通过判断在电子设备获取日志文件的过程(如,上述步骤S101a4)中源端是否发生文件切换,来确定源端是否产生第一类遗漏日志文件。若源端发生了文件切换,则电子设备确定源端产生了第一类遗漏日志文件;若源端未发生文件切换,则电子设备确定源端未产生第一类遗漏日志文件。In some embodiments, the electronic device may determine whether the source end generates the first type of missing log file by judging whether the source end has a file switch during the process of obtaining the log file by the electronic device (eg, the above step S101a4). If file switching occurs at the source, the electronic device determines that the source has generated the first type of missing log file; if no file switching occurs at the source, the electronic device determines that the source does not generate the first type of missing log file.
基于此,电子设备可以确定源端是否发生文件切换,如发生文件切换则重新从源端获取日志文件。这样,源端发生了文件切换的日志文件,就可以在发生了文件切换(也就是,变成了归档文件)后,以归档日志文件的形态被电子设备获取。这样电子设备在执行后续数据集成步骤时,就不会遗漏这个发生了文件切换的日志文件。可以缓解在数据集成过程中的漏数问题,进而可以提升源端与目标端数据的一致性,从而减少目标端的数据失真问题。Based on this, the electronic device can determine whether a file switch occurs at the source, and if a file switch occurs, the log file is acquired from the source again. In this way, the log file of the file switch at the source can be obtained by the electronic device in the form of an archive log file after the file switch occurs (that is, it becomes an archive file). In this way, when the electronic device performs subsequent data integration steps, the log file in which the file switching has occurred will not be missed. It can alleviate the problem of missing numbers in the process of data integration, and then can improve the consistency of data between the source and the target, thereby reducing the data distortion problem at the target.
示例性的,电子设备可以通过判断同一个数据存储节点的日志文件的日志文件编号是否连续,来判断源端是否发生日志文件切换。如来自同一个数据存储节点的日志文件的文件编号连续,则未发生日志文件切换;如来自同一个数据存储节点的日志文件的文件编号不连续,则发生日志文件切换。Exemplarily, the electronic device may determine whether log file switching occurs at the source end by determining whether the log file numbers of the log files of the same data storage node are continuous. If the file numbers of the log files from the same data storage node are continuous, log file switching does not occur; if the file numbers of the log files from the same data storage node are discontinuous, log file switching occurs.
例如,参见图7,节点1的在线日志文件的日志文件编号是140,归档日志文件的日志文件编号是139-134。如果日志文件编号为140的日志文件达到了预设大小,节点1会将日志文件编号为140的日志文件进行归档操作;之后,在日志文件组2中的日志文件写入日志数据,并给日志文件组2的日志文件设置编号141。此时,如果电子设备从节点1获取日志文件,电子设备就会获取不到日志文件编号为140的日志文件,电子设备可以获取到的日志文件的日志编号为:141、139、138、137、136、135、134。接下来,电子设备检测到获取的日志文件的日志编号不连续,电子设备确定节点1发生了日志文件切换,电子设备会再次从源端获取日志文件。For example, referring to FIG. 7 , the log file number of the online log file of node 1 is 140, and the log file numbers of the archived log files are 139-134. If the log file with log file number 140 reaches the preset size, node 1 will archive the log file with log file number 140; after that, log data will be written into the log file in log file group 2 and sent to the log Log file set number 141 for filegroup 2. At this time, if the electronic device obtains the log file from node 1, the electronic device will not be able to obtain the log file with the log file number 140, and the log numbers of the log files that the electronic device can obtain are: 141, 139, 138, 137, 136, 135, 134. Next, the electronic device detects that the log numbers of the obtained log files are discontinuous, and the electronic device determines that a log file switch has occurred on node 1, and the electronic device obtains the log file from the source again.
示例性的,电子设备可以通过源端发送的文件切换指示,来确定源端是否产生了第一类遗漏日志文件。例如,源端的数据存储节点,可以在对日志进行归档操作(也就是在发生文件切换)之前,生成并向电子设备发送文件切换指示。若电子设备接收到文件切换指示,则确定源端产生第一类遗漏日志文件;若未接收到该文件切换指示,则确定源端未产生第一类遗漏日志文件。Exemplarily, the electronic device may determine whether the source end has generated the first type of missing log file according to the file switching instruction sent by the source end. For example, the data storage node at the source can generate and send a file switching instruction to the electronic device before archiving the log (that is, before the file switching occurs). If the electronic device receives the file switching instruction, it is determined that the source end generates the first type of missing log file; if the file switching instruction is not received, it is determined that the source end does not generate the first type of missing log file.
作为一种可能的实施方式,上述步骤S101中,电子设备还可以执行循环获取过程,以得到全部与上述日志数据标识范围对应的所述源端日志文件。As a possible implementation manner, in the above step S101, the electronic device may also perform a cyclic acquisition process to obtain all the source end log files corresponding to the above log data identification range.
其中,循环获取过程包括:若获取不到全部与上述日志数据标识范围对应的所述源端日志文件,则电子设备重新获取与上述日志数据标识范围对应的源端日志文件,直至获取到全部与上述日志数据标识范围对应的源端日志文件。Wherein, the cyclic acquisition process includes: if all the source-end log files corresponding to the above-mentioned log data identification range cannot be obtained, the electronic device reacquires the source-end log files corresponding to the above-mentioned log data identification range until all the source-end log files corresponding to the above-mentioned log data identification range are obtained. The source end log file corresponding to the above log data identification range.
作为另一种可能的实施方式,若电子设备执行上述循环获取过程的次数等于预设的次数阈值(如,5次,10次),则根据电子设备最后一次循环获取过程获取到的源端日志文件的日志文件编号,确定最后一次循环获取过程未获取到的源端日志文件的日志文件编号。接下来,电子设备根据最后一次循环获取过程未获取到的源端日志文件的日志文件编号,得到最后一次循环获取过程未获取到的源端日志文件的起点日志数据标识或终点日志数据标识。然后,电子设备根据最后一次循环获取过程未获取到的源端日志文件的起点日志数据标识或终点日志数据标识,从源端数据库,获取最后一次循环获取过程未获取到的源端日志文件,源端日志文件的终点日志数据标识是源端日志文件包括的最后一条日志数据的日志数据标识。As another possible implementation, if the number of times the electronic device performs the above cyclic acquisition process is equal to the preset number of times threshold (for example, 5 times, 10 times), then according to the source log obtained by the last cyclic acquisition process of the electronic device The log file number of the file, which determines the log file number of the source log file that was not obtained in the last cyclic acquisition process. Next, the electronic device obtains the start log data identifier or the end log data identifier of the source log file not obtained in the last cyclic acquisition process according to the log file number of the source log file not obtained in the last cyclic acquisition process. Then, the electronic device obtains the source log files not obtained in the last cyclic acquisition process from the source database according to the start log data identifier or the end log data identifier of the source log files not obtained in the last cyclic acquisition process, and the source The terminal log data identifier of the end log file is the log data identifier of the last log data included in the source end log file.
可以理解的,如果源端频繁地发生日志文件切换,源端就会频繁地产生第一类遗漏日志文件;这会导致,电子设备循环地执行步骤S101a5和步骤S101a4,没有去执行后续数据集成步骤;也就是说,源端的日志文件集成到目标端的时延会比较长,这会影响数据集成的实时性。It is understandable that if log file switching occurs frequently at the source end, the source end will frequently generate the first type of missing log files; this will cause the electronic device to execute step S101a5 and step S101a4 cyclically, without performing subsequent data integration steps ; That is to say, the log file integration delay at the source end to the target end will be relatively long, which will affect the real-time performance of data integration.
基于此,在一些实施例中,电子设备可以记录,源端发生日志文件切换导致电子设备重新加载日志文件的次数(如,在执行步骤S101a5之后执行步骤S101a4的次数)。如果该次数超过预设的次数阈值(如,5次,10次),电子设备可以不重新加载日志文件(如不执行步骤S101a4),直接执行后续数据集成步骤。(如,直接基于加载到的日志文件对目标端数据库进行数据操作。)Based on this, in some embodiments, the electronic device may record the number of times the electronic device reloads the log file due to log file switching at the source (eg, the number of times step S101a4 is performed after step S101a5 is performed). If the number of times exceeds the preset number of times threshold (for example, 5 times, 10 times), the electronic device may not reload the log file (for example, step S101a4 is not performed), and directly execute subsequent data integration steps. (For example, perform data operations on the target database directly based on the log file loaded to.)
这样,在电子设备重复执行步骤S101a5和步骤S101a4,达到一定次数时;即便仍然存在第一类遗漏日志文件,电子设备也执行后续数据集成步骤(例如,根据最后一次获取到的源端日志文件对目标端数据库进行数据操作)。可以使得源端生成的日志文件可以及时的集成至目标端,可以减少源端的日志文件集成到目标端的时延,可以提升电子设备进行数据集成的实时性。In this way, when the electronic device repeatedly executes step S101a5 and step S101a4, reaching a certain number of times; even if the first type of missing log files still exist, the electronic device also performs subsequent data integration steps (for example, according to the source end log file obtained last time. The target database performs data operations). The log files generated by the source end can be integrated to the target end in a timely manner, the time delay for integrating the log files at the source end to the target end can be reduced, and the real-time performance of data integration by electronic devices can be improved.
以及,考虑到源端会持续生成日志文件;如果源端频繁地发生日志文件切换,也就是电子设备多次在步骤S101a5之后执行步骤S101a4;以及,电子设备在步骤S101a5之后执行步骤S101a4时,电子设备获取到了源端新生成的日志文件;这会导致电子设备在执行步骤S101a4时,获取的日志文件数量持续增加,会导致电子设备需要处理的日志文件数据量比较多,可能会超过电子设备的数据处理能力,甚至会导致电子设备崩溃。And, considering that the source end will continue to generate log files; if the source end frequently switches log files, that is, the electronic device executes step S101a4 after step S101a5 for many times; and, when the electronic device executes step S101a4 after step S101a5, the electronic device The device obtains the newly generated log files at the source; this will cause the number of log files obtained by the electronic device to continue to increase when the electronic device executes step S101a4, which will result in a large amount of log file data that the electronic device needs to process, which may exceed the electronic device. Data processing power can even cause electronic equipment to crash.
基于此,通过设置预设的次数阈值,当电子设备重新加载日志文件的次数超过次数阈值后,电子设备直接执行后续数据集成步骤。可以控制电子设备在执行步骤S101a4时,获取到的日志文件的数量,可以控制电子设备需要处理的日志文件的数据量,可以减少电子设备崩溃的情况。Based on this, by setting a preset number of times threshold, when the number of times the electronic device reloads the log file exceeds the number of times threshold, the electronic device directly performs subsequent data integration steps. The number of log files acquired by the electronic device when executing step S101a4 can be controlled, the data volume of the log files to be processed by the electronic device can be controlled, and the crash of the electronic device can be reduced.
在一些实施例中,如果电子设备记录的源端发生日志文件切换导致电子设备重新加载日志文件的次数,超过预设的次数阈值(5次,10次),则电子设备可以确定切换日志文件(也就是第一类遗漏日志文件)的文件编号。之后,电子设备基于切换日志文件的文件编号,获取切换日志文件的起点日志数据标识或终点日志数据标识。接下来,电子设备不重新加载日志文件,直接执行后续数据集成步骤,并在后续加载日志文件时,通过起点日志数据标识或终点日志数据标识加载该切换日志文件。其中,切换日志文件可以理解为,在电子设备从源端获取日志文件的过程中发生了切换的日志文件;也就是正在进行归档日志操作,切换为归档日志文件的日志文件。在一些实施例中,第一类遗漏日志文件(如切换日志文件)也可以被称为缺失日志文件。In some embodiments, if the log file switching at the source recorded by the electronic device causes the electronic device to reload the log file for a number of times that exceeds a preset threshold (5 times, 10 times), the electronic device may determine to switch the log file ( That is, the file number of the first type of missing log file). Afterwards, the electronic device obtains the start log data identifier or the end log data identifier of the switch log file based on the file number of the switch log file. Next, the electronic device directly executes subsequent data integration steps without reloading the log file, and loads the switching log file through the starting point log data identifier or the end point log data identifier when subsequently loading the log file. Wherein, the switching log file can be understood as a log file that has been switched during the process of obtaining the log file from the source by the electronic device; that is, a log file that is being archived and switched to an archived log file. In some embodiments, the first type of missing log files (such as switching log files) may also be referred to as missing log files.
这样,电子设备就可以通过切换日志文件的起点日志数据标识和终点日志数据标识,获取到上述切换日志文件。并在后续,将该切换日志文件中的日志数据集成至目标端数据库,这样,可以进一步地缓解数据集成时的漏数问题,提高源端数据库与目标端数据库的数据一致性,缓解目标端的数据失真问题。In this way, the electronic device can obtain the aforementioned switch log file through the start log data identifier and the end log data identifier of the switch log file. And in the follow-up, the log data in the switching log file will be integrated into the target database. In this way, the problem of missing numbers during data integration can be further alleviated, the data consistency between the source database and the target database can be improved, and the data at the target can be alleviated. Distortion problem.
假设,电子设备可以获取到的日志文件的日志编号为:141、139、138、137、136、135、134。接下来,电子设备检测到获取的日志文件的日志编号不连续,确定导致日志文件的日志编号不连续的间断点也就是日志编号为140的日志文件是切换日志文件。之后,电子设备向源端发送查询请求,以查询日志编号为140的日志文件的起点日志数据标识和终点日志数据标识。接下来,在下一次从源端获取日志文件时,基于日志编号为140的日志文件的起点日志数据标识和终点日志数据标识,从源端加载该日志编号为140的日志文件。Assume that the log numbers of the log files that can be obtained by the electronic device are: 141, 139, 138, 137, 136, 135, and 134. Next, the electronic device detects that the log numbers of the acquired log files are discontinuous, and determines that the discontinuity point that causes the log numbers of the log files to be discontinuous, that is, the log file with the log number 140, is the switching log file. Afterwards, the electronic device sends a query request to the source to query the start log data identifier and the end log data identifier of the log file whose log number is 140. Next, when the log file is acquired from the source next time, based on the start log data identifier and the end log data identifier of the log file with the log number 140, the log file with the log number 140 is loaded from the source.
在一些实施例中,电子设备在从源端获取日志文件后,电子设备还可以对获取到的日志文件进行解析(如,编码转换、解密等等)。可以理解的,对于一些数据库上的日志文件会以一些特殊的编码格式存在,或者这些日志文件会被数据存储节点加密。基于此,电子设备可以对这些日志文件进行解析,并在解析后执行后续的数据集成步骤。In some embodiments, after the electronic device obtains the log file from the source, the electronic device may also analyze (for example, code conversion, decryption, etc.) the obtained log file. Understandably, log files on some databases exist in some special encoding formats, or these log files are encrypted by data storage nodes. Based on this, the electronic device can parse these log files, and perform subsequent data integration steps after parsing.
S102.电子设备从获取到的日志文件中,确定第二类遗漏日志文件。S102. The electronic device determines the second type of missing log files from the acquired log files.
其中,第二类遗漏日志文件可以理解为,上述电子设备获取到的日志文件中,日志数据发生了缺失的日志文件。也可以理解为,日志文件内容缺失的日志文件。也就是说,电子设备获取到了第二类遗漏日志文件的,日志文件编号和起点日志数据标识等信息,未获取到第二类遗漏日志文件的日志数据。第二类遗漏日志文件的文件内容,也就是第二类遗漏日志文件的日志数据是遗漏的。Wherein, the second type of missing log file can be understood as a log file in which log data is missing among the log files obtained by the above-mentioned electronic device. It can also be understood as a log file with missing log file content. That is to say, the electronic device has obtained information such as the log file number and the starting point log data identifier of the second type of missing log file, but has not obtained the log data of the second type of missing log file. The file content of the second type of missing log file, that is, the log data of the second type of missing log file is missing.
在一些实施例中,电子设备可以读取日志文件中的日志数据。如未读取到日志文件中的日志数据,则确定该日志文件的日志数据发生了遗漏,该日志文件为第二类遗漏日志文件。In some embodiments, the electronic device can read the log data in the log file. If the log data in the log file is not read, it is determined that the log data in the log file is missing, and the log file is a second type of missing log file.
例如,电子设备获取到了日志文件的日志文件编号,电子设备根据日志文件编号只读取到了某个日志文件的名称、属性、起点日志数据标识、终点日志数据标识等,但电子设备未读取到该日志文件包括的日志数据;电子设备可以确定该日志文件为第二类遗漏日志文件。For example, the electronic device has obtained the log file number of the log file. According to the log file number, the electronic device only reads the name, attribute, starting point log data identifier, and end point log data identifier of a certain log file, but the electronic device does not read the log file number. The log data included in the log file; the electronic device can determine that the log file is a second type of missing log file.
又例如,电子设备读取到了某个日志文件的名称、属性、起点日志数据标识、终点日志数据标识;但是电子设备从日志文件中读取到了异常的日志数据,电子设备无法基于该日志文件中的日志数据进行数据操作;电子设备可以确定该日志文件为第二类遗漏日志文件。For another example, the electronic device has read the name, attribute, start log data identifier, and end point log data identifier of a certain log file; however, the electronic device has read abnormal log data from the log file, and the electronic device cannot The log data is used for data manipulation; the electronic device can determine that the log file is the second type of missing log file.
其中,日志数据可以对应数据修改操作、数据增加操作和数据删除操作中的任一种。Wherein, the log data may correspond to any one of a data modification operation, a data addition operation, and a data deletion operation.
例如,日志数据中记载的数据操作是对数据A进行修改操作,则电子设备会对目标端的数据A进行修改。又例如,日志数据中记载的数据操作是对数据B进行删除操作,则电子设备会对目标端的数据B进行删除操作。又例如,日志数据中记载的数据操作是对数据C的增加操作,增加了数据C。则电子设备会对目标端进行数据C的增加操作,为目标端增加数据C。For example, if the data operation recorded in the log data is to modify data A, the electronic device will modify data A at the target end. For another example, if the data operation recorded in the log data is to delete data B, the electronic device will delete data B at the target end. For another example, the data operation recorded in the log data is an addition operation on data C, and data C is added. Then the electronic device will perform an increase operation of data C on the target end, and add data C to the target end.
在一些实施例中,使用归档日志文件进行数据集成时发生漏数现象的概率,会小于,使用在线日志文件进行数据集成时发生漏数现象的概率。基于此,电子设备可以从获取到的日志文件中提取出在线日志文件,之后,再从上述在线日志文件中确定第二类遗漏日志文件。这样,可以减少电子设备需要确定的是否为第二类遗漏日志文件的数量。可以减少电子设备的处理量,可以提升电子设备进行数据集成的效率。In some embodiments, the probability of missing data when using archived log files for data integration is smaller than the probability of missing data when using online log files for data integration. Based on this, the electronic device may extract online log files from the obtained log files, and then determine the second type of missing log files from the online log files. In this way, the number of log files that the electronic device needs to determine whether it is the second type of missing log files can be reduced. The processing capacity of electronic equipment can be reduced, and the efficiency of data integration by electronic equipment can be improved.
在一些实施例中,电子设备也可以基于获取到的日志文件对目标端进行数据操作,这样来自源端多个数据存储节点的数据就可以被汇聚至目标端。In some embodiments, the electronic device can also perform data operations on the target end based on the obtained log files, so that data from multiple data storage nodes at the source end can be aggregated to the target end.
S103.电子设备基于第二类遗漏日志文件和本次日志数据标识范围得到日志标志位。S103. The electronic device obtains the log flag bit based on the second type of missing log file and the current log data identification range.
其中,上述本次日志数据标识范围可以理解为,本次数据集成时,在上述步骤S101中获取日志文件时,电子设备使用的日志数据标识范围(范围起点、范围终点)。以及,日志标志位,也可以被称为快照标志位。Wherein, the above log data identification range of this time can be understood as the log data identification range (range starting point, range end point) used by the electronic device when the log file is obtained in the above step S101 during data integration this time. And, the log flag can also be called the snapshot flag.
可以理解的,在本申请实施例中,日志标志位是基于第二类遗漏日志文件得到的。因此,日志标志位就会与本次数据集成时确定的第二类遗漏日志文件相关。也就是说,日志标志位会和电子设备在本次数据集成时,确定发生了内容遗漏的日志文件相关。之后,电子设备在下次获取日志文件时使用该日志标志位,从源端获取日志文件时,就可以基于该日志标志位生成日志标识范围(也就是,电子设备下次获取日志文件时,电子设备使用的日志标识范围);由于该日志标志位是与本次数据集成时确定的第二类遗漏日志文件相关的。因此,电子设备在下次获取日志文件时,就可以通过基于日志标志位生成的日志标识范围,从源端获取到本次数据集成时确定的第二类遗漏日志文件。基于此,电子设备就可以通过日志标志位,标记出本次数据集成时发现的第二类遗漏日志文件,并在下次获取日志文件时,获取到本次数据集成时发现的第二类遗漏日志文件。这样,电子设备就可以缓解在数据集成过程中的漏数现象,可以提升源端与目标端数据的一致性,从而减少目标端的数据失真问题。It can be understood that, in the embodiment of the present application, the log flag bit is obtained based on the second type of missing log files. Therefore, the log flag will be related to the second type of missing log files determined during this data integration. That is to say, the log flag bit will be related to the log file that the electronic device determines that content omission occurs during this data integration. Afterwards, the electronic device uses the log flag bit when obtaining the log file next time, and when obtaining the log file from the source, it can generate a log identification range based on the log flag bit (that is, when the electronic device obtains the log file next time, the electronic device The log identification range used); because this log flag bit is related to the second type of missing log files determined during this data integration. Therefore, when the electronic device obtains the log file next time, it can obtain the second type of missing log file determined during this data integration from the source through the log identification range generated based on the log flag bit. Based on this, the electronic device can mark the second type of missing log files found during this data integration through the log flag, and obtain the second type of missing logs found during this data integration when the log files are obtained next time document. In this way, the electronic device can alleviate the missing data phenomenon in the data integration process, and can improve the data consistency between the source end and the target end, thereby reducing the data distortion problem at the target end.
在一些实施例中,如果步骤S102中电子设备未确定遗漏日志文件。则在步骤S103中,电子设备可以基于日志数据标识范围得到日志标志位。In some embodiments, if the electronic device does not determine that the log file is missing in step S102. Then in step S103, the electronic device may obtain the log flag bit based on the log data identification range.
示例性的,若电子设备在步骤S102未确定第二类遗漏日志文件,则电子设备在步骤中可以直接将日志数据标识范围的范围终点作为日志标志位。Exemplarily, if the electronic device does not determine the second type of missing log file in step S102, the electronic device may directly use the end point of the log data identification range as the log flag in the step.
例如,参见图8,电子设备获取到了归档日志文件1、归档日志文件2、在线日志文件1和在线日志文件2。电子设备未确定第二类遗漏日志文件。之后,电子设备将范围终点作为日志标志位。For example, referring to FIG. 8 , the electronic device has acquired archived log file 1 , archived log file 2 , online log file 1 and online log file 2 . Electronics did not identify a second category of missing log files. Afterwards, the electronic device uses the end of the range as a log flag.
在一些实施例中,电子设备可以基于第二类遗漏日志文件的起点日志数据标识,和本次日志数据标识范围的范围起点得到日志标志位。In some embodiments, the electronic device may obtain the log flag bit based on the start log data identifier of the second type of missing log file and the range start point of the current log data identifier range.
例如,电子设备可以将该第二类遗漏日志文件的日志数据标识,和日志数据标识范围的范围起点中数值最大的作为,参考日志数据标识。接下来,电子设备可以通过该参考数据标识的数值得到日志标志位,日志标志位的数值可以小于或等于该参考日志数据标识的数值。可以理解的,在日志标志位的数值等于该参考日志数据标识的数值时,也就是说,日志标志位可以是该参考日志数据标识。For example, the electronic device may refer to the log data identifier, whichever has the largest value among the log data identifier of the second type of missing log file and the start point of the range of the log data identifier range. Next, the electronic device may obtain the log flag bit by using the value identified by the reference data, and the value of the log flag bit may be less than or equal to the value identified by the reference log data. It can be understood that when the value of the log flag bit is equal to the value of the reference log data identifier, that is to say, the log flag bit may be the reference log data identifier.
可以理解的,从上述描述可知,电子设备在从源端获取日志文件时,获取的是日志数据标识大于或等于范围起点,或小于或等于范围终点的日志文件。也就是说,如果范围起点越小,电子设备从源端获取的日志文件的数量就会越多。获取到的日志文件的日志数据标识也会越小。以及,日志数据标识越小的日志文件,其在源端的生成时间也会越早,与电子设备的当前时间也会相差越多。这样,下次数据集成时,如果电子设备在使用比较小的日志数据标识,得到下次数据集成时使用的日志数据标识范围。电子设备就可能会获取到,比较多的生成时间比较早的日志文件;也就是说电子设备会获取到一些时效性比较差的日志文件,这会影响电子设备进行数据集成的实时性。It can be understood that, from the above description, when the electronic device obtains log files from the source, it obtains log files whose log data identifier is greater than or equal to the start point of the range, or smaller than or equal to the end point of the range. That is to say, if the starting point of the scope is smaller, the number of log files obtained by the electronic device from the source will be larger. The log data identifier of the obtained log file will also be smaller. And, the smaller the log data identifier, the earlier the generation time of the log file at the source end, and the greater the difference from the current time of the electronic device. In this way, in the next data integration, if the electronic device is using a relatively small log data identifier, the log data identifier range used in the next data integration is obtained. Electronic devices may obtain more log files that were generated earlier; that is to say, electronic devices will obtain some log files with poor timeliness, which will affect the real-time performance of data integration by electronic devices.
以及,第二类遗漏日志文件是在本次获取日志文件时,获取到了日志文件但日志文件内容缺失的日志文件。也就是说,第二类遗漏日志文件的日志标识是大于或等于本次日志数据标识范围的范围起点(下文可简称为本次起点)的日志文件(即使用本次起点,电子设备可以获取到第二类遗漏日志文件)。以及,起点日志数据标识是日志文件中最小的日志数据标识。也就是说,使用该第二类遗漏日志文件的起点日志数据标识,电子设备也可以获取到该第二类遗漏日志文件。And, the second type of missing log file is a log file that has been obtained but the content of the log file is missing when the log file is obtained this time. That is to say, the log identifier of the second type of missing log file is a log file that is greater than or equal to the starting point of the current log data identification range (hereinafter referred to as the current starting point for short) (that is, the electronic device can obtain the The second category of missing log files). And, the starting point log data identifier is the smallest log data identifier in the log file. That is to say, the electronic device can also obtain the second type of missing log file by using the starting log data identifier of the second type of missing log file.
由此,电子设备使用第二类遗漏日志文件的起点日志数据标识,或者,使用本次起点电子设备均可以在下一次从源端获取日志文件时获取到在本次数据集成时确定的第二类遗漏日志文件。Therefore, the electronic device uses the log data identification of the starting point of the second type of missing log file, or the electronic device using this starting point can obtain the second type determined during this data integration when the log file is obtained from the source next time. Missing log files.
基于此,电子设备可以从第二类遗漏日志文件的日志数据标识,和日志数据标识范围的范围起点中,确定一个阈值(也就是,参考日志数据标识)。如果电子设备,在下次从源端获取日志文件时,使用数值比该参考日志数据标识大的范围起点,从源端获取日志文件;则电子设备可能从源端获取不到,本次确定的第二类遗漏日文件。Based on this, the electronic device may determine a threshold (that is, refer to the log data identifier) from the log data identifier of the second type of missing log file and the range starting point of the log data identifier range. If the electronic device obtains log files from the source next time, it uses a range starting point whose value is larger than the reference log data identifier to obtain log files from the source; Class II missing date files.
综上可见,如果日志标志位等于该参考日志数据标识,电子设备既可以在下次从源端获取日志文件时,获取到本次确定的第二类遗漏日志文件,又可以在一定程度上控制从源端获取日志文件的数量,可以使电子设备获取的日志文件数量不是很多;还可以在一定程度上控制从源端数据库获取日志文件的时效性。可以既缓解在数据集成时的漏数问题,又可以提高电子设备在获取日志文件时的效率,可以提高电子设备进行数据集成的实时性。To sum up, if the log flag bit is equal to the reference log data identifier, the electronic device can not only obtain the second type of missing log files determined this time when it obtains log files from the source next time, but also control to a certain extent from The number of log files acquired by the source can reduce the number of log files acquired by the electronic device; it can also control the timeliness of acquiring log files from the source database to a certain extent. It can not only alleviate the missing data problem during data integration, but also improve the efficiency of electronic equipment when acquiring log files, and improve the real-time performance of electronic equipment for data integration.
在一些实施例中,电子设备可以将遗漏日志文件的日志数据标识,和日志数据标识范围的范围起点中,数值最小的作为日志标志位。In some embodiments, the electronic device may use the smallest numerical value among the log data identifier of the missing log file and the range starting point of the log data identifier range as the log flag bit.
示例性的,参见图9,电子设备获取到了归档日志文件1、归档日志文件2、在线日志文件1和在线日志文件2。电子设备确定在线日志文件2是遗漏日志文件。之后,电子设备将遗漏日志文件的日志数据标识和范围起点进行比较,将二者中数值比较大的,也就是在线日志文件2的日志数据标识作为日志标志位。Exemplarily, referring to FIG. 9 , the electronic device has obtained the archived log file 1, the archived log file 2, the online log file 1 and the online log file 2. The electronic device determines that online log file 2 is a missing log file. Afterwards, the electronic device compares the log data identifier of the missing log file with the starting point of the range, and uses the larger value of the two, that is, the log data identifier of the online log file 2, as the log flag bit.
示例性的,参见图10,电子设备获取到了在线日志文件1和在线日志文件2。接下来,电子设备确定在线日志文件1是遗漏日志文件。之后,电子设备将遗漏日志文件的日志数据标识和范围起点进行比较,将二者中数值比较大的,也就是范围起点作为日志标志位。Exemplarily, referring to FIG. 10 , the electronic device has acquired online log file 1 and online log file 2 . Next, the electronic device determines that the online log file 1 is a missing log file. Afterwards, the electronic device compares the log data identifier of the missing log file with the starting point of the range, and takes the larger value of the two, that is, the starting point of the range, as the log flag bit.
在一些实施例中,在电子设备确定了多个第二类遗漏日志文件的情况下,电子设备从多个第二类遗漏日志文件中确定目标遗漏日志文件,目标遗漏日志文件是上述多个第二类遗漏日志文件中,起点日志数据标识最小的日志文件。之后,电子设备将目标遗漏日志文件的日志数据标识和范围起点中数值比较大的,作为参考日志数据标识。接下来,电子设备基于参考日志数据标识得到日志标志位。In some embodiments, when the electronic device determines a plurality of second-type missing log files, the electronic device determines a target missing log file from the plurality of second-type missing log files, and the target missing log file is the above-mentioned plurality of second-type missing log files. Among the second type of missing log files, the log file with the smallest starting point log data identifier. Afterwards, the electronic device uses the log data identifier of the target omission log file and the starting point of the range with a larger value as the reference log data identifier. Next, the electronic device obtains the log flag bit based on the reference log data identification.
如果,电子设备确定了多个第二类遗漏日志文件,为了使得电子设备在下次获取日志文件时,可以获取到本次数据集成时确定的多个第二类遗漏日志文件,因此可以先在多个第二类遗漏日志文件中,确定目标遗漏日志文件。由电子设备获取日志文件时的方式可知,电子设备在从源端获取日志文件时,获取的是,日志数据标识大于或等于范围起点的日志文件。因此,如果电子设备使用目标遗漏日志文件从源端获取日志文件,可以获取到上述多个第二类遗漏日志文件。基于此,可以从多个第二类遗漏日志文件中,确定目标遗漏日志文件;这样可以使得电子设备后续基于目标遗漏日志文件,得到日志标志位;在下次从源端获取日志文件时,可以基于日志标志位,从源端数据库获取到本次确定的多个第二类遗漏日志文件。If the electronic device has determined multiple second-type missing log files, in order to enable the electronic device to obtain the multiple second-type missing log files determined during this data integration when it obtains the log files next time, it can first Among the second type of missing log files, determine the target missing log file. From the manner in which the electronic device obtains the log file, it can be known that when the electronic device obtains the log file from the source, it obtains the log file whose log data identifier is greater than or equal to the start point of the range. Therefore, if the electronic device uses the target missing log file to obtain log files from the source, it can obtain the above-mentioned multiple second-type missing log files. Based on this, the target missing log file can be determined from multiple second-type missing log files; this can make the electronic device subsequently obtain the log flag bit based on the target missing log file; when obtaining the log file from the source next time, it can be based on The log flag bit, the multiple second-type missing log files determined this time are obtained from the source database.
示例性的,参见图11,电子设备获取到了在线日志文件1、在线日志文件2和在线日志文件3。接下来,电子设备确定在线日志文件1和在线日志文件3是遗漏日志文件。之后,电子设备在遗漏日志文件中,将日志数据标识比较小的作为目标遗漏日志文件,也就是电子设备将在线日志文件1作为目标遗漏日志文件。接下来,电子设备将线日志文件1的日志数据标识和范围起点进行比较,将二者中数值比较大的,也就是范围起点作为日志标志位。Exemplarily, referring to FIG. 11 , the electronic device has acquired online log file 1 , online log file 2 and online log file 3 . Next, the electronic device determines that the online log file 1 and the online log file 3 are missing log files. Afterwards, among the missing log files, the electronic device identifies a relatively small log data as the target missing log file, that is, the electronic device takes the online log file 1 as the target missing log file. Next, the electronic device compares the log data identifier of the line log file 1 with the starting point of the range, and takes the larger value of the two, that is, the starting point of the range, as the log flag bit.
示例性的,参见图12,电子设备获取到了在线日志文件1、在线日志文件2和在线日志文件3。接下来,电子设备确定在线日志文件1、在线日志文件2和在线日志文件3均是遗漏日志文件。之后,电子设备在遗漏日志文件中,将日志数据标识比较小的作为目标遗漏日志文件,也就是,电子设备将在线日志文件3作为目标遗漏日志文件。接下来,电子设备将在线日志文件3的日志数据标识和范围起点进行比较,将二者中数值比较大的,也就是范围起点作为日志标志位。Exemplarily, referring to FIG. 12 , the electronic device has acquired online log file 1 , online log file 2 and online log file 3 . Next, the electronic device determines that the online log file 1, the online log file 2, and the online log file 3 are all missing log files. After that, the electronic device identifies the smaller log data as the target missing log file among the missing log files, that is, the electronic device uses the online log file 3 as the target missing log file. Next, the electronic device compares the log data identifier of the online log file 3 with the starting point of the range, and takes the larger value of the two, that is, the starting point of the range, as the log flag bit.
可以理解的,在电子设备得到了日志标志位之后,电子设备在下一次进行数据集成时,电子设备就会处于非首次数据集成场景。It can be understood that after the electronic device obtains the log flag, the electronic device will be in a non-first data integration scenario when the electronic device performs data integration next time.
接下来,将以电子设备在非首次数据集成场景下,对步骤S101进行补充介绍。Next, step S101 will be supplemented with an electronic device in a non-first data integration scenario.
在非首次数据集成场景下,由于电子设备在之前,或者在之前的一段时间,已经进行过数据集成;也就是说,电子设备可以获取到历史日志标志位。其中,历史日志标志位可以理解为,电子设备在上一次执行数据集成时得到的日志标志位。基于此,电子设备可以获取历史日志标志位。之后,电子设备通过动态步长或预设步长,得到范围终点。接下来,电子设备基于该日志标志位和范围终点从源端获取日志文件。In the non-first data integration scenario, because the electronic device has performed data integration before, or in a period of time before; that is to say, the electronic device can obtain the historical log flag. Wherein, the historical log flag can be understood as the log flag obtained by the electronic device when performing data integration last time. Based on this, the electronic device can obtain the history log flag. Afterwards, the electronic device obtains the end of the range by means of a dynamic step size or a preset step size. Next, the electronic device obtains the log file from the source based on the log flag and the end point of the range.
示例性的,参见图13,上述步骤S101可以包括步骤S101b1-S101b4。Exemplarily, referring to FIG. 13 , the above step S101 may include steps S101b1-S101b4.
S101b1.电子设备确定处于非首次数据集成场景。S101b1. The electronic device determines that it is not in the first data integration scenario.
可以理解的,步骤S101b1的实现方式可参见上述对步骤S101a1的相关描述,在此不再赘述。It can be understood that, for the implementation manner of step S101b1, reference may be made to the relevant description of step S101a1 above, which will not be repeated here.
S101b2.电子设备将历史日志标志位作为本次数据集成时的范围起点。S101b2. The electronic device uses the historical log flag as the starting point of the data integration range.
可以理解的,由于历史日志标志位,是电子设备上次数据集成时得到的,可以反应电子设备在上次数据集成时,电子设备确定的第二类遗漏日志文件,也就是说,电子设备在上次数据集成时确定的第二类遗漏日志文件的起点日志文件标识,会大于或等于,历史日志标志位。以及,电子设备在获取日志文件时,获取的是,日志数据标识大于或等于范围起点的日志文件。因此,将历史日志标志位为作为本次范围起点,电子设备可以获取到,电子设备在上次数据集成时确定的第二类遗漏日志文件。这样,电子设备在本次数据集成时就可以获取到,电子设备在上次数据集成时确定的第二类遗漏日志文件。并在后续将上次数据集成时确定的第二类遗漏日志文件中的日志数据,集成到目标端。这样可以,缓解数据集成时的漏数现象,提升源端与目标端数据的一致性,从而减少目标端的数据失真的问题。Understandably, since the history log flag is obtained during the last data integration of the electronic device, it can reflect the second type of missing log files determined by the electronic device during the last data integration of the electronic device, that is, the electronic device is in the The starting log file identifier of the second type of missing log file determined during the last data integration, which will be greater than or equal to the historical log flag. And, when the electronic device obtains the log file, what is obtained is the log file whose log data identifier is greater than or equal to the start point of the range. Therefore, by setting the historical log mark as the starting point of this range, the electronic device can obtain the second type of missing log files determined by the electronic device during the last data integration. In this way, the electronic device can obtain the second type of missing log files determined by the electronic device during the last data integration during this data integration. And subsequently integrate the log data in the second type of missing log files determined during the last data integration to the target end. In this way, the phenomenon of missing numbers during data integration can be alleviated, and the data consistency between the source and target ends can be improved, thereby reducing the problem of data distortion at the target end.
S101b3.电子设备基于本次数据集成时的范围起点,通过动态步长,计算得到范围终点。S101b3. The electronic device calculates and obtains the end point of the range through the dynamic step size based on the start point of the range during this data integration.
其中,本次数据集成时的范围起点可以是上述步骤S101b2中得到的范围起点。Wherein, the starting point of the range during data integration this time may be the starting point of the range obtained in the above step S101b2.
可以理解的,步骤S101b3的实现方式可参见上述对步骤S101a3的相关描述,在此不再赘述。It can be understood that, for the implementation manner of step S101b3, reference may be made to the relevant description of step S101a3 above, which will not be repeated here.
S101b4.电子设备基于范围起点和范围终点从源端获取日志文件。S101b4. The electronic device obtains the log file from the source based on the range start point and the range end point.
可以理解的,步骤S101b4的实现方式可参见上述对步骤S101a4的相关描述,在此不再赘述。It can be understood that, for the implementation manner of step S101b4, reference may be made to the relevant description of step S101a4 above, which will not be repeated here.
在一些实施例中,上述步骤S101还可以包括步骤S101b5。In some embodiments, the above step S101 may also include step S101b5.
S101b5.若电子设备确定源端产生第一类遗漏日志文件,则电子设备执行步骤S101b4;若电子设备未确定源端产生第一类遗漏日志文件,则电子设备执行后续步骤。S101b5. If the electronic device determines that the source end generates the first type of missing log file, the electronic device executes step S101b4; if the electronic device does not determine that the source end generates the first type of missing log file, the electronic device executes subsequent steps.
可以理解的,步骤S101b5的实现方式可参见上述对步骤S101a5的相关描述,在此不再赘述。It can be understood that, for the implementation manner of step S101b5, reference may be made to the relevant description of step S101a5 above, which will not be repeated here.
下面,将结合电子设备的多次数据集成过程,对本申请实施例提供的数据处理方法进行介绍。该方法,可以包括:电子设备根据第一标识范围从源端数据库获取源端日志文件,该第一标识范围用于指示电子设备待获取的源端日志文件的日志数据标识的范围。其中,此步骤的实现方式可参照上述步骤S101的介绍,在此不再赘述。In the following, the data processing method provided by the embodiment of the present application will be introduced in combination with multiple data integration processes of the electronic device. The method may include: the electronic device acquires the source-end log file from the source-end database according to the first identification range, where the first identification range is used to indicate the log data identification range of the source-end log file to be acquired by the electronic device. For the implementation of this step, reference may be made to the introduction of the above step S101, which will not be repeated here.
之后,电子设备基于获取到的源端日志文件,对目标端数据库进行数据操作。Afterwards, the electronic device performs data operations on the target database based on the acquired source log file.
例如,源端日志文件中记载的数据操作是对数据A进行修改操作,则电子设备会对目标端的数据A进行修改。又例如,源端日志文件记载的数据操作是对数据B进行删除操作,则电子设备会对目标端的数据B进行删除操作。又例如,源端日志文件中记载的数据操作是对数据C的增加操作,增加了数据C。则电子设备会对目标端进行数据C的增加操作,为目标端增加数据C。For example, if the data operation recorded in the log file at the source is to modify data A, the electronic device will modify data A at the target. For another example, if the data operation recorded in the log file at the source is to delete data B, the electronic device will delete data B at the target. For another example, the data operation recorded in the source log file is an addition operation on data C, which adds data C. Then the electronic device will perform an increase operation of data C on the target end, and add data C to the target end.
接下来,若电子设备从获取到的源端日志文件中确定存在至少一个遗漏日志文件,则电子设备基于至少一个遗漏日志文件的起点日志数据标识和第一标识范围得到日志标志位。上述遗漏日志文件是包括的日志数据发生遗漏的日志文件。遗漏日志文件的起点日志数据标识是,该遗漏日志文件包括的第一条日志数据的日志数据标识。其中,此步骤的实现方式可参照上述步骤S102和步骤S103的介绍,在此不再赘述。Next, if the electronic device determines that there is at least one missing log file from the obtained source log files, the electronic device obtains the log flag bit based on the starting log data identifier and the first identification range of the at least one missing log file. The aforementioned missing log file is a log file in which included log data is missing. The starting log data identifier of the missing log file is the log data identifier of the first piece of log data included in the missing log file. For the implementation of this step, reference may be made to the introduction of the above step S102 and step S103, which will not be repeated here.
然后,电子设备根据上述日志标志位确定第二标识范围,并根据第二标识范围从所述源端数据库获取源端日志文件。Then, the electronic device determines a second identification range according to the log flag bit, and obtains the source-end log file from the source-end database according to the second identification range.
其中,此步骤的实现方式可参照上述步骤S101的介绍,在此不再赘述。For the implementation of this step, reference may be made to the introduction of the above step S101, which will not be repeated here.
下面,将以在订单处理业务场景下,对源端数据进行数据集成为例对本申请实施例提供的数据处理方法进行举例说明。In the following, the data processing method provided in the embodiment of the present application will be illustrated by taking data integration of source data in an order processing business scenario as an example.
示例性的,在订单处理业务场景下,数据存储系统中的源端可以包括三个数据存储节点(节点1、节点2和节点3),目标端包括两个数据存储节点(节点a和节点b)。在订单处理业务场景下,源端可以用于接收订单信息,并基于订单信息对源端进行数据操作。例如,假设商家上架100数量的商品A,源端会接收到增加数据的数据操作,在源端的数据文件中,增加“商品A,数量100”这条数据。又例如,用户下单了10数量的商品A,源端会接收到修改数据的数据操作,将数据文件中的“商品A,数量100”修改为,“商品A,数量90”。再例如,商家将商品A下架,源端会接收到删除数据的数据操作,在源端的数据文件中,删除“商品A,数量90”这条数据。以及,目标端可以用于同步源端的数据,并基于目标端数据,进行订单发货,商品数量展示等等。Exemplarily, in the order processing business scenario, the source end in the data storage system may include three data storage nodes (node 1, node 2, and node 3), and the target end may include two data storage nodes (node a and node b ). In the order processing business scenario, the source end can be used to receive order information and perform data operations on the source end based on the order information. For example, assuming that the merchant puts 100 items of product A on the shelves, the source end will receive the data operation of adding data, and add the data "product A, quantity 100" to the data file at the source end. For another example, if the user places an order for 10 items of product A, the source end will receive the data modification operation and change the "product A, quantity 100" in the data file to "product A, quantity 90". For another example, when a merchant removes product A from the shelves, the source end will receive a data operation to delete the data, and delete the data "product A, quantity 90" in the data file at the source end. And, the target end can be used to synchronize the data of the source end, and based on the data of the target end, carry out order delivery, product quantity display, and so on.
假设,参见图14,源端接收到增加了数据文件1、数据文件2、数据文件3;其中,数据文件1是“商品A,数量100”,数据文件2是“商品B,数量200”,数据文件3是“商品C,数量150”。之后,源端的数据存储节点会生成针对上述数据文件1、数据文件2、数据文件3的日志数据。并将该日志数据随机的存入上述三个数据存储节点(节点1、节点2和节点3)中。例如,源端可以将针对数据文件1的日志数据1存入节点1、将针对数据文件2的日志数据2存入节点2,将针对数据文件3的日志数据3存入节点3。之后,源端接收到了下单操作,源端会增加数据文件4“顾客A,购买50个商品A”,以及源端会修改数据文件1,将数据文件1修改为“商品A,数量50”。接下来,源端会生成针对数据文件4的日志数据4,并将该日志数据存入节点2;以及,源端会生成针对数据文件1的修改操作生成日志数据5,并将该日志数据5存入节点3。之后,源端接收到了商品C的下架操作,源端会将数据文件3删除,之后,源端生成针对将数据文件3删除操作的日志数据6,并将该日志数据存入节点1。此时,节点1的在线日志文件a1达到了预设大小,节点会对日志文件a1进行归档操作,并将日志数据6存入日志文件a2中。Assume, as shown in Figure 14, that the source end receives data file 1, data file 2, and data file 3; among them, data file 1 is "commodity A, quantity 100", and data file 2 is "commodity B, quantity 200", Data file 3 is "commodity C, quantity 150". Afterwards, the data storage node at the source will generate log data for the above-mentioned data file 1, data file 2, and data file 3. And randomly store the log data into the above three data storage nodes (node 1, node 2 and node 3). For example, the source can store log data 1 for data file 1 in node 1, log data 2 for data file 2 in node 2, and store log data 3 for data file 3 in node 3. After that, the source end receives the order placing operation, the source end will add data file 4 "customer A, purchase 50 items of product A", and the source end will modify data file 1, and modify data file 1 to "product A, quantity 50" . Next, the source end will generate log data 4 for data file 4, and store the log data in node 2; and, the source end will generate log data 5 for the modification operation of data file 1, and store the log data 5 Deposit in node 3. Afterwards, the source end receives the removal operation of product C, and the source end deletes data file 3. After that, the source end generates log data 6 for the operation of deleting data file 3, and stores the log data in node 1. At this time, the online log file a1 of node 1 has reached the preset size, and the node will archive the log file a1 and store the log data 6 in the log file a2.
基于此,节点1的日志文件a1就会记录着日志数据1,节点1的日志文件a2就会记录着日志数据6;节点2的日志文件b1就会记录着日志数据2和日志数据4,节点3的日志文件c1就会记录着日志数据3和日志数据5。以及,源端在生成日志数据时,会同时生成该日志数据的日志标识(如,为scn);如,日志数据1的scn为1001、日志数据2的scn为1002、日志数据3的scn为1003、日志数据4的scn为1005、日志数据5的scn为1008、日志数据6的scn为1010。也就是说,日志文件a1的起点日志数据标识为975、日志文件a1的终点日志数据标识为1001、日志文件b1的起点日志数据标识为980、日志文件c1的起点日志数据标识为1000。日志文件a2的起点日志数据标识为1010。Based on this, log file a1 of node 1 will record log data 1, log file a2 of node 1 will record log data 6; log file b1 of node 2 will record log data 2 and log data 4, node The log file c1 of 3 will record log data 3 and log data 5. And, when the source end generates log data, it will generate the log identifier (for example, scn) of the log data at the same time; for example, the scn of log data 1 is 1001, the scn of log data 2 is 1002, and the scn of log data 3 is 1003, the scn of log data 4 is 1005, the scn of log data 5 is 1008, and the scn of log data 6 is 1010. That is to say, the start log data identifier of log file a1 is 975, the end log data identifier of log file a1 is 1001, the start log data identifier of log file b1 is 980, and the start log data identifier of log file c1 is 1000. The starting log data identifier of the log file a2 is 1010.
示例性的,日志文件的日志数据标识如下述表1所示。Exemplarily, the log data identification of the log file is shown in Table 1 below.
表1Table 1
其中,由于日志文件a2、日志文件b1、日志文件c1是在线日志文件,会被随时写入日志数据,基于此其不具有终点日志数据标识。Among them, since the log file a2, the log file b1, and the log file c1 are online log files, and will be written into log data at any time, they do not have an end-point log data identifier based on this.
可以理解的,在源端生成上述日志数据1-6的过程中,源端也会生成一些针对源端数据库系统的日志数据(图中未示出),这些数据也会占用scn,因此上述日志数据1-6的scn可能不是连续的。It is understandable that during the process of generating the above log data 1-6 at the source end, the source end will also generate some log data (not shown in the figure) for the source end database system, and these data will also occupy scn, so the above log The scn of data 1-6 may not be consecutive.
接下来,电子设备进行数据集成,将源端三个节点(节点1、节点2和节点3)的数据集成到目标端。由于,电子设备没有获取到历史日志标志;因此,电子设备确定处于首次集成场景。电子设备第一次从源端加载日志文件。接下来,电子设备获取指定范围起点,如为988。之后,电子设备会通过动态步长计算得到,日志数据标识的终端为1011,也就是日志数据标识范围是[988,1011]。而后,电子设备基于该日志数据标识范围,从源端获取日志文件。Next, the electronic device performs data integration, integrating the data of the three nodes (node 1, node 2 and node 3) at the source end to the target end. Because the electronic device has not acquired the history log flag; therefore, the electronic device is determined to be in the first integration scene. The electronic device loads the log file from the source for the first time. Next, the electronic device obtains the starting point of the specified range, such as 988. Afterwards, the electronic device will calculate through the dynamic step size, and the terminal identified by the log data is 1011, that is, the range of the log data identification is [988,1011]. Then, the electronic device obtains the log file from the source based on the log data identification range.
示例性的,再次参见图14,日志文件a1、日志文件a2、日志文件b1和日志文件c1的日志数据标识均和上述日志数据标识范围有重合,因此,电子设备需要获取日志文件a1、日志文件a2、日志文件b1和日志文件c1。然而,由于日志文件a1正在进行归档操作,电子设备没有获取到日志文件a1。也就说,电子设备获取到了日志文件a2、日志文件b1和日志文件c1。之后,电子设备通过获取节点1的归档日志文件编号(如为日志文件a0的日志文件编号)和日志文件a2的日志文件编号比较发现,二者不连续,发生了日志文件切换。接下来,电子设备重新从源端获取日志文件;如,电子设备获取到了日志文件a1、日志文件a2、日志文件b1和日志文件c1。然后,电子设备确定获取到日志文件未发生文件切换。这样,就可以缓解源端在线日志文件切换导致的漏数现象。Exemplarily, referring to FIG. 14 again, the log data identifiers of the log file a1, log file a2, log file b1, and log file c1 all overlap with the range of the above-mentioned log data identifiers. Therefore, the electronic device needs to obtain the log file a1, log file a2, log file b1 and log file c1. However, because the log file a1 is being archived, the electronic device does not obtain the log file a1. That is to say, the electronic device has obtained the log file a2, the log file b1 and the log file c1. Afterwards, the electronic device obtains the archived log file number of node 1 (for example, the log file number of log file a0) and compares the log file number of log file a2 and finds that the two are not continuous, and a log file switch has occurred. Next, the electronic device obtains the log files from the source again; for example, the electronic device obtains the log file a1, the log file a2, the log file b1, and the log file c1. Then, the electronic device determines that no file switching has occurred in the acquired log file. In this way, the missing data caused by the switching of online log files at the source can be alleviated.
接下来,由于日志文件a1、日志文件a2、日志文件b1和日志文件c1均是加密状态,电子设备对,日志文件a1、日志文件a2、日志文件b1和日志文件c1进行解密;并在解密之后,基于分别日志文件a1、日志文件a2、日志文件b1和日志文件c1中的日志数据,对目标端进行数据操作。Next, since the log file a1, the log file a2, the log file b1 and the log file c1 are all encrypted, the electronic device decrypts the log file a1, the log file a2, the log file b1 and the log file c1; and after decryption , based on the log data in log file a1, log file a2, log file b1, and log file c1, perform data operations on the target end.
之后,电子设备可以读取解析后的日志文件a1、日志文件a2、日志文件b1和日志文件c1中的日志数据,如果日志文件b1未读取到日志数据,则确定日志文件b1为遗漏日志文件。或者,电子设备也可以通过解析后的日志文件a1、日志文件a2、日志文件b1和日志文件c1对目标端进行数据操作。如在基于日志文件c1执行数据操作时,目标端未获取到日志文件c1中日志数据对应的数据操作,目标端的数据未发生数据改变;则电子设备确定日志文件c1为遗漏日志文件。Afterwards, the electronic device can read the log data in the analyzed log file a1, log file a2, log file b1 and log file c1, if the log file b1 does not read the log data, then determine that the log file b1 is a missing log file . Alternatively, the electronic device may also perform data operations on the target end through the parsed log file a1, log file a2, log file b1, and log file c1. For example, when the data operation is performed based on the log file c1, the target end does not obtain the data operation corresponding to the log data in the log file c1, and the data at the target end does not change; the electronic device determines that the log file c1 is a missing log file.
接下来,电子设备在日志文件b1和日志文件c1中确定目标遗漏文件。从图14中可见,日志文件b1的起点日志数据标识(如,scn=980)会小于日志文件c1的起点日志数据标识(如,scn=1000)。因此,电子设备,确定日志文件b1为目标遗漏文件。Next, the electronic device determines the target missing file in the log file b1 and the log file c1. It can be seen from FIG. 14 that the start log data identifier (for example, scn=980) of the log file b1 is smaller than the start log data identifier (for example, scn=1000) of the log file c1. Therefore, the electronic device determines that the log file b1 is the target missing file.
而后,电子设备将日志文件b1的起点日志数据标识和范围起点中,数值最大的作为日志标志位。也就是将范围起点作为日志标志位,也就是988。Then, the electronic device takes the log data identifier of the starting point of the log file b1 and the starting point of the range, whichever has the largest value, as the log flag bit. That is, the starting point of the range is used as the log flag, which is 988.
示例性的,遗漏日志文件和范围起点的数值对比可参见下述表2。Exemplarily, the numerical comparison between the missing log file and the starting point of the range can be found in Table 2 below.
表2Table 2
接下来,电子设备基于第一次数据集成时得到的日志标志位,也就是988,得到日志数据标识范围。并基于该日志数据标识范围,第二次进行数据集成,从源端获取日志文件。如,获取日志文件a1、日志文件a2、日志文件b1和日志文件c1。之后,电子设备解析上述日志文件a1、日志文件a2、日志文件b1和日志文件c1,接着对目标端进行数据操作。可以理解的,对于一些目标端来讲,其会有一些避免重复数据操作的机制。这种机制,过滤掉对目标端相同的数据操作。因此即使电子设备再次基于解析后的日志文件a1和日志文件a2对目标端进行数据操作,目标端的数据也不会错乱。Next, the electronic device obtains the log data identification range based on the log flag obtained during the first data integration, that is, 988. And based on the log data identification range, data integration is performed for the second time to obtain log files from the source. For example, log file a1, log file a2, log file b1, and log file c1 are obtained. Afterwards, the electronic device parses the above-mentioned log file a1, log file a2, log file b1, and log file c1, and then performs data operations on the target end. It can be understood that for some targets, there are some mechanisms for avoiding repeated data operations. This mechanism filters out the same data operations on the target side. Therefore, even if the electronic device performs data operations on the target end again based on the parsed log file a1 and log file a2, the data at the target end will not be confused.
这样,电子设备在第二次数据集成时,就可以再次获取第一次数据集成中遗漏的日志文件b1、和日志文件c1。这样,可以减少日志文件b1、日志文件c1未集成至目标端的情况。可以缓解,因为目标端未获取到日志数据2、日志数据3、日志数据4和日志数据5,而导致在目标端数据失真的情况。可以提高基于目标端的数据进行订单发货和商品数量展示时的准确程度。In this way, during the second data integration, the electronic device can obtain the log file b1 and log file c1 that were missed in the first data integration again. In this way, the situation that the log file b1 and the log file c1 are not integrated into the target end can be reduced. It can alleviate the situation that the data at the target end is distorted because the target end does not obtain log data 2, log data 3, log data 4, and log data 5. It can improve the accuracy of order delivery and product quantity display based on the target data.
这样,电子设备通过使用上一次数据集成得到的日志标志位,进行本次数据集成。电子设备可以在本次数据集成时获取上一次数据集成时遗漏的来自源端的数据,这样,可以缓解在数据集成过程中的漏数现象,进而可以提升源端与目标端数据的一致性,从而减少目标端的数据失真的问题。In this way, the electronic device performs data integration this time by using the log flag bits obtained from the last data integration. The electronic device can obtain the data from the source that was missed in the previous data integration during this data integration. In this way, the phenomenon of missing numbers in the data integration process can be alleviated, and the consistency of the source and target data can be improved, thereby Reduce the problem of data distortion on the target side.
然后,在第二次进行数据集成时,电子设备由于没有读取到日志文件c1的内容,电子设备确定日志文件c1是遗漏文件。Then, when data integration is performed for the second time, since the electronic device does not read the content of the log file c1, the electronic device determines that the log file c1 is a missing file.
接下来,电子设备将日志文件c1的起点日志数据标识(如为scn=1000)和范围起点(如为scn=988)中,数值最大的作为日志标志位。也就是将日志文件c1的起点日志数据标识作为日志标志位,也就是scn=1000。Next, the electronic device takes the log data identifier (for example, scn=1000) and the range start point (for example, scn=988) of the log file c1 as the log flag, whichever has the largest value. That is, the starting log data identifier of the log file c1 is used as the log flag bit, that is, scn=1000.
示例性的,遗漏日志文件和范围起点的数值对比可参见下述表3。Exemplarily, the numerical comparison between the missing log file and the starting point of the range can be found in Table 3 below.
表3table 3
之后,电子设备在进行下一次数据集成时,就会使用这次数据集成得到的日志标志位,也就是scn=1000,得到日志数据标识范围。并基于该日志数据标识范围,第三次从源端获取日志文件。Afterwards, when the electronic device performs the next data integration, it will use the log flag bit obtained by this data integration, that is, scn=1000, to obtain the log data identification range. And based on the log data identification range, the log file is obtained from the source for the third time.
在一些实施例中,如果因为日志文件a1进行归档操作的时间比较长,电子设备多次(3次、5次等等)获取不到日志文件a1。电子设备可以将日志文件a1作为切换日志文件。之后电子设备可以向源端发送查询请求,以查询日志文件a1的起点日志数据标识和终点日志数据标识。源端可以响应于该日志查询请求,向电子设备发送日志文件a1的起点日志数据标识或终点日志数据标识。这样,电子设备,就可以通过日志文件a1的起点日志数据标识(如,975)或终点日志数据标识(如,980),从源端获取日志文件a1。如,以指定日志数据标识是975或指定日志数据标识是980的形式,从源端获取日志文件a1。并在后续,基于源端日志文件a1对目标端数据库进行数据操作。In some embodiments, if the log file a1 has been archived for a long time, the electronic device fails to obtain the log file a1 for many times (3 times, 5 times, etc.). The electronic device may use the log file a1 as a switching log file. Afterwards, the electronic device may send a query request to the source to query the start log data identifier and the end log data identifier of the log file a1. In response to the log query request, the source end may send the start log data identifier or the end log data identifier of the log file a1 to the electronic device. In this way, the electronic device can obtain the log file a1 from the source through the start log data identifier (for example, 975) or the end log data identifier (for example, 980) of the log file a1. For example, the log file a1 is acquired from the source in the form of specifying the log data identifier as 975 or specifying the log data identifier as 980. And in the follow-up, data operations are performed on the target database based on the source log file a1.
可以理解的是,为了实现上述功能,电子设备包含了执行各个功能相应的硬件和/或软件模块。结合本文中所公开的实施例描述的各示例的算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以结合实施例对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。It can be understood that, in order to realize the above functions, the electronic device includes hardware and/or software modules corresponding to each function. Combining the algorithm steps of each example described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software drives hardware depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions in combination with the embodiments for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.
本实施例可以根据上述方法示例对电子设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块可以采用硬件的形式实现。需要说明的是,本实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In this embodiment, the functional modules of the electronic device may be divided according to the above method example. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules may be implemented in the form of hardware. It should be noted that the division of modules in this embodiment is schematic, and is only a logical function division, and there may be other division methods in actual implementation.
示例性的,参见图15,本申请实施例还提供一种数据集成装置,该装置包括:日志获取模块、日志切换检测模块、日志解析模块、日志数据操作模块和日志标识快照模块。Exemplarily, referring to FIG. 15 , the embodiment of the present application also provides a data integration device, which includes: a log acquisition module, a log switch detection module, a log analysis module, a log data operation module, and a log identification snapshot module.
其中,日志获取模块被配置为,实现上述步骤S101a1-S101a4的功能,或者实现上述步骤S101b1-S101b4的功能。具体可参见上述相关步骤的介绍,在此不再赘述。Wherein, the log obtaining module is configured to realize the functions of the above steps S101a1-S101a4, or realize the functions of the above steps S101b1-S101b4. For details, refer to the introduction of the above-mentioned related steps, which will not be repeated here.
日志切换检测模块,被配置为,实现上述步骤S101a5或S101b5的功能。具体可参见上述相关步骤的介绍,在此不再赘述。The log switching detection module is configured to realize the function of the above step S101a5 or S101b5. For details, refer to the introduction of the above-mentioned related steps, which will not be repeated here.
日志解析模块,被配置为,实现上述解析日志文件的功能。具体可参见上述相关步骤的介绍,在此不再赘述。The log parsing module is configured to realize the above-mentioned function of parsing log files. For details, refer to the introduction of the above-mentioned related steps, which will not be repeated here.
日志数据操作模块,被配置为,实现上述基于日志文件对目标端进行数据操作的功能。具体可参见上述相关步骤的介绍,在此不再赘述。The log data operation module is configured to realize the above-mentioned function of performing data operations on the target end based on the log files. For details, refer to the introduction of the above-mentioned related steps, which will not be repeated here.
快照模块,被配置为,实现上述步骤S102或S103的功能。具体可参见上述相关步骤的介绍,在此不再赘述。The snapshot module is configured to realize the function of the above step S102 or S103. For details, refer to the introduction of the above-mentioned related steps, which will not be repeated here.
本申请实施例还提供一种电子设备,如图16所示,该电子设备可以包括一个或者多个处理器1001、存储器1002和通信接口1003。The embodiment of the present application also provides an electronic device. As shown in FIG. 16 , the electronic device may include one or more processors 1001 , a memory 1002 and a communication interface 1003 .
其中,存储器1002、通信接口1003与处理器1001耦合。例如,存储器1002、通信接口1003与处理器1001可以通过总线1004耦合在一起。Wherein, the memory 1002 and the communication interface 1003 are coupled with the processor 1001 . For example, the memory 1002 , the communication interface 1003 and the processor 1001 may be coupled together through the bus 1004 .
其中,通信接口1003用于与其他设备进行数据传输。存储器1002中存储有计算机程序代码。计算机程序代码包括计算机指令,当计算机指令被处理器1001执行时,使得电子设备执行本申请实施例中的设备认证。Wherein, the communication interface 1003 is used for data transmission with other devices. Computer program code is stored in the memory 1002 . The computer program code includes computer instructions, and when the computer instructions are executed by the processor 1001, the electronic device performs the device authentication in the embodiment of the present application.
其中,处理器1001可以是处理器或控制器,例如可以是中央处理器(centralprocessing unit,CPU),通用处理器,数字信号处理器(digital signal processor,DSP),专用集成电路(application-specific integrated circuit,ASIC),现场可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。Wherein, the processor 1001 may be a processor or a controller, such as a central processing unit (central processing unit, CPU), a general processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (application-specific integrated circuit, ASIC), field programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various illustrative logical blocks, modules and circuits described in connection with the present disclosure. The processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of DSP and a microprocessor, and so on.
其中,总线1004可以是外设部件互连标准(peripheral componentinterconnect,PCI)总线或扩展工业标准结构(extended industry standardarchitecture,EISA)总线等。上述总线1004可以分为地址总线、数据总线、控制总线等。为便于表示,图16中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。Wherein, the bus 1004 may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus or the like. The above bus 1004 can be divided into address bus, data bus, control bus and so on. For ease of representation, only one thick line is used in FIG. 16 , but it does not mean that there is only one bus or one type of bus.
本申请实施例还提供一种计算机可读存储介质,该计算机存储介质中存储有计算机程序代码,当上述处理器执行该计算机程序代码时,电子设备执行上述方法实施例中的相关方法步骤。The embodiment of the present application also provides a computer-readable storage medium, the computer storage medium stores computer program code, and when the above-mentioned processor executes the computer program code, the electronic device executes the relevant method steps in the above-mentioned method embodiment.
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行上述方法实施例中的相关方法步骤。The embodiment of the present application also provides a computer program product, which, when running on a computer, causes the computer to execute the relevant method steps in the above method embodiments.
其中,本申请提供的电子设备、计算机可读存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。Wherein, the electronic equipment, computer-readable storage medium or computer program product provided in this application are all used to execute the corresponding method provided above, therefore, the beneficial effects that it can achieve can refer to the corresponding method provided above The beneficial effects in the above will not be repeated here.
通过以上实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and brevity of the description, only the division of the above-mentioned functional modules is used as an example for illustration. In practical applications, the above-mentioned functions can be assigned by Completion of different functional modules means that the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be Incorporation or may be integrated into another device, or some features may be omitted, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component displayed as a unit may be one physical unit or multiple physical units, that is, it may be located in one place, or may be distributed to multiple different places . Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the essence of the technical solution of the embodiment of the present application or the part that contributes or the whole or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium, including several The instructions are used to make a device (which may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (ROM), random access memory (random access memory, RAM), magnetic disk or optical disk, and other media capable of storing program codes.
以上内容,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above content is only the specific implementation of the application, but the protection scope of the application is not limited thereto, and any changes or replacements within the technical scope disclosed in the application shall be covered within the protection scope of the application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.
Claims (11)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310983572.5A CN116701426B (en) | 2023-08-07 | 2023-08-07 | Data processing method, electronic device and storage medium |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310983572.5A CN116701426B (en) | 2023-08-07 | 2023-08-07 | Data processing method, electronic device and storage medium |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN116701426A true CN116701426A (en) | 2023-09-05 |
| CN116701426B CN116701426B (en) | 2024-04-05 |
Family
ID=87829706
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202310983572.5A Active CN116701426B (en) | 2023-08-07 | 2023-08-07 | Data processing method, electronic device and storage medium |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN116701426B (en) |
Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040210577A1 (en) * | 2003-04-16 | 2004-10-21 | Oracle International Corporation | Techniques for increasing the usefulness of transaction logs |
| US20050289187A1 (en) * | 2004-06-29 | 2005-12-29 | Oracle International Corporation | System and method for investigating a data operation performed on a database |
| US20080235294A1 (en) * | 2007-03-20 | 2008-09-25 | Oracle International Corporation | No data loss system with reduced commit latency |
| US20080301200A1 (en) * | 2007-06-01 | 2008-12-04 | Microsoft Corporation | Log file amnesia detection |
| EP2541413A1 (en) * | 2011-06-30 | 2013-01-02 | Verisign, Inc. | Systems and Methods for Data Integrity Checking |
| JP2013191188A (en) * | 2012-02-14 | 2013-09-26 | Nippon Telegr & Teleph Corp <Ntt> | Log management device, log storage method, log retrieval method, importance determination method and program |
| CN103488690A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Data integrating system and data integrating method |
| KR20140050903A (en) * | 2012-10-22 | 2014-04-30 | 주식회사 엔써티 | Real time backup system of database, system of recovering data and method of recovering data |
| US20200174989A1 (en) * | 2018-12-04 | 2020-06-04 | International Business Machines Corporation | Log reader and parser sharing determination in a change data capture environment |
| CN111723064A (en) * | 2019-03-22 | 2020-09-29 | 顺丰科技有限公司 | Log collection method, device, server and storage medium |
| US20220121524A1 (en) * | 2020-10-20 | 2022-04-21 | EMC IP Holding Company LLC | Identifying database archive log dependency and backup copy recoverability |
| CN115470082A (en) * | 2022-07-29 | 2022-12-13 | 北京结慧科技有限公司 | Log breakpoint and log text data acquisition method of Oracle database |
| CN115757318A (en) * | 2022-11-16 | 2023-03-07 | 中国工商银行股份有限公司 | Log query method, device, storage medium and electronic equipment |
-
2023
- 2023-08-07 CN CN202310983572.5A patent/CN116701426B/en active Active
Patent Citations (13)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040210577A1 (en) * | 2003-04-16 | 2004-10-21 | Oracle International Corporation | Techniques for increasing the usefulness of transaction logs |
| US20050289187A1 (en) * | 2004-06-29 | 2005-12-29 | Oracle International Corporation | System and method for investigating a data operation performed on a database |
| US20080235294A1 (en) * | 2007-03-20 | 2008-09-25 | Oracle International Corporation | No data loss system with reduced commit latency |
| US20080301200A1 (en) * | 2007-06-01 | 2008-12-04 | Microsoft Corporation | Log file amnesia detection |
| EP2541413A1 (en) * | 2011-06-30 | 2013-01-02 | Verisign, Inc. | Systems and Methods for Data Integrity Checking |
| JP2013191188A (en) * | 2012-02-14 | 2013-09-26 | Nippon Telegr & Teleph Corp <Ntt> | Log management device, log storage method, log retrieval method, importance determination method and program |
| KR20140050903A (en) * | 2012-10-22 | 2014-04-30 | 주식회사 엔써티 | Real time backup system of database, system of recovering data and method of recovering data |
| CN103488690A (en) * | 2013-09-02 | 2014-01-01 | 用友软件股份有限公司 | Data integrating system and data integrating method |
| US20200174989A1 (en) * | 2018-12-04 | 2020-06-04 | International Business Machines Corporation | Log reader and parser sharing determination in a change data capture environment |
| CN111723064A (en) * | 2019-03-22 | 2020-09-29 | 顺丰科技有限公司 | Log collection method, device, server and storage medium |
| US20220121524A1 (en) * | 2020-10-20 | 2022-04-21 | EMC IP Holding Company LLC | Identifying database archive log dependency and backup copy recoverability |
| CN115470082A (en) * | 2022-07-29 | 2022-12-13 | 北京结慧科技有限公司 | Log breakpoint and log text data acquisition method of Oracle database |
| CN115757318A (en) * | 2022-11-16 | 2023-03-07 | 中国工商银行股份有限公司 | Log query method, device, storage medium and electronic equipment |
Non-Patent Citations (4)
| Title |
|---|
| 张媛;汤学达;桂文军;: "基于Oracle Logminer的数据同步技术研究", 网络安全技术与应用, no. 06 * |
| 彭巍;邵佳炜;雷振江;吕旭明;聂庆节;刘赛;: "基于数据块的关系数据库日志挖掘技术", 计算机系统应用, no. 07 * |
| 易金旭;: "ORACLE SCN增长过快问题研究和解决方案探索", 福建电脑, no. 02 * |
| 秦森;杨艳;: "基于Oracle日志分析的数据还原操作的设计及实现", 电脑知识与技术(学术交流), no. 03 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116701426B (en) | 2024-04-05 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2022143540A1 (en) | Block chain index storage method and apparatus, computer device and medium | |
| US8819335B1 (en) | System and method for executing map-reduce tasks in a storage device | |
| US11422907B2 (en) | Disconnected operation for systems utilizing cloud storage | |
| US20210117441A1 (en) | Data replication system | |
| US12499253B2 (en) | Data stream processing method, storage control node, and nonvolatile readable storage medium | |
| US9251201B2 (en) | Compatibly extending offload token size | |
| US20180095914A1 (en) | Application direct access to sata drive | |
| CN110851474A (en) | Data query method, database middleware, data query device and storage medium | |
| JP2012150792A (en) | System and method for improving scalability of deduplication storage system | |
| JP2015527008A (en) | Mechanisms that promote storage data encryption-free integrity protection in computing systems | |
| US11977548B2 (en) | Allocating partitions for executing operations of a query | |
| US20120290801A1 (en) | Controlling storing of data | |
| WO2022222523A1 (en) | Log management method and apparatus | |
| US11507611B2 (en) | Personalizing unstructured data according to user permissions | |
| EP3555767B1 (en) | Partial storage of large files in distinct storage systems | |
| CN107423425B (en) | Method for quickly storing and inquiring data in K/V format | |
| CN116257180A (en) | Data access method and device | |
| US12236096B2 (en) | System and method for aggregation of write commits to control written block size | |
| US11507292B2 (en) | System and method to utilize a composite block of data during compression of data blocks of fixed size | |
| US12411804B2 (en) | Data compaction method and device | |
| CN116701426A (en) | Data processing method, electronic device and storage medium | |
| US10832132B2 (en) | Data transmission method and calculation apparatus for neural network, electronic apparatus, computer-readable storage medium and computer program product | |
| US11861166B2 (en) | System and method for network interface controller based data deduplication | |
| CN117762332A (en) | Storage management system, method, equipment and machine-readable storage medium | |
| CN116738510A (en) | Systems and methods for efficiently obtaining information stored in address space |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant | ||
| CP03 | Change of name, title or address | ||
| CP03 | Change of name, title or address |
Address after: Unit 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong 518040 Patentee after: Honor Terminal Co.,Ltd. Country or region after: China Address before: 3401, unit a, building 6, Shenye Zhongcheng, No. 8089, Hongli West Road, Donghai community, Xiangmihu street, Futian District, Shenzhen, Guangdong Patentee before: Honor Device Co.,Ltd. Country or region before: China |