CN109992469B

CN109992469B - Method and device for merging logs

Info

Publication number: CN109992469B
Application number: CN201711489592.8A
Authority: CN
Inventors: 严锁鹏
Original assignee: 360 Technology Group Co Ltd
Current assignee: 360 Technology Group Co Ltd
Priority date: 2017-12-29
Filing date: 2017-12-29
Publication date: 2023-08-18
Anticipated expiration: 2037-12-29
Also published as: CN109992469A

Abstract

The invention provides a method and a device for merging logs, wherein the method comprises the steps of writing received logs into a set non-persistent cache and a persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the logs is smaller than that of the persistent cache to the logs. Searching the logs with the same field in the non-persistent cache, acquiring and merging various logs meeting preset backtracking conditions from the searched logs with the same field, and caching the merged logs into the persistent cache. According to the embodiment of the invention, the received log is replicatively written into the set non-persistent cache and the persistent cache, so that a large number of acquisition requests can be prevented from being directly distributed into the persistent cache when the log is acquired later, and the log processing pressure of the persistent cache is reduced. And moreover, the logs with the same fields are combined, so that the storage space of the logs can be effectively saved, and the subsequent centralized management of the logs is facilitated.

Description

A method and device for merging logs

技术领域technical field

本发明涉及计算机技术领域，特别是涉及一种合并日志的方法及装置。The invention relates to the field of computer technology, in particular to a method and device for merging logs.

背景技术Background technique

日志在计算机系统中是一个非常广泛的概念，任何程序(如操作系统内核、各种应用服务器等等)都有可能输出日志。虽然日志的内容、规模和用途各不相同，但是，其总体的功能是记录软件运行状态，存储系统产生的事件信息。Logs are a very broad concept in computer systems, and any program (such as operating system kernel, various application servers, etc.) may output logs. Although the content, scale and purpose of the log are different, its overall function is to record the running status of the software and store the event information generated by the system.

随着计算机技术的不断发展，目前的日志量越来越多很大，特别是在日志回溯规则很多的时候(例如同一日志的回溯规则包括每隔5分钟或者1小时或者5小时进行日志回溯)，需要不断的从存储日志的缓存数据库中获取日志，由于持久化缓存的性能较差，因此这会对持久化缓存数据库造成很大的压力，从而容易影响日志处理系统对日志的处理性能。因此，如何为日志的缓存和处理提供足够大的灵活性以及良好的伸缩性是目前面临的较大的挑战。With the continuous development of computer technology, the current log volume is increasing, especially when there are many log backtracking rules (for example, the backtracking rules of the same log include log backtracking every 5 minutes or 1 hour or 5 hours) , It is necessary to continuously obtain logs from the cache database storing logs. Due to the poor performance of the persistent cache, this will put a lot of pressure on the persistent cache database, which will easily affect the log processing performance of the log processing system. Therefore, how to provide sufficient flexibility and good scalability for log caching and processing is a major challenge at present.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的合并日志的方法及装置。In view of the above problems, the present invention is proposed to provide a method and device for merging logs that overcome the above problems or at least partially solve the above problems.

依据本发明的一方面，提供了一种合并日志的方法，包括：According to one aspect of the present invention, a method for merging logs is provided, including:

将接收日志复制性写入设置的非持久化缓存和持久化缓存中，其中，所述非持久化缓存对日志的缓存时长小于所述持久化缓存对日志的缓存时长；Write the receive log duplicatively into the set non-persistent cache and persistent cache, wherein the cache duration of the non-persistent cache to the log is shorter than the cache duration of the persistent cache to the log;

在所述非持久化缓存中查找具有相同字段的日志；Find a log with the same field in the non-persistent cache;

从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至所述持久化缓存中。Various logs satisfying preset backtracking conditions are obtained from the found logs of the same field and merged, and the merged logs are cached in the persistent cache.

可选地，若在所述非持久化缓存中未查找到具有相同字段的日志，则继续从所述持久化缓存中查找具有相同字段的日志。Optionally, if no log with the same field is found in the non-persistent cache, continue to search for a log with the same field from the persistent cache.

可选地，所述将接收日志复制性写入设置的非持久化缓存和持久化缓存之前，还包括：设置第一级缓存和第二级缓存作为非持久化缓存，设置第三级缓存作为持久化缓存。Optionally, before the non-persistent cache and the persistent cache are written to receive log replication, it also includes: setting the first-level cache and the second-level cache as the non-persistent cache, and setting the third-level cache as the non-persistent cache. persistent cache.

可选地，在所述非持久化缓存中查找具有相同字段的日志，包括：Optionally, look for logs with the same fields in the non-persistent cache, including:

依次从所述第一级缓存、所述第二级缓存中查找具有相同字段的日志。A log with the same field is searched from the first-level cache and the second-level cache in sequence.

可选地，所述第一级缓存包括redis cluster，所述第二级缓存包括aerospike，所述第三级缓存包括hbase。Optionally, the first-level cache includes redis cluster, the second-level cache includes aerospike, and the third-level cache includes hbase.

可选地，所述方法应用于kafka系统，所述将接收日志复制性写入设置的非持久化缓存和持久化缓存中，包括：Optionally, the method is applied to the kafka system, and the non-persistent cache and the persistent cache of the set receive log replication are written, including:

将接收日志中具有相同字段的日志分配到基于所述kafka系统中的同一worker行程中，其中，所述字段包括用户ID或者用户其他唯一标识key；Logs with the same field in the receiving log are assigned to the same worker trip based on the kafka system, wherein the field includes a user ID or other unique identification keys of the user;

各worker行程将分配到的日志复制性写入所述非持久化缓存和所述持久化缓存中。Each worker process writes the allocated log replicas into the non-persistent cache and the persistent cache.

可选地，所述将接收日志中具有相同字段的日志分配到基于所述kafka系统中的同一worker行程中，包括：Optionally, the distributing the logs with the same fields in the received logs to the same worker trip based on the kafka system includes:

采用flink keyby操作将具有相同字段的日志分配到基于所述kafka系统中的同一worker行程中。Use the flink keyby operation to assign logs with the same field to the same worker trip based on the kafka system.

到达预置回溯时间时启动回溯操作，并在所述非持久化缓存中查找具有相同字段的日志。A backtracking operation is started when the preset backtracking time is reached, and a log with the same field is searched in the non-persistent cache.

可选地，所述方法还包括：Optionally, the method also includes:

到达预置回溯时间并启动回溯操作时，判断回溯时间与日志写入时间差是否超过所述非持久化缓存的缓存时长；When the preset backtracking time is reached and the backtracking operation is started, it is judged whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache;

若是，直接从所述持久化缓存中查找具有相同字段的日志。If so, directly search for a log with the same field from the persistent cache.

可选地，所述将接收日志复制性写入设置的非持久化缓存和持久化缓存中，包括：Optionally, the copying of the received log into the set non-persistent cache and persistent cache includes:

接收日志并为所述日志添加时间戳，将携带时间戳的日志复制性写入设置的非持久化缓存和持久化缓存中。Receive the log and add a timestamp to the log, and write the log with the timestamp into the set non-persistent cache and persistent cache.

可选地，所述到达预置回溯时间并启动回溯操作时，判断回溯时间与日志写入时间差是否超过所述非持久化缓存的缓存时长，包括：Optionally, when the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache includes:

到达预置回溯时间并启动回溯操作时，依据日志的时间戳判断回溯时间与日志写入时间差是否超过所述非持久化缓存的缓存时长。When the preset backtracking time is reached and the backtracking operation is started, it is judged according to the timestamp of the log whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache.

可选地，所述满足预置回溯条件的多种日志包括：Optionally, the various logs satisfying preset backtracking conditions include:

完整session动作中多个组成对象所对应的日志，其中，所述完整session动作中的多个组成对象依据业务需求确定得到。Logs corresponding to multiple constituent objects in the complete session action, wherein the multiple constituent objects in the complete session action are determined according to business requirements.

可选地，所述方法还包括：若从查找到的相同字段的日志中无法获取满足预置回溯条件的多种日志，依据查找到的相同字段的日志建立topic，并为建立的topic设置最大回溯时间；Optionally, the method further includes: if various logs satisfying the preset backtracking conditions cannot be obtained from the found logs of the same field, creating a topic according to the found logs of the same field, and setting the maximum look back in time;

在达到所述最大回溯时间时从所述非持久化缓存和持久化缓存中查找是否存在与topic中的日志字段相同、且满足预置回溯条件的多种日志；When the maximum backtracking time is reached, it is searched from the non-persistent cache and the persistent cache whether there are various logs that are the same as the log fields in the topic and satisfy the preset backtracking conditions;

若是，合并具有相同字段且满足预置回溯条件的多种日志，并将合并后的日志缓存至所述持久化缓存中。If so, merge multiple logs with the same fields and satisfy preset backtracking conditions, and cache the merged logs into the persistent cache.

依据本发明的另一方面，还提供了一种合并日志的装置，包括：According to another aspect of the present invention, a device for merging logs is also provided, including:

写入模块，适于将接收日志复制性写入设置的非持久化缓存和持久化缓存中，其中，所述非持久化缓存对日志的缓存时长小于所述持久化缓存对日志的缓存时长；The writing module is adapted to write the received log duplicatively into the set non-persistent cache and persistent cache, wherein the cache duration of the non-persistent cache to the log is shorter than the cache duration of the persistent cache to the log;

查找模块，适于在所述非持久化缓存中查找具有相同字段的日志；A search module, adapted to search for logs with the same field in the non-persistent cache;

合并模块，适于从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至所述持久化缓存中。The merging module is adapted to acquire and merge various logs satisfying preset backtracking conditions from the found logs of the same field, and cache the merged logs into the persistent cache.

可选地，所述查找模块还适于：Optionally, the search module is also suitable for:

若在所述非持久化缓存中未查找到具有相同字段的日志，则继续从所述持久化缓存中查找具有相同字段的日志。If no log with the same field is found in the non-persistent cache, continue to search for a log with the same field from the persistent cache.

可选地，所述装置还包括：Optionally, the device also includes:

设置模块，适于在所述写入模块将接收日志复制性写入设置的非持久化缓存和持久化缓存之前，设置第一级缓存和第二级缓存作为非持久化缓存，设置第三级缓存作为持久化缓存。The setting module is adapted to set the first-level cache and the second-level cache as non-persistent caches before the writing module writes the received log replication into the set non-persistent cache and persistent cache, and sets the third-level The cache acts as a persistent cache.

可选地，所述装置应用于kafka系统，所述写入模块还适于：Optionally, the device is applied to the kafka system, and the writing module is also suitable for:

可选地，所述写入模块还适于：Optionally, the writing module is also suitable for:

可选地，所述装置还包括：Optionally, the device also includes:

判断模块，适于到达预置回溯时间并启动回溯操作时，判断回溯时间与日志写入时间差是否超过所述非持久化缓存的缓存时长；The judging module is adapted to determine whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache when the preset backtracking time is reached and the backtracking operation is started;

若是，所述查找模块直接从所述持久化缓存中查找具有相同字段的日志。If yes, the search module directly searches the log with the same field from the persistent cache.

可选地，所述判断模块还适于：Optionally, the judging module is also suitable for:

可选地，所述装置还包括：Optionally, the device also includes:

建立模块，适于若所述合并模块从查找到的相同字段的日志中无法获取满足预置回溯条件的多种日志，依据查找到的相同字段的日志建立topic，并为建立的topic设置最大回溯时间；Establishing a module adapted to create a topic based on the found logs of the same field if the merging module cannot obtain multiple logs that meet the preset backtracking conditions from the found logs of the same field, and setting the maximum backtracking for the established topic time;

所述查找模块在达到所述最大回溯时间时从所述非持久化缓存和持久化缓存中查找是否存在与topic中的日志字段相同、且满足预置回溯条件的多种日志；When the search module reaches the maximum backtracking time, it searches from the non-persistent cache and the persistent cache whether there are various logs that are the same as the log fields in the topic and satisfy the preset backtracking conditions;

若是，所述合并模块合并具有相同字段且满足预置回溯条件的多种日志，并将合并后的日志缓存至所述持久化缓存中。If so, the merging module merges various logs having the same fields and meeting preset backtracking conditions, and caches the merged logs into the persistent cache.

依据本发明的再一方面，还提供了一种电子设备，包括：According to another aspect of the present invention, an electronic device is also provided, including:

处理器；以及processor; and

被安排成存储计算机可执行指令的存储器，所述可执行指令在被执行时使所述处理器执行根据上文任意实施例中的合并日志的方法。A memory arranged to store computer-executable instructions which, when executed, cause the processor to perform the method of merging logs according to any of the above embodiments.

依据本发明的又一方面，还提供了一种计算机存储介质，其中，所述计算机可读存储介质存储一个或多个程序，所述一个或多个程序当被包括多个应用程序的电子设备执行时，使得所述电子设备执行根据上文任意实施例中的合并日志的方法。According to yet another aspect of the present invention, a computer storage medium is also provided, wherein the computer-readable storage medium stores one or more programs, and the one or more programs are used when an electronic device including multiple application programs When executing, the electronic device is made to execute the method for merging logs according to any of the above embodiments.

在本发明实施例中，当接收到日志时，将接收到的日志复制性写入设置的非持久化缓存和持久化缓存中，其中，非持久化缓存对日志的缓存时长小于持久化缓存对日志的缓存时长。写入日志之后还可以在非持久化缓存中查找具有相同字段的日志，并从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至持久化缓存中。由此，本发明实施例通过设置非持久化缓存和持久化缓存，并将接收到的日志复制性的写入设置的非持久化缓存和持久化缓存，从而可以避免后续在获取日志时将大量的获取请求直接分配到持久化缓存中，以减少持久化缓存的日志处理压力。进一步地，将具有相同字段的日志进行合并还可以有效地节约日志的存储空间，并方便后续对日志的集中管理。In the embodiment of the present invention, when a log is received, the received log is duplicatively written into the set non-persistent cache and the persistent cache, wherein the non-persistent cache has a shorter cache duration for the log than the persistent cache pair Log cache duration. After writing the log, you can also search for logs with the same fields in the non-persistent cache, and obtain and merge various logs that meet the preset backtracking conditions from the found logs with the same fields, and cache the merged logs to the persistent cached. Therefore, in the embodiment of the present invention, by setting the non-persistent cache and the persistent cache, and writing the received logs to the set non-persistent cache and the persistent cache, it is possible to avoid a large number of subsequent log acquisitions. Get requests are directly assigned to the persistent cache to reduce the log processing pressure of the persistent cache. Further, merging the logs with the same field can also effectively save the storage space of the logs, and facilitate subsequent centralized management of the logs.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

根据下文结合附图对本发明具体实施例的详细描述，本领域技术人员将会更加明了本发明的上述以及其他目的、优点和特征。Those skilled in the art will be more aware of the above and other objects, advantages and features of the present invention according to the following detailed description of specific embodiments of the present invention in conjunction with the accompanying drawings.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same components. In the attached picture:

图1示出了根据本发明一个实施例的合并日志的方法的流程示意图；FIG. 1 shows a schematic flowchart of a method for merging logs according to an embodiment of the present invention;

图2示出了根据本发明另一个实施例的合并日志的方法的流程示意图；FIG. 2 shows a schematic flowchart of a method for merging logs according to another embodiment of the present invention;

图3示出了根据本发明一个实施例的合并日志的装置的结构示意图；FIG. 3 shows a schematic structural diagram of an apparatus for merging logs according to an embodiment of the present invention;

图4示出了根据本发明另一个实施例的合并日志的装置的结构示意图；FIG. 4 shows a schematic structural diagram of an apparatus for merging logs according to another embodiment of the present invention;

图5示出了用于执行根据本发明的合并日志的方法的计算设备的框图；以及Figure 5 shows a block diagram of a computing device for performing the method for merging logs according to the present invention; and

图6示出了用于保持或者携带实现根据本发明的合并日志的方法的程序代码的存储单元。FIG. 6 shows a storage unit for holding or carrying program codes for realizing the method for merging logs according to the present invention.

具体实施方式Detailed ways

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

为解决上述技术问题，本发明实施例提供了一种合并日志的方法，该方法可以应用在kafka系统中，Kafka是一种高吞吐量的分布式发布订阅消息系统，它可以处理消费者规模的网站中的所有动作流数据，kafka开发的主要目标是构建一个用来处理海量日志、用户行为和网站运营统计等的数据处理框架。图1示出了根据本发明一个实施例的合并日志的方法的流程示意图。参见图1，该方法至少包括步骤S102至步骤S106。In order to solve the above technical problems, the embodiment of the present invention provides a method for merging logs, which can be applied in the Kafka system. Kafka is a high-throughput distributed publish-subscribe message system, which can handle consumer-scale For all action flow data in the website, the main goal of kafka development is to build a data processing framework for processing massive logs, user behavior and website operation statistics. Fig. 1 shows a schematic flowchart of a method for merging logs according to an embodiment of the present invention. Referring to Fig. 1, the method at least includes step S102 to step S106.

步骤S102，将接收日志复制性写入设置的非持久化缓存和持久化缓存中，其中，非持久化缓存对日志的缓存时长小于持久化缓存对日志的缓存时长。In step S102, copying the received log is written into the set non-persistent cache and persistent cache, wherein the cache duration of the log in the non-persistent cache is shorter than the cache duration of the log in the persistent cache.

在该步骤中，将日志复制性的写入设置的非持久化缓存和持久化缓存中指的是在接收到的一份日志后，不仅会将该日志写入设置的非持久化缓存，还会将该日志写入设置的持久化缓存中。写入日志的方式可以是同时写入两类缓存(即非持久化缓存和持久化缓存)，或者先写入一类缓存，再写入另一类缓存中，本发明实施例对此不做具体限定。In this step, copying the log into the set non-persistent cache and the persistent cache refers to not only writing the log to the set non-persistent cache after receiving a log, but also Write the log to the set persistent cache. The way to write the log can be to write two types of caches (ie, non-persistent cache and persistent cache) at the same time, or write to one type of cache first, and then write to another type of cache, which is not done in the embodiment of the present invention. Specific limits.

步骤S104，在非持久化缓存中查找具有相同字段的日志。Step S104, searching for a log with the same field in the non-persistent cache.

在该步骤中，具有相同字段的日志指的是日志内容虽然不同(或者相同)，但其具有相同的字段。其中，字段可以包括用户ID，还可以包括用户其他唯一标识key等。In this step, the logs with the same fields mean that although the contents of the logs are different (or the same), they have the same fields. Wherein, the field may include a user ID, and may also include other unique identification keys of the user, and the like.

步骤S106，从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至持久化缓存中。Step S106, obtaining and merging various logs satisfying preset backtracking conditions from the found logs of the same field, and caching the merged logs into a persistent cache.

在本发明实施例中，当接收到日志时将接收到的日志复制性写入设置的非持久化缓存和持久化缓存中，其中，非持久化缓存对日志的缓存时长小于持久化缓存对日志的缓存时长。写入日志之后还可以在非持久化缓存中查找具有相同字段的日志，并从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至持久化缓存中。由此，本发明实施例通过设置非持久化缓存和持久化缓存，并将接收到的日志复制性的写入设置的非持久化缓存和持久化缓存，从而可以避免后续在获取日志时将大量的获取请求直接分配到持久化缓存中，以减少持久化缓存的日志处理压力。进一步地，将具有相同字段的日志进行合并还可以有效地节约日志的存储空间，并方便后续对日志的集中管理。In the embodiment of the present invention, when a log is received, the received log is duplicatively written into the set non-persistent cache and the persistent cache, wherein the non-persistent cache has a shorter cache duration for the log than the persistent cache for the log cache duration. After writing the log, you can also search for logs with the same fields in the non-persistent cache, and obtain and merge various logs that meet the preset backtracking conditions from the found logs with the same fields, and cache the merged logs to the persistent cached. Therefore, in the embodiment of the present invention, by setting the non-persistent cache and the persistent cache, and writing the received logs to the set non-persistent cache and the persistent cache, it is possible to avoid a large number of subsequent log acquisitions. Get requests are directly assigned to the persistent cache to reduce the log processing pressure of the persistent cache. Further, merging the logs with the same field can also effectively save the storage space of the logs, and facilitate subsequent centralized management of the logs.

参见上文步骤S102，在本发明一实施例中，在将接收日志复制性写入设置的非持久化缓存和持久化缓存之前还需要先设置非持久化缓存和持久化缓存两类缓存。例如，可以设置第一级缓存和第二级缓存作为非持久化缓存，并且设置第三级缓存作为持久化缓存。其中，第一级缓存可以采用redis cluster(即Redis集群)，Redis是一个开源的Key-Value数据库，并提供多种语言的API(Application Program Interface，应用程序接口)。第二级缓存可以采用aerospike数据库，其是一个分布式可扩展的NoSql(Not Only SQL，非关系型数据库)。第三级缓存可以采用hbase数据库，其是一个分布式的、面向列的开源数据库。此处的各级缓存采用的数据库仅仅是示意性的，还可以包括其他类型的数据库，本发明实施例对此不做具体限定。Referring to step S102 above, in an embodiment of the present invention, two types of caches, the non-persistent cache and the persistent cache, need to be set before the received log is replicated into the set non-persistent cache and the persistent cache. For example, the first-level cache and the second-level cache can be set as non-persistent caches, and the third-level cache can be set as persistent caches. Among them, the first-level cache can use redis cluster (ie Redis cluster), Redis is an open source Key-Value database, and provides API (Application Program Interface, application program interface) in multiple languages. The second-level cache can use the aerospike database, which is a distributed and scalable NoSql (Not Only SQL, non-relational database). The third-level cache can use the hbase database, which is a distributed, column-oriented open source database. The databases used by the caching at all levels here are only illustrative, and may also include other types of databases, which are not specifically limited in this embodiment of the present invention.

在本发明实施例中，非持久化缓存对日志的缓存时长可以远远小于持久化缓存对日志的缓存时长。例如，非持久化缓存的缓存时长为1小时、2小时等等，而持久化缓存的缓存时长为10天、20天等等，或者还可以对持久化缓存的缓存时长不设置具体时间，即缓存至持久化缓存中的日志只要不进行人为的删除，其日志可以一直保存。若非持久化缓存包括第一级缓存和第二级缓存，那么可以分别为第一级缓存和第二级缓存设置相应的缓存时长，这两级缓存的缓存时长可以相同，也可以不同。例如，第一级缓存和第二级缓存的缓存时长均为1小时，那么两级缓存中的日志每隔1小时就会更新一次，即删除原来缓存的日志，并继续缓存新接收到的日志。若持久化缓存包含第三级缓存，其缓存时长为10天，同理，第三级缓存中的日志每隔10天更新一次，但是，若第三级缓存没有配置缓存时长，那么其中的日志能够永久保存，当然用户可以人为删除持久化缓存中的日志。本发明实施例对非持久化缓存和持久化缓存的缓存时间不做具体限定。In the embodiment of the present invention, the non-persistent cache can cache logs for a long time much shorter than the persistent cache for logs. For example, the cache duration of the non-persistent cache is 1 hour, 2 hours, etc., and the cache duration of the persistent cache is 10 days, 20 days, etc., or the cache duration of the persistent cache can not be set for a specific time, that is As long as the logs cached in the persistent cache are not manually deleted, the logs can be kept forever. If the non-persistent cache includes the first-level cache and the second-level cache, you can set corresponding cache durations for the first-level cache and the second-level cache respectively, and the cache durations of the two caches can be the same or different. For example, if the cache duration of the first-level cache and the second-level cache are both 1 hour, then the logs in the two-level cache will be updated every 1 hour, that is, the original cached logs will be deleted, and the newly received logs will continue to be cached . If the persistent cache includes a third-level cache, its cache duration is 10 days. Similarly, the logs in the third-level cache are updated every 10 days. However, if the third-level cache does not configure the cache duration, the logs in it It can be stored permanently, of course, users can manually delete the logs in the persistent cache. The embodiment of the present invention does not specifically limit the cache time of the non-persistent cache and the persistent cache.

在本发明实施例中，将接收到的日志缓存在各级数据库中而并非是系统内存，主要原因在于若将日志保存在系统内存中，那么当系统重新启动时内存中的日志有可能会丢失，即在系统重启时之前内存中缓存的日志都会miss掉，从而仅仅剩下持久化缓存(如hbase数据库)中的日志。当后续有业务使用日志时只能从持久化缓存中获取日志，这可能使得持久化缓存的处理性能无法跟上业务的需求。通过设置非持久化缓存和持久化缓存，如第一级缓存、第二级缓存以及第三级缓存，当系统重启之后各级缓存中的日志不会丢失，并且各级缓存中的日志可以满足各种业务需求，避免了增加持久化缓存的日志处理压力。In the embodiment of the present invention, the received logs are cached in databases at all levels rather than system memory. The main reason is that if the logs are stored in the system memory, the logs in the memory may be lost when the system is restarted. , that is, before the system restarts, the logs cached in the memory will be missed, so that only the logs in the persistent cache (such as the hbase database) remain. When there are subsequent business use logs, the logs can only be obtained from the persistent cache, which may make the processing performance of the persistent cache unable to keep up with the needs of the business. By setting non-persistent cache and persistent cache, such as first-level cache, second-level cache, and third-level cache, the logs in each level of cache will not be lost after the system is restarted, and the logs in each level of cache can satisfy Various business needs avoid the log processing pressure of increasing the persistent cache.

参见上文步骤S104，在本发明一实施例中，若非持久化缓存包括第一级缓存和第二级缓存。那么，在非持久化缓存中查找具有相同字段的日志时，可以依次从第一级缓存、第二级缓存中查找具有相同字段的日志，即先从第一级缓存中查找是否具有相同字段的日志，若是，则可以从相同字段的日志中获取满足预置回溯条件的多种日志，若否，再从第二级缓存中查找具有相同字段的日志。当然，若第二级缓存中仍然没有查找到具有相同字段的日志，还可以继续从持久化缓存(如第三级缓存)中查找具有相同字段的日志。Referring to step S104 above, in an embodiment of the present invention, if the non-persistent cache includes a first-level cache and a second-level cache. Then, when looking for logs with the same field in the non-persistent cache, you can search for logs with the same field from the first-level cache and the second-level cache in turn, that is, first look for the logs with the same field from the first-level cache log, if yes, you can obtain multiple logs that meet the preset backtracking conditions from logs with the same field, and if not, look for logs with the same field from the second-level cache. Certainly, if the log with the same field is still not found in the second-level cache, the log with the same field may continue to be searched from the persistent cache (such as the third-level cache).

在本发明一实施例中，在非持久化缓存中查找具有相同字段的日志的过程即对缓存的日志的回溯过程。当到达预置的回溯时间并开始启动回溯操作(即join操作)时，在非持久化缓存中查找具有相同字段的日志。其中，预置的回溯时间可以是5分钟、1小时、5小时等等时长，本发明实施例对此不做具体的限定。In an embodiment of the present invention, the process of searching for a log with the same field in the non-persistent cache is the backtracking process of the cached log. When the preset backtracking time is reached and the backtracking operation (that is, the join operation) is started, a log with the same field is searched in the non-persistent cache. Wherein, the preset look-back time may be 5 minutes, 1 hour, 5 hours, etc., which is not specifically limited in this embodiment of the present invention.

在该实施例中，当到达预置回溯时间并启动回溯操作时，还可以先判断回溯时间与日志写入时间差是否超过非持久化缓存的缓存时长，若是，则可以直接从持久化缓存中查找具有相同字段的日志。由于非持久化缓存的缓存时间较短，即日志在其中的存储时间较短，当达到预置回溯时间时需要回溯的日志可能已经不在非持久化缓存中了，因此，当判断回溯时间与日志写入时间差超过非持久化缓存的缓存时长时，可以直接从持久化缓存中查找具有相同字段的日志，以减少不必要的操作流程，从而提高回溯效率。In this embodiment, when the preset backtracking time is reached and the backtracking operation is started, it is also possible to first determine whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache, and if so, it can be directly searched from the persistent cache Logs with the same fields. Since the cache time of the non-persistent cache is short, that is, the storage time of the log in it is short, the log that needs to be backtracked may not be in the non-persistent cache when the preset backtracking time is reached. Therefore, when judging the backtracking time and the log When the write time difference exceeds the cache duration of the non-persistent cache, you can directly look up logs with the same field from the persistent cache to reduce unnecessary operation processes and improve backtracking efficiency.

其中，判断回溯时间与日志写入时间差是否超过非持久化缓存的缓存时长时，可以依据非持久化缓存中日志的时间戳判断回溯时间与日志写入时间差是否超过非持久化缓存的缓存时长。日志的时间戳可以是在接收到日志之后为日志添加的时间戳，进而将携带时间戳的日志复制性写入设置的非持久化缓存和持久化缓存中。Among them, when judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache, it can be judged based on the timestamp of the log in the non-persistent cache whether the difference between the lookback time and the log writing time exceeds the cache duration of the non-persistent cache. The timestamp of the log can be a timestamp added to the log after the log is received, and then the log with the timestamp is replicated and written into the set non-persistent cache and persistent cache.

参见上文步骤S106，在本发明实施例中，满足预置回溯条件的多种日志可以是完整session动作中多个组成对象所对应的日志。其中，完整session动作中的多个组成对象依据业务需求确定得到。例如，在业务1中，一次完整的session动作可以包括用户流量的竞价、展示、点击操作，其中竞价、展示、点击操作即为业务1完整session动作中的三个组成对象。业务2中，一次完整的session动作可以包括用户流量的竞价、展示，而不包括点击操作，其中竞价、展示即为业务2完整session动作中的两个组成对象。由此，不同的业务需求对应的完整session动作也是不相同的。Referring to step S106 above, in the embodiment of the present invention, the various logs satisfying the preset backtracking conditions may be the logs corresponding to multiple constituent objects in the complete session action. Among them, multiple constituent objects in the complete session action are determined according to business requirements. For example, in business 1, a complete session action may include user traffic bidding, display, and click operations, where bidding, display, and click operations are the three components of the complete session action of business 1. In business 2, a complete session action may include bidding and display of user traffic, excluding click operations, where bidding and display are two components of the complete session action of business 2. Therefore, the complete session actions corresponding to different business requirements are also different.

在该实施例中，如果从查找到的相同字段的日志中无法获取满足预置回溯条件的多种日志。此时，可以将不满足预置回溯条件的多种日志作为特殊日志类型来处理，即依据查找到的相同字段的日志建立topic，并为建立的topic设置最大回溯时间。并在达到最大回溯时间时从非持久化缓存和持久化缓存中查找是否存在与topic中的日志字段相同、且满足预置回溯条件的多种日志，若存在，则合并具有相同字段且满足预置回溯条件的多种日志，并将合并后的日志缓存至持久化缓存中。其中，kafka系统可以利用topic对日志进行分类管理，一个业务可以申请多个topic，并且，一个topic的日志也可以被多个业务共享。In this embodiment, if various logs satisfying the preset backtracking condition cannot be obtained from the found logs of the same field. At this point, various logs that do not meet the preset backtracking conditions can be treated as special log types, that is, a topic is created based on the found logs of the same field, and the maximum backtracking time is set for the created topic. And when the maximum backtracking time is reached, check from the non-persistent cache and the persistent cache whether there are various logs that have the same log field as the topic and meet the preset backtracking conditions. If they exist, merge them with the same field and meet the preset Various logs with backtracking conditions set, and the merged logs are cached in the persistent cache. Among them, the Kafka system can use topics to classify and manage logs. One business can apply for multiple topics, and the logs of one topic can also be shared by multiple businesses.

例如，上文提及的业务1中，在回溯过程中仅仅可以查找到用户A流量竞价、展示对应的日志，而没有查找到用户A的点击操作对应的日志，即没有回溯成功。此时，可以依据查找到的用户A的日志建立topic，并为建立的topic设置最大回溯时间，例如设置的最大回溯时间为5小时，那么在5小时之后从非持久化缓存和持久化缓存中再次查找是否存在用户A流量的竞价、展示、点击操作对应的日志，如果存在，合并查找到的日志，并将合并后的日志缓存至持久化缓存中。For example, in business 1 mentioned above, only the logs corresponding to user A’s traffic bidding and display can be found during the backtracking process, but no logs corresponding to user A’s click operations can be found, that is, the backtracking is not successful. At this point, you can create a topic based on the found user A's log, and set the maximum backtracking time for the established topic. For example, if the maximum backtracking time is set to 5 hours, then after 5 hours, the non-persistent cache and persistent cache Check again whether there are logs corresponding to the bidding, display, and click operations of user A's traffic. If yes, merge the found logs and cache the merged logs in the persistent cache.

此处需要说明的是，对于大部分的回溯操作是秒级就能回溯完成的，设置的最大回溯时间指的是业务端可以容忍的最大回溯时间，即业务端可以容忍获取回溯成功日志的最多等待时间。当然，对于没有回溯成功的用户日志，在没有到达最大回溯时间(如5小时等等)时，有可能之前没有回溯成功的用户日志就满足了预置回溯条件。例如，在之前回溯过程中仅仅可以查找到用户A流量竞价、展示对应的日志，而没有查找到用户A的点击操作对应的日志，且设置的最大回溯时间为5小时，如果在接下来的1分钟内接收到了点击日志，那么当设置的最大回溯时间到达时可以不再执行回溯操作的过程。What needs to be explained here is that most of the backtracking operations can be done in seconds. The maximum backtracking time set refers to the maximum backtracking time that the business side can tolerate, that is, the business side can tolerate the maximum number of backtracking success logs waiting time. Of course, for user logs that have not been backtracked successfully, before reaching the maximum backtracking time (such as 5 hours, etc.), it is possible that the user logs that have not been backtracked successfully before meet the preset backtracking conditions. For example, in the previous backtracking process, only the logs corresponding to user A’s traffic bidding and display can be found, but no logs corresponding to user A’s click operations can be found, and the maximum backtracking time is set to 5 hours. If in the next 1 If the click log is received within minutes, the backtracking operation can no longer be performed when the set maximum backtracking time is reached.

由此，最大回溯时间实际上可以认为是一种兜底保证，就是在业务端最大容忍时间内去尝试下是否能对日志回溯成功。因为有些业务的规则是即使在最大容忍时间(即最大回溯时间)没有回溯成功，那么也需要将部分回溯成功的日志进行合并。Therefore, the maximum backtracking time can actually be considered as a bottom-line guarantee, that is, to try whether the log can be backtracked successfully within the maximum tolerance time on the business side. Because some business rules are that even if the backtracking is not successful within the maximum tolerance time (ie, the maximum backtracking time), it is necessary to merge some of the backtracking successful logs.

继续参见步骤S106，在本发明一实时中，当日志合并成功后可以直接缓存至持久化缓存中，供离线job处理。当然，可以在日志合并成功后，直接以kafka系统为中心往外分发，供实时job处理。Continuing to refer to step S106, in a real-time method of the present invention, after the log is merged successfully, it can be directly cached in the persistent cache for offline job processing. Of course, after the log is successfully merged, it can be directly distributed outside the center of the Kafka system for real-time job processing.

本发明实施例还提供了另一种合并日志的方法。图2示出了根据本发明另一个实施例的合并日志的方法的流程示意图。参见图2，该方法至少包括步骤S202至步骤S212。The embodiment of the present invention also provides another method for merging logs. Fig. 2 shows a schematic flowchart of a method for merging logs according to another embodiment of the present invention. Referring to Fig. 2, the method at least includes step S202 to step S212.

步骤S202，将接收日志中具有相同字段的日志分配到基于kafka系统中的同一worker行程中。其中，字段可以包括用户ID或者用户其他唯一标识key。Step S202, assigning logs with the same fields in the received logs to the same worker journey in the Kafka-based system. Wherein, the field may include a user ID or other unique identification keys of the user.

在该步骤中，可以采用flink keyby操作(如hash)将具有相同字段的日志分配到基于kafka系统中的同一worker行程中。flink是一个类似spark的“开源技术栈”，可以提供批处理、流式计算、图计算、交互式查询、机器学习等功能。通过将具有相同字段的日志分配到同一worker行程中，从而可以保证后续能够更加方便地对相同字段的日志进行合并，进而在系统进行扩容或升级的时候有效地保证日志的一致性。例如，通过flink keyby操作将用户ID为“11”的日志分配到worker1中，将用户ID为“12”的日志分配到worker2中，将用户ID为“13”的日志分配到worker3中。In this step, the flink keyby operation (such as hash) can be used to assign logs with the same field to the same worker process in the kafka-based system. Flink is an "open source technology stack" similar to spark, which can provide batch processing, streaming computing, graph computing, interactive query, machine learning and other functions. By assigning logs with the same field to the same worker itinerary, it is possible to ensure that the logs with the same field can be merged more conveniently in the future, and then effectively ensure the consistency of the logs when the system is expanded or upgraded. For example, through the flink keyby operation, the log with the user ID "11" is allocated to worker1, the log with the user ID "12" is allocated to worker2, and the log with the user ID "13" is allocated to worker3.

步骤S204，各worker行程将分配到的日志复制性写入非持久化缓存和持久化缓存中。In step S204, each worker process writes the assigned log into the non-persistent cache and the persistent cache.

在该步骤中，由于上文已经将相同字段的日志分配到相同的worker行程中，因此，各worker行程可以将自身所分配到的相同字段的日志复制性的写入非持久化缓存和持久化缓存中。例如，结合上述实施例worker1分配到用户ID为“11”的日志，worker2分配到用户ID为“12”的日志，那么，worker1将用户ID为“11”的日志复制性写入非持久化缓存和持久化缓存中，worker2将用户ID为“12”的日志复制性写入非持久化缓存和持久化缓存中，并且，worker3将用户ID为“13”的日志复制性写入非持久化缓存和持久化缓存中。In this step, since the log of the same field has been assigned to the same worker trip above, each worker trip can copy the log of the same field assigned to itself to the non-persistent cache and persistence cached. For example, in combination with the above embodiment, worker1 is assigned to the log with user ID "11", and worker2 is assigned to the log with user ID "12", then worker1 will copy the log with user ID "11" into the non-persistent cache In the persistent cache, worker2 writes the log replication of the user ID "12" into the non-persistent cache and the persistent cache, and worker3 writes the log replication of the user ID "13" into the non-persistent cache and persistent cache.

步骤S206，在非持久化缓存中查找是否存在具有相同字段的日志，若是，执行步骤S208；若否，执行步骤S210。Step S206, check whether there is a log with the same field in the non-persistent cache, if yes, execute step S208; if not, execute step S210.

步骤S208，从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至持久化缓存中，并执行步骤S212。In step S208, various logs satisfying preset backtracking conditions are acquired from the found logs of the same field and merged, and the merged logs are cached in a persistent cache, and step S212 is executed.

步骤S210，在持久化缓存中查找是否具有相同字段的日志，若是，执行步骤S208；若否，执行步骤S212，结束。Step S210, check whether there is a log with the same field in the persistent cache, if yes, execute step S208; if not, execute step S212, and end.

本发明实施例可以基于kafka系统和flink对接收到的日志进行流处理，进而可以通过并行度配置来实现系统的扩容、缩容。The embodiment of the present invention can perform flow processing on the received logs based on the kafka system and flink, and then realize the expansion and contraction of the system through the configuration of parallelism.

基于同一发明构思，本发明实施例还提供了一种合并日志的装置，图3示出了根据本发明一个实施例的合并日志的装置的结构示意图，参见图3，合并日志的装置300包括写入模块310、查找模块320以及合并模块330。Based on the same inventive concept, an embodiment of the present invention also provides an apparatus for merging logs. FIG. 3 shows a schematic structural diagram of an apparatus for merging logs according to an embodiment of the present invention. Referring to FIG. 3 , an apparatus 300 for merging logs includes writing Input module 310, search module 320 and merge module 330.

现介绍本发明实施例的合并日志的装置300的各组成或器件的功能以及各部分间的连接关系：The functions of the various components or devices of the log merging apparatus 300 according to the embodiment of the present invention and the connection relationship between the various parts are now introduced:

写入模块310，适于将接收日志复制性写入设置的非持久化缓存和持久化缓存中，其中，非持久化缓存对日志的缓存时长小于持久化缓存对日志的缓存时长；The writing module 310 is adapted to write the received log duplicatively into the set non-persistent cache and persistent cache, wherein the cache duration of the non-persistent cache to the log is shorter than the cache duration of the persistent cache to the log;

查找模块320，与写入模块310耦合，适于在非持久化缓存中查找具有相同字段的日志；The search module 320, coupled with the write module 310, is adapted to search for logs with the same field in the non-persistent cache;

合并模块330，与查找模块320耦合，适于从查找到的相同字段的日志中获取满足预置回溯条件的多种日志并合并，将合并后的日志缓存至持久化缓存中。其中，满足预置回溯条件的多种日志可以包括，完整session动作中多个组成对象所对应的日志，其中，完整session动作中的多个组成对象依据业务需求确定得到。The merging module 330, coupled with the search module 320, is adapted to obtain and merge various logs satisfying preset backtracking conditions from the found logs of the same field, and cache the merged logs into a persistent cache. Among them, the various logs satisfying the preset backtracking conditions may include logs corresponding to multiple constituent objects in the complete session action, wherein the multiple constituent objects in the complete session action are determined according to business requirements.

在本发明一实施例中，查找模块320还适于若在非持久化缓存中未查找到具有相同字段的日志，则继续从持久化缓存中查找具有相同字段的日志。In an embodiment of the present invention, the search module 320 is further adapted to continue searching for a log with the same field from the persistent cache if no log with the same field is found in the non-persistent cache.

在本发明一实施例中，合并日志的装置还可以应用于kafka系统，写入模块310还适于将接收日志中具有相同字段的日志分配到基于kafka系统中的同一worker行程中，其中，字段包括用户ID或者用户其他唯一标识key。各worker行程将分配到的日志复制性写入非持久化缓存和持久化缓存中。In an embodiment of the present invention, the device for merging logs can also be applied to the kafka system, and the writing module 310 is also suitable for distributing logs with the same field in the received log to the same worker trip based on the kafka system, wherein the field Including the user ID or other unique identification keys of the user. Each worker trip writes the assigned log replicas into the non-persistent cache and the persistent cache.

在本发明一实施例中，写入模块310还适于，采用flink keyby操作将具有相同字段的日志分配到基于kafka系统中的同一worker行程中。In an embodiment of the present invention, the write module 310 is also adapted to assign logs with the same field to the same worker process in the kafka-based system by using the flink keyby operation.

在本发明一实施例中，查找模块320还适于，到达预置回溯时间时启动回溯操作，并在非持久化缓存中查找具有相同字段的日志。In an embodiment of the present invention, the search module 320 is further adapted to start a backtracking operation when the preset backtracking time is reached, and search for a log with the same field in the non-persistent cache.

本发明实施例还提供了另一种合并日志的装置，参见图4，合并日志的装置300除了包含上述各模块之外，还包括设置模块340、判断模块350和建立模块360。The embodiment of the present invention also provides another device for merging logs. Referring to FIG. 4 , the device 300 for merging logs includes a setting module 340 , a judging module 350 and a building module 360 in addition to the above modules.

设置模块340，与写入模块310耦合，适于在写入模块310将接收日志复制性写入设置的非持久化缓存和持久化缓存之前，设置第一级缓存和第二级缓存作为非持久化缓存，设置第三级缓存作为持久化缓存。The setting module 340, coupled with the writing module 310, is suitable for setting the first-level cache and the second-level cache as non-persistent cache, and set the third-level cache as a persistent cache.

判断模块350，与写入模块310和查找模块320分别耦合，适于在写入模块310将接收日志复制性写入设置的非持久化缓存和持久化缓存后，在到达预置回溯时间并启动回溯操作时，判断回溯时间与日志写入时间差是否超过非持久化缓存的缓存时长，若是，由查找模块320直接从持久化缓存中查找具有相同字段的日志。The judging module 350 is coupled with the writing module 310 and the searching module 320 respectively, and is suitable for starting after the writing module 310 writes the duplication of the received log into the set non-persistent cache and the persistent cache, when the preset backtracking time is reached and starts During the backtracking operation, it is judged whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache, and if so, the search module 320 directly searches the log with the same field from the persistent cache.

建立模块360，与合并模块330和查找模块320分别耦合，适于若合并模块330从查找到的相同字段的日志中无法获取满足预置回溯条件的多种日志，则依据查找到的相同字段的日志建立topic，并为建立的topic设置最大回溯时间。查找模块320在达到最大回溯时间时从非持久化缓存和持久化缓存中查找是否存在与topic中的日志字段相同、且满足预置回溯条件的多种日志，若是，由合并模块330合并具有相同字段且满足预置回溯条件的多种日志，并将合并后的日志缓存至持久化缓存中。The establishment module 360 is separately coupled with the merge module 330 and the search module 320, and is suitable for if the merge module 330 cannot obtain various logs satisfying preset backtracking conditions from the logs of the same field found, then according to the logs of the same field found The log creates a topic, and sets the maximum backtracking time for the created topic. When the maximum backtracking time is reached, the search module 320 searches from the non-persistent cache and the persistent cache whether there are multiple logs that are the same as the log fields in the topic and satisfy the preset backtracking conditions. If so, the merge module 330 merges them with the same Fields and various logs that meet the preset backtracking conditions, and the merged logs are cached in the persistent cache.

在本发明一实施例中，查找模块320还适于依次从第一级缓存、第二级缓存中查找具有相同字段的日志。其中，第一级缓存可以包括redis cluster，第二级缓存可以包括aerospike，第三级缓存可以包括hbase。In an embodiment of the present invention, the search module 320 is further adapted to search logs with the same field from the first-level cache and the second-level cache in sequence. Among them, the first-level cache can include redis cluster, the second-level cache can include aerospike, and the third-level cache can include hbase.

在本发明一实施例中，写入模块310还适于，接收日志并为日志添加时间戳，将携带时间戳的日志复制性写入设置的非持久化缓存和持久化缓存中。In an embodiment of the present invention, the writing module 310 is further adapted to receive the log, add a time stamp to the log, and write the log carrying the time stamp into the configured non-persistent cache and persistent cache.

在本发明一实施例中，判断模块350还适于，到达预置回溯时间并启动回溯操作时，依据日志的时间戳判断回溯时间与日志写入时间差是否超过非持久化缓存的缓存时长。In an embodiment of the present invention, the judging module 350 is further adapted to, when the preset backtracking time is reached and the backtracking operation is started, judge whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the timestamp of the log.

根据上述任意一个优选实施例或多个优选实施例的组合，本发明实施例能够达到如下有益效果：According to any one of the above preferred embodiments or a combination of multiple preferred embodiments, the embodiments of the present invention can achieve the following beneficial effects:

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some embodiments herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. And form different embodiments. For example, in the claims, any one of the claimed embodiments can be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的合并日志的装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all functions of some or all components in the apparatus for merging logs according to the embodiment of the present invention. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

依据本发明的再一方面，还提供了一种电子设备，包括处理器；以及被安排成存储计算机可执行指令的存储器，可执行指令在被执行时使处理器执行根据上文任意实施例中的合并日志的方法。According to yet another aspect of the present invention, there is also provided an electronic device, comprising a processor; and a memory arranged to store computer-executable instructions, which when executed cause the processor to perform the The method of merging logs.

依据本发明的另一方面，还提供了一种计算机存储介质，其中，计算机存储介质存储一个或多个程序，一个或多个程序当被包括多个应用程序的电子设备执行时，使得电子设备执行根据上文任意实施例中的合并日志的方法。According to another aspect of the present invention, a computer storage medium is also provided, wherein the computer storage medium stores one or more programs, and when the one or more programs are executed by an electronic device including a plurality of application programs, the electronic device Execute the method for merging logs according to any of the above embodiments.

例如，图5示出了可以实现合并日志的方法的计算设备。该计算设备传统上包括处理器510和存储器520形式的计算机程序产品或者计算机可读介质。存储器520可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器520具有存储用于执行上述方法中的任何方法步骤的程序代码531的存储空间530。例如，存储程序代码的存储空间530可以包括分别用于实现上面的方法中的各种步骤的各个程序代码531。这些程序代码可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘，紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图6所示的便携式或者固定存储单元。该存储单元可以具有与图5的计算设备中的存储器520类似布置的存储段、存储空间等。程序代码可以例如以适当形式进行压缩。通常，存储单元包括用于执行本发明的方法步骤的计算机可读代码531’，即可以由诸如510之类的处理器读取的代码，当这些代码由计算设备运行时，导致该计算设备执行上面所描述的方法中的各个步骤。For example, Figure 5 illustrates a computing device that may implement a method of merging logs. The computing device conventionally includes a processor 510 and a computer program product in the form of a memory 520 or a computer-readable medium. Memory 520 may be electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM. The memory 520 has a storage space 530 storing a program code 531 for performing any method step in the method described above. For example, the storage space 530 storing program codes may include respective program codes 531 for respectively implementing various steps in the above methods. These program codes can be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards or floppy disks. Such a computer program product is typically, for example, a portable or fixed storage unit as shown in FIG. 6 . The storage unit may have storage segments, storage spaces, etc. arranged similarly to the memory 520 in the computing device of FIG. 5 . The program code can eg be compressed in a suitable form. Typically, the memory unit includes computer readable code 531' for performing the method steps of the present invention, i.e. code readable by a processor such as 510, which when executed by a computing device causes the computing device to execute steps in the method described above.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.

Claims

1. A method of merging logs, the method being applied to a kafka system, comprising:

writing received logs into a set non-persistent cache and a persistent cache in a replicative manner, wherein the cache time of the non-persistent cache to the logs is smaller than that of the persistent cache to the logs, and the non-persistent cache comprises a first-level cache and a second-level cache;

When reaching the preset backtracking time and starting the backtracking operation, judging whether the difference between the backtracking time and the log writing time exceeds the cache time of the non-persistent cache;

if yes, directly searching the logs with the same fields from the persistent cache;

acquiring and combining various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the combined logs into the persistent cache;

if multiple logs meeting preset backtracking conditions cannot be obtained from the searched logs in the same field, building a topic according to the searched logs in the same field, and setting the maximum backtracking time for the built topic;

searching whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;

if yes, merging various logs which have the same field and meet preset backtracking conditions, and caching the merged logs into the persistent cache.

2. The method of claim 1, further comprising:

if the logs with the same fields are not found in the non-persistent cache, continuing to find the logs with the same fields from the persistent cache.

3. The method of claim 1 or 2, wherein prior to the replicative writing of the received log into the set non-persistent cache and persistent cache, further comprising: setting a first-level cache and a second-level cache as non-persistent caches, and setting a third-level cache as a persistent cache.

4. The method of claim 3, wherein looking up the log with the same field in the non-persistent cache comprises:

and searching logs with the same field from the first-level cache and the second-level cache in sequence.

5. The method of claim 3, wherein the first level cache comprises a redis cluster, the second level cache comprises an aerosepike, and the third level cache comprises hbase.

6. The method of claim 1 or 2, wherein the method is applied to a kafka system, the replicative writing of a received log into a set non-persistent cache and persistent cache, comprising:

distributing the logs with the same fields in the received logs to the same worker travel in the kafka system, wherein the fields comprise user IDs or other unique identification keys of users;

and each worker stroke duplicatively writes the allocated log into the non-persistent cache and the persistent cache.

7. The method of claim 6, wherein the assigning logs with the same fields in the received log to the same worker run in the kafka-based system comprises:

the flink key operation is used to distribute the logs with the same fields to the same worker run in the kafka system.

8. The method of claim 1, wherein the replicatively writing the received log into the set non-persistent cache and persistent cache comprises:

and receiving the log, adding a time stamp to the log, and copying the log carrying the time stamp into the set non-persistent cache and the persistent cache.

9. The method of claim 8, wherein when the preset trace back time is reached and the trace back operation is started, determining whether a difference between the trace back time and the log writing time exceeds a cache duration of the non-persistent cache comprises:

when the preset backtracking time is reached and the backtracking operation is started, judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache according to the time stamp of the log.

10. The method according to claim 1 or 2, wherein the plurality of logs satisfying a preset backtracking condition include:

And the logs corresponding to the plurality of component objects in the complete session action are obtained by determining the plurality of component objects in the complete session action according to the service requirement.

11. An apparatus for merging logs, wherein the apparatus is applied to a kafka system, comprising:

the writing module is suitable for writing the received log into a set non-persistent cache and a persistent cache, wherein the cache time of the non-persistent cache to the log is smaller than that of the persistent cache to the log, and the non-persistent cache comprises a first-level cache and a second-level cache;

the judging module is suitable for judging whether the difference between the backtracking time and the log writing time exceeds the cache duration of the non-persistent cache when the preset backtracking time is reached and the backtracking operation is started;

if yes, the searching module is suitable for directly searching the logs with the same fields from the persistent cache;

the merging module is suitable for acquiring and merging various logs meeting preset backtracking conditions from the searched logs in the same field, and caching the merged logs into the persistent cache;

the establishing module is suitable for establishing a topic according to the searched logs of the same field if the merging module cannot acquire various logs meeting preset backtracking conditions from the searched logs of the same field, and setting the maximum backtracking time for the established topic;

The searching module searches whether various logs which are the same as the log fields in the topic and meet preset backtracking conditions exist in the non-persistent cache and the persistent cache when the maximum backtracking time is reached;

if yes, the merging module merges various logs which have the same field and meet preset backtracking conditions, and caches the merged logs into the persistent cache.

12. The apparatus of claim 11, wherein the lookup module is further adapted to:

13. The apparatus of claim 11 or 12, further comprising:

the setting module is suitable for setting the first-level cache and the second-level cache as non-persistent caches and setting the third-level cache as persistent caches before the writing module is used for reproducibly writing the received log into the set non-persistent caches and persistent caches.

14. The apparatus of claim 13, wherein the lookup module is further adapted to:

15. The apparatus of claim 13, wherein the first level cache comprises a redis cluster, the second level cache comprises an aeropike, and the third level cache comprises a hbase.

16. The apparatus of claim 11 or 12, the writing module further adapted to:

17. The apparatus of claim 16, wherein the writing module is further adapted to:

18. The apparatus of claim 17, wherein the writing module is further adapted to:

19. The apparatus of claim 18, wherein the determination module is further adapted to:

20. The apparatus according to claim 11 or 12, wherein the plurality of logs satisfying a preset backtracking condition include:

21. An electronic device, comprising:

a processor; and

a memory arranged to store computer executable instructions which when executed cause the processor to perform the method of merging logs according to any of claims 1-10.

22. A computer storage medium storing one or more programs that, when executed by an electronic device comprising a plurality of application programs, cause the electronic device to perform the method of merging logs according to any of claims 1-10.