CN110750563A

CN110750563A - Multi-model data processing method, system, device, electronic equipment and storage medium

Info

Publication number: CN110750563A
Application number: CN201810805589.0A
Authority: CN
Inventors: 冯甲一; 王凯; 徐羽; 赵旭玲
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2018-07-20
Filing date: 2018-07-20
Publication date: 2020-02-04

Abstract

The disclosure provides a multi-model data processing method, a system, a device, electronic equipment and a computer readable storage medium, and belongs to the technical field of data processing. The method comprises the following steps: obtaining raw model data from a plurality of raw models; merging the original model data to obtain first intermediate data; extracting tuples, of which the target fields meet the extraction conditions, from the first intermediate data according to the extraction conditions for the target fields of the first intermediate data to obtain second intermediate data; and performing packaging treatment on the second intermediate data to obtain standard model data. The method and the device realize the fusion and intercommunication of the multi-model data, improve the convenience of subsequent application, form the standardized processing flow of the multi-model data and have strong universality.

Description

Multi-model data processing method, system, device, electronic device and storage medium

技术领域technical field

本公开涉及数据处理技术领域，尤其涉及一种多模型数据处理方法、系统、装置、电子设备及计算机可读存储介质。The present disclosure relates to the technical field of data processing, and in particular, to a multi-model data processing method, system, apparatus, electronic device, and computer-readable storage medium.

背景技术Background technique

随着数据处理技术的发展，App(应用程序)或网站运营厂商越来越具备存储与处理海量数据的能力，并能够通过大数据分析，制定精准有效的营销策略，实现App或网站运营效益的提高。With the development of data processing technology, App (application) or website operators are more and more capable of storing and processing massive data, and can formulate accurate and effective marketing strategies through big data analysis to achieve the benefits of App or website operation. improve.

目前，各个App或网站运营厂商的数据管理团队一般都有各自的数据模型，对于数据存储的结构、数据格式、字段名、数据加工规则等都有特定的要求，导致不同模型的数据难以互通。在互联网“生态化”、“一站式”平台等观念带动的发展背景下，不同模型数据的融合与互通是App及网站发展的主流趋势，相较于单一模型的数据分析，基于多模型数据的大数据分析能够发挥出更大的价值。At present, the data management team of each app or website operator generally has its own data model, and has specific requirements for the structure of data storage, data format, field name, data processing rules, etc., which makes it difficult for the data of different models to communicate with each other. In the context of the development driven by the concept of "ecologicalization" and "one-stop" platform of the Internet, the integration and interoperability of data from different models is the mainstream trend in the development of apps and websites. The big data analysis can play a greater value.

因此需要提出一种多模型数据处理方法。Therefore, it is necessary to propose a multi-model data processing method.

需要说明的是，在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解，因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。It should be noted that the information disclosed in the above Background section is only for enhancement of understanding of the background of the present disclosure, and therefore may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.

发明内容SUMMARY OF THE INVENTION

本公开提供了一种多模型数据处理方法、系统、装置、电子设备及计算机可读存储介质，进而至少在一定程度上克服现有的多模型数据处理方法中不同模型的数据难以互通的问题。The present disclosure provides a multi-model data processing method, system, apparatus, electronic device, and computer-readable storage medium, thereby at least to a certain extent overcoming the problem that data of different models in the existing multi-model data processing methods are difficult to communicate with each other.

本公开的其他特性和优点将通过下面的详细描述变得显然，或部分地通过本公开的实践而习得。Other features and advantages of the present disclosure will become apparent from the following detailed description, or be learned in part by practice of the present disclosure.

根据本公开的一个方面，提供一种多模型数据处理方法，包括：从多个原始模型获取原始模型数据；将所述原始模型数据进行合并，以获得第一中间数据；根据针对于所述第一中间数据的目标字段的抽取条件，从所述第一中间数据中抽取所述目标字段满足所述抽取条件的元组，以获得第二中间数据；对所述第二中间数据进行包装化处理，获得标准模型数据。According to one aspect of the present disclosure, there is provided a multi-model data processing method, comprising: obtaining original model data from multiple original models; merging the original model data to obtain first intermediate data; An extraction condition of the target field of the intermediate data, extract the tuple whose target field satisfies the extraction condition from the first intermediate data to obtain the second intermediate data; perform packaging processing on the second intermediate data , to obtain standard model data.

在本公开的一种示例性实施例中，所述从多个原始模型获取原始模型数据包括：周期性遍历文件存储平台上由所述多个原始模型发送的原始模型数据文件的文件名；将所述文件名与下载历史记录进行匹配，并在匹配不成功时对所述文件名进行解析，以及将解析后的所述文件名与预设白名单进行匹配；如果解析后的所述文件名与所述预设白名单匹配成功，则从所述文件存储平台下载所述原始模型数据文件；从下载的所述原始模型数据文件中提取所述原始模型数据。In an exemplary embodiment of the present disclosure, the acquiring the original model data from the multiple original models includes: periodically traversing the file names of the original model data files sent by the multiple original models on the file storage platform; The file name is matched with the download history record, and when the matching is unsuccessful, the file name is parsed, and the parsed file name is matched with the preset whitelist; if the parsed file name is If the matching with the preset whitelist is successful, the original model data file is downloaded from the file storage platform; the original model data is extracted from the downloaded original model data file.

在本公开的一种示例性实施例中，所述将所述原始模型数据进行合并，以获得第一中间数据包括：在当前周期内，检测是否已经获取所述当前周期的全部原始模型数据；如果检测到未获取所述当前周期的全部原始模型数据，则在第一间隔时间后再次检测是否已经获得所述当前周期的全部原始模型数据；如果检测到已获取所述当前周期的全部原始模型数据，则将所述当前周期的原始模型数据进行合并，以获得所述第一中间数据。In an exemplary embodiment of the present disclosure, the combining the original model data to obtain the first intermediate data includes: in the current cycle, detecting whether all the original model data of the current cycle have been obtained; If it is detected that all the original model data of the current cycle has not been acquired, it is detected again after the first interval whether all the original model data of the current cycle have been acquired; if it is detected that all the original model data of the current cycle have been acquired data, the original model data of the current cycle is merged to obtain the first intermediate data.

在本公开的一种示例性实施例中，所述方法还包括：如果检测到未获得所述当前周期的全部原始模型数据，则判断模型数据合并任务是否激活多批次模式；如果判断到所述模型数据合并任务未激活多批次模式，则执行在第一预定时间后再次检测是否已经获取所述当前周期的全部原始模型数据的步骤；如果判断到所述模型数据合并任务激活多批次模式，则在所述多批次模式的任一批次内，检测当前批次内获取的原始模型数据对应的原始模型是否包括全部所述原始模型；如果检测到当前批次内获取的原始模型数据对应的原始模型未包括全部所述原始模型，则在第二间隔时间后再次检测所述当前批次内获取的原始模型数据对应的原始模型是否包括全部所述原始模型；如果检测到当前批次内获取的原始模型数据对应的原始模型包括全部所述原始模型，则对所述当前批次内获取的原始模型数据进行合并，以获得所述第一中间数据。In an exemplary embodiment of the present disclosure, the method further includes: if it is detected that all the original model data of the current cycle has not been obtained, judging whether the model data merging task activates the multi-batch mode; If the model data merging task does not activate the multi-batch mode, then perform the step of re-detecting whether all the original model data of the current cycle has been acquired after the first predetermined time; if it is determined that the model data merging task activates the multi-batch mode mode, in any batch of the multi-batch mode, check whether the original model corresponding to the original model data obtained in the current batch includes all the original models; if it is detected that the original models obtained in the current batch are If the original model corresponding to the data does not include all the original models, then after the second interval time, check again whether the original model corresponding to the original model data obtained in the current batch includes all the original models; if it is detected that the current batch If the original model corresponding to the original model data obtained in the second batch includes all the original models, the original model data obtained in the current batch is merged to obtain the first intermediate data.

在本公开的一种示例性实施例中，所述方法还包括：在检测到当前批次内获取的原始模型数据对应的原始模型未包括全部所述原始模型后，检测所述模型数据合并任务是否激活强制合并模式；如果检测到所述模型数据合并任务已激活强制合并模式，则在所述强制合并模式的预定时间合并已经获取的所述原始模型数据，以获得所述第一中间数据；如果检测到所述模型数据合并任务未激活强制合并模式，则执行在第二间隔时间后再次检测所述当前批次内获取的原始模型数据对应的原始模型是否包括全部所述原始模型的步骤。In an exemplary embodiment of the present disclosure, the method further includes: after detecting that the original model corresponding to the original model data obtained in the current batch does not include all the original models, detecting the model data merging task Whether to activate the forced merging mode; if it is detected that the model data merging task has activated the forced merging mode, merge the acquired original model data at a predetermined time in the forced merging mode to obtain the first intermediate data; If it is detected that the forced merging mode is not activated in the model data merging task, the step of detecting again whether the original model corresponding to the original model data obtained in the current batch includes all the original models is performed after the second interval.

在本公开的一种示例性实施例中，所述将所述原始模型数据进行合并，以获得第一中间数据包括：获取合并规则，所述合并规则包括字段映射规则；根据所述字段映射规则，将同一类型字段的所述原始模型数据进行列合并，并为各列分配标准字段，以获得所述第一中间数据。In an exemplary embodiment of the present disclosure, the combining the original model data to obtain the first intermediate data includes: acquiring a combining rule, where the combining rule includes a field mapping rule; according to the field mapping rule , perform column merging of the original model data of the same type of field, and assign standard fields to each column to obtain the first intermediate data.

在本公开的一种示例性实施例中，所述合并规则还包括字段筛选规则、模型优先级规则、格式转换规则、重置字段计算规则及过滤条件中的至少一种；在将同一类型字段的所述原始模型数据进行列合并，并为各列分配标准字段后，所述方法还包括：通过以下步骤中的至少一个对进行列合并以及为各列分配标准字段后的所述原始模型数据进行合并处理：去除所述字段筛选规则指定的字段；根据所述模型优先级规则，当存在多个所述原始模型在同一类型字段且主索引相同的原始模型数据不同时，保留优先级最高的原始模型的原始模型数据；根据所述格式转换规则，按照各所述标准字段的定制格式对各所述标准字段的原始模型数据进行格式转换；根据所述重置字段计算规则，对重置字段的原始模型数据进行转换计算；去除符合所述过滤条件的原始模型数据。In an exemplary embodiment of the present disclosure, the merging rule further includes at least one of a field screening rule, a model priority rule, a format conversion rule, a reset field calculation rule, and a filtering condition; when combining fields of the same type After merging the original model data of the original model data and assigning standard fields to each column, the method further includes: performing at least one of the following steps on the original model data after merging columns and assigning standard fields to each column. Perform merging processing: remove the fields specified by the field screening rules; according to the model priority rules, when there are multiple original models with the same type of field and the same primary index with different original model data, keep the one with the highest priority. The original model data of the original model; according to the format conversion rule, format conversion is performed on the original model data of each of the standard fields according to the customized format of each of the standard fields; according to the reset field calculation rule, the reset field is The original model data is converted and calculated; the original model data that meets the filtering conditions is removed.

在本公开的一种示例性实施例中，所述目标字段包括最近活跃时间及抽取状态；所述根据针对于所述第一中间数据的目标字段的抽取条件，从所述第一中间数据中抽取所述目标字段满足所述抽取条件的元组，以获得第二中间数据包括：从应用任务中提取所述抽取条件，所述抽取条件包括抽取时间、活动周期及不重复抽取；在到达所述抽取时间时，从所述第一中间数据中抽取所述活跃时间在所述活动周期内且所述抽取状态为未抽取的元组，以获得所述第二中间数据。In an exemplary embodiment of the present disclosure, the target field includes the most recent active time and an extraction state; the target field is extracted from the first intermediate data according to an extraction condition for the target field of the first intermediate data Extracting the tuple whose target field satisfies the extraction condition to obtain the second intermediate data includes: extracting the extraction condition from the application task, where the extraction condition includes extraction time, activity period and non-repetitive extraction; When the extraction time is selected, a tuple whose active time is within the active period and whose extraction state is unextracted is extracted from the first intermediate data to obtain the second intermediate data.

在本公开的一种示例性实施例中，所述方法还包括：在获得所述第一中间数据后，根据所述第一中间数据的多个索引字段将所述第一中间数据分片为多个第一中间数据分表，并分别存储到多个容器中；所述根据针对于所述第一中间数据的目标字段的抽取条件，从所述第一中间数据中抽取所述目标字段满足所述抽取条件的元组，以获得第二中间数据包括：根据所述抽取条件，从所述多个容器中获取目标第一中间数据分表，并从所述目标第一中间数据分表中抽取所述目标字段满足所述抽取条件的元组，以获得所述第二中间数据。In an exemplary embodiment of the present disclosure, the method further includes: after obtaining the first intermediate data, slicing the first intermediate data into pieces according to multiple index fields of the first intermediate data A plurality of first intermediate data are divided into tables and stored in a plurality of containers respectively; the extracting the target field from the first intermediate data according to the extraction conditions for the target field of the first intermediate data satisfies The extraction of the tuple of conditions to obtain the second intermediate data includes: according to the extraction conditions, acquiring the target first intermediate data sub-table from the plurality of containers, and obtaining the target first intermediate data sub-table from the target first intermediate data sub-table Extracting a tuple whose target field satisfies the extraction condition to obtain the second intermediate data.

根据本公开的一个方面，提供一种多模型数据处理系统，包括：规则定制模块，用于根据规则配置文件，生成合并规则、抽取条件以及包装规则；数据处理模块，用于根据所述合并规则对原始模型数据进行合并，得到第一中间数据，根据所述抽取条件从所述第一中间数据中抽取出第二中间数据，以及根据所述包装规则对所述第二中间数据进行包装化处理，得到标准模型数据；数据存储模块，用于获取并存储所述原始模型数据，以及分别存储所述第一中间数据、所述第二中间数据与所述标准模型数据。According to one aspect of the present disclosure, there is provided a multi-model data processing system, comprising: a rule customization module for generating merging rules, extraction conditions and packaging rules according to a rule configuration file; a data processing module for generating merging rules according to the merging rules Merging the original model data to obtain first intermediate data, extracting second intermediate data from the first intermediate data according to the extraction conditions, and packaging the second intermediate data according to the packaging rules to obtain standard model data; a data storage module is used to obtain and store the original model data, and store the first intermediate data, the second intermediate data and the standard model data respectively.

根据本公开的一个方面，提供一种多模型数据处理装置，包括：原始获取单元，用于从多个原始模型获取原始模型数据；数据合并单元，用于将所述原始模型数据进行合并，以获得第一中间数据；数据抽取单元，用于根据针对于所述第一中间数据的目标字段的抽取条件，从所述第一中间数据中抽取所述目标字段满足所述抽取条件的元组，以获得第二中间数据；数据包装单元，用于对所述第二中间数据进行包装化处理，获得标准模型数据。According to one aspect of the present disclosure, there is provided a multi-model data processing apparatus, comprising: an original acquisition unit for acquiring original model data from a plurality of original models; and a data merging unit for merging the original model data to Obtaining first intermediate data; a data extraction unit, configured to extract, from the first intermediate data, a tuple whose target field satisfies the extraction condition according to an extraction condition for a target field of the first intermediate data, to obtain second intermediate data; a data packaging unit, configured to perform packaging processing on the second intermediate data to obtain standard model data.

根据本公开的一个方面，提供一种电子设备，包括：处理器；以及存储器，用于存储所述处理器的可执行指令；其中，所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的方法。According to one aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute via executing the executable instructions The method of any of the above.

根据本公开的一个方面，提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行时实现上述任意一项所述的方法。According to one aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements any one of the methods described above.

本公开的示例性实施例具有以下有益效果：Exemplary embodiments of the present disclosure have the following beneficial effects:

一方面，将各原始模型的原始模型数据通过合并、抽取及包装化处理，转换为标准模型数据，从而实现了不同模型数据的融合与互通，提高了后续应用的方便度。另一方面，本实施例形成了多模型数据的标准化处理流程，可以根据原始模型的特点或后续应用的定制化需求制定相应的合并、抽取及包装化处理规则，使得本实施例具有较强的通用性。On the one hand, the original model data of each original model is converted into standard model data through merging, extraction and packaging processing, thereby realizing the fusion and intercommunication of different model data, and improving the convenience of subsequent applications. On the other hand, this embodiment forms a standardized processing flow for multi-model data, and can formulate corresponding merging, extracting and packaging processing rules according to the characteristics of the original model or the customized requirements of subsequent applications, so that this embodiment has a strong Universality.

应当理解的是，以上的一般描述和后文的细节描述仅是示例性和解释性的，并不能限制本公开。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the present disclosure.

附图说明Description of drawings

此处的附图被并入说明书中并构成本说明书的一部分，示出了符合本公开的实施例，并与说明书一起用于解释本公开的原理。显而易见地，下面描述中的附图仅仅是本公开的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure. Obviously, the drawings in the following description are only some embodiments of the present disclosure, and for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative effort.

图1示意性示出本示例性实施例中一种多模型数据处理方法的流程图；FIG. 1 schematically shows a flowchart of a multi-model data processing method in this exemplary embodiment;

图2示意性示出本示例性实施例中一种原始模型数据获取方法的流程图；FIG. 2 schematically shows a flowchart of a method for acquiring original model data in this exemplary embodiment;

图3示意性示出本示例性实施例中一种原始模型数据合并方法的流程图；FIG. 3 schematically shows a flow chart of a method for merging original model data in this exemplary embodiment;

图4示意性示出本示例性实施例中一种规则定制的示意图；FIG. 4 schematically shows a schematic diagram of a rule customization in this exemplary embodiment;

图5示意性示出本示例性实施例中另一种多模型数据处理方法的流程图；FIG. 5 schematically shows a flowchart of another multi-model data processing method in this exemplary embodiment;

图6示意性示出本示例性实施例中另一种多模型数据处理方法的流程图；FIG. 6 schematically shows a flowchart of another multi-model data processing method in this exemplary embodiment;

图7示意性示出本示例性实施例中一种数据抽取方法的流程图；FIG. 7 schematically shows a flowchart of a data extraction method in this exemplary embodiment;

图8示意性示出本示例性实施例的一种多模型数据处理系统的运行环境架构示意图；FIG. 8 schematically shows a schematic diagram of an operating environment architecture of a multi-model data processing system according to this exemplary embodiment;

图9示意性示出本示例性实施例中一种多模型数据处理装置的结构框图；FIG. 9 schematically shows a structural block diagram of a multi-model data processing apparatus in this exemplary embodiment;

图10示意性示出本示例性实施例中一种用于实现上述方法的电子设备；FIG. 10 schematically shows an electronic device for implementing the above method in this exemplary embodiment;

图11示意性示出本示例性实施例中一种用于实现上述方法的计算机可读存储介质。FIG. 11 schematically shows a computer-readable storage medium for implementing the above method in this exemplary embodiment.

具体实施方式Detailed ways

现在将参考附图更全面地描述示例实施方式。然而，示例实施方式能够以多种形式实施，且不应被理解为限于在此阐述的范例；相反，提供这些实施方式使得本公开将更加全面和完整，并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments, however, can be embodied in various forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

本公开的示例性实施例提供了一种多模型数据处理方法，可以应用于App或网站的后台服务器。参考图1所示，该方法可以包括以下步骤S11～S14：Exemplary embodiments of the present disclosure provide a multi-model data processing method, which can be applied to a backend server of an App or a website. Referring to FIG. 1, the method may include the following steps S11-S14:

步骤S11，从多个原始模型获取原始模型数据。Step S11, acquiring original model data from multiple original models.

步骤S12，将原始模型数据进行合并，以获得第一中间数据。Step S12, combining the original model data to obtain the first intermediate data.

步骤S13，根据针对于第一中间数据的目标字段的抽取条件，从第一中间数据中抽取目标字段满足抽取条件的元组，以获得第二中间数据。Step S13, according to the extraction condition for the target field of the first intermediate data, extract the tuple whose target field satisfies the extraction condition from the first intermediate data to obtain the second intermediate data.

步骤S14，对第二中间数据进行包装化处理，获得标准模型数据。Step S14, performing packaging processing on the second intermediate data to obtain standard model data.

本实施例可应用于多种场景，例如电商平台的一种商品，其供应商可能包括厂家、代理商、超市、第三方卖家等，不同供应商关于该商品的参数的数据模型不同，可以通过本实施例的方法转换为标准模型数据，从而对多个模型的数据实现了融合，方便下游的电商平台使用；又例如服装、生鲜、外卖等不同电商模块的用户数据按照各自不同的模型进行处理与存储，可以通过本实施例的方法转换为标准模型数据，方便下游的支付平台使用；又例如：企业内部可能有大数据部门、模型算法部门、程序开发部门等多个团队从不同App或App的不同板块抽取用户、订单、行为操作等数据，根据不同的模型进行用户画像构建与用户行为分析，可以通过本实施例的方法转换为标准模型数据，以便于运营部门根据标准模型数据制定相应的营销策略。This embodiment can be applied to a variety of scenarios, for example, a commodity on an e-commerce platform, its suppliers may include manufacturers, agents, supermarkets, third-party sellers, etc. The method of this embodiment is converted into standard model data, so that the data of multiple models is integrated, which is convenient for downstream e-commerce platforms to use; for example, the user data of different e-commerce modules such as clothing, fresh food, and take-out are different according to their respective It can be converted into standard model data by the method of this embodiment, which is convenient for downstream payment platforms to use; another example: there may be multiple teams such as big data department, model algorithm department, program development department and so on within the enterprise. Different apps or different sections of the app extract data such as users, orders, behavioral operations, etc., and perform user portrait construction and user behavior analysis according to different models, which can be converted into standard model data by the method of this embodiment, so that the operation department can follow the standard model. Data to formulate corresponding marketing strategies.

基于上述说明，步骤S11中的原始模型可以是生成各种原始模型数据的模型，例如上述各供应商的数据模型、各电商模块的数据模型、各团队的用户数据分析模型等。每个原始模型生成的原始模型数据可以是一个或多个关系型数据表，其中每一列是数据的一种属性，以特定的字段标识，每一行是一个元组。Based on the above description, the original model in step S11 may be a model for generating various original model data, such as the data model of each supplier, the data model of each e-commerce module, and the user data analysis model of each team. The original model data generated by each original model can be one or more relational data tables, where each column is an attribute of the data, identified by a specific field, and each row is a tuple.

步骤S12获得的第一中间数据也可以是关系型数据表。在合并原始模型数据时，可以遵循一定的合并规则，例如指定各原始模型之间的优先级，指定各字段的数据定制格式，指定原始模型数据中的有效字段以裁掉不需要的数据列，指定原始模型数据中的排重字段以去除重复的元组(数据行)，指定过滤条件以定制化筛选原始模型数据，根据第一中间数据与原始模型数据的字段索引以便于数据回查等，根据应用场景的不同以及后续的数据需求，可以制定相应的合并规则，本实施例对此不做特别限定。The first intermediate data obtained in step S12 may also be a relational data table. When merging the original model data, you can follow certain merging rules, such as specifying the priority between the original models, specifying the data customization format of each field, specifying the valid fields in the original model data to cut out unnecessary data columns, Specify the reordering field in the original model data to remove duplicate tuples (data rows), specify filter conditions to customize the filtering of the original model data, and use the field index of the first intermediate data and the original model data to facilitate data review, etc., Corresponding merging rules may be formulated according to different application scenarios and subsequent data requirements, which are not particularly limited in this embodiment.

第一中间数据可以作为储备的数据库，为合并后的全部数据。后续的应用中可能不需要全部数据，可以设定抽取条件，通过步骤S13从第一中间数据中抽取需要的数据，即第二中间数据。在步骤S13中，通常以元组为单位，指定目标字段的抽取条件后，抽取符合条件的元组，例如抽取符合价格区间条件的商品数据，抽取符合会员等级条件的用户数据等。The first intermediate data can be used as a reserve database, which is all the combined data. Subsequent applications may not require all the data, and extraction conditions may be set, and the required data, that is, the second intermediate data, is extracted from the first intermediate data through step S13. In step S13, usually in units of tuples, after specifying the extraction conditions of the target field, the tuples that meet the conditions are extracted, for example, commodity data meeting the price range conditions, user data meeting the membership level conditions, etc. are extracted.

步骤S14中的包装化处理是指对第二中间数据进行完善或优化，以符合后续应用的需求，具体可以包括为第二中间数据排序、添加标签、将第二中间数据的空值进行预设填充等，本实施例对此不做特别限定。The packaging processing in step S14 refers to improving or optimizing the second intermediate data to meet the requirements of subsequent applications, which may specifically include sorting the second intermediate data, adding labels, and presetting the null value of the second intermediate data. Filling, etc., are not particularly limited in this embodiment.

在上述方法中，一方面，将各原始模型的原始模型数据通过合并、抽取及包装化处理，转换为标准模型数据，从而实现了不同模型数据的融合与互通，提高了后续应用的方便度。另一方面，本实施例形成了多模型数据的标准化处理流程，可以根据原始模型的特点或后续应用的定制化需求制定相应的合并、抽取及包装化处理规则，使得本实施例具有较强的通用性。In the above method, on the one hand, the original model data of each original model is converted into standard model data through merging, extraction and packaging processing, thereby realizing the fusion and intercommunication of different model data, and improving the convenience of subsequent applications. On the other hand, this embodiment forms a standardized processing flow for multi-model data, and can formulate corresponding merging, extracting and packaging processing rules according to the characteristics of the original model or the customized requirements of subsequent applications, so that this embodiment has a strong Universality.

为了实现多模型数据的自动化处理，可以为图1中的各步骤设置触发条件。在一示例性实施例中，步骤S11可以包括以下步骤，以实现自动化的获取原始模型数据：In order to realize automatic processing of multi-model data, trigger conditions can be set for each step in Figure 1. In an exemplary embodiment, step S11 may include the following steps to achieve automated acquisition of original model data:

周期性遍历文件存储平台上由多个原始模型发送的原始模型数据文件的文件名。Periodically traverse the filenames of the original model data files sent by multiple original models on the file storage platform.

将文件名与下载历史记录进行匹配，并在匹配不成功时对文件名进行解析，以及将解析后的文件名与预设白名单进行匹配。Match the file name with the download history, and parse the file name if the match is unsuccessful, and match the parsed file name with a preset whitelist.

如果解析后的文件名与预设白名单匹配成功，则从文件存储平台下载原始模型数据文件。If the parsed file name matches the preset whitelist successfully, download the original model data file from the file storage platform.

从下载的原始模型数据文件中提取原始模型数据。Extract raw model data from the downloaded raw model data file.

其中，文件存储平台可以是后台服务器之外的数据库、容器等，各原始模型可以定期向文件存储平台发送原始模型数据文件，原始模型数据文件通常可以按照时间段存储原始模型数据，并生成包含模型名称、数据类型、数据生成时间等信息的文件名。下载历史记录中存有已下载文件的文件名，如果匹配不成功，说明该文件未被下载过。预设白名单是指预先配置的文件名信息匹配条件，以防止误下载了其他类型的文件，并可以在预设白名单中设置文件的时间段、类型等匹配条件，以进一步筛选出需要的原始模型数据文件。在从原始模型数据文件中提取原始模型数据时，也可以进行一定的筛选，例如提取特定字段的原始模型数据，提取特定编号的原始模型数据等。Among them, the file storage platform can be a database, container, etc. other than the background server. Each original model can periodically send original model data files to the file storage platform. The original model data files can usually store original model data according to time periods, and generate models containing The filename for information such as name, data type, data generation time, etc. The file name of the downloaded file is stored in the download history. If the match is unsuccessful, the file has not been downloaded. The preset whitelist refers to the pre-configured file name information matching conditions to prevent other types of files from being downloaded by mistake, and the matching conditions such as the time period and type of the files can be set in the preset whitelist to further filter out the required files. Original model data file. When extracting the original model data from the original model data file, certain filtering can also be performed, such as extracting the original model data of a specific field, extracting the original model data of a specific number, and so on.

上述流程可以参考图2所示，原始模型向文件存储平台发送原始模型数据文件，后台服务器可以每30分钟自动遍历文件存储平台上的文件名，如果同时满足文件未被下载过，且文件名与预设白名单匹配，则可以下载该文件。当然，图中的原始模型数目及30分钟的周期仅是示例性的，根据实际需求，可以设置任意数目的原始模型以及任意时长的周期，本实施例对此不做特别限定。The above process can be referred to as shown in Figure 2. The original model sends the original model data file to the file storage platform. The background server can automatically traverse the file name on the file storage platform every 30 minutes. If the default whitelist matches, the file can be downloaded. Of course, the number of primitive models and the period of 30 minutes in the figure are only exemplary. According to actual requirements, any number of primitive models and periods of any duration can be set, which are not particularly limited in this embodiment.

在一示例性实施例中，步骤S12可以包括以下步骤，以实现自动化的触发原始模型数据合并：In an exemplary embodiment, step S12 may include the following steps to realize automatic triggering of original model data merging:

在当前周期内，检测是否已经获取当前周期的全部原始模型数据。In the current cycle, check whether all the original model data of the current cycle have been acquired.

如果检测到未获取当前周期的全部原始模型数据，则在第一间隔时间后再次检测是否已经获得当前周期的全部原始模型数据。If it is detected that all the original model data of the current cycle has not been obtained, it is checked again after the first interval time whether all the original model data of the current cycle has been obtained.

如果检测到已获取当前周期的全部原始模型数据，则将当前周期的原始模型数据进行合并，以获得第一中间数据。If it is detected that all the original model data of the current cycle have been acquired, the original model data of the current cycle are merged to obtain the first intermediate data.

其中，周期是指模型数据合并任务进行的时间区间，通常可以是原始模型数据发送周期的整数倍，例如原始模型每天发送一次数据，则模型数据合并任务的周期可以是一天、两天、三天等。第一间隔时间可视为模型数据合并任务周期性启动的间隔时间。在本实施例中，在获取当前周期的全部原始模型数据时，触发合并原始模型数据。需要说明的是，在每个周期内，每个原始模型可能发送不止一次原始模型数据，因此需要检测是否获取了所有原始模型发送的所有数据，可以通过检测原始模型数据文件名，或者核对原始模型数据文件的数量等方式实现。The cycle refers to the time interval during which the model data merging task is performed, which can usually be an integer multiple of the original model data sending cycle. For example, if the original model sends data once a day, the model data merging task cycle can be one day, two days, or three days. Wait. The first interval time may be regarded as an interval time for periodically starting the model data merging task. In this embodiment, when all the original model data of the current cycle are acquired, the merging of the original model data is triggered. It should be noted that, in each cycle, each original model may send original model data more than once, so it is necessary to check whether all data sent by all original models has been obtained. You can check the original model data file name by checking the original model. The number of data files, etc.

进一步的，在一示例性实施例中，多模型数据处理方法还可以包括以下步骤：Further, in an exemplary embodiment, the multi-model data processing method may further include the following steps:

如果检测到未获得当前周期的全部原始模型数据，则判断模型数据合并任务是否激活多批次模式。If it is detected that all the original model data of the current cycle has not been obtained, it is determined whether the model data merging task activates the multi-batch mode.

如果判断到模型数据合并任务未激活多批次模式，则执行在第一预定时间后再次检测是否已经获取当前周期的全部原始模型数据的步骤。If it is determined that the multi-batch mode is not activated by the model data merging task, the step of detecting again whether all the original model data of the current cycle has been acquired after the first predetermined time is performed.

如果判断到模型数据合并任务激活多批次模式，则在多批次模式的任一批次内，检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型。If it is determined that the model data merging task activates the multi-batch mode, in any batch of the multi-batch mode, check whether the original model corresponding to the original model data obtained in the current batch includes all the original models.

如果检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型，则在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型。If it is detected that the original model corresponding to the original model data obtained in the current batch does not include all the original models, after the second interval time, it is detected again whether the original model corresponding to the original model data obtained in the current batch includes all the original models.

如果检测到当前批次内获取的原始模型数据对应的原始模型包括全部原始模型，则对当前批次内获取的原始模型数据进行合并，以获得第一中间数据。If it is detected that the original model corresponding to the original model data obtained in the current batch includes all original models, the original model data obtained in the current batch is merged to obtain the first intermediate data.

其中，多批次模式是指在一个周期内，分多个批次合并原始模型数据，例如周期为一天时，可以分别在上午、下午、晚上进行一次原始模型数据合并，以对模型数据合并任务进行细化，降低每次任务处理的数据量。每个批次内触发原始模型数据合并的条件是在该批次内，所有原始模型均到齐，即每个原始模型至少发送了一次数据，以避免缺失了某个原始模型的数据。第二间隔时间与第一间隔时间可以相同，也可以不同。Among them, the multi-batch mode refers to merging the original model data in multiple batches within a cycle. For example, when the cycle is one day, the original model data can be merged once in the morning, afternoon, and evening, respectively, to merge the model data. Refinement to reduce the amount of data processed per task. The condition for triggering the data merging of original models within each batch is that all original models are in place within the batch, that is, each original model has sent data at least once to avoid missing data for a certain original model. The second interval time and the first interval time may be the same or different.

此外，在下游应用平台对标准模型数据紧急需求的情况下，或者可能存在某个原始模型不能正常发送数据的情况下，为了及时触发原始模型数据合并，可以在模型数据合并任务内设置强制合并模式。在一示例性实施例中，多模型数据处理方法还可以包括以下步骤：In addition, when the downstream application platform urgently needs the standard model data, or there may be a situation where a certain original model cannot send data normally, in order to trigger the original model data merge in time, the forced merge mode can be set in the model data merge task. . In an exemplary embodiment, the multi-model data processing method may further include the following steps:

在检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型后，检测模型数据合并任务是否激活强制合并模式；After detecting that the original model corresponding to the original model data obtained in the current batch does not include all original models, check whether the model data merging task activates the forced merging mode;

如果检测到模型数据合并任务已激活强制合并模式，则在强制合并模式的预定时间合并已经获取的原始模型数据，以获得第一中间数据；If it is detected that the model data merging task has activated the forced merging mode, merge the acquired original model data at a predetermined time in the forced merging mode to obtain the first intermediate data;

如果检测到模型数据合并任务未激活强制合并模式，则执行在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型的步骤。If it is detected that the forced merging mode is not activated in the model data merging task, the step of detecting again whether the original model corresponding to the original model data obtained in the current batch includes all the original models is performed after the second interval.

其中，强制合并模式包括执行强制合并的预定时间，可以根据应用方的数据需求时间设置预定时间，也可以将预定时间设置在每个周期临近结束的时间等。The forced merging mode includes a predetermined time for performing the forced merging. The predetermined time may be set according to the data demand time of the application side, or the predetermined time may be set near the end of each cycle.

基于上述实施例，图3示出了一种模型数据合并任务执行的流程示意图，模型数据合并任务定时执行并启动，首先检测是否已获取全部原始模型数据，如果已全部获取，则触发原始数据合并，如果未全部获取，则进一步检测是否激活多批次模式。在未激活多批次模式的情况下，需要在一定间隔时间后再重复以上步骤；在激活多批次模式的情况下，需要再检测所有的原始模型是否到齐，如果已到齐，则将已经获取的原始模型数据触发合并；如果未到齐，继续检测是否激活强制合并模式。在激活的情况下继续检测是否到达预定时间，如果到达预定时间，则无论已获取的原始模型数据是否完整，所有的原始模型是否都到齐，都强制执行合并原始模型数据；如果未到达预定时间或未激活强制合并模式，则需要在一定间隔时间后再重复以上步骤。可见，原始模型数据可以通过三种方式触发合并：已获取当前周期的全部原始模型数据，激活多批次模式且当前批次内所有原始模型已到齐，激活强制合并模式且到达预定时间。从而可以根据不同的实际情况制定相应的模型数据合并策略，以满足下游应用平台的多样化需求。Based on the above embodiment, FIG. 3 shows a schematic flowchart of the execution of a model data merging task. The model data merging task is executed and started periodically. First, it is detected whether all the original model data has been acquired. If all the original model data has been acquired, the original data merging is triggered. , if not all are acquired, further check whether the multi-batch mode is activated. When the multi-batch mode is not activated, the above steps need to be repeated after a certain interval; when the multi-batch mode is activated, it is necessary to check whether all the original models are in place. The acquired original model data triggers the merge; if not, continue to detect whether the forced merge mode is activated. In the case of activation, continue to detect whether the predetermined time has been reached. If the predetermined time is reached, regardless of whether the acquired original model data is complete and whether all original models are in order, the original model data will be forced to merge; if the predetermined time is not reached Or if the forced merge mode is not activated, you need to repeat the above steps after a certain interval. It can be seen that the original model data can be merged in three ways: all the original model data of the current cycle has been acquired, the multi-batch mode is activated and all original models in the current batch have been completed, and the forced merge mode is activated and the predetermined time is reached. Therefore, corresponding model data merging strategies can be formulated according to different actual situations to meet the diverse needs of downstream application platforms.

在一示例性实施例中，参考图4所示，将原始模型数据进行合并，获得第一中间数据可以包括以下步骤：In an exemplary embodiment, referring to FIG. 4 , combining the original model data to obtain the first intermediate data may include the following steps:

步骤S41，获取合并规则，合并规则包括字段映射规则。Step S41, acquiring a merging rule, where the merging rule includes a field mapping rule.

步骤S42，根据字段映射规则，将同一类型字段的原始模型数据进行列合并，并为各列分配标准字段，以获得第一中间数据。Step S42, according to the field mapping rule, perform column merging of the original model data of the same type of field, and assign standard fields to each column to obtain the first intermediate data.

进一步的，合并规则还可以包括字段筛选规则、模型优先级规则、格式转换规则、重置字段计算规则及过滤条件中的至少一种。在将同一类型字段的原始模型数据进行列合并，并为各列分配标准字段后，还可以通过以下步骤中的至少一个对进行列合并以及为各列分配标准字段后的原始模型数据进行合并处理：Further, the merging rules may further include at least one of field filtering rules, model priority rules, format conversion rules, reset field calculation rules, and filtering conditions. After merging the original model data of the same type of field and assigning standard fields to each column, the original model data after column merging and assigning standard fields to each column can also be merged through at least one of the following steps. :

步骤S43，去除字段筛选规则指定的字段。Step S43, remove the field specified by the field filter rule.

步骤S44，根据模型优先级规则，当存在多个原始模型在同一类型字段且主索引相同的原始模型数据不同时，保留优先级最高的原始模型的原始模型数据。Step S44, according to the model priority rule, when there are multiple original models in the same type field and the original model data with the same primary index are different, the original model data of the original model with the highest priority is retained.

步骤S45，根据格式转换规则，按照各标准字段的定制格式对各标准字段的原始模型数据进行格式转换。Step S45, according to the format conversion rule, format conversion is performed on the original model data of each standard field according to the customized format of each standard field.

步骤S46，根据重置字段计算规则，对重置字段的原始模型数据进行转换计算。Step S46, according to the reset field calculation rule, perform conversion calculation on the original model data of the reset field.

步骤S47，去除符合过滤条件的原始模型数据。Step S47, remove the original model data that meets the filtering conditions.

其中，重置字段计算规则是指某一字段在各原始模型中的计算方法不同，可以综合各原始模型的计算方法后，为第一中间数据确定新的计算方法；过滤条件可以是针对某个或某些字段过滤掉特定数值的元组，例如过滤掉“用户类型”＝“VIP”的原始模型数据等。Among them, the reset field calculation rule means that the calculation method of a certain field in each original model is different, and a new calculation method can be determined for the first intermediate data after synthesizing the calculation methods of each original model; the filter condition can be for a certain field Or some fields filter out tuples of specific values, for example, filter out the original model data of "user type" = "VIP", etc.

需要说明的是，根据实际应用的需求，合并规则可以包括上述合并规则的全部规则，也可以仅包括其中的一部分规则，还可以设置更多的合并规则条件，或对上述合并规则进行更改等。例如参考图5所示，可以在合并规则中增设排重字段规则、追踪字段规则，排重字段规则是指可以确定原始模型数据中的排重字段以去除重复的元组(数据行)；追踪字段规则是指当多个原始模型都包含编号时，可以指定以其中特定一个原始模型的编号作为第一中间数据的编号。本实施例对于合并规则的具体内容不做特别限定。It should be noted that, according to actual application requirements, the merging rules may include all the above merging rules, or only a part of them, and more merging rule conditions may be set, or the above merging rules may be changed, etc. For example, referring to FIG. 5, a reordering field rule and a tracking field rule can be added to the merge rule. The reordering field rule means that the reordering field in the original model data can be determined to remove duplicate tuples (data rows); tracking The field rule means that when multiple original models contain numbers, the number of a specific original model can be specified as the number of the first intermediate data. This embodiment does not specifically limit the specific content of the merging rule.

参考图6所示，在实际应用中，可以将合并规则映射为定制的数据结构，例如模型优先级规则{column,{modelid1,modelid2,…}}可以映射为{String,PairIndex<Integer,Integer>}；追踪字段规则{"ext1":",|column，column,…","ext2":",|column",…}可以映射为{BiConsumer<Integer,Map<String,Object>>,BiConsumer<Map<String,Object>,Map<String,Object>>}；重置字段计算规则{"2":"column,…,"1":"column2,column2,…"}可以映射为{BiConsumer<Map<String,Object>,Map<String,Object>>，Consumer<Map<String,Object>>}等，从而可以对原始模型数据进行合并规则中的各项操作，以实现数据合并。Referring to Figure 6, in practical applications, merging rules can be mapped to customized data structures. For example, model priority rules {column,{modelid1,modelid2,...}} can be mapped to {String,PairIndex<Integer,Integer> }; Tracking field rules {"ext1":",|column,column,…","ext2":",|column",…} can be mapped to {BiConsumer<Integer,Map<String,Object>>,BiConsumer< Map<String,Object>,Map<String,Object>>}; reset field calculation rules {"2":"column,...,"1":"column2,column2,..."} can be mapped to {BiConsumer<Map <String, Object>, Map<String, Object>>, Consumer<Map<String, Object>>}, etc., so that various operations in the merging rules can be performed on the original model data to realize data merging.

在一示例性实施例中，当原始模型数据为用户数据时，经过合并的第一中间数据的每个元组为每个用户的基本信息数据、行为数据、分析数据等的汇总，目标字段可以包括用户的最近活跃时间及抽取状态，则步骤S13可以包括以下步骤：In an exemplary embodiment, when the original model data is user data, each tuple of the merged first intermediate data is a summary of each user's basic information data, behavior data, analysis data, etc., and the target field can be Including the user's latest active time and extraction state, then step S13 may include the following steps:

从应用任务中提取抽取条件，抽取条件包括抽取时间、活动周期及不重复抽取。Extraction conditions are extracted from application tasks, and extraction conditions include extraction time, activity period and non-repetitive extraction.

在到达抽取时间时，从第一中间数据中抽取活跃时间在活动周期内且抽取状态为未抽取的元组，以获得第二中间数据。When the extraction time arrives, the tuples whose active time is within the active period and whose state is not extracted are extracted from the first intermediate data to obtain second intermediate data.

其中，应用任务从下游应用平台获取，下游应用平台在进行营销活动时，可以选取在活动周期内活跃的用户作为活动对象。本实施例中，参考上述图5所示，抽取条件实际上包含了两部分条件：目标字段的抽取条件与抽取状态条件。抽取状态可以是在模型数据合并阶段生成的新字段，初始值都为“未抽取”，在数据抽取阶段，如果某个元组被抽取，则将元组的该字段值改为“已抽取”。抽取时间可以是活动开始的时间或活动预备的时间，不重复抽取是指在活动周期内，对于已经进行过营销推送的用户，不再抽取。Wherein, the application task is obtained from the downstream application platform, and when the downstream application platform conducts the marketing activity, the user who is active in the activity period may be selected as the activity object. In this embodiment, referring to FIG. 5 above, the extraction condition actually includes two parts of conditions: the extraction condition of the target field and the extraction state condition. The extraction state can be a new field generated in the model data merging stage, and the initial value is "unextracted". In the data extraction stage, if a tuple is extracted, the field value of the tuple is changed to "extracted" . The extraction time can be the time when the activity starts or the time when the activity is prepared. Non-repetitive extraction means that users who have already made marketing pushes will not be extracted during the activity cycle.

上述数据抽取流程可以参考图7所示，在开始执行数据抽取任务后，由于当前周期的原始模型数据获取及原始模型数据合并可能未完成，因此可以获取上一周期的第一中间数据，并通过活跃时间是否在活动周期内，以及元组是否未抽取两个判断步骤确定抽取哪些元组，从而可以获得第二中间数据。在其他情况下，抽取的数据范围不限于上一周期的第一中间数据，也可以从更早周期、或者多个历史周期的第一中间数据抽取数据，本实施例对此不做特别限定。基于数据抽取任务，可以从数据量庞大的第一中间数据中按照下游应用平台的需求抽取一部分数据，并进行后续处理，以最终获得下游应用平台需要的标准模型数据，从而进一步减少了整个过程的数据处理量，使得标准模型数据的配置无需与原始模型数据完全一致，较为灵活。The above data extraction process can be referred to as shown in Figure 7. After starting the data extraction task, since the original model data acquisition and original model data merging of the current cycle may not be completed, the first intermediate data of the previous cycle can be obtained and passed. Whether the active time is within the active period, and whether the tuple is not extracted are two judgment steps to determine which tuples are extracted, so that the second intermediate data can be obtained. In other cases, the range of extracted data is not limited to the first intermediate data of the previous period, and data may also be extracted from the first intermediate data of an earlier period or multiple historical periods, which is not particularly limited in this embodiment. Based on the data extraction task, a part of the data can be extracted from the first intermediate data with a huge amount of data according to the needs of the downstream application platform, and subsequent processing can be performed to finally obtain the standard model data required by the downstream application platform, thereby further reducing the entire process. Due to the amount of data processing, the configuration of standard model data does not need to be completely consistent with the original model data, which is more flexible.

由于第一中间数据通常具有庞大的数据量，为了便于存储及调用，在一示例性实施例中，参考上述图6所示，多模型数据处理方法还可以包括以下步骤：在获得第一中间数据后，根据第一中间数据的多个索引字段将第一中间数据分片为多个第一中间数据分表，并分别存储到多个容器中。例如可以对第一中间数据的索引字段进行distinct操作，并通过column、group by等条件对第一中间数据进行分片，以获得各第一中间数据分表。相应的，步骤S13可以通过以下步骤实现：根据抽取条件，从多个容器中获取目标第一中间数据分表，并从目标第一中间数据分表中抽取目标字段满足抽取条件的元组，以获得第二中间数据。从而在执行数据抽取时，无需调用整个第一中间数据，只需调用其中的一个分表，并进行抽取与包装化处理。包装化处理所遵循的包装规则可以参考上述图5所示，通常包括数据排序规则、数据标签添加规则、空值填充规则等，当然也可以根据实际需要增设其他类型的包装规则，本实施例对此不做特别限定。Since the first intermediate data usually has a huge amount of data, in order to facilitate storage and invocation, in an exemplary embodiment, referring to FIG. 6 above, the multi-model data processing method may further include the following steps: after obtaining the first intermediate data Afterwards, the first intermediate data is segmented into multiple first intermediate data sub-tables according to the multiple index fields of the first intermediate data, and are respectively stored in multiple containers. For example, a distinct operation may be performed on the index field of the first intermediate data, and the first intermediate data may be fragmented by conditions such as column and group by to obtain each first intermediate data sub-table. Correspondingly, step S13 can be realized by the following steps: according to the extraction conditions, obtain the target first intermediate data sub-table from a plurality of containers, and extract the tuples whose target fields meet the extraction conditions from the target first intermediate data sub-table, to obtain the target first intermediate data sub-table. Obtain second intermediate data. Therefore, when performing data extraction, it is not necessary to call the entire first intermediate data, but only one of the sub-tables needs to be called to perform extraction and packaging processing. The packaging rules followed by the packaging process can be referred to as shown in Figure 5 above, which usually include data sorting rules, data label adding rules, null value filling rules, etc. Of course, other types of packaging rules can also be added according to actual needs. This is not particularly limited.

基于上述第一中间数据分片与分布式存储的方式，可以进一步降低了数据量，提高了处理效率。并且通过对第一中间数据分片并进行分布式存储，有利于整个系统的扩容，降低了对于上游原始模型数量的限制，可以实现海量模型数据的融合与互通。Based on the above-mentioned first intermediate data fragmentation and distributed storage method, the amount of data can be further reduced, and the processing efficiency can be improved. And by slicing and storing the first intermediate data in a distributed manner, it is beneficial to the expansion of the entire system, reducing the limit on the number of upstream original models, and realizing the fusion and intercommunication of massive model data.

进一步的，对第一中间数据进行分片存储的容器可以是Docker。Docker是一种应用容器引擎，相比于其他容器技术的主要区别是Docker可以在单一的容器内捆绑应用程序组件，使Docker在不同平台和计算系统之间实现便携性，能够较好的并行处理第一中间数据的分片规则以及第二中间数据的抽取条件。此外，Docker具有秒级的启动速率，能够提高系统运行多模型数据处理流程的效率，并且Docker的资源利用率较高，支持一台服务器上同时运行数千个Docker容器，因此能够进一步节约系统资源。Further, the container for sharded storage of the first intermediate data may be Docker. Docker is an application container engine. Compared with other container technologies, the main difference is that Docker can bundle application components in a single container, enabling Docker to achieve portability between different platforms and computing systems, and better parallel processing. Fragmentation rules for the first intermediate data and extraction conditions for the second intermediate data. In addition, Docker has a second-level startup rate, which can improve the efficiency of the system running multi-model data processing processes, and Docker has high resource utilization, supporting thousands of Docker containers running on one server at the same time, so it can further save system resources. .

本公开的示例性实施例还提供了一种多模型数据处理系统。图8示出了该系统的运行环境架构示意图。参考图8所示，该系统80可以包括：规则定制模块81，用于根据规则配置文件，生成合并规则、抽取条件以及包装规则；数据处理模块82，用于根据合并规则对原始模型数据进行合并，得到第一中间数据，根据抽取条件从第一中间数据中抽取出第二中间数据，以及根据包装规则对第二中间数据进行包装化处理，得到标准模型数据；数据存储模块83，用于获取并存储原始模型数据，以及分别存储第一中间数据、第二中间数据与标准模型数据。其中，上游原始模型可以将数据发送到数据存储模块83，由数据存储模块83将原始模型数据存储在文件存储平台，文件存储平台可以是数据存储模块83的内置单元，也可以是外部的存储平台；规则配置文件可以是人工配置的规则文本或脚本，规则定制模块81可以将其转换为适用于系统运行的合并规则、抽取条件及包装规则；数据处理模块82可以从规则定制模块中获取上述规则，对文件存储平台上的原始模型数据依次进行合并、抽取及包装，最终获得标准模型数据，可以分配到下游应用平台以进行后续使用。通过该多模型数据处理系统80，可以形成多模型数据处理的标准化流程，并且上游原始模型或下游应用平台可以设置任意数目的模型或平台，从而方便的实现了系统的扩容，可以根据实际需求增设原始模型或应用平台，使得该系统具有较强的通用性。Exemplary embodiments of the present disclosure also provide a multi-model data processing system. FIG. 8 shows a schematic diagram of the operating environment architecture of the system. Referring to FIG. 8 , the system 80 may include: a rule customization module 81 for generating merging rules, extraction conditions and packaging rules according to the rule configuration file; a data processing module 82 for merging original model data according to the merging rules , obtain the first intermediate data, extract the second intermediate data from the first intermediate data according to the extraction conditions, and perform packaging processing on the second intermediate data according to the packaging rules to obtain standard model data; the data storage module 83 is used to obtain The original model data is stored, and the first intermediate data, the second intermediate data and the standard model data are respectively stored. The upstream original model can send data to the data storage module 83, and the data storage module 83 stores the original model data in a file storage platform, which can be a built-in unit of the data storage module 83, or an external storage platform The rule configuration file can be manually configured rule text or script, and the rule customization module 81 can convert it into merge rules, extraction conditions and packaging rules suitable for system operation; the data processing module 82 can obtain the above-mentioned rules from the rule customization module , the original model data on the file storage platform is merged, extracted and packaged in turn, and finally the standard model data is obtained, which can be distributed to the downstream application platform for subsequent use. Through the multi-model data processing system 80, a standardized process for multi-model data processing can be formed, and any number of models or platforms can be set for the upstream original model or the downstream application platform, thereby facilitating the expansion of the system, which can be added according to actual needs The original model or application platform makes the system have strong versatility.

在一示例性实施例中，数据存储模块83可以包括：文件存储平台，用于周期性获取由多个原始模型发送的原始模型数据文件，将原始模型数据文件的文件名与下载历史记录进行匹配，在匹配不成功时对文件名进行解析，并将解析后的文件名与预设白名单进行匹配，以及从解析后的文件名与预设白名单匹配成功的原始模型数据文件中提取原始模型数据。In an exemplary embodiment, the data storage module 83 may include: a file storage platform for periodically acquiring original model data files sent by a plurality of original models, and matching the file names of the original model data files with the download history records. , parse the file name when the matching is unsuccessful, match the parsed file name with the preset whitelist, and extract the original model from the original model data file whose parsed file name successfully matches the preset whitelist data.

在一示例性实施例中，数据处理模块82可以包括：合并组件，用于在当前周期内，检测是否已经获取当前周期的全部原始模型数据，当检测到未获取当前周期的全部原始模型数据时，在第一间隔时间后再次检测是否已经获得当前周期的全部原始模型数据，以及当检测到已获取当前周期的全部原始模型数据时，将当前周期的原始模型数据进行合并，以获得第一中间数据。In an exemplary embodiment, the data processing module 82 may include: a merging component for detecting in the current cycle whether all the original model data of the current cycle has been acquired, when it is detected that all the original model data of the current cycle has not been acquired; , after the first interval time, check again whether all the original model data of the current cycle has been obtained, and when it is detected that all the original model data of the current cycle have been obtained, combine the original model data of the current cycle to obtain the first intermediate data.

在一示例性实施例中，合并组件还可以用于当检测到未获得当前周期的全部原始模型数据时，判断模型数据合并任务是否激活多批次模式，当判断到模型数据合并任务未激活多批次模式时，执行在第一预定时间后再次检测是否已经获取当前周期的全部原始模型数据的步骤，当判断到模型数据合并任务激活多批次模式时，在多批次模式的任一批次内，检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型，当检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型时，在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型，以及当检测到当前批次内获取的原始模型数据对应的原始模型包括全部原始模型时，对当前批次内获取的原始模型数据进行合并，以获得第一中间数据。In an exemplary embodiment, the merging component can also be used to judge whether the model data merging task activates the multi-batch mode when it is detected that all the original model data of the current cycle has not been obtained, and when it is judged that the model data merging task has not activated the multi-batch mode. In batch mode, the step of re-detecting whether all the original model data of the current cycle has been obtained after the first predetermined time is performed. When it is determined that the model data merging task activates the multi-batch mode, any batch of In the second interval, check whether the original model corresponding to the original model data obtained in the current batch includes all original models, and when it is detected that the original model corresponding to the original model data obtained in the current batch does not include all original models After time, check again whether the original model corresponding to the original model data obtained in the current batch includes all original models, and when it is detected that the original model corresponding to the original model data obtained in the current batch includes all original models, the current batch The original model data obtained within the data is merged to obtain the first intermediate data.

在一示例性实施例中，合并组件还可以用于在检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型后，检测模型数据合并任务是否激活强制合并模式，当检测到模型数据合并任务已激活强制合并模式时，在强制合并模式的预定时间合并已经获取的原始模型数据，以获得第一中间数据，以及当检测到模型数据合并任务未激活强制合并模式时，执行在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型的步骤。In an exemplary embodiment, the merging component can also be used to detect whether the model data merging task activates the forced merging mode after detecting that the original model corresponding to the original model data obtained in the current batch does not include all original models, When the model data merging task has activated the forced merging mode, merge the acquired original model data at a predetermined time in the forced merging mode to obtain the first intermediate data, and when it is detected that the model data merging task has not activated the forced merging mode, execute The step of detecting again whether the original model corresponding to the original model data obtained in the current batch includes all original models after the second interval time.

在一示例性实施例中，规则定制模块81还可以用于获取合并规则，合并规则包括字段映射规则；合并组件还可以用于根据字段映射规则，将同一类型字段的原始模型数据进行列合并，并为各列分配标准字段，以获得第一中间数据。In an exemplary embodiment, the rule customization module 81 can also be used to obtain merging rules, and the merging rules include field mapping rules; the merging component can also be used to perform column merging of the original model data of the same type of fields according to the field mapping rules, And assign standard fields to each column to get the first intermediate data.

在一示例性实施例中，合并规则还可以包括字段筛选规则、模型优先级规则、格式转换规则、重置字段计算规则及过滤条件中的至少一种；合并组件还可以用于通过以下步骤中的至少一个对进行列合并以及为各列分配标准字段后的原始模型数据进行合并处理，以获得第一中间数据：去除字段筛选规则指定的字段；根据模型优先级规则，当存在多个原始模型在同一类型字段且主索引相同的原始模型数据不同时，保留优先级最高的原始模型的原始模型数据；根据格式转换规则，按照各标准字段的定制格式对各标准字段的原始模型数据进行格式转换；根据重置字段计算规则，对重置字段的原始模型数据进行转换计算；去除符合过滤条件的原始模型数据。In an exemplary embodiment, the merging rule may further include at least one of a field screening rule, a model priority rule, a format conversion rule, a reset field calculation rule, and a filter condition; At least one of the original model data after column merging and allocating standard fields to each column is merged to obtain the first intermediate data: remove the fields specified by the field filtering rules; according to the model priority rules, when there are multiple original models When the original model data of the same type of field and the same main index are different, the original model data of the original model with the highest priority is retained; according to the format conversion rules, the original model data of each standard field is formatted according to the customized format of each standard field. ;According to the reset field calculation rules, transform and calculate the original model data of the reset field; remove the original model data that meets the filter conditions.

在一示例性实施例中，数据处理模块82可以包括：抽取组件，用于从应用任务中提取包括抽取时间、活动周期及不重复抽取的抽取条件，以及在到达抽取时间时，从第一中间数据中抽取活跃时间在活动周期内且抽取状态为未抽取的元组，以获得第二中间数据。In an exemplary embodiment, the data processing module 82 may include: an extraction component for extracting extraction conditions including extraction time, activity period and non-repetitive extraction from the application task, and when the extraction time is reached, extracting the extraction conditions from the first intermediate Extracting tuples whose active time is within the active period and whose status is not extracted from the data, to obtain second intermediate data.

在一示例性实施例中，数据存储模块83还可以用于在获得第一中间数据后，根据第一中间数据的多个索引字段将第一中间数据分片为多个第一中间数据分表，并分别存储到多个容器中；抽取组件还可以用于根据抽取条件，从多个容器中获取目标第一中间数据分表，并从目标第一中间数据分表中抽取目标字段满足抽取条件的元组，以获得第二中间数据。In an exemplary embodiment, the data storage module 83 may also be configured to, after obtaining the first intermediate data, segment the first intermediate data into multiple first intermediate data sub-tables according to multiple index fields of the first intermediate data. , and store them in multiple containers respectively; the extraction component can also be used to obtain the target first intermediate data sub-table from multiple containers according to the extraction conditions, and extract the target field from the target first intermediate data sub-table to meet the extraction conditions tuple to obtain the second intermediate data.

本公开的示例性实施例还提供了一种多模型数据处理装置，参考图9所示，该装置90可以包括：原始获取单元91，用于从多个原始模型获取原始模型数据；数据合并单元92，用于将原始模型数据进行合并，以获得第一中间数据；数据抽取单元93，用于根据针对于第一中间数据的目标字段的抽取条件，从第一中间数据中抽取目标字段满足抽取条件的元组，以获得第二中间数据；数据包装单元94，用于对第二中间数据进行包装化处理，获得标准模型数据。Exemplary embodiments of the present disclosure also provide a multi-model data processing apparatus, as shown in FIG. 9 , the apparatus 90 may include: an original acquisition unit 91 for acquiring original model data from multiple original models; a data merging unit 92, for merging the original model data to obtain the first intermediate data; data extraction unit 93, for extracting the target field from the first intermediate data according to the extraction conditions for the target field of the first intermediate data to satisfy the extraction A tuple of conditions to obtain the second intermediate data; the data packaging unit 94 is configured to perform packaging processing on the second intermediate data to obtain the standard model data.

其中，原始模型可以是生成各种原始模型数据的模型，例如上游各供应商的数据模型、各电商模块的数据模型、各团队的用户数据分析模型等。通过原始模型生成的数据即原始模型数据，由原始获取单元91从原始模型中获取。数据合并单元92在合并原始模型数据时，可以遵循一定的合并规则，例如指定各原始模型之间的优先级，指定各字段的数据定制格式，指定原始模型数据中的有效字段以裁掉不需要的数据列，指定原始模型数据中的排重字段以去除重复的元组(数据行)等，根据应用场景的不同以及后续的数据需求，可以制定相应的合并规则。数据抽取单元93通常以元组为单位，可以针对于根据目标字段的抽取条件，抽取符合条件的元组，例如抽取符合价格区间条件的商品数据，抽取符合会员等级条件的用户数据等。数据包装单元94可以基于一定的包装规则对第二中间数据进行包装化处理，例如为第二中间数据排序、添加标签、将第二中间数据的空值进行预设填充等。The original model may be a model for generating various original model data, such as the data model of each upstream supplier, the data model of each e-commerce module, the user data analysis model of each team, and the like. The data generated by the original model, that is, the original model data, is acquired from the original model by the original acquisition unit 91 . The data merging unit 92 can follow certain merging rules when merging the original model data, such as specifying the priority between the original models, specifying the data customization format of each field, and specifying the valid fields in the original model data to cut out unnecessary data. , specify the reordering field in the original model data to remove duplicate tuples (data rows), etc. Corresponding merging rules can be formulated according to different application scenarios and subsequent data requirements. The data extraction unit 93 usually uses a tuple as a unit, and can extract qualified tuples according to the extraction conditions of the target field, such as extracting commodity data that meets price range conditions, and extracting user data that meets membership level conditions. The data packaging unit 94 may perform packaging processing on the second intermediate data based on certain packaging rules, such as sorting the second intermediate data, adding tags, and pre-filling null values of the second intermediate data.

本示例性实施例的多模型数据处理装置可以实现不同模型数据的融合与互通，提高后续数据应用的方便度，并形成多模型数据的标准化处理流程，满足模型数据的定制化需求。The multi-model data processing apparatus of this exemplary embodiment can realize the fusion and intercommunication of different model data, improve the convenience of subsequent data application, and form a standardized processing flow of multi-model data to meet the customization requirements of model data.

在一示例性实施例中，原始获取单元91可以包括：文件名匹配子单元，用于周期性遍历文件存储平台上由多个原始模型发送的原始模型数据文件的文件名，将文件名与下载历史记录进行匹配，并在匹配不成功时对文件名进行解析，以及将解析后的文件名与预设白名单进行匹配；文件下载子单元，用于当解析后的文件名与预设白名单匹配成功时，从文件存储平台下载原始模型数据文件；数据提取子单元，用于从下载的原始模型数据文件中提取原始模型数据。In an exemplary embodiment, the original acquisition unit 91 may include: a file name matching subunit, configured to periodically traverse the file names of the original model data files sent by a plurality of original models on the file storage platform, and match the file names with the downloaded files. The history records are matched, and when the match is unsuccessful, the file name is parsed, and the parsed file name is matched with the preset whitelist; the file download subunit is used when the parsed file name is compared with the preset whitelist. When the matching is successful, the original model data file is downloaded from the file storage platform; the data extraction subunit is used to extract the original model data from the downloaded original model data file.

在一示例性实施例中，数据合并单元92可以包括：全量检测子单元，用于在当前周期内，检测是否已经获取当前周期的全部原始模型数据，以及当检测到未获取当前周期的全部原始模型数据时，在第一间隔时间后再次检测是否已经获得当前周期的全部原始模型数据；合并处理子单元，用于当检测到已获取当前周期的全部原始模型数据时，将当前周期的原始模型数据进行合并，以获得第一中间数据。In an exemplary embodiment, the data merging unit 92 may include: a full-quantity detection sub-unit for detecting in the current cycle whether all the original model data of the current cycle has been acquired, and when it is detected that all the original model data of the current cycle have not been acquired. When the model data is used, check again whether all the original model data of the current cycle has been obtained after the first interval time; the merge processing sub-unit is used to combine the original model data of the current cycle when it is detected that all the original model data of the current cycle have been obtained. The data are combined to obtain the first intermediate data.

在一示例性实施例中，全量检测子单元还可以用于，当检测到未获得当前周期的全部原始模型数据时，判断模型数据合并任务是否激活多批次模式，当判断到模型数据合并任务未激活多批次模式时，执行在第一预定时间后再次检测是否已经获取当前周期的全部原始模型数据的步骤，当判断到模型数据合并任务激活多批次模式时，在多批次模式的任一批次内，检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型，以及当检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型时，在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型；合并处理子单元还可以用于在当前批次内获取的原始模型数据对应的原始模型包括全部原始模型时，对当前批次内获取的原始模型数据进行合并，以获得第一中间数据。In an exemplary embodiment, the full detection subunit can also be used to, when it is detected that all original model data of the current cycle has not been obtained, determine whether the model data merging task activates the multi-batch mode, and when it is determined that the model data merging task is to activate the multi-batch mode When the multi-batch mode is not activated, the step of detecting again whether all the original model data of the current cycle has been acquired is performed after the first predetermined time. In any batch, check whether the original model corresponding to the original model data obtained in the current batch includes all original models, and when it is detected that the original model corresponding to the original model data obtained in the current batch does not include all original models, After the second interval time, check again whether the original model corresponding to the original model data obtained in the current batch includes all original models; When all original models are available, the original model data obtained in the current batch is merged to obtain the first intermediate data.

在一示例性实施例中，全量检测子单元还可以用于在检测到当前批次内获取的原始模型数据对应的原始模型未包括全部原始模型后，检测模型数据合并任务是否激活强制合并模式，以及在检测到模型数据合并任务未激活强制合并模式时，执行在第二间隔时间后再次检测当前批次内获取的原始模型数据对应的原始模型是否包括全部原始模型的步骤；合并处理子单元还可以用于模型数据合并任务已激活强制合并模式时，在强制合并模式的预定时间合并已经获取的原始模型数据，以获得第一中间数据。In an exemplary embodiment, the full detection subunit may also be used to detect whether the model data merging task activates the forced merging mode after detecting that the original model corresponding to the original model data obtained in the current batch does not include all original models, And when it is detected that the model data merging task does not activate the forced merging mode, the step of detecting again whether the original model corresponding to the original model data obtained in the current batch includes all the original models after the second interval time is performed; the merging processing subunit also It can be used when the model data merging task has activated the forced merging mode, and at a predetermined time in the forced merging mode, the acquired original model data is merged to obtain the first intermediate data.

在一示例性实施例中，数据合并单元92还可以用于获取包括字段映射规则的合并规则，以及根据字段映射规则，将同一类型字段的原始模型数据进行列合并，并为各列分配标准字段，以获得第一中间数据。In an exemplary embodiment, the data merging unit 92 may also be configured to obtain merging rules including field mapping rules, and perform column merging of original model data of the same type of fields according to the field mapping rules, and assign standard fields to each column. , to obtain the first intermediate data.

在一示例性实施例中，合并规则还可以包括字段筛选规则、模型优先级规则、格式转换规则、重置字段计算规则及过滤条件中的至少一种；数据合并单元还可以用于通过以下步骤中的至少一个对进行列合并以及为各列分配标准字段后的原始模型数据进行合并处理，以获得第一中间数据：去除字段筛选规则指定的字段；根据模型优先级规则，当存在多个原始模型在同一类型字段且主索引相同的原始模型数据不同时，保留优先级最高的原始模型的原始模型数据；根据格式转换规则，按照各标准字段的定制格式对各标准字段的原始模型数据进行格式转换；根据重置字段计算规则，对重置字段的原始模型数据进行转换计算；去除符合过滤条件的原始模型数据。In an exemplary embodiment, the merging rules may further include at least one of field screening rules, model priority rules, format conversion rules, reset field calculation rules, and filter conditions; the data merging unit may also be used to pass the following steps: At least one of the original model data after column merging and allocating standard fields to each column is merged to obtain the first intermediate data: remove the fields specified by the field filtering rules; according to the model priority rules, when there are multiple original model data When the original model data of the same type of field and the same primary index is different, the original model data of the original model with the highest priority is retained; according to the format conversion rules, the original model data of each standard field is formatted according to the customized format of each standard field Transform: According to the reset field calculation rules, transform and calculate the original model data of the reset field; remove the original model data that meets the filter conditions.

在一示例性实施例中，目标字段可以包括最近活跃时间及抽取状态；数据抽取单元还可以用于从应用任务中提取包括抽取时间、活动周期及不重复抽取的抽取条件，以及在到达抽取时间时，从第一中间数据中抽取活跃时间在活动周期内且抽取状态为未抽取的元组，以获得第二中间数据。In an exemplary embodiment, the target field may include the most recent active time and extraction status; the data extraction unit may also be used to extract extraction conditions including extraction time, activity period and non-repetitive extraction from the application task, and the extraction time at the arrival of extraction time. When , extract the tuples whose active time is within the active period and whose status is not extracted from the first intermediate data to obtain the second intermediate data.

在一示例性实施例中，多模型数据处理装置90还可以包括：分片存储单元(图中未示出)，用于在获得第一中间数据后，根据第一中间数据的多个索引字段将第一中间数据分片为多个第一中间数据分表，并分别存储到多个容器中；数据抽取单元还可以用于根据抽取条件，从多个容器中获取目标第一中间数据分表，并从目标第一中间数据分表中抽取目标字段满足抽取条件的元组，以获得第二中间数据。In an exemplary embodiment, the multi-model data processing apparatus 90 may further include: a sharding storage unit (not shown in the figure), configured to, after obtaining the first intermediate data, store data according to a plurality of index fields of the first intermediate data Divide the first intermediate data into multiple first intermediate data sub-tables and store them in multiple containers respectively; the data extraction unit can also be used to obtain the target first intermediate data sub-tables from multiple containers according to extraction conditions , and extract the tuples whose target fields satisfy the extraction conditions from the target first intermediate data sub-table to obtain the second intermediate data.

本公开的示例性实施例还提供了一种能够实现上述方法的电子设备。Exemplary embodiments of the present disclosure also provide an electronic device capable of implementing the above method.

所属技术领域的技术人员能够理解，本公开的各个方面可以实现为系统、方法或程序产品。因此，本公开的各个方面可以具体实现为以下形式，即：完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等)，或硬件和软件方面结合的实施方式，这里可以统称为“电路”、“模块”或“系统”。As will be appreciated by one skilled in the art, various aspects of the present disclosure may be implemented as a system, method or program product. Therefore, various aspects of the present disclosure can be embodied in the following forms: a complete hardware implementation, a complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software aspects, which may be collectively referred to herein as implementations "circuit", "module" or "system".

下面参照图10来描述根据本公开的这种示例性实施例的电子设备1000。图10显示的电子设备1000仅仅是一个示例，不应对本公开实施例的功能和使用范围带来任何限制。An electronic device 1000 according to such an exemplary embodiment of the present disclosure is described below with reference to FIG. 10 . The electronic device 1000 shown in FIG. 10 is only an example, and should not impose any limitation on the functions and scope of use of the embodiments of the present disclosure.

如图10所示，电子设备1000以通用计算设备的形式表现。电子设备1000的组件可以包括但不限于：上述至少一个处理单元1010、上述至少一个存储单元1020、连接不同系统组件(包括存储单元1020和处理单元1010)的总线1030、显示单元1040。As shown in FIG. 10, electronic device 1000 takes the form of a general-purpose computing device. Components of the electronic device 1000 may include, but are not limited to: the above-mentioned at least one processing unit 1010 , the above-mentioned at least one storage unit 1020 , a bus 1030 connecting different system components (including the storage unit 1020 and the processing unit 1010 ), and a display unit 1040 .

其中，存储单元存储有程序代码，程序代码可以被处理单元1010执行，使得处理单元1010执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。例如，处理单元1010可以执行图1所示的步骤S11～S14，也可以执行图4所示的步骤S41～S47等。The storage unit stores program codes, which can be executed by the processing unit 1010, so that the processing unit 1010 performs the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "Exemplary Methods" section of this specification. For example, the processing unit 1010 may execute steps S11 to S14 shown in FIG. 1 , and may also execute steps S41 to S47 shown in FIG. 4 , and so on.

存储单元1020可以包括易失性存储单元形式的可读介质，例如随机存取存储单元(RAM)1021和/或高速缓存存储单元1022，还可以进一步包括只读存储单元(ROM)1023。The storage unit 1020 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 1021 and/or a cache storage unit 1022 , and may further include a read only storage unit (ROM) 1023 .

存储单元1020还可以包括具有一组(至少一个)程序模块1025的程序/实用工具1024，这样的程序模块1025包括但不限于：操作系统、一个或者多个应用程序、其它程序模块以及程序数据，这些示例中的每一个或某种组合中可能包括网络环境的实现。The storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, An implementation of a network environment may be included in each or some combination of these examples.

总线1030可以为表示几类总线结构中的一种或多种，包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。The bus 1030 may be representative of one or more of several types of bus structures, including a memory cell bus or memory cell controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures bus.

电子设备1000也可以与一个或多个外部设备1200(例如键盘、指向设备、蓝牙设备等)通信，还可与一个或者多个使得用户能与该电子设备1000交互的设备通信，和/或与使得该电子设备1000能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口1050进行。并且，电子设备1000还可以通过网络适配器1060与一个或者多个网络(例如局域网(LAN)，广域网(WAN)和/或公共网络，例如因特网)通信。如图所示，网络适配器1060通过总线1030与电子设备1000的其它模块通信。应当明白，尽管图中未示出，可以结合电子设备1000使用其它硬件和/或软件模块，包括但不限于：微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。The electronic device 1000 may also communicate with one or more external devices 1200 (eg, keyboards, pointing devices, Bluetooth devices, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with Any device (eg, router, modem, etc.) that enables the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interface 1050 . Also, the electronic device 1000 may communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network such as the Internet) through the network adapter 1060 . As shown, network adapter 1060 communicates with other modules of electronic device 1000 via bus 1030 . It should be appreciated that, although not shown, other hardware and/or software modules may be used in conjunction with electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives and data backup storage systems.

通过以上的实施方式的描述，本领域的技术人员易于理解，这里描述的示例实施方式可以通过软件实现，也可以通过软件结合必要的硬件的方式来实现。因此，根据本公开实施方式的技术方案可以以软件产品的形式体现出来，该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM，U盘，移动硬盘等)中或网络上，包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开示例性实施例的方法。From the description of the above embodiments, those skilled in the art can easily understand that the exemplary embodiments described herein may be implemented by software, or may be implemented by software combined with necessary hardware. Therefore, the technical solutions according to the embodiments of the present disclosure may be embodied in the form of software products, and the software products may be stored in a non-volatile storage medium (which may be CD-ROM, U disk, mobile hard disk, etc.) or on the network , including several instructions to cause a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to an exemplary embodiment of the present disclosure.

本公开的示例性实施例还提供了一种计算机可读存储介质，其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中，本公开的各个方面还可以实现为一种程序产品的形式，其包括程序代码，当程序产品在终端设备上运行时，程序代码用于使终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。Exemplary embodiments of the present disclosure also provide a computer-readable storage medium on which a program product capable of implementing the above-described method of the present specification is stored. In some possible implementations, various aspects of the present disclosure can also be implemented in the form of a program product, which includes program code, when the program product runs on a terminal device, the program code is used to cause the terminal device to execute the above-mentioned procedures in this specification. Steps according to various exemplary embodiments of the present disclosure are described in the "Example Methods" section.

参考图11所示，描述了根据本公开的示例性实施例的用于实现上述方法的程序产品1100，其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码，并可以在终端设备，例如个人电脑上运行。然而，本公开的程序产品不限于此，在本文件中，可读存储介质可以是任何包含或存储程序的有形介质，该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Referring to FIG. 11 , a program product 1100 for implementing the above method is described according to an exemplary embodiment of the present disclosure, which can adopt a portable compact disk read only memory (CD-ROM) and include program codes, and can be stored in a terminal devices such as personal computers. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.

程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件，或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括：具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号，其中承载了可读程序代码。这种传播的数据信号可以采用多种形式，包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质，该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a propagated data signal in baseband or as part of a carrier wave with readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A readable signal medium can also be any readable medium, other than a readable storage medium, that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

可读介质上包含的程序代码可以用任何适当的介质传输，包括但不限于无线、有线、光缆、RF等等，或者上述的任意合适的组合。Program code embodied on a readable medium may be transmitted using any suitable medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码，程序设计语言包括面向对象的程序设计语言—诸如Java、C++等，还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中，远程计算设备可以通过任意种类的网络，包括局域网(LAN)或广域网(WAN)，连接到用户计算设备，或者，可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming Language - such as the "C" language or similar programming language. The program code may execute entirely on the user computing device, partly on the user device, as a stand-alone software package, partly on the user computing device and partly on a remote computing device, or entirely on the remote computing device or server execute on. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (eg, using an Internet service provider business via an Internet connection).

此外，上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明，而不是限制目的。易于理解，上述附图所示的处理并不表明或限制这些处理的时间顺序。另外，也易于理解，这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned figures are merely schematic illustrations of the processes included in the methods according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It is easy to understand that the processes shown in the above figures do not indicate or limit the chronological order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, in multiple modules.

应当注意，尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元，但是这种划分并非强制性的。实际上，根据本公开的示例性实施例，上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之，上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the apparatus for action performance are mentioned in the above detailed description, this division is not mandatory. Indeed, according to exemplary embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of one module or unit described above may be further divided into multiple modules or units to be embodied.

本领域技术人员在考虑说明书及实践这里公开的发明后，将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化，这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的，本公开的真正范围和精神由权利要求指出。Other embodiments of the present disclosure will readily suggest themselves to those skilled in the art upon consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the present disclosure that follow the general principles of the present disclosure and include common knowledge or techniques in the technical field not disclosed by the present disclosure . The specification and examples are to be regarded as exemplary only, with the true scope and spirit of the disclosure being indicated by the claims.

应当理解的是，本公开并不局限于上面已经描述并在附图中示出的精确结构，并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It is to be understood that the present disclosure is not limited to the precise structures described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. a multi-model data processing method, is characterized in that, comprises:

Obtain original model data from multiple original models;

combining the original model data to obtain first intermediate data;

According to the extraction condition for the target field of the first intermediate data, extract the tuple whose target field satisfies the extraction condition from the first intermediate data to obtain the second intermediate data;

The second intermediate data is packaged to obtain standard model data.

2. The method according to claim 1, wherein the obtaining original model data from a plurality of original models comprises:

Periodically traverse the file names of the original model data files sent by the multiple original models on the file storage platform;

Matching the file name with the download history record, and parsing the file name when the matching is unsuccessful, and matching the parsed file name with a preset whitelist;

If the parsed file name matches the preset whitelist successfully, downloading the original model data file from the file storage platform;

The original model data is extracted from the downloaded original model data file.

3. The method according to claim 1, wherein the combining the original model data to obtain the first intermediate data comprises:

In the current cycle, detecting whether all the original model data of the current cycle have been acquired;

If it is detected that all the original model data of the current cycle has not been obtained, then after the first interval time, it is detected again whether all the original model data of the current cycle have been obtained;

If it is detected that all the original model data of the current cycle have been acquired, the original model data of the current cycle are combined to obtain the first intermediate data.

4. The method according to claim 3, wherein the method further comprises:

If it is detected that all the original model data of the current cycle has not been obtained, determine whether the model data merging task activates the multi-batch mode;

If it is determined that the multi-batch mode is not activated by the model data merging task, performing the step of re-detecting whether all the original model data of the current cycle has been acquired after a first predetermined time;

If it is determined that the multi-batch mode is activated by the model data merging task, in any batch of the multi-batch mode, it is detected whether the original model corresponding to the original model data obtained in the current batch includes all the original models. Model;

If it is detected that the original model corresponding to the original model data obtained in the current batch does not include all the original models, then after the second interval time, it is detected again whether the original model corresponding to the original model data obtained in the current batch includes the original model. all said original models;

If it is detected that the original model corresponding to the original model data obtained in the current batch includes all the original models, the original model data obtained in the current batch is merged to obtain the first intermediate data.

5. The method according to claim 4, wherein the method further comprises:

After detecting that the original model corresponding to the original model data obtained in the current batch does not include all the original models, detecting whether the model data merging task activates the forced merging mode;

If it is detected that the model data merging task has activated a forced merging mode, combining the acquired original model data at a predetermined time in the forced merging mode to obtain the first intermediate data;

If it is detected that the forced merging mode is not activated in the model data merging task, the step of detecting again whether the original model corresponding to the original model data obtained in the current batch includes all the original models is performed after the second interval.

6. The method according to claim 1, wherein the combining the original model data to obtain the first intermediate data comprises:

obtaining a merging rule, where the merging rule includes a field mapping rule;

According to the field mapping rule, the original model data of the same type of field is column-merged, and standard fields are allocated to each column to obtain the first intermediate data.

7. The method according to claim 6, wherein the merging rule further comprises at least one of a field screening rule, a model priority rule, a format conversion rule, a reset field calculation rule and a filter condition;

After merging the original model data of the same type of field and assigning standard fields to each column, the method further includes:

The original model data after column merging and allocating standard fields to each column are merged by at least one of the following steps:

remove the field specified by the field filter rule;

According to the model priority rule, when there are multiple original models with different original model data in the same type field and the same primary index, the original model data of the original model with the highest priority is retained;

According to the format conversion rule, format conversion is performed on the original model data of each of the standard fields according to the customized format of each of the standard fields;

According to the reset field calculation rule, transform and calculate the original model data of the reset field;

Remove raw model data that matches the filter criteria.

8. The method according to claim 1, wherein the target field comprises the latest active time and extraction state;

The extracting, according to the extraction condition for the target field of the first intermediate data, the tuple of which the target field satisfies the extraction condition is extracted from the first intermediate data to obtain the second intermediate data including:

Extracting the extraction condition from the application task, the extraction condition includes extraction time, activity period and non-repetitive extraction;

When the extraction time is reached, a tuple whose active time is within the active period and whose extraction state is unextracted is extracted from the first intermediate data to obtain the second intermediate data.

9. The method of claim 1, wherein the method further comprises:

After obtaining the first intermediate data, fragment the first intermediate data into multiple first intermediate data sub-tables according to multiple index fields of the first intermediate data, and store them in multiple containers respectively;

According to the extraction condition, the target first intermediate data sub-table is obtained from the plurality of containers, and the tuple whose target field satisfies the extraction condition is extracted from the target first intermediate data sub-table, so as to obtain the second intermediate data.

10. A multi-model data processing system, comprising:

The rule customization module is used to generate merge rules, extraction conditions and packaging rules according to the rule configuration file;

a data processing module for merging the original model data according to the merging rules to obtain first intermediate data, extracting second intermediate data from the first intermediate data according to the extraction conditions, and extracting the second intermediate data from the first intermediate data according to the extraction conditions, and performing packaging processing on the second intermediate data to obtain standard model data;

A data storage module, configured to acquire and store the original model data, and store the first intermediate data, the second intermediate data and the standard model data respectively.

11. A multi-model data processing device, comprising:

an original acquisition unit for acquiring original model data from multiple original models;

a data merging unit for merging the original model data to obtain first intermediate data;

A data extraction unit, configured to extract, from the first intermediate data, a tuple whose target field satisfies the extraction condition according to an extraction condition for a target field of the first intermediate data, to obtain second intermediate data ;

A data packaging unit, configured to perform packaging processing on the second intermediate data to obtain standard model data.

12. An electronic device, characterized in that, comprising:

processor; and

a memory for storing executable instructions for the processor;

wherein the processor is configured to perform the method of any of claims 1-9 by executing the executable instructions.

13. A computer-readable storage medium on which a computer program is stored, characterized in that, when the computer program is executed by a processor, the method of any one of claims 1-9 is implemented.