[go: up one dir, main page]

CN114428608A - Task optimization method based on big data and related equipment - Google Patents

Task optimization method based on big data and related equipment Download PDF

Info

Publication number
CN114428608A
CN114428608A CN202210080035.5A CN202210080035A CN114428608A CN 114428608 A CN114428608 A CN 114428608A CN 202210080035 A CN202210080035 A CN 202210080035A CN 114428608 A CN114428608 A CN 114428608A
Authority
CN
China
Prior art keywords
task
target
tasks
dependency
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210080035.5A
Other languages
Chinese (zh)
Inventor
邓雪昭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202210080035.5A priority Critical patent/CN114428608A/en
Publication of CN114428608A publication Critical patent/CN114428608A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/36Software reuse

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application belongs to the field of artificial intelligence and big data, and relates to a task optimization method based on big data, which comprises the steps of obtaining a task dependency set of each target task; classifying the target tasks to obtain a task overlap ratio set formed by the target tasks; calculating the contact ratio of any two target tasks in the task contact ratio set; obtaining the reuse degree; and when the reusability of the target tasks is low and the overlap ratio of the two target tasks is high, combining the two target tasks into the same target task. The application also provides a device, computer equipment and a storage medium. In addition, the application also relates to a block chain technology, and a task dependent set, a repetition dependent set, a coincidence degree and a reusability can be stored in the block chain. According to the method and the device, the target task pairs with low reuse degree and high coincidence degree are screened out, the target tasks are merged or replaced, chimney development is reduced, and the beneficial effect of optimizing task development is achieved.

Description

基于大数据的任务优化方法、及相关设备Task optimization method and related equipment based on big data

技术领域technical field

本申请涉及人工智能与大数据技术领域,尤其涉及一种基于大数据的任务优化方法、装置、计算机设备及存储介质。The present application relates to the technical field of artificial intelligence and big data, and in particular, to a task optimization method, device, computer equipment and storage medium based on big data.

背景技术Background technique

数据资产是企业数字化转型最重要的基础,随着企业的快速发展,数据规模前所未有,为了快速实现业务的需求,开发的过程中往往伴随着大量烟囱的产生,不同业务线之间的应用数据是割裂的,进一步增大烟囱式开发的数量。Data assets are the most important foundation for digital transformation of enterprises. With the rapid development of enterprises, the scale of data is unprecedented. In order to quickly meet business needs, the development process is often accompanied by the generation of a large number of chimneys. The application data between different business lines is fragmented, further increasing the number of chimney developments.

现有技术中,会不定期对冗余任务进行人工梳理,而下线任务过程中又疲于对同链路相关任务进行分析,存在漏下的风险;而数据链路的整合,由于历史开发动辄几十层的链路往往难以分析,整合难度大;此外,减少烟囱开发和提高任务复用,要求需求设计阶段必须严格把控现有的数据体系,选择最优的设计方案,对需求设计人员的要求较高。In the prior art, redundant tasks are manually sorted out from time to time, and tasks related to the same link are analyzed in the process of offline tasks, and there is a risk of missing; and the integration of data links, due to historical development Links with dozens of layers are often difficult to analyze and difficult to integrate. In addition, reducing the development of chimneys and improving task reuse requires that the existing data system must be strictly controlled in the requirements design stage, the optimal design scheme should be selected, and the requirements should be designed. Personnel requirements are high.

因此,烟囱式开发造成系统内存在大量重复计算与开发,降低研发效率,导致计算与存储资源的浪费,增大人力、计算成本。Therefore, chimney-style development results in a large number of repeated calculations and development in the system, which reduces R&D efficiency, leads to waste of computing and storage resources, and increases labor and computing costs.

发明内容SUMMARY OF THE INVENTION

本申请实施例的目的在于提出一种基于大数据的任务优化方法,以解决烟囱式开发造成系统内存在大量重复计算与开发,导致计算与存储资源的浪费,增大人力的技术问题。The purpose of the embodiments of the present application is to propose a task optimization method based on big data, so as to solve the technical problem of a large number of repeated calculations and development in the system caused by chimney development, waste of computing and storage resources, and increase of manpower.

为了解决上述技术问题,本申请实施例提供一种基于大数据的任务优化方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application provides a task optimization method based on big data, which adopts the following technical solutions:

接收到优化任务指令时,获取各个目标任务的任务依赖集合;When receiving the optimization task instruction, obtain the task dependency set of each target task;

根据所述优化任务指令对各个所述目标任务进行分类,获取由所述目标任务构成的任务重合度集合;Classify each of the target tasks according to the optimization task instruction, and obtain a task coincidence degree set composed of the target tasks;

计算所述任务重合度集合中任意两个目标任务的重合度;Calculate the coincidence degree of any two target tasks in the task coincidence degree set;

获取所述任务重合度集合中各个目标任务的复用度;Obtain the degree of reuse of each target task in the set of task coincidence degrees;

若所述任务重合度集合中的目标任务的复用度低于预设复用度且该目标任务与另一目标任务的重合度高于预设重合度时,将所述目标任务合并于所述另一目标任务。If the reuse degree of the target task in the task coincidence degree set is lower than the preset reuse degree and the coincidence degree of the target task and another target task is higher than the preset coincidence degree, the target task is merged into the target task. another objective task.

为了解决上述技术问题,本申请实施例还提供一种基于大数据的任务优化装置,包括:In order to solve the above technical problems, the embodiment of the present application also provides a task optimization device based on big data, including:

接收模块,用于接收到优化任务指令时,获取各个目标任务的任务依赖集合;The receiving module is used to obtain the task dependency set of each target task when receiving the optimization task instruction;

分类模块,用于根据所述优化任务指令对各个所述目标任务进行分类,获取由所述目标任务构成的任务重合度集合;A classification module, configured to classify each of the target tasks according to the optimization task instruction, and obtain a task coincidence degree set composed of the target tasks;

重合度计算模块,用于计算所述任务重合度集合中任意两个目标任务的重合度;A coincidence degree calculation module, used for calculating the coincidence degree of any two target tasks in the task coincidence degree set;

复用度计算模块,用于获取所述任务重合度集合中各个目标任务的复用度;a reusability calculation module, used to obtain the reusability of each target task in the task coincidence degree set;

合并模块,用于若所述任务重合度集合中的目标任务的复用度低于预设复用度且该目标任务与另一目标任务的重合度高于预设重合度时,将所述目标任务合并于所述另一目标任务。The merging module is configured to, if the degree of reuse of the target task in the set of task coincidence degrees is lower than the preset degree of reuse and the degree of coincidence between the target task and another target task is higher than the preset degree of coincidence, The target task is merged with the other target task.

为了解决上述技术问题,本申请实施例还提供一种计算机设备,包括:In order to solve the above technical problem, the embodiment of the present application also provides a computer device, including:

存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如上述的基于大数据的任务优化方法的步骤。A memory and a processor, where computer-readable instructions are stored in the memory, and when the processor executes the computer-readable instructions, the steps of the above-mentioned big data-based task optimization method are implemented.

为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,包括:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, including:

所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如上述的基于大数据的任务优化方法的步骤。The computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, implements the steps of the above-mentioned big data-based task optimization method.

与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:

通过查找到复用度低的目标任务,并根据两个目标任务之间的依赖重合度,筛选出复用度低且重合度过高的目标任务对,对目标任务进行合并或替代,后续的需求设计优先采用复用度高的表作为底座,同时参考设计之后的任务与现有任务的依赖重合度,减少烟囱开发,达到优化任务开发的有益效果。By finding the target task with low degree of reuse, and according to the degree of dependency overlap between the two target tasks, screen out the target task pair with low degree of reuse and high degree of overlap, and merge or replace the target tasks. The demand design prioritizes the use of tables with high reusability as the base. At the same time, the tasks after the reference design overlap with the existing tasks to reduce the development of chimneys and achieve the beneficial effect of optimizing task development.

附图说明Description of drawings

为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.

图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;

图2是根据本申请的基于大数据的任务优化方法的一个实施例的流程图;2 is a flow chart of an embodiment of a task optimization method based on big data according to the present application;

图3是根据本申请的基于大数据的任务优化装置的一个实施例的结构示意图;3 is a schematic structural diagram of an embodiment of a task optimization device based on big data according to the present application;

图4是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.

具体实施方式Detailed ways

除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.

在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.

为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.

如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications may be installed on the terminal devices 101 , 102 and 103 , such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.

终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture E基于大数据的任务优化perts GroupAudio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture E基于大数据的任务优化perts GroupAudio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。The terminal devices 101, 102, and 103 can be various electronic devices that have a display screen and support web browsing, including but not limited to smart phones, tablet computers, e-book readers, and MP3 players (Moving Picture E task optimization based on big data). perts GroupAudio Layer III, Motion Picture Expert Compression Standard Audio Layer 3), MP4 (Moving Picture E Big Data-Based Task Optimization perts GroupAudio Layer IV, Motion Picture Expert Compression Standard Audio Layer 4) Players, Laptops and Desktops computer, etc.

服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for the pages displayed on the terminal devices 101 , 102 , and 103 .

服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(ContentDeliveryNetwork,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The server can be an independent server, or it can provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, and Content Delivery Network (CDN) , and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.

需要说明的是,本申请实施例所提供的基于大数据的任务优化方法一般由服务器/终端设备执行,相应地,基于大数据的任务优化装置一般设置于服务器/终端设备中。It should be noted that the big data-based task optimization method provided by the embodiments of the present application is generally performed by a server/terminal device, and accordingly, the big data-based task optimization apparatus is generally set in the server/terminal device.

应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

继续参考图2,示出了根据本申请的基于大数据的任务优化的方法的一个实施例的流程图。所述的基于大数据的任务优化方法,包括以下步骤:Continuing to refer to FIG. 2 , a flowchart of one embodiment of a method for task optimization based on big data according to the present application is shown. The method for task optimization based on big data includes the following steps:

步骤S201,接收到优化任务指令时,获取各个目标任务的任务依赖集合;Step S201, when receiving the optimization task instruction, obtain the task dependency set of each target task;

在本实施例中,基于大数据的任务优化方法运行于其上的电子设备(例如图1所示的服务器/终端设备)可以通过有线连接方式或者无线连接方式接收客户端发送的优化指令。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMA基于大数据的任务优化连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device (for example, the server/terminal device shown in FIG. 1 ) on which the big data-based task optimization method runs can receive the optimization instruction sent by the client through a wired connection or a wireless connection. It should be pointed out that the above wireless connection methods may include but are not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMA connection based on big data task optimization, Zigbee connection, UWB (ultra wideband) connection, and other known or Future development of wireless connectivity.

在本实施例中,优化任务指令可以是人为操作触发,也可以通过设置闹钟触发,也可以根据系统的运行负载量触发,以优化系统中执行任务情况。优化任务指令可包含优化任务的要求,如目标任务的范围、优化层级等。任务依赖关系是指一个任务a需要在另一个任务b完成后才能执行,则任务a依赖于任务b。任务依赖集合是指最终执行的任务a所依赖的所有上游任务,如任务a依赖于任务b和任务c,任务b依赖于任务d与任务e,任务c依赖于任务f,那么称集合{b,c,d,e,f}为任务a的所有上游依赖集合;称集合{b,c}为任务a为直接上游依赖集合;集合{b}为d的直接下游依赖集合;称集合{a,b}为d的所有下游依赖集合。In this embodiment, the optimization task instruction may be triggered by human operation, or may be triggered by setting an alarm clock, or may be triggered according to the operating load of the system, so as to optimize the execution of tasks in the system. The optimization task instruction may contain the requirements of the optimization task, such as the scope of the target task, the optimization level, and so on. Task dependency means that a task a needs to be executed after another task b is completed, then task a depends on task b. The task dependency set refers to all upstream tasks that the final executed task a depends on. For example, task a depends on task b and task c, task b depends on task d and task e, and task c depends on task f, then the set {b ,c,d,e,f} is the set of all upstream dependencies of task a; the set {b,c} is called the set of direct upstream dependencies of task a; the set {b} is the set of direct downstream dependencies of d; the set {a ,b} is the set of all downstream dependencies of d.

在一种可选的实施例中,步骤S201的细化步骤包括:In an optional embodiment, the refinement step of step S201 includes:

步骤S2011,从任务配置系统获取各个所述目标任务的配置参数;Step S2011, obtaining the configuration parameters of each of the target tasks from the task configuration system;

步骤S2012,根据所述配置参数获取各个所述目标任务的上游依赖任务;Step S2012, obtaining the upstream dependent tasks of each of the target tasks according to the configuration parameters;

步骤S2013,根据所述上游依赖任务得到所述任务依赖集合。Step S2013, obtaining the task dependency set according to the upstream dependent task.

各个目标任务的配置参数存储于任务配置系统如linkdo任务调度系统中,通过调用任务配置系统获取各个目标任务的配置参数。配置参数中包含目标任务在那些任务执行完成之后才能执行的任务,这些任务为目标任务的上游依赖任务,而某个目标任务的所有上游依赖任务所构成的集合为该目标任务的任务依赖集合。The configuration parameters of each target task are stored in a task configuration system such as the linkdo task scheduling system, and the configuration parameters of each target task are obtained by calling the task configuration system. The configuration parameters include the tasks that the target task can execute after those tasks are completed. These tasks are the upstream dependent tasks of the target task, and the set of all upstream dependent tasks of a target task is the task dependency set of the target task.

通过调用任务配置系统中的配置参数,追溯目标任务所有的上游依赖任务,将所有上游依赖任务整合,得到目标任务的任务依赖集合,依据目标任务的层级关系精准获取上游依赖任务,达到快速、精获取目标依赖集合的有意效果。By calling the configuration parameters in the task configuration system, trace all the upstream dependent tasks of the target task, integrate all the upstream dependent tasks, get the task dependency set of the target task, and accurately obtain the upstream dependent tasks according to the hierarchical relationship of the target task, so as to achieve fast and precise Get the intended effect of the target-dependent collection.

在又一可选的实施例中,步骤S201的细化步骤还包括:In yet another optional embodiment, the refinement step of step S201 further includes:

步骤S2014,获取各个目标任务的运行日志;Step S2014, obtaining the operation log of each target task;

步骤S2015,查找所述运行日志中所述目标任务的执行代码;Step S2015, searching for the execution code of the target task in the running log;

步骤S2016,根据所述目标任务的执行代码的来源表整合所述来源表中各个所述目标任务的上游依赖任务,得到所述任务依赖集合。Step S2016: Integrate the upstream dependent tasks of each of the target tasks in the source table according to the source table of the execution code of the target task to obtain the task dependency set.

为了降低运行压力,可设置一定期限内包含目标任务的运行日志,运行日志中记录目标任务的执行代码,其中,不同目标任务可对应多个运行日志,不同目标任务可出现在同一个运行日志中。在本实施例中,通过获取一个目标任务的任务依赖集合后,再获取下一个目标任务的任务依赖集合。例如目标任务为a,查找运行日志中目标任务a的执行代码,根据执行代码追溯到执行代码的来源表,查找到执行代码的上游执行代码,根据上游执行代码转换成上游依赖任务,将目标任务a的所有上游依赖任务整合在一起,得到任务依赖集合。In order to reduce the running pressure, a running log containing the target task within a certain period can be set, and the execution code of the target task can be recorded in the running log. Different target tasks can correspond to multiple running logs, and different target tasks can appear in the same running log. . In this embodiment, after obtaining the task dependency set of one target task, the task dependency set of the next target task is obtained. For example, if the target task is a, find the execution code of the target task a in the running log, trace the source table of the execution code according to the execution code, find the upstream execution code of the execution code, convert the upstream execution code into the upstream dependent task according to the upstream execution code, and convert the target task All upstream dependent tasks of a are integrated together to obtain a task dependency set.

在本实施例中,通过目标任务的运行日志,根据运行日志中记录的执行代码,追溯到目标任务的所有上游依赖任务,得到由所有上游依赖任务构成的任务依赖集合。In this embodiment, all upstream dependent tasks of the target task are traced back to all upstream dependent tasks of the target task through the running log of the target task and according to the execution code recorded in the running log, to obtain a task dependency set composed of all upstream dependent tasks.

步骤S202,根据所述优化任务指令对各个所述目标任务进行分类,获取由所述目标任务构成的任务重合度集合;Step S202, classify each of the target tasks according to the optimization task instruction, and obtain a task coincidence degree set composed of the target tasks;

解析优化任务指令中的优化任务要求,其中,优化任务指令包括按照主题分类、数据层级(ODS、DWD、DWS、DMD、DMS、DMI)分类或聚类分类。根据优化任务指令对目标任务进行分类,得到任务重合度集合,其中,任务重合度集合是指根据不同的分类标准对目标任务进行分类,得到同一主题、同一数据层级或聚类重合度接近的任务集合。Analyze the optimization task requirements in the optimization task instructions, wherein the optimization task instructions include classification by subject, data hierarchy (ODS, DWD, DWS, DMD, DMS, DMI) classification or cluster classification. Classify the target tasks according to the optimization task instructions, and obtain a task coincidence degree set, wherein the task coincidence degree set refers to classifying the target tasks according to different classification standards, and obtaining tasks with the same topic, the same data level or close clustering coincidence degree gather.

在一可选的实施例中,步骤S202的细化步骤包括:In an optional embodiment, the refinement step of step S202 includes:

步骤S2021,解析所述优化任务指令,其中,所述优化任务指令包含按照目标数据层级分类的要求;Step S2021, parsing the optimization task instruction, wherein the optimization task instruction includes requirements classified according to the target data level;

步骤S2022,获取层级为所述目标数据层级的目标任务,生成所述任务重合度集合。Step S2022, acquiring target tasks whose level is the target data level, and generating the task coincidence degree set.

通过解析优化任务指令中包含按照目标数据层级分类的要求,确定待分类的目标数据层级,获取各个目标任务的数据层级,得到层级为目标数据层级的目标任务,生成任务重合度集合,其中,任务重合度集合由层次为目标数据层级的目标任务构成的。By parsing the optimization task instructions including the requirements for classification according to the target data level, the target data level to be classified is determined, the data level of each target task is obtained, the target task whose level is the target data level is obtained, and the task coincidence degree set is generated. The coincidence degree set is composed of target tasks whose level is the target data level.

在本实施例中,通过数据链层的方式对目标任务进行分类,有利于企业内每个部门对所在部门任务进行优化,实现局部优化,通过分片式处理得到任务重合度集合,提升了查找烟囱开发的效率。In this embodiment, the target tasks are classified by means of the data link layer, which is beneficial for each department in the enterprise to optimize the department's tasks and achieve local optimization. The task coincidence degree set is obtained through fragmented processing, which improves the search efficiency. Efficiency of chimney development.

步骤S203,计算所述任务重合度集合中任意两个目标任务的重合度;Step S203, calculating the coincidence degree of any two target tasks in the task coincidence degree set;

根据分类要求得到由目标任务组成的任务重合度集合,例如,主题相同,数据层级相同,或者聚类重合度相近。可以理解的是,不同分类的目标任务不会有用途重复,在这种情况下,任务依赖集合中的目标任务可能会存在重复开发的情况,为了优化任务链路,对同一分类要求中的任务进行迭代汇总,得到任务重合度集合。获取任务依赖集合中的任意两个目标任务,根据两个目标任务对应的任务依赖集合确定两个目标任务之间的重合度。重合度反映两个目标任务的上游依赖任务之间的重叠率,重叠率越高,则两个目标任务之间的重合度越高,存在任务重复开发的程度越高。According to the classification requirements, a task coincidence degree set composed of target tasks is obtained, for example, the subject is the same, the data level is the same, or the cluster coincidence degree is similar. It is understandable that the target tasks of different classifications will not have repeated uses. In this case, the target tasks in the task-dependent set may be repeatedly developed. In order to optimize the task link, the tasks in the same classification are required. Perform iterative summary to get the task coincidence degree set. Obtain any two target tasks in the task dependency set, and determine the degree of coincidence between the two target tasks according to the task dependency sets corresponding to the two target tasks. The degree of overlap reflects the overlap rate between the upstream dependent tasks of the two target tasks. The higher the overlap rate, the higher the degree of overlap between the two target tasks and the higher the degree of repetitive development of tasks.

在一可选的实施例中,步骤S203的细化步骤包括:In an optional embodiment, the refinement step of step S203 includes:

步骤S2031,获取所述任务重合度集合中任意的两个目标任务;Step S2031, acquiring any two target tasks in the task coincidence degree set;

步骤S2032,分别获取所述两个目标任务各自任务依赖集合的元素个数,所述两个目标任务的任务依赖集合的并集元素个数以及交集元素个数;Step S2032, respectively obtaining the number of elements of the respective task dependency sets of the two target tasks, the number of union elements and the number of intersection elements of the task dependency sets of the two target tasks;

步骤S2033,根据所述两个目标任务各自任务依赖集合的元素个数,所述两个目标任务的任务依赖集合的并集元素个数以及交集元素个数计算所述任务重合度集合中任意两个目标任务的重合度。Step S2033, according to the number of elements of the respective task dependency sets of the two target tasks, the number of union elements and the number of intersection elements of the task dependency sets of the two target tasks, calculate any two in the task coincidence degree set. The degree of coincidence of the target tasks.

每次选取任务重合度集合中的两个目标任务,执行步骤S2032、步骤S2033。若任务重合度集合包含n个目标任务,则执行Cn 2次,即n(n-1)/2次。若选取的目标任务分别为a和b,获取目标任务a的任务依赖集合A,获取目标任务b的任务依赖集合,得到目标任务a的任务依赖集合A的元素个数n(A),以及目标任务b的任务依赖集合B的元素个数n(B),根据任务依赖集合A与B可获取A与B的交集以及并集,得到交集元素个数n(A∩B)与并集元素个数n(A∪B)。Each time two target tasks in the task coincidence degree set are selected, step S2032 and step S2033 are performed. If the task coincidence degree set includes n target tasks, execute C n 2 times, that is, n(n-1)/2 times. If the selected target tasks are a and b, respectively, obtain the task dependency set A of target task a, obtain the task dependency set of target task b, obtain the number of elements n(A) of the task dependency set A of target task a, and the target The number of elements of task-dependent set B of task b is n(B). According to task-dependent sets A and B, the intersection and union of A and B can be obtained, and the number of intersection elements n(A∩B) and union elements can be obtained. Number n(A∪B).

根据n(A)、n(B)、n(A∩B)以及n(A∪B)计算得到目标任务a与目标任务b的重合度。重合度用于衡量目标任务a与目标任务b所依赖上游依赖任务的重叠率,反映任务间的冗杂率,是一项重要的优化指标。According to n(A), n(B), n(A∩B) and n(A∪B), the coincidence degree of target task a and target task b is obtained. Coincidence is used to measure the overlap rate of upstream dependent tasks on which target task a and target task b depend, reflecting the redundancy rate between tasks, and is an important optimization indicator.

在本实施例中,通过获取两个目标任务的任务依赖集合,并分别得到两个目标任务的任务依赖集合元素个数,两个目标任务的任务依赖集合的交集元素个数以及并集个数,得到重合度,计算过程简单,易于操作,同时能准确地反映两个目标任务之间的重合度。In this embodiment, the task dependency sets of the two target tasks are obtained, and the number of elements of the task dependency sets of the two target tasks, the number of intersection elements and the number of unions of the task dependency sets of the two target tasks are obtained respectively. , get the coincidence degree, the calculation process is simple, easy to operate, and can accurately reflect the coincidence degree between the two target tasks.

进一步的,步骤S2033的计算公式如下:Further, the calculation formula of step S2033 is as follows:

Figure BDA0003485508400000091
Figure BDA0003485508400000091
or

Figure BDA0003485508400000092
Figure BDA0003485508400000092
or

Figure BDA0003485508400000093
Figure BDA0003485508400000093

其中,S为重合度;A为第一目标任务的任务依赖集合,所述第一目标任务为所述两个目标任务中的任意一个;B为所述两个目标任务中的另一个目标任务的任务依赖集合;n(A∩B)集合A与集合B的交集的元素个数;n(A∪B)为集合A与集合B的并集的元素个数;n(A)为集合A的元素个数;n(B)为集合B的元素个数。Among them, S is the degree of coincidence; A is the task dependency set of the first target task, and the first target task is any one of the two target tasks; B is the other target task of the two target tasks The task dependent set of ; n(A∩B) the number of elements in the intersection of set A and set B; n(A∪B) is the number of elements in the union of set A and set B; n(A) is set A The number of elements of ; n(B) is the number of elements of set B.

步骤S204,获取所述任务重合度集合中各个目标任务的复用度;Step S204, obtaining the reuse degree of each target task in the task coincidence degree set;

复用度是指目标任务的直接下游依赖集合中包含元素的个数,可以代表该任务支撑了多少个下游表,可以用来衡量设计的优劣,体现数据模型共享性。复用度越高,共享程度越高,设计越优。直接下游依赖集合为依赖目标任务所组成的集合。若只有目标任务A执行完成后才能执行在A1与A2,则目标任务A的直接下游依赖集合为{A1,A2},目标任务A的复用度为2。The degree of reuse refers to the number of elements contained in the direct downstream dependency set of the target task, which can represent how many downstream tables the task supports. It can be used to measure the quality of the design and reflect the sharing of data models. The higher the degree of reuse, the higher the degree of sharing, and the better the design. The set of direct downstream dependencies is the set of dependent target tasks. If the target task A can only be executed in A1 and A2 after the execution of the target task A is completed, the direct downstream dependency set of the target task A is {A1, A2}, and the reuse degree of the target task A is 2.

在一可选的实施例中,步骤S204的细化步骤包括:In an optional embodiment, the refinement step of step S204 includes:

步骤S2041,获取各个所述目标任务的直接下游依赖集合;Step S2041, obtaining the direct downstream dependency sets of each of the target tasks;

步骤S2042,根据所述直接下游依赖集合的元素个数确定各个所述目标任务的复用度。Step S2042: Determine the multiplexing degree of each of the target tasks according to the number of elements in the direct downstream dependency set.

可通过调用任务配置系统中目标任务的配置参数或者任务运行日志中记录的调用该目标任务的代码,根据该代码包含的所有调用关系,通过进行任务匹配,可得到依赖目标任务的直接下游任务,该目标任务的所有直接下游任务构成直接下游依赖集合。直接下游依赖集合由目标任务的直接下游任务组成,其元素个数为直接下游任务的个数,即直接下游依赖集合的元素个数为目标任务的复用度。复用度用于衡量任务设计的优劣,体现数据模型共享性。By calling the configuration parameters of the target task in the task configuration system or the code that calls the target task recorded in the task operation log, according to all the calling relationships contained in the code, through task matching, the direct downstream tasks that depend on the target task can be obtained. All direct downstream tasks of the target task constitute the set of direct downstream dependencies. The direct downstream dependency set consists of the direct downstream tasks of the target task, and the number of elements is the number of direct downstream tasks, that is, the number of elements in the direct downstream dependency set is the reuse degree of the target task. The degree of reuse is used to measure the pros and cons of task design, reflecting the sharing of data models.

通过获取两个目标任务之间的重合度,用于衡量两个目标任务之间的依赖关系,所依赖的上游任务之间的重叠率,体现冗杂程度,而目标任务的复用度通过获取直接下游依赖集合,体现目标任务之间的数据模型共享性,从两个角度筛选出重叠率高、共享性差的目标任务,有效减少烟囱开发的有益效果。By obtaining the degree of overlap between the two target tasks, it is used to measure the dependency between the two target tasks, and the overlap rate between the dependent upstream tasks reflects the degree of redundancy, and the degree of reuse of the target task is directly obtained by obtaining The downstream dependency set reflects the data model sharing between target tasks, and screen out target tasks with high overlap rate and poor sharing from two perspectives, effectively reducing the beneficial effect of chimney development.

步骤S205,若所述任务重合度集合中的目标任务的复用度低于预设复用度且该目标任务与另一目标任务的重合度高于预设重合度时,将所述目标任务合并于所述另一目标任务。Step S205, if the degree of reuse of the target task in the set of task coincidence degrees is lower than the preset degree of reuse and the degree of coincidence of the target task and another target task is higher than the preset degree of coincidence, the target task is merged with the other target task.

预设复用度与预设重合度可由设计人员根据系统运行情况进行设置,可依据不同目标任务的类型进行调整,如预设复用度为3,预设重合度为90%。在目标任务的复用度低于预设复用度且该目标任务与另一目标任务之间的重合度高于预设重合度时,表明该目标任务与另一目标任务之间的依赖任务存在高度重叠,目标任务与另一目标任务之间处于重复开发,目标任务属于烟囱开发。为了优化冗杂任务,提升系统整体运行效率,将该目标任务合并到与之重合度高的另一目标任务中。The preset multiplexing degree and the preset overlapping degree can be set by the designer according to the operating conditions of the system, and can be adjusted according to the types of different target tasks. For example, the preset multiplexing degree is 3, and the preset overlapping degree is 90%. When the reuse degree of the target task is lower than the preset reuse degree and the coincidence degree between the target task and another target task is higher than the preset coincidence degree, it indicates the dependent task between the target task and another target task There is a high degree of overlap, the target task is being developed repeatedly with another target task, and the target task belongs to the chimney development. In order to optimize the redundant tasks and improve the overall operating efficiency of the system, the target task is merged into another target task with a high degree of coincidence.

在本实施例中,通过查找到复用度低的目标任务,并根据两个目标任务之间的依赖重合度,筛选出复用度低且重合度过高的目标任务对,对目标任务进行合并或替代,后续的需求设计优先采用复用度高的表作为底座,同时参考设计之后的任务与现有任务的依赖重合度,减少烟囱开发,达到优化任务开发的有益效果。In this embodiment, a target task with a low degree of reuse is found, and according to the degree of overlap of dependencies between the two target tasks, a pair of target tasks with a low degree of reuse and an excessively high degree of overlap are screened out, and the target task is evaluated. Combined or replaced, the follow-up requirement design preferentially uses the table with high reusability as the base. At the same time, the dependence of the task after the reference design and the existing task overlap, reduce the development of the chimney, and achieve the beneficial effect of optimizing the task development.

需要强调的是,为进一步保证上述候选词库的私密和安全性,上述任务依赖集合、依赖重复度集合、重合度、复用度还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned candidate lexicon, the above-mentioned task dependency set, dependency repetition set, coincidence degree, and reuse degree can also be stored in a node of a blockchain.

本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process related data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

本申请可用于众多通用或专用的计算机系统环境或配置中。例如:个人计算机、服务器计算机、手持设备或便携式设备、平板型设备、多处理器系统、基于微处理器的系统、置顶盒、可编程的消费电子设备、网络PC、小型计算机、大型计算机、包括以上任何系统或设备的分布式计算环境等等。本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。The present application may be used in numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, including A distributed computing environment for any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(RandomAccess Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the program is executed, it may include the processes of the foregoing method embodiments. The aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).

应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowchart of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process related data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .

人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。The basic technologies of artificial intelligence generally include technologies such as sensors, special artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种基于大数据的任务优化装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a task optimization device based on big data, and the device embodiment corresponds to the method embodiment shown in FIG. 2 , Specifically, the device can be applied to various electronic devices.

如图3所示,本实施例所述的基于大数据的任务优化装置300包括:接收模块301、分类模块302、重合度计算模块303、复用度计算模块304以及合并模块305。其中:As shown in FIG. 3 , the big data-based task optimization apparatus 300 in this embodiment includes: a receiving module 301 , a classification module 302 , a coincidence degree calculation module 303 , a reuse degree calculation module 304 , and a merging module 305 . in:

接收模块301,用于接收到优化任务指令时,获取各个目标任务的任务依赖集合;The receiving module 301 is configured to obtain the task dependency set of each target task when receiving the optimization task instruction;

分类模块302,用于根据所述优化任务指令对各个所述目标任务进行分类,获取由所述目标任务构成的任务重合度集合;A classification module 302, configured to classify each of the target tasks according to the optimization task instruction, and obtain a task coincidence degree set composed of the target tasks;

重合度计算模块303,用于计算所述任务重合度集合中任意两个目标任务的重合度;The coincidence degree calculation module 303 is used to calculate the coincidence degree of any two target tasks in the task coincidence degree set;

复用度计算模块304,用于获取所述任务重合度集合中各个目标任务的复用度;a reusability calculation module 304, configured to obtain the reusability of each target task in the task coincidence degree set;

合并模块305,用于若所述任务重合度集合中的目标任务的复用度低于预设复用度且该目标任务与另一目标任务的重合度高于预设重合度时,将所述目标任务合并于所述另一目标任务。The merging module 305 is configured to, if the degree of reuse of the target task in the set of task coincidence degrees is lower than the preset degree of reuse and the degree of coincidence between the target task and another target task is higher than the preset degree of coincidence, The target task is merged with the other target task.

在本实施例中,通过查找到复用度低的目标任务,并根据两个目标任务之间的依赖重合度,筛选出复用度低且重合度过高的目标任务对,对目标任务进行合并或替代,后续的需求设计优先采用复用度高的表作为底座,同时参考设计之后的任务与现有任务的依赖重合度,减少烟囱开发,达到优化任务开发的有益效果。In this embodiment, a target task with a low degree of reuse is found, and according to the degree of overlap of dependencies between the two target tasks, a pair of target tasks with a low degree of reuse and an excessively high degree of overlap are screened out, and the target task is evaluated. Combined or replaced, the follow-up requirement design preferentially uses the table with high reusability as the base. At the same time, the dependence of the task after the reference design and the existing task overlap, reduce the development of the chimney, and achieve the beneficial effect of optimizing the task development.

在本实施例的一些可选的实现方式中,接收模块301包括:In some optional implementations of this embodiment, the receiving module 301 includes:

参数获取子单元3011,用于从任务配置系统获取各个所述目标任务的配置参数;The parameter acquisition subunit 3011 is used to acquire the configuration parameters of each of the target tasks from the task configuration system;

任务获取子单元3012,用于根据所述配置参数获取各个所述目标任务的上游依赖任务;A task acquisition subunit 3012, configured to acquire the upstream dependent tasks of each of the target tasks according to the configuration parameters;

集合获取子单元3013,用于根据所述上游依赖任务得到所述任务依赖集合。The set obtaining subunit 3013 is configured to obtain the task dependency set according to the upstream dependent task.

在本实施例中,通过调用任务配置系统中的配置参数,追溯目标任务所有的上游依赖任务,将所有上游依赖任务整合,得到目标任务的任务依赖集合,依据目标任务的层级关系精准获取上游依赖任务,达到快速、精获取目标依赖集合的有意效果。In this embodiment, all upstream dependent tasks of the target task are traced back by calling the configuration parameters in the task configuration system, and all upstream dependent tasks are integrated to obtain the task dependency set of the target task, and the upstream dependencies are accurately obtained according to the hierarchical relationship of the target task. Tasks to achieve fast and precise acquisition of the intended effect of target-dependent collections.

在本实施例的一些可选的实现方式中,分类模块301还包括:In some optional implementations of this embodiment, the classification module 301 further includes:

日志获取子单元3014,用于获取各个目标任务的运行日志;The log acquisition subunit 3014 is used to acquire the operation log of each target task;

查找子单元3015,用于查找所述运行日志中所述目标任务的执行代码;Search subunit 3015, used to search the execution code of the target task in the operation log;

整合子单元3016,用于根据所述目标任务的执行代码的来源表整合所述来源表中各个所述目标任务的上游依赖任务,得到所述任务依赖集合。The integration subunit 3016 is configured to integrate the upstream dependent tasks of each of the target tasks in the source table according to the source table of the execution code of the target task to obtain the task dependency set.

在本实施例中,通过目标任务的运行日志,根据运行日志中记录的执行代码,追溯到目标任务的所有上游依赖任务,得到由所有上游依赖任务构成的任务依赖集合。In this embodiment, all upstream dependent tasks of the target task are traced back to all upstream dependent tasks of the target task through the running log of the target task and according to the execution code recorded in the running log, to obtain a task dependency set composed of all upstream dependent tasks.

本实施例的一些可选的实现方式中,接收模块302还包括:In some optional implementation manners of this embodiment, the receiving module 302 further includes:

解析子单元3021,用于解析所述优化任务指令,其中,所述优化任务指令包含按照目标数据层级分类的要求;A parsing subunit 3021, configured to parse the optimization task instruction, wherein the optimization task instruction includes requirements classified according to the target data level;

集合生成子单元3022,获取层级为所述目标数据层级的目标任务,生成所述任务重合度集合。。The set generating subunit 3022 obtains the target task whose level is the target data level, and generates the task coincidence degree set. .

在本实施例中,通过数据链层的方式对目标任务进行分类,有利于企业内每个部门对所在部门任务进行优化,实现局部优化,通过分片式处理得到任务重合度集合,提升了查找烟囱开发的效率。In this embodiment, the target tasks are classified by means of the data link layer, which is beneficial for each department in the enterprise to optimize the department's tasks and achieve local optimization. The task coincidence degree set is obtained through fragmented processing, which improves the search efficiency. Efficiency of chimney development.

本实施例的一些可选的实现方式中,重合度计算模块303还包括:In some optional implementations of this embodiment, the coincidence degree calculation module 303 further includes:

任务获取子单元3031,用于获取所述任务重合度集合中任意的两个目标任务;The task acquisition subunit 3031 is used to acquire any two target tasks in the task coincidence degree set;

记数子单元3032,用于分别获取所述两个目标任务各自任务依赖集合的元素个数,所述两个目标任务的任务依赖集合的并集元素个数以及交集元素个数;The counting subunit 3032 is used to obtain the number of elements of the respective task-dependent sets of the two target tasks, the number of union elements and the number of intersection elements of the task-dependent sets of the two target tasks;

复合度计算子单元3033,用于根据所述两个目标任务各自任务依赖集合的元素个数,所述两个目标任务的任务依赖集合的并集元素个数以及交集元素个数计算所述任务重合度集合中任意两个目标任务的重合度。The composite degree calculation subunit 3033 is used to calculate the task according to the number of elements of the respective task dependency sets of the two target tasks, the number of union elements and the number of intersection elements of the task dependency sets of the two target tasks The coincidence degree of any two target tasks in the coincidence degree set.

在本实施例中,通过获取两个目标任务的任务依赖集合,并分别得到两个目标任务的任务依赖集合元素个数,两个目标任务的任务依赖集合的交集元素个数以及并集个数,得到重合度,计算过程简单,易于操作,同时能准确地反映两个目标任务之间的重合度。In this embodiment, the task dependency sets of the two target tasks are obtained, and the number of elements of the task dependency sets of the two target tasks, the number of intersection elements and the number of unions of the task dependency sets of the two target tasks are obtained respectively. , get the coincidence degree, the calculation process is simple, easy to operate, and can accurately reflect the coincidence degree between the two target tasks.

本实施例的一些可选的实现方式中,复用度计算模块304还包括:In some optional implementations of this embodiment, the multiplexing degree calculation module 304 further includes:

复用度获取子单元3041,用于获取各个所述目标任务的直接下游依赖集合;The reuse degree acquisition subunit 3041 is used to acquire the direct downstream dependency set of each of the target tasks;

复用度计算子单元3042,用于根据所述直接下游依赖集合的元素个数确定各个所述目标任务的复用度。The multiplexing degree calculation subunit 3042 is configured to determine the multiplexing degree of each of the target tasks according to the number of elements in the direct downstream dependency set.

在本实施例中,通过调用任务配置系统中目标任务的配置参数或者任务运行日志中记录的调用该目标任务的代码,根据该代码包含的所有调用关系,通过进行任务匹配,可得到依赖目标任务的直接下游任务,该目标任务的所有直接下游任务构成直接下游依赖集合。直接下游依赖集合由目标任务的直接下游任务组成,其元素个数为直接下游任务的个数,即直接下游依赖集合的元素个数为目标任务的复用度。复用度用于衡量任务设计的优劣,体现数据模型共享性。In this embodiment, by invoking the configuration parameters of the target task in the task configuration system or the code for invoking the target task recorded in the task operation log, and according to all the invocation relationships contained in the code, by performing task matching, the dependent target task can be obtained. All direct downstream tasks of the target task constitute the set of direct downstream dependencies. The direct downstream dependency set consists of the direct downstream tasks of the target task, and the number of elements is the number of direct downstream tasks, that is, the number of elements in the direct downstream dependency set is the reuse degree of the target task. The degree of reuse is used to measure the pros and cons of task design, reflecting the sharing of data models.

通过获取两个目标任务之间的重合度,用于衡量两个目标任务之间的依赖关系,所依赖的上游任务之间的重叠率,体现冗杂程度,而目标任务的复用度通过获取直接下游依赖集合,体现目标任务之间的数据模型共享性,从两个角度筛选出重叠率高、共享性差的目标任务,有效减少烟囱开发的有益效果。By obtaining the degree of overlap between the two target tasks, it is used to measure the dependency between the two target tasks, and the overlap rate between the dependent upstream tasks reflects the degree of redundancy, and the degree of reuse of the target task is directly obtained by obtaining The downstream dependency set reflects the data model sharing between target tasks, and screen out target tasks with high overlap rate and poor sharing from two perspectives, effectively reducing the beneficial effect of chimney development.

为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of a basic structure of a computer device according to this embodiment.

所述计算机设备4包括通过系统总线相互通信连接存储器41、处理器42、网络接口43。需要指出的是,图中仅示出了具有组件41-43的计算机设备4,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(ApplicationSpecific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable GateArray,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 4 includes a memory 41, a processor 42, and a network interface 43 that communicate with each other through a system bus. It should be noted that only the computer device 4 with components 41-43 is shown in the figure, but it should be understood that it is not required to implement all of the shown components, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated circuit (ApplicationSpecific Integrated Circuit, ASIC), programmable gate array (Field-Programmable GateArray, FPGA), digital processor (Digital Signal Processor, DSP), embedded equipment, etc.

所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.

所述存储器41至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或D基于大数据的任务优化存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器41可以是所述计算机设备4的内部存储单元,例如该计算机设备4的硬盘或内存。在另一些实施例中,所述存储器41也可以是所述计算机设备4的外部存储设备,例如该计算机设备4上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(SecureDigital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器41还可以既包括所述计算机设备4的内部存储单元也包括其外部存储设备。本实施例中,所述存储器41通常用于存储安装于所述计算机设备4的操作系统和各类应用软件,例如基于大数据的任务优化方法的计算机可读指令等。此外,所述存储器41还可以用于暂时地存储已经输出或者将要输出的各类数据。The storage 41 includes at least one type of readable storage medium, including flash memory, hard disk, multimedia card, card-type storage (eg, SD or D task-optimized storage based on big data, etc.), random access Memory (RAM), Static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4 , such as a hard disk or a memory of the computer device 4 . In other embodiments, the memory 41 may also be an external storage device of the computer device 4 , such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (SecureDigital, SD) card, flash memory card (Flash Card) and so on. Of course, the memory 41 may also include both the internal storage unit of the computer device 4 and its external storage device. In this embodiment, the memory 41 is generally used to store the operating system and various application software installed on the computer device 4 , such as computer-readable instructions for a task optimization method based on big data. In addition, the memory 41 can also be used to temporarily store various types of data that have been output or will be output.

所述处理器42在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器42通常用于控制所述计算机设备4的总体操作。本实施例中,所述处理器42用于运行所述存储器41中存储的计算机可读指令或者处理数据,例如运行所述基于大数据的任务优化方法的计算机可读指令。The processor 42 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. This processor 42 is typically used to control the overall operation of the computer device 4 . In this embodiment, the processor 42 is configured to execute computer-readable instructions stored in the memory 41 or process data, for example, computer-readable instructions for executing the big data-based task optimization method.

所述网络接口43可包括无线网络接口或有线网络接口,该网络接口43通常用于在所述计算机设备4与其他电子设备之间建立通信连接。The network interface 43 may include a wireless network interface or a wired network interface, and the network interface 43 is generally used to establish a communication connection between the computer device 4 and other electronic devices.

通过查找到复用度低的目标任务,并根据两个目标任务之间的依赖重合度,筛选出复用度低且重合度过高的目标任务对,对目标任务进行合并或替代,后续的需求设计优先采用复用度高的表作为底座,同时参考设计之后的任务与现有任务的依赖重合度,减少烟囱开发,达到优化任务开发的有益效果。By finding the target task with low degree of reuse, and according to the degree of dependency overlap between the two target tasks, screen out the target task pair with low degree of reuse and high degree of overlap, and merge or replace the target tasks. The demand design prioritizes the use of tables with high reusability as the base. At the same time, the tasks after the reference design overlap with the existing tasks to reduce the development of chimneys and achieve the beneficial effect of optimizing task development.

本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的基于大数据的任务优化方法的步骤。The present application also provides another embodiment, that is, to provide a computer-readable storage medium, where the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is caused to perform the steps of the big data-based task optimization method as described above.

通过查找到复用度低的目标任务,并根据两个目标任务之间的依赖重合度,筛选出复用度低且重合度过高的目标任务对,对目标任务进行合并或替代,后续的需求设计优先采用复用度高的表作为底座,同时参考设计之后的任务与现有任务的依赖重合度,减少烟囱开发,达到优化任务开发的有益效果。By finding the target task with low degree of reuse, and according to the degree of dependency overlap between the two target tasks, screen out the target task pair with low degree of reuse and high degree of overlap, and merge or replace the target tasks. The demand design prioritizes the use of tables with high reusability as the base. At the same time, the tasks after the reference design overlap with the existing tasks to reduce the development of chimneys and achieve the beneficial effect of optimizing task development.

通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.

显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (10)

1. A task optimization method based on big data is characterized by comprising the following steps:
when an optimization task instruction is received, a task dependency set of each target task is obtained;
classifying the target tasks according to the optimized task instruction to obtain a task overlap ratio set formed by the target tasks;
calculating the contact ratio of any two target tasks in the task contact ratio set;
acquiring the reusability of each target task in the task overlap ratio set;
and if the reusability of the target task in the task overlap ratio set is lower than a preset reusability and the overlap ratio of the target task and another target task is higher than a preset overlap ratio, merging the target task with the other target task.
2. The big-data-based task optimization method according to claim 1, wherein the step of obtaining the task dependency set of each target task comprises:
acquiring configuration parameters of each target task from a task configuration system;
acquiring an upstream dependent task of each target task according to the configuration parameters;
and obtaining the task dependency set according to the upstream dependency task.
3. The big data based task optimization method according to claim 1, wherein the step of obtaining the task dependency set of each target task further comprises:
acquiring running logs of each target task;
searching the execution code of the target task in the running log;
and integrating the upstream dependent tasks of the target tasks in the source table according to the source table of the execution codes of the target tasks to obtain the task dependent set.
4. The big data-based task optimization method according to claim 1, wherein the step of classifying each target task according to the optimization task instruction and obtaining a task overlap ratio set formed by the target tasks comprises:
analyzing the optimization task instruction, wherein the optimization task instruction comprises a requirement of classification according to a target data hierarchy;
and acquiring a target task with a level as the target data level, and generating the task overlap ratio set.
5. The big-data-based task optimization method according to claim 1, wherein the step of calculating the contact ratio of any two target tasks in the task contact ratio set comprises:
acquiring any two target tasks in the task overlap ratio set;
respectively acquiring the element number of the task dependency set of each of the two target tasks, the union element number and the intersection element number of the task dependency sets of the two target tasks;
and calculating the coincidence degree of any two target tasks in the task coincidence degree set according to the element number of the task dependency set of each of the two target tasks, the union element number and the intersection element number of the task dependency sets of the two target tasks.
6. The big-data-based task optimization method according to claim 5, wherein the overlap ratio of any two target tasks in the task overlap ratio set is calculated according to the number of elements in the task dependency set of each of the two target tasks, the number of union elements and the number of intersection elements in the task dependency set of the two target tasks, and the calculation formula is as follows:
Figure FDA0003485508390000021
or
Figure FDA0003485508390000022
Or
Figure FDA0003485508390000023
Wherein S is the contact ratio; a is a task dependency set of a first target task, wherein the first target task is any one of the two target tasks; b is a task dependency set of the other of the two target tasks; n (A ≈ B) the number of elements of the intersection of the set A and the set B; n (A U.B) is the number of elements of the union of the set A and the set B; n (A) is the number of elements in set A; n (B) is the number of elements in set B.
7. The big-data-based task optimization method according to claim 1, wherein the step of obtaining the reusability of each target task in the task overlap ratio set comprises:
acquiring a direct downstream dependency set of each target task;
and determining the multiplexing degree of each target task according to the number of elements of the direct downstream dependency set.
8. A big data based task optimization device, wherein the big data based task optimization device comprises:
the receiving module is used for acquiring a task dependency set of each target task when receiving the task optimization instruction;
the classification module is used for classifying the target tasks according to the optimization task instruction to obtain a task overlap ratio set formed by the target tasks;
the coincidence degree calculation module is used for calculating the coincidence degree of any two target tasks in the task coincidence degree set;
the reuse degree calculation module is used for acquiring the reuse degree of each target task in the task overlap degree set;
and the merging module is used for merging the target task into another target task if the reusability of the target task in the task overlap ratio set is lower than a preset reusability and the overlap ratio of the target task and the other target task is higher than a preset overlap ratio.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor that when executed performs the steps of the big data based task optimization method of any of claims 1 to 7.
10. A computer-readable storage medium, wherein the computer-readable storage medium stores thereon computer-readable instructions, which when executed by a processor, implement the steps of the big data based task optimization method according to any one of claims 1 to 7.
CN202210080035.5A 2022-01-24 2022-01-24 Task optimization method based on big data and related equipment Pending CN114428608A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210080035.5A CN114428608A (en) 2022-01-24 2022-01-24 Task optimization method based on big data and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210080035.5A CN114428608A (en) 2022-01-24 2022-01-24 Task optimization method based on big data and related equipment

Publications (1)

Publication Number Publication Date
CN114428608A true CN114428608A (en) 2022-05-03

Family

ID=81313657

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210080035.5A Pending CN114428608A (en) 2022-01-24 2022-01-24 Task optimization method based on big data and related equipment

Country Status (1)

Country Link
CN (1) CN114428608A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082006A1 (en) * 2012-09-14 2014-03-20 FTI Consulting Inc. Computer-Implemented System And Method For Identifying Near Duplicate Documents
CN108519881A (en) * 2018-03-17 2018-09-11 东南大学 A Component Recognition Method Based on Multi-Rule Clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082006A1 (en) * 2012-09-14 2014-03-20 FTI Consulting Inc. Computer-Implemented System And Method For Identifying Near Duplicate Documents
CN108519881A (en) * 2018-03-17 2018-09-11 东南大学 A Component Recognition Method Based on Multi-Rule Clustering

Similar Documents

Publication Publication Date Title
WO2022007438A1 (en) Emotional voice data conversion method, apparatus, computer device, and storage medium
CN111782649A (en) Data acquisition format update method, device, computer equipment and storage medium
CN115238009A (en) Metadata management method, device and equipment based on blood vessel margin analysis and storage medium
CN114637831A (en) Data query method and related equipment based on semantic analysis
CN114281552A (en) A task scheduling method, device, equipment and medium based on directed acyclic graph
CN116860856A (en) Financial data processing method and device, computer equipment and storage medium
CN114265835A (en) Data analysis method and device based on graph mining and related equipment
CN116860941A (en) Question and answer method, device, electronic equipment and storage medium
CN119537651A (en) A data processing method, device, equipment and medium
CN114637672A (en) Automated data testing method, device, computer equipment and storage medium
CN117235236B (en) Dialogue method, dialogue device, computer equipment and storage medium
CN116842011A (en) Blood relationship analysis method, device, computer equipment and storage medium
CN114428608A (en) Task optimization method based on big data and related equipment
CN116932697A (en) A business data processing method and related equipment based on rule engine optimization
CN117217684A (en) Index data processing method and device, computer equipment and storage medium
CN117078406A (en) Customer loss early warning method and device, computer equipment and storage medium
CN116661763A (en) Front-end and back-end development management method and device, computer equipment and storage medium
CN115829768A (en) Data calculation method, device and equipment based on rule engine and storage medium
CN115730603A (en) Information extraction method, device, equipment and storage medium based on artificial intelligence
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN115578170A (en) Method, device, equipment and storage medium for financial batch production certificates
CN115239185A (en) Service provider distribution method, service provider distribution device, computer equipment and storage medium
CN115576837A (en) Batch number making method and device, computer equipment and storage medium
CN115168472A (en) Real-time report generation method and system based on Flink
CN114328214A (en) Method and device for improving efficiency of interface test case of report software and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination