CN112364084B

CN112364084B - A visual data processing method and system with in-depth customized algorithm integration

Info

Publication number: CN112364084B
Application number: CN202011291725.2A
Authority: CN
Inventors: 陈欣; 李勇琪
Original assignee: Shenzhen Aerospace Smart City System Technology Co ltd
Current assignee: Shenzhen Aerospace Smart City System Technology Co ltd
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2024-10-22
Anticipated expiration: 2040-11-18
Also published as: CN112364084A

Abstract

The present invention relates to the field of data processing, and in particular to a visual data processing method and system for deeply customized algorithm integration. The method comprises the following steps: S1. By analyzing the algorithm library management, the required algorithms are packaged and uploaded, and the basic information of the algorithms is configured; S2. By constructing a data service model, the data is processed by the algorithm and arranged as a data service; S3. After the service is arranged, the service is approved by the data service approval module; S4. After the approval, the service is managed by the data service management module. The present invention integrates the management of algorithms and visualizes the data processing arrangement. And it can generate services after the arrangement, which is convenient for users to use to the greatest extent, and the present invention solves the problems of high learning cost and cumbersome processing for data processing personnel in a very simple and low-cost way.

Description

A visual data processing method and system with in-depth customized algorithm integration

技术领域Technical Field

本发明涉及数据处理领域，特别涉及一种深入定制算法集成的可视化数据处理方法及系统。The present invention relates to the field of data processing, and in particular to a visualization data processing method and system with in-depth customized algorithm integration.

背景技术Background Art

目前，随着科技的进步，我们进入了一个数据驱动的时代，每个机构、企业都希望在数据当中发现价值，数据已经成为这个新时代不可缺少的部分。而目前大部分的数据处理分析人员，还是采用传统的代码编写，即通过python、java等语言自定义算法，或Spark、flink等大数据处理工具提供的算法对数据进行处理；与SQL语句分析，即通过在数据库或hive数据仓库工具中编写SQL语句，运用其自带函数对数据进行处理的方式对数据进行处理。但这些数据处理方法或者是学习成本高，需要学会编写代码、SQL写法以及数据处理工具的使用，或者是使用繁琐，处理类似数据时需要重新复制之前写过的处理逻辑并进行修改，或者是复用性差，自己的处理方法交予他人相当麻烦，且对方不一定会使用。总体上，目前的针对数据处理所采用的技术，由于学习成本高、使用繁琐、复用性差等原因造成了高额的成本，耗时且费财力。At present, with the advancement of science and technology, we have entered a data-driven era. Every organization and enterprise hopes to find value in data, and data has become an indispensable part of this new era. However, most data processing analysts still use traditional code writing, that is, using custom algorithms in languages such as python and java, or algorithms provided by big data processing tools such as Spark and flink to process data; and SQL statement analysis, that is, writing SQL statements in databases or hive data warehouse tools, and using their own functions to process data. However, these data processing methods either have high learning costs, requiring learning to write code, SQL writing, and the use of data processing tools, or are cumbersome to use, and when processing similar data, it is necessary to copy the previously written processing logic and modify it, or have poor reusability, and it is quite troublesome to hand over your own processing methods to others, and the other party may not necessarily use them. In general, the current technologies used for data processing have high costs, time-consuming and costly due to high learning costs, cumbersome use, and poor reusability.

目前在数据处理方面，大部分处理人员都是使用传统的代码或SQL语句编写的方式对数据进行处理，一般用户需要学会hive数据仓库、Spark离线计算、ozzie调度等繁多的软件，或是通过代码编写的方式对数据进行处理，导致学习成本以及使用成本高。而要想解决数据处理人员的这些问题，需要设计一种能够集成各种处理算法，能够供数据处理人员简单使用的系统。At present, in terms of data processing, most data processors use traditional code or SQL statements to process data. Generally, users need to learn a variety of software such as hive data warehouse, Spark offline computing, and ozzie scheduling, or process data through code writing, which leads to high learning and usage costs. In order to solve these problems of data processors, it is necessary to design a system that can integrate various processing algorithms and can be easily used by data processors.

目前的解决方案是可视化ETL工具Kettle，Kettle是一款国外开源的ETL工具，纯java编写，可以在Window、Linux、Unix上运行，绿色无需安装，数据抽取高效稳定，它允许你管理来自不同数据库的数据，通过提供一个图形化的用户环境来描述你想做什么，Kettle中有两种脚本文件，transformation和job，transformation完成针对数据的基础转换，job则完成整个工作流的控制。然而Kettle不能对拥有的算法进行有效管理，新增算法困难；且编排数据转换流程过于繁琐，需要编排转换与工作两个部分；无法在编排后发布为服务，方便数据对接。因此此技术方案目前尚未有大规模应用。The current solution is the visual ETL tool Kettle. Kettle is an open source ETL tool written in pure Java. It can run on Windows, Linux, and Unix. It is green and does not require installation. It extracts data efficiently and stably. It allows you to manage data from different databases by providing a graphical user environment to describe what you want to do. There are two script files in Kettle, transformation and job. Transformation completes the basic transformation of data, and job completes the control of the entire workflow. However, Kettle cannot effectively manage the algorithms it has, and it is difficult to add new algorithms. In addition, the data conversion process is too cumbersome to arrange, and it is necessary to arrange the transformation and work. It cannot be published as a service after arrangement to facilitate data docking. Therefore, this technical solution has not yet been widely used.

现在在数据处理方面，没有什么有效系统或技术方案来解决学习成本高、算法复用性差、算法使用繁琐等问题。处理人员需要学习大量软件，且需要掌握大量算法。目前市面上已有的数据处理软件往往实用性不足，使用过于繁琐，且无法有效集成各类算法，导致使用人数较少。At present, there is no effective system or technical solution to solve the problems of high learning cost, poor algorithm reusability, and cumbersome algorithm use in data processing. Processors need to learn a lot of software and master a lot of algorithms. The existing data processing software on the market is often not practical enough, too cumbersome to use, and unable to effectively integrate various algorithms, resulting in a small number of users.

发明内容Summary of the invention

本发明提供一种深入定制算法集成的可视化数据处理方法及系统，旨在解决目前数据处理方面学习成本高、算法复用性差、算法使用繁琐等问题。The present invention provides a visual data processing method and system with in-depth customized algorithm integration, aiming to solve the current problems in data processing such as high learning cost, poor algorithm reusability, and cumbersome algorithm use.

本发明提供一种深入定制算法集成的可视化数据处理方法，包括以下步骤：The present invention provides a visual data processing method with in-depth customized algorithm integration, comprising the following steps:

S1.通过分析算法库管理，将需要的算法打包上传，并配置好算法的基础信息；S1. By analyzing the algorithm library management, package and upload the required algorithms, and configure the basic information of the algorithms;

S2.通过数据服务模型构建，采用算法对数据进行处理，并编排为数据服务；S2. Through the construction of data service model, the data is processed by algorithms and arranged into data services;

S3.编排完服务后通过数据服务审批模块对服务进行审批；S3. After arranging the service, the service is approved by the data service approval module;

S4.审批通过后使用数据服务管理模块对服务进行管理。S4. After approval, use the data service management module to manage the service.

作为本发明的进一步改进，所述步骤S1中，具体包括：As a further improvement of the present invention, the step S1 specifically includes:

分析算法库管理提供包括基础业务计算公式、算法模型、通用计算方式的算法库，同时提供新增算法工具的编辑入口，用户将需要用到的算法进行打包，在算法管理页面进行算法注册，并配置包括算法属性、算法详细信息、算法名称的基础信息。The analysis algorithm library management provides an algorithm library including basic business calculation formulas, algorithm models, and general calculation methods. It also provides an editing entry for new algorithm tools. Users can package the algorithms they need, register the algorithms on the algorithm management page, and configure basic information including algorithm properties, algorithm details, and algorithm name.

作为本发明的进一步改进，所述步骤S1中分析算法库管理包括以下子执行板块：As a further improvement of the present invention, the analysis algorithm library management in step S1 includes the following sub-execution modules:

S11.算法列表管理：将当前添加的算法以列表形式展示，并进行分类管理；S11. Algorithm list management: Display the currently added algorithms in a list and manage them by category;

S12.算法查询：根据算法名称和创建时间进行搜索查询；S12. Algorithm query: search and query based on algorithm name and creation time;

S13.新增算法：用户新增算法，进行详细信息填写，并导入本地算法包；S13. Add algorithm: The user adds an algorithm, fills in the detailed information, and imports the local algorithm package;

S14.修改算法：用户对算法详细信息的更改和算法包的重新上传；S14. Modify algorithm: User changes to algorithm details and re-uploads the algorithm package;

S15.移除算法：删除没用的算法，删除前进行提醒；同时对被占用的算法作出不允许删除的提示；S15. Remove algorithm: delete useless algorithms and give reminders before deletion; meanwhile, give reminders not to allow deletion of occupied algorithms;

S16.启用/停用算法：对算法进行启用、停用操作。S16. Enable/disable algorithm: enable or disable the algorithm.

作为本发明的进一步改进，所述步骤S2具体包括：用户上传算法后，通过数据服务模型构建来处理数据，数据服务模型构建提供可视化页面，通过拖拉拽的形式，将注册好的算法引入，对数据进行处理；包括以下子步骤：As a further improvement of the present invention, the step S2 specifically includes: after the user uploads the algorithm, the data is processed by building a data service model, and the data service model provides a visualization page, and the registered algorithm is introduced by dragging and dropping to process the data; including the following sub-steps:

S21.将数据抽取算子拉入，选择数据源；S21. Pull in the data extraction operator and select the data source;

S22.添加用户上传的处理算法，编辑算法的配置属性，并构建服务模型；S22. Add the processing algorithm uploaded by the user, edit the configuration properties of the algorithm, and build the service model;

S23.对构建的服务模型进行运行测试，并将数据源与处理算法进行连线后保存；S23. Run the constructed service model for testing, and save the data source and processing algorithm after connecting them;

S24.对连线结果进行发布。S24. Publish the connection result.

作为本发明的进一步改进，所述步骤S22中对上传的算法，通过数据服务模型构建进行处理数据，具体包括以下子步骤：As a further improvement of the present invention, the uploaded algorithm in step S22 processes data by constructing a data service model, which specifically includes the following sub-steps:

S22a.模型基本信息注册：注册信息包括服务模型名称、服务模型描述、模型权限设置、适用地区范围、适用时间阶段；S22a. Registration of basic model information: Registration information includes service model name, service model description, model permission settings, applicable area scope, and applicable time period;

S22b.模型标签设置：利用标签设置，给数据服务一个指定的标签；S22b. Model label setting: Use label setting to give the data service a specified label;

S22c.模型参数设置：利用可视化技术将模型参数配置结果进行直观展示，参数设置操作包括对参数的增加、删除、修改；S22c. Model parameter setting: Use visualization technology to intuitively display the model parameter configuration results. Parameter setting operations include adding, deleting, and modifying parameters;

S22d.添加业务流程：提供数据服务主题库API或根据用户需求形成API共享接口，通过添加子节点的方式，将业务流程进行模块化处理；同时配置数据处理流程的节点的属性。S22d. Add business process: provide data service theme library API or form API sharing interface according to user needs, modularize the business process by adding sub-nodes; and configure the properties of the nodes of the data processing process.

作为本发明的进一步改进，所述步骤S23具体包括以下子步骤：As a further improvement of the present invention, step S23 specifically includes the following sub-steps:

S23a.服务模型完整性评估：测试服务模型的完整度，测试包括任务分配反馈情况、流程子节点、数据源、逻辑规则的完整性、连续性在内的信息。S23a. Service model integrity assessment: Test the integrity of the service model, including task allocation feedback, process sub-nodes, data sources, integrity of logical rules, and continuity.

S23b.服务模型试运行：若服务模型的完整度评估合理，则进行试运行操作，将所有业务流程、流程子节点、算法、逻辑规则连接起来，并生成新的数据表单进行保存，同时形成计算机可读的信息。S23b. Service model trial run: If the integrity assessment of the service model is reasonable, a trial run operation is performed to connect all business processes, process sub-nodes, algorithms, and logical rules, and generate a new data form for storage, while forming computer-readable information.

作为本发明的进一步改进，所述步骤S24具体包括：As a further improvement of the present invention, the step S24 specifically includes:

将已审核的服务模型，提交至服务发布管理，并存为共享服务，编辑包括基本信息、服务运行策略、服务权限配置在内的服务信息进行发布。Submit the reviewed service model to the service release management and save it as a shared service. Edit the service information including basic information, service operation strategy, and service permission configuration for release.

作为本发明的进一步改进，所述步骤S3数据服务审批模块执行审批的过程具体包括：As a further improvement of the present invention, the process of the data service approval module executing approval in step S3 specifically includes:

根据用户权限显示服务发布审批事项，并根据用户申请内容进行审批通过和审批失败的操作，当审批结果为失败时，填写审批失败原因。Display service release approval items according to user permissions, and perform approval or failure operations based on user application content. When the approval result is failure, fill in the reason for approval failure.

作为本发明的进一步改进，所述步骤S4中数据服务管理模块对服务进行管理过程包括As a further improvement of the present invention, the process of managing the service by the data service management module in step S4 includes:

S41.对生成的服务进行控制，具体为以下子执行板块：S41. Control the generated services, specifically the following sub-execution sections:

S41a.服务信息预览：预览服务的所有相关信息及授权信息，信息包括服务元数据、模型元数据，如输入输出参数、服务地址、调用权限；S41a. Service information preview: preview all relevant information and authorization information of the service, including service metadata and model metadata, such as input and output parameters, service address, and call permissions;

S41b.服务启动：启动暂停中的服务以及有关授权信息的共享；S41b service start: start the suspended service and the sharing of authorization information;

S41c.服务暂停：暂时停止选中的服务以及有关授权信息的共享，保留服务；S41c. Service suspension: temporarily stop the selected service and the sharing of authorization information, and retain the service;

S41d.服务下架：下架选中的服务以及有关授权信息，保留服务版本；S41d. Service removal: Remove the selected service and related authorization information, and retain the service version;

S41e.服务删除：删除选中的服务以及有关授权信息；S41e. Service deletion: Delete the selected service and related authorization information;

S42.对运行策略进行管理，具体为以下子执行板块：S42. Manage the operation strategy, specifically the following sub-execution sections:

S42a.定期运行：设定服务自动运行时间、数据源更新方式、更新频次，进行服务的定时调用并保存运行结果，根据用户提供参数筛选提供结果集服务调用；S42a regular operation: set the service automatic operation time, data source update method, update frequency, scheduled service call and save the operation results, according to the user-provided parameters to filter the result set service call;

S42b.调用运行：按照服务保存的数据源和数据逻辑，在用户调用过程中根据用户提供的参数，自动抽取数据并实时计算返回服务结果。S42b. Call and run: According to the data source and data logic saved by the service, during the user call process, the service automatically extracts data and calculates and returns service results in real time based on the parameters provided by the user.

本发明还提供一种深入定制算法集成的可视化数据处理系统，包括分析算法库管理模块、数据服务模型构建模块、数据服务审批模块、数据服务管理模块；所述分析算法库管理模块、数据服务模型构建模块、数据服务审批模块、数据服务管理模块依次衔接；The present invention also provides a visual data processing system with in-depth customized algorithm integration, including an analysis algorithm library management module, a data service model construction module, a data service approval module, and a data service management module; the analysis algorithm library management module, the data service model construction module, the data service approval module, and the data service management module are connected in sequence;

所述分析算法库管理模块：提供基础通用的基础信息，同时提供新增算法工具的编辑入口；The analysis algorithm library management module: provides basic general information and provides an editing entry for newly added algorithm tools;

所述数据服务模型构建模块：提供模型构建的入口和对构建完成的模型进行维护，主要包括数据服务模型构建、服务模型运行测试、数据服务模型发布；The data service model construction module: provides an entry for model construction and maintains the constructed model, mainly including data service model construction, service model operation test, and data service model release;

所述数据服务审批模块：对发布的模型进行审批以及判断服务是否能够对外使用；The data service approval module is used to approve the published model and determine whether the service can be used externally;

所述数据服务管理模块：对生成的服务进行控制和对运行策略进行管理。The data service management module controls the generated services and manages the operation strategies.

本发明的有益效果是：通过对算法进行集成管理，可视化数据处理编排。并能够在编排后生成服务，最大限度方便用户使用，并且本发明很简单、低成本地解决了数据处理人员学习成本高、处理繁琐的问题。The beneficial effects of the present invention are: by integrating and managing algorithms, visualizing data processing and arranging, and being able to generate services after arranging, maximizing the convenience for users to use, and the present invention solves the problems of high learning cost and cumbersome processing for data processing personnel in a simple and low-cost manner.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本发明中可视化数据处理方法的流程框图。FIG. 1 is a flowchart of the visual data processing method of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。In order to make the purpose, technical solutions and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments.

实施例一：Embodiment 1:

本技术方案主要通过算法管理模块进行算法配置，针对于汇交的数据，依据业务应用场景进行可视化数据加工处理。技术方面采用B/S架构，页面设计支持响应式适配移动端浏览，采用前后端分离技术架构，前端基于VUE.JS技术体系，服务端基于SpringBoot框架开发，既可以作为独立服务，又可以无缝接入Spring Cloud微服务治理的框架，数据存储采用分布式框架hadoop3.x, 基于hadoop3.x的存储、Hive查询及结合Spark计算，带来了更方便扩展的能力和优秀的性能，随着数据量增大，不仅不会造成服务不可用，还能保证在可容忍时间内返回数据。This technical solution mainly configures the algorithm through the algorithm management module, and performs visual data processing for the submitted data according to the business application scenario. In terms of technology, B/S architecture is adopted, and the page design supports responsive adaptation to mobile browsing. It adopts the front-end and back-end separation technology architecture. The front-end is based on the VUE.JS technology system, and the server is developed based on the SpringBoot framework. It can be used as an independent service and can be seamlessly connected to the Spring Cloud microservice governance framework. The data storage adopts the distributed framework hadoop3.x. Based on the storage, Hive query and Spark calculation of hadoop3.x, it brings more convenient expansion capabilities and excellent performance. As the amount of data increases, it will not only not cause service unavailability, but also ensure that data is returned within a tolerable time.

如图1所示，本发明的一种深入定制算法集成的可视化数据处理方法，包括以下步骤：As shown in FIG1 , a visualization data processing method with in-depth customized algorithm integration of the present invention comprises the following steps:

其中，步骤S1中具体包括：Wherein, step S1 specifically includes:

分析算法库管理提供包括基础业务计算公式、算法模型、通用计算方式等的算法库，同时提供新增算法工具的编辑入口，用户将需要用到的算法进行打包，在算法管理页面进行算法注册，并配置包括算法属性、算法详细信息、算法名称等基础信息，最后供可视化数据建模使用。The analysis algorithm library management provides an algorithm library including basic business calculation formulas, algorithm models, general calculation methods, etc., and also provides an editing entry for new algorithm tools. Users can package the algorithms they need, register the algorithms on the algorithm management page, and configure basic information including algorithm properties, algorithm details, algorithm name, etc., for use in visual data modeling.

步骤S1中分析算法库管理包括以下子执行板块：The analysis algorithm library management in step S1 includes the following sub-execution modules:

其中，步骤S2具体包括：用户上传算法后，通过数据服务模型构建来处理数据，数据服务模型构建提供可视化页面，通过拖拉拽的形式，将注册好的算法引入，对数据进行处理；包括以下子步骤：Among them, step S2 specifically includes: after the user uploads the algorithm, the data is processed by building a data service model, and the data service model provides a visualization page, and the registered algorithm is introduced by dragging and dropping to process the data; it includes the following sub-steps:

S24.对连线结果进行发布。S24. Publish the connection result.

步骤S22中对上传的算法，通过数据服务模型构建进行处理数据，具体包括以下子步骤：In step S22, the uploaded algorithm is processed by constructing a data service model, which specifically includes the following sub-steps:

步骤S23具体包括以下子步骤：Step S23 specifically includes the following sub-steps:

S23a.服务模型完整性评估：测试服务模型的完整度，测试包括任务分配反馈情况、流程子节点、数据源、逻辑规则的完整性、连续性在内的信息；S23a. Service model integrity assessment: Test the integrity of the service model, including task allocation feedback, process sub-nodes, data sources, integrity and continuity of logical rules;

步骤S24具体包括：Step S24 specifically includes:

其中，步骤S3数据服务审批模块执行审批的过程具体包括：The process of the data service approval module executing approval in step S3 specifically includes:

根据用户权限显示服务发布审批事项，并根据用户申请内容进行审批通过和审批失败的操作，当审批结果为失败时，填写审批失败原因。编排好的数据模型需要审批后才可发布为服务，需要有权限的用户对其进行审批，保障服务的合理使用。Display service release approval items according to user permissions, and perform approval and approval failure operations according to the user's application content. When the approval result is failure, fill in the reason for approval failure. The compiled data model needs to be approved before it can be released as a service. It needs to be approved by authorized users to ensure the reasonable use of the service.

其中，步骤S4中数据服务管理模块对服务进行管理过程包括The data service management module manages the service in step S4, including:

S41e.服务删除：删除选中的服务以及有关授权信息。S41e. Service deletion: Delete the selected service and related authorization information.

审批通过的服务在数据服务管理页面可以对其进行管理，数据服务管理主要是对生成的服务进行控制和对运行策略进行管理，可以自由的启停服务以及规定服务可使用时间段。Approved services can be managed on the data service management page. Data service management mainly controls the generated services and manages the operation strategies. Services can be started and stopped freely and the service availability time period can be specified.

本发明的方法通过分析算法库管理作为算法集成的一个入口，能够对算法进行有效管理及集成；通过可视化的流程构建，能够方便用户使用，减少学习成本；数据服务发布能够将处理好的数据直接发布为服务，方便用户使用。The method of the present invention can effectively manage and integrate algorithms by analyzing algorithm library management as an entry point for algorithm integration; through visual process construction, it can be convenient for users to use and reduce learning costs; data service publishing can directly publish processed data as services, which is convenient for users to use.

本发明提供一种继承各类算法，并能够可视化进行数据处理器编排，最终形成数据服务，使用简单，实用性强。通过简单的方式对算法进行集成管理，可视化数据处理编排，处理后发布为服务，大大降低使用成本。The present invention provides a method of inheriting various algorithms and visually arranging data processors to eventually form data services, which is simple to use and highly practical. The algorithm is integrated and managed in a simple way, data processing and arrangement are visualized, and the data is published as a service after processing, which greatly reduces the cost of use.

实施例二：Embodiment 2:

本发明提供一种深入定制算法集成的可视化数据处理系统，包括分析算法库管理模块、数据服务模型构建模块、数据服务审批模块、数据服务管理模块；分析算法库管理模块、数据服务模型构建模块、数据服务审批模块、数据服务管理模块依次衔接。The present invention provides a visual data processing system with in-depth customized algorithm integration, including an analysis algorithm library management module, a data service model construction module, a data service approval module, and a data service management module; the analysis algorithm library management module, the data service model construction module, the data service approval module, and the data service management module are connected in sequence.

分析算法库管理模块：提供基础通用的基础业务计算公式、算法模型、通用计算方式等，同时提供新增算法工具的编辑入口。算法包含类型大致有Hive抽取算法、count算法、hive查询算法、数据汇总算法、统计算法、计算算法、Spark算法，以及基础支撑平台算法集提供的空间融合计算算法等。具体功能如下：Analysis algorithm library management module: provides basic and general business calculation formulas, algorithm models, general calculation methods, etc., and also provides editing entry for newly added algorithm tools. The algorithms include Hive extraction algorithm, count algorithm, hive query algorithm, data aggregation algorithm, statistical algorithm, calculation algorithm, Spark algorithm, and spatial fusion calculation algorithm provided by the basic support platform algorithm set. The specific functions are as follows:

1)算法列表管理：将当前添加的算法以列表形式展示，并进行分类管理。列表展示信息包括算法名称、算法描述、创建时间、当前状态（启用/停用）等。列表按照创建时间进行倒序排列，并支持翻页功能。1) Algorithm list management: Display the currently added algorithms in a list format and manage them by category. The list display information includes algorithm name, algorithm description, creation time, current status (enabled/disabled), etc. The list is arranged in reverse order according to the creation time and supports page turning.

2)算法查询：支持根据算法名称和创建时间进行搜索查询。2) Algorithm query: supports search queries based on algorithm name and creation time.

3)新增算法：支持用户新增算法，并进行详细信息填写，并支持导入本地算法包。3) New algorithm: supports users to add new algorithms and fill in detailed information, and supports importing local algorithm packages.

4)修改算法：支持用户对算法详细信息的更改和算法包的重新上传。4) Modify algorithm: Support users to modify algorithm details and re-upload algorithm packages.

5)移除算法：支持删除没用的算法，删除前进行提醒。同时针对被占用的算法，作出不允许删除的提示。5) Remove algorithm: Supports deleting useless algorithms and gives reminders before deletion. At the same time, for occupied algorithms, a reminder is given that deletion is not allowed.

6)启用/停用算法：支持对算法进行启用、停用操作。算法的启停直接影响数据服务模型的运行。6) Enable/disable algorithm: supports enabling and disabling the algorithm. The start and stop of the algorithm directly affects the operation of the data service model.

数据服务模型构建模块：提供模型构建的入口和对构建完成的模型进行维护，主要包括数据服务模型构建、服务模型运行测试、数据服务模型发布。Data service model construction module: provides an entry for model construction and maintenance of the completed model, mainly including data service model construction, service model operation testing, and data service model publishing.

其中数据服务模型构建包括：The data service model construction includes:

1) 模型基本信息注册：注册信息包括服务模型名称、服务模型描述、模型权限设置、适用地区范围、适用时间阶段等。1) Registration of basic model information: Registration information includes service model name, service model description, model permission settings, applicable area scope, applicable time period, etc.

2)模型标签设置：利用标签设置，给数据服务一个特定的标签，类似一种描述信息，允许新增标签和多标签选择。2) Model tag setting: Use tag setting to give the data service a specific tag, which is similar to a description information, allowing new tags to be added and multiple tags to be selected.

3) 模型参数设置：利用可视化技术将模型参数配置结果进行直观展示，主要功能包括参数的增加、删除、修改。3) Model parameter setting: Use visualization technology to intuitively display the model parameter configuration results. The main functions include adding, deleting and modifying parameters.

4)添加业务流程：提供数据服务主题库API或根据用户需求形成API共享接口，通过添加子节点（算法）的方式，将业务流程进行模块化处理。同时可以配置数据处理流程的节点的属性。4) Add business process: Provide data service theme library API or form API sharing interface according to user needs, and modularize business process by adding sub-nodes (algorithms). At the same time, you can configure the properties of the nodes of the data processing process.

服务模型运行测试针对已经构建的服务模型进行运行测试，包括以下功能：Service model run test runs the built service model, including the following functions:

1)服务模型完整性评估：测试服务模型的完整度，包括任务分配反馈情况、流程子节点、数据源、逻辑规则的完整性、连续性等信息。1) Service model integrity assessment: Test the integrity of the service model, including task allocation feedback, process sub-nodes, data sources, integrity and continuity of logical rules, and other information.

2) 服务模型试运行：若服务模型的完整度评估合理，即可进行试运行操作，该操作将所有业务流程、流程子节点、算法、逻辑规则连接起来，并生成新的数据表单进行保存，同时形成计算机可读的信息。2) Service model trial run: If the integrity assessment of the service model is reasonable, a trial run can be performed, which connects all business processes, process sub-nodes, algorithms, and logical rules, generates a new data form for storage, and forms computer-readable information.

数据服务模型发布：将已审核的服务模型，提交至服务发布管理，为共享服务，编辑服务的基本信息、服务运行策略、服务权限配置等信息进行发布。Data service model release: Submit the reviewed service model to the service release management, edit the basic information of the service, service operation strategy, service permission configuration and other information for sharing services, and publish them.

数据服务审批模块：对发布的模型进行审批以及判断服务是否能够对外使用的关键一步，需要有权限用户进行确认。该模块显示用户所有的待办、已办任务。根据用户权限显示服务发布审批事项。根据用户申请内容进行审批通过和审批失败的操作。当审批结果为失败时，填写审批失败原因。Data service approval module: A key step to approve the published model and determine whether the service can be used externally. Confirmation by authorized users is required. This module displays all pending and completed tasks of the user. Service release approval items are displayed according to user permissions. Approval is performed based on the user's application content. If the approval result is failure, fill in the reason for the approval failure.

数据服务管理模块：生成的服务进行控制和对运行策略进行管理，具体内容如下：Data service management module: controls the generated services and manages the operation strategies. The specific contents are as follows:

1) 服务运行管理：1) Service operation management:

服务信息预览：预览服务的所有相关信息及授权信息，包括服务元数据、模型元数据等，如输入输出参数、服务地址、调用权限等；Service Information Preview: Preview all relevant information and authorization information of the service, including service metadata, model metadata, such as input and output parameters, service address, calling permissions, etc.;

服务启动：启动暂停中的服务以及有关授权信息的共享；Service startup: Start the suspended service and share the related authorization information;

服务暂停：暂时停止选中的服务以及有关授权信息的共享，保留服务；Service suspension: temporarily stop the selected service and the sharing of related authorization information, and retain the service;

服务下架：下架选中的服务以及有关授权信息，保留服务版本；Service delisting: delist the selected service and related authorization information, and retain the service version;

服务删除：删除选中的服务以及有关授权信息。Service Delete: Delete the selected service and related authorization information.

2)服务运行策略管理：服务运行策略管理是针对服务调用机制进行管理。具体功能如下：2) Service operation strategy management: Service operation strategy management is to manage the service calling mechanism. The specific functions are as follows:

定期运行：设定服务自动运行时间、数据源更新方式、更新频次，进行服务的定时调用并保存运行结果，根据用户提供参数筛选提供结果集服务调用；Regular operation: set the service automatic operation time, data source update method, update frequency, perform scheduled service calls and save the operation results, and filter and provide result set service calls based on the parameters provided by the user;

调用运行：按照服务保存的数据源和数据逻辑，在用户调用过程中根据用户提供的参数，自动抽取数据并实时计算返回服务结果。Call and run: According to the data source and data logic saved by the service, the service automatically extracts data and calculates and returns service results in real time based on the parameters provided by the user during the user call process.

本发明的系统是一款专为各项数据处理需求和服务定制生产的可视化数据处理工具。通过底层对数据处理算法的定制，实现无需代码即可轻松构建ETL和ETL进程，轻松引入、移动、准备、转换和处理数据，并可在直观的视觉环境中完成数据建模。是实现多源数据融合和多元服务融合的有力支撑，能够在对外提供数据服务能力的过程中提质增效。通过定制算法以及可视化处理数据的方式，较好地解决了数据处理过程中学习成本高、使用繁琐、复用性差等问题，能够对数据处理人员提供便利，降低使用成本。The system of the present invention is a visual data processing tool customized for various data processing needs and services. Through the customization of the underlying data processing algorithm, it is possible to easily build ETL and ETL processes without code, easily introduce, move, prepare, convert and process data, and complete data modeling in an intuitive visual environment. It is a strong support for realizing multi-source data fusion and multi-service fusion, and can improve quality and efficiency in the process of providing data service capabilities to the outside world. Through customized algorithms and visual data processing methods, it better solves the problems of high learning cost, cumbersome use, poor reusability, etc. in the data processing process, which can provide convenience for data processing personnel and reduce the cost of use.

本发明能够直接生成数据服务，方便数据对接；本发明学习成本低，可视化构建服务能够快速且简单的构建处理流程；分析算法库管理能有效对算法进行管理及集成。The present invention can directly generate data services, which is convenient for data docking; the present invention has low learning cost, and the visual construction service can quickly and simply build the processing flow; the analysis algorithm library management can effectively manage and integrate the algorithms.

以上内容是结合具体的优选实施方式对本发明所作的进一步详细说明，不能认定本发明的具体实施只局限于这些说明。对于本发明所属技术领域的普通技术人员来说，在不脱离本发明构思的前提下，还可以做出若干简单推演或替换，都应当视为属于本发明的保护范围。The above contents are further detailed descriptions of the present invention in combination with specific preferred embodiments, and it cannot be determined that the specific implementation of the present invention is limited to these descriptions. For ordinary technicians in the technical field to which the present invention belongs, several simple deductions or substitutions can be made without departing from the concept of the present invention, which should be regarded as falling within the scope of protection of the present invention.

Claims

1. A visual data processing method with in-depth customized algorithm integration, characterized by comprising the following steps:

S1. By analyzing the algorithm library management, package and upload the required algorithms, and configure the basic information of the algorithms;

S2. Through the construction of data service model, the data is processed by algorithms and arranged into data services;

S3. After arranging the service, the service is approved by the data service approval module;

S4. After approval, use the data service management module to manage the service;

The step S2 specifically includes: after the user uploads the algorithm, the data is processed by building a data service model, and the data service model provides a visualization page, and the registered algorithm is introduced by dragging and dropping to process the data; it includes the following sub-steps:

S21. Pull in the data extraction operator and select the data source;

S22. Add the processing algorithm uploaded by the user, edit the configuration properties of the algorithm, and build the service model;

S23. Run the constructed service model for testing, and save the data source and processing algorithm after connecting them;

S24. Publish the connection result;

In step S22, the uploaded algorithm is processed by constructing a data service model, which specifically includes the following sub-steps:

S22a. Registration of basic model information: Registration information includes service model name, service model description, model permission settings, applicable area scope, and applicable time period;

S22b. Model label setting: Use label setting to give the data service a specified label;

S22c. Model parameter setting: Use visualization technology to intuitively display the model parameter configuration results. Parameter setting operations include adding, deleting, and modifying parameters;

S22d. Add business process: provide data service theme library API or form API sharing interface according to user needs, modularize the business process by adding sub-nodes; and configure the properties of the nodes of the data processing process.

2. The visual data processing method with in-depth customized algorithm integration according to claim 1 is characterized in that the step S1 specifically comprises:

The analysis algorithm library management provides an algorithm library including basic business calculation formulas, algorithm models, and general calculation methods. It also provides an editing entry for new algorithm tools. Users can package the algorithms they need, register the algorithms on the algorithm management page, and configure basic information including algorithm properties, algorithm details, and algorithm name.

3. According to the visual data processing method with in-depth customized algorithm integration as claimed in claim 2, it is characterized in that the analysis algorithm library management in step S1 includes the following sub-execution modules:

S11. Algorithm list management: Display the currently added algorithms in a list and manage them by category;

S12. Algorithm query: search and query based on algorithm name and creation time;

S13. Add algorithm: The user adds an algorithm, fills in the detailed information, and imports the local algorithm package;

S14. Modify algorithm: User changes to algorithm details and re-uploads the algorithm package;

S15. Remove algorithm: delete useless algorithms and give reminders before deletion; meanwhile, give reminders not to allow deletion of occupied algorithms;

S16. Enable/disable algorithm: enable or disable the algorithm.

4. According to the visual data processing method with in-depth customized algorithm integration as claimed in claim 1, it is characterized in that the step S23 specifically includes the following sub-steps:

S23a. Service model integrity assessment: Test the integrity of the service model, including task allocation feedback, process sub-nodes, data sources, integrity and continuity of logic rules;

S23b. Service model trial run: If the integrity assessment of the service model is reasonable, a trial run operation is performed to connect all business processes, process sub-nodes, algorithms, and logical rules, and generate a new data form for storage, while forming computer-readable information.

5. According to the visual data processing method with in-depth customized algorithm integration as claimed in claim 1, it is characterized in that the step S24 specifically comprises:

Submit the reviewed service model to the service release management and save it as a shared service. Edit the service information including basic information, service operation strategy, and service permission configuration for release.

6. According to the visual data processing method with in-depth customized algorithm integration as claimed in claim 1, it is characterized in that the process of the data service approval module executing approval in step S3 specifically includes:

Display service release approval items according to user permissions, and perform approval or failure operations based on user application content. When the approval result is failure, fill in the reason for approval failure.

7. The visual data processing method with in-depth customized algorithm integration according to claim 1 is characterized in that the data service management module in step S4 manages the service including:

S41. Control the generated services, specifically the following sub-execution sections:

S41a. Service information preview: preview all relevant information and authorization information of the service, including service metadata and model metadata, such as input and output parameters, service address, and call permissions;

S41b service start: start the suspended service and the sharing of authorization information;

S41c. Service suspension: temporarily stop the selected service and the sharing of authorization information, and retain the service;

S41d. Service removal: Remove the selected service and related authorization information, and retain the service version;

S41e. Service deletion: Delete the selected service and related authorization information;

S42. Manage the operation strategy, specifically the following sub-execution sections:

S42a regular operation: set the service automatic operation time, data source update method, update frequency, scheduled service call and save the operation results, according to the user-provided parameters to filter the result set service call;

S42b. Call and run: According to the data source and data logic saved by the service, during the user call process, the service automatically extracts data and calculates and returns service results in real time based on the parameters provided by the user.

8. A visual data processing system with in-depth customized algorithm integration, characterized by comprising an analysis algorithm library management module, a data service model construction module, a data service approval module, and a data service management module; the analysis algorithm library management module, the data service model construction module, the data service approval module, and the data service management module are connected in sequence;

The analysis algorithm library management module: provides basic general information and provides an editing entry for newly added algorithm tools;

The data service model construction module: provides an entry for model construction and maintains the constructed model, mainly including data service model construction, service model operation test, and data service model release;

The data service approval module is used to approve the published model and determine whether the service can be used externally;

The data service management module controls the generated services and manages the operation strategies;

The data service model construction includes:

1) Registration of basic model information: Registration information includes service model name, service model description, model permission settings, applicable area scope, and applicable time period;

2) Model tag setting: Use tag setting to give a data service a specific tag, which is similar to a description information. It allows adding tags and selecting multiple tags.

3) Model parameter setting: Use visualization technology to intuitively display the model parameter configuration results. The main functions include adding, deleting and modifying parameters;

4) Add business processes: Provide a data service theme library API or form an API sharing interface based on user needs, modularize the business process by adding sub-nodes (algorithms); and configure the properties of the nodes of the data processing process.