CN120611814A

CN120611814A - Intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing

Info

Publication number: CN120611814A
Application number: CN202410264439.9A
Authority: CN
Inventors: 胡毅; 赵彦庆; 刘龙翼; 吴迪; 张曦阳; 马用泽
Original assignee: Shenyang Zhongke Cnc Technology Co ltd
Current assignee: Shenyang Zhongke Cnc Technology Co ltd
Priority date: 2024-03-08
Filing date: 2024-03-08
Publication date: 2025-09-09

Abstract

The invention provides a data driving and multi-element cloud edge collaborative computing method aiming at intelligent workshop construction, which comprises the functions of multi-protocol heterogeneous equipment sensing, edge node self-adaptive load balancing, light-weight real-time edge computing processing, cloud centralized management control, industrial intelligent application service deployment scheduling, cloud edge task unloading and the like, and comprises three components of a data acquisition and transmission middleware, an edge flow processing engine and a cloud service center. The method has the characteristics of real-time performance, low delay, flexibility, expandability, loose coupling, low resource occupancy, high availability and privacy security.

Description

Intelligent workshop production optimization method based on data driving and multi-element cloud edge cooperative computing

Technical Field

The invention relates to the field of cloud edge cooperative computing, in particular to an intelligent workshop production optimization method based on data driving and multi-element cloud edge cooperative computing.

Background

The traditional production workshop has the problems of low production efficiency, opaque process and the like due to the limitations of manual operation and independent production flow, and mass data cannot be effectively perceived and applied due to the existence of information islands. After intelligent transformation and upgrading of workshops, the mass heterogeneous devices and the cloud computing center are comprehensively interconnected and intercommunicated through ubiquitous connection. By constructing heterogeneous industrial Internet, intelligent perception and control of workshop production scheduling full life cycle are realized, and the industrial interconnection mode is called a cloud centralized processing mode. In the cloud centralized processing mode, the Internet of things equipment collects data at the network edge in real time, the data are pushed to a cloud computing center for processing analysis through a multi-element transmission protocol and a low-power consumption communication technology, and analysis results are used in the fields of visualization, data mining, control decision and the like. However, as the number of access industrial internet of things devices is continuously increased, network bandwidth is gradually limited, and congestion of a cloud network may be caused by data scale and frequent data interaction of a PB level, an average delay of data transmission reaches 25-50ms, and some of the data transmission is even higher, which is catastrophic for an industrial field with high real-time requirements, and part of important data may be lost in serious cases. Aiming at the problems, a cooperative interaction mechanism under a new system architecture is researched, and the existing cloud centralized processing industrial interconnection mode is optimized and upgraded to meet industrial intelligent requirements of intelligent workshop production self-control, quality control, predictive maintenance, process optimization and the like.

Disclosure of Invention

In view of the above, the invention provides an intelligent workshop production optimization method based on data driving and multi-cloud edge cooperative computing, which develops a data acquisition and transmission middleware and an edge flow processing engine, a cloud service center realizes high-concurrency data acquisition and transmission, real-time edge computing and cloud intelligent analysis, and Kubernetes and KubeEdge are adopted to realize central cloud and edge node cooperative computing, so that containerized application arrangement and management is expanded to edge nodes and top-level control and scheduling capability of the central cloud is maintained, and thus, the requirements of high availability and high stability of internet of things equipment are met.

The technical scheme adopted by the invention for achieving the purpose is as follows:

the intelligent workshop production optimization method based on data driving and multi-cloud-edge cooperative computing comprises the following steps:

1) The data acquisition and transmission middleware acquires workshop production data, performs heterogeneous equipment real-time protocol conversion and multi-source data perception fusion, maps data of different sources to a unified format, and pushes the data to an upper layer service;

2) The edge flow processing engine performs edge calculation, edge storage and edge intelligence on the data on the edge node based on the SQL parser;

3) The cloud service center externally provides various cloud application services in a SaaS service mode according to the construction requirements of the intelligent workshops;

4) Based on Kubernetes and KubeEdge, a cloud edge cooperative mechanism is constructed, a communication channel is established at the cloud edge, and the containerized application arrangement and management of the cloud service center is expanded to the edge node.

Said step 1) comprises the steps of:

1.1 Device registration, namely performing device registration through DashBoard visual operation;

1.2 Constructing a universal plug-in interface, defining a group of standard functions for processing plug-in initialization, connection establishment, data reading and data writing, defining a specific plug-in class for each class of protocol to realize the standard function of the universal interface, and creating an instance of the plug-in class based on a factory mode after equipment registration is completed, wherein the instance is responsible for processing connection, communication, data analysis and southbound data circulation of a specific protocol;

1.3 Constructing a star-type expandable topology network by adopting a PAIR peer-to-peer communication mode of ZeroMQ message library, connecting an industrial equipment driving adapter in the south direction, connecting each data application adapter in the north direction, installing a south direction communication plug-in for each external equipment by using a loose coupling architecture by a south direction driving node, establishing a specific group and a point location to directly communicate with the external equipment, forwarding communication data to a core message route, subscribing the specific group established in the south direction node by the north direction application node, receiving the data message from the core message route, and carrying out logic processing or forwarding data according to the system requirement;

1.4 Unified management, namely managing the data tag, the southbound communication plug-in, the adapter and the group;

1.5 Constructing a unified JSON data format, carrying out consistency processing on a plurality of different communication protocols, and routing the JSON data to a northbound application node;

1.6 The core message route takes the MQTT Broker as a northbound application adapter, the command message appointed in the subscription MQTT Broker is forwarded to ZeroMQ to form a unified command format, and the command message processed in the southbound protocol driving subscription ZeroMQ is issued to specific equipment through a protocol plug-in type command control function.

The device registration supports configuration plug-in registration and template batch registration, wherein:

the configuration plug-in registration completes equipment registration by selecting plug-ins supported by an equipment communication protocol, configuring connection parameters, setting groups and point positions;

the template batch registration completes batch registration of nodes by creating templates with specified plug-ins and configuration information.

Said step 2) comprises the steps of:

2.1 Edge calculation, namely taking output data of a data acquisition and transmission middleware as a basic data source, creating a calculation task through built-in sources, actions and functions, and submitting the calculation task through a REST API;

2.2 Providing a plurality of storage modes according to data sources and data flow directions, and performing redundancy backup when the local cache is unavailable in cloud storage;

2.3 Cloud edge task allocation, namely unloading different types of tasks to the optimal node through cloud edge task allocation;

2.4 Cloud edge consistency service, namely, through deploying Kubeedge a cloud primary edge computing platform and carrying out personalized design on the cloud computing platform, unified cloud edge node management and control, unified IOT device access, reliable data transmission and edge-crossing cloud data synchronization are realized, and the functions of the cloud computing platform are extended to the edge.

The step 2.2) specifically comprises the following steps:

Storing the state information, configuration information and equipment metadata of the edge nodes by adopting an embedded lightweight database SQLite;

the method comprises the steps of adopting an InfluxDB time sequence database to store atomic data of the Internet of things equipment;

And carrying out real-time edge caching on the intermediate data of the edge calculation analysis by adopting a Redis memory database.

Said step 2.3) comprises the steps of:

2.3.1 Constructing a task allocation data set with labels based on the historical task data construction characteristics, and determining the unbalance degree of a data set sample;

2.3.2 Pre-setting a destabilizing sample weight coefficient, and constructing LightGBM lightweight gradient lifting decision tree model;

2.3.3 Determining LightGBM optimal parameter combinations of the model based on multiple rounds of iteration of the genetic algorithm, and determining a final reasoning model through multiple rounds of model training;

2.3.4 Capturing performance indexes of edge nodes in real time, and calculating an aggregate value by an edge flow processing engine based on a time window to construct a verification set input reasoning model to be classified;

2.3.5 The task classification and the index output by the model are used as input parameters of kubernetes Custom Scheduler, and the calculation task is distributed to the optimal node by combining with the self-defined scheduling strategy.

The self-defined scheduling strategy is divided into node preselection and node preference, wherein:

the pre-selection stage selects network bandwidth, network delay and storage delay as pre-selection indexes for the delay sensitive task, the computation intensive task selects CPU utilization rate, GPU utilization rate and memory utilization rate as pre-selection indexes, the pre-selection indexes are selected according to task classification results, comprehensive scores of all nodes are computed based on a time sequence related entropy weight method, and nodes with scores not meeting a threshold are filtered;

and in the optimization stage, all the preselected indexes are fused, the comprehensive scores of the remaining nodes are calculated by adopting weighted average, and the node with the highest score is used as the optimal node.

The entropy weight method based on time sequence correlation comprises the following steps:

(1) And (3) standardization treatment:

Assuming that r time points, m nodes and n indexes are provided, X _ijt represents the t time point, the value of the j index of the i node is selected according to the task classification result, the indexes are classified into positive indexes or negative indexes, and the positive indexes are processed:

Processing the negative indexes:

Wherein X _max represents the maximum value in the j-th index, and X _min represents the minimum value in the j-th index;

(2) Calculating the proportion P _ijt of the j index of the inode at the t time point:

(3) Calculating an entropy value e _j and a difference coefficient d _j of the j-th index:

wherein r and m are coefficients, respectively;

(4) Determining a weight coefficient of a j index according to the information entropy redundancy:

Wherein, e _j≤1,d_j＝1-e_j is more than or equal to 0;

(5) Calculating a comprehensive evaluation index:

Said step 3) comprises the steps of:

3.1 The data integration layer integrates production data, calculation data and cloud application data transmitted by the edge nodes to construct a layered data warehouse, wherein the layered data warehouse comprises a deployment center data warehouse, a logic data warehouse, a modeling data warehouse and a stream calculation data warehouse, wherein the center data warehouse is constructed by adopting a multi-level architecture, an ODS layer is used as a buffer area between a source system and the data warehouse, a DWD layer stores the detail business data after cleaning, integration and conversion to provide index data for decision support and analysis, and a CTM layer realizes data integration and standardization based on general business subjects and rules;

3.2 The computing engine layer respectively designs transmission buses aiming at real-time data, offline data, object storage and cache data stored by the data integration layer, and provides different service components for the data application layer;

3.3 The data application layer is used as a cloud-side-end centralized management control center to carry out intelligent control decision and instruction issuing.

Said step 4) comprises the steps of:

4.1 The resource management module realizes state synchronization, monitoring sniffing and telescopic scheduling of cloud edge node resources based on Prometaus and KubeEdge, and simultaneously realizes millisecond cluster response and dynamic adjustment of resources;

4.2 The log analysis module fuses the ELK and KubeEdge log mechanisms to form a new mechanism suitable for cloud-edge-end log cooperation of the complex industrial site;

4.3 The security service module constructs security policies of identity authentication, communication encryption, equipment authentication and monitoring audit, each edge node is responsible for detecting the edge security state, the center cloud completes the analysis of the security state information, and the security policies are issued to the edge nodes.

The invention has the following beneficial effects and advantages:

1. Real-time performance and low delay, the data acquisition and transmission middleware and the edge flow processing engine enable the industrial Internet platform to perform real-time calculation and analysis at a place close to a data source, reduce data transmission delay and improve response speed.

2. The data acquisition and transmission middleware provides an adapter suitable for various industrial protocols and Internet of things protocols, converts multi-source data into a unified format, supports flexible configuration and provides more flexible and comprehensive data views. The edge flow processing adopts an SQL parser to provide edge real-time calculation, and has stronger expansibility.

3. Loose coupling, the data transmission realizes loose coupling design through middleware, the function operation among the modules is not affected, and the maintenance is easy.

4. The resource occupancy rate is low, all node services on the cloud edge are managed in a centralized mode through a central cloud, and the cloud node services are deployed in a Pod issuing form, so that the cloud node service has a small resource occupancy rate. The central cloud realizes real-time monitoring of each node resource and log, dynamic load balancing and resource allocation.

5. High availability, cloud-edge collaborative computing fully trades off the real-time requirements of latency-sensitive tasks against the computing resources requirements of computation-intensive tasks. The central cloud load is further reduced by performing partial computation on the edge devices for the delay sensitive tasks while reducing reliance on the communication network.

6. Privacy security, some sensitive data may not be suitable for being stored or processed in the cloud, and the edge flow processing engine provides consistent storage service and lightweight computing service, so that most data edge computing and storage requirements can be met, and privacy leakage risks caused by frequent cloud edge communication are avoided.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a schematic diagram of data acquisition and transmission middleware;

FIG. 3 is a flow chart of an edge flow processing engine task allocation;

FIG. 4 is a diagram of a cloud service center architecture;

FIG. 5 is a flow chart of an industrial instruction issue.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

An intelligent workshop production optimization method based on data driving and multi-cloud-edge collaborative calculation comprises the following steps:

1) The self-defined adapter is used for heterogeneous equipment real-time protocol conversion and multi-source data perception fusion, and maps data of different sources to a unified format by defining a general data model and a standard field, and pushes the data to an upper layer service.

2) And the edge flow processing engine provides real-time data processing capacity on the edge nodes based on the SQL parser, and realizes edge calculation, edge storage and edge intelligence.

3) The cloud service center is oriented to the intelligent workshop construction requirements, and provides various cloud application services to the outside in a SaaS service mode, including multidimensional data warehouse construction, MES system construction, flink large-scale data flow computing service development, digital twin workshop model system architecture construction and PHM intelligent algorithm library integration.

4) Cloud edge collaboration mechanisms are used for expanding containerized application arrangement and management to edge nodes based on Kubernetes and KubeEdge, a safe communication channel is established at the cloud edge, and consistent computing services between the cloud and the edge are provided through three modules (1) (2) (3). Global resource management and log analysis functions are provided based on Pormetheus and ELKs.

The step 1) specifically comprises the following steps:

The middleware based on data acquisition and transmission externally provides equipment registration, high-speed data acquisition, protocol conversion, data protocol, unified management, data routing and control instruction issuing service, and has ultra-low delay processing capability. The data acquisition transmission middleware realizes communication plug-ins of a plurality of specific protocols, supports visual equipment registration and configuration, and can be connected into a plurality of different communication protocol equipment in a key way through configuration of connection parameters, point positions and group information. The containerized lightweight deployment mode enables it to simultaneously meet the instantaneous concurrency of thousands of devices, occupying only ultra-low resources. The data acquisition and transmission middleware converts data of different protocol types from industrial equipment into unified MQTT information of the Internet of things, performs data protocol to form unified information type access edge computing nodes, and transmits an upstream processing result back to specific equipment in a command mode. The data acquisition and transmission middleware provides data support for upstream application, performs accurate reverse control on downstream equipment, and realizes a full life cycle self-management mechanism of intelligent workshop equipment.

The step 2) is specifically as follows:

Based on an edge flow processing engine, the JSON data of the northbound transmission of the data transmission and collection middleware is converted into streaming data, complex calculation tasks are abstracted into SQL operators, the SQL operators comprise real-time data cleaning and filtering, analysis and calculation, stateful flow processing, edge data caching and the like, and the edge flow processing engine supports asynchronous communication and task isolation and meets the multi-type bypass output of calculation results. In order to avoid performance bottlenecks caused by complex computing tasks, based on computing feature factors of an edge flow processing engine, classification of delay sensitive tasks and computation intensive tasks is combined with LightGBM algorithm, cloud edge tasks are distributed with a self-defined scheduling strategy, and the architecture of edge computing enables equipment to locally execute part of computing tasks, so that dependence on central cloud resources is reduced, and response speed and instantaneity of a system are improved.

The step 3) is specifically as follows:

The cloud service center establishes a robust cloud computing infrastructure at an infrastructure layer, and comprises basic components such as containers, storage, networks and the like, and artificial intelligence-big data sets such as a Flink flow computing platform, a Kafka message queue, a Pytorch machine learning platform and the like. And introducing core services such as an AI model algorithm library, a stream calculation service, data warehouse construction, digital twin modeling and the like into a basic capability layer so as to realize intelligent data processing and real-time simulation prediction.

The step 4) is specifically as follows:

Based on Prometheus and Grafana, resource monitoring and scheduling allocation are realized, and high availability and superior performance of the system are ensured. Based on ELK log service center, zipkin call chain tracking, user-defined Kubernetes dispatcher, nacos service registration center and Apollo configuration management, comprehensive service monitoring and service management are realized, and stability and maintainability of the system are ensured.

Examples

fig. 1 shows an overall flowchart of an intelligent workshop production optimization method based on data driving and multi-cloud-edge collaborative computing, which comprises the following steps:

(1) Data acquisition and transmission middleware

As shown in fig. 2, which is a schematic diagram of a data acquisition and transmission middleware in the embodiment of the present invention, different protocol types of data are accessed to a core routing module of the data acquisition and transmission middleware through a southbound protocol adapter, the core routing module splits original data into different data flow groups and performs data protocol to form a unified data format, data publishing and subscribing are performed through ZeroMQ, and the data after conversion protocol is pushed to a northbound application service such as an edge flow processing engine, an MQTT Broker and the like through a northbound application driver, and the data processing system mainly comprises the following functional modules:

And (3) equipment registration, namely carrying out equipment registration by the data acquisition transmission middleware through the provided DashBoard visual operation, wherein the equipment registration supports two kinds of configuration plug-in registration and template batch registration. The configuration plug-in registration completes the device registration by selecting plug-ins supported by the device communication protocol, configuring connection parameters, setting groups and point positions. Template batch registration the batch registration of nodes is accomplished by creating templates with specified plug-ins and configuration information. The connection parameters to be set by the plug-in units of different communication protocols are determined by the requirements of the data acquisition and transmission middleware to establish a communication connection with the device, for example Modbus (physical link, timeout period, maximum retry number, instruction retransmission interval, stop bit, check bit, baud rate, connection mode, IP address, port number, instruction interval, maximum connection time)), OPC UA (endpoint URL, user name, password, certificate, key), etc.

And the data acquisition and transmission middleware realizes the development of a protocol plug-in based on a C# function library, acquires the data of the designated equipment through the protocol plug-in and provides a southbound driving interface for the core routing module. The method comprises designing a universal plug-in interface, defining a group of standard functions for processing plug-in initialization, connection establishment, data reading, data writing, etc. A standard function is defined for each class of protocol to implement the generic interface for a specific plug-in class. After the equipment registration is completed, an instance of a plug-in class is created based on the factory mode and is responsible for processing connection, communication, data analysis and southbound data circulation of a specific protocol.

Data routing, namely, a data acquisition and transmission middleware uses a lightweight message library ZeroMQ to construct a message bus, and ZeroMQ supports a large-scale system and extremely high message transmission rate through the functions of efficient message queues, asynchronous I/O communication mechanisms, load balancing, dynamic endpoint discovery and the like. A star-type expandable topology network is constructed by adopting a PAIR peer-to-peer communication mode of ZeroMQ, a message routing module is positioned at a central core position of the network, an industrial equipment driving adapter is connected in a southerly direction, and each data application adapter is connected in a northbound direction. And the southbound driving node is used as a data producer, a southbound communication plug-in is installed for each external device by adopting a loose coupling architecture, a communication driving program for accessing the external device under a specific protocol is realized, a specific group and a point location are created to directly communicate with the external device, and communication data is forwarded to a core routing module. The northbound application node acts as a data consumer, subscribes to a specific group created in the southbound node, receives data messages from the core message route, and processes or forwards the data according to system requirements or logic.

The core route module is used for receiving the south-oriented production data and forwarding the south-oriented production data to the north-oriented application node, and managing data labels, south-oriented communication plug-ins, adapters and groups. The data tag management is responsible for maintaining the data tag configuration of all equipment nodes, and the data tag stores information such as node attributes, point positions, addresses, groups and the like, so that the real-time performance and high availability of data acquisition and transmission are ensured. The southbound communication plugin management is responsible for system function expansion, supports dynamic shared object file loading and file registration to a plugin table. The adapter manages the services that the plug-in responsible for installation can provide to communicate with the designated device as intended. Group management ensures accurate routing of data information to specific recipients within each subscription node group by maintaining subscription tables for data routing.

Data protocol-data acquisition and transmission middleware supports the consistency processing of multiple communication protocols, proposes a unified JSON data format {"node":node-name,"group":group-name,"timestamp":timestamp,"values":{"tag1":tag1-value,"tag2":tag2-value}}, and then routes the JSON data to a northbound application node (MQTT is adopted here, node-name is taken as MQTT topic), and a designated format which finally provides a data interface to an upper layer is formed at the northbound application node.

And the instruction issuing step of establishing a communication link specially used for controlling the instruction issuing between the data acquisition transmission middleware and the edge node MQTT Broker. The data acquisition and transmission middleware core routing module takes the MQTT Broker as a northbound application adapter, forwards ZeroMQ command messages appointed in the subscribed MQTT Broker to form a unified command format, and the southbound protocol driver takes the command messages processed in the consumer subscription ZeroMQ and issues the command messages to specific equipment through a protocol plug-in type command control function.

(2) Edge flow processing engine

The edge nodes form a unified data stream based on real-time data, edge application data and cloud MES historical data of the integrated Internet of things equipment of the edge stream processing engine connection module. And realizing data filtering, data cleaning and feature calculation based on the edge flow processing engine calculation module. The storage scheme is provided on demand based on the edge flow processing engine storage module. Based on the edge flow processing engine algorithm module, the computing tasks are classified, time delay sensitive tasks and computing intensive tasks are distinguished, and the edge task distribution is realized by combining a scheduling strategy. Cloud edge consistency service is achieved based on Kubeedge.

And (3) edge calculation, namely an edge flow processing engine takes output data of a data perception layer as a basic data source, creates a calculation task through a built-in source, action and function of a calculation module, wherein the created calculation task comprises attributes such as calculation logic, sink objects, logs and the like, and the created task is submitted through a REST API. After the calculation task is submitted, the edge flow processing engine continuously reads data from the data flow, and the SQL parser parses, plans and optimizes the calculation task to form a series of operator flows, and finally outputs the calculation result to the Sink object.

1) And analyzing a standard JSON data format output by the northbound data perception layer through data filtering, extracting a key field value, and filtering different devices through a node-name field defined by the data perception layer to finally obtain a designated field value of the designated device.

2) Data cleansing completes outliers, missing values, event-specific cleansing and criticality verification by defining cleansing rules.

3) The feature calculation aims at the business requirement of cloud edge task allocation, and the performance index feature, the data index feature construction and the corresponding feature value calculation are realized. The performance index features include average task execution time, maximum task execution time, minimum task execution time, CPU average utilization, GPU average utilization, memory average utilization, bandwidth utilization, network packet loss rate, average network delay, edge device load. The data index features include frequency domain features such as average data transmission capacity, data transmission time, unit data set size, average value, median, standard deviation, extremum, spectral energy, tiled average value, spectral variance, and the like. And realizing the real-time acquisition of index values by deploying Prometheus on the edge, and completing the calculation of the aggregate value of a specific window by an edge flow processing engine calculation module to construct corresponding characteristics.

Edge storage, namely an edge flow processing engine storage module provides a plurality of storage schemes according to data sources and data flow directions. The embedded lightweight database SQLite is used for storing the state information, the configuration information and the device metadata of the edge nodes, so that the method is extremely low in resource occupancy rate and faster in processing speed, and suitable for concurrent access of mass devices. InfluxDB time sequence database storage is adopted for the atomic data of the Internet of things equipment, a fragmentation and index optimization strategy provides higher-performance read-write operation, and the application requirements of the atomic data on the edge nodes in the aspects of real-time decision and edge reasoning, history backtracking and offline autonomy are met. And carrying out real-time edge caching on intermediate data of edge calculation analysis by adopting a Redis memory database, so as to meet the requirements of cloud edge task allocation, digital twin modeling, equipment predictive maintenance and other application layer services on calculation index real-time data. Meanwhile, the local cache can be used as a redundant backup when cloud storage is unavailable, so that the high availability of the system is improved.

Cloud edge task allocation, namely unloading different types of tasks to an optimal node through reasonable cloud edge task allocation, so that computing resources of a central cloud and edge nodes can be utilized more effectively, overall system delay is reduced, and response speed is improved.

1) The specific cloud edge task allocation data interaction flow is shown in fig. 3, a task allocation data set with labels is constructed based on historical task data construction characteristics, the unbalance degree of a data set sample is determined, a unsteady sample weight coefficient is preset, a LightGBM lightweight gradient lifting decision tree model is constructed through an edge flow processing engine algorithm module, optimal parameter combinations of a LightGBM model are determined based on multiple iterations of a genetic algorithm, and a final reasoning model is determined through multiple rounds of model training. Capturing edge node performance indexes in real time, and calculating an aggregate value by an edge flow processing engine based on a time window to construct a verification set to be classified and input an inference model. And taking task classification and indexes such as priority, task load, network delay and the like output by the model as kubernetes Custom Scheduler input parameters, and distributing the calculation tasks to the optimal nodes by combining with a self-defined scheduling strategy.

2) The whole flow of the cloud edge task allocation scheduling strategy is divided into two parts, namely node preselection and node optimization. The pre-selection stage selects network bandwidth, network delay and storage delay as pre-selection indexes for the delay sensitive task, the computation intensive task selects CPU utilization rate, GPU utilization rate and memory utilization rate as pre-selection indexes, selects corresponding pre-selection indexes according to task classification results, calculates comprehensive scores of all nodes based on a time sequence related entropy weight method, and filters out the nodes with smaller scores of 80%. In order to avoid excessive deflection of the preselected indexes of the two types of tasks, one type of resources are excessive, the other type of resources are tense, all the preselected indexes are fused in a preferred stage, and the comprehensive scores of the remaining nodes are calculated by adopting weighted average to select the optimal node. The specific calculation flow of the time sequence related entropy weight method is as follows:

Assuming that there are r time points, m nodes, and n indexes, X _ijt represents the value of the jth index of the inode at the t time point.

① Normalization process

The indexes originate from different levels, and the sizes and the orders of the index values are obviously different, so that the indexes only have transverse comparability and practicability after being normalized, and the accuracy of the final estimated index can be ensured.

The formula for processing the forward index is as follows:

the formula for processing the negative indicators is as follows:

wherein X _max represents the maximum value in the jth index and X _min represents the minimum value in the jth index

② Calculating the proportion P _ijt of j indexes of the inode at the t time point

③ Calculating the entropy value e _j and the difference coefficient d _j of the j-th index

④ Determining the weight coefficient of the jth index according to the information entropy redundancy

Wherein e is 0≤e _j≤1,d_j＝1-e_j

Wherein the larger the value of d _j, the larger the influence of the index in the evaluation system

⑤ Calculating comprehensive evaluation index

The entropy of index j can be obtained by multiplying w _j by index standard _ijt of normalization, the comprehensive level is obtained by the sum of entropy of n specific indexes, the value of A _i is between 0 and 1, the larger the value is, the higher the node comprehensive score is

And cloud edge consistency service, namely, as the middle part of the industrial interconnection architecture of the whole intelligent workshop, an edge node bears a northbound unified application development interface, and southerly supports core tasks of heterogeneous equipment forms. The Kubeedge cloud primary edge computing platform is extremely lightweight, and achieves unified cloud edge node management and control, unified IOT device access, reliable data transmission and span-edge cloud data synchronization under the condition that limited resource requirements of edge nodes are met, so that functions of the cloud computing platform are extended to edges, and cloud edge consistency requirements are met.

1) And unified cloud edge Node management and control, kubeEdge introduces EdgeNode resources as an extension to the Kubernetes native Node model. EdgeNode expands the characteristics of the edge nodes, including information such as computing capacity, storage capacity, network conditions, geographic positions and the like, and realizes the integrated abstraction and modeling of the edge computing nodes and the central cloud nodes. Meanwhile, kubeEdge brings the edge node and the central cloud node into Kubernetes cluster management together, realizes centralized node management, and provides unified API, command line interface and computing resource pool, thereby meeting the comprehensive support of cloud edge node hybrid management.

2) Unified IOT device access, kubeEdge introduces a consistent IOT device access framework in the IOT device layer, any IOT device performs system integration through the same set of interfaces and specifications, and maintainability and expandability of device access are improved. The IOT device access framework encompasses a cloud native IOT device model and an API edge protocol driver framework. The cloud-native IOT device model provides a unified device abstraction for a user that works in conjunction with a cloud-native environment. The API edge protocol driving framework provides flexible interfaces for device access, and can select and develop proper communication protocols as required, so as to meet diversified IOT device access scenes.

3) The network isolation usually exists between the edge node and the central cloud, and the network environment often accompanies the problems of limited bandwidth, serious packet loss, excessive delay and the like, so that the Kubernetes of the traditional stable network-oriented design is difficult to keep reliably running in the edge environment. KubeEdge by introduction of CloudHub and EdgeHub, shielding of the edge network was successfully achieved. The central cloud CloudHub establishes connections with each edge node EdgeHub, ensuring synchronization and interoperation of global resources. Meanwhile, edgeHub manages all connections in a unified way through WebSocket and message encapsulation technology, so that communication pressure is greatly reduced. The cloud side message verification system KubeEdge can still work normally under the environment with higher network delay and jitter, effectively prevents data from being lost when the network is unstable, and ensures the integrity of the data.

4) The cross-edge cloud data synchronization mainly considers three parts of equipment metadata synchronization, production data synchronization and calculation data synchronization. The device metadata synchronization is based on Kubeedge DeviceTwin, the information of the IOT device metadata in the edge data acquisition and transmission middleware is registered to a central cloud management node, long connection is established between the nodes based on a list-watch mechanism, the device state change in DEVICETWIN is continuously monitored, the information is changed to the central cloud in real time, and the state consistency of the data acquisition and transmission middleware and the registration device between the central cloud is ensured. The production data synchronization mainly comprises two types of equipment production data and edge service production data, a lightweight communication channel between equipment and service is established based on an MQTT Broker, equipment production data is transmitted between a data acquisition transmission middleware and Kubeedge EventBus through a publish-subscribe mode, the transmission mode is called time-driven data transmission, the edge service production data is communicated with Kubeedge ServiceBus in real time through a RESTful API, the transmission mode is called time-driven data transmission, eventBus and ServiceBus data are simultaneously summarized into EdgeHub, and cloud-edge communication links are established between EdgeHub and CloudHub through a WebSocket or QUIC protocol, so that real-time synchronization of the production data is realized. The computing data synchronization is to shield isomerism among IOT equipment, computing indexes and data types, a message bus is constructed by adopting a Kafka production consumption mode, the Kafka decoupling architecture design can ensure that the bottom isomerism does not affect data application of each component on the central cloud, and meanwhile, the Kafka provides a persistent storage mechanism and partition horizontal expansion, so that instantaneous mass data generated by frequent edge computing can not cause node collapse, and the method is suitable for a high-load production environment.

(3) Cloud centralized management and data application service

Cloud end provides unified application management across cloud edges based on the Kubernetes native capability, and deployment, expansion and management of an automation container are more efficient to achieve scheduling and running of cloud end application services. The cloud service center designs a three-layer architecture model based on a cloud edge data collaboration mechanism, and the three-layer architecture model comprises a data integration layer, a calculation engine layer and a data application layer, so that cloud data storage, calculation and one-stop circulation application are realized, and the detailed architecture design is shown in fig. 4.

The intelligent workshop data warehouse fully considers the requirements of lower data quality, ETL efficiency, safety privacy, upper business intelligence, decision analysis, response time and the like, and deploys a central data warehouse, a logic data warehouse, a modeling data warehouse and a stream computing data warehouse. The central data warehouse is built by adopting a multi-level architecture, the ODS layer is used as a buffer area between the source system and the data warehouse, the data updating frequency is high, and the service requirements of real-time query and storage are met. The DWD layer stores the detailed business data after cleaning, integration and conversion, and provides index data for decision support and analysis. The CTM layer realizes data integration and standardization based on general service topics and rules. The logical data warehouse, the modeling data warehouse, and the stream computation data warehouse provide a data view for upper layer application services with the central data warehouse as a data source.

The computing engine layer respectively designs transmission buses for real-time data, offline data, object storage and cache data stored by the data integration layer, provides service components such as computation, analysis and processing for the data application layer, and extracts and filters valuable information from the original data. The data application layer takes the calculation engine layer as a data interface to realize digital twin modeling, stream calculation service, fault diagnosis and other application services.

As shown in fig. 5, the data application layer is used as a cloud-side-end centralized management control center and is responsible for intelligent control decision and instruction issuing. After the IOT equipment is registered and online, an algorithm strategy is configured for the equipment by the MES system aiming at task information, and model entry data are preset. When production data in a central cloud data warehouse starts to be iteratively updated, a model batch reading related data calculation control strategy in an AI algorithm library is manually or automatically issued to an edge node edge flow processing engine by an MES system, the data acquisition and transmission middleware periodically reads the instruction according to a preset rule of equipment to execute, and after the execution of the IOT equipment is completed, a set value is modified and a result is fed back to the central cloud MES system.

(4) Resource management module

The resource collaboration mechanism aims at solving the problems of efficient resource management and collaborative work between the cloud and the edge nodes, and based on Prometheus and KubeEdge, the state synchronization, monitoring sniffing and telescopic scheduling of cloud edge node resources are realized, and millisecond cluster response and resource dynamic adjustment are realized.

1) State synchronization KubeEdge the state synchronization mechanism ensures that resource changes (e.g., allocation or release of resources) will be synchronized between the central cloud, edge nodes, devices, and the specific workflow is as follows. (1) KubeEdge CloudCore and EdgeCore establish a connection between the central cloud and the edge nodes, edgeCore is responsible for running workloads locally at the edge nodes, providing infrastructure resources for computing, storage, networking, etc. for deployment of edge platforms, applications, services, etc., and managing local resources and containerized application services. (2) The central cloud CloudCore is responsible for managing the resource allocation and load balancing of the whole Kubernetes cluster, and cloud edge synchronization of resource allocation, pod state and custom resource definition CRDs is realized based on EdgeHub. (3) And the edge node event driving mechanism synchronizes node states and resource changes among the edges through a data acquisition transmission middleware publishing and subscribing mode and provides support for IOT equipment resource adjustment, configuration updating and task allocation.

2) The resource sniffing mechanism Kubeedge is realized based on Kubelet, kube-APISERVER, ETCD and other components of Kubernetes, and has the following problems when the mechanism is applied in an actual production environment (1) the sniffing index set is relatively limited, and key performance indexes for specific scenes and services are lacked. (2) The resource condition evaluation for the sniffing index lacks advanced data processing and analyzing capability, and has poor flexibility. (3) The components themselves need to occupy the computational resources of the nodes and control planes, and when the cluster size becomes large, the resource overhead is large. (4) The resource index formats are not uniform, and are not easy to integrate with other visual monitoring tools, and difficult to manage and maintain. Aiming at the problems, prometaus is introduced as a core component of resource sniffing and is used for collecting abundant index information such as performance indexes, resource utilization rates and the like of the central cloud and the edge nodes in real time. Prometheus supports advanced query language and efficient storage mechanism to realize complex processing analysis of indexes, and provides data support for resource scheduling. Meanwhile, prometheus supports horizontal expansion, so that the requirement of large-scale cluster resource sniffing can be easily met, and the cost is relatively small in a high-density cluster.

3) The flexible scheduling forms an adaptive scheduling strategy by means of Kubernetes scheduling expansion and Prometheus resource sniffing functions. And periodically collecting Prometheus sniffing index data, analyzing resource bottleneck and load conditions of cloud and edge nodes in fine granularity, and dynamically transferring an application service container to a node with lower load by adopting a self-adaptive scheduling algorithm (the same as cloud edge task allocation strategy proposed in the third chapter), so as to realize balanced utilization of resources.

(5) Log analysis module

ELK (elastic search service, kibana service, logstar service) is fused with the log mechanism of KubeEdge to form a new mechanism suitable for cloud-edge-end log collaboration in complex industrial sites. The log collaboration mechanism integrates computing resources required by the center cloud and the edge node log management based on Kubbedge, and achieves node ELK service containerization arrangement. The central cloud is uniformly responsible for the aggregation management and analysis processing of node service log information, so that service fault investigation and performance optimization are realized, and the specific design scheme is as follows.

1) And (3) collecting and storing log information, distributing the Log-mesh service Pod to each node based on the central cloud Kubernetes, and running the Log-mesh agent on each edge node to realize log collection of edge equipment and service. The Logstar agent is responsible for acquiring log data from KubeEdge local log files and sending the log data to a central elastic search service for centralized storage, so that the instant collection and persistent storage of the edge equipment and service logs are realized. Aiming at the dynamic property and resource limitation of the edge computing scene, a modularized logstack configuration is adopted, a designated module is loaded according to the need, the occupied and starting time of a logstack memory is reduced, and reasonable configuration parameter adjustment is carried out, wherein the adjustment comprises the steps of limiting the concurrent thread number, simplifying the filtering rule and the like.

2) The log filtering and preprocessing expands KubeEdge the log mechanism so that the log filtering and preprocessing can be integrated with the logstack agent to realize local log filtering and formatting. Some lightweight pre-processing is done at the edge node to reduce the amount of data transmission and to ensure that useful information is transmitted.

3) Index and sharding policies, in the central cloud elastic search service, the sharding policies are used to balance query performance and storage efficiency to accommodate dynamic data changes in the edge computing environment. And establishing reasonable indexes according to the dimensions of equipment, service, time and the like, setting corresponding data retention strategies, realizing efficient retrieval and analysis of edge equipment and service, and ensuring that the elastic search service can effectively manage and allocate storage resources when facing to increase and decrease of edge nodes.

4) Visual monitoring and analysis, the center cloud uses Kibana services to construct an intuitive and powerful visual interface for monitoring and analyzing log data in real time. Kibana are rich in charts and dashboard functions, and can rapidly locate potential service fault points, optimize performance and monitor the health condition of the service full life cycle.

5) Log tagging mechanism to enhance adaptability to edge computing environments, the concept of log tags (Logging Labels) is introduced. And tag information is uniformly added to the equipment and the service based on KubeEdge, so that the tags can be embedded and captured in the log data, the log has more context and relevance, the readability of the data is improved, the fault point can be positioned more accurately, and the fault troubleshooting process is accelerated.

6) Automatic discovery and registration mechanism based on Kubernetes service discovery and custom controller (Custom Controller) implements the automatic discovery and registration mechanism. First, using the Service discovery mechanism of Kubernetes, application programs or services are exposed using Service resource definitions. Secondly, a custom controller is written, and the changes of Service resources and Pod resources in the Kubernetes cluster are monitored. When a new Service or Pod joins the cluster, the controller is responsible for automatically discovering and registering these services in the Service registry ZooKeeper. The auto-discovery and registration mechanism enables new edge nodes to automatically join the log collaboration mechanism without manual configuration.

(6) Security service module

The isomerism and the distributivity of the edge calculation make the security become a critical problem, the security collaborative mechanism provided herein realizes security policies such as identity authentication, communication encryption, equipment authentication, monitoring audit and the like, each edge node is responsible for detecting the edge security state, the central cloud completes the analysis of the security state information, and the security policies are issued to the edge nodes.

1) Authentication and authorization, based on the Kubernetes RBAC system, tightly controls each device and service access cluster. Each device and service has a unique identity, and identity verification is performed through Token tokens. The RBAC rules are based on a minimum authority principle, ensuring that each entity has only the minimum authority required for its task, preventing unauthorized devices or malicious entities from accessing the system.

2) Communication encryption, namely encrypting communication between the central cloud and the edge equipment by using TLS/SSL, adding a disposable time stamp token into the communication message to prevent replay attack, and realizing bidirectional identity verification by cloud edges to ensure confidentiality and integrity of data in the transmission process. Aiming at the condition that the network condition of the industrial site is complex and privacy protection exists between subnets, the communication between the cloud and the edge equipment is subjected to network isolation to reduce the risk of transverse diffusion attack. Virtual Local Area Networks (VLANs) are used to isolate the different subnets, ensuring that the communication channels are only visible to authorized entities, thus forming an end-to-end security solution.

3) Device authentication devices must be authenticated prior to joining KubeEdge clusters, each device being assigned a unique certificate signed by a trust authority (CA) in the cluster, the device verifying identity by providing its certificate. After the device is authenticated, its information is registered in the device registry of the cluster. The registry contains unique identifiers, certificate information, and other related metadata for each device, enabling the cluster manager to track and manage all access devices. Once a device successfully registers, the RBAC mechanism comes into play, ensuring that the device can only perform the operations required for its tasks, minimizing the potential attack surface.

4) Monitoring audit, namely monitoring the state of cluster resources and services in real time based on the resource coordination and log coordination capability. The audit function is used for recording access and operation to the clusters and the devices, and provides tracking and tracing basis for the security events.

Claims

1. An intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing, characterized by comprising the following steps:

1) The data acquisition and transmission middleware obtains workshop production data, performs real-time protocol conversion for heterogeneous devices, and integrates multi-source data perception. It maps data from different sources into a unified format and pushes the data to upper-layer services.

2) The edge stream processing engine performs edge computing, edge storage, and edge intelligence on data at edge nodes based on the SQL parser;

3) The cloud service center provides various cloud application services in a SaaS service model based on the needs of intelligent workshop construction;

4) Build a cloud-edge collaboration mechanism based on Kubernetes and KubeEdge, establish communication channels at the cloud-edge, and extend the containerized application orchestration and management of the cloud service center to the edge nodes.

2. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 1 is characterized in that step 1) includes the following steps:

1.1) Device registration: Device registration is performed through DashBoard visual operations;

1.2) Heterogeneous Data Collection: Build a common plug-in interface and define a set of standard functions for handling plug-in initialization, connection establishment, data reading, and data writing. For each protocol, define a specific plug-in class that implements the standard functions of the common interface. After device registration is complete, create an instance of the plug-in class based on the factory model to handle the connection, communication, data parsing, and southbound data flow for the specific protocol.

1.3) Data Routing: A star-shaped scalable network topology is constructed using the ZeroMQ message library's PAIR peer-to-peer communication model. This network connects to industrial device driver adapters in the south and data application adapters in the north. Southbound driver nodes use a loosely coupled architecture to install southbound communication plug-ins for each external device, create specific groups and points for direct communication with external devices, and forward communication data to the core message router. Northbound application nodes subscribe to specific groups created in the southbound nodes, receive data messages from the core message router, and perform logical processing or forward data based on system requirements.

1.4) Unified management: Manage data tags, southbound communication plug-ins, adapters, and groups;

1.5) Data Specification: Build a unified JSON data format, ensure consistency across multiple communication protocols, and route this JSON data to northbound application nodes.

1.6) Command Distribution: The core message routing uses the MQTT Broker as a northbound application adapter, subscribes to command messages specified in the MQTT Broker, and forwards them to ZeroMQ to form a unified command format. The southbound protocol driver subscribes to the processed command messages in ZeroMQ and distributes them to specific devices through the protocol plug-in class command control function.

3. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 2 is characterized in that the device registration supports configuration plug-in registration and template batch registration, wherein:

The configuration plug-in registration completes the device registration by selecting a plug-in supported by the device communication protocol, configuring connection parameters, setting groups and points;

The template batch registration completes the batch registration of nodes by creating a template with specified plug-ins and configuration information.

4. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 1, wherein step 2) comprises the following steps:

2.1) Edge computing: Using the output data of the data collection and transmission middleware as the basic data source, computing tasks are created through built-in sources, actions, and functions, and submitted through the REST API;

2.2) Edge Storage: Provides multiple storage methods based on data sources and data flows, and provides redundant backup when local cache is unavailable in the cloud.

2.3) Cloud-edge task allocation: Through cloud-edge task allocation, different types of tasks are offloaded to the optimal nodes;

2.4) Cloud-edge consistency service: By deploying the Kubeedge cloud-native edge computing platform and customizing the cloud computing platform, unified cloud-edge node management and control, unified IoT device access, reliable data transmission, cross-edge-cloud data synchronization, and extending the cloud computing platform functions to the edge.

5. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 4 is characterized in that step 2.2) is specifically:

An embedded lightweight database SQLite is used to store edge node status information, configuration information, and device metadata;

InfluxDB time series database is used to store atomic data of IoT devices;

The Redis memory database is used for real-time edge caching of intermediate data for edge computing analysis.

6. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 4, characterized in that step 2.3) includes the following steps:

2.3.1) Construct features based on historical task data, build a labeled task assignment dataset, and determine the degree of sample imbalance in the dataset;

2.3.2) Preset the weight coefficient of unstable samples and build the LightGBM lightweight gradient boosting decision tree model;

2.3.3) Determine the optimal parameter combination of the LightGBM model based on multiple rounds of genetic algorithm iterations, and determine the final inference model after multiple rounds of model training;

2.3.4) Capture edge node performance metrics in real time. The edge stream processing engine calculates aggregate values based on time windows and constructs the validation set to be classified as input to the inference model.

2.3.5) Use the task classification and metrics output by the model as input parameters for the Kubernetes Custom Scheduler, and combine it with the custom scheduling strategy to allocate computing tasks to the optimal nodes.

7. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 6 is characterized in that the customized scheduling strategy is divided into node pre-selection and node optimization, wherein:

In the pre-selection phase, network bandwidth, network latency, and storage latency are selected as pre-selection indicators for latency-sensitive tasks, and CPU utilization, GPU utilization, and memory utilization are selected as pre-selection indicators for compute-intensive tasks. Pre-selection indicators are selected based on the task classification results, and a comprehensive score for each node is calculated using the time-series-related entropy weight method. Nodes whose scores do not meet the threshold are filtered out.

In the optimization stage, all pre-selected indicators are integrated, and the weighted average is used to calculate the comprehensive scores of the remaining nodes, and the node with the highest score is selected as the optimal node.

8. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 7 is characterized in that the time series-related entropy weight method includes the following steps:

(1) Standardization:

Assume that there are r time points, m nodes, and n indicators. _Xijt represents the value of the jth indicator of the i-th node at the t-th time point. Preselected indicators are selected based on the task classification results. The indicators are divided into positive indicators and negative indicators. The positive indicators are processed as follows:

Dealing with negative indicators:

Among them, X _max represents the maximum value of the j-th indicator, and X _min represents the minimum value of the j-th indicator;

(2) Calculate the _proportion of index j of node i at time point t:

(3) Calculate the entropy value _ej and the difference coefficient _dj of the jth indicator:

Among them, r and m are coefficients respectively;

(4) Determine the weight coefficient of the jth indicator based on the information entropy redundancy:

Where, 0≤e _j ≤1, d _j =1-e _j ;

(5) Calculation of comprehensive evaluation indicators:

9. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 1, wherein step 3) comprises the following steps:

3.1) The data integration layer integrates production data and computing data transmitted by edge nodes with cloud application data to build a hierarchical data warehouse. This hierarchical data warehouse includes a deployment center data warehouse, a logical data warehouse, a modeling data warehouse, and a stream computing data warehouse. The center data warehouse adopts a multi-tiered architecture. The ODS layer serves as a buffer between the source system and the data warehouse. The DWD layer stores detailed business data that has been cleansed, integrated, and transformed, providing indicator data for decision support and analysis. The CTM layer implements data integration and standardization based on common business themes and rules. The logical data warehouse, modeling data warehouse, and stream computing data warehouse use the center data warehouse as their data source to provide data views for upper-layer application services.

3.2) The computing engine layer designs transmission buses for real-time data, offline data, object storage, and cached data stored in the data integration layer, providing different service components for the data application layer;

3.3) The data application layer serves as the cloud-edge-end centralized management and control center, making intelligent control decisions and issuing instructions.

10. The intelligent workshop production optimization method based on data-driven and multi-cloud-edge collaborative computing according to claim 1, wherein step 4) comprises the following steps:

4.1) The resource management module uses Prometheus and KubeEdge to implement state synchronization, monitoring, and scaling of cloud-edge node resources, while also achieving millisecond-level cluster response and dynamic resource adjustment.

4.2) The log analysis module integrates the logging mechanisms of ELK and KubeEdge to form a new mechanism for cloud-edge-end log collaboration suitable for complex industrial sites;

4.3) The security service module builds security policies for identity authentication, communication encryption, device authentication, and monitoring and auditing. Each edge node is responsible for detecting the edge security status. The central cloud completes the analysis of security status information and issues security policies to the edge nodes.