CN120448098A

CN120448098A - Data processing method, device and storage medium based on distributed network

Info

Publication number: CN120448098A
Application number: CN202510491671.0A
Authority: CN
Inventors: 李卓豪; 陈早; 吴惠豪
Original assignee: Shenzhen Qingyun Information Technology Co ltd
Current assignee: Shenzhen Qingyun Information Technology Co ltd
Priority date: 2025-04-18
Filing date: 2025-04-18
Publication date: 2025-08-08

Abstract

The application relates to the technical field of distributed processing, and provides a data processing method based on a distributed network, which comprises the steps of collecting a dynamic characteristic set and a real-time resource index set of an input data stream in real time; the method comprises the steps of inputting a dynamic feature set and a real-time resource index set into a pre-trained autonomous optimization model, generating a data slicing strategy matrix and task allocation weight vectors of joint optimization, splitting a task to be processed into a plurality of subtasks with execution sequence relations according to a data dependency graph, dynamically mapping each subtask to an appointed storage slicing of a target node according to the weight vectors, continuously exchanging execution state data by each node in the subtask execution process, dynamically adjusting task execution path weights of adjacent nodes according to progress deviation rates and resource margins in the state data to form a closed loop feedback optimization link, and triggering a slicing migration transaction and a fault recovery pipeline according to the continuously updated real-time resource index set and the continuously updated progress deviation rates, and carrying out strategy iterative updating.

Description

Data processing method, device and storage medium based on distributed network

Technical Field

The present invention relates to the field of distributed processing technologies, and in particular, to a data processing method, device, and storage medium based on a distributed network.

Background

With the popularization of cloud computing and edge computing, the data size to be processed by a distributed system increases exponentially, and the coupling relation between dynamic characteristics (such as time sequence fluctuation and business relevance) of data flows and heterogeneous resource environments (such as node computing power fluctuation and network state change) is increasingly complex. The existing distributed data processing method has the defects of data characteristic acquisition and resource monitoring separation, static strategy driving task allocation, lack of dynamic coordination in the execution process, fault recovery, strategy iteration unhooking and the like.

The technical defects cause the problems of low resource utilization rate, large task delay fluctuation, long fault recovery period and the like when the existing method processes the tasks with high dynamic property and strong real-time property. Therefore, a distributed data processing method capable of realizing data-resource collaborative awareness, dynamic policy optimization and closed-loop elastic guarantee is needed.

Disclosure of Invention

The application provides a data processing method, equipment and a storage medium based on a distributed network, which can efficiently process mass data streams.

In one aspect, the present application provides a data processing method based on a distributed network, the method comprising:

collecting a dynamic characteristic set of an input data stream in real time, and synchronously obtaining a real-time resource index set;

inputting the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model to generate a jointly optimized data slicing strategy matrix and task allocation weight vectors;

splitting a task to be processed into a plurality of subtasks with execution sequence relations according to a data dependency graph based on the data slicing strategy matrix, and dynamically mapping each subtask to an appointed storage slicing of a target node according to a weight vector;

In the subtask execution process, each node continuously exchanges execution state data, and dynamically adjusts task execution path weights of adjacent nodes according to progress deviation rate and resource allowance in the state data to form a closed loop feedback optimization link;

Triggering a fragment migration transaction and a fault recovery pipeline according to the continuously updated real-time resource index set and the progress deviation rate, and writing back an event log to an autonomous optimization model to perform strategy iterative updating.

In another aspect, the present application provides a data processing apparatus based on a distributed network, the apparatus comprising:

The acquisition module is used for acquiring the dynamic characteristic set of the input data stream in real time and synchronously acquiring the real-time resource index set;

The generation module is used for inputting the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model to generate a data slicing strategy matrix and task allocation weight vectors of the joint optimization;

the splitting module is used for splitting the task to be processed into a plurality of subtasks with execution sequence relations according to the data dependency graph based on the data slicing strategy matrix, and dynamically mapping each subtask to the designated storage slicing of the target node according to the weight vector;

the adjusting module is used for continuously exchanging execution state data of each node in the subtask execution process, and dynamically adjusting task execution path weights of adjacent nodes according to the progress deviation rate and the resource allowance in the state data to form a closed loop feedback optimization link;

And the triggering module is used for triggering the shard migration transaction and the fault recovery pipeline according to the continuously updated real-time resource index set and the progress deviation rate, and writing back an event log to the autonomous optimization model to carry out strategy iteration update.

In a third aspect, the present application provides an electronic device, the device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the technical solution of the data processing method based on a distributed network as described above when the computer program is executed.

In a fourth aspect, the present application provides a storage medium storing a computer program which, when executed by a processor, implements the steps of the technical solution of a data processing method based on a distributed network as described above.

According to the technical scheme provided by the application, on one hand, the data dynamic characteristic set and the node real-time resource index set are synchronously acquired, the multidimensional joint perception of data type distribution, time sequence mode and CPU/storage/network state is realized, accurate real-time input is provided for dynamic strategy generation, so that the matching degree of task allocation strategy and real-time resources is remarkably improved, on the other hand, the autonomous optimization model is used for generating the joint optimization strategy matrix and weight vector, the partition constraint and the node priority are dynamically adjusted based on the real-time data-resource coupling state, the task execution delay in a heterogeneous resource scene can be effectively reduced, the resource fragmentation is reduced, on the other hand, the task is split according to the data dependency graph and is dynamically mapped to a target partition, and the allocation position of the weight vector constraint is combined, so that the subtask group with a strong dependency relationship is ensured to be preferentially allocated to a low-delay node cluster, and the data access locality and the task execution efficiency are improved. In conclusion, the technical scheme of the application can efficiently process large stream data through the distributed network.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data processing method based on a distributed network according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a distributed network-based data processing apparatus according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In this specification, adjectives such as first and second may be used solely to distinguish one element or action from another element or action without necessarily requiring or implying any actual such relationship or order. Where the environment permits, reference to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but may be one or more of the element, component, or step, etc.

In the present specification, for convenience of description, the dimensions of the various parts shown in the drawings are not drawn in actual scale.

With the popularization of cloud computing and edge computing, the data size to be processed by a distributed system increases exponentially, and the coupling relation between dynamic characteristics (such as time sequence fluctuation and business relevance) of data flows and heterogeneous resource environments (such as node computing power fluctuation and network state change) is increasingly complex. The existing distributed data processing method has the defects of data characteristic acquisition and resource monitoring separation, static strategy driving task allocation, lack of dynamic coordination in the execution process, fault recovery, strategy iteration unhooking and the like, and the following technical route is generally adopted:

1) The data characteristic collection and the resource monitoring are separated, namely the data characteristic and the node resource index are respectively collected through independent modules, so that a task allocation strategy cannot sense the dynamic association of the data flow change and the resource state in real time;

2) Static strategy driving task allocation, namely generating a segmentation strategy based on an offline training model or a predefined rule, wherein the segmentation strategy is difficult to adapt to dynamic adjustment of service topology and burst resource bottleneck;

3) The execution process lacks dynamic coordination, wherein the task is dependent on a fixed execution path after deployment, cannot be dynamically optimized according to real-time load, and adopts a global lock mechanism during fragment migration, so that the response of the system is delayed;

4) Fault recovery and strategy iterative unhooking, namely migration and recovery operation do not form closed loop feedback, and similar faults are easy to repeatedly occur.

In order to solve the above problems in the prior art, the present application provides a data processing method based on a distributed network, where a flowchart of the method is shown in fig. 1, and the method mainly includes steps S101 to S105, and is described in detail as follows:

Step S101, collecting dynamic characteristic sets of input data streams in real time, and synchronously obtaining real-time resource index sets.

It should be noted that if only a single dimension (for example, only data features or only resource indexes) is collected, the subsequent autonomous optimization model cannot perceive the association between the data features and the resource fluctuations, so that the slicing strategy and the task allocation scheme deviate from the actual running environment, for example, the node computing power cannot be dynamically adapted when the data stream suddenly grows. This means that joint modeling of data flow and resource state is achieved by synchronously collecting data dynamic feature sets and node resource index sets (e.g., data type distribution, timing patterns, CPU/storage/network index).

In the embodiment of the application, a dynamic characteristic set of an input data stream is acquired in real time, and a real-time resource index set is synchronously acquired, which is actually a multidimensional state sensing stage of a data processing method based on a distributed network, and specifically, the dynamic characteristic set of the input data stream can be acquired in real time through a distributed probe cluster deployed on a data source access layer, and the real-time resource index set is synchronously acquired through a resource monitoring agent installed on a computing node, wherein the dynamic characteristic set comprises data type distribution, a data stream time sequence mode and a service logic topology, and the real-time resource index set comprises CPU calculation fluctuation, storage space dynamic occupancy rate and network bandwidth utilization rate of each node. The data type distribution, the time sequence mode and the multi-dimensional joint perception of the CPU/storage/network state can be realized by synchronously collecting the data dynamic characteristic set and the node real-time resource index set, and accurate real-time input is provided for dynamic strategy generation, so that the matching degree of the task allocation strategy and the real-time resources is remarkably improved. As an embodiment of the present application, the dynamic feature set for collecting the input data stream in real time may be implemented through steps 1011 to 1013, which are described in detail as follows:

Step 1011, implanting a feature extractor in the data input pipeline to generate a data type distribution vector by identifying structured and unstructured data duty cycles.

Step 1012, extracting time sequence fluctuation characteristics of the data stream based on the data type distribution vector and encoding the time sequence fluctuation characteristics into a time sequence matrix.

The time sequence fluctuation characteristic of the data stream can be extracted by wavelet transformation analysis on the arrival interval of the data stream.

Step 1013, inputting the time sequence matrix into a business rule engine to construct a dynamically updated business logic topological graph.

Specifically, the time sequence matrix is input into a business rule engine, the business rule engine analyzes transaction labels carried by data, and a dynamically updated business logic topological graph is constructed by combining a preset cross-source association rule.

As can be seen from steps 1011 to S1013 of the foregoing embodiment, by hierarchical processing, that is, data type identification, time sequence feature extraction, and service topology construction, feature extraction complexity can be reduced, and data overload during service topology construction is avoided, so that completeness and relevance of a dynamic feature set are improved, and a mapping relationship between a data stream and service logic can be accurately identified by a subsequent policy generation model.

And S102, inputting the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model to generate a data slicing strategy matrix and task allocation weight vectors of the joint optimization.

Considering that if a fixed slicing strategy (for example, equal slicing) is adopted or tasks are allocated only by relying on resource indexes, the slicing size cannot be dynamically adjusted according to service topology, so that storage space is wasted or task execution paths are in conflict. Therefore, the method inputs the dynamic feature set and the real-time resource index set into the pre-trained autonomous optimization model to generate the jointly optimized data slicing strategy matrix and the task allocation weight vector, and can convert multidimensional sensing data into executable slicing constraint and task priority rules to solve the adaptability defect of the static strategy, wherein the applied strategy matrix comprises the slicing quantity, the slicing size and the storage position constraint, and the weight vector defines the processing priority of each node to different task types. And, when generating the data slicing strategy matrix, the following slicing-node matching degree evaluation is also carried out, namely, calculating the strategy Score of each potential slicing node combination, namely, score=alpha× (storage capacity allowance/requirement) +beta× (network delay reciprocal) +gamma× (similar task history completion rate), wherein alpha, beta and gamma are respectively the weight coefficients of the storage capacity allowance/requirement, the network delay reciprocal and the similar task history completion rate, and reserving candidate strategies with the scores higher than the adaptability threshold value to form the data slicing strategy matrix.

The training process of the autonomous optimization model of the embodiment is realized according to a collaborative mechanism, wherein a dual-input neural network is established, a first input branch performs LSTM feature extraction on time sequence codes of a dynamic feature set, a second input branch performs graph convolution processing on space distribution codes of a real-time resource index set, then the features extracted by the two input branches are fused to obtain time-space fusion features, the time-space fusion features are respectively input into a strategy matrix generator and a weight vector predictor, the two are mutually optimized in countermeasure training through a gradient inversion layer, and based on training results of the countermeasure training, the weighted sum of historical task execution delay and resource wave rate is calculated to serve as a joint loss function, and full-connection layer parameters of the dual-input neural network are updated through counter propagation.

In order to solve the policy hysteresis problem caused by the offline training of the traditional model, and through the feature learning and penalty mechanism of migration events, the iterative updating of the autonomous optimization model can be realized through a closed loop feedback mechanism, wherein the classification of the migration triggering reasons of the fragments is encoded into feature vectors, the feature vectors are added into a training data set to optimize the generation logic of a policy matrix, migration frequency penalty coefficients are increased in a loss function for migration transactions caused by unreasonable constraint of the fragments in the policy matrix, incremental training is started in a low peak period, the weight distribution of a hidden layer of a neural network is adjusted by using an updated data set containing migration features, and the updated policy matrix and weight vectors are generated.

Step S103, splitting a task to be processed into a plurality of subtasks with execution sequence relations according to a data dependency graph based on a data slicing strategy matrix, and dynamically mapping each subtask to an appointed storage slicing of a target node according to a weight vector.

If, as in the prior art, the undivided tasks or the randomly mapped subtasks are directly allocated, the amount of data transmission across the nodes will increase (e.g., the dependent intensive tasks are distributed to the high latency nodes), significantly increasing the task completion time. Therefore, in order to ensure that the task execution path meets the data locality requirement, and the weight vector constraint node selects priority, the subtasks can be split according to a data dependency graph (DAG) and dynamically mapped to the target node, that is, based on a data slicing policy matrix, the task to be processed is split into a plurality of subtasks with execution sequence relations according to the data dependency graph, and each subtask is dynamically mapped to a designated storage slice of the target node according to the weight vector. When each subtask is dynamically mapped to a designated storage partition of a target node, dynamic binding can be implemented, namely, before the subtask is distributed, a resource reservation request is initiated to the target node, if the current load rate of the node exceeds the processing capacity stated in the weight vector of the node, a suboptimal node is reselected according to a sliding window mechanism, a bidirectional pointer index of the subtask and the storage partition is established, and data locality during execution is ensured.

Specifically, as an embodiment of the present application, based on the data slicing policy matrix, splitting the task to be processed into a plurality of subtasks having execution sequence relationships according to the data dependency graph may be implemented through steps S1031 to S1033, which is described in detail below:

Step S1031, converting the task input into a data dependency graph, wherein nodes represent atomic computing operations and edges represent data transmission dependencies.

Specifically, by analyzing task description, analyzing computing operation and evaluating complexity and intensity of data quantity, combing data transfer relation to determine dependency relation, simultaneously, collecting resource indexes such as node CPU, memory, network bandwidth and the like in real time by using lightweight agents, and adding resource labels such as 'GPU acceleration', 'high bandwidth' and the like for nodes according to hardware and software environments. It should be noted that, when the data dependency graph of the above embodiment is constructed, semantic analysis may be introduced, specifically, a syntax tree capable of parsing an SQL query may identify a JOIN operation dependency chain, parse pipeline dependencies of feature engineering for a machine learning training task, and add a data transmission cost weight factor for each dependency edge.

And S1032, carrying out critical path analysis on the data dependency graph, and identifying the sub-task chain with the highest priority.

For example, complex computationally intensive tasks may be split finely and simple tasks may be split coarsely, depending on the task dependency graph hierarchy. By combining task characteristics and node resource labels, tasks with a large number of matrix operations are split into small blocks and are given to GPU acceleration nodes, and tasks with multiple data transmission are split into data transmission subtasks and are given to high-bandwidth nodes.

And step S1033, dividing the data dependency graph into a plurality of subtask groups meeting the slicing constraint in the strategy matrix through a graph dividing algorithm.

Specifically, high priority subtasks may be preferentially allocated according to task urgency and data-dependent subtask priorities. During allocation, traversing nodes find the nodes which are best matched with the subtask resource requirements, for example, calculating dense subtasks to the nodes with strong CPU performance and low load. Meanwhile, task execution and node resources are monitored in real time, and subsequent subtasks are redistributed when the node resources are tense. And collecting task execution and resource utilization data, and optimizing task disassembly and allocation strategies by using algorithms such as reinforcement learning and the like.

According to the embodiment, the tasks are split according to the data dependency graph and are dynamically mapped to the target fragments, and the deployment positions are constrained by combining the weight vectors, so that the subtask groups with strong dependency relationships can be ensured to be deployed to the low-delay node clusters preferentially, and the data access locality and the task execution efficiency are improved.

Step S104, in the subtask execution process, each node continuously exchanges and executes state data, and task execution path weights of adjacent nodes are dynamically adjusted according to progress deviation rates and resource margins in the state data to form a closed loop feedback optimization link.

Although the task to be processed is split into a plurality of subtasks with execution sequence relations according to the data dependency graph based on the data slicing strategy matrix, and each subtask is dynamically mapped to the designated storage slicing of the target node according to the weight vector, if the initial deployment strategy is only relied on, the task path cannot be dynamically adjusted when the node suddenly breaks down or the load fluctuates, so that the system avalanche (for example, the high-load node task backlog causes the chained delay). Therefore, it is necessary to dynamically adjust task path weights based on the state sharing and the progress deviation rate to form a closed loop feedback link, correct the execution deviation in real time, that is, each node continuously exchanges the execution state data during the execution of the subtasks, and dynamically adjust task execution path weights of adjacent nodes according to the progress deviation rate and the resource margin in the state data to form a closed loop feedback optimized link. According to the embodiment, the task execution path weight is dynamically adjusted through state sharing and progress deviation rate feedback among the nodes, so that a closed-loop optimization mechanism is formed, the task queue can be quickly balanced when the node load is suddenly changed, and the overall stability of the system is maintained.

The method for forming the closed loop feedback optimization link in the embodiment includes the steps of issuing a state snapshot to a message bus every second by each node, wherein the snapshot comprises a current CPU residual clock period, a waiting queue depth and a memory page error count, setting a regional coordinator, calculating a load balancing gradient according to the state snapshots of the nodes in a range of a plurality of adjacent hops, and injecting a back pressure signal to a node downstream of a high-load node when the absolute value of the load balancing gradient exceeds an adaptive threshold value, so that the task injection rate is slowed down. The method comprises the steps of injecting a back pressure signal into a node at the downstream of a high-load node, triggering the following linkage operation, namely generating a check point snapshot for a subtask being processed by the back pressure node, recording a data fragment version number of the check point snapshot, calculating priority weights of fragments to be migrated according to real-time resource saturation of the back pressure node, locating a substitution node matched with the priority weights based on a consistency hash ring, establishing a migration transaction lock, starting a thermomigration thread, transmitting fragment copies marked as a migratable state to the substitution node through zero copy, and releasing an original fragment space and updating a global routing table after migration is completed and data verification is passed. It should be noted that, in the above embodiment, the back pressure signal is a dynamic flow control mechanism, when a node (i.e. a node to be back pressure) cannot process a task in time due to resource overload (e.g. CPU/queue overload), a load state is transferred to a data upstream node (i.e. a task sender) through the signal, so as to require the node to reduce a task injection rate, thereby avoiding system avalanche, and accordingly, the node to be back pressure refers to a computing node that triggers the back pressure signal due to resource overload. As for the replacement node that is located based on the consistency hash ring and matches the priority weight, the establishment of the migration transaction lock can be achieved through steps S1041 to S1044, which is described in detail as follows:

step S1041, constructing a node hash ring.

The method specifically comprises node hash mapping and weight factor injection, wherein the node hash mapping takes IP addresses and port numbers of all available nodes (including current nodes and standby nodes) in a cluster as input, generates a hash value with a fixed length through a consistent hash algorithm (such as SHA-256), maps the hash value to a virtual annular space (0-2A 128-1), and each node occupies one or more virtual positions on a ring according to the hash value. The Weight factor injection can be specifically that a priority Weight value (weight=α×cpu allowance+β×memory allowance+γ×bandwidth allowance, α+β+γ=1) is calculated according to the real-time resource state (CPU residual calculation power, memory availability, network bandwidth allowance) of the node, and then the Weight value is converted into the number of virtual nodes in proportion, wherein the number of the virtual nodes allocated is larger as the Weight is higher, so that the probability of the virtual nodes being selected in the hash ring is improved.

Step S1042, locating and screening candidate nodes.

Specifically, locating and screening candidate nodes can comprise inquiring a hash ring, matching priority weights, processing exceptions and the like, wherein the inquiring hash ring is realized by inputting a unique identifier (such as a fragment ID+version number) of a fragment to be migrated, calculating a hash value of the unique identifier through the same hash algorithm, searching the nearest virtual node on the ring clockwise, and locating a physical node corresponding to the first virtual node as an initial candidate node. The implementation process of the matching priority weight comprises the steps of obtaining a real-time priority weight value of a candidate node, directly selecting the candidate node as a substitute node if the weight value is more than or equal to a preset threshold value (for example, the weight is more than or equal to 0.7), and continuously searching the subsequent N physical nodes (N is configurable) clockwise along a ring if the weight is not up to the standard, and screening out the node with the highest weight as the substitute node. The exception handling mainly comprises triggering an exception marking mechanism if the candidate node is in an offline state or insufficient in resources, temporarily moving the node out of the hash ring, and re-executing the query hash ring and the matching priority weight.

Step S1043, establishing a migration transaction lock.

Specifically, establishing a migration transaction lock includes applying for the transaction lock, handling transaction lock conflicts, and holding and persisting the transaction lock, among others. The application of the transaction lock may be implemented by initiating a migration transaction lock request to a global transaction coordinator (e.g., a service of distributed locks based on Raft protocol), where the request includes metadata such as a to-be-migrated fragment ID and version number, source node and target node IDs, and migration priority weights, and then the coordinator verifies the target node state, grants the lock and returns the transaction ID if it is available for migration, and otherwise returns a rejection signal. If other migration transactions are detected to be locked to the same partition or target node, judging according to priority weights, wherein high-weight transactions obtain locks preferentially, and low-weight transactions enter a waiting queue; and if the weights are the same, ordering according to the transaction initiation time stamp. Holding and persisting the transaction lock mainly means that if the migration is overtime, the coordinator automatically renews the lock and triggers an alarm, and if the continuous renewal fails, the lock is forcedly released and the transaction is rolled back.

Step S1044, submitting the migration transaction and releasing the transaction lock.

The migration transaction is mainly an atomization commit, that is, after migration is completed, a commit request is sent to the coordinator, a data check digest (for example, a CRC64 check code) is attached, and then after the coordinator verifies the consistency of the check code, the coordinator atomically updates the global routing table and binds the fragments with the new node. Releasing the transaction lock mainly comprises releasing the transaction lock by the coordinator after successful submission, clearing temporary metadata, and triggering a rollback process if the submission fails, namely deleting the target node data copy, restoring the source node sharable state and releasing the transaction lock.

From step S1041 to step S1044 in the foregoing embodiments, it is known that, on one hand, the virtual node distribution is dynamically adjusted by the weight, so that it is ensured that the node with high resource margin is more easily selected, and the cluster load balance degree after migration is significantly improved, on the other hand, the transaction lock mechanism may significantly reduce the partition migration conflict rate, and in addition, based on the version number and the atomic submission of the check code, the data consistency deviation probability during migration may be reduced to a preset threshold, for example, below 0.01%.

And step 105, triggering the shard migration transaction and the fault recovery pipeline according to the continuously updated real-time resource index set and the progress deviation rate, and writing back an event log to the autonomous optimization model to perform strategy iterative updating.

If, as in the prior art, only migration recovery is performed without feedback to the model, similar failures are repeatedly triggered (e.g., the same resource bottlenecks occur repeatedly), and system reliability gradually deteriorates. The embodiment of the application can realize the closed loop of fault self-healing and strategy optimization by triggering the migration and fault recovery of the fragments and writing back event logs to the model and converting the run-time abnormality into strategy iteration basis, and particularly can trigger the migration transaction and the fault recovery pipeline when detecting that the node resource saturation exceeds a dynamic threshold or the consistency of the fragment data deviates from a preset tolerance according to the continuously updated real-time resource index set and progress deviation rate, and write back the event logs to the autonomous optimization model for strategy iteration update.

The partition migration transaction in the embodiment can implement hierarchical atomization operation, namely, all copy paths of the partition to be migrated are locked in a global naming service to generate a migration transaction ID containing path fingerprints, remote direct memory access (Remote Direct Memory Access, RDMA) channels of the cross nodes are established to migrate data blocks in batches based on the locked copy paths, meanwhile, the source partition is switched into a read-only mode, and after all target nodes verify the integrity of the data blocks through the copy paths, the partition position mapping table is updated in an atomization mode and the migration transaction ID is released. During execution of the fragment migration transaction, a request redirection agent can be implemented on a fragment access request sent by a client to guide the fragment access request to the latest available copy, read-write operation conflict generated in the redirection process is generated by adopting Multi-version concurrency control (Multi-Version Concurrency Control, MVCC), a data version branch with a timestamp is generated, after the transaction is completed, the data version branch is merged into a linearization sequence according to the timestamp priority and the service consistency rule, invalid expired data copies after merging are cleared, and associated storage space is released. The method comprises the steps of triggering a fault recovery pipeline, namely automatically reducing the frequency of a central processing unit and starting a standby radiating unit when a node temperature sensor detects continuous overtemperature, isolating memory strips with the error correction code error rate exceeding a chip specification threshold value and switching to a mirror channel if the error correction code error rate exceeds the chip specification threshold value, and triggering a standby node cold start process and replaying a pre-written log recovery state for unrepairable hardware faults.

In order to reduce storage overhead and ensure data integrity and service continuity after recovery of a failed node, the cold start process comprises intelligent state reconstruction, wherein metadata indexes are recovered from a latest global check point, the indexes comprise a fragment topological relation and a storage position mapping table, erasure code distribution nodes of missing data fragments are positioned based on the storage position mapping table, complete data fragments are reconstructed through a decoding algorithm, original topic partitions of a message bus are rebuilt according to message subscription relation configuration in the metadata indexes, and data streams are continuously transmitted based on check point time stamps.

The embodiment further comprises a collaborative comparison mechanism of cross-data centers, namely, a data fingerprint matrix is generated periodically, cross-center comparison is carried out through blockchain evidence storage, when fingerprint differences exceed a safety threshold, abnormal fragments are positioned and consistency restoration is triggered, and trust degradation is carried out on malicious tampered nodes and isolated to a sandbox environment.

According to the data processing method based on the distributed network illustrated in the figure 1, on one hand, the data dynamic characteristic set and the node real-time resource index set are synchronously acquired, multi-dimensional joint perception of data type distribution, time sequence modes and CPU/storage/network states is realized, accurate real-time input is provided for dynamic strategy generation, so that the matching degree of task allocation strategies and real-time resources is remarkably improved, on the other hand, a joint optimization strategy matrix and a weight vector are generated through an autonomous optimization model, the partition constraint and the node priority are dynamically adjusted based on the real-time data-resource coupling state, task execution delay in a heterogeneous resource scene can be effectively reduced, resource fragmentation is reduced, on the other hand, tasks are split according to a data dependency graph and are dynamically mapped to target partitions, the allocation position is combined, the fact that subtask groups with strong dependency relations are preferentially allocated to low-delay node clusters is ensured, and therefore the data access locality and the task execution efficiency are improved. In conclusion, the technical scheme of the application can efficiently process large stream data through the distributed network.

Referring to fig. 2, a data processing apparatus based on a distributed network according to an embodiment of the present application may include an obtaining module 201, a generating module 202, a splitting module 203, an adjusting module 204, and a triggering module 205, which are described in detail below:

an acquisition module 201, configured to acquire a dynamic feature set of an input data stream in real time, and synchronously acquire a real-time resource index set;

The generation module 202 is configured to input the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model, and generate a jointly optimized data slicing strategy matrix and a task allocation weight vector;

The splitting module 203 is configured to split a task to be processed into a plurality of subtasks having an execution sequence relationship according to a data dependency graph based on a data slicing policy matrix, and dynamically map each subtask to a designated storage slice of a target node according to a weight vector;

The adjustment module 204 is configured to continuously exchange execution state data between nodes during execution of the subtasks, and dynamically adjust task execution path weights of adjacent nodes according to a progress deviation rate and a resource margin in the state data to form a closed-loop feedback optimization link;

And the triggering module 205 is configured to trigger the shard migration transaction and the fault recovery pipeline according to the continuously updated real-time resource index set and the progress deviation rate, and write back the event log to the autonomous optimization model for policy iteration update.

The data processing device based on the distributed network illustrated in fig. 2 can be used for synchronously acquiring a data dynamic characteristic set and a node real-time resource index set, realizing multidimensional joint perception of data type distribution, time sequence mode and CPU/storage/network state, providing accurate real-time input for dynamic strategy generation, thereby remarkably improving the matching degree of task allocation strategies and real-time resources, generating a joint optimized strategy matrix and a weight vector through an autonomous optimization model, dynamically adjusting the partition constraint and the node priority based on a real-time data-resource coupling state, effectively reducing task execution delay in a heterogeneous resource scene, reducing resource fragmentation, splitting tasks according to a data dependency graph, dynamically mapping the tasks to a target partition, combining the weight vector constraint deployment position, and ensuring that a subtask group with a strong dependency relationship is preferentially deployed to a low-delay node cluster, thereby improving data access locality and task execution efficiency. In conclusion, the technical scheme of the application can efficiently process large stream data through the distributed network.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the electronic device 3 of this embodiment mainly comprises a processor 30, a memory 31 and a computer program 32 stored in the memory 31 and executable on the processor 30, for example a program of a data processing method based on a distributed network. The steps in the above-described embodiment of the data processing method based on the distributed network are implemented when the processor 30 executes the computer program 32, for example, steps S101 to S105 shown in fig. 1. Or the processor 30 when executing the computer program 32 performs the functions of the modules/units in the above-described device embodiments, such as the functions of the acquisition module 201, the generation module 202, the splitting module 203, the adjustment module 204, and the triggering module 205 shown in fig. 2.

The computer program 32 of the data processing method based on the distributed network mainly comprises the steps of collecting a dynamic feature set of an input data stream in real time and synchronously obtaining a real-time resource index set, inputting the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model to generate a jointly optimized data slicing strategy matrix and task allocation weight vectors, splitting a task to be processed into a plurality of subtasks with execution sequence relations according to a data dependent graph based on the data slicing strategy matrix, dynamically mapping each subtask to an appointed storage slicing of a target node according to the weight vectors, continuously exchanging execution state data by each node in the execution process of the subtasks, dynamically adjusting task execution path weights of adjacent nodes according to progress deviation rates and resource margins in the state data to form a closed-loop feedback optimization link, triggering slicing migration transactions and fault recovery pipelines according to the continuously updated real-time resource index set and the continuously updated progress deviation rates, and writing event logs back to the autonomous optimization model to carry out strategy iteration update. The computer program 32 may be divided into one or more modules/units, which are stored in the memory 31 and executed by the processor 30 to complete the present application. One or more of the modules/units may be a series of computer program instruction segments capable of performing a specific function for describing the execution of the computer program 32 in the electronic device 3. For example, the computer program 32 may be divided into functions of an acquisition module 201, a generation module 202, a splitting module 203, an adjustment module 204 and a triggering module 205 (modules in a virtual device), wherein each module has the specific functions of the acquisition module 201 for acquiring a dynamic feature set of an input data stream in real time and synchronously acquiring a real-time resource index set, the generation module 202 for inputting the dynamic feature set and the real-time resource index set into a pre-trained autonomous optimization model to generate a jointly optimized data slicing policy matrix and task allocation weight vectors, the splitting module 203 for splitting a task to be processed into a plurality of subtasks with an execution sequence relation according to a data slicing policy matrix and dynamically mapping each subtask to a designated storage slicing of a target node according to the weight vectors, the adjustment module 204 for continuously exchanging execution state data of each node in the subtask execution process and dynamically adjusting task execution path weights of adjacent nodes according to a progress deviation rate and resource margin in the state data to form a closed-loop feedback optimization link, and the triggering module 205 for triggering and updating a transaction and a transaction event slicing event to an autonomous iterative event recovery policy based on the continuously updated real-time resource index set and the difference and the failure recovery policy.

The electronic device 3 may include, but is not limited to, a processor 30, a memory 31. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the electronic device 3 and does not constitute a limitation of the electronic device 3, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the electronic device may further include an input-output device, a network access device, a bus, etc.

The Processor 30 may be a central processing unit (Central Processing Unit, CPU), other general purpose Processor, digital signal Processor (DIGITAL SIGNAL Processor, DSP), application SPECIFIC INTEGRATED Circuit (ASIC), off-the-shelf Programmable gate array (Field-Programmable GATE ARRAY, FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 31 may be an internal storage unit of the electronic device 3, such as a hard disk or a memory of the electronic device 3. The memory 31 may also be an external storage device of the electronic device 3, such as a plug-in hard disk provided on the electronic device 3, a smart memory card (SMART MEDIA CARD, SMC), a Secure Digital (SD) card, a flash memory card (FLASH CARD), or the like. Further, the memory 31 may also include both an internal storage unit and an external storage device of the electronic device 3. The memory 31 is used to store computer programs and other programs and data required by the electronic device. The memory 31 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that the above-described functional units and modules are merely illustrated for convenience and brevity of description, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above device may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other manners. For example, the apparatus/device embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units may be stored in a storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on the understanding, the method of the embodiment of the application can realize all or part of the processes in the method, and can also be completed by instructing related hardware through a computer program, the computer program of the data processing method based on a distributed network can be stored in a storage medium, the computer program can realize the steps of the method embodiments when being executed by a processor, namely, collecting a dynamic characteristic set of an input data stream in real time and synchronously acquiring a real-time resource index set, inputting the dynamic characteristic set and the real-time resource index set into a pre-trained autonomous optimization model to generate a jointly optimized data slicing strategy matrix and task allocation weight vectors, splitting a task to be processed into a plurality of subtasks with execution sequence relations according to a data dependency graph based on the data slicing strategy matrix, dynamically mapping each subtask to a designated storage slice of a target node according to the weight vectors, continuously exchanging execution state data of each node in the subtask execution process, dynamically adjusting the task execution path weights of adjacent nodes according to the deviation rate and resource allowance in the state data to form a feedback optimization link, triggering a real-time resource deviation set and a fault record, and updating the real-time resource deviation error index and the fault record, and carrying out autonomous iterative recovery and optimizing the fault recovery strategy to the autonomous recovery strategy. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, executable files or in some intermediate form, etc. The storage medium may include any entity or device capable of carrying computer program code, recording medium, USB flash disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, software distribution media, and so forth. It should be noted that the content of the storage medium may be appropriately increased or decreased according to the requirements of jurisdictions in which the legislation and the patent practice, such as in some jurisdictions, the storage medium does not include electrical carrier signals and telecommunication signals according to the legislation and the patent practice.

The foregoing embodiments are merely for illustrating the technical solution of the present application, but not for limiting the same, and although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those skilled in the art that the technical solution described in the foregoing embodiments may be modified or substituted for some of the technical features thereof, and that these modifications or substitutions should not depart from the spirit and scope of the technical solution of the embodiments of the present application and should be included in the protection scope of the present application. The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. A data processing method based on a distributed network, characterized in that the method comprises:

Collect dynamic feature sets of input data streams in real time and simultaneously obtain real-time resource indicator sets;

Inputting the dynamic feature set and the real-time resource indicator set into a pre-trained autonomous optimization model to generate a jointly optimized data sharding strategy matrix and a task allocation weight vector;

Based on the data sharding strategy matrix, the task to be processed is split into multiple subtasks with an execution order relationship according to the data dependency graph, and each subtask is dynamically mapped to a specified storage shard of the target node according to the weight vector;

During the execution of subtasks, each node continuously exchanges execution status data and dynamically adjusts the task execution path weights of adjacent nodes based on the progress deviation rate and resource margin in the status data to form a closed-loop feedback optimization link;

According to the continuously updated real-time resource indicator set and the progress deviation rate, the shard migration transaction and the fault recovery pipeline are triggered, and the event log is written back to the autonomous optimization model for strategy iteration and update.

2. The distributed network-based data processing method according to claim 1, wherein the real-time acquisition of the dynamic feature set of the input data stream comprises:

A feature extractor is embedded in the data input pipeline to generate a data type distribution vector by identifying the proportion of structured and unstructured data.

Based on the data type distribution vector, extract the time series fluctuation characteristics of the data stream and encode them into a time series matrix;

The time series matrix is input into a business rule engine to construct a dynamically updated business logic topology diagram.

3. The distributed network-based data processing method according to claim 1, wherein forming a closed-loop feedback optimization link comprises:

Each node publishes a status snapshot to the message bus every second. The snapshot includes the current remaining CPU clock cycles, the depth of the pending queue, and the memory page fault count.

Set up a regional coordinator to calculate the load balancing gradient based on the state snapshots of nodes within a certain number of hops;

When the absolute value of the load balancing gradient exceeds the adaptive threshold, a back pressure signal is injected into the downstream node of the high-load node to slow down the task injection rate.

4. The distributed network-based data processing method according to claim 3, wherein the injection of a back pressure signal to a node downstream of a high-load node triggers the following linkage operations:

Generate a checkpoint snapshot for the subtask being processed by the back-pressured node and record its data shard version number;

Calculate the priority weight of the shard to be migrated based on the real-time resource saturation of the back-pressured node;

Based on the consistent hash ring, the replacement node that matches the priority weight is located and a migration transaction lock is established;

Start the hot migration thread and transfer the shard copies marked as migratable to the replacement node through zero copy;

After the migration is complete and the data verification is passed, the original shard space is released and the global routing table is updated.

5. The distributed network-based data processing method according to claim 1, wherein the triggering shard migration transaction failure recovery pipeline comprises:

When the node temperature sensor detects a continuous over-temperature, it automatically reduces the CPU frequency and activates the backup cooling unit;

If the memory ECC error rate exceeds the chip specification threshold, the memory module with the ECC error rate exceeding the chip specification threshold is isolated and switched to the mirror channel;

For irreparable hardware failures, the standby node cold start process is triggered and the write-ahead log is replayed to recover the status.

6. The distributed network-based data processing method according to claim 5, wherein the cold start process includes the following intelligent state reconstruction:

Restore the metadata index from the latest global checkpoint, where the index includes the shard topology and storage location mapping table;

Based on the storage location mapping table, locate the erasure code distribution node of the missing data shard, and reconstruct the complete data shard through a decoding algorithm;

According to the message subscription relationship configuration in the metadata index, resubscribe to the original topic partition of the message bus and resume the data stream based on the checkpoint timestamp.

7. The distributed network-based data processing method according to claim 1, wherein the iterative update of the autonomous optimization model is achieved through the following closed-loop feedback mechanism:

The shard migration triggering reasons are categorized and encoded into feature vectors, which are then added to the training dataset to optimize the generation logic of the policy matrix.

For migration transactions caused by unreasonable sharding constraints in the policy matrix, a migration frequency penalty coefficient is added to the loss function;

Incremental training is initiated during off-peak periods, and the updated dataset containing the migration features is used to adjust the hidden layer weight distribution of the neural network and generate an updated policy matrix and weight vector.

8. A data processing device based on a distributed network, characterized in that the device comprises:

The acquisition module is used to collect the dynamic feature set of the input data stream in real time and synchronously obtain the real-time resource indicator set;

A generation module, configured to input the dynamic feature set and the real-time resource indicator set into a pre-trained autonomous optimization model to generate a jointly optimized data sharding strategy matrix and a task allocation weight vector;

A splitting module is used to split the task to be processed into multiple subtasks with an execution order relationship according to the data dependency graph based on the data sharding strategy matrix, and dynamically map each subtask to a specified storage shard of the target node according to a weight vector;

The adjustment module is used to continuously exchange execution status data between nodes during the execution of subtasks, and dynamically adjust the task execution path weights of adjacent nodes based on the progress deviation rate and resource margin in the status data to form a closed-loop feedback optimization link;

The trigger module is used to trigger the shard migration transaction and fault recovery pipeline according to the continuously updated real-time resource indicator set and the progress deviation rate, and write back the event log to the autonomous optimization model for strategy iterative update.

9. An electronic device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 7 when executing the computer program.

10. A storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the method according to any one of claims 1 to 7.