CN118264753A - Resource scheduling method and system for environment-controlled distributed KVM agents - Google Patents
Resource scheduling method and system for environment-controlled distributed KVM agents Download PDFInfo
- Publication number
- CN118264753A CN118264753A CN202410512858.XA CN202410512858A CN118264753A CN 118264753 A CN118264753 A CN 118264753A CN 202410512858 A CN202410512858 A CN 202410512858A CN 118264753 A CN118264753 A CN 118264753A
- Authority
- CN
- China
- Prior art keywords
- data
- kvm
- nodes
- node
- fault
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000004364 calculation method Methods 0.000 claims abstract description 71
- 230000002159 abnormal effect Effects 0.000 claims abstract description 14
- 238000012544 monitoring process Methods 0.000 claims abstract description 14
- 239000012634 fragment Substances 0.000 claims description 187
- 239000003795 chemical substances by application Substances 0.000 claims description 169
- 238000004891 communication Methods 0.000 claims description 147
- 238000003860 storage Methods 0.000 claims description 66
- 238000012937 correction Methods 0.000 claims description 15
- 230000002776 aggregation Effects 0.000 claims description 13
- 238000004220 aggregation Methods 0.000 claims description 13
- 238000009826 distribution Methods 0.000 claims description 8
- 230000000977 initiatory effect Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 abstract description 18
- 230000009286 beneficial effect Effects 0.000 abstract description 15
- 238000003745 diagnosis Methods 0.000 abstract description 4
- 230000005856 abnormality Effects 0.000 abstract description 2
- 238000007726 management method Methods 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 230000004044 response Effects 0.000 description 8
- 230000008439 repair process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 5
- 238000013523 data management Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000006798 recombination Effects 0.000 description 4
- 238000005215 recombination Methods 0.000 description 4
- 238000011084 recovery Methods 0.000 description 4
- 238000013468 resource allocation Methods 0.000 description 4
- 238000013479 data entry Methods 0.000 description 3
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 3
- 238000012550 audit Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000003111 delayed effect Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000002349 favourable effect Effects 0.000 description 2
- 238000011835 investigation Methods 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/5141—Details of processing calls and other types of contacts in an unified manner
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
- H04M3/26—Arrangements for supervision, monitoring or testing with means for applying test signals or for measuring
- H04M3/28—Automatic routine testing ; Fault testing; Installation testing; Test methods, test equipment or test arrangements therefor
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/523—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing with call distribution or queueing
- H04M3/5232—Call distribution algorithms
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/50—Centralised arrangements for answering calls; Centralised arrangements for recording messages for absent or busy subscribers ; Centralised arrangements for recording messages
- H04M3/51—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing
- H04M3/523—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing with call distribution or queueing
- H04M3/5238—Centralised call answering arrangements requiring operator intervention, e.g. call or contact centers for telemarketing with call distribution or queueing with waiting time or load prediction arrangements
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Signal Processing (AREA)
- Business, Economics & Management (AREA)
- Marketing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a resource scheduling method and a system for a ring control distributed KVM seat, which are applied to the field of seat resource scheduling; the invention can detect the abnormal condition of the return signal in time through the preset central control content and the monitoring of the return signal, judge whether the abnormal condition of the return signal exceeds the preset time length, immediately take measures once the abnormality of the return signal is found, quickly identify the fault node and mark the corresponding fault equipment, thereby being beneficial to timely carrying out fault diagnosis and processing, and dynamically reassign the calculation task from other non-fault nodes and carrying out resource scheduling and assignment after the fault node is found, so as to ensure the smooth execution of the task and the stable operation of the system.
Description
Technical Field
The invention relates to the field of agent resource scheduling, in particular to a resource scheduling method and system for a ring control distributed KVM agent.
Background
At present, when a certain node of the distributed KVM agent fails, the whole KVM terminal is easily disconnected from the device in control, so that the usability of the system is affected, and even the data of the KVM terminal is lost due to the failed node, so that irrecoverable information loss is caused.
Disclosure of Invention
The invention aims to solve the problem that when a certain node of a distributed KVM agent fails, equipment in the whole KVM terminal is easily disconnected, and the usability of a system is further affected.
The invention adopts the following technical means for solving the technical problems:
The invention provides a resource scheduling method of a ring control distributed KVM seat, comprising the following steps:
Based on the preset central control content of the KVM seat, acquiring return signals of all controlled devices to the KVM seat;
judging whether the return time length of the return signal exceeds a preset time length;
If yes, acquiring a communication state among all nodes in the KVM seat, identifying a fault node among all nodes according to the communication state, marking fault equipment corresponding to the fault node in the KVM seat, acquiring operation data transmitted to the KVM seat by the fault equipment through the fault node by applying preset decentralization storage, storing the operation data on other non-fault nodes in a distributed manner, and generating all data fragments of the operation data recorded by the non-fault node from the KVM seat;
judging whether the data fragments can be combined to obtain the operation data;
If not, constructing a calculation task of the operation data in the KVM agent, distributing the calculation task into corresponding calculation subtasks based on available resources of other non-fault nodes, carrying out resource scheduling distribution on the other non-fault nodes according to the calculation subtasks, dynamically adjusting the calculation resources required by the other non-fault nodes when executing the calculation subtasks, carrying out data aggregation on calculation results of the other non-fault nodes, and integrating to obtain the operation data.
Further, the step of acquiring a communication state between each node in the KVM agent and identifying a failed node from among the nodes according to the communication state includes:
Configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
Judging whether each node can be connected to the blockchain network;
If yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
Further, after the step of storing the operation data in other non-faulty nodes and generating the data fragments of the operation data recorded by the non-faulty nodes from the KVM agents, the method further includes:
dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying a query instruction output by the KVM agent;
judging whether the query instruction can correspond to the position attribute of the data fragment or not;
If yes, determining a storage node corresponding to the data fragment from the fragment index and the storage metadata, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
Further, the step of constructing the computing task of the operation data in the KVM agent and distributing the computing task to a corresponding computing sub-task based on the available resources of the other non-faulty nodes further includes:
Selecting a pre-recorded scheduling algorithm to cooperate with the execution sequence and the allocation strategy of the computing task based on task resource requirements preset by the KVM seat, wherein the scheduling algorithm specifically comprises first come first serve, shortest job priority and shortest residual time priority;
Judging whether the resource load of the computing task when being executed exceeds a load threshold preset by the KVM seat or not;
If yes, acquiring real-time load information and task queue data of the KVM agent, carrying out resource dynamic scheduling and resource reassignment on the computing task according to the real-time load information and the task queue data, and generating an execution state of the computing task on the KVM agent in real time, wherein the execution state specifically comprises waiting for execution, executing, abnormal execution and completed execution.
Further, in the step of determining whether the return duration of the return signal exceeds the preset duration, the method further includes:
acquiring internal clock data preset by the KVM seat;
judging whether the internal clock data is matched with the pre-recorded public clock data or not;
And if not, synchronizing the internal clock data based on the common clock data, and carrying out error correction on each node according to the internal clock data.
Further, in the step of determining whether the data fragments can be combined to obtain the operation data, the method further includes:
Identifying a receiving sequence of each item of data slicing in the KVM agent based on a slicing sequence pre-established for each item of data slicing;
judging whether the receiving sequence is matched with the slicing order;
If not, detecting the number of fragments in the sequence of fragments, collecting repeated data fragments existing in the receiving sequence according to the number of fragments, and removing redundancy from the repeated data fragments in the KVM seat.
Further, the step of collecting the return signals of each controlled device to the KVM seat based on the central control content preset by the KVM seat further includes:
Identifying the controlled equipment corresponding to the controllable equipment in a preset control area based on the controllable equipment pre-recorded by the KVM seat;
judging whether the controlled equipment can establish a communication bridge with the KVM seat;
if yes, dividing uncontrolled equipment and controlled equipment in the control area, independently constructing a short-time communication area for the controlled equipment, outputting a communication request to the controlled equipment through the KVM seat, and acquiring the communication authority of the controlled equipment from the KVM seat according to the communication request.
The invention also provides a resource scheduling system of the environment-controlled distributed KVM seat, which comprises:
the acquisition module is used for acquiring return signals of all controlled devices to the KVM seat based on the preset central control content of the KVM seat;
the judging module is used for judging whether the return time length of the return signal exceeds a preset time length;
The execution module is used for acquiring the communication state among all nodes in the KVM seat, identifying fault nodes among all nodes according to the communication state, marking fault equipment corresponding to the fault nodes in the KVM seat, acquiring operation data transmitted to the KVM seat by the fault equipment through the fault nodes by applying preset decentralization storage, storing the operation data on other non-fault nodes in a distributed manner, and generating all data fragments of the operation data recorded by the non-fault nodes in the KVM seat;
the second judging module is used for judging whether the data fragments can be combined to obtain the operation data;
And the second execution module is used for constructing a calculation task of the operation data in the KVM seat if not, distributing the calculation task into a corresponding calculation subtask based on available resources of other non-fault nodes, carrying out resource scheduling distribution on the other non-fault nodes according to the calculation subtask, dynamically adjusting the calculation resources required by the other non-fault nodes when executing the calculation subtask, carrying out data aggregation on the calculation results of the other non-fault nodes, and integrating to obtain the operation data.
Further, the execution module includes:
Configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
Judging whether each node can be connected to the blockchain network;
If yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
Further, the method further comprises the following steps:
The identification module is used for dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying the query instruction output by the KVM agent;
The third judging module is used for judging whether the query instruction can correspond to the position attribute of the data fragment;
And the third execution module is used for determining a storage node corresponding to the data fragment from the fragment index and the storage metadata if the data fragment is stored, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
The invention provides a resource scheduling method and a resource scheduling system for a ring control distributed KVM seat, which have the following beneficial effects:
The invention can detect the abnormal condition of the return signal in time through the preset central control content and the monitoring of the return signal, judge whether the abnormal condition of the return signal exceeds the preset time length, immediately take measures once the abnormality of the return signal is found, quickly identify the fault node and mark the corresponding fault equipment, thereby being beneficial to timely carrying out fault diagnosis and processing, and dynamically reassign the calculation task from other non-fault nodes and carrying out resource scheduling and assignment after the fault node is found, so as to ensure the smooth execution of the task and the stable operation of the system.
Drawings
FIG. 1 is a flow chart of one embodiment of a resource scheduling method for a centralized control distributed KVM agent of the present invention;
FIG. 2 is a block diagram illustrating an embodiment of a resource scheduling system for a centralized distributed KVM agent according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present invention, as the achievement, functional features, and advantages of the present invention are further described with reference to the embodiments, with reference to the accompanying drawings.
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a resource scheduling method for a ring control distributed KVM agent according to an embodiment of the invention includes:
S1: based on the preset central control content of the KVM seat, acquiring return signals of all controlled devices to the KVM seat;
s2: judging whether the return time length of the return signal exceeds a preset time length;
S3: if yes, acquiring a communication state among all nodes in the KVM seat, identifying a fault node among all nodes according to the communication state, marking fault equipment corresponding to the fault node in the KVM seat, acquiring operation data transmitted to the KVM seat by the fault equipment through the fault node by applying preset decentralization storage, storing the operation data on other non-fault nodes in a distributed manner, and generating all data fragments of the operation data recorded by the non-fault node from the KVM seat;
s4: judging whether the data fragments can be combined to obtain the operation data;
S5: if not, constructing a calculation task of the operation data in the KVM agent, distributing the calculation task into corresponding calculation subtasks based on available resources of other non-fault nodes, carrying out resource scheduling distribution on the other non-fault nodes according to the calculation subtasks, dynamically adjusting the calculation resources required by the other non-fault nodes when executing the calculation subtasks, carrying out data aggregation on calculation results of the other non-fault nodes, and integrating to obtain the operation data.
In this embodiment, the system collects return signals output by the KVM agent to the KVM agent by the controlled devices based on the central control content preset by the KVM agent, and then the system judges whether the return duration of the return signals exceeds the preset duration to execute the corresponding steps; For example, when the system determines that the return time length of the return signal output by the controlled device to the KVM agent does not exceed the preset time length, the system considers that the communication between the device and the KVM agent is normal, no abnormal condition occurs, the system records the normal return signal time length and the corresponding device thereof, so as to be used as a reference for the operation of the system in the future, and continuously monitors the time length of the return signal at the same time, so as to ensure that the communication between the device and the KVM agent is always in a normal state, and periodically checks the communication state between the device and the KVM agent, prevents potential faults and timely processes the potential faults so as to ensure the stability and reliability of the system; For example, when the system determines that the return time of the return signal output by the controlled device to the KVM agent exceeds the preset time, the system considers that the communication between the device and the KVM agent is abnormal, the system obtains the communication state between each node in the KVM agent, identifies the fault node from each node according to the different communication states, marks the fault device corresponding to the fault node which cannot be communicated in the KVM agent, applies the preset decentralizing storage to obtain the operation data which are being transmitted by the fault device through the fault node, stores the operation data in other adjacent non-fault nodes in a distributed manner, Generating fragments of various data recorded by non-fault nodes on the operation data from the KVM agents, the system can quickly locate equipment with abnormal communication by acquiring the communication state among the nodes in the KVM agents and identifying the fault nodes, is favorable for quickly finding faults and reducing fault processing time, improves the usability of the system, simultaneously applies a preset decentralizing storage mechanism, ensures that the operation data transmitted by the fault equipment can be timely acquired, distributes and stores the data on other adjacent non-fault nodes, so that even if one node fails, the data can be stored and used by other nodes, The reliability and the durability of the data are ensured, the operation data which are being transmitted by the fault equipment are distributed and stored on other adjacent non-fault nodes, and various data fragments of the operation data are generated, so that the redundant storage and recovery of the data are realized, even if one node breaks down, the lost data fragments can be recovered through other nodes, and the integrity and the usability of the data are ensured; then the system judges whether the data fragments of the non-fault node can be recombined to obtain the original operation data of the fault equipment so as to execute the corresponding steps; For example, when the system determines that the data fragments of the non-fault node can be recombined to obtain the original operation data of the fault device, the system considers that the data of the fault device has been successfully recovered and rebuilt on other non-fault nodes, the system can recombine the data fragments recorded on the non-fault node into complete operation data, ensure the sequence and the integrity of the data fragments, perform the data merging and combining operation, simultaneously apply the operation data obtained by the recombination to the system, recover the function and the state of the fault device, including configuring, updating or repairing the system to ensure that the system can normally operate and recover to the state before the fault occurs, After the function of the fault equipment is recovered, the operation condition of the system is continuously monitored, necessary adjustment and optimization are carried out, the system can continuously and stably operate, and the fault processing process is summarized and thinked back so as to improve the fault tolerance and stability of the system; for example, when the system determines that the data fragments of the non-fault node cannot be recombined to obtain the original operation data of the fault device, the system considers that the data of the fault device cannot be recovered and reconstructed on other non-fault nodes, the system constructs a calculation task for the operation data in the KVM agent, distributes the calculation task into corresponding calculation subtasks based on available resources of other non-fault nodes, performs resource scheduling and distribution for other non-fault nodes according to each calculation subtask, dynamically adjusts the calculation resources required by the other non-fault nodes when executing the calculation subtasks, performs data aggregation on the calculation results of the other non-fault nodes, To recombine to obtain operational data of the faulty device; The system ensures that the computing task can be smoothly executed on other non-fault nodes through dynamic scheduling and resource allocation, is beneficial to improving the availability of the system, and can reconstruct and recover data through other nodes even if one node fails, and simultaneously, the system tries to recover the operation data of the fault equipment through data aggregation and recombination, so as to reduce the risk of data loss, because the system can still regenerate the data through the computing task even if the data of the fault equipment cannot be directly recovered, the risk caused by the data loss is reduced, and the system can recover the data of the fault equipment under the condition that the data of the fault equipment cannot be recovered through reconstructing the computing task and the dynamic resource scheduling, the data can be effectively processed and rebuilt, the elasticity and fault tolerance of the system can be improved, the influence of faults on the system can be reduced, and the stable operation of the system can be ensured.
It should be noted that, a specific example of data aggregation of the calculation results of the other non-faulty nodes is as follows:
assuming a KVM seat system, comprising 3 nodes, each node is responsible for monitoring and controlling a server, and the status information of each server includes CPU utilization and memory utilization, the steps are as follows:
Firstly, collecting data, periodically collecting state information of a responsible server from each node, including CPU utilization rate and memory utilization rate, reporting the information to a central control node in a digital form, integrating and checking the data after the central control node collects the state information reported by each node, ensuring the integrity and accuracy of the data, checking whether missing or wrong data exist or not, and then carrying out data aggregation on the collected state information of the servers reported by each node through the central control node, wherein the calculation comprises the calculation of the average CPU utilization rate and the average memory utilization rate of all the servers, and finally generating the state information on a system and outputting the state information to a monitoring panel for a KVM seat manager to check; for example, the average CPU utilization rate of the whole system is 60% and the average memory utilization rate is 70% displayed on the monitoring panel;
in the above example, the process of data aggregation involves summarizing and calculating the server state information reported by each node to obtain comprehensive state information of the entire system.
In this embodiment, the step S3 of acquiring the communication state between each node in the KVM agent and identifying the failed node from among the nodes according to the communication state includes:
s31: configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
s32: judging whether each node can be connected to the blockchain network;
S33: if yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
In this embodiment, the system configures each node in the KVM agent as a monitoring object of the blockchain network based on the blockchain network that is built in advance, and detects a lower resource limit of each node at the same time, and then the system determines whether each node can be successfully connected to the blockchain network, so as to execute a corresponding step; for example, when the system determines that there are nodes which cannot be connected to the blockchain network in each node in the KVM agent, the system considers that the nodes encounter connection problems or network faults when performing tasks related to the blockchain network, so that the blockchain data cannot be acquired from the nodes or the data cannot be submitted to the blockchain network, the system performs fault diagnosis on the nodes which cannot be connected to the blockchain network, determines specific reasons of the faults, and can include network configuration problems, node software or hardware faults and network delays, and simultaneously checks the network connection condition of the nodes to ensure that the network configuration is correct, enables the nodes to be normally connected to the nodes where the blockchain network is located, and tries to restart the nodes where the nodes cannot be connected to the blockchain network in order to try to solve the connection faults possibly caused by the software or configuration problems, and rechecks whether the nodes can be reconnected to the blockchain network after restarting; for example, when the system determines that there are no nodes which cannot be connected to the blockchain network in the nodes in the KVM agent, the system considers that the nodes can execute tasks related to the blockchain network at the moment, periodically sends preset heartbeat signals to the nodes through the blockchain network within a preset period, receives communication requests returned to the KVM agent by the nodes according to the heartbeat signals, acquires communication traffic between the nodes by using a preset network packet capturing tool, and extracts communication parameters between the nodes from the communication traffic; the system sends heartbeat signals to each node regularly and receives communication requests of the nodes, the system can monitor the response state of each node in real time, if a certain node cannot respond to the heartbeat signals or the communication requests, the system can immediately find and mark the node to confirm the node as a fault node, meanwhile, by analyzing communication traffic, the system can find problems of network congestion, packet loss rate and delay, and management personnel can timely take measures to optimize network performance, and the system can acquire the communication state and network quality parameters of each node in real time, so that real-time feedback and decision support are provided for the management personnel of the system, and the management personnel can timely adjust system configuration and optimize network layout according to the information to ensure the stability and performance of the system.
It should be noted that, monitoring the lower limit of resources can help the system prevent potential failures, because if the node resources are close to or lower than the lower limit, that is, it is predicted that a node may become a failed node, the system will send out an alarm and take measures, such as adding resources or reallocating tasks, to prevent the failure of the node caused by insufficient resources.
In this embodiment, the step S3 of storing the operation data in other non-faulty nodes, and generating, from the KVM agents, each data fragment in which the non-faulty node records the operation data, further includes:
S301: dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying a query instruction output by the KVM agent;
s302: judging whether the query instruction can correspond to the position attribute of the data fragment or not;
s303: if yes, determining a storage node corresponding to the data fragment from the fragment index and the storage metadata, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
In this embodiment, the system divides the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifies a query instruction output by the KVM agent at the same time, and then the system determines whether the query instruction can correspond to the position attributes of the data fragments, so as to execute corresponding steps; for example, when the system determines that the query instruction sent by the KVM agent cannot correspond to the location attribute of the data fragment, the system considers that the instruction cannot be correctly located to the corresponding data fragment, that is, cannot find the data fragment at the specified location, possibly because the location attribute of the data fragment is not matched with the location information provided in the instruction, or possibly because the data fragment is lost or damaged, the system checks whether the location information provided in the query instruction is matched with the location attribute of the data fragment, because there may be a situation that the location information provided in the instruction is wrong or inaccurate, verification and correction are required by a manager, and if the data fragment matched with the instruction cannot be found, the system tries to find backup data, including acquiring corresponding copies of the data fragment from other nodes, or taking other methods to temporarily bypass the data fragment at the specified location, and if the data fragment is lost or damaged, the system suggests that the manager needs to perform data repair and restoration operations, including repairing the lost data fragment by using redundant data or error correction codes, or reacquiring the lost data fragment from other sources; for example, when the system determines that the query instruction sent by the KVM agent can correspond to the location attribute of the data fragment, the system considers that the query instruction can locate the corresponding data fragment in some nodes, the system determines storage nodes corresponding to the data fragment from the fragment index and the storage metadata, initiates a data access request to the storage nodes according to the query instruction output by the manager in the KVM agent, and performs data reading or data entry on the storage nodes according to the data access request so as to generate the operation log content of the data fragment in the KVM agent; the system can correctly locate the storage nodes of the data fragments through the query instruction and initiate the data access requests to the nodes, so that the data query and access can be effectively performed in the KVM agent, the availability and the access efficiency of the data are improved, meanwhile, the system can record the operation information of each data access request through generating the operation log content of the data fragments in the KVM agent, the operation behavior, the time and the node information of the data are read or recorded, an important basis is provided for the subsequent data management, audit and tracing of management personnel, and the system can ensure the consistency and the integrity of the data operation through generating the data operation log in the KVM agent, so that all the data access requests and the operation log are subjected to centralized management, and the data is effectively monitored and managed, and the data inconsistency or loss is prevented.
It should be noted that, data reading or data entry, that is, initiating a data access request to a determined storage node through a KVM agent, requesting to read or write data, when reading data, an application program initiates a read request to the storage node and obtains the content of a data fragment, and when writing data, an application program initiates a write request to the storage node and writes the data fragment into the storage node.
In this embodiment, a computing task of the operation data is built in the KVM agent, and the step S5 of distributing the computing task into corresponding computing sub-tasks based on available resources of the other non-faulty nodes further includes:
S51: selecting a pre-recorded scheduling algorithm to cooperate with the execution sequence and the allocation strategy of the computing task based on task resource requirements preset by the KVM seat, wherein the scheduling algorithm specifically comprises first come first serve, shortest job priority and shortest residual time priority;
S52: judging whether the resource load of the computing task when being executed exceeds a load threshold preset by the KVM seat or not;
s53: if yes, acquiring real-time load information and task queue data of the KVM agent, carrying out resource dynamic scheduling and resource reassignment on the computing task according to the real-time load information and the task queue data, and generating an execution state of the computing task on the KVM agent in real time, wherein the execution state specifically comprises waiting for execution, executing, abnormal execution and completed execution.
In this embodiment, the system selects a scheduling algorithm recorded in advance to cooperate with an execution sequence and an allocation strategy of a computing task based on task resource requirements preset by the KVM agent, and then the system judges whether a resource load of the computing task when executed exceeds a load threshold preset by the KVM agent so as to execute corresponding steps; for example, when the system determines that the resource load of the computing task in execution does not exceed the load threshold preset by the KVM agent, the system considers that the current system resource use condition is good, no resource overload condition occurs, the system still needs to continuously monitor the system resource use condition, can periodically collect resource use data, compare the resource use data with the preset load threshold, timely discover and prevent potential resource overload problems, and simultaneously, although the current resource load does not exceed the threshold, the system still continuously monitors the utilization rate of the resource, is beneficial to timely discover the change trend of the resource utilization rate, so as to adjust the resource allocation and task scheduling strategy when needed, adapt to the change of the system load, record the data of the current resource load, and periodically analyze and evaluate the data, and help a system administrator to know the working state of the system, discover potential problems and timely take measures to solve; for example, when the system determines that the resource load of the computing task during execution exceeds a load threshold preset by the KVM seat, the system considers that overload condition exists in resource use at the moment, and the system acquires real-time load information and task queue data of the KVM seat, dynamically schedules and redistributes the resource of the computing task according to the real-time load information and the task queue data, and generates an execution state of the computing task on the KVM seat in real time; the system can dynamically schedule the calculation tasks according to the current resource use condition by acquiring the real-time load information and the task queue data of the KVM agent, so that the resource overload can be avoided, the stability and the performance of the system are ensured, meanwhile, when the resource load exceeds a preset threshold value, the system can reallocate the resources to each calculation task according to the real-time load information and the task queue data so as to balance the load of the system, thereby being beneficial to fully utilizing the system resources, improving the efficiency and the performance of the system, generating the execution state of the calculation tasks on the KVM agent in real time, enabling a manager to monitor the execution condition of the tasks in real time, enabling the manager to know the execution progress of each task, the resource occupation condition and other information, and timely finding and solving the problems in the task execution process.
The specific examples of the dynamic scheduling and the reassigning of the resources are as follows:
Assuming that a plurality of nodes exist in a KVM seat, the CPU utilization rate of one node exceeds a preset threshold value, and the resource utilization rate of other nodes is lower, at the moment, the system can solve the problem through dynamic resource scheduling and resource redistribution;
Dynamic scheduling of resources: the system adjusts the execution sequence or priority of the tasks according to the task queue data, so that the node with higher resource utilization rate preferentially executes the tasks with smaller resource requirements, and the node with lower resource utilization rate executes the tasks with larger resource requirements;
Resource redistribution: the system identifies a node with higher utilization rate of CPU as a resource bottleneck, and dynamically adjusts resources to redistribute part of calculation tasks from the node to other nodes with lower utilization rate of resources so as to balance the load of the system.
In this embodiment, in step S2 of determining whether the return duration of the return signal exceeds the preset duration, the method further includes:
S21: acquiring internal clock data preset by the KVM seat;
s22: judging whether the internal clock data is matched with the pre-recorded public clock data or not;
s23: and if not, synchronizing the internal clock data based on the common clock data, and carrying out error correction on each node according to the internal clock data.
In this embodiment, the system acquires the internal clock data preset by the KVM agent, and then determines whether the internal clock data matches the public clock data recorded in advance, so as to execute the corresponding steps; for example, when the system determines that the internal clock data preset by the KVM agent can be matched with the public clock data recorded in advance, the system considers that the query command output by the KVM agent to each node does not have a time length error, the system can be realized by comparing the time stamp of the query command with the internal clock data of the system, and confirms that the time sequence of the query command output by the KVM agent is consistent with the internal clock and is not affected by the clock error; for example, when the system determines that the internal clock data preset by the KVM agent cannot match the public clock data recorded in advance, the system considers that a time length error may exist in a query instruction output by the KVM agent to each node, so that the instruction received by each node is delayed, and the system synchronizes the internal clock data based on the public clock data and performs error correction on each node from the KVM agent according to the internal clock data; the system can reduce the time length error of the query instruction output by the KVM agent to each node by synchronizing the internal clock data, thereby reducing the delay of the node receiving the instruction, being beneficial to improving the time sequence precision and the response speed of the system, optimizing the accuracy and the stability of the instruction transmission by correcting the error of each node, ensuring that the instruction received by each node can be accurately executed on time, being beneficial to ensuring the normal operation and the stability of the system, ensuring that each node receives the same instruction at the same time point, being beneficial to ensuring the consistency and the synchronism of the data among each node, being very important to the data management and the processing of the distributed system,
Note that, specific examples of the error correction are as follows:
Firstly, the system needs to collect clock data of each node, including the current clock value of each node and the deviation from the common clock data, compare the clock data of each node with the common clock data, calculate the deviation value of each node relative to the common clock, this deviation value represents the error amount of the node clock relative to the common clock, then correct the clock data of each node according to the calculated error value, through increasing or decreasing the clock value of each node, so as to keep the clock of each node synchronous with the common clock, then send the corrected clock data to each node, so that they update their own clock value, thus the clocks of all nodes will be synchronized to the value close to the common clock, thereby reducing the time sequence error between nodes, finally after completing the error correction, the system needs to monitor the clock data of each node continuously, and check the correction effect, if the clock data of each node is found to deviate too much, the manager needs to correct or adjust the correction strategy again, and since the clock drift and error may change with time, the system needs to update the error correction periodically, so as to ensure that the clock data of each node remains synchronous with the common clock always.
In this embodiment, in step S4 of determining whether the data fragments can be combined to obtain the operation data, the method further includes:
S41: identifying a receiving sequence of each item of data slicing in the KVM agent based on a slicing sequence pre-established for each item of data slicing;
S42: judging whether the receiving sequence is matched with the slicing order;
S43: if not, detecting the number of fragments in the sequence of fragments, collecting repeated data fragments existing in the receiving sequence according to the number of fragments, and removing redundancy from the repeated data fragments in the KVM seat.
In this embodiment, the system identifies a reception sequence of each data slice in the KVM agent based on a slice sequence pre-established for the data slice, and then the system determines whether the reception sequence matches the slice sequence to execute a corresponding step; for example, when the system determines that the received sequence of the data fragments can match the pre-established sequence of fragments, the system considers that the sequence of the data fragments received by each node is consistent with the expected sequence, which means that the transmission and assembly processes of the data fragments are successfully completed, the system checks the received sequence of the data fragments with the expected sequence of the fragments to confirm that all the data fragments have been successfully received by each node and are combined according to the expected sequence, meanwhile, since the data fragments have arrived at each node according to the correct sequence, the subsequent processing steps can be smoothly performed, the situation of data assembly errors or losses can not occur, and when the subsequent processing steps are performed, the system should record the receiving state of the data fragments and the execution situation of the subsequent processing steps, which is helpful for the system to monitor the progress of data transmission and processing, and the manager can perform investigation and repair in time when the occurrence of problems is detected; for example, when the system determines that the received sequence of the data fragments cannot match the pre-established sequence of fragments, the system considers that the sequence of the data fragments received by each node is inconsistent with the expected sequence, and the system acquires repeated data fragments existing in the received sequence according to the number of fragments by detecting the number of fragments in the sequence of fragments, and redundancy removing the repeated data fragments from the KVM agent; the system attempts to recover the correct sequence of the data by detecting the number of the fragments in the sequence of the fragments, removes the repeated data fragments, and can not ensure that all the data fragments can be recovered according to the expected sequence, but can reduce the influence caused by the inconsistency of the sequence of the data fragments as much as possible, meanwhile, the repeated data fragments can reduce the data redundancy, save the storage space and the network bandwidth, effectively reduce the resources required by data storage and transmission by identifying and removing the repeated data fragments, improve the efficiency and the performance of the system, and improve the reliability of data transmission and processing by removing the repeated data fragments and recovering the data sequence because the reduction of the data redundancy and the recovery of the correct data sequence help to reduce the possibility of errors of the system and improve the stability and the reliability of the system.
In this embodiment, based on the central control content preset by the KVM seat, the step S1 of collecting the return signals of each controlled device to the KVM seat further includes:
S11: identifying the controlled equipment corresponding to the controllable equipment in a preset control area based on the controllable equipment pre-recorded by the KVM seat;
S12: judging whether the controlled equipment can establish a communication bridge with the KVM seat;
S13: if yes, dividing uncontrolled equipment and controlled equipment in the control area, independently constructing a short-time communication area for the controlled equipment, outputting a communication request to the controlled equipment through the KVM seat, and acquiring the communication authority of the controlled equipment from the KVM seat according to the communication request.
In this embodiment, the system identifies controlled devices corresponding to the controllable devices in a preset control area based on controllable devices recorded in advance by the KVM agent, and then the system determines whether the controlled devices can establish a communication bridge with the KVM agent to execute the corresponding steps; for example, when the system determines that the controlled device cannot establish a communication bridge with the KVM agent, the system considers that the controlled device cannot effectively communicate with the KVM agent, possibly due to a device failure, a network problem or communication interruption caused by other reasons, checks the connection state of the controlled device, ensures that the power supply, the network connection and other necessary connections of the device work normally, if the physical connection problem of the device is detected, the system suggests that the manager solves the problem in time, and simultaneously confirms whether the network configuration between the device and the KVM agent is correct, checks network settings, IP addresses, subnet masks and gateway parameters, ensures that the device and the KVM agent can communicate with each other in the same network environment, and still cannot solve the communication problem after the steps, possibly that the device itself fails, uses a diagnostic tool or a device management system to diagnose the device failure, and suggests that the manager takes corresponding repair measures for the reason of the device failure; for example, when the system determines that the controlled device can establish a communication bridge with the KVM agent, the system considers that the KVM agent can communicate with the controlled device, and divides the uncontrolled device and the controlled device into a control area, and independently constructs a short-time communication area for the controlled device, and simultaneously outputs a communication request to the controlled device through the KVM agent, and obtains the communication authority of the controlled device from the KVM agent according to the communication request; the system can separate the well controlled equipment from the control area and construct a short-time communication area for the well controlled equipment, so that the mixing of communication can be reduced, the communication efficiency of the system is improved, the communication request can reach the target equipment quickly and accurately, the overall response speed and efficiency of the system are improved, meanwhile, by constructing the independent communication area for the well controlled equipment, the system can reduce interference and conflict possibly occurring in communication, improve the reliability and stability of the communication, the communication request can be smoothly transmitted to the target equipment, timely response can be obtained, the overall reliability of the system is improved, the well controlled equipment is separated, the short-time communication area is established for the well controlled equipment, the safety of the communication is improved, the unauthorized access and attack can be effectively prevented by limiting the range and duration of the communication area, and the safety of the equipment and the system is protected.
Referring to FIG. 2, a resource scheduling system for a centralized and controlled distributed KVM agent according to an embodiment of the present invention comprises:
The acquisition module 10 is used for acquiring return signals of all controlled devices to the KVM seat based on the preset central control content of the KVM seat;
A judging module 20, configured to judge whether a return duration of the return signal exceeds a preset duration;
the execution module 30 is configured to obtain a communication state between each node in the KVM agent if the communication state is positive, identify a fault node from among the nodes according to the communication state, mark a fault device corresponding to the fault node in the KVM agent, apply a preset decentralization storage to obtain operation data transmitted to the KVM agent by the fault device through the fault node, store the operation data in other non-fault nodes in a distributed manner, and generate each data fragment of the operation data recorded by the non-fault node from the KVM agent;
a second judging module 40, configured to judge whether the data fragments can be combined to obtain the operation data;
and the second execution module 50 is configured to construct a calculation task of the operation data in the KVM agent if not, distribute the calculation task to a corresponding calculation subtask based on available resources of the other non-fault nodes, perform resource scheduling and distribution for the other non-fault nodes according to the calculation subtask, dynamically adjust the calculation resources required by the other non-fault nodes when executing the calculation subtask, and perform data aggregation on the calculation results of the other non-fault nodes, so as to integrate the operation data.
In this embodiment, the acquisition module 10 acquires return signals output by the KVM agent to each controlled device by the KVM agent based on the central control content preset by the KVM agent, and then the judgment module 20 judges whether the return duration of the return signals exceeds the preset duration to execute corresponding steps; For example, when the system determines that the return time length of the return signal output by the controlled device to the KVM agent does not exceed the preset time length, the system considers that the communication between the device and the KVM agent is normal, no abnormal condition occurs, the system records the normal return signal time length and the corresponding device thereof, so as to be used as a reference for the operation of the system in the future, and continuously monitors the time length of the return signal at the same time, so as to ensure that the communication between the device and the KVM agent is always in a normal state, and periodically checks the communication state between the device and the KVM agent, prevents potential faults and timely processes the potential faults so as to ensure the stability and reliability of the system; For example, when the system determines that the return time of the return signal output by the controlled device to the KVM agent exceeds the preset time, the execution module 30 considers that the communication between the device and the KVM agent is abnormal, the system obtains the communication state between the nodes in the KVM agent, identifies the fault node from the nodes according to the different communication states, marks the fault device corresponding to the fault node which cannot be communicated in the KVM agent, applies the preset decentralization storage to obtain the operation data being transmitted by the fault device through the fault node, distributes and stores the operation data on other adjacent non-fault nodes, Generating fragments of various data recorded by non-fault nodes on the operation data from the KVM agents, the system can quickly locate equipment with abnormal communication by acquiring the communication state among the nodes in the KVM agents and identifying the fault nodes, is favorable for quickly finding faults and reducing fault processing time, improves the usability of the system, simultaneously applies a preset decentralizing storage mechanism, ensures that the operation data transmitted by the fault equipment can be timely acquired, distributes and stores the data on other adjacent non-fault nodes, so that even if one node fails, the data can be stored and used by other nodes, The reliability and the durability of the data are ensured, the operation data which are being transmitted by the fault equipment are distributed and stored on other adjacent non-fault nodes, and various data fragments of the operation data are generated, so that the redundant storage and recovery of the data are realized, even if one node breaks down, the lost data fragments can be recovered through other nodes, and the integrity and the usability of the data are ensured; Then the second judging module 40 judges whether the data fragments of the non-fault node can be recombined to obtain the original operation data of the fault equipment so as to execute the corresponding steps; For example, when the system determines that the data fragments of the non-fault node can be recombined to obtain the original operation data of the fault device, the system considers that the data of the fault device has been successfully recovered and rebuilt on other non-fault nodes, the system can recombine the data fragments recorded on the non-fault node into complete operation data, ensure the sequence and the integrity of the data fragments, perform the data merging and combining operation, simultaneously apply the operation data obtained by the recombination to the system, recover the function and the state of the fault device, including configuring, updating or repairing the system to ensure that the system can normally operate and recover to the state before the fault occurs, After the function of the fault equipment is recovered, the operation condition of the system is continuously monitored, necessary adjustment and optimization are carried out, the system can continuously and stably operate, and the fault processing process is summarized and thinked back so as to improve the fault tolerance and stability of the system; For example, when the system determines that the data fragments of the non-faulty node cannot be recombined to obtain the original operation data of the faulty device, the second execution module 50 considers that the data of the faulty device cannot be recovered and reconstructed on other non-faulty nodes, the system constructs a calculation task for the operation data in the KVM agent, distributes the calculation task into corresponding calculation subtasks based on available resources of other non-faulty nodes, performs resource scheduling and distribution for other non-faulty nodes according to each calculation subtask, dynamically adjusts the calculation resources required when the other non-faulty nodes execute the calculation subtasks, performs data aggregation on the calculation results of the other non-faulty nodes, To recombine to obtain operational data of the faulty device; The system ensures that the computing task can be smoothly executed on other non-fault nodes through dynamic scheduling and resource allocation, is beneficial to improving the availability of the system, and can reconstruct and recover data through other nodes even if one node fails, and simultaneously, the system tries to recover the operation data of the fault equipment through data aggregation and recombination, so as to reduce the risk of data loss, because the system can still regenerate the data through the computing task even if the data of the fault equipment cannot be directly recovered, the risk caused by the data loss is reduced, and the system can recover the data of the fault equipment under the condition that the data of the fault equipment cannot be recovered through reconstructing the computing task and the dynamic resource scheduling, the data can be effectively processed and rebuilt, the elasticity and fault tolerance of the system can be improved, the influence of faults on the system can be reduced, and the stable operation of the system can be ensured.
In this embodiment, the execution module includes:
Configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
Judging whether each node can be connected to the blockchain network;
If yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
In this embodiment, the system configures each node in the KVM agent as a monitoring object of the blockchain network based on the blockchain network that is built in advance, and detects a lower resource limit of each node at the same time, and then the system determines whether each node can be successfully connected to the blockchain network, so as to execute a corresponding step; for example, when the system determines that there are nodes which cannot be connected to the blockchain network in each node in the KVM agent, the system considers that the nodes encounter connection problems or network faults when performing tasks related to the blockchain network, so that the blockchain data cannot be acquired from the nodes or the data cannot be submitted to the blockchain network, the system performs fault diagnosis on the nodes which cannot be connected to the blockchain network, determines specific reasons of the faults, and can include network configuration problems, node software or hardware faults and network delays, and simultaneously checks the network connection condition of the nodes to ensure that the network configuration is correct, enables the nodes to be normally connected to the nodes where the blockchain network is located, and tries to restart the nodes where the nodes cannot be connected to the blockchain network in order to try to solve the connection faults possibly caused by the software or configuration problems, and rechecks whether the nodes can be reconnected to the blockchain network after restarting; for example, when the system determines that there are no nodes which cannot be connected to the blockchain network in the nodes in the KVM agent, the system considers that the nodes can execute tasks related to the blockchain network at the moment, periodically sends preset heartbeat signals to the nodes through the blockchain network within a preset period, receives communication requests returned to the KVM agent by the nodes according to the heartbeat signals, acquires communication traffic between the nodes by using a preset network packet capturing tool, and extracts communication parameters between the nodes from the communication traffic; the system sends heartbeat signals to each node regularly and receives communication requests of the nodes, the system can monitor the response state of each node in real time, if a certain node cannot respond to the heartbeat signals or the communication requests, the system can immediately find and mark the node to confirm the node as a fault node, meanwhile, by analyzing communication traffic, the system can find problems of network congestion, packet loss rate and delay, and management personnel can timely take measures to optimize network performance, and the system can acquire the communication state and network quality parameters of each node in real time, so that real-time feedback and decision support are provided for the management personnel of the system, and the management personnel can timely adjust system configuration and optimize network layout according to the information to ensure the stability and performance of the system.
In this embodiment, further comprising:
The identification module is used for dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying the query instruction output by the KVM agent;
The third judging module is used for judging whether the query instruction can correspond to the position attribute of the data fragment;
And the third execution module is used for determining a storage node corresponding to the data fragment from the fragment index and the storage metadata if the data fragment is stored, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
In this embodiment, the system divides the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifies a query instruction output by the KVM agent at the same time, and then the system determines whether the query instruction can correspond to the position attributes of the data fragments, so as to execute corresponding steps; for example, when the system determines that the query instruction sent by the KVM agent cannot correspond to the location attribute of the data fragment, the system considers that the instruction cannot be correctly located to the corresponding data fragment, that is, cannot find the data fragment at the specified location, possibly because the location attribute of the data fragment is not matched with the location information provided in the instruction, or possibly because the data fragment is lost or damaged, the system checks whether the location information provided in the query instruction is matched with the location attribute of the data fragment, because there may be a situation that the location information provided in the instruction is wrong or inaccurate, verification and correction are required by a manager, and if the data fragment matched with the instruction cannot be found, the system tries to find backup data, including acquiring corresponding copies of the data fragment from other nodes, or taking other methods to temporarily bypass the data fragment at the specified location, and if the data fragment is lost or damaged, the system suggests that the manager needs to perform data repair and restoration operations, including repairing the lost data fragment by using redundant data or error correction codes, or reacquiring the lost data fragment from other sources; for example, when the system determines that the query instruction sent by the KVM agent can correspond to the location attribute of the data fragment, the system considers that the query instruction can locate the corresponding data fragment in some nodes, the system determines storage nodes corresponding to the data fragment from the fragment index and the storage metadata, initiates a data access request to the storage nodes according to the query instruction output by the manager in the KVM agent, and performs data reading or data entry on the storage nodes according to the data access request so as to generate the operation log content of the data fragment in the KVM agent; the system can correctly locate the storage nodes of the data fragments through the query instruction and initiate the data access requests to the nodes, so that the data query and access can be effectively performed in the KVM agent, the availability and the access efficiency of the data are improved, meanwhile, the system can record the operation information of each data access request through generating the operation log content of the data fragments in the KVM agent, the operation behavior, the time and the node information of the data are read or recorded, an important basis is provided for the subsequent data management, audit and tracing of management personnel, and the system can ensure the consistency and the integrity of the data operation through generating the data operation log in the KVM agent, so that all the data access requests and the operation log are subjected to centralized management, and the data is effectively monitored and managed, and the data inconsistency or loss is prevented.
In this embodiment, the second execution module further includes:
The selecting unit is used for selecting a pre-recorded scheduling algorithm to cooperate with the execution sequence and the allocation strategy of the computing task based on the task resource requirement preset by the KVM agent, wherein the scheduling algorithm specifically comprises first come first serve, shortest job priority and shortest residual time priority;
The second judging unit is used for judging whether the resource load of the computing task when being executed exceeds a load threshold preset by the KVM seat;
And the second execution unit is used for acquiring the real-time load information and task queue data of the KVM seat if the calculation task is executed, carrying out resource dynamic scheduling and resource reassignment on the calculation task according to the real-time load information and the task queue data, and generating the execution state of the calculation task on the KVM seat in real time, wherein the execution state specifically comprises waiting for execution, executing, abnormal execution and completed execution.
In this embodiment, the system selects a scheduling algorithm recorded in advance to cooperate with an execution sequence and an allocation strategy of a computing task based on task resource requirements preset by the KVM agent, and then the system judges whether a resource load of the computing task when executed exceeds a load threshold preset by the KVM agent so as to execute corresponding steps; for example, when the system determines that the resource load of the computing task in execution does not exceed the load threshold preset by the KVM agent, the system considers that the current system resource use condition is good, no resource overload condition occurs, the system still needs to continuously monitor the system resource use condition, can periodically collect resource use data, compare the resource use data with the preset load threshold, timely discover and prevent potential resource overload problems, and simultaneously, although the current resource load does not exceed the threshold, the system still continuously monitors the utilization rate of the resource, is beneficial to timely discover the change trend of the resource utilization rate, so as to adjust the resource allocation and task scheduling strategy when needed, adapt to the change of the system load, record the data of the current resource load, and periodically analyze and evaluate the data, and help a system administrator to know the working state of the system, discover potential problems and timely take measures to solve; for example, when the system determines that the resource load of the computing task during execution exceeds a load threshold preset by the KVM seat, the system considers that overload condition exists in resource use at the moment, and the system acquires real-time load information and task queue data of the KVM seat, dynamically schedules and redistributes the resource of the computing task according to the real-time load information and the task queue data, and generates an execution state of the computing task on the KVM seat in real time; the system can dynamically schedule the calculation tasks according to the current resource use condition by acquiring the real-time load information and the task queue data of the KVM agent, so that the resource overload can be avoided, the stability and the performance of the system are ensured, meanwhile, when the resource load exceeds a preset threshold value, the system can reallocate the resources to each calculation task according to the real-time load information and the task queue data so as to balance the load of the system, thereby being beneficial to fully utilizing the system resources, improving the efficiency and the performance of the system, generating the execution state of the calculation tasks on the KVM agent in real time, enabling a manager to monitor the execution condition of the tasks in real time, enabling the manager to know the execution progress of each task, the resource occupation condition and other information, and timely finding and solving the problems in the task execution process.
In this embodiment, the judging module further includes:
the acquisition unit is used for acquiring the internal clock data preset by the KVM seat;
a third judging unit for judging whether the internal clock data matches the pre-recorded common clock data;
and the third execution unit is used for synchronizing the internal clock data based on the common clock data if not, and carrying out error correction on each node according to the internal clock data.
In this embodiment, the system acquires the internal clock data preset by the KVM agent, and then determines whether the internal clock data matches the public clock data recorded in advance, so as to execute the corresponding steps; for example, when the system determines that the internal clock data preset by the KVM agent can be matched with the public clock data recorded in advance, the system considers that the query command output by the KVM agent to each node does not have a time length error, the system can be realized by comparing the time stamp of the query command with the internal clock data of the system, and confirms that the time sequence of the query command output by the KVM agent is consistent with the internal clock and is not affected by the clock error; for example, when the system determines that the internal clock data preset by the KVM agent cannot match the public clock data recorded in advance, the system considers that a time length error may exist in a query instruction output by the KVM agent to each node, so that the instruction received by each node is delayed, and the system synchronizes the internal clock data based on the public clock data and performs error correction on each node from the KVM agent according to the internal clock data; the system can reduce the time length error of the query instruction output by the KVM agent to each node by synchronizing the internal clock data, thereby reducing the delay of the node receiving the instruction, being beneficial to improving the time sequence precision and the response speed of the system, optimizing the accuracy and the stability of the instruction transmission by correcting the error of each node, ensuring that the instruction received by each node can be accurately executed on time, being beneficial to ensuring the normal operation and the stability of the system, ensuring that each node receives the same instruction at the same time point, being beneficial to ensuring the consistency and the synchronism of the data among each node, being very important to the data management and the processing of the distributed system,
In this embodiment, the second judging module further includes:
an identifying unit, configured to identify, in the KVM agent, a reception sequence of each item of data fragmentation based on a fragmentation sequence pre-established for each item of data fragmentation;
a fourth judging unit, configured to judge whether the received sequence matches the slicing order;
and the fourth execution unit is used for detecting the number of fragments in the fragment sequence if not, acquiring the repeated data fragments existing in the receiving sequence according to the number of fragments, and removing redundancy from the repeated data fragments in the KVM seat.
In this embodiment, the system identifies a reception sequence of each data slice in the KVM agent based on a slice sequence pre-established for the data slice, and then the system determines whether the reception sequence matches the slice sequence to execute a corresponding step; for example, when the system determines that the received sequence of the data fragments can match the pre-established sequence of fragments, the system considers that the sequence of the data fragments received by each node is consistent with the expected sequence, which means that the transmission and assembly processes of the data fragments are successfully completed, the system checks the received sequence of the data fragments with the expected sequence of the fragments to confirm that all the data fragments have been successfully received by each node and are combined according to the expected sequence, meanwhile, since the data fragments have arrived at each node according to the correct sequence, the subsequent processing steps can be smoothly performed, the situation of data assembly errors or losses can not occur, and when the subsequent processing steps are performed, the system should record the receiving state of the data fragments and the execution situation of the subsequent processing steps, which is helpful for the system to monitor the progress of data transmission and processing, and the manager can perform investigation and repair in time when the occurrence of problems is detected; for example, when the system determines that the received sequence of the data fragments cannot match the pre-established sequence of fragments, the system considers that the sequence of the data fragments received by each node is inconsistent with the expected sequence, and the system acquires repeated data fragments existing in the received sequence according to the number of fragments by detecting the number of fragments in the sequence of fragments, and redundancy removing the repeated data fragments from the KVM agent; the system attempts to recover the correct sequence of the data by detecting the number of the fragments in the sequence of the fragments, removes the repeated data fragments, and can not ensure that all the data fragments can be recovered according to the expected sequence, but can reduce the influence caused by the inconsistency of the sequence of the data fragments as much as possible, meanwhile, the repeated data fragments can reduce the data redundancy, save the storage space and the network bandwidth, effectively reduce the resources required by data storage and transmission by identifying and removing the repeated data fragments, improve the efficiency and the performance of the system, and improve the reliability of data transmission and processing by removing the repeated data fragments and recovering the data sequence because the reduction of the data redundancy and the recovery of the correct data sequence help to reduce the possibility of errors of the system and improve the stability and the reliability of the system.
In this embodiment, the acquisition module further includes:
The second identifying unit is used for identifying the controlled equipment corresponding to the controllable equipment in a preset control area based on the controllable equipment pre-recorded by the KVM seat;
a fifth judging unit, configured to judge whether the controlled device can establish a communication bridge with the KVM seat;
And the fifth execution unit is used for dividing the uncontrolled equipment and the controlled equipment in the control area if the controlled equipment is enabled, independently constructing a short-time communication area for the controlled equipment, outputting a communication request to the controlled equipment through the KVM agent, and acquiring the communication authority of the controlled equipment from the KVM agent according to the communication request.
In this embodiment, the system identifies controlled devices corresponding to the controllable devices in a preset control area based on controllable devices recorded in advance by the KVM agent, and then the system determines whether the controlled devices can establish a communication bridge with the KVM agent to execute the corresponding steps; for example, when the system determines that the controlled device cannot establish a communication bridge with the KVM agent, the system considers that the controlled device cannot effectively communicate with the KVM agent, possibly due to a device failure, a network problem or communication interruption caused by other reasons, checks the connection state of the controlled device, ensures that the power supply, the network connection and other necessary connections of the device work normally, if the physical connection problem of the device is detected, the system suggests that the manager solves the problem in time, and simultaneously confirms whether the network configuration between the device and the KVM agent is correct, checks network settings, IP addresses, subnet masks and gateway parameters, ensures that the device and the KVM agent can communicate with each other in the same network environment, and still cannot solve the communication problem after the steps, possibly that the device itself fails, uses a diagnostic tool or a device management system to diagnose the device failure, and suggests that the manager takes corresponding repair measures for the reason of the device failure; for example, when the system determines that the controlled device can establish a communication bridge with the KVM agent, the system considers that the KVM agent can communicate with the controlled device, and divides the uncontrolled device and the controlled device into a control area, and independently constructs a short-time communication area for the controlled device, and simultaneously outputs a communication request to the controlled device through the KVM agent, and obtains the communication authority of the controlled device from the KVM agent according to the communication request; the system can separate the well controlled equipment from the control area and construct a short-time communication area for the well controlled equipment, so that the mixing of communication can be reduced, the communication efficiency of the system is improved, the communication request can reach the target equipment quickly and accurately, the overall response speed and efficiency of the system are improved, meanwhile, by constructing the independent communication area for the well controlled equipment, the system can reduce interference and conflict possibly occurring in communication, improve the reliability and stability of the communication, the communication request can be smoothly transmitted to the target equipment, timely response can be obtained, the overall reliability of the system is improved, the well controlled equipment is separated, the short-time communication area is established for the well controlled equipment, the safety of the communication is improved, the unauthorized access and attack can be effectively prevented by limiting the range and duration of the communication area, and the safety of the equipment and the system is protected.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. The resource scheduling method for the environment-controlled distributed KVM seat is characterized by comprising the following steps of:
Based on the preset central control content of the KVM seat, acquiring return signals of all controlled devices to the KVM seat;
judging whether the return time length of the return signal exceeds a preset time length;
If yes, acquiring a communication state among all nodes in the KVM seat, identifying a fault node among all nodes according to the communication state, marking fault equipment corresponding to the fault node in the KVM seat, acquiring operation data transmitted to the KVM seat by the fault equipment through the fault node by applying preset decentralization storage, storing the operation data on other non-fault nodes in a distributed manner, and generating all data fragments of the operation data recorded by the non-fault node from the KVM seat;
judging whether the data fragments can be combined to obtain the operation data;
If not, constructing a calculation task of the operation data in the KVM agent, distributing the calculation task into corresponding calculation subtasks based on available resources of other non-fault nodes, carrying out resource scheduling distribution on the other non-fault nodes according to the calculation subtasks, dynamically adjusting the calculation resources required by the other non-fault nodes when executing the calculation subtasks, carrying out data aggregation on calculation results of the other non-fault nodes, and integrating to obtain the operation data.
2. The method for resource scheduling of a ring-controlled distributed KVM agent according to claim 1, wherein the step of obtaining a communication status between nodes in the KVM agent and identifying a failed node from among the nodes according to the communication status comprises:
Configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
Judging whether each node can be connected to the blockchain network;
If yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
3. The method for resource scheduling of a ring-controlled distributed KVM agent according to claim 1, wherein the step of storing the operation data in other non-faulty nodes, and generating the non-faulty nodes from the KVM agent to record each piece of data of the operation data, further comprises:
dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying a query instruction output by the KVM agent;
judging whether the query instruction can correspond to the position attribute of the data fragment or not;
If yes, determining a storage node corresponding to the data fragment from the fragment index and the storage metadata, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
4. The method for resource scheduling of a ring-controlled distributed KVM agent according to claim 1, wherein the step of constructing a computing task of the operation data in the KVM agent and distributing the computing task into corresponding computing sub-tasks based on available resources of the other non-faulty nodes further comprises:
Selecting a pre-recorded scheduling algorithm to cooperate with the execution sequence and the allocation strategy of the computing task based on task resource requirements preset by the KVM seat, wherein the scheduling algorithm specifically comprises first come first serve, shortest job priority and shortest residual time priority;
Judging whether the resource load of the computing task when being executed exceeds a load threshold preset by the KVM seat or not;
If yes, acquiring real-time load information and task queue data of the KVM agent, carrying out resource dynamic scheduling and resource reassignment on the computing task according to the real-time load information and the task queue data, and generating an execution state of the computing task on the KVM agent in real time, wherein the execution state specifically comprises waiting for execution, executing, abnormal execution and completed execution.
5. The resource scheduling method of a ring-controlled distributed KVM seat according to claim 1, wherein the step of determining whether the return duration of the return signal exceeds a preset duration further comprises:
acquiring internal clock data preset by the KVM seat;
judging whether the internal clock data is matched with the pre-recorded public clock data or not;
And if not, synchronizing the internal clock data based on the common clock data, and carrying out error correction on each node according to the internal clock data.
6. The resource scheduling method of a ring control distributed KVM seat according to claim 1, wherein the step of determining whether the pieces of data can be combined to obtain the operation data further comprises:
Identifying a receiving sequence of each item of data slicing in the KVM agent based on a slicing sequence pre-established for each item of data slicing;
judging whether the receiving sequence is matched with the slicing order;
If not, detecting the number of fragments in the sequence of fragments, collecting repeated data fragments existing in the receiving sequence according to the number of fragments, and removing redundancy from the repeated data fragments in the KVM seat.
7. The method for resource scheduling of a ring-controlled distributed KVM agent according to claim 1, wherein the step of collecting return signals of each controlled device to the KVM agent based on the central control content preset by the KVM agent further comprises:
Identifying the controlled equipment corresponding to the controllable equipment in a preset control area based on the controllable equipment pre-recorded by the KVM seat;
judging whether the controlled equipment can establish a communication bridge with the KVM seat;
if yes, dividing uncontrolled equipment and controlled equipment in the control area, independently constructing a short-time communication area for the controlled equipment, outputting a communication request to the controlled equipment through the KVM seat, and acquiring the communication authority of the controlled equipment from the KVM seat according to the communication request.
8. A resource scheduling system for a centralized distributed KVM agent, comprising:
the acquisition module is used for acquiring return signals of all controlled devices to the KVM seat based on the preset central control content of the KVM seat;
the judging module is used for judging whether the return time length of the return signal exceeds a preset time length;
The execution module is used for acquiring the communication state among all nodes in the KVM seat, identifying fault nodes among all nodes according to the communication state, marking fault equipment corresponding to the fault nodes in the KVM seat, acquiring operation data transmitted to the KVM seat by the fault equipment through the fault nodes by applying preset decentralization storage, storing the operation data on other non-fault nodes in a distributed manner, and generating all data fragments of the operation data recorded by the non-fault nodes in the KVM seat;
the second judging module is used for judging whether the data fragments can be combined to obtain the operation data;
And the second execution module is used for constructing a calculation task of the operation data in the KVM seat if not, distributing the calculation task into a corresponding calculation subtask based on available resources of other non-fault nodes, carrying out resource scheduling distribution on the other non-fault nodes according to the calculation subtask, dynamically adjusting the calculation resources required by the other non-fault nodes when executing the calculation subtask, carrying out data aggregation on the calculation results of the other non-fault nodes, and integrating to obtain the operation data.
9. The resource scheduling system of a ring controlled distributed KVM agent of claim 8, wherein the execution module comprises:
Configuring each node in the KVM agent as a monitoring object of the blockchain network based on a pre-built blockchain network, and detecting a resource lower limit of each node, wherein the resource lower limit specifically comprises a computing resource lower limit and a storage resource lower limit;
Judging whether each node can be connected to the blockchain network;
If yes, sending a preset heartbeat signal to each node periodically in a preset period through the blockchain network, receiving a communication request returned by each node according to the heartbeat signal, collecting communication traffic among the nodes by using a preset network packet capturing tool, and extracting communication parameters among the nodes from the communication traffic, wherein the communication parameters specifically comprise delay, packet loss rate, bandwidth utilization rate, connection state and communication frequency.
10. The resource scheduling system of a ring controlled distributed KVM agent of claim 8, further comprising:
The identification module is used for dividing the data fragments into fragment indexes and storage metadata based on a preset data structure, and identifying the query instruction output by the KVM agent;
The third judging module is used for judging whether the query instruction can correspond to the position attribute of the data fragment;
And the third execution module is used for determining a storage node corresponding to the data fragment from the fragment index and the storage metadata if the data fragment is stored, initiating a data access request to the storage node according to the query instruction, reading or inputting data to the storage node according to the data access request, and generating the operation log content of the data fragment in the KVM agent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410512858.XA CN118264753A (en) | 2024-04-26 | 2024-04-26 | Resource scheduling method and system for environment-controlled distributed KVM agents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410512858.XA CN118264753A (en) | 2024-04-26 | 2024-04-26 | Resource scheduling method and system for environment-controlled distributed KVM agents |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118264753A true CN118264753A (en) | 2024-06-28 |
Family
ID=91611355
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410512858.XA Pending CN118264753A (en) | 2024-04-26 | 2024-04-26 | Resource scheduling method and system for environment-controlled distributed KVM agents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118264753A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119376994A (en) * | 2024-12-27 | 2025-01-28 | 山东海量信息技术研究院 | A fault-tolerant processing method, device, medium and computer program product |
-
2024
- 2024-04-26 CN CN202410512858.XA patent/CN118264753A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN119376994A (en) * | 2024-12-27 | 2025-01-28 | 山东海量信息技术研究院 | A fault-tolerant processing method, device, medium and computer program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6397359B1 (en) | Methods, systems and computer program products for scheduled network performance testing | |
CN108847982B (en) | Distributed storage cluster and node fault switching method and device thereof | |
JP2005209201A (en) | Node management in high-availability cluster | |
CN119011374B (en) | A master-slave switching method and system based on device synchronization | |
CN112506702A (en) | Data center disaster tolerance method, device, equipment and storage medium | |
CN107153660A (en) | The fault detect processing method and its system of distributed data base system | |
CN118331779A (en) | Distributed system fault judgment and recovery method, cloud operating system and computing platform using the method | |
CN118264753A (en) | Resource scheduling method and system for environment-controlled distributed KVM agents | |
CN105607973A (en) | Method, device and system for processing equipment failures in virtual machine system | |
CN118734994A (en) | A Distributed Machine Learning Approach in Heterogeneous Computing | |
US20040153704A1 (en) | Automatic startup of a cluster system after occurrence of a recoverable error | |
CN115017235A (en) | Data synchronization method, electronic device and storage medium | |
CN120086073A (en) | A business failure migration system and method for a secure computing platform | |
US20050234919A1 (en) | Cluster system and an error recovery method thereof | |
RU2710288C1 (en) | Method of remote abnormal state reset of racks used in data center | |
CN104199747B (en) | High-availability system obtaining method and system based on health management | |
CN104158843A (en) | Storage unit invalidation detecting method and device for distributed file storage system | |
CN117818511A (en) | Vehicle-mounted operating system safety detection method and device based on virtualization technology | |
CN111694894A (en) | Method, server, device and storage medium for monitoring data synchronization | |
KR20140029644A (en) | Distributed computing system and recovery method thereof | |
CN117827537A (en) | Hybrid multi-cloud data backup and recovery method, device and medium based on GPT technology | |
CN112965902B (en) | Evaluation method and device of application system | |
CN105550094B (en) | A kind of high-availability system state automatic monitoring method | |
CN113987065A (en) | Database drift method, system, electronic device and storage medium | |
CN105306256B (en) | A kind of two-node cluster hot backup implementation method based on VxWorks equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |