TWI795887B

TWI795887B - Method, electronic equipment and storage medium for virtual machine migration

Info

Publication number: TWI795887B
Application number: TW110131545A
Authority: TW
Inventors: 朱祐昇; 陳藝丰
Original assignee: 新加坡商鴻運科股份有限公司
Priority date: 2021-08-25
Filing date: 2021-08-25
Publication date: 2023-03-11
Also published as: TW202309742A

Abstract

This application discloses a method, electronic equipment and storage medium for virtual machine migration, and relates to a technical field of cloud computing. The method includes: monitoring the status of computing nodes. Determine whether a computing node meets the trigger condition. Among them, the trigger condition includes that the offline duration of the computing node reaches a preset duration, or the state of the computing node is unstable. If the computing node meets the trigger condition, it sends a message to a control node to migrate the virtual machine.

Description

Virtual machine migration method, electronic device and storage medium

本申請涉及雲計算技術領域，具體涉及一種虛擬機器遷移方法、電子設備及存儲介質。 The present application relates to the technical field of cloud computing, and in particular to a virtual machine migration method, electronic equipment and storage media.

目前，對於雲平臺之管理，通常會使用一套監控物理節點之系統，需要運維人員進行整體服務之運作管理。當某一物理節點發生異常或需要維護時，將運行於此物理節點之服務轉移至其他物理節點，而服務轉移之行為通常是由運維人員進行手動操作。對於突發狀況之反應，手動操作通常會有一定之時間延遲，影響運維之效率。 At present, for the management of cloud platforms, a system of monitoring physical nodes is usually used, requiring operation and maintenance personnel to manage the operation and management of the overall service. When a physical node is abnormal or requires maintenance, the services running on this physical node are transferred to other physical nodes, and the behavior of service transfer is usually manually operated by operation and maintenance personnel. For the response to emergencies, manual operations usually have a certain time delay, which affects the efficiency of operation and maintenance.

以Openstack雲平臺為例。當一個計算節點發生異常導致該計算節點關機或重啟，或當一個計算節點發生故障影響Openstack雲平臺之正常運行時，需要將該計算節點之虛擬機器遷移到其他空閒之計算節點。當運維人員發現Openstack雲平臺出現異常時，藉由監測資料判斷是否遷移虛擬機器(Virtual Machine,VM)，並藉由Openstack Nova模組中之evacuate指令來遷移虛擬機器至其他計算節點。使用該指令遷移虛擬機器，需要運維人員手動輸入指令或是調用應用程式設計發展介面(Application Programming Interface,API)，會產生一定之時間延遲。當遇到例外狀況時，還需花費額外之時間去復原虛擬機器。 Take the Openstack cloud platform as an example. When a computing node is abnormal and causes the computing node to shut down or restart, or when a computing node fails to affect the normal operation of the Openstack cloud platform, the virtual machine of the computing node needs to be migrated to other idle computing nodes. When the operation and maintenance personnel find that the Openstack cloud platform is abnormal, they can judge whether to migrate the virtual machine (Virtual Machine, VM) based on the monitoring data, and use the evacuate command in the Openstack Nova module to migrate the virtual machine to other computing nodes. Using this command to migrate a virtual machine requires the operation and maintenance personnel to manually input the command or call the application programming interface (Application Programming Interface, API), which will cause a certain time delay. When exceptions are encountered, additional time is required to restore virtual machines.

本申請提供一種虛擬機器遷移方法、電子設備及存儲介質，以提高虛擬機器之遷移效率。 The present application provides a virtual machine migration method, an electronic device and a storage medium, so as to improve the migration efficiency of the virtual machine.

本申請一實施例之虛擬機器遷移方法包括：監測計算節點之狀態。確定計算節點是否滿足觸發條件。其中，觸發條件包括計算節點之掉線時長達到預設時長，或計算節點之狀態不穩定。若計算節點滿足觸發條件，則向控制節點發送消息，以遷移虛擬機器。 The virtual machine migration method according to an embodiment of the present application includes: monitoring the status of computing nodes. Determines whether a compute node satisfies a trigger condition. Wherein, the trigger condition includes that the offline time of the computing node reaches a preset time, or the status of the computing node is unstable. If the computing node meets the trigger condition, it sends a message to the control node to migrate the virtual machine.

於其中一種實施方式中，計算節點掉線，包括：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及1G網路掉線。 In one embodiment, the computing node goes offline, including: the nova-compute proxy service goes offline, the number of nova-compute proxy services in the offline state is 1, the 10G network goes offline and the 1G network goes offline.

於另一種實施方式中，計算節點之狀態不穩定，包括：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及處於掉線狀態之10G網路中計算節點之數目為1。 In another embodiment, the state of the computing node is unstable, including: the nova-compute proxy service is offline, the number of nova-compute proxy services in the offline state is 1, and the 10G network is offline and in the offline state The number of computing nodes in the 10G network is 1.

於另一種實施方式中，當處於掉線狀態之nova-compute代理服務之數目為1，且nova-compute代理服務掉線之頻次達到第一預設頻次，及處於掉線狀態之10G網路中計算節點之數目為1，且10G網路掉線之頻次達到第二預設頻次，確定計算節點之狀態不穩定。 In another embodiment, when the number of nova-compute proxy services in the offline state is 1, and the frequency of nova-compute proxy service offline reaches the first preset frequency, and in the 10G network in the offline state The number of computing nodes is 1, and the frequency of 10G network disconnection reaches the second preset frequency, and it is determined that the status of the computing nodes is unstable.

於另一種實施方式中，虛擬機器遷移方法還包括：計算與顯示虛擬機器之遷移時間。 In another embodiment, the virtual machine migration method further includes: calculating and displaying the migration time of the virtual machine.

於另一種實施方式中，虛擬機器遷移方法還包括：儲存虛擬機器之遷移記錄。 In another implementation manner, the virtual machine migration method further includes: storing a migration record of the virtual machine.

本申請另一實施例之虛擬機器遷移方法包括：檢查雲平臺管理系統之狀態。向監測節點回饋虛擬機器遷移之消息。檢查虛擬機器之狀態是否與遷移之前保持一致。若虛擬機器之狀態與遷移之前不一致，則修復虛擬機器。 Another embodiment of the virtual machine migration method of the present application includes: checking the status of the cloud platform management system. Feedback the news of virtual machine migration to the monitoring node. Check whether the state of the virtual machine is consistent with that before migration. If the state of the virtual machine is inconsistent with that before the migration, the virtual machine is repaired.

於其中一種實施方式中，檢查雲平臺管理系統之狀態，包括：檢查是否存在空閒之計算節點。檢查nova服務之狀態是否正常。檢查電源之狀態是否正常。 In one of the implementation manners, checking the status of the cloud platform management system includes: checking whether there are idle computing nodes. Check whether the status of the nova service is normal. Check whether the status of the power supply is normal.

本申請另一實施例之電子設備包括通訊模組，顯示幕，記憶體，及處理器，處理器運行存儲於記憶體中之電腦程式或代碼，實現本申請實施例之上述虛擬機器遷移方法。 The electronic device in another embodiment of the present application includes a communication module, a display screen, a memory, and a processor. The processor runs computer programs or codes stored in the memory to implement the above-mentioned virtual machine migration method in the embodiment of the present application.

本申請另一實施例之存儲介質用於存儲電腦程式或代碼，當電腦程式或代碼被處理器執行時，實現本申請實施例之上述虛擬機器遷移方法。 The storage medium in another embodiment of the present application is used to store computer programs or codes. When the computer programs or codes are executed by the processor, the above-mentioned virtual machine migration method in the embodiment of the present application is realized.

本申請實施例於雲平臺管理系統中設定監測節點與觸發條件，當監測資料滿足觸發條件，則系統自動開始進行虛擬機器遷移工作，能夠自動遷移虛擬機器離開故障節點，恢復上線，並及時處理虛擬機器遷移過程中遇到之突發狀況，減少系統宕機之時間，提高虛擬機器之遷移效率，從而提升運維之效率。 In the embodiment of this application, monitoring nodes and triggering conditions are set in the cloud platform management system. When the monitoring data meets the triggering conditions, the system will automatically start the migration of the virtual machine, which can automatically migrate the virtual machine away from the faulty node, resume online, and process the virtual machine in time. Unexpected situations encountered in the process of machine migration can reduce system downtime, improve the migration efficiency of virtual machines, and thus improve the efficiency of operation and maintenance.

100:雲平臺管理系統 100: Cloud platform management system

110:監測節點 110: Monitoring node

120:控制節點 120:Control node

130:計算節點 130: computing node

111,121:處理器 111,121: Processor

112,122:記憶體 112,122: memory

113,123:通訊模組 113,123: communication module

114:顯示幕 114: display screen

S101-S106,S201-S210:步驟 S101-S106, S201-S210: steps

圖1是本申請一實施方式之雲平臺管理系統之結構示意圖。 FIG. 1 is a schematic structural diagram of a cloud platform management system according to an embodiment of the present application.

圖2是本申請一實施方式之虛擬機器遷移方法之流程圖。 FIG. 2 is a flowchart of a virtual machine migration method in an embodiment of the present application.

圖3是本申請一實施方式之監測節點之結構示意圖。 FIG. 3 is a schematic structural diagram of a monitoring node in an embodiment of the present application.

圖4是本申請另一實施方式之虛擬機器遷移方法之流程圖。 FIG. 4 is a flowchart of a virtual machine migration method according to another embodiment of the present application.

圖5是本申請一實施方式之控制節點之結構示意圖。 FIG. 5 is a schematic structural diagram of a control node according to an embodiment of the present application.

為能夠更清楚地理解本申請之上述目的、特徵與優點，下面結合附圖與具體實施例對本申請進行詳細描述。需要說明的是，於不衝突之情況下，本申請之實施例及實施例中之特徵可相互組合。於下面之描述中闡述了很多具體細節以便於充分理解本申請，所描述之實施例僅是本申請一部分實施例，而非全部之實施例。 In order to more clearly understand the above purpose, features and advantages of the present application, the present application will be described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other. A lot of specific details are set forth in the following description to facilitate a full understanding of the application, and the described embodiments are only a part of the embodiments of the application, but not all of the embodiments.

需要說明的是，雖於流程圖中示出了邏輯順序，然於某些情況下，可不同於流程圖中之循序執行所示出或描述之步驟。本申請實施例中公開之方法包括用於實現方法之一個或多個步驟或動作。方法步驟與/或動作可於不脫離權利要求之範圍之情況下彼此互換。 It should be noted that although the logical order is shown in the flow chart, in some cases, the steps shown or described in the flow chart may be executed in a different order. The methods disclosed in the embodiments of the present application include one or more steps or actions for implementing the methods. The method steps and/or actions may be interchanged with one another without departing from the scope of the claims.

於本申請實施例中，物理節點包括監測節點(Monitor Node)110、控制節點(Control Node)120及計算節點(Compute Node)130。物理節點可是電子設備，例如智慧型電話、平板電腦、個人電腦(Personal Computer,PC)、個人數位助理(Personal Digital Assistant,PDA)、路由器、工作站或伺服器等。 In the embodiment of the present application, the physical nodes include a monitor node (Monitor Node) 110 , a control node (Control Node) 120 and a computing node (Compute Node) 130 . A physical node can be an electronic device, such as a smart phone, a tablet computer, a personal computer (Personal Computer, PC), a personal digital assistant (Personal Digital Assistant, PDA), a router, a workstation, or a server.

圖1是本申請一實施方式之雲平臺管理系統100之結構示意圖。 FIG. 1 is a schematic structural diagram of a cloud platform management system 100 according to an embodiment of the present application.

可參閱圖1，雲平臺管理系統100包括監測節點110、控制節點120及計算節點130。其中，監測節點110通訊連接於計算節點130與控制節點120，控制節點120通訊連接於計算節點130。 Referring to FIG. 1 , the cloud platform management system 100 includes a monitoring node 110 , a control node 120 and a computing node 130 . Wherein, the monitoring node 110 is communicatively connected to the computing node 130 and the control node 120 , and the control node 120 is communicatively connected to the computing node 130 .

於本實施例中，通訊連接包括有線連接與無線連接。其中，有線連接是指藉由有線傳輸介質(例如光纖或雙絞線)進行連接。無線連接是指藉由無線傳輸介質(例如WiFi，藍牙，NFC，或2G/3G/4G/5G等無線通訊網路)進行連接。 In this embodiment, the communication connection includes wired connection and wireless connection. Wherein, wired connection refers to connection through a wired transmission medium (such as optical fiber or twisted pair). Wireless connection refers to connection through wireless transmission media (such as WiFi, Bluetooth, NFC, or wireless communication networks such as 2G/3G/4G/5G).

計算節點130用以運行一台或多台虛擬機器。 Computing nodes 130 are used to run one or more virtual machines.

控制節點120用以控制一個或多個計算節點130，將一個計算節點130之虛擬機器遷移至另一個計算節點130。 The control node 120 is used to control one or more computing nodes 130 , and migrate the virtual machine of one computing node 130 to another computing node 130 .

監測節點110用以監測計算節點130之狀態，以確定計算節點130是否發生異常。當計算節點130發生異常時，監測節點110向控制節點120發送消息，以將發生異常之計算節點130之虛擬機器遷移至其他可用之計算節點130。 The monitoring node 110 is used for monitoring the status of the computing node 130 to determine whether the computing node 130 is abnormal. When the computing node 130 is abnormal, the monitoring node 110 sends a message to the control node 120 to migrate the virtual machine of the abnormal computing node 130 to other available computing nodes 130 .

於本實施例中，監測節點110可藉由設置一定之觸發條件來判斷計算節點130是否發生異常。當計算節點130之狀態滿足觸發條件時，監測節點110確定計算節點130發生異常，向控制節點120發送消息，以遷移虛擬機器。當計算節點130之狀態不滿足觸發條件時，監測節點110確定計算節點130正常，繼續監測計算節點130之狀態。 In this embodiment, the monitoring node 110 can determine whether the computing node 130 is abnormal by setting a certain trigger condition. When the status of the computing node 130 satisfies the trigger condition, the monitoring node 110 determines that the computing node 130 is abnormal, and sends a message to the control node 120 to migrate the virtual machine. When the status of the computing node 130 does not meet the trigger condition, the monitoring node 110 determines that the computing node 130 is normal, and continues to monitor the status of the computing node 130 .

於其中一種實施方式中，觸發條件包括計算節點130之掉線時長達到預設時長，或計算節點130之狀態不穩定。其中，預設時長之取值範圍為1min-5min。例如，預設時間可取2min。 In one of the implementation manners, the trigger condition includes that the offline duration of the computing node 130 reaches a preset duration, or the status of the computing node 130 is unstable. Wherein, the value range of the preset duration is 1min-5min. For example, the preset time may be 2 minutes.

計算節點130掉線包括以下四個條件：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及1G網路掉線。其中，10G網路是指控制節點120到計算節點130之間之網路，虛擬機器於10G網路上進行遷移。1G網路是指監測節點110到計算節點130之間之網路，監測訊號於1G網路上進行傳輸。nova-compute代理服務可提供管理與配置虛擬機器之入口。nova-compute代理服務於計算節點130上運行，負責管理計算節點130上之實例(instance)。當上述四個條件同時滿足時，可確定計算節點130掉線。 The disconnection of the computing node 130 includes the following four conditions: the nova-compute proxy service is disconnected, the number of nova-compute proxy services in the disconnected state is 1, the 10G network is disconnected and the 1G network is disconnected. Wherein, the 10G network refers to the network between the control node 120 and the computing node 130, and the virtual machine is migrated on the 10G network. The 1G network refers to the network between the monitoring node 110 and the computing node 130, and monitoring signals are transmitted on the 1G network. The nova-compute proxy service can provide an entry point for managing and configuring virtual machines. The nova-compute agent service runs on the computing node 130 and is responsible for managing instances on the computing node 130 . When the above four conditions are met simultaneously, it may be determined that the computing node 130 is offline.

計算節點130之狀態不穩定包括以下四個條件：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及處於掉線狀態之10G網路中計算節點130之數目為1。當處於掉線狀態之nova-compute代理服務之數目為1，且nova-compute代理服務掉線之頻次達到第一預設頻次，及處於掉線狀態之10G網路中計算節點130之數目為1，且10G網路掉線之頻次達到第二預設頻次時，可確定計算節點130之狀態不穩定。其中，第一預設頻次之取值範圍為8/10min-14/10min。例如，第一預設頻次可取14/10min。第二預設頻次之取值範圍為4/10min-7/10min。例如，第二預設頻次可取7/10min。 The unstable state of computing node 130 includes the following four conditions: the nova-compute proxy service is offline, the number of nova-compute proxy services in the offline state is 1, the 10G network is offline and the 10G network is in the offline state The number of computing nodes 130 is one. When the number of nova-compute proxy services in the offline state is 1, and the frequency of nova-compute proxy service offline reaches the first preset frequency, and the number of computing nodes 130 in the 10G network in the offline state is 1 , and when the frequency of 10G network disconnection reaches the second preset frequency, it can be determined that the state of the computing node 130 is unstable. Among them, the first A value range of the preset frequency is 8/10min-14/10min. For example, the first preset frequency may be 14/10 minutes. The value range of the second preset frequency is 4/10min-7/10min. For example, the second preset frequency may be 7/10 minutes.

具體而言，於10G網路中，nova-compute代理服務於正常工作時每30秒發送1次資料包，以管理計算節點130上之實例。10min之時段，nova-compute代理服務累計可發送20次資料包。當發包之丟包率達到40%-70%，即第一預設頻次取8/10min-14/10min時，可確定nova-compute代理服務之狀態不穩定。計算節點130於正常工作時每60秒發送1次資料包，以減小廣播風暴之干擾。10min之時段，計算節點130累計可發送10次資料包。當發包之丟包率達到40%-70%，即第二預設頻次取4/10min-7/10min時，可確定計算節點130之狀態不穩定。當確定nova-compute代理服務與計算節點130之工作狀態均不穩定時，監測節點110確定計算節點130發生異常，向控制節點120發送消息，以遷移虛擬機器。 Specifically, in the 10G network, the nova-compute proxy service sends a data packet every 30 seconds to manage instances on the computing nodes 130 during normal operation. In a period of 10 minutes, the nova-compute proxy service can send 20 data packets in total. When the packet loss rate of sending packets reaches 40%-70%, that is, when the first preset frequency is 8/10min-14/10min, it can be determined that the status of the nova-compute proxy service is unstable. The computing node 130 sends a data packet every 60 seconds during normal operation to reduce the interference of the broadcast storm. In a period of 10 minutes, the computing node 130 can send 10 data packets accumulatively. When the packet loss rate of sending packets reaches 40%-70%, that is, when the second preset frequency is 4/10min-7/10min, it can be determined that the state of the computing node 130 is unstable. When it is determined that the nova-compute agent service and the working status of the computing node 130 are both unstable, the monitoring node 110 determines that the computing node 130 is abnormal, and sends a message to the control node 120 to migrate the virtual machine.

於其中一種實施方式中，當監測節點110確定多個計算節點130發生異常時，依次遷移發生異常之各個計算節點130上之虛擬機器，以減少將一個異常計算節點130上之虛擬機器遷移至另一個異常計算節點130之狀況。 In one of the implementation manners, when the monitoring node 110 determines that multiple computing nodes 130 are abnormal, the virtual machines on each computing node 130 where the abnormality occurs are migrated sequentially, so as to reduce the need to migrate the virtual machines on one abnormal computing node 130 to another The status of an abnormal computing node 130 .

舉例而言，當監測節點110確定第一計算節點130與第二計算節點130發生異常，監測節點110向控制節點120發送第一消息，以遷移第一計算節點130上之虛擬機器。當第一計算節點130上之虛擬機器遷移完成之後，監測節點110向控制節點120發送第二消息，以遷移第二計算節點130上之虛擬機器。 For example, when the monitoring node 110 determines that the first computing node 130 and the second computing node 130 are abnormal, the monitoring node 110 sends a first message to the control node 120 to migrate the virtual machine on the first computing node 130 . After the migration of the virtual machine on the first computing node 130 is completed, the monitoring node 110 sends a second message to the control node 120 to migrate the virtual machine on the second computing node 130 .

於其中一種實施方式中，監測節點110可切換至保險狀態，以停止遷移其他計算節點130上之虛擬機器。 In one of the implementation manners, the monitoring node 110 can switch to the safe state to stop migrating virtual machines on other computing nodes 130 .

具體而言，當監測節點110確定多個計算節點130發生異常時，向控制節點120發送消息，以遷移虛擬機器。控制節點120查找空閒之計算節點 130，依次遷移發生異常之各個計算節點130上之虛擬機器。當控制節點120開始遷移一個異常計算節點130上之虛擬機器時，監測節點110切換至保險狀態，停止遷移其他異常計算節點130上之虛擬機器。當確定一個異常計算節點130上之虛擬機器遷移完成之後，監測節點110切換回工作狀態，以繼續監測計算節點130之狀態。 Specifically, when the monitoring node 110 determines that a plurality of computing nodes 130 are abnormal, it sends a message to the control node 120 to migrate the virtual machine. The control node 120 searches for an idle computing node 130. Migrate the virtual machines on each computing node 130 where the exception occurs in sequence. When the control node 120 starts to migrate a virtual machine on an abnormal computing node 130 , the monitoring node 110 switches to the insurance state, and stops migrating virtual machines on other abnormal computing nodes 130 . When it is determined that the migration of the virtual machine on an abnormal computing node 130 is completed, the monitoring node 110 switches back to the working state to continue monitoring the status of the computing node 130 .

於其中一種實施方式中，當監測節點110向控制節點120發送消息時，監測節點110檢查控制節點120之執行緒標記(flag)與狀態鎖(lock)，以減少重新進入執行緒(re-entry)之狀況。其中，執行緒標記用以表示控制節點120是否正於運行虛擬機器之遷移程式。狀態鎖用以表示控制節點120是否處於鎖定狀態。 In one of the implementation manners, when the monitoring node 110 sends a message to the control node 120, the monitoring node 110 checks the thread flag and state lock of the control node 120 to reduce re-entry ) status. Wherein, the thread flag is used to indicate whether the control node 120 is running the migration program of the virtual machine. The state lock is used to indicate whether the control node 120 is in a locked state.

舉例而言，當執行緒標記flag=1時，表示控制節點120正於運行虛擬機器之遷移程式，控制節點120向監測節點110回饋狀態資訊，以停止新之遷移程式，等待當前之遷移程式運行完畢。當執行緒標記flag=0時，表示控制節點120處於空閒狀態，控制節點120向監測節點110回饋狀態資訊，以啟動遷移程式。當狀態鎖lock=1時，表示控制節點120處於鎖定狀態，此時控制節點120停止工作，不會回應監測節點110之消息。當狀態鎖lock=0時，表示控制節點120處於解鎖狀態，此時控制節點120重新開始工作，向監測節點110回饋狀態資訊。 For example, when the thread flag flag=1, it means that the control node 120 is running the migration program of the virtual machine, and the control node 120 feeds back status information to the monitoring node 110 to stop the new migration program and wait for the current migration program to run complete. When the thread flag flag=0, it means that the control node 120 is in an idle state, and the control node 120 feeds back status information to the monitoring node 110 to start the migration program. When the state lock=1, it means that the control node 120 is in the locked state, and at this time the control node 120 stops working and will not respond to the message from the monitoring node 110 . When the state lock=0, it means that the control node 120 is in an unlocked state, and at this time the control node 120 restarts to work and feeds back state information to the monitoring node 110 .

於其中一種實施方式中，當控制節點120遷移虛擬機器時，檢查雲平臺管理系統100之狀態。 In one of the implementation manners, when the control node 120 migrates the virtual machine, it checks the status of the cloud platform management system 100 .

具體而言，控制節點120檢查是否存在空閒之計算節點130，檢查nova服務之狀態與電源之狀態，及記錄異常計算節點130上之虛擬機器之狀態。 Specifically, the control node 120 checks whether there is an idle computing node 130 , checks the status of the nova service and the power supply, and records the status of the virtual machine on the abnormal computing node 130 .

雲平臺管理系統100可按照計算節點130之工作狀態將計算節點130之存儲區域劃分為空閒區域與工作區域。當空閒區域不存在計算節點130，即不存在空閒之計算節點130時，控制節點120向監測節點110回饋計算節點130之狀態資訊，以停止遷移程式。當空閒區域存在計算節點130，即存在空閒之計算節點130時，控制節點120向監測節點110回饋計算節點130狀態資訊，以啟動遷移程式。 The cloud platform management system 100 can divide the storage area of the computing node 130 into an idle area and a working area according to the working state of the computing node 130 . When there is no computing node 130 in the idle area, that is, there is no idle computing node 130 , the control node 120 feeds back the status information of the computing node 130 to the monitoring node 110 to stop the migration process. When there are computing nodes 130 in the free area, there is an idle plan When computing the computing node 130, the control node 120 feeds back the status information of the computing node 130 to the monitoring node 110, so as to start the migration program.

nova服務負責維護與管理雲平臺管理系統100之計算資源。nova服務之狀態包括正常狀態或異常狀態。當nova服務處於正常狀態時，控制節點120向監測節點110回饋nova服務之狀態資訊，以啟動遷移程式。當nova服務處於異常狀態時，控制節點120向監測節點110回饋nova服務之狀態資訊，以停止遷移程式。 The nova service is responsible for maintaining and managing the computing resources of the cloud platform management system 100 . The status of nova service includes normal status or abnormal status. When the nova service is in a normal state, the control node 120 feeds back the status information of the nova service to the monitoring node 110 to start the migration procedure. When the nova service is in an abnormal state, the control node 120 feeds back the status information of the nova service to the monitoring node 110 to stop the migration procedure.

控制節點120可藉由基板管理控制器(Baseboard Management Controller,BMC)檢查電源之狀態。基板管理控制器可於機器未開機之狀態下，對機器進行固件升級、查看機器設備等操作。電源之狀態包括開機、關機或待機狀態。當電源處於開機狀態時，控制節點120向監測節點110回饋電源之狀態資訊，以啟動遷移程式。當電源處於關機或待機狀態時，雲平臺管理系統100掉線，監測節點110無法接收控制節點120之回饋消息。 The control node 120 can check the status of the power supply through a baseboard management controller (BMC). The baseboard management controller can perform operations such as upgrading the firmware of the machine and checking the machine equipment when the machine is not powered on. The state of the power supply includes on, off or standby state. When the power is turned on, the control node 120 feeds back the status information of the power to the monitoring node 110 to start the migration program. When the power is turned off or in standby state, the cloud platform management system 100 is offline, and the monitoring node 110 cannot receive the feedback message from the control node 120 .

控制節點120即時記錄計算節點130上之虛擬機器之狀態。虛擬機器之狀態包括運行(Active)狀態或故障(Error)狀態。於虛擬機器遷移完成之後，控制節點120檢查虛擬機器之狀態是否與遷移之前保持一致，以確定是否需要運行修復程式對虛擬機器進行處理，及向監測節點110回饋虛擬機器之消息。當遷移後虛擬機器之狀態與遷移之前不一致時，若經過一定之時段或重啟虛擬機器之後，虛擬機器之狀態不能恢復至遷移之前之狀態，則控制節點120運行修復程式對虛擬機器進行修復。當遷移後虛擬機器之狀態與遷移之前保持一致時，控制節點120向監測節點110回饋虛擬機器遷移成功之消息。 The control node 120 records the state of the virtual machine on the computing node 130 in real time. The state of the virtual machine includes a running (Active) state or a failure (Error) state. After the migration of the virtual machine is completed, the control node 120 checks whether the state of the virtual machine is consistent with that before the migration, to determine whether to run a repair program to process the virtual machine, and feeds back the information of the virtual machine to the monitoring node 110 . When the state of the virtual machine after the migration is inconsistent with that before the migration, if the state of the virtual machine cannot be restored to the state before the migration after a certain period of time or after restarting the virtual machine, the control node 120 runs a repair program to repair the virtual machine. When the state of the virtual machine after migration is consistent with that before the migration, the control node 120 feeds back a message of successful migration of the virtual machine to the monitoring node 110 .

舉例而言，虛擬機器於遷移之前處於運行狀態，於遷移之後處於故障狀態，若虛擬機器於5min以內或重啟之後恢復至運行狀態，則計算節點130向控制節點120回饋虛擬機器遷移成功之消息。控制節點120再向監測節點110回饋虛擬機器遷移成功之消息。其中，控制節點120可運行重啟程式，以重啟虛擬機器。若虛擬機器於超過5min或重啟之後仍處於故障狀態，則計算節點130向控制節點120回饋虛擬機器遷移失敗之消息。控制節點120再向監測節點110回饋虛擬機器遷移失敗之消息，並運行修復程式，以修復虛擬機器。 For example, the virtual machine is in the running state before the migration, and is in the fault state after the migration. If the virtual machine returns to the running state within 5 minutes or after restarting, the computing node 130 will feed back a message to the control node 120 that the virtual machine has migrated successfully. The control node 120 returns to the monitoring node 110 Feed the message of the successful migration of the virtual machine. Wherein, the control node 120 can run a restart program to restart the virtual machine. If the virtual machine is still in the failure state after more than 5 minutes or after being restarted, the computing node 130 feeds back a message that the migration of the virtual machine failed to the control node 120 . The control node 120 then feeds back a message of virtual machine migration failure to the monitoring node 110, and runs a repair program to restore the virtual machine.

於其中一種實施方式中，於虛擬機器遷移完成之後，監測節點110藉由顯示幕顯示虛擬機器之遷移時間。其中，虛擬機器之遷移時間包括檢查時間，實例時間及執行時間。檢查時間是指於虛擬機器遷移之前控制節點120檢查雲平臺管理系統100之狀態所消耗之時間。檢查時間之取值範圍為1min-5min。例如，檢查時間可取2min。實例時間是指於虛擬機器遷移之前控制節點120調用實例所消耗之時間。實例時間與實例之數目及實例調用之時隙相關，實例調用之時隙之取值範圍為1s-5s。例如，實例調用之時隙可取5s，表示每隔5s調用一個實例。執行時間是指虛擬機器遷移所消耗之時間。執行時間之取值範圍為30s-120s。例如，執行時間可取60s。 In one of the implementation manners, after the migration of the virtual machine is completed, the monitoring node 110 displays the migration time of the virtual machine through the display screen. Wherein, the migration time of the virtual machine includes checking time, instance time and execution time. The check time refers to the time consumed by the control node 120 to check the status of the cloud platform management system 100 before the virtual machine is migrated. The value range of inspection time is 1min-5min. For example, the inspection time may be 2 minutes. The instance time refers to the time consumed by the control node 120 to invoke the instance before the virtual machine migration. The instance time is related to the number of instances and the time slot of instance calling. The value range of the time slot of instance calling is 1s-5s. For example, the time slot for calling an instance can be 5s, which means calling an instance every 5s. Execution time refers to the time consumed by virtual machine migration. The range of execution time is 30s-120s. For example, the execution time may be 60s.

舉例而言，於虛擬機器遷移之前，檢查時間t₁=120s，實例調用之時隙t₀=5s，實例之數目n=30，執行時間t₂=60s，則監測節點110可計算虛擬機器之遷移時間T=t₁+t₀*n+t₂=330s。 For example, before the migration of the virtual machine, check time t ₁ =120s, time slot t ₀ of instance calling =5s, number of instances n=30, execution time t ₂ =60s, then the monitoring node 110 can calculate the Migration time T=t ₁ +t ₀ *n+t ₂ =330s.

於其中一種實施方式中，監測節點110可儲存虛擬機器遷移之工作記錄。例如，監測節點110可將虛擬機器遷移之全部或部分資料寫入evacuation_computeXX.log檔中，為運維人員改善系統性能提供有效之資料參考。 In one of the implementation manners, the monitoring node 110 can store the work record of virtual machine migration. For example, the monitoring node 110 can write all or part of the virtual machine migration data into the evacuation_computeXX.log file, so as to provide effective data reference for operation and maintenance personnel to improve system performance.

於本實施例中，雲平臺管理系統100藉由監測節點110、控制節點120及計算節點130之間之資訊交互，實現對虛擬機器遷移過程之即時監測與控制。當監測資料滿足觸發條件時，雲平臺管理系統100自動開始進行虛擬機器遷移工作，可自動遷移虛擬機器離開故障節點，恢復上線，並及時處理虛擬機器遷移過程中遇到之例外狀況，減少系統宕機之時間，提高虛擬機器之遷移效率，從而提升運維之效率。於虛擬機器遷移完成之後，運維人員可隨時查看雲平臺管理系統100之工作記錄，以評估雲平臺管理系統100之有效性與可靠性。 In this embodiment, the cloud platform management system 100 realizes the real-time monitoring and control of the virtual machine migration process through the information interaction between the monitoring node 110 , the control node 120 and the computing node 130 . When the monitoring data meets the trigger conditions, the cloud platform management system 100 automatically starts the virtual machine migration work, which can automatically migrate the virtual machine away from the faulty node, restore it online, and promptly handle the exceptions encountered during the virtual machine migration process to reduce system downtime Machine time, improve the migration efficiency of virtual machines, Thereby improving the efficiency of operation and maintenance. After the migration of the virtual machine is completed, the operation and maintenance personnel can check the work records of the cloud platform management system 100 at any time to evaluate the effectiveness and reliability of the cloud platform management system 100 .

虛擬機器遷移方法可應用於雲平臺管理系統100之監測節點110。可參閱圖2，虛擬機器遷移方法包括： The virtual machine migration method can be applied to the monitoring node 110 of the cloud platform management system 100 . Referring to Figure 2, the virtual machine migration methods include:

S101，監測節點110監測計算節點130之狀態。 S101 , the monitoring node 110 monitors the status of the computing node 130 .

其中，計算節點130之狀態包括正常狀態或異常狀態。異常狀態包括計算節點130掉線或宕機等。當計算節點130出現突發狀況，例如掉線或宕機，監測節點110捕捉該事件，確定計算節點130是否滿足觸發條件。 Wherein, the state of the computing node 130 includes a normal state or an abnormal state. The abnormal state includes the computing node 130 being disconnected or down. When the computing node 130 has an unexpected situation, such as disconnection or downtime, the monitoring node 110 captures the event and determines whether the computing node 130 meets the trigger condition.

於其中一種實施方式中，運維人員可選擇一個或多個監測節點110，以監測計算節點130之狀態。 In one of the implementation manners, the operation and maintenance personnel can select one or more monitoring nodes 110 to monitor the status of the computing nodes 130 .

S102，監測節點110確定計算節點130是否滿足觸發條件。若計算節點130滿足觸發條件，則執行步驟S103。若計算節點130不滿足觸發條件，則返回執行步驟S101。 S102, the monitoring node 110 determines whether the computing node 130 meets a trigger condition. If the computing node 130 satisfies the trigger condition, step S103 is performed. If the computing node 130 does not meet the trigger condition, return to step S101.

於其中一種實施方式中，觸發條件包括計算節點130之掉線時長達到預設時長，或計算節點130之狀態不穩定。其中，預設時長之取值範圍為1min-5min。 In one of the implementation manners, the trigger condition includes that the offline duration of the computing node 130 reaches a preset duration, or the status of the computing node 130 is unstable. Wherein, the value range of the preset duration is 1min-5min.

計算節點130掉線包括以下四個條件：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及1G網路掉線。當上述四個條件同時滿足時，可確定計算節點130掉線。 The disconnection of the computing node 130 includes the following four conditions: the nova-compute proxy service is disconnected, the number of nova-compute proxy services in the disconnected state is 1, the 10G network is disconnected and the 1G network is disconnected. When the above four conditions are met simultaneously, it may be determined that the computing node 130 is offline.

計算節點130之狀態不穩定包括以下四個條件：nova-compute代理服務掉線，處於掉線狀態之nova-compute代理服務之數目為1，10G網路掉線及處於掉線狀態之10G網路中計算節點130之數目為1。當處於掉線狀態之nova-compute代理服務之數目為1，且nova-compute代理服務掉線之頻次達到第一預設頻次，及處於掉線狀態之10G網路中計算節點130之數目為1，且10G網路掉線之頻次達到第二預設頻次時，可確定計算節點130之狀態不穩定。其中，第一預設頻次之取值範圍為8/10min-14/10min。第二預設頻次之取值範圍為4/10min-7/10min。 The unstable state of computing node 130 includes the following four conditions: the nova-compute proxy service is offline, the number of nova-compute proxy services in the offline state is 1, the 10G network is offline and the 10G network is in the offline state The number of computing nodes 130 is one. When the number of nova-compute proxy services in the offline state is 1, and the frequency of nova-compute proxy service offline reaches the first A preset frequency, and when the number of computing nodes 130 in the 10G network in the offline state is 1, and the frequency of 10G network offline reaches a second preset frequency, it can be determined that the status of the computing nodes 130 is unstable. Wherein, the value range of the first preset frequency is 8/10min-14/10min. The value range of the second preset frequency is 4/10min-7/10min.

當計算節點130滿足觸發條件時，監測節點110向控制節點120發送消息，以將發生異常之計算節點130之虛擬機器遷移至其他可用之計算節點130。 When the computing node 130 meets the trigger condition, the monitoring node 110 sends a message to the control node 120 to migrate the virtual machine of the abnormal computing node 130 to other available computing nodes 130 .

S103，監測節點110向控制節點120發送消息，以遷移虛擬機器。 S103, the monitoring node 110 sends a message to the control node 120 to migrate the virtual machine.

於其中一種實施方式中，當監測節點110向控制節點120發送消息時，監測節點110檢查控制節點120之執行緒標記(flag)與狀態鎖(lock)，以減少重新進入執行緒(re-entry)之狀況。 In one of the implementation manners, when the monitoring node 110 sends a message to the control node 120, the monitoring node 110 checks the thread flag and state lock of the control node 120 to reduce re-entry ) status.

S104，監測節點110切換至保險狀態，以停止遷移其他計算節點130上之虛擬機器。 S104 , the monitoring node 110 switches to the insurance state, so as to stop migrating virtual machines on other computing nodes 130 .

當控制節點120開始遷移一個異常計算節點130上之虛擬機器時，監測節點110切換至保險狀態，停止遷移其他異常計算節點130上之虛擬機器。當確定一個異常計算節點130上之虛擬機器遷移完成之後，監測節點110切換回工作狀態，以繼續監測計算節點130之狀態。 When the control node 120 starts to migrate a virtual machine on an abnormal computing node 130 , the monitoring node 110 switches to the insurance state, and stops migrating virtual machines on other abnormal computing nodes 130 . When it is determined that the migration of the virtual machine on an abnormal computing node 130 is completed, the monitoring node 110 switches back to the working state to continue monitoring the status of the computing node 130 .

S105，監測節點110藉由顯示幕顯示虛擬機器之遷移時間。 S105, the monitoring node 110 displays the migration time of the virtual machine through the display screen.

其中，虛擬機器之遷移時間包括檢查時間，實例時間及執行時間。檢查時間之取值範圍為1min-5min。實例時間與實例之數目及實例調用之時隙相關，實例調用之時隙之取值範圍為1s-5s。執行時間之取值範圍為30s-120s。監測節點110計算與顯示虛擬機器之遷移時間，運維人員可根據虛擬機器之遷移時間設定其他執行緒之啟動時間，以提升系統之運行效率。 Wherein, the migration time of the virtual machine includes checking time, instance time and execution time. The value range of inspection time is 1min-5min. The instance time is related to the number of instances and the time slot of instance calling. The value range of the time slot of instance calling is 1s-5s. The range of execution time is 30s-120s. The monitoring node 110 calculates and displays the migration time of the virtual machine, and the operation and maintenance personnel can set the startup time of other execution threads according to the migration time of the virtual machine, so as to improve the operating efficiency of the system.

S106，監測節點110儲存虛擬機器之遷移記錄。 S106, the monitoring node 110 stores the migration record of the virtual machine.

於其中一種實施方式中，監測節點110可將虛擬機器遷移之全部或部分資料寫入evacuation_computeXX.log檔中，為運維人員改善系統性能提供有效之資料參考。 In one implementation manner, the monitoring node 110 can write all or part of the virtual machine migration data into the evacuation_computeXX.log file, providing effective data reference for operation and maintenance personnel to improve system performance.

圖3是本申請一實施方式之監測節點110之結構示意圖。 FIG. 3 is a schematic structural diagram of a monitoring node 110 according to an embodiment of the present application.

可參閱圖3，監測節點110可包括處理器111，記憶體112，通訊模組113及顯示幕114。處理器111電連接於上述其他部件。處理器111可藉由運行存儲於記憶體112中之電腦程式或代碼，實現本申請實施例之上述虛擬機器遷移方法。 Referring to FIG. 3 , the monitoring node 110 may include a processor 111 , a memory 112 , a communication module 113 and a display screen 114 . The processor 111 is electrically connected to the above-mentioned other components. The processor 111 can implement the above virtual machine migration method in the embodiment of the present application by running the computer program or code stored in the memory 112 .

處理器111可包括一個或多個處理單元，例如：處理器111可包括應用處理器(Application Processor,AP)，調製解調處理器，圖形處理器(Graphics Processing Unit,GPU)，圖像訊號處理器(Image Signal Processor,ISP)，控制器，視頻轉碼器，數位訊號處理器(Digital Signal Processor,DSP)，基帶處理器，與/或神經網路處理器(Neural-Network Processing Unit,NPU)等。其中，不同之處理單元可是獨立之器件，亦可集成於一個或多個處理器中。 The processor 111 may include one or more processing units, for example: the processor 111 may include an application processor (Application Processor, AP), a modem processor, a graphics processing unit (Graphics Processing Unit, GPU), an image signal processing Image Signal Processor (ISP), Controller, Video Transcoder, Digital Signal Processor (Digital Signal Processor, DSP), Baseband Processor, and/or Neural-Network Processing Unit (NPU) wait. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.

記憶體112可包括外部記憶體介面與內部記憶體。其中，外部記憶體介面可用於連接外部存儲卡，例如Micro SD卡，實現擴展監測節點110之存儲能力。外部存儲卡藉由外部記憶體介面與處理器111通訊，實現資料存儲功能。內部記憶體可用於存儲電腦可執行程式碼，所述可執行程式碼包括指令。內部記憶體可包括存儲程式區與存儲資料區。其中，存儲程式區可存儲作業系統，至少一個功能所需之應用程式(比如聲音播放功能，圖像播放功能等)等。存儲資料區可存儲監測節點110使用過程中所創建之資料(比如音訊資料，電話本等)等。此外，內部記憶體可包括高速隨機存取記憶體，還可包括非易失性記憶體，例如至少一個磁碟記憶體件，快閃記憶體器件，通用快閃記憶體記憶體(Universal Flash Storage,UFS)等。處理器111藉由運行存儲於內部記憶體之指令，與/或存儲於設置於處理器111中之記憶體之指令，執行監測節點110之各種功能應用以及資料處理。 The memory 112 may include an external memory interface and an internal memory. Wherein, the external memory interface can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the monitoring node 110 . The external memory card communicates with the processor 111 through the external memory interface to realize data storage function. Internal memory can be used to store computer executable code, which includes instructions. The internal memory may include a program storage area and a data storage area. Wherein, the stored program area can store the operating system, at least one application program required by a function (such as sound playing function, image playing function, etc.) and the like. The storage data area can store the data (such as audio data, phone book) created during the use of the monitoring node 110 etc. In addition, the internal memory may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk memory device, flash memory device, universal flash memory (Universal Flash Storage , UFS) etc. The processor 111 executes various functional applications and data processing of the monitoring node 110 by executing instructions stored in the internal memory and/or instructions stored in the memory provided in the processor 111 .

通訊模組113可包括移動通訊模組與無線通訊模組。其中，移動通訊模組可提供應用於監測節點110上之包括2G/3G/4G/5G等無線通訊之解決方案。無線通訊模組可提供應用於監測節點110上之包括無線區域網(Wireless Local Area Networks，WLAN)(如無線保真(Wireless Fidelity，Wi-Fi)網路)，藍牙(Bluetooth，BT)，全球導航衛星系統(Global Navigation Satellite System，GNSS)，調頻(Frequency Modulation，FM)，近距離無線通訊技術(Near Field Communication，NFC)，紅外技術(Infrared，IR)等無線通訊之解決方案。 The communication module 113 may include a mobile communication module and a wireless communication module. Among them, the mobile communication module can provide wireless communication solutions including 2G/3G/4G/5G applied to the monitoring node 110 . The wireless communication module can provide wireless local area network (Wireless Local Area Networks, WLAN) (such as wireless fidelity (Wireless Fidelity, Wi-Fi) network), bluetooth (Bluetooth, BT), global Solutions for wireless communication such as Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), and Infrared (IR).

顯示幕114用於顯示圖像、視頻等。顯示幕114包括顯示面板。顯示面板可採用液晶顯示幕(Liquid Crystal Display,LCD)，有機發光二極體(Organic Light-Emitting Diode,OLED)，有源矩陣有機發光二極體或主動矩陣有機發光二極體(Active-Matrix Organic Light Emitting Diode,AMOLED)，柔性發光二極體(Flex Light-Emitting Diode,FLED)，Miniled，MicroLed，Micro-oLed，量子點發光二極體(Quantum Dot Light Emitting Diodes,QLED)等。於一些實施例中，監測節點110可包括1個或N個顯示幕114，N為大於1之正整數。 The display screen 114 is used to display images, videos and the like. The display screen 114 includes a display panel. The display panel can use liquid crystal display (Liquid Crystal Display, LCD), organic light-emitting diode (Organic Light-Emitting Diode, OLED), active matrix organic light-emitting diode or active matrix organic light-emitting diode (Active-Matrix Organic Light Emitting Diode, AMOLED), flexible light-emitting diode (Flex Light-Emitting Diode, FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (Quantum Dot Light Emitting Diodes, QLED), etc. In some embodiments, the monitoring node 110 may include 1 or N display screens 114 , where N is a positive integer greater than 1.

可理解，本實施例示意之結構並不構成對監測節點110之具體限定。於本申請另一些實施例中，監測節點110可包括比圖示更多或更少之部件，或者組合某些部件，或者拆分某些部件，或者不同之部件佈置。 It can be understood that the structure shown in this embodiment does not constitute a specific limitation on the monitoring node 110 . In other embodiments of the present application, the monitoring node 110 may include more or less components than shown in the figure, or combine certain components, or separate certain components, or arrange different components.

虛擬機器遷移方法可應用於雲平臺管理系統100之控制節點120。可參閱圖4，虛擬機器遷移方法包括： The virtual machine migration method can be applied to the control node 120 of the cloud platform management system 100 . Referring to Figure 4, the virtual machine migration methods include:

S201，控制節點120檢查是否存在空閒之計算節點130。若存在空閒之計算節點130，則執行步驟S202。若不存在空閒之計算節點130，則執行步驟S205。 S201, the control node 120 checks whether there is an idle computing node 130. If there is an idle computing node 130, step S202 is executed. If there is no idle computing node 130, step S205 is executed.

於其中一種實施方式中，雲平臺管理系統100可按照計算節點130之工作狀態將計算節點130之存儲區域劃分為空閒區域與工作區域。當空閒區域存在計算節點130，即存在空閒之計算節點130時，控制節點120向監測節點110回饋計算節點130狀態資訊，以啟動遷移程式。當空閒區域不存在計算節點130，即不存在空閒之計算節點130時，控制節點120向監測節點110回饋計算節點130之狀態資訊，以停止遷移程式。 In one of the implementation manners, the cloud platform management system 100 can divide the storage area of the computing node 130 into an idle area and a working area according to the working state of the computing node 130 . When there is a computing node 130 in the free area, that is, there is an idle computing node 130 , the control node 120 feeds back the status information of the computing node 130 to the monitoring node 110 to start the migration procedure. When there is no computing node 130 in the idle area, that is, there is no idle computing node 130 , the control node 120 feeds back the status information of the computing node 130 to the monitoring node 110 to stop the migration process.

S202，控制節點120檢查nova服務之狀態是否正常。若nova服務之狀態正常，則執行步驟S203。若nova服務之狀態異常，則執行步驟S205。 S202, the control node 120 checks whether the status of the nova service is normal. If the status of the nova service is normal, execute step S203. If the status of the nova service is abnormal, execute step S205.

當控制節點120確定空閒之計算節點130之後，可建立源主機到目標主機之遷移路徑，再檢查雲平臺管理系統100之nova服務之狀態是否正常。其中，源主機是指待遷移虛擬機器之計算節點130。目標主機是指待接收虛擬機器之計算節點130。 After the control node 120 determines the idle computing node 130, it can establish a migration path from the source host to the target host, and then check whether the status of the nova service of the cloud platform management system 100 is normal. Wherein, the source host refers to the computing node 130 of the virtual machine to be migrated. The target host refers to the computing node 130 to receive the virtual machine.

當nova服務處於正常狀態時，控制節點120向監測節點110回饋nova服務之狀態資訊，以啟動遷移程式。當nova服務處於異常狀態時，控制節點120向監測節點110回饋nova服務之狀態資訊，以停止遷移程式。 When the nova service is in a normal state, the control node 120 feeds back the status information of the nova service to the monitoring node 110 to start the migration program. When the nova service is in an abnormal state, the control node 120 feeds back the status information of the nova service to the monitoring node 110 to stop the migration procedure.

S203，控制節點120檢查電源之狀態是否正常。若電源之狀態正常，則執行步驟S204。若電源之狀態異常，則執行步驟S205。 S203, the control node 120 checks whether the status of the power supply is normal. If the state of the power supply is normal, step S204 is executed. If the state of the power supply is abnormal, execute step S205.

於其中一種實施方式中，控制節點120可藉由基板管理控制器檢查雲平臺管理系統100之電源之狀態。其中，電源之狀態包括開機、關機或待機狀態。當電源處於開機狀態時，控制節點120向監測節點110回饋電源之狀態資訊，以啟動遷移程式。當電源處於關機或待機狀態時，雲平臺管理系統100掉線，監測節點110無法接收控制節點120之回饋消息。 In one of the implementation manners, the control node 120 can check the status of the power supply of the cloud platform management system 100 through the baseboard management controller. Among them, the state of the power supply includes power on, power off or standby state. When the power is turned on, the control node 120 feeds back the status information of the power to the monitoring node 110 to start the migration program. When the power is turned off or in standby state, the cloud platform management system 100 is offline, and the monitoring node 110 cannot receive the feedback message from the control node 120 .

S204，控制節點120啟動遷移程式。 S204, the control node 120 starts a migration program.

當控制節點120遷移虛擬機器時，檢查雲平臺管理系統100之狀態，包括上述步驟S201-S203。當雲平臺管理系統100之狀態正常時，控制節點120啟動遷移程式。 When the control node 120 migrates the virtual machine, it checks the status of the cloud platform management system 100, including the above steps S201-S203. When the status of the cloud platform management system 100 is normal, the control node 120 starts the migration program.

S205，控制節點120停止遷移程式。 S205, the control node 120 stops the migration program.

當雲平臺管理系統100之狀態異常時，控制節點120停止遷移程式。 When the state of the cloud platform management system 100 is abnormal, the control node 120 stops the migration program.

S206，控制節點120檢查虛擬機器之狀態是否與遷移之前保持一致。若虛擬機器之狀態與遷移之前保持一致，則執行步驟S207。若虛擬機器之狀態與遷移之前不一致，則執行步驟S208。 S206, the control node 120 checks whether the state of the virtual machine is consistent with that before the migration. If the state of the virtual machine remains the same as before the migration, step S207 is executed. If the state of the virtual machine is inconsistent with that before the migration, step S208 is executed.

控制節點120可即時記錄計算節點130上之虛擬機器之狀態。其中，虛擬機器之狀態包括運行狀態或故障狀態。 The control node 120 can record the state of the virtual machine on the computing node 130 in real time. Wherein, the state of the virtual machine includes a running state or a failure state.

於其中一種實施方式中，於虛擬機器遷移完成之後，控制節點120檢查虛擬機器之狀態是否與遷移之前保持一致。 In one of the implementation manners, after the migration of the virtual machine is completed, the control node 120 checks whether the state of the virtual machine is consistent with that before the migration.

S207，控制節點120向監測節點110回饋虛擬機器遷移成功之消息。 S207, the control node 120 feeds back to the monitoring node 110 a message that the migration of the virtual machine is successful.

當遷移後虛擬機器之狀態與遷移之前保持一致時，控制節點120向監測節點110回饋虛擬機器遷移成功之消息。 When the state of the virtual machine after migration is consistent with that before the migration, the control node 120 feeds back a message of successful migration of the virtual machine to the monitoring node 110 .

S208，控制節點120確定是否需要修復虛擬機器。若不需要修復虛擬機器，則執行步驟S209。若需要修復虛擬機器，則執行步驟S210。 S208, the control node 120 determines whether the virtual machine needs to be repaired. If the virtual machine does not need to be repaired, step S209 is performed. If the virtual machine needs to be repaired, step S210 is performed.

當遷移後虛擬機器之狀態與遷移之前不一致時，控制節點120確定是否需要修復虛擬機器。 When the state of the virtual machine after migration is inconsistent with that before migration, the control node 120 determines whether to repair the virtual machine.

可理解，當遷移後虛擬機器之狀態與遷移之前不一致時，存在以下兩種情況：一，虛擬機器於一定之時段內發生宕機，經過一段時間或重啟之後，可自動恢復至遷移之前之狀態。二，虛擬機器發生故障，如果不進行修復，無法恢復至遷移之前之狀態。 It can be understood that when the state of the virtual machine after migration is inconsistent with that before migration, there are the following two situations: 1. The virtual machine goes down within a certain period of time, and after a period of time or after restarting, it can automatically return to the state before the migration . Second, if the virtual machine fails, if it is not repaired, it cannot be restored to the state before the migration.

S209，控制節點120重啟虛擬機器。 S209, the control node 120 restarts the virtual machine.

當出現上述第一種情況時，不需要運行修復程式，重啟虛擬機器或等待虛擬機器自動恢復即可。其中，控制節點120可運行重啟程式，以重啟虛擬機器。 When the above first situation occurs, there is no need to run the repair program, just restart the virtual machine or wait for the virtual machine to recover automatically. Wherein, the control node 120 can run a restart program to restart the virtual machine.

於其中一種實施方式中，當重啟虛擬機器之後，虛擬機器之狀態仍然不能恢復，則控制節點120向監測節點110回饋虛擬機器遷移失敗之消息，並執行步驟S210。 In one of the implementation manners, when the status of the virtual machine cannot be recovered after the virtual machine is restarted, the control node 120 feeds back a message that the migration of the virtual machine fails to the monitoring node 110, and executes step S210.

S210，控制節點120修復虛擬機器。 S210, the control node 120 repairs the virtual machine.

當出現上述第二種情況時，需要運行修復程式，以修復發生故障之虛擬機器。 When the above second situation occurs, a recovery program needs to be run to recover the faulty virtual machine.

圖5是本申請一實施方式之控制節點120之結構示意圖。 FIG. 5 is a schematic structural diagram of a control node 120 according to an embodiment of the present application.

可參閱圖5，控制節點120可包括處理器121，記憶體122及通訊模組123。處理器121可藉由運行存儲於記憶體122中之電腦程式或代碼，實現本申請實施例之上述虛擬機器遷移方法。 Referring to FIG. 5 , the control node 120 may include a processor 121 , a memory 122 and a communication module 123 . The processor 121 can execute the computer program or code stored in the memory 122 to implement the above virtual machine migration method in the embodiment of the present application.

處理器121包括基板管理控制器(BMC)與通用處理器。其中，控制節點120可藉由基板管理控制器檢查電源之狀態。通用處理器可參閱處理器111之相關描述，記憶體122可參閱記憶體112之相關描述，通訊模組123可參閱通訊模組113之相關描述，此處不再贅述。 The processor 121 includes a baseboard management controller (BMC) and a general purpose processor. Wherein, the control node 120 can check the status of the power supply through the baseboard management controller. For the general-purpose processor, refer to the relevant description of the processor 111 , for the memory 122 , refer to the relevant description of the memory 112 , and for the communication module 123 , refer to the relevant description of the communication module 113 , which will not be repeated here.

可理解，本實施例示意之結構並不構成對控制節點120之具體限定。於本申請另一些實施例中，控制節點120亦可包括比圖示更多或更少之部件，或者不同之部件佈置。 It can be understood that the structure shown in this embodiment does not constitute a specific limitation on the control node 120 . In other embodiments of the present application, the control node 120 may also include more or less components than those shown in the figure, or have different component arrangements.

本申請實施例還提供一種存儲介質，用於存儲電腦程式或代碼，當所述電腦程式或代碼被處理器執行時，實現本申請實施例之虛擬機器遷移方法。 The embodiment of the present application also provides a storage medium for storing computer programs or codes, and when the computer programs or codes are executed by a processor, the virtual machine migration method of the embodiments of the present application is implemented.

存儲介質包括於用於存儲資訊(諸如電腦可讀指令、資料結構、程式模組或其它資料)之任何方法或技術中實施之易失性與非易失性、可移除與不可移除介質。存儲介質包括，但不限於，隨機存取記憶體(Random Access Memory,RAM)、唯讀記憶體(Read-Only Memory,ROM)、帶電可擦可程式設計唯讀記憶體(Electrically Erasable Programmable Read-Only Memory,EEPROM)、快閃記憶體或其它記憶體、唯讀光碟(Compact Disc Read-Only Memory,CD-ROM)、數位通用光碟(Digital Versatile Disc,DVD)或其它光碟存儲、磁盒、磁帶、磁片存儲或其它磁存儲裝置、或者可用於存儲期望之資訊並且可被電腦訪問之任何其它之介質。 Storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data . Storage media include, but are not limited to, Random Access Memory (Random Access Memory, RAM), Read-Only Memory (Read-Only Memory, ROM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read- Only Memory, EEPROM), flash memory or other memory, CD-ROM (Compact Disc Read-Only Memory, CD-ROM), digital versatile disc (Digital Versatile Disc, DVD) or other optical disc storage, magnetic box, magnetic tape , disk storage or other magnetic storage device, or any other medium that can be used to store desired information and that can be accessed by a computer.

上面結合附圖對本申請實施例作了詳細說明，但本申請不限於上述實施例，於所屬技術領域普通具通常技藝者所具備之知識範圍內，還可於不脫離本申請宗旨之前提下做出各種變化。 The embodiments of the present application have been described in detail above in conjunction with the accompanying drawings, but the present application is not limited to the above embodiments, within the scope of knowledge possessed by ordinary skilled persons in the technical field, it can also be done without departing from the purpose of the present application. Various changes.

S101-S106:步驟 S101-S106: Steps

Claims

A virtual machine migration method applied to a monitoring node. The improvement is that the method includes: monitoring the status of the computing node; determining whether the computing node satisfies a trigger condition; wherein the trigger condition includes the disconnection of the computing node The duration reaches the preset duration, or the state of the computing node is unstable; if the computing node meets the trigger condition, send a message to the control node to migrate the virtual machine; check the thread mark of the control node and a state lock, the thread flag is used to indicate whether the control node is running a virtual machine migration program, and the state lock is used to indicate whether the control node is in a locked state.

The virtual machine migration method as described in claim 1, wherein the computing node is disconnected, including: the nova-compute proxy service is disconnected, the number of nova-compute proxy services in the disconnected state is 1, and the 10G network is disconnected Line and 1G network dropped.

The virtual machine migration method as described in claim 1, wherein the state of the computing node is unstable, including: the nova-compute proxy service is offline, the number of nova-compute proxy services in the offline state is 1, and the 10G network The number of computing nodes in the 10G network in which the line is disconnected and in the disconnected state is 1.

The virtual machine migration method as described in claim 3, wherein, when the number of nova-compute proxy services in the offline state is 1, and the frequency of nova-compute proxy service offline reaches the first preset frequency , and the number of computing nodes in the offline 10G network is 1, and the frequency of the offline of the 10G network reaches a second preset frequency, it is determined that the state of the computing nodes is unstable.

The virtual machine migration method according to claim 1, wherein the method further includes: calculating and displaying the migration time of the virtual machine.

The virtual machine migration method according to claim 1, wherein the method further includes: storing the migration record of the virtual machine.

A virtual machine migration method applied to a control node. The improvement is that the method includes: checking the status of the cloud platform management system; feeding back the message of the virtual machine migration to the monitoring node, and the message of the virtual machine migration includes an execution thread mark and State lock, the execution thread flag is used to indicate whether the control node is running the migration program of the virtual machine, and the state lock is used to indicate whether the control node is in a locked state; check whether the state of the virtual machine remains the same as before the migration Consistent; if the state of the virtual machine is inconsistent with that before the migration, repair the virtual machine.

The virtual machine migration method according to claim 7, wherein the checking the status of the cloud platform management system includes: checking whether there are idle computing nodes; checking whether the status of the nova service is normal; checking whether the status of the power supply is normal.

An electronic device, including a communication module, a display screen, a memory, and a processor. The improvement is that the processor runs the computer program or code stored in the memory to achieve any one of the requirements 1 to 8. The virtual machine migration method described in this item.

A storage medium for storing computer programs or codes, the improvement of which is, When the computer program or code is executed by the processor, the virtual machine migration method as described in any one of Claims 1 to 8 is realized.