[go: up one dir, main page]

TWI591489B - Intelligent monitoring and warning device and method for distributed software defined storage system - Google Patents

Intelligent monitoring and warning device and method for distributed software defined storage system Download PDF

Info

Publication number
TWI591489B
TWI591489B TW105141327A TW105141327A TWI591489B TW I591489 B TWI591489 B TW I591489B TW 105141327 A TW105141327 A TW 105141327A TW 105141327 A TW105141327 A TW 105141327A TW I591489 B TWI591489 B TW I591489B
Authority
TW
Taiwan
Prior art keywords
data
early warning
storage system
software definition
definition storage
Prior art date
Application number
TW105141327A
Other languages
Chinese (zh)
Other versions
TW201822018A (en
Inventor
Hsu Fang Lai
Original Assignee
Chunghwa Telecom Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chunghwa Telecom Co Ltd filed Critical Chunghwa Telecom Co Ltd
Priority to TW105141327A priority Critical patent/TWI591489B/en
Application granted granted Critical
Publication of TWI591489B publication Critical patent/TWI591489B/en
Publication of TW201822018A publication Critical patent/TW201822018A/en

Links

Landscapes

  • Debugging And Monitoring (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Description

應用於分散式軟體定義儲存系統之智慧式監控與預警裝置及其方法 Intelligent monitoring and early warning device and method thereof for distributed software definition storage system

本發明是有關於一種應用於分散式軟體定義儲存系統之智慧式監控與預警裝置及其方法,以透過自動化的監控與反應流程,降低分散式軟體定義儲存系統的維運成本,並提升服務品質。 The invention relates to a smart monitoring and early warning device and a method thereof for a distributed software definition storage system, which can reduce the maintenance cost of the distributed software definition storage system and improve the service quality through an automated monitoring and reaction process. .

儲存系統的運用,在當前的資料中心環境當中,受到極大重視,因為不論是為了開拓創新的業務型態所引發的新型系統建置需求,或導入發展已久、已經很成熟的各式IT應用,都需要儲存系統來保管資料或作為分析之用;而基於系統所需配置的容量越來越大,現有一種分散式儲存系統可將資料切割儲存,讓使用者能使用平行技術加速資料的運算,並透過備份機制提升資料的容錯率,使大資料儲存不再是問題。 The use of storage systems has received great attention in the current data center environment, whether it is to develop new types of system development requirements triggered by innovative business models, or to introduce long-established and mature IT applications. Storage systems are required to store data or for analysis; and based on the increasing capacity of the system, a distributed storage system can cut and store data, allowing users to use parallel technology to accelerate data operations. And through the backup mechanism to improve the data fault tolerance, making large data storage is no longer a problem.

為了確保儲存系統的服務品質,並解決傳統儲存系統架構之缺點,例如儲存資源集中管控不易等等,因此目前市場上係有發展出一套軟體定義儲存(Software-Defined Storage,SDS)系統。軟體定義儲存是電腦數據儲存的一個進化概念,以軟體控制的方法來決定資料儲存的方針及管理方 式,其可從管理儲存基礎架構的軟體中,獨立出儲存硬體的計算機資料儲存技術。在軟體定義儲存下,可以啟動一些功能選項,例如重複數據刪除、複製、自動精簡配置、快照及備份,並可提供儲存資源的政策管理。 In order to ensure the service quality of the storage system and to solve the shortcomings of the traditional storage system architecture, such as centralized management of storage resources, etc., a software-Defined Storage (SDS) system has been developed on the market. Software definition storage is an evolutionary concept of computer data storage. The software control method is used to determine the data storage policy and management. It can separate the storage of computer data storage technology from the software that manages the storage infrastructure. Under the software definition store, you can launch some functional options such as deduplication, replication, thin provisioning, snapshots and backups, and provide policy management for storage resources.

而當中,軟體定義儲存可結合分散式儲存系統以成為分散式軟體定義儲存系統;分散式軟體定義儲存系統可使用軟體來處理資料的保護,讓軟體可以更彈性的達到高等級的防護,允許更多的磁碟機同時失效時仍不會造成資料流失,並且分散式軟體定義儲存系統同時還具備效能可隨意的擴充、自我修護機制之功能。然而,雖然分散式軟體定義儲存系統具備有多項優點,但現階段之分散式軟體定義儲存系統中並無提早防範異常發生之機制,即使其具有自我修護之功能,但仍無法即時阻擋異常發生時所產生的衝擊,進而將影響儲存系統效能之穩定。 Among them, the software definition storage can be combined with the decentralized storage system to become a decentralized software definition storage system; the decentralized software definition storage system can use software to process data protection, so that the software can achieve higher level of protection more flexibly, allowing more When multiple drives fail at the same time, data loss will not occur, and the decentralized software-defined storage system also has the function of freely expanding and self-repairing mechanisms. However, although the decentralized software-defined storage system has many advantages, there is no mechanism for preventing the occurrence of anomalies in the decentralized software-defined storage system at this stage. Even if it has the function of self-repair, it cannot immediately block the abnormality. The impact of the time will in turn affect the stability of the storage system.

有鑑於上述習知技藝之問題,本發明之目的就是在提供一種應用於分散式軟體定義儲存系統之智慧式監控與預警裝置及其方法,以透過自動化的監控與反應流程,降低分散式軟體定義儲存系統的維運成本,並提升服務品質。 In view of the above-mentioned problems of the prior art, the object of the present invention is to provide a smart monitoring and early warning device and method for the distributed software definition storage system, which can reduce the definition of distributed software through an automated monitoring and reaction process. The storage system's maintenance costs and improved service quality.

根據本發明之目的,提出一種應用於分散式軟體定義儲存系統之智慧式監控與預警裝置,其包含:一狀態資料收集模組,係收集分散式軟體定義儲存系統之各節點運行之一狀態資料;一智慧分析模組,係連接該狀態資料收集模組,以接收狀態資料並進行分析,且智慧分析模組係進一步將狀態資料與一異常模型資料進行比對,進而產生一異常比 對結果資料;以及一預警與反應模組,係連接智慧分析模組,所述預警與反應模組係讀取分散式軟體定義儲存系統之一目前配置資料,並在接收到異常比對結果資料後係依據異常程度運算出一目標配置資料,且預警與反應模組係比較目前配置資料與目標配置資料之差異程度,進而以漸近方式逐步調整分散式軟體定義儲存系統之配置。 According to the purpose of the present invention, a smart monitoring and early warning device for a distributed software definition storage system is proposed, which comprises: a state data collection module, which collects state data of each node operation of the distributed software definition storage system. A smart analysis module is connected to the state data collection module to receive state data and analyze, and the smart analysis module further compares the state data with an abnormal model data to generate an abnormal ratio. For the result data; and an early warning and response module, the smart analysis module is connected, and the early warning and response module reads the current configuration data of one of the distributed software definition storage systems, and receives the abnormal comparison result data. After that, a target configuration data is calculated according to the degree of abnormality, and the difference between the current configuration data and the target configuration data is compared between the early warning and the response module, and the configuration of the distributed software definition storage system is gradually adjusted in an asymptotic manner.

根據本發明之目的,又提出一種應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其包含下列步驟:利用一狀態資料收集模組收集分散式軟體定義儲存系統之各節點運行之一狀態資料;利用一智慧分析模組接收狀態資料並進行分析,並進一步將狀態資料與一異常模型資料進行比對,進而產生一異常比對結果資料;利用一預警與反應模組讀取分散式軟體定義儲存系統之一目前配置資料,並在接收到異常比對結果資料後係依據異常程度運算出一目標配置資料;以及利用預警與反應模組比較目前配置資料與目標配置資料之差異程度,進而以漸近方式逐步調整分散式軟體定義儲存系統之配置。 According to the object of the present invention, a smart monitoring and early warning method for a distributed software definition storage system is proposed, which comprises the following steps: collecting one of the nodes of the distributed software definition storage system by using a state data collection module; State data; use a smart analysis module to receive state data and analyze it, and further compare the state data with an abnormal model data to generate an abnormal comparison result data; use an early warning and response module to read the distributed The software defines a current configuration data of one of the storage systems, and after receiving the abnormal comparison result data, calculates a target configuration data according to the abnormal degree; and compares the difference between the current configuration data and the target configuration data by using the early warning and response module, Then, the configuration of the distributed software definition storage system is gradually adjusted in an asymptotic manner.

依據上述技術特徵,本發明更包含一狀態資料庫,係連接狀態資料收集模組,以儲存狀態資料。 According to the above technical features, the present invention further includes a state database, which is a connection state data collection module for storing state data.

依據上述技術特徵,所述智慧分析模組係連接狀態資料庫,且智慧分析模組係讀取狀態資料庫中的既存資料並進行運算與分析,以比對正常狀態資料與異常狀態資料來建構出所述異常模型資料,以及智慧分析模組係接收使用者所輸入之分析回饋資料來更新與調整所述異常模型資料。 According to the above technical feature, the smart analysis module is connected to the state database, and the smart analysis module reads the existing data in the state database and performs calculation and analysis to compare the normal state data with the abnormal state data. The abnormal model data is output, and the smart analysis module receives the analysis feedback data input by the user to update and adjust the abnormal model data.

依據上述技術特徵,狀態資料係包含處理器使用率、記憶體使用率、磁碟存取吞吐流量、磁碟存取操作速率、 磁碟存取反應時間、磁碟健康度資訊、網路使用流量及節點反應時間。 According to the above technical features, the status data includes processor usage, memory usage, disk access throughput, disk access operation rate, Disk access response time, disk health information, network usage traffic, and node response time.

依據上述技術特徵,所述預警與反應模組以漸近方式進行調整配置係於一特定時間內執行單一次之調整,並於分散式軟體定義儲存系統資料回復狀態穩定後再進行下一次的調整。 According to the above technical features, the early warning and response module is adjusted in an asymptotic manner to perform a single adjustment in a specific time, and the next adjustment is performed after the data recovery state of the distributed software definition storage system is stabilized.

綜上所述,本發明之應用於分散式軟體定義儲存系統之智慧式監控與預警裝置及其方法,係具有下列一或多個特點: In summary, the intelligent monitoring and early warning device and method for the distributed software defined storage system of the present invention have one or more of the following characteristics:

1、本發明透過自動化的狀態資料收集與分析,建立異常模型,在系統運行時可即時判別各個受監控之裝置或設備是否有異常傾向,進而偵測出潛在異常,並藉由人工判讀的回饋修正異常模型,提升判斷準確度。 1. The present invention establishes an anomaly model through automated state data collection and analysis, and can instantly determine whether each monitored device or device has an abnormal tendency when the system is running, thereby detecting a potential abnormality and giving feedback by manual interpretation. Correct the anomaly model to improve the accuracy of the judgment.

2、本發明預警與反應模組以智慧分析模組分析之異常狀況,決策出新的分散式軟體定義儲存系統配置,並比較現行配置,以漸進的方式逐步調整配置,控制調整幅度使分散式軟體定義儲存系統能在一定時間內回復至穩定狀態,可有效避免影響分散式軟體定義儲存系統之服務品質。 2. The warning and reaction module of the present invention uses the intelligent analysis module to analyze the abnormal condition, and determines a new distributed software definition storage system configuration, and compares the current configuration, gradually adjusts the configuration in a gradual manner, and controls the adjustment range to make the distributed The software-defined storage system can return to a stable state within a certain period of time, which can effectively avoid affecting the service quality of the distributed software-defined storage system.

3、在本發明監控與反應之流程下,維運人員可針對預期之異常提早準備,在異常發生時第一時間處理,使得維運工作更有效率。。 3. Under the process of monitoring and reaction of the present invention, the maintenance personnel can prepare for the expected abnormality and deal with the first time when the abnormality occurs, so that the maintenance work is more efficient. .

10‧‧‧狀態資料收集模組 10‧‧‧ State data collection module

20‧‧‧智慧分析模組 20‧‧‧Smart Analysis Module

21‧‧‧異常模型資料 21‧‧‧Abnormal model data

30‧‧‧預警與反應模組 30‧‧‧Alarm and response module

40‧‧‧狀態資料庫 40‧‧‧State database

100‧‧‧分散式軟體定義儲存系統 100‧‧‧Distributed software definition storage system

101‧‧‧節點 101‧‧‧ nodes

S11~S14‧‧‧步驟流程 S11~S14‧‧‧Step process

S21~S28‧‧‧步驟流程 S21~S28‧‧‧Step process

S31~S38‧‧‧步驟流程 S31~S38‧‧‧Step process

圖1為本發明之智慧式監控與預警裝置之示意圖。 1 is a schematic diagram of a smart monitoring and early warning device of the present invention.

圖2為本發明之智慧式監控與預警方法之流程圖。 2 is a flow chart of a smart monitoring and early warning method of the present invention.

圖3為本發明之智慧分析模組分析狀態資料之流程圖。 3 is a flow chart of analyzing state data of the smart analysis module of the present invention.

圖4為本發明之預警與反應模組處理異常分析結果之流程圖。 4 is a flow chart of the analysis result of the abnormality of the early warning and reaction module processing of the present invention.

為利 貴審查員瞭解本發明之技術特徵、內容與優點及其所能達成之功效,茲將本發明配合附圖,並以實施例之表達形式詳細說明如下,而其中所使用之圖式,其主旨僅為示意及輔助說明書之用,未必為本發明實施後之真實比例與精準配置,故不應就所附之圖式的比例與配置關係解讀、侷限本發明於實際實施上的權利範圍,合先敘明。 The technical features, contents, and advantages of the present invention, as well as the advantages thereof, can be understood by the present inventors, and the present invention will be described in detail with reference to the accompanying drawings. The subject matter is only for the purpose of illustration and description. It is not intended to be a true proportion and precise configuration after the implementation of the present invention. Therefore, the scope and configuration relationship of the attached drawings should not be interpreted or limited. First described.

本發明主要係提出一種應用於分散式軟體定義儲存系統之智慧式監控與預警裝置及其方法,其可收集並儲存分散式軟體定義儲存系統中各節點監控數據與軟體運行記錄,再使用數據統計、異常偵測與機器學習等方法即時分析,以於硬體障礙發生時找出可能引發障礙的異常數據建立異常模型資料。若後續分散式軟體定義儲存系統運行時偵測到異常數據模式發生,則可針對異常提前發出預警,並調整資料存放比重,使分散式軟體定義儲存系統將資料移出異常發生區域,提早對異常做準備,除了可降低異常發生時對分散式軟體定義儲存系統服務所產生的衝擊,也可一併加速損壞硬體更換流程,藉此維持分散式軟體定義儲存系統之穩定效能。 The invention mainly provides a smart monitoring and early warning device and a method thereof for a distributed software definition storage system, which can collect and store monitoring data and software operation records of each node in the distributed software definition storage system, and then use the data statistics. , anomaly detection and machine learning methods are analyzed in real time to identify abnormal data that may cause obstacles when hardware obstacles occur. If an abnormal data pattern is detected during the operation of the subsequent distributed software definition storage system, an early warning may be issued for the abnormality, and the proportion of the data storage may be adjusted, so that the distributed software definition storage system moves the data out of the abnormal occurrence area, and performs the abnormality earlier. Preparation, in addition to reducing the impact of distributed software definition storage system services when anomalies occur, can also accelerate the damage hardware replacement process, thereby maintaining the stable performance of the decentralized software-defined storage system.

為更清楚敘明本發明之技術特徵,請參閱圖1,其係為本發明之智慧式監控與預警裝置之示意圖。本發明可應用於分散式軟體定義儲存系統之智慧式監控與預警裝置主 要係包含有狀態資料收集模組10、智慧分析模組20、預警與反應模組30及狀態資料庫40,狀態資料收集模組10係連接狀態資料庫40,且智慧分析模組20係連接狀態資料收集模組10、預警與反應模組30及狀態資料庫40。 For a clearer description of the technical features of the present invention, please refer to FIG. 1, which is a schematic diagram of the intelligent monitoring and early warning device of the present invention. The invention can be applied to the intelligent monitoring and early warning device of the distributed software definition storage system. The state data collection module 10, the intelligence analysis module 20, the early warning and response module 30, and the state database 40 are included, and the state data collection module 10 is connected to the state database 40, and the smart analysis module 20 is connected. The status data collection module 10, the early warning and response module 30, and the status database 40.

受監控之分散式軟體定義儲存系統100當中之各節點101佈建用以收集狀態資料之代理程式,所述代理程式會定期傳送狀態資料至狀態資料收集模組10,而所述狀態資料收集模組10在接收到最新之狀態資料後,會將狀態資料儲存至狀態資料庫40,並傳送至智慧分析模組20進行異常分析。其中,狀態資料係包含處理器使用率、記憶體使用率、磁碟存取吞吐流量、磁碟存取操作速率、磁碟存取反應時間、磁碟健康度資訊、網路使用流量、節點反應時間等數據。 Each of the nodes 101 of the distributed decentralized software definition storage system 100 is configured to collect an agent for collecting status data, and the agent periodically transmits status data to the status data collection module 10, and the status data collection module After receiving the latest status data, the group 10 stores the status data in the status database 40 and transmits it to the smart analysis module 20 for abnormal analysis. The status data includes processor usage, memory usage, disk access throughput, disk access operation rate, disk access response time, disk health information, network usage traffic, node response. Time and other data.

智慧分析模組20啟動時會讀取狀態資料庫40中既有的狀態資料以建構出異常模型資料21,且智慧分析模組20在接收到狀態收集模組10所傳送之最新之狀態資料時會依據該異常模型資料21進行分析,接著將分析後產生之異常比對結果資料傳送至預警與反應模組30。詳細地來說,智慧分析模組20會讀取狀態資料庫40中的既存資料並進行運算與分析,以比對正常狀態資料與異常狀態資料來建構出所述異常模型資料21,而智慧分析模組20在接收到狀態收集模組10傳送之狀態資料後,可偵測是否有潛在異常存在,此時會先將狀態資料正規化並初步過濾明顯異常數據後,再進一步將狀態資料與異常模型資料21進行比對,進而可產生所述異常比對結果資料。 When the smart analysis module 20 starts, it reads the state data existing in the state database 40 to construct the abnormal model data 21, and the smart analysis module 20 receives the latest state data transmitted by the state collection module 10. The analysis is performed based on the abnormal model data 21, and then the abnormal comparison result data generated after the analysis is transmitted to the early warning and response module 30. In detail, the smart analysis module 20 reads the existing data in the state database 40 and performs operations and analysis to construct the abnormal model data 21 by comparing the normal state data with the abnormal state data, and the smart analysis After receiving the status data transmitted by the state collection module 10, the module 20 can detect whether there is a potential abnormality. At this time, the state data is normalized and the abnormal abnormal data is initially filtered, and then the state data and the abnormality are further analyzed. The model data 21 is compared, and the abnormal comparison result data can be generated.

預警與反應模組30運行時會偵測目前分散式軟體定義儲存系統100的設定與配置,且預警與反應模組30在 接收到智慧分析模組20所傳送之異常比對結果資料時,將會發送預警訊息給予維運人員,以及依據該異常比對結果資料運算出新的配置並比對現行配置,進而以漸近的方式逐步調整分散式軟體定義儲存系統100,使其維持狀態穩定提供服務。詳細地來說,預警與反應模組30係讀取分散式軟體定義儲存系統100之目前配置資料,並在接收到異常比對結果資料時依據異常程度運算出一目標配置資料,且預警與反應模組30在比較目前配置資料與目標配置資料的差異程度後,將以漸近的方式逐步調整分散式軟體定義儲存系統100之配置,而其中漸近調整之方式係在每一次的調整皆會等待分散式軟體定義儲存系統100之資料回復狀態穩定後再進行下一次的調整,並且控制每一次的調整在一定時間內完成,藉以可確保分散式軟體定義儲存系統100的運作與服務品質。 When the early warning and response module 30 is running, the setting and configuration of the current distributed software definition storage system 100 is detected, and the early warning and response module 30 is Upon receiving the abnormal comparison result data transmitted by the smart analysis module 20, an alert message is sent to the maintenance personnel, and a new configuration is calculated according to the abnormal comparison result data and the current configuration is compared, and then the asymptotic The method gradually adjusts the decentralized software definition storage system 100 to maintain a stable state to provide services. In detail, the early warning and response module 30 reads the current configuration data of the distributed software definition storage system 100, and calculates a target configuration data according to the degree of abnormality when receiving the abnormal comparison result data, and the early warning and reaction After comparing the difference between the current configuration data and the target configuration data, the module 30 will gradually adjust the configuration of the distributed software definition storage system 100 in an asymptotic manner, and the method of asymptotic adjustment is to wait for the dispersion after each adjustment. After the data recovery state of the software definition storage system 100 is stabilized, the next adjustment is performed, and each adjustment of the control is completed within a certain time, thereby ensuring the operation and service quality of the distributed software definition storage system 100.

上述中,維運人員可實際在分散式軟體定義儲存系統100上確認異常情況,並將一分析回饋資料回饋至預警與反應模組30,而預警與反應模組30則可將該分析回饋資料傳送至智慧分析模組20,使智慧分析模組20在收到分析回饋資料後更新與調整異常模型資料21與狀態資料庫50,以修正後續分析,藉此可避免誤判之情事發生。 In the above, the maintenance personnel can actually confirm the abnormal situation on the distributed software definition storage system 100, and feed back an analysis feedback data to the early warning and reaction module 30, and the early warning and reaction module 30 can feedback the analysis data. The smart analysis module 20 is sent to the smart analysis module 20 to update and adjust the abnormal model data 21 and the state database 50 after receiving the analysis feedback data to correct the subsequent analysis, thereby avoiding the occurrence of misjudgment.

請參閱圖2,其係為本發明之智慧式監控與預警方法之流程圖,其流程步驟為: Please refer to FIG. 2 , which is a flowchart of the smart monitoring and early warning method of the present invention, and the process steps are as follows:

步驟S11:利用一狀態資料收集模組收集分散式軟體定義儲存系統之各節點運行之一狀態資料。 Step S11: Collecting, by using a state data collection module, one state data of each node operation of the distributed software definition storage system.

步驟S12:利用一智慧分析模組接收狀態資料並進行分析,並進一步將狀態資料與一異常模型資料進行比對,進而產生一異常比對結果資料。 Step S12: using a smart analysis module to receive state data and perform analysis, and further comparing the state data with an abnormal model data, thereby generating an abnormal comparison result data.

步驟S13:利用一預警與反應模組讀取分散式軟體定義儲存系統之一目前配置資料,並在接收到異常比對結果資料後係依據異常程度運算出一目標配置資料。 Step S13: reading an existing configuration data of one of the distributed software definition storage systems by using an early warning and response module, and calculating a target configuration data according to the abnormality degree after receiving the abnormal comparison result data.

步驟S14:利用預警與反應模組比較目前配置資料與目標配置資料之差異程度,進而以漸近方式逐步調整分散式軟體定義儲存系統之配置。 Step S14: The difference between the current configuration data and the target configuration data is compared by using the early warning and response module, and then the configuration of the distributed software definition storage system is gradually adjusted in an asymptotic manner.

再請參閱圖3,其係為本發明之智慧分析模組分析狀態資料之流程圖,其流程步驟為:步驟S21:接收狀態收集模組傳送之狀態資料訊息。步驟S22:辨別該狀態資料所監控之標的狀態是否已在先前被標記為異常,若是,則跳至步驟S28維持判斷異常,否則繼續進行下列步驟。步驟S23:依照不同類型之狀態資料進行正規化,以利後續分析。步驟S24:判斷所接收之狀態資料是否在異常模型中統計之正常範圍內,若是,則跳至步驟S27判斷所監控之標的為正常,否則繼續進行下列步驟。步驟S25:計算該狀態資料不在正常範圍內的持續時間,是否超過可容忍之觀察期,若是,則跳至步驟S28判斷所監控之標的為異常,否則繼續進行下列步驟。步驟S26:依據該狀態資料監控標的比對異常模型資料,判斷是否符合先前發生異常的特徵,並計算相似程度表示其異常可能性,超過一定值即判斷該監控之標的為異常,否則判斷為正常。 Please refer to FIG. 3 , which is a flowchart of the analysis status data of the smart analysis module of the present invention. The process steps are as follows: Step S21 : receiving the status data message transmitted by the status collection module. Step S22: It is discriminated whether the status of the target monitored by the status data has been previously marked as abnormal. If yes, the process goes to step S28 to maintain the determination abnormality, otherwise the following steps are continued. Step S23: normalize according to different types of state data for subsequent analysis. Step S24: It is judged whether the received status data is within the normal range counted in the abnormal model, and if yes, the process goes to step S27 to judge that the monitored target is normal, otherwise the following steps are continued. Step S25: Calculate whether the duration of the status data is not within the normal range, whether it exceeds the tolerable observation period, and if yes, skip to step S28 to determine that the monitored target is abnormal, otherwise continue the following steps. Step S26: monitoring the abnormality model data of the target according to the state data, judging whether the characteristics of the previous abnormality are met, and calculating the similarity degree to indicate the abnormality probability. If the value exceeds a certain value, the target of the monitoring is determined to be abnormal, otherwise the judgment is normal. .

再請參閱第圖4,其係為本發明之預警與反應模組處理異常分析結果之流程圖,其流程步驟為:步驟S31:接收智慧分析模組傳送之異常比對結果資料,觸發步驟S32,並由步驟S35判斷該結果中是否有新的潛在異常監控標的,若是則一併觸發步驟S36。步驟S32:依據異常比對結果資料,運算新的目標配置,包括資料放置比重等策略。步驟S33:讀 取目前配置並與新的目標配置比較,計算其間差異。步驟S34:以漸近的方式逐步調整配置,每次的調整會依上次調整花費時間進行幅度微調,使其調整時間可控制在一定範圍內,維持儲存系統穩定性。步驟S36:向維運人員發出新的潛在異常預警。步驟S37:維運人員實際確認狀況後給予回饋,印證該潛在異常存在與否。步驟S38:將維運人員之回饋傳送回智慧分析模組,以利修正異常模組資料與後續判斷。 Referring to FIG. 4 , which is a flowchart of the abnormality analysis result of the early warning and reaction module processing of the present invention, the process steps are as follows: Step S31: receiving the abnormal comparison result data transmitted by the smart analysis module, and triggering step S32 And it is determined in step S35 whether there is a new potential abnormal monitoring target in the result, and if so, step S36 is triggered together. Step S32: Calculate a new target configuration according to the abnormal comparison result data, including a strategy such as the proportion of data placement. Step S33: Read Take the current configuration and compare it with the new target configuration and calculate the difference between them. Step S34: The configuration is gradually adjusted in an asymptotic manner, and each adjustment will be fine-tuned according to the time of the last adjustment, so that the adjustment time can be controlled within a certain range to maintain the stability of the storage system. Step S36: Issue a new potential abnormal warning to the maintenance personnel. Step S37: After the actual confirmation of the situation, the maintenance personnel give feedback and verify whether the potential abnormality exists or not. Step S38: The feedback from the maintenance personnel is transmitted back to the smart analysis module to facilitate correction of the abnormal module data and subsequent judgment.

具體而言,本發明分為三大模組,包含狀態資料收集模組、智慧分析模組以及預警與反應模組。狀態資料收集模組負責收集分散式軟體定義儲存系統中各節點之狀態資料,並存放至狀態資料庫中;智慧分析模組負責分析狀態資料,建構異常模型並判斷各狀態資料監控目標之異常程度;預警與反應模組負責回報發現之潛在異常,並依據異常程度調整儲存系統配置。藉由本發明以自動化及智慧化的方式輔助分散式軟體定義儲存系統的運作,事先預警使維運人員得以提前準備或處理異常設備,針對潛在異常調整儲存系統配置避免因異常發生影響效能,可大幅降低分散式軟體定義儲存系統管理與維運之成本。 Specifically, the present invention is divided into three modules, including a state data collection module, a smart analysis module, and an early warning and response module. The state data collection module is responsible for collecting the state data of each node in the distributed software definition storage system and storing it in the state database; the smart analysis module is responsible for analyzing the state data, constructing the anomaly model and judging the abnormality degree of each state data monitoring target. The early warning and response module is responsible for reporting the potential anomalies found and adjusting the storage system configuration based on the degree of anomalies. By the invention, the operation of the distributed software definition storage system is assisted in an automated and intelligent manner, and the early warning enables the maintenance personnel to prepare or handle the abnormal equipment in advance, and adjusts the storage system configuration for the potential abnormality to avoid the influence of the abnormality, which can be greatly Reduce the cost of decentralized software-defined storage system management and maintenance.

綜觀上述,可見本發明在突破先前之技術下,確實已達到所欲增進之功效,且也非熟悉該項技藝者所易於思及,再者,本發明申請前未曾公開,且其所具之進步性、實用性,顯已符合專利之申請要件,爰依法提出專利申請,懇請 貴局核准本件發明專利申請案,以勵發明,至感德便。 Looking at the above, it can be seen that the present invention has achieved the desired effect under the prior art, and is not familiar to those skilled in the art. Moreover, the present invention has not been disclosed before the application, and it has Progressive and practical, it has already met the requirements for patent application, and has filed a patent application according to law. You are requested to approve the application for this invention patent to encourage invention.

以上所述之實施例僅係為說明本發明之技術思想及特點,其目的在使熟習此項技藝之人士能夠瞭解本發明之內容並據以實施,當不能以之限定本發明之專利範圍,即 大凡依本發明所揭示之精神所作之均等變化或修飾,仍應涵蓋在本發明之專利範圍內。 The embodiments described above are merely illustrative of the technical spirit and the features of the present invention, and the objects of the present invention can be understood by those skilled in the art, and the scope of the present invention cannot be limited thereto. which is Equivalent changes or modifications made by the spirit of the present invention should still be included in the scope of the present invention.

10‧‧‧狀態資料收集模組 10‧‧‧ State data collection module

20‧‧‧智慧分析模組 20‧‧‧Smart Analysis Module

21‧‧‧異常模型資料 21‧‧‧Abnormal model data

30‧‧‧預警與反應模組 30‧‧‧Alarm and response module

40‧‧‧狀態資料庫 40‧‧‧State database

100‧‧‧分散式軟體定義儲存系統 100‧‧‧Distributed software definition storage system

101‧‧‧節點 101‧‧‧ nodes

Claims (10)

一種應用於分散式軟體定義儲存系統之智慧式監控與預警裝置,其包含:一狀態資料收集模組,係收集分散式軟體定義儲存系統之各節點運行之一狀態資料;一智慧分析模組,係連接該狀態資料收集模組,以接收該狀態資料並進行分析,且該智慧分析模組係進一步將該狀態資料與一異常模型資料進行比對,進而產生一異常比對結果資料;以及一預警與反應模組,係連接該智慧分析模組,該預警與反應模組係讀取分散式軟體定義儲存系統之一目前配置資料,並在接收到該異常比對結果資料後係依據異常程度運算出一目標配置資料,且該預警與反應模組係比較該目前配置資料與該目標配置資料之差異程度,進而以漸近方式逐步調整分散式軟體定義儲存系統之配置。 A smart monitoring and early warning device for a distributed software definition storage system, comprising: a state data collection module, which collects state data of each node operation of the distributed software definition storage system; a smart analysis module, The state data collection module is connected to receive the state data and analyzed, and the smart analysis module further compares the state data with an abnormal model data, thereby generating an abnormal comparison result data; The early warning and response module is connected to the smart analysis module, and the early warning and response module reads the current configuration data of one of the distributed software definition storage systems, and according to the abnormality degree after receiving the abnormal comparison result data A target configuration data is calculated, and the early warning and response module compares the difference between the current configuration data and the target configuration data, and then gradually adjusts the configuration of the distributed software definition storage system in an asymptotic manner. 如申請專利範圍第1項所述之智慧式監控與預警裝置,其更包含一狀態資料庫,係連接該狀態資料收集模組,以儲存該狀態資料。 The intelligent monitoring and early warning device of claim 1, further comprising a status database connected to the status data collection module to store the status data. 如申請專利範圍第2項所述之智慧式監控與預警裝置,其中該智慧分析模組係連接該狀態資料庫,且該智慧分析模組係讀取該狀態資料庫中的既存資料並進行運算與分析,以比對正常狀態資料與異常狀態資料來建構出該異常模型資料,以及該智慧分析模組係接收使用者所輸入之分析回饋資料來更新與調整該異常模型資料。 The intelligent monitoring and early warning device according to claim 2, wherein the smart analysis module is connected to the state database, and the smart analysis module reads the existing data in the state database and performs an operation. And analyzing, comparing the normal state data with the abnormal state data to construct the abnormal model data, and the smart analysis module receives the analysis feedback data input by the user to update and adjust the abnormal model data. 如申請專利範圍第1項所述之智慧式監控與預警裝置,其中該狀態資料係包含處理器使用率、記憶體使用率、磁碟 存取吞吐流量、磁碟存取操作速率、磁碟存取反應時間、磁碟健康度資訊、網路使用流量及節點反應時間。 The intelligent monitoring and early warning device according to claim 1, wherein the status data includes processor usage, memory usage, and disk. Access throughput traffic, disk access operation rate, disk access response time, disk health information, network usage traffic, and node response time. 如申請專利範圍第1項所述之智慧式監控與預警裝置,其中該預警與反應模組以漸近方式進行調整配置係於一特定時間內執行單一次之調整,並於分散式軟體定義儲存系統資料回復狀態穩定後再進行下一次的調整。 For example, the intelligent monitoring and early warning device described in claim 1 wherein the early warning and response module is adjusted in an asymptotic manner is performed in a specific time to perform a single adjustment, and is defined in a decentralized software definition storage system. After the data recovery status is stable, the next adjustment is made. 一種應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其包含下列步驟:利用一狀態資料收集模組收集分散式軟體定義儲存系統之各節點運行之一狀態資料;利用一智慧分析模組接收該狀態資料並進行分析,並進一步將該狀態資料與一異常模型資料進行比對,進而產生一異常比對結果資料;利用一預警與反應模組讀取分散式軟體定義儲存系統之一目前配置資料,並在接收到該異常比對結果資料後依據異常程度運算出一目標配置資料;以及利用該預警與反應模組比較該目前配置資料與該目標配置資料之差異程度,進而以漸近方式逐步調整分散式軟體定義儲存系統之配置。 A smart monitoring and early warning method for a distributed software definition storage system, comprising the steps of: collecting a state data of each node operation of a distributed software definition storage system by using a state data collection module; using a smart analysis module The group receives the status data and analyzes it, and further compares the status data with an abnormal model data to generate an abnormal comparison result data; and uses an early warning and response module to read one of the distributed software definition storage systems. Currently, the configuration data is obtained, and after receiving the abnormal comparison result data, a target configuration data is calculated according to the abnormality degree; and the difference between the current configuration data and the target configuration data is compared by using the early warning and the response module, and then the asymptotic The method gradually adjusts the configuration of the decentralized software definition storage system. 如申請專利範圍第6項所述之應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其更包含下列步驟:利用一狀態資料庫儲存該狀態資料。 The intelligent monitoring and early warning method applied to the distributed software definition storage system according to claim 6 of the patent application further includes the following steps: storing the status data by using a state database. 如申請專利範圍第7項所述之應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其更包含下列步驟:利用該智慧分析模組讀取該狀態資料庫中的既存資料並進行運算與分析,以比對正常狀態資料與異常狀態資料來建構 出該異常模型資料,並利用該智慧分析模組接收使用者所輸入之分析回饋資料來更新與調整該異常模型資料。 The intelligent monitoring and early warning method applied to the distributed software definition storage system according to claim 7 of the patent application further includes the following steps: reading the existing data in the status database by using the smart analysis module and performing Operation and analysis to construct normal state data and abnormal state data The abnormal model data is extracted, and the smart analysis module receives the analysis feedback data input by the user to update and adjust the abnormal model data. 如申請專利範圍第6項所述之應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其中該狀態資料係包含處理器使用率、記憶體使用率、磁碟存取吞吐流量、磁碟存取操作速率、磁碟存取反應時間、磁碟健康度資訊、網路使用流量及節點反應時間。 The intelligent monitoring and early warning method applied to the distributed software definition storage system, as described in claim 6, wherein the status data includes processor usage, memory usage, disk access throughput, and magnetic Disk access operation rate, disk access response time, disk health information, network usage traffic, and node response time. 如申請專利範圍第6項所述之應用於分散式軟體定義儲存系統之智慧式監控與預警方法,其中該預警與反應模組以漸近方式進行調整配置係於一特定時間內執行單一次之調整,並於分散式軟體定義儲存系統資料回復狀態穩定後再進行下一次的調整。 The intelligent monitoring and early warning method applied to the distributed software definition storage system according to claim 6 of the patent application scope, wherein the early warning and response module is adjusted in an asymptotic manner to perform a single adjustment in a specific time. And the next adjustment is made after the data recovery status of the decentralized software definition storage system is stable.
TW105141327A 2016-12-14 2016-12-14 Intelligent monitoring and warning device and method for distributed software defined storage system TWI591489B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW105141327A TWI591489B (en) 2016-12-14 2016-12-14 Intelligent monitoring and warning device and method for distributed software defined storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW105141327A TWI591489B (en) 2016-12-14 2016-12-14 Intelligent monitoring and warning device and method for distributed software defined storage system

Publications (2)

Publication Number Publication Date
TWI591489B true TWI591489B (en) 2017-07-11
TW201822018A TW201822018A (en) 2018-06-16

Family

ID=60048583

Family Applications (1)

Application Number Title Priority Date Filing Date
TW105141327A TWI591489B (en) 2016-12-14 2016-12-14 Intelligent monitoring and warning device and method for distributed software defined storage system

Country Status (1)

Country Link
TW (1) TWI591489B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913656A (en) * 2019-05-10 2020-11-10 香港商希瑞科技股份有限公司 Computer storage node and method in distributed shared storage system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344026A (en) 2018-07-27 2019-02-15 阿里巴巴集团控股有限公司 Data monitoring method, device, electronic equipment and computer readable storage medium
TWI829895B (en) * 2020-03-20 2024-01-21 中華電信股份有限公司 Model monitoring system based on health and method thereof

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111913656A (en) * 2019-05-10 2020-11-10 香港商希瑞科技股份有限公司 Computer storage node and method in distributed shared storage system
US11042459B2 (en) 2019-05-10 2021-06-22 Silicon Motion Technology (Hong Kong) Limited Method and computer storage node of shared storage system for abnormal behavior detection/analysis
TWI747199B (en) * 2019-05-10 2021-11-21 香港商希瑞科技股份有限公司 Method and computer storage node of shared storage system for abnormal behavior detection/analysis
US11507484B2 (en) 2019-05-10 2022-11-22 Silicon Motion Technology (Hong Kong) Limited Ethod and computer storage node of shared storage system for abnormal behavior detection/analysis
CN111913656B (en) * 2019-05-10 2024-04-19 香港商希瑞科技股份有限公司 Computer storage node and method in distributed shared storage system

Also Published As

Publication number Publication date
TW201822018A (en) 2018-06-16

Similar Documents

Publication Publication Date Title
CN118101421B (en) Intelligent alarm threshold self-adaption method based on machine learning
CN110287081A (en) A kind of service monitoring system and method
CN101632093A (en) System and method for managing performance faults using statistical analysis
CN113282635A (en) Micro-service system fault root cause positioning method and device
KR20180108446A (en) System and method for management of ict infra
CN107872457B (en) Method and system for network operation based on network flow prediction
CN105607973A (en) Method, device and system for processing equipment failures in virtual machine system
TWI591489B (en) Intelligent monitoring and warning device and method for distributed software defined storage system
CN115499294B (en) A distributed storage environment network sub-health detection and fault automatic processing method
CN106789158A (en) Damage identification method and system are insured in a kind of cloud service
CN116781488A (en) Database high availability implementation methods, devices, database architectures, equipment and products
CN119759635A (en) Fault handling system, method, electronic device and storage medium
CN119988168A (en) A web page performance monitoring and analysis method, system, device and storage medium
CN120508428A (en) Intelligent operation and maintenance and fault prediction system and method for data center
CN120216243A (en) Automatic fault detection, diagnosis and processing method, device and terminal based on data platform
US10574552B2 (en) Operation of data network
CN116991947B (en) An automated data synchronization method and system
CN120321102B (en) Alarm intelligent preprocessing method of self-adaptive rule engine
CN114116286A (en) Offline fault diagnosis method and device for Internet of things equipment and electronic equipment
CN120029848A (en) Optimization method, device, equipment and storage medium for database cluster management
CN119512787A (en) An intelligent fault analysis and positioning method and system
CN108449212B (en) MAS message transmission method based on event association
KR102730835B1 (en) Intelligent BMC-based fault detection and fault tolerance management method
CN111277805B (en) A kind of processing method and device for OLT uplink CRC error
CN118826283A (en) An abnormal monitoring and early warning system for new energy management platform

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees