TWI912560B - System and method for triggering a visual indicator of a faulty memory drive - Google Patents
System and method for triggering a visual indicator of a faulty memory driveInfo
- Publication number
- TWI912560B TWI912560B TW111135257A TW111135257A TWI912560B TW I912560 B TWI912560 B TW I912560B TW 111135257 A TW111135257 A TW 111135257A TW 111135257 A TW111135257 A TW 111135257A TW I912560 B TWI912560 B TW I912560B
- Authority
- TW
- Taiwan
- Prior art keywords
- target
- s2pio
- group
- server
- memory disk
- Prior art date
Links
Abstract
Description
本技術係關於資料儲存解決方案且具體言之,係關於一種用於致動毀損記憶碟之視覺指示器的系統及方法。This technology relates to data storage solutions, and more specifically, to a system and method for actuating a visual indicator of a damaged memory disk.
歸因於每天創建之大量數位資料,用於儲存數位資料之儲存要求不斷增加。例如,可需要儲存各種類型之使用者資料、組織資料及/或應用程式資料。此增加對於資料儲存容量之需求。雲端儲存系統可為使用者及/或組織提供資料儲存容量以便應付此等不斷增加之儲存容量要求。Due to the massive amounts of digital data created daily, storage requirements for storing this data are constantly increasing. For example, there may be a need to store various types of user data, organizational data, and/or application data. This increases the demand for data storage capacity. Cloud storage systems can provide users and/or organizations with the data storage capacity to meet these ever-increasing storage requirements.
一般言之,雲端儲存器係其中數位資料儲存於邏輯集區中之電腦儲存器之一模式。實際儲存數位資料之實體儲存器可跨越可能定位於不同位置(即,不同資料中心)且通常由託管雲端儲存服務之一公司管理之多個伺服器。使用者及/或組織通常購買或租賃來自雲端儲存服務提供者之儲存容量以便儲存其等之數位資料。雲端儲存服務提供者繼而負責保持數位資料可用且可存取,同時確保實體儲存器受保護以避免資料丟失。Generally speaking, cloud storage is a model of computer storage where digital data is stored in a logical area. The physical storage that actually stores the digital data can span multiple servers that may be located in different locations (i.e., different data centers) and are typically managed by a company that hosts the cloud storage service. Users and/or organizations typically purchase or lease storage capacity from cloud storage service providers to store their digital data. The cloud storage service provider is then responsible for keeping the digital data available and accessible, while ensuring that the physical storage is protected against data loss.
美國專利申請案第2020/0028902號揭示一種機殼,其包含複數個節點、一網路交換器及經組態以管理機殼之一共用資源之一可程式化裝置。U.S. Patent Application No. 2020/0028902 discloses a housing comprising a plurality of nodes, a network switch, and a programmable device configured to manage a shared resource of the housing.
本技術之開發者已瞭解與先前技術相關聯之某些技術缺點。The developers of this technology have understood certain technical shortcomings associated with previous technologies.
本技術之開發者已意識到,一些解決方案採用一伺服器單元之一基板(BaseBoard)管理控制器(BMC)與一儲存單元之一BMC之間之一匯流排介面尤其用於發送用於致動毀損磁碟機(faulty drive)之視覺指示器之命令。然而,BMC-BMC通信係低效的,此係因為其易受超時、重試及源自用於建立BMC-BMC通信之多層通信協定之複雜性之其他原因影響。The developers of this technology have recognized that some solutions employ a bus interface between a baseboard management controller (BMC) of a server unit and a BMC of a storage unit, particularly for sending commands to actuate visual indicators of faulty drives. However, BMC-BMC communication is inefficient due to its susceptibility to timeouts, retries, and other factors stemming from the complexity of the multi-layered communication protocols used to establish BMC-BMC communication.
概括言之,一伺服器單元之BMC定位於伺服器之主機板上且用於監測伺服器單元之實體狀態。其係可嵌入一電腦(諸如一伺服器單元)之主機板上之一專用微控制器。一BMC亦可管理系統管理軟體與平台硬體之間之介面。一BMC可具有其自身之韌體及揮發性記憶體。應注意,內建於伺服器單元中之不同類型之感測器可向BMC報告參數,諸如溫度、冷卻風扇速度、電源狀態、作業系統(OS)狀態等。BMC可監測感測器且若任何參數未保持在預設定限制內(此指示系統中之一潛在故障),則可經由網路將警報發送給一系統管理員。管理員亦可與BMC遠端通信以採取一些校正動作,諸如將系統重設或電源重啟以使一當機OS再次運行。此等能力可降低擁有或操作一系統之一總成本。In summary, a server unit's BMC (Browser Control Center) is located on the server's motherboard and is used to monitor the physical status of the server unit. It is a dedicated microcontroller that can be embedded on the motherboard of a computer (such as a server unit). A BMC also manages the interface between system management software and platform hardware. A BMC can have its own firmware and volatile memory. It should be noted that different types of sensors built into the server unit can report parameters to the BMC, such as temperature, cooling fan speed, power status, and operating system (OS) status. The BMC can monitor the sensors and, if any parameter is not kept within preset limits (indicating a potential fault in the system), can send an alarm via the network to a system administrator. Administrators can also communicate remotely with the BMC to take corrective actions, such as resetting the system or restarting the power to get a crashed OS running again. These capabilities can reduce the total cost of owning or operating a system.
本技術之開發者已設計一系統,其包括一伺服器單元、一資料儲存單元及用於提供伺服器單元與資料儲存單元之間之更有效通信能力之一匯流排架構。更特定言之,在根據本技術之非限制性實施例實施之一匯流排架構中可不再需要存在於其他已知解決方案中之BMC-BMC通信鏈路,匯流排架構介於伺服器單元與資料儲存單元之間。本文中揭示之匯流排架構可尤其容許致動毀損磁碟機之視覺指示器而無需一伺服器單元之BMC與一資料儲存單元之一BMC通信。The developers of this technology have designed a system comprising a server unit, a data storage unit, and a bus architecture for providing more efficient communication between the server unit and the data storage unit. More specifically, in a bus architecture implemented according to a non-limiting embodiment of this technology, the BMC-BMC communication link present in other known solutions is no longer required, with the bus architecture positioned between the server unit and the data storage unit. The bus architecture disclosed herein particularly allows for the actuation of visual indicators for disk drive failures without communication between the BMC of a server unit and the BMC of a data storage unit.
在本技術之一第一廣泛態樣中,提供一種系統,其包括:一伺服器單元,其包含用於產生輸入/輸出(I/O)操作之一主機處理器及用於監測該伺服器單元之一實體狀態之一服務處理器;及一第一匯流排介面,其用於將該等I/O操作自該主機處理器傳輸至一資料儲存單元以供執行。該資料儲存單元包含:複數個記憶碟,其等用於執行該等I/O操作,其中該複數個記憶碟已被分組為一第一群組及一第二群組,且其中來自一各自群組之一給定記憶碟(i)與該各自群組中之一各自位置相關聯,且(ii)具有指示該給定記憶碟之一狀態之一各自視覺指示器;一第一串列至並列輸入/輸出(S2PIO)裝置,其連接至該第一群組用於控制來自該第一群組之各自記憶碟之視覺指示器,一第二S2PIO裝置,其連接至該第二群組用於控制來自該第二群組之各自記憶碟之視覺指示器。該系統包括用於將命令自該服務處理器傳輸至該等第一及第二S2PIO裝置之一第二匯流排介面,其中該第二匯流排介面包含介於該服務處理器與該第一S2PIO裝置之間之一第一鏈路及介於該服務處理器與該第二S2PIO裝置之間之一第二鏈路。該系統經組態以:藉由該服務處理器自該主機處理器獲取該資料儲存單元中之一毀損記憶碟之一指示。該指示係指示該第一鏈路及該第二鏈路當中之一目標鏈路及該毀損記憶碟在目標群組中之一位置。該目標鏈路連接至該第一S2PIO裝置及該第二S2PIO裝置當中之一目標S2PIO裝置,且該目標S2PIO裝置連接至該第一及該第二群組當中與該毀損記憶碟相關聯之一目標群組。該系統經組態以藉由該服務處理器使用該目標鏈路將一命令傳輸至該目標S2PIO裝置,且其中該命令用於引起該目標S2PIO裝置基於該毀損記憶碟在該目標群組中之該位置致動與該毀損記憶碟相關聯之一視覺指示器。In one first general embodiment of the present art, a system is provided, comprising: a server unit including a host processor for generating input/output (I/O) operations and a service processor for monitoring the physical status of the server unit; and a first bus interface for transmitting the I/O operations from the host processor to a data storage unit for execution. The data storage unit includes: a plurality of memory disks for performing the I/O operations, wherein the plurality of memory disks are grouped into a first group and a second group, and wherein a given memory disk from a respective group (i) is associated with a respective location in the respective group, and (ii) has a respective visual indicator indicating the status of the given memory disk; a first serial-to-parallel input/output (S2PIO) device connected to the first group for controlling the visual indicators of the respective memory disks from the first group; and a second S2PIO device connected to the second group for controlling the visual indicators of the respective memory disks from the second group. The system includes a second bus interface for transmitting commands from the service processor to one of the first and second S2PIO devices, wherein the second bus interface includes a first link between the service processor and the first S2PIO device and a second link between the service processor and the second S2PIO device. The system is configured to: obtain an indication of a damaged memory disk in the data storage unit from the host processor via the service processor. The indication indicates a target link of the first and second links and the location of the damaged memory disk in a target group. The target link is connected to one of the first S2PIO devices and the second S2PIO device, and the target S2PIO device is connected to a target group associated with the damaged memory disk in the first and second groups. The system is configured to transmit a command to the target S2PIO device via the target link through the server processor, wherein the command is used to cause the target S2PIO device to actuate a visual indicator associated with the damaged memory disk based on the location of the damaged memory disk in the target group.
在系統之一些實施例中,該第一S2PIO及該第二S2PIO具有各自唯一識別符。該系統進一步經組態以藉由該目標S2PIO將該目標S2PIO之該唯一識別符之一指示傳輸至該服務處理器。該目標S2PIO之該唯一識別符及該毀損記憶碟在該目標群組中之該位置形成該毀損記憶碟在該資料儲存單元中之一唯一識別符。In some embodiments of the system, the first S2PIO and the second S2PIO each have a unique identifier. The system is further configured to transmit an indication of one of the unique identifiers of the target S2PIO to the service processor via the target S2PIO. The unique identifier of the target S2PIO and the location of the damaged memory disk in the target group form a unique identifier for the damaged memory disk in the data storage unit.
在系統之一些實施例中,該伺服器單元之該服務處理器係該伺服器單元之一基板管理控制器(BMC)。In some embodiments of the system, the service processor of the server unit is a baseboard management controller (BMC) of the server unit.
在系統之一些實施例中,該第一匯流排介面係一串列AT附接(SATA)匯流排介面。In some embodiments of the system, the first bus interface is a serial AT Attached (SATA) bus interface.
在系統之一些實施例中,該第二匯流排介面係一積體電路間(I2C)匯流排介面,且其中該第一鏈路係一第一I2C鏈路且該第二鏈路係一第二I2C鏈路。In some embodiments of the system, the second bus interface is an inter-integrated circuit (I2C) bus interface, wherein the first link is a first I2C link and the second link is a second I2C link.
在系統之一些實施例中,該資料儲存單元係一集束磁碟(JBOD)單元。In some embodiments of the system, the data storage unit is a clustered disk (JBOD) unit.
在系統之一些實施例中,該複數個記憶碟包括一硬碟機(HDD)及一固態硬碟(SSD)之至少一者。In some embodiments of the system, the plurality of memory disks includes at least one of a hard disk drive (HDD) and a solid-state drive (SSD).
在系統之一些實施例中,該S2PIO裝置係一通用I/O (GPIO)擴展器裝置。In some embodiments of the system, the S2PIO device is a general-purpose I/O (GPIO) expander device.
在系統之一些實施例中,該GPIO擴展器裝置係一PCA9995。In some embodiments of the system, the GPIO expander device is a PCA9995.
在系統之一些實施例中,該S2PIO裝置係一複可程式化邏輯裝置(CPLD)。In some embodiments of the system, the S2PIO device is a complex programmable logic device (CPLD).
在系統之一些實施例中,該S2PIO係一場可程式化閘陣列(FPGA)。In some implementations of the system, the S2PIO is a programmable gate array (FPGA).
在本發明之一第二廣泛態樣中,提供一種致動一毀損記憶碟之一指示器之電腦實施方法。該方法可由一系統執行。該系統包括:一伺服器單元,其包含用於產生輸入/輸出(I/O)操作之一主機處理器及用於監測該伺服器單元之一實體狀態之一服務處理器;及一第一匯流排介面,其用於將該等I/O操作自該主機處理器傳輸至一資料儲存單元以供執行。該資料儲存單元包含用於執行該等I/O操作之複數個記憶碟,其中該複數個記憶碟已被分組為一第一群組及一第二群組。來自一各自群組之一給定記憶碟(i)與該各自群組中之一各自位置相關聯,且(ii)具有指示該給定記憶碟之一狀態之一各自視覺指示器。該系統包括連接至該第一群組用於控制來自該第一群組之各自記憶碟之視覺指示器之一第一串列至並列輸入/輸出(S2PIO)裝置,及連接至該第二群組用於控制來自該第二群組之各自記憶碟之視覺指示器之一第二S2PIO裝置,及用於將命令自該服務處理器傳輸至該等第一及第二S2PIO裝置之一第二匯流排介面,其中該第二匯流排介面包含介於該服務處理器與該第一S2PIO裝置之間之一第一鏈路及介於該服務處理器與該第二S2PIO裝置之間之一第二鏈路。該方法包括:藉由該服務處理器自該主機處理器獲取該資料儲存單元中之一毀損記憶碟之一指示。該指示係指示該第一鏈路及該第二鏈路當中之一目標鏈路及該毀損記憶碟在目標群組中之一位置。該目標鏈路連接至該第一S2PIO裝置及該第二S2PIO裝置當中之一目標S2PIO裝置,且該目標S2PIO裝置連接至該第一及該第二群組當中與該毀損記憶碟相關聯之一目標群組。該方法包括藉由該服務處理器使用該目標鏈路將一命令傳輸至該目標S2PIO裝置,其中該命令用於引起該目標S2PIO裝置基於該毀損記憶碟在該目標群組中之該位置致動與該毀損記憶碟相關聯之一視覺指示器。In a second broad embodiment of the present invention, a computer implementation method is provided for actuating an indicator of a damaged memory disk. The method can be performed by a system. The system includes: a server unit comprising a host processor for generating input/output (I/O) operations and a service processor for monitoring the physical status of the server unit; and a first bus interface for transmitting the I/O operations from the host processor to a data storage unit for execution. The data storage unit includes a plurality of memory disks for performing the I/O operations, wherein the plurality of memory disks have been grouped into a first group and a second group. A given memory disk from a respective group (i) is associated with a respective location in the respective group, and (ii) has a respective visual indicator indicating the state of the given memory disk. The system includes a first serial-to-parallel input/output (S2PIO) device connected to the first group for controlling visual indicators of respective memory disks from the first group, and a second S2PIO device connected to the second group for controlling visual indicators of respective memory disks from the second group, and a second bus interface for transmitting commands from the server to the first and second S2PIO devices, wherein the second bus interface includes a first link between the server and the first S2PIO device and a second link between the server and the second S2PIO device. The method includes: obtaining an indication of a damaged memory disk in the data storage unit from the host processor via the server. The indication indicates a target link of the first link and the second link, and a position of the damaged memory disk within a target group. The target link is connected to a target S2PIO device of the first S2PIO device and the second S2PIO device, and the target S2PIO device is connected to a target group associated with the damaged memory disk within the first and second groups. The method includes transmitting a command via the target link to the target S2PIO device using the server processor, wherein the command causes the target S2PIO device to actuate a visual indicator associated with the damaged memory disk based on the position of the damaged memory disk within the target group.
在方法之一些實施例中,該方法進一步包括藉由該目標S2PIO將該目標S2PIO之該唯一識別符之一指示傳輸至該服務處理器。該目標S2PIO之該唯一識別符及該毀損記憶碟在該目標群組中之該位置形成該毀損記憶碟在該資料儲存單元中之一唯一識別符。In some embodiments of the method, the method further includes transmitting an indication of one of the unique identifiers of the target S2PIO to the service processor via the target S2PIO. The unique identifier of the target S2PIO and the location of the damaged memory disk in the target group form a unique identifier of the damaged memory disk in the data storage unit.
在本技術之一第三廣泛態樣中,提供一種系統,其包括:一伺服器單元,其包含用於產生輸入/輸出(I/O)操作之一處理器及用於監測該伺服器單元之一實體狀態之一基板管理控制器(BMC);及一集束磁碟(JBOD),該JBOD藉由一SATA匯流排耦合至該伺服器單元以用於接收來自該處理器之該等I/O操作。該JBOD包含用於執行該等I/O操作之複數個記憶碟,且該複數個記憶碟已被分組為一第一群組及一第二群組。來自一各自群組之一給定記憶碟(i)與該各自群組中之一各自位置相關聯,且(ii)具有指示該給定記憶碟之一狀態之一各自視覺指示器。該系統包括連接至該第一群組用於控制來自該第一群組之各自記憶碟之視覺指示器之一第一通用輸入/輸出裝置(GPIO),及連接至該第二群組用於控制來自該第二群組之各自記憶碟之視覺指示器之一第二GPIO。該系統包括將該BMC與該第一GPIO連接以用於將命令自該BMC傳輸至該第一GPIO之一第一I2C匯流排;及將該BMC與該第二GPIO連接以用於將命令傳輸至該第二GPIO之一第二I2C匯流排。該系統經組態以藉由該BMC自該處理器獲取該JBOD中之一毀損記憶碟之一指示。該指示係指示:該第一I2C匯流排及該第二I2C匯流排當中之一目標I2C匯流排,及該毀損記憶碟在目標群組中之一位置。該目標I2C匯流排連接至該第一GPIO及該第二GPIO當中之一目標GPIO,且該目標GPIO連接至該第一及該第二群組當中與該毀損記憶碟相關聯之一目標群組。該系統經組態以藉由該BMC使用該目標I2C匯流排將一命令傳輸至該目標GPIO,其中該命令用於引起該目標GPIO基於該毀損記憶碟在該目標群組中之該位置致動與該毀損記憶碟相關聯之一視覺指示器。In a third broad embodiment of the present art, a system is provided comprising: a server unit including a processor for generating input/output (I/O) operations and a baseboard management controller (BMC) for monitoring a physical state of the server unit; and a bundled disk (JBOD) coupled to the server unit via a SATA bus for receiving the I/O operations from the processor. The JBOD includes a plurality of memory disks for performing the I/O operations, and the plurality of memory disks are grouped into a first group and a second group. A given memory disk from each group (i) is associated with a respective location within the respective group, and (ii) has a respective visual indicator indicating a state of the given memory disk. The system includes a first general purpose input/output (GPIO) connected to the first group for controlling visual indicators of respective memory disks from the first group, and a second GPIO connected to the second group for controlling visual indicators of respective memory disks from the second group. The system includes a BMC connected to the first GPIO for transmitting commands from the BMC to a first I2C bus of the first GPIO; and a BMC connected to the second GPIO for transmitting commands to a second I2C bus of the second GPIO. The system is configured to obtain an indication of a damaged memory disk in the JBOD from the processor via the BMC. The indication specifies a target I2C bus, either the first or the second I2C bus, and a location of the damaged memory disk within a target group. The target I2C bus is connected to a target GPIO, either the first or the second GPIO, and the target GPIO is connected to a target group associated with the damaged memory disk within the first and second groups. The system is configured to transmit a command via the BMC to the target GPIO using the target I2C bus, wherein the command causes the target GPIO to actuate a visual indicator associated with the damaged memory disk based on the location of the damaged memory disk within the target group.
在系統之一些實施例中,該第一GPIO及該第二GPIO具有各自唯一識別符,且該系統進一步經組態以藉由該目標GPIO將該目標GPIO之該唯一識別符之一指示傳輸至該BMC。該目標GPIO之該唯一識別符及該毀損記憶碟在該目標群組中之該位置形成該毀損記憶碟在該JBOD中之一唯一識別符。In some embodiments of the system, the first GPIO and the second GPIO each have a unique identifier, and the system is further configured to transmit one of the unique identifiers of the target GPIO to the BMC via the target GPIO. The unique identifier of the target GPIO and the location of the damaged memory disk in the target group form a unique identifier of the damaged memory disk in the JBOD.
在本說明書之背景內容中,「用戶端裝置」係能夠運行對於手頭的相關任務適當之軟體之任何電腦硬體。因此,用戶端裝置之一些(非限制性)實例包含個人電腦(桌上型電腦、膝上型電腦、小筆電等)、智慧型電話及平板電腦,以及網路設備(諸如路由器、交換器及閘道器)。應注意,不排除在本背景內容中用作一用戶端裝置之一裝置用作其他用戶端裝置之一伺服器。表達「一用戶端裝置」之使用不排除多個用戶端裝置用於接收/發送、實行任何任務或請求或引起任何任務或請求被實行,或任何任務或請求之結果,或本文中描述之任何方法之步驟。In the context of this specification, "client device" refers to any computer hardware capable of running software appropriate for the relevant task at hand. Therefore, some (non-limiting) examples of client devices include personal computers (desktops, laptops, laptops, etc.), smartphones and tablets, and network devices (such as routers, switches, and gateways). It should be noted that the device used as a client device in this context may be used as a server for other client devices. The use of "client device" does not preclude the use of multiple client devices for receiving/sending, performing, or causing any task or request to be performed, or the result of any task or request, or the steps of any method described herein.
在本說明書之背景內容中,表達「資訊」包含能夠儲存於一資料庫中之任何性質或種類之資訊。因此,資訊包含(但不限於)視聽作品(影像、電影、錄音、演示等)、資料(位置資料、數值資料等)、文本(意見、評論、問題、訊息等)、文件、試算表、字詞清單等。In the background section of this manual, the term "information" is used to refer to any type or nature of information that can be stored in a database. Therefore, information includes (but is not limited to) audiovisual works (images, films, recordings, demonstrations, etc.), data (location data, numerical data, etc.), text (opinions, comments, questions, messages, etc.), documents, spreadsheets, word lists, etc.
在本說明書之背景內容中,表達「組件」意謂包含既必要且亦足以達成所引用之(若干)特定功能之軟體(適用於一特定硬體背景內容)。In the context of this manual, the term "component" means software that is both necessary and sufficient to achieve the specific functions referred to (for a specific hardware context).
在本說明書之背景內容中,表達「電腦可使用資訊儲存媒體」旨在包含任何性質及種類之媒體,包含RAM、ROM、磁碟(CD-ROM、DVD、軟碟、硬碟機等)、USB鑰匙、固態磁碟、磁帶機等。In the background section of this manual, the statement "computers can use information storage media" is intended to include media of any nature and type, including RAM, ROM, disks (CD-ROM, DVD, floppy disk, hard disk drive, etc.), USB keys, solid-state drives, tape drives, etc.
在本說明書之背景內容中,字詞「第一」、「第二」、「第三」等僅為了容許將其等修飾之名詞彼此區分之目的且非為了描述該等名詞之間之任何特定關係之目的而被用作形容詞。因此,例如,應理解,術語「第一伺服器」及「第三伺服器」之使用不旨在暗示(例如)伺服器之/之間之任何特定順序、類型、時序、階層或排序,其等之使用(自身)亦不旨在暗示任何「第二伺服器」在任何給定情境中必須一定存在。此外,如在其他背景內容中在本文中論述,對一「第一」元件及一「第二」元件之引用不排除兩個元件係相同的實際真實世界元件。因此,例如,在一些例項中,一「第一」伺服器及一「第二」伺服器可為相同軟體及/或硬體,在其他情況中,其等可為不同軟體及/或硬體。In the background of this specification, the terms "first," "second," "third," etc., are used as adjectives only to distinguish these modified nouns from one another and not to describe any particular relationship between them. Therefore, for example, it should be understood that the use of the terms "first server" and "third server" is not intended to imply, for example, any particular order, type, timing, hierarchy, or sequence of servers, nor is it intended to imply that any "second server" must necessarily exist in any given situation. Furthermore, as discussed elsewhere in the background, references to a "first" element and a "second" element do not preclude the two elements from being the same real-world element. Therefore, for example, in some cases, a "first" server and a "second" server may be the same software and/or hardware, while in others they may be different software and/or hardware.
本技術之實施方案各具有上文提及之目的及/或態樣之至少一者,但不一定具有其等之全部。應理解,起因於嘗試獲得上文提及之目的之本技術之一些態樣可能未滿足此目的及/或可能滿足在本文中未具體敘述之其他目的。Each embodiment of this technology has at least one of the purposes and/or forms mentioned above, but not necessarily all of them. It should be understood that some forms of this technology arising from an attempt to achieve the purposes mentioned above may not satisfy these purposes and/or may satisfy other purposes not specifically described herein.
自以下描述、隨附圖式及隨附發明申請專利範圍,將變得明白本技術之實施方案之額外及/或替代特徵、態樣及優點。The additional and/or alternative features, forms and advantages of the embodiments of the present invention will become clear from the following description, accompanying drawings and the accompanying invention application scope.
本詳細描述旨在僅為本技術之闡釋性實例之一描述。此描述不旨在定義本技術之範疇亦未闡述其界限。This detailed description is intended only to illustrate one example of the art. It is not intended to define the scope of the art, nor does it set forth its boundaries.
此外,在未如此做之情況下(即,在未闡述修改之實例之情況下),不應解譯為無修改係可行的及/或所描述內容係實施本技術之特定態樣之唯一方式。另外,應理解,本詳細描述在某些例項中提供本技術之簡單實施方案,且在此種情況下,其等以此方式呈現以輔助理解。本技術之各種實施方案可具有一更大複雜性。Furthermore, unless such a modification is described (i.e., unless an example of modification is explained), it should not be interpreted as impractical without modification and/or that the described content is the only way to implement the specific form of the present technology. Additionally, it should be understood that this detailed description provides simple implementations of the present technology in certain examples, and in such cases, they are presented in this manner to aid understanding. Various implementations of the present technology can have greater complexity.
參考圖1,描繪一系統100。系統100經組態用於實施本技術之非限制性實施例。應明確理解,如描繪之系統100僅係本技術之一闡釋性實施方案。因此,下文之其描述旨在僅作為本技術之闡釋性實例之一描述。此描述不旨在定義本技術之範疇或闡述其界限。Referring to Figure 1, a system 100 is depicted. System 100 is configured for a non-limiting embodiment of the present invention. It should be clearly understood that the depicted system 100 is merely one illustrative embodiment of the present invention. Therefore, the following description is intended only as one illustrative example of the present invention. This description is not intended to define the scope or limits of the present invention.
在一些情況中,下文亦可闡述據信係對系統100之修改之有用實例之內容。如此做僅係為了輔助理解且再次,不定義本技術之範疇或闡述其界限。此等修改非一窮舉清單且如熟習此項技術者應理解,其他修改可能係可行的。此外,在未如此做之情況下(即,在未闡述修改之實例之情況下),不應解譯為無修改係可行的及/或所描述內容係實施本技術之元件之唯一方式。如熟習此項技術者應理解,情況可能並非如此。另外,應理解,系統100可在某些例項中提供本技術之簡單實施方案,且在此種情況下,其等以此方式呈現以輔助理解。如熟習此項技術者應理解,本技術之各種實施方案可具有一更大複雜性。In some cases, examples believed to be modifications to System 100 may also be described below. This is done solely to aid understanding and, again, does not define the scope or limit of the art. Such modifications are not exhaustive and, as those skilled in the art will understand, other modifications may be feasible. Furthermore, the absence of such description (i.e., the absence of examples of modifications described) should not be interpreted as the absence of modifications and/or as the only way to implement the elements of the art. As those skilled in the art will understand, this may not be the case. Additionally, it should be understood that System 100 may provide simplified implementations of the art in certain examples, and in such cases, they are presented in this manner to aid understanding. Those familiar with this technology should understand that various implementation schemes of this technology can have greater complexity.
系統100包括一請求源102、一通信網路103及一處理子系統108。現將描述如何根據本技術之各種非限制性實施例實施系統100之上文列舉之組件。 請求源 System 100 includes a request source 102, a communication network 103, and a processing subsystem 108. How the components listed above in System 100 are implemented according to various non-limiting embodiments of the present art will now be described. Request Source
請求源102可為與一終端使用者相關聯之一電子裝置(例如,一用戶端裝置)或替代地,經組態以向系統100提供使用者請求之系統100之任何其他子系統。應明確理解,即使圖1僅描繪請求源102之一單一例項,系統100仍可具有請求源102之多個例項。如本文中繪示,請求源102係系統100之部分;然而,在本技術之一些實施例中,請求源102可在系統100外部且經由一通信鏈路(未加元件符號)連接。Request source 102 may be an electronic device associated with an end user (e.g., a client device) or, alternatively, any other subsystem of system 100 configured to provide user requests to system 100. It should be clearly understood that while Figure 1 depicts only a single example of request source 102, system 100 may have multiple examples of request source 102. As illustrated herein, request source 102 is part of system 100; however, in some embodiments of the art, request source 102 may be external to system 100 and connected via a communication link (without component symbols).
事實上,系統100之一典型實施方案可包含大量請求源102,諸如數百個例項、數千個例項、數百萬個例項及類似者。In fact, a typical implementation of System 100 can contain a large number of request sources 102, such as hundreds, thousands, millions of instances and similar.
在本技術之一些實施例中,在其中系統100用於一企業對客戶(B2C)環境中之情況下,請求源102可為(例如)與系統100之一給定使用者相關聯之一給定用戶端裝置,諸如一智慧型電話。例如,系統100可潛在地為給定使用者之給定用戶端裝置提供雲端儲存服務。In some embodiments of this technology, where system 100 is used in a business-to-customer (B2C) environment, request source 102 may be, for example, a given client device associated with a given user of system 100, such as a smartphone. For example, system 100 may potentially provide cloud storage services to the given user's given client device.
在本技術之其他實施例中,在其中系統100用於一企業對企業(B2B)環境中之情況下,請求源102可為例如,將使用者請求提供至系統100之一給定子系統,諸如一遠端伺服器。例如,在本技術之一些實施例中,系統100可為給定子系統之一操作者提供容錯資料處理及/或儲存服務。In other embodiments of this technology, where system 100 is used in a business-to-business (B2B) environment, request source 102 may, for example, provide user requests to one of the stator systems in system 100, such as a remote server. For example, in some embodiments of this technology, system 100 may provide fault-tolerant data processing and/or storage services to one of the stator system operators.
概括言之,請求源102可為可在系統100內部或外部之一給定用戶端裝置或另一子系統而無關於系統100是否被實施為一B2C或一B2B系統(或就此而言系統之任何其他變動)。In summary, request source 102 may be a client device or another subsystem that can be given either inside or outside of system 100, regardless of whether system 100 is implemented as a B2C or a B2B system (or any other variation of the system for that matter).
如上文提及,請求源102經組態以發出複數個請求180,該複數個請求180之各者將在下文中被稱為請求180。請求180之性質將取決於請求源102之一類型及本技術之特定實施方案。As mentioned above, request source 102 is configured to issue a plurality of requests 180, each of which will be referred to as request 180 below. The nature of request 180 will depend on the type of request source 102 and the specific implementation of this technology.
在本技術之一些實施例中,請求源102亦經組態以接收複數個回應181,該複數個回應181之各者將在下文中被稱為回應181。一般言之,回應於請求180由系統100處理(或潛在地未處理),系統100可產生以與各自請求180相關聯之請求源102為目的地之回應181。回應181之性質將尤其取決於請求源102之一類型,各自請求180之類型及系統100是否處理(或潛在地未處理)各自請求180。In some embodiments of this art, request source 102 is also configured to receive a plurality of responses 181, each of which will be referred to below as a response 181. Generally speaking, in response to request 180 being processed (or potentially unprocessed) by system 100, system 100 may generate a response 181 destined for the request source 102 associated with each request 180. The nature of the response 181 will depend in particular on the type of request source 102, the type of each request 180, and whether system 100 processes (or potentially unprocesses) each request 180.
在一個實例中,在請求180之處理期間,系統100可經組態以請求來自請求源102之額外資料以用於繼續或完成請求180之處理。在此一情況中,系統100可經組態以產生回應181,該回應181呈指示由系統100請求之額外資料以用於繼續或完成請求180之處理之一資料請求訊息之一形式。In one example, during the processing of request 180, system 100 may be configured to request additional data from request source 102 to continue or complete the processing of request 180. In this case, system 100 may be configured to generate a response 181, which is in the form of a data request message instructing system 100 to request additional data to continue or complete the processing of request 180.
在另一實例中,若系統100成功地處理各自請求180,則系統100可經組態以產生呈指示各自請求180之成功處理之一成功訊息之一形式之回應181。In another example, if system 100 successfully processes each request 180, system 100 may be configured to generate a response 181 in the form of a success message indicating that each request 180 has been successfully processed.
在一進一步實例中,若系統100未能成功地處理各自請求180,則系統100可經組態以產生呈指示各自請求180之處理失敗之一失敗訊息之一形式之回應181。在此一情況中,請求源102可經組態以執行額外動作,諸如(但不限於)重新發出請求180,執行用於識別由系統100對請求180處理失敗之原因之診斷分析,發出以系統100為目的地之一新請求及類似者。 通信網路 In a further example, if system 100 fails to successfully process each request 180, system 100 can be configured to generate a response 181 in the form of a failure message indicating that the processing of each request 180 has failed. In this case, request source 102 can be configured to perform additional actions, such as (but not limited to) re-issuing request 180, performing a diagnostic analysis to identify the cause of the failure of system 100 to process request 180, issuing a new request or similar request destined for system 100. Communication Network
請求源102通信地耦合至通信網路103以用於將請求180提供至系統100且用於接收來自系統100之回應181。在本技術之一些非限制性實施例中,通信網路103可經實施為網際網路。在本技術之其他非限制性實施例中,通信網路103可經不同地實施,諸如任何廣域通信網路、區域通信網路、一私人通信網路及類似者。如何實施請求源102與通信網路103之間之一通信鏈路(未分開地加元件符號)將尤其取決於如何實施請求源102。Request source 102 is communicatively coupled to communication network 103 for providing request 180 to system 100 and for receiving response 181 from system 100. In some non-limiting embodiments of the present invention, communication network 103 may be implemented as the Internet. In other non-limiting embodiments of the present invention, communication network 103 may be implemented differently, such as any wide area communication network, local area communication network, a private communication network, and the like. How a communication link (without separate component symbols) between request source 102 and communication network 103 is implemented will depend in particular on how request source 102 is implemented.
僅作為一實例且非作為一限制,在其中請求源102經實施為一無線通信裝置(諸如一智慧型電話)之本技術之該等實施例中,通信鏈路可經實施為一無線通信鏈路(諸如(但不限於)一3G通信網路鏈路、一4G通信網路鏈路、無線保真或簡稱為WiFi®、Bluetooth®及類似者)。在其中請求源102經實施為一遠端伺服器之該等實例中,通信鏈路可為無線的(諸如無線保真或簡稱為WiFi®、Bluetooth®或類似者)或有線的(諸如一基於乙太網路之連接)。By way of example only and not as a limitation, in the embodiments of the present art in which claim 102 is implemented as a wireless communication device (such as a smartphone), the communication link may be implemented as a wireless communication link (such as (but not limited to) a 3G communication network link, a 4G communication network link, Wi-Fi or abbreviated as WiFi®, Bluetooth® and the like). In the embodiments in which claim 102 is implemented as a remote server, the communication link may be wireless (such as Wi-Fi or abbreviated as WiFi®, Bluetooth® or the like) or wired (such as an Ethernet-based connection).
應注意,通信網路103經組態以尤其將包括請求180之一請求資料封包自請求源102傳輸至系統100之請求預處理子系統104。例如,此請求資料封包可包括以表示請求180之一給定宣言型程式設計語言撰寫之電腦可執行指令。通信網路130亦經組態以尤其將包括回應181之一回應資料封包自系統100傳輸至請求源102。例如,此回應資料封包可包括表示回應181之電腦可執行指令。It should be noted that communication network 103 is configured to transmit, in particular, a request data packet including request 180 from request source 102 to request preprocessing subsystem 104 of system 100. For example, this request data packet may include computer-executable instructions written in a declarative programming language representing request 180. Communication network 130 is also configured to transmit, in particular, a response data packet including response 181 from system 100 to request source 102. For example, this response data packet may include computer-executable instructions representing response 181.
然而,經審慎考慮,在本技術之一些實施例中,在其中請求源102係系統100之一給定子系統之情況下,例如,通信網路103可以不同於上文描述之方式之一方式實施,或在一些情況中,甚至可省略,而不脫離本技術之範疇。 處理子系統 However, upon careful consideration, in some embodiments of this technology, where the request source 102 is a stator subsystem of system 100, for example, the communication network 103 may be implemented in a manner different from that described above, or in some cases, even omitted, without departing from the scope of this technology. Processing Subsystem
如上文提及,系統100亦包括處理子系統108。一般言之,處理子系統108經組態以基於請求180處理並儲存資料。As mentioned above, system 100 also includes processing subsystem 108. Generally speaking, processing subsystem 108 is configured to process and store data based on request 180.
為了處理並儲存資料,處理子系統108包括複數個伺服器機架150,該複數個伺服器機架150之各者在下文中將被稱為伺服器機架150。根據本技術之各項實施例,複數個伺服器機架150之一些或全部可定位於一單一位置中或分佈遍及不同位置。例如,複數個伺服器機架150之一些或全部可定位於一單一資料中心中及/或分佈遍及複數個資料中心。To process and store data, the processing subsystem 108 includes a plurality of server racks 150, each of which will be referred to below as a server rack 150. According to various embodiments of the present invention, some or all of the plurality of server racks 150 may be located in a single location or distributed across different locations. For example, some or all of the plurality of server racks 150 may be located in a single data center and/or distributed across a plurality of data centers.
一般言之,一給定伺服器機架係圍封各種硬體設備之一機櫃且通常用於以適於設備之最佳化及各自資料中心中之佔用面積(floor space)之利用之一方式容置並組織硬體設備。在一些情況中,將硬體設備圍封於伺服器機架中增設防止竊賊或意外損害之額外安全性。在其他情況中,將硬體設備圍封於伺服器機架中容許氣流之更佳控制及因此硬體設備之經改良冷卻。Generally speaking, a given server rack is a cabinet that encloses various hardware devices and is typically used to house and organize the hardware in a way that optimizes the equipment and utilizes the floor space within the data center. In some cases, enclosing the hardware in a server rack adds extra security against theft or accidental damage. In other cases, enclosing the hardware in a server rack allows for better control of airflow and thus improved cooling of the hardware.
伺服器機架150包括複數個托盤200,該複數個托盤200之各者將在下文中被稱為一托盤200。一般言之,一給定托盤200容許一人類操作者抽出定位於給定托盤中之硬體設備以供檢測、維護及替換。The server rack 150 includes a plurality of trays 200, each of which will be referred to below as a tray 200. Generally speaking, a given tray 200 allows a human operator to remove hardware located in the given tray for testing, maintenance and replacement.
在一些情況中,給定托盤可包含一起在本文中被稱為一「伺服器」之硬體組件。在其他情況中,給定托盤可包含一起在本文中被稱為一「資料儲存單元」之硬體組件。因此,伺服器機架150可包括各包含一或多個伺服器之複數個伺服器專屬托盤,及各包含一或多個資料儲存單元之複數個資料儲存專屬托盤。In some cases, a given pallet may contain a hardware component referred to herein as a "server". In other cases, a given pallet may contain a hardware component referred to herein as a "data storage unit". Thus, server rack 150 may include a plurality of server-dedicated pallets, each containing one or more servers, and a plurality of data storage-dedicated pallets, each containing one or more data storage units.
參考圖2,描繪一資料儲存單元202定位於其中之托盤200之一左前俯視透視圖。在本技術之至少一項實施例中,資料儲存單元202可經實施為「集束磁碟機」(JBOD),其係使用可獨立地處理或可使用一卷管理器或一跨裝置檔案系統組合成一多個邏輯卷之一架構。不用說,托盤200經組態以在一閉合位置與一或多個抽出位置之間移動,在該閉合位置中,僅一前面板(未加元件符號)可由一人類操作者接取,在該或該等抽出位置中,托盤200內之JBOD之額外組件變得可由人類操作者接取。Referring to Figure 2, a left-front top perspective view is depicted of a data storage unit 202 positioned within a tray 200. In at least one embodiment of the present art, the data storage unit 202 may be implemented as a "JBOD," which uses an architecture that can independently process or can be combined into a plurality of logical volumes using a volume manager or a cross-device file system. Needless to say, the tray 200 is configured to move between a closed position and one or more extended positions, in which only a front panel (without component symbols) is accessible to a human operator, and in the extended positions, additional components of the JBOD within the tray 200 become accessible to a human operator.
資料儲存單元202包括複數個記憶碟210,該複數個記憶碟210之各者將在下文中被稱為記憶碟210。記憶碟210可經實施為固態硬碟(SSD)、硬碟機(HDD)或類似者。經審慎考慮,記憶碟210可為一給定可移動磁碟機(disk type drive)或一固定(靜態)磁碟機。亦經審慎考慮,複數個記憶碟210可包含至少一些HDD及至少一些SSD。在本技術之一個實施方案中,複數個記憶碟210可包含28個記憶碟。在其他實施方案中,多於或少於28個記憶碟可包含於資料儲存單元202中,而不脫離本技術之範疇。Data storage unit 202 includes a plurality of memory disks 210, each of which will be referred to below as a memory disk 210. A memory disk 210 may be implemented as a solid-state drive (SSD), a hard disk drive (HDD), or the like. Upon careful consideration, a memory disk 210 may be a given disk-type drive or a stationary (static) disk drive. Upon careful consideration, the plurality of memory disks 210 may include at least some HDDs and at least some SSDs. In one embodiment of the present invention, the plurality of memory disks 210 may include 28 memory disks. In other embodiments, more or fewer than 28 memory disks may be included in data storage unit 202 without departing from the scope of this technology.
資料儲存單元202包括保持並提供資料儲存單元202之不同硬體組件當中之通信之一主機板220。如在此實施例中描繪,主機板220固持資料儲存單元202之一服務處理器230且經調適用於連接風扇(未加元件符號)及一電力供應器(未加元件符號)及數個匯流排250。Data storage unit 202 includes a host board 220 that maintains and provides communication among the various hardware components of data storage unit 202. As depicted in this embodiment, host board 220 houses a server processor 230 of data storage unit 202 and is adapted for connecting a fan (not labeled) and a power supply (not labeled) and several buses 250.
應注意,記憶碟210具有視覺指示器215。概括言之,視覺指示器215用於向一人類操作者指示各自記憶碟210之狀態。通常言之,兩個視覺指示器215可定位於記憶碟上且可經由發光二極體(LED)及其他適合構件實施。It should be noted that the memory disk 210 has a visual indicator 215. In general, the visual indicator 215 is used to indicate the status of the respective memory disk 210 to a human operator. Typically, two visual indicators 215 can be positioned on the memory disk and can be implemented via light-emitting diodes (LEDs) and other suitable components.
當記憶碟210需要維護或替換時,例如,系統100可經組態以致動記憶碟210之一給定視覺指示器215,使得人類操作者知道在一給定伺服器機房中且更具體言之,在伺服器機架150中檢測哪一記憶碟210。現將參考圖3更詳細論述子系統108之額外硬體組件及子系統108如何經組態以致動一毀損記憶碟之一給定視覺指示器。When memory disk 210 requires maintenance or replacement, for example, system 100 can configure and actuate a given visual indicator 215 for one of memory disks 210, so that a human operator knows which memory disk 210 is being detected in a given server room, and more specifically, in server rack 150. The additional hardware components of subsystem 108 and how subsystem 108 configures and actuates a given visual indicator for a damaged memory disk will now be discussed in more detail with reference to FIG3.
在圖3中,描繪資料儲存單元202、一伺服器單元302以及用於實現處理子系統108之資料儲存單元202與伺服器單元302之間之通信之匯流排介面之一示意性表示300。如上文提及,伺服器單元302可定位於伺服器機架150及/或同一伺服器機房及/或一不同伺服器機房中之其他伺服器機架之一給定伺服器專屬托盤中。Figure 3 schematically represents a data storage unit 202, a server unit 302, and a bus interface 300 for implementing communication between the data storage unit 202 and the server unit 302 in the processing subsystem 108. As mentioned above, the server unit 302 may be located in a server rack 150 and/or in a given server-dedicated tray in another server rack in the same server room and/or a different server room.
伺服器單元302包括一「主機」部分303。概括言之,主機部分303可被稱為伺服器單元302之主處理部分。主機部分303包括一主機處理器310。經審慎考慮,多於一個主機處理器310可為伺服器單元302之主機部分303之部分。主機部分303亦包括數個其他硬體組件320,包含(但不限於):複數個揮發性記憶碟、複數個非揮發性記憶碟、複數個網路介面、用於外部PCIe裝置之連接之複數個PCIe介面及類似者。主機部分303亦包括用於經由下文將更詳細論述之匯流排介面提供主機部分303與資料儲存單元202之間之通信之連接器351及352。Server unit 302 includes a "host" section 303. In general, host section 303 may be referred to as the main processing section of server unit 302. Host section 303 includes a host processor 310. Upon careful consideration, more than one host processor 310 may be part of the host section 303 of server unit 302. Host section 303 also includes several other hardware components 320, including (but not limited to): a plurality of volatile memory disks, a plurality of non-volatile memory disks, a plurality of network interfaces, a plurality of PCIe interfaces for connecting external PCIe devices, and the like. The host unit 303 also includes connectors 351 and 352 for providing communication between the host unit 303 and the data storage unit 202 via a bus interface, which will be discussed in more detail below.
主機處理器310經組態以執行伺服器單元302之一作業系統(OS) 315。概括言之,一給定OS係執行伺服器單元302之數個任務之一軟體組件,諸如(例如)檔案管理、記憶體管理、程序管理、處置輸入及輸出及控制周邊裝置,諸如記憶碟及印表機。The host processor 310 is configured to run an operating system (OS) 315 of the server unit 302. In general, a given OS is a software component that performs several tasks of the server unit 302, such as (for example) file management, memory management, program management, handling input and output, and controlling peripheral devices, such as memory disks and printers.
OS 315可為一「現成」 OS且無特定限制。然而,在一些實施例中,OS 315可經實施為基於Linux之OS。在其他實施例中,OS 315可經實施為一基於Debian之OS。OS 315 can be an "off-the-shelf" OS without specific restrictions. However, in some implementations, OS 315 can be implemented as a Linux-based OS. In other implementations, OS 315 can be implemented as a Debian-based OS.
運行OS 315之主機處理器310亦經組態以執行一磁碟機健康管理應用程式317。概括言之,磁碟機健康管理應用程式317經組態以監測由主機處理器310自複數個記憶碟210獲取之「健康資料」且經組態以分析此健康資料以用於判定來自複數個記憶碟210之一或多個記憶碟之一健康狀態。例如,由磁碟機健康管理應用程式317分析之健康資料可包括自各自記憶碟210 (及/或其群組)獲取之溫度、電壓及電流之指示。在HDD之情況中,由磁碟機健康管理應用程式317分析之健康資料亦可包括一給定HDD上之數個磨損扇區之指示。The host processor 310 running OS 315 is also configured to execute a disk drive health management application 317. In summary, the disk drive health management application 317 is configured to monitor "health data" obtained by the host processor 310 from a plurality of memory disks 210 and is configured to analyze this health data to determine the health status of one or more of the memory disks 210. For example, the health data analyzed by the disk drive health management application 317 may include indications of temperature, voltage, and current obtained from the respective memory disks 210 (and/or their groups). In the case of an HDD, the health data analyzed by the disk drive health management application 317 may also include indications of several worn sectors on a given HDD.
使用哪一磁碟機健康管理應用程式無特定限制且一給定磁碟機健康管理應用程式之選擇可尤其取決於本技術之不同實施方案。磁碟機健康管理應用程式317經組態以分析健康資料以用於判定不同磁碟機之健康狀態且在其等當中識別一或多個毀損磁碟機,而無關於使用哪一特定磁碟機健康管理應用程式。在本技術之背景內容中,主機處理器310可利用磁碟機健康管理應用程式317以用於發出用於致動毀損磁碟機之一視覺指示器之一或多個命令。將在下文詳細論述主機處理器310可如何使用磁碟機健康管理應用程式317以用於發出該一或多個命令。There is no specific restriction on which disk drive health management application is used, and the choice of a given disk drive health management application can depend particularly on different implementations of this technology. Disk drive health management application 317 is configured to analyze health data to determine the health status of different disk drives and identify one or more damaged disk drives among them, regardless of which specific disk drive health management application is used. In the background of this technology, the host processor 310 can utilize disk drive health management application 317 to issue one or more commands to actuate a visual indicator of a damaged disk drive. The use of disk drive health management application 317 by the host processor 310 to issue such one or more commands will be discussed in detail below.
伺服器單元302亦包括一服務處理器330。在一些實施例中,服務處理器330可經實施為伺服器單元302之一基板管理控制器(BMC)。概括言之,伺服器單元302之BMC定位於伺服器之主機板上且用於監測伺服器單元302之實體狀態。其係可嵌入一電腦(諸如一伺服器單元)之主機板上之一專用微控制器。一BMC亦可管理系統管理軟體與平台硬體之間之介面。一BMC可具有其自身之韌體及揮發性記憶體。應注意,內建於伺服器單元302 (及潛在地處理子系統108中之其他電腦系統)中之不同類型之感測器可向BMC報告(例如,處理子系統108中之伺服器單元302及/或其他電腦系統之組件之)參數,諸如溫度、冷卻風扇速度、電源狀態、作業系統(OS)狀態等。BMC可監測感測器且若任何參數未保持在預設定限制內(此指示系統中之一潛在故障),則可經由網路將警報發送給一系統管理員。管理員亦可與BMC遠端通信以採取一些校正動作,諸如將系統重設或電源重啟以使一當機OS再次運行。此等能力可降低擁有或操作一系統之一總成本。Server unit 302 also includes a server processor 330. In some embodiments, the server processor 330 may be implemented as a baseboard management controller (BMC) of server unit 302. In general, the BMC of server unit 302 is located on the server's motherboard and is used to monitor the physical status of server unit 302. It is a dedicated microcontroller that can be embedded on the motherboard of a computer (such as a server unit). A BMC can also manage the interface between system management software and platform hardware. A BMC may have its own firmware and volatile memory. It should be noted that various types of sensors built into server unit 302 (and other computer systems potentially in processing subsystem 108) can report parameters to the BMC (e.g., components of server unit 302 and/or other computer systems in processing subsystem 108), such as temperature, cooling fan speed, power status, and operating system (OS) status. The BMC can monitor the sensors and send an alert to a system administrator via the network if any parameter fails to remain within preset limits (indicating a potential fault in the system). The administrator can also communicate remotely with the BMC to take corrective actions, such as resetting the system or restarting the power to get a crashed OS running again. These capabilities can reduce the total cost of owning or operating a system.
僅作為實例,服務處理器330可用於(i)控制主機電源狀態(通電、斷電)且用於容許藉由經由網路傳遞之來自外部系統之命令遠端地使伺服器單元302通電及斷電,(ii)藉由不同硬體感測器(如同溫度感測器、電壓及電流感測器、氣流感測器)監測主機健康(溫度、硬體錯誤及異常),(iii)經由網路提供對主機部分303之一鍵盤、滑鼠及視訊監視器之遠端存取,(iv)取決於伺服器組件之溫度而控制冷卻風扇及類似者。By way of example only, server processor 330 may be used to (i) control the host power status (power on, power off) and to allow remote power on and power off of server unit 302 via commands from an external system transmitted over the network, (ii) monitor host health (temperature, hardware errors and anomalies) via various hardware sensors (such as temperature sensors, voltage and current sensors, and airflow sensors), (iii) provide remote access to the keyboard, mouse and video monitor of host unit 303 via the network, and (iv) control cooling fans and the like depending on the temperature of the server components.
在至少一些實施例中,實施為伺服器單元302之BMC之服務處理器330可被稱為伺服器單元302內部之一專屬運算裝置,且可獨立於主機部分303被供電且在電力被施加至伺服器單元302時及在主機部分303通電之前通電。In at least some embodiments, the server processor 330 of the BMC implemented as server unit 302 may be referred to as a dedicated computing device within server unit 302, and may be powered independently of host unit 303 and powered when power is applied to server unit 302 and before host unit 303 is powered on.
根據上文已參考圖2描述之內容,儲存單元202具有服務處理器230、複數個記憶碟210以及連接器353及354。類似於伺服器302之服務處理器330,資料儲存單元202之服務處理器230可經實施為資料儲存單元202之BMC。服務處理器230經組態以監測資料儲存單元202之實體狀態且因此,可在資料儲存單元202中執行類似於藉由伺服器單元302中之服務處理器330執行之功能之功能。然而,資料儲存單元202之服務處理器230之功能亦可排除伺服器單元302中之服務處理器330之一些功能,諸如(例如)提供對一鍵盤、一滑鼠及一監視器之存取。As described above with reference to Figure 2, storage unit 202 includes a server processor 230, multiple memory disks 210, and connectors 353 and 354. Similar to the server processor 330 of server 302, the server processor 230 of data storage unit 202 can be implemented as the BMC of data storage unit 202. The server processor 230 is configured to monitor the physical status of data storage unit 202, and therefore, can perform functions similar to those performed by the server processor 330 in server unit 302 within data storage unit 202. However, the functionality of the server processor 230 in the data storage unit 202 may also exclude some functions of the server processor 330 in the server unit 302, such as (for example) providing access to a keyboard, a mouse and a monitor.
本技術之開發者已意識到,一些解決方案採用一伺服器單元之一BMC與一儲存單元之一BMC之間之一匯流排介面尤其用於發送用於致動毀損磁碟機之視覺指示器之命令。然而,BMC-BMC通信係低效的,此係因為其易受超時、重試及源自用於建立BMC-BMC通信之多層通信協定之複雜性之其他原因影響。The developers of this technology have recognized that some solutions employ a bus interface between a BMC in a server unit and a BMC in a storage unit, particularly for sending commands to actuate visual indicators of a damaged disk drive. However, BMC-BMC communication is inefficient due to its susceptibility to timeouts, retries, and other factors stemming from the complexity of the multi-layered communication protocols used to establish BMC-BMC communication.
本技術之開發者已設計一系統,其包括一伺服器單元、一資料儲存單元及用於提供伺服器單元與資料儲存單元之間之更有效通信能力之一匯流排架構。更特定言之,在根據本技術之非限制性實施例實施之一匯流排架構中可不再需要存在於其他已知解決方案中之BMC-BMC通信鏈路,匯流排架構介於伺服器單元與資料儲存單元之間。如自本文中在下文之進一步描述將變得明白,由本技術之開發者設計之新匯流排架構尤其容許致動毀損磁碟機之視覺指示器而無需一伺服器單元之BMC與一資料儲存單元之一BMC通信。The developers of this technology have designed a system comprising a server unit, a data storage unit, and a bus architecture for providing more efficient communication between the server unit and the data storage unit. More specifically, in a bus architecture implemented according to a non-limiting embodiment of this technology, the BMC-BMC communication link present in other known solutions is no longer required, with the bus architecture positioned between the server unit and the data storage unit. As will become clear from the further description herein, the new bus architecture designed by the developers of this technology particularly allows the actuation of a visual indicator of a damaged disk drive without communication between the BMC of a server unit and the BMC of a data storage unit.
返回圖3之描述,應注意,複數個記憶碟210被「分組」成包含記憶碟210之一第一群組360及記憶碟210之一第二群組370之一組群組。僅為了簡潔起見,已省略來自複數個記憶碟210之其他記憶體群組。在一個實施方案中,資料儲存單元202可包含記憶碟210之七個群組,而不脫離本技術之範疇。Referring back to the description in Figure 3, it should be noted that the plurality of memory disks 210 are "grouped" into groups comprising a first group 360 of memory disks 210 and a second group 370 of memory disks 210. For simplicity, other memory groups from the plurality of memory disks 210 have been omitted. In one embodiment, the data storage unit 202 may contain seven groups of memory disks 210 without departing from the scope of this art.
在本技術之背景內容中,記憶碟之各群組連接至一各自串列至並列輸入/輸出(S2PIO)裝置。概括言之,一給定S2PIO裝置係一方面連接至一串列匯流排介面且另一方面具有數個並列I/O連接之一給定裝置。In the background of this technology, each group of memory disks is connected to a separate serial-to-parallel input/output (S2PIO) device. In general, a given S2PIO device is a given device that is connected to a serial bus interface on one hand and has several parallel I/O connections on the other hand.
在一些實施例中,一給定S2PIO裝置可為一給定通用I/O (GPIO)擴展裝置。概括言之,一GPIO係一積體電路或電子電路板上之可用作一輸入或輸出之一未認可數位信號接腳。In some embodiments, a given S2PIO device may be a given general purpose I/O (GPIO) expansion device. In general, a GPIO is an unauthorized digital signal pin on an integrated circuit or electronic circuit board that can be used as an input or output.
在一個實施方案中,一PCA9555可用作GPIO擴展裝置,而不脫離本技術之範疇。PCA9555係為I2C匯流排/SMBus應用提供16位元之GPIO擴展且經開發以增強I2C匯流排I/O擴展器之NXP半導體系列之一24接腳CMOS裝置。當ACPI電源開關、感測器、按鈕、LED、風扇等需要額外I/O時,I/O擴展器提供一簡單解決方案。PCA9555由兩個8位元組態(輸入或輸出選擇);輸入、輸出及極性反轉(高態有效或低態有效操作)暫存器組成。I/O可藉由寫入至I/O組態位元而實現為輸入或輸出。用於各輸入或輸出之資料保持於對應輸入或輸出暫存器中。讀取暫存器之極性可使用極性反轉暫存器反轉。當任何輸入狀態不同於其對應輸入埠暫存器狀態且用於向系統控制器指示一輸入狀態已改變時,可啟動PCA9555汲極開路中斷輸出。此等硬體接腳(A0、A1、A2)變動固定I2C匯流排位址且容許多達八個裝置共用同一I2C匯流排/SMBus。In one implementation, a PCA9555 can be used as a GPIO expansion device without departing from the scope of this technology. The PCA9555 is a 24-pin CMOS device from NXP Semiconductors, providing 16-bit GPIO expansion for I2C bus/SMBus applications and developed to enhance I2C bus I/O expansion capabilities. The I/O expander provides a simple solution when additional I/O is required for ACPI power switches, sensors, buttons, LEDs, fans, etc. The PCA9555 consists of two 8-bit configuration (input or output selection) registers; input, output, and polarity inversion (active high or active low operation) registers. I/O can be implemented as input or output by writing to the I/O configuration bits. Data for each input or output is stored in the corresponding input or output register. The polarity of a register can be read using a polarity inverter register. The PCA9555 open-drain interrupt output can be activated when any input state differs from the state of its corresponding input port register and is used to indicate to the system controller that an input state has changed. These hardware pins (A0, A1, A2) vary the fixed I2C bus address and allow up to eight devices to share the same I2C bus/SMBus.
然而,在其他實施例中,S2PIO可經實施為一給定複可程式化邏輯裝置(CPLD)、一場可程式化閘陣列(FPGA)及類似者,而不脫離本技術之範疇。經審慎考慮,一給定S2PIO裝置可經組態以將經由一各自串列匯流排接收之資料轉換為其I/O接腳應該具有之狀態,改變其I/O接腳之狀態,且可將指示其I/O接腳之當前狀態之資料轉換為待由各自串列匯流排載送之資料。However, in other embodiments, S2PIO can be implemented as a given complex programmable logic device (CPLD), a field programmable gate array (FPGA), and the like, without departing from the scope of this art. Upon careful consideration, a given S2PIO device can be configured to convert data received via a respective serial bus to the state that its I/O pins should have, change the state of its I/O pins, and convert data indicating the current state of its I/O pins to data to be loaded via the respective serial bus.
如描繪,第一群組360經由鏈路385連接至一第一S2PIO裝置380且第二群組370經由鏈路395連接至一第二S2PIO裝置390。應注意,一給定記憶碟在記憶碟之各自群組中具有一相對位置。例如,在四個磁碟機之一給定群組中,一給定磁碟機可取決於哪些鏈路將其連接至各自S2PIO裝置而為一第一、一第二、一第三或一第四磁碟機。例如,若給定磁碟機連接至各自S2PIO裝置之鏈路當中之一第一組鏈路,則磁碟機可被視為四個磁碟機之各自群組中之第一磁碟機。在同一實例中,若給定磁碟機連接至各自S2PIO裝置之鏈路當中之一第二組鏈路,則磁碟機可被視為四個磁碟機之各自群組中之第二磁碟機。磁碟機群組中之哪一磁碟機具有哪一相對位置比各磁碟機在磁碟機之一各自群組中具有一唯一相對位置之事實更不重要。As depicted, the first group 360 is connected to a first S2PIO device 380 via link 385, and the second group 370 is connected to a second S2PIO device 390 via link 395. It should be noted that a given memory disk has a relative position within its respective group. For example, in a given group of four drives, a given drive may be a first, second, third, or fourth drive, depending on which links connect it to its respective S2PIO device. For example, if a given drive is connected to a first group of links within its respective S2PIO device, then the drive can be considered the first drive in its respective group of four drives. In the same example, if a given drive is connected to a second set of links in one of the respective S2PIO devices, then the drive can be considered as the second drive in each of the four drive groups. Which drive in a drive group has which relative position is less important than the fact that each drive has a unique relative position in its respective drive group.
第一S2PIO裝置380及記憶碟210之第一群組360經由各自匯流排介面連接至一連接器353用於與伺服器單元302通信。第二S2PIO裝置390及記憶碟之第二群組370經由各自匯流排介面連接至一連接器354用於與伺服器單元302通信。應注意,記憶碟210藉由一第一匯流排介面304與伺服器單元302連接,且S2PIO裝置藉由一第二匯流排介面305與伺服器單元302連接。The first S2PIO device 380 and the first group 360 of the memory disk 210 are connected to a connector 353 via their respective bus interfaces for communication with the server unit 302. The second S2PIO device 390 and the second group 370 of the memory disk are connected to a connector 354 via their respective bus interfaces for communication with the server unit 302. It should be noted that the memory disk 210 is connected to the server unit 302 via a first bus interface 304, and the S2PIO device is connected to the server unit 302 via a second bus interface 305.
在本技術之一些實施例中,第一匯流排介面304可為一串列AT附接(SATA)匯流排介面。概括言之,SATA係將主機匯流排配接器連接至大容量儲存裝置(諸如HDD、光碟機及SSD)之一電腦匯流排介面。在其他實施例中,第一匯流排介面可經實施為SCSI、SAS及PCIe匯流排。In some embodiments of this technology, the first bus interface 304 may be a Serial AT Attached (SATA) bus interface. In general, SATA is a computer bus interface that connects a host bus adapter to a mass storage device (such as an HDD, optical drive, or SSD). In other embodiments, the first bus interface may be implemented as a SCSI, SAS, or PCIe bus.
在所繪示實例中,記憶碟210之第一群組360可藉由行進通過連接器351及353之一第一SATA鏈路341連接至主機處理器310。在同一實例中,記憶碟210之第二群組370可藉由行進通過連接器352及354之一第二SATA鏈路342連接至主機處理器310。In the illustrated example, the first group 360 of memory disk 210 can be connected to host processor 310 via a first SATA link 341 through connectors 351 and 353. In the same example, the second group 370 of memory disk 210 can be connected to host processor 310 via a second SATA link 342 through connectors 352 and 354.
第一匯流排介面304經組態用於在主機處理器310與記憶碟210之間傳輸I/O操作。第一匯流排介面304亦經組態用於在記憶碟210至主機處理器310之間傳輸健康資料以用於由磁碟機健康管理應用程式317管理及分析。The first bus interface 304 is configured to transmit I/O operations between the host processor 310 and the memory disk 210. The first bus interface 304 is also configured to transmit health data between the memory disk 210 and the host processor 310 for management and analysis by the disk drive health management application 317.
在本技術之一些實施例中,第二匯流排介面305可為一積體電路間(I2C)匯流排介面。概括言之,I2C係廣泛用於將較低速周邊IC附接至處理器及/或微控制器之一同步、多主控、多從屬、封包交換、單端串列通信匯流排。應注意,I2C匯流排之一個子集被稱為一系統管理匯流排(SMBus)。SMBus之一個目的係促進穩健性及互操作性。因此,現代I2C系統併入來自SMBus之一些政策及規則,有時支援I2C及SMBus兩者,僅需要藉由命令或輸出接腳使用之最小重新組態。在其他實施例中,第二匯流排介面可經由CANbus、RS-422 (ANSI/TIA/EIA-422-B)實施。In some embodiments of this technology, the second bus interface 305 may be an inter-integrated circuit (I2C) bus interface. In general, I2C is widely used to attach lower-speed peripheral ICs to a synchronous, multi-master, multi-slave, packet-switched, single-ended serial communication bus of a processor and/or microcontroller. It should be noted that a subset of I2C buses is called a System Management Bus (SMBus). One purpose of SMBus is to promote robustness and interoperability. Therefore, modern I2C systems incorporate some policies and rules from SMBus, sometimes supporting both I2C and SMBus with minimal reconfiguration via command or output pins. In other embodiments, the second bus interface may be implemented via CANbus or RS-422 (ANSI/TIA/EIA-422-B).
在所繪示實例中,第一S2PIO裝置380可藉由行進通過連接器351及353之一第一I2C鏈路343連接至伺服器單元302之服務處理器330。在同一實例中,第二S2PIO裝置390可藉由行進通過連接器352及354之一第二I2C鏈路344連接至伺服器單元302之服務處理器330。In the illustrated example, the first S2PIO device 380 can be connected to the server processor 330 of the server unit 302 via a first I2C link 343 passing through connectors 351 and 353. In the same example, the second S2PIO device 390 can be connected to the server processor 330 of the server unit 302 via a second I2C link 344 passing through connectors 352 and 354.
第二匯流排介面305經組態用於傳輸來自伺服器單元302之服務處理器330及各自S2PIO裝置之命令。經由第二匯流排介面305傳輸之命令可包含用於致動各自記憶碟210之視覺指示器之命令。第二匯流排介面305亦經組態用於將關於各自S2PIO裝置及各自輸入及輸出(接腳)之當前狀態之資訊提供至伺服器單元302之服務處理器330。The second bus interface 305 is configured to transmit commands from the server processor 330 of the server unit 302 and their respective S2PIO devices. The commands transmitted via the second bus interface 305 may include commands to actuate the visual indicators of their respective memory disks 210. The second bus interface 305 is also configured to provide information about the current status of their respective S2PIO devices and their respective inputs and outputs (pins) to the server processor 330 of the server unit 302.
現將參考圖4論述處理子系統108如何經組態以操作用於致動一給定毀損磁碟機之一視覺指示器。主機處理器310經組態以經由第二SATA鏈路342接收健康資料402且可將健康資料402提供至磁碟機健康管理應用程式317。健康資料402包括指示與第二SATA鏈路342相關聯之記憶碟之對應目標群組中之記憶碟210之健康狀態之資訊。應注意,具體言之,主機處理器310在一定意義上「不知道」哪些記憶碟210與第二SATA鏈路342相關聯。換言之,主機處理器310 (尚)不知道資料儲存單元202中之磁碟機之哪一群組係經由第二SATA鏈路342連接之磁碟機之目標群組。The following describes, with reference to Figure 4, how the processing subsystem 108 is configured to operate a visual indicator for actuating a given damaged disk drive. The host processor 310 is configured to receive health data 402 via the second SATA link 342 and to provide the health data 402 to the disk drive health management application 317. The health data 402 includes information indicating the health status of memory disks 210 in a corresponding target group of memory disks associated with the second SATA link 342. It should be noted that, specifically, the host processor 310, in a certain sense, "does not know" which memory disks 210 are associated with the second SATA link 342. In other words, the host processor 310 does not yet know which group of disk drives in the data storage unit 202 is the target group of the disk drives connected via the second SATA link 342.
主機處理器310採用磁碟機健康管理應用程式317以用於判定來自記憶碟之該目標群組之記憶碟210之哪一者(若有)係一毀損記憶碟。例如,磁碟機健康管理應用程式317可基於健康資料402判定,來自記憶碟之目標群組之一第二記憶碟(磁碟機在群組中之相對位置)係毀損的。再次,應注意,主機處理器310不知道對應於第二SATA鏈路342之記憶碟之目標群組是否係第一群組360或第二群組370,即主機處理器310基於健康資料402知道,經由第二SATA鏈路342之第二記憶碟(相對位置)可能係毀損的。The host processor 310 employs a disk drive health management application 317 to determine which (if any) of the memory disks 210 from the target group is a damaged memory disk. For example, the disk drive health management application 317 may determine, based on health data 402, that the second memory disk (the relative position of the disk drive within the group) from the target group is damaged. Again, it should be noted that the host processor 310 does not know whether the target group corresponding to the memory disk on the second SATA link 342 is the first group 360 or the second group 370; that is, the host processor 310 knows, based on health data 402, that the second memory disk (relative position) via the second SATA link 342 may be damaged.
主機處理器310經組態以經由一鏈路311將一命令404發出至服務處理器330。命令404經組態以指示服務處理器330致動毀損記憶碟之一視覺指示器。例如,命令404可根據一OEM IPMI協定經由鏈路311傳輸。命令404包括指示以下之資訊:(i)毀損記憶碟在目標群組中之一相對位置,及(ii)主機處理器310已經由其獲取健康資料402之一鏈路。換言之,命令404包括指示來自經由第二SATA鏈路342 (經由連接器352及354)連接之記憶碟之一給定群組之第二記憶碟係一毀損記憶碟之資訊。Host processor 310 is configured to issue a command 404 to server processor 330 via a link 311. Command 404 is configured to instruct server processor 330 to actuate a visual indicator of the damaged memory disk. For example, command 404 may be transmitted via link 311 according to an OEM IPMI protocol. Command 404 includes information indicating: (i) a relative position of the damaged memory disk in a target group, and (ii) a link from which host processor 310 has obtained health data 402. In other words, command 404 includes information indicating that a second memory disk in a given group of memory disks connected via a second SATA link 342 (via connectors 352 and 354) is a damaged memory disk.
服務處理器330經組態以處理命令404以便(i)產生用於致動來自記憶碟之一目標群組之一第二者之視覺指示器之一命令406,及(ii)識別待經由其傳輸命令406之一目標I2C鏈路。在此實例中,服務處理器330可基於命令404識別,待經由其發送命令406之目標I2C鏈路係第二I2C鏈路342 (經由連接器352及354)。因而,服務處理器330經組態以經由第二SATA鏈路342發出命令406。命令406係攜載關於對應目標S2PIO裝置之哪一I/O接腳應改變其狀態之資訊之一低階命令。Server processor 330 is configured to process command 404 to (i) generate a command 406 for actuating a visual indicator of a second party in a target group from a memory disk, and (ii) identify a target I2C link through which command 406 is to be transmitted. In this example, server processor 330 may identify, based on command 404, that the target I2C link through which command 406 is to be sent is a second I2C link 342 (via connectors 352 and 354). Therefore, server processor 330 is configured to issue command 406 via the second SATA link 342. Command 406 is a low-level command carrying information about which I/O pin of the corresponding target S2PIO device should change its state.
命令406係由目標S2PIO裝置接收且指示與目標群組內之第二記憶碟之一視覺指示器相關聯之接腳之狀態應改變。目標S2PIO裝置改變對應接腳之狀態,藉此致動毀損磁碟機之視覺指示器。如此致動之視覺指示器可藉由視覺上顯示磁碟機之毀損狀態而幫助人類操作者執行毀損磁碟機之維護。Command 406 is received by the target S2PIO device and instructs that the state of a pin associated with a visual indicator of the second memory disk within the target group should be changed. The target S2PIO device changes the state of the corresponding pin, thereby actuating the visual indicator of the damaged disk drive. The actuated visual indicator assists the human operator in performing maintenance on the damaged disk drive by visually displaying the damage status of the disk drive.
應注意,各S2PIO裝置可在與資料儲存單元202中之記憶碟210之各自群組相關聯之S2PIO裝置當中具有一唯一識別符。經審慎考慮,目標S2PIO裝置經組態以將指示其各自唯一識別符之資訊傳回服務處理器330。因此,服務處理器330現在「知道」具體哪一記憶碟係毀損記憶碟。實際上,基於由目標S2PIO裝置經由第二I2C鏈路342提供之資訊,服務處理器330可經組態以將目標S2PIO裝置識別為與記憶碟之第二群組370相關聯之第二S2PIO裝置390。結合與指示毀損記憶碟(來自目標群組之第二記憶碟)在目標群組中之相對位置之資訊,服務處理器330可經組態以將毀損記憶碟識別為來自連接至第二S2PIO裝置390之記憶碟之第二群組370之第二記憶碟210。換言之,服務處理器330可藉由存取指示以下之資訊而判定毀損記憶碟之一絕對位置:(i)資料儲存單元202中之哪一S2PIO裝置係目標S2PIO,及(ii)各自目標群組之哪一記憶碟係毀損磁碟機。It should be noted that each S2PIO device may have a unique identifier among the S2PIO devices associated with its respective group of memory disk 210 in data storage unit 202. After careful consideration, the target S2PIO device is configured to send information indicating its respective unique identifier back to the server processor 330. Therefore, the server processor 330 now "knows" which specific memory disk is the damaged memory disk. In fact, based on information provided by the target S2PIO device via the second I2C link 342, the server processor 330 may be configured to identify the target S2PIO device as a second S2PIO device 390 associated with the second group 370 of the memory disk. By combining information indicating the relative location of the damaged memory disk (from the second memory disk of the target group) within the target group, the server processor 330 can be configured to identify the damaged memory disk as the second memory disk 210 of the second group 370 from the memory disk connected to the second S2PIO device 390. In other words, the server processor 330 can determine an absolute location of the damaged memory disk by accessing information indicating: (i) which S2PIO device in the data storage unit 202 is the target S2PIO, and (ii) which memory disk in each target group is the damaged disk drive.
在一些實施例中,經審慎考慮,將特定記憶碟(目標I2C鏈路及毀損磁碟機在目標群組當中之相對位置)識別為毀損磁碟機可容許伺服器單元302之服務處理器330在維護期間提供對人類操作者之支援。In some embodiments, after careful consideration, identifying a specific memory disk (the target I2C link and the relative position of the damaged disk drive within the target group) as a damaged disk drive allows the server processor 330 of server unit 302 to provide support to human operators during maintenance.
例如,假定人類操作者決定將毀損記憶碟斷開連接且使用一新記憶碟替換它,但錯誤地將錯誤的記憶碟斷開連接以進行替換。在此實例中,亦假定人類操作者將第二群組370中之第三記憶碟而非第二記憶碟斷開連接。目標S2PIO裝置(第二S2PIO裝置390)可經組態以讀取其接腳之當前狀態且判定在其磁碟機群組當中之第三記憶碟之接腳已被斷開連接。此資訊可經由第二I2C鏈路342傳輸至服務處理器330,且接著經由鏈路311傳輸至主機處理器310。磁碟機健康管理應用程式317可經組態以比較如此獲取之資訊與識別毀損磁碟機之資訊。在此實例中,磁碟機健康管理應用程式317可判定不同於毀損記憶碟之一記憶碟已與目標磁碟機群組斷開連接。磁碟機健康管理應用程式317可經組態以發出向人類操作者通知該效應之一或多個補救動作。For example, suppose a human operator decides to disconnect a damaged memory disk and replace it with a new one, but mistakenly disconnects the wrong memory disk for replacement. In this example, it is also assumed that the human operator disconnects the third memory disk in the second group 370 instead of the second memory disk. The target S2PIO device (the second S2PIO device 390) can be configured to read the current state of its pins and determine that the pins of the third memory disk in its disk drive group have been disconnected. This information can be transmitted via the second I2C link 342 to the server processor 330, and then via link 311 to the host processor 310. The disk drive health management application 317 can be configured to compare the acquired information with information identifying a damaged disk drive. In this example, the disk drive health management application 317 can determine that a memory drive other than the damaged one has been disconnected from the target disk drive group. The disk drive health management application 317 can be configured to issue one or more remedial actions to notify the human operator of this effect.
在本技術之一些實施例中,經審慎考慮,系統108可經組態以執行一方法500,在圖5中繪示方法500之一示意方塊表示。現將詳細描述方法500之各種步驟。 步驟502:藉由服務處理器自主機處理器獲取資料儲存單元中之一毀損記憶碟之一指示 In some embodiments of this technology, after careful consideration, system 108 can be configured to execute a method 500, which is shown as a schematic block representation of method 500 in Figure 5. The various steps of method 500 will now be described in detail. Step 502: Obtain an instruction from a damaged memory disk in the data storage unit via the server processor's autoprocessor.
方法500在步驟502以一服務處理器自一主機處理器獲取一資料儲存單元中之一毀損記憶碟之一指示開始。Method 500 begins in step 502 with a server processor obtaining an instruction from a host processor for a damaged memory disk in a data storage unit.
在一個實例中,在步驟502期間,主機處理器310可經組態以將命令404傳輸至服務處理器330 (參見圖4)。應注意,命令404係自伺服器單元302之主機處理器310傳輸至伺服器單元302之服務處理器330。經審慎考慮,主機處理器310可為一給定伺服器單元之一主機部分之一給定處理器,而服務處理器330可為給定(同一)伺服器單元之BMC。In one example, during step 502, the host processor 310 may be configured to transmit command 404 to the server processor 330 (see Figure 4). It should be noted that command 404 is transmitted from the host processor 310 of server unit 302 to the server processor 330 of server unit 302. Upon careful consideration, the host processor 310 may be a given processor within a given host portion of a given server unit, and the server processor 330 may be a BMC of a given (same) server unit.
命令404可攜載資料儲存單元202中之毀損記憶碟410之指示。應注意,(用於執行I/O操作之)資料儲存單元202中之複數個記憶碟被分組為記憶碟210之各自群組。來自一各自群組之一給定記憶碟與各自群組中之一各自位置相關聯(例如,諸如四個記憶碟之群組中之第一、第二、第三及第四位置)且具有用於顯示給定記憶碟之一狀態之至少一個各自視覺指示器。磁碟機之一給定群組連接至用於監測並控制給定群組中之各自記憶碟之視覺指示器之一各自S2PIO裝置。在一個實例中,一給定S2PIO裝置可為一GPIO擴展裝置。據稱,在一些實施例中,磁碟機之一給定群組內之視覺指示器可經由各自S2PIO裝置而非一本端BMC (諸如各自資料儲存單元之BMC)控制。磁碟機之一給定群組亦連接至一伺服器單元之一給定主機處理器。在一個實例中,磁碟機之給定群組可經由一或多個SATA匯流排鏈路連接至給定主機處理器。一給定S2PIO裝置連接至同一伺服器單元之一給定服務處理器。在一個實例中,一給定GPIO擴展裝置可經由一各自I2C鏈路連接至伺服器單元之BMC。Command 404 carries an indication of a damaged memory disk 410 in data storage unit 202. It should be noted that the plurality of memory disks in data storage unit 202 (used for performing I/O operations) are grouped into separate groups of memory disks 210. A given memory disk from a given group is associated with a respective location within that group (e.g., the first, second, third, and fourth locations in a group of four memory disks) and has at least one respective visual indicator for displaying the status of a given memory disk. A given group of disk drives is connected to a respective S2PIO device of the visual indicator for monitoring and controlling the respective memory disks in the given group. In one example, a given S2PIO device may be a GPIO expansion device. It is claimed that in some embodiments, visual indicators within a given group of drives may be controlled via their respective S2PIO devices rather than a local BMC (such as the BMC of the respective data storage unit). A given group of drives is also connected to a given host processor of a server unit. In one example, a given group of drives may be connected to a given host processor via one or more SATA bus links. A given S2PIO device is connected to a given server processor of the same server unit. In one example, a given GPIO expansion device may be connected to the BMC of the server unit via a respective I2C link.
命令404可攜載指示。指示係(例如)指示一給定匯流排介面(諸如一I2C匯流排介面)之第一鏈路及第二鏈路當中之一目標鏈路。目標鏈路連接至資料儲存單元202中之S2PIO裝置當中之一目標S2PIO裝置。目標S2PIO裝置連接至與資料儲存單元202中之記憶碟群組當中之毀損記憶碟410相關聯之一目標群組。指示亦係指示毀損記憶碟410在目標群組中之一位置。一旦健康磁碟機管理應用程式317獲取並分析健康資料402,主機處理器310便可產生命令404。例如,健康資料402可經由其他匯流排介面(諸如一SATA匯流排介面)之一給定鏈路獲取。 步驟504:藉由服務處理器使用目標鏈路將一命令傳輸至目標S2PIO裝置 Command 404 may carry an instruction. The instruction is, for example, an instruction to one of the first and second links of a given bus interface (such as an I2C bus interface). The target link is connected to one of the target S2PIO devices in data storage unit 202. The target S2PIO device is connected to a target group associated with the damaged memory disk 410 in the memory disk group of data storage unit 202. The instruction also indicates the location of the damaged memory disk 410 within the target group. Once the health disk drive management application 317 acquires and analyzes the health data 402, the host processor 310 can generate command 404. For example, health data 402 can be obtained via a given link from another bus interface (such as a SATA bus interface). Step 504: Transmit a command to the target S2PIO device via the target link using the server processor.
方法500以服務處理器經由目標鏈路將一給定命令傳輸至目標S2PIO裝置繼續至步驟504。給定命令係用於引起目標S2PIO裝置基於毀損記憶碟在目標群組中之位置致動與毀損記憶碟相關聯之一視覺指示器。應注意,在步驟502處獲取之目標鏈路之指示可由服務處理器330使用以用於判定給定匯流排介面(例如,I2C匯流排介面)之鏈路之哪一者應載送給定命令,而在步驟502處獲取之毀損記憶碟在目標群組中之位置之指示可由服務處理器330使用以用於向接收到給定命令之目標S2PIO裝置指示應致動其所連接至之哪一視覺指示器。Method 500 continues to step 504 by transmitting a given command from the server processor to the target S2PIO device via the target link. The given command is used to cause the target S2PIO device to actuate a visual indicator associated with the damaged memory disk based on the position of the damaged memory disk in the target group. It should be noted that the target link indication obtained at step 502 can be used by the server processor 330 to determine which link of the given bus interface (e.g., I2C bus interface) should carry the given command, and the indication of the location of the damaged memory disk in the target group obtained at step 502 can be used by the server processor 330 to indicate to the target S2PIO device that received the given command which visual indicator it is connected to should be actuated.
在一些實施例中,目標S2PIO可經組態以經由目標鏈路將目標S2PIO之唯一識別符之一指示傳輸至服務處理器330。應注意,目標S2PIO之唯一識別符及毀損記憶碟410在目標群組中之位置形成毀損記憶碟410在資料儲存單元202中之一唯一識別符。毀損記憶碟410之唯一識別符可用於監測毀損記憶碟410之檢測及/或替換。In some embodiments, the target S2PIO can be configured to transmit one of the unique identifiers of the target S2PIO to the server processor 330 via the target link. It should be noted that the unique identifier of the target S2PIO and the location of the damaged memory disk 410 in the target group form a unique identifier for the damaged memory disk 410 in the data storage unit 202. The unique identifier of the damaged memory disk 410 can be used to monitor the detection and/or replacement of the damaged memory disk 410.
應注意,伺服器單元302及資料儲存單元202經由一第一匯流排介面(諸如一SATA匯流排介面)及一第二匯流排介面(諸如一I2C匯流排介面)連接。SATA匯流排介面可將主機處理器310與磁碟機群組中之磁碟機連接且I2C匯流排介面可將服務處理器330與S2PIO裝置連接。經審慎考慮,複數個記憶碟可包括一HDD及一SSD之至少一者。It should be noted that server unit 302 and data storage unit 202 are connected via a first bus interface (such as a SATA bus interface) and a second bus interface (such as an I2C bus interface). The SATA bus interface can connect the host processor 310 to the disk drives in the disk drive group, and the I2C bus interface can connect the server processor 330 to an S2PIO device. Upon careful consideration, the plurality of memory disks may include at least one of an HDD and an SSD.
熟習此項技術者可變得明白對本技術之上述實施方案之修改及改良。前述描述旨在係例示性而非限制性的。因此,本技術之範疇旨在僅藉由隨附發明申請專利範圍之範疇限制。Those skilled in the art will understand the modifications and improvements to the above-described embodiments of this art. The foregoing description is intended to be illustrative and not restrictive. Therefore, the scope of this art is intended to be limited only by the scope of the appended invention claims.
100:系統 102:請求源 103:通信網路 108:處理子系統 150:伺服器機架 180:請求 200:托盤 202:資料儲存單元 210:記憶碟 215:視覺指示器 220:主機板 230:服務處理器 250:匯流排 300:示意性表示 302:伺服器單元 303:主機部分 304:第一匯流排介面 305:第二匯流排介面 310:主機處理器 311:鏈路 315:作業系統(OS) 317:磁碟機健康管理應用程式 320:硬體組件 330:服務處理器 341:第一串列AT附接(SATA)鏈路 342:第二串列AT附接(SATA)鏈路 343:第一積體電路間(I2C)鏈路 344:第二積體電路間(I2C)鏈路 351:連接器 352:連接器 353:連接器 354:連接器 360:第一群組 370:第二群組 380:第一串列至並列輸入/輸出(S2PIO)裝置 385:鏈路 390:第二串列至並列輸入/輸出(S2PIO)裝置 395:鏈路 402:健康資料 404:命令 406:命令 410:毀損記憶碟 500:方法 502:步驟 504:步驟 100: System 102: Request Source 103: Communication Network 108: Processing Subsystem 150: Server Rack 180: Request 200: Tray 202: Data Storage Unit 210: Memory Disk 215: Visual Indicator 220: Motherboard 230: Server Processor 250: Bus 300: Schematic Representation 302: Server Unit 303: Host Section 304: First Bus Interface 305: Second Bus Interface 310: Host Processor 311: Link 315: Operating System (OS) 317: Disk Drive Health Management Application 320: Hardware Components 330: Server Processor 341: First Serial AT Attachment (SATA) Link 342: Second Serial AT Attachment (SATA) Link 343: First Inter-Integrated Circuit (I2C) Link 344: Second Inter-Integrated Circuit (I2C) Link 351: Connector 352: Connector 353: Connector 354: Connector 360: First Group 370: Second Group 380: First Serial-to-Parallel Input/Output (S2PIO) Device 385: Link 390: Second Serial-to-Parallel Input/Output (S2PIO) Device 395: Link 402: Health Data 404: Command 406: Command 410: Damage Memory Disc 500: Method 502: Steps 504: Steps
為了更佳理解本技術以及其之其他態樣及進一步特徵,參考待結合隨附圖式使用之以下描述,其中:To better understand this technology and its other forms and further features, refer to the following description, which is to be used in conjunction with the accompanying drawings, wherein:
圖1描繪適用於實施本技術之非限制性實施例之一系統。Figure 1 depicts one of the non-limiting embodiments of the present technology.
圖2描繪包含圖1之系統之一資料儲存單元之一托盤之一左前俯視透視圖。Figure 2 depicts a left front-view perspective view of a tray containing one of the data storage units of the system in Figure 1.
圖3描繪經由匯流排介面連接至圖1之系統之一伺服器單元之圖2之資料儲存單元之一表示。Figure 3 depicts one of the data storage units of Figure 2, which is connected to one of the server units of the system in Figure 1 via a bus interface.
圖4描繪用於致動資料儲存單元之一記憶碟之一視覺指示器之圖3之伺服器單元之一程序之一示意性表示。Figure 4 is a schematic representation of a program in the server unit of Figure 3, which is used to actuate a visual indicator on a memory disk, one of the data storage units.
圖5係可在本技術之一些實施例中執行之一方法之一示意性方塊圖。Figure 5 is a schematic block diagram of one of the methods that can be performed in some embodiments of the present technology.
202:資料儲存單元 202: Data Storage Unit
230:服務處理器 230: Server
300:示意性表示 300: Indicative representation
302:伺服器單元 302: Server Unit
303:主機部分 303: Server Section
304:第一匯流排介面 304: First bus interface
305:第二匯流排介面 305: Second Bus Interface
310:主機處理器 310: Server Processor
311:鏈路 311: Chain
315:作業系統(OS) 315: Operating system (OS)
317:磁碟機健康管理應用程式 317: Disk Drive Health Management Application
320:硬體組件 320: Hardware Components
330:服務處理器 330: Server
341:第一串列AT附接(SATA)鏈路 341: First Serial Atomic Attached (SATA) Link
342:第二串列AT附接(SATA)鏈路 342: Second Serial ATA (SATA) Link
343:第一積體電路間(I2C)鏈路 343: First Integrated Circuit (I2C) Link
344:第二積體電路間(I2C)鏈路 344: Inter-Integral (I2C) Link
351:連接器 351: Connector
352:連接器 352: Connector
353:連接器 353: Connector
354:連接器 354: Connector
360:第一群組 360: Group 1
370:第二群組 370: Group Two
380:第一串列至並列輸入/輸出(S2PIO)裝置 380: First Serial-to-Parallel Input/Output (S2PIO) Device
385:鏈路 385: Chain
390:第二串列至並列輸入/輸出(S2PIO)裝置 390: Second Serial-to-Parallel Input/Output (S2PIO) Device
395:鏈路 395: Chain
Claims (12)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW111135257A TWI912560B (en) | 2022-09-16 | System and method for triggering a visual indicator of a faulty memory drive |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| TW111135257A TWI912560B (en) | 2022-09-16 | System and method for triggering a visual indicator of a faulty memory drive |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| TW202414229A TW202414229A (en) | 2024-04-01 |
| TWI912560B true TWI912560B (en) | 2026-01-21 |
Family
ID=
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9367510B2 (en) | 2013-11-26 | 2016-06-14 | American Megatrends, Inc. | Backplane controller for handling two SES sidebands using one SMBUS controller and handler controls blinking of LEDs of drives installed on backplane |
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US9367510B2 (en) | 2013-11-26 | 2016-06-14 | American Megatrends, Inc. | Backplane controller for handling two SES sidebands using one SMBUS controller and handler controls blinking of LEDs of drives installed on backplane |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10127032B2 (en) | System and method for unified firmware management | |
| CN106648958B (en) | Basic input output system reply management system, method and program product thereof | |
| TWI616758B (en) | Storage device, system and method for remote multi-computer switching technology | |
| TWI594600B (en) | Network switch and method of updating a device using a network switch | |
| CN106886366B (en) | Storage medium, system and method for using an extender for storage area network management | |
| US10331520B2 (en) | Raid hot spare disk drive using inter-storage controller communication | |
| US11640377B2 (en) | Event-based generation of context-aware telemetry reports | |
| US20170031694A1 (en) | System and method for remote system configuration managment | |
| US10114688B2 (en) | System and method for peripheral bus device failure management | |
| US8751635B2 (en) | Monitoring sensors for systems management | |
| US9712382B2 (en) | Retrieving console messages after device failure | |
| US20170286097A1 (en) | Method to prevent operating system digital product key activation failures | |
| JP2015114873A (en) | Information processing apparatus and monitoring method | |
| CN114600088A (en) | Server condition monitoring system and method using baseboard management controller | |
| US9122564B1 (en) | Evaluating a system event | |
| CN115509978A (en) | Method, device, equipment and storage medium for determining physical position of external plug-in equipment | |
| TWI553490B (en) | Method and system for remote system configuration management and non-transitory computer-readable storage medium | |
| TWI912560B (en) | System and method for triggering a visual indicator of a faulty memory drive | |
| JP5689783B2 (en) | Computer, computer system, and failure information management method | |
| US11307871B2 (en) | Systems and methods for monitoring and validating server configurations | |
| US12493539B2 (en) | System and method for triggering a visual indicator of a faulty memory drive | |
| TW202414229A (en) | System and method for triggering a visual indicator of a faulty memory drive | |
| WO2017072904A1 (en) | Computer system and failure detection method | |
| US11836127B2 (en) | Unique identification of metric values in telemetry reports | |
| CN117632800A (en) | Input/output expansion emulation with programmable device |