CN113568776A

CN113568776A - Automatic recovery method and system for disk-dropping fault applied to disk array

Info

Publication number: CN113568776A
Application number: CN202110881357.5A
Authority: CN
Inventors: 张德明; 侯俊; 陈东海; 陈丹; 刘杰
Original assignee: Hunan Xing Tian Electronic Technology Co ltd
Current assignee: Hunan Xing Tian Electronic Technology Co ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-10-29

Abstract

The invention discloses a method and a system for automatic recovery of a disk drop failure applied to a disk array. The system includes a controller, a first bus, a second bus and a control circuit; the controller communicates with the CPU through the first bus The controller is connected to the communication terminal of the electronic disk through the second bus; the controller is connected to the power terminal of the electronic disk through a control circuit to control the power state of the electronic disk . In the present invention, the controller communicates with the CPU of the disk array through the first bus, and communicates with the electronic disk through the second bus, so that a soft reset can be realized, or a hardware reset can be realized by controlling the power supply terminal of the electronic disk to be turned on and off through the control circuit. Through soft reset and hardware reset, false alarm faults that are not permanent failures of all electronic disks can be eliminated, and the reliability of disk arrays can be improved.

Description

Automatic recovery method and system for disk-dropping fault applied to disk array

Technical Field

The invention relates to the field of electronic disk fault elimination, in particular to a method and a system for automatically recovering a disk-dropping fault applied to a disk array.

Background

In a storage system, a disk array is generally used as a storage medium, and when an electronic disk in the disk array fails, that is, a disk is dropped, data reading and writing fails.

Although the existing disk array can solve the problem of data recovery processing after disk dropping, most of the existing disk array cannot automatically recover the working state of the disk array, and the practicability is not strong. A part of disk arrays can be used for resetting the electronic disks through bus software after the disk dropping fault occurs, but the automatic recovery of a small part of electronic disks after the disk dropping can be realized, and the automatic recovery can not be realized when the electronic disks have the fault and need hardware reset.

In conclusion, the conventional disk array cannot eliminate the false alarm fault of the non-permanent failure of the electronic disk, and has low reliability.

Disclosure of Invention

The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides an automatic recovery method and system for a disk-dropping fault applied to a disk array, which can eliminate a false alarm fault of non-permanent failure of an electronic disk.

According to the embodiment of the first aspect of the present invention, the automatic recovery system for a disk-dropping fault applied to a disk array is arranged between an electronic disk of the disk array and a CPU, and includes: a controller; the controller is connected with the communication end of the CPU through the first bus and used for receiving a communication instruction; the controller is connected with the communication end of the electronic disk through the second bus and used for reading the state of the electronic disk; and the controller is connected with the power supply end of the electronic disk through the control circuit and is used for controlling the power supply state of the electronic disk.

The automatic recovery system for the disk dropping fault applied to the disk array according to the first aspect of the present invention has at least the following technical effects: the controller of the embodiment of the invention is communicated with the CPU of the disk array through the first bus and is communicated with the electronic disk through the second bus, so that soft reset can be realized, hardware reset can be realized by controlling the on-off of the power supply end of the electronic disk through the control circuit, false alarm faults of all the non-permanent failures of the electronic disks can be eliminated through the soft reset and the hardware reset, and the reliability of the disk array is improved.

According to some embodiments of the invention, the first bus is a PCIE bus.

According to some embodiments of the invention, the second bus is a SATA bus.

According to some embodiments of the invention, a first capacitor is connected in series with each of the first and second buses for signal coupling and signal level matching.

According to some embodiments of the present invention, the control circuit includes an NMOS transistor and a PMOS transistor, the IO port of the controller is connected to the gate of the NMOS transistor, the source of the NMOS transistor is grounded, the drain of the NMOS transistor is connected to the gate of the PMOS transistor, the drain of the PMOS transistor is used for connecting to a low voltage working power supply, and the source of the PMOS transistor is connected to the power supply terminal of the electronic disk.

According to some embodiments of the invention, a current limiting resistor is disposed between the IO port of the controller and the gate of the NMOS transistor.

According to some embodiments of the invention, the gate of the NMOS transistor is grounded through a second capacitor and a third capacitor connected in parallel, the second capacitor is used for realizing slow start of the electronic disk, and the third capacitor is used for filtering out high-frequency noise.

According to some embodiments of the invention, the gate of the PMOS transistor is connected to a low-voltage operating power supply through a pull-up resistor.

According to some embodiments of the invention, the controller is an FPGA.

The automatic recovery method for the disk-dropping fault applied to the disk array according to the second aspect of the invention comprises the following steps:

the CPU sends a read-write command to the controller, and the controller receives the read-write command, then performs read-write operation on the electronic disk through the communication port and feeds back the result to the CPU;

if the electronic disk is read and written normally, the CPU continues to read and write, and if the electronic disk is not read and written normally, the CPU sends a soft reset command to the electronic disk through the controller;

after the soft reset command is sent, the controller carries out read-write operation on the electronic disk through the communication port again, and if the read-write of the electronic disk is recovered to be normal, the CPU continues to carry out the read-write step; if the reading and writing of the electronic disk still cannot be normal, the CPU sends a power-off restarting command to the controller, and the controller performs power-off restarting on the electronic disk through the control circuit;

and after the power failure restart, the controller performs read-write operation on the electronic disk again through the communication port, if the read-write of the electronic disk is recovered to be normal, the CPU continues to perform the read-write step, and if the read-write of the electronic disk is abnormal, the power failure restart step is continuously repeated until the read-write of the electronic disk is recovered to be normal.

The method for automatically recovering the disk-dropping fault applied to the disk array according to the embodiment of the first aspect of the invention at least has the following technical effects: the controller of the embodiment of the invention is communicated with the CPU of the disk array through the first bus and is communicated with the electronic disk through the second bus, so that soft reset can be realized, hardware reset can be realized by controlling the on-off of the power supply end of the electronic disk through the control circuit, false alarm faults of all the non-permanent failures of the electronic disks can be eliminated through the soft reset and the hardware reset, and the reliability of the disk array is improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a schematic circuit diagram of a controller and a PICE bus according to an embodiment of the present invention;

FIG. 2 is a schematic circuit diagram of a controller and a SATA bus according to an embodiment of the present invention;

FIG. 3 is a schematic circuit diagram of an IO pin of the controller according to an embodiment of the present invention;

FIG. 4 is a schematic circuit diagram of a control circuit in an embodiment of the invention;

FIG. 5 is a schematic circuit diagram of a CPU in an embodiment of the present invention;

FIG. 6 is a schematic circuit diagram of a SATA electronic disk in an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the present number, and larger, smaller, inner, etc. are understood as including the present number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

A automatic recovery system for a disk-dropping fault applied to a disk array is arranged between an electronic disk and a CPU of the disk array, and comprises: the device comprises a controller, a first bus, a second bus and a control circuit. The controller adopts FPGA, the electronic disk adopts SATA electronic disk, and the CPU adopts the CPU model of conventional disk array.

Referring to fig. 1 and 5, the FPGA is connected to the CPU through a PCIE bus, and referring to fig. 2 and 6, the FPGA is connected to the SATA electronic disks through a SATA bus, in this embodiment, the FPGA is commonly connected to four electronic disks, i.e., SATA1 through SATA4, of course, the number of the electronic disks can be increased or decreased arbitrarily in practical application, and the FPGA mainly functions to convert PCIE bus signals of the CPU into SATA bus signals to communicate with the SATA electronic disks.

Referring to fig. 1 and 2, a first capacitor of 0.1uf is connected in series to both the PCIE bus and the SATA bus, and functions as a high-speed signal coupling capacitor and also has a signal level matching function.

Referring to fig. 3, 4 and 6, the IO pin of the FPGA is connected to the power supply terminal of the electronic disk through the control circuit so as to control the power supply state of the electronic disk. Taking a control circuit on signals of a channel of FPGA1_ POWER1 of an SATA1 electronic disk as an example, the control circuit comprises an NMOS tube Q2 and a PMOS tube Q1, a pin B23 of the FPGA is connected with a grid electrode of an NMOS tube Q2 through a current limiting resistor R701 of 22 ohms, the current limiting resistor has the function of avoiding damage to an IO pin of the FPGA, the pin B23 of the FPGA is grounded through a pull-down resistor R750 of 10K ohms, and the pull-down resistor has the function of fixing the level state of the POWER-on initial state as a low level so as to prevent the level state of the FPGA from being uncontrolled before working.

The grid of NMOS pipe Q2 is through the second electric capacity C48 and the third electric capacity C49 ground connection that connect in parallel each other, and second electric capacity C48 adopts 10uf electric capacity, and its effect makes the electronic switch of SATA electronic disc power supply slowly start, can effectively reduce the impulse current that the electronic disc switched on and off brought of disk array in work, and third electric capacity C49 adopts 0.1uf electric capacity, and its effect is the interference that the high frequency clutter brought of filtering, can effectively eliminate the maloperation of electronic disc.

The source electrode of the NMOS tube Q2 is grounded, the drain electrode of the NMOS tube Q2 is connected with the grid electrode of the PMOS tube, the drain electrode of the PMOS tube is connected with a low-voltage working power supply VCC5_ OUT1, and the source electrode of the PMOS tube Q1 is connected with a power supply terminal VCC5_ V1 of the SATA electronic disk. The gate of the PMOS transistor Q1 is connected to a low voltage operating power VCC5_ OUT1 through a pull-up resistor R62 of 100K ohms. The pull-up resistor is used for ensuring that the PMOS tube is in a cut-off state when being electrified. The connected capacitors C51 and C50 on power supply terminal VCC5_ V1 act as filtering.

The invention also relates to an automatic recovery method for the disk-dropping fault applied to the disk array, which comprises the following steps:

The detailed work flow of the embodiment of the invention is described below with reference to a circuit:

when finding that the SATA1 electronic disk is not normally read and written, the CPU firstly sends a soft reset command that the SATA1 electronic disk needs to be reset to the FPGA through the PCIE bus, at the moment, the FPGA sends the soft reset command to the SATA1 electronic disk through the SATA bus, the FPGA sends a handshake signal connected with the SATA1 electronic disk through the SATA bus again after the command is sent, and at the moment, the electronic disk resumes normal read-write operation and then the automatic recovery function is completed. When the electronic disk communication still cannot be normal at this time, the CPU sends a command of POWER-off restart to the electronic disk of the SATA1 through the PCIE bus, and when the FPGA receives the command, the FPGA outputs a low level for 500 milliseconds through the FPGA1_ POWER1 signal line of the B23 pin of the general IO and then outputs a high level. When the output is low level, the NMOS transistor Q2 works in the off state, and the drain of the NMOS transistor Q2 and the gate of the PMOS transistor Q1 are high level, so the PMOS transistor Q1 also works in the off state, the 5V power supply of the SATA1 electronic disk is cut off, and the power-off function of the SATA1 electronic disk is completed.

After the high level is output after 500 milliseconds, the NMOS tube Q2 works in a conducting state, the drain of the NMOS tube Q2 and the grid of the PMOS tube Q1 are in a low level, therefore, the PMOS tube Q1 also works in a conducting state, the 5V power supply of the SATA1 electronic disk is switched on, and the function of electrifying the electronic disk again is completed. At the moment, the FPGA automatically sends a handshake signal connected with the SATA1 electronic disk through the SATA bus again, and at the moment, after the SATA1 electronic disk recovers normal read-write operation, the automatic recovery function is completed. In order to ensure the reliability of the operation, the above two operations are performed two or more times. After the self-recovery processing method is implemented, other faults of all disk arrays except for the permanent failure of the electronic disk can be eliminated. The reliability of the disk array is greatly improved.

The embodiment of the invention can effectively carry out automatic recovery operation on the disk-dropping fault of the disk array. The reliability and the practicability of the disk array are greatly improved, and the service life and the safety of the disk array are also improved.

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.

Claims

1. an automatic recovery system for disk failure that is applied to a disk array, is arranged between the electronic disk and the CPU of the disk array, is characterized in that, comprises:

controller;

a first bus, the controller is connected to the communication end of the CPU through the first bus;

a second bus, the controller is connected to the communication terminal of the electronic disk through the second bus;

A control circuit, the controller is connected with the power supply terminal of the electronic disk through the control circuit, so as to control the power supply state of the electronic disk.

2 . The automatic recovery system for disk failure failure applied to a disk array according to claim 1 , wherein the first bus is a PCIE bus. 3 .

3 . The automatic recovery system for disk failure failure applied to a disk array according to claim 1 , wherein the second bus is a SATA bus. 4 .

4. The automatic recovery system for disk failure according to claim 1, wherein a first capacitor is connected in series with the first bus and the second bus to realize signal coupling and signal power. Flat match.

5. The system according to claim 1 is characterized in that: the control circuit comprises an NMOS tube and a PMOS tube, and the IO port of the controller is connected to the gate of the NMOS tube, The source of the NMOS transistor is grounded, the drain of the NMOS transistor is connected to the gate of the PMOS transistor, the drain of the PMOS transistor is used to connect to a low-voltage operating power supply, and the source of the PMOS transistor is connected to the electronic the power supply side of the disk.

6. The system of claim 5, wherein a current-limiting resistor is set between the IO port of the controller and the gate of the NMOS transistor.

7 . The automatic recovery system for disk failure failure applied to a disk array according to claim 5 , wherein the gate of the NMOS transistor is grounded through a second capacitor and a third capacitor connected in parallel with each other, and the second capacitor is connected to ground. 8 . It is used to realize the slow start of the electronic disk, and the third capacitor is used to filter out high-frequency clutter.

8 . The automatic recovery system for disk drop failure applied to a disk array according to claim 5 , wherein the gate of the PMOS transistor is connected to a low-voltage working power supply through a pull-up resistor. 9 .

9 . The automatic recovery system for disk failure failure applied to a disk array according to claim 1 , wherein the controller is an FPGA. 10 .

10. An automatic recovery method for disk loss failure applied to a disk array, characterized in that it comprises the following steps:

The CPU sends read and write commands to the controller, and after the controller receives the read and write commands, it reads and writes the electronic disk through the communication port and feeds back the results to the CPU;

If the reading and writing of the electronic disk is normal, the CPU continues to perform the reading and writing steps; if the reading and writing of the electronic disk is abnormal, the CPU sends a soft reset command to the electronic disk through the controller;

After the soft reset command is sent, the controller reads and writes the electronic disk again through the communication port. If the reading and writing of the electronic disk returns to normal, the CPU continues to perform the reading and writing steps; if the reading and writing of the electronic disk still cannot be normal, the CPU sends a power-off restart command To the controller, the controller powers off and restarts the electronic disk through the control circuit;

After power off and restart, the controller again reads and writes the electronic disk through the communication port. If the read and write of the electronic disk returns to normal, the CPU continues to perform the read and write steps. The reading and writing of the electronic disk returns to normal.