Disclosure of Invention
The present invention is directed to solving at least one of the problems of the prior art. Therefore, the invention provides an automatic recovery method and system for a disk-dropping fault applied to a disk array, which can eliminate a false alarm fault of non-permanent failure of an electronic disk.
According to the embodiment of the first aspect of the present invention, the automatic recovery system for a disk-dropping fault applied to a disk array is arranged between an electronic disk of the disk array and a CPU, and includes: a controller; the controller is connected with the communication end of the CPU through the first bus and used for receiving a communication instruction; the controller is connected with the communication end of the electronic disk through the second bus and used for reading the state of the electronic disk; and the controller is connected with the power supply end of the electronic disk through the control circuit and is used for controlling the power supply state of the electronic disk.
The automatic recovery system for the disk dropping fault applied to the disk array according to the first aspect of the present invention has at least the following technical effects: the controller of the embodiment of the invention is communicated with the CPU of the disk array through the first bus and is communicated with the electronic disk through the second bus, so that soft reset can be realized, hardware reset can be realized by controlling the on-off of the power supply end of the electronic disk through the control circuit, false alarm faults of all the non-permanent failures of the electronic disks can be eliminated through the soft reset and the hardware reset, and the reliability of the disk array is improved.
According to some embodiments of the invention, the first bus is a PCIE bus.
According to some embodiments of the invention, the second bus is a SATA bus.
According to some embodiments of the invention, a first capacitor is connected in series with each of the first and second buses for signal coupling and signal level matching.
According to some embodiments of the present invention, the control circuit includes an NMOS transistor and a PMOS transistor, the IO port of the controller is connected to the gate of the NMOS transistor, the source of the NMOS transistor is grounded, the drain of the NMOS transistor is connected to the gate of the PMOS transistor, the drain of the PMOS transistor is used for connecting to a low voltage working power supply, and the source of the PMOS transistor is connected to the power supply terminal of the electronic disk.
According to some embodiments of the invention, a current limiting resistor is disposed between the IO port of the controller and the gate of the NMOS transistor.
According to some embodiments of the invention, the gate of the NMOS transistor is grounded through a second capacitor and a third capacitor connected in parallel, the second capacitor is used for realizing slow start of the electronic disk, and the third capacitor is used for filtering out high-frequency noise.
According to some embodiments of the invention, the gate of the PMOS transistor is connected to a low-voltage operating power supply through a pull-up resistor.
According to some embodiments of the invention, the controller is an FPGA.
The automatic recovery method for the disk-dropping fault applied to the disk array according to the second aspect of the invention comprises the following steps:
the CPU sends a read-write command to the controller, and the controller receives the read-write command, then performs read-write operation on the electronic disk through the communication port and feeds back the result to the CPU;
if the electronic disk is read and written normally, the CPU continues to read and write, and if the electronic disk is not read and written normally, the CPU sends a soft reset command to the electronic disk through the controller;
after the soft reset command is sent, the controller carries out read-write operation on the electronic disk through the communication port again, and if the read-write of the electronic disk is recovered to be normal, the CPU continues to carry out the read-write step; if the reading and writing of the electronic disk still cannot be normal, the CPU sends a power-off restarting command to the controller, and the controller performs power-off restarting on the electronic disk through the control circuit;
and after the power failure restart, the controller performs read-write operation on the electronic disk again through the communication port, if the read-write of the electronic disk is recovered to be normal, the CPU continues to perform the read-write step, and if the read-write of the electronic disk is abnormal, the power failure restart step is continuously repeated until the read-write of the electronic disk is recovered to be normal.
The method for automatically recovering the disk-dropping fault applied to the disk array according to the embodiment of the first aspect of the invention at least has the following technical effects: the controller of the embodiment of the invention is communicated with the CPU of the disk array through the first bus and is communicated with the electronic disk through the second bus, so that soft reset can be realized, hardware reset can be realized by controlling the on-off of the power supply end of the electronic disk through the control circuit, false alarm faults of all the non-permanent failures of the electronic disks can be eliminated through the soft reset and the hardware reset, and the reliability of the disk array is improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as the upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings, and is only for convenience of description and simplification of description, but does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.
In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the present number, and larger, smaller, inner, etc. are understood as including the present number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.
A automatic recovery system for a disk-dropping fault applied to a disk array is arranged between an electronic disk and a CPU of the disk array, and comprises: the device comprises a controller, a first bus, a second bus and a control circuit. The controller adopts FPGA, the electronic disk adopts SATA electronic disk, and the CPU adopts the CPU model of conventional disk array.
Referring to fig. 1 and 5, the FPGA is connected to the CPU through a PCIE bus, and referring to fig. 2 and 6, the FPGA is connected to the SATA electronic disks through a SATA bus, in this embodiment, the FPGA is commonly connected to four electronic disks, i.e., SATA1 through SATA4, of course, the number of the electronic disks can be increased or decreased arbitrarily in practical application, and the FPGA mainly functions to convert PCIE bus signals of the CPU into SATA bus signals to communicate with the SATA electronic disks.
Referring to fig. 1 and 2, a first capacitor of 0.1uf is connected in series to both the PCIE bus and the SATA bus, and functions as a high-speed signal coupling capacitor and also has a signal level matching function.
Referring to fig. 3, 4 and 6, the IO pin of the FPGA is connected to the power supply terminal of the electronic disk through the control circuit so as to control the power supply state of the electronic disk. Taking a control circuit on signals of a channel of FPGA1_ POWER1 of an SATA1 electronic disk as an example, the control circuit comprises an NMOS tube Q2 and a PMOS tube Q1, a pin B23 of the FPGA is connected with a grid electrode of an NMOS tube Q2 through a current limiting resistor R701 of 22 ohms, the current limiting resistor has the function of avoiding damage to an IO pin of the FPGA, the pin B23 of the FPGA is grounded through a pull-down resistor R750 of 10K ohms, and the pull-down resistor has the function of fixing the level state of the POWER-on initial state as a low level so as to prevent the level state of the FPGA from being uncontrolled before working.
The grid of NMOS pipe Q2 is through the second electric capacity C48 and the third electric capacity C49 ground connection that connect in parallel each other, and second electric capacity C48 adopts 10uf electric capacity, and its effect makes the electronic switch of SATA electronic disc power supply slowly start, can effectively reduce the impulse current that the electronic disc switched on and off brought of disk array in work, and third electric capacity C49 adopts 0.1uf electric capacity, and its effect is the interference that the high frequency clutter brought of filtering, can effectively eliminate the maloperation of electronic disc.
The source electrode of the NMOS tube Q2 is grounded, the drain electrode of the NMOS tube Q2 is connected with the grid electrode of the PMOS tube, the drain electrode of the PMOS tube is connected with a low-voltage working power supply VCC5_ OUT1, and the source electrode of the PMOS tube Q1 is connected with a power supply terminal VCC5_ V1 of the SATA electronic disk. The gate of the PMOS transistor Q1 is connected to a low voltage operating power VCC5_ OUT1 through a pull-up resistor R62 of 100K ohms. The pull-up resistor is used for ensuring that the PMOS tube is in a cut-off state when being electrified. The connected capacitors C51 and C50 on power supply terminal VCC5_ V1 act as filtering.
The invention also relates to an automatic recovery method for the disk-dropping fault applied to the disk array, which comprises the following steps:
the CPU sends a read-write command to the controller, and the controller receives the read-write command, then performs read-write operation on the electronic disk through the communication port and feeds back the result to the CPU;
if the electronic disk is read and written normally, the CPU continues to read and write, and if the electronic disk is not read and written normally, the CPU sends a soft reset command to the electronic disk through the controller;
after the soft reset command is sent, the controller carries out read-write operation on the electronic disk through the communication port again, and if the read-write of the electronic disk is recovered to be normal, the CPU continues to carry out the read-write step; if the reading and writing of the electronic disk still cannot be normal, the CPU sends a power-off restarting command to the controller, and the controller performs power-off restarting on the electronic disk through the control circuit;
and after the power failure restart, the controller performs read-write operation on the electronic disk again through the communication port, if the read-write of the electronic disk is recovered to be normal, the CPU continues to perform the read-write step, and if the read-write of the electronic disk is abnormal, the power failure restart step is continuously repeated until the read-write of the electronic disk is recovered to be normal.
The detailed work flow of the embodiment of the invention is described below with reference to a circuit:
when finding that the SATA1 electronic disk is not normally read and written, the CPU firstly sends a soft reset command that the SATA1 electronic disk needs to be reset to the FPGA through the PCIE bus, at the moment, the FPGA sends the soft reset command to the SATA1 electronic disk through the SATA bus, the FPGA sends a handshake signal connected with the SATA1 electronic disk through the SATA bus again after the command is sent, and at the moment, the electronic disk resumes normal read-write operation and then the automatic recovery function is completed. When the electronic disk communication still cannot be normal at this time, the CPU sends a command of POWER-off restart to the electronic disk of the SATA1 through the PCIE bus, and when the FPGA receives the command, the FPGA outputs a low level for 500 milliseconds through the FPGA1_ POWER1 signal line of the B23 pin of the general IO and then outputs a high level. When the output is low level, the NMOS transistor Q2 works in the off state, and the drain of the NMOS transistor Q2 and the gate of the PMOS transistor Q1 are high level, so the PMOS transistor Q1 also works in the off state, the 5V power supply of the SATA1 electronic disk is cut off, and the power-off function of the SATA1 electronic disk is completed.
After the high level is output after 500 milliseconds, the NMOS tube Q2 works in a conducting state, the drain of the NMOS tube Q2 and the grid of the PMOS tube Q1 are in a low level, therefore, the PMOS tube Q1 also works in a conducting state, the 5V power supply of the SATA1 electronic disk is switched on, and the function of electrifying the electronic disk again is completed. At the moment, the FPGA automatically sends a handshake signal connected with the SATA1 electronic disk through the SATA bus again, and at the moment, after the SATA1 electronic disk recovers normal read-write operation, the automatic recovery function is completed. In order to ensure the reliability of the operation, the above two operations are performed two or more times. After the self-recovery processing method is implemented, other faults of all disk arrays except for the permanent failure of the electronic disk can be eliminated. The reliability of the disk array is greatly improved.
The embodiment of the invention can effectively carry out automatic recovery operation on the disk-dropping fault of the disk array. The reliability and the practicability of the disk array are greatly improved, and the service life and the safety of the disk array are also improved.
The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention.