CN119065879A

CN119065879A - DRAM fault detection method, device, storage medium and electronic device

Info

Publication number: CN119065879A
Application number: CN202411071360.0A
Authority: CN
Inventors: 肖洪钦; 高影; 虞青松
Original assignee: Shenzhen Quanhuida Technology Co ltd
Current assignee: Shenzhen Quanhuida Technology Co ltd
Priority date: 2024-08-06
Filing date: 2024-08-06
Publication date: 2024-12-03

Abstract

The embodiment of the application discloses a fault detection method of a DRAM, a fault detection device of the DRAM, a storage medium and electronic equipment, and relates to the field of computers. According to the scheme, the address decoding mapping information of the DRAM is obtained in the PEI phase, and fault detection is carried out by utilizing the information in the DXE phase, so that the error memory address in the DRAM can be accurately positioned. Once the memory address error is detected, the detailed address information such as channel, rank, bank, chip, row, column and the like corresponding to the address can be further analyzed, an accurate basis is provided for the subsequent memory fault repair of the computer system with the Intel architecture, the memory fault detection step is simplified, and the fault detection efficiency is improved.

Description

Method and device for detecting faults of DRAM (dynamic random Access memory), storage medium and electronic equipment

Technical Field

The present application relates to the field of computers, and in particular, to a method and apparatus for detecting a failure of a DRAM, a storage medium, and an electronic device.

Background

In order to test the stability of the memory, the parameters of the memory are optimized. Although the memory particles are detected at the time of delivery, the delivery detection can only test the quality of single memory particles, and when the memory particles are combined into a memory bank, the memory particles naturally form the signal latency between the central particles and the edge-most particles (because the signal lines are different in length), so that the test is needed to confirm the stability and optimize the parameters. In Intel architecture computer systems, the current memory address mapping method is mostly estimated by repeatedly hammering (Rowhammer) and then cross-comparing, and the memory testing process is complex and takes a long time.

Disclosure of Invention

The embodiment of the application provides a fault detection method of a DRAM, a fault detection device of the DRAM, a storage medium and electronic equipment, which can solve the problems of long time consumption and steps of a fault detection step of a processor in the prior art. The technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a method for detecting a failure of a DRAM, where the method includes:

In the PEI phase, acquiring address decoding mapping information of the DRAM according to the MRC code, and transmitting the address decoding mapping information to the DXE phase;

In the DXE stage, performing fault detection on the DRAM, and inquiring channel address information, rank address information, bank address information, chip address information, row address information and column address information corresponding to the memory address according to the address mapping information when detecting that the memory address of the DRAM is wrong;

Detecting whether a preset number of redundant rows exist in the DRAM;

If yes, repairing the row indicated by the row address information in the DRAM by utilizing the redundant row;

If not, prompting the user to replace the DRAM chip indicated by the chip address information.

In a second aspect, an embodiment of the present application provides a failure detection apparatus for a DRAM, the apparatus including:

A transfer unit, configured to obtain address decoding mapping information of the DRAM according to the MRC code in the PEI phase, and transfer the address decoding mapping information to the DXE phase;

The query unit is used for carrying out fault detection on the DRAM in the DXE stage, and querying channel address information, rank address information, bank address information, chip address information, row address information and column address information corresponding to the memory address according to the address mapping information when detecting that the memory address of the DRAM is wrong;

a detection unit for detecting whether a preset number of redundant rows exist in the DRAM;

a repair unit, configured to repair, if yes, a row indicated by the row address information in the DRAM by using the redundant row;

And the prompting unit is used for prompting a user to replace the DRAM chip indicated by the chip address information if the chip address information is not the same.

In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-described method steps.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory, a memory controller, and the above-mentioned failure detection apparatus for a DRAM.

The technical scheme provided by the embodiments of the application has the beneficial effects that at least:

In an Intel architecture computer system, by acquiring address decoding mapping information of a DRAM in a PEI phase and utilizing the information to perform fault detection in a DXE phase, the technical scheme can accurately position an error memory address in the DRAM. Once the memory address error is detected, the detailed address information such as channel, rank, bank, chip, row, column and the like corresponding to the address can be further analyzed, and an accurate basis is provided for subsequent fault repair. After detecting the failure of the DRAM, the technical scheme firstly checks whether the DRAM has the redundancy rows with the preset number. If so, the redundant rows are utilized to repair the faulty rows without replacing the whole DRAM chip, thereby saving cost and time. This redundancy-based repair mechanism improves the reliability and maintenance efficiency of the system. For the DRAM faults which are not enough to repair the redundant rows, the technical scheme provides a flexible fault processing strategy. The user can be prompted to replace the wrong DRAM chip, and continuous and stable operation of the system is ensured. This approach allows for both cost effectiveness and availability and maintainability of the system. By timely detecting and repairing faults in the DRAM, the technical scheme remarkably improves the stability and reliability of the system. The risk of system breakdown and data loss caused by DRAM faults is reduced, and a more stable and reliable computing environment is provided for users. According to the technical scheme, the functions of DRAM fault detection and repair are integrated in the DXE stage, so that the system has stronger self-repair capability. This capability helps to reduce human intervention and downtime, improving overall performance and availability of the system. Through automatic fault detection and repair flow, the technical scheme reduces confusion and uneasiness of users facing DRAM faults. The problem can be rapidly and accurately positioned and a solution can be provided, so that the satisfaction and the trust of the user are improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a method for detecting faults of a DRAM according to an embodiment of the present application;

Fig. 3 is a schematic structural diagram of a failure detection device of a DRAM according to the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.

It should be noted that, the method for detecting the failure of the DRAM according to the present application is generally executed by a processor, and correspondingly, the method for detecting the failure of the DRAM is generally disposed in the processor.

It should be noted that, the fault detection method of the DRAM provided by the present application is generally executed by a processor.

Fig. 1 shows a schematic configuration of an electronic device to which the present application can be applied.

Referring to fig. 1, a schematic structural diagram of an electronic device is provided in an embodiment of the present application. As shown in FIG. 1, the electronic device comprises a processor and a memory, wherein a memory controller is arranged in the processor and is used for executing read-write operation on the memory.

The processor comprises a central processing unit (Central Processing Unit, CPU), a main board (MainBoard);

The CPU uses various interfaces and lines to connect the various parts of the overall system, performing various functions of the chip and processing data by running or executing instructions, programs, code sets, or instruction sets stored in memory, and invoking data stored in memory. The CPU is an Intel series processor.

The Memory may include a random access Memory (Random Access Memory, RAM), a Read-Only Memory (rom), or a storage device on a network server. The tested memory in fig. 1 is DRAM (Dynamic Random Access Memory).

In the memory controller shown in fig. 1, the overall controller may be configured to invoke an application program stored in the memory, and specifically execute the method shown in fig. 2, and the specific process may be shown in fig. 2, which is not described herein.

The method for detecting the failure of the DRAM according to the embodiment of the present application will be described in detail with reference to fig. 2. The fault detection device of the DRAM in the embodiment of the present application may be the fault detection device of the DRAM shown in fig. 1.

Referring to fig. 2, a flow chart of a method for detecting a failure of a DRAM is provided in an embodiment of the present application. As shown in fig. 2, the method according to the embodiment of the present application may include the following steps:

s201, in the PEI phase, address decoding mapping information of the DRAM is obtained according to the MRC code, and the address decoding mapping information is transferred to the DXE phase.

After the system is powered up, the processor first executes basic boot codes in the BIOS or UEFI firmware, which are responsible for initializing the basic operating environment of the CPU and loading the PEI module. In the PEI phase, the processor will call and execute MRC (Memory Reference Code) code. MRC is a program dedicated to DRAM initialization, with an in-depth knowledge of the physical and logical structure of the DRAM. The MRC code will first scan all DRAM modules installed on the system motherboard and identify the basic information of the DRAM module such as type, capacity, speed, etc. Depending on the characteristics of the DRAM module, the MRC code configures the operating parameters of the DRAM, such as timing, voltage, frequency, etc., to ensure that the DRAM is capable of stable operation. During configuration, the MRC code may generate address decode mapping information for the DRAM. This information is used to describe in detail the internal organization of the DRAM, including the mapping between channels, rank, bank, chip, row and column, which is critical to subsequent memory access and failure detection. The MRC code stores the generated address decode mapping information in some location accessible to the processor, typically a register of the CPU, a specific area in memory, or some data structure in firmware. After the PEI phase has completed its tasks, the system will enter the DXE (Driver Execution Environment) phase. Before the DXE phase begins, the PEI phase passes all important information, including address decode mapping information, to the DXE phase. This transfer process may be accomplished by directly copying data, setting global variables, updating data structures in firmware, or triggering the DXE phase to load a particular driver. At the beginning of the DXE phase, all information passed from the PEI phase, including address decode mapping information of the DRAM, is checked and received.

S202, in the DXE stage, fault detection is carried out on the DRAM, and when the memory address of the DRAM is detected to be wrong, channel address information, rank address information, bank address information, chip address information, row address information and column address information corresponding to the memory address are queried according to the address mapping information.

Wherein after the system transitions from PEI phase to DXE phase, the processor executes the initialization code of DXE phase and loads the necessary drivers and services. The DXE phase prepares or loads a tool or module dedicated to DRAM failure detection. The tool may be a separate driver or may be a function integrated in firmware. The processor initiates a DRAM failure detection tool that traverses all address spaces of the DRAM, performs read and write tests, etc., to check the reliability of each memory address. If the tool finds that a certain memory address is wrong (e.g. the read data is inconsistent with the written data) during the detection process, the wrong address is recorded and marked as a faulty address. When a DRAM memory address error is detected, the processor accesses the address mapping information stored in the PEI phase. This information details the physical organization of the DRAM and its mapping to logical addresses. The processor uses the address mapping information to resolve the failed address. Specifically, the logical address (i.e., the erroneous memory address) is converted into a corresponding physical address component, including channel address information, rank address information, bank address information, chip address information, row address information, and column address information. By resolving the address, the processor can pinpoint the physical location of the error in the DRAM. This is critical for subsequent fault repair or handling. The memory address has a bit width of 32 bits or 64 bits.

S203, detecting whether the DRAM has a preset number of redundant rows.

Wherein the processor first needs to access configuration information of the DRAM module through a specific hardware interface or firmware service. This information is typically stored in one or more registers of the DRAM module itself, and possibly also in the BIOS or UEFI firmware of the motherboard. The processor parses the configuration information to obtain various parameters about the DRAM module, including whether it supports redundant row functionality and the number of redundant rows. The processor checks whether the redundant row function is explicitly indicated to be supported in the configuration information of the DRAM module. If not, there is no need to further detect the number of redundant rows. If the DRAM module supports redundant rows, the processor will read and record the number of redundant rows, which may be a value stored directly in the configuration information or may be calculated (e.g., based on the total capacity of the DRAM module and the proportion occupied by the redundant rows). The processor also needs to obtain a preset redundant line number threshold. This threshold may be set by the system manufacturer or user according to specific reliability requirements, and the number threshold may be stored in some setting of the BIOS/UEFI or as one of the system initialization parameters. The processor compares the number of redundant rows actually available in the DRAM module with a preset threshold. If the number of redundant rows in the DRAM module meets or exceeds a preset threshold, the processor will consider the DRAM to have sufficient redundant resources to handle the potential row failure and may continue to perform other system initialization tasks. If the number of redundant rows is insufficient, the processor may record an error or warning message and notify the user via a system log, screen display or other means.

S204, if yes, repairing the row indicated by the row address information in the DRAM by utilizing the redundant row according to hPPR mechanism.

The processor first performs a memory pressure test on the DRAM through a preset test system to detect and locate a failed row (row). This process may involve various means of read-write testing, address scanning, etc. to simulate high load conditions in actual use, triggering a potential failure. The test system records and returns specific address information of the fault line to the processor. This information is the basis for the subsequent repair process. The processor checks whether the DRAM module is provided with sufficient redundant row resources. According to the DDR4 standard, DRAM capacities of 8Gb and above must be equipped with PPR (post-package repair) functions, including redundant rows. The processor needs to configure a series of system parameters and DRAM settings to ensure that hPPR repair processes proceed smoothly. This may include turning off DBI (data bus inversion) and CRC (cyclic redundancy check) functions, and setting Guard Key security mechanisms to prevent illegal hPPR operations. The processor sends hPPR a repair command to the DRAM. This command typically includes address information for the failed row and instructions that instruct the DRAM to replace with a redundant row.

After receiving hPPR repair commands, the DRAM will find an available redundant row in the DRAM module to map according to the address information of the failed row. The mapping process is completed on the hardware level, so that the repaired data transmission can be smoothly performed. After the address mapping is complete, the DRAM will migrate the data in the failed row (if any) to the redundant row. This migration process may be automatic or may require further instructions from the processor. At the same time, the DRAM also performs the necessary testing and verification of the redundant rows to ensure that it is functioning properly. In order to verify whether the repair is successful, the processor writes preset test data into the repaired redundant row (i.e. the address of the original failed row). These data typically have a particular pattern or value to facilitate the subsequent verification process. The processor then reads the test data from the redundant row and compares it to the data as it was written. If the two are consistent, the restoration is successful, and if the two are inconsistent, the restoration is failed or other problems exist. If the repair is successful, the processor needs to record the information of the repair, including the address of the failed row, the redundant row address used, etc. This information is important for subsequent maintenance and management. If the repair fails or other problems are encountered, the processor needs to perform corresponding processing according to the error handling mechanism of the system. This may include logging of errors, triggering alarm notifications, attempting other repair methods, etc.

S205, if not, prompting the user to replace the DRAM chip indicated by the chip address information.

Wherein if the processor determines that the failed row cannot be repaired by the internal mechanism, it will proceed to the next step, namely consider replacing the DRAM chip. The processor uses address information (including chip address information) for the failed rows to determine which DRAM chip contains the failed rows. The chip address information is typically used to identify a particular chip or chip set in the DRAM module. The processor generates a replacement hint based on the determined chip address information. This prompt will inform the user which DRAM chip needs to be replaced and may include specific location information for the chip (e.g., slot number, chip number, etc.). The processor displays the replacement hint to the user in some manner (e.g., system log, screen display, BIOS/UEFI interface, etc.). The processor may need to wait for the user to view the prompt and respond. This typically involves a user shutting down the system, removing the DRAM module, replacing the designated DRAM chip, and restarting the system. After the user completes the replacement of the DRAM chip and reboots the system, the processor may perform a series of verification tests to confirm whether the failure has been resolved. These tests may include memory stress tests, address scans, etc. to ensure that the DRAM module is now functioning properly. If the replacement is successful, the processor may record information of the replacement, including the replacement DRAM chip number, time stamp, etc. Such information is useful for future maintenance and troubleshooting. If there is still a problem after replacement (e.g., the failure still exists or the system cannot be started), the processor needs to perform corresponding processing according to the error handling mechanism of the system. This may include logging of errors, triggering alarm notifications, providing further troubleshooting advice, and the like.

In some embodiments of the present application, the MRC code is obtained from Flash memory of the motherboard. Flash memory on the motherboard (typically the location where BIOS or UEFI firmware is stored) is used to store the boot code and configuration information for the system. Such code and information are loaded and executed by the processor at system start-up to initialize the various components of the system

In some embodiments of the application, the performing fault detection on the DRAM includes:

Loading a memory test code, and writing target data according to a memory address;

after the writing is finished, reading data according to the memory address;

comparing whether the written target data and the read data are consistent, if not, determining that the memory address of the DRAM has faults.

Wherein the target data is all 1 data or all 0 data. The processor loads and executes the memory test code at the DXE (driver execution environment) stage or some appropriate stage of system startup. These test codes typically contain a series of instructions for verifying the functionality and reliability of the DRAM. The memory test code first selects one or more memory addresses as test targets. Then, predetermined target data is written to these addresses. These data may be in a simple pattern (e.g., all 0 s, all 1 s, alternating 0 s and 1 s, etc.) or may be in a more complex sequence of data to facilitate subsequent error detection. After the data writing is completed, the test code accesses the memory address written before again and reads the data stored therein. This step is critical to verifying data integrity and DRAM memory function. The test code will compare the target data previously written to and the data now read from the DRAM. This comparison process uses a logical operation or bit comparison algorithm to ensure that each bit of data is completely identical. If the read data is completely consistent with the written target data, then the DRAM storage function of the memory address may be considered normal. If the read data is inconsistent with the written target data, then it can be determined that the DRAM memory address has failed. Such failures may be further due to hardware defects, electrical disturbances, timing problems, or other factors, the loading of memory test code including:

The memory test code is read from an external mobile storage device or from a network server.

The process of reading the memory test code from the external mobile storage device includes that a user can connect the mobile storage device (such as a USB flash drive, a mobile hard disk, etc.) containing the memory test code to the computer during the system start-up or operation. The operating system will recognize the newly connected mobile storage device and mount the device, either automatically or upon user instruction, to make its file system accessible. The system (which may be a BIOS/UEFI environment, operating system, or specialized diagnostic tool) scans the mounted mobile storage device for memory test code files and loads them into memory for execution.

Reading the memory test code from the network server includes ensuring that the computer is connected to the network and has the rights to access the desired network server and the necessary network configuration. The system sends a request to a designated web server via HTTP, FTP or other network protocol, and downloads the memory test code file. This process may require the user to enter the URL address of the server, authentication information, etc. After the download is completed, the system performs necessary verification (e.g., checking the integrity of the file, signing, etc.) on the test code file to ensure its security and correctness. After passing the verification, the system loads the test code into the memory and executes

Further, the read memory test code is decrypted, the integrity of the decrypted memory test code is checked, and the checking result is passed.

In an embodiment of the present application, address decode mapping information of the DRAM is obtained through MRC (memory reference code) in PEI (pre-EFI initialization) phase, and these critical information are seamlessly transferred to DXE (driver execution environment) phase. The process ensures the continuity and accuracy of the memory address mapping information, and provides a solid foundation for subsequent memory fault detection and repair.

In the DXE stage, when the memory address of the DRAM is wrong, the technical scheme can quickly utilize the address decoding mapping information transmitted before to accurately inquire the detailed level information corresponding to the wrong memory address, including channel, rank, bank, chip, row and column address information. This accurate positioning capability greatly improves the efficiency of troubleshooting so that problems can be quickly locked into a minimum fault range.

Aiming at the detected DRAM faults, the technical scheme provides two flexible processing modes. First, if there are a preset number of redundant rows in the DRAM, the system will automatically utilize these redundant rows to repair the failed row, thereby avoiding the risk of data loss and system instability. Secondly, if the redundant rows are insufficient or cannot be repaired, the system prompts a user to replace a specific DRAM chip in time (indicated by chip address information), so that the fault can be timely and effectively solved.

Through quick and accurate fault positioning and a flexible fault processing mechanism, the technical scheme remarkably improves the stability and reliability of the system. Whether automatic repair or manual replacement can effectively reduce the system downtime caused by DRAM faults, and reduce maintenance cost and inconvenience for users.

For a user, the technical scheme greatly simplifies the fault processing flow by providing detailed fault information and explicit replacement guidance. The user does not need to have professional technical knowledge, and can quickly complete the replacement work of the DRAM chip according to the prompt of the system, so that the overall user experience is improved.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 3, a schematic structural diagram of a failure detection apparatus for a DRAM according to an exemplary embodiment of the present application is shown, and hereinafter referred to as a failure detection apparatus 3 for a DRAM. The failure detection means 3 of the DRAM may be implemented as all or part of the processor by software, hardware or a combination of both. The fault detection device 3 of the DRAM comprises a transmission unit 301, a query unit 302, a detection unit 303, a repair unit 304 and a prompt unit 305.

A transfer unit 301, configured to obtain address decoding mapping information of the DRAM according to the MRC code in the PEI phase, and transfer the address decoding mapping information to the DXE phase;

A query unit 302, configured to perform fault detection on the DRAM in the DXE phase, and query channel address information, rank address information, bank address information, chip address information, row address information, and column address information corresponding to the memory address according to the address mapping information when detecting that the memory address of the DRAM is wrong;

A detecting unit 303, configured to detect whether a preset number of redundant rows exist in the DRAM;

a repair unit 304, configured to repair, if yes, a row indicated by the row address information in the DRAM by using the redundant row;

And the prompting unit 305 is configured to prompt a user to replace the DRAM chip indicated by the chip address information if no.

In one or more possible embodiments, further comprising:

the acquisition unit is used for acquiring the MRC code from the Flash memory of the main board.

In one or more possible embodiments, the performing fault detection on the DRAM includes:

after the writing is finished, reading data according to the memory address;

In one or more possible embodiments, the loading the memory test code includes:

In one or more possible embodiments, further comprising:

The processing unit is used for decrypting the read memory test code, and carrying out integrity check on the decrypted memory test code, wherein the check result is passed.

In one or more possible embodiments, the target data is all 1 data or all 0 data.

In one or more possible embodiments, the memory address has a bit width of 32 bits or 64 bits.

It should be noted that, in the foregoing embodiment, the apparatus 3 is only exemplified by the division of the above functional modules when executing the failure detection method of the DRAM, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above. In addition, the fault detection device of the DRAM provided in the above embodiment belongs to the same concept as the fault detection method embodiment of the DRAM, which represents a detailed implementation process, and is not described herein.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are adapted to be loaded by a processor and execute the steps of the method shown in the embodiment of fig. 2, and the specific execution process may refer to the specific description of the embodiment shown in fig. 2, which is not repeated herein.

The present application also provides a computer program product storing at least one instruction that is loaded and executed by the processor to implement the method for detecting a failure of a DRAM as described in the above embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory, a random access memory, or the like.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. A DRAM fault detection method, comprising:

In the PEI stage, the address decoding mapping information of the DRAM is obtained according to the MRC code, and the address decoding mapping information is transmitted to the DXE stage;

In the DXE stage, the DRAM is fault-detected, and when a memory address of the DRAM is detected to be faulty, the channel address information, rank address information, bank address information, chip address information, row address information, and column address information corresponding to the memory address are queried according to the address mapping information;

Detecting whether a preset number of redundant rows exist in the DRAM;

If yes, repair the row indicated by the row address information in the DRAM using the redundant row;

If not, the user is prompted to replace the DRAM chip indicated by the chip address information.

2. The method according to claim 1, further comprising:

Get the MRC code from the mainboard's Flash memory.

3. The method according to claim 1 or 2, characterized in that the fault detection on the DRAM comprises:

Load the memory test code and write the target data according to the memory address;

After writing is completed, reading data according to the memory address;

Compare whether the written target data and the read data are consistent. If not, it is determined that the memory address of the DRAM fails.

4. The method according to claim 3, characterized in that the loading of memory test code comprises:

5. The method according to claim 4, further comprising:

The read memory test code is decrypted, and the decrypted memory test code is integrity checked, and the check result is passed.

6 . The method according to claim 4 , wherein the target data is all-1 data or all-0 data.

7. The method according to claim 6, characterized in that the bit width of the memory address is 32 bits or 64 bits.

8. A DRAM fault detection device, comprising:

A transfer unit, used for obtaining address decoding mapping information of DRAM according to the MRC code in the PEI stage, and transferring the address decoding mapping information to the DXE stage;

a query unit, configured to perform fault detection on the DRAM in the DXE stage, and when a memory address of the DRAM is detected to be faulty, query the channel address information, rank address information, bank address information, chip address information, row address information, and column address information corresponding to the memory address according to the address mapping information;

A detection unit, used for detecting whether the DRAM has a preset number of redundant rows;

a repair unit, configured to, if yes, repair the row indicated by the row address information in the DRAM using the redundant row;

The prompting unit is used to prompt the user to replace the DRAM chip indicated by the chip address information if the answer is no.

9. A computer storage medium, characterized in that the computer storage medium stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor and executing the method steps according to any one of claims 1 to 7.

10. An electronic device, comprising: a memory, a memory controller and the DRAM fault detection device according to claim 8.