CN112650610B

CN112650610B - Linux system crash control method, system and medium

Info

Publication number: CN112650610B
Application number: CN202011462215.7A
Authority: CN
Inventors: 史慧娟
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2023-01-10
Anticipated expiration: 2040-12-11
Also published as: CN112650610A

Abstract

The invention discloses a Linux system crash control method which comprises the steps of establishing a kernel crash analysis thread, analyzing the cause of system crash, and analyzing the system crash caused by a user or hardware or software; creating a kernel crash evasion thread, carrying out a system crash test experiment caused by software, and triggering the system crash caused by the software; reading a log code generated when the system is crashed, and writing the log code into a suspended task function for protection and shielding; packaging the suspended task function into a Linux kernel, restarting an operating system, and re-entering the Linux kernel; by the mode, the system log can be obtained through the Linux command, the phenomenon of triggering breakdown of hardware problems and software problems can be distinguished by analyzing the system log, and the blocking of the service volume is protected and shielded, so that the service volume is not monitored, when the service access data volume is large, the service access data volume can be properly processed, and the software fault is avoided.

Description

A Linux system crash control method, system and medium

技术领域technical field

本发明涉及系统异常分析领域，特别是涉及一种Linux系统崩溃控制方法、系统及介质。The invention relates to the field of system abnormality analysis, in particular to a Linux system crash control method, system and medium.

背景技术Background technique

在使用服务器时，经常遇到服务器异常宕机或者触发Linux内核panic的现象导致服务器崩溃，对于触发服务器崩溃现象造成的原因，首先是服务器本身的硬件问题；或者是外部环境触发问题，如环境温度太高或者太低，触发了服务器自我保护阈值；或者可能是外界环境的病毒造成的影响，亦或者是任务被堵塞导致的服务器系统宕机；无论哪种现象导致的异常重启宕机，都会对客户体验或者客户使用来说都会造成不可估量的影响。When using a server, it is often encountered that the server crashes abnormally or triggers the Linux kernel panic. The cause of the server crash is firstly the hardware problem of the server itself; or the external environment triggers the problem, such as the ambient temperature. If it is too high or too low, the server self-protection threshold is triggered; or it may be the impact of a virus in the external environment, or the server system is down due to task congestion; no matter what kind of phenomenon causes the abnormal restart and downtime, it will affect the Customer experience or customer usage will have an immeasurable impact.

kdump是在系统崩溃、死锁或者死机的时候用来转储内存运行参数的一个工具和服务，这样系统在触发kernel panic的时候就会在var/crash下生成vmcore 文件，Linux工程师根据生成的vmcore-dmesg文件以及vmcore分析系统产生宕机的原因。kdump is a tool and service used to dump memory operating parameters when the system crashes, deadlocks or crashes, so that the system will generate a vmcore file under var/crash when the system triggers a kernel panic. Linux engineers based on the generated vmcore -dmesg file and vmcore analysis system causes downtime.

但是目前针对客户使用中包括银行业务等数据访问量大，负载过大问题而导致Linux任务阻塞触发kernel panic导致的系统崩溃不能很好地解决。However, at present, the system crash caused by the blockage of Linux tasks and the triggering of kernel panic caused by the large amount of data access in customer use, including banking business, and excessive load cannot be solved well.

发明内容Contents of the invention

本发明主要解决的技术问题是提供一种Linux系统崩溃控制方法、系统及介质，能够通过Linux命令获取系统日志，通过分析系统日志能够区分硬件问题和软件问题触发系统崩溃的现象，并对业务量的阻塞建立保护模块屏障，使其不被监测，使得业务访问数据量比较大的时候能够很好地处理，并且避免软件故障发生。The technical problem mainly solved by the present invention is to provide a Linux system crash control method, system and medium, which can obtain system logs through Linux commands, and can distinguish the phenomenon of hardware problems and software problems triggering system crashes by analyzing the system logs, and analyze the traffic The blocking of the protection module establishes a barrier to prevent it from being monitored, so that when the amount of business access data is relatively large, it can be handled well and software failures can be avoided.

为解决上述技术问题，本发明采用的一个技术方案是：提供一种Linux系统崩溃控制方法，包括：包括创建内核崩溃分析线程，分析系统产生崩溃的原因：In order to solve the above-mentioned technical problems, a technical solution adopted by the present invention is: a kind of Linux system crash control method is provided, comprising: comprising creating a kernel crash analysis thread, analyzing the reason that the system produces a crash:

若系统日志中存在用户行为造成的系统崩溃信息，则定义系统崩溃原因为用户造成的系统崩溃；If there is system crash information caused by user behavior in the system log, the cause of the system crash is defined as the system crash caused by the user;

若系统日志中信息存在服务器中硬件故障造成的系统崩溃信息，则定义系统崩溃原因为硬件造成的系统崩溃；If the information in the system log contains system crash information caused by a hardware failure in the server, the cause of the system crash is defined as a system crash caused by hardware;

若系统日志中信息包含系统任务阻塞造成的故障信息或软死锁错误信息，则定义系统崩溃原因为软件造成的系统崩溃；If the information in the system log contains fault information caused by system task blocking or soft deadlock error information, the cause of the system crash is defined as a system crash caused by software;

创建内核崩溃规避线程，对软件造成的系统崩溃进行测试实验，触发系统崩溃，读取系统崩溃时产生的日志代码，并写入挂起任务函数内；Create a kernel crash avoidance thread, test the system crash caused by the software, trigger the system crash, read the log code generated when the system crashes, and write it into the suspend task function;

将挂起任务函数封装到Linux内核中，重启操作系统，重新进入Linux内核。Encapsulate the suspending task function into the Linux kernel, restart the operating system, and re-enter the Linux kernel.

进一步，所述系统崩溃时产生的日志代码中包含任务进程；当系统运行任务进程时造成系统崩溃。Further, the log code generated when the system crashes includes the task process; when the system runs the task process, the system crashes.

进一步，所述写入挂起任务函数内包括以下步骤：Further, the writing suspend task function includes the following steps:

读取系统崩溃时产生的日志代码中任务进程以及任务进程的数量；Read the task process and the number of task processes in the log code generated when the system crashes;

将任务进程以及任务进程的数量写入挂起任务函数。Write the task process and the number of task processes into the suspend task function.

进一步，所述将挂起任务函数封装到Linux内核中包括以下步骤：Further, said encapsulating the suspending task function into the Linux kernel includes the following steps:

清除Linux内核编译过程中产生的编译文件及配置文件；Clear the compilation files and configuration files generated during the Linux kernel compilation process;

清除Linux内核编译过程中产生的对象文件及可执行文件；Clear object files and executable files generated during Linux kernel compilation;

使用界面命令，将内核配置界面变为图形化，选中挂起任务函数，将挂起任务函数编译进Linux内核；Use the interface command to turn the kernel configuration interface into a graphic, select the suspend task function, and compile the suspend task function into the Linux kernel;

通过编译内核命令进行编译Linux内核；Compile the Linux kernel by compiling the kernel command;

使用安装命令安装Linux内核驱动模块；Use the install command to install the Linux kernel driver module;

安装Linux内核。Install the Linux kernel.

一种Linux系统崩溃控制系统，包括：分析模块、规避模块和封装模块；A Linux system crash control system, comprising: an analysis module, an avoidance module and an encapsulation module;

所述分析模块查看系统日志，分析系统产生崩溃为用户造成的系统崩溃、硬件造成的系统崩溃或软件造成的系统崩溃；The analysis module checks the system log, and analyzes the system crash as a system crash caused by the user, a system crash caused by hardware, or a system crash caused by software;

所述规避模块进行软件造成的系统崩溃测试实验，触发系统崩溃，然后读取系统崩溃时产生的日志代码，并写入挂起任务函数内。The avoidance module performs a system crash test experiment caused by software, triggers a system crash, then reads the log code generated when the system crashes, and writes it into the suspending task function.

一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行上述的一种Linux系统崩溃控制方法的步骤。A computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to execute the steps of the above-mentioned Linux system crash control method.

本发明的有益效果是：本发明针对业务数据访问量大而导致的系统崩溃原因能够更好的分析和定位，能够区分硬件问题和软件问题触发重启的现象，并对业务量的阻塞建立保护模块屏障，使其不被监测，使得业务访问数据量比较大的时候能够很好地处理避免软件故障发生，避免系统宕机。The beneficial effects of the present invention are: the present invention can better analyze and locate the cause of the system crash caused by the large amount of business data access, can distinguish the phenomenon of restart triggered by hardware problems and software problems, and establish a protection module for traffic blocking The barrier prevents it from being monitored, so that when the amount of business access data is relatively large, it can be handled well to avoid software failures and system downtime.

附图说明Description of drawings

图1是本发明一种Linux系统崩溃控制方法一较佳实施例的流程图；Fig. 1 is a flow chart of a preferred embodiment of a Linux system crash control method of the present invention;

图2是本发明一种Linux系统崩溃控制系统架构示意图。FIG. 2 is a schematic diagram of the architecture of a Linux system crash control system according to the present invention.

具体实施方式detailed description

下面结合附图对本发明的较佳实施例进行详细阐述，以使本发明的优点和特征能更易于被本领域技术人员理解，从而对本发明的保护范围做出更为清楚明确的界定。The preferred embodiments of the present invention will be described in detail below in conjunction with the accompanying drawings, so that the advantages and features of the present invention can be more easily understood by those skilled in the art, so as to define the protection scope of the present invention more clearly.

本发明实施例包括：Embodiments of the invention include:

第一方面，见图1，一种Linux系统崩溃控制方法，包括：The first aspect, see Fig. 1, a kind of Linux system crash control method, comprises:

创建内核崩溃分析线程，通过Linux命令#more/var/log/messages显示事件发生的时间、事件发生源系统的主机名、产生日志消息的程序名，分析系统产生崩溃的原因；Create a kernel crash analysis thread, and use the Linux command #more/var/log/messages to display the time when the event occurred, the host name of the source system where the event occurred, and the program name that generated the log message, and analyze the cause of the system crash;

若内核崩溃分析线程查看出系统日志中有如下显示：If the kernel crash analysis thread checks out the following display in the system log:

shutdown:shutting down for system rebootshutdown:shutting down for system reboot

init:Switching to runlevel:6exiting on signal 15init: Switching to runlevel: 6 exiting on signal 15

Got SIGTERM，quitting；Got SIGTERM, quitting;

则系统产生崩溃的原因为用户自己发起的重启行为；Then the reason for the system crash is the restart behavior initiated by the user himself;

内核崩溃分析线程通过命令查找上一个用户具体执行过的命令；The kernel crash analysis thread looks up the specific command executed by the last user through the command;

内核崩溃分析线程通过lastcomm显示以前用户使用的命令信息，通过 lastcommroot查看以前root用户执行过的命令，并分析系统日志，若以前用户使用或执行过的命令导致系统崩溃；则为用户自己造成的系统崩溃。The kernel crash analysis thread displays the command information used by the previous user through lastcomm, checks the commands executed by the previous root user through lastcommroot, and analyzes the system log. If the command used or executed by the previous user causes the system to crash; it is caused by the user himself collapse.

通过内核崩溃分析线程查看系统日志；View the system log through the kernel crash analysis thread;

若系统日志中显示为：If the system log shows:

CPU 1:Machine Check Exception:4Bank 4:ba00000000070f0fCPU 1: Machine Check Exception: 4Bank 4: ba00000000070f0f

Kernel panic-not syncing:Machine checkKernel panic-not syncing: Machine check

Kernel panic-not syncing:Uncorrected machine check则为CPU硬件本身故障导致的系统重启；Kernel panic-not syncing: Uncorrected machine check is the system restart caused by the failure of the CPU hardware itself;

若系统日志中显示为：If the system log shows:

kernel:CPUX:Temperature above threshold,cpu clock throttledkernel: CPUX: Temperature above threshold, cpu clock throttled

kernel:CPUX:Core power limit notification(total events＝1)；kernel:CPUX:Core power limit notification(total events=1);

Power Button Pressed received event"button/power PWRF 00000000Power Button Pressed received event "button/power PWRF 00000000

00000000"则服务器过热导致的重启，内核崩溃分析线程会给出提示，建议00000000", the restart caused by overheating of the server, the kernel crash analysis thread will give a prompt, it is recommended

检查数据中心的制冷系统以及服务器的风扇；Check the cooling system of the data center and the fans of the servers;

则系统崩溃为硬件故障产生的原因；Then the system crash is the cause of hardware failure;

硬件故障产生的原因还包括CPU针脚弯曲，内存存在大量不可修复的ECC 故障，磁盘或内存损坏等问题。Causes of hardware failures include bent CPU pins, a large number of irreparable ECC failures in memory, and disk or memory damage.

通过内核崩溃分析线程使用kdump；Analyzing threads through kernel crashes using kdump;

内核崩溃分析线程启动kdump；Kernel crash analysis thread starts kdump;

当系统崩溃时，kdump产生capture当前运行信息的内核，该内核会将此时的内存中的所有运行状态和数据信息收集到虚拟核心vmcore文件中；When the system crashes, kdump generates the kernel that captures the current running information, and the kernel will collect all the running status and data information in the memory at this time into the virtual core vmcore file;

通过Kernel Oops Analyzer诊断虚拟核心vmcore文件中的系统崩溃问题，确定系统崩溃故障产生的原因。Diagnose the system crash problem in the virtual core vmcore file through Kernel Oops Analyzer, and determine the cause of the system crash fault.

若系统日志中显示为：If the system log shows:

“kernel:INFO:task:60blocked for more than 120seconds.”"kernel:INFO:task:60blocked for more than 120seconds."

则为系统任务阻塞造成的故障；It is a fault caused by system task blocking;

若系统日志中显示为：If the system log shows:

BUG:soft lockup-CPU#2stuck for 67s！[vmmemctl:894]BUG:soft lockup-CPU#2 stuck for 67s! [vmmemctl:894]

BUG:soft lockup-CPU#5stuck for 67s！[bdi-default:49]BUG:soft lockup-CPU#5 stuck for 67s! [bdi-default:49]

BUG:soft lockup-CPU#3stuck for 67s！[irqbalance:1351]BUG:soft lockup-CPU#3 stuck for 67s! [irqbalance:1351]

BUG:soft lockup-CPU#4stuck for 67s！[swapper:0]BUG:soft lockup-CPU#4stuck for 67s! [swapper:0]

BUG:soft lockup-CPU#6stuck for 67s！[watchdog/6:30]BUG:soft lockup-CPU#6 stuck for 67s! [watchdog/6:30]

BUG:soft lockup-CPU#5stuck for 67s！[vmmemctl:894]BUG:soft lockup-CPU#5 stuck for 67s! [vmmemctl:894]

BUG:soft lockup-CPU#0stuck for 67s！[events/0:35]BUG:soft lockup-CPU#0stuck for 67s! [events/0:35]

BUG:soft lockup-CPU#7stuck for 67s！[lldpad:1459]BUG:soft lockup-CPU#7stuck for 67s! [lldpad:1459]

BUG:soft lockup-CPU#6stuck for 67s！[mpt_poll_0:376]BUG:soft lockup-CPU#6 stuck for 67s! [mpt_poll_0:376]

BUG:soft lockup-CPU#4stuck for 67s！[ksoftirqd/4:21]BUG:soft lockup-CPU#4stuck for 67s! [ksoftirqd/4:21]

则为系统的某个驱动程序有问题会导致CPU资源不足，太忙从而watchdog 不及时，导致无法收集每一个逻辑CPU运行时使用数据，抛出的软死锁(soft lockup)错误；If there is a problem with a certain driver of the system, the CPU resources will be insufficient, and the watchdog will not be timely due to being too busy, resulting in the inability to collect the usage data of each logical CPU during operation, and a soft lockup error will be thrown;

则系统崩溃为软件宕机产生的原因；Then the system crash is the cause of the software downtime;

为解决软件宕机中产生的系统任务阻塞造成的故障，创建内核崩溃规避线程，内核崩溃规避线程中先进行软件造成的系统崩溃测试实验，触发系统任务阻塞造成的故障从而使系统崩溃；然后读取系统崩溃时产生的日志代码，In order to solve the faults caused by system task blocking caused by software downtime, a kernel crash avoidance thread is created. In the kernel crash avoidance thread, the system crash test experiment caused by software is first carried out, and the fault caused by system task blocking is triggered to cause the system to crash; then read Get the log code generated when the system crashes,

系统崩溃时产生的日志代码为“kernel:INFO:task xxx:60blocked for morethan 120seconds”，日志代码中显示的task xxx的任务进程为造成系统崩溃的进程；The log code generated when the system crashes is "kernel:INFO:task xxx:60blocked for morethan 120seconds", and the task process of task xxx shown in the log code is the process that caused the system crash;

将日志代码中任务进程读取，将任务进程以及任务进程的数量写入 hung_task函数内进行保护与屏蔽，使其不被监测；Read the task process in the log code, write the task process and the number of task processes into the hung_task function for protection and shielding, so that it will not be monitored;

内核崩溃规避线程将hung_task函数封装到Linux内核中，重启操作系统，重新进入Linux内核，使用新编译的Linux的内核便可以规避因任务阻塞造成的系统崩溃。The kernel crash avoidance thread encapsulates the hung_task function into the Linux kernel, restarts the operating system, re-enters the Linux kernel, and uses the newly compiled Linux kernel to avoid system crashes caused by task blocking.

将hung_task函数封装到Linux内核中包括以下步骤：Encapsulating the hung_task function into the Linux kernel includes the following steps:

#make mrproper清除编译过程中产生的所有中间文件，包括过去曾经配置的内核配置文件“.config”都将被清除，即进行新的编译工作时将原来老的配置文件删除，以免影响新的内核编译；#make mrproper Clear all intermediate files generated during the compilation process, including the kernel configuration file ".config" that has been configured in the past, will be cleared, that is, the original old configuration file will be deleted when the new compilation is performed, so as not to affect the new kernel compile;

通过#make clean，清除上次的编译命令所产生的object文件后缀为“.o”的文件及可执行文件；Through #make clean, clear the object files and executable files with the suffix ".o" generated by the last compilation command;

#make menuconfig#等待几秒后，终端变成图形化的内核配置界面，选中已修改的功能模块(hung_task函数)，将该功能编译进内核；#make menuconfig#After waiting for a few seconds, the terminal becomes a graphical kernel configuration interface, select the modified function module (hung_task function), and compile the function into the kernel;

#make-j2//编译内核,如果电脑是四核的，就用了-j4，如果电脑是八核的，也可以用-j8。j后面的数字越大，编译的时间就越快，生成内核模块和vmlinuz， initrd.img，Symtem.map文件；#make-j2//Compile the kernel, if the computer is quad-core, use -j4, if the computer is eight-core, you can also use -j8. The larger the number after j, the faster the compilation time will generate the kernel module and vmlinuz, initrd.img, Symtem.map files;

#make modules_install//安装内核模块，编译成功后，系统会在/lib/modules目录下生成一个子目录，里面存放着新内核的所有可加载模块(即将编译好的 modules拷贝到/lib/modules下)；#make modules_install//Install the kernel module. After the compilation is successful, the system will generate a subdirectory under the /lib/modules directory, which stores all the loadable modules of the new kernel (that is, copy the compiled modules to /lib/modules );

make install//安装内核，即复制.config，vmlinuz,initrd.img，System.map 文件到/boot目录、更新grub。对于RedHat系统以下三个grub文件自动会更新，默认启动新内核。make install//Install the kernel, that is, copy .config, vmlinuz, initrd.img, System.map files to the /boot directory, and update grub. For the RedHat system, the following three grub files will be updated automatically, and the new kernel will be started by default.

其中，kdump转储内存运行工具是在系统崩溃、死锁或者死机的时候用来转储内存运行参数的一个工具和服务。Among them, the kdump dump memory operation tool is a tool and service used to dump memory operation parameters when the system crashes, deadlocks or crashes.

Kernel Oops Analyzer是内核崩溃分析工具；hung task挂起任务函数是一种自我保护模块用于检测系统中是否存在位于处于D状态超过某种特定时间(时长可以设置)的进程，如果存在，就会触发内核导致使得服务器崩溃重启；在 hung_task挂起任务函数中会循环检测所有的进程。Kernel Oops Analyzer is a kernel crash analysis tool; the hung task function is a self-protection module used to detect whether there is a process in the system that is in the D state for more than a certain time (the duration can be set), and if it exists, it will Triggering the kernel causes the server to crash and restart; all processes are cyclically detected in the hung_task function.

Linux自带了一个watchdog的实现，用于监视系统的运行，包括一个内核watchdog module和一个用户空间的watchdog程序；Linux内核watchdog模块通过/dev/watchdog这个字符设备与用户空间通信，用户空间程序一旦打开 /dev/watchdog设备，就会导致在内核中启动一个1分钟的定时器，此后，用户空间程序需要保证在1分钟之内向这个设备写入数据，每次写操作会导致重新设定定时器，如果用户空间程序在1分钟之内没有写操作，定时器到期会导致一次系统reboot操作。Linux comes with a watchdog implementation to monitor the operation of the system, including a kernel watchdog module and a user space watchdog program; the Linux kernel watchdog module communicates with the user space through the character device /dev/watchdog, once the user space program Opening the /dev/watchdog device will cause a 1-minute timer to be started in the kernel. After that, the user space program needs to ensure that data is written to this device within 1 minute. Each write operation will cause the timer to be reset , if the user space program has no write operation within 1 minute, the timer will expire and cause a system reboot operation.

第二方面，基于与前述实施例中一种Linux系统崩溃控制方法同样的发明构思，本说明书实施例还提供一种Linux系统崩溃控制系统，包括：分析模块、规避模块和封装模块；In the second aspect, based on the same inventive concept as the Linux system crash control method in the foregoing embodiments, the embodiment of this specification also provides a Linux system crash control system, including: an analysis module, an avoidance module, and a package module;

分析模块查看系统日志，分析系统产生崩溃为用户造成的系统崩溃、硬件造成的系统崩溃或软件造成的系统崩溃；The analysis module checks the system log, and analyzes the system crash caused by the user, the system crash caused by the hardware or the system crash caused by the software;

规避模块进行软件造成的系统崩溃测试实验，触发软件造成的系统崩溃；然后读取系统崩溃时产生的日志代码，将日志代码中任务进程读取，将任务进程以及任务进程数写入hung_task函数内进行保护与屏蔽，使其不被监测；Avoid the system crash test experiment caused by the software, trigger the system crash caused by the software; then read the log code generated when the system crashes, read the task process in the log code, write the task process and the number of task processes into the hung_task function Protect and shield from monitoring;

封装模块将所述规避模块中的hung_task函数封装到Linux内核中。The encapsulation module encapsulates the hung_task function in the avoidance module into the Linux kernel.

第三方面，基于与前述实施例中一种Linux系统崩溃控制方法同样的发明构思，本说明书实施例还提供一种计算机可读存储介质，其上存储有计算机程序，所述计算机程序被处理器执行上述的一种Linux系统崩溃控制方法的步骤。In the third aspect, based on the same inventive concept as the Linux system crash control method in the foregoing embodiments, the embodiment of this specification also provides a computer-readable storage medium on which a computer program is stored, and the computer program is executed by the processor Execute the steps of the above-mentioned method for controlling a Linux system crash.

以上所述仅为本发明的实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换，或直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围。The above is only an embodiment of the present invention, and does not limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technologies fields, are all included in the scope of patent protection of the present invention in the same way.

Claims

1. A Linux system crash control method is characterized by comprising the steps of establishing a kernel crash analysis thread and analyzing the cause of system crash; the reasons for system crash include system crash caused by user, system crash caused by hardware and system crash caused by software;

creating a kernel crash avoidance thread, carrying out a test experiment on system crash caused by software, triggering the system crash, reading a log code generated when the system crashes, and writing the log code into a suspended task function;

packaging the suspension task function into a Linux kernel, restarting an operating system, and reentering the Linux kernel;

the log code generated when the system crashes comprises a task process; causing the system to crash when the system runs a task process.

2. The Linux system crash control method of claim 1, wherein: the writing suspension task function comprises the following steps:

reading task processes and the number of the task processes in a log code generated when a system is crashed;

and writing the task processes and the number of the task processes into a suspended task function.

3. The Linux system crash control method of claim 1, wherein: the step of packaging the task suspending function into the Linux kernel comprises the following steps:

clearing a compiling file and a configuration file generated in the Linux kernel compiling process;

clearing object files and executable files generated in the Linux kernel compiling process;

changing a kernel configuration interface into a graphical mode by using an interface command, selecting a suspension task function, and compiling the suspension task function into a Linux kernel;

compiling the Linux kernel through a compiling kernel command;

installing a Linux kernel driver module by using an installation command;

and installing a Linux kernel.

4. A Linux system crash control system, comprising: the device comprises an analysis module, an avoidance module and a packaging module;

the analysis module checks the system log, and analyzes the system crash caused by a user, hardware or software;

the evasion module performs a system crash test experiment caused by software, triggers system crash, reads a log code generated when the system crashes, and writes the log code into a suspended task function;

the encapsulation module encapsulates the suspension function in the avoidance module into a Linux kernel;

the log code generated when the system crashes comprises a task process; causing a system crash when the system runs a task process.

5. A computer-readable storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to perform the steps of the Linux system crash control method of any one of claims 1-3.