Disclosure of Invention
The invention mainly aims to provide a single-chip server design method, an SOC chip and a server, and aims to solve the technical problem of high use cost of the existing server.
The invention provides a design method of a single-chip server, which comprises the steps that the single-chip server adopts a single SOC chip, and the SOC chip is simultaneously used as a BMC chip for bearing BMC service and a CPU chip for bearing a user system;
Installing a first operating system on the SOC chip and taking the first operating system as a host, starting OpenBMC service and keeping running by a kernel of the host, responding to a management and monitoring request of a user side by the OpenBMC service, and taking a gigabit network card provided by the SOC chip as a management network card by the OpenBMC service;
The method comprises the steps of starting a first virtual machine and a second virtual machine in a kernel of a host, running a mirror image of a second operating system installed by a user for operation of a user side by the first virtual machine, running a firmware program by the second virtual machine to manage flash memory and provide block equipment service, mapping block equipment of the second virtual machine to the first virtual machine and taking the block equipment as a system disk, and managing the flash memory by adopting a RAID redundancy mechanism based on a RAID engine of the SOC chip.
Optionally, in a first implementation manner of the first aspect of the present invention, the second virtual machine implements the following operations by using the firmware program:
executing a command set defined in the NVMe protocol to process the read-write IO request;
performing physical operations of the flash memory, including storage, reading and erasure of data;
The NVMe command is performed to manage the reading and writing of data and the storage of data.
Optionally, in a second implementation manner of the first aspect of the present invention, the single-chip server design method further includes:
And respectively leading the register, the interrupt number and the memory to the second virtual machine, so that the physical address and the interrupt number accessed by the second virtual machine and the host machine are completely consistent.
Optionally, in a third implementation manner of the first aspect of the present invention, an implementation manner of directly sending the register and the interrupt number to the second virtual machine is as follows:
when the first operating system of the host is started, configuring a register range and an interrupt number of each Platform device in a device tree file;
After entering the first operating system of the host, using VFIO-Platform drivers to cover the original drivers of all Platform devices and take over all the Platform devices;
adding VFIO-Platform devices when the QEMU is used for starting the second virtual machine so as to enable the VFIO-Platform devices to be directly connected to the second virtual machine;
And in the starting process of the second virtual machine, calling a first function to acquire a Platform bus memory mapping I/O area corresponding to VFIO-Platform equipment and serve as a memory subarea through registers of each VFIO-Platform equipment, adding the memory subarea into the memory area and using a physical register address corresponding to VFIO-Platform equipment, and calling a second function to map interrupt numbers of each VFIO-Platform equipment to physical interrupts through registers of each VFIO-Platform equipment.
Optionally, in a fourth implementation manner of the first aspect of the present invention, an implementation manner of directly passing the memory to the second virtual machine is as follows:
When the first operating system of the host is started, configuring a memory range which needs to be directly used for the second virtual machine in an equipment tree file;
The new kernel module is used for creating character equipment, acquiring an equipment physical address through the character equipment and mapping the equipment physical address to a virtual memory;
And mapping the memory physical address of the host machine to a virtual memory and mapping the virtual memory to the same physical address of the second virtual machine by using a file descriptor returned by the character equipment generated by the newly added kernel module.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the single-chip server design method further includes:
When receiving a power-on and power-off request sent by a user side in the form of https protocol, the OpenBMC service analyzes the power-on and power-off request and then calls an API interface of a virtualization management tool to execute power-on and power-off operation of the host;
when receiving an installation request sent by a user side in the form of https protocol and an uploaded ISO mirror image, openBMC analyzes the installation request, and then starts UEFI firmware to load the uploaded ISO mirror image through the first virtual machine and executes the installation of the second operating system.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the single-chip server design method further includes:
Adding IOMMU support in kernel starting parameters of the host;
After unbinding the hardware driver from the original kernel driver, binding to VFIO-Platform equipment, and providing the unbinding hardware driver for the first virtual machine so as to enable the first virtual machine to directly access physical hardware.
Optionally, in a seventh implementation manner of the first aspect of the present invention, the single-chip server design method further includes:
the OpenBMC service communicates with the hardware in the single-chip server through a preset interface to read the original data for out-of-band monitoring, processes the original data and provides the processed data to a user side for display or alarm, wherein the original data comprises one or more of a temperature sensor, a voltage monitor and a fan state;
And the first virtual machine communicates with the host machine in a bridging mode, so that the service OpenBMC periodically collects the resource use data of the first virtual machine for in-band monitoring, analyzes and processes the resource use data and provides the analyzed and processed resource use data to a user side for display.
The second aspect of the present invention provides an SOC chip designed by using any one of the single-chip server design methods provided in the first aspect.
A third aspect of the present invention provides a server including the SOC chip provided in the second aspect, the server being designed for any one of the single-chip server design methods provided in the first aspect.
The single server designed by the single-chip server design method provided by the invention only uses a single SOC chip to serve as a BMC chip (BMC processor) for bearing BMC service and a CPU chip (CPU processor) for bearing a user system. The system disk of the server can realize redundancy protection even if RAID card hardware is not adopted, and BMC service can realize management request communication without a separate management network card. In addition, openBMC service adopted by the invention can respond to management and monitoring requests of a user side, thereby having the functions of out-of-band and in-band management. The invention effectively reduces the hardware cost of the server and improves the localization degree and the completeness of monitoring management on the premise of not influencing the service function and the usability of the server.
Detailed Description
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of a technical architecture of a single-chip server in an actual application scenario in an embodiment of the invention.
As shown in fig. 1, the console is a user side, and the server is a single-chip server designed by adopting the design method of the embodiment. The user side communicates with the single-chip server through a web mode.
The single-Chip server of the embodiment is provided with a single SOC Chip (System on Chip), which is used as a Chip for carrying the BMC service and also as a CPU Chip of the user System. The SOC chip has the characteristics of high safety, perfect data fault tolerance, support NAND FLASH management, RAID & EC protection, high capacity and flexible expansion, rich interfaces and protocols and the like.
In this embodiment, an Operating System (OS) may be installed in the SOC chip of the server as a host, for example, linuxCentos/Debian/Ubuntu or a domestic operating system. The host kernel initiates OpenBMC the service and remains running for responding to user-side management and monitoring requests, such as power-on and power-off requests, installation requests, out-of-band monitoring, in-band monitoring, etc. In addition, various drivers such as a USB driver, an I2C driver, an LPC driver, a NIC driver, and the like are also mounted in the SOC chip.
OpenBMC services are used for providing a flexible and extensible firmware stack for the baseboard management controller BMC, are compatible with management protocols (HTTP/HTTPS/IPMI/SNMP and the like) and functions of traditional BMC services, and comprise hardware monitoring, remote management, firmware updating and the like, and the open source property of the service enables the service to be quickly suitable for new hardware platforms and management requirements. OpenBMC service uses gigabit network card provided by the SOC chip as management network card for network communication, thereby saving cost of management network card for server.
In this embodiment, since a single SOC chip is used as a chip for carrying BMC services, in order to implement interaction with a user side, a system disk of a server is implemented at the same time, so two virtual machines are started in a kernel of a host machine:
(1) The first virtual machine VM1 runs an image (e.g., windows, linux, etc.) of a second operating system (user OS) installed by the user for use by the user side operations.
(2) The second virtual machine VM2 runs a firmware program (FW), manages flash memory and provides block device services. The block device (such as NVMe device) of the second virtual machine is mapped to the first virtual machine as a system disk, that is, the first virtual machine accesses the flash memory as its own storage device through the second virtual machine, so that performance is not restricted by using the virtual software layer. In addition, the second virtual machine manages the flash memory by adopting a RAID redundancy mechanism based on a RAID engine of the SOC chip. The SOC chip is provided with a RAID engine, and a RAID redundancy mechanism can be directly adopted for flash memory management, so that RAID card hardware is not needed. The RAID redundancy mechanism can improve the reliability and safety of data, and can ensure the integrity and the recoverability of the data even if part of flash memory fails.
Referring to fig. 2, fig. 2 is a schematic diagram illustrating a system disk mapping operation NAND in an embodiment of the invention. SW represents the software layer and HW represents the hardware layer. VM1 represents a first virtual machine, and VM2 represents a second virtual machine, wherein a firmware program FW is running in VM 2.
Taking a Linux operating system installed in a host machine as an example, a Linux Kernel is a core part of the Linux operating system and is responsible for managing hardware resources of the system and providing a stable and unified running environment for an upper application program.
KVM module is an abbreviation of Kernel-based Virtual Machine (Kernel-based virtual machine), which is a virtualization module in a Linux system, and provides a full-virtualization environment through hardware virtualization technology (such as Intel VT or AMD-V) by utilizing functions provided by a Linux Kernel. This allows multiple, completely independent virtual client operating systems to run simultaneously on a single physical server. Each virtual machine has independent hardware resources such as CPU, memory, hard disk, network interface and the like, thereby realizing isolation and efficient utilization of resources.
VFIO (Virtual Function I/O) is a device pass-through technology for Linux operating systems, which allows physical devices (such as GPU, PCI devices, etc.) to be mapped directly into virtual machines or containers, thereby enabling high-performance hardware access and building of bare metal cloud servers. The VFIO module provides a set of user-state interfaces and kernel drivers that allow the user to fully control and manage the device and bypass the performance bottlenecks of the virtualization hierarchy, achieving lower latency and higher throughput. The VFIO module is flexible in design, and can conveniently add support to other types of hardware and IOMMU (I/O memory management unit). The VFIO module fully utilizes the direct memory access remapping and interrupt remapping characteristics provided by the VT-d/AMD-Vi technology, ensures the DMA security of the direct device and can reach the I/O performance close to the physical device.
QEMU (Quick Emulator) is a piece of open source hardware virtualization software, which can run virtual machines on different host platforms, and QEMU adopts a full system emulation method, so that a complete computer system including a processor, a memory, a storage, peripheral equipment and the like can be emulated. It provides a set of virtual hardware models that allow the operating system in the virtual machine to think itself interacted directly with the hardware, but actually with the hardware emulated by the QEMU. KVM (Kernel-based Virtual Machine) is another open source virtualization technology that is part of the Linux Kernel. KVM utilizes hardware extensions such as Intel VT or AMD-V to achieve efficient hardware-assisted virtualization. KVM provides a virtualization framework, while QEMU is typically used as a user space component to simulate a device. QEMU may emulate I/O devices (e.g., network cards, disks, etc.), while KVM is responsible for virtualization of CPU and memory. The combination of QEMU and KVM can implement real-sense virtualization, providing virtualization of CPU, memory and I/O devices. This combined approach can take full advantage of both, providing high performance, flexibility and broad device support. In QEMU-KVM, KVM runs in kernel space, responsible for virtualization of CPU and memory, while QEMU runs in user space, responsible for emulating I/O devices and other hardware. Through the/ioctl call/dev/KVM interface, the QEMU gives part of the CPU instructions to the kernel module KVM for processing, thereby achieving efficient virtualization.
In one embodiment, the second virtual machine implements the following operations by a firmware program:
(1) Executing a command set defined in the NVMe protocol to process the read-write IO request;
(2) Performing physical operations of the flash memory, including storage, reading and erasure of data;
(3) Executing NVMe commands manages the reading and writing of data, and the algorithm optimization logic in the firmware program decides how to execute NVMe commands efficiently, so as to manage the reading and writing of data and the storing of data, such as wear leveling, garbage collection, error handling, etc.
NVMe (Non-Volatile Memory Express) is a high-performance PCIe-based storage interface and communication protocol, designed specifically for SSDs (solid State drives). Although NVMe itself is not an integral part of an SOC chip, some high performance SOC chips may integrate controllers or interfaces that support NVMe to enable access and control of high speed SSDs. Whether an SOC chip integrates NVMe support depends on its design goals and application scenario.
NFC (NAND FLASH Controller) is a SOC chip integrated NAND flash memory Controller, which is a key component for managing data interaction between a NAND flash memory and a host system, and is responsible for performing read and write operations, data management, error correction, and optimizing storage performance between a Central Processing Unit (CPU) of a computer or other device and the flash memory.
NAND (Not AND) is a nonvolatile Memory technology that uses Flash Memory (Flash Memory) to store data. NAND flash is a common storage medium in Solid State Drives (SSDs), USB flash drives, SD cards, and other portable storage devices.
In order for the firmware program to normally run in the second virtual machine, thereby performing the above operation, it is necessary to construct an operating environment for the firmware program that is substantially the same as the bare metal environment.
In one embodiment, the following design is adopted to ensure that the firmware program normally operates in the second virtual machine, and the following specific steps are adopted:
and respectively leading the register, the interrupt number and the memory to the second virtual machine, so that the physical address and the interrupt number accessed by the second virtual machine and the host are completely consistent.
The implementation of directly passing the register and the interrupt number to the second virtual machine is as follows:
1.1, when a first operating system of a host is started, configuring a register range and an interrupt number of each Platform device in a device tree file;
1.2, after entering a first operating system of a host, using VFIO-Platform drivers to cover original drivers of all Platform devices and take over all the Platform devices;
In this embodiment, a register range and an interrupt number of a Platform device, for example, NVMe, NFC, etc., are added to a device tree file used when an operating system of a host machine is started. After entering the operating system of the host, the Platform device added in the device tree file can be seen under the condition of, for example, sys/bus/Platform/devices/catalogue, and then the original driver is covered by VFIO-Platform and the Platform device is connected in parallel, so that VFIO-Platform device is obtained.
The Platform device is an architecture in the Linux system for managing and organizing hardware devices that are independent of the system bus. Such devices include, but are not limited to, GPIO controllers, temperature sensors, clock controllers, and the like. The Platform device is connected to the system via a virtual bus (Platform bus). Platform Driver (Platform Driver) is a driving model mainly used for managing devices specific to a Platform, and is matched with devices and drivers through a device tree (DEVICE TREE), and the devices are generally registered in a kernel in a Platform initialization stage, and a platform_device structure is used as a representation of the devices.
VFIO (Virtual Function I/O) is a framework in the kernel of the operating system that aims to provide a mechanism that enables programs in user mode to directly and securely access and manage hardware devices. Through VFIO framework, the user-state program can obtain direct control over the hardware device. VFIO-Platform devices refer to Platform devices that are managed and pass-through the VFIO framework. The Platform devices are typically integrated on a system board, not relying on conventional physical buses (e.g., PCI, USB, etc.), but rather directly connected to the processor or memory controller of the system. Through VFIO framework, each Platform device can be safely and efficiently exposed to user state programs or virtual machines, so that more flexible and efficient hardware resource utilization is realized.
1.3, Adding VFIO-Platform devices when the QEMU is used for starting the second virtual machine so as to enable the VFIO-Platform devices to be directly connected to the second virtual machine;
QEMU (Quick EMUlator) is an open source machine simulator and virtualizer. It can simulate the whole computer including CPU, memory, hard disk and other hardware devices. QEMU supports processors of multiple architectures and is capable of running on different host operating systems.
When the second virtual machine is started by using the QEMU, VFIO-Platform devices are added through QEMU parameter configuration, such as passing through VFIO-Platform devices, such as NVME, NFC, etc., to the second virtual machine.
1.4, In the starting process of the second virtual machine, calling a first function to acquire a Platform bus memory mapping I/O area corresponding to VFIO-Platform equipment and serve as a memory subarea through a register of each VFIO-Platform equipment, adding the memory subarea into the memory subarea and using a physical register address corresponding to VFIO-Platform equipment, and calling a second function to map interrupt numbers of each VFIO-Platform equipment to physical interrupts through the register of each VFIO-Platform equipment.
In this embodiment, after the above steps are completed, the register map and the interrupt map are further modified respectively, specifically:
(1) Register map modification
In the starting process of the second virtual machine, a first function (such as a Platform bus map function) is called through a register of each VFIO-Platform device to acquire a Platform bus memory mapping I/O area corresponding to VFIO-Platform device and serve as a memory subarea, the memory subarea is added to the memory subarea, and a physical register address corresponding to VFIO-Platform device is used, so that the physical address accessed by the second virtual machine and a host machine are completely consistent.
(2) Interrupt map modification
And calling a second function (such as a platform_bus_map_irq function) through a register of each VFIO-Platform device to map the interrupt number of each VFIO-Platform device to a physical interrupt, so that the interrupt number accessed by the second virtual machine and the host are completely consistent.
And (II) the realization mode of directly connecting the memory to the second virtual machine is as follows:
2.1, when a first operating system of a host is started, configuring a memory range which needs to be directly used for a second virtual machine in an equipment tree file, wherein the memory range comprises two memory equipment, namely a DRAM (dynamic random access memory) and an SRAM (static random access memory);
2.2, a new kernel module is added for creating character equipment, acquiring an equipment physical address through the character equipment, and mapping the equipment physical address to a virtual memory;
And 2.3, mapping the memory physical address of the host to the virtual memory and mapping the virtual memory to the same physical address of the second virtual machine by using the file descriptor returned by the character equipment generated by the newly added kernel module, thereby ensuring that the memory physical addresses in the second virtual machine and the physical machine are completely the same.
In this embodiment, through the above design, the register, the interrupt number and the memory are respectively directly sent to the second virtual machine, so that the physical address and the interrupt number accessed by the second virtual machine and the host machine are completely consistent, and normal running of the firmware program in the second virtual machine is further ensured.
As shown in fig. 1, to further implement OpenBMC service response to the management and monitoring request from the user side, the present embodiment further performs the following processing on the BMC request and the system side:
(1) Power-on and power-off request processing
When receiving a power-on and power-off request sent by a user side in the form of https protocol, openBMC service analyzes the power-on and power-off request, and then calls an API interface of a virtualization management tool to execute power-on and power-off operation of a host. For example, openBMC service calls the API interface system_ powerdown/system_reset of libvirt to complete the power-up and power-down operations of the host.
(2) Installation request processing
When receiving an installation request sent by a user side in the form of https protocol and an uploaded ISO mirror image, openBMC analyzes the installation request, starts UEFI (Unified Extensible FIRMWARE INTERFACE ) firmware through a first virtual machine to load the uploaded ISO mirror image and executes installation of a second operating system. UEFI firmware is software on a computer motherboard that runs first when the computer is started, and is responsible for initializing hardware, detecting hardware functions, and booting an operating system.
(3) Hardware driven binding
The method comprises the steps of adding IOMMU (Input-Output Memory Management Unit, input/output memory management unit) support in kernel starting parameters of a host machine, unbinding a hardware driver from an original kernel driver, and binding to VFIO-Platform equipment to provide the first virtual machine, so that the first virtual machine can directly access physical hardware.
The IOMMU is used for managing and translating the access of peripherals (such as a network adapter, a graphic card, a storage device and the like) to the system memory, realizing the virtualization and conversion of physical addresses and providing address space isolation and memory protection for direct memory access, thereby enhancing the security, stability and performance of the system.
The OpenBMC service designed in the embodiment can acquire hardware state and simultaneously acquire real-time condition of user system resources, and has in-band and out-of-band monitoring functions.
(1) Out-of-band monitoring
OpenBMC the service communicates with hardware in the single-chip server through a preset interface to read original data for out-of-band monitoring, processes the original data and provides the processed data to a user side for display or alarm, wherein the original data comprises one or more of a temperature sensor, a voltage monitor and a fan state.
For example, the interface I2C, GPIO, IPMI is used to directly communicate with hardware, and the original data such as temperature sensor, voltage monitor, fan state are read and processed for display or alarm.
(2) In-band monitoring
The first virtual machine communicates with the host machine in a bridging mode, so that resource usage data of the first virtual machine for in-band monitoring is periodically collected through OpenBMC service, and the resource usage data is analyzed and processed and then provided to a user side for display.
For example, the virtual machine VM1 running the user operating system communicates with the host machine in a bridging manner, and periodically collects and analyzes the resource usage data of the virtual machine VM1 and provides the data to the web terminal for display, so that the user can conveniently check the data.
Fig. 1 and fig. 2 are a detailed description of a method for designing a single-chip server in an embodiment of the present invention, and the embodiment of the present invention further provides an SOC chip, which is designed by adopting the method for designing a single-chip server in any one of the embodiments, and is mainly designed as follows:
(1) The SOC chip has the characteristics of high safety, perfect data fault tolerance, support NAND FLASH management, RAID & EC protection, high capacity and flexible expansion, rich interfaces and protocols and the like.
(2) The SOC chip is provided with a first operating system and is used as a host machine
The kernel of the host starts OpenBMC service and keeps running, openBMC service responds to management and monitoring requests of a user side, and OpenBMC service uses a gigabit network card provided by the SOC chip as a management network card;
(3) Installing a first virtual machine and a second virtual machine in a kernel of a host machine
The method comprises the steps of installing a first virtual machine running a mirror image of a second operating system installed by a user for operation on a user side, enabling the second virtual machine to run a firmware program to manage flash memory and provide block equipment service, mapping block equipment of the second virtual machine to the first virtual machine and serving as a system disk, and enabling the second virtual machine to manage the flash memory by adopting a RAID redundancy mechanism based on a RAID engine of the SOC chip.
The embodiment of the invention also provides a server, which comprises the SOC chip described in the embodiment, and is designed by adopting the single-chip server design method described in any one of the embodiments.
The server designed by the embodiment saves hardware cost of a BMC chip, a RAID card, a BMC service management network card and the like, solves the problem of chip compatibility of the server BMC service, remarkably improves the localization implementation degree, and simultaneously, compared with the traditional BMC service, the server BMC service is compatible with out-of-band and in-band monitoring content, improves monitoring completeness and provides reference and revenues for more efficient and flexible resource configuration management strategies in industry.
Compared with the existing server, the server design scheme provided by the embodiment has the advantage of extremely high cost, and is very suitable for scenes with extremely high requirements on the cost of the server, such as cold data storage nodes and the like. The chip compatibility of BMC service is greatly improved, and the BMC service is not limited by foreign special chips.
It will be apparent to those skilled in the art that the integrated modules or units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. The storage medium includes a U disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
While the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that the foregoing embodiments may be modified or equivalents may be substituted for some of the features thereof, and that the modifications or substitutions do not depart from the spirit and scope of the embodiments of the invention.