[go: up one dir, main page]

CN110109873B - File management method for message queue - Google Patents

File management method for message queue Download PDF

Info

Publication number
CN110109873B
CN110109873B CN201910381124.1A CN201910381124A CN110109873B CN 110109873 B CN110109873 B CN 110109873B CN 201910381124 A CN201910381124 A CN 201910381124A CN 110109873 B CN110109873 B CN 110109873B
Authority
CN
China
Prior art keywords
message
file
index
metadata
messages
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910381124.1A
Other languages
Chinese (zh)
Other versions
CN110109873A (en
Inventor
陈咸彰
罗威
沙行勉
曾孝平
诸葛晴凤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN201910381124.1A priority Critical patent/CN110109873B/en
Publication of CN110109873A publication Critical patent/CN110109873A/en
Application granted granted Critical
Publication of CN110109873B publication Critical patent/CN110109873B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种用于消息队列的文件管理方法,其特征在于:采用新型非易失存储设备,并在存储模块中设置有消息文件和元数据文件且建立了两个键值存储结构用于记录每个消息文件包含的消息数和消息文件大小,所述消息文件和元数据文件一一对应且分开保存,其中消息文件用于存放消息实体数据,元数据文件用于存放每条消息的描述信息,包括消息大小、在消息文件内的偏移,消息生成时间。其效果是:使用NVM做为消息的持久化存储介质,可通过进程的虚拟地址访问消息数据,规避复杂的I/O软件栈,可通过访问消息编号随机访问消息数据,建立了基于消息生产时间的多精确度的消息分层索引结构,可快速回溯消息。

Figure 201910381124

The invention discloses a file management method for a message queue, which is characterized in that: a new type of non-volatile storage device is adopted, and a message file and a metadata file are set in the storage module, and two key-value storage structures are established. To record the number of messages contained in each message file and the size of the message file, the message file and the metadata file are stored in a one-to-one correspondence and stored separately, wherein the message file is used to store the message entity data, and the metadata file is used to store the information of each message Description information, including message size, offset in the message file, and message generation time. The effect is: using NVM as a persistent storage medium for messages, the message data can be accessed through the virtual address of the process, avoiding complex I/O software stacks, and the message data can be randomly accessed by accessing the message number, and a message production time-based The multi-precision message hierarchical index structure can quickly trace back the message.

Figure 201910381124

Description

一种用于消息队列的文件管理方法A file management method for message queue

技术领域technical field

本发明涉及计算机数据存储技术,更具体地说,是一种用于消息队列的文件管理方法。The invention relates to computer data storage technology, more specifically, a file management method for message queues.

背景技术Background technique

消息队列,也称作消息中间件,是消息传递过程中保存消息的容器,是分布式系统中的重要组件,主要解决应用解耦,异步消费等问题。通常情况下,消息队列采用发布、订阅模式,提供统一的接口帮助客户端和服务器解耦,提供消息持久化机制,保证消息能够正确接收,在突发情况下不会丢失,从而使服务器能够应对大吞吐量的消息处理场景。消息队列在移动互联网、电子商务等领域有广泛的应用。Message queue, also known as message middleware, is a container for storing messages during message delivery and an important component in a distributed system. It mainly solves problems such as application decoupling and asynchronous consumption. Under normal circumstances, the message queue adopts the publish and subscribe mode, provides a unified interface to help the client and the server decouple, provides a message persistence mechanism, ensures that the message can be received correctly, and will not be lost in an emergency, so that the server can cope with High-throughput message processing scenarios. Message queues are widely used in mobile Internet, e-commerce and other fields.

现有的消息队列,大都将消息存放在内存或者基于磁盘等块设备的文件系统中。存放在内存中,可以实现对消息的实时操作,速度块,但是无法持久化保存,断电后数据会丢失。将消息保存在基于磁盘、SSD等块设备的文件系统中可以持久化保存消息。但是访问基于块设备的文件系统需要经过复杂的I/O软件栈,这大大降低了消息队列的性能。Most of the existing message queues store messages in memory or in file systems based on block devices such as disks. Stored in memory, real-time operations on messages can be realized, speed block, but it cannot be stored persistently, and the data will be lost after power failure. Saving messages in file systems based on block devices such as disks and SSDs can persist messages. However, accessing a file system based on a block device needs to go through a complex I/O software stack, which greatly reduces the performance of the message queue.

新型的非易失性存储器(non-volatile memory,NVM)具有接近DRAM(动态随机存储器)的读写速度,可直接挂载在内存总线上,高密度,低能耗,断电不丢失、可字节寻址。新型的非易失性存储器主要包括相变存储器(PCM)、阻变存储器(RRAM)、忆阻器(Memristor)等等。NVM模拟成磁盘设备,用现有的磁盘文件系统初始化,无法发挥NVM的优势。因为操作系统是通过I/O总线连接块设备的,访问基于块设备的文件系统需要经过复杂的I/O软件栈。如图1所示的Linux文件系统的I/O软件栈,首先是虚拟文件系统(Virtual FileSystem,简称VFS)。接下来是文件系统的内部操作,如数据块查找等。然后会发出访问块设备的I/O请求,之后经过通用块层、I/O调度层和块设备驱动层等面向设备的软件层。使用传统的I/O软件栈访问数据会产生极大的开销,以读取文件数据为例,首先通过DMA将文件数据从块设备拷贝到内核I/O缓冲区,一般还需要拷贝至页高速缓存,最后拷贝到用户态缓冲区。在这个过程中还需要进行进程上下文切换,占用CPU资源。The new type of non-volatile memory (non-volatile memory, NVM) has a reading and writing speed close to that of DRAM (Dynamic Random Access Memory), can be directly mounted on the memory bus, has high density, low energy consumption, will not be lost when power is off, and can be written section addressing. The new type of non-volatile memory mainly includes phase change memory (PCM), resistive memory (RRAM), memristor (Memristor) and so on. NVM is simulated as a disk device and initialized with the existing disk file system, which cannot take advantage of NVM. Because the operating system is connected to the block device through the I/O bus, accessing the file system based on the block device needs to go through a complex I/O software stack. The I/O software stack of the Linux file system shown in Figure 1 is first a virtual file system (Virtual File System, referred to as VFS). Next is the internal operation of the file system, such as data block lookup, etc. Then an I/O request to access the block device will be issued, and then go through device-oriented software layers such as the general block layer, the I/O scheduling layer, and the block device driver layer. Using the traditional I/O software stack to access data will incur a huge overhead. Take reading file data as an example. First, the file data is copied from the block device to the kernel I/O buffer through DMA. Generally, it needs to be copied to the page high-speed Cache, and finally copy to the user mode buffer. In this process, process context switching is also required, which takes up CPU resources.

发明内容Contents of the invention

针对现有技术中存在的问题,本发明提出一种用于消息队列的文件管理方法,基于新型非易失存储设备,通过访问进程的虚拟地址的方式访问文件数据,规避复杂的I/O软件栈和进程的上下文切换。Aiming at the problems existing in the prior art, the present invention proposes a file management method for message queues, based on a new type of non-volatile storage device, accessing file data by accessing the virtual address of the process, avoiding complex I/O software Context switching for stacks and processes.

为实现上述目的,本发明所采用的具体技术方案如下:In order to achieve the above object, the concrete technical scheme adopted in the present invention is as follows:

一种用于消息队列的文件管理方法,其关键在于:采用新型非易失存储设备,并在存储模块中设置有消息文件和元数据文件且建立了两个键值存储结构用于记录每个消息文件包含的消息数和消息文件大小,所述消息文件和元数据文件一一对应且分开保存,其中消息文件用于存放消息实体数据,元数据文件用于存放每条消息的描述信息,包括消息大小、在消息文件内的偏移,消息生成时间。A file management method for message queues, the key of which is: a new type of non-volatile storage device is adopted, a message file and a metadata file are set in the storage module, and two key-value storage structures are established for recording each The number of messages contained in the message file and the size of the message file, the message file and the metadata file are stored in a one-to-one correspondence and stored separately, wherein the message file is used to store the message entity data, and the metadata file is used to store the description information of each message, including Message size, offset within the message file, message generation time.

可选地,系统采用共享内存保存消息,通过进程的虚拟地址实现消息的写入和读取。Optionally, the system uses shared memory to store messages, and writes and reads messages through virtual addresses of processes.

可选地,在消息读取过程中,先读取所述元数据文件,并从所述元数据文件中获取该消息的大小以及该消息在消息文件内的偏移,然后再读取对应的消息文件。Optionally, during the message reading process, the metadata file is read first, and the size of the message and the offset of the message in the message file are obtained from the metadata file, and then the corresponding message file.

可选地,在消息写入过程中,包括写入消息文件、写入元数据文件以及计算并更新消息数量和消息文件大小键值存储结构三个步骤。Optionally, the message writing process includes three steps: writing the message file, writing the metadata file, and calculating and updating the number of messages and the key-value storage structure of the message file size.

可选地,同一个主题分区同时只运行一个生产者进程进行写入操作,如果存在两个生产者进程都在生成某一个主题的消息时,则为其分配不同的分区。Optionally, only one producer process runs on the same topic partition at the same time for writing operations. If there are two producer processes that are generating messages of a certain topic, different partitions are assigned to them.

可选地,更新键值存储结构中消息文件大小、消息文件内消息数量是后台线程通过滑动检测点检测元数据文件中的元数据项计算出的。Optionally, updating the size of the message file in the key-value storage structure and the number of messages in the message file are calculated by the background thread detecting the metadata items in the metadata file by sliding the detection point.

可选地,在消息文件中,消息是按照生成时间顺序依次存放的。Optionally, in the message file, messages are stored sequentially in order of generation time.

可选地,在消息查询时采用基于生成时间的消息快速索引结构,系统建立了时、分、秒三层的树状索引结构,每层索引的每个索引节点都是一段连续的空间,每个索引节点由若干个大小固定的索引项组成,可以随机访问每个索引项,其中:Optionally, a message fast index structure based on generation time is used in message query. The system establishes a three-layer tree index structure of hours, minutes, and seconds. Each index node of each layer of index is a continuous space. An index node consists of several fixed-size index items, and each index item can be accessed randomly, where:

第一层基于整时索引,有一个索引节点,每个索引节点由24个索引项组成,每个索引项由文件索引号、消息文件内消息编号、下一层索引节点的起始地址组成;The first layer is based on the full-time index, and has an index node, each index node is composed of 24 index items, and each index item is composed of the file index number, the message number in the message file, and the starting address of the next layer index node;

第二层基于分钟索引,有24个索引节点,每个索引节点由60个索引项组成,每个索引项由文件索引号、消息文件内消息编号、下一层索引节点的起始地址组成;The second layer is based on the minute index, with 24 index nodes, each index node is composed of 60 index items, and each index item is composed of the file index number, the message number in the message file, and the starting address of the next layer index node;

第三层基于秒级索引,有24*60个索引节点,每个索引节点由60个索引项组成,每个索引项由文件索引号和消息文件内消息编号组成。The third layer is based on the second-level index, with 24*60 index nodes, each index node is composed of 60 index items, and each index item is composed of the file index number and the message number in the message file.

本发明的显著效果是:Notable effect of the present invention is:

使用NVM做为消息的持久化存储介质,可通过进程的虚拟地址访问消息数据,规避复杂的I/O软件栈,可通过访问消息编号随机访问消息数据,建立了基于消息生产时间的多精确度的消息分层索引结构,可快速回溯消息。Using NVM as the persistent storage medium for messages, the message data can be accessed through the virtual address of the process, avoiding complex I/O software stacks, and the message data can be randomly accessed by accessing the message number, establishing multi-accuracy based on message production time The hierarchical index structure of messages can quickly trace back messages.

附图说明Description of drawings

下面将结合附图及实施例对本发明作进一步说明,附图中:The present invention will be further described below in conjunction with accompanying drawing and embodiment, in the accompanying drawing:

图1为现有Linux系统中的I/O软件栈示意图;Fig. 1 is the I/O software stack schematic diagram in the existing Linux system;

图2为本发明具体实施例中的消息存储模块逻辑结构;Fig. 2 is the logical structure of the message storage module in the specific embodiment of the present invention;

图3为本发明具体实施例中的共享内存原理框图;Fig. 3 is a functional block diagram of shared memory in a specific embodiment of the present invention;

图4为本发明具体实施例中的消息文件与元数据文件关系图;Fig. 4 is a relational diagram of a message file and a metadata file in a specific embodiment of the present invention;

图5为本发明具体实施例中的消息写操作原理框图;Fig. 5 is a functional block diagram of a message writing operation in a specific embodiment of the present invention;

图6为本发明具体实施例中的滑动检测原理图;Fig. 6 is a schematic diagram of sliding detection in a specific embodiment of the present invention;

图7为本发明具体实施例中的基于生产时间的消息索引架构图。Fig. 7 is a structural diagram of a message index based on production time in a specific embodiment of the present invention.

具体实施方式Detailed ways

为了使本发明要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述,应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will be described in detail in conjunction with the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and are not intended to limit the invention.

如图2所示,本发明提出的一种用于消息队列的文件管理方法,用于基于主题Topic的消息队列的消息的持久化存储,Topic是一个抽象概念,即某一类消息的种类,可以横向扩展成分区Partition(下文简称为P),每个Partition包含多个用来存放消息的文件。本发明采用新型非易失存储设备,以文件的形式持久化保存消息,将消息数据和消息描述信息(本文后面称为消息元数据)分开保存,通过查看消息元数据信息查找消息数据。存储模块中有消息文件和元数据文件两种文件,消息文件中存放着消息实体数据,元数据文件中存放每条消息的描述信息,包括消息大小、消息在消息文件内偏移、消息生产时间等。每个消息文件和元数据文件的大小固定,可自行配置,通常情况下,元数据文件小于消息文件。具体实施时,消息文件的命名方式为Topic_P-date-index.log,其中Topic_P为TopicPartition名,date为当前日期,index为文件id,从0开始递增,代表该Partition当天的第几个文件,消息文件存放在log/topic目录下。与消息文件对应,存在着同名的元数据文件,元数据文件的后缀为.meta,存放在meta/topic目录下。存储模块中建立了两个键值存储结构记录每个消息文件包含的消息数和消息文件大小,分别以消息文件名Topic_P-date-index为key,以消息文件中消息数量counts为value;以Topic_P-date-index为key,以消息文件大小size为value。As shown in Fig. 2, a kind of file management method for message queue proposed by the present invention is used for the persistent storage of the message of the message queue based on topic Topic, and Topic is an abstract concept, that is, the type of a certain type of message, It can be horizontally expanded into partitions (hereinafter referred to as P), and each Partition contains multiple files for storing messages. The present invention adopts a new type of non-volatile storage device to persistently save messages in the form of files, separately saves message data and message description information (referred to as message metadata hereinafter), and searches for message data by checking message metadata information. There are two types of files in the storage module: message file and metadata file. The message file stores the message entity data, and the metadata file stores the description information of each message, including message size, message offset in the message file, and message production time. wait. The size of each message file and metadata file is fixed and configurable. Usually, the metadata file is smaller than the message file. During specific implementation, the naming method of the message file is Topic_P-date-index.log, where Topic_P is the name of the TopicPartition, date is the current date, index is the file id, and increments from 0, representing the first file of the Partition on the day, the message The files are stored in the log/topic directory. Corresponding to the message file, there is a metadata file with the same name. The suffix of the metadata file is .meta, and it is stored in the meta/topic directory. Two key-value storage structures are established in the storage module to record the number of messages contained in each message file and the size of the message file. The message file name Topic_P-date-index is used as the key, and the counts of messages in the message file are used as the value; Topic_P -date-index is the key, and the message file size is the value.

在本系统中,消息文件需要被写入进程写入,被读取进程读取,这两个进程通常情况下还是一起工作的。Linux系统提供了共享内存(shm)的机制,所谓共享内存,就是允许把同一块内存分别映射在不同进程的地址空间,如图3所示,进程1可以即时看到进程2对共享内存中数据的更新,反之亦然。本系统将NVM物理空间作为共享NVM内存池,每一个消息文件和元数据文件本质上都是一段共享NVM内存。In this system, the message file needs to be written by the writing process and read by the reading process. These two processes usually work together. The Linux system provides a shared memory (shm) mechanism. The so-called shared memory allows the same piece of memory to be mapped in the address spaces of different processes. As shown in Figure 3, process 1 can instantly see the data in the shared memory of process 2. updates, and vice versa. This system uses NVM physical space as a shared NVM memory pool, and each message file and metadata file is essentially a shared NVM memory.

共享内存的具体实施是主要包括以下4个系统调用函数:The specific implementation of shared memory mainly includes the following four system call functions:

①创建共享内存int shmget(key_t key,size_t size,int shmflg);①Create shared memory int shmget(key_t key, size_t size, int shmflg);

第一个参数是共享内存段的命名;The first parameter is the name of the shared memory segment;

第二个参数是需要创建的内存容量;The second parameter is the memory capacity to be created;

第三个是权限标志;The third is the permission flag;

shmget函数成功时返回一个与key相关的共享内存标识符shm_id。When the shmget function succeeds, it returns a shared memory identifier shm_id associated with the key.

②将共享内存映射在进程的地址空间void*shmat(int shm_id,const void*shm_addr,int shmflg);② Map the shared memory in the address space of the process void*shmat(int shm_id, const void*shm_addr, int shmflg);

第一个参数是由shmget函数返回的共享内存标识;The first parameter is the shared memory identifier returned by the shmget function;

第二个参数是指定共享内存映射到当前进程中的地址,通常为空,表示让系统来选择共享内存的地址;The second parameter is to specify the address of the shared memory mapped to the current process, which is usually empty, which means let the system choose the address of the shared memory;

第三个参数,shm_flg是一组标志位,通常为0;The third parameter, shm_flg is a set of flag bits, usually 0;

调用成功后返回映射空间的首地址。Returns the first address of the mapped space after the call is successful.

③将共享内存总当前进程中分离int shmdt(const void*shmaddr);③Separate int shmdt(const void*shmaddr) from the current process of shared memory;

该函数用于将共享内存从当前进程中分离。注意,将共享内存分离并不是删除它,只是使该共享内存对当前进程不再可用。This function is used to detach shared memory from the current process. Note that detaching shared memory does not delete it, it just makes the shared memory no longer available to the current process.

④控制共享内存int shmctl(int shm_id,int command,struct shmid_ds*buf);例如更改共享内存的关联值,删除共享内存段等。④Control shared memory int shmctl(int shm_id, int command, struct shmid_ds*buf); for example, change the associated value of shared memory, delete shared memory segment, etc.

消息文件和元数据文件大小固定,当写满后需要创建新的文件时,分别对新的消息文件名和元数据名取hash值,并作为参数调用Linux共享内存函数shmget,获取到对应的共享内存标识符shm_id,当进程需要访问消息文件和元数据文件时调用shmat函数即可映射在进程的虚拟地址空间,通过访问返回的地址就可以完成数据访问。The size of the message file and the metadata file is fixed. When a new file needs to be created after the file is full, hash values are taken for the new message file name and metadata name respectively, and the Linux shared memory function shmget is called as a parameter to obtain the corresponding shared memory ID The character shm_id, when the process needs to access the message file and metadata file, the shmat function can be called to map in the virtual address space of the process, and the data access can be completed by accessing the returned address.

在消息读取过程中,先读取所述元数据文件,并从所述元数据文件中获取该消息的大小以及该消息在消息文件内的偏移,然后再读取对应的消息文件。In the message reading process, the metadata file is read first, and the size of the message and the offset of the message in the message file are obtained from the metadata file, and then the corresponding message file is read.

具体而言,可根据消息id快速查询并读取,消息id由该消息所在文件的文件名结合该消息在该文件内的序号共同组成,在每个消息文件内消息id均从0开始递增。消息文件与元数据文件是一一对应的,如图4所示,元数据文件保存着每条消息的元数据信息,每条消息的元数据项由该消息的生产时间、该消息在消息文件中的偏移、该消息的长度组成。元数据项大小固定,将元数据文件映射在进程的虚拟地址空间后,可在用户态随机访问如何一个元数据项。读取消息时,先读取对应的元数据项,获取该消息在消息文件中的偏移和消息大小,就可以访问真正的消息数据。Specifically, it can be quickly queried and read according to the message id. The message id is composed of the file name of the file where the message is located and the sequence number of the message in the file. The message id in each message file increases from 0. There is a one-to-one correspondence between the message file and the metadata file. As shown in Figure 4, the metadata file stores the metadata information of each message. The offset in , the length of the message. The size of the metadata item is fixed, and after the metadata file is mapped in the virtual address space of the process, any metadata item can be randomly accessed in the user mode. When reading a message, first read the corresponding metadata item, obtain the offset and message size of the message in the message file, and then you can access the real message data.

在消息写入过程中,包括写入消息文件、写入元数据文件以及计算并更新消息数量和消息文件大小键值存储结构三个步骤。In the message writing process, there are three steps including writing the message file, writing the metadata file, and calculating and updating the number of messages and the key-value storage structure of the message file size.

具体实施时,同一个Partition同时只允许一个生产者进程进行写入操作,如果存在两个生产者进程都在生产某一Topic的消息,则为其分配不同的Partition。现在以生产者进程生产Topic1_P0-#-#文件的消息3为例讲述消息写入操作,如图5所示。During specific implementation, only one producer process is allowed to write to the same Partition at the same time. If there are two producer processes that are both producing messages of a certain topic, different Partitions are assigned to them. Now take the message 3 of the Topic1_P0-#-# file produced by the producer process as an example to describe the message writing operation, as shown in FIG. 5 .

1)生成进程调用API生成消息,封装处理后以append方式写入消息文件。1) The generation process calls the API to generate a message, and writes the message file in append mode after encapsulation and processing.

2)消息文件写成功后会更新消息文件对应的元数据文件,以append方式在对应的元数据文件中写入消息3的元数据项。2) After the message file is successfully written, the metadata file corresponding to the message file will be updated, and the metadata item of message 3 will be written in the corresponding metadata file in the append mode.

3)后台线程通过检测元数据项,计算出该消息文件内的消息总数和该消息文件的大小,更新相关的键值存储结构。在中间任何一步失败则表示本次写入消息失败,回到第一步重新执行。3) The background thread calculates the total number of messages in the message file and the size of the message file by detecting metadata items, and updates the relevant key-value storage structure. Failure in any step in the middle means that the writing of the message failed this time, and it returns to the first step to re-execute.

更新键值存储结构中消息文件大小、消息文件内消息数量是后台线程通过滑动检测点检测元数据项计算出的,检测流程如图6所示。Updating the size of the message file in the key-value storage structure and the number of messages in the message file are calculated by the background thread through the sliding detection point to detect metadata items. The detection process is shown in Figure 6.

元数据文件创建后,元数据项的所有字段会被初始化为0。将元数据文件映射在进程的虚拟地址空间后,因为每一个元数据项大小固定,可以访问任何一个元数据项。消息元数据项中代表消息长度的字段size放在末尾,且用整形表示,CPU进行写入操作时是原子操作,所以只要size字段不为0,就代表着该消息的写入操作已经完成,该元数据项标识了一条新的消息,随后更改消息文件中消息数量和消息文件大小,同时将检测点移向下一个元数据项。After the metadata file is created, all fields of the metadata item will be initialized to 0. After the metadata file is mapped in the virtual address space of the process, any metadata item can be accessed because the size of each metadata item is fixed. The field size representing the length of the message in the message metadata item is placed at the end, and is represented by an integer. When the CPU performs a write operation, it is an atomic operation, so as long as the size field is not 0, it means that the write operation of the message has been completed. This metadata item identifies a new message, then changes the number of messages in the message file and the size of the message file while moving the detection point to the next metadata item.

持久化的消息队列系统同时扮演着消息存储的角色,为消息队列提供快速的消息查询功能是十分必要的,用户可能需要回溯过去某个时间段的消息,一条一条的查找,显然是不合理的。在消息存储模块中定位到给定Topic的消息文件并不难,难点在于如何快速定位到给定Topic在某个时间段生产的消息。The persistent message queue system plays the role of message storage at the same time. It is very necessary to provide a fast message query function for the message queue. Users may need to go back to the messages of a certain period of time in the past, and it is obviously unreasonable to search one by one. . It is not difficult to locate the message file of a given topic in the message storage module, but the difficulty lies in how to quickly locate the messages produced by a given topic in a certain period of time.

在本系统中,消息文件名包含了当前的日期,但如果是更细粒度的时间划分,如时、分、秒,则需要建立一种数据结构来辅助快速定位消息,本文按照时、分、秒建立三层索引。每条消息的元数据信息都记录了消息的生产时间,所以在运行过程中通过检测元数据项就可以建立好索引结构,索引结构的建立按Partition实施,如图7所示。In this system, the message file name contains the current date, but if it is a finer-grained time division, such as hours, minutes, and seconds, it is necessary to establish a data structure to assist in quickly locating messages. Create a three-tier index in seconds. The metadata information of each message records the production time of the message, so the index structure can be established by detecting the metadata items during operation, and the establishment of the index structure is implemented according to Partition, as shown in Figure 7.

在消息文件中,消息是按照生产时间顺序存放的,所以只需要记录整时间点生产消息的id,那么该id之后的消息均是在该时间点之后生产的。对于不同时间粒度的需求,本文按照时、分、秒建立了三层的树状索引。每层索引的每个索引节点都是一段连续的空间,每个索引节点由若干个大小固定的索引项组成,可以随机访问每个索引项。第一层索引是基于整时的索引,只有一个索引节点,由于一天分为24小时,所以该索引节点由24个索引项组成,每个索引项由文件index、消息文件内消息id、下一层索引节点的起始地址addr组成。因为每个Topic Partition每天产生的文件可能不止一个,所以本文通过递增的index来标明不同的文件,id标识了在当前文件内的消息id。索引项0就标识了从某个消息文件的第几条消息起,所有的消息均是0点整之后生产的,以此类推。因为一个小时又分为60分钟,所以每个索引项的addr字段指向了下一层代表这60分钟的索引节点。同理在分钟级的索引项中,索引项0就标识了从某个消息文件的第几条消息起,所有的消息均是0分整之后生产的。一分钟又包括了60秒,所以第二层索引中每个索引项的addr字段又指向了第三层秒级索引节点。秒级索引节点的索引项只有index和id,标识了该id之后的所有消息均是这一秒整之后生产的。In the message file, messages are stored in the order of production time, so you only need to record the id of the production message at the entire time point, then the messages after the id are all produced after that time point. For the requirements of different time granularities, this paper establishes a three-layer tree index according to hours, minutes, and seconds. Each index node of each index layer is a continuous space, and each index node is composed of several fixed-size index items, and each index item can be accessed randomly. The first layer index is based on the whole time index, and there is only one index node. Since a day is divided into 24 hours, the index node is composed of 24 index items. Each index item consists of the file index, the message id in the message file, the next The starting address of the layer index node is composed of addr. Because each Topic Partition may generate more than one file per day, this paper uses an incremental index to indicate different files, and id identifies the message id in the current file. The index item 0 indicates that from the first message of a certain message file, all messages are produced after 0 o'clock, and so on. Because an hour is divided into 60 minutes, the addr field of each index entry points to the index node of the next layer representing the 60 minutes. Similarly, in the minute-level index items, the index item 0 identifies the number of messages from a certain message file, and all messages are produced after 0 minutes. One minute includes 60 seconds, so the addr field of each index item in the second-level index points to the third-level second-level index node. The index items of the second-level index node are only index and id, and all messages after the id are identified are produced after this second.

通过上述的三层索引结构,可以快速定位到每秒生产的消息。比如想查询8点07分56秒之后的消息,可以先查找第一层下标为8的索引项,找到其指向的第二层索引节点,然后查找该节点下标为7的索引项,找到其指向第三层索引节点的起始地址,然后查询该节点下标为56的索引项,就可以找到满足要求的消息id,然后就可以读取对应的元数据项,最终获取到消息数据。Through the above-mentioned three-tier index structure, the messages produced per second can be quickly located. For example, if you want to query the news after 8:07:56, you can first search for the index item with the subscript 8 in the first layer, find the index node in the second layer pointed to by it, and then search for the index item with the subscript 7 for the node, and find It points to the starting address of the third-level index node, and then queries the index item whose subscript is 56 to find the message id that meets the requirements, then reads the corresponding metadata item, and finally obtains the message data.

综上所述,本发明使用NVM做为消息的持久化存储介质,可通过进程的虚拟地址访问消息数据,规避复杂的I/O软件栈,可通过访问消息id随机访问消息数据。建立了基于消息生产时间的多精确度的消息分层索引结构,可快速回溯消息。To sum up, the present invention uses NVM as a persistent storage medium for messages, can access message data through the virtual address of the process, avoids complex I/O software stacks, and can randomly access message data by accessing message ids. A multi-accuracy message hierarchical index structure based on message production time is established, which can quickly trace back messages.

最后要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。As a final note, as used herein, the term "comprises," "comprises," or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a set of elements includes not only those elements , but also includes other elements not expressly listed, or also includes elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

上述本发明实施例序号仅仅为了描述,不代表实施例的优劣,通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。The serial numbers of the above-mentioned embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments. Through the description of the above implementation modes, those skilled in the art can clearly understand that the methods of the above-mentioned embodiments can be implemented with the help of software plus the necessary general-purpose hardware platform. Of course, it can also be implemented through hardware, but in many cases the former is a better implementation. Based on such an understanding, the essence of the technical solution of the present invention or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products are stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in various embodiments of the present invention.

上面结合附图对本发明的实施例进行了描述,但是本发明并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本发明的启示下,在不脱离本发明宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本发明的保护之内。Embodiments of the present invention have been described above in conjunction with the accompanying drawings, but the present invention is not limited to the above-mentioned specific implementations, and the above-mentioned specific implementations are only illustrative, rather than restrictive, and those of ordinary skill in the art will Under the enlightenment of the present invention, many forms can also be made without departing from the gist of the present invention and the protection scope of the claims, and these all belong to the protection of the present invention.

Claims (6)

1.一种用于消息队列的文件管理方法,其特征在于:采用相变存储器、阻变存储器或忆阻器作为存储模块,并在存储模块中设置有消息文件和元数据文件且建立了两个键值存储结构用于记录每个消息文件包含的消息数和消息文件大小,所述消息文件和元数据文件一一对应且分开保存,其中消息文件用于存放消息实体数据,元数据文件用于存放每条消息的描述信息,包括消息大小、在消息文件内的偏移,消息生成时间;1. A file management method for message queue, characterized in that: adopt phase-change memory, resistive change memory or memristor as storage module, and be provided with message file and metadata file in storage module and set up two A key-value storage structure is used to record the number of messages contained in each message file and the size of the message file. The message file and the metadata file are stored in a one-to-one correspondence, wherein the message file is used to store message entity data, and the metadata file is used to store the message entity data. It is used to store the description information of each message, including message size, offset in the message file, and message generation time; 在消息文件中,消息是按照生成时间顺序依次存放的;In the message file, the messages are stored in order of generation time; 在消息查询时采用基于生成时间的消息快速索引结构,系统建立了时、分、秒三层的树状索引结构,每层索引的每个索引节点都是一段连续的空间,每个索引节点由若干个大小固定的索引项组成,可以随机访问每个索引项,其中:When querying messages, a fast message index structure based on generation time is adopted. The system establishes a three-layer tree index structure of hours, minutes, and seconds. Each index node of each layer of index is a continuous space, and each index node is composed of It consists of several fixed-size index items, and each index item can be accessed randomly, among which: 第一层基于整时索引,有一个索引节点,每个索引节点由24个索引项组成,每个索引项由文件索引号、消息文件内消息编号、下一层索引节点的起始地址组成;The first layer is based on the full-time index, and has an index node, each index node is composed of 24 index items, and each index item is composed of the file index number, the message number in the message file, and the starting address of the next layer index node; 第二层基于分钟索引,有24个索引节点,每个索引节点由60个索引项组成,每个索引项由文件索引号、消息文件内消息编号、下一层索引节点的起始地址组成;The second layer is based on the minute index, with 24 index nodes, each index node is composed of 60 index items, and each index item is composed of the file index number, the message number in the message file, and the starting address of the next layer index node; 第三层基于秒级索引,有60*24个索引节点,每个索引节点由60个索引项组成,每个索引项由文件索引号和消息文件内消息编号组成。The third layer is based on the second-level index, with 60*24 index nodes, each index node is composed of 60 index items, and each index item is composed of the file index number and the message number in the message file. 2.根据权利要求1所述的用于消息队列的文件管理方法,其特征在于:采用共享内存方式保存消息,利用进程的虚拟地址实现消息的写入和读取。2. The file management method for message queue according to claim 1, characterized in that: the message is stored in a shared memory mode, and the writing and reading of the message is realized by using the virtual address of the process. 3.根据权利要求1或2所述的用于消息队列的文件管理方法,其特征在于:在消息读取过程中,先读取所述元数据文件,并从所述元数据文件中获取该消息的大小以及该消息在消息文件内的偏移,然后再读取对应的消息文件。3. The file management method for message queue according to claim 1 or 2, characterized in that: in the message reading process, first read the metadata file, and obtain the metadata file from the metadata file The size of the message and the offset of the message in the message file, and then read the corresponding message file. 4.根据权利要求1或2所述的用于消息队列的文件管理方法,其特征在于:在消息写入过程中,包括写入消息文件、写入元数据文件以及计算并更新消息数量和消息文件大小键值存储结构三个步骤。4. The file management method for message queue according to claim 1 or 2, characterized in that: in the message writing process, including writing message files, writing metadata files and calculating and updating message quantity and message The file size key-value store structure has three steps. 5.根据权利要求4所述的用于消息队列的文件管理方法,其特征在于:同一个主题分区同时只运行一个生产者进程进行写入操作,如果存在两个生产者进程都在生产某一个主题的消息时,则为其分配不同的分区。5. The file management method for message queues according to claim 4, characterized in that: the same topic partition runs only one producer process at the same time for writing operations, if there are two producer processes that are producing a certain When the topic's messages are assigned different partitions. 6.根据权利要求4所述的用于消息队列的文件管理方法,其特征在于:更新键值存储结构中消息文件大小、消息文件内消息数量是后台线程通过滑动检测点检测元数据文件中的元数据项计算出的。6. the file management method that is used for message queue according to claim 4, it is characterized in that: update the message file size in the key-value storage structure, the number of messages in the message file is that the background thread detects in the metadata file by the sliding detection point Metadata items are computed.
CN201910381124.1A 2019-05-08 2019-05-08 File management method for message queue Active CN110109873B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910381124.1A CN110109873B (en) 2019-05-08 2019-05-08 File management method for message queue

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910381124.1A CN110109873B (en) 2019-05-08 2019-05-08 File management method for message queue

Publications (2)

Publication Number Publication Date
CN110109873A CN110109873A (en) 2019-08-09
CN110109873B true CN110109873B (en) 2023-04-07

Family

ID=67488894

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910381124.1A Active CN110109873B (en) 2019-05-08 2019-05-08 File management method for message queue

Country Status (1)

Country Link
CN (1) CN110109873B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704212B (en) * 2019-09-29 2022-04-22 广州荔支网络技术有限公司 Message processing method and device
CN111125019A (en) * 2019-12-20 2020-05-08 北京无线电测量研究所 File retrieval method, writing method, system, FPGA chip and device
CN111338834B (en) * 2020-02-21 2023-06-30 北京百度网讯科技有限公司 Data storage method and device
CN111597124B (en) * 2020-04-21 2023-05-05 重庆大学 Persistent memory file system data organization method, system and storage medium
CN113239307B (en) * 2021-05-17 2025-03-21 北京百度网讯科技有限公司 Method and device for storing message topics
CN115686876A (en) * 2021-07-26 2023-02-03 深圳市腾讯计算机系统有限公司 Method and device for callback of message processing position, electronic equipment and storage medium
CN114896215A (en) * 2022-04-22 2022-08-12 阿里巴巴(中国)有限公司 Metadata storage method and device
CN116455956B (en) * 2023-06-16 2023-08-15 中国人民解放军国防科技大学 A method and system for data collection and data playback based on message middleware
CN119377232B (en) * 2024-12-27 2025-05-06 神州灵云(北京)科技有限公司 A method for storing and querying massive conversation data

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055591A (en) * 2007-05-25 2007-10-17 中兴通讯股份有限公司 Data access method for all-memory database
CN105005510A (en) * 2015-07-02 2015-10-28 西安交通大学 Error correction protection architecture and method applied to resistive random access memory cache of solid state disk
JP2015191361A (en) * 2014-03-27 2015-11-02 キヤノン株式会社 Information processing apparatus, information processing system, information processing apparatus control method, and computer program
CN106612306A (en) * 2015-10-22 2017-05-03 中兴通讯股份有限公司 Data sharing method and device of virtual machine
CN107704604A (en) * 2017-10-16 2018-02-16 中汇信息技术(上海)有限公司 A kind of information persistence method, server and computer-readable recording medium
CN108132845A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 Message storage, delivering method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9996403B2 (en) * 2011-09-30 2018-06-12 Oracle International Corporation System and method for providing message queues for multinode applications in a middleware machine environment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101055591A (en) * 2007-05-25 2007-10-17 中兴通讯股份有限公司 Data access method for all-memory database
JP2015191361A (en) * 2014-03-27 2015-11-02 キヤノン株式会社 Information processing apparatus, information processing system, information processing apparatus control method, and computer program
CN105005510A (en) * 2015-07-02 2015-10-28 西安交通大学 Error correction protection architecture and method applied to resistive random access memory cache of solid state disk
CN106612306A (en) * 2015-10-22 2017-05-03 中兴通讯股份有限公司 Data sharing method and device of virtual machine
CN108132845A (en) * 2016-12-01 2018-06-08 阿里巴巴集团控股有限公司 Message storage, delivering method and device and electronic equipment
CN107704604A (en) * 2017-10-16 2018-02-16 中汇信息技术(上海)有限公司 A kind of information persistence method, server and computer-readable recording medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
"Redis数据库在非易失性内存上的交互技术研究与实现";秦杰杰;《中国优秀硕士学位论文全文数据库》;20190415;全文 *
"UMFS: an efficient user-space file system for non-volatile memory";xianzhang chen等;《 Journal of Systems Architecture》;20180930;全文 *
"基于异构存储设备的分布式对象存储系统优化研究";吴林;《中国博士学位论文全文数据库》;20190415;正文第2.4.5节,第3.3.4节,第4.3.5节 *
"面向非易失性内存的文件系统与页面交换机制研究";陈咸彰;《中国博士学位论文全文数据库》;20180615;全文 *
"面向非易失性存储器的多表连接写操作的优化研究";马竹琳,李心池,诸葛晴凤,吴林,陈咸彰;《计算机学报》;20180925;全文 *

Also Published As

Publication number Publication date
CN110109873A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110109873B (en) File management method for message queue
US12210481B2 (en) File system data access method and file system
CN103020315B (en) A kind of mass small documents storage means based on master-salve distributed file system
CN114281762B (en) A log storage acceleration method, device, device and medium
CN107168657B (en) Virtual disk hierarchical cache design method based on distributed block storage
JP5996088B2 (en) Cryptographic hash database
US9043334B2 (en) Method and system for accessing files on a storage system
EP2735978A1 (en) Storage system and management method used for metadata of cluster file system
CN113448938A (en) Data processing method and device, electronic equipment and storage medium
CN106570113B (en) Mass vector slice data cloud storage method and system
CN106909651A (en) A kind of method for being write based on HDFS small documents and being read
CN111309258A (en) A kind of B+ tree access method, device and computer readable storage medium
CN109165321B (en) Consistent hash table construction method and system based on nonvolatile memory
CN109933564A (en) File system management method, device, terminal and medium for fast rollback based on linked list and N-ary tree structure
CN113704217A (en) Metadata and data organization architecture method in distributed persistent memory file system
WO2022262381A1 (en) Data compression method and apparatus
CN116991897A (en) A query method and device based on disk storage
CN108009029A (en) Method and system based on the data cached decoupling persistence of Ignite grids
CN110851407A (en) Data distributed storage system and method
CN102073690B (en) Method for constructing memory database supporting historical Key information
CN102567415A (en) Database control method and device
CN104516945A (en) Hadoop distributed file system metadata storage method based on relational data base
CN115203211A (en) A method and system for generating a unique hash sequence number
CN104133970A (en) Data space management method and device
US11586353B2 (en) Optimized access to high-speed storage device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant