[go: up one dir, main page]

CN118819826A - Method, system, device and medium for quickly extracting multi-level compressed files - Google Patents

Method, system, device and medium for quickly extracting multi-level compressed files Download PDF

Info

Publication number
CN118819826A
CN118819826A CN202410796499.5A CN202410796499A CN118819826A CN 118819826 A CN118819826 A CN 118819826A CN 202410796499 A CN202410796499 A CN 202410796499A CN 118819826 A CN118819826 A CN 118819826A
Authority
CN
China
Prior art keywords
decompression
level
thread
file set
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410796499.5A
Other languages
Chinese (zh)
Inventor
肖红飞
万振华
王颉
李华
董燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou Shuan Technology Co ltd
Seczone Technology Co Ltd
Original Assignee
Yangzhou Shuan Technology Co ltd
Seczone Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou Shuan Technology Co ltd, Seczone Technology Co Ltd filed Critical Yangzhou Shuan Technology Co ltd
Priority to CN202410796499.5A priority Critical patent/CN118819826A/en
Publication of CN118819826A publication Critical patent/CN118819826A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of data decompression, and discloses a method, a system, equipment and a medium for rapidly extracting a multi-level compressed file, wherein the method comprises the following steps: initializing a blocking queue, decompressing producer threads and decompressing consumer threads; traversing the compressed files to be decompressed to split the compressed files to obtain a main compressed file set and a main decompressed file set; performing layer-by-layer multi-thread decompression on the main-level compressed file set by using a decompression producer thread, a decompression consumer thread and a blocking queue to obtain a sub-level decompression result; and performing iterative enqueue decompression on the blocking queue according to the main-level decompression file set and the sub-level decompression result to obtain a standard decompression result. The blocking decompression queue, the producer-consumer thread mode and the multithreading iterative task decompression implemented by the invention can effectively utilize the CPU computing capacity of the computer, reduce the memory overhead during decompression and improve the decompression efficiency of the multi-level compressed file.

Description

快速提取多层级压缩文件的方法、系统、设备及介质Method, system, device and medium for quickly extracting multi-level compressed files

技术领域Technical Field

本发明涉及数据解压技术领域,尤其涉及一种快速提取多层级压缩文件的方法、系统、设备及介质。The present invention relates to the technical field of data decompression, and in particular to a method, system, device and medium for quickly extracting multi-level compressed files.

背景技术Background Art

多层级压缩文件是指在计算机中进行了多次压缩的文件,即一个压缩文件中包含另一些压缩文件,这些压缩文件中还包含另外的压缩文件,多层级压缩文件广泛应用于隐私数据加密场景,而为了获取压缩文件数据,需要对多层级压缩文件进行数据提取。A multi-level compressed file refers to a file that has been compressed multiple times in a computer, that is, one compressed file contains other compressed files, which in turn contain other compressed files. Multi-level compressed files are widely used in privacy data encryption scenarios. In order to obtain compressed file data, data extraction from multi-level compressed files is required.

传统解压多层级压缩文件的方法主要是基于递归的文件解压提取方法,实际使用时,基于递归的文件解压提取方法无法有效的利用计算机CPU运算能力,解压速度慢,若递归的深度过大,函数调用栈可能会耗尽可用的内存空间,导致栈溢出错误,此外使用递归在某些场景下还有深度的限制,当递归深度超过限制时,程序可能会崩溃或异常,可能导致对多层级压缩文件进行数据提取时效率较低的问题。Traditional methods for decompressing multi-level compressed files are mainly based on recursive file decompression and extraction methods. In actual use, recursive file decompression and extraction methods cannot effectively utilize the computer CPU computing power, and the decompression speed is slow. If the recursive depth is too large, the function call stack may exhaust the available memory space, resulting in a stack overflow error. In addition, the use of recursion has a depth limit in some scenarios. When the recursive depth exceeds the limit, the program may crash or exception, which may lead to low efficiency when extracting data from multi-level compressed files.

发明内容Summary of the invention

本发明提供一种快速提取多层级压缩文件的方法、系统、设备及介质,其主要目的在于解决相关技术中对多层级压缩文件进行数据提取时效率较低的问题。The present invention provides a method, system, device and medium for quickly extracting multi-level compressed files, the main purpose of which is to solve the problem of low efficiency in data extraction from multi-level compressed files in the related art.

为实现上述目的,本发明提供的一种快速提取多层级压缩文件的方法,包括:初始化阻塞队列、解压生产者线程以及解压消费者线程;利用解压生产者线程对待解压的多层级压缩文件进行遍历压缩文件拆分,得到主级压缩文件集以及主级解压文件集;利用解压生产者线程将主级压缩文件集存入阻塞队列中,得到解压任务队列;根据解压消费者线程对解压任务队列进行逐层多线程解压,得到子级解压结果;根据主级解压文件集以及子级解压结果对解压任务队列进行迭代入队解压,得到标准解压结果。To achieve the above-mentioned purpose, the present invention provides a method for quickly extracting multi-level compressed files, including: initializing a blocking queue, a decompression producer thread and a decompression consumer thread; using the decompression producer thread to traverse the compressed files to be decompressed and split them to obtain a main-level compressed file set and a main-level decompressed file set; using the decompression producer thread to store the main-level compressed file set in the blocking queue to obtain a decompression task queue; performing multi-threaded decompression on the decompression task queue layer by layer according to the decompression consumer thread to obtain a sub-level decompression result; iteratively enqueuing and decompressing the decompression task queue according to the main-level decompression file set and the sub-level decompression result to obtain a standard decompression result.

为了解决上述问题,本发明还提供一种快速提取多层级压缩文件的系统,系统包括:队列初始化模块,用于初始化阻塞队列、解压生产者线程以及解压消费者线程;遍历拆分模块,用于利用解压生产者线程对待解压的多层级压缩文件进行遍历压缩文件拆分,得到主级压缩文件集以及主级解压文件集;任务入队模块,用于利用解压生产者线程将主级压缩文件集存入阻塞队列中,得到解压任务队列;逐层解压模块,用于根据解压消费者线程对解压任务队列进行逐层多线程解压,得到子级解压结果;迭代解压模块,用于根据主级解压文件集以及子级解压结果对解压任务队列进行迭代入队解压,得到标准解压结果。In order to solve the above problems, the present invention also provides a system for quickly extracting multi-level compressed files, the system comprising: a queue initialization module, used to initialize a blocking queue, a decompression producer thread and a decompression consumer thread; a traversal and splitting module, used to use the decompression producer thread to traverse and split the multi-level compressed files to be decompressed, and obtain a main-level compressed file set and a main-level decompressed file set; a task enqueuing module, used to use the decompression producer thread to store the main-level compressed file set in a blocking queue, and obtain a decompression task queue; a layer-by-layer decompression module, used to perform layer-by-layer multi-thread decompression on the decompression task queue according to the decompression consumer thread, and obtain a sub-level decompression result; an iterative decompression module, used to iteratively enqueue and decompress the decompression task queue according to the main-level decompression file set and the sub-level decompression result, and obtain a standard decompression result.

为了解决上述问题,本发明还提供一种电子设备,电子设备包括:In order to solve the above problem, the present invention further provides an electronic device, the electronic device comprising:

至少一个处理器;以及,at least one processor; and,

与至少一个处理器通信连接的存储器;其中,a memory communicatively connected to at least one processor; wherein,

存储器存储有可被至少一个处理器执行的计算机程序,计算机程序被至少一个处理器执行,以使至少一个处理器能够执行上述的快速提取多层级压缩文件的方法。The memory stores a computer program that can be executed by at least one processor. The computer program is executed by at least one processor so that the at least one processor can execute the above-mentioned method for quickly extracting multi-level compressed files.

为了解决上述问题,本发明还提供一种计算机可读存储介质,存储有计算机程序,计算机程序被处理器执行时实现上述的快速提取多层级压缩文件的方法。In order to solve the above problem, the present invention also provides a computer-readable storage medium storing a computer program, which implements the above method for quickly extracting multi-level compressed files when executed by a processor.

本发明实施例通过初始化阻塞队列、解压生产者线程以及解压消费者线程能够实现多线程的队列解压任务,从而提高多层级压缩文件解压的稳定性和安全性,提高整体解压效率,通过进行遍历压缩文件拆分,能够实现初级解压,并筛选出子级的压缩文件,方便后续进行进一步的解压操作,通过将主级压缩文件集存入阻塞队列中,得到解压任务队列,可以防止解压时的依赖错误以及线程等待的现象,提高解压的效率,通过进行逐层多线程解压,能够利用多线程技术对多层级压缩文件进行并发解压处理,提高了数据解压的效率以及线程的利用率,通过进行迭代入队解压,可以使用迭代算法替代递归解压算法,从而提高解压性能和降低解压时的内存开销,并通过阻塞队列和多线程并发操作利用CPU的运算能力,得到更高效的解压效率。因此本发明提出的快速提取多层级压缩文件的方法、系统、设备及介质,可以解决对多层级压缩文件进行数据提取时效率较低的问题。The embodiment of the present invention can realize multi-threaded queue decompression tasks by initializing blocking queues, decompression producer threads and decompression consumer threads, thereby improving the stability and security of multi-level compressed file decompression, improving the overall decompression efficiency, and realizing primary decompression by traversing compressed file splitting, and screening out sub-level compressed files, so as to facilitate subsequent further decompression operations, and by storing the main level compressed file set in the blocking queue to obtain the decompression task queue, it is possible to prevent dependency errors and thread waiting during decompression, thereby improving the efficiency of decompression, and by performing layer-by-layer multi-thread decompression, it is possible to use multi-thread technology to perform concurrent decompression processing on multi-level compressed files, thereby improving the efficiency of data decompression and the utilization rate of threads, and by performing iterative queue decompression, it is possible to use an iterative algorithm to replace a recursive decompression algorithm, thereby improving the decompression performance and reducing the memory overhead during decompression, and by using the CPU computing power through blocking queues and multi-threaded concurrent operations, obtaining a more efficient decompression efficiency. Therefore, the method, system, device and medium for quickly extracting multi-level compressed files proposed by the present invention can solve the problem of low efficiency when extracting data from multi-level compressed files.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明一实施例提供的快速提取多层级压缩文件的方法的流程示意图;FIG1 is a schematic diagram of a flow chart of a method for rapidly extracting multi-level compressed files provided by an embodiment of the present invention;

图2为本发明一实施例提供的压缩文件拆分的流程示意图;FIG2 is a schematic diagram of a process of splitting a compressed file according to an embodiment of the present invention;

图3为本发明一实施例提供的逐层多线程解压的流程示意图;FIG3 is a schematic diagram of a process of layer-by-layer multi-thread decompression provided by an embodiment of the present invention;

图4为本发明一实施例提供的快速提取多层级压缩文件的系统的功能模块图;FIG4 is a functional module diagram of a system for rapidly extracting multi-level compressed files provided by an embodiment of the present invention;

图5为本发明一实施例提供的用于实现快速提取多层级压缩文件的方法的电子设备的结构示意图。FIG5 is a schematic diagram of the structure of an electronic device for implementing a method for rapidly extracting multi-level compressed files provided by an embodiment of the present invention.

本发明目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization of the purpose, functional features and advantages of the present invention will be further explained in conjunction with embodiments and with reference to the accompanying drawings.

具体实施方式DETAILED DESCRIPTION

应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。It should be understood that the specific embodiments described herein are only used to explain the present invention, and are not used to limit the present invention.

本申请实施例提供一种快速提取多层级压缩文件的方法。快速提取多层级压缩文件的方法的执行主体包括但不限于服务端、终端等能够被配置为执行本申请实施例提供的该方法的电子设备中的至少一种。换言之,快速提取多层级压缩文件的方法可以由安装在终端设备或服务端设备的软件或硬件来执行。服务端包括但不限于:单台服务器、服务器集群、云端服务器或云端服务器集群等。服务器可以是独立的服务器,也可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。The embodiment of the present application provides a method for quickly extracting multi-level compressed files. The execution subject of the method for quickly extracting multi-level compressed files includes but is not limited to at least one of the electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiment of the present application. In other words, the method for quickly extracting multi-level compressed files can be executed by software or hardware installed on a terminal device or a server device. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, etc. The server can be an independent server, or it can be a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and big data and artificial intelligence platforms.

参照图1所示,为本发明一实施例提供的快速提取多层级压缩文件的方法的流程示意图。在本实施例中,该快速提取多层级压缩文件的方法包括:1 is a flow chart of a method for quickly extracting a multi-level compressed file provided by an embodiment of the present invention. In this embodiment, the method for quickly extracting a multi-level compressed file includes:

S1、初始化阻塞队列、解压生产者线程以及解压消费者线程。S1, initialize the blocking queue, decompress the producer thread, and decompress the consumer thread.

详细地,阻塞队列(Blocking Queue)是一种支持两个附加操作的队列,当阻塞队列为空时,例如take等获取操作会阻塞线程,直到队列中有可用元素,当阻塞队列为满时,例如put等添加操作会阻塞线程,直到队列中有空闲空间。In detail, a blocking queue is a queue that supports two additional operations. When the blocking queue is empty, an acquisition operation such as take will block the thread until there are available elements in the queue. When the blocking queue is full, an addition operation such as put will block the thread until there is free space in the queue.

具体地,解压生产者线程是指用于多层级压缩文件解压的生产者线程(ProducerThread),生产者线程用于生成或获取需要处理的解压任务,并将解压任务放入阻塞队列中,若阻塞队列已满,生产者线程将被阻塞,直到阻塞队列中有空闲空间。Specifically, the decompression producer thread refers to the producer thread (ProducerThread) used for decompressing multi-level compressed files. The producer thread is used to generate or obtain the decompression tasks that need to be processed, and put the decompression tasks into the blocking queue. If the blocking queue is full, the producer thread will be blocked until there is free space in the blocking queue.

具体地,解压消费者线程是指用于多层级压缩文件解压的消费者线程(ConsumerThread),消费者线程用于从阻塞队列中取出解压任务并进行处理,若阻塞队列为空,消费者线程将被阻塞,直到阻塞队列中有可以用的解压任务。Specifically, the decompression consumer thread refers to a consumer thread (ConsumerThread) used for decompressing multi-level compressed files. The consumer thread is used to take out the decompression task from the blocking queue and process it. If the blocking queue is empty, the consumer thread will be blocked until there is an available decompression task in the blocking queue.

本发明实施例中,初始化阻塞队列、解压生产者线程以及解压消费者线程,包括:初始化任务队列;根据预先获取的CPU核心数对任务队列进行线程初始化,得到初始线程池;对线程池进行线程分配,得到解压生产者线程以及解压消费者线程;根据解压生产者线程以及解压消费者线程对任务队列进行阻塞配置,得到阻塞队列。In an embodiment of the present invention, a blocking queue, a decompression producer thread, and a decompression consumer thread are initialized, including: initializing a task queue; performing thread initialization on the task queue according to a pre-acquired number of CPU cores to obtain an initial thread pool; performing thread allocation on the thread pool to obtain a decompression producer thread and a decompression consumer thread; and performing blocking configuration on the task queue according to the decompression producer thread and the decompression consumer thread to obtain a blocking queue.

详细地,可以利用queue函数初始化任务队列,CPU核心数是指中央处理器中独立处理单元的数量,每个核心可以独立地执行计算任务,意味着多核CPU可以并行处理多个任务,从而提高整体计算性能。In detail, the queue function can be used to initialize the task queue. The number of CPU cores refers to the number of independent processing units in the central processing unit. Each core can perform computing tasks independently, which means that a multi-core CPU can process multiple tasks in parallel, thereby improving the overall computing performance.

具体地,可以利用thread pool size函数对任务队列进行线程初始化,线程初始化是指将初始线程池的线程数量设置为CPU核心数减一,从而使大部分核心都被利用,同时留有一定的缓冲,以应对系统其他任务。Specifically, the thread pool size function can be used to initialize threads in the task queue. Thread initialization means setting the number of threads in the initial thread pool to the number of CPU cores minus one, so that most cores are utilized while leaving a certain buffer to cope with other tasks of the system.

详细地,可以利用threading库的join函数对任务队列进行阻塞配置,得到阻塞队列,阻塞配置即当阻塞队列为空时,会阻塞解压消费者线程,直到队列中有可用元素,当阻塞队列为满时,会阻塞解压生产者线程,直到队列中有空闲空间。In detail, the join function of the threading library can be used to configure the task queue to block and obtain a blocking queue. The blocking configuration means that when the blocking queue is empty, the decompression consumer thread will be blocked until there are available elements in the queue. When the blocking queue is full, the decompression producer thread will be blocked until there is free space in the queue.

本发明实施例中,通过初始化阻塞队列、解压生产者线程以及解压消费者线程能够实现多线程的队列解压任务,从而提高多层级压缩文件解压的稳定性和安全性,提高整体解压效率。In the embodiment of the present invention, a multi-threaded queue decompression task can be implemented by initializing a blocking queue, a decompression producer thread, and a decompression consumer thread, thereby improving the stability and security of multi-level compressed file decompression and improving the overall decompression efficiency.

S2、利用解压生产者线程对待解压的多层级压缩文件进行遍历压缩文件拆分,得到主级压缩文件集以及主级解压文件集。S2. Use the decompression producer thread to traverse the multi-level compressed files to be decompressed and split the compressed files to obtain a primary-level compressed file set and a primary-level decompressed file set.

详细地,多层级压缩文件是指在计算机中进行了多次压缩的文件,一个压缩文件中包含另一些压缩文件,这些压缩文件中还包含另外的压缩文件,主级压缩文件集中的各个主级压缩文件是指多层级压缩文件中第一次解压后得到的子压缩文件,主级解压文件集中的各个主级解压文件是指多层级压缩文件中第一次解压得到的非压缩文件。In detail, a multi-level compressed file refers to a file that has been compressed multiple times in a computer, where one compressed file contains other compressed files, which in turn contain other compressed files. Each main-level compressed file in a main-level compressed file set refers to a sub-compressed file obtained after the first decompression of a multi-level compressed file. Each main-level decompressed file in a main-level decompressed file set refers to a non-compressed file obtained after the first decompression of a multi-level compressed file.

本发明实施例中,参照图2所示,上述利用解压生产者线程对待解压的多层级压缩文件进行遍历压缩文件拆分,得到主级压缩文件集以及主级解压文件集的流程,包括如下步骤:In the embodiment of the present invention, as shown in FIG. 2 , the process of using the decompression producer thread to traverse the compressed files to be decompressed to split the multi-level compressed files to obtain the primary compressed file set and the primary decompressed file set includes the following steps:

S21、利用解压生产者线程对待解压的多层级压缩文件进行文件扫描,得到主级文件集;S21, using the decompression producer thread to perform file scanning on the multi-level compressed files to be decompressed, to obtain a primary file set;

S22、逐个选取主级文件集中的文件作为目标文件,对目标文件进行压缩特征提取,得到目标压缩特征;S22, selecting files in the main file set one by one as target files, extracting compression features from the target files, and obtaining target compression features;

S23、对目标压缩特征进行压缩属性检测,得到目标检测结果;S23, performing compression attribute detection on the target compression feature to obtain a target detection result;

S24、根据所有的目标检测结果对主级文件集进行压缩筛选,得到主级压缩文件集以及主级解压文件集。S24. Compress and filter the primary file set according to all target detection results to obtain a primary compressed file set and a primary decompressed file set.

详细地,可以利用zipfile、tarfile或rarfile等解压库进行文件扫描,文件扫描是指打开多层级压缩文件,并对多层级压缩文件内的各个文件进行扫描,将所有扫描出的文件汇集成主级文件集。Specifically, decompression libraries such as zipfile, tarfile or rarfile may be used to perform file scanning, which refers to opening a multi-level compressed file and scanning each file in the multi-level compressed file, and collecting all scanned files into a master file set.

具体地,对目标文件进行压缩特征提取,得到目标压缩特征,包括:对目标文件进行后缀名提取,得到目标后缀名;对目标文件进行文件头提取,得到目标文件头;对目标文件头进行魔数提取,得到目标魔数;对目标魔数以及目标后缀名进行特征拼接融合,得到目标压缩特征。Specifically, compression features are extracted from the target file to obtain target compression features, including: extracting the suffix of the target file to obtain the target suffix; extracting the file header of the target file to obtain the target file header; extracting the magic number of the target file header to obtain the target magic number; and performing feature splicing and fusion on the target magic number and the target suffix to obtain the target compression features.

详细地,后缀名提取是指提取出目标文件的后缀名作为目标后缀名,目标后缀名可以是txt、zip或7z,文件头提取是指提取目标文件的文件头内容作为目标文件头,文件头文件的开头部分,包含关于文件内容的信息,这些信息通常包括文件格式、版本、大小、数据的编码方式以及其他元数据。In detail, suffix extraction refers to extracting the suffix of the target file as the target suffix. The target suffix can be txt, zip or 7z. File header extraction refers to extracting the file header content of the target file as the target file header. The beginning part of the file header file contains information about the file content. This information usually includes file format, version, size, data encoding method and other metadata.

具体地,魔数分析是指提取出目标文件头中的魔数,魔数是文件头中的特定字节序列,用于标识文件类型,每种文件格式通常都有一个独特的魔数,当系统或应用程序读取文件时,可以根据魔数快速识别文件的类型,例如GZ的魔数是1F8B。Specifically, magic number analysis refers to extracting the magic number in the target file header. The magic number is a specific byte sequence in the file header that is used to identify the file type. Each file format usually has a unique magic number. When the system or application reads the file, the file type can be quickly identified based on the magic number. For example, the magic number of GZ is 1F8B.

具体地,对目标压缩特征进行压缩属性检测,得到目标检测结果,包括:判断目标压缩特征是否包含压缩后缀名;若是,则将压缩属性作为目标压缩特征的目标检测结果;若否,则判断目标压缩特征是否包含压缩魔数;若是,则将压缩属性作为目标压缩特征的目标检测结果;若否,则将非压缩属性作为目标压缩特征的目标检测结果。Specifically, compression attribute detection is performed on the target compression feature to obtain a target detection result, including: determining whether the target compression feature contains a compression suffix; if so, using the compression attribute as the target detection result of the target compression feature; if not, determining whether the target compression feature contains a compression magic number; if so, using the compression attribute as the target detection result of the target compression feature; if not, using the non-compression attribute as the target detection result of the target compression feature.

详细地,压缩后缀名是指压缩文件默认的后缀名,例如rar、zip、tar以及gz等,压缩魔数是指压缩文件的文件头中的魔数,例如0x504B0304,1F8B以及0x526172211A0700等。In detail, the compression suffix refers to the default suffix of the compressed file, such as rar, zip, tar and gz, and the compression magic number refers to the magic number in the file header of the compressed file, such as 0x504B0304, 1F8B and 0x526172211A0700.

具体地,压缩属性是压缩文件的属性标签,非压缩属性是非压缩文件的属性标签,压缩筛选是指从主级文件集中筛选出目标检测结果为压缩属性的文件作为主级压缩文件汇集成主级压缩文件集,将主级文件集中目标检测结果为非压缩属性的文件作为主级解压文件汇集成主级解压文件集。Specifically, the compression attribute is the attribute label of the compressed file, the non-compression attribute is the attribute label of the non-compressed file, and compression filtering refers to filtering out the files whose target detection results are compression attributes from the main-level file set as main-level compressed files to form a main-level compressed file set, and filtering out the files whose target detection results are non-compression attributes in the main-level file set as main-level decompressed files to form a main-level decompressed file set.

本发明实施例中,通过进行遍历压缩文件拆分,能够实现初级解压,并筛选出子级的压缩文件,方便后续进行进一步的解压操作。In the embodiment of the present invention, by traversing and splitting the compressed file, primary decompression can be achieved, and the child compressed files can be screened out, which facilitates further decompression operations.

S3、利用解压生产者线程将主级压缩文件集存入阻塞队列中,得到解压任务队列。S3. Use the decompression producer thread to store the primary compressed file set into a blocking queue to obtain a decompression task queue.

详细地,解压任务队列中的各个解压任务是指需要进行进一步解压的压缩文件对应的解压任务,解压任务包含多层级压缩文件中主层级其余未解压的压缩文件。In detail, each decompression task in the decompression task queue refers to a decompression task corresponding to a compressed file that needs to be further decompressed, and the decompression task includes the remaining undecompressed compressed files of the main level in the multi-level compressed files.

本发明实施例中,利用解压生产者线程将主级压缩文件集存入阻塞队列中,得到解压任务队列,包括:利用压缩生产者线程对主级压缩文件集进行任务初始化,得到解压任务集;根据解压任务集对阻塞队列进行入队操作,得到解压任务队列。In an embodiment of the present invention, a decompression producer thread is used to store a main-level compressed file set in a blocking queue to obtain a decompression task queue, including: using a compression producer thread to initialize the task of the main-level compressed file set to obtain a decompression task set; and enqueuing the blocking queue according to the decompression task set to obtain a decompression task queue.

详细地,任务初始化是指初始化压缩生产者线程针对主级压缩文件集的入队任务,任务初始化包括指定主级压缩文件集中各个主级压缩文件的入队序号,可以利用queue.put函数实现入队操作。In detail, task initialization refers to initializing the enqueue task of the compression producer thread for the primary compressed file set. Task initialization includes specifying the enqueue sequence number of each primary compressed file in the primary compressed file set. The enqueue operation can be implemented using the queue.put function.

本发明实施例中,通过将主级压缩文件集存入阻塞队列中,得到解压任务队列,可以防止解压时的依赖错误以及线程等待的现象,提高解压的效率。In the embodiment of the present invention, by storing the primary compressed file set in a blocking queue to obtain a decompression task queue, dependency errors and thread waiting phenomena during decompression can be prevented, thereby improving the efficiency of decompression.

S4、根据解压消费者线程对解压任务队列进行逐层多线程解压,得到子级解压结果。S4. Perform multi-thread decompression on the decompression task queue layer by layer according to the decompression consumer thread to obtain a sub-level decompression result.

详细地,子级解压结果是指多层级解压文件的第二次解压出的文件,即包括解压出的次层级压缩文件以及非压缩文件。In detail, the sub-level decompression result refers to the file decompressed for the second time of the multi-level decompressed file, that is, it includes the decompressed sub-level compressed files and non-compressed files.

本发明实施例中,参照图3所示,上述根据解压消费者线程对解压任务队列进行逐层多线程解压,得到子级解压结果的流程,包括如下步骤:In the embodiment of the present invention, as shown in FIG. 3 , the process of performing layer-by-layer multi-thread decompression on the decompression task queue according to the decompression consumer thread to obtain the sub-level decompression result includes the following steps:

S31、根据解压消费者线程对解压任务队列进行并发任务选取,得到目标解压任务组;S31, selecting concurrent tasks from the decompression task queue according to the decompression consumer thread to obtain a target decompression task group;

S32、根据解压消费者线程对目标解压任务组进行多线程解压,得到线程解压文件集;S32, performing multi-thread decompression on the target decompression task group according to the decompression consumer thread to obtain a thread decompression file set;

S33、对线程解压文件集进行格式验证,得到验证解压文件集;S33, performing format verification on the thread decompression file set to obtain a verified decompression file set;

S34、对验证解压文件集进行结果更新,得到子级解压结果。S34, updating the result of the verification decompression file set to obtain the sub-level decompression result.

详细地,并发任务选取是指利用解压消费者线程中的各个线程逐个选取解压任务队列中的若干解压任务作为目标解压任务组,多线程解压是指解压消费者线程中的各个线程对目标解压任务组中的对应解压任务进行解压,得到线程解压文件,将所有的线程解压文件汇集成线程解压文件集,其中,解压方法可以是tar、zip以及rar等解压方法。In detail, concurrent task selection refers to using each thread in the decompression consumer thread to select several decompression tasks in the decompression task queue one by one as the target decompression task group, and multi-threaded decompression refers to each thread in the decompression consumer thread decompressing the corresponding decompression task in the target decompression task group to obtain the thread decompression file, and all the thread decompression files are collected into a thread decompression file set, wherein the decompression method can be tar, zip, rar and other decompression methods.

具体地,对线程解压文件集进行格式验证,得到验证解压文件集,包括:对线程解压文件集进行目录初始化,得到子级解压目录;对线程解压文件集进行压缩格式处理,得到格式解压文件集;对格式解压文件集进行完整性校验,得到校验解压文件集;根据子级解压目录对校验解压文件集进行目录配置,得到验证解压文件集。Specifically, format verification is performed on the thread decompressed file set to obtain a verified decompressed file set, including: directory initialization of the thread decompressed file set to obtain a sub-level decompression directory; compression format processing of the thread decompressed file set to obtain a format decompressed file set; integrity verification of the format decompressed file set to obtain a verified decompressed file set; directory configuration of the verified decompressed file set according to the sub-level decompression directory to obtain a verified decompressed file set.

详细地,目录初始化是指根据线程解压文件集中各个线程解压文件在多层级压缩文件中的层级结构初始化对应的数据目录,压缩格式处理是指对解压后的文件进行格式验证,完整性校验是指判断文件解压是否完整的校验,目录配置是指将校验解压文件集中的各个文件存储至对应的子级解压目录中。In detail, directory initialization refers to initializing the corresponding data directory according to the hierarchical structure of each thread decompressed file in the multi-level compressed file in the thread decompression file set, compression format processing refers to format verification of the decompressed file, integrity verification refers to the verification of whether the file decompression is complete, and directory configuration refers to storing each file in the verified decompressed file set in the corresponding sub-level decompression directory.

具体地,结果更新是根据验证解压文件集生成解压日志,并对目标解压任务组中的各个解压任务进行任务状态更新,将验证解压文件集和解压日志作为子级解压结果。Specifically, the result update is to generate a decompression log according to the verified decompression file set, and update the task status of each decompression task in the target decompression task group, and use the verified decompression file set and the decompression log as the sub-level decompression result.

本发明实施例中,通过进行逐层多线程解压,能够利用多线程技术对多层级压缩文件进行并发解压处理,提高了数据解压的效率以及线程的利用率。In the embodiment of the present invention, by performing layer-by-layer multi-thread decompression, multi-thread technology can be used to perform concurrent decompression processing on multi-level compressed files, thereby improving the efficiency of data decompression and the utilization rate of threads.

S5、根据主级解压文件集以及子级解压结果对解压任务队列进行迭代入队解压,得到标准解压结果。S5. Iterate and decompress the decompression task queue according to the main-level decompression file set and the sub-level decompression results to obtain a standard decompression result.

详细地,标准解压结果包含多层级压缩文件中各层级中被解压出的文件数据以及解压的完整日志数据。In detail, the standard decompression result includes the decompressed file data in each level of the multi-level compressed file and the decompressed complete log data.

本发明实施例中,根据主级解压文件集以及子级解压结果对解压任务队列进行迭代入队解压,得到标准解压结果,包括:对子级解压结果进行遍历压缩文件拆分,得到子级压缩文件集以及子级解压文件集;判断子级压缩文件集是否为空集;若否,则利用子级压缩文件集对主级压缩文件进行更新,利用解压任务队列对阻塞队列进行更新;返回利用解压生产者线程将主级压缩文件集存入阻塞队列中的步骤;若是,则将主级解压文件集以及所有的子级解压文件集汇集成标准解压结果。In an embodiment of the present invention, the decompression task queue is iteratively queued and decompressed according to the main-level decompression file set and the sub-level decompression results to obtain a standard decompression result, including: traversing the compressed files of the sub-level decompression results to split them, to obtain a sub-level compressed file set and a sub-level decompression file set; judging whether the sub-level compressed file set is an empty set; if not, updating the main-level compressed file using the sub-level compressed file set, and updating the blocking queue using the decompression task queue; returning to the step of storing the main-level compressed file set in the blocking queue using the decompression producer thread; if so, aggregating the main-level decompression file set and all the sub-level decompression file sets into a standard decompression result.

详细地,遍历压缩文件拆分的方法与上述步骤S1中的遍历压缩文件拆分的方法一致,通过判断子级压缩文件集是否为空集,能够实现对子级解压结果中各个子层级的压缩文件的迭代入队解压。In detail, the method of traversing compressed file splitting is consistent with the method of traversing compressed file splitting in the above step S1. By judging whether the sub-level compressed file set is an empty set, it is possible to iteratively queue and decompress compressed files of each sub-level in the sub-level decompression result.

本发明实施例中,通过进行迭代入队解压,可以使用迭代算法替代递归解压算法,从而提高解压性能和降低解压时的内存开销,并通过阻塞队列和多线程并发操作利用CPU的运算能力,得到更高效的解压效率。In an embodiment of the present invention, by performing iterative decompression, an iterative algorithm can be used instead of a recursive decompression algorithm, thereby improving decompression performance and reducing memory overhead during decompression, and utilizing the computing power of the CPU through blocking queues and multi-threaded concurrent operations to obtain more efficient decompression efficiency.

本发明实施例通过初始化阻塞队列、解压生产者线程以及解压消费者线程能够实现多线程的队列解压任务,从而提高多层级压缩文件解压的稳定性和安全性,提高整体解压效率,通过进行遍历压缩文件拆分,能够实现初级解压,并筛选出子级的压缩文件,方便后续进行进一步的解压操作,通过将主级压缩文件集存入阻塞队列中,得到解压任务队列,可以防止解压时的依赖错误以及线程等待的现象,提高解压的效率,通过进行逐层多线程解压,能够利用多线程技术对多层级压缩文件进行并发解压处理,提高了数据解压的效率以及线程的利用率,通过进行迭代入队解压,可以使用迭代算法替代递归解压算法,从而提高解压性能和降低解压时的内存开销,并通过阻塞队列和多线程并发操作利用CPU的运算能力,得到更高效的解压效率。因此本发明提出的快速提取多层级压缩文件的方法,可以解决对多层级压缩文件进行数据提取时效率较低的问题。The embodiment of the present invention can realize the multi-threaded queue decompression task by initializing the blocking queue, the decompression producer thread and the decompression consumer thread, thereby improving the stability and security of the decompression of the multi-level compressed file, improving the overall decompression efficiency, and realizing primary decompression by traversing the compressed file splitting, and screening out the compressed files of the sub-level, so as to facilitate the subsequent further decompression operation, and by storing the main level compressed file set in the blocking queue, obtaining the decompression task queue, it is possible to prevent the dependency error and the phenomenon of thread waiting during decompression, and improve the decompression efficiency, and by performing layer-by-layer multi-thread decompression, it is possible to use the multi-thread technology to perform concurrent decompression processing on the multi-level compressed file, thereby improving the efficiency of data decompression and the utilization rate of threads, and by performing iterative queue decompression, it is possible to use the iterative algorithm to replace the recursive decompression algorithm, thereby improving the decompression performance and reducing the memory overhead during decompression, and by using the CPU computing power through the blocking queue and multi-thread concurrent operation, a more efficient decompression efficiency is obtained. Therefore, the method for quickly extracting multi-level compressed files proposed by the present invention can solve the problem of low efficiency when extracting data from multi-level compressed files.

如图4所示,是本发明一实施例提供的快速提取多层级压缩文件的系统的功能模块图。As shown in FIG. 4 , it is a functional module diagram of a system for rapidly extracting multi-level compressed files provided by an embodiment of the present invention.

本发明快速提取多层级压缩文件的系统400可以安装于电子设备中。根据实现的功能,快速提取多层级压缩文件的系统400可以包括队列初始化模块401、遍历拆分模块402、任务入队模块403、逐层解压模块404及迭代解压模块405。本发明模块也可以称之为单元,是指一种能够被电子设备处理器所执行,并且能够完成固定功能的一系列计算机程序段,其存储在电子设备的存储器中。The system 400 for quickly extracting multi-level compressed files of the present invention can be installed in an electronic device. According to the functions implemented, the system 400 for quickly extracting multi-level compressed files can include a queue initialization module 401, a traversal splitting module 402, a task queue module 403, a layer-by-layer decompression module 404, and an iterative decompression module 405. The module of the present invention can also be called a unit, which refers to a series of computer program segments that can be executed by an electronic device processor and can complete fixed functions, and is stored in the memory of the electronic device.

在本实施例中,关于各模块/单元的功能如下:In this embodiment, the functions of each module/unit are as follows:

队列初始化模块401,用于初始化阻塞队列、解压生产者线程以及解压消费者线程;The queue initialization module 401 is used to initialize the blocking queue, decompress the producer thread, and decompress the consumer thread;

遍历拆分模块102,用于利用解压生产者线程对待解压的多层级压缩文件进行遍历压缩文件拆分,得到主级压缩文件集以及主级解压文件集;A traversal splitting module 102 is used to traverse and split the multi-level compressed files to be decompressed by using a decompression producer thread to obtain a primary-level compressed file set and a primary-level decompressed file set;

任务入队模块403,用于利用解压生产者线程将主级压缩文件集存入阻塞队列中,得到解压任务队列;The task queue module 403 is used to store the primary compressed file set into the blocking queue by using the decompression producer thread to obtain a decompression task queue;

逐层解压模块404,用于根据解压消费者线程对解压任务队列进行逐层多线程解压,得到子级解压结果;A layer-by-layer decompression module 404 is used to perform layer-by-layer multi-thread decompression on the decompression task queue according to the decompression consumer thread to obtain a sub-level decompression result;

迭代解压模块405,用于根据主级解压文件集以及子级解压结果对解压任务队列进行迭代入队解压,得到标准解压结果。The iterative decompression module 405 is used to iteratively decompress the decompression task queue according to the main-level decompression file set and the sub-level decompression results to obtain a standard decompression result.

详细地,本发明实施例中快速提取多层级压缩文件的系统400中的各模块在使用时采用与上述图1中的快速提取多层级压缩文件的方法一样的技术手段,并能够产生相同的技术效果,这里不再赘述。In detail, each module in the system 400 for quickly extracting multi-level compressed files in the embodiment of the present invention adopts the same technical means as the method for quickly extracting multi-level compressed files in Figure 1 above when used, and can produce the same technical effects, which will not be repeated here.

如图5所示,是本发明一实施例提供的用于实现快速提取多层级压缩文件的方法的电子设备的结构示意图。As shown in FIG. 5 , it is a schematic diagram of the structure of an electronic device for implementing a method for rapidly extracting multi-level compressed files provided by an embodiment of the present invention.

电子设备501可以包括处理器510、存储器511、通信总线512以及通信接口513,还可以包括存储在存储器511中并可在处理器510上运行的计算机程序,如快速提取多层级压缩文件的程序。The electronic device 501 may include a processor 510, a memory 511, a communication bus 512, and a communication interface 513. It may also include a computer program stored in the memory 511 and executable on the processor 510, such as a program for quickly extracting multi-level compressed files.

其中,处理器510在一些实施例中可以由集成电路组成,例如可以由单个封装的集成电路所组成,也可以是由多个相同功能或不同功能封装的集成电路所组成,包括一个或者多个中央处理器(Central Processing unit,CPU)、微处理器、数字处理芯片、图形处理器及各种控制芯片的组合等。处理器510是电子设备的控制核心(Control Unit),利用各种接口和线路连接整个电子设备的各个部件,通过运行或执行存储在存储器511内的程序或者模块(例如执行快速提取多层级压缩文件的程序等),以及调用存储在存储器511内的数据,以执行电子设备的各种功能和处理数据。In some embodiments, the processor 510 may be composed of an integrated circuit, for example, a single packaged integrated circuit, or a plurality of packaged integrated circuits with the same or different functions, including one or more central processing units (CPUs), microprocessors, digital processing chips, graphics processors, and combinations of various control chips. The processor 510 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect various components of the entire electronic device, and executes or executes programs or modules stored in the memory 511 (for example, a program for quickly extracting multi-level compressed files, etc.), and calls data stored in the memory 511 to execute various functions of the electronic device and process data.

存储器511至少包括一种类型的可读存储介质,可读存储介质包括闪存、移动硬盘、多媒体卡、卡型存储器(例如:SD或DX存储器等)、磁性存储器、磁盘、光盘等。存储器511在一些实施例中可以是电子设备的内部存储单元,例如该电子设备的移动硬盘。存储器511在另一些实施例中也可以是电子设备的外部存储设备,例如电子设备上配备的插接式移动硬盘、智能存储卡(Smart Media Card,SMC)、安全数字(Secure Digital,SD)卡、闪存卡(Flash Card)等。进一步地,存储器511还可以既包括电子设备的内部存储单元也包括外部存储设备。存储器511不仅可以用于存储安装于电子设备的应用软件及各类数据,例如快速提取多层级压缩文件的程序的代码等,还可以用于暂时地存储已经输出或者将要输出的数据。The memory 511 includes at least one type of readable storage medium, and the readable storage medium includes a flash memory, a mobile hard disk, a multimedia card, a card-type memory (for example, SD or DX memory, etc.), a magnetic memory, a disk, an optical disk, etc. In some embodiments, the memory 511 may be an internal storage unit of an electronic device, such as a mobile hard disk of the electronic device. In other embodiments, the memory 511 may also be an external storage device of an electronic device, such as a plug-in mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, a flash card (Flash Card), etc. equipped on the electronic device. Further, the memory 511 may also include both an internal storage unit of the electronic device and an external storage device. The memory 511 can not only be used to store application software and various types of data installed in the electronic device, such as the code of a program for quickly extracting multi-level compressed files, etc., but also can be used to temporarily store data that has been output or is to be output.

通信总线512可以是外设部件互连标准(peripheral component interconnect,简称PCI)总线或扩展工业标准结构(extended industry standard architecture,简称EISA)总线等。该总线可以分为地址总线、数据总线、控制总线等。总线被设置为实现存储器511以及至少一个处理器510等之间的连接通信。The communication bus 512 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is configured to realize connection and communication between the memory 511 and at least one processor 510, etc.

通信接口513用于上述电子设备与其他设备之间的通信,包括网络接口和用户接口。可选地,网络接口可以包括有线接口和/或无线接口(如WI-FI接口、蓝牙接口等),通常用于在该电子设备与其他电子设备之间建立通信连接。用户接口可以是显示器(Display)、输入单元(比如键盘(Keyboard)),可选地,用户接口还可以是标准的有线接口、无线接口。可选地,在一些实施例中,显示器可以是LED显示器、液晶显示器、触控式液晶显示器以及OLED(Organic Light-Emitting Diode,有机发光二极管)触摸器等。其中,显示器也可以适当的称为显示屏或显示单元,用于显示在电子设备中处理的信息以及用于显示可视化的用户界面。The communication interface 513 is used for communication between the above-mentioned electronic device and other devices, including a network interface and a user interface. Optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is generally used to establish a communication connection between the electronic device and other electronic devices. The user interface may be a display (Display), an input unit (such as a keyboard (Keyboard)), and optionally, the user interface may also be a standard wired interface, a wireless interface. Optionally, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, and an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, etc. Among them, the display may also be appropriately referred to as a display screen or a display unit, which is used to display information processed in the electronic device and to display a visual user interface.

图中仅示出了具有部件的电子设备,本领域技术人员可以理解的是,图中示出的结构并不构成对电子设备的限定,可以包括比图示更少或者更多的部件,或者组合某些部件,或者不同的部件布置。The figure only shows an electronic device with components. Those skilled in the art will understand that the structure shown in the figure does not constitute a limitation on the electronic device, and may include fewer or more components than shown in the figure, or combine certain components, or arrange the components differently.

例如,尽管未示出,电子设备还可以包括给各个部件供电的电源(比如电池),优选地,电源可以通过电源管理系统与至少一个处理器510逻辑相连,从而通过电源管理系统实现充电管理、放电管理、以及功耗管理等功能。电源还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。电子设备还可以包括多种传感器、蓝牙模块、Wi-Fi模块等,在此不再赘述。For example, although not shown, the electronic device may also include a power source (such as a battery) for supplying power to each component. Preferably, the power source may be logically connected to at least one processor 510 through a power management system, so that the power management system can realize functions such as charging management, discharging management, and power consumption management. The power source may also include any components such as one or more DC or AC power sources, recharging systems, power failure detection circuits, power converters or inverters, and power status indicators. The electronic device may also include a variety of sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.

应该了解,实施例仅为说明之用,在专利申请范围上并不受此结构的限制。It should be understood that the embodiment is for illustration only and the scope of the patent application is not limited by this structure.

具体地,处理器510对上述指令的具体实现方法可参考附图对应实施例中相关步骤的描述,在此不赘述。Specifically, the specific implementation method of the processor 510 for the above instructions can refer to the description of the relevant steps in the corresponding embodiment of the accompanying drawings, which will not be repeated here.

进一步地,电子设备501集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读存储介质中。计算机可读存储介质可以是易失性的,也可以是非易失性的。例如,计算机可读介质可以包括:能够携带计算机程序代码的任何实体或系统、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)。Furthermore, if the module/unit integrated in the electronic device 501 is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. The computer-readable storage medium can be volatile or non-volatile. For example, the computer-readable medium can include: any entity or system capable of carrying computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory).

对于本领域技术人员而言,显然本发明不限于上述示范性实施例的细节,而且在不背离本发明的精神或基本特征的情况下,能够以其他的具体形式实现本发明。It is obvious to those skilled in the art that the present invention is not limited to the details of the above exemplary embodiments, and that the present invention can be implemented in other specific forms without departing from the spirit or essential characteristics of the present invention.

因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本发明的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本发明内。不应将权利要求中的任何附关联图标记视为限制所涉及的权利要求。Therefore, no matter from which point of view, the embodiments should be regarded as illustrative and non-restrictive, and the scope of the present invention is limited by the appended claims rather than the above description, so it is intended that all changes falling within the meaning and scope of the equivalent elements of the claims are included in the present invention. Any attached figure mark in the claims should not be regarded as limiting the claims involved.

最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention rather than to limit it. Although the present invention has been described in detail with reference to the preferred embodiments, those skilled in the art should understand that the technical solution of the present invention can be modified or replaced by equivalents without departing from the spirit and scope of the technical solution of the present invention.

Claims (12)

1. A method for rapidly extracting a multi-level compressed file, the method comprising:
Initializing a blocking queue, decompressing producer threads and decompressing consumer threads;
Traversing the compressed files to be decompressed by using the decompression producer thread, splitting the compressed files to obtain a main compressed file set and a main decompressed file set;
Storing the main-stage compressed file set into the blocking queue by using the decompression producer thread to obtain a decompression task queue;
Performing layer-by-layer multithreading decompression on the decompression task queue according to the decompression consumer thread to obtain a sub-level decompression result;
And performing iterative enqueue decompression on the decompression task queue according to the main-stage decompression file set and the sub-stage decompression result to obtain a standard decompression result.
2. The method for quickly extracting a multi-level compressed file according to claim 1, wherein initializing a blocking queue, decompressing a producer thread, and decompressing a consumer thread comprises:
Initializing a task queue;
Carrying out thread initialization on the task queue according to the CPU core number obtained in advance to obtain an initial thread pool;
Thread allocation is carried out on the thread pool, so that a decompression producer thread and a decompression consumer thread are obtained;
And performing blocking configuration on the task queue according to the decompression producer thread and the decompression consumer thread to obtain a blocking queue.
3. The method for quickly extracting a multi-level compressed file according to claim 1, wherein traversing the compressed file splitting by the decompression producer thread to obtain a main-level compressed file set and a main-level decompressed file set, includes:
Carrying out file scanning on the multi-level compressed file to be decompressed by using the decompression producer thread to obtain a main-level file set;
selecting files in the main file set one by one as target files, and extracting compression characteristics of the target files to obtain target compression characteristics;
Performing compression attribute detection on the target compression characteristics to obtain a target detection result;
And carrying out compression screening on the main-level file set according to all target detection results to obtain a main-level compressed file set and a main-level decompressed file set.
4. The method for quickly extracting a multi-level compressed file according to claim 3, wherein the extracting the compression feature of the target file to obtain the target compression feature includes:
extracting the suffix name of the target file to obtain a target suffix name;
Extracting a file header of the target file to obtain a target file header;
performing magic number extraction on the target file header to obtain a target magic number;
And carrying out feature stitching fusion on the target magic numbers and the target suffix names to obtain target compression features.
5. The method for rapidly extracting a multi-level compressed file according to claim 3, wherein the performing compression attribute detection on the target compression feature to obtain a target detection result includes:
Judging whether the target compression characteristic contains a compression suffix name or not;
If yes, taking the compression attribute as a target detection result of the target compression characteristic;
If not, judging whether the target compression characteristic comprises a compression magic number or not;
If yes, taking the compression attribute as a target detection result of the target compression characteristic;
and if not, taking the non-compression attribute as a target detection result of the target compression characteristic.
6. The method for quickly extracting multi-level compressed files according to claim 1, wherein storing the set of main-level compressed files in the blocking queue by the decompression producer thread to obtain a decompressed task queue comprises:
performing task initialization on the main-stage compressed file set by using the compression producer thread to obtain a decompression task set;
and performing enqueuing operation on the blocking queue according to the decompression task set to obtain a decompression task queue.
7. The method for quickly extracting a multi-level compressed file according to claim 1, wherein the step of performing layer-by-layer multi-thread decompression on the decompression task queue according to the decompression consumer thread to obtain a sub-level decompression result comprises:
the concurrent task selection is carried out on the decompression task queue according to the decompression consumer thread, and a target decompression task group is obtained;
multithreading decompression is carried out on the target decompression task group according to the decompression consumer thread, so that a thread decompression file set is obtained;
performing format verification on the thread decompressed file set to obtain a verified decompressed file set;
and updating the result of the verification decompression file set to obtain a sub-level decompression result.
8. The method for quickly extracting the multi-level compressed file according to claim 7, wherein the performing format verification on the thread decompressed file set to obtain a verified decompressed file set includes:
initializing the catalogue of the thread decompressed file set to obtain a sub-level decompressed catalogue;
compressing the thread decompressed file set to obtain a formatted decompressed file set;
carrying out integrity check on the format decompressed file set to obtain a check decompressed file set;
And carrying out directory configuration on the verification decompressed file set according to the sub-level decompressed directory to obtain the verification decompressed file set.
9. The method for quickly extracting a multi-level compressed file according to claim 1, wherein the performing iterative enqueuing decompression on the decompression task queue according to the main-level decompression file set and the sub-level decompression result to obtain a standard decompression result comprises:
Traversing the compressed file splitting to the sub-level decompression result to obtain a sub-level compressed file set and a sub-level decompressed file set;
Judging whether the sub-level compressed file set is an empty set or not;
if not, updating the main-stage compressed file by using the sub-stage compressed file set, and updating the blocking queue by using the decompression task queue;
Returning to said storing said primary compressed file set in said blocking queue using said decompression producer thread;
If yes, the main-level decompressed file set and all the sub-level decompressed file sets are collected into a standard decompressed result.
10. A system for fast extraction of multi-level compressed files, the system comprising:
the queue initialization module is used for initializing a blocking queue, decompressing a producer thread and decompressing a consumer thread;
The traversal splitting module is used for carrying out traversal compression file splitting on the multi-level compression file to be decompressed by using the decompression producer thread to obtain a main-level compression file set and a main-level decompression file set;
The task enqueuing module is used for storing the main-level compressed file set into the blocking queue by using the decompression producer thread to obtain a decompression task queue;
The layer-by-layer decompression module is used for performing layer-by-layer multithreading decompression on the decompression task queue according to the decompression consumer thread to obtain a sub-level decompression result;
And the iterative decompression module is used for carrying out iterative enqueuing decompression on the decompression task queue according to the main-stage decompression file set and the sub-stage decompression result to obtain a standard decompression result.
11. An electronic device, the electronic device comprising:
At least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of rapidly extracting multi-level compressed files according to any one of claims 1 to 9.
12. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of fast extraction of multi-level compressed files according to any one of claims 1 to 9.
CN202410796499.5A 2024-06-19 2024-06-19 Method, system, device and medium for quickly extracting multi-level compressed files Pending CN118819826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410796499.5A CN118819826A (en) 2024-06-19 2024-06-19 Method, system, device and medium for quickly extracting multi-level compressed files

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410796499.5A CN118819826A (en) 2024-06-19 2024-06-19 Method, system, device and medium for quickly extracting multi-level compressed files

Publications (1)

Publication Number Publication Date
CN118819826A true CN118819826A (en) 2024-10-22

Family

ID=93067097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410796499.5A Pending CN118819826A (en) 2024-06-19 2024-06-19 Method, system, device and medium for quickly extracting multi-level compressed files

Country Status (1)

Country Link
CN (1) CN118819826A (en)

Similar Documents

Publication Publication Date Title
US20230126005A1 (en) Consistent filtering of machine learning data
US11100420B2 (en) Input processing for machine learning
CN113778982B (en) Data migration method and device
CN108134609A (en) Multithreading compression and decompressing method and the device of a kind of conventional data gz forms
CN110489387A (en) Compress softwares method, apparatus, electronic equipment and storage medium
WO2022048204A1 (en) Image generation method and apparatus, electronic device, and computer readable storage medium
CN113901000A (en) Data processing method and device, computer readable medium and electronic equipment
CN118819826A (en) Method, system, device and medium for quickly extracting multi-level compressed files
CN108551580A (en) Video file code-transferring method and electronic equipment in a kind of electronic equipment
CN106873906A (en) Method and apparatus for managing metamessage
WO2019119336A1 (en) Multi-thread compression and decompression methods in generic data gz format, and device
CN115576967A (en) A primary key generation method and device
CN115081233A (en) Process simulation method and electronic device
CN115543528A (en) Data processing method, related device and storage medium
CN115407936A (en) Data set processing method, system, terminal and computer-readable storage medium
CN115114297A (en) Data lightweight storage and search method, device, electronic device and storage medium
CN113961357A (en) Method and device for processing data priority, electronic equipment and storage medium
CN116737815B (en) Data extraction method, device, electronic device and storage medium
CN115098860B (en) Rapid android ROP Trojan horse detection method based on large-scale graph analysis
CN119938004B (en) Methods, apparatuses, electronic devices, and storage media for processing loop code.
CN113742064B (en) Resource arrangement method, system, equipment and medium of server cluster
US11397586B1 (en) Unified and compressed statistical analysis data
CN115827574A (en) Method, device and storage medium for reading video playback data
CN113138858B (en) Data processing methods and apparatus
HK40079472A (en) Data processing method, related apparatus and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination