[go: up one dir, main page]

CN119127514B - Pulsar Fourier domain acceleration search pipeline parallel method and device - Google Patents

Pulsar Fourier domain acceleration search pipeline parallel method and device Download PDF

Info

Publication number
CN119127514B
CN119127514B CN202411614716.0A CN202411614716A CN119127514B CN 119127514 B CN119127514 B CN 119127514B CN 202411614716 A CN202411614716 A CN 202411614716A CN 119127514 B CN119127514 B CN 119127514B
Authority
CN
China
Prior art keywords
task
sub
thread
data
gpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202411614716.0A
Other languages
Chinese (zh)
Other versions
CN119127514A (en
Inventor
汤昭荣
潘秋红
毛旷
王琪
陈华曦
任祖杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202411614716.0A priority Critical patent/CN119127514B/en
Publication of CN119127514A publication Critical patent/CN119127514A/en
Application granted granted Critical
Publication of CN119127514B publication Critical patent/CN119127514B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/14Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
    • G06F17/141Discrete Fourier transforms
    • G06F17/142Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Discrete Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a pulsar Fourier domain acceleration search pipeline parallel method and device, which comprise the steps of distributing received astronomical data to a plurality of parallel processes, preprocessing the distributed astronomical data by utilizing a first CPU (Central processing Unit) sub-thread, reading the preprocessed data by utilizing a GPU (graphics processing Unit) sub-thread, executing acceleration calculation on the GPU to obtain candidate signal data, reading the candidate signal data by utilizing a second CPU sub-thread, carrying out post-processing and result summarization, synchronizing task states among the sub-threads by utilizing queues and queue blocking locks of a plurality of parallel process structures, dynamically adjusting the number of processes in a processing flow by monitoring feedback, recording the task states and calculation results of the processes in real time, and processing abnormal conditions. The invention can effectively promote the pulse double-star searching process, supports the parallel acceleration by utilizing a plurality of GPUs and greatly promotes the speed of searching the celestial body of the type from the FAST astronomical data.

Description

Pulsar Fourier domain acceleration search pipeline parallel method and device
Technical Field
The invention belongs to the technical field of astronomical data high-performance calculation, and particularly relates to a pulsar Fourier domain acceleration search pipeline parallel method and device.
Background
Pulsar is a compact neutron star in universe, has high-speed autorotation and strong electromagnetic radiation characteristics, and becomes an important target in astronomical observation due to the stable autorotation period. The discovery of pulsar provides rich data for astrology and astrophysics, and helps scientists to know key physical phenomena such as gravitational waves, generalized relativity and the like in depth. However, with the continuous progress of the observation technology, especially the use of high sensitivity telescopes such as FAST (five hundred meter caliber spherical radio telescope), the generated data volume is rapidly increased, and the pulsar search needs to process massive observation data, which brings unprecedented challenges, whereas the traditional time domain search method can identify the signal of a single pulsar to a certain extent, but is difficult to cope with the complexity of a double-star system and the requirement of quick search, so that the development of a new search method is of great importance.
To address the challenges described above, fourier Domain Accelerated Search (FDAS) algorithms have been developed. The algorithm processes the observation data by converting the observation data into a frequency domain, and the efficiency of pulsar searching is obviously improved by using mathematical tools such as Fast Fourier Transform (FFT). Currently, the prest project is taken as one of representative schemes of FDAS algorithm, and has shown strong searching capability in practice, however, the implementation of the prest project on the GPU has remarkable performance bottleneck, and the main problems are that the execution efficiency of the conventional multi-process scheme based on command lines on the GPU is not ideal, the GPU resource allocation and management are not efficient enough, the GPU utilization rate is low, the calculation delay is remarkably increased, the throughput of the whole system cannot reach the expectations, and meanwhile, the performance fluctuation is large and the resource consumption is too concentrated, so that a great amount of calculation resources are consumed in the searching process, and high operation cost is brought. These problems severely limit the application potential of prest's project and its similar approaches in large-scale astronomical data processing.
With the rapid development of heterogeneous computing technology, especially the popularization of CPU and GPU cooperative computing modes, a new solution idea is provided for the computation-intensive tasks such as pulsar search and the like. However, the existing FDAS algorithm has not fully utilized its performance advantages. The prior art is not fully combined with the advantages of the application layer characteristics and the underlying hardware architecture in design, so that the system performance is not utilized to the maximum in the actual operation process, and the calculation efficiency and the resource utilization rate still have a larger improvement space.
Therefore, in order to overcome the performance bottleneck problem in the existing scheme, the potential of the heterogeneous computing architecture is fully exploited, more efficient and economical pulsar search is realized, the FDAS algorithm is necessary to be deeply redesigned, the data processing flow is optimized, and the parallelism and the execution efficiency of the algorithm are improved. This will not only drive the deep development of pulsar research, but will also contribute an important force for the advancement of astrology and astrophysics.
Disclosure of Invention
In view of the above, the present invention aims to provide a parallel method and apparatus for a pulsar fourier domain acceleration search pipeline, which can improve a pulsar double-star search process with several times of performance advantages by designing a multi-thread architecture in multiple processes and in each process to perform parallel processing on astronomical data, and simultaneously support the speed of searching for the type of celestial body from FAST astronomical data by using multiple GPUs for parallel acceleration, which can be several tens of times.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
In a first aspect, the method for parallelizing the pulsar fourier domain acceleration search pipeline provided by the embodiment of the invention comprises the following steps:
distributing the received astronomical data to a plurality of parallel processes with configurable quantity;
Dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;
dynamically adjusting the number of processes in the processing flow by monitoring feedback;
And recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
Specifically, the preprocessing the distributed astronomical data by using the first CPU sub-thread includes:
In the first CPU sub-thread, preprocessing is carried out on the distributed astronomical data, wherein the preprocessing comprises the creation and initialization of a harmonic and sub-harmonic information structure body, and the harmonic and sub-harmonic information structure body comprises the frequency domain data distribution of each harmonic and the memory requirement of each harmonic.
Specifically, the method for obtaining candidate signal data by using the GPU sub-thread to read the preprocessed data and performing acceleration calculation on the GPU includes:
And in the GPU sub-thread, reading a harmonic wave and sub-harmonic wave information structure body obtained by preprocessing the first CPU sub-thread, distributing resources on the GPU according to predefined task parameters, and executing accelerated calculation comprising Fourier transformation and candidate signal data generation to obtain candidate signal data.
Specifically, the method for reading candidate signal data by using the second CPU sub-thread and performing post-processing and result summarization includes:
And in the second CPU sub-thread, the candidate signal data obtained by accelerating the calculation of the GPU sub-thread is read, and the candidate signal data is subjected to post-processing comprising sequencing and screening and formatted to output a result summary file.
Specifically, the predefined task parameters include:
The maximum z value of the accelerated search, the maximum w value of the accelerated search, the threshold of signal detection, the number of harmonics used in the accelerated search, and the amount of data processed by a task, wherein the z value represents the width of the fourier window and the w value represents the accelerated search depth parameter.
Specifically, the task state includes:
task ID, current processing stage, process ID of execution, start time and last active time of task, and timestamp or timeout status of task completion.
Specifically, the dynamically adjusting the number of processes in the processing flow by monitoring feedback includes:
When the task execution timeout is detected, reassigning the task with the execution timeout to a task queue;
Setting a stop mark of the process to stop the original process, monitoring the exit state of the process and waiting for the process to exit normally in a set time, if the process does not exit in a preset time, forcibly stopping the process, re-creating a new process after the original process exits, and adding the new process into a working process pool to ensure that the number of processes in the working process pool reaches a specified number.
Specifically, the method further comprises:
Providing a debugging mode and a non-debugging mode selection, redirecting standard output to remove redundant task execution log records in the non-debugging mode, reserving detailed task execution logs in the debugging mode, and monitoring and troubleshooting problems according to the detailed task execution logs.
In order to achieve the aim of the invention, the embodiment of the invention also provides a pulsar Fourier domain acceleration search pipeline parallel device which is realized by the pulsar Fourier domain acceleration search pipeline parallel method, comprising a task distribution module, a task execution module, a task monitoring module and a task recording module;
The task allocation module is used for allocating the received astronomical data to a plurality of parallel processes with configurable quantity;
The task execution module is used for dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;
the task monitoring module is used for dynamically adjusting the number of processes in the processing flow through monitoring feedback;
the task recording module is used for recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
In a third aspect, to achieve the above object, an embodiment of the present invention further provides an electronic device, including a memory and one or more processors, where the memory is configured to store a computer program, and the processors are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when the computer program is executed.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
(1) Isolation is realized through multi-process parallelization processing, so that each process has independent memory space and resources, thread conflict is effectively avoided among a plurality of processes, and thread safety is ensured. The isolation mechanism not only improves the stability of the program, but also provides a reliable running environment for complex data processing.
(2) By arranging a plurality of serial sub-threads in each process and skillfully utilizing the working mode of the pipeline, the non-waiting utilization of GPU resources is realized, and the GPU can continuously receive processing tasks by the mode, so that the resource utilization rate of the GPU is greatly improved. In addition, the pipeline mode optimizes task allocation and scheduling, and further improves the overall processing efficiency.
(3) Due to the complexity of an astronomical data structure, the problem of shared memory can occur only by carrying out complex data processing and exchange through multiple processes, and the multiple processes and the adoption of multiple sub-threads for processing in each process provided by the invention not only fully utilize the parallel computing capability of a multi-core processor, but also effectively avoid the problem of shared memory through reasonable task division and inter-thread communication mechanisms, thereby remarkably improving the data processing efficiency, reducing the system overhead and enhancing the stability and the expandability of programs.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a parallel method for a pulsar Fourier domain acceleration search pipeline provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of a sub-process workflow in each process provided by an embodiment of the present invention;
FIG. 3 is a schematic flow chart of processing an original process of a timeout task according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a parallel device of a pulsar Fourier domain acceleration search pipeline according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
Aiming at the problems of low parallelism and execution efficiency of a pulsar Fourier domain acceleration search algorithm in the prior art, the embodiment of the invention provides a pulsar Fourier domain acceleration search pipeline parallelism method and device, which are used for processing astronomical data by designing multiple processes and designing a multithreading processing stage comprising a first CPU sub-thread, a GPU sub-thread and a second CPU sub-thread in each process, so that the parallel processing efficiency of astronomical data can be accelerated, the pulsar double-star search process can be effectively improved, and the speed of searching the type of astronomical object from FAST astronomical data can be improved by tens of times by using multiple GPU parallel acceleration.
Fig. 1 is a schematic flow chart of a parallel method of a pulsar fourier domain acceleration search pipeline according to an embodiment of the present invention. As shown in fig. 1, an embodiment provides a pulsar fourier domain accelerated search pipeline parallel method, which includes the following steps:
s1, distributing the received astronomical data to a plurality of parallel processes with configurable quantity.
In an embodiment, astronomical data is received, which is an FFT (fast fourier transform) file generated by realfft commands in the pulsar search and analysis software developed by Scott Ransom, the FFT file containing frequency information of astronomical observation signals, in particular fourier transform results based on sampling points.
A task queue for receiving and managing FFT files is initialized. Initializing a result queue for receiving the final processing result, and receiving the processing result generated by the second CPU sub-thread after each task is completed through the result queue, wherein the content comprises output data for accelerating searching and related statistical information thereof. Initializing a working process pool formed by a plurality of working processes, wherein each process comprises three sub-threads and two intermediate result queues, and creating a specified number of working threads for each GPU, so that the system can efficiently process FFT files from different sources.
Traversing file catalogues in FFT files, generating an independent task for each file, distributing the independent task to a global task queue for waiting for processing, and initializing a plurality of processing processes for acquiring different tasks from the global task queue for processing.
S2, dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures.
In an embodiment, as shown in fig. 2, in the first CPU sub-thread, tasks are extracted from the task queue, and the allocated astronomical data is preprocessed, including creating and initializing a harmonic and sub-harmonic information structure, where the harmonic and sub-harmonic information structure includes a frequency domain data distribution of each harmonic and its memory requirements, in preparation for a subsequent efficient FFT transformation on the GPU. Harmonics (Harmonic) refer to frequency components that occur at integer multiples of the original signal frequency, which are typically extracted from the signal by fourier transform or other spectral analysis means, to enhance or filter out signal components at specific frequencies. Sub-harmonics (Sub-harmonics) refer to frequency components below the fundamental Harmonic, typically a fraction (e.g., 1/2, 1/3, etc.) of the fundamental frequency, which are used in some signal processing to more finely decompose the spectral information of the signal to support multi-level frequency analysis. And transmitting the harmonic and subharmonic information structure bodies to the GPU sub-thread through the first intermediate result queue, and continuing to calculate.
In an embodiment, as shown in fig. 2, in the GPU sub-thread, the harmonic and sub-harmonic information structures are obtained through the first intermediate result queue, a specific GPU device is allocated according to the environmental variable cuda_visible_device, and a specific resource is allocated on the GPU according to a predefined task parameter, and accelerated computation including fourier transformation and candidate signal data generation is performed, so as to form candidate signal data of a linked list structure. Processing time is greatly shortened through GPU acceleration, and overall searching efficiency is improved. And finally, transmitting the candidate signal data to a second CPU sub-thread through a second intermediate result queue. Wherein the predefined task parameters include:
(1) zmax is the maximum z value of acceleration search, wherein the z value represents the width of a Fourier window, controls the frequency resolution and has direct influence on the signal detection precision;
(2) wmax is the maximum w value of acceleration search, w represents the acceleration search depth parameter, the higher the z value is, the wider the acceleration range of the signal in the Fourier domain is controlled;
(3) sigma, threshold value of signal detection;
(4) numharm accelerating the number of harmonics used in the search;
(5) batchsize the data amount processed by one task.
In the embodiment, as shown in fig. 2, in the second CPU sub-thread, candidate signal data in the linked list structure is obtained through the second intermediate result queue, post-processing including sorting and screening is performed, and a CSV file containing detailed information of candidate pulsar signals is formatted and output to the result queue, so that after all processes are calculated, the whole pulsar search task is completed.
Meanwhile, the task states among all the sub-threads are synchronized by using the queues of the multi-process architecture and the queue blocking locks. Wherein the task state includes:
(1) Task ID;
(2) A current processing stage;
(3) The process ID of the execution;
(4) The start time and the last active time of the task;
(5) A time stamp of task completion or a timeout state.
S3, dynamically adjusting the number of processes in the processing flow by monitoring feedback.
In an embodiment, when the task execution timeout is detected, the task executing the timeout is reassigned to the task queue. As shown in fig. 3, the processing of the original process where the timeout task is located includes:
(1) Setting a stop mark of the process to terminate the original process;
(2) Monitoring the exit state of the process and waiting for the process to exit normally in a set time;
(3) If the process does not exit within the preset time, the process is forcedly terminated;
(4) After the original process exits, a new process is re-created and added into the working process pool to ensure that the number of processes in the working process pool reaches the specified number, ensure the continuity and stability of task processing and ensure the dynamic recovery and reasonable utilization of system resources.
Through the monitoring process, the dynamic allocation and load balancing of the tasks are realized, and the optimal utilization of resources and the efficient execution of the tasks are ensured.
S4, recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
In an embodiment, the execution state of each task is tracked and recorded, and the task execution state is written into a global task state dictionary, so that the result of task execution can be collected and returned efficiently. And outputting an overall result after all tasks are processed, wherein the overall result comprises the processing time, the processing state and the final processing output of each file.
In addition, debug mode and non-debug mode selection are provided, the standard output is redirected to remove redundant task execution log records in the non-debug mode, detailed task execution logs are reserved in the debug mode, and monitoring and troubleshooting are performed according to the detailed task execution logs.
In summary, the pulsar Fourier domain acceleration search pipeline parallel method provided by the embodiment of the invention can promote the pulsar double-star search process with a plurality of times of performance advantages, and simultaneously support the speed of searching the type of celestial body from FAST astronomical data by using multi-GPU parallel acceleration, wherein the speed can be increased by tens of times.
Based on the same inventive concept, as shown in fig. 4, the embodiment of the invention further provides a pulsar fourier domain accelerated search pipeline parallel device 400, which comprises a task allocation module 410, a task execution module 420, a task monitoring module 430 and a task recording module 440.
The task allocation module 410 is configured to allocate the received astronomical data to a plurality of parallel processes with a configurable number;
The task execution module 420 is configured to divide each process into three serial sub-threads, pre-process the allocated astronomical data by using a first CPU sub-thread, read the pre-processed data by using a GPU sub-thread and perform accelerated computation on the GPU to obtain candidate signal data, read the candidate signal data by using a second CPU sub-thread and perform post-processing and result summarization, and synchronize task states among the sub-threads by using queues and queue blocking locks of multiple parallel process architectures;
the task monitoring module 430 is configured to dynamically adjust the number of processes in the processing flow by monitoring feedback;
The task recording module 440 is configured to record the task state and the calculation results of the multiple parallel processes in real time and process the abnormal situation.
Based on the same inventive concept, as shown in fig. 5, an electronic device 500 is further provided according to an embodiment of the present invention, which includes a memory 510 and one or more processors 520, where the memory 510 is configured to store a computer program, and the processors 520 are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when executing the computer program.
It should be noted that, the pulsar fourier domain acceleration search pipeline parallel device and the electronic device provided in the foregoing embodiments all belong to the same inventive concept as a pulsar fourier domain acceleration search pipeline parallel method, and specific implementation processes of the pulsar fourier domain acceleration search pipeline parallel device and the pulsar fourier domain acceleration search pipeline parallel method are detailed in an embodiment of a pulsar fourier domain acceleration search pipeline parallel method, which is not described herein again.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims (9)

1.一种脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,包括以下步骤:1. A parallel method for accelerating the search pipeline of pulsars in Fourier domain, characterized by comprising the following steps: 将接收的天文数据分配到数量可配置的多个并行进程;Distribute received astronomical data to a configurable number of parallel processes; 将每个进程分为三个串行子线程,利用第一CPU子线程对分配到的天文数据进行预处理,利用GPU子线程读取预处理后的数据并在GPU上执行包括傅立叶变换和候选信号数据生成的加速计算得到候选信号数据,利用第二CPU子线程读取候选信号数据并进行后处理和结果汇总,同时利用多个并行进程架构的队列和队列阻塞锁来同步各子线程之间的任务状态;Each process is divided into three serial sub-threads. The first CPU sub-thread is used to pre-process the assigned astronomical data. The GPU sub-thread is used to read the pre-processed data and perform accelerated calculations including Fourier transform and candidate signal data generation on the GPU to obtain candidate signal data. The second CPU sub-thread is used to read the candidate signal data and perform post-processing and result aggregation. At the same time, queues and queue blocking locks of multiple parallel process architectures are used to synchronize the task status between sub-threads. 通过监控反馈动态调整处理流程中的进程数量,包括:在检测到任务执行超时时,将执行超时的任务重新分配到任务队列中;对超时任务所在的原进程进行处理,包括:设定进程的停止标志以终止原进程,监控进程的退出状态并在设定的时间内等待进程正常退出,若进程未在预定时间内退出则强制终止进程,在原进程退出后重新创建一个新进程并将新进程加入到工作进程池中以保证工作进程池中进程数量达到指定数量;Dynamically adjust the number of processes in the processing flow through monitoring feedback, including: when a task execution timeout is detected, reallocate the timed-out task to the task queue; process the original process where the timed-out task is located, including: set the stop flag of the process to terminate the original process, monitor the exit status of the process and wait for the process to exit normally within the set time, forcibly terminate the process if the process does not exit within the predetermined time, recreate a new process after the original process exits and add the new process to the working process pool to ensure that the number of processes in the working process pool reaches the specified number; 实时记录任务状态和多个并行进程的计算结果并处理异常情况。Record task status and calculation results of multiple parallel processes in real time and handle exceptions. 2.根据权利要求1所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,所述利用第一CPU子线程对分配到的天文数据进行预处理,包括:2. The method for accelerating the search pipeline of pulsars in Fourier domain according to claim 1, wherein the step of preprocessing the allocated astronomical data using the first CPU subthread comprises: 在第一CPU子线程中,对分配到的天文数据进行包括创建并初始化谐波和子谐波信息结构体的预处理,其中,谐波和子谐波信息结构体中包括每个谐波的频域数据分布及其内存需求。In the first CPU sub-thread, the allocated astronomical data is preprocessed including creating and initializing harmonic and sub-harmonic information structures, wherein the harmonic and sub-harmonic information structures include frequency domain data distribution of each harmonic and its memory requirement. 3.根据权利要求2所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,所述利用GPU子线程读取预处理后的数据并在GPU上执行包括傅立叶变换和候选信号数据生成的加速计算得到候选信号数据,包括:3. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 2, characterized in that the use of a GPU subthread to read the preprocessed data and perform accelerated calculations including Fourier transform and candidate signal data generation on the GPU to obtain candidate signal data comprises: 在GPU子线程中,读取第一CPU子线程预处理得到的谐波和子谐波信息结构体,根据预定义的任务参数,在GPU上分配资源并执行包括傅立叶变换和候选信号数据生成的加速计算,得到候选信号数据。In the GPU sub-thread, the harmonic and sub-harmonic information structure obtained by preprocessing of the first CPU sub-thread is read, and according to the predefined task parameters, resources are allocated on the GPU and accelerated calculations including Fourier transform and candidate signal data generation are performed to obtain candidate signal data. 4.根据权利要求1或3所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,所述利用第二CPU子线程读取候选信号数据并进行后处理和结果汇总,包括:4. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1 or 3, characterized in that the use of the second CPU subthread to read the candidate signal data and perform post-processing and result aggregation comprises: 在第二CPU子线程中,读取GPU子线程加速计算得到的候选信号数据,对候选信号数据进行包括排序和筛选的后处理并格式化输出得到结果汇总文件。In the second CPU sub-thread, the candidate signal data obtained by the accelerated calculation of the GPU sub-thread is read, and the candidate signal data is post-processed including sorting and screening and formatted and output to obtain a result summary file. 5.根据权利要求3所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,预定义的任务参数包括:5. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 3, wherein the predefined task parameters include: 加速搜索的最大z值、加速搜索的最大w值、信号检测的阈值、加速搜索中使用的谐波数量、以及一次任务处理的数据量,其中,z值表示傅立叶窗口的宽度,w值表示加速搜索深度参数。The maximum z value of the accelerated search, the maximum w value of the accelerated search, the threshold of signal detection, the number of harmonics used in the accelerated search, and the amount of data processed in one task, where the z value represents the width of the Fourier window and the w value represents the accelerated search depth parameter. 6.根据权利要求1所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,任务状态包括:6. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1, wherein the task status includes: 任务ID、当前处理阶段、执行的进程ID、任务的开始时间和最近活跃时间、以及任务完成的时间戳或超时状态。The task ID, current processing stage, executing process ID, task start time and last activity time, and task completion timestamp or timeout status. 7.根据权利要求1所述的脉冲星傅立叶域加速搜索流水线并行方法,其特征在于,所述方法还包括:7. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1, characterized in that the method further comprises: 提供调试模式和非调试模式选择,在非调试模式下重定向标准输出除去冗余任务执行日志记录,在调试模式下保留详细的任务执行日志并根据详细的任务执行日志进行监控和排查问题。Provides debugging mode and non-debugging mode selection. In non-debugging mode, standard output is redirected to remove redundant task execution log records. In debugging mode, detailed task execution logs are retained and monitored and troubleshooted based on the detailed task execution logs. 8.一种脉冲星傅立叶域加速搜索流水线并行装置,利用权利要求1-7任一项所述的脉冲星傅立叶域加速搜索流水线并行方法实现,其特征在于,包括:任务分配模块、任务执行模块、任务监控模块和任务记录模块;8. A parallel device for accelerating the search pipeline in Fourier domain for pulsars, implemented by the parallel method for accelerating the search pipeline in Fourier domain for pulsars according to any one of claims 1 to 7, characterized in that it comprises: a task allocation module, a task execution module, a task monitoring module and a task recording module; 所述任务分配模块用于将接收的天文数据分配到数量可配置的多个并行进程;The task allocation module is used to allocate the received astronomical data to a configurable number of multiple parallel processes; 所述任务执行模块用于将每个进程分为三个串行子线程,利用第一CPU子线程对分配到的天文数据进行预处理,利用GPU子线程读取预处理后的数据并在GPU上执行加速计算得到候选信号数据,利用第二CPU子线程读取候选信号数据并进行后处理和结果汇总,同时利用多个并行进程架构的队列和队列阻塞锁来同步各子线程之间的任务状态;The task execution module is used to divide each process into three serial sub-threads, use the first CPU sub-thread to pre-process the assigned astronomical data, use the GPU sub-thread to read the pre-processed data and perform accelerated calculation on the GPU to obtain candidate signal data, use the second CPU sub-thread to read the candidate signal data and perform post-processing and result aggregation, and use the queues and queue blocking locks of the multiple parallel process architectures to synchronize the task status between the sub-threads; 所述任务监控模块用于通过监控反馈动态调整处理流程中的进程数量;The task monitoring module is used to dynamically adjust the number of processes in the processing flow through monitoring feedback; 所述任务记录模块用于实时记录任务状态和多个并行进程的计算结果并处理异常情况。The task recording module is used to record the task status and the calculation results of multiple parallel processes in real time and handle abnormal situations. 9.一种电子设备,包括存储器和一种或多种处理器,所述存储器用于存储计算机程序,其特征在于,所述处理器用于当执行所述计算机程序时,实现权利要求1-7任一项所述的脉冲星傅立叶域加速搜索流水线并行方法。9. An electronic device comprising a memory and one or more processors, wherein the memory is used to store a computer program, and wherein the processor is used to implement the pulsar Fourier domain accelerated search pipeline parallel method as described in any one of claims 1 to 7 when executing the computer program.
CN202411614716.0A 2024-11-13 2024-11-13 Pulsar Fourier domain acceleration search pipeline parallel method and device Active CN119127514B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202411614716.0A CN119127514B (en) 2024-11-13 2024-11-13 Pulsar Fourier domain acceleration search pipeline parallel method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202411614716.0A CN119127514B (en) 2024-11-13 2024-11-13 Pulsar Fourier domain acceleration search pipeline parallel method and device

Publications (2)

Publication Number Publication Date
CN119127514A CN119127514A (en) 2024-12-13
CN119127514B true CN119127514B (en) 2025-04-22

Family

ID=93765987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202411614716.0A Active CN119127514B (en) 2024-11-13 2024-11-13 Pulsar Fourier domain acceleration search pipeline parallel method and device

Country Status (1)

Country Link
CN (1) CN119127514B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107205362A (en) * 2014-12-03 2017-09-26 斯马特博有限公司 Method for obtaining the information on farm-animals
CN117494060A (en) * 2023-11-15 2024-02-02 河海大学 GPU-based method for mining variable-length motifs in trend data

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8528001B2 (en) * 2008-12-15 2013-09-03 Oracle America, Inc. Controlling and dynamically varying automatic parallelization
WO2016109277A1 (en) * 2015-01-02 2016-07-07 Systech Corporation Control infrastructure
CN111368252A (en) * 2020-02-28 2020-07-03 中国科学院新疆天文台 Pulsar coherent de-dispersion system and method
CN117851330A (en) * 2023-11-06 2024-04-09 中国科学院新疆天文台 Ultra-wideband pulsar data processing method based on GPU cluster

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107205362A (en) * 2014-12-03 2017-09-26 斯马特博有限公司 Method for obtaining the information on farm-animals
CN117494060A (en) * 2023-11-15 2024-02-02 河海大学 GPU-based method for mining variable-length motifs in trend data

Also Published As

Publication number Publication date
CN119127514A (en) 2024-12-13

Similar Documents

Publication Publication Date Title
Camp et al. Streamline integration using MPI-hybrid parallelism on a large multicore architecture
CN106055311B (en) MapReduce tasks in parallel methods based on assembly line multithreading
Zeebaree et al. Multicomputer multicore system influence on maximum multi-processes execution time
Chen et al. GPU-accelerated high-throughput online stream data processing
US20170371713A1 (en) Intelligent resource management system
Zhong et al. Towards GPU-accelerated large-scale graph processing in the cloud
Valero-Lara et al. Many-task computing on many-core architectures
Jiang et al. Accelerating MapReduce framework on multi-GPU systems
CN119127514B (en) Pulsar Fourier domain acceleration search pipeline parallel method and device
Alnaasan et al. ACCDP: accelerated data-parallel distributed DNN training for modern GPU-based HPC clusters
Wu et al. A model-based software solution for simultaneous multiple kernels on GPUs
Ravi et al. Runway: In-transit data compression on heterogeneous hpc systems
Chong et al. A Multi-GPU framework for in-memory text data analytics
Liu et al. A-MapCG: an adaptive MapReduce framework for GPUs
US20150242323A1 (en) Source-to-source compiler and run-time library to transparently accelerate stack or queue-based irregular applications on many-core architectures
Jones et al. Evolution of HEP Processing Frameworks
Khlevna et al. Parallel and Distributed Machine Learning Techniques for Anomaly Detection Systems.
Wenjie et al. HSK: A Hierarchical Parallel Simulation Kernel for Multicore Platform
Dang et al. Test Data Generation based on Multiprocess Enhanced Multi-Population Genetic Algorithm
Ivanescu et al. Parallel vs distributed edge detection for large medical image datasets
CN115827251B (en) Heterogeneous platform-based high-performance Linpack benchmark test program optimization method and equipment
Su et al. Optimistic parallel discrete event simulation based on multi-core platform and its performance analysis
Yang et al. An efficient parallel ISODATA algorithm based on Kepler GPUs
Wang Reliability speedup: an effective metric for parallel application with checkpointing
Song et al. Large Dynamic Graph Processing with GPU-Accelerated Priority-Driven Differential Scheduling and Operation Reduction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant