CN119127514B

CN119127514B - Pulsar Fourier domain acceleration search pipeline parallel method and device

Info

Publication number: CN119127514B
Application number: CN202411614716.0A
Authority: CN
Inventors: 汤昭荣; 潘秋红; 毛旷; 王琪; 陈华曦; 任祖杰
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-11-13
Filing date: 2024-11-13
Publication date: 2025-04-22
Anticipated expiration: 2044-11-13
Also published as: CN119127514A

Abstract

The invention discloses a pulsar Fourier domain acceleration search pipeline parallel method and device, which comprise the steps of distributing received astronomical data to a plurality of parallel processes, preprocessing the distributed astronomical data by utilizing a first CPU (Central processing Unit) sub-thread, reading the preprocessed data by utilizing a GPU (graphics processing Unit) sub-thread, executing acceleration calculation on the GPU to obtain candidate signal data, reading the candidate signal data by utilizing a second CPU sub-thread, carrying out post-processing and result summarization, synchronizing task states among the sub-threads by utilizing queues and queue blocking locks of a plurality of parallel process structures, dynamically adjusting the number of processes in a processing flow by monitoring feedback, recording the task states and calculation results of the processes in real time, and processing abnormal conditions. The invention can effectively promote the pulse double-star searching process, supports the parallel acceleration by utilizing a plurality of GPUs and greatly promotes the speed of searching the celestial body of the type from the FAST astronomical data.

Description

Pulsar Fourier domain acceleration search pipeline parallel method and device

Technical Field

The invention belongs to the technical field of astronomical data high-performance calculation, and particularly relates to a pulsar Fourier domain acceleration search pipeline parallel method and device.

Background

Pulsar is a compact neutron star in universe, has high-speed autorotation and strong electromagnetic radiation characteristics, and becomes an important target in astronomical observation due to the stable autorotation period. The discovery of pulsar provides rich data for astrology and astrophysics, and helps scientists to know key physical phenomena such as gravitational waves, generalized relativity and the like in depth. However, with the continuous progress of the observation technology, especially the use of high sensitivity telescopes such as FAST (five hundred meter caliber spherical radio telescope), the generated data volume is rapidly increased, and the pulsar search needs to process massive observation data, which brings unprecedented challenges, whereas the traditional time domain search method can identify the signal of a single pulsar to a certain extent, but is difficult to cope with the complexity of a double-star system and the requirement of quick search, so that the development of a new search method is of great importance.

To address the challenges described above, fourier Domain Accelerated Search (FDAS) algorithms have been developed. The algorithm processes the observation data by converting the observation data into a frequency domain, and the efficiency of pulsar searching is obviously improved by using mathematical tools such as Fast Fourier Transform (FFT). Currently, the prest project is taken as one of representative schemes of FDAS algorithm, and has shown strong searching capability in practice, however, the implementation of the prest project on the GPU has remarkable performance bottleneck, and the main problems are that the execution efficiency of the conventional multi-process scheme based on command lines on the GPU is not ideal, the GPU resource allocation and management are not efficient enough, the GPU utilization rate is low, the calculation delay is remarkably increased, the throughput of the whole system cannot reach the expectations, and meanwhile, the performance fluctuation is large and the resource consumption is too concentrated, so that a great amount of calculation resources are consumed in the searching process, and high operation cost is brought. These problems severely limit the application potential of prest's project and its similar approaches in large-scale astronomical data processing.

With the rapid development of heterogeneous computing technology, especially the popularization of CPU and GPU cooperative computing modes, a new solution idea is provided for the computation-intensive tasks such as pulsar search and the like. However, the existing FDAS algorithm has not fully utilized its performance advantages. The prior art is not fully combined with the advantages of the application layer characteristics and the underlying hardware architecture in design, so that the system performance is not utilized to the maximum in the actual operation process, and the calculation efficiency and the resource utilization rate still have a larger improvement space.

Therefore, in order to overcome the performance bottleneck problem in the existing scheme, the potential of the heterogeneous computing architecture is fully exploited, more efficient and economical pulsar search is realized, the FDAS algorithm is necessary to be deeply redesigned, the data processing flow is optimized, and the parallelism and the execution efficiency of the algorithm are improved. This will not only drive the deep development of pulsar research, but will also contribute an important force for the advancement of astrology and astrophysics.

Disclosure of Invention

In view of the above, the present invention aims to provide a parallel method and apparatus for a pulsar fourier domain acceleration search pipeline, which can improve a pulsar double-star search process with several times of performance advantages by designing a multi-thread architecture in multiple processes and in each process to perform parallel processing on astronomical data, and simultaneously support the speed of searching for the type of celestial body from FAST astronomical data by using multiple GPUs for parallel acceleration, which can be several tens of times.

In order to achieve the above purpose, the technical scheme provided by the invention is as follows:

In a first aspect, the method for parallelizing the pulsar fourier domain acceleration search pipeline provided by the embodiment of the invention comprises the following steps:

distributing the received astronomical data to a plurality of parallel processes with configurable quantity;

Dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;

dynamically adjusting the number of processes in the processing flow by monitoring feedback;

And recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.

Specifically, the preprocessing the distributed astronomical data by using the first CPU sub-thread includes:

In the first CPU sub-thread, preprocessing is carried out on the distributed astronomical data, wherein the preprocessing comprises the creation and initialization of a harmonic and sub-harmonic information structure body, and the harmonic and sub-harmonic information structure body comprises the frequency domain data distribution of each harmonic and the memory requirement of each harmonic.

Specifically, the method for obtaining candidate signal data by using the GPU sub-thread to read the preprocessed data and performing acceleration calculation on the GPU includes:

And in the GPU sub-thread, reading a harmonic wave and sub-harmonic wave information structure body obtained by preprocessing the first CPU sub-thread, distributing resources on the GPU according to predefined task parameters, and executing accelerated calculation comprising Fourier transformation and candidate signal data generation to obtain candidate signal data.

Specifically, the method for reading candidate signal data by using the second CPU sub-thread and performing post-processing and result summarization includes:

And in the second CPU sub-thread, the candidate signal data obtained by accelerating the calculation of the GPU sub-thread is read, and the candidate signal data is subjected to post-processing comprising sequencing and screening and formatted to output a result summary file.

Specifically, the predefined task parameters include:

The maximum z value of the accelerated search, the maximum w value of the accelerated search, the threshold of signal detection, the number of harmonics used in the accelerated search, and the amount of data processed by a task, wherein the z value represents the width of the fourier window and the w value represents the accelerated search depth parameter.

Specifically, the task state includes:

task ID, current processing stage, process ID of execution, start time and last active time of task, and timestamp or timeout status of task completion.

Specifically, the dynamically adjusting the number of processes in the processing flow by monitoring feedback includes:

When the task execution timeout is detected, reassigning the task with the execution timeout to a task queue;

Setting a stop mark of the process to stop the original process, monitoring the exit state of the process and waiting for the process to exit normally in a set time, if the process does not exit in a preset time, forcibly stopping the process, re-creating a new process after the original process exits, and adding the new process into a working process pool to ensure that the number of processes in the working process pool reaches a specified number.

Specifically, the method further comprises:

Providing a debugging mode and a non-debugging mode selection, redirecting standard output to remove redundant task execution log records in the non-debugging mode, reserving detailed task execution logs in the debugging mode, and monitoring and troubleshooting problems according to the detailed task execution logs.

In order to achieve the aim of the invention, the embodiment of the invention also provides a pulsar Fourier domain acceleration search pipeline parallel device which is realized by the pulsar Fourier domain acceleration search pipeline parallel method, comprising a task distribution module, a task execution module, a task monitoring module and a task recording module;

The task allocation module is used for allocating the received astronomical data to a plurality of parallel processes with configurable quantity;

The task execution module is used for dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;

the task monitoring module is used for dynamically adjusting the number of processes in the processing flow through monitoring feedback;

the task recording module is used for recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.

In a third aspect, to achieve the above object, an embodiment of the present invention further provides an electronic device, including a memory and one or more processors, where the memory is configured to store a computer program, and the processors are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when the computer program is executed.

Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:

(1) Isolation is realized through multi-process parallelization processing, so that each process has independent memory space and resources, thread conflict is effectively avoided among a plurality of processes, and thread safety is ensured. The isolation mechanism not only improves the stability of the program, but also provides a reliable running environment for complex data processing.

(2) By arranging a plurality of serial sub-threads in each process and skillfully utilizing the working mode of the pipeline, the non-waiting utilization of GPU resources is realized, and the GPU can continuously receive processing tasks by the mode, so that the resource utilization rate of the GPU is greatly improved. In addition, the pipeline mode optimizes task allocation and scheduling, and further improves the overall processing efficiency.

(3) Due to the complexity of an astronomical data structure, the problem of shared memory can occur only by carrying out complex data processing and exchange through multiple processes, and the multiple processes and the adoption of multiple sub-threads for processing in each process provided by the invention not only fully utilize the parallel computing capability of a multi-core processor, but also effectively avoid the problem of shared memory through reasonable task division and inter-thread communication mechanisms, thereby remarkably improving the data processing efficiency, reducing the system overhead and enhancing the stability and the expandability of programs.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow diagram of a parallel method for a pulsar Fourier domain acceleration search pipeline provided by an embodiment of the invention;

FIG. 2 is a schematic illustration of a sub-process workflow in each process provided by an embodiment of the present invention;

FIG. 3 is a schematic flow chart of processing an original process of a timeout task according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a parallel device of a pulsar Fourier domain acceleration search pipeline according to an embodiment of the present invention;

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.

Aiming at the problems of low parallelism and execution efficiency of a pulsar Fourier domain acceleration search algorithm in the prior art, the embodiment of the invention provides a pulsar Fourier domain acceleration search pipeline parallelism method and device, which are used for processing astronomical data by designing multiple processes and designing a multithreading processing stage comprising a first CPU sub-thread, a GPU sub-thread and a second CPU sub-thread in each process, so that the parallel processing efficiency of astronomical data can be accelerated, the pulsar double-star search process can be effectively improved, and the speed of searching the type of astronomical object from FAST astronomical data can be improved by tens of times by using multiple GPU parallel acceleration.

Fig. 1 is a schematic flow chart of a parallel method of a pulsar fourier domain acceleration search pipeline according to an embodiment of the present invention. As shown in fig. 1, an embodiment provides a pulsar fourier domain accelerated search pipeline parallel method, which includes the following steps:

s1, distributing the received astronomical data to a plurality of parallel processes with configurable quantity.

In an embodiment, astronomical data is received, which is an FFT (fast fourier transform) file generated by realfft commands in the pulsar search and analysis software developed by Scott Ransom, the FFT file containing frequency information of astronomical observation signals, in particular fourier transform results based on sampling points.

A task queue for receiving and managing FFT files is initialized. Initializing a result queue for receiving the final processing result, and receiving the processing result generated by the second CPU sub-thread after each task is completed through the result queue, wherein the content comprises output data for accelerating searching and related statistical information thereof. Initializing a working process pool formed by a plurality of working processes, wherein each process comprises three sub-threads and two intermediate result queues, and creating a specified number of working threads for each GPU, so that the system can efficiently process FFT files from different sources.

Traversing file catalogues in FFT files, generating an independent task for each file, distributing the independent task to a global task queue for waiting for processing, and initializing a plurality of processing processes for acquiring different tasks from the global task queue for processing.

S2, dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures.

In an embodiment, as shown in fig. 2, in the first CPU sub-thread, tasks are extracted from the task queue, and the allocated astronomical data is preprocessed, including creating and initializing a harmonic and sub-harmonic information structure, where the harmonic and sub-harmonic information structure includes a frequency domain data distribution of each harmonic and its memory requirements, in preparation for a subsequent efficient FFT transformation on the GPU. Harmonics (Harmonic) refer to frequency components that occur at integer multiples of the original signal frequency, which are typically extracted from the signal by fourier transform or other spectral analysis means, to enhance or filter out signal components at specific frequencies. Sub-harmonics (Sub-harmonics) refer to frequency components below the fundamental Harmonic, typically a fraction (e.g., 1/2, 1/3, etc.) of the fundamental frequency, which are used in some signal processing to more finely decompose the spectral information of the signal to support multi-level frequency analysis. And transmitting the harmonic and subharmonic information structure bodies to the GPU sub-thread through the first intermediate result queue, and continuing to calculate.

In an embodiment, as shown in fig. 2, in the GPU sub-thread, the harmonic and sub-harmonic information structures are obtained through the first intermediate result queue, a specific GPU device is allocated according to the environmental variable cuda_visible_device, and a specific resource is allocated on the GPU according to a predefined task parameter, and accelerated computation including fourier transformation and candidate signal data generation is performed, so as to form candidate signal data of a linked list structure. Processing time is greatly shortened through GPU acceleration, and overall searching efficiency is improved. And finally, transmitting the candidate signal data to a second CPU sub-thread through a second intermediate result queue. Wherein the predefined task parameters include:

(1) zmax is the maximum z value of acceleration search, wherein the z value represents the width of a Fourier window, controls the frequency resolution and has direct influence on the signal detection precision;

(2) wmax is the maximum w value of acceleration search, w represents the acceleration search depth parameter, the higher the z value is, the wider the acceleration range of the signal in the Fourier domain is controlled;

(3) sigma, threshold value of signal detection;

(4) numharm accelerating the number of harmonics used in the search;

(5) batchsize the data amount processed by one task.

In the embodiment, as shown in fig. 2, in the second CPU sub-thread, candidate signal data in the linked list structure is obtained through the second intermediate result queue, post-processing including sorting and screening is performed, and a CSV file containing detailed information of candidate pulsar signals is formatted and output to the result queue, so that after all processes are calculated, the whole pulsar search task is completed.

Meanwhile, the task states among all the sub-threads are synchronized by using the queues of the multi-process architecture and the queue blocking locks. Wherein the task state includes:

(1) Task ID;

(2) A current processing stage;

(3) The process ID of the execution;

(4) The start time and the last active time of the task;

(5) A time stamp of task completion or a timeout state.

S3, dynamically adjusting the number of processes in the processing flow by monitoring feedback.

In an embodiment, when the task execution timeout is detected, the task executing the timeout is reassigned to the task queue. As shown in fig. 3, the processing of the original process where the timeout task is located includes:

(1) Setting a stop mark of the process to terminate the original process;

(2) Monitoring the exit state of the process and waiting for the process to exit normally in a set time;

(3) If the process does not exit within the preset time, the process is forcedly terminated;

(4) After the original process exits, a new process is re-created and added into the working process pool to ensure that the number of processes in the working process pool reaches the specified number, ensure the continuity and stability of task processing and ensure the dynamic recovery and reasonable utilization of system resources.

Through the monitoring process, the dynamic allocation and load balancing of the tasks are realized, and the optimal utilization of resources and the efficient execution of the tasks are ensured.

S4, recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.

In an embodiment, the execution state of each task is tracked and recorded, and the task execution state is written into a global task state dictionary, so that the result of task execution can be collected and returned efficiently. And outputting an overall result after all tasks are processed, wherein the overall result comprises the processing time, the processing state and the final processing output of each file.

In addition, debug mode and non-debug mode selection are provided, the standard output is redirected to remove redundant task execution log records in the non-debug mode, detailed task execution logs are reserved in the debug mode, and monitoring and troubleshooting are performed according to the detailed task execution logs.

In summary, the pulsar Fourier domain acceleration search pipeline parallel method provided by the embodiment of the invention can promote the pulsar double-star search process with a plurality of times of performance advantages, and simultaneously support the speed of searching the type of celestial body from FAST astronomical data by using multi-GPU parallel acceleration, wherein the speed can be increased by tens of times.

Based on the same inventive concept, as shown in fig. 4, the embodiment of the invention further provides a pulsar fourier domain accelerated search pipeline parallel device 400, which comprises a task allocation module 410, a task execution module 420, a task monitoring module 430 and a task recording module 440.

The task allocation module 410 is configured to allocate the received astronomical data to a plurality of parallel processes with a configurable number;

The task execution module 420 is configured to divide each process into three serial sub-threads, pre-process the allocated astronomical data by using a first CPU sub-thread, read the pre-processed data by using a GPU sub-thread and perform accelerated computation on the GPU to obtain candidate signal data, read the candidate signal data by using a second CPU sub-thread and perform post-processing and result summarization, and synchronize task states among the sub-threads by using queues and queue blocking locks of multiple parallel process architectures;

the task monitoring module 430 is configured to dynamically adjust the number of processes in the processing flow by monitoring feedback;

The task recording module 440 is configured to record the task state and the calculation results of the multiple parallel processes in real time and process the abnormal situation.

Based on the same inventive concept, as shown in fig. 5, an electronic device 500 is further provided according to an embodiment of the present invention, which includes a memory 510 and one or more processors 520, where the memory 510 is configured to store a computer program, and the processors 520 are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when executing the computer program.

It should be noted that, the pulsar fourier domain acceleration search pipeline parallel device and the electronic device provided in the foregoing embodiments all belong to the same inventive concept as a pulsar fourier domain acceleration search pipeline parallel method, and specific implementation processes of the pulsar fourier domain acceleration search pipeline parallel device and the pulsar fourier domain acceleration search pipeline parallel method are detailed in an embodiment of a pulsar fourier domain acceleration search pipeline parallel method, which is not described herein again.

The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.

Claims

1. A parallel method for accelerating the search pipeline of pulsars in Fourier domain, characterized by comprising the following steps:

Distribute received astronomical data to a configurable number of parallel processes;

Each process is divided into three serial sub-threads. The first CPU sub-thread is used to pre-process the assigned astronomical data. The GPU sub-thread is used to read the pre-processed data and perform accelerated calculations including Fourier transform and candidate signal data generation on the GPU to obtain candidate signal data. The second CPU sub-thread is used to read the candidate signal data and perform post-processing and result aggregation. At the same time, queues and queue blocking locks of multiple parallel process architectures are used to synchronize the task status between sub-threads.

Dynamically adjust the number of processes in the processing flow through monitoring feedback, including: when a task execution timeout is detected, reallocate the timed-out task to the task queue; process the original process where the timed-out task is located, including: set the stop flag of the process to terminate the original process, monitor the exit status of the process and wait for the process to exit normally within the set time, forcibly terminate the process if the process does not exit within the predetermined time, recreate a new process after the original process exits and add the new process to the working process pool to ensure that the number of processes in the working process pool reaches the specified number;

Record task status and calculation results of multiple parallel processes in real time and handle exceptions.

2. The method for accelerating the search pipeline of pulsars in Fourier domain according to claim 1, wherein the step of preprocessing the allocated astronomical data using the first CPU subthread comprises:

In the first CPU sub-thread, the allocated astronomical data is preprocessed including creating and initializing harmonic and sub-harmonic information structures, wherein the harmonic and sub-harmonic information structures include frequency domain data distribution of each harmonic and its memory requirement.

3. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 2, characterized in that the use of a GPU subthread to read the preprocessed data and perform accelerated calculations including Fourier transform and candidate signal data generation on the GPU to obtain candidate signal data comprises:

In the GPU sub-thread, the harmonic and sub-harmonic information structure obtained by preprocessing of the first CPU sub-thread is read, and according to the predefined task parameters, resources are allocated on the GPU and accelerated calculations including Fourier transform and candidate signal data generation are performed to obtain candidate signal data.

4. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1 or 3, characterized in that the use of the second CPU subthread to read the candidate signal data and perform post-processing and result aggregation comprises:

In the second CPU sub-thread, the candidate signal data obtained by the accelerated calculation of the GPU sub-thread is read, and the candidate signal data is post-processed including sorting and screening and formatted and output to obtain a result summary file.

5. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 3, wherein the predefined task parameters include:

The maximum z value of the accelerated search, the maximum w value of the accelerated search, the threshold of signal detection, the number of harmonics used in the accelerated search, and the amount of data processed in one task, where the z value represents the width of the Fourier window and the w value represents the accelerated search depth parameter.

6. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1, wherein the task status includes:

The task ID, current processing stage, executing process ID, task start time and last activity time, and task completion timestamp or timeout status.

7. The pulsar Fourier domain accelerated search pipeline parallel method according to claim 1, characterized in that the method further comprises:

Provides debugging mode and non-debugging mode selection. In non-debugging mode, standard output is redirected to remove redundant task execution log records. In debugging mode, detailed task execution logs are retained and monitored and troubleshooted based on the detailed task execution logs.

8. A parallel device for accelerating the search pipeline in Fourier domain for pulsars, implemented by the parallel method for accelerating the search pipeline in Fourier domain for pulsars according to any one of claims 1 to 7, characterized in that it comprises: a task allocation module, a task execution module, a task monitoring module and a task recording module;

The task allocation module is used to allocate the received astronomical data to a configurable number of multiple parallel processes;

The task execution module is used to divide each process into three serial sub-threads, use the first CPU sub-thread to pre-process the assigned astronomical data, use the GPU sub-thread to read the pre-processed data and perform accelerated calculation on the GPU to obtain candidate signal data, use the second CPU sub-thread to read the candidate signal data and perform post-processing and result aggregation, and use the queues and queue blocking locks of the multiple parallel process architectures to synchronize the task status between the sub-threads;

The task monitoring module is used to dynamically adjust the number of processes in the processing flow through monitoring feedback;

The task recording module is used to record the task status and the calculation results of multiple parallel processes in real time and handle abnormal situations.

9. An electronic device comprising a memory and one or more processors, wherein the memory is used to store a computer program, and wherein the processor is used to implement the pulsar Fourier domain accelerated search pipeline parallel method as described in any one of claims 1 to 7 when executing the computer program.