CN119127514B - Pulsar Fourier domain acceleration search pipeline parallel method and device - Google Patents
Pulsar Fourier domain acceleration search pipeline parallel method and device Download PDFInfo
- Publication number
- CN119127514B CN119127514B CN202411614716.0A CN202411614716A CN119127514B CN 119127514 B CN119127514 B CN 119127514B CN 202411614716 A CN202411614716 A CN 202411614716A CN 119127514 B CN119127514 B CN 119127514B
- Authority
- CN
- China
- Prior art keywords
- task
- sub
- thread
- data
- gpu
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/14—Fourier, Walsh or analogous domain transformations, e.g. Laplace, Hilbert, Karhunen-Loeve, transforms
- G06F17/141—Discrete Fourier transforms
- G06F17/142—Fast Fourier transforms, e.g. using a Cooley-Tukey type algorithm
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5018—Thread allocation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Discrete Mathematics (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a pulsar Fourier domain acceleration search pipeline parallel method and device, which comprise the steps of distributing received astronomical data to a plurality of parallel processes, preprocessing the distributed astronomical data by utilizing a first CPU (Central processing Unit) sub-thread, reading the preprocessed data by utilizing a GPU (graphics processing Unit) sub-thread, executing acceleration calculation on the GPU to obtain candidate signal data, reading the candidate signal data by utilizing a second CPU sub-thread, carrying out post-processing and result summarization, synchronizing task states among the sub-threads by utilizing queues and queue blocking locks of a plurality of parallel process structures, dynamically adjusting the number of processes in a processing flow by monitoring feedback, recording the task states and calculation results of the processes in real time, and processing abnormal conditions. The invention can effectively promote the pulse double-star searching process, supports the parallel acceleration by utilizing a plurality of GPUs and greatly promotes the speed of searching the celestial body of the type from the FAST astronomical data.
Description
Technical Field
The invention belongs to the technical field of astronomical data high-performance calculation, and particularly relates to a pulsar Fourier domain acceleration search pipeline parallel method and device.
Background
Pulsar is a compact neutron star in universe, has high-speed autorotation and strong electromagnetic radiation characteristics, and becomes an important target in astronomical observation due to the stable autorotation period. The discovery of pulsar provides rich data for astrology and astrophysics, and helps scientists to know key physical phenomena such as gravitational waves, generalized relativity and the like in depth. However, with the continuous progress of the observation technology, especially the use of high sensitivity telescopes such as FAST (five hundred meter caliber spherical radio telescope), the generated data volume is rapidly increased, and the pulsar search needs to process massive observation data, which brings unprecedented challenges, whereas the traditional time domain search method can identify the signal of a single pulsar to a certain extent, but is difficult to cope with the complexity of a double-star system and the requirement of quick search, so that the development of a new search method is of great importance.
To address the challenges described above, fourier Domain Accelerated Search (FDAS) algorithms have been developed. The algorithm processes the observation data by converting the observation data into a frequency domain, and the efficiency of pulsar searching is obviously improved by using mathematical tools such as Fast Fourier Transform (FFT). Currently, the prest project is taken as one of representative schemes of FDAS algorithm, and has shown strong searching capability in practice, however, the implementation of the prest project on the GPU has remarkable performance bottleneck, and the main problems are that the execution efficiency of the conventional multi-process scheme based on command lines on the GPU is not ideal, the GPU resource allocation and management are not efficient enough, the GPU utilization rate is low, the calculation delay is remarkably increased, the throughput of the whole system cannot reach the expectations, and meanwhile, the performance fluctuation is large and the resource consumption is too concentrated, so that a great amount of calculation resources are consumed in the searching process, and high operation cost is brought. These problems severely limit the application potential of prest's project and its similar approaches in large-scale astronomical data processing.
With the rapid development of heterogeneous computing technology, especially the popularization of CPU and GPU cooperative computing modes, a new solution idea is provided for the computation-intensive tasks such as pulsar search and the like. However, the existing FDAS algorithm has not fully utilized its performance advantages. The prior art is not fully combined with the advantages of the application layer characteristics and the underlying hardware architecture in design, so that the system performance is not utilized to the maximum in the actual operation process, and the calculation efficiency and the resource utilization rate still have a larger improvement space.
Therefore, in order to overcome the performance bottleneck problem in the existing scheme, the potential of the heterogeneous computing architecture is fully exploited, more efficient and economical pulsar search is realized, the FDAS algorithm is necessary to be deeply redesigned, the data processing flow is optimized, and the parallelism and the execution efficiency of the algorithm are improved. This will not only drive the deep development of pulsar research, but will also contribute an important force for the advancement of astrology and astrophysics.
Disclosure of Invention
In view of the above, the present invention aims to provide a parallel method and apparatus for a pulsar fourier domain acceleration search pipeline, which can improve a pulsar double-star search process with several times of performance advantages by designing a multi-thread architecture in multiple processes and in each process to perform parallel processing on astronomical data, and simultaneously support the speed of searching for the type of celestial body from FAST astronomical data by using multiple GPUs for parallel acceleration, which can be several tens of times.
In order to achieve the above purpose, the technical scheme provided by the invention is as follows:
In a first aspect, the method for parallelizing the pulsar fourier domain acceleration search pipeline provided by the embodiment of the invention comprises the following steps:
distributing the received astronomical data to a plurality of parallel processes with configurable quantity;
Dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;
dynamically adjusting the number of processes in the processing flow by monitoring feedback;
And recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
Specifically, the preprocessing the distributed astronomical data by using the first CPU sub-thread includes:
In the first CPU sub-thread, preprocessing is carried out on the distributed astronomical data, wherein the preprocessing comprises the creation and initialization of a harmonic and sub-harmonic information structure body, and the harmonic and sub-harmonic information structure body comprises the frequency domain data distribution of each harmonic and the memory requirement of each harmonic.
Specifically, the method for obtaining candidate signal data by using the GPU sub-thread to read the preprocessed data and performing acceleration calculation on the GPU includes:
And in the GPU sub-thread, reading a harmonic wave and sub-harmonic wave information structure body obtained by preprocessing the first CPU sub-thread, distributing resources on the GPU according to predefined task parameters, and executing accelerated calculation comprising Fourier transformation and candidate signal data generation to obtain candidate signal data.
Specifically, the method for reading candidate signal data by using the second CPU sub-thread and performing post-processing and result summarization includes:
And in the second CPU sub-thread, the candidate signal data obtained by accelerating the calculation of the GPU sub-thread is read, and the candidate signal data is subjected to post-processing comprising sequencing and screening and formatted to output a result summary file.
Specifically, the predefined task parameters include:
The maximum z value of the accelerated search, the maximum w value of the accelerated search, the threshold of signal detection, the number of harmonics used in the accelerated search, and the amount of data processed by a task, wherein the z value represents the width of the fourier window and the w value represents the accelerated search depth parameter.
Specifically, the task state includes:
task ID, current processing stage, process ID of execution, start time and last active time of task, and timestamp or timeout status of task completion.
Specifically, the dynamically adjusting the number of processes in the processing flow by monitoring feedback includes:
When the task execution timeout is detected, reassigning the task with the execution timeout to a task queue;
Setting a stop mark of the process to stop the original process, monitoring the exit state of the process and waiting for the process to exit normally in a set time, if the process does not exit in a preset time, forcibly stopping the process, re-creating a new process after the original process exits, and adding the new process into a working process pool to ensure that the number of processes in the working process pool reaches a specified number.
Specifically, the method further comprises:
Providing a debugging mode and a non-debugging mode selection, redirecting standard output to remove redundant task execution log records in the non-debugging mode, reserving detailed task execution logs in the debugging mode, and monitoring and troubleshooting problems according to the detailed task execution logs.
In order to achieve the aim of the invention, the embodiment of the invention also provides a pulsar Fourier domain acceleration search pipeline parallel device which is realized by the pulsar Fourier domain acceleration search pipeline parallel method, comprising a task distribution module, a task execution module, a task monitoring module and a task recording module;
The task allocation module is used for allocating the received astronomical data to a plurality of parallel processes with configurable quantity;
The task execution module is used for dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures;
the task monitoring module is used for dynamically adjusting the number of processes in the processing flow through monitoring feedback;
the task recording module is used for recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
In a third aspect, to achieve the above object, an embodiment of the present invention further provides an electronic device, including a memory and one or more processors, where the memory is configured to store a computer program, and the processors are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when the computer program is executed.
Compared with the prior art, the invention has the beneficial effects that at least the following steps are included:
(1) Isolation is realized through multi-process parallelization processing, so that each process has independent memory space and resources, thread conflict is effectively avoided among a plurality of processes, and thread safety is ensured. The isolation mechanism not only improves the stability of the program, but also provides a reliable running environment for complex data processing.
(2) By arranging a plurality of serial sub-threads in each process and skillfully utilizing the working mode of the pipeline, the non-waiting utilization of GPU resources is realized, and the GPU can continuously receive processing tasks by the mode, so that the resource utilization rate of the GPU is greatly improved. In addition, the pipeline mode optimizes task allocation and scheduling, and further improves the overall processing efficiency.
(3) Due to the complexity of an astronomical data structure, the problem of shared memory can occur only by carrying out complex data processing and exchange through multiple processes, and the multiple processes and the adoption of multiple sub-threads for processing in each process provided by the invention not only fully utilize the parallel computing capability of a multi-core processor, but also effectively avoid the problem of shared memory through reasonable task division and inter-thread communication mechanisms, thereby remarkably improving the data processing efficiency, reducing the system overhead and enhancing the stability and the expandability of programs.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow diagram of a parallel method for a pulsar Fourier domain acceleration search pipeline provided by an embodiment of the invention;
FIG. 2 is a schematic illustration of a sub-process workflow in each process provided by an embodiment of the present invention;
FIG. 3 is a schematic flow chart of processing an original process of a timeout task according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a parallel device of a pulsar Fourier domain acceleration search pipeline according to an embodiment of the present invention;
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description is presented by way of example only and is not intended to limit the scope of the invention.
Aiming at the problems of low parallelism and execution efficiency of a pulsar Fourier domain acceleration search algorithm in the prior art, the embodiment of the invention provides a pulsar Fourier domain acceleration search pipeline parallelism method and device, which are used for processing astronomical data by designing multiple processes and designing a multithreading processing stage comprising a first CPU sub-thread, a GPU sub-thread and a second CPU sub-thread in each process, so that the parallel processing efficiency of astronomical data can be accelerated, the pulsar double-star search process can be effectively improved, and the speed of searching the type of astronomical object from FAST astronomical data can be improved by tens of times by using multiple GPU parallel acceleration.
Fig. 1 is a schematic flow chart of a parallel method of a pulsar fourier domain acceleration search pipeline according to an embodiment of the present invention. As shown in fig. 1, an embodiment provides a pulsar fourier domain accelerated search pipeline parallel method, which includes the following steps:
s1, distributing the received astronomical data to a plurality of parallel processes with configurable quantity.
In an embodiment, astronomical data is received, which is an FFT (fast fourier transform) file generated by realfft commands in the pulsar search and analysis software developed by Scott Ransom, the FFT file containing frequency information of astronomical observation signals, in particular fourier transform results based on sampling points.
A task queue for receiving and managing FFT files is initialized. Initializing a result queue for receiving the final processing result, and receiving the processing result generated by the second CPU sub-thread after each task is completed through the result queue, wherein the content comprises output data for accelerating searching and related statistical information thereof. Initializing a working process pool formed by a plurality of working processes, wherein each process comprises three sub-threads and two intermediate result queues, and creating a specified number of working threads for each GPU, so that the system can efficiently process FFT files from different sources.
Traversing file catalogues in FFT files, generating an independent task for each file, distributing the independent task to a global task queue for waiting for processing, and initializing a plurality of processing processes for acquiring different tasks from the global task queue for processing.
S2, dividing each process into three serial sub-threads, preprocessing the distributed astronomical data by using a first CPU sub-thread, reading the preprocessed data by using a GPU sub-thread, performing accelerated calculation on the GPU to obtain candidate signal data, reading the candidate signal data by using a second CPU sub-thread, performing post-processing and result summarizing, and simultaneously synchronizing task states among the sub-threads by using queues and queue blocking locks of a plurality of parallel process architectures.
In an embodiment, as shown in fig. 2, in the first CPU sub-thread, tasks are extracted from the task queue, and the allocated astronomical data is preprocessed, including creating and initializing a harmonic and sub-harmonic information structure, where the harmonic and sub-harmonic information structure includes a frequency domain data distribution of each harmonic and its memory requirements, in preparation for a subsequent efficient FFT transformation on the GPU. Harmonics (Harmonic) refer to frequency components that occur at integer multiples of the original signal frequency, which are typically extracted from the signal by fourier transform or other spectral analysis means, to enhance or filter out signal components at specific frequencies. Sub-harmonics (Sub-harmonics) refer to frequency components below the fundamental Harmonic, typically a fraction (e.g., 1/2, 1/3, etc.) of the fundamental frequency, which are used in some signal processing to more finely decompose the spectral information of the signal to support multi-level frequency analysis. And transmitting the harmonic and subharmonic information structure bodies to the GPU sub-thread through the first intermediate result queue, and continuing to calculate.
In an embodiment, as shown in fig. 2, in the GPU sub-thread, the harmonic and sub-harmonic information structures are obtained through the first intermediate result queue, a specific GPU device is allocated according to the environmental variable cuda_visible_device, and a specific resource is allocated on the GPU according to a predefined task parameter, and accelerated computation including fourier transformation and candidate signal data generation is performed, so as to form candidate signal data of a linked list structure. Processing time is greatly shortened through GPU acceleration, and overall searching efficiency is improved. And finally, transmitting the candidate signal data to a second CPU sub-thread through a second intermediate result queue. Wherein the predefined task parameters include:
(1) zmax is the maximum z value of acceleration search, wherein the z value represents the width of a Fourier window, controls the frequency resolution and has direct influence on the signal detection precision;
(2) wmax is the maximum w value of acceleration search, w represents the acceleration search depth parameter, the higher the z value is, the wider the acceleration range of the signal in the Fourier domain is controlled;
(3) sigma, threshold value of signal detection;
(4) numharm accelerating the number of harmonics used in the search;
(5) batchsize the data amount processed by one task.
In the embodiment, as shown in fig. 2, in the second CPU sub-thread, candidate signal data in the linked list structure is obtained through the second intermediate result queue, post-processing including sorting and screening is performed, and a CSV file containing detailed information of candidate pulsar signals is formatted and output to the result queue, so that after all processes are calculated, the whole pulsar search task is completed.
Meanwhile, the task states among all the sub-threads are synchronized by using the queues of the multi-process architecture and the queue blocking locks. Wherein the task state includes:
(1) Task ID;
(2) A current processing stage;
(3) The process ID of the execution;
(4) The start time and the last active time of the task;
(5) A time stamp of task completion or a timeout state.
S3, dynamically adjusting the number of processes in the processing flow by monitoring feedback.
In an embodiment, when the task execution timeout is detected, the task executing the timeout is reassigned to the task queue. As shown in fig. 3, the processing of the original process where the timeout task is located includes:
(1) Setting a stop mark of the process to terminate the original process;
(2) Monitoring the exit state of the process and waiting for the process to exit normally in a set time;
(3) If the process does not exit within the preset time, the process is forcedly terminated;
(4) After the original process exits, a new process is re-created and added into the working process pool to ensure that the number of processes in the working process pool reaches the specified number, ensure the continuity and stability of task processing and ensure the dynamic recovery and reasonable utilization of system resources.
Through the monitoring process, the dynamic allocation and load balancing of the tasks are realized, and the optimal utilization of resources and the efficient execution of the tasks are ensured.
S4, recording the task state and the calculation results of a plurality of parallel processes in real time and processing abnormal conditions.
In an embodiment, the execution state of each task is tracked and recorded, and the task execution state is written into a global task state dictionary, so that the result of task execution can be collected and returned efficiently. And outputting an overall result after all tasks are processed, wherein the overall result comprises the processing time, the processing state and the final processing output of each file.
In addition, debug mode and non-debug mode selection are provided, the standard output is redirected to remove redundant task execution log records in the non-debug mode, detailed task execution logs are reserved in the debug mode, and monitoring and troubleshooting are performed according to the detailed task execution logs.
In summary, the pulsar Fourier domain acceleration search pipeline parallel method provided by the embodiment of the invention can promote the pulsar double-star search process with a plurality of times of performance advantages, and simultaneously support the speed of searching the type of celestial body from FAST astronomical data by using multi-GPU parallel acceleration, wherein the speed can be increased by tens of times.
Based on the same inventive concept, as shown in fig. 4, the embodiment of the invention further provides a pulsar fourier domain accelerated search pipeline parallel device 400, which comprises a task allocation module 410, a task execution module 420, a task monitoring module 430 and a task recording module 440.
The task allocation module 410 is configured to allocate the received astronomical data to a plurality of parallel processes with a configurable number;
The task execution module 420 is configured to divide each process into three serial sub-threads, pre-process the allocated astronomical data by using a first CPU sub-thread, read the pre-processed data by using a GPU sub-thread and perform accelerated computation on the GPU to obtain candidate signal data, read the candidate signal data by using a second CPU sub-thread and perform post-processing and result summarization, and synchronize task states among the sub-threads by using queues and queue blocking locks of multiple parallel process architectures;
the task monitoring module 430 is configured to dynamically adjust the number of processes in the processing flow by monitoring feedback;
The task recording module 440 is configured to record the task state and the calculation results of the multiple parallel processes in real time and process the abnormal situation.
Based on the same inventive concept, as shown in fig. 5, an electronic device 500 is further provided according to an embodiment of the present invention, which includes a memory 510 and one or more processors 520, where the memory 510 is configured to store a computer program, and the processors 520 are configured to implement the above-mentioned pulsar fourier domain accelerated search pipeline parallel method when executing the computer program.
It should be noted that, the pulsar fourier domain acceleration search pipeline parallel device and the electronic device provided in the foregoing embodiments all belong to the same inventive concept as a pulsar fourier domain acceleration search pipeline parallel method, and specific implementation processes of the pulsar fourier domain acceleration search pipeline parallel device and the pulsar fourier domain acceleration search pipeline parallel method are detailed in an embodiment of a pulsar fourier domain acceleration search pipeline parallel method, which is not described herein again.
The foregoing detailed description of the preferred embodiments and advantages of the invention will be appreciated that the foregoing description is merely illustrative of the presently preferred embodiments of the invention, and that no changes, additions, substitutions and equivalents of those embodiments are intended to be included within the scope of the invention.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411614716.0A CN119127514B (en) | 2024-11-13 | 2024-11-13 | Pulsar Fourier domain acceleration search pipeline parallel method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202411614716.0A CN119127514B (en) | 2024-11-13 | 2024-11-13 | Pulsar Fourier domain acceleration search pipeline parallel method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN119127514A CN119127514A (en) | 2024-12-13 |
CN119127514B true CN119127514B (en) | 2025-04-22 |
Family
ID=93765987
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202411614716.0A Active CN119127514B (en) | 2024-11-13 | 2024-11-13 | Pulsar Fourier domain acceleration search pipeline parallel method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN119127514B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107205362A (en) * | 2014-12-03 | 2017-09-26 | 斯马特博有限公司 | Method for obtaining the information on farm-animals |
CN117494060A (en) * | 2023-11-15 | 2024-02-02 | 河海大学 | GPU-based method for mining variable-length motifs in trend data |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8528001B2 (en) * | 2008-12-15 | 2013-09-03 | Oracle America, Inc. | Controlling and dynamically varying automatic parallelization |
WO2016109277A1 (en) * | 2015-01-02 | 2016-07-07 | Systech Corporation | Control infrastructure |
CN111368252A (en) * | 2020-02-28 | 2020-07-03 | 中国科学院新疆天文台 | Pulsar coherent de-dispersion system and method |
CN117851330A (en) * | 2023-11-06 | 2024-04-09 | 中国科学院新疆天文台 | Ultra-wideband pulsar data processing method based on GPU cluster |
-
2024
- 2024-11-13 CN CN202411614716.0A patent/CN119127514B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107205362A (en) * | 2014-12-03 | 2017-09-26 | 斯马特博有限公司 | Method for obtaining the information on farm-animals |
CN117494060A (en) * | 2023-11-15 | 2024-02-02 | 河海大学 | GPU-based method for mining variable-length motifs in trend data |
Also Published As
Publication number | Publication date |
---|---|
CN119127514A (en) | 2024-12-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Camp et al. | Streamline integration using MPI-hybrid parallelism on a large multicore architecture | |
CN106055311B (en) | MapReduce tasks in parallel methods based on assembly line multithreading | |
Zeebaree et al. | Multicomputer multicore system influence on maximum multi-processes execution time | |
Chen et al. | GPU-accelerated high-throughput online stream data processing | |
US20170371713A1 (en) | Intelligent resource management system | |
Zhong et al. | Towards GPU-accelerated large-scale graph processing in the cloud | |
Valero-Lara et al. | Many-task computing on many-core architectures | |
Jiang et al. | Accelerating MapReduce framework on multi-GPU systems | |
CN119127514B (en) | Pulsar Fourier domain acceleration search pipeline parallel method and device | |
Alnaasan et al. | ACCDP: accelerated data-parallel distributed DNN training for modern GPU-based HPC clusters | |
Wu et al. | A model-based software solution for simultaneous multiple kernels on GPUs | |
Ravi et al. | Runway: In-transit data compression on heterogeneous hpc systems | |
Chong et al. | A Multi-GPU framework for in-memory text data analytics | |
Liu et al. | A-MapCG: an adaptive MapReduce framework for GPUs | |
US20150242323A1 (en) | Source-to-source compiler and run-time library to transparently accelerate stack or queue-based irregular applications on many-core architectures | |
Jones et al. | Evolution of HEP Processing Frameworks | |
Khlevna et al. | Parallel and Distributed Machine Learning Techniques for Anomaly Detection Systems. | |
Wenjie et al. | HSK: A Hierarchical Parallel Simulation Kernel for Multicore Platform | |
Dang et al. | Test Data Generation based on Multiprocess Enhanced Multi-Population Genetic Algorithm | |
Ivanescu et al. | Parallel vs distributed edge detection for large medical image datasets | |
CN115827251B (en) | Heterogeneous platform-based high-performance Linpack benchmark test program optimization method and equipment | |
Su et al. | Optimistic parallel discrete event simulation based on multi-core platform and its performance analysis | |
Yang et al. | An efficient parallel ISODATA algorithm based on Kepler GPUs | |
Wang | Reliability speedup: an effective metric for parallel application with checkpointing | |
Song et al. | Large Dynamic Graph Processing with GPU-Accelerated Priority-Driven Differential Scheduling and Operation Reduction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |