[go: up one dir, main page]

CN102768675A - A Parallel Astronomical Cross-certification Method - Google Patents

A Parallel Astronomical Cross-certification Method Download PDF

Info

Publication number
CN102768675A
CN102768675A CN2012101943085A CN201210194308A CN102768675A CN 102768675 A CN102768675 A CN 102768675A CN 2012101943085 A CN2012101943085 A CN 2012101943085A CN 201210194308 A CN201210194308 A CN 201210194308A CN 102768675 A CN102768675 A CN 102768675A
Authority
CN
China
Prior art keywords
astronomical
matrix
cross
node
catalog data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101943085A
Other languages
Chinese (zh)
Other versions
CN102768675B (en
Inventor
孙济洲
王润涛
肖健
于策
孙超
刘旭
尹伶艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201210194308.5A priority Critical patent/CN102768675B/en
Publication of CN102768675A publication Critical patent/CN102768675A/en
Application granted granted Critical
Publication of CN102768675B publication Critical patent/CN102768675B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a parallel astronomical cross identification method. The parallel astronomical cross identification method adopts a cluster system environment as the execution environment and comprises the following steps: step one, building a cluster calculation environment; step two, establishing a multi-dimensional matrix model according to astronomical catalog data; step three, distributing matrix blocks among clusters; and step four, performing distributed calculation and starting the threads with the number as same as that of CPUs (Central Processing Units) of the node to each node in the cluster system. Files of two astronomical catalog data with the same serial number are distributed to the thread of the current node to perform cross identification calculation by a polling mode to the two astronomical catalogs to be parallelly identified, and the result is written into temporary files; and after calculation of all the threads is finished, the temporary files are gathered to obtain the final identification result. Compared with the prior art, the invention provides the cross identification method, which is easy to operate, high in parallelism degree and has high extendability. On the premise of guaranteeing the identification correctness, the performance of cross identification can be improved.

Description

一种并行天文交叉证认方法A Parallel Astronomical Cross-certification Method

技术领域 technical field

本发明涉及天文证认技术领域;特别是涉及一种基于多波段海量天文星表数据的交叉证认方法。The invention relates to the technical field of astronomical authentication; in particular, it relates to a cross-authentication method based on multi-band massive astronomical star catalog data.

背景技术 Background technique

天文交叉证认是用于多波段星表数据融合的关键技术,将不同星表数据中的源根据位置信息关联起来,如存在位置相同或在一定的误差范围内的源,则将它们证认为同一天体。Astronomical cross-certification is the key technology for the fusion of multi-band star catalog data. It associates the sources in different star catalog data according to the location information. If there are sources with the same position or within a certain error range, they will be certified as the same celestial body.

经文献检索发现,国内外常用的交叉证认技术主要有两种:According to the literature search, there are two main types of cross-certification technologies commonly used at home and abroad:

一是基于关系型数据库的交叉证认技术,主要指将星表数据存入数据库,建立索引,查询数据,进行交叉证认计算的技术。具有代表性的是美国虚拟天文台的OpenSkyQuery(http://openskyquery.net),其采用Microsoft SQL Server作为底层数据库,并内置了纯SQL交叉证认算法。但是,这种方法受限于数据库系统与内存容量,每次只能对少量数据进行交叉证认。The first is the cross-certification technology based on relational database, which mainly refers to the technology of storing star catalog data in the database, building indexes, querying data, and performing cross-certification calculations. A representative one is OpenSkyQuery ( http://openskyquery.net ) of the American Virtual Observatory, which uses Microsoft SQL Server as the underlying database and has a built-in pure SQL cross-certification algorithm. However, this method is limited by the database system and memory capacity, and can only cross-certify a small amount of data each time.

二是基于文本数据处理的交叉证认技术,主要指对海量文本数据的读取、分析、交叉证认处理的方法,具有代表性的是基于hadoop集群的交叉证认方法(赵青,孙济洲等.基于MapReduce模型的分布式天文交叉证认[J].计算机应用研究.2010(9).),其先将文本数据上传到hadoop的HDFS文件系统,然后根据HealPix天区索引方法,在预处理过程中计算每条数据所属天区,将属于同一天区的星表数据移动到相同的文件块,并通过标记区分来自不同星表的数据,之后再对同一文件块中不同来源的数据进行交叉证认。该方法的优点是通过预处理分区过程减少了计算量,分布式计算达到了加速的目的,并且易于扩展,缺点是预处理过程耗时较长。The second is the cross-certification technology based on text data processing, which mainly refers to the method of reading, analyzing, and cross-certification processing of massive text data. The representative one is the cross-certification method based on Hadoop cluster (Zhao Qing, Sun Jizhou, etc. .Distributed astronomical cross-authentication based on MapReduce model[J].Computer Application Research.2010(9).), it uploads the text data to the HDFS file system of Hadoop first, and then according to the HealPix sky area index method, in the preprocessing In the process, calculate the sky area to which each piece of data belongs, move the star catalog data belonging to the same sky area to the same file block, and distinguish the data from different star catalogs by marking, and then cross the data from different sources in the same file block Certified. The advantage of this method is that the amount of calculation is reduced by preprocessing the partition process, the distributed computing achieves the purpose of acceleration, and it is easy to expand. The disadvantage is that the preprocessing process takes a long time.

随着天文观测技术的发展,天文数据呈爆发式增长,开发一种高效的交叉证认方法迫在眉睫。With the development of astronomical observation technology, astronomical data is growing explosively, and it is imminent to develop an efficient cross-certification method.

发明内容 Contents of the invention

基于上述现有技术存在的问题,本发明提出了一种并行天文交叉证认方法,利用天文星表数据建立多维矩阵模型,在集群系统环境中采用分布式运行方式实现天文星表的交叉证认。Based on the problems existing in the above-mentioned prior art, the present invention proposes a parallel astronomical cross-certification method, which uses the astronomical star catalog data to establish a multidimensional matrix model, and adopts a distributed operation mode in the cluster system environment to realize the cross-certification of the astronomical star catalog .

本发明提供一种并行天文交叉证认方法,以集群系统环境作为执行环境,该方法包括以下步骤:The present invention provides a parallel astronomical cross-certification method, using the cluster system environment as the execution environment, the method includes the following steps:

步骤一:搭建集群计算环境;Step 1: Build a cluster computing environment;

步骤二:根据天文星表数据建立多维矩阵模型,具体包括以下处理:Step 2: Establish a multidimensional matrix model based on the astronomical star catalog data, specifically including the following processing:

(1)选择矩阵维度和矩阵单元属性(1) Select matrix dimensions and matrix cell attributes

以星表数据中的位置属性即赤经RA、赤纬DEC属性为两个维度,以天文星表数据的其他属性作为每个矩阵单元的属性建立多维矩阵;Taking the position attribute in the star catalog data, that is, the right ascension RA and declination DEC attributes as two dimensions, and using other attributes of the astronomical star catalog data as the attributes of each matrix unit to establish a multidimensional matrix;

(2)选择矩阵块大小(2) Select the matrix block size

选择矩阵块大小,以此块大小对整个多维矩阵进行划分,得到多个矩阵块;Select the matrix block size, divide the entire multidimensional matrix with this block size, and obtain multiple matrix blocks;

步骤三:在集群间实现矩阵块分发,具体包括以下处理:Step 3: Realize matrix block distribution between clusters, including the following processing:

对每个天文星表,根据步骤二中所划分的矩阵块数,为每个矩阵块建立对应的文件,将天文星表数据按照位置信息即赤经RA、赤纬DEC计算出各数据所属矩阵块并写入到相应的文件中;之后以轮询的方式将这些文件分发到集群中的各个节点上,即第i块放到第i%n个节点上;For each astronomical star catalog, according to the number of matrix blocks divided in step 2, a corresponding file is established for each matrix block, and the astronomical star catalog data is calculated according to the position information, namely the right ascension RA and declination DEC to calculate the matrix to which each data belongs block and write it into the corresponding file; then distribute these files to each node in the cluster in a round-robin manner, that is, put the i-th block on the i%n-th node;

步骤4:进行分布式计算,具体包括以下处理:Step 4: Perform distributed computing, specifically including the following processing:

对于集群系统中的每个节点,启动与该节点CPU数相同的线程数;对于需要并行证认的两个星表,以轮询的方式将编号相同的两个天文星表数据的文件分配给当前所在节点上的线程进行交叉证认计算,将结果写入临时文件;For each node in the cluster system, start the same number of threads as the number of CPUs of the node; for the two star catalogs that need to be certified in parallel, assign the two astronomical star catalog data files with the same number to the The thread on the current node performs cross-certification calculation and writes the result to a temporary file;

在所有线程的计算完成后,将所有临时文件汇总,得到最终证认结果。After the calculation of all threads is completed, all temporary files are aggregated to obtain the final authentication result.

与现有技术相比,本发明提供一种易于操作,并行度高,可扩展性强的交叉证认方法,在保证证认正确性的前提下,能够提高交叉证认的性能。Compared with the prior art, the present invention provides a cross-certification method that is easy to operate, has high parallelism and strong scalability, and can improve the performance of the cross-certification on the premise of ensuring the correctness of the certification.

附图说明 Description of drawings

图1为天文星表数据矩阵模型示意图;Fig. 1 is a schematic diagram of the astronomical star catalog data matrix model;

图2为集群间矩阵块分发示意图;Fig. 2 is a schematic diagram of inter-cluster matrix block distribution;

图3为节点上多线程协作示意图。Fig. 3 is a schematic diagram of multi-thread cooperation on a node.

具体实施方式 Detailed ways

以下结合附图及较佳实施例,对依据本发明提供的具体实施方式、结构、特征及其功效,详细说明如下。The specific implementation, structure, features and effects provided by the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments.

本发明提出了一种并行天文交叉证认方法,基于多维矩阵模型的分布式地实现,主要分为以下四个步骤:The present invention proposes a parallel astronomical cross-certification method, based on the distributed implementation of the multidimensional matrix model, which is mainly divided into the following four steps:

步骤1:环境搭建,搭建集群计算环境。Step 1: Environment construction, build a cluster computing environment.

步骤2:建立多维矩阵模型Step 2: Build a multidimensional matrix model

天文星表数据主要由位置信息(包括:赤经,简记为RA;赤纬,简记为DEC)和其他观测值组成,本发明为天文数据所建立的矩阵如图1所示,建立该多维矩阵模型的具体步骤如下:Astronomical star catalog data is mainly made up of position information (comprising: right ascension, abbreviated as RA; declination, abbreviated as DEC) and other observation values, the matrix that the present invention establishes for astronomical data is as shown in Figure 1, establishes this The specific steps of the multidimensional matrix model are as follows:

(1)选择矩阵维度和矩阵单元属性(1) Select matrix dimensions and matrix cell attributes

星表数据具有多个属性,以星表数据中的RA(赤经,范围为0-360)、DEC(赤纬,范围为-90-90)属性为两个维度,将其他属性作为每个矩阵单元的属性建立二维矩阵。对于RA和DEC,要根据星表数据中两者的精度扩大相应的倍数,将其化为整数。例如RA、DEC的精度均为10-6,则将星表数据中RA、DEC的值均扩大106,使RA的范围为0—360000000,DEC的范围为-90000000—90000000,从而矩阵大小为360000000×180000000。The star catalog data has multiple attributes, with the RA (right ascension, range 0-360) and DEC (declination, range -90-90) attributes in the star catalog data as two dimensions, and other attributes as each The properties of the matrix elements create a two-dimensional matrix. For RA and DEC, it is necessary to expand the corresponding multiples according to the precision of the two in the star catalog data, and turn them into integers. For example, the accuracy of RA and DEC are both 10 -6 , then the values of RA and DEC in the star catalog data are expanded by 10 6 , so that the range of RA is 0-360000000, and the range of DEC is -90000000-90000000, so the matrix size is 360000000×180000000.

(2)选择矩阵块大小(2) Select the matrix block size

合理选择矩阵块大小划分矩阵可提高并行度,从而提高数据处理性能。例如,若将矩阵划分为3600块,则每块大小为6000000×3000000。Reasonable choice of matrix block size to partition the matrix can increase the degree of parallelism, thereby improving the performance of data processing. For example, if the matrix is divided into 3600 blocks, the size of each block is 6000000×3000000.

步骤3:在集群间分发矩阵块Step 3: Distribute matrix blocks across the cluster

对每个星表,根据步骤(2)中的分块数,为每个矩阵块建立对应的文件(文件带有矩阵块编号作为标记),将星表数据按照位置信息计算出所属矩阵块并写入到相应的文件中;之后以轮询的方式将这些文件分发到集群中的各个节点上,即第i块放到第i%n个节点上。For each star catalog, according to the number of blocks in step (2), create a corresponding file for each matrix block (the file has a matrix block number as a mark), calculate the matrix block to which the star catalog data belongs according to the position information, and Write it into the corresponding file; then distribute these files to each node in the cluster in a round-robin manner, that is, put the i-th block on the i%n-th node.

步骤4:进行分布式计算Step 4: Do Distributed Computing

对于每个节点,启动与该节点CPU数相同的线程数。对于需要并行证认的两个星表,步骤(3)保证了它们在每个节点上的文件数目和编号均相同,而且相同编号的文件中的数据具有相同的坐标范围,故以轮询的方式将编号相同的两个星表的数据文件分配给这个节点上的线程进行交叉证认计算,将结果写入临时文件。For each node, start the number of threads equal to the number of CPUs on that node. For the two star catalogs that need to be certified in parallel, step (3) ensures that the number and number of their files on each node are the same, and the data in the files with the same number have the same coordinate range, so the polling The method assigns the data files of two star catalogs with the same number to the thread on this node to perform cross-certification calculation, and write the result into a temporary file.

在所有线程的计算完成后,将所有临时文件汇总,得到最终结果。After the calculation of all threads is completed, all temporary files are aggregated to obtain the final result.

Claims (2)

1.一种并行天文交叉证认方法,以集群系统环境作为执行环境,其特征在于,该方法包括以下步骤:1. A parallel astronomical cross-certification method, using the cluster system environment as the execution environment, is characterized in that the method comprises the following steps: 步骤一:搭建集群计算环境;Step 1: Build a cluster computing environment; 步骤二:根据天文星表数据建立多维矩阵模型,具体包括以下处理:Step 2: Establish a multidimensional matrix model based on the astronomical star catalog data, specifically including the following processing: (1)选择矩阵维度和矩阵单元属性(1) Select matrix dimensions and matrix cell attributes 以星表数据中的位置属性即赤经RA、赤纬DEC属性为两个维度,以天文星表数据的其他属性作为每个矩阵单元的属性建立多维矩阵;Taking the position attribute in the star catalog data, that is, the right ascension RA and declination DEC attributes as two dimensions, and using other attributes of the astronomical star catalog data as the attributes of each matrix unit to establish a multidimensional matrix; (2)选择矩阵块大小(2) Select the matrix block size 选择矩阵块大小,以此块大小对整个多维矩阵进行过划分,得到多个矩阵块;Select the matrix block size, divide the entire multidimensional matrix with this block size, and obtain multiple matrix blocks; 步骤三:在集群间实现矩阵块分发,具体包括以下处理:Step 3: Realize matrix block distribution between clusters, including the following processing: 对每个天文星表,根据步骤二中所划分的矩阵块数,为每个矩阵块建立对应的文件,将天文星表数据按照位置信息即赤经RA、赤纬DEC计算出各数据所属矩阵块并写入到相应的文件中;之后以轮询的方式将这些文件分发到集群中的各个节点上,即第i块放到第i%n个节点上;For each astronomical star catalog, according to the number of matrix blocks divided in step 2, a corresponding file is established for each matrix block, and the astronomical star catalog data is calculated according to the position information, namely the right ascension RA and declination DEC to calculate the matrix to which each data belongs block and write it into the corresponding file; then distribute these files to each node in the cluster in a round-robin manner, that is, put the i-th block on the i%n-th node; 步骤4:进行分布式计算,具体包括以下处理:Step 4: Perform distributed computing, specifically including the following processing: 对于集群系统中的每个节点,启动与该节点CPU数相同的线程数;对于需要并行证认的两个星表,以轮询的方式将编号相同的两个天文星表数据的文件分配给当前所在节点上的线程进行交叉证认计算,将结果写入临时文件;For each node in the cluster system, start the same number of threads as the number of CPUs of the node; for the two star catalogs that need to be certified in parallel, assign the two astronomical star catalog data files with the same number to the The thread on the current node performs cross-certification calculation and writes the result to a temporary file; 在所有线程的计算完成后,将所有临时文件汇总,得到最终证认结果。After the calculation of all threads is completed, all temporary files are aggregated to obtain the final authentication result. 2.如权利要求1所述的并行天文交叉证认方法,其特征在于,所述以星表数据中的位置属性即赤经RA、赤纬DEC属性为两个维度的步骤中,对于赤经RA和赤纬DEC,根据在天文星表数据中两者的精度分别扩大相应的倍数,将两者化为整数;如果精度为10-n,则扩大倍数为10n。2. the parallel astronomical cross authentication method as claimed in claim 1, is characterized in that, in the described step with the position attribute in star catalog data namely right ascension RA, declination DEC attribute are two dimensions, for right ascension RA and declination DEC, according to the precision of the two in the astronomical star catalog data, expand the corresponding multiples respectively, and turn them into integers; if the precision is 10 -n , then the expansion multiple is 10n.
CN201210194308.5A 2012-06-13 2012-06-13 Parallel astronomical cross identification method Expired - Fee Related CN102768675B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210194308.5A CN102768675B (en) 2012-06-13 2012-06-13 Parallel astronomical cross identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210194308.5A CN102768675B (en) 2012-06-13 2012-06-13 Parallel astronomical cross identification method

Publications (2)

Publication Number Publication Date
CN102768675A true CN102768675A (en) 2012-11-07
CN102768675B CN102768675B (en) 2014-11-12

Family

ID=47096079

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210194308.5A Expired - Fee Related CN102768675B (en) 2012-06-13 2012-06-13 Parallel astronomical cross identification method

Country Status (1)

Country Link
CN (1) CN102768675B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491471A (en) * 2017-06-19 2017-12-19 天津科技大学 Extensive chronometer data day area covering generation method based on Spark
CN111414572A (en) * 2020-04-10 2020-07-14 中国科学院国家天文台 Cross-certification method, device and readable storage medium for radio star catalogue and infrared star catalogue
CN113485638A (en) * 2021-06-07 2021-10-08 贵州大学 Access optimization system for massive astronomical data

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6816848B1 (en) * 2000-06-12 2004-11-09 Ncr Corporation SQL-based analytic algorithm for cluster analysis
CN101710286A (en) * 2009-12-23 2010-05-19 天津大学 Parallel programming model system of DAG oriented data driving type application and realization method
CN101826016A (en) * 2010-05-13 2010-09-08 天津大学 Visual modeling and code skeleton generating method for supporting design of multinuclear parallel program
CN101887367A (en) * 2010-06-22 2010-11-17 天津大学 A Multilevel Parallel Programming Method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6816848B1 (en) * 2000-06-12 2004-11-09 Ncr Corporation SQL-based analytic algorithm for cluster analysis
CN101710286A (en) * 2009-12-23 2010-05-19 天津大学 Parallel programming model system of DAG oriented data driving type application and realization method
CN101826016A (en) * 2010-05-13 2010-09-08 天津大学 Visual modeling and code skeleton generating method for supporting design of multinuclear parallel program
CN101887367A (en) * 2010-06-22 2010-11-17 天津大学 A Multilevel Parallel Programming Method

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋烜 等: "用MapReduce实现天文星表交叉认证", 《计算机应用研究》 *
赵青 等: "基于MapReduce模型的分布式天文交叉证认", 《计算机应用研究》 *
赵青 等: "面向海量数据的并行天文交叉证认", 《计算机应用》 *
赵青: "面向海量数据的高效天文交叉证认的研究", 《中国博士学位论文全文数据库 信息科技辑 》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107491471A (en) * 2017-06-19 2017-12-19 天津科技大学 Extensive chronometer data day area covering generation method based on Spark
CN107491471B (en) * 2017-06-19 2020-05-22 天津科技大学 A Spark-based method for generating sky area coverage for large-scale astronomical data
CN111414572A (en) * 2020-04-10 2020-07-14 中国科学院国家天文台 Cross-certification method, device and readable storage medium for radio star catalogue and infrared star catalogue
CN111414572B (en) * 2020-04-10 2023-06-13 中国科学院国家天文台 Method, device and readable storage medium for cross-certification of radio star catalog and infrared star catalog
CN113485638A (en) * 2021-06-07 2021-10-08 贵州大学 Access optimization system for massive astronomical data

Also Published As

Publication number Publication date
CN102768675B (en) 2014-11-12

Similar Documents

Publication Publication Date Title
CN104376053B (en) A kind of storage and retrieval method based on magnanimity meteorological data
Shao et al. Efficient cohesive subgraphs detection in parallel
覃雄派 et al. Big data analysis—competition and symbiosis of RDBMS and MapReduce
Neelakandan et al. Large scale optimization to minimize network traffic using MapReduce in big data applications
CN103793442B (en) The processing method and system of spatial data
CN102662639A (en) Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method
CN103413067A (en) Abstract convex lower-bound estimation based protein structure prediction method
CN105956666B (en) A kind of machine learning method and system
US11030714B2 (en) Wide key hash table for a graphics processing unit
CN105405070A (en) Distributed memory power grid system construction method
CN103699656A (en) GPU-based mass-multimedia-data-oriented MapReduce platform
Wang et al. Distributed storage and index of vector spatial data based on HBase
Liu et al. Profiling and improving i/o performance of a large-scale climate scientific application
You et al. Spatial join query processing in cloud: Analyzing design choices and performance comparisons
CN105718561A (en) Particular distributed data storage file structure redundancy removing construction method and system
Ding et al. ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms
Hashem et al. An Integrative Modeling of BigData Processing.
CN107358061A (en) Elasticity distribution formula sequence alignment system and method based on Spark and SIMD
CN102768675A (en) A Parallel Astronomical Cross-certification Method
Ji et al. Scalable nearest neighbor query processing based on inverted grid index
Mittal et al. Efficient random data accessing in MapReduce
Xun et al. Parallel spatial index algorithm based on Hilbert partition
Zhou et al. SparkSCAN: a structure similarity clustering algorithm on spark
Ding et al. Commapreduce: An improvement of mapreduce with lightweight communication mechanisms
CN106484818B (en) A Hierarchical Clustering Method Based on Hadoop and HBase

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20141112

CF01 Termination of patent right due to non-payment of annual fee