CN102768675A - A Parallel Astronomical Cross-certification Method - Google Patents
A Parallel Astronomical Cross-certification Method Download PDFInfo
- Publication number
- CN102768675A CN102768675A CN2012101943085A CN201210194308A CN102768675A CN 102768675 A CN102768675 A CN 102768675A CN 2012101943085 A CN2012101943085 A CN 2012101943085A CN 201210194308 A CN201210194308 A CN 201210194308A CN 102768675 A CN102768675 A CN 102768675A
- Authority
- CN
- China
- Prior art keywords
- astronomical
- matrix
- cross
- node
- catalog data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 48
- 238000004364 calculation method Methods 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000005192 partition Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004883 computer application Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000013332 literature search Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域 technical field
本发明涉及天文证认技术领域;特别是涉及一种基于多波段海量天文星表数据的交叉证认方法。The invention relates to the technical field of astronomical authentication; in particular, it relates to a cross-authentication method based on multi-band massive astronomical star catalog data.
背景技术 Background technique
天文交叉证认是用于多波段星表数据融合的关键技术,将不同星表数据中的源根据位置信息关联起来,如存在位置相同或在一定的误差范围内的源,则将它们证认为同一天体。Astronomical cross-certification is the key technology for the fusion of multi-band star catalog data. It associates the sources in different star catalog data according to the location information. If there are sources with the same position or within a certain error range, they will be certified as the same celestial body.
经文献检索发现,国内外常用的交叉证认技术主要有两种:According to the literature search, there are two main types of cross-certification technologies commonly used at home and abroad:
一是基于关系型数据库的交叉证认技术,主要指将星表数据存入数据库,建立索引,查询数据,进行交叉证认计算的技术。具有代表性的是美国虚拟天文台的OpenSkyQuery(http://openskyquery.net),其采用Microsoft SQL Server作为底层数据库,并内置了纯SQL交叉证认算法。但是,这种方法受限于数据库系统与内存容量,每次只能对少量数据进行交叉证认。The first is the cross-certification technology based on relational database, which mainly refers to the technology of storing star catalog data in the database, building indexes, querying data, and performing cross-certification calculations. A representative one is OpenSkyQuery ( http://openskyquery.net ) of the American Virtual Observatory, which uses Microsoft SQL Server as the underlying database and has a built-in pure SQL cross-certification algorithm. However, this method is limited by the database system and memory capacity, and can only cross-certify a small amount of data each time.
二是基于文本数据处理的交叉证认技术,主要指对海量文本数据的读取、分析、交叉证认处理的方法,具有代表性的是基于hadoop集群的交叉证认方法(赵青,孙济洲等.基于MapReduce模型的分布式天文交叉证认[J].计算机应用研究.2010(9).),其先将文本数据上传到hadoop的HDFS文件系统,然后根据HealPix天区索引方法,在预处理过程中计算每条数据所属天区,将属于同一天区的星表数据移动到相同的文件块,并通过标记区分来自不同星表的数据,之后再对同一文件块中不同来源的数据进行交叉证认。该方法的优点是通过预处理分区过程减少了计算量,分布式计算达到了加速的目的,并且易于扩展,缺点是预处理过程耗时较长。The second is the cross-certification technology based on text data processing, which mainly refers to the method of reading, analyzing, and cross-certification processing of massive text data. The representative one is the cross-certification method based on Hadoop cluster (Zhao Qing, Sun Jizhou, etc. .Distributed astronomical cross-authentication based on MapReduce model[J].Computer Application Research.2010(9).), it uploads the text data to the HDFS file system of Hadoop first, and then according to the HealPix sky area index method, in the preprocessing In the process, calculate the sky area to which each piece of data belongs, move the star catalog data belonging to the same sky area to the same file block, and distinguish the data from different star catalogs by marking, and then cross the data from different sources in the same file block Certified. The advantage of this method is that the amount of calculation is reduced by preprocessing the partition process, the distributed computing achieves the purpose of acceleration, and it is easy to expand. The disadvantage is that the preprocessing process takes a long time.
随着天文观测技术的发展,天文数据呈爆发式增长,开发一种高效的交叉证认方法迫在眉睫。With the development of astronomical observation technology, astronomical data is growing explosively, and it is imminent to develop an efficient cross-certification method.
发明内容 Contents of the invention
基于上述现有技术存在的问题,本发明提出了一种并行天文交叉证认方法,利用天文星表数据建立多维矩阵模型,在集群系统环境中采用分布式运行方式实现天文星表的交叉证认。Based on the problems existing in the above-mentioned prior art, the present invention proposes a parallel astronomical cross-certification method, which uses the astronomical star catalog data to establish a multidimensional matrix model, and adopts a distributed operation mode in the cluster system environment to realize the cross-certification of the astronomical star catalog .
本发明提供一种并行天文交叉证认方法,以集群系统环境作为执行环境,该方法包括以下步骤:The present invention provides a parallel astronomical cross-certification method, using the cluster system environment as the execution environment, the method includes the following steps:
步骤一:搭建集群计算环境;Step 1: Build a cluster computing environment;
步骤二:根据天文星表数据建立多维矩阵模型,具体包括以下处理:Step 2: Establish a multidimensional matrix model based on the astronomical star catalog data, specifically including the following processing:
(1)选择矩阵维度和矩阵单元属性(1) Select matrix dimensions and matrix cell attributes
以星表数据中的位置属性即赤经RA、赤纬DEC属性为两个维度,以天文星表数据的其他属性作为每个矩阵单元的属性建立多维矩阵;Taking the position attribute in the star catalog data, that is, the right ascension RA and declination DEC attributes as two dimensions, and using other attributes of the astronomical star catalog data as the attributes of each matrix unit to establish a multidimensional matrix;
(2)选择矩阵块大小(2) Select the matrix block size
选择矩阵块大小,以此块大小对整个多维矩阵进行划分,得到多个矩阵块;Select the matrix block size, divide the entire multidimensional matrix with this block size, and obtain multiple matrix blocks;
步骤三:在集群间实现矩阵块分发,具体包括以下处理:Step 3: Realize matrix block distribution between clusters, including the following processing:
对每个天文星表,根据步骤二中所划分的矩阵块数,为每个矩阵块建立对应的文件,将天文星表数据按照位置信息即赤经RA、赤纬DEC计算出各数据所属矩阵块并写入到相应的文件中;之后以轮询的方式将这些文件分发到集群中的各个节点上,即第i块放到第i%n个节点上;For each astronomical star catalog, according to the number of matrix blocks divided in
步骤4:进行分布式计算,具体包括以下处理:Step 4: Perform distributed computing, specifically including the following processing:
对于集群系统中的每个节点,启动与该节点CPU数相同的线程数;对于需要并行证认的两个星表,以轮询的方式将编号相同的两个天文星表数据的文件分配给当前所在节点上的线程进行交叉证认计算,将结果写入临时文件;For each node in the cluster system, start the same number of threads as the number of CPUs of the node; for the two star catalogs that need to be certified in parallel, assign the two astronomical star catalog data files with the same number to the The thread on the current node performs cross-certification calculation and writes the result to a temporary file;
在所有线程的计算完成后,将所有临时文件汇总,得到最终证认结果。After the calculation of all threads is completed, all temporary files are aggregated to obtain the final authentication result.
与现有技术相比,本发明提供一种易于操作,并行度高,可扩展性强的交叉证认方法,在保证证认正确性的前提下,能够提高交叉证认的性能。Compared with the prior art, the present invention provides a cross-certification method that is easy to operate, has high parallelism and strong scalability, and can improve the performance of the cross-certification on the premise of ensuring the correctness of the certification.
附图说明 Description of drawings
图1为天文星表数据矩阵模型示意图;Fig. 1 is a schematic diagram of the astronomical star catalog data matrix model;
图2为集群间矩阵块分发示意图;Fig. 2 is a schematic diagram of inter-cluster matrix block distribution;
图3为节点上多线程协作示意图。Fig. 3 is a schematic diagram of multi-thread cooperation on a node.
具体实施方式 Detailed ways
以下结合附图及较佳实施例,对依据本发明提供的具体实施方式、结构、特征及其功效,详细说明如下。The specific implementation, structure, features and effects provided by the present invention will be described in detail below in conjunction with the accompanying drawings and preferred embodiments.
本发明提出了一种并行天文交叉证认方法,基于多维矩阵模型的分布式地实现,主要分为以下四个步骤:The present invention proposes a parallel astronomical cross-certification method, based on the distributed implementation of the multidimensional matrix model, which is mainly divided into the following four steps:
步骤1:环境搭建,搭建集群计算环境。Step 1: Environment construction, build a cluster computing environment.
步骤2:建立多维矩阵模型Step 2: Build a multidimensional matrix model
天文星表数据主要由位置信息(包括:赤经,简记为RA;赤纬,简记为DEC)和其他观测值组成,本发明为天文数据所建立的矩阵如图1所示,建立该多维矩阵模型的具体步骤如下:Astronomical star catalog data is mainly made up of position information (comprising: right ascension, abbreviated as RA; declination, abbreviated as DEC) and other observation values, the matrix that the present invention establishes for astronomical data is as shown in Figure 1, establishes this The specific steps of the multidimensional matrix model are as follows:
(1)选择矩阵维度和矩阵单元属性(1) Select matrix dimensions and matrix cell attributes
星表数据具有多个属性,以星表数据中的RA(赤经,范围为0-360)、DEC(赤纬,范围为-90-90)属性为两个维度,将其他属性作为每个矩阵单元的属性建立二维矩阵。对于RA和DEC,要根据星表数据中两者的精度扩大相应的倍数,将其化为整数。例如RA、DEC的精度均为10-6,则将星表数据中RA、DEC的值均扩大106,使RA的范围为0—360000000,DEC的范围为-90000000—90000000,从而矩阵大小为360000000×180000000。The star catalog data has multiple attributes, with the RA (right ascension, range 0-360) and DEC (declination, range -90-90) attributes in the star catalog data as two dimensions, and other attributes as each The properties of the matrix elements create a two-dimensional matrix. For RA and DEC, it is necessary to expand the corresponding multiples according to the precision of the two in the star catalog data, and turn them into integers. For example, the accuracy of RA and DEC are both 10 -6 , then the values of RA and DEC in the star catalog data are expanded by 10 6 , so that the range of RA is 0-360000000, and the range of DEC is -90000000-90000000, so the matrix size is 360000000×180000000.
(2)选择矩阵块大小(2) Select the matrix block size
合理选择矩阵块大小划分矩阵可提高并行度,从而提高数据处理性能。例如,若将矩阵划分为3600块,则每块大小为6000000×3000000。Reasonable choice of matrix block size to partition the matrix can increase the degree of parallelism, thereby improving the performance of data processing. For example, if the matrix is divided into 3600 blocks, the size of each block is 6000000×3000000.
步骤3:在集群间分发矩阵块Step 3: Distribute matrix blocks across the cluster
对每个星表,根据步骤(2)中的分块数,为每个矩阵块建立对应的文件(文件带有矩阵块编号作为标记),将星表数据按照位置信息计算出所属矩阵块并写入到相应的文件中;之后以轮询的方式将这些文件分发到集群中的各个节点上,即第i块放到第i%n个节点上。For each star catalog, according to the number of blocks in step (2), create a corresponding file for each matrix block (the file has a matrix block number as a mark), calculate the matrix block to which the star catalog data belongs according to the position information, and Write it into the corresponding file; then distribute these files to each node in the cluster in a round-robin manner, that is, put the i-th block on the i%n-th node.
步骤4:进行分布式计算Step 4: Do Distributed Computing
对于每个节点,启动与该节点CPU数相同的线程数。对于需要并行证认的两个星表,步骤(3)保证了它们在每个节点上的文件数目和编号均相同,而且相同编号的文件中的数据具有相同的坐标范围,故以轮询的方式将编号相同的两个星表的数据文件分配给这个节点上的线程进行交叉证认计算,将结果写入临时文件。For each node, start the number of threads equal to the number of CPUs on that node. For the two star catalogs that need to be certified in parallel, step (3) ensures that the number and number of their files on each node are the same, and the data in the files with the same number have the same coordinate range, so the polling The method assigns the data files of two star catalogs with the same number to the thread on this node to perform cross-certification calculation, and write the result into a temporary file.
在所有线程的计算完成后,将所有临时文件汇总,得到最终结果。After the calculation of all threads is completed, all temporary files are aggregated to obtain the final result.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210194308.5A CN102768675B (en) | 2012-06-13 | 2012-06-13 | Parallel astronomical cross identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210194308.5A CN102768675B (en) | 2012-06-13 | 2012-06-13 | Parallel astronomical cross identification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102768675A true CN102768675A (en) | 2012-11-07 |
CN102768675B CN102768675B (en) | 2014-11-12 |
Family
ID=47096079
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210194308.5A Expired - Fee Related CN102768675B (en) | 2012-06-13 | 2012-06-13 | Parallel astronomical cross identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102768675B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491471A (en) * | 2017-06-19 | 2017-12-19 | 天津科技大学 | Extensive chronometer data day area covering generation method based on Spark |
CN111414572A (en) * | 2020-04-10 | 2020-07-14 | 中国科学院国家天文台 | Cross-certification method, device and readable storage medium for radio star catalogue and infrared star catalogue |
CN113485638A (en) * | 2021-06-07 | 2021-10-08 | 贵州大学 | Access optimization system for massive astronomical data |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6816848B1 (en) * | 2000-06-12 | 2004-11-09 | Ncr Corporation | SQL-based analytic algorithm for cluster analysis |
CN101710286A (en) * | 2009-12-23 | 2010-05-19 | 天津大学 | Parallel programming model system of DAG oriented data driving type application and realization method |
CN101826016A (en) * | 2010-05-13 | 2010-09-08 | 天津大学 | Visual modeling and code skeleton generating method for supporting design of multinuclear parallel program |
CN101887367A (en) * | 2010-06-22 | 2010-11-17 | 天津大学 | A Multilevel Parallel Programming Method |
-
2012
- 2012-06-13 CN CN201210194308.5A patent/CN102768675B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6816848B1 (en) * | 2000-06-12 | 2004-11-09 | Ncr Corporation | SQL-based analytic algorithm for cluster analysis |
CN101710286A (en) * | 2009-12-23 | 2010-05-19 | 天津大学 | Parallel programming model system of DAG oriented data driving type application and realization method |
CN101826016A (en) * | 2010-05-13 | 2010-09-08 | 天津大学 | Visual modeling and code skeleton generating method for supporting design of multinuclear parallel program |
CN101887367A (en) * | 2010-06-22 | 2010-11-17 | 天津大学 | A Multilevel Parallel Programming Method |
Non-Patent Citations (4)
Title |
---|
宋烜 等: "用MapReduce实现天文星表交叉认证", 《计算机应用研究》 * |
赵青 等: "基于MapReduce模型的分布式天文交叉证认", 《计算机应用研究》 * |
赵青 等: "面向海量数据的并行天文交叉证认", 《计算机应用》 * |
赵青: "面向海量数据的高效天文交叉证认的研究", 《中国博士学位论文全文数据库 信息科技辑 》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107491471A (en) * | 2017-06-19 | 2017-12-19 | 天津科技大学 | Extensive chronometer data day area covering generation method based on Spark |
CN107491471B (en) * | 2017-06-19 | 2020-05-22 | 天津科技大学 | A Spark-based method for generating sky area coverage for large-scale astronomical data |
CN111414572A (en) * | 2020-04-10 | 2020-07-14 | 中国科学院国家天文台 | Cross-certification method, device and readable storage medium for radio star catalogue and infrared star catalogue |
CN111414572B (en) * | 2020-04-10 | 2023-06-13 | 中国科学院国家天文台 | Method, device and readable storage medium for cross-certification of radio star catalog and infrared star catalog |
CN113485638A (en) * | 2021-06-07 | 2021-10-08 | 贵州大学 | Access optimization system for massive astronomical data |
Also Published As
Publication number | Publication date |
---|---|
CN102768675B (en) | 2014-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104376053B (en) | A kind of storage and retrieval method based on magnanimity meteorological data | |
Shao et al. | Efficient cohesive subgraphs detection in parallel | |
覃雄派 et al. | Big data analysis—competition and symbiosis of RDBMS and MapReduce | |
Neelakandan et al. | Large scale optimization to minimize network traffic using MapReduce in big data applications | |
CN103793442B (en) | The processing method and system of spatial data | |
CN102662639A (en) | Mapreduce-based multi-GPU (Graphic Processing Unit) cooperative computing method | |
CN103413067A (en) | Abstract convex lower-bound estimation based protein structure prediction method | |
CN105956666B (en) | A kind of machine learning method and system | |
US11030714B2 (en) | Wide key hash table for a graphics processing unit | |
CN105405070A (en) | Distributed memory power grid system construction method | |
CN103699656A (en) | GPU-based mass-multimedia-data-oriented MapReduce platform | |
Wang et al. | Distributed storage and index of vector spatial data based on HBase | |
Liu et al. | Profiling and improving i/o performance of a large-scale climate scientific application | |
You et al. | Spatial join query processing in cloud: Analyzing design choices and performance comparisons | |
CN105718561A (en) | Particular distributed data storage file structure redundancy removing construction method and system | |
Ding et al. | ComMapReduce: An improvement of MapReduce with lightweight communication mechanisms | |
Hashem et al. | An Integrative Modeling of BigData Processing. | |
CN107358061A (en) | Elasticity distribution formula sequence alignment system and method based on Spark and SIMD | |
CN102768675A (en) | A Parallel Astronomical Cross-certification Method | |
Ji et al. | Scalable nearest neighbor query processing based on inverted grid index | |
Mittal et al. | Efficient random data accessing in MapReduce | |
Xun et al. | Parallel spatial index algorithm based on Hilbert partition | |
Zhou et al. | SparkSCAN: a structure similarity clustering algorithm on spark | |
Ding et al. | Commapreduce: An improvement of mapreduce with lightweight communication mechanisms | |
CN106484818B (en) | A Hierarchical Clustering Method Based on Hadoop and HBase |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20141112 |
|
CF01 | Termination of patent right due to non-payment of annual fee |