[go: up one dir, main page]

CN104199963A - Method and device for HBase data backup and recovery - Google Patents

Method and device for HBase data backup and recovery Download PDF

Info

Publication number
CN104199963A
CN104199963A CN201410483014.3A CN201410483014A CN104199963A CN 104199963 A CN104199963 A CN 104199963A CN 201410483014 A CN201410483014 A CN 201410483014A CN 104199963 A CN104199963 A CN 104199963A
Authority
CN
China
Prior art keywords
data
hbase
file
recovery
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410483014.3A
Other languages
Chinese (zh)
Inventor
刘璧怡
郭美思
吴楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Beijing Electronic Information Industry Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410483014.3A priority Critical patent/CN104199963A/en
Publication of CN104199963A publication Critical patent/CN104199963A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供了一种HBase数据备份恢复的方法和装置,包括:在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中;为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份;在HBase数据进行恢复时,若所需要恢复的数据是持久化数据,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复;若所需要恢复的数据是内存数据,根据日志文件,对HBase内存数据进行恢复。本发明能够高效且完整地对HBase数据进行备份恢复。

The present invention provides a method and device for HBase data backup and recovery, comprising: when HBase data is backed up, brushing the data in the HBase memory into the HFile file; for each HFile in each Region under the HBase table structure Create the corresponding reference file for the file, and back up the HFile file of each HBase table; when restoring the HBase data, if the data to be restored is persistent data, perform persistence according to the reference file corresponding to the data to be restored Data recovery; if the data to be recovered is memory data, restore the HBase memory data according to the log files. The invention can backup and recover HBase data efficiently and completely.

Description

HBase数据备份恢复的方法和装置Method and device for HBase data backup and recovery

技术领域technical field

本发明涉及数据处理技术领域,尤其涉及一种HBase(Hadoop Database)数据备份恢复的方法和装置。The invention relates to the technical field of data processing, in particular to a method and a device for HBase (Hadoop Database) data backup and recovery.

背景技术Background technique

伴随着海量数据时代的到来,计算模型也经历着多种模式的演变。从单一的计算机到分布式计算的演变过程是持续增长数据量的必然趋势。现阶段,大数据集的分析、管理、挖掘等需求都是传统的数据库无法胜任的,据统计,数据库工具处理的结构化数据是在GB级别上,传统技术无法适应这种扩展性。现拥有的技术和工具中,最成熟的是Hadoop文件存储计算框架及架构于其上的相关组件。Hadoop本身由Hadoop分布式文件系统(HDFS,HadoopDistributed File System)和分布式计算框架MapReduce组成,其中MapReduce计算框架主要适用于批量文件处理,而在实时数据分析处理方面主要运用的技术是HBase。With the advent of the era of massive data, computing models are also undergoing evolution in multiple modes. The evolution from a single computer to distributed computing is an inevitable trend of continued growth in data volumes. At this stage, the analysis, management, and mining of large data sets are beyond the reach of traditional databases. According to statistics, the structured data processed by database tools is at the GB level, and traditional technologies cannot adapt to this scalability. Among the existing technologies and tools, the most mature is the Hadoop file storage computing framework and related components built on it. Hadoop itself is composed of Hadoop Distributed File System (HDFS, Hadoop Distributed File System) and distributed computing framework MapReduce. The MapReduce computing framework is mainly suitable for batch file processing, and the main technology used in real-time data analysis and processing is HBase.

HBase是建立在分布式文件系统HDFS之上,是一个提供高可靠性、列存储、高性能、可伸缩、能实时读写的分布式数据库系统。它介于非关系型数据库和关系型数据库之间,能通过主键及主键范围来检索数据。在HBase中的数据是随时间而变化的,如果想要使用某一时间段的数据需要对其进行备份恢复。因此,HBase数据备份恢复是非常重要的。HBase is built on the distributed file system HDFS. It is a distributed database system that provides high reliability, column storage, high performance, scalability, and real-time read and write. It is between non-relational databases and relational databases, and can retrieve data by primary key and primary key range. The data in HBase changes with time. If you want to use the data of a certain period of time, you need to back it up and restore it. Therefore, HBase data backup and recovery is very important.

在HBase中数据是分成两部分存储的,一部分是在内存中,另一部分是以HFile(Hadoop File)文件的形式持久化到HDFS上。因此,在执行数据备份恢复时需将两部分的内容进行备份恢复,但是,现有的HBase数据备份恢复方法在进行HBase数据备份恢复时需要对HBase的服务进行停止,影响用户操作,而且根据HBase中的日志文件进行数据备份恢复,难以保证在备份恢复时两部分内容的数据完整性。In HBase, data is stored in two parts, one part is in memory, and the other part is persisted to HDFS in the form of HFile (Hadoop File). Therefore, the content of the two parts needs to be backed up and restored when performing data backup and restoration. However, the existing HBase data backup and restoration methods need to stop the HBase service when performing HBase data backup and restoration, which affects user operations, and according to HBase It is difficult to guarantee the data integrity of the two parts during backup and restoration.

发明内容Contents of the invention

为了解决上述技术问题,本发明提供了一种HBase数据备份恢复方法和装置,能够高效且完整地对HBase数据进行备份恢复。In order to solve the above technical problems, the present invention provides a HBase data backup and recovery method and device, which can efficiently and completely perform backup and recovery of HBase data.

为了达到本发明目的,本发明提供了一种HBase数据备份恢复方法,包括:在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中;为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份;在HBase数据进行恢复时,若所需要恢复的数据是持久化数据,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复;若所需要恢复的数据是内存数据,根据日志文件,对HBase内存数据进行恢复。In order to achieve the object of the present invention, the present invention provides a kind of HBase data backup recovery method, comprising: when HBase data is backed up, the data in the HBase internal memory is brushed in the HFile file; Create a corresponding reference file for each HFile file, and back up the HFile file of each HBase table; when restoring HBase data, if the data to be restored is persistent data, according to the reference file corresponding to the data to be restored, Perform persistent data recovery; if the data to be recovered is memory data, restore the HBase memory data according to the log file.

进一步地,若所需要恢复的数据是持久化数据,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复,包括:若所需要恢复的数据是持久化数据,将所需要恢复的数据对应的引用文件放入HBase表中对应Region文件夹下,进行持久化数据恢复。Further, if the data to be restored is persistent data, perform persistent data recovery according to the reference file corresponding to the data to be restored, including: if the data to be restored is persistent data, the data to be restored The corresponding reference file is placed in the corresponding Region folder in the HBase table for persistent data recovery.

进一步地,进行持久化数据恢复是在HBase的Region进行合并时整理成完整的数据,合并操作是将多个HFile文件合并成大文件,在达到HBase合并配置参数值时自动触发。Furthermore, the recovery of persistent data is to organize the complete data when HBase Regions are merged. The merge operation is to merge multiple HFile files into a large file, which is automatically triggered when the HBase merge configuration parameter value is reached.

进一步地,日志文件包括执行操作的HBase表、执行操作的HBase表对应的Region和进行的相应执行操作。Further, the log file includes the HBase table that performs the operation, the Region corresponding to the HBase table that performs the operation, and the corresponding execution operation.

进一步地,若所需要恢复的数据是内存数据,根据日志文件,对HBase内存数据进行恢复,包括:若所需要恢复的数据是内存数据,根据日志文件中对应的HBase表名称和Region名称,将日志文件恢复到相应的Region文件夹下;当引用文件及日志放置到Region文件夹下相应的位置时,启动HBase表,Region被分配到相应的RegionServer中,且Region会读取对应的日志文件到自己内部的MemStore中,完成HBase数据恢复。Further, if the data to be restored is memory data, restore the HBase memory data according to the log file, including: if the data to be restored is memory data, according to the corresponding HBase table name and Region name in the log file, the The log file is restored to the corresponding Region folder; when the reference file and log are placed in the corresponding location under the Region folder, the HBase table is started, the Region is assigned to the corresponding RegionServer, and the Region will read the corresponding log file to Complete HBase data recovery in its own internal MemStore.

一种HBase数据备份恢复装置,包括:备份单元,用于在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中;为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份;恢复单元,用于在HBase数据进行恢复时,若所需要恢复的数据是持久化数据,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复;若所需要恢复的数据是内存数据,根据日志文件,对HBase内存数据进行恢复。A kind of HBase data backup recovery device, comprising: a backup unit, for when HBase data is backed up, the data in the HBase internal memory is brushed in the HFile file; Create for each HFile file in each Region under the HBase table structure The corresponding reference file is used to back up the HFile file of each HBase table; the recovery unit is used to restore the HBase data, if the data to be restored is persistent data, according to the reference file corresponding to the data to be restored, Perform persistent data recovery; if the data to be recovered is memory data, restore the HBase memory data according to the log file.

与现有技术相比,本发明包括:在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中;为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份;在HBase数据进行恢复时,若所需要恢复的数据是持久化数据,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复;若所需要恢复的数据是内存数据,根据日志文件,对HBase内存数据进行恢复。本发明通过将HBase内存中的数据刷到HFile文件中,统一对HFile文件进行备份操作,保证了在备份恢复时HBase中内存和HDFS两部分内容的数据完整性。此外,为每个HFile文件创建相应的引用文件,即对HBase中存储数据的位置进行备份,在进行数据恢复时,通过引用文件进行HBase的数据恢复,保证了在进行HBase数据备份恢复时数据的正常使用。因此,能够高效且完整地对HBase数据进行备份恢复。Compared with the prior art, the present invention includes: when the HBase data is backed up, the data in the HBase memory is brushed into the HFile file; for each HFile file in each Region under the HBase table structure, a corresponding reference file is created , back up the HFile file of each HBase table; when restoring HBase data, if the data to be restored is persistent data, restore the persistent data according to the reference file corresponding to the data to be restored; if required The restored data is in-memory data, and the HBase in-memory data is restored according to the log files. In the present invention, by brushing the data in the HBase memory into the HFile file, the HFile file is uniformly backed up, thereby ensuring the data integrity of the contents of the memory and the HDFS in the HBase when the backup is restored. In addition, create a corresponding reference file for each HFile file, that is, back up the location where the data is stored in HBase. During data recovery, HBase data recovery is performed through the reference file, which ensures data integrity during HBase data backup and recovery. Normal use. Therefore, HBase data can be efficiently and completely backed up and restored.

附图说明Description of drawings

图1是本发明HBase存储数据的框架示意图。Fig. 1 is a schematic diagram of the frame of HBase storing data in the present invention.

图2是本发明HBase数据备份恢复方法的流程示意图。Fig. 2 is a schematic flow chart of the HBase data backup recovery method of the present invention.

图3是本发明本发明HBase数据备份恢复装置的结构示意图。Fig. 3 is a schematic structural diagram of the HBase data backup and recovery device of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明进行进一步的详细说明。通过足够详细的描述这些实施示例,使得本领域技术人员能够实践本发明。在不脱离本发明的主旨和范围的情况下,可以对实施做出逻辑的、实现的和其他的改变。The present invention will be further described in detail below in conjunction with the accompanying drawings. These implementation examples are described in sufficient detail to enable those skilled in the art to practice the invention. Logical, implementation and other changes may be made in the implementation without departing from the spirit and scope of the invention.

图1是本发明HBase存储数据的框架示意图。如图1所示,HBase表逻辑上是以Region的形式保存在RegionServer中。当HBase表对应的记录数不断增大超过一定阈值后,会自动分裂成多个Region,不同的Region会被Master分配给相应的RegionServer进行管理。Fig. 1 is a schematic diagram of the frame of HBase storing data in the present invention. As shown in Figure 1, the HBase table is logically stored in the RegionServer in the form of Region. When the number of records corresponding to the HBase table continues to increase and exceeds a certain threshold, it will be automatically split into multiple Regions, and different Regions will be assigned by the Master to the corresponding RegionServers for management.

Region是HBase中分布式存储和负载均衡的最小单元,不同的Region可以分到不同的RegionServer上,但一个Region不能拆分到多个RegionServer上。值得注意的是,Region是分布式存储的最小单元,但不是存储的最小单元,存储的最小单元是Store。Region is the smallest unit of distributed storage and load balancing in HBase. Different Regions can be assigned to different RegionServers, but a Region cannot be split into multiple RegionServers. It is worth noting that Region is the smallest unit of distributed storage, but not the smallest unit of storage. The smallest unit of storage is Store.

Region是由一个或多个Store组成,每个Store保存一个列族,每个Store又由一个内存(MemStore)和多个StoreFile组成,其中StoreFile是以HFile的形式持久化在HDFS上。因此,在HBase数据备份时需要对MemStore和HFile文件进行备份,在HBase数据恢复时需要对HFile文件及MemStore的数据进行恢复。A Region is composed of one or more Stores, each Store stores a column family, and each Store consists of a memory (MemStore) and multiple StoreFiles, where StoreFiles are persisted on HDFS in the form of HFiles. Therefore, MemStore and HFile files need to be backed up when HBase data is backed up, and HFile files and MemStore data need to be restored when HBase data is restored.

图2是本发明HBase数据备份恢复方法的流程示意图,如图2所示,该方法首先对HBase数据进行备份,然后再进行HBase数据的恢复,具体包括:Fig. 2 is the schematic flow sheet of HBase data backup recovery method of the present invention, as shown in Fig. 2, this method first carries out backup to HBase data, then carries out the recovery of HBase data, specifically comprises:

步骤21,在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中。Step 21, when the HBase data is backed up, the data in the HBase memory is flushed to the HFile file.

在本步骤中,HBase中数据是分成两部分存储的,一部分是在HBase内存中,另一部分是以HFile文件的形式持久化到HDFS上。In this step, the data in HBase is stored in two parts, one part is in the HBase memory, and the other part is persisted to HDFS in the form of HFile files.

在对HBase数据进行备份时,首先将HBase内存中的数据刷到HFile中,保证所有的数据都被持久化,则后续就可以统一对HFile文件进行相应的备份操作。When backing up HBase data, first flush the data in the HBase memory to HFile to ensure that all data is persisted, and then perform corresponding backup operations on HFile files in a unified manner.

步骤22,为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份。Step 22, creating a corresponding reference file for each HFile file in each Region under the HBase table structure, and backing up the HFile file of each HBase table.

步骤23,在HBase数据进行恢复时,判断所需要恢复的数据是否是持久化数据,如果是,进入步骤24;如果否,进入步骤25。Step 23, when restoring the HBase data, judge whether the data to be restored is persistent data, if yes, go to step 24; if not, go to step 25.

步骤24,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复。Step 24, restore the persistent data according to the reference file corresponding to the data to be restored.

本步骤具体为:将所需要恢复的数据对应的引用文件放入HBase表中对应Region文件夹下,进行持久化数据恢复。该引用文件占用的磁盘空间很小,所以拷贝的速度会很快。This step is specifically: put the reference file corresponding to the data to be restored into the corresponding Region folder in the HBase table, and restore the persistent data. The reference file occupies very little disk space, so the copying speed will be very fast.

在读取数据时,根据引用文件,在备份文件夹中找到对应的数据进行读取。When reading data, according to the reference file, find the corresponding data in the backup folder for reading.

持久化数据恢复是在HBase的Region进行合并时才整理成完整的数据。合并操作是在达到HBase合并配置参数的值时自动触发的,合并操作会将多个HFile文件合并成大文件,这样可以使得在查询等操作时不用打开多个文件,只需要打开一个文件即可。如此,可以使得HBase数据恢复后尽快的提供服务,提高了效率。Persistent data recovery is organized into complete data when HBase Regions are merged. The merging operation is automatically triggered when the value of the HBase merging configuration parameter is reached. The merging operation will merge multiple HFile files into a large file, so that you do not need to open multiple files during query operations, but only need to open one file. . In this way, services can be provided as soon as possible after the HBase data is restored, improving efficiency.

步骤25,根据日志文件,对HBase内存数据进行恢复。Step 25, restore the HBase memory data according to the log file.

本步骤是对尚未持久化的数据进行恢复,通过恢复日志文件中的数据进行对HBase内存数据进行恢复。在进行HBase内存数据恢复时,每个Region中都有属于自己的内存,因此,需要对每个Region的内存进行恢复数据。This step is to restore the data that has not been persisted, and restore the HBase memory data by restoring the data in the log file. When restoring HBase memory data, each Region has its own memory. Therefore, it is necessary to restore data to the memory of each Region.

HBase的日志文件是对每一个操作进行记录,该日志文件包括执行操作的HBase表、该执行操作的HBase表对应的Region及进行的相应执行操作。根据日志文件中对应的HBase表名称和Region名称,将日志文件恢复到相应的Region文件夹下。The HBase log file records each operation. The log file includes the HBase table that performs the operation, the Region corresponding to the HBase table that performs the operation, and the corresponding execution operation. According to the corresponding HBase table name and Region name in the log file, restore the log file to the corresponding Region folder.

当引用文件及日志都放置到了Region文件夹下相应的位置时,就可以启动HBase表。当启动HBase表时,Region会被分配到相应的RegionServer中,且Region会读取对应的日志文件到自己内部的MemStore中,完成了HBase数据恢复。When the reference files and logs are placed in the corresponding locations under the Region folder, the HBase table can be started. When the HBase table is started, the Region will be assigned to the corresponding RegionServer, and the Region will read the corresponding log file to its internal MemStore to complete the HBase data recovery.

本发明通过将HBase内存中的数据刷到HFile文件中,统一对HFile文件进行备份操作,保证了在备份恢复时HBase中内存和HDFS两部分内容的数据完整性。此外,为每个HFile文件创建相应的引用文件,即对HBase中存储数据的位置进行备份,在进行数据恢复时,通过引用文件进行HBase的数据恢复,保证了在进行HBase数据备份恢复时数据的正常使用。因此,能够高效且完整地对HBase数据进行备份恢复。In the present invention, by brushing the data in the HBase memory into the HFile file, the HFile file is uniformly backed up, thereby ensuring the data integrity of the contents of the memory and the HDFS in the HBase when the backup is restored. In addition, create a corresponding reference file for each HFile file, that is, back up the location where the data is stored in HBase. During data recovery, HBase data recovery is performed through the reference file, which ensures data integrity during HBase data backup and recovery. Normal use. Therefore, HBase data can be efficiently and completely backed up and restored.

图3是本发明HBase数据备份恢复装置的结构示意图,如图3所示,具体包括:Fig. 3 is the structural representation of HBase data backup recovery device of the present invention, as shown in Fig. 3, specifically comprises:

备份单元,用于在HBase数据进行备份时,将HBase内存中的数据刷到HFile文件中;为HBase表结构下的每个Region中的每个HFile文件创建相应的引用文件,对每个HBase表的HFile文件进行备份;The backup unit is used to flush the data in the HBase memory to the HFile file when the HBase data is backed up; to create a corresponding reference file for each HFile file in each Region under the HBase table structure, and for each HBase table The HFile file for backup;

恢复单元,用于在HBase数据进行恢复时,判断所需要恢复的数据是否是持久化数据HFile文件,如果是,根据所需要恢复的数据对应的引用文件,进行持久化数据恢复;如果否,根据日志文件,对HBase内存数据进行恢复。The recovery unit is used to determine whether the data to be recovered is a persistent data HFile file when the HBase data is recovered, and if so, perform persistent data recovery according to the reference file corresponding to the data to be recovered; if not, according to Log files are used to restore HBase memory data.

HBase数据备份恢复装置是和HBase数据备份恢复方法对应的,因此,具体的实现细节可参看HBase数据备份恢复方法,在此不赘述。The HBase data backup and recovery device corresponds to the HBase data backup and recovery method. Therefore, for specific implementation details, please refer to the HBase data backup and recovery method, which will not be repeated here.

本发明HBase数据备份恢复装置通过将HBase内存中的数据刷到HFile文件中,统一对HFile文件进行备份操作,保证了在备份恢复时HBase中内存和HDFS两部分内容的数据完整性。此外,为每个HFile文件创建相应的引用文件,即对HBase中存储数据的位置进行备份,在进行数据恢复时,通过引用文件进行HBase的数据恢复,保证了在进行HBase数据备份恢复时数据的正常使用。因此,能够高效且完整地对HBase数据进行备份恢复。The HBase data backup and recovery device of the present invention brushes the data in the HBase memory into the HFile file, and performs a unified backup operation on the HFile file, thereby ensuring the data integrity of the contents of the memory and HDFS in the HBase during backup and recovery. In addition, create a corresponding reference file for each HFile file, that is, back up the location where the data is stored in HBase. During data recovery, HBase data recovery is performed through the reference file, which ensures data integrity during HBase data backup and recovery. Normal use. Therefore, HBase data can be efficiently and completely backed up and restored.

应当理解,虽然本说明书根据实施方式加以描述,但并非每个实施方式仅包含一个独立的技术方案,说明书的这种叙述方式仅仅是为清楚起见,本领域技术人员应当将说明书作为一个整体,各实施方式中的技术方案也可以经适当组合,形成本领域技术人员可以理解的其他实施方式。It should be understood that although the description is described according to the implementations, not each implementation includes only an independent technical solution. This description of the description is only for clarity, and those skilled in the art should take the description as a whole, and each The technical solutions in the embodiments can also be properly combined to form other embodiments that can be understood by those skilled in the art.

上文所列出的一系列的详细说明仅仅是针对本发明的可行性实施方式的具体说明,它们并非用于限制本发明的保护范围,凡未脱离本发明技艺精神所作的等效实施方式或变更均应包含在本发明的保护范围之内。The series of detailed descriptions listed above are only specific descriptions for the feasible implementation modes of the present invention, and they are not used to limit the protection scope of the present invention. All changes should be included within the protection scope of the present invention.

Claims (10)

1. a method for HBase data backup restoration, is characterized in that, comprising:
When HBase data back up, the data in HBase internal memory are brushed in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
When HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
2. the method for HBase data backup restoration according to claim 1, is characterized in that, if the data of described required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If the data of required recovery are perdurable datas, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
3. the method for HBase data backup restoration according to claim 1 and 2, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
4. the method for HBase data backup restoration according to claim 1, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
5. the method for HBase data backup restoration according to claim 4, is characterized in that, if the data of described required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered, and comprising:
If the data of required recovery are internal storage datas, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
6. a HBase data backup restoration device, is characterized in that, comprising:
Backup units, for when HBase data back up, brushes the data in HBase internal memory in HFile file; For the corresponding reference document of each HFile document creation in each Region under HBase list structure, the HFile file of each HBase table is backed up;
Recovery unit, for when HBase data are recovered, if the data of required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery; If the data of required recovery are internal storage datas, according to journal file, HBase internal storage data is recovered.
7. HBase data backup restoration device according to claim 6, is characterized in that described recovery unit, if the data for required recovery are perdurable datas, the reference document corresponding according to the data of required recovery, carries out perdurable data recovery, comprising:
If described recovery unit is perdurable data for the data of required recovery, reference document corresponding to the data of required recovery put under the corresponding Region file of HBase table, carry out perdurable data recovery.
8. according to the HBase data backup restoration device described in claim 6 or 7, it is characterized in that, it is described that to carry out that perdurable data recovers be when the Region of HBase merges, to be organized into complete data, union operation is that a plurality of HFile Piece file mergences are become to large file, when reaching HBase merging configuration parameter value, automatically triggers.
9. HBase data backup restoration device according to claim 6, is characterized in that, described journal file comprises the HBase table of executable operations, the corresponding executable operations that the HBase of executable operations shows corresponding Region and carries out.
10. HBase data backup restoration device according to claim 9, is characterized in that, described recovery unit, if be internal storage data for the data of required recovery, according to journal file, recovers HBase internal storage data, comprising:
If described recovery unit is internal storage data for the data of required recovery, according to HBase table name corresponding in journal file, claim the title with Region, journal file is returned under corresponding Region file; When reference document and daily record are placed under Region file corresponding position, start HBase table, Region is assigned in corresponding RegionServer, and Region can read corresponding journal file in own inner MemStore, completes HBase data and recovers.
CN201410483014.3A 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery Pending CN104199963A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410483014.3A CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410483014.3A CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Publications (1)

Publication Number Publication Date
CN104199963A true CN104199963A (en) 2014-12-10

Family

ID=52085256

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410483014.3A Pending CN104199963A (en) 2014-09-19 2014-09-19 Method and device for HBase data backup and recovery

Country Status (1)

Country Link
CN (1) CN104199963A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104778097A (en) * 2015-03-27 2015-07-15 新浪网技术(中国)有限公司 Data recovery method and data recovery device
CN105159945A (en) * 2015-08-10 2015-12-16 北京思特奇信息技术股份有限公司 Method and system for extracting and converting data between Hbase and Hdfs
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN106294008A (en) * 2016-08-05 2017-01-04 浙江宇视科技有限公司 A kind of data reconstruction method and device
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725470B2 (en) * 2006-08-07 2010-05-25 Bea Systems, Inc. Distributed query search using partition nodes
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7725470B2 (en) * 2006-08-07 2010-05-25 Bea Systems, Inc. Distributed query search using partition nodes
CN101957863A (en) * 2010-10-14 2011-01-26 广州从兴电子开发有限公司 Data parallel processing method, device and system
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN103106286A (en) * 2013-03-04 2013-05-15 曙光信息产业(北京)有限公司 Method and device for managing metadata

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栾洋洋: "《分布式数据库HBase故障恢复方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105988995A (en) * 2015-01-27 2016-10-05 杭州海康威视数字技术股份有限公司 HFile based data batch loading method
CN105988995B (en) * 2015-01-27 2019-05-24 杭州海康威视数字技术股份有限公司 A method of based on HFile batch load data
CN104778097A (en) * 2015-03-27 2015-07-15 新浪网技术(中国)有限公司 Data recovery method and data recovery device
CN105159945A (en) * 2015-08-10 2015-12-16 北京思特奇信息技术股份有限公司 Method and system for extracting and converting data between Hbase and Hdfs
US11119863B2 (en) 2015-09-25 2021-09-14 Huawei Technologies Co., Ltd. Data backup method and data processing system
US11132260B2 (en) 2015-09-25 2021-09-28 Huawei Technologies Co., Ltd. Data processing method and apparatus
CN106294008A (en) * 2016-08-05 2017-01-04 浙江宇视科技有限公司 A kind of data reconstruction method and device
CN106294008B (en) * 2016-08-05 2019-06-11 浙江宇视科技有限公司 A data recovery method and device
CN108228752A (en) * 2017-12-21 2018-06-29 中国联合网络通信集团有限公司 Data full dose deriving method, data distribution device and data export node

Similar Documents

Publication Publication Date Title
CN104199963A (en) Method and device for HBase data backup and recovery
CN106445738B (en) Database backup method and device
CN107835983B (en) Backup and restore in distributed databases using consistent database snapshots
US8635187B2 (en) Method and system of performing incremental SQL server database backups
CN101866358B (en) Multidimensional interval querying method and system thereof
CN105573859A (en) Data recovery method and device of database
US20110106768A1 (en) Backup using metadata virtual hard drive and differential virtual hard drive
CN104239443B (en) A kind of storage method of serialized data operation log
US11663160B2 (en) Recovering the metadata of data backed up in cloud object storage
CN105630834B (en) A method and device for realizing deduplication of data
WO2012083754A1 (en) Method and device for processing dirty data
CN102314503A (en) Indexing method
CN106844089A (en) A kind of method and apparatus for recovering tree data storage
CN106469152A (en) A kind of document handling method based on ETL and system
CN114490735A (en) Method and device for constructing distributed OLAP data analysis based on MPP and full-text index
CN106155838A (en) A kind of database back-up data restoration methods and device
CN103678608A (en) Log management method and device
CN103207916A (en) Metadata processing method and device
US9053100B1 (en) Systems and methods for compressing database objects
CN105068893A (en) Database state restoration method
US10311021B1 (en) Systems and methods for indexing backup file metadata
CN111897490B (en) Method and device for deleting data
CN108984341A (en) A kind of data reconstruction method and system based on distributed memory system
CN115658391A (en) Backup recovery method of WAL mechanism based on QianBase MPP database
Prabavathy et al. Multi-index technique for metadata management in private cloud storage

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20141210

RJ01 Rejection of invention patent application after publication