The CopyTable utility copies data from a source table, row by row, to an existing destination table with the same schema as the source. To include all versions in the date range, set to a value greater than your maximum possible row versions, such as 100000. Note that you have to specify the number of versions of each row to export. Each date is in milliseconds since the Unix epoch. You can also specify a date range for the rows to include, which allows you to perform the process incrementally. This approach offers table-level granularity. For example, in Azure Azure Data Lake Storage Gen2, the syntax Azure Data Lake Storage Gen1, the syntax is:
Specify the full export path to the default storage or to any of the attached storage options. To import table data, SSH into the head node of your destination HDInsight cluster and then run the following hbase command: hbase .mapreduce.Import "" "///" The export directory must not already exist. To export table data, first SSH into the head node of your source HDInsight cluster and then run the following hbase command: hbase .mapreduce.Export "" "///" You can then copy the exported folder to the destination storage location, and run the Import utility on the destination HDInsight cluster. On the source HDInsight cluster, use the Export utility (included with HBase) to export data from a source table to the default attached storage. For Azure Storage, use AzCopy, and for Data Lake Storage use AdlCopy. The new instance is created with all the existing data.Ĭopy the hbase folder to a different Azure Storage blob container or Data Lake Storage location, and then start a new cluster with that data. Before you can rely on this folder as an accurate representation of the HBase data, you must shut down the cluster.Īfter you delete the cluster, you can either leave the data in place, or copy the data to a new location:Ĭreate a new HDInsight instance pointing to the current storage location.
In either case, the hbase folder contains all the data that HBase has flushed to disk, but it may not contain the in-memory data. This root path typically has a clusters folder, with a subfolder named after your HDInsight cluster: In an Azure Storage account the hbase folder resides at the root of the blob Azure Data Lake Storage, the hbase folder resides under the root path you specified when provisioning a cluster. In either case, HBase stores its data and metadata files under the following path: HBase in HDInsight uses the default storage selected when creating the cluster, either Azure Storage blobs or Azure Data Lake Storage. Subsequent approaches provide greater control. With this approach, you copy all HBase data, without being able to select a subset of tables or column families.
The following sections describe the usage scenario for each of these approaches. Apache Phoenix stores its metadata in HBase tables, so that metadata is backed up when you back up the HBase system catalog tables.