After working on HBase from past one and half year I decided to share my understanding. In this blog I will try to describe the high level functioning of HBase and the different components involved.
HBase – The Basics:
HBase is an open-source, NoSQL, distributed, column-oriented data store which has been implemented from Google BigTable that runs on top of HDFS. It was developed as part of Apache’s Hadoop project and runs on top of HDFS (Hadoop Distributed File System). HBase provides all the features of Google BigTable. We can call HBase a “Data Store” than a “Data Base” as it lacks many of the features available in traditional database, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
The Data model consists of Table name, row key, column family, columns, time stamp. While creating tables in HBase, the rows will be uniquely identified with the help of row keys and time stamp. In this data model the column family are static whereas columns are dynamic. Now let us look into the HBase Architecture.
- HMaster: The HBase HMaster is a lightweight process responsible for assigning regions to RegionServers in the Hadoop cluster to achieve load balancing.
- RegionServer: HBase RegionServers are the worker nodes that handle read, write, update, and delete requests from clients. The RegionServer process typically runs on each Hadoop node in the cluster.
- ZooKeeper: Zoo keeper helps in keeping a track of all region servers that are there for HBase. Zoo keeper keeps track of how many region servers are there, which region servers are holding from which data node. HMaster gets the details of region servers by contacting Zoo keeper.
- Memstore: Memstore is an in-memory storage, hence the Memstore utilizes the in-memory storage of each data node to store the logs. Rows are written to theMemStore. The data in the MemStore is ordered.When certain thresholds are met, Memstore data gets flushed into HFile. Every time Memstore flush happens one HFile created for each ColumnFamily
- HFile: HFiles are the actual storage files i.e. physical representation of data in HFile, specifically created to serve one purpose: store HBase’s data fast and efficiently. Clients do not read HFiles directly but go through region servers to get to the data.
HBase Table Operations :
- Reads: Client read requests are directed to the proper RegionServer by the ZooKeeper service. Clients can read all columns for a given row, or read an entire column or column family for a range of rows.
- Writes: An HBase write to a single row is atomic, meaning the whole operation either succeeds or fails, even if the write occurs across column families. A write operation to multiple rows however is not atomic, some row writes may succeed while others fail.
- Updates: Each Cell in Hbase is capable of storing multiple values, each with an associated version, or timestamp corresponding to the time the value was written. Users can specify time to live values for cells, instructing HBase to delete old cells at a given interval.
- Deletes: When an HBase client wishes to delete a row, it is not immediately removed from the table. Instead, HBase writes a tombstone marker to the blocks of data storing the row. The data is permanently removed from storage during the next major compaction.
HBase Table Maintenance:
- Minor Compactions: When data is written to HBase, it is first written to an in-memory structure called a memstore for performance. Intermittently, when the memstore reaches a certain size, the data is written to a store-file on disk and marked read-only. When the number of storefiles reaches a configured threshold, a minor compaction occurs to merge multiple storefiles.
- Major Compactions: Periodically, default every 24 hours, a major compaction runs to merge all storefiles together into a single storefile on each RegionServer. In addition, the RegionServer walks its tables to find any rows that were marked with a tombstone, meaning a delete was requested, and those rows are purged at this time.
Compactions, especially major compactions, can take a toll on utilization of a RegionServer. Client requests made during a compaction will experience latency and jitter as a result of resource contention.
Diagram of HBase Table and Region servers
In HBase it works something like this:
- Edits (Puts, etc) are collected and sorted in memory (using a skip list specifically). HBase calls this the “memstore”
- When the memstore reached a certain size (hbase.hregion.memstore.flush.size) it is written (or flushed) to disk as a new “HFile”
- There is one memstore per region and column family
- Upon read, HBase performs a merge sort between all – partially sorted – memstore disk images (i.e. the HFiles)
HBase stores rows of data in tables. Tables are split into chunks of rows called “regions”. Those regions are distributed across the cluster, hosted and made available to client processes by the RegionServer process. A region is a continuous range within the key space, meaning all rows in the table that sort between the region’s start key and end key are stored in the same region. Regions are non-overlapping, i.e. a single row key belongs to exactly one region at any point in time. A region is only served by a single region server at any point in time, which is how HBase guarantees strong consistency within a single row#.