Tag Archives: MapReduce

BulkLoading data into HBase table using MapReduce

My previous post will give a high level architecture of different components used in HBase and its functioning. Here in this post I will discuss how to bulk load source data directly into HBase table using HBase bulkloading feature. Apache HBase gives you random, real-time, read/write access to your Big Data, but how do you […]

Excel InputFormat for Hadoop MapReduce

Excel Spreadsheet Input Format for Hadoop Map Reduce I want to read a Microsoft Excel spreadsheet using Map Reduce, and found that I cannot use Text Input format of Hadoop to fulfill my requirement. Hadoop does not understand Excel spreadsheet so I landed upon writing custom Input format to achieve the same. Hadoop works with […]

How to write MapReduce program in Java with example

Understanding fundamental of MapReduce MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework. MapReduce is a game all about Key-Value pair. I […]