Tag Archives: BigData

Converting Hive query (Joining multiple tables) into MapReduce using Job Chaining

It’s been a while since I last time blogged. I am writing this post which gives you an idea how to convert a hive query which joins multiple tables into a MapReduce job. You might be wondering why I should ever think of writing a MapReduce query when Hive does it for me ? You […]

HBase Architecture

After working on HBase from past one and half year I decided to share my understanding. In this blog I will try to describe the high level functioning of HBase and the different components involved. HBase – The Basics: HBase is an open-source, NoSQL, distributed, column-oriented data store which has been implemented from Google BigTable […]

Excel InputFormat for Hadoop MapReduce

Excel Spreadsheet Input Format for Hadoop Map Reduce I want to read a Microsoft Excel spreadsheet using Map Reduce, and found that I cannot use Text Input format of Hadoop to fulfill my requirement. Hadoop does not understand Excel spreadsheet so I landed upon writing custom Input format to achieve the same. Hadoop works with […]

How to write MapReduce program in Java with example

Understanding fundamental of MapReduce MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework. MapReduce is a game all about Key-Value pair. I […]