Category MapReduce code

Converting Hive query (Joining multiple tables) into MapReduce using Job Chaining

It’s been a while since I last time blogged. I am writing this post which gives you an idea how to convert a hive query which joins multiple tables into a MapReduce job. You might be wondering why I should ever think of writing a MapReduce query when Hive does it for me ? You […]

Implementing Partitioners and Combiners for MapReduce

Partitioners and Combiners in MapReduce Partitioners are responsible for dividing up the intermediate key space and assigning intermediate key-value pairs to reducers. In other words, the partitioner specifies the task to which an intermediate key-value pair must be copied. Within each reducer, keys are processed in sorted order. Combiners are an optimization in MapReduce that […]

How to write MapReduce program in Java with example

Understanding fundamental of MapReduce MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework. MapReduce is a game all about Key-Value pair. I […]