Tag Archives: Big Data

Log analyzer example using Spark and Scala

Again a long time to write some technical stuffs on Big Data but believe me the wait was worth. It’s been some couple of months now since I started reading and writing Scala and Spark and finally I am confident enough to share the knowledge I have gained.As said before learning Scala is worth but […]

Implementing Partitioners and Combiners for MapReduce

Partitioners and Combiners in MapReduce Partitioners are responsible for dividing up the intermediate key space and assigning intermediate key-value pairs to reducers. In other words, the partitioner specifies the task to which an intermediate key-value pair must be copied. Within each reducer, keys are processed in sorted order. Combiners are an optimization in MapReduce that […]

How to write MapReduce program in Java with example

Understanding fundamental of MapReduce MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework. MapReduce is a game all about Key-Value pair. I […]

Architecture of HDFS Write and Read

The Hadoop Distributed File System HDFS is a distributed file system which is designed to overcome some of the limitations of other file system like NFS (Network File System) which Unix Solaris, Mac Os uses to name a few. Some of the distributed computing features which HDFS possesses are: Deals with huge amount of data […]

Introduction to Big Data

Big Data in layman’s term: Big Data is the latest buzzword which describes enormous volumes of both structured and unstructured data. The fundamental difference between both structured and unstructured data is former can be consistently pumped into any relational database or in any structured file format such as XML by knowing the schema, while the […]