Understanding fundamental of MapReduce MapReduce is a framework designed for writing programs that process large volume of structured and unstructured data in parallel fashion across a cluster, in a reliable and fault-tolerant manner. MapReduce concept is simple to understand who are familiar with distributed processing framework. MapReduce is a game all about Key-Value pair. I […]

The Hadoop Distributed File System HDFS is a distributed file system which is designed to overcome some of the limitations of other file system like NFS (Network File System) which Unix Solaris, Mac Os uses to name a few. Some of the distributed computing features which HDFS possesses are: Deals with huge amount of data […]

Big Data in layman’s term: Big Data is the latest buzzword which describes enormous volumes of both structured and unstructured data. The fundamental difference between both structured and unstructured data is former can be consistently pumped into any relational database or in any structured file format such as XML by knowing the schema, while the […]