Tag Archives: Hadoop

Log analyzer example using Spark and Scala

Again a long time to write some technical stuffs on Big Data but believe me the wait was worth. It’s been some couple of months now since I started reading and writing Scala and Spark and finally I am confident enough to share the knowledge I have gained.As said before learning Scala is worth but […]

Converting Hive query (Joining multiple tables) into MapReduce using Job Chaining

It’s been a while since I last time blogged. I am writing this post which gives you an idea how to convert a hive query which joins multiple tables into a MapReduce job. You might be wondering why I should ever think of writing a MapReduce query when Hive does it for me ? You […]

BulkLoading data into HBase table using MapReduce

My previous post will give a high level architecture of different components used in HBase and its functioning. Here in this post I will discuss how to bulk load source data directly into HBase table using HBase bulkloading feature. Apache HBase gives you random, real-time, read/write access to your Big Data, but how do you […]

HBase Architecture

After working on HBase from past one and half year I decided to share my understanding. In this blog I will try to describe the high level functioning of HBase and the different components involved. HBase – The Basics: HBase is an open-source, NoSQL, distributed, column-oriented data store which has been implemented from Google BigTable […]

Architecture of HDFS Write and Read

The Hadoop Distributed File System HDFS is a distributed file system which is designed to overcome some of the limitations of other file system like NFS (Network File System) which Unix Solaris, Mac Os uses to name a few. Some of the distributed computing features which HDFS possesses are: Deals with huge amount of data […]

Introduction to Big Data

Big Data in layman’s term: Big Data is the latest buzzword which describes enormous volumes of both structured and unstructured data. The fundamental difference between both structured and unstructured data is former can be consistently pumped into any relational database or in any structured file format such as XML by knowing the schema, while the […]