Thursday, November 15, 2012

Article on Facebook big data

Recent article on Facebook's big data. Inside Facebook, Hive is used heavily. Hive handles 60k+ queries daily.

Monday, November 12, 2012

Confusing terms

You are not confused if you are able to answer following questions without any doubts

Q) Can we build a system which is highly fault-tolerant but not highly available?
Q) Can we build a system with low reliability and High availability?
Q) is reliability
defined as MTBF or MTTR?

A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.

Elasticity often refers to a system's ability to allocate additional resources in an autonomic manner

In other words, a scalable system allows you to add resources in order to handle more load, while an elastic system will add resources itself when the load increases.


Fault tolerance refers to a system's ability to continue operating, perhaps gracefully degrading in performance when components of the system fail. There is no exact measure to measure fault-tolerance of a system


Availability is a percentage of time that a system is actually operational and providing its intended service.

A = Uptime/(Uptime + Downtime)

Where there are no single points of failure might be considered system as fault tolerant, but if application-level data migrations, software upgrades, or configuration changes take an hour or more of downtime to complete, then the system is not highly available.

In simple words, how long can a system stay up continuously?

More concrete definition is “reliability is the ability of a person or system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances”

Reliability is often defined in terms of mean time between failures (MTBF). We can build a system with low-quality, not-so-reliable components and subsystems, and still achieve HA.

Durability of a system guarantees that stored data
can't be lost.





Sunday, November 11, 2012

EC2 elastic ?

From the discussion in our last class, I thought it would be interesting to point out what elasticity means in regard to Amazon AWS.

"Elastic – Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs."

This clearly means that AWS as a service provides well defined API for the applications to scale up and down, but does not do this itself. 

FYI: Nevertheless, there is an AutoScaleUp functionality provided by AWS for Web Applications hosted on EC2 which increase the number of resources deployed when there is increase in web traffic. But I doubt if this is possible for other applications too. 

Tuesday, November 6, 2012

NoSQL databases

Dynamo is one of NoSQL database, Bigtable and HBase are other examples, crurrently there are 150 NoSQL database systems.  has a list of all types of nosql systems.

Thursday, October 25, 2012

A good article describing the evolution of GFS

It has been almost a decade since the GFS paper has been published. Since then, there are many changes that have been made. This interview here summarizes a few of them and the circumstances in which they are made