CSci 8980 Big Data and the Cloud
Jon Weissman
Saturday, December 8, 2012
Thursday, November 15, 2012
Article on Facebook big data
http://techcrunch.com/2012/11/08/a-riddle-wrapped-in-a-mystery-inside-an-enigma/
Recent article on Facebook's big data. Inside Facebook, Hive is used heavily. Hive handles 60k+ queries daily.
Monday, November 12, 2012
Confusing terms
You are not confused if you are able to answer following questions without any doubts
Q) Can we build a system which is highly fault-tolerant but not highly available?
Q) Can we build a system with low reliability and High availability?
Q) is reliability defined as MTBF or MTTR?
Scalability
A system whose performance improves after adding hardware, proportionally to the capacity added, is said to be a scalable system.
Elasticity
Elasticity often refers to a system's ability to allocate additional resources in an autonomic manner
In other words, a scalable system allows you to add resources in order to handle more load, while an elastic system will add resources itself when the load increases.
Fault-tolerance
Fault tolerance refers to a system's ability to continue operating, perhaps gracefully degrading in performance when components of the system fail. There is no exact measure to measure fault-tolerance of a system
Availability
Availability is a percentage of time that a system is actually operational and providing its intended service.
A = Uptime/(Uptime + Downtime)
Ai = MTBF/(MTBF+MTTR)
Where there are no single points of failure might be considered system as fault tolerant, but if application-level data migrations, software upgrades, or configuration changes take an hour or more of downtime to complete, then the system is not highly available.
Reliability
In simple words, how long can a system stay up continuously?
More concrete definition is “reliability is the ability of a person or system to perform and maintain its functions in routine circumstances, as well as hostile or unexpected circumstances”
Reliability is often defined in terms of mean time between failures (MTBF). We can build a system with low-quality, not-so-reliable components and subsystems, and still achieve HA.
Durability
Durability of a system guarantees that stored data can't be lost.
References:
1) http://www.quora.com/Distributed-Systems/What-is-the-difference-between-the-terms-scalable-and-elastic
2) http://www.ibm.com/developerworks/library/pa-bigiron2/
3) http://www.quora.com/Distributed-Systems/What-is-the-difference-between-a-highly-fault-tolerant-and-a-highly-available-system#
4) http://www.wikipedia.org/
Sunday, November 11, 2012
EC2 elastic ?
From the discussion in our last class, I thought it would be interesting to point out what elasticity means in regard to Amazon AWS.
"Elastic – Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or days. You can commission one, hundreds or even thousands of server instances simultaneously. Of course, because this is all controlled with web service APIs, your application can automatically scale itself up and down depending on its needs."
This clearly means that AWS as a service provides well defined API for the applications to scale up and down, but does not do this itself.
FYI: Nevertheless, there is an AutoScaleUp functionality provided by AWS for Web Applications hosted on EC2 which increase the number of resources deployed when there is increase in web traffic. But I doubt if this is possible for other applications too.
Tuesday, November 6, 2012
NoSQL databases
Dynamo is one of NoSQL database, Bigtable and HBase are other examples, crurrently there are 150 NoSQL database systems. http://nosql-database.org/ has a list of all types of nosql systems.
Thursday, October 25, 2012
A good article describing the evolution of GFS
It has been almost a decade since the GFS paper has been published. Since then, there are many changes that have been made. This interview here summarizes a few of them and the circumstances in which they are made
http://delivery.acm.org/10.1145/1600000/1594206/p10-case_study.pdf?ip=134.84.46.146&acc=OPEN&CFID=132372171&CFTOKEN=23979261&__acm__=1351194189_d2b32e9f2b0897f11d07caa688ed10c0
http://delivery.acm.org/10.1145/1600000/1594206/p10-case_study.pdf?ip=134.84.46.146&acc=OPEN&CFID=132372171&CFTOKEN=23979261&__acm__=1351194189_d2b32e9f2b0897f11d07caa688ed10c0
Subscribe to:
Posts (Atom)