Thursday 12 December 2013

A History of our Understanding of Data Computation



Statistics is a fairly new branch of mathematics tracing its roots to the mid seventeenth century when Thomas Bayes devised his famous Bayes theorem and revolutionized the way we view and perceive the world around us. It was a simpler time where numbers were merely mathematical objects used to count, people in those times the idea of using machines with software’s like Hadoop for the computation and analysis of big data clusters was inconceivable.

The science of statistics with its development over the years has allowed us to better understand and interpret data; now when someone tells us that the temperature outside is 30 degrees we unconsciously know that we will feel hot when we go outdoors. While this example might seem trivially irrelevant to us today it does have some merit. Today 30 degrees is no longer merely a number used to measure weather, in this day and age when we are told its 30 degrees we can make a number of assumptions based on that data such as that it would feel “hot” outside, that most people would be wearing half sleeved or sleeveless t-shirts, that we would need to use air conditioning, that our electricity bills would be higher and so on.

While the compilation and interpretation of data began as early as 1790 with the first population census conducted in the United States it was not until over a century later that for the first time simple machines were used in the tabulation of census. The first programmable digital electronic computer the Colossus was used to break the Lorenz cipher and changed our understanding and usage of data forever. The only thing that gave the Colossus an edge over its predecessors was volume of data it could compute. Since the advent of computers the amount of data we compute has gotten bigger and bigger, which while increasing our understanding of the world as well as the precision of our predictions has brought its own set of problems.

The continual advances in computer technology meant that computers could handle larger and larger calculations rendering previous versions obsolete. However the size of data to be calculated always seemed to be bigger i.e. the ability of computers to process data was not scaling as well as the size of our data and it soon became clear that a different approach to the problem was needed.

Hadoop named after a toy of its creator Doug Cuttings son, is an open-source software frame work used for the storage and processing of large scale data that emerged in 2005 as the answer to various tedious and time consuming problems faced by computer scientists in this day and age. Hadoop with its ingenuous infrastructure consisting of Hadoop Distribution File System (HDFS), Big Data Hadoop YARN and Hadoop Map Reduce works under the assumption that hardware failures are common and thus should be handled in software by the framework. In layman’s terms this means that Hadoop works under the assumption that the size of the data is too large for a computer or set of computers to compute at once causing the system to crash. Working with this fundamental assumption Intellipaat - Hadoop online training has created a system that allows its users to work around the problem empowering them to theoretically process infinite amounts of data at the same time. Online Hadoop tutorial is the easiest way to understand every thing deeply and clearly.

Saturday 7 December 2013

Hadoop software framework program



What is big data and why is it important to be analysed?

Big data is a large and complex set of data that is difficult to be managed through the traditional applications for processing the data. In this bigdata tutorial provided by the Intellipaat, how this huge amount of data is stored and processed through the Hadoop software framework has been discussed.
 
      The three Vs of big data:
·         The big data comes from Various sources and has various         formats that can be unstructured or structured, audio files, log files, emails, communication records and pictures.
·         The big data comes with a high Velocity
·         The big data has a massive Volume
The big data can be anything from tweets from the Twitter or web logs, other interaction logs that can help the business to become more user-friendly and get a better business than the competitors. It can even manage the reputation from the social media posts. However, this analyzing of the big data is not possible through a single machine and therefore, a software framework is needed to do the task.

What is Apache Hadoop?
The Apache Hadoop is a software framework designed by the Apache Software Foundation, for processing the big data. Hadoop software framework overcomes the limitations and drawbacks of the traditional data processing software like scaling up and scaling down, huge demand for bandwidth and failure of data on a partial process. Hadoop software framework uses several machines, and cluster of machines to distribute the big data so that the machines and the software framework can analyse the big data and come to a conclusion.

How does the Hadoop software framework function?
Doug Cutting, who is the Chief Architect of Cloudera, helped the Apache Software Foundation to design a new software framework, inspired by the Google’s technology of handling the huge amount of data and named the software as Hadoop. Previously the trend in most of the web developers and hosts was to rely on different hardware and different systems of storing the data, as well as for processing it, but Hadoop has the ability to store as well as process the huge amount of data all by itself. The other advantage is that the software can store and process the data by analysing the cluster of machines that physically exist in different geographical locations. This helps in storing all the useful as well as useless data altogether in the Hadoop cluster so that whenever you need them, you have the data ready in hands.

The working principle of that can be discussed in this big data tutorial is that the Hadoop software framework works through the Hadoop Distributed File System or HDFS. Every set of big data that you send to the Hadoop software framework will be first sent to the NameNode or the main node of the cluster. Then the data is distributed into many other DataNodes or subordinate nodes, where a replica of the data is automatically stored so that even if there is a crash of any of the machines, the data can be restored. The data is then sent to the “MapReduce” phase, which is one of the components of processing the data, where the Map function distributes the data to the different nodes and the Reduce function gathers the results.

To known more about Apache Hadoop and Hadoop software framework, you can visit Intellipaat.uk.

Friday 6 December 2013

The process of Hadoop and concept of big data

Summary: Big data is just some data that keeps accumulating every day in huge amounts. The way to process this data is given by Hadoop software. This data cannot be stored on a single system so it is divided and sent to various systems where it can be effectively processed.

Big data means a huge amount of data which keeps accumulating on a daily basis. This data is generated in large volumes and it is not easy to process it. The volume is such that it cannot be processed on a single machine. It needs lots of data storage location. This information is also not simple. It is unordered and not in a well defined relational manner. When one is unable to understand the relation it becomes even more difficult to process or arrange it. The big datatutorial helps in understanding in detail the ways to process and handle big data in today’s world. The scenario observed now is that there is a lot of work to be done and very less people do it. There are about some millions of vacancies in this field. This is due to the amount of data that is being generated every day.

The data previously was processed using a number of systems attached to a single storage area network. There are many disadvantages of this method. The data is distributed but huge bandwidth is required to complete the process at a normal speed. The other problems that might occur with this method are that if one of the systems fails to work or stops functioning then the whole process will suffer. The big data tutorial explains the concept of big data and the functioning of Hadoop.    
      
How Hadoop helps with big data
 
Apache Hadoop is a software which helps in the handling of the huge amounts of data. It takes the data and divides it into sections which are then sent to different systems and then processed. Along with data the program according to which it is to be processed is also sent to each and every system. The data are very huge so it is divided into small parts of 64mb or 128mb. The big data tutorial -intellipaat also explains how Hadoop works. The Hadoop system was developed because the system in use before Hadoop was developed to require data to be sent to and from the system a lot of times. Most of the power of the system was consumed obtaining the data from and sending it back to the network on which it was stored.

How Hadoop handles data

Hadoop divides the data into parts and sends them to the systems. The data combined along with the program is similar to a block. It is replicated two more times so that the data is not lost if there is any problem with the processor. The processor however does not decide for itself as to how the data will be divided. It is the work of the main system on which Apache is working. This system divides the data according to the number of systems available. It tries to assign equal amounts of data to each system. The maximum difference in the number of blocks might be one. The individual systems can be close to each other or far away in separate countries but it will not affect the process at all. The word Hadoop does not mean anything in particular. It was just a word used by a small kid to refer to one of his toys. The Intellipaat -  Hadoop online training explains how hadoop handles data.

To known about Apache Hadoop and Hadoop software framework, you can visit Intellipaat.uk.