Saturday 7 December 2013

Hadoop software framework program



What is big data and why is it important to be analysed?

Big data is a large and complex set of data that is difficult to be managed through the traditional applications for processing the data. In this bigdata tutorial provided by the Intellipaat, how this huge amount of data is stored and processed through the Hadoop software framework has been discussed.
 
      The three Vs of big data:
·         The big data comes from Various sources and has various         formats that can be unstructured or structured, audio files, log files, emails, communication records and pictures.
·         The big data comes with a high Velocity
·         The big data has a massive Volume
The big data can be anything from tweets from the Twitter or web logs, other interaction logs that can help the business to become more user-friendly and get a better business than the competitors. It can even manage the reputation from the social media posts. However, this analyzing of the big data is not possible through a single machine and therefore, a software framework is needed to do the task.

What is Apache Hadoop?
The Apache Hadoop is a software framework designed by the Apache Software Foundation, for processing the big data. Hadoop software framework overcomes the limitations and drawbacks of the traditional data processing software like scaling up and scaling down, huge demand for bandwidth and failure of data on a partial process. Hadoop software framework uses several machines, and cluster of machines to distribute the big data so that the machines and the software framework can analyse the big data and come to a conclusion.

How does the Hadoop software framework function?
Doug Cutting, who is the Chief Architect of Cloudera, helped the Apache Software Foundation to design a new software framework, inspired by the Google’s technology of handling the huge amount of data and named the software as Hadoop. Previously the trend in most of the web developers and hosts was to rely on different hardware and different systems of storing the data, as well as for processing it, but Hadoop has the ability to store as well as process the huge amount of data all by itself. The other advantage is that the software can store and process the data by analysing the cluster of machines that physically exist in different geographical locations. This helps in storing all the useful as well as useless data altogether in the Hadoop cluster so that whenever you need them, you have the data ready in hands.

The working principle of that can be discussed in this big data tutorial is that the Hadoop software framework works through the Hadoop Distributed File System or HDFS. Every set of big data that you send to the Hadoop software framework will be first sent to the NameNode or the main node of the cluster. Then the data is distributed into many other DataNodes or subordinate nodes, where a replica of the data is automatically stored so that even if there is a crash of any of the machines, the data can be restored. The data is then sent to the “MapReduce” phase, which is one of the components of processing the data, where the Map function distributes the data to the different nodes and the Reduce function gathers the results.

To known more about Apache Hadoop and Hadoop software framework, you can visit Intellipaat.uk.

No comments:

Post a Comment

Thank You!!!!