BUILDING A CLASSIFICATION MODEL WITH LOGISTIC REGRESSION ON REAL TIME BIG DATA USING R, APACHE HADOOP, RHADOOP & APACHE FLUME

Authors

  • Arunendra Mishra Consultant - Analytics Management Consulting, KPMG Author

Abstract

This paper encompass of building a classification model with logistic regression on R using open source RHadoop with robust & resilient Apache Hadoop using real time data handling capabilities of Apache Flume. We have integrated Hadoop with Flume to handle real time / streaming big data & used RHadoop to integrate R with HDFS. Then, we used R to build a classification model for log management. The objective of elastic classification model is to classifying logs into relevant & irrelevant. Reason for using streaming data is reduce lag. Time is the most important factor in our world of decision making. As it is said that if we take any correct decision but at inappropriate time; its ultimately INCORRECT.

References

Published

2014-09-29