BUILDING A CLASSIFICATION MODEL WITH LOGISTIC REGRESSION ON REAL TIME BIG DATA USING R, APACHE HADOOP, RHADOOP & APACHE FLUME
Abstract
This paper encompass of building a classification model with logistic regression on R using open source RHadoop with robust & resilient Apache Hadoop using real time data handling capabilities of Apache Flume. We have integrated Hadoop with Flume to handle real time / streaming big data & used RHadoop to integrate R with HDFS. Then, we used R to build a classification model for log management. The objective of elastic classification model is to classifying logs into relevant & irrelevant. Reason for using streaming data is reduce lag. Time is the most important factor in our world of decision making. As it is said that if we take any correct decision but at inappropriate time; its ultimately INCORRECT.