Real-time analytics, in general, is that you are analyzing their data as in when the data comes And that is a separate branch off analytics Because so far whatever you’re doing you toe spark you do her dope your machine learning You mean that can be there but any branch off analysis You know you’re working on static data Your data is there You are not moving it so that this called our data interest And then we have data in motion So daytime rest is easy to handle because the data is there.
It is not moving anywhere You just quit Either date Our what everyone but data in motion is very difficult to handle because you need systems that can handle real-time data on You need to take it off things like $4 What will happen if my system breaks What if I lose real-time data on the speed of processing and all these things become a challenge And there are several examples We will discuss them for real-time analytics but technically speaking real terminal It explains your analyzing the data as and when it comes right So we learned the previous off Big data Their velocity stands for real-time analytics Right So volume velocity variety are streaming Data can be generated from thousands of sources which typically send in the data records simultaneously Ability to collect and process.
There are bites off data in real-time that is streaming data analytics And it is a very interesting field starting from ad recommendations like you are getting your browsing some website on Do you start getting recommendations and acts starting from there Okay to ah lot off like fraud detection Like I said if you’re swiping a credit card on immediately you’re getting a course saying that Did you actually make it And customized offers like I use HDFC Bank So when I make an online transaction I get a nice summer saying that here is an author you’re trying sector for this much Here is an offer for you.
All this is analyzing their data in real-time on of course Twitter-like streaming the data later in life and then predicting prints so It’s very popular right So like when the World Cup matches and all are happening like which team is the popular which player is a popular all This has to be real-time Why you can’t start the prediction now right The World Cup chances are over So now if you start the prediction is basically useless So So all these are examples of real-time Analytics on spark is excellent In doing this that is what we have to understand So I have collected or we have collected some of the use cases where some of them I have done Some of my friend has worked on this fraud detection in credit card using bank transaction like so in the So the next question is that how you do this is spark The only framework which can do this and no spark is the latest One spark is very new right.
Spy force of people used to adore these things right Older days map-reduce used to do this but that was very slow because you know apparatus will take a lot of time to process your data on one off the frameworks where I have worked a little bit Is gold Apache Stone There is something called Apache Stone S T o R Stone no longer exist It’s very rare to see stone but I have personally what been not 2012 in stone because in that point in time Stone was the only framework which can handle on stone training So like I was not a stone expert but people were ready to pay any money for a strong drink It was almost the only really impressing engine on And so the problem It’s Thomas It’s a separate framer.
You open learning separately It’s a P It’s commands or whatever you have on then you have been started in a cluster strong and starts working Then you have integrated that with your Hadoop for water It’s not a default Framer Contrato on a storm can in fact read data from anywhere You’re only Leslie If it’s it can wait from anywhere But people had big data So what they used to do they will set up a storm cluster on that connect that with her Do Citibank was using the same infrastructure they were having a storm plus for court credit card fraud detection That storm is good Actually Strong is a very nice swimmer Even today the framework exists but very fewer people actually use it right Second is like add optimization and targeting based on each visit like.
When you see this act there are buildings oneness showing the act Second this because like I send up this ad optimization right one is like showing the act Second is the personalization of fact like one The at should come second the person I this personalization should come So sometimes you can see that if you visit Amazon or something the actual pop up But it will be based on the personalization off either the category of the customer So let’s say you are browsing from a particular city based on that trend little shoulder So all this has to happen and really limit has to show the ad So there are so really I’m processing is used to know Third is again personalized content for ad self-driving cars.
You are artificial intelligence right on machine learning where if you’re building a self-driving car or something it has to make real-time decisions when to stop When probably not to kill the people trying to kill the people Something like that on social media trends to take up trending keywords Facebook top trending keywords These things are really pained I o distances We have worked with Bush in one off the project where we were handling I got later So there’s a comical bush right So they had a home industry solution where they will compute the thermal variations in factories So they had installed some sensors close the factories. The fact is we’re in Japan on these sensors.
Will you calculate some variation thermal power and other things So the system sir data on then Bush themselves have a veritable give structure to this data so they will create sort of like a table actually on Then they will send the data, Okay So what we were doing I had actually built a small case study based on this but I don’t have it like that but basically what we were doing Is that so This was there uh fact Well it’s a group of factories or whatever I don’t know So from here the sensor data welcome so sensors will collect them later so one census collects the data This Bush had a system where they will create sort off like a tabular structure It’s not a table but they will give a rock or some sort of structure in a particular system They had an application here right now Our requirement was that.
This data was really huge We were getting around up two million 3.3 million Now align soft data in a week actually So then they requirement was that they were doing a lot of machine learning and on top of this data So they had a data science team so they will give us the requirement saying that Get this data on There are some parameters in the later cast Those parameters on then they have some questions to apply this equation on def that crosses a particular threshold Then either you have to alert them or so sometimes you’re percent another Tow them or sometimes you have to classify the data It’s not So what we were doing Once we get the tabular data here right So here the data will come.