The dimension is a weak predictor but for small cars it comes out to be very strong predictors So when I’m building the model instead building one single model to predict the mpg off all the cars put together which is silly I’ll build separate models for small cars separate for large cars And when I build those mortals.
I will not use the same attributes The attributes I used to build in your model for large cars will probably will be different from the attributes I used to build the immortal for the small parts Right on If you can lose this on the wants to dimension is your uh acceleration They’re the hardest on almost horizontal and this fact comes out in the pair panels Acceleration is a useless bum So you look at any of any value of acceleration You’ll find all the cars available Large cars small cars every cars of And if you look at the pair panel in the pale final look at the acceleration Oh it is acceleration.
This acceleration This is all the questions are overlapping So useless damage so came into clustering I can actually one more example of how I used to give in slipstream This is for tech support and let me hope I hope I have the date on the quarter What a real estes out of build inducing models different dimensions so I was getting him to priority This is a lenient mortem Somewhere down the line support vector machine didn’t help much Give me 87% Only I use decision Tree Alina Regression Ridge and lasso.
None of them helped They’re still giving a performance and range of seventies only But decision tree came out to be better Oh it’s nothing Decision Tree gives me almost 90% accuracy in production But looking at the output off this clustering on the various dimensions I took a call that if I build a little mortal after billed separately need models But if you want one single Morton decision tree regressive came out to be the right It gave me an A crazy of 90% almost on a stick You want to explore this You want to try this out Would electric take This is a Yeah Going explore this I’m not complete I’m not built A linear models won’t build a linear models on one clusters and see what their performances.
How it does it can be used In particular case studios used to suggest ideas for improvement of customer satisfaction On technical support I realized I overeating that with another court though The name of the finest still tech support analysis But it is not export So I’m going to share this court with you This walk through the court will find it easier to understand If you have followed me on the mpg data set.
We’ll be able to understand this particular example Okay so shall we get it to hijack your history Well you could have killed all of you Came in sequestering the very powerful very commonly use technique As you would have understood It requires a lot of part process Lord of understanding the data before your implemented and one of the challenges of gaming says how do you know how many clusters to look for That is weird This Ah heretical clustering becomes handy American Celestron gives you a visual way off finding out how many clusters are likely to be there in your data But at the same time I must caution you Hideout of useless truth Don’t make sure visually some clusters in your data sets those clusters may not exist Oh encouraging Those places may not exist Really.
We need to be careful There are ways off being of preventing ourselves from falling into such portholes So we’ll see how we can prevent ourselves from getting into these traps one of the reason which we can do it But before you go there in Caymans questioning you find the similarity or dissimilarity between data points Horwich We used Euclidean distance all If you’re comfortable with that okay when you come to her been plastering What happens is we’re going to see a glue motive that is bottom up Every record Every data point in a mathematical space is treated as l cluster.
In the next iteration two of these clusters are coolest Are joined together to form a larger cluster which cluster to join to form larger clusters that has deserted based on the distance between the clusters Once again the same of the rule applies closer The clusters are to one another more similar there So I want to combine similar clusters into larger too Okay but the problem this when you start looking for similarity between clusters a cluster is no more A one single data point It is a combination of data points So the distance calculation between clusters to find out which are more similar to each other that becomes slightly complex It becomes complex because there are different ways of doing it So the different ways of doing it is shown here One is called single Linkage Our Dominion linkage Imagine you have three clusters Blue cluster here your local staring directly.
Now And if I I want to find out which two clusters are closest to each other what the single linkage matter does is it finds out the distance between two points returned from different clusters and nearest So it finds out the distance between the red and yellow red and blue yellow and blue off the three they didn’t hear Lou This distance is at least so it combines range with you This is called Single Linkage Maker If I look at Max Linkage it finds out the distance between the farthest points from the different blisters red and yellow red and blue yellow and blue.
It takes the one which is leased farthest always will take the least Once again it might combine your 100 our average distance Average distance is pair wise Distance between all the data points here and all the data points they’re on their average Do this amongst the red and yellow cluster red and blue cluster yellow and blue cluster combined Those two clusters whose average distance is the least departure combination or me for that So that’s a few points are closer Would it shift those points No No Rex becoming that’s become so This is to be done because all the data points is a little data points Good night Which are The money is I didn’t know what we will find a living on the bunch of you Let the distinct points you like to come out of me You’re thinking and you start marching I’m just looking Suppose there’s a blue close to here a next Let’s do this Okay.
There are three clusters here Three clusters Find out the average distance off All off you from all of them Final different distance off All off you all A few from all of them Find out which averages the least you much with the actress So these two become a logical right Obviously at resisting of these points on those will be very high completed These travelers have not considered that that is what this average material No it doesn’t mean you moved in with insects You never do 34 All the data points has become all the data points to close these two blisters that closes that that all they want to become one that is taking it is not going to point to point It is taking a Isn’t that a better way Because you’re taking more variance in the sand in your data set to decide which one to bluster What about about three points was never created So in this case the bond reports is those two on these three of the boundary points in these two places.