# How is the data technology evolving?

If there’s any sort of relationship between people who smoke and is there a chance of them getting cancer right so this is my dependent variable where I have two labels yes or no so yes the person has cancer or no the person does not have cancer and the independent variable is yes the person smokes no the person does not smoke and with the help of this classification algorithm I am trying to determine if a person smokes or if he doesn’t smoke what is the probability of that person having cancer right.

This is how an so this is how a classification algorithm works so next we learn to know the next type of supervised learning algorithm which is a recognition and regression is basically used to find out relationship among different entities and in this session we were to focus on linear regression as the name States linear regression is a specific type of regression algorithm and it is used to find out the linear relationship among different entities and in linear regression your dependent variable is a continuous numerical value now a perfect example for this would be.

If we are trying to determine if there is a relationship between the age of the potion and the salary of the person so over here the dependent variable would be the salary of the person and the independent variable would be the age of the person and let’s see if I give an arbitrary value for the age of the person then I would want the exact salary of the person right it’s over here X so let’s say if I filled in the x value to be 28 which is the age of the person I would want a particular value for salary right so if I let say feed in 28 then the salary needs to come out as maybe one lakh INR all right.

This is our linear regression algorithm works now we’ll head on to the next type of machine learning algorithms which is unsupervised learning so in supervised learning we saw that there were independent variables and dependent variables and there was also work class label associated with the data but what happens in unsupervised learning this we have input data but there are no class labels associated with the data Dallas we just are so if we just look at this picture will you then we just have images of cars and cycles over here but there are no labels indicating that this is a car or this is a cycle so what we’ll do is we’ll take this set of input data with no class labels.

We’ll feed this to an unsupervised or machine learning algorithm and this unsupervised claw machine learning algorithm is basically a clustering algorithm so this algorithm takes in this data and divides it into two clusters so the first cluster would comprise of only the cars when the second cluster would comprise of only those cycles now one thing to note over here is thus unsupervised learning algorithm was divided I was able to divide this entire data into these two clusters where one cluster had only cars and the second cluster had only cycles with no input labels associated with them.

This algorithm was able to do so by understanding the underlying structure of this particular data so if I take this cluster over here you have to understand that these data points are very similar to each other that is there is very high intra cluster similarity right so these cars are very similar to each other and again if I take this cluster over here all of these cycles are very similar to each other again so there is very high intra cluster similarity but if I compare so let’s say this is cluster 1 and if this is cluster 2.

If I compare cluster 1 with cluster 2 then there is very high inter cluster dissimilarity right so these are so data points from this cluster and data points from this cluster are very dissimilar to each other and all of this so we were able to do by using this unsupervised learning algorithm right guys so these are the basics of data mining process and may properly understood what exactly is machine learning and with the categories of machine learning now it’s time to do a demo and implement all of these different data mining techniques so I will head on to Jupiter over here and I’ll open a new Python free notebook so we’ll be actually working with the iris dataset.

Let me notice the data set and before I do that I’ll actually go ahead and load the required libraries so while type an input pandas as speedy so find us as a Ison library which is mostly used for data manipulation purposes so now that I have loaded this let me load up the file which I’ll be working with so I’ll type in PD dot read underscore CSV and in fact this I will give in the name of the file witches iris dot CSV and I will stools in a new object I’ll name that object to be iris alright so I’ve got my dataset stored in this new object and I’ve named an object to be iris now I will have a glance at the first few records of this dataset to have a glance at the for Shira course.

I’ll use the head function over here so the head function would give me the first fire records which are present in this dataset now similarly let’s see if I want to look at the first ten record of the zero set I just put in ten instead of five right so what I’m doing is I’m basically performing the basic data manipulation tasks over here right so my first task was data acquisition where I have acquired the data after that I’d have to do the data pre-processing where I’m understanding the structure of the data right.

I have looked at this so we see that there are five columns in total so this data set is basically about the iris flower which has different species so it has the Saito’s our species over C colors P XI and the virginica species and these other different features associated with these different species of the iris flower so we have the sepal length sepal width petal length and battle rate associated associated with the iris flower all right now I will go ahead and perform different data manipulation app operations over here so let’s say I want only those records from this entire data set where the sepal length is greater than five so let me do that I will start by giving a condition here so I’ll give in the name of the really mrs. iris.

I’ll given these parentheses over here and given the name of this column it is sepal length you’d have to keep in mind that you give the correct spelling over here and also take care of the capital letters over here right so if you don’t do that you’ll get an arrow now how about only through this record well separate unto straighter than five so iris sepal length has to be greater than five let me click on run and let’s see what do we get.

You get a bunch of true and false values of what does this mean so we get a true value wherever this condition has been satisfied so iris sepal length is greater than five to see that sepal length of this iris flower is greater than five and that is why we have a true value over here so after that we have four false values following and that is because all of these four values 1 2 3 & 4 these values are not greater than 4 this up this value not greater than 5 so wherever this condition has been satisfied you get a true value and wherever the condition has been not satisfied you get a false value given the condition.