I think your data that you keep in your organization is actually not right The third thing is uh if you look at the if you had visited any manufacturing plant or a big building or you know some sort of oil plans So you must have not Let’s say if you go to a hospital or if you go to some school Right So you must have noticed that you know the assets You know for example they have lots of manufacturing scenario They have a lot of machines right and they have assembly lines.
They have a lot of sensors lot of subsystems So all these assets are actually you know managed in a particular order So there is nothing but you know a sort of grouped into a category So So that’s another area where you know we want to make sure that we can map Let’s say we can map a sensor to a system Right Censor X Y z I should be able to map to a system ABC Right Some wolf I should be able to map toe another system So So this is another area where the product categorization is hugely important.
What we do Basically when we categorize the product we essentially create a signature of a product you know How can we travel Ah some sort of data points so that we can reach a product definition Now what’s the business impact You know why does it really matter So as I told you one case that you know the good shopping experience in case of e-commerce set up now in case of the engineering versus business uh mapping data mapping set up I think it is important because if the business data is not right your financial reporting is not going to be right And that means you’re not really giving right correct numbers to your shareholders to your customers and so on.
It really badly impacts your business Okay so this has already told you about the bad shopping experience and then it all boils down to you know how do we organize and digitize your products so that you know all the other dependent systems you know do not face challenges and they work seamlessly Okay Andi if you do that through machine learning you know also you know to add some automation on top of it then I think we can also save a lot of manual labor time and the cost So this is pretty much in nutshell You know what This product categorization and Why does it matter now I’ll give you some snapshots on what we’re talking here So this is what we mean about the products.
Let’s say the product is for example share money right here So it is categorized in the group Ethnic was And then did you see a high hierarchical product Categories ethnic were is actually linked to the men’s category and the final top-level categories Clothing So this is one kind off product categorization which has to get right Otherwise it’s gonna give you a lot of challenges business-wise So what do we generally What do we get in our data set So So we generally have data sets like you know what ISS Let’s say uh the product number let’s say for share money maybe they have assigned a product number.
They have assigned some description For example here the description is shared one in 3 62 Or if you combine this description with the top level of the layer then we can say ethnic were shared one in 3 62 right so this type of description is like a textual description is a very unstructured description and then you can have some other data For example with the supplier is giving you this uh this product and where this product is getting manufactured what is the color material And so you may have in a lot of lots of features later toe the product as part of your made metadata which can be used to build the machine learning models machine learning solution.
Nowadays you know we also use an image of a product along with the metadata Now you know at times the image really really benefits a lot in terms of categorizing a product but at times it may not Right So eso its’s always like a good balance between the matter data and demonstrator So in this particular session I think we’re not going through a case study with an image because uh image generally leads us to you know this deep learning concept like convolution network or um on dso on we have to talk about you know the bound the bounding boxes and the segmentation etcetera which is like it is little advance topics We will not talk about it but we will go through on the approaches which we can adapt with the matter data you know with unstructured and structured you know sort of mix kind of the data.
What are the key challenges So as you would have already anticipated you know a large number of categories is a big challenge Okay And unstructured nature of data as I told you that highly to actual data And you have uh you know lots of categories So that’s why you have a very high card inability So cardinal tea is nothing but that the number of categories several levels you have in your future, okay And then uh you know another challenge which is even bigger challenges that a lot of times uh the new products are introduced Eso basically this whole process really becomes a dynamic process where you have to keep refining your model You have to keep returning your model?
You can take care of some of the new products which are getting interview So So these are the key challenges to develop a solution around these problems Okay So what What can we do about it So So we’ll go through a case study but I’ll tell you what are the steps we will go through uh in an orderly fashion So first we will take you through the pre-processing you know how do we um sort of clean up the data and pre-process it And then we will discuss about some of the feature selection strategies like you know grams of diagrams which can actually help reduce the card inability.
We will look at some of the tips and tricks you know how can we do better than faster model building How can we make sure that the size of the model is not huge considering the problem space is huge So those type off tips and tricks we will go through it and then we will finally you know look at some of the models on their reserves and how they stand concerning each other So this is overall on the strategy which I will be going through so let’s jump into the problem statement So So what are the problems statement have taken This uh is that we already have engineering data and we want to to move to the business categories right So as I was I was telling you that you know and you drink PLM system to the business BLM system.
Let’s a team center to s a P or theme center to Oracle Okay so this is a problem statement here So the case story you know I have about six columns So object type It’s like what is the type of the product Whether it’s a hardware or software and part of the product it’s like what is the number It could be like a number with the six digits Seven digits You know it Z like an I doff apart Okay And then the part description is nothing But ah how do you sort of to actually describe the part for example which we looked at you know share one e you know in the previous slide you know to share 162 linked to ethnic were linked to the men category and linked to the clothing category.