MetaCost: A cost-sensitive approach for Imbalanced Dataset

4 min readMay 22, 2020

Hey Folks, recently I came to problem where I have to make some classification with almost 400k data in which there were almost 400 unique classes ( target variable), the target variable were obviously highly imbalanced.

There are some methods to handle these imbalanced data, I am mentioning the methods which I tried for my work:

Using class_weight=`balanced` parameter if the classifier supports
Using imblearn’s SMOTE.

There were some disadvantages of both the approaches, as I had lots of data and highly imbalanced target values, Approach 1 took hell lot of the time, the reason I think is that as I have set the parameter class_weight=balanced, here it tries to balance the data mostly by up-sampling the minority classes and then training and predicting. The disadvantage of SMOTE was it worked well with Binary Classification but not good with Multi-Class Classification and I stated above I had around 400 unique class labels.

This is when I came across the MetaCost Algorithm which is a cost-sensitive classifier.

What is Cost-Sensitive Learning?

We have seen in most of the classifier we try to minimize the Error between the predicted label and the actual label but, Cost-Sensitive Learning takes the misclassification costs into consideration. The goal of this type of learning is to minimize the total cost and to pursue a high accuracy of classifying examples into a set of known classes.

For instance, in binary classification task a confusion matrix has four values FP (False Positive), FN (False Negative), TP (True Positive), TN (True Negative). Here for TP and TN it seems reasonable to not put a cost because the values predicted are correct as per actual values, what values matter are cost for misclassification i.e for FP and FN. Here we represent the Cost C(i, j) where j is the actual class and i is the predicted class.

These misclassification cost values can be given by domain experts, or learned via other approaches. In cost-sensitive learning, it is usually assume that such a cost matrix is given and known. For multiple classes, the cost matrix
can be easily extended by adding more rows and more columns.

Deep Understanding of MetaCost algorithm and its working

So, you might be wondering how MetaCost helps for Imbalanced Classification.

Pseudo code of its algo according to its research paper is as below:

S = Training Set, L = Classifier

m = number of resamples to generate

n = number of samples in each resample (n < S)

p = True if L supports class probabilities

q = True if all resamples are used for each example

MetaCost uses a concept of Bagging Ensemble where we take n random data points with replacement and create a subset of the sample and fit this resampled data into a model M, we repeat this m times. Thus, at the end we will have m number of Models where Mᵢ model correspond to Sᵢ sub-sample.

Here now we loop over the entire Training set. Now check if q is True. if yes, it means we take all the m trained models else we take only those models in which the x query point is not present in Sᵢ Sample. For instance out of 30 samples lets say the x data point was not present in S¹, S², s²¹, S¹⁷ sub samples then we only take the models corresponding to these sub-samples.

After getting the models on the basis of the q we now compute the the probabilities if the classifier supports, meaning if the Classifier is able to give the probability values of a query point x belonging to each class. so if the value of p=True, we calculate the probability of point x given the m models else we just put 1 for the predicted class.

Once we get the probability of query point x by all the m models we multiply those probability values with the cost matrix we have and take the minimum value out of it. Here as the cost of misclassification will be high, and we are taking the minimum of the dot product, the chances are less for the rarer class to be classified.

Thus here there are less chances for the rarest model to be misclassified due to cost-matrix we have, because here we are trying to minimise the cost of misclassification.

At the end of this loop as we have done all the things for all of the data in training points we would have a relabelled data for the same training set.

Now we take L classifier and S training set with new labelled data and train the classifier and return that trained Model.

I hope this helps you to get the basic idea of how MetaCost works and how the cost-sensitive learning helps to penalise the mis-classification of the rarest class in the imbalanced data.

Reference and other sources

[1] https://homes.cs.washington.edu/~pedrod/papers/kdd99.pdf

[2] https://cling.csd.uwo.ca/papers/cost_sensitive.pdf

[3] https://medium.com/rv-data/how-to-do-cost-sensitive-learning-61848bf4f5e7

MetaCost: A cost-sensitive approach for Imbalanced Dataset

What is Cost-Sensitive Learning?

Deep Understanding of MetaCost algorithm and its working

Reference and other sources

Written by Jigar Parmar