Automating CAQH provider data fetch.

Tired of manually entering provider data into your application. Wish there was a solution out there that would fetch the data from CAQH and populate your lengthy form. Aurora Health Solutions…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




Handling Imbalanced Dataset

What is Imbalanced Dataset ?

It is most commonly found in medical sector related dataset,fraudulent dataset etc. Suppose Apollo hospital has made a dataset of people came for diabetes checkup ,the dataset consists binary ouput that is either person will be diabetic or not.

let’s say out of 1000 records 100 people are diabetic and rest are normal, so according to output our dataset has been divided into two parts.

Person is diabetic =100 and person is non diabetic =900 here large amount of dataset has been inclined towards a particular class (negative class) hence it leads to formation of imbalanced dataset.

In this blog we will discuss various techniques to balance the imbalanced dataset.let’s get started……

In undersampling we reduce the majority class in such a way that it will be equal to minority class.

According to figure1 previously we had imbalanced dataset of 900 datapoints in majority class and 100 datapoints in minority class.By doing downsampling we have reduced the datapoints of majority class as equal to minority class.

Disadvantage :

Reducing datapoints from majority class may leads to loss in useful information won’t give better result.

Step 1 : checking whether dataset is balanced or imbalanced

out 1
out 2

Step 2 : perform the opreation

As we see there is abundant amount of data present in class 0 as compared to class 1. Hence we can say that data is imbalanced. Now we will balance the dataset using library imblearn this library might not be pre installed into jupyter notebook so you need to install it first by pip install imblearn.

We have class 1=492 (minority class) and class 0=284315(majority class) by using undersampling we will reduce down the count of majority as same as count of minority minority class.

After undersampling we will have number of values in class 1=492 and class 0=492 .

Plotting countplot chart to see whether the dataset has been balanced or not

Output countplot of balanced dataset

In oversampling we add more and more datapoints in minority class and make it’s datapoints equal to majority class . This is most commonly used balancing technique in machine learning it won’t loose information during balancing the dataset. There are various oversampling techniques we can use to balance the majority and minority class.When we do upsampling we always will have chance of overfitting.

Method 1 : class weight

Suppose you have 900 datapoints of class 1 and 100 dataset of class 0 .

Step1 : Take the ratio of datapoints present in both the classes

ratio = 100÷ 900 = 1÷ 9 =>1:9

The weight of majority class will be multiplied with each and every datapoints of minority class and weight of minority class will be mutliplied with each and every datapoints of majority class which will be resulted into balanced dataset.

Step 2 :

cross multiplication

Now our minority class will be equal to majority class hence data is balanced now.

Method 2 :- Artifical or synthetic points method

By using extraplolation technique we create more and more synthetic points in the minority class until it will be equal to points present in majority class.

Python implementation :-

We can implement the over sampling technique by another method also that is;

Visualizing whether we are able to balance dataset or not

This is all from my side if you find this blog interesting hang tight i will be coming with more interesting .Please give your valuable suggestion in the comment box. Keep learning keep exploring …………

Add a comment

Related posts:

Driftwood

A father dies. A brother is disinherited, but his sister stands by him. In more ways than one.

My First Data Warehouse

A sample data warehouse design and implementation built with PostgreSQL, Airflow, dbt, and Redash.

Build a Chrome Extension using React JS

Build a Chrome Extension using React JS. Google Chrome is the world’s most used web browser and React is the most popular JavaScript framework. So it’s quite obvious you want to….