Complement Naive Bayes (CNB) Algorithm

Last Updated : 04 Sep, 2025

Complement Naive Bayes (CNB) is a variant of the Naive Bayes algorithm that is specifically designed to improve classification performance on imbalanced datasets and text classification tasks. It modifies the way probabilities are estimated to reduce bias towards majority classes, making it more suitable than the standard Multinomial Naive Bayes in many cases.

Challenge of Unbalanced Datasets

An unbalanced dataset means one type of data appears much more often than the other. This often happens in spam filtering (more normal emails than spam) or medical diagnosis (more healthy cases than disease cases).

Example:

If 95% of cases are "not fraud" and only 5% are "fraud," a model that always predicts "not fraud" will be 95% accurate but will miss all fraud cases. This shows why special methods are needed to deal with such uneven data.

How Complement Naive Bayes Works

For each class, compute the complement frequency: the frequency of features in all other classes combined.
Estimate the conditional probabilities using these complement frequencies.
Normalize the values to ensure they form valid probability distributions.
Classify a sample by choosing the class with the maximum posterior probability.

Formula

For a class c and feature f:

P(f|c) = \frac{count(f, \bar{c}) + \alpha}{\sum_{f'} count(f', \bar{c}) + \alpha \cdot |V|}

count(f, \bar{c}) = count of feature f in the complement of class c
\alpha = smoothing parameter (Laplace smoothing)
|V| = vocabulary size

Example

Suppose classifying sentences as Apples or Bananas using word frequencies, To classify a new sentence (Round=1, Red=1, Soft=1):

MNB would estimate probabilities for Apples using only Apples data
CNB estimates probabilities for Apples using Bananas' data (complement) and vice versa

Solving by CNB: We classify a new sentence with features {Round =1, Red =1, Soft =1} and vocabulary {Round, Red, Soft}.

Step 1: Complement counts

For Apples, use Bananas’ counts -> {Round:5, Red:1, Soft:3}
For Bananas, use Apples’ counts -> {Round:3, Red:4, Soft:1}

Step 2: Probabilities (using Laplace smoothing, α =1)

For Apples:

Round = (5+1)/(5+1+3+3) = 6/12 = 0.5
Red = (1+1)/12 = 0.167
Soft = (3+1)/12 = 0.333

For Bananas:

Round = (3+1)/(3+1+4+1) = 4/11 ≈ 0.364
Red = (4+1)/11 = 0.455
Soft = (1+1)/11 = 0.182

Step 3: Scores, Multiply feature probabilities:

Apples = 0.5 × 0.167 × 0.333 ≈ 0.0278
Bananas = 0.364 × 0.455 × 0.182 ≈ 0.0301

Final Result -> Bananas

Implementing CNB

We can implement CNB using scikit-learn on the wine dataset (for demonstration purposes).

1. Import libraries and load data

We will import and load the required libraries

Import load_wine for dataset loading from sklearn.
Use train_test_split to divide data into training and test sets.
Import ComplementNB as the classifier.
Import evaluation metrics: classification_report and accuracy_score.

Python

from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import ComplementNB
from sklearn.metrics import classification_report, accuracy_score

# Load the wine dataset
data = load_wine()
X, y = data.data, data.target

2. Split into training and test sets

We will split the dataset into training and test sets:

Split the dataset into 70% training and 30% testing data.
Set random_state=42 for reproducibility.

Python

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

3. Train the CNB classifier

We will train the Complement Naive Bayes classifier

Create a ComplementNB instance.
Fit the classifier on the training data.

Python

cnb = ComplementNB()
cnb.fit(X_train, y_train)

4. Evaluate the model

We will now evaluate the trained model:

Predict class labels for the test set using predict().
Print the accuracy score and the classification report for detailed metrics.

Python

y_pred = cnb.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

Note: CNB is better suited for discrete data like text. For continuous features (as in this dataset), Gaussian Naive Bayes might perform better.

When to Use CNB

Scenario	Why CNB is Suitable
Imbalanced class distributions	The complement approach ensures minority classes receive fairer parameter estimates.
Text classification	CNB handles discrete feature counts (e.g., word frequencies) very effectively.
Large feature spaces	CNB is computationally efficient and easy to interpret, even with many features.

Limitations of CNB

Feature independence assumption: Like all Naive Bayes variants, CNB assumes that features are conditionally independent given the class. This assumption is rarely true in real-world datasets and can reduce accuracy when violated.
Best suited for discrete features: CNB is primarily designed for tasks with discrete data, such as word counts in text classification. Continuous data typically requires preprocessing for optimal results.
Bias in balanced datasets: The complement-based parameter estimation can introduce unnecessary bias when classes are already balanced. This may reduce its advantage compared to standard Naive Bayes models.

alokesh985

Improve

Article Tags :

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Complement Naive Bayes (CNB) Algorithm

Challenge of Unbalanced Datasets

How Complement Naive Bayes Works

Formula

Example

Implementing CNB

1. Import libraries and load data

2. Split into training and test sets

3. Train the CNB classifier

4. Evaluate the model

When to Use CNB

Limitations of CNB

Related articles

Thank You!

What kind of Experience do you want to share?