Skip to content

FatimeNazliAs/Intrusion-Detection-Systems-using-ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

79 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Intrusion-Detection-Systems-using-ML

📜 Project Overview

This project focuses on building an Intrusion Detection System (IDS) using machine learning techniques. The IDS analyzes automotive cybersecurity datasets to identify potential intrusions and anomalies. It emphasizes efficient preprocessing of large datasets, preparing high-quality data for downstream ML modeling.

🚀 Features

  • Dual Data Processing Engines: Supports both Polars and Pandas for flexible and efficient data handling.
  • Optimized Dataset Loading: Uses Polars for high-speed data ingestion with large files (~9 million rows).
  • Data Preprocessing: Clean, transform, and sample data using intuitive Pandas workflows.
  • Exploratory Analysis: Offers visual and statistical analysis tools via Jupyter notebooks.
  • Modular Design: Shared utility functions for easier maintenance and reusability.

📂 Repository Structure

  • notebooks: Used for exploratory data analysis, visualization, and prototyping. You can find scripts here to test and experiment with the data before finalizing the methods.
  • src: Contains the finalized Python scripts that implement the core functionality of the project. After testing and refining methods in the notebooks, the final code is written into these files for consistent and optimized execution.
  • notebooks/eda.ipynb: This is the first exploration of the data, where no sampling or preprocessing has been done yet. The goal here is to learn from the raw data, understand its structure, detect any anomalies, and identify potential features for further analysis.
  • notebooks/solve_dlc_flag_issue.ipynb: Handles the issue of misplaced flag values in datasets with variable DLC (Data Length Code). When dlc < 8, the flag sometimes appears in one of the byte columns instead of the flag column. This notebook detects and corrects such cases.
Intrusion-Detection-Systems-using-ML/
├── input/                                   # Raw dataset files from Car Hacking Dataset
│   ├── attack_free.txt
│   ├── dos_dataset.csv
│   ├── fuzzy_dataset.csv
├── output/                                  # Processed datasets ready for analysis
│   ├── attack_free_df.csv
│   ├── dos_df.csv
│   ├── fuzzy_df.csv
├── notebooks/                              # Jupyter notebooks for analysis
│   ├── eda.ipynb                           # Exploratory data analysis using Pandas
│   ├── preprocess_data_with_pandas.ipynb   # Data cleaning and transformation using Pandas
│   ├── preprocess_data_with_polars.ipynb   # Data cleaning and transformation using Polars
│   ├── solve_dlc_flag_issue.ipynb          # Fixing misplaced DLC/flag column
│   ├── utils.ipynb                         # Helper functions for notebooks
│   ├── visualize_data.ipynb                # Data visualization with charts
├── src/                                    # Python scripts for production-ready data processing
│   ├── load_data_with_polars.py            # 🚀 Actively used: Efficient loading using Polars
│   ├── preprocess_data_with_pandas.py      # ✅ Actively used: Sampling & cleaning using Pandas
│   ├── utils.py                            # ✅ Actively used: Shared helper functions
│   ├── load_data_with_pandas.py            # ⚠️ Not used (slow on large data, kept for reference)
│   ├── preprocess_data_with_polars.py      # ⚠️ Not used (replaced with Pandas version)
│   ├── train_model.py                      # ML model training (coming soon)
├── README.md                               # Project documentation


✅ Currently Used Code Files

File Purpose
load_data_with_polars.py Loads full datasets efficiently using Polars
preprocess_data_with_pandas.py Preprocesses sampled data using Pandas, suitable for ML workflows
utils.py Stores common helper functions used across both engines

❌ Deprecated / Reference Files

File Notes
load_data_with_pandas.py Legacy loader; not recommended for large-scale data loading
preprocess_data_with_polars.py Old preprocessing logic; replaced for better maintainability with Pandas

🧠 Why Both Pandas & Polars?

  • Polars is preferred for initial full data loading due to its speed and memory efficiency.
  • Pandas is used for preprocessing sampled data—it's more intuitive and integrates well with visualization and ML tools.

📊 Datasets

The raw datasets are taken from the Car Hacking Dataset, which contains records for intrusion detection, such as:

  • attack_free.txt: Attack-free dataset.
  • dos_dataset.csv: Denial of Service (DoS) dataset.
  • fuzzy_dataset.csv: Fuzzy intrusion dataset.

Processed datasets are saved in the output folder as:

  • attack_free_df.csv
  • dos_df.csv
  • fuzzy_df.csv

🛠️ Setup Instructions

  1. Clone the repository:
    git clone https://github.com/yourusername/Intrusion-Detection-Systems-using-ML.git
    cd Intrusion-Detection-Systems-using-ML
  2. Install dependencies:
    Create and activate a virtual environment, then install requirements:
    python -m venv venv
    source venv/bin/activate   # For Linux/Mac
    venv\Scripts\activate      # For Windows
    pip install -r requirements.txt
  3. Run preprocessing scripts: Use the src scripts to generate processed datasets:
    python src/load_data_with_polars.py
    python src/preprocess_data_with_pandas.py
    

📝 Usage

  • Load Full Dataset: Use src/load_data_with_polars.py for quick ingestion of large files.
  • Preprocess Data: Run src/preprocess_data_with_pandas.py after sampling for manageable processing.
  • Explore Data: Open notebooks/eda.ipynb for insights into distributions, anomalies, and patterns.
  • Visualize Data: Generate visual summaries using notebooks/visualize_data.ipynb.

📜 License

This project is licensed under the MIT License.


🙌 Acknowledgments

  • Car Hacking Dataset — source of real CAN bus data.
  • Polars — for blazing-fast data loading.
  • Open-source community — for tools and guidance that power this project.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors