Raman Spectral Classification System

This repository implements a modular, research-grade deep learning pipeline for 1D spectral classification with PyTorch. It is designed to support rapid experimentation, robust evaluation, and advanced domain adaptation to achieve high accuracy on both reference datasets and out-of-distribution (OOD) clinical Raman spectral data.

Current Architecture

The core pipeline is heavily optimized for both speed and generalization across clinical domains. It features a robust multi-stage transfer learning setup:

Multi-Stage Training Pipeline:
- Stage 1: Isolate-space pretraining on large reference datasets.
- Stage 2: Treatment-space semantic alignment and compact transfer-space finetuning.
- Stage 3: Clinical domain transfer utilizing Domain-Adversarial Neural Networks (DANN) and CORAL loss.
Advanced Deep Architectures:
- resnet1d: Deep residual network with depthwise separable convolutions, scaling kernel sizes, and optional Squeeze-and-Excitation (SE) attention (configurable via use_se: true switch).
- cnn: Baseline 1D convolutional network.
- hybrid: Convolutional stem with a transformer encoder.
- transformer: Attention-based sequence model.
Dynamic Model Registry: Automatic injection of signal_length, n_classes, in_channels (with derivative-aware channel logic), and semantic_space handling.
Multitask Capabilities: Ontology-aware auxiliary heads mapping sparse clinical IDs for multi-objective transfer learning.
Fast Data Loading: Direct .npy array loading eliminates the I/O bottleneck of per-file CSV reading.
Robust Preprocessing: Configurable SNV, baseline correction (ALS), Savitzky-Golay smoothing, and standard scaling.
On-the-fly Augmentation: Gaussian noise, intensity scaling, wavenumber shifting, and baseline drift injection.
YAML-driven Configuration: Complete control over data splits, preprocessing, augmentations, and model architectures via modular YAML files.
FastAPI Inference: Built-in production-ready REST API for model serving.

Dataset Handling

Due to file size limits, the dataset (large .npy files) is not included in this repository.

Expected Folder Structure: Before running any training scripts, ensure your dataset is placed at data/raw/.

Quick Start

Install dependencies:

pip install -r requirements.txt

Train a model: You can train a model by specifying the architecture (cnn, resnet1d, hybrid, transformer). The script automatically loads configs from the configs/ directory, trains the model, evaluates it, and runs the finetuning/domain adaptation phase.

python scripts/train.py --model resnet1d

Note: Use --override training.batch_size=64 to easily override specific configuration parameters.

Evaluate a trained model:

python scripts/evaluate.py --exp-dir experiments/resnet1d_YYYYMMDD_HHMMSS

Compare multiple models:

python scripts/evaluate.py --compare experiments/run1 experiments/run2 --split test

Running the API

You can serve a trained model locally using the included FastAPI backend.

# Set the environment variable to your trained experiment directory
export RAMAN_EXPERIMENT_DIR=experiments/resnet1d_YYYYMMDD_HHMMSS

# Run the API
uvicorn app.api:app --host 0.0.0.0 --port 8000

Then, visit http://localhost:8000/docs to test the /predict endpoint via the Swagger UI.

Project Structure

.
├── app/                  # FastAPI backend for inference
├── configs/              # YAML configuration files
│   ├── data/             # Splits, preprocessing, and augmentation configs
│   ├── model/            # Architecture-specific hyperparameters
│   └── training/         # Base training config, optimizer, losses, etc.
├── data/
│   └── raw/              # .npy files go here
├── experiments/          # Output directory for training logs and artifacts
├── notebooks/            # Jupyter notebooks for data exploration and analysis
├── scripts/              # Executable entry points
│   ├── train.py          # Main training and finetuning script
│   ├── evaluate.py       # Standalone evaluation script
│   ├── setup_data.py     # Data preparation and integrity checks
│   └── deep_analysis.py  # Model interpretation and metric analysis
├── src/                  # Core package
│   ├── data/             # NumpyDataset, Dataloaders, and Augmentations
│   ├── evaluation/       # Metrics, confusion matrices, McNemar's test
│   ├── interpretability/ # Grad-CAM, Integrated Gradients, etc.
│   ├── models/           # CNN, Hybrid, ResNet1D, Transformer
│   ├── training/         # Main Trainer, Finetuner, Losses, and Schedulers
│   └── utils/            # Helper functions for config parsing and I/O
└── tests/                # Unit tests

Google Colab Execution

If you want to train this model on Google Colab, you can securely mount your dataset from Google Drive:

# 1. Clone the repo and install dependencies
!git clone https://github.com/rana-rohit/raman-spectral-classifier.git
%cd raman-spectral-classifier
!pip install -r requirements.txt

# 2. Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

# 3. Create the expected directory structure and copy data
!mkdir -p data/raw
!cp /content/drive/MyDrive/path_to_your_data/*.npy ./data/raw/

# 4. Run training
!python scripts/train.py --model resnet1d

Name		Name	Last commit message	Last commit date
Latest commit History 251 Commits
configs		configs
experiments		experiments
metadata		metadata
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Raman Spectral Classification System

Current Architecture

Dataset Handling

Quick Start

Running the API

Project Structure

Google Colab Execution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Raman Spectral Classification System

Current Architecture

Dataset Handling

Quick Start

Running the API

Project Structure

Google Colab Execution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages