Classifying eight types of blood cells using image data. It covers data exploration, preprocessing, feature extraction comparison (HOG, ViT, PCA, t-SNE), model development (including DNNs, CNNs, ResNet, ViT), evaluation, and interpretability analysis.
You can access a live preview of the interactive HTML document here:
Note that it may take a few moments for the files to load.
The project adopts a sequential methodology:
- Initial data exploration and validation
- Feature engineering (HOG vs. deep features with ViT)
- Dimensionality reduction and feature space visualization
- Baseline modeling with classical ML approaches
- Deep learning with custom CNN architectures
- Transfer learning with pre-trained models (ResNet18, ViT)
- Model interpretability through SHAP analysis
The detailed research process, experimental notebooks, and in-depth analysis are available in the project's notebooks.
Transfer learning with robust architectures like ResNet18 and ViT proved superior to training simpler models or custom CNNs from scratch, achieving near-perfect classification accuracy by effectively learning discriminative visual patterns. The SHAP analysis confirmed that the models focused on biologically relevant features such as nuclear morphology and cell size.
This system employs various deep learning architectures to enhance diagnostic accuracy.
Entry point that coordinates all components of the blood cell classification system.
- preprocessing.py: Functions for image data processing, validation, and preparation.
- dataset.py: PyTorch dataset classes and data loading utilities for efficient data handling.
- neural_networks.py: Neural network architecture definitions including SimpleNN, LightCNN, CNNModel, ResNet18, and ViT implementations.
- training.py: Training and evaluation functions with support for early stopping and metrics tracking.
- extraction.py: Feature extraction methods including HOG and deep learning approaches using pre-trained vision transformers.
- explorer.py: Data exploration and visualization functions for dataset analysis and result interpretation.
- evaluation.py: Model evaluation metrics and performance visualization tools.
- setup.py: Environment setup and installation utilities.
- config.py: Configuration management for consistent experiment parameters.
Yehonatan Keypur
Grade: 100