Skip to content

DhruvaRajwade/IRSM_Spectral_Classification_DL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IRSM Spectral Classification using Machine Learning and Deep Learning

This repository contains code used for our work (paper 1 and paper 2 ) to predict idiopathic recurrent spontaneous miscarriages using Deep Learning (Specifically Variational Autoencoders, Artificial Neural networks and Convolutional Neural Networks (1D) as well as Classical Machine Learning models)

To run the code on your setup:

  1. Clone the repository to your local machine:
git clone https://github.com/DhruvaRajwade/IRSM_Spectral_Classification_DL.git
  1. Install the required dependencies:
pip install -r requirements.txt

Files

|___Root
|
├── R_code/ # Requires the Chemospec package 
│   ├── HCA.R # Sample R Code for Hierarchical Clustering using Agglomerate Euclidean distances
│   └── LDA.R # Sample R Code for a Linear discriminant Analysis
│
├── dev/  # Some experimental code I implemented (Not included in the paper, but interesting nevertheless)
│   ├── PINN.py 
│   └── forward_forward_NIPS.py
│
├── sklearn/ # Classical Machine Learning code
│   ├── cross_validation.py # Functions for Cross Validation and Visualization
│   ├── grid_search.py      # Functions to conduct a GridSearch Hyperparameter sweep
│   ├── model_selection.py  # Functions to help in initial model selection
│   ├── pca_visualize.py    # Visualize PCA outputs using Loadings and scatter plots
│   └── plot.py             # Helper script for grid_search.py
│
├── tensorflow/
│   ├── cnn_bayesian_hyperparam_tuning.py # Bayesian hyperparameter tuning using keras-tuner
│   ├── tf_cnn.py # 1 D CNN
│   └── tf_eval.py # Evaluation and model inference
│
├── torch/
│   ├── torch_eval.py # Evaluation and model inference
│   ├── torch_models.py # Variational Autoencoder and Attention Residual ANN
│   ├── torch_train.py  # Training script
│   ├── vae_cross_validation.py # Cross-validate the pipeline
│   └── vae_loss.py    # Loss functions {KL Divergence and BCE hybrid loss controlled by `Temperature` param, Negative Log Likelihood Loss and MSE loss implementations}
│
├── utils/
│   └── data_preprocessing.py # Helper functions to convert the data from raw form to Numpy (for Sklearn and TF) or Torch tensors(For Pytorch)
│
│
├── images # Visualizations for the README file
|
├── README.md (What you're reading)
└── requirements.txt (Generated using pipreqs!)

Overview Of Methodology

Data:

We used FTIR-ATR Data (Endometrial tissue) for the below results. The data was generated by us, and may be available on request.

Model Architectures:

ANN

Variational Autoencoder

1D CNN

Results (For ATR-FTIR Spectroscopy Data)

Initial Model runs (For Model Selection)

Classifier Ideal Value CV Value
Logistic Regression 0.821 0.817
Ridge Classifier 0.806 0.817
SGD Classifier 0.746 0.786
Passive Aggressive Classifier 0.776 0.802
KNeighborsClassifier 0.851 0.845
Decision Tree 0.910 0.888
Linear SVC 0.821 0.802
SVC 0.821 0.819
Gaussian Naive Bayes 0.896 0.848
AdaBoost 0.851 0.829
BaggingClassifier 0.896 0.876
Random Forest 0.881 0.857
ExtraTrees 0.925 0.905
Gaussian Process Classifier 0.821 0.833
Gradient Boosting 0.881 0.871
Linear Discriminant Analysis 0.791 0.802
Quadratic Discriminant Analysis 0.836 0.836
HistGradientBoostingClassifier 0.851 0.860

Pearson Correlation Analysis for the above metric data

Hyperparameter Tuning Of Selected Models

(A) Support vector machine (SVM) (B) Random forest (RF) (C) Adaptive Boosting (AdaBoost) (D) Decision tree (DT) (E) Extreme Gradient Boosting (XGBoost) (F) Gradient boosting (GB) (=)

Results of best performing models

Model Sensitivity Specificity Accuracy F1 Score
SVM 81% 100% 90% 90%
XGBoost 80% 90% 85% 84%
AdaBoost 80% 90% 85% 84%
DT 80% 90% 85% 84%
RT 80% 90% 85% 84%
GB 80% 90% 85% 84%
CNN 90% 100% 94% 95%
ANN 88% 87% 88% 89%

Learning Curves For Best Performing Models

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest)

K Fold Cross Validation: Sensitivity Analysis of the value of K

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest )

Citation

Please consider citing our work using the following bibtex entries,

@article{Sherpa2024,
  title = {Attenuated Total Reflectance–Fourier Transform Infrared (ATR-FTIR) Spectroscopy Combined With Deep Learning for Classification of Idiopathic Recurrent Spontaneous Miscarriage (IRSM)},
  ISSN = {1532-236X},
  url = {http://dx.doi.org/10.1080/00032719.2024.2333960},
  DOI = {10.1080/00032719.2024.2333960},
  journal = {Analytical Letters},
  publisher = {Informa UK Limited},
  author = {Sherpa,  Dadoma and Rajwade,  Dhruva Abhijit and Mitra,  Imon and Biswas,  Souvik and Sharma,  Sunita and Chakraborty,  Pratip and Kalapahar,  Shovandeb and Chattopadhyay,  Ratna and Chaudhury,  Koel},
  year = {2024},
  month = apr,
  pages = {1–17}
}
@inproceedings{Sherpa2023,
  title = {Prediction of Idiopathic Recurrent Spontaneous Miscarriage using Machine Learning},
  url = {http://dx.doi.org/10.1109/ICCECE51049.2023.10085363},
  DOI = {10.1109/iccece51049.2023.10085363},
  booktitle = {2023 International Conference on Computer,  Electrical & Communication Engineering (ICCECE)},
  publisher = {IEEE},
  author = {Sherpa,  Dadoma and Abhijit,  Rajwade Dhruva and Mitra,  Imon and Dhar,  Dhruba and Sharma,  Sunita and Chakraborty,  Pratip and Chaudhury,  Koel},
  year = {2023},
  month = jan 
}

Todo:

Add Example Notebooks Add RayTune hyperparameter sweep code

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published