IRSM Spectral Classification using Machine Learning and Deep Learning

This repository contains code used for our work (paper 1 and paper 2 ) to predict idiopathic recurrent spontaneous miscarriages using Deep Learning (Specifically Variational Autoencoders, Artificial Neural networks and Convolutional Neural Networks (1D) as well as Classical Machine Learning models)

To run the code on your setup:

Clone the repository to your local machine:

git clone https://github.com/DhruvaRajwade/IRSM_Spectral_Classification_DL.git

Install the required dependencies:

pip install -r requirements.txt

Files

|___Root
|
├── R_code/ # Requires the Chemospec package 
│   ├── HCA.R # Sample R Code for Hierarchical Clustering using Agglomerate Euclidean distances
│   └── LDA.R # Sample R Code for a Linear discriminant Analysis
│
├── dev/  # Some experimental code I implemented (Not included in the paper, but interesting nevertheless)
│   ├── PINN.py 
│   └── forward_forward_NIPS.py
│
├── sklearn/ # Classical Machine Learning code
│   ├── cross_validation.py # Functions for Cross Validation and Visualization
│   ├── grid_search.py      # Functions to conduct a GridSearch Hyperparameter sweep
│   ├── model_selection.py  # Functions to help in initial model selection
│   ├── pca_visualize.py    # Visualize PCA outputs using Loadings and scatter plots
│   └── plot.py             # Helper script for grid_search.py
│
├── tensorflow/
│   ├── cnn_bayesian_hyperparam_tuning.py # Bayesian hyperparameter tuning using keras-tuner
│   ├── tf_cnn.py # 1 D CNN
│   └── tf_eval.py # Evaluation and model inference
│
├── torch/
│   ├── torch_eval.py # Evaluation and model inference
│   ├── torch_models.py # Variational Autoencoder and Attention Residual ANN
│   ├── torch_train.py  # Training script
│   ├── vae_cross_validation.py # Cross-validate the pipeline
│   └── vae_loss.py    # Loss functions {KL Divergence and BCE hybrid loss controlled by `Temperature` param, Negative Log Likelihood Loss and MSE loss implementations}
│
├── utils/
│   └── data_preprocessing.py # Helper functions to convert the data from raw form to Numpy (for Sklearn and TF) or Torch tensors(For Pytorch)
│
│
├── images # Visualizations for the README file
|
├── README.md (What you're reading)
└── requirements.txt (Generated using pipreqs!)

Overview Of Methodology

Data:

We used FTIR-ATR Data (Endometrial tissue) for the below results. The data was generated by us, and may be available on request.

Model Architectures:

ANN

Variational Autoencoder

1D CNN

Results (For ATR-FTIR Spectroscopy Data)

Initial Model runs (For Model Selection)

Classifier	Ideal Value	CV Value
Logistic Regression	0.821	0.817
Ridge Classifier	0.806	0.817
SGD Classifier	0.746	0.786
Passive Aggressive Classifier	0.776	0.802
KNeighborsClassifier	0.851	0.845
Decision Tree	0.910	0.888
Linear SVC	0.821	0.802
SVC	0.821	0.819
Gaussian Naive Bayes	0.896	0.848
AdaBoost	0.851	0.829
BaggingClassifier	0.896	0.876
Random Forest	0.881	0.857
ExtraTrees	0.925	0.905
Gaussian Process Classifier	0.821	0.833
Gradient Boosting	0.881	0.871
Linear Discriminant Analysis	0.791	0.802
Quadratic Discriminant Analysis	0.836	0.836
HistGradientBoostingClassifier	0.851	0.860

Pearson Correlation Analysis for the above metric data

Hyperparameter Tuning Of Selected Models

(A) Support vector machine (SVM) (B) Random forest (RF) (C) Adaptive Boosting (AdaBoost) (D) Decision tree (DT) (E) Extreme Gradient Boosting (XGBoost) (F) Gradient boosting (GB) (=)

Results of best performing models

Model	Sensitivity	Specificity	Accuracy	F1 Score
SVM	81%	100%	90%	90%
XGBoost	80%	90%	85%	84%
AdaBoost	80%	90%	85%	84%
DT	80%	90%	85%	84%
RT	80%	90%	85%	84%
GB	80%	90%	85%	84%
CNN	90%	100%	94%	95%
ANN	88%	87%	88%	89%

Learning Curves For Best Performing Models

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest)

K Fold Cross Validation: Sensitivity Analysis of the value of K

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest )

Citation

Please consider citing our work using the following bibtex entries,

@article{Sherpa2024,
  title = {Attenuated Total Reflectance–Fourier Transform Infrared (ATR-FTIR) Spectroscopy Combined With Deep Learning for Classification of Idiopathic Recurrent Spontaneous Miscarriage (IRSM)},
  ISSN = {1532-236X},
  url = {http://dx.doi.org/10.1080/00032719.2024.2333960},
  DOI = {10.1080/00032719.2024.2333960},
  journal = {Analytical Letters},
  publisher = {Informa UK Limited},
  author = {Sherpa,  Dadoma and Rajwade,  Dhruva Abhijit and Mitra,  Imon and Biswas,  Souvik and Sharma,  Sunita and Chakraborty,  Pratip and Kalapahar,  Shovandeb and Chattopadhyay,  Ratna and Chaudhury,  Koel},
  year = {2024},
  month = apr,
  pages = {1–17}
}

@inproceedings{Sherpa2023,
  title = {Prediction of Idiopathic Recurrent Spontaneous Miscarriage using Machine Learning},
  url = {http://dx.doi.org/10.1109/ICCECE51049.2023.10085363},
  DOI = {10.1109/iccece51049.2023.10085363},
  booktitle = {2023 International Conference on Computer,  Electrical &amp; Communication Engineering (ICCECE)},
  publisher = {IEEE},
  author = {Sherpa,  Dadoma and Abhijit,  Rajwade Dhruva and Mitra,  Imon and Dhar,  Dhruba and Sharma,  Sunita and Chakraborty,  Pratip and Chaudhury,  Koel},
  year = {2023},
  month = jan 
}

Todo:

Add Example Notebooks Add RayTune hyperparameter sweep code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRSM Spectral Classification using Machine Learning and Deep Learning

To run the code on your setup:

Files

Overview Of Methodology

Data:

Model Architectures:

ANN

Variational Autoencoder

1D CNN

Results (For ATR-FTIR Spectroscopy Data)

Initial Model runs (For Model Selection)

Pearson Correlation Analysis for the above metric data

Hyperparameter Tuning Of Selected Models

(A) Support vector machine (SVM) (B) Random forest (RF) (C) Adaptive Boosting (AdaBoost) (D) Decision tree (DT) (E) Extreme Gradient Boosting (XGBoost) (F) Gradient boosting (GB) (=)

Results of best performing models

Learning Curves For Best Performing Models

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest)

K Fold Cross Validation: Sensitivity Analysis of the value of K

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest )

Citation

Todo:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
R_code		R_code
dev		dev
images		images
sklearn		sklearn
tensorflow		tensorflow
torch		torch
utils		utils
.gitattributes		.gitattributes
README.md		README.md
requirements.txt		requirements.txt

DhruvaRajwade/IRSM_Spectral_Classification_DL

Folders and files

Latest commit

History

Repository files navigation

IRSM Spectral Classification using Machine Learning and Deep Learning

To run the code on your setup:

Files

Overview Of Methodology

Data:

Model Architectures:

ANN

Variational Autoencoder

1D CNN

Results (For ATR-FTIR Spectroscopy Data)

Initial Model runs (For Model Selection)

Pearson Correlation Analysis for the above metric data

Hyperparameter Tuning Of Selected Models

(A) Support vector machine (SVM) (B) Random forest (RF) (C) Adaptive Boosting (AdaBoost) (D) Decision tree (DT) (E) Extreme Gradient Boosting (XGBoost) (F) Gradient boosting (GB) (=)

Results of best performing models

Learning Curves For Best Performing Models

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest)

K Fold Cross Validation: Sensitivity Analysis of the value of K

((A) Support vector machine (SVM) (B) Adaptive boosting (AdaBoost) (C) Extreme boosting (XGBoost) (D) Gradient boosting (E) Decision tree (F) Random forest )

Citation

Todo:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages