Malaria Detection
Using Deep Learning and Convolutional Neural Networks to creat a computer vision model to cost effectively detect maaria
By James Hochleutner
https://github.com/jhochle/MalariaDetection
James Hochleutner Deep Learning Final Presentation.pptx
- Malaria is a deadly disease that is transmitted by infected mosquitos.
- CDC estimates 400,000 people die of malaria annually.
- Traditional Malaria Detection requires examination of samples by trained labratory professionals
- Many types fo highly effective antimalaria medications are avaliable for treatment
- Low Cost, fast, and scalable detection could significantly reduce malaria mortality
- Using a computer vision model we can elliminate the need for a trained lab technician to evaluate samples
- Final model trained using a 14 layer Convolutional Neural Network (CNN) achieved 98.8% accuracy
Data Set
Starting with a labeled data set of 27,558 color images takene from microscopic images of red blood cells
- All images are color (RGB) with color values between 0 and 255
- Images are all labeled as either “Parasitized” or ‘Uninefected”
- Data is split into training (24,958) and testing (2,600) data sets
- Data sets are nearly perfectly balanced between Parasitized and Uninfected
Data Pre-Processing
- Images are resized to ensure continuity when training the computer vision model
- First images are resized to 64x64 pixels for faster training times, and later 100x100 to test if lager image sizes improves accuracy
- I tested a number of image augmentation techniques to try to improve the model
- HSV, Gaussian Blurring, and ImageDataGenerator are all used to introduce more noise to the images and decrease model overfitting
- Images are all normalized by dividing color values by 255 or 252 (HSV)

Model Training Approach and Evaluation
- Models will all be trained using a common Random Seed
- Early stopping once valudation loss fails to improve in 2 Epochs will help to reduce model overfitting
- Models are evaluated on overall model accuracy and Recall to help minimize cases of failing to detect parasitized samples
- Plotting a Confusion Matrix hels with identifying the tradoffs between model accuracy and Recall
Initial Model
- Base Model used is a Squential CNN with 1,058,786 trainable parameters
- The model will scan each image in small batches of pixels looking for common patterns
- Convolutional layers with ReLu activation functions are used
- Max Pooling Layers designed to extract the most dominant features
- Dropout layers randomly turn off filters to help prevent model overfitting
- This structure is repeated before adding a Flattening layer and a fully connected layer before a Softmax Activation layer
- Model performed extremely well with 98.3% accuracy


Model revisions
Many attempts to improve model accuracy were taken
- Adding additional layers to increase model complexity
- Changing Activation Function to Leaky ReLu
- Batch Noramlization Layers
- Image Augmentation (HSV, Gaussian Blurring, ImageDataGenerator, Larger Image sized)
- Changing Batch Sizes
- Kernal Regularization
None of these attempts significantly improved model performance
Pre Trained Model Structure
Using the Pre Trained VGG-16 model yielded suboptimal results. The VGG-16 model is more complex with over 19 layers and 14,714,688 trainable paramaters. The VGG-16 model performed the worst of all models tested with 88.3% overall accuracy
Simpler Model Structure
Since attempts to incrase model complexity from our base model failed to improve model performance and given that the far more complex VGG-16 model showed inferior performance, I decided to try to use a simpler model with fewer trainable parameters. The approach worked and achieved superier overall accuracy and Recall
- Model with fewer layers (14)
- Far Fewer trainable parameters by reducing filters per layer (203,970)
- New loss minimizer
- Faster training speeds due to ferwer trainable parameters (20 seconds using GPU on GoogleColab Pro)
- Superier performance 98.8% overall model accuracy and 99% recall
Final Model

