AnoMalNet: Outlier Detection based Malaria Cell Image Classification Method Leveraging Deep Autoencoder

Class imbalance is a pervasive issue in the field of disease classification from medical images. It is necessary to balance out the class distribution while training a model for decent results. However, in the case of rare medical diseases, images from affected patients are much harder to come by compared to images from non-affected patients, resulting in unwanted class imbalance. Various processes of tackling class imbalance issues have been explored so far, each having its fair share of drawbacks. In this research, we propose an outlier detection based binary medical image classification technique which can handle even the most extreme case of class imbalance. We have utilized a dataset of malaria parasitized and uninfected cells. An autoencoder model titled AnoMalNet is trained with only the uninfected cell images at the beginning and then used to classify both the affected and non-affected cell images by thresholding a loss value. We have achieved an accuracy, precision, recall, and F1 score of 98.49%, 97.07%, 100%, and 98.52% respectively, performing better than large deep learning models and other published works. As our proposed approach can provide competitive results without needing the disease-positive samples during training, it should prove to be useful in binary disease classification on imbalanced datasets.


INTRODUCTION
Malaria is a menacing disease that has affected large hordes of people in the past and continues to do the same at present.The statistics speak for themselves, 2020 saw a record-breaking estimation of 241 million malaria cases worldwide [1].Needless to say, it is imperative to work on the remedies for such a deadly disease in order to mitigate the damages inflicted by them.
Microscopic thick and thin blood smear examinations are the most reliable and routinely used method for disease diagnosis.Thin blood smears help identify the species of the parasite causing the infection, whereas thick blood smears help detect the presence of parasites.However, the efficiency of this manual analysis method heavily depends on the medical personnel carrying out the tasks, also each diagnosis takes a huge deal of time.In the era of automation, where researchers are continuously working on fast and efficient ways of treating malaria, deep learning has been quite popular in terms of the detection and analysis of malaria as discussed in the literature review later on.Along with malaria classification, deep learning has held its grip in several research fields such as medical image analysis, natural language processing, and audio processing [2]- [10] ❒ ISSN: 2089-4864 Though deep learning techniques have proved to be handy in the recent past, it is observed that most of these models are heavy and use up a lot of computational power, this is also an issue when we try to deploy these models to mobile or edge devices.Additionally, these methods do not exhibit much potential when it comes to solving class imbalance issues.To rectify such issues this research work introduces AnoMalNet, an autoencoder-based method for the investigation of malaria in cell tissues that are lighter and capable to solve class imbalance problems in datasets.In this research the following contributions are made: -An anomaly detection-based approach which is built upon autoencoders for the investigation of malaria in cell tissues that deals with class imbalance issues has been introduced.-The proposed model outperforms state-of-the-art models like VGG16, Resnet50, MobileNetV2, and LeNet.
-Comparative analysis with other published methods has been provided.

LITERATURE REVIEW
There have been quite a few research in the field of malaria cell image classification.In the initial section of the review, we are going to discuss some of those.Later on we will have a short discussion on various techniques that have been used for handling class imbalance issue.
Raihan and Nahid [11] used a bioorthogonal wavelet to reduce the image size to 72×72 resolution and extract the features .Images were passed through a custom convolutional neural network (CNN) with three convolutional layers and three fully connected (FC) layers.This CNN was used to extract the features from the first FC layer.In this way, 768 features were found initially.The whale optimization algorithm (WOA) was used to select the optimal subset of features.Samples of these features were passed through the XGBoost algorithm.For set 1 with 768 features, XGBoost achieved 94.92%, 94.34%, 95.57%, and 94.95% and for set 2 with 365 features, the model achieved 94.78%, 94.39%, 95.21%, and 94.80% for accuracy, precision, recall, and F1 score respectively in the validation set.XGBoost model construction time was half for the second set due to a number of features being reduced.Shapley additive explanations (SHAP) was used as a model explainability tool to assess the importance of the features [12].
In the research article of Narayannan et al. [13], reshaped the images into 50×50 resolution and the color consistency technique was applied to maintain the same illumination condition for all the images.A fast CNN model with 6 convolution layers and 2 FC layers was deployed.Additionally, AlexNet, ResNet, VGG-16, and DenseNet with transfer learning from imagenet were deployed.Furthermore, the bag-of-features model using SVM was used.Among all these implemented models, DenseNet got the highest accuracy of 96.6%.Meanwhile, Reddy and Juliet [14] used a pre-trained ResNet with a sigmoid-enabled FC layer as the last layer.Apart from the last few layers, all the other layers were frozen during training.They achieved accuracies of 95.91% and 95.4% for training and validation respectively.The authors reported the existence of a test set in the experiment but no test result was reflected on it.
Rajaraman et al. [15] introduced a customized model with three convolution layers and two FC layers in their research work.AlexNet, Xception, ResNet, and DenseNet121 were used to extract features whereas gridsearch was used for hyperparameter optimization.For each individual CNNs, their default input resolution was used and for the pre-defined architectures, they tried extracting features from different layers and determined the most optimal layer to extract features from in order to improve accuracy.With extracted features from the most optimal layer, they got the highest accuracy of 95.9% from VGG16 and from ResNet50 among all the tested models.In another research, thick blood smear images were collected [16].Among those, 7,245 bounding box instances of plasmodium were annotated in 1,182 images.Images were divided into small patches and passed through a CNN to learn whether the patch contains any object of interest or not.The CNN was run on a 50/50 train-test split with which, they achieved an area under the receiver operating characteristic curve(ROC AUC) of 1.00.Authors claimed that their process is quite efficient since it can directly learn from pixel data.Furthermore, Bibin et al. [17] introduced a trained model based on a deep belief network (DBN) to classify 4,100 peripheral blood smear images into two classes.By stacking limited Boltzmann machines and utilizing the contrastive divergence approach, the DBN is pre-trained.They took features from the photos and initialized the DBN's visible variables in order to train it.This paper's feature vector combines the attributes of color and texture.With an F-score of 89.66%, a sensitivity of 97.60%, and a specificity of 95.92%, the proposed method has surpassed existing state-of-the-art methods significantly.
Lipsa and Dash [18] used the optimum number and size of convolution layers and spooling layers coupled with CNN.An Adam optimizer was employed to train and validate the model where a case study of the Int J Reconfigurable Embedded Syst, Vol. 13 ❒ 173 malaria diagnosis dataset was observed.Images were fed into the CNN keeping their size or color unchanged and assessments of their performance were made.An architectural comparison was performed between the proposed CNN model and some popular CNN architectures with the proposed model having a smaller number of hyperparameters.This comparison demonstrates that the mechanism of this model demands much fewer evaluation parameters, making the suggested approach a time-effective and computationally precise model in terms of predicate accuracy.Nugroho and Nurfauzi [19] used green, green, blue (GGB) color normalization as a preprocessing step in the detection of malaria.The findings demonstrate that their method has greater sensitivity and consistently comparable precision in a number of intersections over union (IoU) thresholds for malaria identification.Finally, Tan et al. [20] employed an automated segmentation of one of the types, plasmodium falciparum out of 5 common types of malaria on a thin blood smear.It was experimented with using their proposed residual attention U-net.When the trained system was applied to verified test data, the results indicated an accuracy of 0.9687 and a precision of 0.9691.Of all the research work that has been discussed so far, pretty much all of them used regular supervised learning-based methods on a balanced dataset.As discussed earlier, class imbalance in medical image datasets is not a rare incident and various approaches are taken to handle class imbalance.One prominent way is to generate synthetic data for the minority distribution class through a generative adversarial network (GAN) and balance out the class distribution.However, Mariani et al. [21] mentioned, GAN itself takes lots of images to generate synthetic data.Therefore, when the synthetic data generation is for a class that is sparse in distribution, it is not realistic to generate good-quality synthetic data since it is not possible to provide GAN with a sufficient amount of training images in the first place.Therefore, although GAN can hypothetically solve class imbalance issues, it is really hard to train a GAN in practice.Additionally, another common way of handling class imbalance is data augmentation.Well-known augmentation techniques include geometric transformation, noise injection, color space transformation, image mixing, applying kernel filter, cropping, random erasing, and so on [22].Some of these techniques, for example, image mixing and applying kernel filter may completely distort an image and change the underlying feature space.This feature transformation is generally unwanted in the case of medical images, as images of different modalities come with a very specific set of features.Additionally, other augmentation tactics, such as geometric transformation, cropping, and noise injecting.are quite limited in terms of creating variation.As a result, due to the limitations of the currently available approaches, the proposed AnoMalNet architecture in this paper can be useful.

METHOD
Classifying malaria cell images into either parasite infected cells or uninfected cell is a binary classification task.Traditional deep neural network (DNN) models can be used for solving this problem.In addition to this, it can be easily formulated as an outlier detection problem with the help of auto-encoders.In the following subsections, several DNN models like LeNet, VGG16, ResNet50, MobileNetv2, and autoencoders are described.

Deep neural network models
One of the earliest DNN models that have been proposed is LeNet [23].It consists of three convolutional layers of kernel size 5 and two average pooling layers.Additionally, it also has two fully-connected layers which act as the classifiers.Compared to LeNet, VGG16 is a much deeper model comprising 13 convolutional layers and 3 fully-connected layers [24].This model uses a smaller kernel size.In this approach, the kernel size is set to 3. Theoretically, a deeper neural network model should perform better than a shallow one.However, it was found that if a deep neural network is used then a problem arises which is the vanishing gradient problem.In order to solve this problem, the ResNet model was proposed which contains residual blocks [25].These blocks contain skip connections that take the output from previous layers and feed it to the later ones.Thus it helps in solving the vanishing gradient problem.MobileNetV2 is a CNN architecture that is specifically designed for mobile devices.It is built on an inverted residual structure where the bottleneck layers are coupled by residual connections [26].Lightweight depthwise convolutions are used in the intermediate expansion layer as a source of non-linearity to filter features.
An autoencoder is a DNN model which tries to learn its input to the output.It is a two part DNN model where the first part is an encoder network while the later one is a decoder network.The task of the encoder part is to encode the input to a representation of the input of a smaller dimension while the decoder tries to collect this representation and reconstruct it to the original input.With the help of this encoder and decoder ❒ ISSN: 2089-4864 network, an autoencoder can be used to create a spare representation of the input data.For this research work a custom convolutional autoencoder is used.The major difference between this AnoMalNet architecture and autoencoders is that, in convolutional autoencoders, convolutional layers are used while in the later case, regular feed forward neural networks are used.In order to create the encoder network three convolution layers were used of 4, 16, and 32 channels respectively.The kernel size for all of these convolution layers were set to 3×3 and the padding was set to 1.All of these convolution layers were followed by relu activation function and max pooling layers of 2×2.
The decoder network was comprised of three transpose convolution layers of 32, 16, and 4 channels respectively.In this case, 2×2 was set as the kernel size and the stride value was set at 2. Apart from the last transpose convolution layer, all the other layers' outputs were passed through a relu function while the output of the final layer went through a sigmoid activation function.Figure 1 provides a graphical view of our custom autoencoder.

Proposed approach
At the very beginning, some simple data preprocessing techniques are applied on the image dataset.All the images are reshaped to a dimension of 32×32.Additionally, all the images are converted from the red green blue (RGB) to gray-scale color space.Changing the color space domain doesn't create many problems for this task because the parasite is still visible in the gray-scale color space and by doing so it removes unnecessary noises in the images.
After this step, a custom auto-encoder with the decoder network is trained using only uninfected cell images.Mean squared error (MSE) is used as a loss function to train the weights of the model.The proposed approach is based on the intuition that, during testing, this trained auto-encoder will achieve a loss score for uninfected cell images.However, in the case of parasite infected cells, the model will output a significantly higher loss value.This can be visualized with the help of Figure 2. The training and testing of the model in the case of uninfected cell images are shown while in Figures 2(a) and (b), the performance of the model during inference on infected cell images is shown.With the help of simple statistics, a cut-off point can be established to label unknown cell images as infected ones or uninfected ones.An unknown image is determined as an outlier or infected cell if the loss value of the unknown image which we get after passing through the model is more than the mean plus three times of the standard deviation of train loss.This is a standard statistical approach to determine whether a particular data is an outlier or not.3(c) and (d) contain the original and reconstructed images of infected cells.From these figures, it can be seen that in the case of uninfected normal cell images, the reconstructed image is quite similar to the original one.However, in case of parasite infected cells, the reconstruction is not so good.The model is able to get the shape correct to some extent but not the parasite inside the cell.

EXPERIMENTAL RESULTS AND ANALYSIS
The models and proposed approach that were described in the previous section were used for experimentation on a dataset which can be used for malaria parasite classification.Details about the dataset can be found in the following subsection.Apart from this, a detailed discussion on the experimental setup, results and analysis are provided in the following subsections.

Dataset description
The dataset that has been used here is collected from the National Institute of Health (NIH).It contains a total of 27,558 images.Among these, 13,779 images are infected with malaria parasites while the rest of them are uninfected cell images.All the images are in RGB color-space.

Experimental setup
In order to train the AnoMalNet model, randomly selected 1,607 uninfected cell images were used.For validating the models performance, 407 uninfected cell images were used.A total of 4,009 images were used to train and evaluate the performance of the AnoMalNet model.The MSE loss function was used along with the Adam optimizer and the learning rate was set to 0.01.The model was trained for a total of 200 epochs.During the testing phase, a total of 5,512 images were used.Among them, 2,757 were parasite infected images and 2,755 were uninfected cells.After training the model, the loss value of the validation set was used to calculate the mean and standard deviation which was later used for creating a threshold for decision making.As mentioned in section 3.2, a particular image is labeled as a parasitized cell image if it's loss value is greater than the mean plus three times of the standard deviation.Using this threshold, all the images are classified with the help of an autoencoder.
Several DNN networks namely LeNet, VGG16, ResNet50, and MobileNetV2 were trained to compare the performance of the proposed approach [23]- [26].All of these models were trained on 22,046 images from the dataset which contain both infected and uninfected cell images.These DNN models were trained for 100 epochs each using Adam optimizer while keeping the learning rate to 0.01, having cross entropy as the loss function.

Results and discussion
For all the trained models, loss vs epoch curve can be found in Figure 4.In Figure 4(a) and (b) training and testing loss of all the models can be visualized.From this curve, it can be seen that after a while traditional DNNs tend move in such a direction that the test loss increases.However, autoencoder based proposed approach doesn't have this problem and moves toward achieving a zero score in test loss.Although it should be kept in mind that the autoencoder is testing on unknown uninfected cell images while the DNNs are performing tests on unknown infected and uninfected cell images.
After training the model properly, the results shown in Table 1 are obtained.In order to better understand the performance of the proposed approach four different metrics are used which are accuracy, precision, recall and F1 score.To be able to visualize the comparison of various models a bar chart is displayed as well in Figure 5.The lowest performing model according to the bar chart and the table is LeNet.This model was able to acquire 94.64%accuracy, 94.73% precision, 94.56% recall and 94.64% F1 score.MobileNetv2 attained the second highest accuracy and F1 score which is 96.28% and 96.27%.However, the best performing method was the proposed AnoMalNet.It was able to achieve 98.49% accuracy, 97.07%precision, 100% recall and 98.52% F1 score.Apart from comparing with traditional DNN models, another study is also conducted with other proposed methods which are proposed by other researchers.Table 2 shows a comparison of this manuscripts proposal with other research works.From this table, it can be seen that the proposed autoencoder based outlier detection method outperforms other traditional DNN based classification techniques.Table 2. Comparative study of the performance of the proposed method against other published approaches Method Accuracy (%) Narayanan et al. [13] 96.60 Reddy and Juliet [14] 95.40 Raihan and Nahid [11] 94.78 AnoMalNet 98.49

CONCLUSION
An autoencoder-based DNN architecture is presented in this research work for classifying malaria parasite infections in cell images.This DNN model is trained to identify outliers i.e. parasite-infected cells.As this model is trained on completely normal cell images, this method provides an advantage in scenarios where disease-positive samples are scarce.With the help of MSE loss and a threshold, this approach can correctly identify images with malaria parasites.Additional comparisons with other traditional DNN models have been shown in this experiment from which it can be seen that the proposed approach performs better than traditional DNN models.There are some scopes of improvement in this research work.Like incorporating more complex datasets for this model, expanding the task from binary classification to multi-class classification.

Figure 2 .
Figure 2. Proposed methodology (a) uninfected cell images for training and (b) infected cell images used for testing only

Figure 4 .
Figure 4. Loss vs epoch graph for (a) training data and (b) testing data