Polycystic ovarian syndrome
INTRODUCTION
Polycystic ovarian syndrome (PCOS) is an endocrine disease that is identified roughly in 10-20% of women in reproductive age. PCOS is characterized as a menstrual disorder and hyperandrogenism. It is classified as polycystic ovary morphology, oligo-anovulation, and hyperandrogenism. It is related to high-risk endometrial cancer, infertility, and metabolic syndrome. Indeed of its occurrence, this syndrome may not be understood completely, and diagnosis seems to be challenging. It is observed that the enhanced resolution of ultra-sound technology may improve the number of follicles identification and, therefore, identify the occurrence of morphological changes. To overcome this issue, an earlier prediction of PCOS is essential. Recently, Machine Learning (ML) algorithm plays a vital role in disease prediction by reducing computational complexity and improves prediction accuracy for making the appropriate decision for treating PCOS.
- LITERATURE REVIEW
Researchers have made various attempts to diagnose and predicting disease severity with ML approaches. Here, a vast review is made to analyze the diagnostic perspective of multiple investigators. (Fan Wang et al., 2017)reported the prediction of cancer by examining the severity of tumor features. To extract the most needed information from the predicted tumor, the author implies the hybridization of k-means and K-SVM. The former model is applied to recognize the hidden patterns behind the benign and malignant tumors.
Similarly, the latter model is used to classify the class labels of the given dataset. The results outperform the other model in prediction efficiency during cancer prediction and save enormous time during the training process. (Rees Da et al., 2016), used NB, CART, J48, Bayes, and REPTREE for predicting heart disease effectually. The accuracy attained with the NB is higher for the J48 tree.
Similarly, (Moran et al., 2015) evaluated the performance of three classifiers k-NN, NB, and decision tree based on parameters like F-measure, accuracy, precision, and recall. Here, k-NN produces the most exceptional accuracy compared to other models. (Vieira et al., 2013) have anticipated two diverse ant colonies with the integration of fuzzy set for selecting features and another for reducing features. These are used to diminish classification errors. A fuzzy set is used as classification accuracy computation. (Jones et al., 2016) have considered Probability Density Function with an ant colony for measuring exponential functionality to select features and outcomes are compared with obtained values and genetic algorithms. (Fan et al. 2017) demonstrated about cancer diagnosis using tumor features. To extract deterministic information and to predict the tumor, hybrid K-means and k-SVM has been designed. These methods are used to identify patterns of benign and malignant tumors.
- PROBLEM STATEMENT
PCOS is a prevalent hormonal and heterogeneous endocrine disorder that is highly prone to anovulation, infertility, Type II diabetes, cardiovascular disease, obesity, and so on. The earlier prediction and diagnosis with minimal test result leads to reduced miscarriage risk, infertility, mental agony, and gynecological cancer. Therefore, to achieve the factors mentioned above and to improve the prediction accuracy, Machine Learning algorithm plays a substantial role in disease prediction and reduces time complexity while processing the deterministic features that triggers PCOS in women and to provide earlier prediction mechanism for Gynecologist.
- MODELLING OF F3I BASED FEATURE SELECTION APPROACH FOR PCOS CLASSIFICATION AND PREDICTION
4.1 OBJECTIVE
The ultimate objective of using the F3I Feature Selection approach is 1) F3I based feature selection reduces the complexity encountered in conventional FireFly approaches. 2) To enable faster training of classifier approaches. 3) It reduces over-fitting and predicts PCOS with improved accuracy.
4.2. WORKING MECHANISM
Furious FireFly (F3I): The proposed Furious FireFly (F3I) based feature selection includes three different phases known as ROI construction, search strategy, and follicle mapping. The mapping process is updated in all successive iterations. a) ROI Construction: In general, region attraction is determined as an optimal functionality of F3I. The ovarian characteristics are examined` from the given input images using spectral domain-based variation prediction. 1) Compute input image mapping with the residual process. 2) Evaluate the highest value pixel and resize the image with the newer size of the original image. 3) The resized image is a unique image. 4) The ROI construction process is terminated when the number of iterations is limited. 5) The ROI selection process gives better results when compared to a spectral algorithm. b) Searching strategy for follicle identification: Here, original follicles are bounded by local minima, and ovarian boundaries are separated by real follicles. Follicles regions are characterized by varying other regions outside the ovary. With all these observations, follicles of ultrasound images are classified into two classes: Class 0 specifies ovary with no follicles (no PCOS); similarly, class 1 specifies ovary with follicles (with PCOS). Follicles are considered as local minima inside the ovary. By considering these ovarian boundary characteristics, two classes are utilized to determine the minimum local relationship with ovarian boundary. Here, follicle identification is performed based on object growth. c) Follicle mapping with adaptive strategy: Follicle mapping uses internal and external factors of ovarian regions to perform computation. The factors given below are considered for mapping follicles. 1) Determine local minimum using ROI center and predict the target set. Evaluate the threshold limit by setting the proper value. 2) Object cost is computed based on cost-based threshold values. 3) Remove the object that has a higher threshold cost. 4) When an object is removed, calculate the threshold value and record the detected follicles, and these features are fed as an input to classifiers.
Classification: After performing feature selection with F3I, the chosen features are fed to classifiers like Naïve Bayes and Artificial Neural Network (ANN). The purpose of classification is to validate accuracy. a) Naive Bayes (NB): NB is a supervised learning approach. It is a conditional probability theorem to identify feature vector class. It produces better results during disease prediction with feature selection when compared to other approaches. NB classifier results improved accuracy while classifying feature instances. NB determines the conditional probability of the provided class. After conditional probability evaluation, a newer vector class is defined. Thus, class labels are identified using the NB classifier. b) Artificial Neural Networks (ANN): ANN is a supervised ML algorithm that combines neurons to transfer the messages. The objective is to demonstrate a number of parameters whose value is constant and variables. ANN possesses three phases, including inputs, outputs, and transfer functions. The input unit has values that are altered during the training network process. The output is evaluated with known class; weight is re-computed using actual class and predicted output.
- PERFORMANCE EVALUATION
The simulation has been done in MATLAB 2018a environment. Here, input data is assessed, and outcomes are evaluated for PCOS prediction. Metrics like True Positive (TP), True Negative (TN), False Positive (FP) and False Negative (FN) is used as the performance measure.
Fig 1: Performance Metrics based on F3I
The anticipated F3I model gives 98.63% prediction accuracy when ultrasound images are provided as input. Similarly, 100% precision, 55% recall, 68.76% F-measure, and 100% specificity is obtained by validating the proposed model. The follicles are mapped and identified effectually using F3I, which improves the classifier’s performance by eliminating the over-fitting encountered in the training network and reduces the misclassification error compared to conventional approaches.
- PCOD IDENTIFICATION USING HYBRID SVM AND ACO TECHNIQUE WITH MEDICAL DATASET
5.1. OBJECTIVE
The ultimate objective of Hybridizing Support Vector Machine (SVM) and Ant Colony Optimization (ACO) are 1) To enhance the classifier performance and to acquire optimal results. 2) The hybridized approach benefits advantages from two or more methods to reach better performance. 3) SVM is used to attain improved class results; an ACO is used to acquire global outcomes. 4) The classifier provides better results to make the decision-making process more efficient.
5.2. MECHANISM
The ultimate objective of modeling this approach is to recognize PCOS symptoms in an earlier stage and carry out necessary precautions. There are two phases included in this predictor model. They are Ant colony based feature selection and Hybrid SVM classifier. The utilization of optimization techniques is to attain an optimal solution. Similarly, the classification approach predicts the class labels, and i.e., PCOS or Non-PCOS. The selected features are given as an input to classifier for attaining higher accuracy.
Feature selection with ACO: Here, the optimal feature subset is chosen. Every feature specifies numerical values, and all of them are independent of one another. and are parameters of relatively heuristic and Pheromone information. These pheromone values are initialized with and values that may be provided with one which is the best value of and , and ten is more appropriate for initializing pheromone trail values. When every Ant is supposed to complete their tour, Pheromone’s trail is updated based on global update rule. This specifies higher F-measure value of Ant chosen by subset values, more Pheromone deposited over feature subset, and those features are considered to be selected in successive iterations. The chosen features are fed to HSVM, where kernel computation is performed to evaluate hyperplanes’ marginal values.
Hybrid Support Vector Machine classifier: Generally, SVM is a supervised machine learning approach used to classify data into positive and negative classes.SVM plays a significant role in Clinical application. For predicting follicles, this method provides superior performance with simple classification due to the competency to produce class weights. The feature data is supplied to the SVM classifier for analysis, and the linear equation is determined to classify patients into final classification outcomes. While performing kernel computation, SVM has to learn the marginal width among hyperplane; there are diverse methods involved in kernel computation to enhance the SVM algorithm’s performance.SVM margin is directly proportional to one another, i.e., .if margin size increases, the hyperplane based misclassification rate is also increased. value assists in determining the marginal value of SVM hyperplane. To reduce the misclassification rate, marginal size has to be reduced. Therefore, vectors are classified based on space regions.
- PERFORMANCE EVALUATION
Performance metrics like accuracy, precision, recall, F-measure, error rate, and specificity are computed using the HSVM classifier. Here, the dataset is partitioned into a 70:30 ratio for testing and training purposes.
Fig 2: Performance Metrics based on HSVM
The training data gives an optimal solution by predicting the most discriminative clinical features. Features like Age, Testosterone, Androstenedione, No of follicles, Endometrial Thickness, Ferriman Gallwey, BMI, obesity, and Free Testosterone are predicted as the most influencing feature. These features are chosen based on the tour performed by Ant and the Pheromone released by them.
- K- NEAREST NEIGHBORS FOR PREGNANCY RISK PREDICTION BY REDUCTION AND PRUNING METHOD
6.1. OBJECTIVE
The ultimate objective of this work is to: 1) Predict the pregnancy risk of women in childbearing age. 2) Pathogen analysis based risk factors are encountered using the k-NN and MapReduction process. 3) Factors that influence infertility in a woman are identified using hit rate analysis (pregnancy hit rate and non-pregnancy hit rate). 4) To prune the non-essential elements from overall clinical pathogens. These pathogens are associated with genetic factors of patients.
6.2. MECHANISM
The learning algorithm facilitates a vast amount of features to be considered as inputs. The input variable choice and decision making are performed with the learning method. The benefit of using this algorithm is its efficiency, simplicity, and competency to handle non-linear separable and homogeneous cases. The learning approaches are robust towards misclassification points and outliers.
Map Reduction: Here, MapReduction based pruning process is employed and tuned with two essential factors: a minimal amount of samples provided from feature value sets and a confidence level of pruning. The error pruning is supplied with the ‘P’ parameter. It ranges from 0% to 100%. For effectual computation, the misclassification rate and confidence level are evaluated. The k-NN based pruning is used to identify the pregnancy rate, infertility rate, and classification without any misclassification errors. With MapReduction analysis, the correlation among the pathogens and pregnancy risks is evaluated, when the p-value is less than . With this correlation process, some factors like (Pelvic inflammatory disease (PID), Chocolate cyst, Intrauterine adhesion*, Kallmann syndrome, Pelvic tuberculosis, Endometrial hyperplasia (EH), Endometrial tuberculosis) are reduced/pruned. These are reduced to recognize the most dominating features for predicting infertility. As an outcome, 13 features are chosen or selected for further analysis.
Cumulative Pregnancy rate estimation: The infertility rate prediction is exceptionally complex with pathogen analysis like an endometrial, embryo, and hyper-stimulation features. Based on various reviews, only embryo features are considered for predicting the pregnancy rate with machine learning algorithms. When considering the patients’ essential characteristics under treatment, the pregnancy rate prediction is also being similar. This is not considered to be the underlying assumption while performing further procedures. Therefore, pathogens are found with the assumption level, and more features are extracted before performing classification, and more appropriate/individual pregnancy prediction can be made.
k-Nearest Neighbors: For performing classification, a supervised learning model known as k-NN is used, where k-NN works by assigning samples to predefined class for the majority of data space. The distance metric is used to evaluate distance among individual samples for all samples being sorted. k-NN variation model was developed with MapReduction for computing classification accuracy, sensitivity, and specificity. Prediction accuracy is the ratio among the number of appropriately classified samples and the total number of samples.
6.3. PERFORMANCE EVALUATION
The specificity and sensitivity analysis is a general paradigm in the medical field for minor computing changes for validating the infertility outcome prediction model. This work concentrates on pregnancy and non-pregnancy rate due to infertility. With the experimentation, the number of non-pregnant cases, which is 2.5% higher than the pregnant instances, this makes learning approach to automatically enhance specificity by reducing sensitivity for attaining higher prediction accuracy. This prediction model attempts to identify non-pregnant cases due to infertility than pregnant cases. The sample observation is more apparent for variations with higher sensitivity. It is more evident with the chosen k-value. This investigation’s objective is fulfilled by recognizing the variety of k-NN over other classifier models in predicting pregnancy risk.
Fig 3: Performance Metrics based on k-NN
- CONCLUSION
PCOS problem is identified in most women all over the world. Here, various investigations have been done with feature selection approaches like F3I, ACO, and pruning model. From all these feature selection processes, F3I shows a better feature selection process for analyzing infertility, identification of pregnancy rate, stressful syndrome, and treated based on a priority basis. Similarly, ANN, NB, HSVM, and k-NN model is used for classification purpose. The overall prediction accuracy is higher with the implementation of the F3I-ANN/NB classifier compared to individual classifiers like HSVM and k-NN. This method shows better trade-off in contrary to existing approaches. The prediction accuracy attained by F3I-ANN/NB is 98.63%, 25.93%, and 16.82% higher than other individual classifiers.
REFERENCES
- Fan Wang, Shaobing Wang, Zhenghong Zhang, Qingqiang Lin, Yiping Liu, Yijun Xiao, Kaizhuan Xiao, and Zhengchao Wang (2017). Defective insulin signaling and the protective effects of dimethyldiguanide during follicular development in the ovaries of polycystic ovary syndrome. Molecular Medicine Reports. Vol. 16, no. 6, pp. 8164-8170. doi: 10.3892/mmr.2017.7678
- Rees DA, S J-J, Morgan CL. Contemporary reproductive outcomes for patients with polycystic ovary syndrome: a retrospective observational study (2016). J Clin Endocrinol Metab. Vol. 101, No. 4, pp. 1664–1672.
- Moran LJ, March WA, Whitrow MJ, Giles LC, Davies MJ, Moore VM (2015). Sleep disturbances in a community-based sample of women with polycystic ovary syndrome. Hum Reprod. Vol. 30, No. 2, pp. 466–472.
- Vieira, S., Mendonҫa, L., Farinha, G. and Sousa, J. (2013) ‘Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients’, Applied Soft Computing, Vol. 13, No. 8, pp.3494–3504.
- Jones, M. R. &Goodarzi, M. O. Genetic determinants of polycystic ovary syndrome: progress and future directions. Fertility and sterility 106, 25–32, (2016).
- Fan Wang, Shaobing Wang, Zhengchao Wang (2017). Defective insulin signaling and the protective effects of dimethyldiguanide during follicular development in the ovaries of polycystic ovary syndrome. Molecular Medicine Reports. Vol. 16, no. 6, pp. 8164-8170
- Rustam Z and Ariantari N P A A 2018 Support Vector Machines for Classifying Policyholders Satisfactorily in Automobile Insurance Journal of Physics: Conference Series1028.
- Liu, H., Zhao, H. & Chen, Z. J. Genome-Wide Association Studies for Polycystic Ovary Syndrome. Seminars in reproductive medicine 34, 224–229, (2016).
- Kori, M., Gov, E. &Arga, K. Y. Molecular signatures of ovarian diseases: Insights from network medicine perspective. Systems biology in reproductive medicine 62, 266–282, (2016).
- TrishanPanch PS, Atun R. Artificial intelligence, machine learning and health systems. J Global Health. 2018;8(2).
LIST OF PUBLICATIONS
- Maheswari, T.Baranidharan, S.Karthik, T.Sumathi ‘Modelling of F3I based feature selection approach for PCOS classification and prediction’, Journal of Ambient Intelligence and Humanized Computing (2020), ISSN No: 1868-5137. https://doi.org/10.1007/s12652-020-02199-1 [Annexure I, IF: 4.594, Published in Springer].