A REVIEW OF CLUSTER UNDER-SAMPLING IN UNBALANCED DATASET AS A METHODS FOR IMPROVING SOFTWARE DEFECT PREDICTION

dc.contributor.authorAbdulhamid Sani, V. S. Manjula and Musa Ahmed Zayyad
dc.date.accessioned2024-04-09T08:29:57Z
dc.date.available2024-04-09T08:29:57Z
dc.date.issued2023
dc.description.abstractIn many real-world machine learning applications, including software defect prediction, detecting fraud, detection of network intrusion and penetration, managing risk, and medical dataset, class imbalance is an inherent issue. It happens when there aren't many instances of a certain class mostly the class the procedure is meant to identify because the occurrence the class reflects is rare. The considerable priority placed on correctly classifying the relatively minority instances—which incur a higher cost if incorrectly categorized than the majority instances—is a major driving force for class imbalance learning. Supervised models are often designed to maximize the overall classification accuracy; however, because minority examples are rare in the training data, they typically misclassify minority instances. Training a model is facilitated by balancing the dataset since it keeps the model from becoming biased in favor of one class. Put another way, just because the model has more data, it won't automatically favor the majority class. One method of reducing the issue of class imbalance before training classification models is data sampling; however, the majority of the methods now in use introduce additional issues during the sampling process and frequently overlook other concerns related to the quality of the data. Therefore, the goal of this work is to create an effective sampling algorithm that, by employing a straightforward logical framework, enhances the performance of classification algorithms. By providing a thorough literature on class imbalance while developing and putting into practice a novel Cluster under Sampling Technique (CUST), this research advances both academia and industry. It has been demonstrated that CUST greatly enhances the performance of popular classification techniques like C 4.5 decision tree and One Rule when learning from imbalance datasets.
dc.description.sponsorshipKampala International University
dc.identifier.issn1813-3509
dc.identifier.urihttp://hdl.handle.net/20.500.12493/14445
dc.language.isoen
dc.publisherJournal of Applied Sciences, Information and Computing (JASIC)
dc.titleA REVIEW OF CLUSTER UNDER-SAMPLING IN UNBALANCED DATASET AS A METHODS FOR IMPROVING SOFTWARE DEFECT PREDICTION
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2JASIC 2707745889_a-review-of-cluster-under-sampling-in-unbalanced-dataset-as-a-methods-for-improving-software-defect-prediction.pdf
Size:
644.5 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: