A REVIEW OF CLUSTER UNDER-SAMPLING IN UNBALANCED DATASET AS A  METHODS FOR IMPROVING SOFTWARE DEFECT PREDICTION

Abdulhamid Sani, V. S. Manjula and Musa Ahmed Zayyad

A REVIEW OF CLUSTER UNDER-SAMPLING IN UNBALANCED DATASET AS A METHODS FOR IMPROVING SOFTWARE DEFECT PREDICTION

dc.contributor.author	Abdulhamid Sani, V. S. Manjula and Musa Ahmed Zayyad
dc.date.accessioned	2024-04-09T08:29:57Z
dc.date.available	2024-04-09T08:29:57Z
dc.date.issued	2023
dc.description.abstract	In many real-world machine learning applications, including software defect prediction, detecting fraud, detection of network intrusion and penetration, managing risk, and medical dataset, class imbalance is an inherent issue. It happens when there aren't many instances of a certain class mostly the class the procedure is meant to identify because the occurrence the class reflects is rare. The considerable priority placed on correctly classifying the relatively minority instances—which incur a higher cost if incorrectly categorized than the majority instances—is a major driving force for class imbalance learning. Supervised models are often designed to maximize the overall classification accuracy; however, because minority examples are rare in the training data, they typically misclassify minority instances. Training a model is facilitated by balancing the dataset since it keeps the model from becoming biased in favor of one class. Put another way, just because the model has more data, it won't automatically favor the majority class. One method of reducing the issue of class imbalance before training classification models is data sampling; however, the majority of the methods now in use introduce additional issues during the sampling process and frequently overlook other concerns related to the quality of the data. Therefore, the goal of this work is to create an effective sampling algorithm that, by employing a straightforward logical framework, enhances the performance of classification algorithms. By providing a thorough literature on class imbalance while developing and putting into practice a novel Cluster under Sampling Technique (CUST), this research advances both academia and industry. It has been demonstrated that CUST greatly enhances the performance of popular classification techniques like C 4.5 decision tree and One Rule when learning from imbalance datasets.
dc.description.sponsorship	Kampala International University
dc.identifier.issn	1813-3509
dc.identifier.uri	http://hdl.handle.net/20.500.12493/14445
dc.language.iso	en
dc.publisher	Journal of Applied Sciences, Information and Computing (JASIC)
dc.title	A REVIEW OF CLUSTER UNDER-SAMPLING IN UNBALANCED DATASET AS A METHODS FOR IMPROVING SOFTWARE DEFECT PREDICTION

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2JASIC 2707745889_a-review-of-cluster-under-sampling-in-unbalanced-dataset-as-a-methods-for-improving-software-defect-prediction.pdf
Size:: 644.5 KB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Bachelors Degree of Management Information Systems (BIS)