Doctor of Philosophy (PhD)
In large databases, there may exist critical nuggets - small collections of records or instances that contain domain-specific important information. This information can be used for future decision making such as labeling of critical, unlabeled data records and improving classification results by reducing false positive and false negative errors. In recent years, data mining efforts have focussed on pattern and outlier detection methods. However, not much effort has been dedicated to finding critical nuggets within a data set. This work introduces the idea of critical nuggets, proposes an innovative domain-independent method to measure criticality, suggests a heuristic to reduce the search space for finding critical nuggets, and isolates and validates critical nuggets from some real world data sets. It seems that only a few subsets may qualify to be critical nuggets, underlying the importance of finding them. The proposed methodology can detect them. This work also identifies certain properties of critical nuggets and provides experimental validation of the properties. Critical nuggets were then applied to 2 important classification task related performance metrics - classification accuracy and misclassification costs. Experimental results helped validate that critical nuggets can assist in improving classification accuracies in real world data sets when compared with other standalone classification algorithms. The improvements in accuracy using the critical nuggets were statistically significant. Extensive studies were also undertaken on real world data sets that utilized critical nuggets to help minimize misclassification costs. In this case as well the critical nuggets based approach yielded statistically significant, lower misclassification costs than than standalone classification methods.
Document Availability at the Time of Submission
Secure the entire work for patent and/or proprietary purposes for a period of one year. Student has submitted appropriate documentation which states: During this period the copyright owner also agrees not to exercise her/his ownership rights, including public use in works, without prior authorization from LSU. At the end of the one year period, either we or LSU may request an automatic extension for one additional year. At the end of the one year secure period (or its extension, if such is requested), the work will be released for access worldwide.
Sathiaraj, David, "On Identifying Critical Nuggets Of Information During Classification Task" (2013). LSU Doctoral Dissertations. 506.