Towards better semi-supervised classification of malicious software
Document Type
Conference Proceeding
Publication Date
3-4-2015
Abstract
Due to the large number of malicious software (malware) and the large variety among them, automated detection and analysis using machine learning techniques have become more and more important for network and computer security. An often encountered scenario in these security applications is that training examples are scarce but unlabeled data are abundant. Semi-supervised learning where both labeled and unlabeled data are used to learn a good model quickly is a natural choice under such condition. We investigate semi-supervised classification for malware categorization. We observed that malware data have specific characteristics and that they are noisy. Off-the-shelf semi-supervised learning may not work well in this case. We proposed a semi-supervised approach that addresses the problems with malware data and can provide better classification. We conducted a set of experiments to test and compare our method to others. The experimental results show that semi-supervised classification is a promising direction for malware classification. Our method achieved more than 90% accuracy when there were only a few number of training examples. The results also indicates that modifications are needed to make semi-supervised learning work with malware data. Otherwise, semi-supervised classification may perform worse than classifiers trained on only the labeled data.
Publication Source (Journal or Book title)
IWSPA 2015 - Proceedings of the 2015 ACM International Workshop on Security and Privacy Analytics, Co-located with CODASPY 2015
First Page
27
Last Page
33
Recommended Citation
Shakya, S., & Zhang, J. (2015). Towards better semi-supervised classification of malicious software. IWSPA 2015 - Proceedings of the 2015 ACM International Workshop on Security and Privacy Analytics, Co-located with CODASPY 2015, 27-33. https://doi.org/10.1145/2713579.2713587