Today, with the increasing use of the internet by various software users, we are witnessing the emergence of new types of malware on a daily basis. Because these malwares are mutants of older types of malwares, nowadays the methods which are signature-based are less effective for the detection and classification of malwares. Fortunately, in recent years, much attention has been paid to the use of machine learning (ML) in this area. For example, deep learning (DL) can be called one of the most important and efficient ML algorithms that are used for malwares detection and classification. But, one of the factors that reduce the performance of DL is catastrophic forgetting (CF). In fact, catastrophic forgetting occurs when a model that was previously trained and tested on some of the malwares, now when trained and tested on new types of malwares, does not have the former accuracy in detection and classification of previous malwares. Our purpose in this study is multiple folds. Firstly, we show that even a highly accurate deep model for malware classification (MC) performs poorly in continual learning (CL) settings. Secondly, we implement and compare two state-of-the-art CL techniques for solving the problem, and show that even the state-of-the-art fails when tested in the single-head setup. Thirdly, we study the impact of class imbalance on the CL performance of our model for MC. Fourthly, we also study the impact of freezing the feature learning layers on the CL performance of our model. Finally, we also show that a hybrid approach consisting of rehearsal and the regularization methods is more accurate for single-head CL for MC.
Advisor
Abstract
Publication Type
Publication Year
Subject
Computer Science