Detection of Malicious Vbscript Using Static and Dynamic Analysis with Recurrent Deep Learning

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |

Organized by IEEE

Related File

Attackers have used malicious VBScripts as an important computer infection vector. In this study, we explore a system that employs both static and dynamic analysis to detect malicious VBScripts. For the static analysis, we investigate two deep recurrent models, LaMP (LSTM and Max Pooling) and CPoLS (Convoluted Partitioning of Long Sequences), which process a VBScript as a byte sequence. Lower layers capture the sequential nature of these byte sequences while higher layers classify the resulting embedding as malicious or benign. Our models are trained in an end-to-end fashion allowing discriminative training even for the sequential processing layers. Dynamic analysis allows us to investigate obfuscated VBScripts an additional files which may be dropped during execution. Evaluating these models on a large corpus of 240,504 VBScript files indicates that the best performing LaMP model has a 69.3% true positive rate (TPR) at a false positive rate (FPR) of 1.0%. Similarly, the best CPoLS model has a TPR of 67.9% at an FPR of 1.0%. Our system is general in nature and can be applied to other scripting languages (e.g., JavaScript) as well.