Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References

Corona, Igino; Maiorca, Davide; Ariu, Davide; Giacinto, Giorgio

doi:10.1145/2666652.2666657

JavaScript is a dynamic programming language adopted in a variety of applications, including web pages, PDF Readers, widget engines, network platforms, office suites. Given its widespread presence throughout different software platforms, JavaScript is a primary tool for the development of novel -rapidly evolving- malicious exploits. If the classical signature- and heuristic-based detection approaches are clearly inadequate to cope with this kind of threat, machine learning solutions proposed so far suffer from high false-alarm rates or require special instrumentation that make them not suitable for protecting end-user systems. In this paper we present Lux0R "Lux 0n discriminant References", a novel, lightweight approach to the detection of malicious JavaScript code. Our method is based on the characterization of JavaScript code through its API references, i.e., functions, constants, objects, methods, keywords as well as attributes natively recognized by a JavaScript Application Programming Interface (API). We exploit machine learning techniques to select a subset of API references that characterize malicious code, and then use them to detect JavaScript malware. The selection algorithm has been thought to be "secure by design" against evasion by mimicry attacks. In this investigation, we focus on a relevant application domain, i.e., the detection of malicious JavaScript code within PDF documents. We show that our technique is able to achieve excellent malware detection accuracy, even on samples exploiting never-before-seen vulnerabilities, i.e., for which there are no examples in training data. Finally, we experimentally assess the robustness of Lux0R against mimicry attacks based on feature addition.