An Evasion Resilient Approach to the Detection of Malicious PDF Files

Maiorca, Davide; Ariu, Davide; Corona, Igino; Giacinto, Giorgio

doi:10.1007/978-3-319-27668-7_5

Malicious PDF les still constitute a serious threat to the systems security. New reader vulnerabilities have been discovered, and research has shown that current state of the art approaches can be easily bypassed by exploiting weaknesses caused by erroneous parsing or incomplete information extraction. In this work, we present a novel machine learning system to the detection of malicious PDF les. We have developed a static approach that leverages on information extracted by both the structure and the content of PDF les, which allows to improve the system robustness against evasion attacks. Experimental results show that our system is able to outperform all publicly available state of the art tools. We also report a signicant improvement of the performances at detecting reverse mimicry attacks, which are able to completely evade systems that only extract information from the PDF le structure. Finally, we claim that, to avoid targeted attacks, a more careful design of machine learning based detectors is needed.