Scalable code clone search for malware analysis |
| |
Affiliation: | 1. School of Information Studies, McGill University, Montreal, QC, Canada;2. Mission Critical Cyber Security Section, DRDC-Valcartier Research Centre, Quebec, QC, Canada;3. Concordia Institute for Information Systems Engineering, Concordia University, Montreal, QC, Canada;1. Department of Electrical Engineering, Technion–Israel Institute of Technology, Haifa 3200003, Israel;2. Tower Semiconductor, Migdal Haemek 2310502, Israel;3. vSync Circuits, Yokneam Illit 2069201, Israel;4. Intel, Haifa 31015, Israel;5. Toga Networks, Hod Hasharon 4524075, Israel;1. Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA, USA;2. Johns Hopkins University, 3400 North Charles Street, Baltimore, MD, USA;1. Delft University of Technology, Software Engineering Research Group, Mekelweg 5, 2628 CD, Delft, Netherlands;2. Netherlands Forensic Institute, Knowledge and Expertise Centre for Intelligent Data Analysis, Laan van Ypenburg 6, 2497 GB, The Hague, Netherlands |
| |
Abstract: | Reverse engineering is the primary step to analyze a piece of malware. After having disassembled a malware binary, a reverse engineer needs to spend extensive effort analyzing the resulting assembly code, and then documenting it through comments in the assembly code for future references. In this paper, we have developed an assembly code clone search system called ScalClone based on our previous work on assembly code clone detection systems. The objective of the system is to identify the code clones of a target malware from a collection of previously analyzed malware binaries. Our new contributions are summarized as follows: First, we introduce two assembly code clone search methods for malware analysis with a high recall rate. Second, our methods allow malware analysts to discover both exact and inexact clones at different token normalization levels. Third, we present a scalable system with a database model to support large-scale assembly code search. Finally, experimental results on real-life malware binaries suggest that our proposed methods can effectively identify assembly code clones with the consideration of different scenarios of code mutations. |
| |
Keywords: | Assembly code clone detection Malware analysis Reverse engineering Software fingerprinting Software security |
本文献已被 ScienceDirect 等数据库收录! |
|