首页 | 本学科首页   官方微博 | 高级检索  
     


Scripting DNA: Identifying the JavaScript programmer
Affiliation:1. Delft University of Technology, Software Engineering Research Group, Mekelweg 5, 2628 CD, Delft, Netherlands;2. Netherlands Forensic Institute, Knowledge and Expertise Centre for Intelligent Data Analysis, Laan van Ypenburg 6, 2497 GB, The Hague, Netherlands;1. Information Security Center, Beijing University of Posts and Telecommunications, Beijing 100876, China;2. National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China;3. Beijing Softsec Technology Co., Ltd, Beijing 100876, China;1. Center for Pediatric Trauma Research, The Research Institute at Nationwide Children''s Hospital, USA;2. College of Public Health, Division of Biostatistics, The Ohio State University, USA;3. Injury Research and Policy, Johns Hopkins Center for International Injury Research Unit, Johns Hopkins University, USA;4. Colorado Injury Control Research Center, Colorado State University, USA;5. Center for Injury Research and Policy, The Research Institute at Nationwide Children''s Hospital, USA;6. The Ohio State University College of Medicine, USA
Abstract:The attribution of authorship is required in diverse applications, ranging from ancient novels (Shakespeare's work, Federalist papers) for historical interest to recent novels for linguistic research or even out of curiosity (Robert Galbraith alias J.K.Rowling). For this problem extensive research has resulted in effective general purpose methods. Also, for other types of text the original author needs to be discovered. Especially, we are interested in methods to identify JavaScript programmers, which can be used to reveal the offender who produced malicious software on a website. So far, for this hardly studied problem, mainly general purpose methods from natural language authorship attribution have been applied. Moreover, no suitable reference dataset is available to allow for method evaluation and method development in a supervised machine learning approach. In this work we first obtain a reference dataset of substantial size and quality. Further, we propose to extract structural features from the Abstract Syntax Tree (AST) to describe the coding style of an author. In the experiments, we show that the specifically designed features indeed improve the authorship attribution of scripting code to programmers, especially in addition to character n-gram features.
Keywords:Authorship identification  Authorship verification  Source code  JavaScript  Abstract Syntax Tree  Syntactic features
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号