首页 | 本学科首页   官方微博 | 高级检索  
     


Applicability of Latent Dirichlet Allocation to multi-disk search
Affiliation:1. College of Computer Science and Technology, Jilin University, Qianjin Street 2699, Changchun 130012, China;2. Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Qianjin Street 2699, Changchun 130012, China
Abstract:Digital forensics practitioners face a continual increase in the volume of data they must analyze, which exacerbates the problem of finding relevant information in a noisy domain. Current technologies make use of keyword based search to isolate relevant documents and minimize false positives with respect to investigative goals. Unfortunately, selecting appropriate keywords is a complex and challenging task. Latent Dirichlet Allocation (LDA) offers a possible way to relax keyword selection by returning topically similar documents. This research compares regular expression search techniques and LDA using the Real Data Corpus (RDC). The RDC, a set of over 2400 disks from real users, is first analyzed to craft effective tests. Three tests are executed with the results indicating that, while LDA search should not be used as a replacement to regular expression search, it does offer benefits. First, it is able to locate documents when few, if any, of the keywords exist within them. Second, it improves data browsing and deals with keyword ambiguity by segmenting the documents into topics.
Keywords:Latent Dirichlet Allocation  Topic models  Query by document  Data mining  Text mining  Document search
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号