首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Large-scale digital forensic investigations present at least two fundamental challenges. The first one is accommodating the computational needs of a large amount of data to be processed. The second one is extracting useful information from the raw data in an automated fashion. Both of these problems could result in long processing times that can seriously hamper an investigation.In this paper, we discuss a new approach to one of the basic operations that is invariably applied to raw data – hashing. The essential idea is to produce an efficient and scalable hashing scheme that can be used to supplement the traditional cryptographic hashing during the initial pass over the raw evidence. The goal is to retain enough information to allow binary data to be queried for similarity at various levels of granularity without any further pre-processing/indexing.The specific solution we propose, called a multi-resolution similarity hash (or MRS hash), is a generalization of recent work in the area. Its main advantages are robust performance – raw speed comparable to a high-grade block-level crypto hash, scalability – ability to compare targets that vary in size by orders of magnitude, and space efficiency – typically below 0.5% of the size of the target.  相似文献   

2.
Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations.Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation.This work's contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance.  相似文献   

3.
4.
Investigating seized devices within digital forensics gets more and more difficult due to the increasing amount of data. Hence, a common procedure uses automated file identification which reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is also helpful to detect similar data by applying approximate matching.Let x denote the number of digests in a database, then the lookup for a single similarity digest has the complexity of O(x). In other words, the digest has to be compared against all digests in the database. In contrast, cryptographic hash values are stored within binary trees or hash tables and hence the lookup complexity of a single digest is O(log2(x)) or O(1), respectively.In this paper we present and evaluate a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(x) to O(1). Therefore, instead of using multiple small Bloom filters (which is the common procedure), we demonstrate that a single, huge Bloom filter has a far better performance. Our evaluation demonstrates that current approximate matching algorithms are too slow (e.g., over 21 min to compare 4457 digests of a common file corpus against each other) while the improved version solves this challenge within seconds. Studying the precision and recall rates shows that our approach works as reliably as the original implementations. We obtain this benefit by accuracy–the comparison is now a file-against-set comparison and thus it is not possible to see which file in the database is matched.  相似文献   

5.
《Digital Investigation》2014,11(2):81-89
Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations.Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages, and have been evaluated in a somewhat ad-hoc manner. Recently, the FRASH testing framework has been proposed as a vehicle for developing a set of standardized tests for approximate matching algorithms; the aim is to provide a useful guide for understanding and comparing the absolute and relative performance of different algorithms.The contribution of this work is twofold: a) expand FRASH with automated tests for quantifying approximate matching algorithm behavior with respect to precision and recall; and b) present a case study of two algorithms already in use–sdhash and ssdeep.  相似文献   

6.
Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Word document), and fragments (e.g., network packets), too. Over the recent years a couple of different similarity hashing algorithms were published. However, due to the absence of a definition and a test framework, it is hardly possible to evaluate and compare these approaches to establish them in the community.The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing. First, we describe common use cases of a similarity hashing algorithm to motivate our two test classes efficiency and sensitivity & robustness. Next, our open and freely available framework is briefly described. Finally, we apply FRASH to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses.  相似文献   

7.
The Daubert decision motivates attempts to establish error rates for digital forensic tools. Many scientific procedures have been devised that can answer simple questions. For example, does a soil sample contain component X? A procedure can be followed that gives an answer with known rates of error. Usually the error rate of a process that tries to detect something is associated with a random component of some measurement. Typically there are two types of error, type I, also called a false positive (detecting it when it is not really there), and type II, also called a false negative (missing it when it really is there). At first thought, an error rate for a forensic acquisition tool or a write blocking tool is a simple concept. An obvious possibility for the error rate of an acquisition is k/n, where n is the total number of bits acquired and k is the number of incorrectly acquired bits. However, the kinds of errors in the soil test and in digital acquisition are fundamentally different. The errors in the soil test can be modeled with a random distribution that can be treated statistically, but the errors that occur in a digital acquisition are systematic and triggered by specific conditions. The purpose of this paper is not to define any error rates for forensic tools, but identification of some of the basic issues to stimulate discussion and further work on the topic.  相似文献   

8.
This practitioner paper provides an introduction to investigating IPv6 networks and systems. IPv6 addressing, packet structure, and supporting protocols are explained. Collecting information from IPv6 registries and databases such as WHOIS and DNS is demonstrated. Basic concepts and methods relevant for digital forensic investigators are highlighted, including the forensic analysis of IPv6 enabled systems. The enabling of IPv6 capability in a forensics lab is shown, including IPv6 connectivity and the use of IPv6 compatible tools. Collection and analysis of live network evidence from IPv6 networks is discussed, including investigation of remote IPv6 nodes, and promiscuous capture of IPv6 traffic.  相似文献   

9.
Current digital forensic text string search tools use match and/or indexing algorithms to search digital evidence at the physical level to locate specific text strings. They are designed to achieve 100% query recall (i.e. find all instances of the text strings). Given the nature of the data set, this leads to an extremely high incidence of hits that are not relevant to investigative objectives. Although Internet search engines suffer similarly, they employ ranking algorithms to present the search results in a more effective and efficient manner from the user's perspective. Current digital forensic text string search tools fail to group and/or order search hits in a manner that appreciably improves the investigator's ability to get to the relevant hits first (or at least more quickly). This research proposes and empirically tests the feasibility and utility of post-retrieval clustering of digital forensic text string search results – specifically by using Kohonen Self-Organizing Maps, a self-organizing neural network approach.This paper is presented as a work-in-progress. A working tool has been developed and experimentation has begun. Findings regarding the feasibility and utility of the proposed approach will be presented at DFRWS 2007, as well as suggestions for follow-on research.  相似文献   

10.
Current digital forensics methods capture, preserve, and analyze digital evidence in general-purpose electronic containers (typically, plain files) with no dedicated support to help establish that the evidence has been properly handled. Auditing of a digital investigation, from identification and seizure of evidence through duplication and investigation is, essentially, ad hoc, recorded in separate log files or in an investigator's case notebook. Auditing performed in this fashion is bound to be incomplete, because different tools provide widely disparate amounts of auditing information – including none at all – and there is ample room for human error. The latter is a particularly pressing concern given the fast growth of the size of forensic targets.Recently, there has been a serious community effort to develop an open standard for specialized digital evidence containers (DECs). A DEC differs from a general purpose container in that, in addition to the actual evidence, it bundles arbitrary metadata associated with it, such as logs and notes, and provides the basic means to detect evidence-tampering through digital signatures. Current approaches consist of defining a container format and providing a specialized library that can be used to manipulate it. While a big step in the right direction, this approach has some non-trivial shortcomings – it requires the retooling of existing forensic software and, thereby, limits the number of tools available to the investigator. More importantly, however, it does not provide a complete solution since it only records snapshots of the state of the DEC without being able to provide a trusted log of all data operations actually performed on the evidence. Without a trusted log the question of whether a tool worked exactly as advertised cannot be answered with certainty, which opens the door to challenges (both legitimate and frivolous) of the results.In this paper, we propose a complementary mechanism, called the Forensic Discovery Auditing Module (FDAM), aimed at closing this loophole in the discovery process. FDAM can be thought of as a ‘clean-room’ environment for the manipulation of digital evidence, where evidence from containers is placed for controlled manipulation. It functions as an operating system component, which monitors and logs all access to the evidence and enforces policy restrictions. This allows the immediate, safe, and verifiable use of any tool deemed necessary by the examiner. In addition, the module can provide transparent support for multiple DEC formats, thereby greatly simplifying the adoption of open standards.  相似文献   

11.
Android operating system has the highest market share in 2014; making it the most widely used mobile operating system in the world. This fact makes Android users the biggest target group for malware developers. Trend analyses show large increase in mobile malware targeting the Android platform. Android's security mechanism is based on an instrument that informs users about which permissions the application needs to be granted before installing them. This permission system provides an overview of the application and may help gain awareness about the risks. However, we do not have enough information to conclude that standard users read or digital investigators understand these permissions and their implications. Digital investigators need to be on the alert for the presence of malware when examining Android devices, and can benefit from supporting tools that help them understand the capabilities of such malicious code. This paper presents a permission-based Android malware detection system, APK Auditor that uses static analysis to characterize and classify Android applications as benign or malicious. APK Auditor consists of three components: (1) A signature database to store extracted information about applications and analysis results, (2) an Android client which is used by end-users to grant application analysis requests, and (3) a central server responsible for communicating with both signature database and smartphone client and managing whole analysis process. To test system performance, 8762 applications in total, 1853 benign applications from Google's Play Store and 6909 malicious applications from different sources were collected and analyzed by the system developed. The results show that APK Auditor is able to detect most well-known malwares and highlights the ones with a potential in approximately 88% accuracy with a 0.925 specificity.  相似文献   

12.
The National Software Reference Library (NSRL) is an essential data source for forensic investigators, providing in its Reference Data Set (RDS) a set of hash values of known software. However, the NSRL RDS has not previously been tested against a broad spectrum of real-world data. The current work did this using a corpus of 36 million files on 2337 drives from 21 countries. These experiments answered a number of important questions about the NSRL RDS, including what fraction of files it recognizes of different types. NSRL coverage by vendor/product was also tested, finding 51% of the vendor/product names in our corpus had no hash values at all in NSRL. It is shown that coverage or “recall” of the NSRL can be improved with additions from our corpus such as frequently-occurring files and files whose paths were found previously in NSRL with a different hash value. This provided 937,570 new hash values which should be uncontroversial additions to NSRL. Several additional tests investigated the accuracy of the NSRL data. Experiments testing the hash values saw no evidence of errors. Tests of file sizes showed them to be consistent except for a few cases. On the other hand, the product types assigned by NSRL can be disputed, and it failed to recognize any of a sample of virus-infected files. The file names provided by NSRL had numerous discrepancies with the file names found in the corpus, so the discrepancies were categorized; among other things, there were apparent spelling and punctuation errors. Some file names suggest that NSRL hash values were computed on deleted files, not a safe practice. The tests had the secondary benefit of helping identify occasional errors in the metadata obtained from drive imaging on deleted files in our corpus. This research has provided much data useful in improving NSRL and the forensic tools that depend upon it. It also provides a general methodology and software for testing hash sets against corpora.  相似文献   

13.
As electronic documents become more important and valuable in the modern era, attempts are invariably made to take undue-advantage by tampering with them. Tampering with the modification, access and creation date and time stamps (MAC DTS) of digital documents pose a great threat and proves to be a major handicap in digital forensic investigation. Authentic date and time stamps (ADTS) can provide crucial evidence in linking crime to criminal in cases of Computer Fraud and Cyber Crimes (CFCC) through reliable time lining of digital evidence. But the ease with which the MAC DTS of stored digital documents can be changed raises some serious questions about the integrity and admissibility of digital evidence, potentially leading to rejection of acquired digital evidence in the court of Law. MAC DTS procedures of popular operating systems are inherently flawed and were created only for the sake of convenience and not necessarily keeping in mind the security and digital forensic aspects. This paper explores these issues in the context of the Ext2 file system and also proposes one solution to tackle such issues for the scenario where systems have preinstalled plug-ins in the form of Loadable Kernel Modules, which provide the capability to preserve ADTS.  相似文献   

14.
The field of digital forensics maintains significant reliance on the software it uses to acquire and investigate forms of digital evidence. Without these tools, analysis of digital devices would often not be possible. Despite such levels of reliance, techniques for validating digital forensic software are sparse and research is limited in both volume and depth. As practitioners pursue the goal of producing robust evidence, they face the onerous task of both ensuring the accuracy of their tools and, their effective use. Whilst tool errors provide one issue, establishing a tool's limitations also provides an investigatory challenge leading the potential for practitioner user-error and ultimately a grey area of accountability. This article debates the problems surrounding digital forensic tool usage, evidential reliability and validation.  相似文献   

15.
ABSTRACT

This paper examines how new technologies are employed by the Brazilian Chamber of Deputies to stimulate experiences of digital engagement. It also evaluates how new technologies are put in practice by the institution, considering its potentialities and limitations in mediating the relationship between the parliament and the citizens. This analysis is anchored in concepts put forth by Polsby about arena parliaments and transformative parliaments, in order to evaluate which of these models of engagement tools have greater potential. The study concludes that the use of digital technologies by the Brazilian Parliament is very diverse, with a variety of tools that allow for the interaction and engagement of citizens, although these tools have the greatest potential for the arena parliament model.  相似文献   

16.
While reflective spectrophotometry is an established method for measuring macroscopic hair colour, it can be cumbersome to use on a large number of individuals and not all reflective spectrophotometry instruments are easily portable. This study investigates the use of digital photographs to measure hair colour and compares its use to reflective spectrophotometry. An understanding of the accuracy of colour determination by these methods is of relevance when undertaking specific investigations, such as those on the genetics of hair colour. Measurements of hair colour may also be of assistance in cases where a photograph is the only evidence of hair colour available (e.g. surveillance). Using the CIE L*a*b* colour space, the hair colour of 134 individuals of European ancestry was measured by both reflective spectrophotometry and by digital image analysis (in V++). A moderate correlation was found along all three colour axes, with Pearson correlation coefficients of 0.625, 0.593 and 0.513 for L*, a* and b* respectively (p-values = 0.000), with means being significantly overestimated by digital image analysis for all three colour components (by an average of 33.42, 3.38 and 8.00 for L*, a* and b* respectively). When using digital image data to group individuals into clusters previously determined by reflective spectrophotometric analysis using a discriminant analysis, individuals were classified into the correct clusters 85.8% of the time when there were two clusters. The percentage of cases correctly classified decreases as the number of clusters increases. It is concluded that, although more convenient, hair colour measurement from digital images has limited use in situations requiring accurate and consistent measurements.  相似文献   

17.
《Digital Investigation》2014,11(4):314-322
This research comparatively evaluates four competing clustering algorithms for thematically clustering digital forensic text string search output. It does so in a more realistic context, respecting data size and heterogeneity, than has been researched in the past. In this study, we used physical-level text string search output, consisting of over two million search hits found in nearly 50,000 allocated files and unallocated blocks. Holding the data set constant, we comparatively evaluated k-Means, Kohonen SOM, Latent Dirichlet Allocation (LDA) followed by k-Means, and LDA followed by SOM. This enables true cross-algorithm evaluation, whereas past studies evaluated singular algorithms using unique, non-reproducible datasets. Our research shows an LDA + k-Means using a linear, centroid-based user navigation procedure produces optimal results. The winning approach increased information retrieval effectiveness, from the baseline random walk absolute precision rate of 0.04, to an average precision rate of 0.67. We also explored a variety of algorithms for user navigation of search hit results, finding that the performance of k-means clustering can be greatly improved with a non-linear, non-centroid-based cluster and document navigation procedure, which has potential implications for digital forensic tools and use thereof, particularly given the popularity and speed of k-means clustering.  相似文献   

18.
Document forensics remains an important field of digital forensics. To date, previously existing methods focused on the last saved version of the document file stored on the PC; however, the drawback of this approach is that this provides no indication as to how the contents have been modified. This paper provides a novel method for document forensics based on tracking the revision history of a Microsoft Word file. The proposed method concentrates on the TMP file created when the author saves the file and the ASD file created periodically by Microsoft Word during editing. A process whereby the revision history lists are generated based on metadata of the Word, TMP, and ASD files is presented. Furthermore, we describe a technique developed to link the revision history lists based on similarity. These outcomes can provide considerable assistance to a forensic investigator trying to establish the extent to which document file contents have been changed and when the file was created, modified, deleted, and copied.  相似文献   

19.
Considering today's challenges in digital forensics, for password cracking, distributed computing is a necessity. If we limit the selection of password-cracking tools strictly to open-source software, hashcat tool unambiguously wins in speed, repertory of supported hash formats, updates, and community support. Though hashcat itself is by design a single-machine solution, its interface makes it possible to use the tool as a base construction block of a larger distributed system. Creating a “distributed hashcat” which supports the maximum of hashcat's original features requires a smart controller that employs different distribution strategies in different cases. In the paper, we show how to use BOINC framework to control a network of hashcat-equipped nodes and provide a working solution for performing different cracking attacks. We also provide experimental results of multiple cracking tasks to demonstrate the applicability of our approach. Last but not least, we compare our solution to an existing hashcat-based distributed tool - Hashtopolis.  相似文献   

20.
This paper defines the attributes that are required to provide ‘good’ provenance. The various sources that may be used to verify the provenance of digital information during the investigation of a Microsoft® Windows® based computer system are presented. This paper shows how provenance can be corroborated by artefacts which indicate how a computer system was connected to the outside world and the capabilities that it provided to a user. Finally consideration is given to a number of commonly used forensic tools and shows how they represent the provenance of information to an investigator.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号