首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations.Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation.This work's contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance.  相似文献   

2.
《Digital Investigation》2014,11(2):81-89
Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations.Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages, and have been evaluated in a somewhat ad-hoc manner. Recently, the FRASH testing framework has been proposed as a vehicle for developing a set of standardized tests for approximate matching algorithms; the aim is to provide a useful guide for understanding and comparing the absolute and relative performance of different algorithms.The contribution of this work is twofold: a) expand FRASH with automated tests for quantifying approximate matching algorithm behavior with respect to precision and recall; and b) present a case study of two algorithms already in use–sdhash and ssdeep.  相似文献   

3.
The fast growth of the average size of digital forensic targets demands new automated means to quickly, accurately and reliably correlate digital artifacts. Such tools need to offer more flexibility than the routine known-file filtering based on crypto hashes. Currently, there are two tools for which NIST has produced reference hash sets–ssdeep and sdhash. The former provides a fixed-sized fuzzy hash based on random polynomials, whereas the latter produces a variable-length similarity digest based on statistically-identified features packed into Bloom filters.This study provides a baseline evaluation of the capabilities of these tools both in a controlled environment and on real-world data. The results show that the similarity digest approach significantly outperforms in terms of recall and precision in all tested scenarios and demonstrates robust and scalable behavior.  相似文献   

4.
Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Word document), and fragments (e.g., network packets), too. Over the recent years a couple of different similarity hashing algorithms were published. However, due to the absence of a definition and a test framework, it is hardly possible to evaluate and compare these approaches to establish them in the community.The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing. First, we describe common use cases of a similarity hashing algorithm to motivate our two test classes efficiency and sensitivity & robustness. Next, our open and freely available framework is briefly described. Finally, we apply FRASH to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses.  相似文献   

5.
Large-scale digital forensic investigations present at least two fundamental challenges. The first one is accommodating the computational needs of a large amount of data to be processed. The second one is extracting useful information from the raw data in an automated fashion. Both of these problems could result in long processing times that can seriously hamper an investigation.In this paper, we discuss a new approach to one of the basic operations that is invariably applied to raw data – hashing. The essential idea is to produce an efficient and scalable hashing scheme that can be used to supplement the traditional cryptographic hashing during the initial pass over the raw evidence. The goal is to retain enough information to allow binary data to be queried for similarity at various levels of granularity without any further pre-processing/indexing.The specific solution we propose, called a multi-resolution similarity hash (or MRS hash), is a generalization of recent work in the area. Its main advantages are robust performance – raw speed comparable to a high-grade block-level crypto hash, scalability – ability to compare targets that vary in size by orders of magnitude, and space efficiency – typically below 0.5% of the size of the target.  相似文献   

6.
Forensic examination of Windows Mobile devices and devices running its successor Windows Phone 7 remains relevant for the digital forensic community. In these devices, the file pim.vol is a Microsoft Embedded Database (EDB) volume that contains information related to contacts, appointments, call history, speed-dial settings and tasks. Current literature shows that analysis of the pim.vol file is less than optimal. We succeeded in reverse-engineering significant parts of the EDB volume format and this article presents our current understanding of the format. In addition we provide a mapping from internal column identifiers to human readable application-level property names for the pim.vol database. We implemented a parser and compared our results to the traditional approach using an emulator and the API provided by the Windows CE operating system. We were able to recover additional databases, additional properties per record and unallocated records.  相似文献   

7.
The National Software Reference Library (NSRL) is an essential data source for forensic investigators, providing in its Reference Data Set (RDS) a set of hash values of known software. However, the NSRL RDS has not previously been tested against a broad spectrum of real-world data. The current work did this using a corpus of 36 million files on 2337 drives from 21 countries. These experiments answered a number of important questions about the NSRL RDS, including what fraction of files it recognizes of different types. NSRL coverage by vendor/product was also tested, finding 51% of the vendor/product names in our corpus had no hash values at all in NSRL. It is shown that coverage or “recall” of the NSRL can be improved with additions from our corpus such as frequently-occurring files and files whose paths were found previously in NSRL with a different hash value. This provided 937,570 new hash values which should be uncontroversial additions to NSRL. Several additional tests investigated the accuracy of the NSRL data. Experiments testing the hash values saw no evidence of errors. Tests of file sizes showed them to be consistent except for a few cases. On the other hand, the product types assigned by NSRL can be disputed, and it failed to recognize any of a sample of virus-infected files. The file names provided by NSRL had numerous discrepancies with the file names found in the corpus, so the discrepancies were categorized; among other things, there were apparent spelling and punctuation errors. Some file names suggest that NSRL hash values were computed on deleted files, not a safe practice. The tests had the secondary benefit of helping identify occasional errors in the metadata obtained from drive imaging on deleted files in our corpus. This research has provided much data useful in improving NSRL and the forensic tools that depend upon it. It also provides a general methodology and software for testing hash sets against corpora.  相似文献   

8.
File carving is the process of reassembling files from disk fragments based on the file content in the absence of file system metadata. By leveraging both file header and footer pairs, traditional file carving mainly focuses on document and image files such as PDF and JPEG. With the vast amount of malware code appearing in the wild daily, recovery of binary executable files becomes an important problem, especially for the case in which malware deletes itself after compromising a computer. However, unlike image files that usually have both a header and footer pair, executable files only have header information, which makes the carving much harder. In this paper, we present Bin-Carver, a first-of-its-kind system to automatically recover executable files with deleted or corrupted metadata. The key idea is to explore the road map information defined in executable file headers and the explicit control flow paths present in the binary code. Our experiment with thousands of binary code files has shown our Bin-Carver to be incredibly accurate, with an identification rate of 96.3% and recovery rate of 93.1% on average when handling file systems ranging from pristine to chaotic and highly fragmented.  相似文献   

9.
Minnaard proposed a novel method that constructs a creation time bound of files recovered without time information. The method exploits a relationship between the creation order of files and their locations on a storage device managed with the Linux FAT32 file system. This creation order reconstruction method is valid only in non-wraparound situations, where the file creation time in a former position is earlier than that in a latter position. In this article, we show that if the Linux FAT32 file allocator traverses the storage space more than once, the creation time of a recovered file is possibly earlier than that of a former file and possibly later than that of a latter file on the Linux FAT32 file system. Also it is analytically verified that there are at most n candidates for the creation time bound of each recovered file where n is the number of traversals by the file allocator. Our analysis is evaluated by examining file allocation patterns of two commercial in-car dashboard cameras.  相似文献   

10.
Forensic imaging has been facing scalability challenges for some time. As disk capacity growth continues to outpace storage IO bandwidth, the demands placed on storage and time are ever increasing. Data reduction and de-duplication technologies are now commonplace in the Enterprise space, and are potentially applicable to forensic acquisition. Using the new AFF4 forensic file format we employ a hash based compression scheme to leverage an existing corpus of images, reducing both acquisition time and storage requirements. This paper additionally describes some of the recent evolution in the AFF4 file format making the efficient implementation of hash based imaging a reality.  相似文献   

11.
Cross-cultural research on psychopathy necessitates assessment methods that are generalizable across linguistic and cultural differences. Multiple-group confirmatory factor analysis was used to compare the factorial structure of Psychopathy Checklist-Revised (PCL-R) assessments obtained from file reviews of North-American (N = 2622) and German (N = 443) male offenders. The analyses indicated that the 18 item, 4-factor model of the PCL-R obtained with the standard PCL-R protocol (interview and file review) also holds for file review data. On a factor-by-factor level, the data are commensurate with strong factorial invariance of factor loadings and item thresholds for the Interpersonal and Lifestyle factors, and with likely metric invariance for the Affective factor. The Antisocial factor showed structural differences between the two samples. The results imply that cultural or environmental factors more strongly influence the judgment and/or expression of antisociality. Based on the results, cross-cultural comparisons between North-American and German offenders in terms of PCL-R psychopathy should be limited to the Interpersonal and Lifestyle factors. Further research using data obtained through the standard protocol (i.e., interview plus file information) is encouraged.  相似文献   

12.
An absence and its locus are the same ontological entity. But the cognition of the absence is different from the cognition of the locus. The cognitive difference is caused by a query followed by a cognitive process of introspection. The moment one perceptually knows y that contains only one thing, z, one is in a position to conclude that y contains the absence of any non-z. After having a query as to whether y has x one revisits one’s knowledge of y containing z and comes to know that x is absent from y. Thus the knowledge of the absence of x logically follows from the knowledge of y containing z through the mediation of a query. This analysis goes against the thesis according to which an absence is an irreducible entity that is to be known through senses, and is inspired by the Mīmā?sā views, especially the Prābhākara views, on absence and its cognition.  相似文献   

13.
Identity-based cryptography has attracted attention in the cryptographic research community in recent years. Despite the importance of cryptographic schemes for applications in business and law, the legal implications of identity-based cryptography have not yet been discussed. We investigate how identity-based signatures fit into the legal framework. We focus on the European Signature Directive, but also take the UNCITRAL Model Law on Electronic Signatures into account. In contrast to previous assumptions, identity-based signature schemes can, in principle, be used even for qualified electronic signatures, which can replace handwritten signatures in the member states of the European Union. We derive requirements to be taken into account in the development of future identity-based signature schemes.  相似文献   

14.
Over the past decade, a substantial effort has been put into developing methods to classify file fragments. Throughout, it has been an article of faith that data fragments, such as disk blocks, can be attributed to different file types. This work is an attempt to critically examine the underlying assumptions and compare them to empirically collected data. Specifically, we focus most of our effort on surveying several common compressed data formats, and show that the simplistic conceptual framework of prior work is at odds with the realities of actual data. We introduce a new tool, zsniff, which allows us to analyze deflate-encoded data, and we use it to perform an empirical survey of deflate-coded text, images, and executables. The results offer a conceptually new type of classification capabilities that cannot be achieved by other means.  相似文献   

15.
The increasing popularity of cryptography poses a great challenge in the field of digital forensics. Digital evidence protected by strong encryption may be impossible to decrypt without the correct key. We propose novel methods for cryptographic key identification and present a new proof of concept tool named Interrogate that searches through volatile memory and recovers cryptographic keys used by the ciphers AES, Serpent and Twofish. By using the tool in a virtual digital crime scene, we simulate and examine the different states of systems where well known and popular cryptosystems are installed. Our experiments show that the chances of uncovering cryptographic keys are high when the digital crime scene are in certain well-defined states. Finally, we argue that the consequence of this and other recent results regarding memory acquisition require that the current practices of digital forensics should be guided towards a more forensically sound way of handling live analysis in a digital crime scene.  相似文献   

16.
A problem that arises in computer forensics is to determine the type of a file fragment. An extension to the file name indicating the type is stored in the disk directory, but when a file is deleted, the entry for the file in the directory may be overwritten. This problem is easily solved when the fragment includes the initial header, which contains explicit type-identifying information, but it is more difficult to determine the type of a fragment from the middle of a file.We investigate two algorithms for predicting the type of a fragment: one based on Fisher's linear discriminant and the other based on longest common subsequences of the fragment with various sets of test files. We test the ability of the algorithms to predict a variety of common file types. Algorithms of this kind may be useful in designing the next generation of file-carvers – programs that reconstruct files when directory information is lost or deleted. These methods may also be useful in designing virus scanners, firewalls and search engines to find files that are similar to a given file.  相似文献   

17.
We have applied the generalised and universal distance measure NCD—Normalised Compression Distance—to the problem of determining the type of file fragments. To enable later comparison of the results, the algorithm was applied to fragments of a publicly available corpus of files. The NCD algorithm in conjunction with the k-nearest-neighbour (k ranging from one to ten) as the classification algorithm was applied to a random selection of circa 3000 512-byte file fragments from 28 different file types. This procedure was then repeated ten times. While the overall accuracy of the n-valued classification only improved the prior probability from approximately 3.5% to circa 32–36%, the classifier reached accuracies of circa 70% for the most successful file types.A prototype of a file fragment classifier was then developed and evaluated on new set of data (from the same corpus). Some circa 3000 fragments were selected at random and the experiment repeated five times. This prototype classifier remained successful at classifying individual file types with accuracies ranging from only slightly lower than 70% for the best class, down to similar accuracies as in the prior experiment.  相似文献   

18.
用电子取证中的文件雕复相关技术,并以AVI文件为实例,阐述一种侦测AVI文件碎片,并将碎片排序组合成完整文件的文件雕复方法。通过实例的文件雕复方法基于AVI文件格式的结构特征,借鉴了文件结构关键字匹配,二分碎片间隙雕复等文件雕复思想来完成实验。此方法能在失去系统元信息的情况下完成对AVI文件的文件雕复。实验结果表明,此方法提高了有碎片的AVI文件的雕复成功率。  相似文献   

19.
Inconsistency between the way in which the law is structured, and the way in which technologies actually operate is always an interesting and useful topic to explore. When a law conflicts with a business model, the solution will often be changing the business model. However, when the law comes into conflict with the architecture of hardware and software, it is less clear how the problem will be managed.In this paper, we analyze the contradiction of blockchain technology and the requirements of GDPR. The three contradictions we examine are (i) right to be forgotten versus irreversibility/immutability of records, (ii) data protection by design versus tamper-proofness and transparency of blockchain, and (iii) data controller versus decentralized nodes. We highlight that the conflicts can be handled through focusing on commonalities of GDPR and the blockchain, developing new approaches and interpretations, and tailoring the blockchain technology according to the needs of data protection law.  相似文献   

20.
This paper explores the use of purpose-built functions and cryptographic hashes of small data blocks for identifying data in sectors, file fragments, and entire files. It introduces and defines the concept of a “distinct” disk sector—a sector that is unlikely to exist elsewhere except as a copy of the original. Techniques are presented for improved detection of JPEG, MPEG and compressed data; for rapidly classifying the forensic contents of a drive using random sampling; and for carving data based on sector hashes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号