首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a method by which to determine a synchronzation point within a DEFLATE-compressed bit stream (as used in Zip and gzip archives) for which the beginning is unknown or damaged. Decompressing from the synchronization point forward yields a mixed stream of literal bytes and co-indexed unknown bytes. Language modeling in the form of byte trigrams and word unigrams is then applied to the resulting stream to infer probable replacements for each co-indexed unknown byte. Unique inferences can be made for approximately 30% of the co-indices, permitting reconstruction of approximately 75% of the unknown bytes recovered from the compressed data with accuracy in excess of 90%. The program implementing these techniques is available as open-source software.  相似文献   

2.
File carving is the process of reassembling files from disk fragments based on the file content in the absence of file system metadata. By leveraging both file header and footer pairs, traditional file carving mainly focuses on document and image files such as PDF and JPEG. With the vast amount of malware code appearing in the wild daily, recovery of binary executable files becomes an important problem, especially for the case in which malware deletes itself after compromising a computer. However, unlike image files that usually have both a header and footer pair, executable files only have header information, which makes the carving much harder. In this paper, we present Bin-Carver, a first-of-its-kind system to automatically recover executable files with deleted or corrupted metadata. The key idea is to explore the road map information defined in executable file headers and the explicit control flow paths present in the binary code. Our experiment with thousands of binary code files has shown our Bin-Carver to be incredibly accurate, with an identification rate of 96.3% and recovery rate of 93.1% on average when handling file systems ranging from pristine to chaotic and highly fragmented.  相似文献   

3.
We have applied the generalised and universal distance measure NCD—Normalised Compression Distance—to the problem of determining the type of file fragments. To enable later comparison of the results, the algorithm was applied to fragments of a publicly available corpus of files. The NCD algorithm in conjunction with the k-nearest-neighbour (k ranging from one to ten) as the classification algorithm was applied to a random selection of circa 3000 512-byte file fragments from 28 different file types. This procedure was then repeated ten times. While the overall accuracy of the n-valued classification only improved the prior probability from approximately 3.5% to circa 32–36%, the classifier reached accuracies of circa 70% for the most successful file types.A prototype of a file fragment classifier was then developed and evaluated on new set of data (from the same corpus). Some circa 3000 fragments were selected at random and the experiment repeated five times. This prototype classifier remained successful at classifying individual file types with accuracies ranging from only slightly lower than 70% for the best class, down to similar accuracies as in the prior experiment.  相似文献   

4.
In the field of reverse engineering, the correct image base of firmware has very important significance for the reverse engineers to understand the firmware by building accurate cross references. Furthermore, patching firmware needs to insert some instructions that references absolute addresses depending on the correct image base. However, for a large number of embedded system firmwares, the format is nonstandard and the image base is unknown. In this paper, we present a two-step method to determine the image base of firmwares for ARM-based devices. First, based on the storage characteristic of string in the firmware files and the encoding feature of literal pools that contain string addresses, we propose an algorithm called FIND-LP to recognize all possible literal pools in firmware. Second, we propose an algorithm called Determining image Base by Matching Literal Pools (DBMLP) to determine the image base. DBMLP can obtain the relationship between absolute addresses of strings and their corresponding offsets in a firmware file, thereby a candidate list for image base value is obtained. If the number of matched literal pools corresponding to a certain candidate image base is far greater than the others, this candidate is considered as the correct image base of the firmware. The experimental result indicates that the proposed method can effectively determine image base for a lot of firmwares that use the literal pools to store the string addresses.  相似文献   

5.
Globe positioning system (GPS) devices are an increasing importance source of evidence, as more of our devices have built-in GPS capabilities. In this paper, we propose a novel framework to efficiently recover National Marine Electronics Association (NMEA) logs and reconstruct GPS trajectories. Unlike existing approaches that require file system metadata, our proposed algorithm is designed based on the file carving technique without relying on system metadata. By understanding the characteristics and intrinsic structure of trajectory data in NMEA logs, we demonstrate how to pinpoint all data blocks belonging to the NMEA logs from the acquired forensic image of GPS device. Then, a discriminator is presented to determine whether two data blocks can be merged. And based on the discriminator, we design a reassembly algorithm to re-order and merge the obtained data blocks into new logs. In this context, deleted trajectories can be reconstructed by analyzing the recovered logs. Empirical experiments demonstrate that our proposed algorithm performs well when the system metadata is available/unavailable, log files are heavily fragmented, one or more parts of the log files are overwritten, and for different file systems of variable cluster sizes.  相似文献   

6.
Several operating systems provide a central logging service which collects event messages from the kernel and applications, filters them and writes them into log files. Since more than a decade such a system service exists in Microsoft Windows NT. Its file format is well understood and supported by forensic software. Microsoft Vista introduces an event logging service which entirely got newly designed. This confronts forensic examiners and software authors with unfamiliar system behavior and a new, widely undocumented file format.This article describes the history of Windows system loggers, what has been changed over time and for what reason. It compares Vista log files in their native binary form and in a textual form. Based on the results, this paper for the first time publicly describes the key-elements of the new log file format and the proprietary binary encoding of XML. It discusses the problems that may arise during daily work. Finally it proposes a procedure for how to recover information from log fragments. During a criminal investigation this procedure was successfully applied to recover information from a corrupted event log.  相似文献   

7.
Minnaard proposed a novel method that constructs a creation time bound of files recovered without time information. The method exploits a relationship between the creation order of files and their locations on a storage device managed with the Linux FAT32 file system. This creation order reconstruction method is valid only in non-wraparound situations, where the file creation time in a former position is earlier than that in a latter position. In this article, we show that if the Linux FAT32 file allocator traverses the storage space more than once, the creation time of a recovered file is possibly earlier than that of a former file and possibly later than that of a latter file on the Linux FAT32 file system. Also it is analytically verified that there are at most n candidates for the creation time bound of each recovered file where n is the number of traversals by the file allocator. Our analysis is evaluated by examining file allocation patterns of two commercial in-car dashboard cameras.  相似文献   

8.
Forensic examiners are frequently confronted with content in languages that they do not understand, and they could benefit from machine translation into their native language. But automated translation of file paths is a difficult problem because of the minimal context for translation and the frequent mixing of multiple languages within a path. This work developed a prototype implementation of a file-path translator that first identifies the language for each directory segment of a path, and then translates to English those that are not already English nor artificial words. Brown's LA-Strings utility for language identification was tried, but its performance was found inadequate on short strings and it was supplemented with clues from dictionary lookup, Unicode character distributions for languages, country of origin, and language-related keywords. To provide better data for language inference, words used in each directory over a large corpus were aggregated for analysis. The resulting directory-language probabilities were combined with those for each path segment from dictionary lookup and character-type distributions to infer the segment's most likely language. Tests were done on a corpus of 50.1 million file paths looking for 35 different languages. Tests showed 90.4% accuracy on identifying languages of directories and 93.7% accuracy on identifying languages of directory/file segments of file paths, even after excluding 44.4% of the paths as obviously English or untranslatable. Two of seven proposed language clues were shown to impair directory-language identification. Experiments also compared three translation methods: the Systran translation tool, Google Translate, and word-for-word substitution using dictionaries. Google Translate usually performed the best, but all still made errors with European languages and a significant number of errors with Arabic and Chinese.  相似文献   

9.
The classification of file fragments is an important problem in digital forensics. The literature does not include comprehensive work on applying machine learning techniques to this problem. In this work, we explore the use of techniques from natural language processing to classify file fragments. We take a supervised learning approach, based on the use of support vector machines combined with the bag-of-words model, where text documents are represented as unordered bags of words. This technique has been repeatedly shown to be effective and robust in classifying text documents (e.g., in distinguishing positive movie reviews from negative ones).In our approach, we represent file fragments as “bags of bytes” with feature vectors consisting of unigram and bigram counts, as well as other statistical measurements (including entropy and others). We made use of the publicly available Garfinkel data corpus to generate file fragments for training and testing. We ran a series of experiments, and found that this approach is effective in this domain as well.  相似文献   

10.
Over the past decade, a substantial effort has been put into developing methods to classify file fragments. Throughout, it has been an article of faith that data fragments, such as disk blocks, can be attributed to different file types. This work is an attempt to critically examine the underlying assumptions and compare them to empirically collected data. Specifically, we focus most of our effort on surveying several common compressed data formats, and show that the simplistic conceptual framework of prior work is at odds with the realities of actual data. We introduce a new tool, zsniff, which allows us to analyze deflate-encoded data, and we use it to perform an empirical survey of deflate-coded text, images, and executables. The results offer a conceptually new type of classification capabilities that cannot be achieved by other means.  相似文献   

11.
Cross-cultural research on psychopathy necessitates assessment methods that are generalizable across linguistic and cultural differences. Multiple-group confirmatory factor analysis was used to compare the factorial structure of Psychopathy Checklist-Revised (PCL-R) assessments obtained from file reviews of North-American (N = 2622) and German (N = 443) male offenders. The analyses indicated that the 18 item, 4-factor model of the PCL-R obtained with the standard PCL-R protocol (interview and file review) also holds for file review data. On a factor-by-factor level, the data are commensurate with strong factorial invariance of factor loadings and item thresholds for the Interpersonal and Lifestyle factors, and with likely metric invariance for the Affective factor. The Antisocial factor showed structural differences between the two samples. The results imply that cultural or environmental factors more strongly influence the judgment and/or expression of antisociality. Based on the results, cross-cultural comparisons between North-American and German offenders in terms of PCL-R psychopathy should be limited to the Interpersonal and Lifestyle factors. Further research using data obtained through the standard protocol (i.e., interview plus file information) is encouraged.  相似文献   

12.
《Digital Investigation》2014,11(3):224-233
The allocation algorithm of the Linux FAT32 file system driver positions files on disk in such a way that their relative positions reveal information on the order in which these files have been created. This provides an opportunity to enrich information from (carved) file fragments with time information, even when such file fragments lack the file system metadata in which time-related information is usually to be found.Through source code analysis and experiments the behaviour of the Linux FAT allocator is examined. How an understanding of this allocator can be applied in practice is demonstrated with a case study involving a TomTom GPS car navigation device. In this case, time information played a crucial role. Large amounts of location records could be carved from this device's flash storage, yielding insight into the locations the device has visited—yet the carved records themselves offered no information on when the device had been at the locations. Still, bounds on the records' time of creation could be inferred when making use of filesystem timestamps related to neighbouring on-disk positions.Finally, we perform experiments which contrast the Linux behaviour with that of Windows 7. We show that the latter differs subtly, breaking the strong relation between creation order and position.  相似文献   

13.

Objectives

Introduce and test the relative efficacy of two methods for modeling the impact of cumulative ‘exposure’ to drinking facilities on violent crime at street segments.

Methods

One method, simple count, sums the number of drinking places within a distance threshold. The other method, inverse distance weighted count, weights each drinking place within a threshold based on its distance from the street segment. Closer places are weighted higher than more distant places. Distance is measured as the street length from a street segment to a drinking place along the street network. Seven distance thresholds of 400, 800, 1,200, 1,600, 2,000, 2,400 and 2,800 feet are tested. A negative binomial regression model controlling for socio-economic characteristics, opportunity factors and spatial autocorrelation is used to evaluate which of the measure/threshold combinations produce a better fit as compared to a model with no exposure measures.

Results

Exposure measured as an inverse distance weighted count produces the best fitting model and is significantly related to violent crime at longer distances than simple count (from 400 to 2,800 feet). Exposure to drinking places using a simple count is significantly related to violent crime up to 2,000 feet. Both models indicate the influence of drinking places is highest at shorter distance thresholds.

Conclusions

Both researchers and practitioners can more precisely quantify the influence of drinking places in multivariate models of street segment level violent crime by incorporating proximity in the development of a cumulative exposure measure. The efficacy of using exposure measures to quantify the influence of other types of facilities on crime patterns across street segments should be explored.  相似文献   

14.

Objectives

We argue that assessing the level of crime concentration across cities has four challenges: (1) how much variability should we expect to observe; (2) whether concentration should be measured across different types of macro units of different sizes; (3) a statistical challenge for measuring crime concentration; (4) the temporal assumption employed when measuring high crime locations.

Methods

We use data for 42 cities in southern California with at least 40,000 population to assess the level of crime concentration in them for five different Part 1 crimes and total Part 1 crimes over 2005–2012. We demonstrate that the traditional measure of crime concentration is confounded by crimes that may simply spatially locate due to random chance. We also use two measures employing different temporal assumptions: a historically adjusted crime concentration measure, and a temporally adjusted crime concentration measure (a novel approximate solution that is simple for researchers to implement).

Results

There is much variability in crime concentration over cities in the top 5 % of street segments. The standard deviation across cities over years for the temporally adjusted crime concentration measure is between 10 and 20 % across crime types (with the average range typically being about 15–90 %). The historically adjusted concentration has similar variability and typically ranges from about 35 to 100 %.

Conclusions

The study provides evidence of variability in the level of crime concentration across cities, but also raises important questions about the temporal scale when measuring this concentration. The results open an exciting new area of research exploring why levels of crime concentration may vary over cities? Either micro- or macro- theories may help researchers in exploring this new direction.
  相似文献   

15.
In response to continuing interest in obtaining reference deoxyribonucleic acid (DNA) analysis data for previously unstudied population groups, blood samples were collected from Punjabi individuals living in East Punjab, India. This first segment of our research is focused on restriction fragment length polymorphism (RFLP) analysis, with future segments anticipated for various polymerase chain reaction (PCR) based techniques. In this study, the samples were subjected to RFLP analysis using HaeIII, followed by hybridization with variable number tandem repeat (VNTR) probes for loci D2S44, D1S7, D10S28, D4S139, D17S79 and D5S110. The band sizes of the resulting patterns were estimated using an FBI imaging system. The resulting data were subjected to statistical analysis for conformity with Hardy-Weinberg expectations, first for the total population of Punjabis, and additionally for the subgroups of Sikhs and Hindus. The loci are highly polymorphic in all sample populations studied. Except for D5S110, there is no evidence for departure from Hardy-Weinberg equilibrium (HWE) for the VNTR loci in the population groups. In addition, there is little evidence of correlation between the alleles at any of the pairs of loci and no evidence of association across the six loci. Finally, the data suggest that a multiple locus VNTR profile would be rare in the Punjabi or either of its subgroups.  相似文献   

16.
In this paper we present a novel approach to the problem of steganography detection in JPEG images by applying a statistical attack. The method is based on the empirical Benford's Law and, more specifically, on its generalized form. We prove and extend the validity of the logarithmic rule in colour images and introduce a blind steganographic method which can flag a file as a suspicious stego-carrier. The proposed method achieves very high accuracy and speed and is based on the distributions of the first digits of the quantized Discrete Cosine Transform coefficients present in JPEGs. In order to validate and evaluate our algorithm, we developed steganographic tools which are able to analyse image files and we subsequently applied them on the popular Uncompressed Colour Image Database. Furthermore, we demonstrate that not only can our method detect steganography but, if certain criteria are met, it can also reveal which steganographic algorithm was used to embed data in a JPEG file.  相似文献   

17.
Reconstruction of 2D object is a problem concerning many different fields such as forensics science, archiving, and banking. In the literature, it is considered as one‐sided puzzle problem. But this study handles torn banknotes as a double‐sided puzzle problem for the first time. In addition to that, a new dataset (ToB) is created for solving this problem. A selection approach based on the Borda count method is adopted in order to make the right decision as to which keypoint‐based method is to be used in the proposed reconstruction system. The selection approach was determined the Accelerated‐KAZE (AKAZE) as the most successful keypoint‐based method. This study also proposes new measures determining the success ratio of the reconstructed banknotes and calculating their loss ratio. When the torn banknotes were reconstructed with the AKAZE‐based reconstruction system, the average success rate was calculated as 95.55% by the proposed metric.  相似文献   

18.

Objectives

This study applies the growing emphasis on micro-places to the analysis of addresses, assessing the presence and persistence of “problem properties” with elevated levels of crime and disorder. It evaluates what insights this additional detail offers beyond the analysis of neighborhoods and street segments.

Methods

We used over 2,000,000 geocoded emergency and non-emergency requests received by the City of Boston’s 911 and 311 systems from 2011–2013 to calculate six indices of violent crime, physical disorder, and social disorder for all addresses (n = 123,265). We linked addresses to their street segment (n = 13,767) and census tract (n = 178), creating a three-level hierarchy that enabled a series of multilevel Poisson hierarchical models.

Results

Less than 1% of addresses generated 25% of reports of crime and disorder. Across indices, 95–99% of variance was at the address level, though there was significant clustering at the street segment and neighborhood levels. Models with lag predictors found that levels of crime and disorder persisted across years for all outcomes at all three geographic levels, with stronger effects at higher geographic levels. Distinctively, ~15% of addresses generated crime or disorder in one year and not in the other.

Conclusions

The analysis suggests new opportunities for both the criminology of place and the management of public safety in considering addresses in conjunction with higher-order geographies. We explore directions for empirical work including the further experimentation with and evaluation of law enforcement policies targeting problem properties.
  相似文献   

19.
Using campaign contributions to legislators as an indicator of member influence, we explore the impact of term limits on the distribution of power within state legislatures. Specifically, we perform a cross‐state comparison of the relative influence of party caucus leaders, committee chairs, and rank‐and‐file legislators before and after term limits. The results indicate that term limits diffuse power in state legislatures, both by decreasing average contributions to incumbents and by reducing the power of party caucus leaders relative to other members. The change in contribution levels across legislators in different chambers implies a shift in power to the upper chamber in states with term limits. Thus, the impact of term limits may be attenuated in a bicameral system.  相似文献   

20.

Objectives

The present study focuses on Systematic Social Observation (SSO) as a method to investigate physical and social disorder at different units of analysis. The study contributes to the aggregation bias debate and to the ‘social science of ecological assessment’ in two ways: first, by presenting a new model that directly controls for observer bias in ecological constructs and second, by attempting to identify systematic sources of bias in SSO that affect the valid and reliable measurement of physical and social disorder at both street segments and neighborhoods.

Methods

Data on physical disorder (e.g., litter, cigarette butts) and social disorder (e.g., loitering adults) from 1422 street segments in 253 different neighborhoods in a conurbation of the greater The Hague area (the Netherlands) are analyzed using cross-classified multilevel models.

Results

Neighborhood differences in disorder are overestimated when scholars fail to recognize the cross-classified data structure of an SSO study that is due to allocation of street segments to observers and neighborhoods. Not correcting for observer bias and observational conditions underestimates the disorder–crime association at street segment/grid cell level, but overestimates this association at the neighborhood level.

Conclusion

Findings indicate that SSO can be used for measuring disorder at both street segment level and neighborhood level. Future studies should pay attention to observer bias prior to their data collection by selecting a minimum number of observers, offering extensive training, and collecting information on the urban background of the observers.
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号