Unicode search of dirty data,or: How I learned to stop worrying and love Unicode Technical Standard #18 |
| |
Affiliation: | Lightbox Technologies, Inc., 1400 Key Boulevard, Suite 1000, Arlington, VA 22209, USA |
| |
Abstract: | This paper discusses problems arising in digital forensics with regard to Unicode, character encodings, and search. It describes how multipattern search can handle the different text encodings encountered in digital forensics and a number of issues pertaining to proper handling of Unicode in search patterns. Finally, we demonstrate the feasibility of the approach and discuss the integration of our developed search engine, lightgrep, with the popular bulk_extractor tool. |
| |
Keywords: | Unicode Regular expression Regex Search Ditigal forensics |
本文献已被 ScienceDirect 等数据库收录! |
|