Probabilistic Redaction
An automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automa...
Main Author: | |
---|---|
Format: | Report |
Published: |
United States Air Force Research Laboratory (AFRL)
2015
|
_version_ | 1797055939533078528 |
---|---|
author | Loughry, J |
author_facet | Loughry, J |
author_sort | Loughry, J |
collection | OXFORD |
description | An automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automatically flags candidate words, phrases, sentences, and paragraphs in documents under review that are likely classified and suggests redactions to make the document unclassified. Security classification guidance from more than one guide at a time is figured into each suggested redaction. The probabilistic aspect of operation is in the way the system prioritizes its suggestions according to the measured rate of occurrence of words and phrases observed |
first_indexed | 2024-03-06T19:16:27Z |
format | Report |
id | oxford-uuid:188d690f-c518-480d-9a97-cfee50353e08 |
institution | University of Oxford |
last_indexed | 2024-03-06T19:16:27Z |
publishDate | 2015 |
publisher | United States Air Force Research Laboratory (AFRL) |
record_format | dspace |
spelling | oxford-uuid:188d690f-c518-480d-9a97-cfee50353e082022-03-26T10:43:51ZProbabilistic RedactionReporthttp://purl.org/coar/resource_type/c_93fcuuid:188d690f-c518-480d-9a97-cfee50353e08Department of Computer ScienceUnited States Air Force Research Laboratory (AFRL)2015Loughry, JAn automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automatically flags candidate words, phrases, sentences, and paragraphs in documents under review that are likely classified and suggests redactions to make the document unclassified. Security classification guidance from more than one guide at a time is figured into each suggested redaction. The probabilistic aspect of operation is in the way the system prioritizes its suggestions according to the measured rate of occurrence of words and phrases observed |
spellingShingle | Loughry, J Probabilistic Redaction |
title | Probabilistic Redaction |
title_full | Probabilistic Redaction |
title_fullStr | Probabilistic Redaction |
title_full_unstemmed | Probabilistic Redaction |
title_short | Probabilistic Redaction |
title_sort | probabilistic redaction |
work_keys_str_mv | AT loughryj probabilisticredaction |