Probabilistic Redaction

An automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automa...

Full description

Bibliographic Details
Main Author: Loughry, J
Format: Report
Published: United States Air Force Research Laboratory (AFRL) 2015
_version_ 1797055939533078528
author Loughry, J
author_facet Loughry, J
author_sort Loughry, J
collection OXFORD
description An automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automatically flags candidate words, phrases, sentences, and paragraphs in documents under review that are likely classified and suggests redactions to make the document unclassified. Security classification guidance from more than one guide at a time is figured into each suggested redaction. The probabilistic aspect of operation is in the way the system prioritizes its suggestions according to the measured rate of occurrence of words and phrases observed
first_indexed 2024-03-06T19:16:27Z
format Report
id oxford-uuid:188d690f-c518-480d-9a97-cfee50353e08
institution University of Oxford
last_indexed 2024-03-06T19:16:27Z
publishDate 2015
publisher United States Air Force Research Laboratory (AFRL)
record_format dspace
spelling oxford-uuid:188d690f-c518-480d-9a97-cfee50353e082022-03-26T10:43:51ZProbabilistic RedactionReporthttp://purl.org/coar/resource_type/c_93fcuuid:188d690f-c518-480d-9a97-cfee50353e08Department of Computer ScienceUnited States Air Force Research Laboratory (AFRL)2015Loughry, JAn automated interactive redaction assistant prototype was developed. Based on the Web 1T 5-gram corpus, a list of every unique word and phrase in the English language, up to five words in length, that were observed on the World Wide Web and collected by Google, Inc. in 2006, the CLOAK system automatically flags candidate words, phrases, sentences, and paragraphs in documents under review that are likely classified and suggests redactions to make the document unclassified. Security classification guidance from more than one guide at a time is figured into each suggested redaction. The probabilistic aspect of operation is in the way the system prioritizes its suggestions according to the measured rate of occurrence of words and phrases observed
spellingShingle Loughry, J
Probabilistic Redaction
title Probabilistic Redaction
title_full Probabilistic Redaction
title_fullStr Probabilistic Redaction
title_full_unstemmed Probabilistic Redaction
title_short Probabilistic Redaction
title_sort probabilistic redaction
work_keys_str_mv AT loughryj probabilisticredaction