Provably Safe Artificial General Intelligence via Interactive Proofs

Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></...

Full description

Bibliographic Details
Main Author: Kristen Carlson
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Philosophies
Subjects:
Online Access:https://www.mdpi.com/2409-9287/6/4/83
_version_ 1797225577990586368
author Kristen Carlson
author_facet Kristen Carlson
author_sort Kristen Carlson
collection DOAJ
description Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></i> that differ dramatically in their computational capabilities (<i>AGI<sup>n</sup></i> << <i>AGI</i><sup><i>n</i>+1</sup>). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2<sup>−100</sup>). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. <i>In toto</i>, IPS provides a way to reduce <i>AGI<sup>n</sup></i> ↔ <i>AGI</i><sup><i>n</i>+1</sup> interaction hazards to an acceptably low level.
first_indexed 2024-03-10T03:18:44Z
format Article
id doaj.art-ba3cad24ebcb40acb664f740fcc11398
institution Directory Open Access Journal
issn 2409-9287
language English
last_indexed 2024-04-24T14:11:14Z
publishDate 2021-10-01
publisher MDPI AG
record_format Article
series Philosophies
spelling doaj.art-ba3cad24ebcb40acb664f740fcc113982024-04-03T09:22:56ZengMDPI AGPhilosophies2409-92872021-10-01648310.3390/philosophies6040083Provably Safe Artificial General Intelligence via Interactive ProofsKristen Carlson0Beth Israel Deaconess Medical Center and Harvard Medical School, Harvard University, Boston, MA 02115, USAMethods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></i> that differ dramatically in their computational capabilities (<i>AGI<sup>n</sup></i> << <i>AGI</i><sup><i>n</i>+1</sup>). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2<sup>−100</sup>). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. <i>In toto</i>, IPS provides a way to reduce <i>AGI<sup>n</sup></i> ↔ <i>AGI</i><sup><i>n</i>+1</sup> interaction hazards to an acceptably low level.https://www.mdpi.com/2409-9287/6/4/83artificial general intelligenceAGIAI safetyAI value alignmentAI containmentinteractive proof systems
spellingShingle Kristen Carlson
Provably Safe Artificial General Intelligence via Interactive Proofs
Philosophies
artificial general intelligence
AGI
AI safety
AI value alignment
AI containment
interactive proof systems
title Provably Safe Artificial General Intelligence via Interactive Proofs
title_full Provably Safe Artificial General Intelligence via Interactive Proofs
title_fullStr Provably Safe Artificial General Intelligence via Interactive Proofs
title_full_unstemmed Provably Safe Artificial General Intelligence via Interactive Proofs
title_short Provably Safe Artificial General Intelligence via Interactive Proofs
title_sort provably safe artificial general intelligence via interactive proofs
topic artificial general intelligence
AGI
AI safety
AI value alignment
AI containment
interactive proof systems
url https://www.mdpi.com/2409-9287/6/4/83
work_keys_str_mv AT kristencarlson provablysafeartificialgeneralintelligenceviainteractiveproofs