Provably Safe Artificial General Intelligence via Interactive Proofs

Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></...

Full description

Bibliographic Details
Main Author: Kristen Carlson
Format: Article
Language:English
Published: MDPI AG 2021-10-01
Series:Philosophies
Subjects:
Online Access:https://www.mdpi.com/2409-9287/6/4/83
Description
Summary:Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></i> that differ dramatically in their computational capabilities (<i>AGI<sup>n</sup></i> << <i>AGI</i><sup><i>n</i>+1</sup>). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2<sup>−100</sup>). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. <i>In toto</i>, IPS provides a way to reduce <i>AGI<sup>n</sup></i> ↔ <i>AGI</i><sup><i>n</i>+1</sup> interaction hazards to an acceptably low level.
ISSN:2409-9287