Provably Safe Artificial General Intelligence via Interactive Proofs
Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-10-01
|
Series: | Philosophies |
Subjects: | |
Online Access: | https://www.mdpi.com/2409-9287/6/4/83 |
_version_ | 1797225577990586368 |
---|---|
author | Kristen Carlson |
author_facet | Kristen Carlson |
author_sort | Kristen Carlson |
collection | DOAJ |
description | Methods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></i> that differ dramatically in their computational capabilities (<i>AGI<sup>n</sup></i> << <i>AGI</i><sup><i>n</i>+1</sup>). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2<sup>−100</sup>). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. <i>In toto</i>, IPS provides a way to reduce <i>AGI<sup>n</sup></i> ↔ <i>AGI</i><sup><i>n</i>+1</sup> interaction hazards to an acceptably low level. |
first_indexed | 2024-03-10T03:18:44Z |
format | Article |
id | doaj.art-ba3cad24ebcb40acb664f740fcc11398 |
institution | Directory Open Access Journal |
issn | 2409-9287 |
language | English |
last_indexed | 2024-04-24T14:11:14Z |
publishDate | 2021-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Philosophies |
spelling | doaj.art-ba3cad24ebcb40acb664f740fcc113982024-04-03T09:22:56ZengMDPI AGPhilosophies2409-92872021-10-01648310.3390/philosophies6040083Provably Safe Artificial General Intelligence via Interactive ProofsKristen Carlson0Beth Israel Deaconess Medical Center and Harvard Medical School, Harvard University, Boston, MA 02115, USAMethods are currently lacking to <i>prove</i> artificial general intelligence (AGI) safety. An AGI ‘hard takeoff’ is possible, in which first generation <i>AGI<sup>1</sup></i> rapidly triggers a succession of more powerful <i>AGI<sup>n</sup></i> that differ dramatically in their computational capabilities (<i>AGI<sup>n</sup></i> << <i>AGI</i><sup><i>n</i>+1</sup>). No proof exists that AGI will benefit humans or of a sound value-alignment method. Numerous paths toward human extinction or subjugation have been identified. We suggest that probabilistic proof methods are the fundamental paradigm for proving safety and value-alignment between disparately powerful autonomous agents. Interactive proof systems (IPS) describe mathematical communication protocols wherein a Verifier queries a computationally more powerful Prover and reduces the probability of the Prover deceiving the Verifier to any specified low probability (e.g., 2<sup>−100</sup>). IPS procedures can test AGI behavior control systems that incorporate hard-coded ethics or value-learning methods. Mapping the axioms and transformation rules of a behavior control system to a finite set of prime numbers allows validation of ‘safe’ behavior via IPS number-theoretic methods. Many other representations are needed for proving various AGI properties. Multi-prover IPS, program-checking IPS, and probabilistically checkable proofs further extend the paradigm. <i>In toto</i>, IPS provides a way to reduce <i>AGI<sup>n</sup></i> ↔ <i>AGI</i><sup><i>n</i>+1</sup> interaction hazards to an acceptably low level.https://www.mdpi.com/2409-9287/6/4/83artificial general intelligenceAGIAI safetyAI value alignmentAI containmentinteractive proof systems |
spellingShingle | Kristen Carlson Provably Safe Artificial General Intelligence via Interactive Proofs Philosophies artificial general intelligence AGI AI safety AI value alignment AI containment interactive proof systems |
title | Provably Safe Artificial General Intelligence via Interactive Proofs |
title_full | Provably Safe Artificial General Intelligence via Interactive Proofs |
title_fullStr | Provably Safe Artificial General Intelligence via Interactive Proofs |
title_full_unstemmed | Provably Safe Artificial General Intelligence via Interactive Proofs |
title_short | Provably Safe Artificial General Intelligence via Interactive Proofs |
title_sort | provably safe artificial general intelligence via interactive proofs |
topic | artificial general intelligence AGI AI safety AI value alignment AI containment interactive proof systems |
url | https://www.mdpi.com/2409-9287/6/4/83 |
work_keys_str_mv | AT kristencarlson provablysafeartificialgeneralintelligenceviainteractiveproofs |