Computationally bounded rationality from three perspectives: precomputation, regret tradeoffs, and lifelong learning

What does it mean for a computer program to be optimal? Many fields in optimal decision making, from game theory to Bayesian decision theory, define optimal solutions which can be computationally intractable to implement or find. This is problematic, because it means that sometimes these solutions a...

Full description

Bibliographic Details
Main Author: Orton, T
Other Authors: Santhanam, R
Format: Thesis
Language:English
Published: 2023
Description
Summary:What does it mean for a computer program to be optimal? Many fields in optimal decision making, from game theory to Bayesian decision theory, define optimal solutions which can be computationally intractable to implement or find. This is problematic, because it means that sometimes these solutions are not physically realizable. To address this problem, bounded rationality studies what it means to behave optimally subject to constraints on processing time, memory and knowledge. This thesis contributes three new models for studying bounded rationality in different contexts. The first model considers games like chess. We suppose each player can spend some time before the game precomputing (memorizing) strong moves from an oracle, but has limited memory to remember these moves. We show how to analytically quantify how randomly optimal strategies play in equilibrium, and give polynomial- time algorithms for computing a best response and an ε-Nash equilibrium. We use the best response algorithm to empirically evaluate the chess playing program Stockfish. The second model takes place in the setting of adversarial online learning. Here, we imagine an algorithm receives new problems online, and is given a computational budget to run <i>B</i> problem solvers for each problem. We show how to trade off the budget <i>B</i> for a strengthening of the algorithm’s regret guarantee in both the full and semi-bandit feedback settings. We then show how this tradeoff implies new results for Online Submodular Function Maximization (OSFM) (Streeter and Golovin, 2008) and Linear Programming. We use these observations to derive and benchmark a new algorithm for OSFM. The third model approaches bounded rationality from the perspective of lifelong learning (Chen and Liu, 2018). Instead of modelling the final solution, lifelong learning models how a computationally bounded agent can accumulate knowledge over time and attempt to solve tractable subproblems it encounters. We develop models for incrementally accumulating and learning knowledge in a domain agnostic setting, and use these models to give an abstract framework for a lifelong reinforcement learner. The framework attempts to make a step towards making the best of analytical performance guarantees, while still being able to make use of black box techniques such as neural networks which may perform well in practice.