Summary: | People leverage the compositional nature of their environment to generalize to new scenarios. For example, if you understand the meaning of the verb "to sing" and the adverb "loudly," then you can determine the meaning of the novel phrase "to sing loudly" from these known components. This process is known as generalization through systematic compositionality. Developing agents that can use systematic compositionality to generalize to new conditions has been a long-standing problem in AI. In response to this challenge, grounded benchmarks have been developed to evaluate an agent’s ability to generalize using this approach. However, there are key problems with the current grounded benchmarks. To start, these benchmarks are ad-hoc. They propose sets of tasks without any formalism, so it is challenging to determine whether or not these tasks exhaustively explore the set of possible generalizations. This lack of structure also makes it challenging to compare benchmarks concretely. Another key issue with these benchmarks is that their environments are defined by a fixed set of rules and a small set of objects whose states can be changed. By strictly delineating the rules of these environments, we have overlooked a critical rule-understanding and manipulation capability that agents will need in the real world. Our approach to addressing these issues is twofold. First, we define a formalism to investigate generalization mathematically as a function of the environment architecture. We then use this formalism to create a novel type of generalization benchmark for agents that must learn to change the rules of their environments. Lastly, we run both supervised learning and reinforcement learning models on a small subset of the benchmark tasks to validate our environment and pinpoint key conditions under which agents fail to generalize.
|