Implementing Actor-Critic with Large Language Models

Actor-Critic

Abstract

While large language models (LLMs) have shown impressive capability across numerous tasks, they often struggle with interactive decision-making tasks if they are used as actor-only methods, i.e., directly generating and selecting actions based on previous trajectories. This struggle is because LLMs generate textual action auto-regressively and do not conduct explicit long-term planning, which is often necessary for decision-making tasks. Hence recent work turns to critic-only methods, which use other repurposed LLMs as critics to evaluate each action candidate through planning and simulating and select the action with the best-estimated evaluation. However, both actor-only and critic-only methods ignore the interrelation between actor and critic, prioritize one over the other, and insufficiently exploit the valuable knowledge from the actor and critic for decision-making. To address this problem, we propose to integrate prior actor-only and critic-only methods in the way that would utilize the merits of the actor-critic algorithm with the strengths of LLMs. Specifically, we design two novel critics to exploit the strong prior knowledge in LLMs and integrate them with the actor via in-context learning and solving an optimization problem, respectively, during different decision-making phases. Empirically, we apply our approach to a diverse set of decision-making tasks that cover both high-level action space (ALFWorld) and low-level action space (BabyAI-text). Our method outperforms other state-of-the-art baselines using the same 7B/8B open-source LLMs and even exceeds ReAct using GPT-4 in most settings.

Publication
Submit to Thirty-eighth Annual Conference on Neural Information Processing Systems
Heng Dong (董恒)
Heng Dong (董恒)
Ph.D. Student

My research interests include reinforcement learning, robot design, embodied AI and multi-agent.