AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Nan Sun1*, Bo Mao2*, Yongchang Li1*, Di Guo2, Huaping Liu1
1Department of Computer Science and Technology, Tsinghua University 2School of Artificial Intelligence, Beijing University of Posts and Telecommunications
*Equal Contribution


AssistantX. A description of the whole system of AssistantX. The video includes motivation, problem formulation, framework, datasets and example presentations.

Abstract

Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in real- world scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 280 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion.

Problem Formulation

Description of the image

When AssistantX receives an instruction, it will proactively perceive virtual environmental information and real-world environmental information, generating cyber and real-world tasks. AssistantX proficiently generates both cyber tasks TC and real-world tasks TR while executing them concurrently in a manner akin to a human assistant.

AssistantX Framework

Description of the image

Illustration of our PPDR4X framework (the multi-agent collaboration architecture of AssistantX.) In a given office scenario, PPDR4X is capable of accurately perceiving the surroundings and human intentions, thereby formulating comprehensive plans based on user instructions. It can also autonomously execute tasks and engage in self-reflection, even when the instructions are complex and lacking in detail. PPDR4X equips AssistantX with a problem-solving mindset similar to that of a human assistant, facilitating seamless integration into authentic work environments for autonomous effective interaction with other individuals. The components of PPDR4X include Memory Unit, Perception Agent, Planning Agent, Decision Agent and Reflection Agent.

Details of AssistantX agents

Description of the image

An illustration of the inputs and outputs of PPDR agents, showing how they collaborate to determine the next move after the previous task, with all agents communicating in natural language, ensuring logical consistency and interpretability.

Outputs of AssistantX agents

Description of the image

We demonstrate that AssistantX can reactively respond to Lee’s request and operate autonomously. When Mao is unavailable for printing, it actively searches memory for alternatives, identifying Wu. When Wu is also unavailable, AssistantX proactively seeks help in an active group chat to complete the complex task with human collaboration. Two representative inference processes showcasing the generation of proactive thoughts and behaviors are also presented.

Execution of an instruction

Description of the image

We illustrate the comprehensive dialogue content and execution flow of the base instruction along with its three variants, while also offering a thorough evaluation of their respective difficulty levels.

Difficulty Level 3

Difficulty Level 4

Difficulty Level 5

Difficulty Level 8

Further exploration

By integrating a multimodal perception module into our multiagent framework, we also made further exploration of the abilities of AssistantX to handle more complex tasks that need interactions with the environments outside the office. Here is a brief presentation video of ordering takeaway online and delivering it directly to the user.