AssistantX: An LLM-Powered Proactive Assistant in Collaborative Human-Populated Environment

Nan Sun1*, Bo Mao2*, Yongchang Li1*, Lumeng Ma1, Di Guo2, Huaping Liu1
1Department of Computer Science and Technology, Tsinghua University 2School of Artificial Intelligence, Beijing University of Posts and Telecommunications
*Equal Contribution


AssistantX. A description of the whole system of AssistantX. The video includes motivation, problem formulation, framework, datasets and example presentations.

Abstract

The increasing demand for intelligent assistants in human-populated environments has motivated significant research in autonomous robotic systems. Traditional service robots and virtual assistants, however, struggle with real-world task execution due to their limited capacity for dynamic reasoning and interaction, particularly when human collaboration is required. Recent developments in Large Language Models have opened new avenues for improving these systems, enabling more sophisticated reasoning and natural interaction capabilities. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed to operate autonomously in a physical office environment. Unlike conventional service robots with limited reasoning capabilities, AssistantX leverages a novel multi-agent architecture, PPDR4X, which provides it with advanced inference capabilities, as well as comprehensive collaboration awareness. By effectively bridging the gap between virtual operations and physical interactions, AssistantX demonstrates robust performance in managing complex real-world scenarios. Our evaluation highlights the architecture’s effectiveness, showing that AssistantX can respond reactively to clear instructions, actively retrieve supplementary information from memory, and proactively seek collaboration from team members to ensure successful task completion.

Problem Formulation

Description of the image

When AssistantX receives an instruction, it will proactively perceive virtual environmental information and real-world environmental information, generating cyber and real-world tasks. AssistantX proficiently generates both cyber tasks TC and real-world tasks TR while executing them concurrently in a manner akin to a human assistant.

AssistantX Framework

Description of the image

Illustration of our PPDR4X framework (the multi-agent collaboration architecture of AssistantX.) In a given office scenario, PPDR4X is capable of accurately perceiving the surroundings and human intentions, thereby formulating comprehensive plans based on user instructions. It can also autonomously execute tasks and engage in self-reflection, even when the instructions are complex and lacking in detail. PPDR4X equips AssistantX with a problem-solving mindset similar to that of a human assistant, facilitating seamless integration into authentic work environments for autonomous effective interaction with other individuals. The components of PPDR4X include Memory Unit, Perception Agent, Planning Agent, Decision Agent and Reflection Agent.

Execution of an instruction

Description of the image

We illustrate the comprehensive dialogue content and execution flow of the base instruction along with its three variants, while also offering a thorough evaluation of their respective difficulty levels.

Difficulty Level 3

Difficulty Level 4

Difficulty Level 5

Difficulty Level 8

Further exploration

By integrating a multimodal perception module into our multiagent framework, we also made further exploration of the abilities of AssistantX to handle more complex tasks that need interactions with the environments outside the office. Here is a brief presentation video of ordering takeaway online and delivering it directly to the user.