Current service robots suffer from limited natural language communication abilities, heavy reliance on predefined commands, ongoing human intervention, and, most notably, a lack of proactive collaboration awareness in human-populated environments. This results in narrow applicability and low utility. In this paper, we introduce AssistantX, an LLM-powered proactive assistant designed for autonomous operation in real- world scenarios with high accuracy. AssistantX employs a multi-agent framework consisting of 4 specialized LLM agents, each dedicated to perception, planning, decision-making, and reflective review, facilitating advanced inference capabilities and comprehensive collaboration awareness, much like a human assistant by your side. We built a dataset of 280 real-world tasks to validate AssistantX, which includes instruction content and status information on whether relevant personnel are available. Extensive experiments were conducted in both text-based simulations and a real office environment over the course of a month and a half. Our experiments demonstrate the effectiveness of the proposed framework, showing that AssistantX can reactively respond to user instructions, actively adjust strategies to adapt to contingencies, and proactively seek assistance from humans to ensure successful task completion.
When AssistantX receives an instruction, it will proactively perceive virtual environmental information and real-world environmental information, generating cyber and real-world tasks. AssistantX proficiently generates both cyber tasks TC and real-world tasks TR while executing them concurrently in a manner akin to a human assistant.
Illustration of our PPDR4X framework (the multi-agent collaboration architecture of AssistantX.) In a given office scenario, PPDR4X is capable of accurately perceiving the surroundings and human intentions, thereby formulating comprehensive plans based on user instructions. It can also autonomously execute tasks and engage in self-reflection, even when the instructions are complex and lacking in detail. PPDR4X equips AssistantX with a problem-solving mindset similar to that of a human assistant, facilitating seamless integration into authentic work environments for autonomous effective interaction with other individuals. The components of PPDR4X include Memory Unit, Perception Agent, Planning Agent, Decision Agent and Reflection Agent.
An illustration of the inputs and outputs of PPDR agents, showing how they collaborate to determine the next move after the previous task, with all agents communicating in natural language, ensuring logical consistency and interpretability.
We demonstrate that AssistantX can reactively respond to Lee’s request and operate autonomously. When Mao is unavailable for printing, it actively searches memory for alternatives, identifying Wu. When Wu is also unavailable, AssistantX proactively seeks help in an active group chat to complete the complex task with human collaboration. Two representative inference processes showcasing the generation of proactive thoughts and behaviors are also presented.
We illustrate the comprehensive dialogue content and execution flow of the base instruction along with its three variants, while also offering a thorough evaluation of their respective difficulty levels.
Difficulty Level 3
Difficulty Level 4
Difficulty Level 5
Difficulty Level 8
By integrating a multimodal perception module into our multiagent framework, we also made further exploration of the abilities of AssistantX to handle more complex tasks that need interactions with the environments outside the office. Here is a brief presentation video of ordering takeaway online and delivering it directly to the user.