White, I., Nottingham, K., Maniar, A., Robinson, M., Lillemark, H., Maheshwari, M., ... & Ammanabrolu, P. (2025). Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning. arXiv preprint arXiv:2504.17950.
This paper introduces MINDcraft, a novel Minecraft-based platform for studying collaborative embodied reasoning in Large Language Models (LLMs), alongside the MineCollab benchmark consisting of cooking, crafting, and construction tasks requiring inter-agent coordination. The research demonstrates that while LLMs exhibit some embodied reasoning capabilities, they struggle with efficient communication for multi-agent collaboration, evidenced by a significant performance drop (up to 15%) when detailed plans must be shared. This suggests that current LLMs are not optimized for such scenarios, revealing a bottleneck in utilizing natural language for coordination and highlighting the need for methods beyond in-context and imitation learning to improve collaborative abilities. The study also showcases how MINDcraft and MineCollab can be used to generate valuable datasets for fine-tuning less computationally intensive models and facilitate future research into the complexities of multi-agent embodied AI.
Share this post