Active Semantic Mapping for Household Robots: Rapid Indoor Adaptation and Reduced User Burden

1Ritsumeikan University, 2Soka University, 3Kyoto University
Accepted at IEEE SMC 2023.

*Corresponding Author

Demonstration Video

Abstract

Active semantic mapping is essential for service robots to quickly capture both the map of an environment and its spatial meaning, while also minimizing the burden on users during robot operation and data collection. SpCoSLAM, a method of semantic mapping with place categorization and simultaneous localization and mapping (SLAM), is well suited to environmental adaptation, as it is not limited to predefined labels. However, SpCoSLAM presents two issues that increase the burden on users: 1) users struggle to efficiently determine a destination for the robot’s quick adaptation, and 2) providing instructions to the robot becomes repetitive and cumbersome. To address these challenges, we propose Active-SpCoSLAM, which enables the robot to actively explore uncharted areas and employs CLIP for image captioning to provide a flexible vocabulary that replaces human instructions. The robot determines its actions by calculating information gain integrated from both semantics and SLAM uncertainties. We conducted experiments in a simulated environment, comparing the proposed method to other methods in terms of efficiency and applicability to object discovery tasks. Additionally, we tested the proposed method, which combines user instruction and CLIP, in a real environment. Our results demonstrated that the robot explored its environment with approximately five fewer iterations and 11 minutes faster compared to the case of random exploration. Moreover, our method achieved a higher success rate in object discovery tasks during earlier stages of learning compared to other methods. In conclusion, the proposed method rapidly covers an environment while gathering useful data for object discovery tasks, thus reducing the burden on users and enhancing the robot’s adaptability.

Graphical Model

image of model

Focused task in this study. (Top) The robot performs SLAM to estimate self-position and map simultaneously. (Bottom) The robot acquires categorical knowledge (spatial concepts) about locations based on multimodal information observed by the robot: image features, image captions, and self-location.

Action Determination

The robot navigates to the position with the maximum information gain among the candidate points. The IG in Active-SpCoSLAM consists of a weighted sum of three IGs: related to spatial concepts, self-location, and maps, respectively.

Learning Spatial Concept

First, the robot acquires multiple images of its surroundings with the head camera. Then, the images are input to ClipCap for caption generation, and the images are input to CNN to obtain image features. The robot learns location concepts by categorizing these, plus self-position, into three multimodal pieces of information.

Video Presentation

BibTeX


      @inproceedings{ishikawa2023active,
        title={Active semantic mapping for household robots: rapid indoor adaptation and reduced user burden},
        author={Ishikawa, Tomochika and Taniguchi, Akira and Hagiwara, Yoshinobu and Taniguchi, Tadahiro},
        booktitle={2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)},
        pages={3116--3123},
        year={2023},
        organization={IEEE}
      }
    

Funding

This work was supported by the Japan Science and Technology Agency (JST), Moonshot Research & Development Program (Grant Number JPMJMS2011), and the Japan Society for the Promotion of Science (JSPS), KAKENHI Grant Number JP20K19900, JP23K16975 and JP22K12212.