Loading [MathJax]/extensions/tex2jax.js

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Mariia Khan
  • Yue Qiu
  • Yuren Cong
  • Bodo Rosenhahn

Research Organisations

External Research Organisations

  • Edith Cowan University
  • National Institute of Advanced Industrial Science and Technology

Details

Original languageEnglish
Title of host publication2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
Pages9777-9783
Number of pages7
ISBN (electronic)979-8-3503-7770-5
Publication statusPublished - 14 Oct 2024
Event2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 - Abu Dhabi, United Arab Emirates
Duration: 14 Oct 202418 Oct 2024

Publication series

NameIEEE International Conference on Intelligent Robots and Systems
ISSN (Print)2153-0858
ISSN (electronic)2153-0866

Abstract

Understanding of scene changes is crucial for embodied AI applications, such as visual room rearrangement, where the agent must revert changes by restoring the objects to their original locations or states. Visual changes between two scenes, pre- and post-rearrangement, encompass two tasks: scene change detection (locating changes) and image difference captioning (describing changes). While previous methods, focused on sequential 2D images, have addressed these tasks separately, it is essential to emphasize the significance of their combination. Therefore, we propose a new Scene Change Understanding (SCU) task for simultaneous change detection and description. Moreover, we go beyond change language description generation and aim to generate rearrangement instructions for the robotic agent to revert changes. To solve this task, we propose a novel method - EmbSCU, which allows to compare instance-level change object masks (for 53 frequently-seen indoor object classes) before and after changes and generate rearrangement language instructions for the agent. EmbSCU is built on our Segment Any Object Model (SAOMv2) - a fine-tuned version of Segment Anything Model (SAM), adapted to obtain instance-level object masks for both foreground and background objects in indoor embodied environments. EmbSCU is evaluated on our own dataset of sequential 2D image pairs before and after changes, collected from the Ai2Thor simulator. The proposed framework achieves promising results in both change detection and change description. Moreover, EmbSCU demonstrates positive generalization results on real-world scenes without using any real-life data during training. The dataset and the code are available here.

ASJC Scopus subject areas

Cite this

Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change. / Khan, Mariia; Qiu, Yue; Cong, Yuren et al.
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2024. p. 9777-9783 (IEEE International Conference on Intelligent Robots and Systems).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Khan, M, Qiu, Y, Cong, Y, Rosenhahn, B, Suter, D & Abu-Khalaf, J 2024, Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change. in 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE International Conference on Intelligent Robots and Systems, pp. 9777-9783, 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, United Arab Emirates, 14 Oct 2024. https://doi.org/10.1109/IROS58592.2024.10801354
Khan, M., Qiu, Y., Cong, Y., Rosenhahn, B., Suter, D., & Abu-Khalaf, J. (2024). Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 9777-9783). (IEEE International Conference on Intelligent Robots and Systems). https://doi.org/10.1109/IROS58592.2024.10801354
Khan M, Qiu Y, Cong Y, Rosenhahn B, Suter D, Abu-Khalaf J. Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change. In 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2024. p. 9777-9783. (IEEE International Conference on Intelligent Robots and Systems). doi: 10.1109/IROS58592.2024.10801354
Khan, Mariia ; Qiu, Yue ; Cong, Yuren et al. / Indoor Scene Change Understanding (SCU) : Segment, Describe, and Revert Any Change. 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2024. pp. 9777-9783 (IEEE International Conference on Intelligent Robots and Systems).
Download
@inproceedings{75ef6d7ab6c84165afdadde2be231b9f,
title = "Indoor Scene Change Understanding (SCU): Segment, Describe, and Revert Any Change",
abstract = "Understanding of scene changes is crucial for embodied AI applications, such as visual room rearrangement, where the agent must revert changes by restoring the objects to their original locations or states. Visual changes between two scenes, pre- and post-rearrangement, encompass two tasks: scene change detection (locating changes) and image difference captioning (describing changes). While previous methods, focused on sequential 2D images, have addressed these tasks separately, it is essential to emphasize the significance of their combination. Therefore, we propose a new Scene Change Understanding (SCU) task for simultaneous change detection and description. Moreover, we go beyond change language description generation and aim to generate rearrangement instructions for the robotic agent to revert changes. To solve this task, we propose a novel method - EmbSCU, which allows to compare instance-level change object masks (for 53 frequently-seen indoor object classes) before and after changes and generate rearrangement language instructions for the agent. EmbSCU is built on our Segment Any Object Model (SAOMv2) - a fine-tuned version of Segment Anything Model (SAM), adapted to obtain instance-level object masks for both foreground and background objects in indoor embodied environments. EmbSCU is evaluated on our own dataset of sequential 2D image pairs before and after changes, collected from the Ai2Thor simulator. The proposed framework achieves promising results in both change detection and change description. Moreover, EmbSCU demonstrates positive generalization results on real-world scenes without using any real-life data during training. The dataset and the code are available here.",
author = "Mariia Khan and Yue Qiu and Yuren Cong and Bodo Rosenhahn and David Suter and Jumana Abu-Khalaf",
note = "Publisher Copyright: {\textcopyright} 2024 IEEE.; 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024 ; Conference date: 14-10-2024 Through 18-10-2024",
year = "2024",
month = oct,
day = "14",
doi = "10.1109/IROS58592.2024.10801354",
language = "English",
isbn = "979-8-3503-7771-2",
series = "IEEE International Conference on Intelligent Robots and Systems",
pages = "9777--9783",
booktitle = "2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)",

}

Download

TY - GEN

T1 - Indoor Scene Change Understanding (SCU)

T2 - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024

AU - Khan, Mariia

AU - Qiu, Yue

AU - Cong, Yuren

AU - Rosenhahn, Bodo

AU - Suter, David

AU - Abu-Khalaf, Jumana

N1 - Publisher Copyright: © 2024 IEEE.

PY - 2024/10/14

Y1 - 2024/10/14

N2 - Understanding of scene changes is crucial for embodied AI applications, such as visual room rearrangement, where the agent must revert changes by restoring the objects to their original locations or states. Visual changes between two scenes, pre- and post-rearrangement, encompass two tasks: scene change detection (locating changes) and image difference captioning (describing changes). While previous methods, focused on sequential 2D images, have addressed these tasks separately, it is essential to emphasize the significance of their combination. Therefore, we propose a new Scene Change Understanding (SCU) task for simultaneous change detection and description. Moreover, we go beyond change language description generation and aim to generate rearrangement instructions for the robotic agent to revert changes. To solve this task, we propose a novel method - EmbSCU, which allows to compare instance-level change object masks (for 53 frequently-seen indoor object classes) before and after changes and generate rearrangement language instructions for the agent. EmbSCU is built on our Segment Any Object Model (SAOMv2) - a fine-tuned version of Segment Anything Model (SAM), adapted to obtain instance-level object masks for both foreground and background objects in indoor embodied environments. EmbSCU is evaluated on our own dataset of sequential 2D image pairs before and after changes, collected from the Ai2Thor simulator. The proposed framework achieves promising results in both change detection and change description. Moreover, EmbSCU demonstrates positive generalization results on real-world scenes without using any real-life data during training. The dataset and the code are available here.

AB - Understanding of scene changes is crucial for embodied AI applications, such as visual room rearrangement, where the agent must revert changes by restoring the objects to their original locations or states. Visual changes between two scenes, pre- and post-rearrangement, encompass two tasks: scene change detection (locating changes) and image difference captioning (describing changes). While previous methods, focused on sequential 2D images, have addressed these tasks separately, it is essential to emphasize the significance of their combination. Therefore, we propose a new Scene Change Understanding (SCU) task for simultaneous change detection and description. Moreover, we go beyond change language description generation and aim to generate rearrangement instructions for the robotic agent to revert changes. To solve this task, we propose a novel method - EmbSCU, which allows to compare instance-level change object masks (for 53 frequently-seen indoor object classes) before and after changes and generate rearrangement language instructions for the agent. EmbSCU is built on our Segment Any Object Model (SAOMv2) - a fine-tuned version of Segment Anything Model (SAM), adapted to obtain instance-level object masks for both foreground and background objects in indoor embodied environments. EmbSCU is evaluated on our own dataset of sequential 2D image pairs before and after changes, collected from the Ai2Thor simulator. The proposed framework achieves promising results in both change detection and change description. Moreover, EmbSCU demonstrates positive generalization results on real-world scenes without using any real-life data during training. The dataset and the code are available here.

UR - http://www.scopus.com/inward/record.url?scp=85216500891&partnerID=8YFLogxK

U2 - 10.1109/IROS58592.2024.10801354

DO - 10.1109/IROS58592.2024.10801354

M3 - Conference contribution

AN - SCOPUS:85216500891

SN - 979-8-3503-7771-2

T3 - IEEE International Conference on Intelligent Robots and Systems

SP - 9777

EP - 9783

BT - 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Y2 - 14 October 2024 through 18 October 2024

ER -

By the same author(s)