Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments

Oliver Beren Kaul; Kersten Behrens; Michael Rohs

doi:10.1145/3411763.3451611

Details

Original language	English
Title of host publication	Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021
Place of Publication	New York, NY, USA
Publisher	Association for Computing Machinery (ACM)
Pages	1-7
ISBN (electronic)	9781450380959
Publication status	Published - 8 May 2021
Event	CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths (CHI EA 2021) - Virtual, Online, Japan Duration: 8 May 2021 → 13 May 2021

Publication series

Name	Conference on Human Factors in Computing Systems - Proceedings

Abstract

People with visual impairments face challenges in scene and object recognition, especially in unknown environments. We combined the mobile scene detection framework Apple ARKit with MobileNet-v2 and 3D spatial audio to provide an auditory scene description to people with visual impairments. The combination of ARKit and MobileNet allows keeping recognized objects in the scene even if the user turns away from the object. An object can thus serve as an auditory landmark. With a search function, the system can even guide the user to a particular item. The system also provides spatial audio warnings for nearby objects and walls to avoid collisions. We evaluated the implemented app in a preliminary user study. The results show that users can find items without visual feedback using the proposed application. The study also reveals that the range of local object detection through MobileNet-v2 was insufficient, which we aim to overcome using more accurate object detection frameworks in future work (YOLOv5x).

Keywords

Accessibility, Auditory Scene Description, Collision Warnings, Mobile Scene Recognition, Object Detection, Spatial Audio, Text-to-Speech, Visually Impaired

ASJC Scopus subject areas

Computer Science(all)
Human-Computer Interaction
Computer Science(all)
Computer Graphics and Computer-Aided Design
Computer Science(all)
Software

Cite this

Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments. / Kaul, Oliver Beren; Behrens, Kersten; Rohs, Michael.
Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021. New York, NY, USA: Association for Computing Machinery (ACM), 2021. p. 1-7 394 (Conference on Human Factors in Computing Systems - Proceedings).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Kaul, OB, Behrens, K & Rohs, M 2021, Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments. in Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021., 394, Conference on Human Factors in Computing Systems - Proceedings, Association for Computing Machinery (ACM), New York, NY, USA, pp. 1-7, CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths (CHI EA 2021), Virtual, Online, Japan, 8 May 2021. https://doi.org/10.1145/3411763.3451611

Kaul, O. B., Behrens, K., & Rohs, M. (2021). Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021 (pp. 1-7). Article 394 (Conference on Human Factors in Computing Systems - Proceedings). Association for Computing Machinery (ACM). https://doi.org/10.1145/3411763.3451611

Kaul OB, Behrens K, Rohs M. Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021. New York, NY, USA: Association for Computing Machinery (ACM). 2021. p. 1-7. 394. (Conference on Human Factors in Computing Systems - Proceedings). doi: 10.1145/3411763.3451611

Kaul, Oliver Beren ; Behrens, Kersten ; Rohs, Michael. / Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments. Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021. New York, NY, USA : Association for Computing Machinery (ACM), 2021. pp. 1-7 (Conference on Human Factors in Computing Systems - Proceedings).

Download

@inproceedings{40e4e0a96de542cc80a88c5de98672b0,

title = "Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments",

abstract = "People with visual impairments face challenges in scene and object recognition, especially in unknown environments. We combined the mobile scene detection framework Apple ARKit with MobileNet-v2 and 3D spatial audio to provide an auditory scene description to people with visual impairments. The combination of ARKit and MobileNet allows keeping recognized objects in the scene even if the user turns away from the object. An object can thus serve as an auditory landmark. With a search function, the system can even guide the user to a particular item. The system also provides spatial audio warnings for nearby objects and walls to avoid collisions. We evaluated the implemented app in a preliminary user study. The results show that users can find items without visual feedback using the proposed application. The study also reveals that the range of local object detection through MobileNet-v2 was insufficient, which we aim to overcome using more accurate object detection frameworks in future work (YOLOv5x).",

keywords = "Accessibility, Auditory Scene Description, Collision Warnings, Mobile Scene Recognition, Object Detection, Spatial Audio, Text-to-Speech, Visually Impaired",

author = "Kaul, {Oliver Beren} and Kersten Behrens and Michael Rohs",

year = "2021",

month = may,

day = "8",

doi = "10.1145/3411763.3451611",

language = "English",

series = "Conference on Human Factors in Computing Systems - Proceedings",

publisher = "Association for Computing Machinery (ACM)",

pages = "1--7",

booktitle = "Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021",

address = "United States",

note = "CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths (CHI EA 2021), CHI EA 2021 ; Conference date: 08-05-2021 Through 13-05-2021",

}

Download

TY - GEN

T1 - Mobile Recognition and Tracking of Objects in the Environment through Augmented Reality and 3D Audio Cues for People with Visual Impairments

AU - Kaul, Oliver Beren

AU - Behrens, Kersten

AU - Rohs, Michael

PY - 2021/5/8

Y1 - 2021/5/8

N2 - People with visual impairments face challenges in scene and object recognition, especially in unknown environments. We combined the mobile scene detection framework Apple ARKit with MobileNet-v2 and 3D spatial audio to provide an auditory scene description to people with visual impairments. The combination of ARKit and MobileNet allows keeping recognized objects in the scene even if the user turns away from the object. An object can thus serve as an auditory landmark. With a search function, the system can even guide the user to a particular item. The system also provides spatial audio warnings for nearby objects and walls to avoid collisions. We evaluated the implemented app in a preliminary user study. The results show that users can find items without visual feedback using the proposed application. The study also reveals that the range of local object detection through MobileNet-v2 was insufficient, which we aim to overcome using more accurate object detection frameworks in future work (YOLOv5x).

AB - People with visual impairments face challenges in scene and object recognition, especially in unknown environments. We combined the mobile scene detection framework Apple ARKit with MobileNet-v2 and 3D spatial audio to provide an auditory scene description to people with visual impairments. The combination of ARKit and MobileNet allows keeping recognized objects in the scene even if the user turns away from the object. An object can thus serve as an auditory landmark. With a search function, the system can even guide the user to a particular item. The system also provides spatial audio warnings for nearby objects and walls to avoid collisions. We evaluated the implemented app in a preliminary user study. The results show that users can find items without visual feedback using the proposed application. The study also reveals that the range of local object detection through MobileNet-v2 was insufficient, which we aim to overcome using more accurate object detection frameworks in future work (YOLOv5x).

KW - Accessibility

KW - Auditory Scene Description

KW - Collision Warnings

KW - Mobile Scene Recognition

KW - Object Detection

KW - Spatial Audio

KW - Text-to-Speech

KW - Visually Impaired

UR - http://www.scopus.com/inward/record.url?scp=85105779312&partnerID=8YFLogxK

U2 - 10.1145/3411763.3451611

DO - 10.1145/3411763.3451611

M3 - Conference contribution

AN - SCOPUS:85105779312

T3 - Conference on Human Factors in Computing Systems - Proceedings

SP - 1

EP - 7

BT - Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, CHI EA 2021

PB - Association for Computing Machinery (ACM)

CY - New York, NY, USA

T2 - CHI Conference on Human Factors in Computing Systems: Making Waves, Combining Strengths (CHI EA 2021)

Y2 - 8 May 2021 through 13 May 2021

ER -

Research@Leibniz University