A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autoren

  • Max Heinrich Laves
  • Jens Bicker
  • Lüder A. Kahrs
  • Tobias Ortmaier

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)483-492
Seitenumfang10
FachzeitschriftInternational journal of computer assisted radiology and surgery
Jahrgang14
Ausgabenummer3
Frühes Online-Datum16 Jan. 2019
PublikationsstatusVeröffentlicht - 14 März 2019

Abstract

Purpose: Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer- and robot-aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration. Methods: Four machine learning-based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning. Results: In this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7%. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning. Conclusion: CNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data.

ASJC Scopus Sachgebiete

Zitieren

A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation. / Laves, Max Heinrich; Bicker, Jens; Kahrs, Lüder A. et al.
in: International journal of computer assisted radiology and surgery, Jahrgang 14, Nr. 3, 14.03.2019, S. 483-492.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Laves MH, Bicker J, Kahrs LA, Ortmaier T. A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation. International journal of computer assisted radiology and surgery. 2019 Mär 14;14(3):483-492. Epub 2019 Jan 16. doi: 10.48550/arXiv.1807.06081, 10.1007/s11548-018-01910-0
Download
@article{fa55ab01bf6b4af5be1bfb99809a0802,
title = "A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation",
abstract = "Purpose: Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer- and robot-aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration. Methods: Four machine learning-based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning. Results: In this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7%. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning. Conclusion: CNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data.",
keywords = "Computer vision, Larynx, Machine learning, Open-access dataset, Patient-to-patient fine-tuning, Soft tissue, Vocal folds",
author = "Laves, {Max Heinrich} and Jens Bicker and Kahrs, {L{\"u}der A.} and Tobias Ortmaier",
note = "Funding information: We thank Giorgio Peretti from the Ospedale Policlinico San Martino, University of Genova, Italy, for providing us with the in vivo laryngeal data used in this study. We would also like to thank James Napier from the Institute of Lasers and Optics, University of Applied Sciences Emden-Leer, Germany, for his thorough proofreading of this manuscript. Funding This research has received funding from the European Union as being part of the ERFE OPhonLas project.",
year = "2019",
month = mar,
day = "14",
doi = "10.48550/arXiv.1807.06081",
language = "English",
volume = "14",
pages = "483--492",
journal = "International journal of computer assisted radiology and surgery",
issn = "1861-6410",
publisher = "Springer Verlag",
number = "3",

}

Download

TY - JOUR

T1 - A dataset of laryngeal endoscopic images with comparative study on convolution neural network-based semantic segmentation

AU - Laves, Max Heinrich

AU - Bicker, Jens

AU - Kahrs, Lüder A.

AU - Ortmaier, Tobias

N1 - Funding information: We thank Giorgio Peretti from the Ospedale Policlinico San Martino, University of Genova, Italy, for providing us with the in vivo laryngeal data used in this study. We would also like to thank James Napier from the Institute of Lasers and Optics, University of Applied Sciences Emden-Leer, Germany, for his thorough proofreading of this manuscript. Funding This research has received funding from the European Union as being part of the ERFE OPhonLas project.

PY - 2019/3/14

Y1 - 2019/3/14

N2 - Purpose: Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer- and robot-aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration. Methods: Four machine learning-based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning. Results: In this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7%. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning. Conclusion: CNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data.

AB - Purpose: Automated segmentation of anatomical structures in medical image analysis is a prerequisite for autonomous diagnosis as well as various computer- and robot-aided interventions. Recent methods based on deep convolutional neural networks (CNN) have outperformed former heuristic methods. However, those methods were primarily evaluated on rigid, real-world environments. In this study, existing segmentation methods were evaluated for their use on a new dataset of transoral endoscopic exploration. Methods: Four machine learning-based methods SegNet, UNet, ENet and ErfNet were trained with supervision on a novel 7-class dataset of the human larynx. The dataset contains 536 manually segmented images from two patients during laser incisions. The Intersection-over-Union (IoU) evaluation metric was used to measure the accuracy of each method. Data augmentation and network ensembling were employed to increase segmentation accuracy. Stochastic inference was used to show uncertainties of the individual models. Patient-to-patient transfer was investigated using patient-specific fine-tuning. Results: In this study, a weighted average ensemble network of UNet and ErfNet was best suited for the segmentation of laryngeal soft tissue with a mean IoU of 84.7%. The highest efficiency was achieved by ENet with a mean inference time of 9.22 ms per image. It is shown that 10 additional images from a new patient are sufficient for patient-specific fine-tuning. Conclusion: CNN-based methods for semantic segmentation are applicable to endoscopic images of laryngeal soft tissue. The segmentation can be used for active constraints or to monitor morphological changes and autonomously detect pathologies. Further improvements could be achieved by using a larger dataset or training the models in a self-supervised manner on additional unlabeled data.

KW - Computer vision

KW - Larynx

KW - Machine learning

KW - Open-access dataset

KW - Patient-to-patient fine-tuning

KW - Soft tissue

KW - Vocal folds

UR - http://www.scopus.com/inward/record.url?scp=85060178921&partnerID=8YFLogxK

U2 - 10.48550/arXiv.1807.06081

DO - 10.48550/arXiv.1807.06081

M3 - Article

C2 - 30649670

AN - SCOPUS:85060178921

VL - 14

SP - 483

EP - 492

JO - International journal of computer assisted radiology and surgery

JF - International journal of computer assisted radiology and surgery

SN - 1861-6410

IS - 3

ER -