Transformer models for Land Cover Classification with Satellite Image Time Series

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

  • Mirjana Voelsen
  • Franz Rottensteiner
  • Christian Heipke
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Seiten (von - bis)547-568
Seitenumfang22
FachzeitschriftPFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
Jahrgang92
Ausgabenummer5
Frühes Online-Datum6 Aug. 2024
PublikationsstatusVeröffentlicht - Okt. 2024

Abstract

In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

ASJC Scopus Sachgebiete

Zitieren

Transformer models for Land Cover Classification with Satellite Image Time Series. / Voelsen, Mirjana; Rottensteiner, Franz; Heipke, Christian.
in: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Jahrgang 92, Nr. 5, 10.2024, S. 547-568.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Voelsen, M, Rottensteiner, F & Heipke, C 2024, 'Transformer models for Land Cover Classification with Satellite Image Time Series', PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Jg. 92, Nr. 5, S. 547-568. https://doi.org/10.1007/s41064-024-00299-7
Voelsen, M., Rottensteiner, F., & Heipke, C. (2024). Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 92(5), 547-568. https://doi.org/10.1007/s41064-024-00299-7
Voelsen M, Rottensteiner F, Heipke C. Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 Okt;92(5):547-568. Epub 2024 Aug 6. doi: 10.1007/s41064-024-00299-7
Voelsen, Mirjana ; Rottensteiner, Franz ; Heipke, Christian. / Transformer models for Land Cover Classification with Satellite Image Time Series. in: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 ; Jahrgang 92, Nr. 5. S. 547-568.
Download
@article{a2289b5e8c4e4d1ab55fbf9b0c8601e7,
title = "Transformer models for Land Cover Classification with Satellite Image Time Series",
abstract = "In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.",
keywords = "Land Cover Classification, Satellite Image Time Series, Self-attention, Swin Transformer",
author = "Mirjana Voelsen and Franz Rottensteiner and Christian Heipke",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
month = oct,
doi = "10.1007/s41064-024-00299-7",
language = "English",
volume = "92",
pages = "547--568",
number = "5",

}

Download

TY - JOUR

T1 - Transformer models for Land Cover Classification with Satellite Image Time Series

AU - Voelsen, Mirjana

AU - Rottensteiner, Franz

AU - Heipke, Christian

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/10

Y1 - 2024/10

N2 - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

AB - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

KW - Land Cover Classification

KW - Satellite Image Time Series

KW - Self-attention

KW - Swin Transformer

UR - http://www.scopus.com/inward/record.url?scp=85200604175&partnerID=8YFLogxK

U2 - 10.1007/s41064-024-00299-7

DO - 10.1007/s41064-024-00299-7

M3 - Article

AN - SCOPUS:85200604175

VL - 92

SP - 547

EP - 568

JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

SN - 2512-2789

IS - 5

ER -