Transformer models for Land Cover Classification with Satellite Image Time Series

Research output: Contribution to journalArticleResearchpeer review

Authors

  • Mirjana Voelsen
  • Franz Rottensteiner
  • Christian Heipke
View graph of relations

Details

Original languageEnglish
Pages (from-to)547-568
Number of pages22
JournalPFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
Volume92
Issue number5
Early online date6 Aug 2024
Publication statusPublished - Oct 2024

Abstract

In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

Keywords

    Land Cover Classification, Satellite Image Time Series, Self-attention, Swin Transformer

ASJC Scopus subject areas

Cite this

Transformer models for Land Cover Classification with Satellite Image Time Series. / Voelsen, Mirjana; Rottensteiner, Franz; Heipke, Christian.
In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Vol. 92, No. 5, 10.2024, p. 547-568.

Research output: Contribution to journalArticleResearchpeer review

Voelsen, M, Rottensteiner, F & Heipke, C 2024, 'Transformer models for Land Cover Classification with Satellite Image Time Series', PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 92, no. 5, pp. 547-568. https://doi.org/10.1007/s41064-024-00299-7
Voelsen, M., Rottensteiner, F., & Heipke, C. (2024). Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 92(5), 547-568. https://doi.org/10.1007/s41064-024-00299-7
Voelsen M, Rottensteiner F, Heipke C. Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 Oct;92(5):547-568. Epub 2024 Aug 6. doi: 10.1007/s41064-024-00299-7
Voelsen, Mirjana ; Rottensteiner, Franz ; Heipke, Christian. / Transformer models for Land Cover Classification with Satellite Image Time Series. In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 ; Vol. 92, No. 5. pp. 547-568.
Download
@article{a2289b5e8c4e4d1ab55fbf9b0c8601e7,
title = "Transformer models for Land Cover Classification with Satellite Image Time Series",
abstract = "In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.",
keywords = "Land Cover Classification, Satellite Image Time Series, Self-attention, Swin Transformer",
author = "Mirjana Voelsen and Franz Rottensteiner and Christian Heipke",
note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",
year = "2024",
month = oct,
doi = "10.1007/s41064-024-00299-7",
language = "English",
volume = "92",
pages = "547--568",
number = "5",

}

Download

TY - JOUR

T1 - Transformer models for Land Cover Classification with Satellite Image Time Series

AU - Voelsen, Mirjana

AU - Rottensteiner, Franz

AU - Heipke, Christian

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/10

Y1 - 2024/10

N2 - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

AB - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

KW - Land Cover Classification

KW - Satellite Image Time Series

KW - Self-attention

KW - Swin Transformer

UR - http://www.scopus.com/inward/record.url?scp=85200604175&partnerID=8YFLogxK

U2 - 10.1007/s41064-024-00299-7

DO - 10.1007/s41064-024-00299-7

M3 - Article

AN - SCOPUS:85200604175

VL - 92

SP - 547

EP - 568

JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

SN - 2512-2789

IS - 5

ER -