Transformer models for Land Cover Classification with Satellite Image Time Series

Mirjana Voelsen; Franz Rottensteiner; Christian Heipke

doi:10.1007/s41064-024-00299-7

Details

Original language	English
Pages (from-to)	547-568
Number of pages	22
Journal	PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science
Volume	92
Issue number	5
Early online date	6 Aug 2024
Publication status	Published - Oct 2024

Abstract

In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

Keywords

Land Cover Classification, Satellite Image Time Series, Self-attention, Swin Transformer

ASJC Scopus subject areas

Social Sciences(all)
Geography, Planning and Development
Physics and Astronomy(all)
Instrumentation
Earth and Planetary Sciences(all)
Earth and Planetary Sciences (miscellaneous)

Cite this

Transformer models for Land Cover Classification with Satellite Image Time Series. / Voelsen, Mirjana; Rottensteiner, Franz; Heipke, Christian.
In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, Vol. 92, No. 5, 10.2024, p. 547-568.

Research output: Contribution to journal › Article › Research › peer review

Voelsen, M, Rottensteiner, F & Heipke, C 2024, 'Transformer models for Land Cover Classification with Satellite Image Time Series', PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, vol. 92, no. 5, pp. 547-568. https://doi.org/10.1007/s41064-024-00299-7

Voelsen, M., Rottensteiner, F., & Heipke, C. (2024). Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science, 92(5), 547-568. https://doi.org/10.1007/s41064-024-00299-7

Voelsen M, Rottensteiner F, Heipke C. Transformer models for Land Cover Classification with Satellite Image Time Series. PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 Oct;92(5):547-568. Epub 2024 Aug 6. doi: 10.1007/s41064-024-00299-7

Voelsen, Mirjana ; Rottensteiner, Franz ; Heipke, Christian. / Transformer models for Land Cover Classification with Satellite Image Time Series. In: PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science. 2024 ; Vol. 92, No. 5. pp. 547-568.

Download

@article{a2289b5e8c4e4d1ab55fbf9b0c8601e7,

title = "Transformer models for Land Cover Classification with Satellite Image Time Series",

abstract = "In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.",

keywords = "Land Cover Classification, Satellite Image Time Series, Self-attention, Swin Transformer",

author = "Mirjana Voelsen and Franz Rottensteiner and Christian Heipke",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

month = oct,

doi = "10.1007/s41064-024-00299-7",

language = "English",

volume = "92",

pages = "547--568",

number = "5",

}

Download

TY - JOUR

T1 - Transformer models for Land Cover Classification with Satellite Image Time Series

AU - Voelsen, Mirjana

AU - Rottensteiner, Franz

AU - Heipke, Christian

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024/10

Y1 - 2024/10

N2 - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

AB - In this paper we address the task of pixel-wise land cover (LC) classification using satellite image time series (SITS). For that purpose, we use a supervised deep learning model and focus on combining spatial and temporal features. Our method is based on the Swin Transformer and captures global temporal features by using self-attention and local spatial features by convolutions. We extend the architecture to receive multi-temporal input to generate one output label map for every input image. In our experiments we focus on the application of pixel-wise LC classification from Sentinel‑2 SITS over the whole area of Lower Saxony (Germany). The experiments with our new model show that by using convolutions for spatial feature extraction or a temporal weighting module in the skip connections the performance improves and is more stable. The combined usage of both adaptations results in the overall best performance although this improvement is only minimal. Compared to a fully convolutional neural network without any self-attention layers our model improves the results by 2.1% in the mean F1-Score on a corrected test dataset. Additionally, we investigate different types of temporal position encoding, which do not have a significant impact on the performance.

KW - Land Cover Classification

KW - Satellite Image Time Series

KW - Self-attention

KW - Swin Transformer

UR - http://www.scopus.com/inward/record.url?scp=85200604175&partnerID=8YFLogxK

U2 - 10.1007/s41064-024-00299-7

DO - 10.1007/s41064-024-00299-7

M3 - Article

AN - SCOPUS:85200604175

VL - 92

SP - 547

EP - 568

JO - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

JF - PFG - Journal of Photogrammetry, Remote Sensing and Geoinformation Science

SN - 2512-2789

IS - 5

ER -

Research@Leibniz University

Transformer models for Land Cover Classification with Satellite Image Time Series

Authors

Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this