Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Externe Organisationen

  • National Yang Ming Chiao Tung University (NSTC)
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des Sammelwerks2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Herausgeber (Verlag)Institute of Electrical and Electronics Engineers Inc.
ISBN (elektronisch)9798350359855
PublikationsstatusVeröffentlicht - 2023
Veranstaltung2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 - Jeju, Südkorea
Dauer: 4 Dez. 20237 Dez. 2023

Abstract

In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

ASJC Scopus Sachgebiete

Zitieren

Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. / Benjak, Martin; Chen, Yi Hsin; Peng, Wen Hsiao et al.
2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Benjak, M, Chen, YH, Peng, WH & Ostermann, J 2023, Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. in 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023, Jeju, Südkorea, 4 Dez. 2023. https://doi.org/10.1109/VCIP59821.2023.10402677
Benjak, M., Chen, Y. H., Peng, W. H., & Ostermann, J. (2023). Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. In 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/VCIP59821.2023.10402677
Benjak M, Chen YH, Peng WH, Ostermann J. Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. in 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc. 2023 doi: 10.1109/VCIP59821.2023.10402677
Benjak, Martin ; Chen, Yi Hsin ; Peng, Wen Hsiao et al. / Learning-Based Scalable Video Coding with Spatial and Temporal Prediction. 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. Institute of Electrical and Electronics Engineers Inc., 2023.
Download
@inproceedings{cb17cad9bb0041e59b06124bd3d414f5,
title = "Learning-Based Scalable Video Coding with Spatial and Temporal Prediction",
abstract = "In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.",
keywords = "conditional coding, scalable coding, spatial scalability, video coding, VVC",
author = "Martin Benjak and Chen, {Yi Hsin} and Peng, {Wen Hsiao} and Jorn Ostermann",
year = "2023",
doi = "10.1109/VCIP59821.2023.10402677",
language = "English",
booktitle = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",
note = "2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023 ; Conference date: 04-12-2023 Through 07-12-2023",

}

Download

TY - GEN

T1 - Learning-Based Scalable Video Coding with Spatial and Temporal Prediction

AU - Benjak, Martin

AU - Chen, Yi Hsin

AU - Peng, Wen Hsiao

AU - Ostermann, Jorn

PY - 2023

Y1 - 2023

N2 - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

AB - In this work, we propose a hybrid learning-based method for layered spatial scalability. Our framework consists of a base layer (BL), which encodes a spatially downsampled representation of the input video using Versatile Video Coding (VVC), and a learning-based enhancement layer (EL), which conditionally encodes the original video signal. The EL is conditioned by two fused prediction signals: A spatial inter-layer prediction signal, that is generated by spatially upsampling the output of the BL using super-resolution, and a temporal inter-frame prediction signal, that is generated by decoder-side motion compensation without signaling any motion vectors. We show that our method outperforms LCEVC and has comparable performance to full-resolution VVC for high-resolution content, while still offering scalability.

KW - conditional coding

KW - scalable coding

KW - spatial scalability

KW - video coding

KW - VVC

UR - http://www.scopus.com/inward/record.url?scp=85184853773&partnerID=8YFLogxK

U2 - 10.1109/VCIP59821.2023.10402677

DO - 10.1109/VCIP59821.2023.10402677

M3 - Conference contribution

AN - SCOPUS:85184853773

BT - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023

Y2 - 4 December 2023 through 7 December 2023

ER -

Von denselben Autoren