Details
Originalsprache | Englisch |
---|---|
Seiten (von - bis) | 1 |
Seitenumfang | 1 |
Fachzeitschrift | IEEE Transactions on Circuits and Systems for Video Technology |
Jahrgang | 34 |
Ausgabenummer | 11 |
Publikationsstatus | Veröffentlicht - 12 Juli 2024 |
Abstract
Conditional coding has lately emerged as the main-stream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB and outperforms VTM-17.0 in terms of MS-SSIM-RGB. It also opens up a new research direction for advancing learned video compression.
ASJC Scopus Sachgebiete
- Ingenieurwesen (insg.)
- Medientechnik
- Ingenieurwesen (insg.)
- Elektrotechnik und Elektronik
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
in: IEEE Transactions on Circuits and Systems for Video Technology, Jahrgang 34, Nr. 11, 12.07.2024, S. 1.
Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review
}
TY - JOUR
T1 - MaskCRT
T2 - Masked Conditional Residual Transformer for Learned Video Compression
AU - Chen, Yi Hsin
AU - Xie, Hong Sheng
AU - Chen, Cheng Wei
AU - Gao, Zong Lin
AU - Benjak, Martin
AU - Peng, Wen Hsiao
AU - Ostermann, Jorn
N1 - Publisher Copyright: IEEE
PY - 2024/7/12
Y1 - 2024/7/12
N2 - Conditional coding has lately emerged as the main-stream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB and outperforms VTM-17.0 in terms of MS-SSIM-RGB. It also opens up a new research direction for advancing learned video compression.
AB - Conditional coding has lately emerged as the main-stream approach to learned video compression. However, a recent study shows that it may perform worse than residual coding when the information bottleneck arises. Conditional residual coding was thus proposed, creating a new school of thought to improve on conditional coding. Notably, conditional residual coding relies heavily on the assumption that the residual frame has a lower entropy rate than that of the intra frame. Recognizing that this assumption is not always true due to dis-occlusion phenomena or unreliable motion estimates, we propose a masked conditional residual coding scheme. It learns a soft mask to form a hybrid of conditional coding and conditional residual coding in a pixel adaptive manner. We introduce a Transformer-based conditional autoencoder. Several strategies are investigated with regard to how to condition a Transformer-based autoencoder for inter-frame coding, a topic that is largely under-explored. Additionally, we propose a channel transform module (CTM) to decorrelate the image latents along the channel dimension, with the aim of using the simple hyperprior to approach similar compression performance to the channel-wise autoregressive model. Experimental results confirm the superiority of our masked conditional residual transformer (termed MaskCRT) to both conditional coding and conditional residual coding. On commonly used datasets, MaskCRT shows comparable BD-rate results to VTM-17.0 under the low delay P configuration in terms of PSNR-RGB and outperforms VTM-17.0 in terms of MS-SSIM-RGB. It also opens up a new research direction for advancing learned video compression.
KW - Encoding
KW - Entropy
KW - Feature extraction
KW - Image coding
KW - Learned video compression
KW - masked conditional residual coding
KW - Transformer-based video compression
KW - Transformers
KW - Video codecs
KW - Video compression
UR - http://www.scopus.com/inward/record.url?scp=85198379094&partnerID=8YFLogxK
U2 - 10.1109/TCSVT.2024.3427426
DO - 10.1109/TCSVT.2024.3427426
M3 - Article
AN - SCOPUS:85198379094
VL - 34
SP - 1
JO - IEEE Transactions on Circuits and Systems for Video Technology
JF - IEEE Transactions on Circuits and Systems for Video Technology
SN - 1051-8215
IS - 11
ER -