Abstract
Low-light image enhancement plays a central role in various downstream computer vision tasks. Vision Transformers (ViTs) have recently been adapted for low-level image processing and have achieved a promising performance. However, ViTs process images in a window- or patch-based manner, compromising their computational efficiency and long-range dependency. Additionally, existing ViTs process RGB images instead of RAW data from sensors, which is sub-optimal when it comes to utilizing the rich information from RAW data. We propose a fully end-to-end Conv-Transformer-based model, RawFormer, to directly utilize RAW data for low-light image enhancement. RawFormer has a structure similar to that of U-Net, but it is integrated with a thoughtfully designed Conv-Transformer Fusing (CTF) block. The CTF block combines local attention and transposed self-attention mechanisms in one module and reduces the computational overhead by adopting a transposed self-attention operation. Experiments demonstrate that RawFormer outperforms state-of-the-art models by a significant margin on low-light RAW image enhancement tasks.
Original language | English |
---|---|
Pages (from-to) | 2677-2681 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 29 |
DOIs | |
Publication status | Published - 2022 |
Bibliographical note
Funding Information:This work was supported in part by the National Key R&D Program of China underGrant 2021YFB3600603, in part by theNatural Science Foundation of Fujian Province underGrant 2020J01468, and in part by the Brain Pool Program through the National Research Foundation of Korea (NRF) under the Ministry of Science and ICT under Grant 2021H1D3A2A01099396.
Publisher Copyright:
© 1994-2012 IEEE.
All Science Journal Classification (ASJC) codes
- Signal Processing
- Applied Mathematics
- Electrical and Electronic Engineering