Event cameras can report scene movements as an asynchronous stream of data called the events. Unlike traditional cameras, event cameras have very low latency (microseconds vs milliseconds) very high dynamic range (140 dB vs 60 dB), and low power consumption, as they report changes of a scene and not a complete frame. As they report per pixel feature-like events and not the whole intensity frame they are immune to motion blur. However, event cameras require movement between the scene and camera to fire events, i.e., they have no output when the scene is relatively static. Traditional cameras, however, report the whole frame of pixels at once in fixed intervals but have lower dynamic range and are prone to motion blur in case of rapid movements. We get the best from both worlds and use events and intensity images together in our complementary design and estimate dense disparity from this combination. The proposed end-to-end design combines events and images in a sequential manner and correlates them to estimate dense depth values. Our various experimental settings in real-world and simulated scenarios exploit the superiority of our method in predicting accurate depth values with fine details. We further extend our method to extreme cases of missing the left or right event or stereo pair and also investigate stereo depth estimation with inconsistent dynamic ranges or event thresholds on the left and right pairs.
|Title of host publication||Proceedings - 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021|
|Publisher||Institute of Electrical and Electronics Engineers Inc.|
|Number of pages||10|
|Publication status||Published - 2021|
|Event||18th IEEE/CVF International Conference on Computer Vision, ICCV 2021 - Virtual, Online, Canada|
Duration: 2021 Oct 11 → 2021 Oct 17
|Name||Proceedings of the IEEE International Conference on Computer Vision|
|Conference||18th IEEE/CVF International Conference on Computer Vision, ICCV 2021|
|Period||21/10/11 → 21/10/17|
Bibliographical noteFunding Information:
Acknowledgement. This work was partly supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2019R1C1C1009283), the National Research Foundation of Korea (NRF) grant funded by the Korea Government (MSIT) (NRF-2018R1A2B3008640) and Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No.2019-0-01842, Artificial Intelligence Graduate School Program (GIST)), (No.2019-0-01351, Development of Ultra Low-Power Mobile Deep Learning Semiconductor With Compression/Decompression of Activation/Kernel Data, 20%) and (No. 2021-0-02068, Artificial Intelligence Innovation Hub). Authors thank Yeong-woo Nam for helping with the DCSEC challenge.
© 2021 IEEE
All Science Journal Classification (ASJC) codes
- Computer Vision and Pattern Recognition