Scene Prior Filtering for Depth Map Super-Resolution

Zhengxue Wang¹ Zhiqiang Yan¹ Ming-Hsuan Yang² Jinshan Pan¹ Guangwei Gao³ Ying Tai⁴ Jian Yang¹

¹Nanjing University of Science and Technology ²University of California at Merced
³Nanjing University of Posts and Telecommunications ⁴Nanjing University

Paper Code BibTex

🔥 Amazing Depth Super-Resolution Reconstruction 🔥

For the first time, SPFNet introduces the priors surface normal and semantic map from large-scale models, addressing the issues of texture interference and edge inaccuracy in GDSR. Here are some typical super-resolution cases.

Abstract

Multi-modal fusion is vital to the success of super-resolution of depth images. However, commonly used fusion strategies, such as addition and concatenation, fall short of effectively bridging the modal gap. As a result, guided image filtering methods have been introduced to mitigate this issue. Nevertheless, it is observed that their filter kernels usually encounter significant texture interference and edge inaccuracy. To tackle these two challenges, we introduce a Scene Prior Filtering network, SPFNet, which utilizes the priors surface normal and semantic map from large scale models. Specifically, we design an All-in-one Prior Propagation that computes the similarity between multi-modal scene priors, i.e., RGB, normal, semantic, and depth, to reduce the texture interference. In addition, we present a One-to-one Prior Embedding that continuously embeds each single-modal prior into depth using Mutual Guided Filtering, further alleviating the texture interference while enhancing edges. Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance. The source codes and pre-trained models are available at https://github.com/yanzq95/SPFNet.

Method

SPFNet. It first produces the normal I_n and semantic I_s priors from I_r using large-scale models. Then, the scene prior branch (orange part) extracts the multi-modal features. Meanwhile, the depth branch (blue part) recursively conducts all-in-one prior propagation (APP) and one-to-one prior embedding (OPE). BI: bicubic interpolation.

Scheme of (a) All-in-one Prior Propagation (APP), and (b) histogram comparison of scene prior features before and after APP.

Scheme of (a) One-to-one Prior Embedding (OPE), and (b) gradient histogram of filter kernels in the texture area (green box). The surface normal, semantic, and RGB kernels are generated by our Mutual Guided Filtering (MGF).

Quantitative Comparison

Visual Comparison

Visual results and error maps on NYU-v2 dataset (x16).

Visual results and error maps on RGB-D-D dataset (x16).

Visual results and error maps on Lu dataset (x16).

Visual results and error maps on Middlebury dataset (x16).

Visual results on real-world RGB-D-D dataset.

Visual results of joint DSR and denoising on NYU-v2 and Middlebury (x16).

BibTex

                    
@article{wang2024scene,
  title={Scene Prior Filtering for Depth Map Super-Resolution},
  author={Wang, Zhengxue and Yan, Zhiqiang and Yang, Ming-Hsuan and Pan, Jinshan and Yang, Jian and Tai, Ying and Gao, Guangwei},
  journal={arXiv preprint arXiv:2402.13876},
  year={2024}
}

Contact

For any questions, please contact {zxwang,yanzq}@njust.edu.cn