Xiaoqi Zhao (赵骁骐)
Dalian University of Technology (DLUT)
Address: No.2 Linggong Road, Ganjingzi District, Dalian City, Liaoning, China
Email: zxq[at]mail.dlut.edu.cn

About Me [GitHub] [Google Scholar]

I received the B.E. degree in electronic and information engineering from the Dalian University of Technology (DLUT), Dalian, China in 2019. I am currently pursuing the PhD degree in signal and information processing from DLUT. My advisors are Prof. Lihe Zhang and Prof. Huchuan Lu from DLUT.

My research interests include RGB/RGB-D salient object detection, video object segmentation, camouflaged object detection, medical image segmentation (polyp), crowd counting, multi-task learning and self-supervised learning.


  • Seventh place of VisDrone 2021 Crowd Counting Challenge (ICCV 2021 Workshop), (7th from 77 teams)
  • Seventh place of 2020 Hualu Cup's first big data competition (Intelligent Diagnosis of Cancer Risk), 5,000 RMB bonus (7th from 114 teams)
  • 2020 Huawei Fellowship
  • Third place of 2018 OPPO Top AI Competition (Portrait Segmentation), 50,000 RMB bonus (3rd from 456 teams)


  • 1 paper accepted in AAAI 2022 (~15%[1349/9020]).
  • 1 paper accepted in ACMMM 2021 (oral ~9%[179/1942]).
  • 1 paper accepted in MICCAI 2021.
  • 3 paper accepted in ECCV 2020 (1 oral ~2%[104/5025]).
  • 1 paper accepted in CVPR 2020.


Self-Supervised Pretraining for RGB-D Salient Object Detection
Xiaoqi Zhao, Youwei Pang, Lihe Zhang, Huchuan Lu, Xiang Ruan
in AAAI 2022
[Paper] [Code]
In this work, we propose a novel self-supervised learning (SSL) scheme to accomplish effective pre-training for RGB-D SOD without requiring human annotation. As the first method of SSL in RGB-D SOD, it can be taken as a new baseline for future research.
Multi-Source Fusion and Automatic Predictor Selection for Zero-Shot Video Object Segmentation
Xiaoqi Zhao, Youwei Pang, Jiaxing Yang, Lihe Zhang, Huchuan Lu
in ACMMM 2021 (Oral )
[Paper] [Slide&极市平台推送] [Code]
In this paper, we propose a novel multi-source fusion network to effectively utilize the complementary features from the RGB, depth, static saliency and optical flow for zero-shot video object segmentation. And, to get rid of the inevitable interference caused by low-quality optical flow, we design a novel predictor selection network, which automatically chooses the results from static saliency predictor and moving object predictor.
Automatic Polyp Segmentation via Multi-scale Subtraction Network
Xiaoqi Zhao, Lihe Zhang, Huchuan Lu
in MICCAI 2021
[Paper] [Code]
In this paper, we present a novel multi-scale subtraction network (MSNet) to automatically segment polyps from colonoscopy images. MSNet runs at the fastest speed of ∼70fps among the existing polyp segmentation methods.
Suppress and Balance: A Simple Gated Network for Salient Object Detection
Xiaoqi Zhao*, Youwei Pang*, Lihe Zhang, Huchuan Lu, Lei Zhang
in ECCV 2020 (Oral )
[Paper] [Slide] [Code]
The gate unit is simple yet effective, therefore, a gated FPN network can be used as a new baseline for dense prediction tasks.
A Single Stream Network for Robust and Real-time RGB-D Salient Object Detection
Xiaoqi Zhao, Lihe Zhang, Youwei Pang, Huchuan Lu, Lei Zhang
in ECCV 2020
[Paper] [Code]
We build a single-stream network with the novel depth-enhanced dual attention for real-time (speed: 32FPS, model size: 106.7 MB with the VGG-16 backbone ) and robust rgb-d salient object detection. The more efficient way of using depth information to guide early fusion and ehance feature discrimination in decoder, which are different from existing two-stream methods concentrate on the cross-modal fusion between the RGB stream and the depth stream.
Multi-scale Interactive Network for Salient Object Detection
Youwei Pang*, Xiaoqi Zhao*, Lihe Zhang, Huchuan Lu
in CVPR 2020
[Paper] [Code]
In this paper, we investigate the multi-scale issue to propose an effective and efficient network MINet with the transformation-interaction-fusion strategy, for salient object detection.
Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection
Youwei Pang, Lihe Zhang, Xiaoqi Zhao, Huchuan Lu
in ECCV 2020
[Paper] [Slide] [Code]
The proposed model (HDFNet) generates adaptive filters with different receptive field sizes through the dynamic dilated pyramid module. It can make full use of semantic cues from multi-modal mixed features to achieve multi-scale cross-modal guidance, thereby enhancing the representation capabilities of the decoder. HDFNet is an important baseline of the [winning solution] in NTIRE 2021 (Depth Guided Image Relighting Challenge) hosted in CVPR 2021 workshop (winner: AICSNTU-MBNet team (Asustek Computer Inc & National Taiwan University)).

Review Services

Journal Reviewer
Conference Reviewer
ICCV: 2021 AAAI: 2022 CVPR: 2022