Full list on Google Scholar.

Also a brief future research proposal.

$*$ = equal contribution

$\dagger$ = equal supervision

  • DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

    Brian Nlong Zhao, Yuhang Xiao*, Jiashu Xu*, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet†, Yunhao Ge†

    Submitted to ICLR 2025 (3 positive reviews)

    TL;DRA novel, simple approach to learn a semantic distribution of images that enables diverse personalized image and 3D generation with flexible variations and editing capabilities.
    Figure for DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
  • Benchmark Dataset for Radiology Report Generation with Instructions and Contexts

    Brian Nlong Zhao, Zilong Wang, Xinyang Jiang, Xufang Luo, Yifan Yang, Bo Li, Javier Alvarez-Valle, Matthew P. Lungren, Dongsheng Li, Lili Qiu

    Submitted to ICLR 2025

    TL;DRNew radiology report generation tasks, data generation pipeline, and dataset in real-world clinical setting with medical contexts. A novel baseline model architecture to adapt general domain visual LLM to medical domain and report generation task with context.
    Figure for Benchmark Dataset for Radiology Report Generation with Instructions and Contexts
  • Probabilistic Prompt Distribution Learning for Animal Pose Estimation

    Jiyong Rao*, Brian Nlong Zhao*, Yu Wang

    Submitted to CVPR 2025

    TL;DRA probabilistic prompt-learning approach for multi-species animal pose estimation using diverse textual representations and cross-modal fusion to handle large data variances, achieving state-of-the-art animal pose estimation results.
    Figure for Probabilistic Prompt Distribution Learning for Animal Pose Estimation
  • Asymmetric Semantic Optimization for Text-Video Retrieval

    Yu Wang, Brian Nlong Zhao, Shiwei Chen

    Submitted to CVPR 2025

    TL;DRA novel method to employ an asymmetric semantic optimization for text-video retrieval, leveraging knowledge-based text editing and multi-granularity interactions to refine language features, highlight crucial visual clues, and achieve state-of-the-art performance.
    Figure for Asymmetric Semantic Optimization for Text-Video Retrieval
  • Unbalanced Multi-view Stochastic Embedding Learning for Text-Video Retrieval

    Yu Wang, Shiwei Chen, Brian Nlong Zhao

    Submitted to CVPR 2025

    TL;DRText-video retrieval method enhanced with multi-view descriptions, adaptive embedding optimization, and a memory-based mechanism to reduce bias, achieving state-of-the-art results.
    Figure for Unbalanced Multi-view Stochastic Embedding Learning for Text-Video Retrieval
  • DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
    Beyond generation: Harnessing Text to Image Models for Object Detection and Segmentation (Extension)

    Yunhao Ge*, Jiashu Xu*, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet

    arXiv preprint 2023

    TL;DRA method for generating detection and segmentation training data using text-to-image synthesis and a copy-and-paste scheme, achieving performance comparable to real data alone and performing even better when combined with real data, while excelling in zero-shot and out-of-distribution scenarios.
    Figure for DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection </br> Beyond generation: Harnessing Text to Image Models for Object Detection and Segmentation (Extension)
  • EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation

    Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet

    arXiv preprint 2022

    TL;DRAn EM-guided segment selection and cut-paste approach for weakly-supervised instance segmentation using only image-level supervision. By refining object masks, generating context-aware backgrounds, and compositing these into a pseudo-labeled dataset, EM-Paste achieves state-of-the-art results on PASCAL VOC and COCO, outperforming baselines and addressing long-tail class augmentation.
    Figure for EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation
  • Progressive Motion Coherence for Remote Sensing Image Matching

    Yizhang Liu, Brian Nlong Zhao, Shengjie Zhao, Lin Zhang

    IEEE Transactions on Geoscience and Remote Sensing Volume 60

    TL;DRA feature-based remote sensing image matching method that uses novel coherence constraints for robustness to image degradations and large rotations.
    Figure for Progressive Motion Coherence for Remote Sensing Image Matching
  • Rectified Neighborhood Construction for Robust Feature Matching With Heavy Outliers

    Yizhang Liu, Brian Nlong Zhao, Shengjie Zhao

    IEEE Geoscience and Remote Sensing Letters Volume 19

    TL;DRA rectified neighborhood construction strategy to improve local consistency-based feature matching by mitigating the impact of outliers and adaptively estimating parameters.
    Figure for Rectified Neighborhood Construction for Robust Feature Matching With Heavy Outliers
  • Extrinsic Self-calibration of the Surround-view System: A Weakly Supervised Approach

    Yang Chen, Lin Zhang, Ying Shen, Brian Nlong Zhao, Yicong Zhou

    IEEE Transactions on Multimedia Volume 25

    Figure for Extrinsic Self-calibration of the Surround-view System: A Weakly Supervised Approach
  • Scene Text Image Super-Resolution via Parallelly Contextual Attention Network

    Cairong Zhao, Shuyang Feng, Brian Nlong Zhao, Zhijun Ding, Jun Wu, Fuming Shen, Hengtao Shen

    ACM Multimedia 2021

    TL;DRA network architecture that reconstructs high-frequency information and adaptively captures horizontal and vertical sequence-dependent features for text super-resolution.
    Figure for Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
  • ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System

    Tianjun Zhang, Brian Nlong Zhao, Ying Shen, Xuan Shao, Lin Zhang, Yicong Zhou

    ACM Multimedia 2021

    Figure for ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System