Full list on Google Scholar.

Also a brief future research proposal.

$*$ = equal contribution

$\dagger$ = equal supervision

  • (TBD) 4D Animal Reconstruction from In-the-wild Video

    Brian Nlong Zhao, Jiajun Wu†, Shangzhe Wu†

    In submission

    TL;DRA framework for obtaining large-scale 4D animal assets from zero, including a data engine that automatically scrape and process online video, and an animal reconstruction method adapted for sequence reconstruction, as well as benchmark evaluation set for 4D animal reconstruction task.
    Figure for (TBD) 4D Animal Reconstruction from In-the-wild Video
  • DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models

    Brian Nlong Zhao, Yuhang Xiao*, Jiashu Xu*, Xinyang Jiang, Yifan Yang, Dongsheng Li, Laurent Itti, Vibhav Vineet†, Yunhao Ge†

    ICLR 2025

    TL;DRA novel, simple approach to learn a semantic distribution of images that enables diverse personalized image and 3D generation with flexible variations and editing capabilities.
    Figure for DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
  • Probabilistic Prompt Distribution Learning for Animal Pose Estimation

    Jiyong Rao*, Brian Nlong Zhao*, Yu Wang

    CVPR 2025

    TL;DRA probabilistic prompt-learning approach for multi-species animal pose estimation using diverse textual representations and cross-modal fusion to handle large data variances, achieving state-of-the-art animal pose estimation results.
    Figure for Probabilistic Prompt Distribution Learning for Animal Pose Estimation
  • Benchmark Dataset for Radiology Report Generation with Instructions and Contexts

    Brian Nlong Zhao, Zilong Wang, Xinyang Jiang, Xufang Luo, Yifan Yang, Bo Li, Javier Alvarez-Valle, Matthew P. Lungren, Dongsheng Li, Lili Qiu

    Submitted to MICCAI 2025

    TL;DRNew radiology report generation tasks, data generation pipeline, and dataset in real-world clinical setting with medical contexts. A novel baseline model architecture to adapt general domain visual LLM to medical domain and report generation task with context.
    Figure for Benchmark Dataset for Radiology Report Generation with Instructions and Contexts
  • Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings

    Kenan Ye*, Brian Nlong Zhao*, Shuang Liang, Han Yao, Wenzhen Jia

    IEEE Transactions on Computational Social Systems

    TL;DRSolving 3D action recognition task by new approah that uses graph-based encoder and GraphGRU with a context-aware topology attention mechanism.
    Figure for Self-Supervised 3-D Action Recognition by Contrasting Context-Enhanced Action Embeddings
  • Beyond generation: Harnessing Text to Image Models for Object Detection and Segmentation

    Yunhao Ge*, Jiashu Xu*, Brian Nlong Zhao, Neel Joshi, Laurent Itti, Vibhav Vineet

    arXiv preprint 2023

    TL;DRA method for generating detection and segmentation training data using text-to-image synthesis and a copy-and-paste scheme, achieving performance comparable to real data alone and performing even better when combined with real data, while excelling in zero-shot and out-of-distribution scenarios.
    Figure for Beyond generation: Harnessing Text to Image Models for Object Detection and Segmentation
  • EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation

    Yunhao Ge, Jiashu Xu, Brian Nlong Zhao, Laurent Itti, Vibhav Vineet

    arXiv preprint 2022

    TL;DRAn EM-guided segment selection and cut-paste approach for weakly-supervised instance segmentation using only image-level supervision. By refining object masks, generating context-aware backgrounds, and compositing these into a pseudo-labeled dataset, EM-Paste achieves state-of-the-art results on PASCAL VOC and COCO, outperforming baselines and addressing long-tail class augmentation.
    Figure for EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation
  • Progressive Motion Coherence for Remote Sensing Image Matching

    Yizhang Liu, Brian Nlong Zhao, Shengjie Zhao, Lin Zhang

    IEEE Transactions on Geoscience and Remote Sensing Volume 60

    TL;DRA feature-based remote sensing image matching method that uses novel coherence constraints for robustness to image degradations and large rotations.
    Figure for Progressive Motion Coherence for Remote Sensing Image Matching
  • Rectified Neighborhood Construction for Robust Feature Matching With Heavy Outliers

    Yizhang Liu, Brian Nlong Zhao, Shengjie Zhao

    IEEE Geoscience and Remote Sensing Letters Volume 19

    TL;DRA rectified neighborhood construction strategy to improve local consistency-based feature matching by mitigating the impact of outliers and adaptively estimating parameters.
    Figure for Rectified Neighborhood Construction for Robust Feature Matching With Heavy Outliers
  • Extrinsic Self-calibration of the Surround-view System: A Weakly Supervised Approach

    Yang Chen, Lin Zhang, Ying Shen, Brian Nlong Zhao, Yicong Zhou

    IEEE Transactions on Multimedia Volume 25

    Figure for Extrinsic Self-calibration of the Surround-view System: A Weakly Supervised Approach
  • Scene Text Image Super-Resolution via Parallelly Contextual Attention Network

    Cairong Zhao, Shuyang Feng, Brian Nlong Zhao, Zhijun Ding, Jun Wu, Fuming Shen, Hengtao Shen

    ACM Multimedia 2021

    TL;DRA network architecture that reconstructs high-frequency information and adaptively captures horizontal and vertical sequence-dependent features for text super-resolution.
    Figure for Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
  • ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System

    Tianjun Zhang, Brian Nlong Zhao, Ying Shen, Xuan Shao, Lin Zhang, Yicong Zhou

    ACM Multimedia 2021

    Figure for ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System