Full list on Google Scholar.
Also a brief future research proposal.
$*$ = equal contribution
$\dagger$ = equal supervision
-
DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models
Submitted to ICLR 2025 (3 positive reviews)
TL;DR
A novel, simple approach to learn a semantic distribution of images that enables diverse personalized image and 3D generation with flexible variations and editing capabilities. -
Benchmark Dataset for Radiology Report Generation with Instructions and Contexts
Submitted to ICLR 2025
[Paper] TL;DR
New radiology report generation tasks, data generation pipeline, and dataset in real-world clinical setting with medical contexts. A novel baseline model architecture to adapt general domain visual LLM to medical domain and report generation task with context. -
Probabilistic Prompt Distribution Learning for Animal Pose Estimation
Submitted to CVPR 2025
TL;DR
A probabilistic prompt-learning approach for multi-species animal pose estimation using diverse textual representations and cross-modal fusion to handle large data variances, achieving state-of-the-art animal pose estimation results. -
Asymmetric Semantic Optimization for Text-Video Retrieval
Submitted to CVPR 2025
TL;DR
A novel method to employ an asymmetric semantic optimization for text-video retrieval, leveraging knowledge-based text editing and multi-granularity interactions to refine language features, highlight crucial visual clues, and achieve state-of-the-art performance. -
Unbalanced Multi-view Stochastic Embedding Learning for Text-Video Retrieval
Submitted to CVPR 2025
TL;DR
Text-video retrieval method enhanced with multi-view descriptions, adaptive embedding optimization, and a memory-based mechanism to reduce bias, achieving state-of-the-art results. -
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection Beyond generation: Harnessing Text to Image Models for Object Detection and Segmentation (Extension)
arXiv preprint 2023
TL;DR
A method for generating detection and segmentation training data using text-to-image synthesis and a copy-and-paste scheme, achieving performance comparable to real data alone and performing even better when combined with real data, while excelling in zero-shot and out-of-distribution scenarios. -
EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation
arXiv preprint 2022
[arXiv] TL;DR
An EM-guided segment selection and cut-paste approach for weakly-supervised instance segmentation using only image-level supervision. By refining object masks, generating context-aware backgrounds, and compositing these into a pseudo-labeled dataset, EM-Paste achieves state-of-the-art results on PASCAL VOC and COCO, outperforming baselines and addressing long-tail class augmentation. -
Progressive Motion Coherence for Remote Sensing Image Matching
IEEE Transactions on Geoscience and Remote Sensing Volume 60
[IEEE Xplore] TL;DR
A feature-based remote sensing image matching method that uses novel coherence constraints for robustness to image degradations and large rotations. -
Rectified Neighborhood Construction for Robust Feature Matching With Heavy Outliers
IEEE Geoscience and Remote Sensing Letters Volume 19
[IEEE Xplore] TL;DR
A rectified neighborhood construction strategy to improve local consistency-based feature matching by mitigating the impact of outliers and adaptively estimating parameters. -
Extrinsic Self-calibration of the Surround-view System: A Weakly Supervised Approach
IEEE Transactions on Multimedia Volume 25
-
Scene Text Image Super-Resolution via Parallelly Contextual Attention Network
ACM Multimedia 2021
[ACM Digital Library] [GitHub] TL;DR
A network architecture that reconstructs high-frequency information and adaptively captures horizontal and vertical sequence-dependent features for text super-resolution. -
ROECS: A Robust Semi-direct Pipeline Towards Online Extrinsics Correction of the Surround-view System
ACM Multimedia 2021