publications | Ziyan Wu

2026

MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

Yuhao Su, Anwesa Choudhuri, Zhongpai Gao, Benjamin Planche, Van Nguyen Nguyen, Meng Zheng , Yuhan Shen, Arun Innanje , Terrence Chen, Ehsan Elhamifar , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Abs Bib Paper

We introduce MedVidBench, a large-scale benchmark of 531,850 video-instruction pairs across 8 medical sources spanning video, segment, and frame-level tasks. We introduce MedGRPO, a novel RL framework for balanced multi-dataset training with cross-dataset reward normalization and a medical LLM judge that evaluates caption quality on five clinical dimensions.
@article{medgrpo2026, title = {MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding}, author = {Su, Yuhao and Choudhuri, Anwesa and Gao, Zhongpai and Planche, Benjamin and Nguyen, Van Nguyen and Zheng, Meng and Shen, Yuhan and Innanje, Arun and Chen, Terrence and Elhamifar, Ehsan and Wu, Ziyan}, year = {2026}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
Consistent Instance Field for Dynamic Scene Understanding

Junyi Wu, Van Nguyen Nguyen, Benjamin Planche, Jiachen Tao , Changchang Sun, Zhongpai Gao , Zhenghao Zhao, Anwesa Choudhuri , Gengyu Zhang, Meng Zheng , Feiran Wang , Terrence Chen, Yan Yan , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026

Abs Bib Paper

We introduce Consistent Instance Field, a continuous and probabilistic spatio-temporal representation for dynamic scene understanding. Unlike prior methods that rely on discrete tracking or view-dependent features, our approach disentangles visibility from persistent object identity by modeling each space-time point with an occupancy probability and a conditional instance distribution.
@article{consistent2026, title = {Consistent Instance Field for Dynamic Scene Understanding}, author = {Wu, Junyi and Nguyen, Van Nguyen and Planche, Benjamin and Tao, Jiachen and Sun, Changchang and Gao, Zhongpai and Zhao, Zhenghao and Choudhuri, Anwesa and Zhang, Gengyu and Zheng, Meng and Wang, Feiran and Chen, Terrence and Yan, Yan and Wu, Ziyan}, year = {2026}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
Universal Beta Splatting

Rong Liu, Zhongpai Gao, Benjamin Planche , Meida Chen, Van Nguyen Nguyen, Meng Zheng, Anwesa Choudhuri , Terrence Chen , Yue Wang, Andrew Feng , and Ziyan Wu

International Conference on Learning Representations (ICLR), 2026

Abs Bib Paper

We introduce Universal Beta Splatting (UBS), a unified framework that generalizes 3D Gaussian Splatting to N-dimensional anisotropic Beta kernels for explicit radiance field rendering. Unlike fixed Gaussian primitives, Beta kernels enable controllable dependency modeling across spatial, angular, and temporal dimensions within a single representation.
@article{universal2026, title = {Universal Beta Splatting}, author = {Liu, Rong and Gao, Zhongpai and Planche, Benjamin and Chen, Meida and Nguyen, Van Nguyen and Zheng, Meng and Choudhuri, Anwesa and Chen, Terrence and Wang, Yue and Feng, Andrew and Wu, Ziyan}, year = {2026}, journal = {International Conference on Learning Representations (ICLR)}, }

2025

highlight
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image

Arindam Dutta, Meng Zheng, Zhongpai Gao, Benjamin Planche, Anwesa Choudhuri, Amit K. Roy-Chowdhury , Terrence Chen , and Ziyan Wu

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abs Bib Paper

We propose a novel pipeline designed to reconstruct occlusion-resilient 3D humans with multiview consistency from a single occluded image, without requiring either ground-truth geometric prior annotations or 3D supervision. Specifically, CHROME leverages a multiview diffusion model to first synthesize occlusion-free human images from the occluded input.
@article{chrome2025, title = {CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image}, author = {Dutta, Arindam and Zheng, Meng and Gao, Zhongpai and Planche, Benjamin and Choudhuri, Anwesa and Roy-Chowdhury, Amit K. and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {IEEE/CVF International Conference on Computer Vision (ICCV)}, note = {highlight}, }
7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting

Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri , Terrence Chen , and Ziyan Wu

IEEE/CVF International Conference on Computer Vision (ICCV), 2025

Abs Bib Paper

We present 7D Gaussian Splatting (7DGS), a unified framework representing scene elements as seven-dimensional Gaussians spanning position (3D), time (1D), and viewing direction (3D). Experiments demonstrate that 7DGS outperforms prior methods by up to 7.36 dB in PSNR while achieving real-time rendering on challenging dynamic scenes.
@article{7dgs2025, title = {7DGS: Unified Spatial-Temporal-Angular Gaussian Splatting}, author = {Gao, Zhongpai and Planche, Benjamin and Zheng, Meng and Choudhuri, Anwesa and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {IEEE/CVF International Conference on Computer Vision (ICCV)}, }
PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis

Anwesa Choudhuri, Zhongpai Gao, Meng Zheng, Benjamin Planche , Terrence Chen , and Ziyan Wu

Medical Image Computing and Computer Assisted Intervention (MICCAI), 2025

Abs Bib Paper

We introduce PolypSegTrack, a novel foundation model that jointly addresses polyp detection, segmentation, classification and unsupervised tracking in colonoscopic videos.
@article{polypsegtrack2025, title = {PolypSegTrack: Unified Foundation Model for Colonoscopy Video Analysis}, author = {Choudhuri, Anwesa and Gao, Zhongpai and Zheng, Meng and Planche, Benjamin and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {Medical Image Computing and Computer Assisted Intervention (MICCAI)}, }
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding

Andong Deng, Zhongpai Gao, Anwesa Choudhuri, Benjamin Planche, Meng Zheng , Bin Wang , Chen Chen , Terrence Chen , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025

Abs Bib Paper

We propose Seq2Time, a data-oriented training paradigm that leverages sequences of images and short video clips to enhance temporal awareness in long videos. By converting sequence positions into temporal annotations, we transform largescale image and clip captioning datasets into sequences that mimic the temporal structure of long videos.
@article{seq2time2025, title = {Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding}, author = {Deng, Andong and Gao, Zhongpai and Choudhuri, Anwesa and Planche, Benjamin and Zheng, Meng and Wang, Bin and Chen, Chen and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering

Zhongpai Gao, Benjamin Planche, Meng Zheng, Anwesa Choudhuri , Terrence Chen , and Ziyan Wu

International Conference on Learning Representations (ICLR), 2025

Abs Bib Paper Website

We introduce 6D Gaussian Splatting (6DGS), which enhances color and opacity representations and leverages the additional directional information in the 6D space for optimized Gaussian control. Our approach significantly improves real-time radiance field rendering by better modeling view-dependent effects and fine details.
@article{6dgs2025, title = {6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering}, author = {Gao, Zhongpai and Planche, Benjamin and Zheng, Meng and Choudhuri, Anwesa and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {International Conference on Learning Representations (ICLR)}, }
3D Vision-Language Gaussian Splatting

Qucheng Peng, Benjamin Planche, Zhongpai Gao, Meng Zheng, Anwesa Choudhuri , Terrence Chen , Chen Chen , and Ziyan Wu

International Conference on Learning Representations (ICLR), 2025

Abs Bib Paper

We propose a 3D vision-language Gaussian splatting model for scene understanding, to put emphasis on the representation learning of language modality. We propose a novel cross-modal rasterizer, using modality fusion along with a smoothed semantic indicator for enhancing semantic rasterization.
@article{3dvlgs2025, title = {3D Vision-Language Gaussian Splatting}, author = {Peng, Qucheng and Planche, Benjamin and Gao, Zhongpai and Zheng, Meng and Choudhuri, Anwesa and Chen, Terrence and Chen, Chen and Wu, Ziyan}, year = {2025}, journal = {International Conference on Learning Representations (ICLR)}, }
Order-aware Interactive Segmentation

Bin Wang, Anwesa Choudhuri, Meng Zheng, Zhongpai Gao, Benjamin Planche, Andong Deng, Qin Liu , Terrence Chen, Ulas Bagci , and Ziyan Wu

International Conference on Learning Representations (ICLR), 2025

Abs Bib Paper Website

We introduce a novel order-aware attention, where the order maps seamlessly guide the user interactions to attend to the image features. Our approach allows both dense and sparse integration of user clicks, enhancing both accuracy and efficiency as compared to prior works.
@article{ois2025, title = {Order-aware Interactive Segmentation}, author = {Wang, Bin and Choudhuri, Anwesa and Zheng, Meng and Gao, Zhongpai and Planche, Benjamin and Deng, Andong and Liu, Qin and Chen, Terrence and Bagci, Ulas and Wu, Ziyan}, year = {2025}, journal = {International Conference on Learning Representations (ICLR)}, }
Automated Patient Positioning with Learned 3D Hand Gestures

Zhongpai Gao, Abhishek Sharma, Meng Zheng, Benjamin Planche , Terrence Chen , and Ziyan Wu

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025

Abs Bib Paper

We propose an automated patient positioning system that utilizes a camera to detect specific hand gestures from technicians, allowing users to indicate the target patient region to the system and initiate automated positioning. Our approach relies on a novel multi-stage pipeline to recognize and interpret the technicians’ gestures, translating them into precise motions of medical devices.
@article{patient2025, title = {Automated Patient Positioning with Learned 3D Hand Gestures}, author = {Gao, Zhongpai and Sharma, Abhishek and Zheng, Meng and Planche, Benjamin and Chen, Terrence and Wu, Ziyan}, year = {2025}, journal = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, }

2024

DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering

Zhongpai Gao^*, Benjamin Planche^*, Meng Zheng, Xiao Chen , Terrence Chen , and Ziyan Wu

Annual Conference on Neural Information Processing Systems (NeurIPS), 2024

Abs Bib Paper

We present a novel approach that marries realistic physics-inspired X-ray simulation with efficient, differentiable DRR generation using 3D Gaussian splatting (3DGS). Our direction-disentangled 3DGS (DDGS) method separates the radiosity contribution into isotropic and direction-dependent components, approximating complex anisotropic interactions without intricate runtime simulations.
@article{ddgsct2024, title = {DDGS-CT: Direction-Disentangled Gaussian Splatting for Realistic Volume Rendering}, author = {Gao, Zhongpai and Planche, Benjamin and Zheng, Meng and Chen, Xiao and Chen, Terrence and Wu, Ziyan}, year = {2024}, journal = {Annual Conference on Neural Information Processing Systems (NeurIPS)}, }
Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou , Terrence Chen, Junsong Yuan , and Ziyan Wu

European Conference on Computer Vision (ECCV), 2024

Abs Bib Paper

We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images.
@article{divide2024, title = {Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images}, author = {Luan, Tianyu and Gao, Zhongpai and Xie, Luyuan and Sharma, Abhishek and Ding, Hao and Planche, Benjamin and Zheng, Meng and Lou, Ange and Chen, Terrence and Yuan, Junsong and Wu, Ziyan}, year = {2024}, journal = {European Conference on Computer Vision (ECCV)}, }
early accept
Few-Shot 3D Volumetric Segmentation with Multi-Surrogate Fusion

Meng Zheng, Benjamin Planche, Zhongpai Gao , Terrence Chen, Richard J. Radke , and Ziyan Wu

Medical Image Computing and Computer Assisted Intervention (MICCAI), 2024

Abs Bib Paper

We present MSFSeg, a novel few-shot 3D segmentation framework with a lightweight multi-surrogate fusion (MSF). MSFSeg is able to automatically segment unseen 3D objects/organs (during training) provided with one or a few annotated 2D slices or 3D sequence segments.
@article{msfseg2024, title = {Few-Shot 3D Volumetric Segmentation with Multi-Surrogate Fusion}, author = {Zheng, Meng and Planche, Benjamin and Gao, Zhongpai and Chen, Terrence and Radke, Richard J. and Wu, Ziyan}, year = {2024}, journal = {Medical Image Computing and Computer Assisted Intervention (MICCAI)}, note = {early accept}, }
Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models

Wenqi Ren, Ruihao Xia, Meng Zheng , Ziyan Wu , Yang Tang, and Nicu Sebe

ACM Multimedia Conference (MM), 2024

Abs Bib Paper

This work addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes that are either unlabeled or unseen in the source domain.
@article{ccda2024, title = {Cross-Class Domain Adaptive Semantic Segmentation with Visual Language Models}, author = {Ren, Wenqi and Xia, Ruihao and Zheng, Meng and Wu, Ziyan and Tang, Yang and Sebe, Nicu}, year = {2024}, journal = {ACM Multimedia Conference (MM)}, }
DaReNeRF: Direction-aware Representation for Dynamic Scenes

Ange Lou, Benjamin Planche, Zhongpai Gao , Yamin Li, Tianyu Luan, Hao Ding , Terrence Chen, Jack Noble , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Abs Bib Paper

We present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information.
@article{darenerf2024, title = {DaReNeRF: Direction-aware Representation for Dynamic Scenes}, author = {Lou, Ange and Planche, Benjamin and Gao, Zhongpai and Li, Yamin and Luan, Tianyu and Ding, Hao and Chen, Terrence and Noble, Jack and Wu, Ziyan}, year = {2024}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
The 2nd AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

Kuan-Chuan Peng, Abhishek Aich , and Ziyan Wu

MDPI Comput. Sci. Math. Forum, 2024

Abs Bib Paper Website

The official proceedings of the Second Workshop on Artificial Intelligence with Biased or Scarce Data in conjunction with AAAI Conference on Artificial Intelligence 2024.
@article{aibsdworkshop2024, title = {The 2nd AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)}, author = {Peng, Kuan-Chuan and Aich, Abhishek and Wu, Ziyan}, year = {2024}, journal = {MDPI Comput. Sci. Math. Forum}, volume = {9}, number = {1}, }
PBADet: A One-Stage Anchor-Free Approach for Part-Body Association

Zhongpai Gao, Huayi Zhou, Abhishek Sharma, Meng Zheng, Benjamin Planche , Terrence Chen , and Ziyan Wu

International Conference on Learning Representations (ICLR), 2024

Abs Bib Paper

We presents PBADet, a novel one-stage, anchor-free approach for part-body association detection. Building upon the anchor-free object representation across multi-scale feature maps, we introduce a singular part-to-body center offset that effectively encapsulates the relationship between parts and their parent bodies.
@article{pbadet2024, title = {PBADet: A One-Stage Anchor-Free Approach for Part-Body Association}, author = {Gao, Zhongpai and Zhou, Huayi and Sharma, Abhishek and Zheng, Meng and Planche, Benjamin and Chen, Terrence and Wu, Ziyan}, year = {2024}, journal = {International Conference on Learning Representations (ICLR)}, }
Implicit Modeling of Non-rigid Objects with Cross-Category Signals

Yuchun Liu, Benjamin Planche, Meng Zheng, Zhongpai Gao, Pierre Sibut-Bourde, Fan Yang , Terrence Chen , and Ziyan Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Abs Bib Paper

In this work, we propose MODIF, a multi-object deep implicit function that jointly learns the deformation fields and instance-specific latent codes for multiple objects at once. Our emphasis is on non-rigid, non-interpenetrating entities such as organs.
@article{implicit2024, title = {Implicit Modeling of Non-rigid Objects with Cross-Category Signals}, author = {Liu, Yuchun and Planche, Benjamin and Zheng, Meng and Gao, Zhongpai and Sibut-Bourde, Pierre and Yang, Fan and Chen, Terrence and Wu, Ziyan}, year = {2024}, journal = {AAAI Conference on Artificial Intelligence (AAAI)}, }
Disguise without Disruption: Utility-Preserving Face De-Identification

Zikui Cai, Zhongpai Gao, Benjamin Planche, Meng Zheng , Terrence Chen, M. Salman Asif , and Ziyan Wu

AAAI Conference on Artificial Intelligence (AAAI), 2024

Abs Bib Paper

In this paper, we introduce Disguise, a novel algorithm that seamlessly de-identifies facial images while ensuring the usability of the modified data. Our solution is firmly grounded in the domains of differential privacy and ensemble-learning research.
@article{disguise2024, title = {Disguise without Disruption: Utility-Preserving Face De-Identification}, author = {Cai, Zikui and Gao, Zhongpai and Planche, Benjamin and Zheng, Meng and Chen, Terrence and Asif, M. Salman and Wu, Ziyan}, year = {2024}, journal = {AAAI Conference on Artificial Intelligence (AAAI)}, }
Federated Learning via Input-Output Collaborative Distillation

Xuan Gong , Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang , Ziyan Wu, Baochang Zhang , Yefeng Zheng, and David Doermann

AAAI Conference on Artificial Intelligence (AAAI), 2024

Abs Bib Paper

We propose a federated learning framework eliminating any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users.
@article{federated2024, title = {Federated Learning via Input-Output Collaborative Distillation}, author = {Gong, Xuan and Li, Shanglin and Bao, Yuxiang and Yao, Barry and Huang, Yawen and Wu, Ziyan and Zhang, Baochang and Zheng, Yefeng and Doermann, David}, year = {2024}, journal = {AAAI Conference on Artificial Intelligence (AAAI)}, }

2023

CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation

Ruihao Xia, Chaoqiang Zhao, Meng Zheng , Ziyan Wu , Qiyu Sun , and Yang Tang

IEEE/CVF International Conference on Computer Vision (ICCV), 2023

Abs Bib Paper Website

Event cameras, as a new form of vision sensors, are complementary to conventional cameras with their high dynamic range. We propose a novel unsupervised Cross-Modality Domain Adaptation (CMDA) framework to leverage multi-modality (Images and Events) information for nighttime semantic segmentation.
@article{cmda2023, title = {CMDA: Cross-Modality Domain Adaptation for Nighttime Semantic Segmentation}, author = {Xia, Ruihao and Zhao, Chaoqiang and Zheng, Meng and Wu, Ziyan and Sun, Qiyu and Tang, Yang}, year = {2023}, journal = {IEEE/CVF International Conference on Computer Vision (ICCV)}, }
oral
Progressive Multi-view Human Mesh Recovery with Self-Supervision

Xuan Gong, Liangchen Song, Meng Zheng, Benjamin Planche , Terrence Chen, Junsong Yuan, David Doermann , and Ziyan Wu

AAAI Conference on Artificial Intelligence (AAAI), 2023

Abs Bib Paper

We propose a novel simulation-based training pipeline for multi-view human mesh recovery, which (a) relies on intermediate 2D representations which are more robust to synthetic-to-real domain gap; (b) leverages learnable calibration and triangulation to adapt to more diversified camera setups; and (c) progressively aggregates multi-view information in a canonical 3D space.
@article{progressive2023, title = {Progressive Multi-view Human Mesh Recovery with Self-Supervision}, author = {Gong, Xuan and Song, Liangchen and Zheng, Meng and Planche, Benjamin and Chen, Terrence and Yuan, Junsong and Doermann, David and Wu, Ziyan}, year = {2023}, journal = {AAAI Conference on Artificial Intelligence (AAAI)}, note = {oral}, }

2022

Federated Learning with Privacy-Preserving Ensemble Attention Distillation

Xuan Gong, Liangchen Song, Rishi Vedula, Abhishek Sharma, Meng Zheng, Benjamin Planche, Arun Innanje , Terrence Chen, Junsong Yuan, David Doermann , and Ziyan Wu

IEEE Transactions on Medical Imaging (TMI), 2022

Abs Bib Paper

We propose a privacy-preserving FL framework leveraging unlabeled public data for one-way offline knowledge distillation. The central model is learned from local knowledge via ensemble attention distillation.
@article{flppead2022, title = {Federated Learning with Privacy-Preserving Ensemble Attention Distillation}, author = {Gong, Xuan and Song, Liangchen and Vedula, Rishi and Sharma, Abhishek and Zheng, Meng and Planche, Benjamin and Innanje, Arun and Chen, Terrence and Yuan, Junsong and Doermann, David and Wu, Ziyan}, year = {2022}, journal = {IEEE Transactions on Medical Imaging (TMI)}, }
spotlight
Forecasting Human Trajectory from Scene History

Mancheng Meng , Ziyan Wu , Terrence Chen, Xiran Cai , Xiang Sean Zhou, Fan Yang, and Dinggang Shen

Annual Conference on Neural Information Processing Systems (NeurIPS), 2022

Abs Bib Paper Website

We introduce a novel framework Scene History Excavating Network (SHENet), where the scene history is leveraged in a simple yet effective approach to forecast a person’s future trajectory.
@article{trajectory2022, title = {Forecasting Human Trajectory from Scene History}, author = {Meng, Mancheng and Wu, Ziyan and Chen, Terrence and Cai, Xiran and Zhou, Xiang Sean and Yang, Fan and Shen, Dinggang}, year = {2022}, journal = {Annual Conference on Neural Information Processing Systems (NeurIPS)}, note = {spotlight}, }
oral
PREF: Predictability Regularized Neural Motion Fields

Liangchen Song, Xuan Gong, Benjamin Planche, Meng Zheng, David Doermann, Junsong Yuan , Terrence Chen , and Ziyan Wu

European Conference on Computer Vision (ECCV), 2022

Abs Bib Paper Website

We leverage a neural motion field for estimating the motion of all points in a multiview setting. We propose to regularize the estimated motion to be predictable.
@article{pref2022, title = {PREF: Predictability Regularized Neural Motion Fields}, author = {Song, Liangchen and Gong, Xuan and Planche, Benjamin and Zheng, Meng and Doermann, David and Yuan, Junsong and Chen, Terrence and Wu, Ziyan}, year = {2022}, journal = {European Conference on Computer Vision (ECCV)}, note = {oral}, }
Self-supervised Human Mesh Recovery with Cross-Representation Alignment

Xuan Gong, Meng Zheng, Benjamin Planche, Srikrishna Karanam , Terrence Chen, David Doermann , and Ziyan Wu

European Conference on Computer Vision (ECCV), 2022

Abs Bib Paper

We propose cross-representation alignment utilizing the complementary information from the robust but sparse representation (2D keypoints). Specifically, the alignment errors between initial mesh estimation and both 2D representations are forwarded into regressor and dynamically corrected.
@article{cra2022, title = {Self-supervised Human Mesh Recovery with Cross-Representation Alignment}, author = {Gong, Xuan and Zheng, Meng and Planche, Benjamin and Karanam, Srikrishna and Chen, Terrence and Doermann, David and Wu, Ziyan}, year = {2022}, journal = {European Conference on Computer Vision (ECCV)}, }
PseudoClick: Interactive Image Segmentation with Click Imitation

Qin Liu, Meng Zheng, Benjamin Planche, Srikrishna Karanam , Terrence Chen, Marc Niethammer , and Ziyan Wu

European Conference on Computer Vision (ECCV), 2022

Abs Bib Paper

We propose PseudoClick, a generic framework that enables existing segmentation networks to propose candidate next clicks as an imitation of human clicks to refine the segmentation mask.
@article{pseudoclick2022, title = {PseudoClick: Interactive Image Segmentation with Click Imitation}, author = {Liu, Qin and Zheng, Meng and Planche, Benjamin and Karanam, Srikrishna and Chen, Terrence and Niethammer, Marc and Wu, Ziyan}, year = {2022}, journal = {European Conference on Computer Vision (ECCV)}, }
early accept
Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion

Meng Zheng, Benjamin Planche, Xuan Gong, Fan Yang , Terrence Chen , and Ziyan Wu

Medical Image Computing and Computer Assisted Intervention (MICCAI), 2022

Abs Bib Paper

We propose a generic modularized 3D patient modeling method consists of (a) a multi-modal keypoint detection module with attentive fusion; and (b) a self-supervised 3D mesh regression module.
@article{patientmodeling2022, title = {Self-supervised 3D Patient Modeling with Multi-modal Attentive Fusion}, author = {Zheng, Meng and Planche, Benjamin and Gong, Xuan and Yang, Fan and Chen, Terrence and Wu, Ziyan}, year = {2022}, journal = {Medical Image Computing and Computer Assisted Intervention (MICCAI)}, note = {early accept}, }

Visual Similarity Attention

Meng Zheng, Srikrishna Karanam , Terrence Chen, Richard J. Radke , and Ziyan Wu

International Joint Conference on Artificial Intelligence (IJCAI), 2022

Abs Bib Paper

@article{similarityattention2022,
  title = {Visual Similarity Attention},
  author = {Zheng, Meng and Karanam, Srikrishna and Chen, Terrence and Radke, Richard J. and Wu, Ziyan},
  year = {2022},
  journal = {International Joint Conference on Artificial Intelligence (IJCAI)},
}

SMPL-A: Modeling Person-Specific Deformable Anatomy

Hengtao Guo, Benjamin Planche, Meng Zheng, Srikrishna Karanam , Terrence Chen , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022

Abs Bib Paper

We present the first learning-based approach to estimate the patient’s internal organ deformation for arbitrary human poses.
@article{smpla2022, title = {SMPL-A: Modeling Person-Specific Deformable Anatomy}, author = {Guo, Hengtao and Planche, Benjamin and Zheng, Meng and Karanam, Srikrishna and Chen, Terrence and Wu, Ziyan}, year = {2022}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)

Kuan-Chuan Peng , and Ziyan Wu

MDPI Comput. Sci. Math. Forum, 2022

Abs Bib Paper

The official proceedings of the Workshop on Artificial Intelligence with Biased or Scarce Data in conjunction with AAAI Conference on Artificial Intelligence 2022.
@article{aibsdworkshop2022, title = {AAAI Workshop on Artificial Intelligence with Biased or Scarce Data (AIBSD)}, author = {Peng, Kuan-Chuan and Wu, Ziyan}, year = {2022}, journal = {MDPI Comput. Sci. Math. Forum}, }
Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation

Xuan Gong, Abhishek Sharma, Srikrishna Karanam , Ziyan Wu , Terrence Chen, David Doermann, and Arun Innanje

AAAI Conference on Artificial Intelligence (AAAI), 2022

Abs Bib Paper

We propose a quantized and noisy ensemble of local predictions from completely trained local models for stronger privacy guarantees without sacrificing accuracy. Based on extensive experiments on classification and segmentation tasks, we show that our method outperforms baseline FL algorithms with superior performance in both accuracy and data privacy preservation.
@article{preserving2022162, title = {Preserving Privacy in Federated Learning with Ensemble Cross-Domain Knowledge Distillation}, author = {Gong, Xuan and Sharma, Abhishek and Karanam, Srikrishna and Wu, Ziyan and Chen, Terrence and Doermann, David and Innanje, Arun}, year = {2022}, journal = {AAAI Conference on Artificial Intelligence (AAAI)}, }
Multi-motion and Appearance Self-Supervised Moving Object Detection

Fan Yang, Srikrishna Karanam, Meng Zheng , Terrence Chen, Haibin Ling , and Ziyan Wu

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022

Abs Bib Paper

We propose a Multi-motion and Appearance Self-supervised Network (MASNet) to introduce multi-scale motion information and appearance information of scene for MOD. Introducing multi-scale motion can aggregate these regions to form a more complete detection. Appearance information can serve as another cue for MOD when the motion independence is not reliable and for removing false detection in background caused by locally independent background motion.
@article{multimotion2022157, title = {Multi-motion and Appearance Self-Supervised Moving Object Detection}, author = {Yang, Fan and Karanam, Srikrishna and Zheng, Meng and Chen, Terrence and Ling, Haibin and Wu, Ziyan}, year = {2022}, journal = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, }
Zero-shot Deep Domain Adaptation with Common Representation Learning

Mohammed Kutbi, Kuan-Chuan Peng , and Ziyan Wu

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 44, No. 7, pp. 3909-3924, 2022

Abs Bib Paper

We proposed zero-shot deep domain adaptation (ZDDA). ZDDA-C/ML learns to generate common representations for source and target domains data. Then, either domain representation is used later to train a system that works on both domains or having the ability to eliminate the need to either domain in sensor fusion settings. In this paper, two variants of ZDDA have been developed for classification and metric learning task respectively.
@article{zeroshot2022114, title = {Zero-shot Deep Domain Adaptation with Common Representation Learning}, author = {Kutbi, Mohammed and Peng, Kuan-Chuan and Wu, Ziyan}, year = {2022}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 44, No. 7, pp. 3909-3924}, }

2021

Everybody Is Unique: Towards Unbiased Human Mesh Recovery

Ren Li, Srikrishna Karanam, Meng Zheng , Terrence Chen , and Ziyan Wu

British Machine Vision Conference (BMVC)(oral), 2021

Abs Bib Paper

We present a generalized human mesh optimization algorithm that substantially improves the performance of existing methods on both obese person images as well as community-standard benchmark datasets. The proposed method utilizes only 2D annotations without relying on supervision from expensive-to-create mesh parameters.
@article{everybody2021140, title = {Everybody Is Unique: Towards Unbiased Human Mesh Recovery}, author = {Li, Ren and Karanam, Srikrishna and Zheng, Meng and Chen, Terrence and Wu, Ziyan}, year = {2021}, journal = {British Machine Vision Conference (BMVC)(oral)}, }
Learning Local Recurrent Models for Human Mesh Recovery

Runze Li, Srikrishna Karanam , Ren Li , Terrence Chen, Bir Bhanu , and Ziyan Wu

International Conference on 3D Vision (3DV), 2021

Abs Bib Paper

We present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model. We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body.
@article{learning2021149, title = {Learning Local Recurrent Models for Human Mesh Recovery}, author = {Li, Runze and Karanam, Srikrishna and Li, Ren and Chen, Terrence and Bhanu, Bir and Wu, Ziyan}, year = {2021}, journal = {International Conference on 3D Vision (3DV)}, }
Ensemble Attention Distillation for Privacy-Preserving Federated Learning

Xuan Gong, Abhishek Sharma, Srikrishna Karanam , Ziyan Wu , Terrence Chen, David Doermann, and Arun Innanje

IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Abs Bib Paper

We propose a new distillation-based FL framework that can preserve privacy by design, while also consuming substantially less network communication resources when compared to the current methods. Our framework engages in inter-node communication using only publicly available and approved datasets, thereby giving explicit privacy control to the user. To distill knowledge among the various local models, our framework involves a novel ensemble distillation algorithm that uses both final prediction as well as model attention.
@article{ensemble2021187, title = {Ensemble Attention Distillation for Privacy-Preserving Federated Learning}, author = {Gong, Xuan and Sharma, Abhishek and Karanam, Srikrishna and Wu, Ziyan and Chen, Terrence and Doermann, David and Innanje, Arun}, year = {2021}, journal = {IEEE/CVF International Conference on Computer Vision (ICCV)}, }
Spatio-Temporal Representation Factorization for Video-based Person Re-Identification

Abhishek Aich, Meng Zheng, Srikrishna Karanam , Terrence Chen, Amit K. Roy-Chowdhury , and Ziyan Wu

IEEE/CVF International Conference on Computer Vision (ICCV), 2021

Abs Bib Paper

We propose Spatio-Temporal Representation Factorization (STRF), a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID. The key innovations of STRF over prior work include explicit pathways for learning discriminative temporal and spatial features, with each component further factorized to capture complementary person-specific appearance and motion information. Specifically, temporal factorization comprises two branches, one each for static features (e.g., the color of clothes) that do not change much over time, and dynamic features (e.g., walking patterns) that change over time.
@article{spatiotemporal2021147, title = {Spatio-Temporal Representation Factorization for Video-based Person Re-Identification}, author = {Aich, Abhishek and Zheng, Meng and Karanam, Srikrishna and Chen, Terrence and Roy-Chowdhury, Amit K. and Wu, Ziyan}, year = {2021}, journal = {IEEE/CVF International Conference on Computer Vision (ICCV)}, }
A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam , Terrence Chen, Laurent Itti , and Ziyan Wu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021

Abs Bib Paper

We propose a novel framework to interpret neural networks which extracts relevant class-specific visual concepts and organizes them using structural concepts graphs based on pairwise concept relationships. By means of knowledge distillation, we show VRX can take a step towards mimicking the reasoning process of NNs and provide logical, concept-level explanations for final model decisions. With extensive experiments, we empirically show VRX can meaningfully answer “why” and “why not” questions about the prediction, providing easy-to-understand insights about the reasoning process. We also show that these insights can potentially provide guidance on improving NN’s performance.
@article{a2021136, title = {A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts}, author = {Ge, Yunhao and Xiao, Yao and Xu, Zhi and Zheng, Meng and Karanam, Srikrishna and Chen, Terrence and Itti, Laurent and Wu, Ziyan}, year = {2021}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis

Xi Ouyang, Srikrishna Karanam , Ziyan Wu，Terrence Chen, Jiayu Huo , Xiang Sean Zhou , Qian Wang, and Jie-Zhi Cheng

IEEE Transactions on Medical Imaging (TMI), Vol. 40, No. 10, pp. 2698-2710, 2021

Abs Bib Paper

We propose a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues.
@article{learning2021132, title = {Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis}, author = {Ouyang, Xi and Karanam, Srikrishna and Chen, Ziyan Wu，Terrence and Huo, Jiayu and Zhou, Xiang Sean and Wang, Qian and Cheng, Jie-Zhi}, year = {2021}, journal = {IEEE Transactions on Medical Imaging (TMI), Vol. 40, No. 10, pp. 2698-2710}, }

2020

Robust Multi-modal 3D Patient Body Modeling

Fan Yang , Ren Li, Georgios Georgakis, Srikrishna Karanam , Terrence Chen, Haibin Ling , and Ziyan Wu

Medical Image Computing and Computer Assisted Intervention (MICCAI), 2020

Abs Bib Paper

This paper considers the problem of 3D patient body modeling. Such a 3D model provides valuable information for improving patient care, streamlining clinical workflow, automated parameter optimization for medical devices etc. We present a novel robust dynamic fusion technique that facilitates flexible multi-modal inference, resulting in accurate 3D body modeling even when the input sensor modality is only a subset of the training modalities.
@article{robust2020, title = {Robust Multi-modal 3D Patient Body Modeling}, author = {Yang, Fan and Li, Ren and Georgakis, Georgios and Karanam, Srikrishna and Chen, Terrence and Ling, Haibin and Wu, Ziyan}, year = {2020}, journal = {Medical Image Computing and Computer Assisted Intervention (MICCAI)}, }
Towards Contactless Patient Positioning

Fan Yang, Srikrishna Karanam , Ren Li, Wei Hu , Terrence Chen , and Ziyan Wu

IEEE Transations on Medical Imaging (TMI), Vol. 39, No. 8, pp. 2701-2710, 2020

Abs Bib Paper

The COVID-19 pandemic, caused by the highly contagious SARS-CoV-2 virus, has overwhelmed healthcare systems worldwide, putting medical professionals at a high risk of getting infected themselves due to a global shortage of personal protective equipment. To help alleviate this problem, we design and develop a contactless patient positioning system that can enable scanning patients in a completely remote and contactless fashion. Our key design objective is to reduce the physical contact time with a patient as much as possible, which we achieve with our contactless workflow.
@article{towards2020, title = {Towards Contactless Patient Positioning}, author = {Yang, Fan and Karanam, Srikrishna and Li, Ren and Hu, Wei and Chen, Terrence and Wu, Ziyan}, year = {2020}, journal = {IEEE Transations on Medical Imaging (TMI), Vol. 39, No. 8, pp. 2701-2710}, }
Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19

Feng Shi^*, Jun Wang^* , Jun Shi^* , Ziyan Wu , Qian Wang, Zhenyu Tang, Kelei He, Yinghuan Shi, and Dinggang Shen

IEEE Reviews in Biomedical Engineering (RBME), Vol. 14, pp. 4-15, 2020

Abs Bib Paper

We cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up. We particularly focus on the integration of AI with X-ray and CT, both of which are widely used in the frontline hospitals, in order to depict the latest progress of medical imaging and radiology fighting against COVID-19.
@article{review2020, title = {Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19}, author = {Shi, Feng and Wang, Jun and Shi, Jun and Wu, Ziyan and Wang, Qian and Tang, Zhenyu and He, Kelei and Shi, Yinghuan and Shen, Dinggang}, year = {2020}, journal = {IEEE Reviews in Biomedical Engineering (RBME), Vol. 14, pp. 4-15}, }
oral
Towards Visually Explaining Variational Autoencoders

Wenqian Liu^*, Runze Li^*, Meng Zheng, Srikrishna Karanam , Ziyan Wu, Bir Bhanu, Richard J. Radke, and Octavia Camps

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020

Abs Bib Paper

We propose the first technique to visually explain VAEs by means of gradient-based attention. We present methods to generate visual attention from the learned latent space, and also demonstrate such attention explanations serve more than just explaining VAE predictions. We show how these attention maps can be used to localize anomalies in images, and how they can be infused into model training, helping bootstrap the VAE into learning improved latent space disentanglement.
@article{towardsvae2020, title = {Towards Visually Explaining Variational Autoencoders}, author = {Liu, Wenqian and Li, Runze and Zheng, Meng and Karanam, Srikrishna and Wu, Ziyan and Bhanu, Bir and Radke, Richard J. and Camps, Octavia}, year = {2020}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, note = {oral}, }
Hierarchical Kinematic Human Mesh Recovery

Georgios Georgakis^* , Ren Li^*, Srikrishna Karanam , Terrence Chen, Jana Kosecka , and Ziyan Wu

European Conference on Computer Vision (ECCV), 2020

Abs Bib Paper

In this work, we address this gap by proposing a new technique for regression of human parametric model that is explicitly informed by the known hierarchical structure, including joint interdependencies of the model. This results in a strong prior-informed design of the regressor architecture and an associated hierarchical optimization that is flexible to be used in conjunction with the current standard frameworks for 3D human mesh recovery. *Equal Contributions
@article{hierarchical2020100, title = {Hierarchical Kinematic Human Mesh Recovery}, author = {Georgakis, Georgios and Li, Ren and Karanam, Srikrishna and Chen, Terrence and Kosecka, Jana and Wu, Ziyan}, year = {2020}, journal = {European Conference on Computer Vision (ECCV)}, }
Guided Attention Inference Network

Kunpeng Li , Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, and Yun Fu

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 42, No. 12, pp. 2996-3010, 2020

Abs Bib Paper

This is an extension of our CVPR 18 work with added support of bounding box labels seamlessly integrated with image level and pixel level labels for weakly supervised semantic segmentation.
@article{guided2020180, title = {Guided Attention Inference Network}, author = {Li, Kunpeng and Wu, Ziyan and Peng, Kuan-Chuan and Ernst, Jan and Fu, Yun}, year = {2020}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), Vol. 42, No. 12, pp. 2996-3010}, }

2019

Learning without Memorizing

Prithviraj Dhar^*, Rajat Vikram Singh^*, Kuan-Chuan Peng , Ziyan Wu, and Rama Chellappa

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Abs Bib Paper

Knowledge distillation should not only focus on "what", but also "why". We peoposed an online learning method to preserve the exisiting knowledge without storing any data.
@article{memorizing2019, title = {Learning without Memorizing}, author = {Dhar, Prithviraj and Singh, Rajat Vikram and Peng, Kuan-Chuan and Wu, Ziyan and Chellappa, Rama}, year = {2019}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
Re-identification with Consistent Attentive Siamese Networks

Meng Zheng, Srikrishna Karanam , Ziyan Wu, and Richard J. Radke

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019

Abs Bib Paper

We proposed the first learning architecture that integrates attention consistency modeling and Siamese representation learning in a joint learning framework for person re-id.
@article{casn2019, title = {Re-identification with Consistent Attentive Siamese Networks}, author = {Zheng, Meng and Karanam, Srikrishna and Wu, Ziyan and Radke, Richard J.}, year = {2019}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }
Counterfactual Visual Explanations

Yash Goyal , Ziyan Wu, Jan Ernst, Dhruv Batra, Devi Parikh, and Stefan Lee

International Conference on Machine Learning (ICML), 2019

Abs Bib Paper

A technique to produce counterfactual visual explanations. Given a ’query’ image I for which a vision system predicts class c, a counterfactual visual explanation identifies how I could change such that the system would output a different specified class c’.
@article{counterfactual2019, title = {Counterfactual Visual Explanations}, author = {Goyal, Yash and Wu, Ziyan and Ernst, Jan and Batra, Dhruv and Parikh, Devi and Lee, Stefan}, year = {2019}, journal = {International Conference on Machine Learning (ICML)}, }
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets

Srikrishna Karanam^*, Mengran Gou^* , Ziyan Wu, Angels Rates-Borras, Octavia Camps, and Richard J. Radke

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2019

Abs Bib Paper

We present an extensive review and performance evaluation of single and multi-shot re-id algorithms based on a new large-scale dataset.
@article{benchmark2019, title = {A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets}, author = {Karanam, Srikrishna and Gou, Mengran and Wu, Ziyan and Rates-Borras, Angels and Camps, Octavia and Radke, Richard J.}, year = {2019}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)}, }
Incremental Scene Synthesis

Benjamin Planche, Xuejian Rong , Ziyan Wu, Srikrishna Karanam, Harald Kosch, YingLi Tian, Jan Ernst, and Andreas Hutter

Annual Conference on Neural Information Processing Systems (NeurIPS), 2019

Abs Bib Paper

We present a method to incrementally generate complete 2D or 3D scenes with global consistentcy at each step according to a learned scene prior. Real observations of a scene can be incorporated while observing global consistency and unobserved regions can be hallucinated locally in consistence with previous observations, hallucinations as well as global priors. Hallucinations are statistical in nature, i.e., different scenes can be generated from the same observations.
@article{incremental2019, title = {Incremental Scene Synthesis}, author = {Planche, Benjamin and Rong, Xuejian and Wu, Ziyan and Karanam, Srikrishna and Kosch, Harald and Tian, YingLi and Ernst, Jan and Hutter, Andreas}, year = {2019}, journal = {Annual Conference on Neural Information Processing Systems (NeurIPS)}, }
Sharpen Focus: Learning with Attention Separability and Consistency

Lezi Wang , Ziyan Wu, Srikrishna Karanam, Kuan-Chuan Peng, Rajat Vikram Singh , Bo Liu , and Dimitris N. Metaxas

IEEE International Conference on Computer Vision (ICCV), 2019

Abs Bib Paper

We improve the generalizability of CNNs by means of a new framework that makes class-discriminative attention a principled part of the learning process. We propose new learning objectives for attention separability and cross-layer consistency, which result in improved attention discriminability and reduced visual confusion.
@article{sharpen2019, title = {Sharpen Focus: Learning with Attention Separability and Consistency}, author = {Wang, Lezi and Wu, Ziyan and Karanam, Srikrishna and Peng, Kuan-Chuan and Singh, Rajat Vikram and Liu, Bo and Metaxas, Dimitris N.}, year = {2019}, journal = {IEEE International Conference on Computer Vision (ICCV)}, }
Learning Local RGB-to-CAD Correspondences for Object Pose Estimation

Georgios Georgakis, Srikrishna Karanam , Ziyan Wu, and Jana Kosecka

IEEE International Conference on Computer Vision (ICCV), 2019

Abs Bib Paper

We solve the key problem of existing 3D object pose estimation methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation. Our method requires neither real-world textures for CAD models nor explicit 3D pose annotations for RGB images.
@article{learning2019, title = {Learning Local RGB-to-CAD Correspondences for Object Pose Estimation}, author = {Georgakis, Georgios and Karanam, Srikrishna and Wu, Ziyan and Kosecka, Jana}, year = {2019}, journal = {IEEE International Conference on Computer Vision (ICCV)}, }
Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition

Benjamin Planche^*, Sergey Zakharov^* , Ziyan Wu, Andreas Hutter, Harald Kosch, and Slobodan Ilic

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019

Abs Bib Paper

We introduce a pipeline to map unseen target samples into the synthetic domain used to train task-specific methods. Denoising the data and retaining only the features these recognition algorithms are familiar with.
@article{seeing2019, title = {Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition}, author = {Planche, Benjamin and Zakharov, Sergey and Wu, Ziyan and Hutter, Andreas and Kosch, Harald and Ilic, Slobodan}, year = {2019}, journal = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, }

2018

Zero Shot Deep Domain Adaptation

Kuan-Chuan Peng , Ziyan Wu, and Jan Ernst

European Conference on Computer Vision (ECCV), 2018

Abs Bib Paper

We propose zero-shot deep domain adaptation (ZDDA) for domain adaptation and sensor fusion. ZDDA learns from the task-irrelevant dual-domain pairs when the task-relevant target-domain training data is unavailable.
@article{zdda2018, title = {Zero Shot Deep Domain Adaptation}, author = {Peng, Kuan-Chuan and Wu, Ziyan and Ernst, Jan}, year = {2018}, journal = {European Conference on Computer Vision (ECCV)}, }
spotlight
Tell Me Where To Look: Guided Attention Inference Network

Kunpeng Li , Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, and Yun Fu

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Abs Bib Paper Website

We address three shortcomings of previous approaches in modeling attention maps: (1) making attention maps an explicit component of end-to-end training, (2) providing self-guidance directy on these maps, and (3) bridging the gap between weak and extra supervision.
@article{tellmewhere2018, title = {Tell Me Where To Look: Guided Attention Inference Network}, author = {Li, Kunpeng and Wu, Ziyan and Peng, Kuan-Chuan and Ernst, Jan and Fu, Yun}, year = {2018}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, note = {spotlight}, }
spotlight
Learning Compositional Visual Concepts with Mutual Consistency

Yunye Gong, Srikrishna Karanam , Ziyan Wu, Kuan-Chuan Peng, Jan Ernst , and Peter C. Doerschuk

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Abs Bib Paper

We proposed ConceptGAN, a novel concept learning framework where we seek to capture underlying semantic shifts between data domains instead of mappings restricted to training distributions.
@article{conceptgan2018, title = {Learning Compositional Visual Concepts with Mutual Consistency}, author = {Gong, Yunye and Karanam, Srikrishna and Wu, Ziyan and Peng, Kuan-Chuan and Ernst, Jan and Doerschuk, Peter C.}, year = {2018}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, note = {spotlight}, }
End-to-End Learning of Keypoint Detector and Descriptor for Pose Invariant 3D Matching

Georgios Georgakis, Srikrishna Karanam , Ziyan Wu, Jan Ernst, and Jana Kosecka

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018

Abs Bib Paper

We proposed an end-to-end learning framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans.
@article{e2ekeypoint2018, title = {End-to-End Learning of Keypoint Detector and Descriptor for Pose Invariant 3D Matching}, author = {Georgakis, Georgios and Karanam, Srikrishna and Wu, Ziyan and Ernst, Jan and Kosecka, Jana}, year = {2018}, journal = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, }

Learning Affine Hull Representations for Multi-Shot Person Re-Identification

Srikrishna Karanam , Ziyan Wu, and Richard J. Radke

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), 2018

Abs Bib Paper

@article{affinehull2018,
  title = {Learning Affine Hull Representations for Multi-Shot Person Re-Identification},
  author = {Karanam, Srikrishna and Wu, Ziyan and Radke, Richard J.},
  year = {2018},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)},
  volume = {28},
  number = {10},
  pages = {2500--2512},
}

oral
Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only

Sergey Zakharov^*, Benjamin Planche^* , Ziyan Wu, Andreas Hutter, Harald Kosch, and Slobodan Ilic

International Conference on 3D Vision (3DV), 2018

Abs Bib Paper

We propose a novel approach leveraging only CAD models to bridge the realism gap. A GAN learns to effectively segment depth images and recover the clean synthetic-looking depth information.
@article{keepitunreal2018, title = {Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only}, author = {Zakharov, Sergey and Planche, Benjamin and Wu, Ziyan and Hutter, Andreas and Kosch, Harald and Ilic, Slobodan}, year = {2018}, journal = {International Conference on 3D Vision (3DV)}, note = {oral}, }

2017

Weakly Supervised Summarization of Web Videos

Rameswar Panda, Abir Das , Ziyan Wu, Jan Ernst, and Amit K. Roy-Chowdhury

IEEE International Conference on Computer Vision (ICCV), 2017

Abs Bib Paper

We proposed a weakly supervised approach to summarize videos with only video-level annotation, introducing an effective method for computing spatio-temporal importance scores.
@article{videosum2017, title = {Weakly Supervised Summarization of Web Videos}, author = {Panda, Rameswar and Das, Abir and Wu, Ziyan and Ernst, Jan and Roy-Chowdhury, Amit K.}, year = {2017}, journal = {IEEE International Conference on Computer Vision (ICCV)}, }

oral

DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition

Benjamin Planche , Ziyan Wu, Kai Ma, Shanhui Sun, Stefan Kluckner , Terrence Chen, Andreas Hutter, Harald Kosch, and Jan Ernst

International Conference on 3D Vision (3DV), 2017

Abs Bib Paper

@article{depthsynth2017,
  title = {DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition},
  author = {Planche, Benjamin and Wu, Ziyan and Ma, Kai and Sun, Shanhui and Kluckner, Stefan and Chen, Terrence and Hutter, Andreas and Kosch, Harald and Ernst, Jan},
  year = {2017},
  journal = {International Conference on 3D Vision (3DV)},
  note = {oral},
}

Vessel Tree Tracking in Angiographic Sequences

Dong Zhang, Shanhui Sun , Ziyan Wu , Bor-Jeng Chen , and Terrence Chen

Journal of Medical Imaging (JMI), 2017

Abs Bib Paper

@article{vesseltree2017,
  title = {Vessel Tree Tracking in Angiographic Sequences},
  author = {Zhang, Dong and Sun, Shanhui and Wu, Ziyan and Chen, Bor-Jeng and Chen, Terrence},
  year = {2017},
  journal = {Journal of Medical Imaging (JMI)},
  volume = {4},
  number = {2},
  pages = {025001},
}

From the Lab to the Real World: Re-Identification in an Airport Camera Network

Octavia Camps, Mengran Gou, Tom Hebble, Srikrishna Karanam, Oliver Lehmann , Yang Li, Richard J. Radke , Ziyan Wu, and Fei Xiong

IEEE Transactions on Circuits and Systems for Video Technology (TCSV), 2017

Abs Bib Paper

@article{labtorealistic2017,
  title = {From the Lab to the Real World: Re-Identification in an Airport Camera Network},
  author = {Camps, Octavia and Gou, Mengran and Hebble, Tom and Karanam, Srikrishna and Lehmann, Oliver and Li, Yang and Radke, Richard J. and Wu, Ziyan and Xiong, Fei},
  year = {2017},
  journal = {IEEE Transactions on Circuits and Systems for Video Technology (TCSV)},
  volume = {27},
  number = {3},
  pages = {540--553},
}

2016

Human Re-Identification

Ziyan Wu

2016

Abs Bib Website

This book covers aspects of human re-identification problems related to computer vision and machine learning, bridging the gap between research and reality.
@book{reidbook2016, title = {Human Re-Identification}, author = {Wu, Ziyan}, year = {2016}, publisher = {Springer}, isbn = {978-3-319-40991-7}, }
Guidewire Tracking Using a Novel Sequential Segment Optimization Method in Interventional X-Ray Videos

Bor-Jeng Chen , Ziyan Wu, Shanhui Sun , Dong Zhang , and Terrence Chen

IEEE International Symposium on Biomedical Imaging (ISBI), 2016

Abs Bib Paper

We model the wire-like structure as a sequence of small segments and formulate guidewire tracking as a graph-based optimization problem.
@article{guidewire2016, title = {Guidewire Tracking Using a Novel Sequential Segment Optimization Method in Interventional X-Ray Videos}, author = {Chen, Bor-Jeng and Wu, Ziyan and Sun, Shanhui and Zhang, Dong and Chen, Terrence}, year = {2016}, journal = {IEEE International Symposium on Biomedical Imaging (ISBI)}, }

2015

Viewpoint Invariant Human Re-Identification in Camera Networks Using Pose Priors and Subject-Discriminative Features

Ziyan Wu , Yang Li, and Richard J. Radke

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015

Abs Bib Paper

@article{poseprior2015,
  title = {Viewpoint Invariant Human Re-Identification in Camera Networks Using Pose Priors and Subject-Discriminative Features},
  author = {Wu, Ziyan and Li, Yang and Radke, Richard J.},
  year = {2015},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  volume = {37},
  number = {5},
  pages = {1095--1108},
}

Multi-Shot Human Re-Identification Using Adaptive Fisher Discriminant Analysis

Yang Li , Ziyan Wu, Srikrishna Karanam, and Richard J. Radke

British Machine Vision Conference (BMVC), 2015

Abs Bib Paper

We introduce an algorithm to hierarchically cluster image sequences and use the representative data samples to learn a feature subspace maximizing the Fisher criterion.
@article{adaptivefisher2015, title = {Multi-Shot Human Re-Identification Using Adaptive Fisher Discriminant Analysis}, author = {Li, Yang and Wu, Ziyan and Karanam, Srikrishna and Radke, Richard J.}, year = {2015}, journal = {British Machine Vision Conference (BMVC)}, }
Multi-Shot Re-identification with Random-Projection-based Random Forest

Yang Li , Ziyan Wu, and Richard J. Radke

IEEE Winter Conference on Applications of Computer Vision (WACV), 2015

Abs Bib Paper

We perform dimensionality reduction on image feature vectors through random projection for multi-shot Re-ID.
@article{randomforest2015, title = {Multi-Shot Re-identification with Random-Projection-based Random Forest}, author = {Li, Yang and Wu, Ziyan and Radke, Richard J.}, year = {2015}, journal = {IEEE Winter Conference on Applications of Computer Vision (WACV)}, }

2014

Multi-Object Tracking and Association With a Camera Network

Ziyan Wu

Rensselaer Polytechnic Institute (RPI), 2014

Abs Bib

This thesis investigates several important and challenging computer vision problems related to system calibration, multi-object tracking, and target behavior analysis.
@phdthesis{phdthesis2014, title = {Multi-Object Tracking and Association With a Camera Network}, author = {Wu, Ziyan}, year = {2014}, school = {Rensselaer Polytechnic Institute (RPI)}, }
oral
Virtual Insertion: Robust Bundle Adjustment over Long Video Sequences

Ziyan Wu, Han-Pang Chiu, and Zhiwei Zhu

British Machine Vision Conference (BMVC), 2014

Abs Bib Paper

We propose a novel "virtual insertion" scheme for Structure from Motion (SfM), which constructs virtual points and virtual frames to adapt the existence of visual landmark link outage.
@article{virtualinsertion2014, title = {Virtual Insertion: Robust Bundle Adjustment over Long Video Sequences}, author = {Wu, Ziyan and Chiu, Han-Pang and Zhu, Zhiwei}, year = {2014}, journal = {British Machine Vision Conference (BMVC)}, note = {oral}, }

Improving Counterflow Detection in Dense Crowds with Scene Features

Ziyan Wu, and Richard J. Radke

Pattern Recognition Letters (PRL), 2014

Abs Bib Paper

@article{counterflow2014,
  title = {Improving Counterflow Detection in Dense Crowds with Scene Features},
  author = {Wu, Ziyan and Radke, Richard J.},
  year = {2014},
  journal = {Pattern Recognition Letters (PRL)},
  volume = {44},
  pages = {152--160},
}

Real-World Re-Identification in an Airport Camera Network

Yang Li , Ziyan Wu, Srikrishna Karanam, and Richard J. Radke

ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC), 2014

Abs Bib Paper

We discuss the high-level system design of the video surveillance application, and the issues we encountered during our development and testing. We also describe the algorithm framework for our human re-identification software, and discuss considerations of speed and matching performance.
@article{realworld2014194, title = {Real-World Re-Identification in an Airport Camera Network}, author = {Li, Yang and Wu, Ziyan and Karanam, Srikrishna and Radke, Richard J.}, year = {2014}, journal = {ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC)}, }

2013

Keeping a PTZ Camera Calibrated

Ziyan Wu, and Richard J. Radke

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013

Abs Bib Paper

@article{calibrated2013,
  title = {Keeping a PTZ Camera Calibrated},
  author = {Wu, Ziyan and Radke, Richard J.},
  year = {2013},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  volume = {35},
  number = {8},
  pages = {1994--2007},
}

2012

Using Scene Features to Improve Wide-Area Video Surveillance

Ziyan Wu, and Richard J. Radke

Workshop on Camera Networks and Wide Area Scene Analysis (CVPRW), 2012

Abs Bib Paper

We introduce two novel methods to improve the performance of wide area video surveillance applications by using scene features.
@article{scenefeatures2012, title = {Using Scene Features to Improve Wide-Area Video Surveillance}, author = {Wu, Ziyan and Radke, Richard J.}, year = {2012}, journal = {Workshop on Camera Networks and Wide Area Scene Analysis (CVPRW)}, }

2011

Real-Time Airport Security Checkpoint Surveillance Using a Camera Network

Ziyan Wu, and Richard J. Radke

Workshop on Camera Networks and Wide Area Scene Analysis (CVPRW), 2011

Abs Bib Paper

We introduce an airport security checkpoint surveillance system using a camera network that maintains the association between bags and passengers.
@article{airportsecurity2011, title = {Real-Time Airport Security Checkpoint Surveillance Using a Camera Network}, author = {Wu, Ziyan and Radke, Richard J.}, year = {2011}, journal = {Workshop on Camera Networks and Wide Area Scene Analysis (CVPRW)}, }
Towards Improved Paper-based Election Technology

Elisa Barney Smith, Daniel Lopresti, George Nagy , and Ziyan Wu

International Conference on Document Analysis and Recognition (ICDAR), 2011

Abs Bib Paper

Resources are presented for fostering paper-based election technology, comprising a diverse collection of real and simulated ballot and survey images.
@article{electiontech2011, title = {Towards Improved Paper-based Election Technology}, author = {Barney Smith, Elisa and Lopresti, Daniel and Nagy, George and Wu, Ziyan}, year = {2011}, journal = {International Conference on Document Analysis and Recognition (ICDAR)}, }
Characterizing Challenged Minnesota Ballots

George Nagy, Daniel Lopresti, Elisa Barney Smith , and Ziyan Wu

Document Recognition and Retrieval XVIII (DRR), 2011

Abs Bib Paper

Robust tools were developed for determining the underlying grid of the targets on ballots challenged in the 2008 Minnesota elections.
@article{ballots2011, title = {Characterizing Challenged Minnesota Ballots}, author = {Nagy, George and Lopresti, Daniel and Barney Smith, Elisa and Wu, Ziyan}, year = {2011}, journal = {Document Recognition and Retrieval XVIII (DRR)}, }