Ziyan Wu
I am a Principal Expert Scientist at
UII America
in Cambridge, MA, where I work on computer vision and
machine learning problems in medical environments. Before
joining UII I have worked at Siemens and Honeywell
repsectively.
I received my PhD in Computer and Systems Engineering
from Department of Electrical, Computer, and Systems Engineering of Rensselaer Polytechnic
Institute in May 2014 under the supervision of Prof.
Richard J. Radke.
Google
Scholar /
LinkedIn / Email
|
|
News
|
Research
My research interests are computer vision and machine
learning, with special focus on object detection and
tracking, anomaly detection, augmented reality, scene
understanding, human re-identification and camera calibration.
|
|
Learning Hierarchical Attention for Weakly-supervised Chest X-Ray Abnormality Localization and Diagnosis
Xi Ouyang*,
Srikrishna Karanam,
Ziyan Wu,Terrence Chen,
Jiayu Huo,
Xiang Sean Zhou,
Qian Wang,
Jie-Zhi Cheng,
IEEE Transations on Medical Imaging (TMI),
To Appear
We propose a new attention-driven weakly supervised algorithm comprising a hierarchical attention mining framework that unifies activation- and gradient-based visual attention in a holistic manner. Our key algorithmic innovations include the design of explicit ordinal attention constraints, enabling principled model training in a weakly-supervised fashion, while also facilitating the generation of visual-attention-driven model explanations by means of localization cues.
|
|
Robust Multi-modal 3D Patient Body Modeling
Fan Yang*,
Ren Li*,
Georgios Georgakis,
Srikrishna Karanam,
Terrence Chen,
Haibin Ling,
Ziyan Wu
Medical Image Computing and Computer Assisted Intervention (MICCAI), 2020
This paper considers the problem of 3D patient body modeling. Such a 3D model provides valuable information for improving patient care, streamlining clinical workflow, automated parameter optimization for medical devices etc. We present a novel robust dynamic fusion technique that facilitates flexible multi-modal inference, resulting in accurate 3D body modeling even when the input sensor modality is only a subset of the training modalities..
|
|
Hierarchical
Kinematic Human Mesh Recovery
Georgios Georgakis*,
Ren Li*,
Srikrishna Karanam,
Terrence Chen,
Jana Kosecka,
Ziyan Wu
European Conference on Computer Vision (ECCV), 2020
In this work, we address this gap by proposing a new
technique for regression of human parametric model that is
explicitly informed by the known hierarchical structure,
including joint interdependencies of the model. This results
in a strong prior-informed design of the regressor
architecture and an associated hierarchical optimization
that is flexible to be used in conjunction with the current
standard frameworks for 3D human mesh recovery.
|
|
Towards Contactless Patient Positioning
Srikrishna Karanam*,
Ren Li*,
Fan Yang*,
Wei Hu,
Terrence Chen,
Ziyan Wu
IEEE Transations on Medical Imaging (TMI), August 2020
MICCAI webinar talk
The COVID-19 pandemic, caused by the highly contagious
SARS-CoV-2 virus, has overwhelmed healthcare systems
worldwide, putting medical professionals at a high risk of
getting infected themselves due to a global shortage of
personal protective equipment. To help alleviate this
problem, we design and develop a contactless patient
positioning system that can enable scanning patients in a
completely remote and contactless fashion. Our key design
objective is to reduce the physical contact time with a
patient as much as possible, which we achieve with our
contactless workflow.
|
|
Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19
Feng Shi*,
Jun Wang*,
Jun Shi*,
Ziyan Wu,
Qian Wang,
Zhenyu Tang,
Kelei He,
Yinghuan Shi
,
Dinggang Shen
IEEE Reviews in Biomedical Engineering (RBME), April 2020
We cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up. We particularly focus on the integration of AI with X-ray and CT, both of which are widely used in the frontline hospitals, in order to depict the latest progress of medical imaging and radiology fighting against COVID-19.
|
|
Towards Visually Explaining Variational Autoencoders
Wenqian
Liu*,
Runze Li*,
Meng Zheng,
Srikrishna Karanam,
Ziyan Wu,
Bir Bhanu,
Richard J. Radke,
Octavia Camps
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2020 (oral)
We propose the first technique to visually explain VAEs by means of gradient-based attention. We present methods to generate visual attention from the learned latent space, and also demonstrate such attention explanations serve more than just explaining VAE predictions. We show how these attention maps can be used to localize anomalies in images,
and how they can be infused into model training, helping
bootstrap the VAE into learning improved latent space
disentanglement.
|
|
Incremental Scene Synthesis
Benjamin Planche,
Xuejian
Rong, Ziyan Wu,
Srikrishna Karanam,Harald Kosch, YingLi Tian, Jan Ernst,
Andreas Hutter
Annual Conference on Neural Information Processing Systems (NeurIPS), 2019
We present a method to incrementally generate complete 2D or 3D scenes with global consistentcy at each step according to a
learned scene prior. Real observations of a scene can be incorporated while observing global consistency and unobserved regions can be hallucinated locally in
consistence with previous observations, hallucinations as well as global priors. Hallucinations are statistical in nature, i.e., different scenes can be generated from
the same observations.
|
 |
Sharpen Focus: Learning with Attention Separability and Consistency
Lezi Wang,
Ziyan Wu,
Srikrishna Karanam,
Kuan-Chuan Peng,
Rajat Vikram Singh,
Bo Liu,
Dimitris N. Metaxas
IEEE International Conference on Computer Vision (ICCV), 2019
We improve the generalizability of CNNs by means of a new framework that makes class-discriminative attention a principled part of the learning process. We propose new learning objectives for attention separability and cross-layer consistency, which result in improved attention discriminability and reduced visual
confusion.
|
 |
Learning Local RGB-to-CAD Correspondences for Object Pose Estimation
Georgios Georgakis,
Srikrishna Karanam,
Ziyan Wu,
Jana Kosecka
IEEE International Conference on Computer Vision (ICCV), 2019
We solve the key problem of existing 3D object pose estimation methods requiring expensive 3D pose annotations by proposing a new method that matches RGB images to CAD models for object pose estimation.
Our method requires neither real-world textures for CAD models nor explicit 3D pose annotations for RGB images.
|
 |
Guided Attention Inference Network
Kunpeng Li,
Ziyan Wu,
Kuan-Chuan Peng, Jan Ernst,
Yun Fu
IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), to appear, 2019
This is an extension of our
CVPR 18 work with added
support of bounding box labels seamlessly integrated with
image level and pixel level labels for weakly supervised
semantic segmentation.
|
 |
Seeing Beyond Appearance - Mapping Real Images into Geometrical Domains for Unsupervised CAD-based Recognition
Benjamin Planche*, Sergey Zakharov*, Ziyan Wu,
Andreas Hutter,
Harald Kosch,
Slobodan Ilic
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019
We introduce a pipeline to map unseen target samples into the
synthetic domain used to train task-specific methods.
Denoising the data and retaining only the features these
recognition algorithms are familiar with.
|
|
Learning without Memorizing
Prithviraj Dhar*,
Rajat Vikram Singh*,
Kuan-Chuan Peng, Ziyan Wu,
Rama Chellappa
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2019
Knowledge distillation should not only focus on "what",
but also "why". We peoposed an online learning method to
preserve the exisiting knowledge without storing any
data, while making the classifier progressively learn to
encode the new classes.
|
|
Re-identification with Consistent Attentive Siamese Networks
Meng Zheng,
Srikrishna Karanam,
Ziyan Wu,
Richard J. Radke
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2019
We proposed the first learning architecture that integrates attention consistency modeling and Siamese representation learning in a joint learning framework, called the Consistent Attentive Siamese Network (CASN), for person re-id.
|
|
Counterfactual Visual Explanations
Yash Goyal, Ziyan Wu,
Jan Ernst,
Dhruv Batra,
Devi Parikh,
Stefan Lee
International Conference on Machine Learning (ICML), 2019
slides /
supplementary
A technique to produce counterfactual visual explanations. Given a 'query' image I for which a vision system predicts class c, a counterfactual visual explanation identifies how I could change such that the system would output a different specified class c′.
|
|
A Systematic Evaluation and Benchmark for Person Re-Identification: Features, Metrics, and Datasets
Srikrishna Karanam*,
Mengran Gou*,
Ziyan Wu, Angels Rates-Borras,
Octavia Camps,
Richard J. Radke
IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), Vol. 41, No. 3, pp. 523-536, March 2019
supplementary
/
dataset /
code
We present an extensive review and
performance evaluation of single and multi-shot re-id algorithms. The experimental protocol incorporates 11 feature extraction
and 22 metric learning and ranking techniques and evaluates
using a new large-scale dataset that closely mimics a real-world problem setting, in addition to 16 other publicly available datasets.
|
|
Zero Shot Deep Domain Adaptation
Kuan-Chuan Peng, Ziyan Wu,
Jan Ernst
European Conference on Computer Vision (ECCV), 2018
We propose zero-shot deep domain adaptation (ZDDA) for
domain adaptation and sensor fusion. ZDDA learns from the
task-irrelevant dual-domain pairs when the task-relevant
target-domain training data is unavailable.
|
|
Tell Me Where To Look: Guided Attention Inference Network
Kunpeng Li,
Ziyan Wu,
Kuan-Chuan Peng, Jan Ernst,
Yun Fu
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR),
2018 (spotlight)
code by
alokwhitewolf
/
talk
In one common framework we address three shortcomings of
previous approaches in modeling such attention maps: We (1)
first time make attention maps an explicit and natural
component of the end-to-end training, (2) provide
self-guidance directly on these maps by exploring
supervision form the network itself to improve them, and (3)
seamlessly bridge the gap between using weak and extra
supervision if available.
|
|
Learning Compositional Visual Concepts with Mutual Consistency
Yunye Gong,
Srikrishna Karanam,
Ziyan Wu,
Kuan-Chuan Peng, Jan Ernst,
Peter C. Doerschuk
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2018 (spotlight)
video
We proposed ConceptGAN, a novel concept learning
framework where we seek to capture underlying semantic
shifts between data domains instead of mappings restricted
to training distributions. The key idea is that via joint
concept learning, transfer and composition, information over
a joint latent space is recovered given incomplete training
data.
|
|
End-to-End Learning of Keypoint Detector and Descriptor for Pose Invariant 3D Matching
Georgios Georgakis,
Srikrishna Karanam,
Ziyan Wu, Jan Ernst,
Jana Kosecka
IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2018
Related Product: Siemens
EasySpareIDea®
We proposed an end-to-end learning
framework for keypoint detection and its representation (descriptor) for 3D depth maps or 3D scans, where the two can
be jointly optimized towards task-specific objectives without
a need for separate annotations.
|
|
Learning Affine Hull Representations for Multi-Shot Person Re-Identification
Srikrishna Karanam,
Ziyan Wu,
Richard J. Radke
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT),
Vol.28, No.10, pp.2500-2512, Oct 2018
We describe the image sequence data using affine hulls,
and we show that directly
computing the distance between the closest points on these affine
hulls as in existing recognition algorithms is not sufficiently
discriminative in the context of person re-identification. To this
end, we incorporate affine hull data modeling into the traditional
distance metric learning framework, learning discriminative
feature representations directly using affine hulls.
|
|
Keep it Unreal: Bridging the Realism Gap for 2.5D Recognition with Geometry Priors Only
Sergey Zakharov*, Benjamin Planche*, Ziyan Wu,
Andreas Hutter,
Harald Kosch,
Slobodan Ilic
International Conference on 3D Vision (3DV), 2018 (oral)
We propose a novel approach leveraging only CAD models to bridge the realism gap. Purely
trained on synthetic data, playing against an extensive augmentation pipeline in an unsupervised manner, a generative adversarial network learns to effectively segment depth
images and recover the clean synthetic-looking depth information even from partial occlusions.
|
|
Weakly Supervised Summarization of Web Videos
Rameswar Panda, Abir Das,
Ziyan Wu, Jan Ernst,
Amit K. Roy-Chowdhury
IEEE International Conference on Computer Vision (ICCV),
2017
supplementary
We proposed a weakly supervised approach to summarize
videos with only video-level annotation, introducing an
effective method for computing spatio-temporal importance
scores without resorting to additional training steps.
|
|
DepthSynth: Real-Time Realistic Synthetic Data Generation from CAD Models for 2.5D Recognition
Benjamin Planche*, Ziyan Wu,
Kai Ma,
Shanhui
Sun, Stefan Kluckner,
Terrence Chen,
Andreas Hutter,
Harald Kosch,
Jan Ernst
International Conference on 3D Vision (3DV),
2017 (oral)
Related Product: Siemens
EasySpareIDea®
We propose an end-to-end framework which simulates the whole mechanism of 3D sensors (structured light and TOF), generating realistic depth data from 3D models by comprehensively modeling vital factors e.g. sensor noise, material reflectance, surface geometry.
|
|
Vessel Tree Tracking in Angiographic Sequences
Dong Zhang,
Shanhui
Sun,
Ziyan Wu,
Bor-Jeng Chen,
Terrence Chen
Journal of Medical Imaging (JMI),
Vol.4, No.2, 025001, 2017
We present a method to track vessels in angiography. Our
method maximizes the appearance similarity while preserving
the vessel structure. The vessel tree
tracking problem turns into finding the most similar tree from the DAG in the next frame, and it is solved
using an efficient dynamic programming algorithm.
|
|
From the Lab to the Real World: Re-Identification in an Airport Camera Network
Octavia Camps,
Mengran Gou,
Tom
Hebble,
Srikrishna Karanam,
Oliver Lehmann,
Yang
Li,
Richard J. Radke,
Ziyan Wu,
Fei Xiong
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT),
Vol. 27, No. 3, pp. 540-553, Mar 2017
We detail the challenges of the real-world airport environment, the computer vision algorithms underlying our human detection and re-identification algorithms,
our robust software architecture, and the ground-truthing system required to provide the training and validation data for the algorithms.
|
|
Guidewire Tracking Using a Novel Sequential Segment Optimization Method in Interventional X-Ray Videos
Bor-Jeng Chen,
Ziyan Wu,
Shanhui
Sun,
Dong Zhang,
Terrence Chen
IEEE International Symposium on Biomedical Imaging (ISBI),
2016
We model the wire-like structure as a sequence of small segments and formulate guidewire tracking as a graph-based optimization problem which aims to find the optimal link set.
To overcome distracters, we extract them from the dominant motion pattern and propose a confidence re-weighting process in the appearance measurement.
|
 |
Human Re-Identification
Ziyan Wu
Springer,
2016 ISBN 978-3-319-40991-7 This book covers aspects of human re-identification problems related to computer vision and machine learning. Working from a practical perspective, it introduces novel algorithms and designs for human re-identification that bridge the gap between research and reality. The primary focus is on building a robust, reliable, distributed and scalable smart surveillance system that can be deployed in real-world scenarios.
|
|
Viewpoint Invariant Human Re-Identification in Camera Networks Using Pose Priors and Subject-Discriminative Features
Ziyan Wu,
Yang
Li,
Richard J. Radke
IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), Vol. 37, No. 5, pp. 1095-1108, May 2015
We build a model for human
appearance as a function of pose, using training data gathered from a calibrated camera. We then apply this “pose prior” in
online re-identification to make matching and identification more robust to viewpoint. We further integrate person-specific features
learned over the course of tracking to improve the algorithm’s performance.
|
|
Multi-Shot Human Re-Identification Using Adaptive Fisher Discriminant Analysis
Yang
Li, Ziyan Wu,
Srikrishna Karanam,
Richard J. Radke
British Machine Vision Conference (BMVC),
2015
We introduce an algorithm to
hierarchically cluster image sequences and use the representative data samples to learn a
feature subspace maximizing the Fisher criterion. The clustering and subspace learning
processes are applied iteratively to obtain diversity-preserving discriminative features.
|
|
Multi-Shot Re-identification with Random-Projection-based Random Forest
Yang
Li,
Ziyan Wu,
Richard J. Radke
IEEE Winter Conference on Applications of Computer Vision (WACV),
2015
We perform dimensionality reduction on image feature vectors through random projection for multi-shot Re-ID. A random forests is trained based on pairwise constraints in
the projected subspace. During run-time, we select personalized random forests for each subject using their multi-shot appearances.
|
|
Virtual Insertion: Robust Bundle Adjustment over Long Video Sequences
Ziyan Wu,
Han-Pang Chiu,
Zhiwei Zhu
British Machine Vision Conference (BMVC),
2014 (oral)
talk
We propose a novel “virtual insertion” scheme
for Structure from Motion (SfM), which constructs virtual points and virtual frames to adapt the existence of visual landmark link
outage, namely “visual breaks” due to no common features observed from neighboring
camera views in challenging environments.
|
|
Multi-Object Tracking and Association With a Camera Network
Ziyan Wu
Doctoral Dissertation, Rensselaer Polytechnic Institute (RPI),
2014
Video surveillance is a critical issue for defense and
homeland security applications. There are three key steps of
video surveillance: system calibration, multi-object
tracking, and target behavior analysis. In this thesis we
investigate several important and challenging computer
vision problems and applications related to these three
steps, in order to improve the performance of video
surveillance.
|
|
Improving Counterflow Detection in Dense Crowds with Scene Features
Ziyan Wu,
Richard J. Radke
Pattern Recognition Letters (PRL), Vol. 44, pp. 152-160, July 15, 2014
This paper addresses the problem of detecting counterflow motion in
videos of highly dense crowds. We focus on improving the detection performance by identifying scene features — that is, features on motionless
background surfaces. We propose a three-way classifier to differentiate counterflow from normal flow, simultaneously identifying scene features based on
statistics of low-level feature point tracks.
|
|
Real-World Re-Identification in an Airport Camera Network
Yang
Li,
Ziyan Wu,
Srikrishna Karanam, Richard J. Radke
ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC),
2014
We discuss the high-level system design of the video surveillance application, and the issues we encountered during our development and testing. We also describe the algorithm framework for our human re-identification software, and discuss considerations of speed and matching performance.
|
|
Keeping a PTZ Camera Calibrated
Ziyan Wu, Richard J. Radke
IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), Vol. 35, No. 8, pp. 1994-2007, 2013
We propose a complete model for a pan-tilt-zoom camera
that explicitly reflects how focal length and lens distortion vary as a function of zoom scale. We show how the parameters of this model
can be quickly and accurately estimated using a series of simple initialization steps followed by a nonlinear optimization. We also show how the calibration parameters can be maintained using
a one-shot dynamic correction process; this ensures that the camera returns the same field of view every time the user requests a given
(pan, tilt, zoom), even after hundreds of hours of operation.
|
|
Using Scene Features to Improve Wide-Area Video Surveillance
Ziyan Wu, Richard J. Radke
Workshop on Camera Networks and Wide Area Scene Analysis, in conjunction with CVPR (CVPRW), 2012
We introduce two novel methods to improve the performance of wide area video surveillance applications by using scene features.
|
|
Real-Time Airport Security Checkpoint Surveillance Using a Camera Network
Ziyan Wu, Richard J. Radke
Workshop on Camera Networks and Wide Area Scene Analysis, in conjunction with CVPR (CVPRW), 2011
video
We introduce an airport security checkpoint surveillance
system using a camera network. The system tracks the
movement of each passenger and carry-on bag, continuously maintains the association between bags and passengers, and verifies that passengers leave the checkpoint with
the correct bags.
|
|
Towards Improved Paper-based Election Technology
Elisa Barney
Smith, Daniel Lopresti,
George Nagy, Ziyan Wu
International Conference on Document Analysis and Recognition (ICDAR), 2011
Resources are presented for fostering paper-based election technology. They comprise a diverse collection of real and simulated ballot and survey images, and software tools for ballot synthesis, registration, segmentation, and ground truthing.
|
|
Characterizing Challenged Minnesota Ballots
George Nagy,
Daniel Lopresti,
Elisa Barney
Smith, Ziyan Wu
Document Recognition and Retrieval XVIII (DRR), 2011
Photocopies of the ballots challenged in the 2008 Minnesota elections, which constitute a public record, were scanned on a high-speed scanner and made available on a public radio website. Based on a review of relevant image-processing aspects of paper-based election machinery and on additional statistics and observations on the posted sample data, robust tools were developed for determining the underlying grid of the targets on these ballots regardless of skew, clipping, and other degradations caused by high-speed copying and digitization.
|
 |
Associate Editor, IEEE Access, 2017 - present
Organizer,
Worshop on Vision with Biased and Scarce
Data, in conjunction with CVPR 2019
Organizer,
Worshop on Vision with Biased and Scarce
Data, in conjunction with CVPR 2018
Organizer, EXPO
Spotlight, CVPR 2017
Organizer,
Vision Industry and Entrepreneur Workshop, in conjunction with CVPR
2016
|
|