Yu-hsi Chen

About Me

I am a Computer Vision Engineer with over ten years of experience in deep learning and video/image processing, currently working at the Institute of Information Science, Academia Sinica (Taiwan)(born 1991). My research focuses on visual modeling, temporal modeling, and generative vision, with applications in object detection and tracking, behavior recognition, and robust visual understanding under adverse conditions.

I have developed multiple state-of-the-art systems, including NeighborTrack, a lightweight single-object tracking method that achieved top performance on LaSOT and ranked #1 on UAV123, and LD-YOLOv7, a latent-diffusion-augmented license plate recognition framework that achieves strong synthetic-to-real generalization under rain and fog. Recently, I have been integrating Vision-Language Models and structured temporal models (S4/ViS4mer) to construct a Universal Action Space for long-term behavior analysis, and exploring SAM2 and GroundingDINO for unsupervised video object segmentation.

I received my M.S. degree in Electrical Engineering and a B.S. degree in Computer Information and Network Engineering from Lunghwa University of Science and Technology, graduating first in my department. My work bridges academic research and practical applications, with publications in CVPR Workshops and IEEE CAI.

For more detailed information about my professional and academic background, please see my CV. I am open to research discussions or collaboration opportunities; feel free to contact me at franktpmvu@gmail.com.

Work Experiences

2015/07 - Present

Computer Vision Engineer (Research Assistant) at Academia Sinica, Institute of Information Science
supervised by Mark Liao

Education

2013/09 - 2015/08

Master of Science (M.S.), Electrical Engineering (EE) of Lunghwa University of Science and Technology, Taipei, Taiwan

2009/09 - 2013/07

Bachelor of Science (B.S.), Computer Information and Network Engineering of Lunghwa University of Science and Technology, Taipei, Taiwan

Research Projects

Deep Learning-based Animal Behavior Analysis: Insights from Mouse Chronic Pain Models (2025) [Code]
Yu-Hsi Chen et al.
Keywords: S4 / ViS4mer / VST / V-JEPA / VideoCLIP / SAM2 / GroundingDINO

Developed a modular system for long-term behavior analysis using vision-only and vision-language models. Constructed a Universal Action Space from Kinetics-600, applied S4/ViS4mer for temporal modeling, and integrated SAM2 and GroundingDINO for unsupervised video object segmentation and foreground/background separation.

License Plate Recognition in Low Quality Images using Latent Diffusion (LD-YOLOv7) (2024) [Code]
Yu-Hsi Chen et al.

Designed a synthetic license plate generation pipeline with weather degradation (rain/fog) to improve recognition robustness without real data. Integrated Latent Diffusion into YOLOv7 to restore intermediate features and enhance performance in adverse weather. Achieved 87.38% on AOLP and strong synthetic-to-real generalization.

NeighborTrack: Single Object Tracking with Spatiotemporal Context (2023) [Code]
Yu-Hsi Chen et al.

Proposed a lightweight module that leverages neighboring tracklets to reduce ID switches and improve tracking stability. Achieved 72.2% AUC on LaSOT and Ranked #1 on UAV123 until 2024.

Bounding box color legend: Red = Our method, Green = Ground Truth, Magenta = Others.

Publications

License Plate Recognition in Low Quality Images using Latent Diffusion YOLOv7
Yu-Hsi Chen et al.
In 2024 IEEE Conference on Artificial Intelligence (CAI), 2024
NeighborTrack: Single Object Tracking by Bipartite Matching With Neighbor Tracklets and Its Applications to Sports [code]
Yu-Hsi Chen, Chien-Yao Wang, Cheng-Yun Yang, Hung-Shuo Chang, Youn-Long Lin, Yung-Yu Chuang, Hong-Yuan Mark Liao
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR), 2023
A multiresolution approach to recovering colors and details of clipped image regions
Yi-Hung Lu, Yu-Hsi Chen, Hsueh-Yi Sean Lin
In International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP), 2015
Full-Frame Video Stabilization via SIFT Feature Matching (Excellent Paper Awards)
Yu-Hsi Chen, Hsueh-Yi Sean Lin, and Chih Wen Su
In International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP), 2014