About me

I am a Post-Doctor at the Huazhong University of Science and Technology, in the HUST Media Lab, under the supervision of Prof. Junqing Yu. I am currently as an visiting scholar at the National University of Singapore, under the supervision of Prof. Xinchao Wang.

I major in Computer Science and my research interests lie in the areas of computer vision, motion estimation, and social network analysis.

News

  • 2024.12: Our Two papers on Tracking/Anomaly Detection are accepted to AAAI'25.
  • 2024.9: Our paper on Multimodal Tech is accepted to NeurIPS'24.
  • 2024.7: Our paper on Point Tracking is accepted to ACM MM'24.
  • 2024.3: Our paper on Feature Compress is accepted to ICME'24.

Education

Award and Service

  • ACM Outstanding Student
  • Outstanding Doctoral Scholarship
  • Conference PC/Reviewers : CVPR23/24/25, ICCV23, ECCV24, AAAI24/25, NIPS24, ICLR25, ACM MM24, IJCAI24
  • Journal Reviewers : TIP, PR, TCSVT, TMM, KBS, SIGPRO

Publications

Temporal Coherent Object Flow for Multi-Object Tracking
Zikai Song, Run Luo, Lintao Ma, Ying Tang, Yi-Ping Phoebe Chen, Junqing Yu, Wei Yang*
AAAI,2025
paper / Code

We propose a section-based multi-object tracking approach that integrates a temporal coherent Object Flow Tracker, capable of achieving simultaneous multi-frame tracking by treating multiple consecutive frames as the basic processing unit, denoted as a “section”.

Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model
Hang Zhou, Cai Jiale, Yuteng Ye, Yonghui Feng, Chenxing Gao, Junqing Yu, Zikai Song*, Wei Yang
AAAI,2025
paper / Code

We introduce innovative motion and appearance conditions that are seamlessly integrated into our patch diffusion model.

Coupled Mamba: Enhanced Multimodal Fusion with Coupled State Space Model
Wenbing Li, Hang Zhou, Junqing Yu, Zikai Song*, Wei Yang*
NeurIPS,2024
paper / Code

We propose the Coupled SSM model, for coupling state chains of multiple modalities while maintaining independence of intra-modality state processes.

Autogenic Language Embedding for Coherent Point Tracking
Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang*
ACM MM,2024
paper

We introduce a novel approach leveraging language embeddings to enhance the coherence of frame-wise visual features related to the same object.

Agnostic Feature Compression with Semantic Guided Channel Importance Analysis
Ying Tang, Wei Yang, Junqing Yu, Zikai Song*
ICME,2024
paper

We can apply compression operation to a deeper degree for less irrelevant parts to achieve a high compression rate, while preserving the performance by applying a lower compression ratio to the more important parts.

AMD: Anatomical Motion Diffusion with Interpretable Motion Decomposition and Fusion
Beibei Jing, Youjia Zhang, Zikai Song, Junqing Yu, Wei Yang*
AAAI,2024
arXiv

We propose the Adaptable Motion Diffusion (AMD) model, which leverages a Large Language Model (LLM) to parse the input text into a sequence of concise and interpretable anatomical scripts that correspond to the target motion.

Progressive Text-to-Image Diffusion with Soft Latent Direction
Yuteng Ye, Jiale Cai, Hang Zhou, Guanwen Li, Youjia Zhang, Zikai Song, Chenxing Gao, Junqing Yu, Wei Yang*
AAAI,2024
arXiv / Code

We propose to harness the capabilities of a Large Language Model (LLM) to decompose text descriptions into coherent directives adhering to stringent formats and progressively generate the target image.

DiffusionTrack: Diffusion Model For Multi-Object Tracking
Run Luo, Zikai Song*, Lintao Ma, Jinlin Wei, Wei Yang, Min Yang*
AAAI,2024
arXiv / Code

We formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes.

Compact Transformer Tracker with Correlative Masked Modeling
Zikai Song, Run Luo, Junqing Yu*, Yi-Ping Phoebe Chen, Wei Yang*
AAAI,2023 (Oral Presentation)
arXiv / Code

We demonstrate the basic vision transformer (ViT) architecture is sufficient for visual tracking with correlative masked modeling for information aggregation enhancement.

Transformer Tracking with Cyclic Shifting Window Attention
Zikai Song, Junqing Yu*, Yi-Ping Phoebe Chen, Wei Yang
CVPR, 2022
arXiv / Code

CSWinTT is a new transformer architecture with multi-scale cyclic shifting window attention for visual object tracking, elevating the attention from pixel to window level.

Distractor-Aware Tracker with a Domain-Special Optimized Benchmark for Soccer Player Tracking
Zikai Song, Zhiwen Wan, Wei Yuan, Ying Tang, Junqing Yu, Yi-Ping Phoebe Chen
ICMR, 2021
Project Page / Paper

We proposed a distractor-aware player tracking algorithm and a high-quality benchmark for soccer play tracking, deal with occlusion and similar distractors in soccer scenes.

Fine-Grain Level Sports Video Search Engine
Zikai Song, Junqing Yu, Hengyou Cai, Yangliu Hu, Yi-Ping Phoebe Chen
MultiMedia Modeling, 2020
Paper

We designed and developed a sports video search engine based on distributed architecture, aimimng to provide content-based video analysis and retrieval services

SSET: a dataset for shot segmentation, event detection, player tracking in soccer videos
Na Feng, Zikai Song, Junqing Yu, Yi-Ping Phoebe Chen, Yizhu Zhao, Yunfeng He, Tao Guan
Multimedia Tools and Applications, 2020
Project Page / Paper

We construct a soccer dataset named Soccer Dataset for Shot, Event, and Tracking, to meet the research needs of shot segmentation, event detection and player tracking

Comprehensive dataset of broadcast soccer videos
Junqing Yu, Aiping Lei, Zikai Song, Tingting Wang, Hengyou Cai, Na Feng
MIPR, 2018
Project Page / Paper

We focus on broadcast soccer videos and present a comprehensive dataset for analysis, including shot boundaries, event annotations, and bounding boxes.

Project

  • 2023 入选国家资助博士后计划
  • 2024 主持中国博士后科学基金面上项目
  • 2025 主持国家自然青年科学基金
  • 2025 主持湖北省博新计划项目