Education
2008.8 -- 2012.7: B.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology
2012.8 -- 2014.7: M.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology. Supervisor: Sheng Li
2014.8 -- 2018.9: Ph.D. in Joint Education Program of Microsoft Research Asia and Harbin Institute of Technology. My Ph.D. supervisor is Wei-Ying Ma and Tiejun Zhao. Thesis: Research and Applications of Image-Text Multimodal Correlation Learning
Work Experience
JD AI Research, CV Lab (2018.02 -- 2023.06)
Senior Researcher, working on snapshop, VQA, fine-grained recognition, relationships modeling in images, 3D imaging, etc..
Microsoft Research Asia, Web Search and Mining Group (2013.06 -- 2018.02)
Research intern working on deep learning for image representation and computer vision.
Microsoft Research Asia, Web Search and Data Mining Group (2012.01-2012-07)
Research intern working on document retrieval results re-ranking.
Selected Honors
First Place in AliProducts Challenge: Large-scale Product Recognition at CVPR 2020
Second Place in iMet: Fine-grained Attributes Recognition Challenge at CVPR 2020
First Place in iMaterialist Challenge on Product Recognition at CVPR 2019
First Place in Fieldguide Challenge: Moths and Butterflies at CVPR 2019
Second Place in iFood Challenge at FGVC workshop, CVPR 2019
Rank 1st in the track of without using extra data and 2nd in all teams at MSR Image Recognition Challenge at IEEE ICME 2016
ACM Multimedia 2015 Student Travel Grant
First Place in MSR-Bing Image Retreival Challenge at ACM MM 2014
Selected Open-source Projects
The pytorch implementation of our research paper "Destruction and Construction Learning for Fine-grained Image Recognition, CVPR2019". The first place solutions for CVPR 2020 AliProducts Challenge: Large-scale Product Recognition[1], CVPR 2019 iMaterialist Challenge on Product Recognition[2] and CVPR 2019 Fieldguide Challenge: Moths & Butterflies[3].
Official solution for the first Conversational Head Generation Challenge, including vivid talking head video generation, responsive listening head video generation and implementations of 11 quantitative evaluation metrics.
Important Preprints
Xiaoxiao Ma, Mohan Zhou, Tao Liang, Yalong Bai, Tiejun Zhao, Huaian Chen, Yi Jin. STAR: Scale-wise Text-to-image generation via Auto-Regressive representations. Arxiv. 2024 [pdf]
Yalong Bai, Mohan Zhou, Qing Yang. StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models. Arxiv, 2024 [pdf]
Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao. Interactive Conversational Head Generation. Arxiv, 2023 [pdf]
Jinhong Ni, Yalong Bai, Wei Zhang, Ting Yao, Tao Mei. Deep Equilibrium Multimodal Fusion. Arxiv, 2023 [pdf]
Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, Wei Zhang. Products-10K: A Large-scale Product Recognition Dataset. Arxiv, 2020 [pdf]
Publications
Google Scholar Profile
Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, Qing Yang. Dynamic Prompt Optimizing for Text-to-Image Generation. CVPR, 2024 [pdf]
Guiwei Zhang, Tianyu Zhang, Guanglin Niu, Zichang Tan, Yalong Bai, Qing Yang. CAMEL: CAusal Motion Enhancement tailored for Lifting Text-driven Video Editing. CVPR, 2024 [pdf]
Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei. Learning and Evaluating Human Preferences for Conversational Head Generation. ACM MultiMedia, 2023 [pdf]
Yalong Bai, Mohan Zhou, Yuxiang Chen, Wei Zhang, Bowen Zhou, Tao Mei. Augmentation Pathways Network for Visual Recognition. Transactions on Pattern Analysis and Machine Intelligence, 2023 [pdf]
Jie Ma*, Yalong Bai, Bineng Zhong, Wei Zhang, Ting Yao, Tao Mei. Visualizing and Understanding Patch Interactions in Vision Transformer. IEEE Transactions on Neural Networks and Learning Systems, 2023 [pdf]
Mohan Zhou*, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei. Visual-Aware Text-to-Speech. ICASSP, 2023 (top 3% paper)
Guoqing Ma*, Yalong Bai, Wei Zhang, Ting Yao, Basem Shihada, Tao Mei. Boosting Generic Visual-Linguistic Representation with Dynamic Contexts. IEEE Transations on Multimedia, 2023 [pdf]
Mohan Zhou*, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei. Responsive Listening Head Generation: A Benchmark Dataset and Baseline. ECCV, 2022 [pdf | Project]
Yalong Bai, Yifan Yang*, Wei Zhang, Tao Mei. Directional Self-supervised Learning for Heavy Image Augmentations. CVPR, 2022 [pdf]
Tianyu Hua*, Hongdong Zheng*, Yalong Bai, Wei Zhang, Xiao-Ping Zhang, Tao Mei. Exploiting Relationship for Complex-scene Image Generation. AAAI, 2021[pdf]
Mohan Zhou*, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei. Look-into-Object: Self-supervised Structure Modeling for Object Recognition. CVPR, 2020 [Source Code | pdf]
Yuanzhi Liang*, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei. VrR-VG: Refocusing Visually-Relevant Relationships. ICCV, 2019 [VrR-VG Dataset | pdf]
Yue Chen*, Yalong Bai, Wei Zhang, Tao Mei. Destruction and Construction Learning for Fine-grained Image Recognition. CVPR, 2019 [Source Code | pdf]
Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei. Deep Attention Neural Tensor Network for Visual Question Answering. ECCV, 2018 [pdf]
Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao. Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018. [Dataset and Models Download | pdf]
Chang Xu, Tao Qin, Yalong Bai, Gang Wang, Tie-Yan Liu. Convolutional Neural Networks For Posed and Spontaneous Expression Recognition. IEEE International Conference on Multimedia and Expo, 2017
Guotian Xie, Kuiyuan Yang, Yalong Bai, Min Shang, Yong Rui, Jianhuang Lai. Improve Dog Recognition By Mining More Information From Both Click-through Logs and Pre-trained Models. IEEE International Conference on Multimedia & Expo Workshops, 2016 [pdf]
Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, Tiejun Zhao. Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network. Full Paper, ACM MultiMedia, 2015 [Dataset Download | pdf]
Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Learning Cross Space Mapping via DNN using Large Scale Click-through Logs. IEEE Transactions on Multimedia, 2015.
Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Visualizing and Comparing Convolutional Neural Networks.
Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, Tie-Yan Liu. RC-NET: A General Framework for Incorporating Knowledge into Word Representations. CIKM, 2014 [pdf]
Yalong Bai, Wei Yu, Tianjun Xiao, Chang Xu, Kuiyuan Yang, Wei-Ying Ma, Tiejun Zhao. Bag-of-Words Based Deep Neural Network for Image Retrieval. Short Paper. ACM MultiMedia, 2014 [pdf]
Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. DNN Flow: DNN feature pyramid based image matching. BMVC, 2014.
Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao. Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Clickthrough Data. ICLR, 2014.
Mo Yu, Tiejun Zhao and Yalong Bai, Hao Tian, Dianhai Yu. Cross-lingual Projections between Languages from different Families. ACL2013 short paper.
Mo Yu, Tiejun Zhao, Yalong Bai. Learning Domain Differences Automatically for Dependency Parsing Adaptation. IJCAI 2013 poster.
Note *: interns that I mentored.
Professional Activities
Workshop & Challenge Organizer
The second Conversational Head Generation Challenge held in conjunction with ACM Multimedia 2023 at Ottawa, Canada. [leaderboard]
The first Conversational Head Generation Challenge held in conjunction with ACM Multimedia 2022 at Lisbon, Portugal. [leaderboard]
Products-10K: Large Scale Product Recognition Challenge held in conjunction with ICPR 2020 at Milan, Italy. [leaderboard]
Workshop 《视觉与语言协同学习及应用》 at VALSE 2021 at Hangzhou, China.
Journal Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Image Processing
IEEE Transactions on Multimedia
IEEE Transactions on Neural Networks and Learning Systems
IEEE Transactions on Circuits and Systems for Video Technology
ACM Transactions on Intelligent Systems and Technology
Transactions on Multimedia Computing Communications and Applications
Conference Reviewer / Program Committee Member
Reviewer (PC Member): CVPR (2019-); ICCV (2019-); ECCV (2020); NeurIPS (2022); ACM Multimedia (2021-); AAAI (2019-)
SPC: IJCAI 2021
Area Chair / Meta-Reviewer: CICAI 2022; ACMMM (2022-) Track Grand Challenges, ICASSP (2023-)
Other
Executive Area Chairs Committee, VALSE (2020-)