Ph.D., Compute Vision Researcher


ylbai AT outlook DOT com


  • 2008.8 -- 2012.7: B.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology
  • 2012.8 -- 2014.7: M.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology. Supervisor: Sheng Li
  • 2014.8 -- 2018.9: Ph.D. in Joint Education Program of Microsoft Research Asia and Harbin Institute of Technology. My Ph.D. supervisor is Wei-Ying Ma and Tiejun Zhao. Thesis: Research and Applications of Image-Text Multimodal Correlation Learning
  • Work Experience

    JD AI Research, CV Lab (2018.02 -- Now)

    Researcher in Image Group, working on snapshop, VQA, fine-grained recognition, relationships modeling in images.

    Microsoft Research Asia, Web Search and Mining Group (2013.06 -- 2018.02)

    Research intern working on deep learning for image representation and computer vision.

    Microsoft Research Asia, Web Search and Data Mining Group (2012.01-2012-07)

    Research intern working on document retrieval results re-ranking.

    Selected Honors

  • First Place in AliProducts Challenge: Large-scale Product Recognition at CVPR 2020
  • Second Place in iMet: Fine-grained Attributes Recognition Challenge at CVPR 2020
  • First Place in iMaterialist Challenge on Product Recognition at CVPR 2019
  • First Place in Fieldguide Challenge: Moths and Butterflies at CVPR 2019
  • Second Place in iFood Challenge at FGVC workshop, CVPR 2019
  • Rank 1st in the track of without using extra data and 2nd in all teams at MSR Image Recognition Challenge at IEEE ICME 2016
  • ACM Multimedia 2015 Student Travel Grant
  • First Place in MSR-Bing Image Retreival Challenge at ACM MM 2014
  • Seletected Open-source Projects

    DCL for Fine-grained Image Recognition

    The pytorch implementation of our research paper "Destruction and Construction Learning for Fine-grained Image Recognition, CVPR2019". The first place solutions for CVPR 2020 AliProducts Challenge: Large-scale Product Recognition[1], CVPR 2019 iMaterialist Challenge on Product Recognition[2] and CVPR 2019 Fieldguide Challenge: Moths & Butterflies[3].

    Conversational Head Generator

    Official solution for the first Conversational Head Generation Challenge, including vivid talking head video generation, responsive listening head video generation and implementations of 11 quantitative evaluation metrics.

    Important Preprints

  • Jie Ma, Yalong Bai, Bineng Zhong, Wei Zhang, Ting Yao, Tao Mei. Visualizing and Understanding Patch Interactions in Vision Transformer. Arxiv, 2022 [pdf]
  • Yalong Bai, Mohan Zhou, Yuxiang Chen, Wei Zhang, Bowen Zhou, Tao Mei. Augmentation Pathways Network for Visual Recognition. Arxiv, 2021 [pdf]
  • Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, Wei Zhang. Products-10K: A Large-scale Product Recognition Dataset. Arxiv, 2020 [pdf]
  • Publications

    Google Scholar Profile

  • Guoqing Ma*, Yalong Bai, Wei Zhang, Ting Yao, Basem Shihada, Tao Mei. Boosting Generic Visual-Linguistic Representation with Dynamic Contexts. IEEE Transations on Multimedia, 2023
  • Mohan Zhou*, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei. Responsive Listening Head Generation: A Benchmark Dataset and Baseline. ECCV, 2022 [pdf | Project]
  • Yalong Bai, Yifan Yang*, Wei Zhang, Tao Mei. Directional Self-supervised Learning for Heavy Image Augmentations. CVPR, 2022 [pdf]
  • Tianyu Hua*, Hongdong Zheng*, Yalong Bai, Wei Zhang, Xiao-Ping Zhang, Tao Mei. Exploiting Relationship for Complex-scene Image Generation. AAAI, 2021[pdf]
  • Mohan Zhou*, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei. Look-into-Object: Self-supervised Structure Modeling for Object Recognition. CVPR, 2020 [Source Code | pdf]
  • Yuanzhi Liang*, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei. VrR-VG: Refocusing Visually-Relevant Relationships. ICCV, 2019 [VrR-VG Dataset | pdf]
  • Yue Chen*, Yalong Bai, Wei Zhang, Tao Mei. Destruction and Construction Learning for Fine-grained Image Recognition. CVPR, 2019 [Source Code | pdf]
  • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei. Deep Attention Neural Tensor Network for Visual Question Answering. ECCV, 2018 [pdf]
  • Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao. Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018. [Dataset and Models Download | pdf]
  • Chang Xu, Tao Qin, Yalong Bai, Gang Wang, Tie-Yan Liu. Convolutional Neural Networks For Posed and Spontaneous Expression Recognition. IEEE International Conference on Multimedia and Expo, 2017
  • Guotian Xie, Kuiyuan Yang, Yalong Bai, Min Shang, Yong Rui, Jianhuang Lai. Improve Dog Recognition By Mining More Information From Both Click-through Logs and Pre-trained Models. IEEE International Conference on Multimedia & Expo Workshops, 2016 [pdf]
  • Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, Tiejun Zhao. Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network. Full Paper, ACM MultiMedia, 2015 [Dataset Download | pdf]
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Learning Cross Space Mapping via DNN using Large Scale Click-through Logs. IEEE Transactions on Multimedia, 2015.
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Visualizing and Comparing Convolutional Neural Networks.
  • Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, Tie-Yan Liu. RC-NET: A General Framework for Incorporating Knowledge into Word Representations. CIKM, 2014 [pdf]
  • Yalong Bai, Wei Yu, Tianjun Xiao, Chang Xu, Kuiyuan Yang, Wei-Ying Ma, Tiejun Zhao. Bag-of-Words Based Deep Neural Network for Image Retrieval. Short Paper. ACM MultiMedia, 2014 [pdf]
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. DNN Flow: DNN feature pyramid based image matching. BMVC, 2014.
  • Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao. Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Clickthrough Data. ICLR, 2014.
  • Mo Yu, Tiejun Zhao and Yalong Bai, Hao Tian, Dianhai Yu. Cross-lingual Projections between Languages from different Families. ACL2013 short paper.
  • Mo Yu, Tiejun Zhao, Yalong Bai. Learning Domain Differences Automatically for Dependency Parsing Adaptation. IJCAI 2013 poster.
  • Note *: interns that I mentored at JD AI Research.

    Professional Activities

    Workshop & Challenge Organizer

  • The first Conversational Head Generation Challenge held in conjunction with ACM Multimedia 2022 at Lisbon, Portugal. [leaderboard]
  • Products-10K: Large Scale Product Recognition Challenge held in conjunction with ICPR 2020 at Milan, Italy. [leaderboard]
  • Workshop 《视觉与语言协同学习及应用》 at VALSE 2021 at Hangzhou, China.
  • Journal Reviewer

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • IEEE Transactions on Image Processing
  • IEEE Transactions on Multimedia
  • IEEE Transactions on Neural Networks and Learning Systems
  • IEEE Transactions on Circuits and Systems for Video Technology
  • ACM Transactions on Intelligent Systems and Technology
  • Transactions on Multimedia Computing Communications and Applications
  • Conference Reviewer / Program Committee Member

  • Reviewer (PC Member): CVPR (2019-); ICCV (2019-); ECCV (2020); NeurIPS (2022); ACM Multimedia (2021-); AAAI (2019-)
  • SPC: IJCAI 2021
  • Area Chair: CICAI 2022; ACMMM 2022 Track Grand Challenges
  • Other

  • Executive Area Chairs Committee, VALSE 2020, 2021