Compute Vision Researcher

JD AI Research, CV Lab


baiyalong AT jd DOT com


Building A, North-Star Century Cente


No 8 Beichen West Street, Chaoyang District


Beijing 100105, China

Research Intern Position Opening

Please send me your cv if you are interested.


  • 2008.8 -- 2012.7: B.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology
  • 2012.8 -- 2014.7: M.S. in Computer Science and Technology, School of Computer Science And Technology, Harbin Institute of Technology. Supervisor: Sheng Li
  • 2014.8 -- 2018.9: Ph.D. in Joint Education Program of Microsoft Research Asia and Harbin Institute of Technology. My Ph.D. supervisor is Wei-Ying Ma and Tiejun Zhao. Thesis: Research and Applications of Image-Text Multimodal Correlation Learning
  • Selected Honors

  • First Place in AliProducts Challenge: Large-scale Product Recognition at CVPR 2020
  • Second Place in iMet: Fine-grained Attributes Recognition Challenge at CVPR 2020
  • First Place in iMaterialist Challenge on Product Recognition at CVPR 2019
  • First Place in Fieldguide Challenge: Moths and Butterflies at CVPR 2019
  • Second Place in iFood Challenge at FGVC workshop, CVPR 2019
  • Rank 1st in the track of without using extra data and 2nd in all teams at MSR Image Recognition Challenge at IEEE ICME 2016
  • ACM Multimedia 2015 Student Travel Grant
  • First Place in MSR-Bing Image Retreival Challenge at ACM MM 2014
  • News

  • 2020.8.20: Products-10K: Large Scale Product Recognition Challenge @ ICPR2020 is Launched at Kaggle.
  • 2020.6.9: We got first place in CVPR 2020 AliProducts Challenge: Large-scale Product Recognition. Technical report will be presented at Retail Vision workshop, CVPR2020.
  • 2020.3.1: We will host a Large-scale Product Recognition Challenge at ICPR2020, Milan, Italy.
  • 2020.2.24: One generic object recogniton paper is accpeted by CVPR2020. Source code will be released soon.
  • Work Experience

    JD AI Research, CV Lab (2018.02 -- Now)

    Researcher in Image Group, working on snapshop, VQA, fine-grained recognition, relationships modeling in images.

    Microsoft Research Asia, Web Search and Mining Group (2013.06 -- 2018.02)

    Research intern working on deep learning for image representation and computer vision.

    Microsoft Research Asia, Web Search and Data Mining Group (2012.01-2012-07)

    Research intern working on document retrieval results re-ranking.

    Important Preprints

  • Jie Ma, Yalong Bai, Bineng Zhong, Wei Zhang, Ting Yao, Tao Mei. Visualizing and Understanding Patch Interactions in Vision Transformer. Arxiv, 2022 [pdf]
  • Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei. Responsive Listening Head Generation: A Benchmark Dataset and Baseline. Arxiv, 2021 [pdf | Project]
  • Yalong Bai, Mohan Zhou, Yuxiang Chen, Wei Zhang, Bowen Zhou, Tao Mei. Augmentation Pathways Network for Visual Recognition. Arxiv, 2021 [pdf]
  • Yalong Bai, Yuxiang Chen, Wei Yu, Linfang Wang, Wei Zhang. Products-10K: A Large-scale Product Recognition Dataset. Arxiv, 2020 [pdf]
  • Publications

    Google Scholar Profile

  • Yalong Bai, Yifan Yang, Wei Zhang, Tao Mei. Directional Self-supervised Learning for Heavy Image Augmentations. CVPR, 2022 [pdf]
  • Tianyu Hua*, Hongdong Zheng*, Yalong Bai, Wei Zhang, Xiao-Ping Zhang, Tao Mei. Exploiting Relationship for Complex-scene Image Generation. AAAI, 2021[pdf]
  • Mohan Zhou*, Yalong Bai, Wei Zhang, Tiejun Zhao, Tao Mei. Look-into-Object: Self-supervised Structure Modeling for Object Recognition. CVPR, 2020 [Source Code | pdf]
  • Yuanzhi Liang*, Yalong Bai, Wei Zhang, Xueming Qian, Li Zhu, Tao Mei. VrR-VG: Refocusing Visually-Relevant Relationships. ICCV, 2019 [VrR-VG Dataset | pdf]
  • Yue Chen*, Yalong Bai, Wei Zhang, Tao Mei. Destruction and Construction Learning for Fine-grained Image Recognition. CVPR, 2019 [Source Code | pdf]
  • Yalong Bai, Jianlong Fu, Tiejun Zhao, Tao Mei. Deep Attention Neural Tensor Network for Visual Question Answering. ECCV, 2018 [pdf]
  • Yalong Bai, Kuiyuan Yang, Tao Mei, Wei-Ying Ma, Tiejun Zhao. Automatic Data Augmentation from Massive Web Images for Deep Visual Recognition. ACM Transactions on Multimedia Computing, Communications, and Applications, 2018. [Dataset and Models Download | pdf]
  • Chang Xu, Tao Qin, Yalong Bai, Gang Wang, Tie-Yan Liu. Convolutional Neural Networks For Posed and Spontaneous Expression Recognition. IEEE International Conference on Multimedia and Expo, 2017
  • Guotian Xie, Kuiyuan Yang, Yalong Bai, Min Shang, Yong Rui, Jianhuang Lai. Improve Dog Recognition By Mining More Information From Both Click-through Logs and Pre-trained Models. IEEE International Conference on Multimedia & Expo Workshops, 2016 [pdf]
  • Yalong Bai, Kuiyuan Yang, Wei Yu, Chang Xu, Wei-Ying Ma, Tiejun Zhao. Automatic Image Dataset Construction from Click-through Logs Using Deep Neural Network. Full Paper, ACM MultiMedia, 2015 [Dataset Download | pdf]
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Learning Cross Space Mapping via DNN using Large Scale Click-through Logs. IEEE Transactions on Multimedia, 2015.
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. Visualizing and Comparing Convolutional Neural Networks.
  • Chang Xu, Yalong Bai, Jiang Bian, Bin Gao, Gang Wang, Xiaoguang Liu, Tie-Yan Liu. RC-NET: A General Framework for Incorporating Knowledge into Word Representations. CIKM, 2014 [pdf]
  • Yalong Bai, Wei Yu, Tianjun Xiao, Chang Xu, Kuiyuan Yang, Wei-Ying Ma, Tiejun Zhao. Bag-of-Words Based Deep Neural Network for Image Retrieval. Short Paper. ACM MultiMedia, 2014 [pdf]
  • Wei Yu, Kuiyuan Yang, Yalong Bai, Hongxun Yao, Yong Rui. DNN Flow: DNN feature pyramid based image matching. BMVC, 2014.
  • Yalong Bai, Kuiyuan Yang, Wei Yu, Wei-Ying Ma, Tiejun Zhao. Learning High-level Image Representation for Image Retrieval via Multi-Task DNN using Clickthrough Data. ICLR, 2014.
  • Mo Yu, Tiejun Zhao and Yalong Bai, Hao Tian, Dianhai Yu. Cross-lingual Projections between Languages from different Families. ACL2013 short paper.
  • Mo Yu, Tiejun Zhao, Yalong Bai. Learning Domain Differences Automatically for Dependency Parsing Adaptation. IJCAI 2013 poster.
  • Note *: interns that I mentored at JD AI Research.

    Professional Activities

    Journal Reviewer

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • IEEE Transactions on Image Processing
  • IEEE Transactions on Multimedia
  • IEEE Transactions on Neural Networks and Learning Systems
  • IEEE Transactions on Circuits and Systems for Video Technology
  • ACM Transactions on Intelligent Systems and Technology
  • Transactions on Multimedia Computing Communications and Applications
  • Conference Reviewer / Program Committee Member

  • CVPR 2019, 2020, 2021
  • ICCV 2019, 2021
  • ECCV 2020
  • ACM MM 2021
  • AAAI 2019
  • IJCAI 2021 (SPC)
  • Other

  • Executive Area Chairs Committee, VALSE 2020, 2021