Xiulong Liu

I am a ML Researcher at Apple AI/ML. Before joining Apple, I obtained my Ph.D. degree from the University of Washington, where I conducted research in the NeuroAI Lab under the supervision of Prof. Eli Shlizerman. My research interests broadly lie in computer vision, audio generation and multi-modal learning. Prior to that, I received my B.S. degree in Electrical Engineering at Shanghai Jiaotong University.

News

Jul 16, 2025 My Ph.D. dissertation titled “Towards Multi-modal Interactive Systems that Connect Vision, Audio and Beyond” is formally published, please check out here
Jun 30, 2025 I am excited to share that I’m joining Apple as a ML Researcher!
May 22, 2025 I have successfully defended my PhD thesis titled “Towards Multi-modal Interactive Systems that Connects Audio, Vision and Beyond” and become Dr. Dragon!
Feb 26, 2025 My first authored paper “Hearing Anywhere in Any Environment” has been accepted to CVPR 2025! Code, Dataset has been released. Please check out here and here!
Feb 10, 2025 I pass my PhD General Exam, and become a Ph.D. candidate!

selected publications

  1. CVPR
    Hearing Anywhere in Any Environment
    Liu, Xiulong, Kumar, Anurag, Calamia, Paul, Amengual, Sebastia V., Murdock, Calvin, Ananthabhotla, Ishwarya, Robinson, Philip, Shlizerman, Eli, Ithapu, Vamsi Krishna, and Gao, Ruohan
    In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Jun 2025
  2. NEURIPS
    Tell What You Hear From What You See - Video to Audio Generation Through Text
    Liu, Xiulong, Su, Kun, and Shlizerman, Eli
    In Advances in Neural Information Processing Systems 2024
  3. ICML
    From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    In Proceedings of the 41st International Conference on Machine Learning 21–27 jul 2024
  4. CVPR
    MuseChat: A Conversational Music Recommendation System for Videos
    Dong, Zhikang, Liu, Xiulong, Chen, Bin, Polak, Pawel, and Zhang, Peng
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Jun 2024
  5. AAAI
    CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments
    Liu, Xiulong, Paul, Sudipta, Chatterjee, Moitreya, and Cherian, Anoop
    Proceedings of the AAAI Conference on Artificial Intelligence Mar 2024
  6. WACV
    Let the Beat Follow You - Creating Interactive Drum Sounds From Body Rhythm
    Liu, Xiulong, Su, Kun, and Shlizerman, Eli
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Jan 2024
  7. WACV
    Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
    Liu, Xiulong, Dong, Zhikang, and Zhang, Peng
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2024
  8. NEURIPS
    How Does it Sound? Generation of Rhythmic Soundtracks for Human Movement Videos
    Su, Kun*, Liu, Xiulong*, and Shlizerman, Eli
    Advances in Neural Information Processing Systems 2021
  9. NEURIPS
    Audeo: Audio generation for a silent performance video
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    Advances in Neural Information Processing Systems 2020
  10. CVPR
    Predict & cluster: Unsupervised skeleton based action recognition
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020