Xiulong Liu

I am Xiulong Liu, a Ph.D. graduate from the University of Washington, where I conducted research in the NeuroAI Lab under the supervision of Prof. Eli Shlizerman. My research interests broadly lie in computer vision, audio generation and multi-modal learning. Prior to that, I received my B.S. degree in Electrical Engineering at Shanghai Jiaotong University.

News

May 22, 2025 I have successfully defended my PhD thesis titled “Towards Multi-modal Interactive Systems that Connects Audio, Vision and Beyond” and become Dr. Dragon!
Feb 26, 2025 My first authored paper “Hearing Anywhere in Any Environment” has been accepted to CVPR 2025!
Feb 10, 2025 I pass my PhD General Exam, and become a Ph.D. candidate!
Sep 25, 2024 My first authored paper “Tell What You Hear From What You See - Video to Audio Generation Through Text” has been accepted by NeurIPS 2024!
Feb 26, 2024 My first co-authored paper “MuseChat: A Conversational Music Recommendation System for Videos” has been accepted by CVPR 2024 as Highlight Poster (Top 2.8%)!

selected publications

  1. CVPR
    Hearing Anywhere in Any Environment
    Liu, Xiulong, Kumar, Anurag, Calamia, Paul, Amengual, Sebastia V., Murdock, Calvin, Ananthabhotla, Ishwarya, Robinson, Philip, Shlizerman, Eli, Ithapu, Vamsi Krishna, and Gao, Ruohan
    In Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Jun 2025
  2. NEURIPS
    Tell What You Hear From What You See - Video to Audio Generation Through Text
    Liu, Xiulong, Su, Kun, and Shlizerman, Eli
    In Advances in Neural Information Processing Systems 2024
  3. ICML
    From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    In Proceedings of the 41st International Conference on Machine Learning 21–27 jul 2024
  4. CVPR
    MuseChat: A Conversational Music Recommendation System for Videos
    Dong, Zhikang, Liu, Xiulong, Chen, Bin, Polak, Pawel, and Zhang, Peng
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Jun 2024
  5. AAAI
    CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments
    Liu, Xiulong, Paul, Sudipta, Chatterjee, Moitreya, and Cherian, Anoop
    Proceedings of the AAAI Conference on Artificial Intelligence Mar 2024
  6. WACV
    Let the Beat Follow You - Creating Interactive Drum Sounds From Body Rhythm
    Liu, Xiulong, Su, Kun, and Shlizerman, Eli
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Jan 2024
  7. WACV
    Tackling Data Bias in MUSIC-AVQA: Crafting a Balanced Dataset for Unbiased Question-Answering
    Liu, Xiulong, Dong, Zhikang, and Zhang, Peng
    In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2024
  8. NEURIPS
    How Does it Sound? Generation of Rhythmic Soundtracks for Human Movement Videos
    Su, Kun*, Liu, Xiulong*, and Shlizerman, Eli
    Advances in Neural Information Processing Systems 2021
  9. NEURIPS
    Audeo: Audio generation for a silent performance video
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    Advances in Neural Information Processing Systems 2020
  10. CVPR
    Predict & cluster: Unsupervised skeleton based action recognition
    Su, Kun, Liu, Xiulong, and Shlizerman, Eli
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020