👨‍💻

Xiangyu Zhou

(he/him)

Graduate Research Assistant

About Me

Hi there! 👋 I’m a Ph.D. candidate in Computer Science at Wayne State University, advised by Prof. Dongxiao Zhu, where I spend most of my time studying how to make large language models more trustworthy, robust, and safe. My research sits at the intersection of trustworthy AI, large language model safety, and reasoning, with a focus on understanding how modern models can be manipulated, misaligned, or made to forget in more precise ways.
Since joining the Trustworthy AI Lab, I have been working on problems such as jailbreak vulnerabilities, adversarial in-context learning, safety alignment, and LLM unlearning. My work explores both the weaknesses of frontier language and reasoning models and practical ways to improve their reliability under real-world conditions.
More recently, I have been studying how reasoning traces and conversational context can steer model behavior, as well as how to align models more effectively without hurting their general usefulness. I am also interested in targeted unlearning: removing unwanted or sensitive information from models while keeping useful knowledge intact. At a broader level, I care about building AI systems that are not only capable, but also dependable and responsible. My long-term goal is to help bridge cutting-edge language model research with safer deployment in high-impact settings.
If you’re interested in trustworthy AI, language model safety, robustness, or reasoning, let’s connect! 🚀

Download CV

Education

PhD in Computer Science

2023-08-30

Wayne State University

Master in Computer Science

2021-08-30
2023-05-30

Stevens Institute of Technology

Bachelor of Science in Software Engineering

2017-09-01
2019-05-30

Chongqing University of Posts and Telecommunications

Interests

Large Language Models Reinforcement Learning Trustworthy AI Supervised Fine-tuning Safety and Robutness in LLMs Multimodal AI

Featured Publications

Accepted by AAAI as Oral

Not all tokens are meant to be forgotten

This paper helps large language models forget sensitive and unwanted data without over-forgetting general data.

Xiangyu Zhou

• Mar 14, 2026 • 1 min read

Large Language Models

Hijacking Large Language Models via Adversarial In-Context Learning

This work introduces a novel transferable attack against In-Context-Learning to hijack LLMs to generate the target response or jailbreak. We also propose a defense strategy …

Xiangyu Zhou

• Nov 16, 2023 • 1 min read

Recent Publications

Xiangyu Zhou, Yao Qiang, Saleh Zare Zade, Douglas Zytko, Prashant Khanduri, Dongxiao Zhu (2026). Not all tokens are meant to be forgotten. AAAI-26.

Source Document Code DOI

Rafi Ibn Sultan, Hui Zhu, Xiangyu Zhou, Chengyin Li, Prashant Khanduri, Marco Brocanelli, Dongxiao Zhu (2026). WalkGPT Grounded Vision-Language Conversation with Depth-Aware Segmentation for Pedestrian Navigation. CVPR-26.

Source Document Code Dataset

Saleh Zare Zade, Xiangyu Zhou, Sijia Liu, Dongxiao Zhu (2026). Attention Smoothing Is All You Need For Unlearning. ICLR-26.

Source Document Code

Emma Walquist, Isha Datey, Wenqi Zheng, Xiangyu Zhou, Kelly Berishaj, Melissa Mcdonald, Michele Parkhill, Dongxiao Zhu, Douglas Zytko (2025). Collective Consent Who Needs to Consent to the Donation of Data Representing Multiple People?. CSCW-25.

Source Document DOI

Saleh Zare Zade, Yao Qiang, Xiangyu Zhou, Hui Zhu, Mohammad Amin Roshani, Prashant Khanduri, Dongxiao Zhu (2025). Automatic calibration for membership inference attack on large language models. ECAI-25.

Source Document

See all

Recent & Upcoming Talks

Will be serving as one of the reviewer at ICML 2026

I was glad to serve as a reviewer for ICML 2026 this year. Seeing papers from the reviewer side gave me a better sense of what makes research stand out.

Xiangyu Zhou

• Jan 25, 2026 • 1 min read

Research

🎉Paper accepted by AAAI-26

Excited to share our paper 'Not All Tokens Are Meant to Be Forgotten' has been accepted to the The 40th Annual AAAI Conference on Artificial Intelligence (Accepted as Oral, …

Xiangyu Zhou

• Nov 8, 2025 • 1 min read