Xiangyu Zhou 👨‍💻

Xiangyu Zhou

(he/him)

Graduate Research Assistant

About Me

Hi there! 👋 I’m a Ph.D. candidate in Computer Science at Wayne State University, advised by Prof. Dongxiao Zhu, where I spend most of my time studying how to make large language models more trustworthy, robust, and safe. My research sits at the intersection of trustworthy AI, large language model safety, and reasoning, with a focus on understanding how modern models can be manipulated, misaligned, or made to forget in more precise ways.
Since joining the Trustworthy AI Lab, I have been working on problems such as jailbreak vulnerabilities, adversarial in-context learning, safety alignment, and LLM unlearning. My work explores both the weaknesses of frontier language and reasoning models and practical ways to improve their reliability under real-world conditions.
More recently, I have been studying how reasoning traces and conversational context can steer model behavior, as well as how to align models more effectively without hurting their general usefulness. I am also interested in targeted unlearning: removing unwanted or sensitive information from models while keeping useful knowledge intact. At a broader level, I care about building AI systems that are not only capable, but also dependable and responsible. My long-term goal is to help bridge cutting-edge language model research with safer deployment in high-impact settings.
If you’re interested in trustworthy AI, language model safety, robustness, or reasoning, let’s connect! 🚀

Education

PhD in Computer Science

2023-08-30

Wayne State University

Master in Computer Science

2021-08-30
2023-05-30

Stevens Institute of Technology

Bachelor of Science in Software Engineering

2017-09-01
2019-05-30

Chongqing University of Posts and Telecommunications

Interests

Large Language Models Reinforcement Learning Trustworthy AI Supervised Fine-tuning Safety and Robutness in LLMs Multimodal AI
Featured Publications
Not all tokens are meant to be forgotten featured image

Not all tokens are meant to be forgotten

This paper helps large language models forget sensitive and unwanted data without over-forgetting general data.

avatar
Xiangyu Zhou
Hijacking Large Language Models via Adversarial In-Context Learning featured image

Hijacking Large Language Models via Adversarial In-Context Learning

This work introduces a novel transferable attack against In-Context-Learning to hijack LLMs to generate the target response or jailbreak. We also propose a defense strategy …

avatar
Xiangyu Zhou
Recent Publications
Recent & Upcoming Talks
Recent News
Will be serving as one of the reviewer at ICML 2026 featured image

Will be serving as one of the reviewer at ICML 2026

I was glad to serve as a reviewer for ICML 2026 this year. Seeing papers from the reviewer side gave me a better sense of what makes research stand out.

avatar
Xiangyu Zhou
🎉Paper accepted by AAAI-26 featured image

🎉Paper accepted by AAAI-26

Excited to share our paper 'Not All Tokens Are Meant to Be Forgotten' has been accepted to the The 40th Annual AAAI Conference on Artificial Intelligence (Accepted as Oral, …

avatar
Xiangyu Zhou
Serving as one of the Program Committee (PC) members at AAAI 2026 featured image

Serving as one of the Program Committee (PC) members at AAAI 2026

I will be serving as one of the Program Committee (PC) members at AAAI 2026.

avatar
Xiangyu Zhou