Selected Publications

Moral Reasoning & Alignment

  • B. Chen, G. Liu, Z. Qi, K. M. Johnson. Learning to Diagnose and Correct Moral Errors: Towards Enhancing Moral Sensitivity in Large Language Models. ArXiv 2026. [PDF]
  • Zhiyu Xue, Z. Qi, G. Liu, B. Chen, R. Pedarsani. Deactivating Refusal Triggers: Understanding and Mitigating Overrefusal in Safety Alignment. ArXiv [PDF]
  • G. Liu, X. Chen, B. Chen, X. Zhang, K. M. Johnson. Pragmatic Inference for Moral Reasoning Acquisition: Generalization via Metapragmatic Links. ArXiv [PDF]
  • B. Chen, G. Liu, Z. Qi, X. Zhang. Diagnosing the Performance Trade-off in Moral Alignment: A Case Study on Gender Stereotypes. ArXiv 2025. [PDF]

Chatbot System Security

  • B. Chen, Z. Wang, K. M. Johnson. Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots. RAID 2023. [PDF]
  • B. Chen, X. Liu, Z. Wang. Multi-Turn Hidden Backdoor in Large Language Model-Powered Chatbot Models. ASIACCS 2024. [PDF]
  • B. Chen, G. Liu, K. Johnson. Jailbreaker in Jail: Moving Target Defense for Large Language Models. CCS MTD 2023. [PDF]