Haibo Jin

Doctoral Student
PhD, Information Sciences, Illinois (in progress)
Research focus
Development of trustworthy machine learning through robustness, interpretability, and alignment.
Advisor
Publications & Papers
Jin, H., Zhang, P., Luo, M., & Wang, H. (2025). Reasoning Can Hurt the Inductive Abilities of Large Language Models. Advances in Neural Information Processing Systems.
Jin, H., Zhou, A., Menke, J., & Wang, H. (2024). Jailbreaking large language models against moderation guardrails via cipher characters. Advances in Neural Information Processing Systems, 37, 59408-59435.
Jin, H., Chen, R., Chen, J., Zheng, H., Zhang, Y., & Wang, H. (2024, September). Catchbackdoor: Backdoor detection via critical trojan neural path fuzzing. In European Conference on Computer Vision (pp. 90-106). Cham: Springer Nature Switzerland.
Chen, R., Jin, H., Liu, Y., Chen, J., Wang, H., & Sun, L. (2024, September). Editshield: Protecting unauthorized image editing by instruction-guided diffusion models. In European Conference on Computer Vision (pp. 126-142). Cham: Springer Nature Switzerland.
Zhang, P., Jin, H., Hu, L., Li, X., Kang, L., Luo, M., ... & Wang, H. Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization. In Forty-second International Conference on Machine Learning.
Zhuang, J., Jin, H., Zhang, Y., Kang, Z., Zhang, W., Dagher, G. G., & Wang, H. (2025). Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.