Haibo Jin

Doctoral Student

PhD, Information Sciences, Illinois (in progress)

haibo@illinois.edu

https://haibojin001.github.io/

Research focus

Development of trustworthy machine learning through robustness, interpretability, and alignment.

Advisor

Haohan Wang

Publications & Papers

Jin, H., Zhang, P., Luo, M., & Wang, H. (2025). Reasoning Can Hurt the Inductive Abilities of Large Language Models. Advances in Neural Information Processing Systems.

Jin, H., Zhou, A., Menke, J., & Wang, H. (2024). Jailbreaking large language models against moderation guardrails via cipher characters. Advances in Neural Information Processing Systems, 37, 59408-59435.

Jin, H., Chen, R., Chen, J., Zheng, H., Zhang, Y., & Wang, H. (2024, September). Catchbackdoor: Backdoor detection via critical trojan neural path fuzzing. In European Conference on Computer Vision (pp. 90-106). Cham: Springer Nature Switzerland.

Chen, R., Jin, H., Liu, Y., Chen, J., Wang, H., & Sun, L. (2024, September). Editshield: Protecting unauthorized image editing by instruction-guided diffusion models. In European Conference on Computer Vision (pp. 126-142). Cham: Springer Nature Switzerland.

Zhang, P., Jin, H., Hu, L., Li, X., Kang, L., Luo, M., ... & Wang, H. Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization. In Forty-second International Conference on Machine Learning.

Zhuang, J., Jin, H., Zhang, Y., Kang, Z., Zhang, W., Dagher, G. G., & Wang, H. (2025). Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing.

In the News

Illinois information sciences researchers develop AI safety testing methods August 13 2025