"Emoji Attack: A Method for Misleading Judge LLMs in Safety Risk Detection."

Zhipeng Wei, Yuqi Liu, N. Benjamin Erichson (2024)

Details and statistics

DOI: 10.48550/ARXIV.2411.01077

access: open

type: Informal or Other Publication

metadata version: 2024-12-11