Events
Global Frontier AI Series: Watermarking Methods for Detecting LLM-generated Text
Speaker: Jeongyeon Hwang
Location:
TBA
Videoconference link:
https://nyu.zoom.us/j/93573795488
Date: Monday, February 23, 2026
this event will be fully remote
The rapid advancement and widespread adoption of large language models (LLMs) have heightened concerns about misuse, ranging from generating misleading content to undermining academic integrity (e.g., cheating). To address these concerns, watermarking has emerged as a promising way to detect LLM-generated text by embedding an imperceptible statistical signal in model outputs. In this talk, I first motivate the need for LLM watermarking and explain how existing methods work. I then present a key practical challenge: in real-world settings, adversaries may try to remove or obscure the watermark to evade detection. To study this threat, we present BIRA, a black-box watermark removal attack, and evaluate the robustness of current watermarking schemes under realistic evasion. Our results show that many existing approaches can be reliably circumvented with a simple rewriting-based strategy, underscoring the need for rigorous stress testing and the development of more robust watermarking methods.
Bio: Jeongyeon Hwang is a 4th-year integrated M.S./Ph.D. student in the Machine Learning Lab at POSTECH, advised by Jungseul Ok. His research aims to improve the real-world reliability of ML/NLP systems, with a focus on robust large language models against malicious inputs, corrupted training data, and misuse (e.g., fake content generation). Recently, he is working on watermarking methods for detecting LLM-generated text.