Research Paper
1 min readMar 23, 2026
[Paper] From Masks to Pixels and Meaning: A New Taxonomy, Benchmark, and Metrics for VLM Image Tampering
This paper introduces a new approach to VLM image tampering detection, addressing the inaccuracies of current benchmarks that rely on coarse object masks. It reformulates the task to be pixel-grounded, meaning-aware, and language-aware, aiming for more precise identification of subtle edits. The work also proposes a comprehensive taxonomy of edit primitives, such as replace, remove, and inpaint, to better categorize and detect image manipulations.