Towards Authorship Obfuscation with Language Models
Computers and Communication Security (CCS) |
Published by ACM | Organized by ACM
Authorship obfuscation is the process of making changes to text such that identifying attributes (style, common words and phrases, tone) are masked. The goal of obfuscation is to retain the semantics of the text (i.e., the meaning) but rewrite it in such a way that the author cannot be identified. In this work, we investigate the effectiveness of language models for authorship obfuscation. More specifically, we examine the application of document summarization (a task where we learn to generate the summary of a text) as an authorship obfuscation method. Since summaries are shorter versions of text but which retain the significant points made in it, we hypothesize that summaries will be stripped off any stylistic identifying features of the text. Our experiments show that this is indeed the case; we were able to fool authorship classifiers and degrade their performance by as much as 70% However, this also significantly affected the semantics; there was a non-trivial loss of information and the produced text was not an accurate representation of the original.