Smokey: Automatic Recognition of Hostile Messages
- Ellen Spertus
Proceedings of IAAI-97, the 9th Conference on Innovative Application of Artificial Intelligence |
Abusive messages (flames) can be both a source of frustration and a waste of time for Internet users. This paper describes some approaches to flame recognition, including a prototype system, Smokey. Smokey builds a 47-element feature vector based on the syntax and semantics of each sentence, combining the vectors for the sentences within each message. A training set of 720 messages was used by Quinlan’s C4.5 decision-tree generator to determine featurebased rules that were able to correctly categorize 64% of the flames and 98% of the non-flames in a separate test set of 460 messages. Additional techniques for greater accuracy and user customization are also discussed.