By Janie Chang, Writer, Microsoft Research
When an oil spill happens, are we annoyed, angry, or furious? When the jobless rate drops, are we relieved, happy, or ecstatic? If these topics are being discussed on Twitter, a new study from Microsoft Research Redmond proves that it is possible to identify our collective emotional state based on large-scale expressions of moods shared via social media.
Moods are central to the expression of thoughts, ideas, and opinions. In turn, these factors influence attitudes and behavior. Consumer research, health care, urban development, and socioeconomics are just a few of the domains that would benefit from tools that can track and gauge a population’s moods—and, therefore, behaviors—with greater accuracy. Psychologists study human emotions, but the recent growth of social media has provided researchers with rich, large-scale data that can help interpret and understand the behavior of millions of individuals.
“Greater accuracy is the key difference between this work and previous approaches,” says postdoctoral researcher Munmun De Choudhury. “Before this, social-network and media research into moods largely looked for only ‘positive affect’ or ‘negative affect.’ But moods are not binary. One of our contributions is exploring the nuances of emotional expressions online and getting beyond the coarse-grained assessment of positive and negative.”
In the technical paper Not All Moods are Created Equal! Exploring Human Emotional States in Social Media—to be presented during the sixth International AAAI Conference on Weblogs and Social Media, which runs from June 4 through 7 at Trinity College Dublin—De Choudhury and researchers Scott Counts and Michael Gamon successfully identified more than 200 moods from Twitter data by borrowing from psychology literature a representation of the human mood landscape, known as the “circumplex model,” which defines dimensions of mood.
“Psychologists have validated different dimensions of moods, two of which are valence and activation,” explains Counts, who has a Ph.D. in social and personality psychology. “If you take a mood such as ‘frustrated,’ valence means how positive or negative is it? ‘Frustrated’ obviously is a negative emotion, but is it more negative than ‘angry,’ or is it less negative than ‘angry’? Activation refers to the intensity of an emotion. ‘Infuriated,’ for example, is higher in activation than ‘angry.’”
The team’s first task was to construct a lexicon of mood-indicative words relevant to social media. The researchers began with a list of 3,000 words drawn from psychology literature. These words were rated, through an Amazon Mechanical Turk crowdsourcing project, on a scale of 1-7, where 1 meant “not a mood at all” and 7 indicated “absolutely a mood.” This resulted in a list of 203 words—such as “excited,” “nervous,” “quiet,” “grumpy,” “depressed,” “thankful,” and “bored”—that both Mechanical Turk users and researchers agreed were mood-indicative.
Some of these words already had been rated for valence and activation in existing psychology lexicons. For those that were not, a second crowdsourcing task completed the rating exercise, and the researchers were able to place each word in a valence/activation space represented by a chart that locates valence along the x-axis and activation on the y-axis.
“The resulting mood lexicon is definitely a nice contribution of the research,” De Choudhury says. “It can be used with other Internet communication modalities to detect mood expression and for analysis similar to what we carried out in this study.”
The researchers then applied the lexicon to social-media data to study various aspects of mood expression in the context of behavioral attributes that define an individual’s actions in social media. With access to a year’s worth of Twitter posts, the team knew that a significant percentage of tweets would not reflect moods. The researchers had to devise a way of eliminating those. Fortunately, many Twitter users use a hashtag to indicate a mood at the end of a tweet—as in “going to latest Harry Potter movie tonight #excited”—which made it easier to collect a data set of mood-indicative tweets. The final mood data set consisted of 10.6 million tweets from 4.1 million English-speaking users.
“We worked with anonymized aggregated data to analyze usage trends,” Counts says. “We did not look at individual tweets because the goal was to map out characteristics of social-media mood expression. We were interested in discovering how often different types of moods were expressed, activity rates, and participatory patterns such as conversational engagement and information sharing. We achieved large-scale validations of what we knew of human moods, based on psychology literature, and also ran into quite a few surprises.”
One surprise was that, out of the 203 different moods, negative moods appeared more frequently than positive moods and covered a wider range of mood expressions. Furthermore, negative moods were usually of mid-level activation. Positive moods were less frequent than negative and represented a smaller range, but they tended to have high-level activation: words such as “win,” “happy,” and “ecstatic” occurred frequently.
“That was interesting, because in such a large sample size, we thought we would find an equal distribution of positive and negative mood words,” De Choudhury says. “Since Twitter users broadcast opinions and feelings, our theory is that users tend to express feelings of very high or low valence. Mildly positive feelings don’t seem to warrant tweeting, but extremely positive feelings do. We hypothesize that when users are feeling down or negative, even if mildly so, they reach out for social support, hence the higher incidence of negative mood expressions.”
Another unexpected dynamic was the relative lack of emotional content in targeted tweets, in which one user replies in a public message to another user via the @ reply. The team believed this type of exchange suggested a stronger rapport and comfort level between the users and expected such conversational exchanges to express more emotion.
Instead, they found the opposite.
“Our theory is that if two people already share a strong tie, perhaps they take discussions with emotional content offline or to a different medium, such as email, rather than having them in a broadcast medium like Twitter,” De Choudhury says. “That is a key point: how does the medium affect communication behavior? That would be an interesting direction to take later for this research.”
Overall, the team observed a relationship between the moods expressed and the social ties formed by individuals, as well as their rates of activity. Positive moods are shared by more “social” and highly “active” people. In addition, when individuals associate moods with information sharing via links on social media, it tends to be positive moods, potentially indicating that positive moods are contagious and likely to propagate better in a network.
From an application perspective, Counts and De Choudhury can see this research leading to avenues for building mood classifiers or analytical tools useful in search and advertising. In the consumer space, it could extend traditional sentiment analysis. There are also applications in media studies and in health care. They already are examining ways to aid the study of depression.
“Prior to social media, this kind of study would have been impossible,” Counts says. “The fact that we can understand the expression of so many nuances of emotion at this scale is unprecedented. To me, that’s the most exciting piece of the puzzle.”
De Choudhury adds another comment.
“We can see how people express themselves when using a completely different form of communication, one that did not exist 20 years ago,” she notes. “When we began this research, we thought people would behave similarly to the way they interact on the telephone. It turns out there are similarities and a lot of differences, all of which contributes to helping us understand how the Internet affects our lives—specifically, how it affects the way we express our emotions. That’s an incredible finding.”