Discussion Graph Tool

Established: April 25, 2014

Feature Extractors

A feature extractor in DGT is responsible for analyzing the raw data of a social media message and recognizing, extracting, inferring or detecting higher level information.  The raw data of a social media message may include the text as well available metadata about the message, the message author, and other geo-temporal or social context.

DGT includes several out-of-the-box feature extractors for common scenarios.  These include some complex analysis tasks, such as mood inference and geo-location mapping, as well as support for simpler analyses, such as customizable dictionary and regular expression-based feature extractors.

The reference guide lists the feature extractors included in DGT, and examples of using the customizable feature extractors.

The TREC 2013 Microblog track (opens in new tab) provided a convenient set of tools for retrieving tweets, including a tool for sampling from the public twitter stream.  To install this tool and begin downloading tweets, follow these instructions:

  1. Install the prerequisite software
    1. Java Development Kit (opens in new tab)
    2. Apache Maven (opens in new tab)
  2. Download the twitter-tools zip file from https://github.com/lintool/twitter-tools/ (opens in new tab) and extract it on your computer
  3. Open a command-line to the directory where you extracted the twitter-tools zip file and run the following two commands to build the twitter-tools program
> cd twitter-tools-core
> mvn clean package appassembler:assemble

4. Follow the instructions (opens in new tab) on the twitter-tools site for creating your Twitter access tokens, setting up a twitter4j.properties file, and running the GatherStatusStream.bat program to retrieve tweets from the public Twitter stream