NivaDuck – A Scalable Pipeline to Build a Database of Political Twitter Handles for India and the United States

  • Anmol Panda ,
  • A’ndre Gonawela ,
  • Sreangsu Acharyya ,
  • Dibyendu Mishra ,
  • Mugdha Mohapatra ,
  • Ramgopal Chandrasekaran ,
  • Joyojeet Pal

SMSociety'20: International Conference on Social Media and Society |

Published by Association for Computing Machinery | Organized by Association for Computing Machinery

Publication | PDF

We present a scalable methodology to identify Twitter handles of politicians in a given region and test our framework in the context of Indian and US politics. The main contribution of our work is the list of the curated Twitter handles of 18500 Indian and 8000 US politicians. Our work leveraged machine learning-based classification and human verification to build a data set of Indian politicians on Twitter. We built NivaDuck, a highly precise, two-staged classification pipeline that leverages Twitter description text and tweet content to identify politicians. For India, we tested NivaDuck’s recall using Twitter handles of the members of the Indian parliament while for the US we used state and local level politicians in California state and San Diego county respectively. We found that while NivaDuck has lower recall scores, it produces large, diverse sets of politicians with precision exceeding 90 percent for the US dataset. We discuss the need for an ML-based, scalable method to compile such a dataset and its myriad use cases for the research community and its wide-ranging utilities for research in political communication on social media.