This data set contains aggregated user-level information about two dataset of Twitter users: 1 - with the self-reported political ideology (7-scale through survey) and age and gender; 2 - with the follower-based political orientation (binary).

Associated paper, read for more details:
Daniel Preotiuc-Pietro, Ye Liu, Daniel J Hopkins, Lyle Ungar
Beyond Binary Labels: Political Ideology Prediction of Twitter Users
ACL 2017
URL: http://sites.sas.upenn.edu/danielpr/files/moderates17acl.pdf

For other questions about the data set, contact Daniel Preotiuc-Pietro (www.preotiuc.ro)

Each feature file contains one category of features per data set in comma separated value format.

Feature files are split by data set:

'pi-' prefix - data set D1 from the paper (survey-based)
'po2-' prefix - data set D2 from the paper (follower-based)

Feature files are split by feature type (see header of each file):

'unigrams' - tokens used by at least 10% of the users
'emotions' - Ekman's six emotions + anticipation + trust + positive/negative computed using the NRC emotion lexicon
'liwc-2015' - LIWC 2015 categories
'w2v-500' - word2vec hard clusters, available from http://www.sas.upenn.edu/~danielpr/clusters.tar.gz
'political' - political terms as described in the paper
'user-trats' - user traits of the users (gender, age, political orientation/ideology)

NOTE: For protecting the identity of the users participating in our survey, we have annonimized the user_ids of the users in the D1 data set ('pi'). If you are interested in the real user_ids for research purposes only, please contact Daniel Preotiuc-Pietro (danielpr@sas.upenn.edu). Further, @-usernames are annonimized in this data set. The users from the D2 data set ('po2') are the real Twitter user_ids.

If using this data set, please cite the following publication:

@inproceedings{moderates17acl,
  title = {{Beyond Binary Labels: Political Ideology Prediction of Twitter Users}},
  author = {Preo\c{t}iuc-Pietro, Daniel and Liu, Ye and Hopkins, Daniel J and Ungar, Lyle},
  series = {ACL},
  booktitle = {Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics},
  year = {2017},
}