This data set contains aggregated user-level information about 491 Twitter users with their self-reported age and gender and dark triad scores obtained using the Dirty Dozen questionnaire. Associated paper, read for more details: Daniel Preotiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar Studying the Dark Triad of Personality using Twitter Behavior CIKM 2016 For other questions about the dataset, contact Daniel Preotiuc-Pietro (www.preotiuc.ro) Each feature file contain one feature category (space separated, header matches paper denomination): feature-unigrams - tokens used by at least 1% of the users feature-emotions - Ekman's six emotions + anticipation + trust + positive/negative computed using the NRC emotion lexicon feature-LIWC2015 - LIWC 2015 categories feature-w2v_200 - word2vec hard clusters, available from http://www.sas.upenn.edu/~danielpr/clusters.tar.gz feature-image - profile image features features-profile1 - profile features features-profile2 - profile features (derived from tweets) features-shallow - shallow tweet features outcomes - user level information, 'ag' suffix indicates residual values after predicting using age and gender Missing features are represented with -1 We extracted text features only for the 491 users that posted more than 500 tokens. Shallow features and a part of the profile features were only extracted from the 710 users posted at least one tweet. If using this data set, please cite the following publication: @inproceedings {darktriad2016cikm, title = {{Studying the Dark Triad of Personality using Twitter Behavior}}, author = {Preo\c{t}iuc-Pietro, Daniel and Carpenter, Jordan and Giorgi, Salvatore and Ungar, Lyle}, series = {CIKM}, booktitle = {Proceedings of the 25th {ACM} Conference on Information and Knowledge Management}, year = {2016}, }