This data set contains aggregated user-level information about 491 Twitter users with their self-reported age and gender and dark triad scores obtained using the Dirty Dozen questionnaire. Associated paper, read for more details: Daniel Preotiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar Studying the Dark Triad of Personality using Twitter Behavior CIKM 2016 Each file contains one line for each word (space separated): model-darktriad.txt - predicting the combined Dark Triad score of a user model-narcissism.txt - predicting the combined Dark Triad score of a user model-psychopathy.txt - predicting the combined Dark Triad score of a user model-machiavellianism.txt - predicting the combined Dark Triad score of a user To apply the model to a new user, you need to first extract the fraction of occurence of each token in a user's tweets. Then, multiply these occurences with the weights assigned to each word from our model and, finally, add the '_intercept' value from the first line of each file. The results will be the natural logarithm for each trait. The regular score is on a scale from 1 -- 5 (see the paper for more details on sample means + questions from each scale). For tokenization, we recommend using the WWBP social media tokenizer: http://wwbp.org/downloads/public_data/happierfuntokenizing.zip We recommend to use on users with at least 500 tokens in their entire history (or approximatelly 50 tweets). The model was developed and tested on English speaking US-based Twitter users. Application on data from other domains (Facebook, blogs) or cultures (e.g. UK users) is not recommended and likely to suffer from a performance drop. If using this data set, please cite the following publication: @inproceedings {darktriad2016cikm, title = {{Studying the Dark Triad of Personality using Twitter Behavior}}, author = {Preo\c{t}iuc-Pietro, Daniel and Carpenter, Jordan and Giorgi, Salvatore and Ungar, Lyle}, series = {CIKM}, booktitle = {Proceedings of the 25th {ACM} Conference on Information and Knowledge Management}, year = {2016}, } For other questions about the models, please contact Daniel Preotiuc-Pietro (www.preotiuc.ro)