This data set contains aggregated user-level information about 491 Twitter users with their self-reported age and gender and dark triad scores obtained using the Dirty Dozen questionnaire.

Associated paper, read for more details:
Daniel Preotiuc-Pietro, Jordan Carpenter, Salvatore Giorgi, Lyle Ungar
Studying the Dark Triad of Personality using Twitter Behavior
CIKM 2016

Each file contains one line for each word (space separated):

model-darktriad.txt - predicting the combined Dark Triad score of a user
model-narcissism.txt - predicting the combined Dark Triad score of a user
model-psychopathy.txt - predicting the combined Dark Triad score of a user
model-machiavellianism.txt - predicting the combined Dark Triad score of a user

To apply the model to a new user, you need to first extract the fraction of occurence of each token in a user's tweets. Then, multiply these occurences with the weights assigned to each word from our model and, finally, add the '_intercept' value from the first line of each file. The results will be the natural logarithm for each trait. The regular score is on a scale from 1 -- 5 (see the paper for more details on sample means + questions from each scale).

For tokenization, we recommend using the WWBP social media tokenizer: http://wwbp.org/downloads/public_data/happierfuntokenizing.zip

We recommend to use on users with at least 500 tokens in their entire history (or approximatelly 50 tweets).

The model was developed and tested on English speaking US-based Twitter users. Application on data from other domains (Facebook, blogs) or cultures (e.g. UK users) is not recommended and likely to suffer from a performance drop.

If using this data set, please cite the following publication:

@inproceedings {darktriad2016cikm,
	title = {{Studying the Dark Triad of Personality using Twitter Behavior}},
	author = {Preo\c{t}iuc-Pietro, Daniel and Carpenter, Jordan and Giorgi, Salvatore and Ungar, Lyle},
	series = {CIKM},
	booktitle = {Proceedings of the 25th {ACM} Conference on Information and Knowledge Management},
	year = {2016},
}

For other questions about the models, please contact Daniel Preotiuc-Pietro (www.preotiuc.ro)