==Twitter Occupation Dataset== Feature representation and tweets of a set of 5191 users mapped to their occupational class. Extracted around 5 August 2014. Associated paper, read for more details: Daniel Preotiuc-Pietro, Vasileios Lampos, Nikolaos Aletras An analysis of the user occupational class through Twitter content ACL 2015 Total number of users: 5191 Total number of tweet ids: 10796836 Contents: 1. jobs-tweetids - user_id[SPACE]tweet_id Each line represents a tweet. 2. jobs-unigrams - user_id[SPACE]wordid_1:frequency_1[SPACE]...wordid_n:frequency_n Bag-of-words unigram feature representation, one user/line. 3. dictionary - wordid[SPACE]word Mapping between word ids and words. 4. jobs-users - user_id[SPACE]occupation_code Resolved 3-digit SOC code for each user. 5. keywords - occupation_code,occupation_description,"keyphrase_1, ..., keyphrase_n" 3-digit SOC code, its corresponding class description and the keyphrases for jobs in this category used for identifying users If you are using this dataset, please cite: @inproceedings{jobs15acl, title = {An analysis of the user occupational class through {T}witter content}, journal = {Proceedings of the 53rd annual meeting of the Association for Computational Linguistics}, year = {2015}, series = {ACL}, author = {Preo\c{t}iuc-Pietro, Daniel and Lampos, Vasileios and Aletras, Nikolaos} }