Can personality predict risk of drug addiction?

Authors: Jaejin Kim and Jui Nerurkar

Research Question

Our goal for this project is to study whether personality traits of individuals affect their drug use.

  • How does personality affect drug use?
  • Can personality measurements help in predicting drug addiction/drug use?

Approach and Implementation

a. Data:

  1. To answer these questions, we used the drug consumption dataset from the UCI repository which contains information for 1885 respondents and has 12 measured attributes for each respondent.
  2. Variables in the dataset include: NEO-FFI-R personality measurements (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impulsivity), and ImpSS (sensation seeking), level of education, age, gender, country of residence, ethnicity and their drug use history.

b. Methodology:

  1. First we used Principal Component Analysis (PCA) to visualize the data on a two dimensional space. Though from the plot we decided that binary classification is more appropriate, we analyze both multi-class and binary classification to illustrate this point further.
  2. We used k-Nearest Neighbors and Logistic Regression model for binary classification and k-Nearest Neighbors and Random Forest model for multi-class classification, with 2/3 train and 1/3 test split.
  3. For binary classification, AUC was used as the model selection metric while F1 score was used for multi-class classification.

c. Outcomes:

  1. The initial bar plot helps us compare the frequencies of the people belonging to the 4 groups:
  • 0: People who have never used drugs; 
  • 1: People who have used the given drug over a decade ago/ in the past decade; 
  • 2: People who have used the given drug in the last year/month, and 
  • 3: People who have used the given drug in the last week/day. 


Figure 1: Frequency Bar Plot of Figure 2: PCA Plot Drug_indicator (outcome)

  1. It was evident from the PCA plot that the only significant difference in personality traits were observable between people who never used drugs and those who used drugs at some point in their lives.
  2. The overall results from the models support our intuition from analyzing the PCA plot. Hence, though personality traits can help in predicting whether a person will use drugs in his life or not, it seems difficult to look at personality traits and distinguish between people who used drugs in the past decade, in the last year/last month or last week/day. 

       Table 1: Summary of results for binary classification

Model AUCAccuracy
Logistic Regression0.7340.759
kNN (5)0.690

              Table 2: Summary of results for multi-class classification

ModelF1 ScoreAccuracy
Random Forest(Information Gain)0.200.494
Random Forest(Gini Impurity)0.185
kNN (5)0.186
  1. If we use AUC as the model selection metric, for binary classification logistic regression seem to be performing the best. Similarly, looking at F1 Score of predictions among multi class classification models, Random Forest using information gain seems to be the best model. In addition to the above, we have also reported the accuracy of predictions for the best model in the binary and multi-class classification.

d. Next Steps:

  1. Firstly, we would like to do further research to improve the accuracy of our multi-class classification models. 
  2. Moreover, to advance our knowledge of the role of individual differences in drug use, in future we would like to compare the personality profiles of individuals using Amphet, Coke, LSD, Heroin and Meth with each other rather than a simple comparison between personality profiles of drug users and non-users. 

Code for this analysis available at:

%d bloggers like this: