In this section, we describe the overview of the modelling techniques and their performances for tag prediction. Classification and topic modelling approaches were utilised to model our solution.
These chord diagram represent the misclassification patterns for different models.
This model classified most of CSS questions as HTML, most of Jquery questions as javascript. Majority of HTML misclassification comprises of being classified into javascript. These languages tend to be used together frequently and perhaps be part of same question many times.
Similarly the misclassfications in the Naive Bayes can be attributed to languages being used together more often than not are confused as one another.
So, also is the case with LDA where it mixes up words from HTML and CSS into topic 4, Topic 9 consists of a small part of HTML words and Jquery.
The misclassification patterns are similar across all the models. This can be addressed by identifying better discriminator vocabularies for co-occuring technologies.
We have made 80-20 train-test split. All the classifiers were trained on the nouns extracted from the posts. And converted them into TF-IDF vectors to be used as instances training and testing.
Parameters:
Training times: Longest to shortest
Best Accuracies:
The detailed tabulation of the classifiers’ performance also has been added in the following section. There we can see what was the performane for each class for a classifier.
## Accuracy Kappa AccuracyLower AccuracyUpper
## 0.73 0.70 0.72 0.74
Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | Precision | Recall | F1 | Prevalence | Detection Rate | Detection Prevalence | Balanced Accuracy | |
---|---|---|---|---|---|---|---|---|---|---|---|
Class: android | 0.89 | 0.99 | 0.91 | 0.98 | 0.91 | 0.89 | 0.90 | 0.14 | 0.12 | 0.13 | 0.94 |
Class: c | 0.61 | 0.99 | 0.41 | 1.00 | 0.41 | 0.61 | 0.49 | 0.01 | 0.01 | 0.01 | 0.80 |
Class: c# | 0.87 | 0.98 | 0.77 | 0.99 | 0.77 | 0.87 | 0.82 | 0.08 | 0.07 | 0.09 | 0.93 |
Class: c++ | 0.59 | 0.99 | 0.75 | 0.98 | 0.75 | 0.59 | 0.66 | 0.05 | 0.03 | 0.04 | 0.79 |
Class: css | 0.75 | 0.97 | 0.03 | 1.00 | 0.03 | 0.75 | 0.05 | 0.00 | 0.00 | 0.03 | 0.86 |
Class: html | 0.48 | 0.96 | 0.57 | 0.95 | 0.57 | 0.48 | 0.52 | 0.10 | 0.05 | 0.08 | 0.72 |
Class: ios | 0.93 | 0.99 | 0.78 | 1.00 | 0.78 | 0.93 | 0.85 | 0.04 | 0.04 | 0.05 | 0.96 |
Class: java | 0.87 | 0.98 | 0.79 | 0.99 | 0.79 | 0.87 | 0.83 | 0.09 | 0.07 | 0.09 | 0.92 |
Class: javascript | 0.60 | 0.95 | 0.72 | 0.93 | 0.72 | 0.60 | 0.66 | 0.16 | 0.10 | 0.13 | 0.78 |
Class: jquery | 1.00 | 0.98 | 0.01 | 1.00 | 0.01 | 1.00 | 0.01 | 0.00 | 0.00 | 0.02 | 0.99 |
Class: mysql | 0.64 | 0.98 | 0.29 | 1.00 | 0.29 | 0.64 | 0.40 | 0.01 | 0.01 | 0.03 | 0.81 |
Class: php | 0.64 | 0.98 | 0.77 | 0.97 | 0.77 | 0.64 | 0.70 | 0.07 | 0.05 | 0.06 | 0.81 |
Class: python | 0.83 | 0.97 | 0.87 | 0.96 | 0.87 | 0.83 | 0.85 | 0.18 | 0.15 | 0.17 | 0.90 |
Class: r | 0.77 | 0.99 | 0.73 | 0.99 | 0.73 | 0.77 | 0.75 | 0.04 | 0.03 | 0.04 | 0.88 |
Class: sql | 0.49 | 0.99 | 0.78 | 0.98 | 0.78 | 0.49 | 0.60 | 0.04 | 0.02 | 0.02 | 0.74 |
## Accuracy Kappa AccuracyLower AccuracyUpper
## 0.70 0.67 0.70 0.70
Sensitivity | Specificity | Pos Pred Value | Neg Pred Value | Precision | Recall | F1 | Prevalence | Detection Rate | Detection Prevalence | Balanced Accuracy | |
---|---|---|---|---|---|---|---|---|---|---|---|
Class: android | 0.89 | 0.98 | 0.88 | 0.98 | 0.88 | 0.89 | 0.88 | 0.13 | 0.12 | 0.13 | 0.93 |
Class: c | 0.30 | 1.00 | 0.82 | 0.97 | 0.82 | 0.30 | 0.44 | 0.04 | 0.01 | 0.01 | 0.65 |
Class: c# | 0.88 | 0.98 | 0.74 | 0.99 | 0.74 | 0.88 | 0.80 | 0.07 | 0.06 | 0.09 | 0.93 |
Class: c++ | 0.75 | 0.99 | 0.69 | 0.99 | 0.69 | 0.75 | 0.72 | 0.04 | 0.03 | 0.04 | 0.87 |
Class: css | 0.30 | 0.99 | 0.69 | 0.96 | 0.69 | 0.30 | 0.42 | 0.06 | 0.02 | 0.03 | 0.65 |
Class: html | 0.47 | 0.94 | 0.34 | 0.97 | 0.34 | 0.47 | 0.40 | 0.06 | 0.03 | 0.08 | 0.71 |
Class: ios | 0.91 | 0.99 | 0.83 | 1.00 | 0.83 | 0.91 | 0.86 | 0.04 | 0.04 | 0.05 | 0.95 |
Class: java | 0.82 | 0.98 | 0.79 | 0.98 | 0.79 | 0.82 | 0.80 | 0.09 | 0.07 | 0.09 | 0.90 |
Class: javascript | 0.77 | 0.93 | 0.50 | 0.98 | 0.50 | 0.77 | 0.61 | 0.09 | 0.07 | 0.14 | 0.85 |
Class: jquery | 0.26 | 0.99 | 0.72 | 0.96 | 0.72 | 0.26 | 0.38 | 0.05 | 0.01 | 0.02 | 0.63 |
Class: mysql | 0.46 | 0.98 | 0.40 | 0.99 | 0.40 | 0.46 | 0.43 | 0.02 | 0.01 | 0.03 | 0.72 |
Class: php | 0.66 | 0.98 | 0.76 | 0.97 | 0.76 | 0.66 | 0.70 | 0.07 | 0.05 | 0.06 | 0.82 |
Class: python | 0.94 | 0.95 | 0.74 | 0.99 | 0.74 | 0.94 | 0.83 | 0.14 | 0.13 | 0.17 | 0.94 |
Class: r | 0.64 | 1.00 | 0.91 | 0.98 | 0.91 | 0.64 | 0.75 | 0.06 | 0.04 | 0.04 | 0.82 |
Class: sql | 0.47 | 1.00 | 0.85 | 0.98 | 0.85 | 0.47 | 0.61 | 0.04 | 0.02 | 0.02 | 0.73 |