In this section, we describe the overview of the modelling techniques and their performances for tag prediction. Classification and topic modelling approaches were utilised to model our solution.

These chord diagram represent the misclassification patterns for different models.

SVM :

This model classified most of CSS questions as HTML, most of Jquery questions as javascript. Majority of HTML misclassification comprises of being classified into javascript. These languages tend to be used together frequently and perhaps be part of same question many times.

Multinomial Naive Bayes:

Similarly the misclassfications in the Naive Bayes can be attributed to languages being used together more often than not are confused as one another.

LDA :

So, also is the case with LDA where it mixes up words from HTML and CSS into topic 4, Topic 9 consists of a small part of HTML words and Jquery.

The misclassification patterns are similar across all the models. This can be addressed by identifying better discriminator vocabularies for co-occuring technologies.

Performances

We have made 80-20 train-test split. All the classifiers were trained on the nouns extracted from the posts. And converted them into TF-IDF vectors to be used as instances training and testing.

Parameters:

Training times: Longest to shortest

Best Accuracies:

The detailed tabulation of the classifiers’ performance also has been added in the following section. There we can see what was the performane for each class for a classifier.

SVM Linear

##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##          0.73          0.70          0.72          0.74
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall F1 Prevalence Detection Rate Detection Prevalence Balanced Accuracy
Class: android 0.89 0.99 0.91 0.98 0.91 0.89 0.90 0.14 0.12 0.13 0.94
Class: c 0.61 0.99 0.41 1.00 0.41 0.61 0.49 0.01 0.01 0.01 0.80
Class: c# 0.87 0.98 0.77 0.99 0.77 0.87 0.82 0.08 0.07 0.09 0.93
Class: c++ 0.59 0.99 0.75 0.98 0.75 0.59 0.66 0.05 0.03 0.04 0.79
Class: css 0.75 0.97 0.03 1.00 0.03 0.75 0.05 0.00 0.00 0.03 0.86
Class: html 0.48 0.96 0.57 0.95 0.57 0.48 0.52 0.10 0.05 0.08 0.72
Class: ios 0.93 0.99 0.78 1.00 0.78 0.93 0.85 0.04 0.04 0.05 0.96
Class: java 0.87 0.98 0.79 0.99 0.79 0.87 0.83 0.09 0.07 0.09 0.92
Class: javascript 0.60 0.95 0.72 0.93 0.72 0.60 0.66 0.16 0.10 0.13 0.78
Class: jquery 1.00 0.98 0.01 1.00 0.01 1.00 0.01 0.00 0.00 0.02 0.99
Class: mysql 0.64 0.98 0.29 1.00 0.29 0.64 0.40 0.01 0.01 0.03 0.81
Class: php 0.64 0.98 0.77 0.97 0.77 0.64 0.70 0.07 0.05 0.06 0.81
Class: python 0.83 0.97 0.87 0.96 0.87 0.83 0.85 0.18 0.15 0.17 0.90
Class: r 0.77 0.99 0.73 0.99 0.73 0.77 0.75 0.04 0.03 0.04 0.88
Class: sql 0.49 0.99 0.78 0.98 0.78 0.49 0.60 0.04 0.02 0.02 0.74

Multinomial Naive Bayes

##      Accuracy         Kappa AccuracyLower AccuracyUpper 
##          0.70          0.67          0.70          0.70
Sensitivity Specificity Pos Pred Value Neg Pred Value Precision Recall F1 Prevalence Detection Rate Detection Prevalence Balanced Accuracy
Class: android 0.89 0.98 0.88 0.98 0.88 0.89 0.88 0.13 0.12 0.13 0.93
Class: c 0.30 1.00 0.82 0.97 0.82 0.30 0.44 0.04 0.01 0.01 0.65
Class: c# 0.88 0.98 0.74 0.99 0.74 0.88 0.80 0.07 0.06 0.09 0.93
Class: c++ 0.75 0.99 0.69 0.99 0.69 0.75 0.72 0.04 0.03 0.04 0.87
Class: css 0.30 0.99 0.69 0.96 0.69 0.30 0.42 0.06 0.02 0.03 0.65
Class: html 0.47 0.94 0.34 0.97 0.34 0.47 0.40 0.06 0.03 0.08 0.71
Class: ios 0.91 0.99 0.83 1.00 0.83 0.91 0.86 0.04 0.04 0.05 0.95
Class: java 0.82 0.98 0.79 0.98 0.79 0.82 0.80 0.09 0.07 0.09 0.90
Class: javascript 0.77 0.93 0.50 0.98 0.50 0.77 0.61 0.09 0.07 0.14 0.85
Class: jquery 0.26 0.99 0.72 0.96 0.72 0.26 0.38 0.05 0.01 0.02 0.63
Class: mysql 0.46 0.98 0.40 0.99 0.40 0.46 0.43 0.02 0.01 0.03 0.72
Class: php 0.66 0.98 0.76 0.97 0.76 0.66 0.70 0.07 0.05 0.06 0.82
Class: python 0.94 0.95 0.74 0.99 0.74 0.94 0.83 0.14 0.13 0.17 0.94
Class: r 0.64 1.00 0.91 0.98 0.91 0.64 0.75 0.06 0.04 0.04 0.82
Class: sql 0.47 1.00 0.85 0.98 0.85 0.47 0.61 0.04 0.02 0.02 0.73

Comparison of Classifiers