The visibility of stackoverflow posts can be enhanced by applying relevant tags to the posts. About a third of the questions remain without accepted answers or votes. A new user who is not well versed with the platform may not be able to enjoy full benefit of the forum due to lack of fundamental information on usage. In such a scenario, tag recommendation comes as a helpful tool. This being our motivation, through this project, we have attempted to recommend 1-most appropriate tag for a post based on the textual content and the title of the post. Our tags are relevant pertaining to stack overflow, as our models learn from the past posts of stack overflow.
We set out to answer the following questions with our project
* In which parts of the world is Stack Overflow mostly used?
* What are the top 10 common categories?
* What is the overall sentiment of users grouped by topic?
* What are the most upvoted question tags?
* Is sentiment of answers/comments correlated with the number of upvotes or downvotes?
This co-occuring terms visualization was an attempt at understanding if co-occuring words can be a factor for predicting tags. It can be seen that when we follow the highly frequent edges in the graph, we can form pseudo questions.
View counts distribution for questions from each tag. C and C++ questions tend to get viewed more often than others. This could be due to the extremely large user base of c and c++. JQuery questions get the least amount of views.
Here we have considered the data from 2017 to examine the progression of the number of question for a few tags over a month.
This plot shows a pattern of the number of question dipping during the weekend while being the highest during weekdays. This pattern is commonly observed for questions of all Tags. This observation is inline with active working days patterns in most countries worldwide. Also during the end of December is observed as a holiday period and there is decrease in the number of question as compared to the month of January.