(exploring the family of Stack Exchange Websites)

After running the following query:

SELECT PostTypeId as type_of_posts, COUNT(*) as nº_of_posts
  FROM Posts
 GROUP BY PostTypeId
   ORDER BY nº_of_posts DESC;

One could check that the two most numerous post types are the 2 and the 1, Answer and Question. Let's now focus on the Questions side.

SELECT Id, 
       PostTypeId, 
       CreationDate, 
       Score, 
       ViewCount, 
       Tags, 
       AnswerCount, 
       FavoriteCount
  FROM Posts
 WHERE PostTypeId = 1 AND CreationDate >= '2019-01-01'
ORDER BY CreationDate;

Now lets read into the file created from the query above:

In [1]:
import numpy as np
import pandas as pd


posts = pd.read_csv('2019_questions.csv', parse_dates=['CreationDate'])

# Creating a sample of five rows from the newly created posts DataFrame:
posts.sample(5)
Out[1]:
Id CreationDate Score ViewCount Tags AnswerCount FavoriteCount
7925 44021 2019-01-15 10:15:57 0 12 <machine-learning><k-nn> 0 NaN
2459 47565 2019-03-18 22:31:37 2 142 <machine-learning><predictive-modeling><machin... 2 NaN
3799 61127 2019-10-02 05:02:49 1 13 <nlp><topic-model> 0 NaN
4158 49470 2019-04-17 10:51:15 1 43 <machine-learning><neural-network><data-mining... 0 NaN
4366 61617 2019-10-11 18:55:06 0 7 <python><matplotlib> 0 NaN
In [2]:
posts.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8839 entries, 0 to 8838
Data columns (total 7 columns):
Id               8839 non-null int64
CreationDate     8839 non-null datetime64[ns]
Score            8839 non-null int64
ViewCount        8839 non-null int64
Tags             8839 non-null object
AnswerCount      8839 non-null int64
FavoriteCount    1407 non-null float64
dtypes: datetime64[ns](1), float64(1), int64(4), object(1)
memory usage: 483.5+ KB
In [3]:
posts['FavoriteCount'].value_counts(dropna=False)
Out[3]:
NaN      7432
 1.0      953
 2.0      205
 0.0      175
 3.0       43
 4.0       12
 5.0        8
 6.0        4
 7.0        4
 11.0       1
 8.0        1
 16.0       1
Name: FavoriteCount, dtype: int64
In [4]:
posts['Tags'].value_counts()
Out[4]:
<machine-learning>                                                                               118
<python><pandas>                                                                                  58
<python>                                                                                          55
<r>                                                                                               38
<tensorflow>                                                                                      36
<nlp>                                                                                             35
<neural-network>                                                                                  35
<reinforcement-learning>                                                                          32
<keras>                                                                                           29
<deep-learning>                                                                                   29
<time-series>                                                                                     26
<keras><tensorflow>                                                                               24
<machine-learning><python>                                                                        23
<classification>                                                                                  23
<python><pandas><dataframe>                                                                       22
<clustering>                                                                                      21
<machine-learning><neural-network>                                                                21
<cnn>                                                                                             19
<machine-learning><deep-learning>                                                                 18
<lstm>                                                                                            17
<dataset>                                                                                         17
<machine-learning><classification>                                                                17
<orange>                                                                                          16
<machine-learning><neural-network><deep-learning>                                                 16
<visualization>                                                                                   16
<machine-learning><python><scikit-learn>                                                          15
<pytorch>                                                                                         15
<python><keras><tensorflow>                                                                       15
<decision-trees>                                                                                  15
<pandas>                                                                                          15
                                                                                                ... 
<time-series><forecasting><probabilistic-programming>                                              1
<neural-network><training><optimization><fuzzy-logic><fuzzy-classification>                        1
<deep-learning><cnn><image-recognition><image-preprocessing><image-size>                           1
<machine-learning><cnn><reinforcement-learning><convolution>                                       1
<python><c>                                                                                        1
<data-mining><dbscan><research><implementation>                                                    1
<deep-learning><cross-validation>                                                                  1
<dataset><lstm>                                                                                    1
<gan><databases>                                                                                   1
<machine-learning><data><categorical-data><encoding>                                               1
<training><methodology>                                                                            1
<python><statistics><geospatial>                                                                   1
<machine-learning><classification><perceptron>                                                     1
<machine-learning><machine-learning-model><azure-ml>                                               1
<classification><scikit-learn><decision-trees><multiclass-classification><unbalanced-classes>      1
<deep-learning><loss-function><cosine-distance>                                                    1
<neural-network><computer-vision><object-detection>                                                1
<deep-learning><nlp><lstm><rnn><language-model>                                                    1
<r><ensemble-modeling>                                                                             1
<machine-learning><python><scikit-learn><regression><feature-selection>                            1
<machine-learning><python><nlp><stanford-nlp>                                                      1
<training><computer-vision><gan>                                                                   1
<machine-learning><nlp><natural-language-process><nlg>                                             1
<python><pandas><matplotlib>                                                                       1
<machine-learning><python><pytorch>                                                                1
<data><feature-engineering><encoding>                                                              1
<machine-learning><python><similarity><correlation>                                                1
<multilabel-classification><confusion-matrix>                                                      1
<machine-learning><lstm><bert>                                                                     1
<deep-learning><dataset><cnn><training><image-size>                                                1
Name: Tags, Length: 6462, dtype: int64

We have a tremendous amount of missing (NaN) datapoints in our posts Dataframe, mainly in our FavoriteCount column. Where 7432 datapoints, out of 8839, are NaN's. Apart from this particular col we do not have any other missing values in our Dataframe.

From a quick search we found out that this missing values might correspond to Questions that have never been favorited, hence felt into the forgotten category. And the difference between these and the 0.0 values, in the FavoriteCount column, is that the last were once favorited Questions that lost their favorited pedigree to the extent of 0. - https://meta.stackexchange.com/questions/327680/why-do-some-questions-have-a-favorite-count-of-0-while-others-have-none

We have two solutions to fix this missing values in our Dataframe. The first solution would be to simply erase the rows, the second solution would be to assign to all the rows that have missing values, and based on our findings, the value of 0, implying that these rows have zero favorite votes. Due to the size of the missing values in our Dataframe we would suggest to opt for the second solution, thus not losing a lot of data.

Regarding our Tags col we could, in order to favour the analysis, and to smooth the results, further group the column, or we could also treat each group of tags as an individual tag. But first of all we should separate each tag properly, i e, with a comma (,).

First let's fill in the missing values in our FavoriteCount column with zeros (0). And then changing the col type from a float to a integer one:

In [5]:
#resorting to the fillna method:
posts['FavoriteCount'] = posts['FavoriteCount'].fillna(0)
posts.sample(5)
Out[5]:
Id CreationDate Score ViewCount Tags AnswerCount FavoriteCount
7045 43471 2019-01-04 10:26:10 5 354 <neural-network><classification><overfitting> 3 1.0
1745 57692 2019-08-16 21:15:18 1 29 <regression><visualization><feature-extraction... 1 0.0
34 44500 2019-01-24 12:40:20 0 157 <deep-learning><training> 0 0.0
8324 65611 2019-12-30 08:12:01 1 23 <machine-learning><deep-learning><pytorch> 0 1.0
4289 61768 2019-10-15 13:08:21 0 14 <machine-learning><predictive-modeling><data-c... 0 0.0
In [6]:
posts['FavoriteCount'] = posts['FavoriteCount'].astype(int)
posts.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 8839 entries, 0 to 8838
Data columns (total 7 columns):
Id               8839 non-null int64
CreationDate     8839 non-null datetime64[ns]
Score            8839 non-null int64
ViewCount        8839 non-null int64
Tags             8839 non-null object
AnswerCount      8839 non-null int64
FavoriteCount    8839 non-null int64
dtypes: datetime64[ns](1), int64(5), object(1)
memory usage: 483.5+ KB

Its time now to clean the Tags column and separate each different tag by a comma (,):

In [7]:
posts['Tags'] = posts['Tags'].str.replace('><', ',').copy()
posts['Tags'] = posts['Tags'].str.replace('<', ''). copy()
posts['Tags'] = posts['Tags'].str.replace('>', '').copy()
posts.head()
Out[7]:
Id CreationDate Score ViewCount Tags AnswerCount FavoriteCount
0 44419 2019-01-23 09:21:13 1 21 machine-learning,data-mining 0 0
1 44420 2019-01-23 09:34:01 0 25 machine-learning,regression,linear-regression,... 0 0
2 44423 2019-01-23 09:58:41 2 1651 python,time-series,forecast,forecasting 0 0
3 44427 2019-01-23 10:57:09 0 55 machine-learning,scikit-learn,pca 1 0
4 44428 2019-01-23 11:02:15 0 19 dataset,bigdata,data,speech-to-text 0 0
We are going to calculate the number of times each Tag was used, first by spliting on the commas (,) each individual Tag from their counterparts, then by stacking the full Dataframe that resulted from that, and finally counting the number of times that each different Tag was used:
In [8]:
tags_count = posts['Tags'].str.split(',', expand=True).stack().value_counts()
tags_top5 = tags_count.head(5)
print(tags_top5)
machine-learning    2693
python              1814
deep-learning       1220
neural-network      1055
keras                935
dtype: int64
Plotting the above results:
In [9]:
import matplotlib.pyplot as plt
%matplotlib inline

#Plotting the tags_top5 DataFrame in an horizontal bar graph:
tags_top5_graph = tags_top5.plot.barh(
                    edgecolor='none',
                    color = [(255/255,188/255,121/255),
                            (162/255,200/255, 236/255),
                            (207/255,207/255,207/255),
                            (200/255,82/255,0/255),
                            (255/255,194/255,10/255)])

#ENHANCING PLOT AESTHETICS:

#Removing all the 4 spines with a for loop from our graph figure:
for key, spine in tags_top5_graph.spines.items():
    spine.set_visible(False)
#Removing the ticks:
tags_top5_graph.tick_params(
                            bottom='off', top='off', left='off', right='off')
#Setting a graph title:
tags_top5_graph.set_title('Top5 Tags by Number of Usage')
#Setting an average graph line:
tags_top5_graph.axvline(tags_top5.mean(),
                       alpha=.8, linestyle='--', color='grey')

# Displaying the graph:
plt.show()
In [10]:
posts[posts['Tags'].str.contains('machine-learning')]['ViewCount'].sum()
Out[10]:
398666
Now lets check, from the top5 Tags calculated above, which one is the most viewed. We will do so using the str.contains() method combined with a mask to filter only the top5 Tags in our posts Dataframe, and then, concentrating on the ViewCount col, adding all the times each tag was viewed:
In [11]:
#creating the tags_top5_views DataFrame(DF):
tags_top5_views = pd.DataFrame(
                                columns=tags_top5.index, 
                                index=['Total Views'])


for r in range(0,5):
    df = tags_top5
    n_views = posts[posts['Tags'].str.contains(df.index[r])]['ViewCount'].sum()
    print(str(df.index[r]) + '__total-views:',n_views)
#filling in the tags_top5_views DF with the name of the columns and respective
#number of views:
    col = tags_top5_views.columns[r]
    tags_top5_views[col] = [n_views]
machine-learning__total-views: 398666
python__total-views: 541691
deep-learning__total-views: 233628
neural-network__total-views: 185367
keras__total-views: 269051
Plotting the above results:
In [12]:
#Plotting the tags_top5_views DataFrame in a bar graph:
tags_top5_views_graph = tags_top5_views.plot.bar(edgecolor='none',
                                            color = [(255/255,188/255,121/255),
                            (162/255,200/255, 236/255),
                            (207/255,207/255,207/255),
                            (200/255,82/255,0/255),
                            (255/255,194/255,10/255)])

#ENHANCING PLOT AESTHETICS: 

#Removing all the 4 spines with a for loop from our graph figure:
for key, spine in tags_top5_views_graph.spines.items():
    spine.set_visible(False)
    
#Removing the ticks from the graph:
tags_top5_views_graph.tick_params(
                                  top ='off',
                                  bottom = 'off',
                                  right = 'off',
                                  left = 'off')
   
# Setting up a legend box for our bar graph:    
tags_top5_views_graph.legend(
    loc='upper right', 
    labels=(tags_top5_views.columns), 
    ncol=1, fancybox=True, framealpha=.6,
    prop={'size': 10})
#Rotating the xtick labels:
plt.xticks(rotation='horizontal')

tags_top5_views_graph.axhline(n_views.mean(),
                             color='grey',
                             alpha=.8,
                             linestyle=':')

plt.show()

It is clear that among our top5 Tags there are two that stand out: Machine-Learning (ML) and Python (Py). Not only are these two Tags the most used (ML-2693 times; Py-1814 times) but also the most viewed ones (Py-541691 views; ML-398666 views).

We've got two pretty good potential candidates for our assignment, and two compimentlary ones that can even be combined into a major one: Python and Machine-Learning.

In [13]:
posts[posts['Tags'].apply(
    lambda tags: True if 'python' and 'tensorflow' in tags else False)]
Out[13]:
Id CreationDate Score ViewCount Tags AnswerCount FavoriteCount
22 44474 2019-01-24 00:43:27 2 1810 python,keras,tensorflow,gpu 2 2
39 44508 2019-01-24 15:18:57 1 27 tensorflow 0 0
52 44537 2019-01-25 00:54:49 0 303 machine-learning,neural-network,keras,tensorflow 1 0
66 55922 2019-07-18 13:59:42 0 117 keras,tensorflow,anomaly-detection,autoencoder 0 0
69 55925 2019-07-18 14:26:20 0 16 python,tensorflow,predictive-modeling,lstm,ana... 0 0
73 55931 2019-07-18 15:07:00 1 29 tensorflow 1 0
103 55994 2019-07-19 10:55:04 0 100 machine-learning,deep-learning,tensorflow,obje... 1 0
104 56000 2019-07-19 12:35:37 0 144 tensorflow 0 0
113 44584 2019-01-25 18:22:34 0 229 deep-learning,keras,tensorflow 1 0
122 44611 2019-01-26 16:16:54 2 102 keras,tensorflow 1 1
126 44624 2019-01-27 02:53:33 2 2538 keras,tensorflow,lstm 3 1
128 44627 2019-01-27 07:54:54 0 255 python,tensorflow,anaconda 2 0
136 44645 2019-01-27 14:13:32 1 216 machine-learning,neural-network,keras,tensorflow 1 0
152 44680 2019-01-28 07:48:11 1 193 python,tensorflow,cnn,computer-vision,opencv 0 0
190 55855 2019-07-17 18:10:46 0 109 machine-learning,tensorflow 1 0
194 55859 2019-07-17 19:32:53 1 50 deep-learning,keras,tensorflow 2 1
198 55868 2019-07-17 21:30:52 0 9 python,neural-network,tensorflow,predictive-mo... 0 1
209 55887 2019-07-18 05:46:20 1 26 machine-learning,deep-learning,tensorflow,data... 0 1
216 55899 2019-07-18 08:00:57 0 29 neural-network,keras,tensorflow,convolution,au... 0 0
240 56038 2019-07-19 22:58:54 0 165 neural-network,keras,tensorflow,cnn,gpu 0 0
257 56067 2019-07-20 17:41:12 0 99 neural-network,keras,tensorflow,autoencoder 0 0
287 44840 2019-01-31 01:15:50 0 54 scikit-learn,tensorflow,algorithms 1 0
297 44864 2019-01-31 14:00:45 0 16 machine-learning,tensorflow,autoencoder 0 0
298 44866 2019-01-31 14:16:49 1 492 tensorflow 1 0
304 44883 2019-02-01 00:02:15 5 4551 deep-learning,keras,tensorflow,multiclass-clas... 6 1
318 44911 2019-02-01 12:16:58 0 11 python,neural-network,scikit-learn,tensorflow,... 0 0
326 44928 2019-02-01 17:05:44 0 77 python,keras,tensorflow 1 0
332 56171 2019-07-22 16:36:03 1 255 deep-learning,keras,tensorflow,cnn,convnet 1 0
333 56172 2019-07-22 17:05:49 0 264 tensorflow 0 0
335 56181 2019-07-22 18:53:04 0 14 keras,tensorflow 0 0
... ... ... ... ... ... ... ...
8194 65518 2019-12-27 10:20:04 2 47 keras,tensorflow,prediction 0 0
8208 54765 2019-06-30 03:30:59 0 16 neural-network,tensorflow,word2vec,word-embedd... 0 0
8209 54766 2019-06-30 04:14:43 0 15 tensorflow,multiclass-classification,word-embe... 0 0
8234 44241 2019-01-19 16:53:02 0 17 machine-learning,deep-learning,tensorflow,comp... 0 0
8237 44246 2019-01-19 18:55:30 1 55 python,neural-network,tensorflow,convnet 1 0
8297 54888 2019-07-02 06:41:19 1 414 deep-learning,tensorflow,bert 1 0
8368 55081 2019-07-04 16:50:25 0 11 deep-learning,keras,tensorflow,convnet 0 0
8373 55089 2019-07-04 18:59:56 0 206 deep-learning,tensorflow,java,opencv 0 1
8413 55188 2019-07-06 17:07:51 0 20 machine-learning,python,tensorflow,accuracy 0 1
8420 55202 2019-07-07 05:36:28 0 16 machine-learning,tensorflow 0 0
8426 55215 2019-07-07 12:33:05 3 749 python,keras,tensorflow,loss-function 1 0
8464 55312 2019-07-08 21:15:45 1 85 keras,tensorflow 2 0
8515 54972 2019-07-03 08:53:01 0 12 neural-network,tensorflow,image-classification... 0 0
8530 55004 2019-07-03 19:10:53 0 31 machine-learning,tensorflow 1 0
8563 55032 2019-07-04 09:24:26 0 24 machine-learning,python,tensorflow,pandas,data... 2 0
8574 55050 2019-07-04 13:08:49 0 393 neural-network,deep-learning,keras,tensorflow 2 0
8588 55494 2019-07-11 10:11:21 0 60 neural-network,tensorflow,regression 1 0
8594 55505 2019-07-11 14:16:52 0 75 deep-learning,keras,tensorflow 0 0
8607 55536 2019-07-12 03:20:03 0 124 python,deep-learning,keras,tensorflow,object-d... 1 0
8612 55545 2019-07-12 06:52:03 2 502 neural-network,keras,tensorflow,cnn,convolution 2 0
8656 55158 2019-07-05 21:26:40 1 38 python,tensorflow,logistic-regression,loss-fun... 0 0
8674 55641 2019-07-14 12:49:57 0 35 machine-learning,deep-learning,tensorflow,imag... 0 0
8684 55659 2019-07-14 21:31:04 0 32 python,deep-learning,tensorflow,cnn 0 0
8742 55724 2019-07-15 20:23:27 0 14 tensorflow 0 0
8748 55735 2019-07-16 01:24:41 0 23 tensorflow,multiclass-classification,multilabe... 0 0
8755 55749 2019-07-16 06:49:44 1 29 deep-learning,tensorflow,multilabel-classifica... 0 0
8769 55777 2019-07-16 13:30:23 1 1212 tensorflow 1 0
8800 55265 2019-07-08 09:38:01 1 42 neural-network,deep-learning,keras,tensorflow,... 1 1
8808 55293 2019-07-08 16:33:03 0 109 deep-learning,tensorflow,cnn 1 0
8822 55391 2019-07-09 20:28:14 0 16 python,keras,tensorflow 0 0

584 rows × 7 columns

Prior to summarizing our findings let's dig deeper into Deep Learning and check whether or not this trend as come to stay.

In [14]:
all_questions = pd.read_csv('all_questions.csv', parse_dates=['CreationDate'])
all_questions.head()
Out[14]:
Id CreationDate Tags
0 45416 2019-02-12 00:36:29 <python><keras><tensorflow><cnn><probability>
1 45418 2019-02-12 00:50:39 <neural-network>
2 45422 2019-02-12 04:40:51 <python><ibm-watson><chatbot>
3 45426 2019-02-12 04:51:49 <keras>
4 45427 2019-02-12 05:08:24 <r><predictive-modeling><machine-learning-mode...

Doing the same process as we've done above, and cleaning the Tags column separating each different tag by a comma (,):¶

In [15]:
all_questions['Tags'] = all_questions['Tags'].str.replace('><', ',')
all_questions['Tags'] = all_questions['Tags'].str.replace('<', '')
all_questions['Tags'] = all_questions['Tags'].str.replace('>', '')
all_questions.sample(5)
Out[15]:
Id CreationDate Tags
10879 61095 2019-10-01 13:10:58 keras,time-series,lstm,convolution,autoencoder
5630 17287 2017-03-01 21:16:32 machine-learning,neural-network,convnet
180 55500 2019-07-11 13:19:26 prediction,forecasting,missing-data
17393 24080 2017-10-25 19:22:39 machine-learning,prediction,gaussian
20356 13854 2016-09-04 19:01:31 machine-learning,r,apache-spark,logistic-regre...
In [16]:
deep_learning = all_questions[all_questions['Tags'] == 'deep-learning']

#Sorting the Dataframe per Date:
deep_learning.sort_values(by='CreationDate', inplace=True)
deep_learning.head()
/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:4: SettingWithCopyWarning:


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Out[16]:
Id CreationDate Tags
5514 5375 2015-03-23 17:36:03 deep-learning
8039 6643 2015-07-31 08:21:49 deep-learning
16750 11591 2016-05-04 18:17:38 deep-learning
2852 15984 2016-12-29 03:35:35 deep-learning
3513 16276 2017-01-12 11:28:45 deep-learning
In [17]:
deep_learning.tail()
Out[17]:
Id CreationDate Tags
14682 62769 2019-11-06 12:48:16 deep-learning
15307 62896 2019-11-08 20:56:55 deep-learning
18065 64103 2019-12-02 16:10:37 deep-learning
18785 64645 2019-12-11 12:32:05 deep-learning
20660 66004 2020-01-07 06:54:06 deep-learning

It's clear that the range of our deep-learning sample varies from 2015 and 2020. Although for 2020 the size of the sample is too small, only contaning one month.

Now it is time to group together our deep-learning data, based on its year:

In [18]:
deep_learning_grp = deep_learning.groupby(
                    deep_learning.CreationDate.dt.year).sum().sort_values(by='Id')
deep_learning_grp.head(10)
Out[18]:
Id
CreationDate
2015 12018
2016 27575
2020 66004
2017 265854
2018 562010
2019 1625975

Applying the same process as above, to the all_questions Dataframe:

In [19]:
all_questions_grp = all_questions.groupby(
                    all_questions.CreationDate.dt.year).sum()
all_questions_grp.head(10)
Out[19]:
Id
CreationDate
2014 774987
2015 8241798
2016 27199660
2017 62341989
2018 189044640
2019 482278000
2020 30388658

Lets now merge the two Dataframes into one, for the sake of our analysis, and proceed with some comparisons and conclusions:

In [20]:
deep_all = pd.merge(all_questions_grp, deep_learning_grp, how='left', 
                    left_index=True, right_index=True)

deep_all = deep_all.rename(
                        columns={'Id_x':'all_questions','Id_y':'deep_learning' })

deep_all.head(10)
Out[20]:
all_questions deep_learning
CreationDate
2014 774987 NaN
2015 8241798 12018.0
2016 27199660 27575.0
2017 62341989 265854.0
2018 189044640 562010.0
2019 482278000 1625975.0
2020 30388658 66004.0
In order to adjust both samples and to make the comparison cleaner we will drop rows 2014, since there is no data for the deep-learning Tag for this period, and also the row 2020, due to the lack of sufficient data for these year in particular. This way our comparisons are more robust and consistent:
In [21]:
deep_all = deep_all.drop([2014, 2020], axis=0)
deep_all.head(10)
Out[21]:
all_questions deep_learning
CreationDate
2015 8241798 12018.0
2016 27199660 27575.0
2017 62341989 265854.0
2018 189044640 562010.0
2019 482278000 1625975.0

Let us now make another test, and compare the number of deep_learning Tags against all the questions made in the Stack Exchange website, in order to demonstrate and validate its growth in terms of all the questions ever made. Making a kind of common-size analysis:

In [22]:
deep_all['%_deep_learning'] = (deep_all[
    'deep_learning']/deep_all['all_questions'])*100
deep_all['date'] = deep_all.index
#dropping index (other way could be df.index.name=None): 
deep_all.reset_index(drop=True, inplace=True)
deep_all = deep_all[['date', 'all_questions', 'deep_learning', '%_deep_learning']]
deep_all.head(10)
Out[22]:
date all_questions deep_learning %_deep_learning
0 2015 8241798 12018.0 0.145818
1 2016 27199660 27575.0 0.101380
2 2017 62341989 265854.0 0.426445
3 2018 189044640 562010.0 0.297290
4 2019 482278000 1625975.0 0.337145
From a first glimpse we can observe an upward trend along the years, in the use of the deep_learning Tags. In 2015 their number were around 12000, as in 2019 these numbers climbed to figures around 1600000. This is an impressive growth.
In terms of their percentage, among all questions made in the Stack Exchange website, the growth trend is also there, albeit not so strong and not so linear. Now let's visualize it.

Plotting the results:

In [23]:
fig = plt.figure(figsize=(6,7))
ax_spines = ['right', 'left', 'bottom', 'top']

ax1 = fig.add_subplot(3,1,1)
ax2 = fig.add_subplot(3,1,2)
ax3 = fig.add_subplot(3,1,3)
x = deep_all['date']
xi = list(range(len(x)))

plt.xlabel=('x')
plt.xticks(xi, x)

ax1.plot(deep_all['all_questions'], color='green', linestyle='-.')
#Setting the yticks for ax1:
ax1.set_yticks([deep_all['all_questions'].min(), deep_all['all_questions'].max()/2,
     deep_all['all_questions'].max()])
ax2.plot(deep_all['deep_learning'], color = 'orange', linestyle='-.')
#Setting the yticks for ax2:
ax2.set_yticks([deep_all['deep_learning'].min(), deep_all['deep_learning'].max()/2,
     deep_all['deep_learning'].max()])
ax3.plot(deep_all['%_deep_learning'], color='grey', linestyle='-.')
#Setting the yticks for ax3:
ax3.set_yticks([0, 0.25, 0.5])

# Giving each graph a title:
ax1.set_title('All Questions')
ax2.set_title('Deep Learning Questions')
ax3.set_title('% of Deep Learning Questions')


#ENHANCING PLOT AESTHETICS:

#Removing the ticks from the graph:
for i in range(3):
    ax = fig.add_subplot(3,1,i+1)
    ax.tick_params(
                                  top ='off',
                                  bottom = 'off',
                                  right = 'off',
                                  left = 'off')
#Removing all the 4 spines with a for loop from our graph figure:
    for i in range(0,4):
        ax.spines[ax_spines[i]].set_visible(False)

#Removing unnecessary x labels:
for i in range(1,3):
    ax = fig.add_subplot(3,1,i)
    ax.tick_params(labelbottom=False)

plt.show()

Concluding, it is clear that we can actually see, along the years, a clear upward trend in the interest shown for the Deep Learning subject. Although being true, this statement is not so strong when we analyze the number of Deep Learning questions, compared to the Total Questions (All Questions). We still conclude that there is a growth, from year 2015 to 2019, in Deep Learning interest, but the linearity of that growth lacks strongness.

To sum it up, it's fair to say that Deep Learning is a subject that deserves our attention due to the traction it gained along the 5 years in our analysis.