Matplotlib 官方文件: https://matplotlib.org/
Seaborn 官方文件: https://seaborn.pydata.org/
以上兩個是在進行資料視覺化時,常使用的兩個套件,Matplotlib的自由度高,Seaborn呈現方式多元成熟,兩者能夠互相搭配使用
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
這是份包含不同類別鋼鐵的資料,包含長度、亮度、面積等資訊
鋼鐵的類別為: Pastry, Z_Scratch, K_Scatch, Stains, Dirtiness, Bumps, Other_Faults等,我們希望了解各種鋼鐵類別間,是否有因為不同的屬性差異而造成不同的分類結果,或者是屬性間的相關性,因此可以透過資料視覺化來先進行初步的了解
df = pd.read_csv('faults.csv')
df.head()
X_Minimum | X_Maximum | Y_Minimum | Y_Maximum | Pixels_Areas | X_Perimeter | Y_Perimeter | Sum_of_Luminosity | Minimum_of_Luminosity | Maximum_of_Luminosity | ... | Orientation_Index | Luminosity_Index | SigmoidOfAreas | Pastry | Z_Scratch | K_Scatch | Stains | Dirtiness | Bumps | Other_Faults | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42 | 50 | 270900 | 270944 | 267 | 17 | 44 | 24220 | 76 | 108 | ... | 0.8182 | -0.2913 | 0.5822 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 645 | 651 | 2538079 | 2538108 | 108 | 10 | 30 | 11397 | 84 | 123 | ... | 0.7931 | -0.1756 | 0.2984 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 829 | 835 | 1553913 | 1553931 | 71 | 8 | 19 | 7972 | 99 | 125 | ... | 0.6667 | -0.1228 | 0.2150 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 853 | 860 | 369370 | 369415 | 176 | 13 | 45 | 18996 | 99 | 126 | ... | 0.8444 | -0.1568 | 0.5212 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1289 | 1306 | 498078 | 498335 | 2409 | 60 | 260 | 246930 | 37 | 126 | ... | 0.9338 | -0.1992 | 1.0000 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 34 columns
此處進行簡單的資料預處理,主要是將資料從 dummy variable 換成分類,並且移除一些不需要的欄位
conditions=[(df['Pastry'] == 1) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 0)& (df['Stains'] == 0)& (df['Dirtiness'] == 0)& (df['Bumps'] == 0)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 1)& (df['K_Scatch'] == 0)& (df['Stains'] == 0)& (df['Dirtiness'] == 0)& (df['Bumps'] == 0)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 1)& (df['Stains'] == 0)& (df['Dirtiness'] == 0)& (df['Bumps'] == 0)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 0)& (df['Stains'] == 1)& (df['Dirtiness'] == 0)& (df['Bumps'] == 0)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 0)& (df['Stains'] == 0)& (df['Dirtiness'] == 1)& (df['Bumps'] == 0)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 0)& (df['Stains'] == 0)& (df['Dirtiness'] == 0)& (df['Bumps'] == 1)& (df['Other_Faults'] == 0),
(df['Pastry'] == 0) & (df['Z_Scratch'] == 0)& (df['K_Scatch'] == 0)& (df['Stains'] == 0)& (df['Dirtiness'] == 0)& (df['Bumps'] == 0)& (df['Other_Faults'] == 1)]
choices = ['Pastry', 'Z_Scratch', 'K_Scatch', 'Stains', 'Dirtiness', 'Bumps', 'Other_Faults']
df['class'] = np.select(conditions, choices)
#Dropping redundant column
#Dropping Hot Encoding Classes
drp_cols=['TypeOfSteel_A400', 'Pastry', 'Z_Scratch', 'K_Scatch', 'Stains', 'Dirtiness', 'Bumps', 'Other_Faults']
df.drop(choices, inplace=True, axis = 1)
df
X_Minimum | X_Maximum | Y_Minimum | Y_Maximum | Pixels_Areas | X_Perimeter | Y_Perimeter | Sum_of_Luminosity | Minimum_of_Luminosity | Maximum_of_Luminosity | ... | Edges_X_Index | Edges_Y_Index | Outside_Global_Index | LogOfAreas | Log_X_Index | Log_Y_Index | Orientation_Index | Luminosity_Index | SigmoidOfAreas | class | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 42 | 50 | 270900 | 270944 | 267 | 17 | 44 | 24220 | 76 | 108 | ... | 0.4706 | 1.0000 | 1.0 | 2.4265 | 0.9031 | 1.6435 | 0.8182 | -0.2913 | 0.5822 | Pastry |
1 | 645 | 651 | 2538079 | 2538108 | 108 | 10 | 30 | 11397 | 84 | 123 | ... | 0.6000 | 0.9667 | 1.0 | 2.0334 | 0.7782 | 1.4624 | 0.7931 | -0.1756 | 0.2984 | Pastry |
2 | 829 | 835 | 1553913 | 1553931 | 71 | 8 | 19 | 7972 | 99 | 125 | ... | 0.7500 | 0.9474 | 1.0 | 1.8513 | 0.7782 | 1.2553 | 0.6667 | -0.1228 | 0.2150 | Pastry |
3 | 853 | 860 | 369370 | 369415 | 176 | 13 | 45 | 18996 | 99 | 126 | ... | 0.5385 | 1.0000 | 1.0 | 2.2455 | 0.8451 | 1.6532 | 0.8444 | -0.1568 | 0.5212 | Pastry |
4 | 1289 | 1306 | 498078 | 498335 | 2409 | 60 | 260 | 246930 | 37 | 126 | ... | 0.2833 | 0.9885 | 1.0 | 3.3818 | 1.2305 | 2.4099 | 0.9338 | -0.1992 | 1.0000 | Pastry |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1936 | 249 | 277 | 325780 | 325796 | 273 | 54 | 22 | 35033 | 119 | 141 | ... | 0.5185 | 0.7273 | 0.0 | 2.4362 | 1.4472 | 1.2041 | -0.4286 | 0.0026 | 0.7254 | Other_Faults |
1937 | 144 | 175 | 340581 | 340598 | 287 | 44 | 24 | 34599 | 112 | 133 | ... | 0.7046 | 0.7083 | 0.0 | 2.4579 | 1.4914 | 1.2305 | -0.4516 | -0.0582 | 0.8173 | Other_Faults |
1938 | 145 | 174 | 386779 | 386794 | 292 | 40 | 22 | 37572 | 120 | 140 | ... | 0.7250 | 0.6818 | 0.0 | 2.4654 | 1.4624 | 1.1761 | -0.4828 | 0.0052 | 0.7079 | Other_Faults |
1939 | 137 | 170 | 422497 | 422528 | 419 | 97 | 47 | 52715 | 117 | 140 | ... | 0.3402 | 0.6596 | 0.0 | 2.6222 | 1.5185 | 1.4914 | -0.0606 | -0.0171 | 0.9919 | Other_Faults |
1940 | 1261 | 1281 | 87951 | 87967 | 103 | 26 | 22 | 11682 | 101 | 133 | ... | 0.7692 | 0.7273 | 0.0 | 2.0128 | 1.3010 | 1.2041 | -0.2000 | -0.1139 | 0.5296 | Other_Faults |
1941 rows × 28 columns
本次會介紹 matplotlib 當中的五種圖形的使用與語法,並且以上述鋼鐵資料集來做為範例
plt.hist(x, bins=None, range=None, density=None, cumulative=False, histtype='bar', align='mid', orientation='vertical', rwidth=None, color=None, label=None, stacked=False)
plt.hist(df["Minimum_of_Luminosity"], bins= 10, color='c') # 畫出直方圖,bins 為區間數
plt.xlabel("Minimum_of_Luminosity") # .xlabel在所有圖形中,都作為 x 軸的屬性
plt.ylabel("frequency") # .ylabel在所有圖形中,都作為 y 軸的屬性
plt.title("Minimum_of_Luminosity") # .title為替圖片取名
plt.show()
# 先將各類別鋼鐵資料分別選擇出來
df1 = df[df['class'] == 'Z_Scratch']
df2 = df[df['class'] == 'K_Scatch']
df3 = df[df['class'] == 'Stains']
df4 = df[df['class'] == 'Dirtiness']
df5 = df[df['class'] == 'Bumps']
df6 = df[df['class'] == 'Other_Faults']
df7 = df[df['class'] == 'Parstry']
# 利用stacked==True來使各直方圖相疊加,此處注意的是data必須用陣列傳入
plt.hist([df1["Minimum_of_Luminosity"],
df2["Minimum_of_Luminosity"],
df3["Minimum_of_Luminosity"],
df4["Minimum_of_Luminosity"],
df5["Minimum_of_Luminosity"],
df6["Minimum_of_Luminosity"],
df7["Minimum_of_Luminosity"]],
label=['Z_Scratch', 'K_Scatch', 'Stains', 'Dirtiness', 'Bumps', 'Other_Faults','Parstry'],
stacked=True)
plt.xlabel('Minimum_of_Luminosity')
plt.legend() # 圖例
plt.show()
colors=['lightsteelblue', 'cornflowerblue', 'royalblue', 'midnightblue', 'navy', 'darkblue', 'mediumblue']
count = df["class"].value_counts(sort=False)
y = [count[i] for i in choices]
plt.bar(df['class'].unique(), y, color=colors, width=0.5) # x軸傳入class類別,y軸計算各類別資料數量
plt.xlabel('class')
plt.ylabel('amount')
plt.title('Class data amount')
plt.show()
plt.bar(df['class'].unique(), y, color=colors, width=0.5) # x軸傳入class類別,y軸計算各類別資料數量
plt.axhline(y=200, c="r", ls="--", lw=2) # axhline y=200代表設定標準
plt.show()
plt.scatter('Minimum_of_Luminosity', "Maximum_of_Luminosity", data=df[df["class"] == "Z_Scratch"], alpha=0.5)
plt.xlabel('Minimum_of_Luminosity')
plt.ylabel('Maximum_of_Luminosity')
plt.show()
df1 = df[df['class'] == 'Pastry']
df2 = df[df['class'] == 'Z_Scratch']
df3 = df[df['class'] == 'Stains']
plt.scatter('X_Maximum', "Y_Maximum", data=df1, alpha=0.2, label="Pastry") # 此處用 label 先標記該點圖的屬性
plt.scatter('X_Maximum', "Y_Maximum", data=df2, alpha=0.2, label="Z_Scratch") # 此處用 label 先標記該點圖的屬性
plt.scatter('X_Maximum', "Y_Maximum", data=df3, alpha=0.2, label="Stains") # 此處用 label 先標記該點圖的屬性
plt.legend() #將前面用label標記的點以圖例的方式表示
plt.xlabel('X_Maximum')
plt.ylabel('Y_Maximum')
plt.show()
plt.boxplot([df[df["class"] == "Pastry"].Minimum_of_Luminosity,
df[df["class"] == "Z_Scratch"].Minimum_of_Luminosity,
df[df["class"] == "Stains"].Minimum_of_Luminosity],
labels = ["Pastry", "Z_Scratch", "Stains"])
plt.ylabel('Minimum_of_Luminosity')
plt.xlabel('class')
plt.title('Box plot of Minimum_of_Luminosity')
plt.show()
month = [1,2,3,4,5,6,7,8,9,10,11,12]
stock_tsmcc = [255,246,247.5,227,224,216.5,246,256,262.5,234,225.5,225.5]
stock_foxconnn = [92.2,88.1,88.5,82.9,85.7,83.2,83.8,80.5,79.2,78.8,71.9,70.8]
plt.plot(month, stock_tsmcc, 's-', color='r', label="TSMC")
plt.plot(month, stock_foxconnn, 'o-', color='g', label="FOXCONN")
plt.title("TSMC_FOXCONN")
plt.xlabel("month")
plt.ylabel("price")
plt.legend()
plt.show()
import seaborn as sns
sns.distplot(df["Minimum_of_Luminosity"], kde=True, bins=25)
<matplotlib.axes._subplots.AxesSubplot at 0x289e7bbd250>
count = df["class"].value_counts(sort=False)
y = [count[i] for i in choices]
sns.barplot(x = df["class"].unique(), y=y)
<AxesSubplot:>
sns.countplot(x=df['class'])
<AxesSubplot:xlabel='class', ylabel='count'>
sns.stripplot(x=df['class'], y=df["Y_Minimum"], jitter=1)
<AxesSubplot:xlabel='class', ylabel='Y_Minimum'>
sns.swarmplot(x=df['class'], y=df["Y_Minimum"])
c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 44.3% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 63.7% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 73.9% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 48.6% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 38.2% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 66.7% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning) c:\users\user\appdata\local\programs\python\python39\lib\site-packages\seaborn\categorical.py:1296: UserWarning: 80.5% of the points cannot be placed; you may want to decrease the size of the markers or use stripplot. warnings.warn(msg, UserWarning)
<AxesSubplot:xlabel='class', ylabel='Y_Minimum'>
sns.jointplot(x='X_Maximum', y='Y_Minimum', data=df, kind='reg') # kind的選擇有 scatter、reg、resid、kde、hex
<seaborn.axisgrid.JointGrid at 0x1a0beb0c0d0>
sns.boxplot(x='class', y='Minimum_of_Luminosity', data=df)
<matplotlib.axes._subplots.AxesSubplot at 0x289e7d0cfd0>
sns.violinplot(x= 'class', y = 'Minimum_of_Luminosity', data = df)
<matplotlib.axes._subplots.AxesSubplot at 0x289e807bbb0>
sns.pairplot(df)
<seaborn.axisgrid.PairGrid at 0x242283ce848>