from fastai import * # Quick access to most common functionality
from fastai.tabular import * # Quick access to tabular functionality
Tabular data should be in a Pandas DataFrame
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
train_df, valid_df = df[:-2000].copy(),df[-2000:].copy()
age | workclass | fnlwgt | education | education-num | marital-status | occupation | relationship | race | sex | capital-gain | capital-loss | hours-per-week | native-country | >=50k | |
0 | 49 | Private | 101320 | Assoc-acdm | 12.0 | Married-civ-spouse | NaN | Wife | White | Female | 0 | 1902 | 40 | United-States | 1 |
1 | 44 | Private | 236746 | Masters | 14.0 | Divorced | Exec-managerial | Not-in-family | White | Male | 10520 | 0 | 45 | United-States | 1 |
2 | 38 | Private | 96185 | HS-grad | NaN | Divorced | NaN | Unmarried | Black | Female | 0 | 0 | 32 | United-States | 0 |
3 | 38 | Self-emp-inc | 112847 | Prof-school | 15.0 | Married-civ-spouse | Prof-specialty | Husband | Asian-Pac-Islander | Male | 0 | 0 | 40 | United-States | 1 |
4 | 42 | Self-emp-not-inc | 82297 | 7th-8th | NaN | Married-civ-spouse | Other-service | Wife | Black | Female | 0 | 0 | 50 | United-States | 0 |
Convert your DataFrame
in to a DataBunch
suitable for modeling by calling tabular_data_from_df
dep_var = '>=50k'
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race', 'sex', 'native-country']
data = TabularDataBunch.from_df(path, train_df, valid_df, dep_var,
tfms=[FillMissing, Categorify], cat_names=cat_names)
Now you can create a Learner
with gen_tabular_dta
, and fit your model
learn = get_tabular_learner(data, layers=[200,100], metrics=accuracy), 1e-2)
VBox(children=(HBox(children=(IntProgress(value=0, max=1), HTML(value='0.00% [0/1 00:00<00:00]'))), HTML(value…
Total time: 00:05 epoch train loss valid loss accuracy 0 0.340980 0.330657 0.847000 (00:05)