datatable_demo

This notebook demonstrates the use of the DataTable object in the ukds package.

This demonstration uses for an example the following dataset: Gershuny, J., Sullivan, O. (2017). United Kingdom Time Use Survey, 2014-2015. Centre for Time Use Research, University of Oxford. [data collection]. UK Data Service. SN: 8128, http://doi.org/10.5255/UKDA-SN-8128-1

Import the ukds package

This demonstration used the ukds package, which is available on PyPi.

In [1]:
import ukds

Set up a filepath to a .tab data table file

The filepath to the data table under study is specified here. This can be changed as needed.

In [2]:
fp_tab=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015'+\
       r'\UKDA-8128-tab\tab\uktus15_household.tab'

Set up a filepath to a UKDS .rtf data dictionary file

The filepath to the associated data dictionary is specified here. This can be changed as needed.

In [3]:
fp_dd=r'C:\Users\cvskf\OneDrive - Loughborough University\_Data\United_Kingdom_Time_Use_Survey_2014-2015' + \
      r'\UKDA-8128-tab\mrdoc\allissue\uktus15_household_ukda_data_dictionary.rtf'

Create a DataTable object

A DataTable object is created. The filepaths are supplied as arguments and the files are read into the DataTable object.

In [4]:
dt=ukds.DataTable(fp_tab,fp_dd)
print(dt.__doc__)
print(dt)
A class for reading a UK Data Service .tab data table file
    
<ukds.data_table.DataTable object at 0x000001DBFB2F92B0>

The data table .tab file is stored in the tab attribute as a pandas DataFrame:

In [5]:
dt.tab.head()
Out[5]:
serial strata psu HhOut hh_wt IMonth IYear DM014 DM016 DM510 ... Relate10_P1 Relate10_P2 Relate10_P3 Relate10_P4 Relate10_P5 Relate10_P6 Relate10_P7 Relate10_P8 Relate10_P9 Relate10_P10
0 11010903 -2 -2 598 NaN 9 2014 0 0 0 ... -2 -2.0 NaN NaN NaN NaN NaN NaN NaN NaN
1 11010904 -2 -2 598 NaN 9 2014 0 0 0 ... -2 -2.0 NaN NaN NaN NaN NaN NaN NaN NaN
2 11010906 -2 -2 598 NaN 10 2014 0 0 0 ... -2 -2.0 -2.0 NaN NaN NaN NaN NaN NaN NaN
3 11010907 -2 -2 598 NaN 9 2014 1 1 0 ... -2 -2.0 -2.0 NaN NaN NaN NaN NaN NaN NaN
4 11010908 -2 -2 598 NaN 9 2014 0 0 0 ... -2 NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 335 columns

The data dictionary .rtf file is stored in the datadictionary attribute as a ukds.DataDictionary object:

In [6]:
dt.datadictionary
Out[6]:
<ukds.data_dictionary.DataDictionary at 0x1dbfb2f9358>

Get dataframe

The information in the tab and datadictionary attributes can be combined by the get_dataframe method.

This method returns a new pandas Dataframe in which:

  • the columns are a multi-level index which hold the data dictionary information
  • the table values are converted from numerical values to the label values, where applicable
In [7]:
df=dt.get_dataframe()
df.head()
Out[7]:
variable serial strata psu HhOut hh_wt IMonth IYear DM014 DM016 DM510 ... Relate10_P1 Relate10_P2 Relate10_P3 Relate10_P4 Relate10_P5 Relate10_P6 Relate10_P7 Relate10_P8 Relate10_P9 Relate10_P10
variable_label Household number Strata Primary sampling unit Final outcome - household Household weight Interview month Interview Year Number of children aged 0-14 Number of children aged 0-16 Number of children aged 5-10 ... Relate10_P1: How related to person 10 Relate10_P2: How related to person 10 Relate10_P3: How related to person 10 Relate10_P4: How related to person 10 Relate10_P5: How related to person 10 Relate10_P6: How related to person 10 Relate10_P7: How related to person 10 Relate10_P8: How related to person 10 Relate10_P9: How related to person 10 Relate10_P10: How related to person 10
variable_type numeric numeric numeric numeric numeric numeric numeric numeric numeric numeric ... numeric numeric numeric numeric numeric numeric numeric numeric numeric numeric
SPSS_measurement_level SCALE SCALE SCALE SCALE SCALE NOMINAL NOMINAL NOMINAL NOMINAL NOMINAL ... SCALE SCALE SCALE SCALE SCALE SCALE SCALE SCALE SCALE NOMINAL
SPSS_user_missing_values ...
pos 1 2 3 4 5 6 7 8 9 10 ... 326 327 328 329 330 331 332 333 334 335
0 11010903 Schedule not applicable Schedule not applicable Other reasons why unproductive NaN September 2014 0 0 0 ... Schedule not applicable Schedule not applicable NaN NaN NaN NaN NaN NaN NaN NaN
1 11010904 Schedule not applicable Schedule not applicable Other reasons why unproductive NaN September 2014 0 0 0 ... Schedule not applicable Schedule not applicable NaN NaN NaN NaN NaN NaN NaN NaN
2 11010906 Schedule not applicable Schedule not applicable Other reasons why unproductive NaN October 2014 0 0 0 ... Schedule not applicable Schedule not applicable Schedule not applicable NaN NaN NaN NaN NaN NaN NaN
3 11010907 Schedule not applicable Schedule not applicable Other reasons why unproductive NaN September 2014 1 1 0 ... Schedule not applicable Schedule not applicable Schedule not applicable NaN NaN NaN NaN NaN NaN NaN
4 11010908 Schedule not applicable Schedule not applicable Other reasons why unproductive NaN September 2014 0 0 0 ... Schedule not applicable NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 335 columns

In [ ]: