This notebook is intended as an example of how to "do SEM" using different tools that are accessible in a Jupyter IPython Notebook installed with Anaconda. The notebook uses a Python3 kernel and interfaces with several R packages. Alternatives would be to run an R kernel in the notebook directly, or to use an RStudio markdown notebook and the reticulate package to access Python functionality from R.
To Do for future revisions:
Integrate instructions on using different hosted / collaborative notebook options for this tutorial:
Turn it into a RISE presentation https://rise.readthedocs.io/en/docs_hot_fixes/ (ideally also using the "Split Cells Notebook" extension for better visual organization)
Integrate other SEM tools available in R:
Compare with other tools:
For local use, I recommend jupyterlab as the interface, usually acessible as http://localhost:8888/lab
To run the code in this notebook, a local (or hosted) installation of Anaconda is required. The following installs R components available through Anaconda. We use the --yes option to avoid prompts for confirmation (impossible in the notebook) but that also means you cannot check what will be installed before it proceeds. To avoid this, you could run the installations from an Anaconda prompt or using the Anaconda Navigator.
R Studio is not strictly required, but it is a useful alternative way of using R. Also note that this assumes an Anaconda installation/environment that you have permission to change. On Windows, that means choosing the recommended "for this user only" option on install of Anaconda. Otherwise, these installations require Administrator / root rights.
!conda install --yes r-essentials
Solving environment: ...working... done # All requested packages already installed.
# !conda install --yes rstudio
Solving environment: ...working... done # All requested packages already installed.
!conda install --yes r-lavaan
Solving environment: ...working... done # All requested packages already installed.
Now we need rpy2
which provides the bridge between this python notebook and R. (Also installing tzlocal due to a current dependency bug in rpy2.)
!conda install --yes tzlocal rpy2
Solving environment: ...working... done # All requested packages already installed.
Let's enable the rpy2 extension, so that we can then execute R code with the %%R magic command at the top of a cell.
%load_ext rpy2.ipython
If you are on Windows, and you do not get text output from %%R cells showing up in the notebook, but instead in the console window where jupyter is running, this is a bug in rpy2 on Windows, and there's a workaround to capture stdout by running the following cells, see https://github.com/vitorcurtis/RWinOut
%%R
install.packages(c("R.utils"))
!curl -O "https://raw.githubusercontent.com/vitorcurtis/RWinOut/master/RWinOut.py"
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 1259 100 1259 0 0 11550 0 --:--:-- --:--:-- --:--:-- 11550
%load_ext RWinOut
The rpy2.ipython extension is already loaded. To reload it, use: %reload_ext rpy2.ipython
And finally we want to see plots directly in the notebook. The simplest way is to request inline plots.
%matplotlib inline
But assuming you are running this in a JuyterLab interface, you might want the ipympl
library to get interactive widget plots.
# We install ipympl with pip, as it is not yet readily available with conda:
!pip install ipympl
# nodejs is needed for the interactive features if using JupyterLab,
# the corresponding package for normal notebooks, widgetsnbextension, should already be installed.
!conda install --yes nodejs
# install the jupyterlab extensions:
!jupyter labextension install @jupyter-widgets/jupyterlab-manager
!jupyter labextension install jupyter-matplotlib
Requirement already satisfied: ipympl in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (0.2.1) Requirement already satisfied: ipykernel>=4.7 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (5.1.0) Requirement already satisfied: ipywidgets>=7.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (7.4.2) Requirement already satisfied: six in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (1.11.0) Requirement already satisfied: matplotlib>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipympl) (2.2.2) Requirement already satisfied: ipython>=5.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (7.1.1) Requirement already satisfied: traitlets>=4.1.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (4.3.2) Requirement already satisfied: jupyter-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.2.3) Requirement already satisfied: tornado>=4.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipykernel>=4.7->ipympl) (5.1.1) Requirement already satisfied: widgetsnbextension~=3.4.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (3.4.2) Requirement already satisfied: nbformat>=4.2.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipywidgets>=7.0.0->ipympl) (4.4.0) Requirement already satisfied: numpy>=1.7.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.15.3) Requirement already satisfied: cycler>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (0.10.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.2.2) Requirement already satisfied: python-dateutil>=2.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2.7.5) Requirement already satisfied: pytz in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (2018.7) Requirement already satisfied: kiwisolver>=1.0.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from matplotlib>=2.0.0->ipympl) (1.0.1) Requirement already satisfied: colorama; sys_platform == "win32" in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.4.0) Requirement already satisfied: pygments in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.2.0) Requirement already satisfied: setuptools>=18.5 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (40.5.0) Requirement already satisfied: decorator in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (4.3.0) Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (2.0.7) Requirement already satisfied: jedi>=0.10 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.13.1) Requirement already satisfied: pickleshare in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.7.5) Requirement already satisfied: backcall in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.0) Requirement already satisfied: ipython-genutils in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from traitlets>=4.1.0->ipykernel>=4.7->ipympl) (0.2.0) Requirement already satisfied: jupyter-core in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (4.4.0) Requirement already satisfied: pyzmq>=13 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jupyter-client->ipykernel>=4.7->ipympl) (17.1.2) Requirement already satisfied: notebook>=4.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.7.0) Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbformat>=4.2.0->ipywidgets>=7.0.0->ipympl) (2.6.0) Requirement already satisfied: wcwidth in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.1.7) Requirement already satisfied: parso>=0.3.0 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jedi>=0.10->ipython>=5.0.0->ipykernel>=4.7->ipympl) (0.3.1) Requirement already satisfied: Send2Trash in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.5.0) Requirement already satisfied: prometheus-client in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2) Requirement already satisfied: jinja2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (2.10) Requirement already satisfied: nbconvert in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (5.3.1) Requirement already satisfied: terminado>=0.8.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.1) Requirement already satisfied: MarkupSafe>=0.23 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.0) Requirement already satisfied: pandocfilters>=1.4.1 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (1.4.2) Requirement already satisfied: bleach in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (3.0.2) Requirement already satisfied: testpath in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.4.2) Requirement already satisfied: entrypoints>=0.2.2 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.2.3) Requirement already satisfied: mistune>=0.7.4 in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.8.4) Requirement already satisfied: webencodings in c:\users\sr876\appdata\local\continuum\anaconda3\lib\site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.4.0->ipywidgets>=7.0.0->ipympl) (0.5.1) Solving environment: ...working... done # All requested packages already installed. jupyter-widgets-jupyterlab-manager-0.38.1.tgz yarn install v1.9.4 info No lockfile found. [1/5] Validating package.json... [2/5] Resolving packages... warning css-loader > cssnano > autoprefixer > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > caniuse-api > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. [3/5] Fetching packages... info fsevents@1.2.4: The platform "win32" is incompatible with this module. info "fsevents@1.2.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning "@jupyterlab/vdom-extension > @nteract/transform-vdom@1.1.1" has incorrect peer dependency "react@^15.6.1". [5/5] Building fresh packages... success Saved lockfile. Done in 75.75s. yarn run v1.9.4 $ webpack Hash: f1eac10162dd2d44bf4d Version: webpack 4.12.2 Time: 39111ms Built at: 2018-11-02 20:57:26 Asset Size Chunks Chunk Names 1.055322dcf6c2bb19185f.js 888 KiB 1 [emitted] 674f50d287a8c48dc19ba404d20fe713.eot 162 KiB [emitted] 912ec66d7572ff821749319396470bde.svg 434 KiB [emitted] fee66e712a8a08eef5805a46892932ad.woff 95.7 KiB [emitted] b06871f281fee6b241d60582ae9369b9.ttf 162 KiB [emitted] main.5a992934528990838951.js 54.6 KiB main [emitted] main 0.cc98107762fcc28532b3.js 4.5 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js 519 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js 22.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js 9.41 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js 2.76 MiB vendors~vega [emitted] vendors~vega af7ae505a9eed503f8b8e6982036873e.woff2 75.4 KiB [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js 266 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls main.5a992934528990838951.js.map 63.5 KiB main [emitted] main 0.cc98107762fcc28532b3.js.map 5.73 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js.map 251 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js.map 15.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js.map 10.9 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js.map 2.19 MiB vendors~vega [emitted] vendors~vega 1.055322dcf6c2bb19185f.js.map 1.04 MiB 1 [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js.map 303 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls index.html 1.53 KiB [emitted] Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built] [1] vertx (ignored) 15 bytes {main} [optional] [built] [4] buffer (ignored) 15 bytes {main} [optional] [built] [5] crypto (ignored) 15 bytes {main} [optional] [built] [6] readable-stream (ignored) 15 bytes {main} [built] [7] supports-color (ignored) 15 bytes {main} [built] [8] chalk (ignored) 15 bytes {main} [built] [9] fs (ignored) 15 bytes {main} [built] [10] node-fetch (ignored) 15 bytes {vega} [built] [11] fs (ignored) 15 bytes {vega} [built] [ANye] ./build/index.out.js 35.9 KiB {main} [built] [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built] [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built] [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built] + 2390 hidden modules WARNING in jquery Multiple versions of jquery found: 2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js 3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js WARNING in vega-lite Multiple versions of vega-lite found: 2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js 2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js Check how you can resolve duplicate packages: https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle Child html-webpack-plugin for "index.html": 1 asset Entrypoint undefined = index.html [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built] [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built] [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built] + 1 hidden module Done in 48.33s.
Node v8.9.3 > C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack @jupyter-widgets/jupyterlab-manager Node v8.9.3 > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build
jupyter-matplotlib-0.3.0.tgz yarn install v1.9.4 info No lockfile found. [1/5] Validating package.json... [2/5] Resolving packages... warning css-loader > cssnano > autoprefixer > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. warning css-loader > cssnano > postcss-merge-rules > caniuse-api > browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools. [3/5] Fetching packages... info fsevents@1.2.4: The platform "win32" is incompatible with this module. info "fsevents@1.2.4" is an optional dependency and failed compatibility check. Excluding it from installation. [4/5] Linking dependencies... warning "@jupyterlab/vdom-extension > @nteract/transform-vdom@1.1.1" has incorrect peer dependency "react@^15.6.1". [5/5] Building fresh packages... success Saved lockfile. Done in 107.61s. yarn run v1.9.4 $ webpack Hash: f1eac10162dd2d44bf4d Version: webpack 4.12.2 Time: 36409ms Built at: 2018-11-02 21:00:13 Asset Size Chunks Chunk Names 1.055322dcf6c2bb19185f.js 888 KiB 1 [emitted] 674f50d287a8c48dc19ba404d20fe713.eot 162 KiB [emitted] 912ec66d7572ff821749319396470bde.svg 434 KiB [emitted] fee66e712a8a08eef5805a46892932ad.woff 95.7 KiB [emitted] b06871f281fee6b241d60582ae9369b9.ttf 162 KiB [emitted] main.5a992934528990838951.js 54.6 KiB main [emitted] main 0.cc98107762fcc28532b3.js 4.5 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js 519 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js 22.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js 9.41 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js 2.76 MiB vendors~vega [emitted] vendors~vega af7ae505a9eed503f8b8e6982036873e.woff2 75.4 KiB [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js 266 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls main.5a992934528990838951.js.map 63.5 KiB main [emitted] main 0.cc98107762fcc28532b3.js.map 5.73 KiB 0 [emitted] vega.91b98e783d16fd1b9e23.js.map 251 bytes vega [emitted] vega vendors~@jupyter-widgets/controls~vega.3dd933b62461edbc58d8.js.map 15.6 KiB vendors~@jupyter-widgets/controls~vega [emitted] vendors~@jupyter-widgets/controls~vega vendors~main.44f7c1af2649541ea6c6.js.map 10.9 MiB vendors~main [emitted] vendors~main vendors~vega.1e302f886dbe300cf0ef.js.map 2.19 MiB vendors~vega [emitted] vendors~vega 1.055322dcf6c2bb19185f.js.map 1.04 MiB 1 [emitted] vendors~@jupyter-widgets/controls.741a7524652a40694e8a.js.map 303 KiB vendors~@jupyter-widgets/controls [emitted] vendors~@jupyter-widgets/controls index.html 1.53 KiB [emitted] Entrypoint main = vendors~main.44f7c1af2649541ea6c6.js vendors~main.44f7c1af2649541ea6c6.js.map main.5a992934528990838951.js main.5a992934528990838951.js.map [0] multi whatwg-fetch ./build/index.out.js 40 bytes {main} [built] [1] vertx (ignored) 15 bytes {main} [optional] [built] [4] buffer (ignored) 15 bytes {main} [optional] [built] [5] crypto (ignored) 15 bytes {main} [optional] [built] [6] readable-stream (ignored) 15 bytes {main} [built] [7] supports-color (ignored) 15 bytes {main} [built] [8] chalk (ignored) 15 bytes {main} [built] [9] fs (ignored) 15 bytes {main} [built] [10] node-fetch (ignored) 15 bytes {vega} [built] [11] fs (ignored) 15 bytes {vega} [built] [ANye] ./build/index.out.js 35.9 KiB {main} [built] [RnhZ] ./node_modules/moment/locale sync ^\.\/.*$ 2.88 KiB {main} [optional] [built] [YuTi] (webpack)/buildin/module.js 497 bytes {vendors~main} [built] [eTbV] ./node_modules/codemirror/mode sync ^\.\/.*\.js$ 2.78 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {vendors~main} [built] + 2390 hidden modules WARNING in jquery Multiple versions of jquery found: 2.2.4 ./~/jupyter-matplotlib/~/jquery from ./~/jupyter-matplotlib\src\mpl_widget.js 3.3.1 ./~/jquery from ./~/@jupyter-widgets\base\lib\widget.js WARNING in vega-lite Multiple versions of vega-lite found: 2.5.1 ./~/vega-lite\build\src from ./~/vega-lite\build\src\compile\selection\selection.js 2.6.0 ./~/vega-lite\build from ./~/vega-lite\build\src\index.js Check how you can resolve duplicate packages: https://github.com/darrenscerri/duplicate-package-checker-webpack-plugin#resolving-duplicate-packages-in-your-bundle Child html-webpack-plugin for "index.html": 1 asset Entrypoint undefined = index.html [KTNU] ./node_modules/html-loader!./templates/partial.html 567 bytes {0} [built] [YuTi] (webpack)/buildin/module.js 497 bytes {0} [built] [aS2v] ./node_modules/html-webpack-plugin/lib/loader.js!./templates/template.html 1.22 KiB {0} [built] [yLpj] (webpack)/buildin/global.js 489 bytes {0} [built] + 1 hidden module Done in 46.88s.
Node v8.9.3 > C:\Users\sr876\AppData\Local\Continuum\anaconda3\npm.CMD pack jupyter-matplotlib Node v8.9.3 > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js install > node C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\jupyterlab\staging\yarn.js run build
Now enable widget-based plots:
%matplotlib widget
Finally, since this is a long document, the following extension adds a table-of-contents sidebar to the JupyterLab interface:
!jupyter labextension install @jupyterlab/toc
Now for some R packages that are not readily available in an Anaconda default install. They might be available through the conda-forge "channel" - however, at the time of writing, I cannot recommend this, as the performance of conda install
is abysmal when using R packages from that repository.
When using a server-hosted notebook, some or all of these packages might already be installed.
%%R
install.packages(c("semPlot", "OpenMx", "semTools", "sem", "gpairs", "GGally"))
A note to avoid possible confusion: lavaan
provides a function cfa
as a convenience for confirmatory factor analysis. There is also an R package called cfa
- however, that one is not related to SEM.
First, import all the basic packages in Python, such as pandas
, numpy
, matplotlib
.
Also import seaborn
for simple high-level plots with decent looks. It complements the default plotting provided by matplotlib
. See here, for a useful brief overview of looking at data using some of seaborn's plot types: https://elitedatascience.com/python-seaborn-tutorial
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Next, some of the libraries that are statistics-oriented.
import statsmodels.api as sm
import statsmodels.formula.api as smf # the R-like interface for statsmodels
import statsmodels.graphics as smg
import sklearn
Now for loading R packages, we will be using.
%%R
library(lavaan)
library(semPlot)
library(OpenMx)
library(semTools)
In R, there are also several packages providing convenient high-level plots, such as generalized pairs plots.
%%R
library(GGally)
library(gpairs)
For the purpose of getting nice visual output, we will also set some defaults for plotting libraries.
defaultfigwidth, defaultfigheight = 10, 9
# set a slightly larger default size for plots. default dpi is 100
plt.rcParams['figure.figsize'] = [defaultfigwidth, defaultfigheight]
# enable seaborn's defaults for nicer plots overall:
sns.set(color_codes=True)
There is currently no built-in way to set default dimensions for plots in R. The %R magic command from the rpy2
library accepts width, height, and units parameters like this: %%R -w 10 -h 9 -u in -r 100
but it would be nice to set defaults.
Since this is python, there is a way around that using monkey-patching. Note that this is usually a Bad Idea(TM) and should be avoided if possible. It is also purely cosmetic for the purposes of this notebook, so it can be safely ignored. :)
# these are the defaults we want to set:
default_units = 'in' # inch, to make it more easily comparable to matpplotlib
default_res = 100 # dpi, same as default in matplotlib
default_width = 10
default_height = 9
# try monkey-patching a function in rpy2, so we effectively get these
# default settings for the width, height, and units arguments of the %R magic command
import rpy2
old_setup_graphics = rpy2.ipython.rmagic.RMagics.setup_graphics
def new_setup_graphics(self, args):
if getattr(args, 'units') is not None:
if args.units != default_units: # a different units argument was passed, do not apply defaults
return old_setup_graphics(self, args)
args.units = default_units
if getattr(args, 'res') is None:
args.res = default_res
if getattr(args, 'width') is None:
args.width = default_width
if getattr(args, 'height') is None:
args.height = default_height
return old_setup_graphics(self, args)
rpy2.ipython.rmagic.RMagics.setup_graphics = new_setup_graphics
We are borrowing example data from this excellent course offered at Harvard, S090A1: https://canvas.harvard.edu/courses/8737/pages/data
The actual data is from the Zambian Early Childhood Development Project. The full sample has more than 1600 Zambian six-year-olds, from a study led by Günther Fink and Stephanie Zuilkowski.
The data is in the proprietory stata format, so we first need to convert it and import it. We will use pandas.read_stata
but this could also be accomplished in R with the foreign
package. First, we write a helper function to download the data file if it is not in the current directly. Defining a function will allow us to re-use it later for other datasets.
import os.path
import urllib.request
def downloadIfMissing(filenameData, remoteLocation):
'''Check if the file exists. If not, try downloading from remoteLocation.'''
if not os.path.isfile(filenameData):
with urllib.request.urlopen(remoteLocation) as response:
with open(filenameData, 'xb') as destinationFile:
destinationFile.write(response.read())
# make sure we have the small data file available in the current directory, if not, try to download it:
filenameSmallZambiaData = "S090_InClass_Zambia.dta"
downloadIfMissing(filenameSmallZambiaData, "https://canvas.harvard.edu/courses/8737/files/1839865/download")
# read the data into a pandas dataframe
smallZambiaDF = pd.read_stata(filenameSmallZambiaData)
len(smallZambiaDF) # should return 1613
1613
# make sure we have the full measurement data file available in the current directory, if not, try to download it:
filenameMeasureZambiaData = "S090_InClass_Zambia_Measurement.dta"
downloadIfMissing(filenameMeasureZambiaData, "https://canvas.harvard.edu/courses/8737/files/1994882/download")
# read the data into a pandas dataframe
measureZambiaDF = pd.read_stata(filenameMeasureZambiaData)
len(measureZambiaDF) # should return 1623
1623
We now have the data in a dataframe, let's get an overview of what kind of data we are dealing with.
Now let's have a look at the dataframe.
smallZambiaDF.head() # equivalent to [:5] i.e. first five entries
childid | male | urban | ece | reasoning | socemo | vocab | vocabsq | wealth | books | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 101 | Male | Urban | ECE | 3.0 | 1.05 | 18.0 | 324.0 | 3.0 | No books in home |
1 | 102 | Male | Urban | No ECE | 4.0 | 1.00 | 19.0 | 361.0 | 2.0 | No books in home |
2 | 103 | Female | Urban | No ECE | 4.0 | 1.80 | 19.0 | 361.0 | 3.0 | No books in home |
3 | 104 | Male | Urban | No ECE | 5.0 | 2.35 | 12.0 | 144.0 | 3.0 | No books in home |
4 | 105 | Male | Urban | No ECE | 5.0 | 1.50 | 25.0 | 625.0 | 2.0 | Books in home |
smallZambiaDF.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 1613 entries, 0 to 1612 Data columns (total 10 columns): childid 1613 non-null int16 male 1613 non-null category urban 1613 non-null category ece 1613 non-null category reasoning 1613 non-null float32 socemo 1613 non-null float32 vocab 1613 non-null float32 vocabsq 1613 non-null float32 wealth 1613 non-null float32 books 1613 non-null category dtypes: category(4), float32(5), int16(1) memory usage: 53.9 KB
smallZambiaDF.describe(include='all', percentiles=[]) # describe categorical and numerical columns, don't bother with percentiles
childid | male | urban | ece | reasoning | socemo | vocab | vocabsq | wealth | books | |
---|---|---|---|---|---|---|---|---|---|---|
count | 1613.000000 | 1613 | 1613 | 1613 | 1613.000000 | 1613.000000 | 1613.000000 | 1613.000000 | 1613.000000 | 1613 |
unique | NaN | 2 | 2 | 2 | NaN | NaN | NaN | NaN | NaN | 2 |
top | NaN | Female | Urban | No ECE | NaN | NaN | NaN | NaN | NaN | No books in home |
freq | NaN | 810 | 814 | 1110 | NaN | NaN | NaN | NaN | NaN | 1171 |
mean | 4158.768754 | NaN | NaN | NaN | 4.451333 | 1.647789 | 21.475512 | 489.346558 | 2.957222 | NaN |
std | 2376.382896 | NaN | NaN | NaN | 2.521065 | 0.450903 | 5.307204 | 216.253510 | 1.430792 | NaN |
min | 101.000000 | NaN | NaN | NaN | 0.000000 | 0.444444 | 0.000000 | 0.000000 | 1.000000 | NaN |
50% | 4312.000000 | NaN | NaN | NaN | 4.000000 | 1.611111 | 22.000000 | 484.000000 | 3.000000 | NaN |
max | 8125.000000 | NaN | NaN | NaN | 10.000000 | 3.000000 | 30.000000 | 900.000000 | 5.000000 | NaN |
smallZambiaDF.male.describe()
count 1613 unique 2 top Female freq 810 Name: male, dtype: object
smallZambiaDF.male.value_counts()
Female 810 Male 803 Name: male, dtype: int64
smallZambiaDF.wealth.value_counts()
1.0 356 4.0 340 2.0 311 5.0 307 3.0 299 Name: wealth, dtype: int64
# visually check relations between numeric variables
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='scatter')
# we have to use an explicit height per facet-figure here, since a grid of figures doesn't follow the matplotlib default size
C:\Users\sr876\AppData\Local\Continuum\anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
grid = sns.pairplot(smallZambiaDF, hue="ece", height=defaultfigheight/6, kind='reg') # linear regressions on top of scatter