A doodle by Tony Hirst / @psychemedia
This is a first, very weak, attempt at putting together some contentmine
IPython magics.
The magics are based on the following conditions:
/notebooks
;psychemedia/contentmine
containing the contentmine applications: getpapers
, norma
, cmine
;There are two ideas at the heart of the demo:
As an example, this notebook was run in a container fired up from the following docker-compose.yaml
file launched with the command docker-compose up -d
:
notebook:
image: jupyter/notebook
ports:
- "8899:8888"
volumes:
- ./notebooks:/notebooks
- /var/run/docker.sock:/var/run/docker.sock
privileged: true
State is passed between the command line Docker container and the notebook container by mounting a specified directory in the command line container on top of a specified directory in the notebook container. Files persist in the notebook container directory; the temporary command line container can writes files to, and read files from this directory and its subdirectories.
Install the magics:
from IPython.core.magic import Magics, magics_class, line_magic
from IPython.core.magic_arguments import (argument, magic_arguments,
parse_argstring)
import shutil
import shlex
import os
!pip3 install docker-py
import docker
#Should do this as part of init
if not shutil.which("docker"):
!apt-get update && apt-get install -y docker.io
@magics_class
class DockerMagics(Magics):
#def dockerMagicGetPath(container,mountdir):
def dockerMagicGetPath(self,mountdir):
cli =docker.Client(base_url='unix://var/run/docker.sock')
#if cli.containers(filters={'name':container}):
# containerData=cli.inspect_container(container)
containers=cli.containers(filters={'id':os.environ['HOSTNAME']})
if containers==[]:
return ''
else:
c=[x['Source'] for x in containers[0]['Mounts'] if 'Destination' in x and x['Destination']==mountdir ]
return c[0]
#! docker run -v /Users/ajh59/tmp/notebookdockercli/notebooks/downloads:/contentmineself --tty --interactive psychemedia/contentmine getpapers -q rhinocerous -o /contentmineself/rhinocerous -x
return ''
#getpapers -q rhinocerous -o /contentmine/rhinocerous -x
@line_magic
def getpapers(self,line):
""" Runs a contentmine command: /MOUNTDIR SEARCHTERM
%getpapers /notebooks rhinocerous
"""
mount=self.dockerMagicGetPath(line.strip().split()[0])
if mount=='':
print('No container mounted there?')
return
Q=' '.join(line.strip().split()[1:])
QD=shlex.quote(Q)
DD='{}{}'.format(mount,'/contentmineMagic')
! docker run --rm -v {DD}:/tmp_contentmineMagic --tty --interactive psychemedia/contentmine getpapers -q {Q} -o /tmp_contentmineMagic/{QD} -x
#norma --project /contentmine/aardvark -i fulltext.xml -o scholarly.html --transform nlm2html
@line_magic
def norma(self,line):
"""
%norma /notebooks rhinocerous
"""
mount=self.dockerMagicGetPath(line.strip().split()[0])
if mount=='':
print('No container mounted there?')
return
Q=' '.join(line.strip().split()[1:])
QD=shlex.quote(Q)
DD='{}{}'.format(mount,'/contentmineMagic')
! docker run --rm -v {DD}:/tmp_contentmineMagic --tty --interactive psychemedia/contentmine norma --project /tmp_contentmineMagic/{QD} -i fulltext.xml -o scholarly.html --transform nlm2html
#./contentmine cmine /contentmine/aardvark
@line_magic
def cmine(self,line):
"""
%cmine /notebooks rhinocerous
"""
mount=self.dockerMagicGetPath(line.strip().split()[0])
if mount=='':
print('No container mounted there?')
return
Q=' '.join(line.strip().split()[1:])
QD=shlex.quote(Q)
DD='{}{}'.format(mount,'/contentmineMagic')
! docker run --rm -v {DD}:/tmp_contentmineMagic --tty --interactive psychemedia/contentmine cmine /tmp_contentmineMagic/{QD}
Requirement already satisfied (use --upgrade to upgrade): docker-py in /usr/local/lib/python3.4/dist-packages
Requirement already satisfied (use --upgrade to upgrade): backports.ssl-match-hostname>=3.5 in /usr/local/lib/python3.4/dist-packages (from docker-py)
Requirement already satisfied (use --upgrade to upgrade): six>=1.4.0 in /usr/local/lib/python3.4/dist-packages (from docker-py)
Requirement already satisfied (use --upgrade to upgrade): websocket-client>=0.32.0 in /usr/local/lib/python3.4/dist-packages (from docker-py)
Requirement already satisfied (use --upgrade to upgrade): requests>=2.5.2 in /usr/local/lib/python3.4/dist-packages (from docker-py)
You are using pip version 8.0.2, however version 8.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
ip = get_ipython()
ip.register_magics(DockerMagics)
Now for a demo...
!rm -r contentmineMagic/
!ls
Contentmine Magic.ipynb Untitled.ipynb
%getpapers /notebooks rhinocerous
info: Searching using eupmc API info: Found 4 open access results Retrieving results [==============================] 100% (eta 0.0s) info: Done collecting results info: Saving result metadata info: Full EUPMC result metadata written to eupmc_results.json info: Individual EUPMC result metadata records written info: Extracting fulltext HTML URL list (may not be available for all articles) info: Fulltext HTML URL list written to eupmc_fulltext_html_urls.txt info: Got XML URLs for 4 out of 4 results info: Downloading fulltext XML files Downloading files [=======-------------------] 25% (1/4) [0.0s elapsed, eta 0.0]Downloading files [=============-------------] 50% (2/4) [0.0s elapsed, eta 0.0]Downloading files [====================------] 75% (3/4) [0.0s elapsed, eta 0.0]Downloading files [=========================] 100% (4/4) [0.1s elapsed, eta 0.0] info: All downloads succeeded!
%norma /notebooks rhinocerous
.
%cmine /notebooks rhinocerous
running: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] WS: /tmp_contentmineMagic/rhinocerous 0 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/pmcstop.txt 4 [main] DEBUG org.xmlcml.ami2.wordutil.WordSetWrapper - symbol expands to: /org/xmlcml/ami2/wordutil/stopwords.txt .filter: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] frequenciesfrequencies5461 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED .5464 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 5467 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 5470 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED summary: word([frequencies])[{xpath:@count>20}, {w.stopwords:pmcstop.txt stopwords.txt}] C: frequencies.running: sequence([dnaprimer])[] .filter: sequence([dnaprimer])[] dnaprimerdnaprimer6546 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED .6549 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 6550 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 6552 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED summary: sequence([dnaprimer])[] C: dnaprimer.running: gene([human])[] .filter: gene([human])[] humanhuman8330 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED .8332 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 8334 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 8336 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED summary: gene([human])[] C: human.running: species([genus])[] SP: /tmp_contentmineMagic/rhinocerous.filter: species([genus])[] genusgenus10699 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED .10701 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 10703 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 10706 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED summary: species([genus])[] C: genus.running: species([binomial])[] SP: /tmp_contentmineMagic/rhinocerous.filter: species([binomial])[] binomialbinomial12458 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED .12460 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 12462 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED 12464 [main] WARN org.xmlcml.cmine.util.CMineGlobber - might delete system files: IGNORED summary: species([binomial])[] C: binomial.12535 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 12540 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 12545 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 12549 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption 12553 [main] WARN org.xmlcml.ami2.plugins.ResultsAnalysis - Null pluginOption
!ls
contentmineMagic Contentmine Magic.ipynb Untitled.ipynb
!ls contentmineMagic/
rhinocerous
!ls contentmineMagic/rhinocerous/
commonest.dataTables.html sequence.dnaprimer.count.xml count.dataTables.html sequence.dnaprimer.documents.xml entries.dataTables.html sequence.dnaprimer.snippets.xml eupmc_fulltext_html_urls.txt species.binomial.count.xml eupmc_results.json species.binomial.documents.xml full.dataTables.html species.binomial.snippets.xml gene.human.count.xml species.genus.count.xml gene.human.documents.xml species.genus.documents.xml gene.human.snippets.xml species.genus.snippets.xml PMC2213592 word.frequencies.count.xml PMC4698820 word.frequencies.documents.xml PMC4730296 word.frequencies.snippets.xml PMC4788244
Setting up the shared directories is a bit of a fudge - is there a better way?
The magics need to be better defined, allowing for the passing of appropriate command line switches, e.g. in getpapers
, via core.magic_arguments
, for example.
Need to consider cell magics so we can write a pipeline along the lines of something like:
%%contentmine /notebooks rhinocerous
getpapers
norma
cmine
A proper install package needs putting together.
The magics need generalising up to a generic docker magic
, and then perhaps back down to magics for a particular application?
More info: Defining custom magics