In [1]:
Magic Example Explanation
info %%info Outputs session information for the current Livy endpoint.
cleanup %%cleanup -f Deletes all sessions for the current Livy endpoint, including this notebook's session. The force flag is mandatory.
delete %%delete -f -s 0 Deletes a session by number for the current Livy endpoint. Cannot delete this kernel's session.
logs %%logs Outputs the current session's Livy logs.
configure %%configure -f
{"executorMemory": "1000M", "executorCores": 4}
Configure the session creation parameters. The force flag is mandatory if a session has already been created and the session will be dropped and recreated.
Look at Livy's POST /sessions Request Body for a list of valid parameters. Parameters must be passed in as a JSON string.
sql %%sql -o tables -q
Executes a SQL query against the sqlContext. Parameters:
  • -o VAR_NAME: The result of the query will be available in the %%local Python context as a Pandas dataframe.
  • -q: The magic will return None instead of the dataframe (no visualization).
  • -m METHOD: Sample method, either take or sample.
  • -n MAXROWS: The maximum number of rows of a SQL query that will be pulled from Livy to Jupyter. If this number is negative, then the number of rows will be unlimited.
  • -r FRACTION: Fraction used for sampling.
local %%local
a = 1
All the code in subsequent lines will be executed locally. Code must be valid Python code.
In [2]:
Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}
No active sessions.
In [3]:
No logs yet.
In [4]:
Creating SparkContext as 'sc'
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
Creating HiveContext as 'sqlContext'
SparkContext and HiveContext created. Executing user code ...
In [5]:
import os
print(os.environ.get('SPARK_HOME', None))
print(os.environ.get('HADOOP_CONF_DIR', None))
In [6]:
Current session configs: {'kind': 'pyspark', 'driverMemory': '1000M', 'executorCores': 2}
IDYARN Application IDKindStateSpark UIDriver logCurrent session?
In [7]:
In [8]:
In [9]:
show tables
In [10]:
select * from movies_pq_s3 limit 100
In [11]:
%%sql -o ratings
select movieid, rating from ratings_pq_s3
In [12]:
%matplotlib inline
import matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
sns.distplot(ratings.rating, kde=False, rug=True)
<matplotlib.axes._subplots.AxesSubplot at 0x115f7d0b8>