Dataset info
Number of columns | 4 |
Number of rows | 3 |
Total Missing (%) | 0.0% |
Total size in memory | 81.7 MB |
Column types
String | 0 |
Numeric | 1 |
Date | 0 |
Bool | 0 |
Array | 0 |
Not available | 0 |
In Google colab you can easily run Optimus. If you not you may want to go here https://colab.research.google.com/github/ironmussa/Optimus/blob/master/examples/10_min_from_spark_to_pandas_with_optimus.ipynb
Install Optimus all the dependencies.
import sys
if 'google.colab' in sys.modules:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q https://archive.apache.org/dist/spark/spark-2.4.1/spark-2.4.1-bin-hadoop2.7.tgz
!tar xf spark-2.4.1-bin-hadoop2.7.tgz
!pip install optimuspyspark
Before you continue, please go to the 'Runtime' Menu above, and select 'Restart Runtime (Ctrl + M + .)'.
if 'google.colab' in sys.modules:
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.1-bin-hadoop2.7"
To hacking Optimus we recommend to clone the repo and change repo_path
relative to this notebook.
repo_path=".."
# This will reload the change you make to Optimus in real time
%load_ext autoreload
%autoreload 2
import sys
sys.path.append(repo_path)
from command line:
pip install optimuspyspark
from a notebook you can use:
!pip install optimuspyspark
from optimus import Optimus
C:\Users\argenisleon\Anaconda3\lib\site-packages\socks.py:58: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Callable You are using PySparkling of version 2.4.10, but your PySpark is of version 2.3.1. Please make sure Spark and PySparkling versions are compatible. `formatargspec` is deprecated since Python 3.5. Use `signature` and the `Signature` object directly
op = Optimus(master="local")
Create a dataframe to passing a list of values for columns and rows. Unlike pandas you need to specify the column names.
df = op.create.df(
[
"names",
"height(ft)",
"function",
"rank",
"weight(t)",
"japanese name",
"last position",
"attributes"
],
[
("Optim'us", 28.0, "Leader", 10, 4.3, ["Inochi", "Convoy"], "19.442735,-99.201111", [8.5344, 4300.0]),
("bumbl#ebéé ", 17.5, "Espionage", 7, 2.0, ["Bumble", "Goldback"], "10.642707,-71.612534", [5.334, 2000.0]),
("ironhide&", 26.0, "Security", 7, 4.0, ["Roadbuster"], "37.789563,-122.400356", [7.9248, 4000.0]),
("Jazz", 13.0, "First Lieutenant", 8, 1.8, ["Meister"], "33.670666,-117.841553", [3.9624, 1800.0]),
("Megatron", None, "None", None, 5.7, ["Megatron"], None, [None, 5700.0]),
("Metroplex_)^$", 300.0, "Battle Station", 8, None, ["Metroflex"], None, [91.44, None]),
]).h_repartition(1)
df.table()
names
1 (string)
nullable
|
height(ft)
2 (float)
nullable
|
function
3 (string)
nullable
|
rank
4 (int)
nullable
|
weight(t)
5 (float)
nullable
|
japanese name
6 (array<string>)
nullable
|
last position
7 (string)
nullable
|
attributes
8 (array<float>)
nullable
|
---|---|---|---|---|---|---|---|
Optim'us
|
28.0
|
Leader
|
10
|
4.300000190734863
|
['Inochi',⋅'Convoy']
|
19.442735,-99.201111
|
[8.53439998626709,⋅4300.0]
|
bumbl#ebéé⋅⋅
|
17.5
|
Espionage
|
7
|
2.0
|
['Bumble',⋅'Goldback']
|
10.642707,-71.612534
|
[5.334000110626221,⋅2000.0]
|
ironhide&
|
26.0
|
Security
|
7
|
4.0
|
['Roadbuster']
|
37.789563,-122.400356
|
[7.924799919128418,⋅4000.0]
|
Jazz
|
13.0
|
First⋅Lieutenant
|
8
|
1.7999999523162842
|
['Meister']
|
33.670666,-117.841553
|
[3.962399959564209,⋅1800.0]
|
Megatron
|
None
|
None
|
None
|
5.699999809265137
|
['Megatron']
|
None
|
[None,⋅5700.0]
|
Metroplex_)^$
|
300.0
|
Battle⋅Station
|
8
|
None
|
['Metroflex']
|
None
|
[91.44000244140625,⋅None]
|
Creating a dataframe by passing a list of tuples specifyng the column data type. You can specify as data type an string or a Spark Datatypes. https://spark.apache.org/docs/2.3.1/api/java/org/apache/spark/sql/types/package-summary.html
Also you can use some Optimus predefined types:
df = op.create.df(
[
("names", "str"),
("height", "float"),
("function", "str"),
("rank", "int"),
],
[
("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7),
("Jazz", 13.0, "First Lieutenant", 8),
("Megatron", None, "None", None),
])
df.table()
names
1 (string)
nullable
|
height
2 (float)
nullable
|
function
3 (string)
nullable
|
rank
4 (int)
nullable
|
---|---|---|---|
bumbl#ebéé⋅⋅
|
17.5
|
Espionage
|
7
|
Optim'us
|
28.0
|
Leader
|
10
|
ironhide&
|
26.0
|
Security
|
7
|
Jazz
|
13.0
|
First⋅Lieutenant
|
8
|
Megatron
|
None
|
None
|
None
|
Creating a dataframe and specify if the column accepts null values
df = op.create.df(
[
("names", "str", True),
("height", "float", True),
("function", "str", True),
("rank", "int", True),
],
[
("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7),
("Jazz", 13.0, "First Lieutenant", 8),
("Megatron", None, "None", None),
])
df.table()
names
1 (string)
nullable
|
height
2 (float)
nullable
|
function
3 (string)
nullable
|
rank
4 (int)
nullable
|
---|---|---|---|
bumbl#ebéé⋅⋅
|
17.5
|
Espionage
|
7
|
Optim'us
|
28.0
|
Leader
|
10
|
ironhide&
|
26.0
|
Security
|
7
|
Jazz
|
13.0
|
First⋅Lieutenant
|
8
|
Megatron
|
None
|
None
|
None
|
Creating a Daframe using a pandas dataframe
import pandas as pd
data = [("bumbl#ebéé ", 17.5, "Espionage", 7),
("Optim'us", 28.0, "Leader", 10),
("ironhide&", 26.0, "Security", 7)]
labels = ["names", "height", "function", "rank"]
# Create pandas dataframe
pdf = pd.DataFrame.from_records(data, columns=labels)
df = op.create.df(pdf=pdf)
df.table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⋅⋅
|
17.5
|
Espionage
|
7
|
Optim'us
|
28.0
|
Leader
|
10
|
ironhide&
|
26.0
|
Security
|
7
|
Here is how to View the first 10 elements in a dataframe.
df.table(10)
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⋅⋅
|
17.5
|
Espionage
|
7
|
Optim'us
|
28.0
|
Leader
|
10
|
ironhide&
|
26.0
|
Security
|
7
|
Spark and Optimus work differently than pandas or R. If you are not familiar with Spark, we recommend taking the time to take a look at the links below.
Partition are the way Spark divide the data in your local computer or cluster to better optimize how it will be processed.It can greatly impact the Spark performance.
Take 5 minutes to read this article: https://www.dezyre.com/article/how-data-partitioning-in-spark-helps-achieve-more-parallelism/297
Lazy evaluation in Spark means that the execution will not start until an action is triggered.
Immutability rules out a big set of potential problems due to updates from multiple threads at once. Immutable data is definitely safe to share across processes.
https://www.quora.com/Why-is-RDD-immutable-in-Spark
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-architecture.html
Optimus organized operations in columns and rows. This is a little different of how pandas works in which all operations are aroud the pandas class. We think this approach can better help you to access and transform data. For a deep dive about the designing decision please read:
Sort by cols names
df.cols.sort().table()
function
1 (string)
nullable
|
height
2 (double)
nullable
|
names
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
Espionage | 17.5 | bumbl#ebéé⸱⸱ | 7 |
Leader | 28.0 | Optim'us | 10 |
Security | 26.0 | ironhide& | 7 |
Sort by rows rank value
df.rows.sort("rank").table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
Optim'us | 28.0 | Leader | 10 |
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
ironhide& | 26.0 | Security | 7 |
df.describe().table()
summary
1 (string)
nullable
|
names
2 (string)
nullable
|
height
3 (string)
nullable
|
function
4 (string)
nullable
|
rank
5 (string)
nullable
|
---|---|---|---|---|
count
|
3
|
3
|
3
|
3
|
mean
|
None
|
23.833333333333332
|
None
|
8.0
|
stddev
|
None
|
5.575242894559244
|
None
|
1.7320508075688772
|
min
|
Optim'us
|
17.5
|
Espionage
|
7
|
max
|
ironhide&
|
28.0
|
Security
|
10
|
Unlike Pandas, Spark DataFrames don't support random row access. So methods like loc
in pandas are not available.
Also Pandas don't handle indexes. So methods like iloc
are not available.
Select an show an specific column
df.cols.select("names").table()
names
1 (string)
nullable
|
---|
bumbl#ebéé⸱⸱ |
Optim'us |
ironhide& |
Select rows from a Dataframe where a the condition is meet
df.rows.select(df["rank"] > 7).table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
Optim'us | 28.0 | Leader | 10 |
Select rows by specific values on it
df.rows.is_in("rank", [7, 10]).table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
Create and unique id for every row.
df.rows.create_id().table()
Create wew columns
df.cols.append("Affiliation", "Autobot").table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
Affiliation
5 (string)
|
---|---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 | Autobot |
Optim'us | 28.0 | Leader | 10 | Autobot |
ironhide& | 26.0 | Security | 7 | Autobot |
df.rows.drop_na("*", how='any').table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
Filling missing data.
df.cols.fill_na("*", "N//A").table()
names
1 (string)
nullable
|
height
2 (string)
nullable
|
function
3 (string)
nullable
|
rank
4 (string)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
To get the boolean mask where values are nan.
df.cols.is_na("*").table()
names
1 (string)
nullable
|
height
2 (boolean)
|
function
3 (string)
nullable
|
rank
4 (boolean)
|
---|---|---|---|
bumbl#ebéé⸱⸱ | False | Espionage | False |
Optim'us | False | Leader | False |
ironhide& | False | Security | False |
df.cols.mean("height")
23.833333333333332
df.cols.mean("*")
{'rank': {'mean': 8.0}, 'height': {'mean': 23.833333333333332}}
def func(value, args):
return value + 1
df.cols.apply("height", func, "float").table()
names
1 (string)
nullable
|
height
2 (float)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 18.5 | Espionage | 7 |
Optim'us | 29.0 | Leader | 10 |
ironhide& | 27.0 | Security | 7 |
df.cols.count_uniques("*")
{'names': {'approx_count_distinct': 3}, 'height': {'approx_count_distinct': 3}, 'function': {'approx_count_distinct': 3}, 'rank': {'approx_count_distinct': 2}}
df \
.cols.lower("names") \
.cols.upper("function").table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | ESPIONAGE | 7 |
optim'us | 28.0 | LEADER | 10 |
ironhide& | 26.0 | SECURITY | 7 |
Optimus provides and intuitive way to concat Dataframes by columns or rows.
df_new = op.create.df(
[
"class"
],
[
("Autobot"),
("Autobot"),
("Autobot"),
("Autobot"),
("Decepticons"),
]).h_repartition(1)
op.append([df, df_new], "columns").table()
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-6af36f3ed73f> in <module> ----> 1 df_new = op.create.df( 2 [ 3 "class" 4 ], 5 [ NameError: name 'op' is not defined
df_new = op.create.df(
[
"names",
"height",
"function",
"rank",
],
[
("Grimlock", 22.9, "Dinobot Commander", 9),
]).h_repartition(1)
op.append([df, df_new], "rows").table()
names
1 (string)
nullable
|
height
2 (string)
nullable
|
function
3 (string)
nullable
|
rank
4 (string)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
Grimlock | 22.9 | Dinobot⸱Commander | 9 |
# Operations like `join` and `group` are handle using Spark directly
df_melt = df.melt(id_vars=["names"], value_vars=["height", "function", "rank"])
df.table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
df_melt.pivot("names", "variable", "value").table()
names
1 (string)
nullable
|
function
2 (string)
nullable
|
height
3 (string)
nullable
|
rank
4 (string)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | Espionage | 17.5 | 7 |
ironhide& | Security | 26.0 | 7 |
Optim'us | Leader | 28.0 | 10 |
df.plot.hist("height", 10)
bucketizer() executed in 0.1 sec hist() executed in 1.27 sec hist() executed in 3.39 sec
df.plot.frequency("*", 10)
df.cols.names()
['names', 'height', 'function', 'rank']
df.to_json()
df.schema
StructType(List(StructField(names,StringType,true),StructField(height,DoubleType,true),StructField(function,StringType,true),StructField(rank,LongType,true)))
df.table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
op.profiler.run(df, "height", infer=True)
Processing column 'height'... _count_data_types() executed in 1.11 sec count_data_types() executed in 1.11 sec cast_columns() executed in 0.0 sec _exprs() executed in 1.18 sec general_stats() executed in 1.19 sec ------------------------------ Processing column 'height'... frequency() executed in 1.19 sec stats_by_column() executed in 0.0 sec percentile() executed in 0.04 sec extra_numeric_stats() executed in 0.17 sec bucketizer() executed in 0.19 sec hist() executed in 1.38 sec dataset_info() executed in 1.21 sec
Number of columns | 4 |
Number of rows | 3 |
Total Missing (%) | 0.0% |
Total size in memory | 81.7 MB |
String | 0 |
Numeric | 1 |
Date | 0 |
Bool | 0 |
Array | 0 |
Not available | 0 |
Unique | 3 |
Unique (%) | 100.0 |
Missing | 0.0 |
Missing (%) | 0 |
String | 0 |
Integer | 0 |
Float | 0 |
Bool | 0 |
Date | 0 |
Missing | 0 |
Null | 0 |
Mean | 23.833333333333332 |
Minimum | 17.5 |
Maximum | 28.0 |
Zeros(%) | 0 |
Value | Count | Frequency (%) |
---|---|---|
28.0 | 1 | 33.333% |
26.0 | 1 | 33.333% |
17.5 | 1 | 33.333% |
"Missing" | 0 | 0.0% |
Minimum | 17.5 |
5-th percentile | 17.5 |
Q1 | 17.5 |
Median | 17.5 |
Q3 | 17.5 |
95-th percentile | 17.5 |
Maximum | 28.0 |
Range | 10.5 |
Interquartile range | 0.0 |
Standard deviation | 5.575242894559244 |
Coef of variation | 0.23393 |
Kurtosis | -1.5000000000000004 |
Mean | 23.833333333333332 |
MAD | 0.0 |
Skewness | 0 |
Sum | 71.5 |
Variance | 31.083333333333336 |
|
|
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
Pika version 0.12.0 connecting to ::1:5672 Created channel=1 Closing channel (0): 'Normal shutdown' on <Channel number=1 OPEN conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>> Received <Channel.CloseOk> on <Channel number=1 CLOSING conn=<SelectConnection OPEN socket=('::1', 60968, 0, 0)->('::1', 5672, 0, 0) params=<URLParameters host=localhost port=5672 virtual_host=/ ssl=False>>> run() executed in 8.76 sec
df_csv = op.load.csv("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv").limit(5)
df_csv.table()
Downloading foo.csv from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.csv Downloaded 967 bytes Creating DataFrame for foo.csv. Please wait... Successfully created DataFrame for 'foo.csv'
id
1 (int)
nullable
|
firstName
2 (string)
nullable
|
lastName
3 (string)
nullable
|
billingId
4 (int)
nullable
|
product
5 (string)
nullable
|
price
6 (int)
nullable
|
birth
7 (string)
nullable
|
dummyCol
8 (string)
nullable
|
---|---|---|---|---|---|---|---|
1 | Luis | Alvarez$$%! | 123 | Cake | 10 | 1980/07/07 | never |
2 | André | Ampère | 423 | piza | 8 | 1950/07/08 | gonna |
3 | NiELS | Böhr//((%% | 551 | pizza | 8 | 1990/07/09 | give |
4 | PAUL | dirac$ | 521 | pizza | 8 | 1954/07/10 | you |
5 | Albert | Einstein | 634 | pizza | 8 | 1990/07/11 | up |
df_json = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json").limit(5)
df_json.table()
Downloading foo.json from https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json Downloaded 2596 bytes Creating DataFrame for foo.json. Please wait... Successfully created DataFrame for 'foo.json'
billingId
1 (bigint)
nullable
|
birth
2 (string)
nullable
|
dummyCol
3 (string)
nullable
|
firstName
4 (string)
nullable
|
id
5 (bigint)
nullable
|
lastName
6 (string)
nullable
|
price
7 (bigint)
nullable
|
product
8 (string)
nullable
|
---|---|---|---|---|---|---|---|
123 | 1980/07/07 | never | Luis | 1 | Alvarez$$%! | 10 | Cake |
423 | 1950/07/08 | gonna | André | 2 | Ampère | 8 | piza |
551 | 1990/07/09 | give | NiELS | 3 | Böhr//((%% | 8 | pizza |
521 | 1954/07/10 | you | PAUL | 4 | dirac$ | 8 | pizza |
634 | 1990/07/11 | up | Albert | 5 | Einstein | 8 | pizza |
df_csv.save.csv("test.csv")
df.table()
names
1 (string)
nullable
|
height
2 (double)
nullable
|
function
3 (string)
nullable
|
rank
4 (bigint)
nullable
|
---|---|---|---|
bumbl#ebéé⸱⸱ | 17.5 | Espionage | 7 |
Optim'us | 28.0 | Leader | 10 |
ironhide& | 26.0 | Security | 7 |
df = op.load.json("https://raw.githubusercontent.com/ironmussa/Optimus/master/examples/data/foo.json")
df.table()
billingId
1 (bigint)
nullable
|
birth
2 (string)
nullable
|
dummyCol
3 (string)
nullable
|
firstName
4 (string)
nullable
|
id
5 (bigint)
nullable
|
lastName
6 (string)
nullable
|
price
7 (bigint)
nullable
|
product
8 (string)
nullable
|
---|---|---|---|---|---|---|---|
123
|
1980/07/07
|
never
|
Luis
|
1
|
Alvarez$$%!
|
10
|
Cake
|
423
|
1950/07/08
|
gonna
|
André
|
2
|
Ampère
|
8
|
piza
|
551
|
1990/07/09
|
give
|
NiELS
|
3
|
Böhr//((%%
|
8
|
pizza
|
521
|
1954/07/10
|
you
|
PAUL
|
4
|
dirac$
|
8
|
pizza
|
634
|
1990/07/11
|
up
|
Albert
|
5
|
Einstein
|
8
|
pizza
|
672
|
1930/08/12
|
never
|
Galileo
|
6
|
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
|
5
|
arepa
|
323
|
1970/07/13
|
gonna
|
CaRL
|
7
|
Ga%%%uss
|
3
|
taco
|
624
|
1950/07/14
|
let
|
David
|
8
|
H$$$ilbert
|
3
|
taaaccoo
|
735
|
1920/04/22
|
you
|
Johannes
|
9
|
KEPLER
|
3
|
taco
|
875
|
1923/03/12
|
down
|
JaMES
|
10
|
M$$ax%%well
|
3
|
taco
|
import requests
def func_request(params):
# You can use here whatever header or auth info you need to send.
# For more information see the requests library
url= "https://jsonplaceholder.typicode.com/todos/" + str(params["id"])
return requests.get(url)
def func_response(response):
# Here you can parse de response
return response["title"]
e = op.enrich(host="localhost", port=27017, db_name="jazz")
e.flush()
df_result = e.run(df, func_request, func_response, calls= 60, period = 60, max_tries = 8)
count is deprecated. Use Collection.count_documents instead.
HBox(children=(IntProgress(value=0, description='Processing...', max=19, style=ProgressStyle(description_width…
find_and_modify is deprecated, use find_one_and_delete, find_one_and_replace, or find_one_and_update instead
df_result.table()
billingId
1 (bigint)
nullable
|
birth
2 (string)
nullable
|
dummyCol
3 (string)
nullable
|
firstName
4 (string)
nullable
|
id
5 (bigint)
nullable
|
lastName
6 (string)
nullable
|
price
7 (bigint)
nullable
|
product
8 (string)
nullable
|
jazz_results
9 (string)
nullable
|
---|---|---|---|---|---|---|---|---|
123
|
1980/07/07
|
never
|
Luis
|
1
|
Alvarez$$%!
|
10
|
Cake
|
delectus⋅aut⋅autem
|
423
|
1950/07/08
|
gonna
|
André
|
2
|
Ampère
|
8
|
piza
|
quis⋅ut⋅nam⋅facilis⋅et⋅officia⋅qui
|
551
|
1990/07/09
|
give
|
NiELS
|
3
|
Böhr//((%%
|
8
|
pizza
|
fugiat⋅veniam⋅minus
|
521
|
1954/07/10
|
you
|
PAUL
|
4
|
dirac$
|
8
|
pizza
|
et⋅porro⋅tempora
|
634
|
1990/07/11
|
up
|
Albert
|
5
|
Einstein
|
8
|
pizza
|
laboriosam⋅mollitia⋅et⋅enim⋅quasi⋅adipisci⋅quia⋅provident⋅illum
|
672
|
1930/08/12
|
never
|
Galileo
|
6
|
⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅GALiLEI
|
5
|
arepa
|
qui⋅ullam⋅ratione⋅quibusdam⋅voluptatem⋅quia⋅omnis
|
323
|
1970/07/13
|
gonna
|
CaRL
|
7
|
Ga%%%uss
|
3
|
taco
|
illo⋅expedita⋅consequatur⋅quia⋅in
|
624
|
1950/07/14
|
let
|
David
|
8
|
H$$$ilbert
|
3
|
taaaccoo
|
quo⋅adipisci⋅enim⋅quam⋅ut⋅ab
|
735
|
1920/04/22
|
you
|
Johannes
|
9
|
KEPLER
|
3
|
taco
|
molestiae⋅perspiciatis⋅ipsa
|
875
|
1923/03/12
|
down
|
JaMES
|
10
|
M$$ax%%well
|
3
|
taco
|
illo⋅est⋅ratione⋅doloremque⋅quia⋅maiores⋅aut
|