#!/usr/bin/env python
# coding: utf-8
# ## Get values from jupyterlab-spreadsheet-editor formula results to use elsewhere
#
# Hacky answer to post ['Convert formulas into values using jupyterlab-spreadsheet-editor'](https://discourse.jupyter.org/t/convert-formulas-into-values-using-jupyterlab-spreadsheet-editor/30704?u=fomightez) on Jupyter Discourse page.
#
# Note that [jupyterlab-spreadsheet-editor](https://github.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor) is based on use of [jexcel/Jspreadsheet CE](https://github.com/jspreadsheet/ce).
#
#
# ### Set-up:
#
# To have something to work with, I went to [here](https://github.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor?tab=readme-ov-file#jupyterlab-spreadsheet-editor) in the JupyterLab Spreadsheet Editor repo and clicked on the `launch binder` badge to launch a session with the extension working. And then in the sessio essentially followed [the gif](https://raw.githubusercontent.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor/main/screenshots/formula-support.gif) below 'Formula support basic formula calculation (rendering) - as implemented by jExcel' as a guide to make a table that uses formulas.
#
# Got:
#
# 
#
# Note the actual content I got from copying that table, which from was made like that using [the gif](https://raw.githubusercontent.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor/main/screenshots/formula-support.gif) below 'Formula support basic formula calculation (rendering) - as implemented by jExcel' as a guide:
#
# ```text
# 5
# 1 =A3*B2
# 2 =A4*B3
# 3 =A5*B4
# 4 =A6*B5
# 5 =A7*B6
# ```
# ----------
#
# ### Getting the HTML of the resulting table
#
# I figured that the HTML representation had the values because they are rendered and so I should be able to get them from there at least.
# Getting the HTML table code:
# In the jupyterlab-spreadsheet-editor view of my demo spreadheeet, did `Shift` + Right-click on bottom value in column that I made with formula and choose from Chrome broser menu `Inspect`. (You may need to have developer tools on prior to this?) In my example it was column B. Then I right-clicked on line that is `` below that in the HTML elements listing. (For some reason when I did that, it looked more like ` == $0`.) Then I right-clicked and choose `Copy` > `OuterHTML`.
#
# I examined what that looked like by adding `%%HTML` cell magic to the start of a Jupyter code cell and pastin in the copied HTML below it, like so:
# In[1]:
get_ipython().run_cell_magic('HTML', '', '
| A | B | C | D |
1 | | | | |
2 | | 5 | | |
3 | 1 | 5 | | |
4 | 2 | 10 | | |
5 | 3 | 30 | | |
6 | 4 | 120 | | |
7 | 5 | 600 | | |
\n')
# I could see that it indeed had the values in there.
# #### How to get that as values from the HTML now?
#
# Use BeautifulSoup to parse the HTML code.
# First, assign the code text to a variable:
# In[1]:
h=''' | A | B | C | D |
1 | | | | |
2 | | 5 | | |
3 | 1 | 5 | | |
4 | 2 | 10 | | |
5 | 3 | 30 | | |
6 | 4 | 120 | | |
7 | 5 | 600 | | |
'''
# Now can parse that table.
# In[2]:
# based on https://stackoverflow.com/a/45843420/8508004
from bs4 import BeautifulSoup
html = h
soup = BeautifulSoup(html, 'html.parser')
tables = [
[
[td.get_text(strip=True) for td in tr.find_all('td')]
for tr in table.find_all('tr')
]
for table in soup.find_all('table')
]
# In[ ]:
tables
# In[3]:
print(tables)
# Now collect the B column and the values.
# In[4]:
column_b = [row[2] for row in tables[0][1:] if len(row) > 2]
column_b_nums = [int(x) for x in column_b if x.strip()]
# In[5]:
column_b_nums
# There's the values of the formula.
#
# What if just wanted a CSV?
#
# -----
# ## CSV from the HTML representation
#
# Since the HTML representation has the values, that can be used to make a CSV that won't have the formulas and instead has the values as requested in the post ['Convert formulas into values using jupyterlab-spreadsheet-editor'](https://discourse.jupyter.org/t/convert-formulas-into-values-using-jupyterlab-spreadsheet-editor/30704?u=fomightez) on Jupyter Discourse page.
# In[1]:
h=''' | A | B | C | D |
1 | | | | |
2 | | 5 | | |
3 | 1 | 5 | | |
4 | 2 | 10 | | |
5 | 3 | 30 | | |
6 | 4 | 120 | | |
7 | 5 | 600 | | |
'''
import csv
import io
from bs4 import BeautifulSoup
def html_table_to_csv(html):
# Parse the HTML
soup = BeautifulSoup(html, 'html.parser')
# Find the table
table = soup.find('table')
# Prepare a CSV output
output = io.StringIO()
csv_writer = csv.writer(output)
# Extract rows
for tr in table.find_all('tr'):
# Extract cell texts, stripping whitespace
# Skip the first column (row numbers) by slicing [1:]
row = [td.get_text(strip=True) for td in tr.find_all('td')[1:]]
# Only write non-empty rows
if row:
csv_writer.writerow(row)
# Get the CSV content as a string
csv_content = output.getvalue()
return csv_content
# Convert the HTML to CSV
csv_output = html_table_to_csv(h)
print(csv_output)
# That works.
#
# So save it as a file:
# In[2]:
get_ipython().run_line_magic('store', 'csv_output >csv_output.csv')
# In[3]:
cat csv_output.csv
# That looks like the following in the result, which is pretty close to the initially rendered table that had the formulas, but now only has values:
#
# 