#!/usr/bin/env python # coding: utf-8 # ## Get values from jupyterlab-spreadsheet-editor formula results to use elsewhere # # Hacky answer to post ['Convert formulas into values using jupyterlab-spreadsheet-editor'](https://discourse.jupyter.org/t/convert-formulas-into-values-using-jupyterlab-spreadsheet-editor/30704?u=fomightez) on Jupyter Discourse page. # # Note that [jupyterlab-spreadsheet-editor](https://github.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor) is based on use of [jexcel/Jspreadsheet CE](https://github.com/jspreadsheet/ce). # # # ### Set-up: # # To have something to work with, I went to [here](https://github.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor?tab=readme-ov-file#jupyterlab-spreadsheet-editor) in the JupyterLab Spreadsheet Editor repo and clicked on the `launch binder` badge to launch a session with the extension working. And then in the sessio essentially followed [the gif](https://raw.githubusercontent.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor/main/screenshots/formula-support.gif) below 'Formula support basic formula calculation (rendering) - as implemented by jExcel' as a guide to make a table that uses formulas. # # Got: # # ![jl-spreadsheet_demo.png](attachment:36b6c15f-b6a0-4b8d-b0be-f6e973c6cc2c.png) # # Note the actual content I got from copying that table, which from was made like that using [the gif](https://raw.githubusercontent.com/jupyterlab-contrib/jupyterlab-spreadsheet-editor/main/screenshots/formula-support.gif) below 'Formula support basic formula calculation (rendering) - as implemented by jExcel' as a guide: # # ```text # 5 # 1 =A3*B2 # 2 =A4*B3 # 3 =A5*B4 # 4 =A6*B5 # 5 =A7*B6 # ``` # ---------- # # ### Getting the HTML of the resulting table # # I figured that the HTML representation had the values because they are rendered and so I should be able to get them from there at least. # Getting the HTML table code: # In the jupyterlab-spreadsheet-editor view of my demo spreadheeet, did `Shift` + Right-click on bottom value in column that I made with formula and choose from Chrome broser menu `Inspect`. (You may need to have developer tools on prior to this?) In my example it was column B. Then I right-clicked on line that is `` below that in the HTML elements listing. (For some reason when I did that, it looked more like ` == $0`.) Then I right-clicked and choose `Copy` > `OuterHTML`. # # I examined what that looked like by adding `%%HTML` cell magic to the start of a Jupyter code cell and pastin in the copied HTML below it, like so: # In[1]: get_ipython().run_cell_magic('HTML', '', '
ABCD
1
25
315
4210
5330
64120
75600
\n') # I could see that it indeed had the values in there. # #### How to get that as values from the HTML now? # # Use BeautifulSoup to parse the HTML code. # First, assign the code text to a variable: # In[1]: h='''
ABCD
1
25
315
4210
5330
64120
75600
''' # Now can parse that table. # In[2]: # based on https://stackoverflow.com/a/45843420/8508004 from bs4 import BeautifulSoup html = h soup = BeautifulSoup(html, 'html.parser') tables = [ [ [td.get_text(strip=True) for td in tr.find_all('td')] for tr in table.find_all('tr') ] for table in soup.find_all('table') ] # In[ ]: tables # In[3]: print(tables) # Now collect the B column and the values. # In[4]: column_b = [row[2] for row in tables[0][1:] if len(row) > 2] column_b_nums = [int(x) for x in column_b if x.strip()] # In[5]: column_b_nums # There's the values of the formula. # # What if just wanted a CSV? # # ----- # ## CSV from the HTML representation # # Since the HTML representation has the values, that can be used to make a CSV that won't have the formulas and instead has the values as requested in the post ['Convert formulas into values using jupyterlab-spreadsheet-editor'](https://discourse.jupyter.org/t/convert-formulas-into-values-using-jupyterlab-spreadsheet-editor/30704?u=fomightez) on Jupyter Discourse page. # In[1]: h='''
ABCD
1
25
315
4210
5330
64120
75600
''' import csv import io from bs4 import BeautifulSoup def html_table_to_csv(html): # Parse the HTML soup = BeautifulSoup(html, 'html.parser') # Find the table table = soup.find('table') # Prepare a CSV output output = io.StringIO() csv_writer = csv.writer(output) # Extract rows for tr in table.find_all('tr'): # Extract cell texts, stripping whitespace # Skip the first column (row numbers) by slicing [1:] row = [td.get_text(strip=True) for td in tr.find_all('td')[1:]] # Only write non-empty rows if row: csv_writer.writerow(row) # Get the CSV content as a string csv_content = output.getvalue() return csv_content # Convert the HTML to CSV csv_output = html_table_to_csv(h) print(csv_output) # That works. # # So save it as a file: # In[2]: get_ipython().run_line_magic('store', 'csv_output >csv_output.csv') # In[3]: cat csv_output.csv # That looks like the following in the result, which is pretty close to the initially rendered table that had the formulas, but now only has values: # # ![jl-spreadsheet_demo_result.png](attachment:ab008dea-47de-40be-b815-d0f841e25c32.png)