The Wendelin project started in the beginning of 2015 with Nexedi as consortium leader in charge of managing the development of a big data solution "made in France". Under the Wendelin umbrella there is the Movement Visualization (MOVIS) project. This project is focussed on implementing data visualization features, like pivot tables and charts, using the scientific libraries already present in the Wendelin project and adding even more to help with richer and interactive visualizations.
Wendelin is getting more mature as the time goes and it is already being put to test with many prototypes. This time a prototype of the Inventory API was created. The Inventory API is one of the areas of ERP5 that demands the most processing power to do calculations over all the client's stock movements. These inventories also hold valuable information for their companies and they might even request many inventories with different parameters for internal research, along with many tables and charts based on this data for better visualization of important insights (like trends and cycles, for example).
Some inventories are so big that even MariaDB is taking too much time do calculate them. The current implementation of the Inventory API relies on a very complex SQL query on the stock table. This stock table can have millions of rows, depending on the client, and this makes the query very slow.
Talking about visualization of data, this prototype includes integration with PivotTableJs. This tool turns any dataset into a summary table and adds a 2-dimensional drag and drop interface to allow any user to manipulate this summary table, turning it into something very similar to what is found in older versions of Microsoft Excel. With included addons not only a table can be rendered, but various kinds of charts, turning the pivot table into a pivot chart.
In the technical point of view, we had MariaDB as a single point of failure. It's very hard to decentralize and paralelize relational databases in a transaction system. With Wendelin all the stock data that was inside MariaDB can be moved to storage system created specifically for that purpose: NEO is a distributed, redundant and transactional storage designed to be an alternative to ZEO and FileStorage.
It's an optional replacement or complement to the original Inventory API that's on early development. It's built on top of Wendelin's Data Array to allow processing of volumes of data much bigger than the available memory. Thanks to Wendelin's Data Array, which is compatible with NumPy's ndarray, we also take advantage of Pandas to create a Data Frame with all the data and have advanced indexing and filtering features.
There are 2 steps before the user can actually use this prototype Pandas-based Inventory API: create the Data Array with stock movement data and add the category information to each stock movement. After these two steps the array is ready for being filtered and the user wishes.
The first thing to do is actually create and fill a Data Array with data from the stock table. To help import the data a class was created to transform any ERP5 query into a Data Array with an equivalent data type. The
SaleOrderModule_zGetQuantityList object is a simple ZSQLMethod that gets and all the rows in the
stock table, as simple as:
SELECT * FROM stock;
data = context.sale_order_module.SaleOrderModule_zGetQuantityList() context.Base_convertResultsToBigArray(data, reference='WendelinJupyter')
<Data Array at /erp5/data_array_module/1>
Now it's time to import the category information from each stock movement to the Data Array. This is necessary in situations like, for example, to get only the sock movements of a resource that belongs to a specific category. The method
Base_fillPandasInventoryCategoryList will take care of querying the catalog in the most efficient way possible to get the category information needed and store it in the Data Array.
Finally the Data Frame is ready to be filtered. Compatibility with the original Inventory API was kept in my mind while developing the prototype: both functions receive the same parameters. The only difference in the prototype is that it returns a Pandas.DataFrame instead of ERP5 objects. With the Data Frame the developer can do further processing to improve the visualization of the data: more filtering, (multilevel) indexing, grouping, sorting and etc.
swimsuit_uid = 17408 resource_product_line_uid = 19534 data_frame = context.Base_getInventoryDataFrame( is_accountable=False, omit_input=True, resource_uid=17408, from_date='2015-08-11', to_date='2015-09-13', simulation_state='planned', resource_product_line_uid='19534' ) data_frame.head()
date explanation_uid function_uid funding_uid is_accountable \ 372 2015-09-12 22278 0 0 0 374 2015-09-11 22280 0 0 0 376 2015-09-10 22282 0 0 0 378 2015-09-09 22284 0 0 0 380 2015-09-08 22286 0 0 0 is_cancellation mirror_date mirror_node_uid mirror_section_uid \ 372 0 2015-01-01 17407 17407 374 0 2015-01-01 17407 17407 376 0 2015-01-01 17407 17407 378 0 2015-01-01 17407 17407 380 0 2015-01-01 17407 17407 node_uid ... resource_category node_category \ 372 16550 ... 15831,15826,19337,19534 19310,20362,20272 374 16550 ... 15831,15826,19337,19534 19310,20362,20272 376 16550 ... 15831,15826,19337,19534 19310,20362,20272 378 16550 ... 15831,15826,19337,19534 19310,20362,20272 380 16550 ... 15831,15826,19337,19534 19310,20362,20272 payment_category section_category mirror_section_category \ 372 0 19310,20362,20272 19309,20272,20362 374 0 19310,20362,20272 19309,20272,20362 376 0 19310,20362,20272 19309,20272,20362 378 0 19310,20362,20272 19309,20272,20362 380 0 19310,20362,20272 19309,20272,20362 function_category project_category funding_category \ 372 0 0 0 374 0 0 0 376 0 0 0 378 0 0 0 380 0 0 0 payment_request_category movement_category 372 0 17408,15831,15826 374 0 17408,15831,15826 376 0 17408,15831,15826 378 0 17408,15831,15826 380 0 17408,15831,15826 [5 rows x 33 columns]
PivotTableJs is here for the rescue of users and developers who wants to get fast insights from their data. All they have to do is: put all the data in a Pandas.DataFrame and use the external method
Base_erp5PivotTableUI. This method is integrated with the ERP5 Jupyter kernel and will render the pivot table user interface with the input data. You can interact with demo pivot table generated by the code below.
import random import pandas as pd df = context.Base_getInventoryDataFrame( is_accountable=False, omit_input=True, simulation_state='planned', resource_uid=17408, from_date='2015-08-11', to_date='2015-09-13', ) # bar chart: sum of quantity vs date by resource_uid/node_uid columns = ['date', 'resource_uid', 'quantity', 'quantity', 'node_uid'] columns_to_delete = df.columns - columns for column in columns_to_delete: df.drop(column, axis=1, inplace=True) context.Base_renderAsHtml( context.Base_erp5PivotTableUI(df) )
The gif below shows a demonstraction of the PivotTableJs UI running inside a Jupyter notebook from a Wendelin instance. Fed with planned stock movements of the resource with UID 17408 that are accountable and omitting all the input movements. Then columns are dragged and dropped to organise data in a meaninful way: node_uid and resource_uid are moved to the rows and date to the columns and now there is a "output movements of resource_uid at node_uid over time" table. Next step is modifying the aggregation function to the sum of the quantity column to get the total output of the resource at the given nodes per day. After the data is organised the source is edited: only dates between 2015-08-13 and 2015-09-09 are taken into account and the representation is changed from a simple table to a beautiful line chart.
from IPython.display import Image gif_url = 'https://media.giphy.com/media/xT9DPOo4UEuKSEPmoM/giphy.gif' context.Base_renderAsHtml( Image(url=gif_url) )
Wendelin is a project in constant evolution with a simple objective: it aims to bring all the tools known in the scientific community along with their high performance to the ERP5 platform. Scikit-learn, statsmodels and Pandas are already integrated and even more integrations are on the works. Everything backed by Wendelin's NumPy-compatible array to enable you to extend your computation beyond the memory limits of servers without losing the valuable insights achievable through all the most famous scientific libraries in the Python community.