This tutorial is not only to help you ramp up on Fugue, but more importantly, it helps you better understand the basic concepts of distributed computing, from higher level. The philosophy of Fugue is to adapt to you, so as long as you understand the basic concepts, you can simply use Fugue to express or glue your logic. Most of your code will stay in native python.
How to quickly start playing with Fugue.
A Fugue use case in NLP preprocessing. It's to get a general idea what Fugue is trying to solve, and why we want to Fugue layer but not directly on Pandas.
A deep dive on the programming interfaces we used on the sentiment analysis. In this tutorial we will cover most features of the Fugue programming interface.
Another Fugue example, this one shows you how to use Fugue SQL to do data analysis
The most fun part of Fugue. You can use SQL instead of python to represent the backbone of your workflow, and you can keep you mindset in SQL in most of the time and with the help of python extensions. The SQL mindset is great for distributed computing, you may be able to make your logic more scale agnostic if within SQL mindset. In this tutorial, we will cover all syntax of Fugue SQL.
From the previous tutorials you have seen plenty of extension examples, here is a complete guide to use Fugue extensions
The most useful extension, that is widely used in real world.
Transformation on multiple dataframes partitioned in the same way
Creators of dataframes for a DAG to use
Taking in one or multiple dataframes and produce a single dataframe
Taking in one or multiple dataframes to do final jobs such as save and print
It's time to build a systematic understanding of Fugue architecture.
Fugue data types and schema are strictly based on Apache Arrow. Dataframe is an abstract concept with several built in implementations to adapt to different dataframes. In this tutorial, we will go through the basic APIs and focus on the most common use cases.
This tutorial is more focused on explaining the basic ideas of data partition. It's less related with Fugue. To have a good understanding of partition is the key to writing high performance code.
The heart of Fugue. It is the layer that unifies many of the core concepts of distributed computing, and separates the underlying computing frameworks from users' higher level logic. Normally you don't directly operate on execution engines. But it's good to understand some basics.
The heart of Fugue. It is the layer that unifies many of the core concepts of distributed computing, and separates the underlying computing frameworks from users' higher level logic. Normally you don't directly operate on execution engines. But it's good to understand some basics.
You may often see -like objects in Fugue API document, here is a complete list of these objects and their ways to initialize.