Last updated: June 22, 2021
The goal of this tutorial is to explain basic intuitions about modeling with Aerospike. The key to getting the most out of Aerospike is to find the right way to match an application’s object model and data access needs to Aerospike’s data model and access methods.
This notebook contains:
This notebook does not include:
Other tutorials will focus on these facets of modeling in more detail.
This Jupyter Notebook requires the Aerospike Database running locally with Java kernel and Aerospike Java Client. To create a Docker container that satisfies the requirements and holds a copy of these notebooks, visit the Aerospike Notebooks Repo.
Make it easier to work with Java in Jupyter.
import io.github.spencerpark.ijava.IJava;
import io.github.spencerpark.jupyter.kernel.magic.common.Shell;
IJava.getKernelInstance().getMagics().registerMagics(Shell.class);
Ensure Aerospike Database is running locally.
%sh asd
Ask Maven to download and install the project object model (POM) of the Aerospike Java Client.
%%loadFromPOM
<dependencies>
<dependency>
<groupId>com.aerospike</groupId>
<artifactId>aerospike-client</artifactId>
<version>5.0.0</version>
</dependency>
</dependencies>
Create an instance of the Aerospike Java Client, and connect to the demo cluster.
The default cluster location for the Docker container is localhost port 3000. If your cluster is not running on your local machine, modify localhost and 3000 to the values for your Aerospike cluster.
import com.aerospike.client.AerospikeClient;
AerospikeClient client = new AerospikeClient("localhost", 3000);
System.out.println("Initialized the client and connected to the cluster.");
Initialized the client and connected to the cluster.
Aerospike differentiates from other Key-Value Stores through its architecture and the consequential structure and tools it provides. One can throw documents of data at Aerospike and achieve some performance to keep up with most applications. However, when applications must achieve high performance at scale, expert use of Aerospike provides those results. Those successful outcomes are due to the structure that other Key-Value Stores do not provide.
Aerospike was architected to efficiently store document-oriented data. Aerospike platform priorities include:
The Aerospike data model is a direct result of these priorities. These modeling notebooks teach the principles behind modeling that will result in proper use.
The pieces of the Aerospike data model can be thought of as a mirror of the anatomy of a relational database.
However, despite the similarities to their RDBMS counterpart, each of these has a well-defined purpose and characteristics that make each scale differently from each other.
The best practice is to consider both of the following questions when creating an application’s data model:
Because of Aerospike’s focus on scalability, properly matching the app object and Aerospike data models will result in a highly performant and scalable app.
From the application perspective, this consists of looking at the app's classes to determine the number and size of instances that will be stored in the Aerospike database. Minimum size, maximum size, and average size should all be considered, as well as the duration of storage. In addition, consider implicit dimensions of storage, such as how the data scales over time. Each object, including potentially implicit dimensions, will be directly paired with one or more elements of the Aerospike Data Model. Finally, consider the flows of how the data is created, modified, and deleted.
To determine how to match with the Aerospike data mode, let's first discuss the elements of the Aerospike Data Model.
The following are the elements of the Aerospike Data Model:
At low read and write volumes, the above may seem like unnecessary complexity. However, as the application scales, the structure provided by the Aerospike data model allows Aerospike to be used surgically at petabyte scale more efficiently by (ROI x Performance) than most varieties of database product. This is due to Aerospike’s architecture and flexible data model that creates enough mesh points to match with a complex application's object model and implicit data dimensions.
The following sections share modeling-related details and API code for working with those elements.
The Namespace is a top level data container that associates index and data with related storage media and policies that govern the data. Because each type of data in a data model has different read/write profile demands, it is common to divide further. For example, data for an ecommerce app might store the hottest sales items in RAM, where the rest are stored in Flash. In such a circumstance, the application may store some identical data in 2 namespaces – 1 associating a subset of products with RAM storage and 1 associating the full product data set with Flash storage.
Because Namespaces are defined in the Aerospike configuration file, some changes require a rolling warm restart to take effect. This differentiates a Namespace from other data containers.
Each Aerospike server in a cluster has a Primary Index per namespace detailing the location of all records in all storage media on the node. Within the index, each record has a 64-byte footprint per record. The weight of this footprint suggests that most Aerospike records should be larger than a simple data type field. However, for the rare case of extremely high throughput access, the index can contain a single numeric element instead of the data record’s location.
The Set is an optional label representing a segment of Records in a Namespace. A set facilitates fast access to its members.
A Record is uniquely identified by a namespace and Digest. The digest is a client-generated RIPEMD-160 20-Byte hash of the set name and the user key. The user key is the application’s unique identifier for a record in Aerospike – a string, a number, or a bytestream. The user key can be optionally stored in the Aerospike Database. The user key can be optionally stored in the Aerospike Database.
The following is Java Client code to create a key using the namespace, set, and user key.
import com.aerospike.client.Key;
String namespaceName = "test";
String setName = "dm101set";
Integer theKey = 0; // A key can be an integer, string, or blob.
Key key = new Key(namespaceName, setName, theKey);
System.out.println("Key created." );
Key created.
Aerospike offers record-level ACID-compliance. That is, Aerospike allows execution of multiple record-operations as one atomic, consistent, isolated, and durable transaction by way of the operate
method.
The structure of a record is a Map containing:
A Bin is a flexible container that contains one data Value. A Value has an associated scalar or collection data type, however a Bin's data type is not formally declared in a schema.
The following Java client code uses the key from the previous code example to put integer and string data into a record in Aerospike.
import com.aerospike.client.Bin;
import com.aerospike.client.policy.ClientPolicy;
String aString = "modeling";
Integer anInteger = 8;
String stringBinName = "str";
String integerBinName = "int";
ClientPolicy clientPolicy = new ClientPolicy();
Bin bin0 = new Bin(stringBinName, aString);
Bin bin1 = new Bin(integerBinName, anInteger);
client.put(clientPolicy.writePolicyDefault, key, bin0, bin1);
System.out.println("Put data into Aerospike: " + stringBinName + "=" + aString + ", " + integerBinName + "=" + anInteger);
Put data into Aerospike: str=modeling, int=8
Uses the same key to read the record.
import com.aerospike.client.Record;
Record record = client.get(null, key);
System.out.println("Generation count: " + record.generation);
System.out.println("Record expiration: " + record.expiration);
System.out.println("Bins: " + record.bins);
Generation count: 1 Record expiration: 364672907 Bins: {str=modeling, int=8}
Lists and Maps are Collection Data Types (CDTs). These are flexible, schema-less data types that can contain Values of any data type, either scalar data or collection data. Collection Data Types can be nested as deeply as necessary to match an application’s needs.
A List is a collection of Values. For data efficiency, Lists are frequently used as tuples, a lightweight record structure using position instead of field names.
A Map is a collection of mapkey/Value pairs. Maps are commonly used for JSON-like data structures.
Because a Record contains one or more Bins, and a Bin or CDT can contain a scalar data type or collection data type, the most common question to consider when creating an application's data model in Aerospike is whether to store a class instance as a Record, Bin, CDT, or nested CDT.
Create a tuple and put it in Aerospike.
import com.aerospike.client.Value;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
ArrayList<Value> aTuple = new ArrayList<Value>();
aTuple.add(Value.get(9.92));
aTuple.add(Value.get("Carl Lewis"));
aTuple.add(Value.get("Seoul, South Korea"));
aTuple.add(Value.get("September 24, 1988"));
String tupleBinName = "tuple";
Bin bin2 = new Bin(tupleBinName, aTuple);
client.put(clientPolicy.writePolicyDefault, key, bin2);
Record record = client.get(null, key);
System.out.println("Put data into Aerospike: " + tupleBinName + "=" + aTuple);
System.out.println("After operation, Bins: " + record.bins);
System.out.println( tupleBinName + ": " + record.getValue(tupleBinName));
Put data into Aerospike: tuple=[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988] After operation, Bins: {str=modeling, int=8, tuple=[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]} tuple: [9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]
Rather than use a simple tuple, this model needs a Map containing a list of Tuples. Reuse the Tuple Bin.
import java.util.HashMap;
String tupleMapKey = "world-records";
ArrayList<Value> tupleList = new ArrayList<Value>();
tupleList.add(Value.get(aTuple));
HashMap <String, ArrayList> wrMap = new HashMap <String, ArrayList>();
wrMap.put(tupleMapKey, tupleList);
Bin bin2 = new Bin(tupleBinName, wrMap);
client.put(clientPolicy.writePolicyDefault, key, bin2);
Record record = client.get(null, key);
System.out.println("After operation, " + tupleBinName + ": " + record.getValue(tupleBinName));
After operation, tuple: {world-records=[[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]]}
From a modeling perspective, each Aerospike data model element is a potential mesh point with the application object model that will help you to store object instances from the application in Aerospike. The following questions instruct broadly how to fit them together. The italic text after the question explains the intuition to apply.
Q: Does the application require a specific storage medium for a particular type of data, to achieve a necessary scale and frequency of reads or writes?
The easiest way to match data to hardware is to assign it to the right namespace. Namespaces associate index and data with storage media, like fast NVMe drives, persistent memory, or DRAM.
Q: Does the application need to store an integer or float with extremely frequent reads or writes?
Aerospike can store an integer or float in the primary index, instead of storing a memory location of the data. This provides even faster access than storing in DRAM media.
Q: Does the application have an object class for which a large number of instances need to be stored in Aerospike and are frequently read together by the application?
There are two common options:
Q: Do writes occur grouped into transactions or are individual pieces of data updated one by one?
Aerospike provides single-record transactions that are ACID compliant. Store data requiring atomic updates in one or more Bins in the same Aerospike Record, and use the Operate
API to execute a multiple-operation transaction. If updates occur element by element, data can be stored in one or more Records, or in one or more Bins of data.
Q: During a single database transaction when updating data in an instance of an application object, are both of the following true?
It can be helpful to store data as-if different different objects in separate Bins.
When applying transaction operations on a record using Operate()
, the Aerospike client delivers the return values from operations per Bin. These return values can be accessed in order, making transaction results easier to work with when data is put in separate Bins.
Q: Is the size of a set of application records large? (For example, Measured in MiBs rather than in KiBs.)
There is an inherent trade-off in record size, as updating an app record will require a read, modify, and write of the entire Aerospike record. Consider storing the app record in more than one Aerospike record, rather than in a single monolithic record.
When data is large, taking advantage of an intrinsic property of the data, like a timestamp, can help to distribute data in an intuitive way across records. Including timestamp in a set name or user key name, for example, allows more efficient reads and writes. It will also allow graceful rotation of data.
Q: If there are no updates, can data naturally age out of the application?
It is common for applications to naturally allow data to expire after creation or update. Aerospike records have an Expiration metadata field that can be used to automatically expire data and reclaim storage space. All operations can be configured with a policy to set or update the Expiration.
An example of this is a bank keeping track of a customer's put stock option. An option grants the holder the right to make a stock transaction until a specified date that is determined at the purchase time. Once the expiration has passed, the option expires and the holder no longer has the right. The bank would model this in their computer systems using Expiration.
Q: Does the application require for a group of associated records that are created at distinct times to be removed at the same time?
The most common way to explicitly rotate out data at intervals is to store Aerospike Records in Sets and truncate the Sets from the associated Namespace, when appropriate.
An example of this is data that accrues over the course of a day, but then is worthless. One way to model the data would be to insert into a Set named for the day, and at the end of the day, the application would truncate the Set.
Q: Does your application volume result multiple servers routinely suffering simultaneous downtime for disrepair or service?"
It is common for Aerospike clusters when the model is architected properly, to replace competing databases at a 1:5 (Aerospike:Other) ratio. When handling downtime, it will be important to configure whether Aerospike will run in AP mode or SC mode.
Go here for more information on Data Consistency.
Truncate the set from the Aerospike Database.
import com.aerospike.client.policy.InfoPolicy;
InfoPolicy infoPolicy = new InfoPolicy();
client.truncate(infoPolicy, namespaceName, setName, null);
System.out.println("Set Truncated.");
Set Truncated.
client.close();
System.out.println("Server connection(s) closed.");
Server connection(s) closed.
Here is a collection of all of the non-Jupyter code from this tutorial.
// Import Java Libraries
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
// Import Aerospike Client Libraries
import com.aerospike.client.AerospikeClient;
import com.aerospike.client.Key;
import com.aerospike.client.Bin;
import com.aerospike.client.policy.ClientPolicy;
import com.aerospike.client.Value;
import com.aerospike.client.Record;
import com.aerospike.client.policy.InfoPolicy;
InfoPolicy infoPolicy = new InfoPolicy();
// Start the Aerospike Client.
AerospikeClient client = new AerospikeClient("localhost", 3000);
System.out.println("Initialized the client and connected to the cluster.");
// Create a Key using Namespace Set and User Key
String namespaceName = "test";
String setName = "dm101set";
Integer theKey = 0; // A key can be any value.
Key key = new Key(namespaceName, setName, theKey);
System.out.println("Key created." );
// Create Bins of Data.
// A. Integer
Integer anInteger = 8;
String integerBinName = "int";
ClientPolicy clientPolicy = new ClientPolicy();
Bin bin0 = new Bin(integerBinName, anInteger);
// B. String
String aString = "modeling";
String stringBinName = "str";
Bin bin1 = new Bin(stringBinName, aString);
// C. List
ArrayList<Value> aTuple = new ArrayList<Value>();
aTuple.add(Value.get(9.92));
aTuple.add(Value.get("Carl Lewis"));
aTuple.add(Value.get("Seoul, South Korea"));
aTuple.add(Value.get("September 24, 1988"));
String tupleBinName = "tuple";
Bin bin2 = new Bin(tupleBinName, aTuple);
client.put(clientPolicy.writePolicyDefault, key, bin2);
// D. Map
String mapTupleBinName = "maptuple";
String tupleMapKey = "world-records";
ArrayList<Value> tupleList = new ArrayList<Value>();
tupleList.add(Value.get(aTuple));
HashMap <String, ArrayList> wrMap = new HashMap <String, ArrayList>();
wrMap.put(tupleMapKey, tupleList);
Bin bin3 = new Bin(mapTupleBinName, wrMap);
// Put the Bins into Aerospike
client.put(clientPolicy.writePolicyDefault, key, bin0, bin1, bin2, bin3);
// Get the Record from Aerospike.
Record record = client.get(null, key);
System.out.println("Read from Aerospike –");
System.out.println("Generation count: " + record.generation);
System.out.println("Record expiration: " + record.expiration);
System.out.println( integerBinName + ": " + record.getValue(integerBinName));
System.out.println( stringBinName + ": " + record.getValue(stringBinName));
System.out.println( tupleBinName + ": " + record.getValue(tupleBinName));
System.out.println( mapTupleBinName + ": " + record.getValue(mapTupleBinName));
// Truncate the Set.
client.truncate(infoPolicy, namespaceName, setName, null);
System.out.println("Set Truncated.");
// Close Client Connections.
client.close();
System.out.println("Server connection(s) closed.");
Initialized the client and connected to the cluster. Key created. Read from Aerospike – Generation count: 2 Record expiration: 364672908 int: 8 str: modeling tuple: [9.92, Carl Lewis, Seoul, South Korea, September 24, 1988] maptuple: {world-records=[[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]]} Set Truncated. Server connection(s) closed.
Data modeling with Aerospike is a science, but deep enough that it will seem like an art at first. An intuitive matching of your application object model with Aerospike's data model will generally result in a successful application.
When pushing the envelope of performance, do not hesitate to use additional resources. A great way to learn more about modeling is to, post questions to the data modeling discussion forum. This is especially worthwhile to optimize Aerospike performance for an application. In addition, discussing requirements with Aerospike's Solutions Architect team will still result in performance improvements and increase your ROI using Aerospike.
Knowing the Right Questions to Ask is the First Step
By nature, the above is incomplete knowledge on Modeling. This notebook may be updated with additional questions over time. Please submit feedback to help refine it.
Have questions? Don't hesitate to reach out if you have additional questions about data modeling at https://discuss.aerospike.com/c/how-developers-are-using-aerospike/data-modeling/143.
Want to check out other Java notebooks?
Are you running this from Binder? Download the Aerospike Notebook Repo and work with Aerospike Database and Jupyter locally using a Docker container.
(https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/MapOperation.html).