Concepts
Learn about the core concepts of Synnax.
Synnax is a distributed data engine designed to acquire and store data from, issue commands to, and process data generated by hardware systems. It scales horizontally, and can be deployed on edge devices for data acquisition or in cloud environments for high-performance analysis.
Synnax inherits a hybrid pedigree from hardware data acquisition systems and cloud-native, horizontally scalable databases. We’re focused on keeping the system simple, and these pages will introduce you to the core concepts needed to work effectively with a Synnax cluster.
Distribution Components
Nodes
A node is an individual, running instance of the Synnax executable. The host machine can be an edge device, VM, container, or bare metal server. The only requirement is that it can store data on disk and has an address reachable by other nodes in the cluster.
Clusters
Nodes communicate with each other to form a cluster. The nodes in a cluster collaborate to read, write, and exchange data. Nodes expose the cluster as a monolothic data space, meaning that a user can query a single node for the entire cluster’s data without being aware of where it is actually stored.
Data Components
Now that we’ve covered the basic distributed systems terminology, we’re ready to step into the data components in a cluster.
To illustrate how these components work together, we’ll use the example of a cyclist who takes several two-hour rides over the course of a month. During each ride, they use a speedometer to record their instantaneous speed once per second.
Samples
A sample is a strongly typed value recorded at a specific point in time. The
readings from our cyclicts’s speedometer are reported as float32
values in
kilometers per hour.
Channels
A channel is a logical collection of samples emitted by or representing the values of a single source. We can store the speedometer readings across all the cyclists’s rides in a single channel titled “speed-gps”. We can also create channels that store post-processed results or simulated values. For example, we can record target speeds for each ride in a “speed-target” channel, and then write the difference between our target and actual readings in a “speed-diff” channel. As long as the samples are time-ordered, do not have duplicates (i.e. no two samples have the same timestamp), and have a consistent data type, they can be contained in a channel.
Domains
Domains are continuous, non-overlapping time-partitions of a channel’s data. When writing to a channel, a user must first define the starting timestamp of a new domain. After doing so, they are free to write chunks of data. Once finished, they commit the domain with an ending timestamp to atomically persist it to the channel. If the domain overlaps with a previously written domain, the commit process fails.
We can allocate a new domain at the start of each ride, and then commit the domain once finished. Over the course of three rides, we’ll create a domain for each ride and write 7200 samples to it. Our “speed-gps” channel will now contain 21600 samples.
Ranges
A range (short for “time range”), is a user defined region of a channel’s data. Unlike domains, ranges are purely for categorization and do not affect the structure of a channel’s data. Ranges can be subsections of a domain or span multiple domains. They can also overlap with or contain other ranges. Ranges are typically used to indicate important events or categorize large periods of time.
After each ride, we can identify periods of interest, such as hills or descents, and define ranges to mark them as relevant for analysis. If our cyclist is training for a century race, we can also wrap all of their rides in a ‘century training’ range to keep them nicely categorized.
Series
While domains and ranges virtually separate areas of related data, series are used to hold the actual samples. Series are strongly typed and hold their values in time-order. When writing data to the cluster, a user must provide a frame (see below) containing an series of samples for each channel they wish to write to. When reading data, they receive a frame containining series of samples for each channel across the requested period of time.
Series also have a time range to specify the region of the channel’s data they represent. While a user does not need to specify a time range when writing data, all series read from the cluster will have one specified.
Frames
A frame is a collection of related series. These series forrm a table like
structure comparable to a pandas DataFrame
in Python or a data.frame
in R.
Each column holds a one or more series. They’re the fundamental unit of data
transfer within a cluster, and are used for reads, writes, and internal
replication.
Synnax as a Spreadsheet
A Synnax cluster’s data can be though of as a very large, distributed spreadsheet. Each channel is a column and each row contains several samples.
time | speed-gps | speed-target | speed-diff |
---|---|---|---|
1677282520236429056 | 3.0 | 7.0 | 4.0 |
1678282520236429056 | 12.1 | 7.0 | -5.1 |
1679282520236429056 | 28.2 | 7.0 | -21.2 |
1680282520236429056 | 15.3 | 7.0 | -8.3 |
1681282520236429056 | 22.4 | 7.0 | -15.4 |
1682282520236429056 | 11.5 | 7.0 | -4.5 |
1697282520236429056 | 3.0 | 7.0 | 4.0 |
1698282520236429056 | 3.1 | 7.0 | 3.9 |
1699282520236429056 | 9.6 | 7.0 | -2.6 |
1700282520236429056 | 18.7 | 7.0 | -11.7 |
1701282520236429056 | 13.8 | 7.0 | -6.8 |
1717282520236429056 | 3.0 | 7.0 | 4.0 |
1718282520236429056 | 19.1 | 7.0 | -12.1 |
1719282520236429056 | 27.2 | 7.0 | -20.1 |
1720282520236429056 | 15.3 | 7.0 | -8.3 |
This table describes the data layout for our cyclist’s rides. Each individual channel, such as speed-gps or speed-target, is a series. The collection of series indexed to one timeseries is a frame.