Frequently Asked Questions
Data Flow Index (DFI)
Design and Architecture
Is DFI a database?
Whilst DFI is much like a database, and carries out many of the same functions, it is designed to supplement your database. It can be integrated with other components such as a streaming database, a data lake or a data warehouse to deliver end-to-end solutions.
Is DFI open source?
DFI is not open source and there are no plans to open source DFI.
How does DFI compare to Kafka?
DFI and Kafka are similar in that they do not replace the downstream pipeline or the need for a database. Kafka is a message broker and specializes in the delivery of messages across a distributed architecture, whereas DFI specializes in the ingest, index, store and query of spatiotemporal data at scale.
What type of data can be stored in DFI?
DFI is specifically designed for processing spatiotemporal data. Entity IDs up to 128 bits are stored in time and date stamp order with a 3-dimensional WGS84 geospatial point for latitude, longitude and altitude. All intersection and distance operations are computed on a sphere.
There is an optional payload of 255 bytes of metadata per row. This schema enables DFI to outperform other solutions when analyzing real-time streaming data from moving entities or analysis on 10 billion or more spatiotemporal records.
What data volumes does DFI support?
DFI is successfully proven to process and query 100 billion records on a single server and is theoretically designed to handle up to 1 trillion records. Data is stored on a local disk, so the maximum size of the data set is solely constrained by the hardware available.
How does DFI fit in my architecture?
DFI is used to accelerate spatiotemporal analysis in two key areas: processing of real-time streaming data and/or fast analysis of massive data sets.
- Embed DFI within the workflow to feed real time applications, detect events, create alerts, enable fast ad-hoc queries on streaming data or just filter incoming data for efficiency down the line
- Append DFI alongside current architecture to enable fast and effective ad-hoc queries on masses of data or to feed live, analytical dashboards with historic trend analysis
How do I use the DFI?
The DFI can be accessed / integrated with a standard Web API (https://api.dataflowindex.io/docs/api/). In addition we also provide a client-side Python package (https://pypi.org/project/dfipy/) to easily query the DFI from common data analytics platforms, such as Jupyter Notebooks.
Features and Capabilities
What type of queries can DFI run?
All points, count of points or unique sensors in a polygon or in a bounding box.
What can’t DFI do?
Roadmap capabilities, including:
- Aggregation functions such as “Group by” and analysis such as average, maximum and minimum speed
- Index and query of geometries other than points (e.g. polygons, polylines, etc.)
- Data persistency
- Storing multiple tables in a single DFI instance
- Geo fencing, i.e. alert when entities enter or exit a space
- Proximity queries (e.g. nearest neighbours)
How quickly does DFI make data available for querying?
Data is ingested and indexed at approximately 5 million rows per second. Each record can be queried immediately after ingestion.
How does DFI handle out of order events?
DFI assumes that ingestion is in temporal order. Modestly out-of-order events have little to no performance impact, depending on the size and frequency of delays. Ingestion in random temporal order adversely impacts the performance of queries with temporal attributes but does not affect performance of geospatial and entity search.
What happens when DFI is filled to capacity?
DFI enters “read only” mode. Ingestion of new data is suspended but existing data can be queried.
Does DFI support rollups and summaries?
These can be generated from the underlying raw data but are not currently computed automatically.
When should I use DFI?
Consider using DFI for quick and efficient analysis of extensive spatiotemporal data. DFI indexes data at 5 million rows per second, making it ideal for processing real-time data from moving entities. It excels at complex queries on billions of records, especially when combining data sets of 20-100 billion records that other technologies struggle with.
Is DFI SaaS or on-premise?
DFI is SaaS. In specific use cases, it may be available on-premise, on a private cloud or in future, at the edge.
Support
What level of skills are required to implement and work with the platform?
General System customer engineers will support DFI installation, setup and introductory training. Users should be familiar with Python to get the most out of the solution.