Firebolt — the new kid on the (data warehousing) block
Recently, we started using Looker for ad-hoc queries of business users in the company (PMs, managers, marketing, and basically everyone). One main problem we have though, is that our Looker performance is pretty bad — queries just take too long.
The performance isn’t bad because of Looker of course, but because of the underlying engine: Trino (formerly PrestoSQL) over parquet files in S3. Simple fetch queries take 10–12 seconds, and more complicated ones take over 30s — we realized we should find a different solution.
We considered using Snowflake, as it’s already being used in the company by another department, but we also heard about the new kid in the block — Firebolt.
It’s important to mention that this post is going to give you a brief summary of Firebolt’s product, and how it works behind the scenes (in a high level) — but at the time of writing these lines, we haven’t tested it yet (see why at the “Connectivity” section below). The goal is to give you some information so you can decide if it’s relevant for you to try Firebolt for your use-case.
What is Firebolt?
Firebolt is a data warehouse that lets you ingest your data to its dedicated storage with a proprietary format called F3 (pronounced “TripleF”). Like Snowflake, it provides you with separated compute units called Firebolt Engines to query that data (and to ingest it).
Firebolt claims to be at least 10x faster, while maintaining a reasonable level of resource consumption (meaning it delivers this speed without the need of an appropriate increase in resources).
They claim this performance can be achieved at a much lower cost than what it would cost in Snowflake.
How it works?
The speed is achieved mainly through efficient data scanning (Firebolt claims to scan significantly less data == less I/O == faster queries) using Sparse Indices. It basically means that per-table, you define the fields that are going to appear in your WHERE clause, and Firebolt sorts, partitions, and indexes the table by those fields.
Additional optimization can be achieved for 2 tasks — aggregations and joins, by using 2 tools Firebolt provides: Aggregating Index, and Join Index:
- Aggregating Index: it is similar to a materialized view, where commonly used aggregations are computed in advance, but unlike a materialized view, the queries seamlessly leverage this Aggregating Index behind the scenes, and fetch the pre-computed aggregations.
- Join Index: if you know some tables are being joined often, you can create this index that generates look-up tables and other optimizations on the relevant fields of the tables, in order to perform the joins faster.
Currently there are 2 ways to integrate with Firebolt:
But in the near future there will also be an official Looker connector available, and probably more such native connections will be available in the following year.
The reason we haven’t tested Firebolt yet is the lack of an official Looker connector. Our Looker deployment is a managed solution, as opposed to deploying Looker on our own instances, and therefore we cannot use Looker’s general JDBC connector (which is available only in the non-managed deployment).
As soon as the official Looker connector will be available, I’m going to give it a try and update this post with the results.
UPDATE (August 31st, 2021)
There is now an official Looker connector for Firebolt. We haven’t tested it yet, because we needed them to have a SOC 2 Type 2 compliance, which they are still in the progress of getting (although I think they’ll get it in the following weeks).
Firebolt seems like a very elegant solution to the performance problem we all have with ad-hoc and interactive queries. If they actually deliver what they claim to — I guess Snowflake now has a scary competitor.
— Anton Ego, Ratatouille
If you face the challenge of slow performance in your BI tools, I think you should check them out. And after you did — it would be pretty helpful if you’d share in the comments your findings from using Firebolt.