firebolt

A golang framework for streaming event processing & data pipeline apps

Introduction

Firebolt has a simple model intended to make it easier to write reliable pipeline applications that process a stream of data.

It can be used to build systems such as:

logging/observability pipelines
streaming ETL
event processing pipelines

Every application's pipeline starts with a single source, the component that receives events from some external system. Sources must implement the node.Source interface.

We provide one built-in source:

kafkaconsumer - Events come from a Kafka topic, and are passed to the root nodes as []byte

The processing of your application is executed by its nodes which form a processing tree. Data - events - flow down this tree. A parent node passes results down to it's child nodes. Nodes may process events synchronously or asynchronously. A synchronous node type node.FanoutNode returns a slice of results for fanout or 'demultiplexing' use cases. Each node must implement the node.SyncNode, node.FanoutNode, or node.AsyncNode interfaces accordingly.

We provide two built-in node types:

kafkaproducer - Events are produced onto a kafka topic by an asynchronous producer.
elasticsearch - Events are bulk indexed into Elasticsearch.

Firebolt has both run and compile-time dependencies on librdkafka, see Developing

Example: Logging Pipeline

At DigitalOcean, our first use of Firebolt was in our logging pipeline. This pipeline consumes logs from just about every system we run. The diagram below depicts the source and nodes in this application.

This system uses the built-in kafkaconsumer source (in yellow) and kafkaproducer and elasticsearch nodes (in green). The blue nodes are custom to this application.

What does Firebolt do for me?

Firebolt is intended to address a number of concerns that are common to near-realtime data pipeline applications, making it easy to run a clustered application that scales predictably to handle large data volume.

It is not an analytics tool - it does not provide an easy way to support 'wide operations' like record grouping, windowing, or sorting that require shuffling data within the cluster. Firebolt is for 'straight through' processing pipelines that are not sensitive to the order in which events are processed.

Some of the concerns Firebolt addresses include:

kafka sources Minimal configuration and no code required to consume from a Kafka topic, consumer lag metrics included
kafka sinks Same for producing to a Kafka topic
loose coupling Nodes in the pipeline are loosely coupled, making them easily testable and highly reusable
simple stream filtering Filter the stream by returning nil in your nodes
convenient error handling Send events that fail processing to a kafka topic for recovery or analysis with a few lines of config
outage recovery: offset management Configurable Kafka offset management during recovery lets you determine the maximum "catch up" to attempt after an outage, so you can quickly get back to realtime processing.
outage recovery: parallel recovery After an outage, process realtime data and "fill-in" the outage time window in parallel, with a rate limit on the recovery window.
monitorability Firebolt exposes Prometheus metrics to track the performance of your Source and all Nodes without writing code. Your nodes can expose their own custom internal metrics as needed.
leader election Firebolt uses Zookeeper to conduct leader elections, facilitating any processing that may need to be conducted on one-and-only-one instance.

Documentation

Configuration The configuration file format
Execution How Firebolt processes your data
Registry Adding node types to the registry
Sample Application Code Example code for running the Firebolt executor
Sources Implementing and using sources
Sync Nodes Implementing and using synchronous nodes
Fanout Nodes Implementing and using fanout nodes
Async Nodes Implementing and using asynchronous nodes
Leader Election Starting leader election and accessing election results
Messaging How to send and receive messages between the components of your system
Metrics What metrics are exposed by default, and how to add custom metrics to your nodes

Built-In Types

Kafka Producer Node for producing events onto a Kafka topic
Elasticsearch Node for indexing documents to an Elasticsearch cluster

Developing

Firebolt depends on librdkafka v1.3.0 or later. To get started building a firebolt app (or working on firebolt itself), install it following the instructions here.

An example for debian-based distros:

sudo wget -qO - https://packages.confluent.io/deb/5.4/archive.key | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://packages.confluent.io/deb/5.4 stable main"
sudo apt-get update
sudo apt-get install -y librdkafka1 librdkafka-dev

Name	Name	Last commit message	Last commit date
Latest commit dependabot[bot] Bump golang.org/x/net from 0.7.0 to 0.17.0 (#49 ) Oct 12, 2023 af56142 · Oct 12, 2023 History 42 Commits
.github/workflows	.github/workflows	Add message metrics (#38 )	Oct 31, 2022
config	config	Add New FanoutNode Node Type (#43 )	Feb 3, 2023
docs	docs	Support configurable bucket ranges for bulk_process_time Histogram (#48 )	Mar 23, 2023
examples	examples	Update docs and examples to use new kafkaproducer API (#16 )	Feb 12, 2020
executor	executor	Add New FanoutNode Node Type (#43 )	Feb 3, 2023
fbcontext	fbcontext	Add message metrics (#38 )	Oct 31, 2022
internal	internal	Multi-Topic Kafka Producer (#14 )	Feb 11, 2020
inttest	inttest	Add New FanoutNode Node Type (#43 )	Feb 3, 2023
kafka	kafka	Add message metrics (#38 )	Oct 31, 2022
leader	leader	Add message metrics (#38 )	Oct 31, 2022
message	message	Add message metrics (#38 )	Oct 31, 2022
metrics	metrics	Add message metrics (#38 )	Oct 31, 2022
node	node	Support configurable bucket ranges for bulk_process_time Histogram (#48 )	Mar 23, 2023
testutil	testutil	Upgrade Kafka Client, Address Dependency Vulnerability (#32 )	Jul 27, 2021
util	util	Add message metrics (#38 )	Oct 31, 2022
.gitignore	.gitignore	Initial migration from DO internal repo	Oct 31, 2019
LICENSE	LICENSE	Full text of license	Nov 11, 2019
Makefile	Makefile	Add message metrics (#38 )	Oct 31, 2022
README.md	README.md	Add New FanoutNode Node Type (#43 )	Feb 3, 2023
coverage_badge.png	coverage_badge.png	Add New FanoutNode Node Type (#43 )	Feb 3, 2023
error.go	error.go	Initial migration from DO internal repo	Oct 31, 2019
error_test.go	error_test.go	Update all package paths	Nov 11, 2019
event.go	event.go	Initial migration from DO internal repo	Oct 31, 2019
go.mod	go.mod	Bump golang.org/x/net from 0.7.0 to 0.17.0 (#49 )	Oct 12, 2023
go.sum	go.sum	Bump golang.org/x/net from 0.7.0 to 0.17.0 (#49 )	Oct 12, 2023
helpers.go	helpers.go	Support configurable bucket ranges for bulk_process_time Histogram (#48 )	Mar 23, 2023
types.go	types.go	Multi-Topic Kafka Producer (#14 )	Feb 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

firebolt

Introduction

Example: Logging Pipeline

What does Firebolt do for me?

Documentation

Built-In Types

Developing

About

Releases 13

Packages

Contributors 10

Languages

License

digitalocean/firebolt

Folders and files

Latest commit

History

Repository files navigation

firebolt

Introduction

Example: Logging Pipeline

What does Firebolt do for me?

Documentation

Built-In Types

Developing

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases 13

Packages 0

Contributors 10

Languages

Packages