Apache Kafka is an [[Open Source]] **Event**-based system of data streaming & processing. It is designed to scale (horizontally) very well. It would be a great thing to have in an [[Event-Driven Architecture]].
Kafka asks you to think of **events first, and things second**, where _things_ are stateful objects (i.e. [[Tuple]]s, rows in a database), and events are discrete, indications in time, that a thing took place. Whereas objects are stored in tables, events are stored in _logs_. Events stored in logs are guaranteed to be read from the logs in the same order in which they were stored.
# Capabilities
1. To **publish** streams of events for continuous consumption by other sources
2. To **subscribe** to events
3. To **store** streams of events to topics
4. To **process** streams of events
![[Pasted image 20250902222342.png]]
## Principles
- **Dumb Broker, Smart Client** - the [[broker]] is intentionally as simple as possible, leaving business logic & other complexities to the client libraries that serve to interact with them. Even things like "which partition does this record go to" are up to the client to decide.
- **Immutability** - logs are Created & Deleted, but never Updated. If a Log needs changed, a new Log must be created that says the old Log is out of date (but the old Log itself will not change)
# Data Containers
> [!cite] A **topic** is just an ordered collection of events that are stored in a [[Durable File Types|durable]] way.
> Tim Berglund
![[Pasted image 20250902222415.png]]
Topics are broken up into **partitions**, which are located on different kafka [[Broker]]s.
![[Pasted image 20250902222430.png]]
# Connect
Kafka Connect is an API for integrating Kafka in with other services. Getting data into & out of Kafka, setting up Producers and Consumers.
# "Stream"
A Java API to handle filtering, joins, aggregation, etc with Kafka.
****
# More
## Source
- https://kafka.apache.org/intro
- Grad School