Apache Kafka is an [[Open Source]] **Event**-based system of data streaming & processing. It is designed to scale (horizontally) very well. It would be a great thing to have in an [[Event-Driven Architecture]]. Kafka asks you to think of **events first, and things second**, where _things_ are stateful objects (i.e. [[Tuple]]s, rows in a database), and events are discrete, indications in time, that a thing took place. Whereas objects are stored in tables, events are stored in _logs_. Events stored in logs are guaranteed to be read from the logs in the same order in which they were stored. # Capabilities 1. To **publish** streams of events for continuous consumption by other sources 2. To **subscribe** to events 3. To **store** streams of events to topics 4. To **process** streams of events ![[Pasted image 20250902222342.png]] ## Principles - **Dumb Broker, Smart Client** - the [[broker]] is intentionally as simple as possible, leaving business logic & other complexities to the client libraries that serve to interact with them. Even things like "which partition does this record go to" are up to the client to decide. - **Immutability** - logs are Created & Deleted, but never Updated. If a Log needs changed, a new Log must be created that says the old Log is out of date (but the old Log itself will not change) # Data Containers > [!cite] A **topic** is just an ordered collection of events that are stored in a [[Durable File Types|durable]] way. > Tim Berglund ![[Pasted image 20250902222415.png]] Topics are broken up into **partitions**, which are located on different kafka [[Broker]]s. ![[Pasted image 20250902222430.png]] # Connect Kafka Connect is an API for integrating Kafka in with other services. Getting data into & out of Kafka, setting up Producers and Consumers. # "Stream" A Java API to handle filtering, joins, aggregation, etc with Kafka. **** # More ## Source - https://kafka.apache.org/intro - Grad School