> [!tldr] A continuous set of samples of data taken over time A data stream is a source of data that produces samples over time. This is my own crappy definition of it, not a good one. In data streams **order matters**. A data stream can be processed in batches, or near-real time. **Types of data streams:** - A [[Hash Table|key/value]] system with time-based keys - A simple running log of measurements + contextual data like [[Oura Ring]]'s "movement" time-series data, which presents an [[Enumeration|enum]] value every 5 minutes. - The [[PDW]] is a data stream - now that it's getting "as it happens" type data, for sure **Tooling:** - [[Kafka]] for large-scale streams - Simply appending [[CSV]] files or [[NDJSON]] files works - [[Relational Databases]] can be used, but [[Big Data]] technologies may be more useful in practice **** # More ## Source - self / grad school