> [!tldr] Graph, hierarchy, key/value, tables, and streams, in descending order of flexibility There are only a few fundamentally different structures for data. - [[Graph]] ← Most flexible, hardest to query - [[Hierarchy]]/tree - [[Hash Table|key/value]] - [[Tabular Data]] - [[Data Stream]] ← Least flexible, easiest to query (sorta) ...then there are Blobs, which are shapeless. [[Everything can be represented in a graph]]. You can use a graph to **losslessly** represent *any* other data structure. # Data File Shapes Table | Shape | Description | Expressiveness | Schema Language | File Formats | Databases | Diagrams | | ---------------------------- | -------------------------------------------------------------------------------- | ------------------------------ | -------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------- | | [[Graph]] | Nodes & edges, optionally with properties | Highest | [[SHACL]], [[OWL]],[[GQL]] | [[JSON-LD]], [[Turtle]], [[graphml]], | [[Neo4j]], [[ArangoDB]], [[Amazon Neptune]] | [[Graph Visual Representation]] | | [[Hierarchy]]/tree | Entities with properties, which themselves may be entities | High | [[JSON Schema]] | [[JSON]], [[YAML]], [[XML]] | [[MongoDB]], [[CouchDB]], [[EA Landscapes]], [[Firestore]] | [[Mind Mapping]], Treemaps, [[Fishbone Diagrams]], Organizational charts | | [[Tabular Data]] | Rows and columns | Medium | [[Data Package (standard)\|Frictionless]], [[SQL DDL]], [[Parquet Schema]] | [[CSV]], [[TSV]], [[Spreadsheet]] formats, [[Parquet]], [[SQLite]] files. | [[Postgres]], [[MySQL]], [[SQLite]], [[DuckDB]], others ad nauseam | [[Entity-Relationship Diagrams]], [[PivotTables]], [[Heatmap]]s, Tables & grids | | [[Hash Table\|key/value]] | Named values in a list where order doesn't matter | Low | None, really | `.env` files, [[TOML]] files (sort of), Dotfiles | [[Redis]], Cloudflare KV Store | Table | | [[Data Stream]]s & sequences | Unnamed values in a list where order does matter | Lowest, sort of, see ==below== | Some I've never heard of. You can use [[JSON Schema]], too. | [[NDJSON]], [[Kafka]] topics, `.log` files, arguably [[CSV]] files. Oh and [[Plaintext]] files including [[Markdown]]. | [[Kafka]], Amazon Kinesis, RabbitMQ, InfluxDB and [[Time-series Databases]] | [[Sequence Diagrams]], [[Timeline]]s, [[Line Plot]]s | | [[Blob]] | Is what it sounds like - a blob of 0's and 1's that represent something specific | - | - | Binary files, PNG, JPEG, WebP, MP4, MKV, WAV, MP3, EXE, WASM | [[S3 Bucket]]s, some other specialized databases | - | ==Notes on Data Streams==: - I'm being a bit narrow in my definition of data streams. I'm mostly thinking of datastreams in a particular context represented mostly with [[Plaintext]]. What they represent is implied in the context. - A datastream arguably is what *everything* is under the hood, because all digital files are [[Binary]] sequences. - Also - the "expressiveness" ranking sort of breaks down when you start writing prose, which is an ordered sequence of characters. - Also data streams are typically more thought of as a "access pattern" than a data structure. **** # More ## Source - self + chat with Claude