Data Package (standard)

**Data Package** is an open source set of standards and associated softwares developed by a not-for-profit organization called "Open Knowledge". The describe data package as: > Data Package is a **standard** consisting of a set of **simple yet extensible specifications** to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates **findability, accessibility, interoperability, and reusability (FAIR)** of data. # Software - **Open Data Editor** - looks like Excel-ish - **Data Curator** - GUI for describing, validating, and sharing data - **Flatterer** - an opinionated converter of data between various formats including [[CSV]], [[JSON]], [[SQLite]], [[Postgres]], [[Parquet]], and [[Excel]] - Several **validators** for various languages exist - often under the branding _frictionless-_(language shortname) - `frictionless-py`, `frictionless-js`, `frictionless-r`, and julia, ruby, java, php, swift, and go > [!tip] The standard doesn't matter - it's what you can **do** with the tools that *use* it. # Specs ## Data Package A description of a collection of data in a single package. Includes [[Data Package (standard)#Data Resource]]s, and optionally: - Name, id, licenses, title, description, homepage, image, version, created, keywords, contributors (with given properties), sources (with given properties) ## Data Resource A description of a single data source, e.g. a file or table. Includes things like `name`& `path` (both required), `title`, `description`, `format`, `mediatype` (mimetype), `encoding` (e.g. utf-8), `bytes` (file size), `hash`, `sources`, `licenses`, and a [[Data Package (standard)#Table Dialect]] for any tabular data, and a [[Data Package (standard)#Table Schema]]. ## Table Dialect A description of how the dataset should be interpreted - things like "what escape character are you using?". This is defined for lots of dialects covering data types from [[CSV]] to [[Spreadsheet]]s to [[SQL]] tables. ## Table Schema The thing that made me learn about Data Package - table schema allows you to define and describe how different types of tables should look. You can insert validation rules for [[CSV]]s and other helpful metadata. You have overall schema descriptors, then descriptors for each of the fields contained therein. You can even handle [[Primary Key]]s and [[Foreign Key]]s. This _can_ hook into [[Resource Description Framework|RDF]] at a field level. It can also hook into [[JSON Schema]] for Object/JSON type values. Sweet. You can specify all sorts of [[Data Types]] and constraints, such as [[Enumeration]]s. **** # More ## Source - https://datapackage.org/