**Data Package** is an open source set of standards and associated softwares developed by a not-for-profit organization called "Open Knowledge". The describe data package as:
> Data Package is a **standard** consisting of a set of **simple yet extensible specifications** to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates **findability, accessibility, interoperability, and reusability (FAIR)** of data.
# Software
- **Open Data Editor** - looks like Excel-ish
- **Data Curator** - GUI for describing, validating, and sharing data
- **Flatterer** - an opinionated converter of data between various formats including [[CSV]], [[JSON]], [[SQLite]], [[Postgres]], [[Parquet]], and [[Excel]]
- Several **validators** for various languages exist - often under the branding _frictionless-_(language shortname)
- `frictionless-py`, `frictionless-js`, `frictionless-r`, and julia, ruby, java, php, swift, and go
> [!tip] The standard doesn't matter - it's what you can **do** with the tools that *use* it.
# Specs
## Data Package
A description of a collection of data in a single package. Includes [[Data Package (standard)#Data Resource]]s, and optionally:
- Name, id, licenses, title, description, homepage, image, version, created, keywords, contributors (with given properties), sources (with given properties)
## Data Resource
A description of a single data source, e.g. a file or table. Includes things like `name`& `path` (both required), `title`, `description`, `format`, `mediatype` (mimetype), `encoding` (e.g. utf-8), `bytes` (file size), `hash`, `sources`, `licenses`, and a [[Data Package (standard)#Table Dialect]] for any tabular data, and a [[Data Package (standard)#Table Schema]].
## Table Dialect
A description of how the dataset should be interpreted - things like "what escape character are you using?". This is defined for lots of dialects covering data types from [[CSV]] to [[Spreadsheet]]s to [[SQL]] tables.
## Table Schema
The thing that made me learn about Data Package - table schema allows you to define and describe how different types of tables should look. You can insert validation rules for [[CSV]]s and other helpful metadata. You have overall schema descriptors, then descriptors for each of the fields contained therein. You can even handle [[Primary Key]]s and [[Foreign Key]]s.
This _can_ hook into [[Resource Description Framework|RDF]] at a field level. It can also hook into [[JSON Schema]] for Object/JSON type values. Sweet.
You can specify all sorts of [[Data Types]] and constraints, such as [[Enumeration]]s.
****
# More
## Source
- https://datapackage.org/