**Central Repository of integrated, historical data from disparate sources**
A staple of business intelligence, a data warehouse is a central repository of integrated, historical (and sometimes current) data from many disparate sources. Enabling a common interface and data history, as a data warehouse contains **copies** of the data, as opposed to the data staying in the source locations.
Or, per my grad school class:
> [!cite] what is a data warehouse?
> a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
Gave rise to the terminology "facts" & "dimensions". ([[Facts vs Dimensions]])
# Data Warehouses vs Database
Data Warehouses are distinct from databases, despite being made up of them.
| | Database | Data Warehouse |
| ------------------ | ----------------------------------------- | ----------------------------------- |
| Purpose | Transactional Processing | Analytical Processing ([[OLAP]]) |
| Data Normalization | Normalized (to reduce read/write latency) | Denormalized (to reduce query time) |
| Volume | Smaller, current data | Larger, historical data |
| Update Frequency | Real time, all the time | Periodic, in batches |
| Use Case | Day-to-day operations | Business intelligence & analysis |
A data warehouse is not _just_ databases, though. Data warehousing is an architectural construct, comprising of databases themselves, but also the processes, metadata, and tooling around them.
> [!tldr] In short:
> - Databases are for running the business.
> - Data warehouses are for analyzing the business.
****
# More
## Source
- [Data warehouse](https://en.wikipedia.org/wiki/Data_warehouse)
- Gemini
- Grad school
## Related
- [[Facts vs Dimensions]]
- [[Data Normalization Levels]]
- [[Relational Databases]]
- [[Star Schema]]
- [[Snowflake Schema]]
- [[Four-Step Data Warehouse Design Process]]