**Central Repository of integrated, historical data from disparate sources** A staple of business intelligence, a data warehouse is a central repository of integrated, historical (and sometimes current) data from many disparate sources. Enabling a common interface and data history, as a data warehouse contains **copies** of the data, as opposed to the data staying in the source locations. Or, per my grad school class: > [!cite] what is a data warehouse? > a subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes Gave rise to the terminology "facts" & "dimensions". ([[Facts vs Dimensions]]) # Data Warehouses vs Database Data Warehouses are distinct from databases, despite being made up of them. | | Database | Data Warehouse | | ------------------ | ----------------------------------------- | ----------------------------------- | | Purpose | Transactional Processing | Analytical Processing ([[OLAP]]) | | Data Normalization | Normalized (to reduce read/write latency) | Denormalized (to reduce query time) | | Volume | Smaller, current data | Larger, historical data | | Update Frequency | Real time, all the time | Periodic, in batches | | Use Case | Day-to-day operations | Business intelligence & analysis | A data warehouse is not _just_ databases, though. Data warehousing is an architectural construct, comprising of databases themselves, but also the processes, metadata, and tooling around them. > [!tldr] In short: > - Databases are for running the business. > - Data warehouses are for analyzing the business. **** # More ## Source - [Data warehouse](https://en.wikipedia.org/wiki/Data_warehouse) - Gemini - Grad school ## Related - [[Facts vs Dimensions]] - [[Data Normalization Levels]] - [[Relational Databases]] - [[Star Schema]] - [[Snowflake Schema]] - [[Four-Step Data Warehouse Design Process]]