New JSON data type for ClickHouse
published on 2024/10/23
In this first post, we’ll dive into how we built this feature, addressing all of the aforementioned challenges (and past limitations) while showing you why our implementation stands out as the best possible implementation of JSON on top of columnar storage featuring support for:
- Dynamically changing data: allow values with different data types (possibly incompatible and not known beforehand) for the same JSON paths without unification into a least common type, preserving the integrity of mixed-type data.
- High performance and dense, true column-oriented storage: store and read any inserted JSON key path as a native, dense subcolumn, allowing high data compression and maintaining query performance seen on classic types.
- Scalability: allow limiting the number of subcolumns that are stored separately, to scale JSON storage for high-performance analytics over PB datasets.
- Tuning: allow hints for JSON parsing (explicit types for JSON paths, paths that should be skipped during parsing, etc).