A deep dive into the development of a macro to explode your nested N-level JSON columns
As JSON is a well-known and widely used standardized encoding format, exploding JSON becomes a must-have task for data engineers and analysts. The functions that can parse JSON are supported in many data warehouses nowadays, including Snowflake. With the FLATTEN function in Snowflake, we can parse nested JSON into columns easily. If you have experienced using dbt (data build tool) to perform the transformation jobs, you might expect a user-defined function to perform a flattened JSON job, no matter what the dataset and JSON columns are. In this post, I will show you how to develop the dbt Macro to flatten JSON step-by-step.
With only SQL, we need to write down the JSON path of the elements we want. A Macro with Jinja gives us the capabilities beyond SQL to flatten JSON with only table and the JSON column names as input. The output will be a flattened table.
Below is a screenshot for the sample source.
Sample Source
What is the final result with the macro? Take a glance below.
Sample Outputs
We will go through the sample data, and the development steps in the following sessions. You can also jump to the Put it together session at the bottom to copy the Macro script directly.
The Snowflake Sample Data: OpenWeatherMap contains tables constructed by JSON columns and timestamp columns.
Screenshot of snowflake_sample_date.weather.weather_14_total
Here is an example cell selected from the sample table column āVā. The JSON object contains various data types: strings, numbers, arrays, and objects (dictionaries), which makes it a comprehensive example to develop a flattening solution.
An example JSON data from snowflake_sample_date.weather.weather_14_total