![]() ![]() Consider a data set about articles, where each article has an array of tags. UNNEST also serves as a bridge to the relational model. UNNEST can be a good way to flatten the output. Athena’s CSV output does not handle array and map data properly, and in general tools expect CSV to be flat. When to unnestīeing able to work with arrays and maps is very powerful, but most often you don’t want these data structures in the final result. The result is identical to the previous example where the cities and rivers were in separate arrays (except for the row with Marseille that was missing a river and not included here). Using the table in the examples above, this is how you would use it:įROM country_geography, UNNEST (rivers_by_city ) AS t (city, river ) You can add the WITH ORDINALITY clause to an UNNEST expression to get the element index as a separate column. As long as one of the arrays have an element the query will behave as if the other arrays were padded with NULL. Only when all of the arrays of a row are empty will that row be missing from the result. ![]() UNNEST can probably be said to be like an inner join, because when an array is empty no rows are produced from its row, just like when you inner join and a value for the join key does not exist in the other table.Īs we saw in the example of unnest with multiple arrays it’s slightly more complicated in that situation. Unnesting an array is a form of join, and different joins deal differently with missing values. If you changed the query to SELECT * FROM … you would get the following result, which may help you understand what’s going on: The result of the cross join is a relation with the source rows repeated once per element in the source row’s array, and an extra column that is the element itself. However, I think that Athena has a special case for this type of cross join that knows that it should only combine each value in the unnested relation with the current row in the relation where the array comes from – and you can’t replace it with any other type of join, so I think of it as syntactic sugar to fit the feature into the SQL structure. Cross joins can produce huge results as they combine each row in each relation with every other row in the other relation. Moving on to the CROSS JOIN, this may look a bit scary. It helps me to think of this expression as pivoting the horizontal array into a vertical column, and that there exist a hidden column that tells Athena which source row each row in this new relation came from. The alias unnested_cities is arbitrary, but more on that later. It tells Athena to for each row, flatten the array cities into a relation called unnested_cities that has a column called city. The last line contains a lot, but it’s the UNNEST(cities_and_countries.cities) AS unnested_cities (city) part that is the most important. It’s going to be easiest to understand this query by starting from the end. Another way to think of it is that the source table has been joined with another table with all the array elements, using a join key that identifies which row they belonged to. It’s like the arrays have been pivoted (or unpivoted, depending on your point of view). Say you have an Athena table called cities_and_countries that is set up to read JSON data looking like this: UNNEST is a bit peculiar as it is is an operator that produces a relation, unlike most functions which transform or aggregate scalar values. In this article I will cover how to flatten arrays to rows, how to flatten maps to rows, but also when you should be using UNNEST. In data formats like JSON it’s very common to have arrays and map properties, and one question that often comes up is how you flatten these structures to work better in a traditional tabular format – in other words, how to turn array elements into rows. ![]() Most of the time you also want something flat as output, as Athena’s CSV output format isn’t really suitable for complex values, and most consumers of the output probably don’t handle complex values either. This makes it possible to do pretty advanced things, but it’s not always easy to wrap your head around what’s going on since almost everything in the SQL world was made for scalar column values. Queries can also aggregate rows into arrays and maps. In fact, they can be deep structures of arrays and maps nested within each other. In contrast to many relational databases, Athena’s columns don’t have to be scalar values like strings and numbers, they can also be arrays and maps.
0 Comments
Leave a Reply. |