Skip to content

Commit 6edbf57

Browse files
m-sanjana19stiga-huang
authored andcommitted
IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays
Currently, the two topics, Querying Arrays and Zipping Unnest on Arrays from Views, were missing. The documentation has been added, and the parent topic has been updated with references to the child topics. Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b Reviewed-on: http://gerrit.cloudera.org:8080/21651 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Quanlong Huang <huangquanlong@gmail.com>
1 parent 6736063 commit 6edbf57

4 files changed

Lines changed: 219 additions & 3 deletions

File tree

docs/impala.ditamap

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,10 @@ under the License.
108108
</topicref>
109109
<topicref href="topics/impala_tinyint.xml"/>
110110
<topicref href="topics/impala_varchar.xml"/>
111-
<topicref href="topics/impala_complex_types.xml"/>
111+
<topicref href="topics/impala_complex_types.xml">
112+
<topicref href="topics/impala_queryingarrays.xml"/>
113+
<topicref href="topics/impala_unnest_views.xml"/>
114+
</topicref>
112115
</topicref>
113116
<topicref href="topics/impala_literals.xml"/>
114117
<topicref href="topics/impala_operators.xml"/>

docs/topics/impala_complex_types.xml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,6 @@ under the License.
4545
and higher. The Hive <codeph>UNION</codeph> type is not currently supported.
4646
</p>
4747

48-
<p outputclass="toc inpage"/>
49-
5048
<p>
5149
Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax
5250
and examples:
@@ -65,6 +63,9 @@ under the License.
6563
<xref href="impala_map.xml#map"/>
6664
</li>
6765
</ul>
66+
<p outputclass="toc inpage"/>
67+
<p>For information on querying arrays and improvements to zipping unnest functionality, refer to
68+
the following:</p>
6869

6970
</conbody>
7071

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed to the Apache Software Foundation (ASF) under one
4+
or more contributor license agreements. See the NOTICE file
5+
distributed with this work for additional information
6+
regarding copyright ownership. The ASF licenses this file
7+
to you under the Apache License, Version 2.0 (the
8+
"License"); you may not use this file except in compliance
9+
with the License. You may obtain a copy of the License at
10+
11+
http://www.apache.org/licenses/LICENSE-2.0
12+
13+
Unless required by applicable law or agreed to in writing,
14+
software distributed under the License is distributed on an
15+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
KIND, either express or implied. See the License for the
17+
specific language governing permissions and limitations
18+
under the License.
19+
-->
20+
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
21+
<concept id="impala_queryingarrays" rev="4.1.0">
22+
<title>Querying arrays (<keyword keyref="impala41"/> or higher only)</title>
23+
<titlealts audience="PDF">
24+
<navtitle>Querying arrays</navtitle>
25+
</titlealts>
26+
<prolog>
27+
<metadata>
28+
<data name="Category" value="Impala"/>
29+
<data name="Category" value="Impala Data Types"/>
30+
<data name="Category" value="SQL"/>
31+
<data name="Category" value="Data Analysts"/>
32+
<data name="Category" value="Developers"/>
33+
<data name="Category" value="Schemas"/>
34+
</metadata>
35+
</prolog>
36+
<conbody>
37+
<p rev="4.1.0">
38+
<indexterm audience="hidden">Querying arrays</indexterm> Describes how to use UNNEST function
39+
to query arrays. ARRAY data types represent collections with arbitrary numbers of elements,
40+
where each element is the same type.</p>
41+
<section id="section_yl4_2qb_3cc">
42+
<title>Querying arrays using JOIN and UNNEST</title>
43+
<p>You can query arrays by making a join between the table and the array inside the table.
44+
This approach is improved with the introduction of the <codeph>UNNEST</codeph> function in
45+
the <codeph>SELECT</codeph> list or in the <codeph>FROM</codeph> clause in the
46+
<codeph>SELECT</codeph> statement. When you use <codeph>UNNEST</codeph>, you can provide
47+
more than one array in the <codeph>SELECT</codeph> statement. If you use JOINs for querying
48+
arrays it will yield a <term>joining unnest</term> however the latter will provide a
49+
<term>zipping unnest</term>.</p>
50+
</section>
51+
<section id="section_hmf_hqb_3cc">
52+
<title>Example of querying arrays using JOIN</title>
53+
<p>Use <codeph>JOIN</codeph> in cases where you must join unnest of multiple arrays. However
54+
if you must zip unnest then use the newly implemented <codeph>UNNEST</codeph> function.</p>
55+
<p>Here is an example of a <codeph>SELECT</codeph> statement that uses JOINs to query an
56+
array.</p>
57+
<codeblock id="codeblock_uqy_rqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl, tbl.arr1, tbl.arr2;
58+
59+
ID, ARR1.ITEM, ARR2.ITEM
60+
[1, 1, 10]
61+
[1, 1, 11]
62+
[1, 2, 10]
63+
[1, 2, 11]
64+
[1, 3, 10]
65+
[1, 3, 11]
66+
</codeblock>
67+
<note id="note_tpb_wln_htb">
68+
<p>The test data used in this example is ID: 1, arr1: {1, 2, 3}, arr2: {10, 11}</p>
69+
</note>
70+
</section>
71+
<section id="section_ipq_tqb_3cc">
72+
<title>Examples of querying arrays using UNNEST</title>
73+
<p>You can use one of the two different syntaxes shown here to unnest multiple arrays in one
74+
query. This results in the items of the arrays being zipped together instead of joining.</p>
75+
<ul id="ul_jpq_tqb_3cc">
76+
<li>ISO:SQL 2016 compliant syntax:
77+
<codeblock id="codeblock_kpq_tqb_3cc">SELECT a1.item, a2.item
78+
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2);
79+
</codeblock></li>
80+
<li>Postgres compatible
81+
syntax:<codeblock id="codeblock_yl5_3mn_htb">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock></li>
82+
</ul>
83+
<p><b>Unnest operator in SELECT list</b></p>
84+
<codeblock id="codeblock_lpq_tqb_3cc">SELECT id, unnest(arr1), unnest(arr2) FROM tbl_name;</codeblock>
85+
<p><b>Unnest operator in FROM clause</b></p>
86+
<codeblock id="codeblock_mpq_tqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl_alias, UNNEST(tbl_alias.arr1, tbl_alias.arr2);</codeblock>
87+
<p>This new functionality would zip the arrays next to each other as shown here. </p>
88+
<codeblock id="codeblock_npq_tqb_3cc">ID, ARR1.ITEM, ARR2.ITEM
89+
[1, 1, 10]
90+
[1, 2, 11]
91+
[1, 3, NULL]
92+
</codeblock>
93+
<p>Note, that arr2 is shorter than arr1 so the "missing" items in its column will be filled
94+
with NULLs.</p>
95+
<note id="note_opq_tqb_3cc">The test data used in this example is ID: 1, arr1: {1, 2, 3},
96+
arr2: {10, 11}</note>
97+
</section>
98+
<section id="section_i1g_wqb_3cc">
99+
<title>Limitations in Using UNNEST</title>
100+
<p>
101+
<ul id="ul_j1g_wqb_3cc">
102+
<li>Only arrays from the same table can be zipping unnested</li>
103+
<li>The old (joining) and the new (zipping) unnests cannot be used together</li>
104+
<li>You can add a <codeph>WHERE</codeph> filter on an unnested item only if you add a
105+
wrapper <codeph>SELECT</codeph> and do the filtering
106+
<p>Example:</p><codeblock id="codeblock_k1g_wqb_3cc">SELECT id, arr1_unnest FROM (SELECT id, unnest(arr1) as arr1_unnest FROM tbl_name) WHERE arr1_unnest &lt; 10;</codeblock></li>
107+
</ul>
108+
</p>
109+
</section>
110+
<section id="section_ewb_yqb_3cc">
111+
<title>Using ARRAY columns in the SELECT list</title>
112+
<!--Removing this since zipping unnest for arrays (IMPALA-10920) and allow array type in SELECT list (IMPALA-9498) are all added in the same upstream release (Impala 4.1)
113+
<p>Prior to this release to look into the content of an array you had to unnest the array
114+
either by the joining syntax or by using the zipping <codeph>UNNEST</codeph> operator as
115+
shown in the following example:</p>
116+
<codeblock id="codeblock_fwb_yqb_3cc">SELECT unnest(IDs), unnest(NAMES) FROM table_name;</codeblock>
117+
-->
118+
<p rev="4.1">Impala 4.1 adds support to return <codeph>ARRAYs</codeph> as
119+
<codeph>STRINGs</codeph> (<term>JSON arrays</term>) in the <codeph>SELECT</codeph> list,
120+
for example: </p>
121+
<codeblock id="codeblock_gwb_yqb_3cc">select id, int_array from functional_parquet.complextypestbl where id = 1;
122+
returns: 1, “[1,2,3]”
123+
</codeblock>
124+
<p>Returning <codeph>ARRAYs</codeph> from inline or Hive Metastore views is also supported.
125+
These arrays can be used both in the select list or as relative table references.</p>
126+
<codeblock id="codeblock_hwb_yqb_3cc">select id, int_array from (select id, int_array from complextypestbl) s;</codeblock>
127+
<p>Though <codeph>STRUCTs</codeph> are already supported, <codeph>ARRAYs</codeph> and
128+
<codeph>STRUCTs</codeph> nested within each other are not supported yet. Using them as
129+
non-relative table references is also not supported yet.</p>
130+
</section>
131+
</conbody>
132+
</concept>
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
Licensed to the Apache Software Foundation (ASF) under one
4+
or more contributor license agreements. See the NOTICE file
5+
distributed with this work for additional information
6+
regarding copyright ownership. The ASF licenses this file
7+
to you under the Apache License, Version 2.0 (the
8+
"License"); you may not use this file except in compliance
9+
with the License. You may obtain a copy of the License at
10+
11+
http://www.apache.org/licenses/LICENSE-2.0
12+
13+
Unless required by applicable law or agreed to in writing,
14+
software distributed under the License is distributed on an
15+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
KIND, either express or implied. See the License for the
17+
specific language governing permissions and limitations
18+
under the License.
19+
-->
20+
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
21+
<concept id="impala_unnest_views" rev="4.1.0">
22+
<title>Zipping unnest on arrays from views (<keyword keyref="impala41"/> or higher only)</title>
23+
<titlealts audience="PDF">
24+
<navtitle>Zipping unnest on arrays from views</navtitle>
25+
</titlealts>
26+
<prolog>
27+
<metadata>
28+
<data name="Category" value="Impala"/>
29+
<data name="Category" value="Impala Data Types"/>
30+
<data name="Category" value="SQL"/>
31+
<data name="Category" value="Data Analysts"/>
32+
<data name="Category" value="Developers"/>
33+
<data name="Category" value="Schemas"/>
34+
</metadata>
35+
</prolog>
36+
<conbody>
37+
<p rev="4.1.0">
38+
<indexterm audience="hidden">Zipping unnest on arrays from views</indexterm>
39+
<keyword keyref="impala41"/>, the zipping unnest functionality works for arrays in both tables
40+
and
41+
views.<codeblock id="codeblock_nwz_lyc_3cc">SELECT UNNSET(arr1) FROM view_name;</codeblock>
42+
</p>
43+
<section id="section_irf_rrb_3cc">
44+
<title>UNNEST() on array columns</title>
45+
<p>You can use <codeph>UNNEST()</codeph> on array columns in two ways. Using one of these two
46+
ways results in the items of the arrays being zipped together instead of joining.</p>
47+
<ul id="ul_jrf_rrb_3cc">
48+
<li>ISO:SQL 2016 compliant syntax</li>
49+
</ul>
50+
<codeblock id="codeblock_krf_rrb_3cc">SELECT a1.item, a2.item
51+
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2);
52+
</codeblock>
53+
<ul id="ul_lrf_rrb_3cc">
54+
<li>Postgres compatible syntax</li>
55+
</ul>
56+
<codeblock id="codeblock_mrf_rrb_3cc">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock>
57+
<p>When unnesting multiple arrays with zipping unnest, the i'th item of one array will be
58+
placed next to the i'th item of the other arrays in the results. If the size of the arrays
59+
is not equal then the shorter arrays will be filled with NULL values up to the size of the
60+
longest array as shown in the following example:</p>
61+
<p>The test data used in this example is arr1: {1, 2, 3}, arr2: {11, 12}</p>
62+
<p>After running any of the queries listed in the examples, the result will be as shown
63+
here:</p>
64+
<p>
65+
<codeblock id="codeblock_n5z_cjq_mtb">arr1 arr2
66+
[1, 11]
67+
[2, 12]
68+
[3, NULL]
69+
</codeblock>
70+
</p>
71+
</section>
72+
<section id="section_ifc_trb_3cc">
73+
<title>Example of an UNNEST() in an inline view using SELECT/FILTER of the inline view</title>
74+
<p>In the following example the filter is not in the <codeph>SELECT</codeph> query that
75+
creates the inline view but a level above that.</p>
76+
<codeblock id="codeblock_jfc_trb_3cc">SELECT id, ac1, ac2 FROM (SELECT id, UNNEST(array_col1) AS ac1, UNNEST(array_col2) AS ac2 FROM some_view) WHERE id &lt;10;
77+
</codeblock>
78+
</section>
79+
</conbody>
80+
</concept>

0 commit comments

Comments
 (0)