diff --git a/content/Development/desingdocs/column-statistics-in-hive.md b/content/Development/desingdocs/column-statistics-in-hive.md index 50230987..284657c8 100644 --- a/content/Development/desingdocs/column-statistics-in-hive.md +++ b/content/Development/desingdocs/column-statistics-in-hive.md @@ -15,11 +15,11 @@ Column statistics are introduced in Hive 0.10.0 by [HIVE-1362](https://issues.ap Column statistics auto gather is introduced in Hive 2.3 by [HIVE-11160](https://issues.apache.org/jira/browse/HIVE-11160). This is also the design document. -For general information about Hive statistics, see [Statistics in Hive]({{< ref "statsdev" >}}). For information about top K statistics, see [Column Level Top K Statistics]({{< ref "top-k-stats" >}}). +For general information about Hive statistics, see [Statistics in Hive]({{% ref "statsdev" %}}). For information about top K statistics, see [Column Level Top K Statistics]({{% ref "top-k-stats" %}}). ### **HiveQL changes** -HiveQL currently supports the [analyze command]({{< ref "#analyze-command" >}}) to compute statistics on tables and partitions. HiveQL’s analyze command will be extended to trigger statistics computation on one or more column in a Hive table/partition. The necessary changes to HiveQL are as below, +HiveQL currently supports the [analyze command]({{% ref "#analyze-command" %}}) to compute statistics on tables and partitions. HiveQL’s analyze command will be extended to trigger statistics computation on one or more column in a Hive table/partition. The necessary changes to HiveQL are as below, `analyze table t [partition p] compute statistics for [columns c,...];` diff --git a/content/Development/desingdocs/correlation-optimizer.md b/content/Development/desingdocs/correlation-optimizer.md index 746a13cb..34532c50 100644 --- a/content/Development/desingdocs/correlation-optimizer.md +++ b/content/Development/desingdocs/correlation-optimizer.md @@ -127,7 +127,7 @@ In Hive, a submitted SQL query needs to be evaluated in a distributed system. Wh For an operator requiring data shuffling, Hive will add one or multiple `ReduceSinkOperators` as parents of this operator (the number of `ReduceSinkOperators` depends on the number of inputs of the operator requiring data shuffling). Those `ReduceSinkOperators` form the boundary between the Map phase and Reduce phase. Then, Hive will cut the operator tree to multiple pieces (MapReduce tasks) and each piece can be executed in a MapReduce job. -For a complex query, it is possible that a input table is used by multiple MapReduce tasks. In this case, this table will be loaded multiple times when the original operator tree is used. Also, when generating those `ReduceSinkOperators`, Hive does not consider if the corresponding operator requiring data shuffling really needs a re-partitioned input data. For example, in the original operator tree of [Example 1]({{< ref "#example-1" >}}) ([Figure 1]({{< ref "#figure-1" >}})), `AGG1`, `JOIN1`, and `AGG2` require the data been shuffled in the same way because all of them require the column `key` to be the partitioning column in their corresponding `ReduceSinkOperators`. But, Hive is not aware this correlation between `AGG1`, `JOIN1`, and `AGG2`, and still generates three MapReduce tasks. +For a complex query, it is possible that a input table is used by multiple MapReduce tasks. In this case, this table will be loaded multiple times when the original operator tree is used. Also, when generating those `ReduceSinkOperators`, Hive does not consider if the corresponding operator requiring data shuffling really needs a re-partitioned input data. For example, in the original operator tree of [Example 1]({{% ref "#example-1" %}}) ([Figure 1]({{% ref "#figure-1" %}})), `AGG1`, `JOIN1`, and `AGG2` require the data been shuffled in the same way because all of them require the column `key` to be the partitioning column in their corresponding `ReduceSinkOperators`. But, Hive is not aware this correlation between `AGG1`, `JOIN1`, and `AGG2`, and still generates three MapReduce tasks. Correlation Optimizer aims to exploit two intra-qeury correlations mentioned above. @@ -138,7 +138,7 @@ Correlation Optimizer aims to exploit two intra-qeury correlations mentioned abo In Hive, every query has one or multiple terminal operators which are the last operators in the operator tree. Those terminal operators are called FileSinkOperatos. To give an easy explanation, if an operator A is on another operator B's path to a FileSinkOperato, A is the downstream of B and B is the upstream of A. -For a given operator tree like the one shown in [Figure 1]({{< ref "#figure-1" >}}), the Correlation Optimizer starts to visit operators in the tree from those FileSinkOperatos in a depth-first way. The tree walker stops at every ReduceSinkOperator. Then, a correlation detector starts to find a correlation from this ReduceSinkOperator and its siblings by finding the furthest correlated upstream ReduceSinkOperators in a recursive way. If we can find any correlated upstream ReduceSinkOperator, we find a correlation. Currently, there are three conditions to determine if a upstream ReduceSinkOperator and an downstream ReduceSinkOperator are correlated, which are +For a given operator tree like the one shown in [Figure 1]({{% ref "#figure-1" %}}), the Correlation Optimizer starts to visit operators in the tree from those FileSinkOperatos in a depth-first way. The tree walker stops at every ReduceSinkOperator. Then, a correlation detector starts to find a correlation from this ReduceSinkOperator and its siblings by finding the furthest correlated upstream ReduceSinkOperators in a recursive way. If we can find any correlated upstream ReduceSinkOperator, we find a correlation. Currently, there are three conditions to determine if a upstream ReduceSinkOperator and an downstream ReduceSinkOperator are correlated, which are 1. emitted rows from these two ReduceSinkOperators are sorted in the same way; 2. emitted rows from these two ReduceSinkOperators are partitioned in the same way; and @@ -156,11 +156,11 @@ With these two rules, we start to analyze those parent ReduceSinkOperators of th For a UnionOperator, none of its parents will be a ReduceSinkOperator. So, we check if we can find correlated ReduceSinkOperators for every parent branch of this UnionOperator. If any branch does not have a ReduceSinkOperator, we will determine that we do not find any correlated ReduceSinkOperator at parent branches of this UnionOperator. During the process of correlation detection, it is possible that the detector can visit a JoinOperator which will be converted to a Map Join later. In this case, the detector stops searching the branch containing this Map Join. For example, -in [Figure 5]({{< ref "#figure-5" >}}), the detector knows that MJ1, MJ2, and MJ3 will be converted to Map Joins. +in [Figure 5]({{% ref "#figure-5" %}}), the detector knows that MJ1, MJ2, and MJ3 will be converted to Map Joins. ## 5. Operator Tree Transformation -In a correlation, there are two kinds of ReduceSinkOperators. The first kinds of ReduceSinkOperators are at the bottom layer of a query operator tree which are needed to emit rows to the shuffling phase. For example, in [Figure 1]({{< ref "#figure-1" >}}), RS1 and RS3 are bottom layer ReduceSinkOperators. The second kinds of ReduceSinkOperators are unnecessary ones which can be removed from the optimized operator tree. For example, in [Figure 1]({{< ref "#figure-1" >}}), RS2 and RS4 are unnecessary ReduceSinkOperators. Because the input rows of the Reduce phase may need to be forwarded to different operators and those input rows are coming from a single stream, we add a new operator called DemuxOperator to dispatch input rows of the Reduce phase to corresponding operators. In the operator tree transformation, we first connect children of those bottom layer ReduceSinkOperators to the DemuxOperator and reassign tags of those bottom layer ReduceSinkOperators (the DemuxOperator is the only child of those bottom layer ReduceSinkOperators). In the DemuxOperator, we record two mappings. The first one is called newTagToOldTag which maps those new tags assigned to those bottom layer ReduceSinkOperators to their original tags. Those original tags are needed to make JoinOperator work correctly. The second mapping is called newTagToChildIndex which maps those new tags to the children indexes. With this mapping, the DemuxOperator can know the correct operator that a row needs to be forwarded based on the tag of this row. The second step of operator tree transformation is to remove those unnecessary ReduceSinkOperators. To make the operator tree in the Reduce phase work correctly, we add a new operator called MuxOperator to the original place of those unnecessary ReduceSinkOperators. It is worth noting that if an operator has multiple unnecessary ReduceSinkOperators as its parents, we only add a single MuxOperator. +In a correlation, there are two kinds of ReduceSinkOperators. The first kinds of ReduceSinkOperators are at the bottom layer of a query operator tree which are needed to emit rows to the shuffling phase. For example, in [Figure 1]({{% ref "#figure-1" %}}), RS1 and RS3 are bottom layer ReduceSinkOperators. The second kinds of ReduceSinkOperators are unnecessary ones which can be removed from the optimized operator tree. For example, in [Figure 1]({{% ref "#figure-1" %}}), RS2 and RS4 are unnecessary ReduceSinkOperators. Because the input rows of the Reduce phase may need to be forwarded to different operators and those input rows are coming from a single stream, we add a new operator called DemuxOperator to dispatch input rows of the Reduce phase to corresponding operators. In the operator tree transformation, we first connect children of those bottom layer ReduceSinkOperators to the DemuxOperator and reassign tags of those bottom layer ReduceSinkOperators (the DemuxOperator is the only child of those bottom layer ReduceSinkOperators). In the DemuxOperator, we record two mappings. The first one is called newTagToOldTag which maps those new tags assigned to those bottom layer ReduceSinkOperators to their original tags. Those original tags are needed to make JoinOperator work correctly. The second mapping is called newTagToChildIndex which maps those new tags to the children indexes. With this mapping, the DemuxOperator can know the correct operator that a row needs to be forwarded based on the tag of this row. The second step of operator tree transformation is to remove those unnecessary ReduceSinkOperators. To make the operator tree in the Reduce phase work correctly, we add a new operator called MuxOperator to the original place of those unnecessary ReduceSinkOperators. It is worth noting that if an operator has multiple unnecessary ReduceSinkOperators as its parents, we only add a single MuxOperator. ## 6. Executing Optimized Operator Tree in the Reduce Phase diff --git a/content/Development/desingdocs/default-constraint.md b/content/Development/desingdocs/default-constraint.md index ea0aae51..13dbbdf9 100644 --- a/content/Development/desingdocs/default-constraint.md +++ b/content/Development/desingdocs/default-constraint.md @@ -104,7 +104,7 @@ Along with this logic change we foresee the following changes: ## Further Work -[HIVE-19059](https://issues.apache.org/jira/browse/HIVE-19059) adds the keyword DEFAULT to enable users to add DEFAULT values in INSERT and UPDATE statements without specifying the column schema.  See [DEFAULT Keyword (HIVE-19059)]({{< ref "default-keyword" >}}). +[HIVE-19059](https://issues.apache.org/jira/browse/HIVE-19059) adds the keyword DEFAULT to enable users to add DEFAULT values in INSERT and UPDATE statements without specifying the column schema.  See [DEFAULT Keyword (HIVE-19059)]({{% ref "default-keyword" %}}). diff --git a/content/Development/desingdocs/default-keyword.md b/content/Development/desingdocs/default-keyword.md index 7d220afe..c5ba5172 100644 --- a/content/Development/desingdocs/default-keyword.md +++ b/content/Development/desingdocs/default-keyword.md @@ -11,7 +11,7 @@ We propose to add DEFAULT keyword in INSERT INTO, UPDATE and MERGE statements to # Background -With the addition of [DEFAULT constraint]({{< ref "default-constraint" >}}) ([HIVE-18726](https://issues.apache.org/jira/browse/HIVE-18726)) user can define columns to have default value which will be used in case user doesn’t explicitly specify it while INSERTING data. For DEFAULT constraint to kick in user has to explicitly specify column schema leaving out the column name for which user would like the sytem to use DEFAULT value. e.g. INSERT INTO TABLE1(COL1, COL3) VALUES(1,3). This statement leaves COL2 from the schema so that Hive could insert DEFAULT value if it is defined. But if user wants to insert DEFAULT value without specifying column schema it is not possible to do so. This limitation could be overcome using DEFAULT keyword.  +With the addition of [DEFAULT constraint]({{% ref "default-constraint" %}}) ([HIVE-18726](https://issues.apache.org/jira/browse/HIVE-18726)) user can define columns to have default value which will be used in case user doesn’t explicitly specify it while INSERTING data. For DEFAULT constraint to kick in user has to explicitly specify column schema leaving out the column name for which user would like the sytem to use DEFAULT value. e.g. INSERT INTO TABLE1(COL1, COL3) VALUES(1,3). This statement leaves COL2 from the schema so that Hive could insert DEFAULT value if it is defined. But if user wants to insert DEFAULT value without specifying column schema it is not possible to do so. This limitation could be overcome using DEFAULT keyword.  # Proposed Changes diff --git a/content/Development/desingdocs/design.md b/content/Development/desingdocs/design.md index 8b4ca2e1..ffa4d453 100644 --- a/content/Development/desingdocs/design.md +++ b/content/Development/desingdocs/design.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : Design -This page contains details about the Hive design and architecture. A brief technical report about Hive is available at [hive.pdf]({{< ref "#hive-pdf" >}}). +This page contains details about the Hive design and architecture. A brief technical report about Hive is available at [hive.pdf]({{% ref "#hive-pdf" %}}). ## Hive Architecture @@ -77,7 +77,7 @@ More plan transformations are performed by the optimizer. The optimizer is an ev ## Hive APIs -[Hive APIs Overview]({{< ref "hive-apis-overview" >}}) describes various public-facing APIs that Hive provides. +[Hive APIs Overview]({{% ref "hive-apis-overview" %}}) describes various public-facing APIs that Hive provides. ## Attachments: diff --git a/content/Development/desingdocs/designdocs.md b/content/Development/desingdocs/designdocs.md index 332f491c..672d2ee8 100644 --- a/content/Development/desingdocs/designdocs.md +++ b/content/Development/desingdocs/designdocs.md @@ -11,80 +11,80 @@ Proposals that appear in the "Completed" and "In Progress" sections should inclu ## Completed -* [Views]({{< ref "viewdev" >}}) ([HIVE-1143](https://issues.apache.org/jira/browse/HIVE-1143)) -* [Partitioned Views]({{< ref "partitionedviews" >}}) ([HIVE-1941](https://issues.apache.org/jira/browse/HIVE-1941)) -* [Storage Handlers]({{< ref "storagehandlers" >}}) ([HIVE-705](https://issues.apache.org/jira/browse/HIVE-705)) -* [HBase Integration]({{< ref "hbaseintegration" >}}) -* [HBase Bulk Load]({{< ref "hbasebulkload" >}}) -* [Locking]({{< ref "locking" >}}) ([HIVE-1293](https://issues.apache.org/jira/browse/HIVE-1293)) -* [Indexes]({{< ref "indexdev" >}}) ([HIVE-417](https://issues.apache.org/jira/browse/HIVE-417)) -* [Bitmap Indexes]({{< ref "indexdev-bitmap" >}}) ([HIVE-1803](https://issues.apache.org/jira/browse/HIVE-1803)) -* [Filter Pushdown]({{< ref "filterpushdowndev" >}}) ([HIVE-279](https://issues.apache.org/jira/browse/HIVE-279)) -* [Table-level Statistics]({{< ref "statsdev" >}}) ([HIVE-1361](https://issues.apache.org/jira/browse/HIVE-1361)) -* [Dynamic Partitions]({{< ref "dynamicpartitions" >}}) -* [Binary Data Type]({{< ref "binary-datatype-proposal" >}}) ([HIVE-2380](https://issues.apache.org/jira/browse/HIVE-2380)) +* [Views]({{% ref "viewdev" %}}) ([HIVE-1143](https://issues.apache.org/jira/browse/HIVE-1143)) +* [Partitioned Views]({{% ref "partitionedviews" %}}) ([HIVE-1941](https://issues.apache.org/jira/browse/HIVE-1941)) +* [Storage Handlers]({{% ref "storagehandlers" %}}) ([HIVE-705](https://issues.apache.org/jira/browse/HIVE-705)) +* [HBase Integration]({{% ref "hbaseintegration" %}}) +* [HBase Bulk Load]({{% ref "hbasebulkload" %}}) +* [Locking]({{% ref "locking" %}}) ([HIVE-1293](https://issues.apache.org/jira/browse/HIVE-1293)) +* [Indexes]({{% ref "indexdev" %}}) ([HIVE-417](https://issues.apache.org/jira/browse/HIVE-417)) +* [Bitmap Indexes]({{% ref "indexdev-bitmap" %}}) ([HIVE-1803](https://issues.apache.org/jira/browse/HIVE-1803)) +* [Filter Pushdown]({{% ref "filterpushdowndev" %}}) ([HIVE-279](https://issues.apache.org/jira/browse/HIVE-279)) +* [Table-level Statistics]({{% ref "statsdev" %}}) ([HIVE-1361](https://issues.apache.org/jira/browse/HIVE-1361)) +* [Dynamic Partitions]({{% ref "dynamicpartitions" %}}) +* [Binary Data Type]({{% ref "binary-datatype-proposal" %}}) ([HIVE-2380](https://issues.apache.org/jira/browse/HIVE-2380)) * [Decimal Precision and Scale Support](/attachments/27362075/34177489.pdf) -* [HCatalog]({{< ref "hcatalog-base" >}}) (formerly [Howl]({{< ref "howl" >}})) -* [HiveServer2]({{< ref "hiveserver2-thrift-api" >}}) ([HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935)) -* [Column Statistics in Hive]({{< ref "column-statistics-in-hive" >}}) ([HIVE-1362](https://issues.apache.org/jira/browse/HIVE-1362)) -* [List Bucketing]({{< ref "listbucketing" >}}) ([HIVE-3026](https://issues.apache.org/jira/browse/HIVE-3026)) -* [Group By With Rollup]({{< ref "groupbywithrollup" >}}) ([HIVE-2397](https://issues.apache.org/jira/browse/HIVE-2397)) +* [HCatalog]({{% ref "hcatalog-base" %}}) (formerly [Howl]({{% ref "howl" %}})) +* [HiveServer2]({{% ref "hiveserver2-thrift-api" %}}) ([HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935)) +* [Column Statistics in Hive]({{% ref "column-statistics-in-hive" %}}) ([HIVE-1362](https://issues.apache.org/jira/browse/HIVE-1362)) +* [List Bucketing]({{% ref "listbucketing" %}}) ([HIVE-3026](https://issues.apache.org/jira/browse/HIVE-3026)) +* [Group By With Rollup]({{% ref "groupbywithrollup" %}}) ([HIVE-2397](https://issues.apache.org/jira/browse/HIVE-2397)) * [Enhanced Aggregation, Cube, Grouping and Rollup](/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup) ([HIVE-3433](https://issues.apache.org/jira/browse/HIVE-3433)) -* [Optimizing Skewed Joins]({{< ref "skewed-join-optimization" >}}) ([HIVE-3086](https://issues.apache.org/jira/browse/HIVE-3086)) -* [Correlation Optimizer]({{< ref "correlation-optimizer" >}}) ([HIVE-2206](https://issues.apache.org/jira/browse/HIVE-2206)) -* [Hive on Tez]({{< ref "hive-on-tez" >}}) ([HIVE-4660](https://issues.apache.org/jira/browse/HIVE-4660)) - + [Hive-Tez Compatibility]({{< ref "hive-tez-compatibility" >}}) -* [Vectorized Query Execution]({{< ref "vectorized-query-execution" >}}) ([HIVE-4160](https://issues.apache.org/jira/browse/HIVE-4160)) +* [Optimizing Skewed Joins]({{% ref "skewed-join-optimization" %}}) ([HIVE-3086](https://issues.apache.org/jira/browse/HIVE-3086)) +* [Correlation Optimizer]({{% ref "correlation-optimizer" %}}) ([HIVE-2206](https://issues.apache.org/jira/browse/HIVE-2206)) +* [Hive on Tez]({{% ref "hive-on-tez" %}}) ([HIVE-4660](https://issues.apache.org/jira/browse/HIVE-4660)) + + [Hive-Tez Compatibility]({{% ref "hive-tez-compatibility" %}}) +* [Vectorized Query Execution]({{% ref "vectorized-query-execution" %}}) ([HIVE-4160](https://issues.apache.org/jira/browse/HIVE-4160)) * [Cost Based Optimizer in Hive](/docs/latest/user/cost-based-optimization-in-hive) ([HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775)) * [Atomic Insert/Update/Delete](https://issues.apache.org/jira/browse/HIVE-5317) ([HIVE-5317](https://issues.apache.org/jira/browse/HIVE-5317)) * [Transaction Manager](https://issues.apache.org/jira/browse/HIVE-5843) ([HIVE-5843](https://issues.apache.org/jira/browse/HIVE-5843)) * [SQL Standard based secure authorization](/attachments/27362075/35193122.pdf) ([HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837)) -* [Hybrid Hybrid Grace Hash Join]({{< ref "hybrid-grace-hash-join-v1-0" >}}) ([HIVE-9277](https://issues.apache.org/jira/browse/HIVE-9277)) -* [LLAP Daemons]({{< ref "llap" >}}) ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926)) -* [Support for Hive Replication]({{< ref "hivereplicationdevelopment" >}}) ([HIVE-7973](https://issues.apache.org/jira/browse/HIVE-7973)) +* [Hybrid Hybrid Grace Hash Join]({{% ref "hybrid-grace-hash-join-v1-0" %}}) ([HIVE-9277](https://issues.apache.org/jira/browse/HIVE-9277)) +* [LLAP Daemons]({{% ref "llap" %}}) ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926)) +* [Support for Hive Replication]({{% ref "hivereplicationdevelopment" %}}) ([HIVE-7973](https://issues.apache.org/jira/browse/HIVE-7973)) ## In Progress -* [Column Level Top K Statistics]({{< ref "top-k-stats" >}}) ([HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421)) +* [Column Level Top K Statistics]({{% ref "top-k-stats" %}}) ([HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421)) * [Hive on Spark](/docs/latest/user/hive-on-spark) ([HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292)) -* [Hive on Spark: Join Design (HIVE-7613)]({{< ref "hive-on-spark-join-design-master" >}}) +* [Hive on Spark: Join Design (HIVE-7613)]({{% ref "hive-on-spark-join-design-master" %}}) * [Improve ACID Performance](https://issues.apache.org/jira/secure/attachment/12823582/Design.Document.Improving%20ACID%20performance%20in%20Hive.02.docx) – download docx file ([HIVE-14035](https://issues.apache.org/jira/browse/HIVE-14035), [HIVE-14199](https://issues.apache.org/jira/browse/HIVE-14199), [HIVE-14233](https://issues.apache.org/jira/browse/HIVE-14233)) -* [Query Results Caching]({{< ref "query-results-caching" >}}) ([HIVE-18513](https://issues.apache.org/jira/browse/HIVE-18513)) -* [Default Constraint]({{< ref "default-constraint" >}}) [(HIVE-18726)](https://issues.apache.org/jira/browse/HIVE-18726) -* [Different TIMESTAMP types]({{< ref "different-timestamp-types" >}}) ([HIVE-21348](https://issues.apache.org/jira/browse/HIVE-21348)) -* [Support SAML 2.0 authentication]({{< ref "support-saml-2-0-authentication-mode" >}}) ([HIVE-24543](https://issues.apache.org/jira/browse/HIVE-24543)) +* [Query Results Caching]({{% ref "query-results-caching" %}}) ([HIVE-18513](https://issues.apache.org/jira/browse/HIVE-18513)) +* [Default Constraint]({{% ref "default-constraint" %}}) [(HIVE-18726)](https://issues.apache.org/jira/browse/HIVE-18726) +* [Different TIMESTAMP types]({{% ref "different-timestamp-types" %}}) ([HIVE-21348](https://issues.apache.org/jira/browse/HIVE-21348)) +* [Support SAML 2.0 authentication]({{% ref "support-saml-2-0-authentication-mode" %}}) ([HIVE-24543](https://issues.apache.org/jira/browse/HIVE-24543)) ## Proposed -* [Spatial Queries]({{< ref "spatial-queries" >}}) -* [Theta Join]({{< ref "theta-join" >}}) ([HIVE-556](https://issues.apache.org/jira/browse/HIVE-556)) +* [Spatial Queries]({{% ref "spatial-queries" %}}) +* [Theta Join]({{% ref "theta-join" %}}) ([HIVE-556](https://issues.apache.org/jira/browse/HIVE-556)) * [attachments/27362075/55476344.pdf](/attachments/27362075/55476344.pdf) * [JDBC Storage Handler](https://issues.apache.org/jira/secure/attachment/12474978/JDBCStorageHandler+Design+Doc.pdf) -* [MapJoin Optimization]({{< ref "mapjoinoptimization" >}}) +* [MapJoin Optimization]({{% ref "mapjoinoptimization" %}}) * [Proposal to standardize and expand Authorization in Hive](https://issues.apache.org/jira/secure/attachment/12554109/Hive_Authorization_Functionality.pdf) -* [Dependent Tables]({{< ref "dependent-tables" >}}) ([HIVE-3466](https://issues.apache.org/jira/browse/HIVE-3466)) -* [AccessServer]({{< ref "accessserver-design-proposal" >}}) -* [Type Qualifiers in Hive]({{< ref "type-qualifiers-in-hive" >}}) -* [MapJoin & Partition Pruning]({{< ref "mapjoin-and-partition-pruning" >}}) ([HIVE-5119](https://issues.apache.org/jira/browse/HIVE-5119)) -* [Updatable Views]({{< ref "updatableviews" >}}) ([HIVE-1143](https://issues.apache.org/jira/browse/HIVE-1143)) -* [Phase 2 of Replication Development]({{< ref "hivereplicationv2development" >}}) ([HIVE-14841](https://issues.apache.org/jira/browse/HIVE-14841)) -* [Subqueries in SELECT]({{< ref "subqueries-in-select" >}}) ([HIVE-16091](https://issues.apache.org/jira/browse/HIVE-16091)) +* [Dependent Tables]({{% ref "dependent-tables" %}}) ([HIVE-3466](https://issues.apache.org/jira/browse/HIVE-3466)) +* [AccessServer]({{% ref "accessserver-design-proposal" %}}) +* [Type Qualifiers in Hive]({{% ref "type-qualifiers-in-hive" %}}) +* [MapJoin & Partition Pruning]({{% ref "mapjoin-and-partition-pruning" %}}) ([HIVE-5119](https://issues.apache.org/jira/browse/HIVE-5119)) +* [Updatable Views]({{% ref "updatableviews" %}}) ([HIVE-1143](https://issues.apache.org/jira/browse/HIVE-1143)) +* [Phase 2 of Replication Development]({{% ref "hivereplicationv2development" %}}) ([HIVE-14841](https://issues.apache.org/jira/browse/HIVE-14841)) +* [Subqueries in SELECT]({{% ref "subqueries-in-select" %}}) ([HIVE-16091](https://issues.apache.org/jira/browse/HIVE-16091)) * [DEFAULT keyword](/development/desingdocs/default-keyword) [(HIVE-19059)](https://issues.apache.org/jira/browse/HIVE-19059) -* [Hive remote databases/tables]({{< ref "hive-remote-databases-tables" >}}) +* [Hive remote databases/tables]({{% ref "hive-remote-databases-tables" %}}) ## Incomplete -* [Authorization]({{< ref "authdev" >}}) (Committed but not secure/deployable – see [Disclaimer]({{< ref "#disclaimer" >}})) +* [Authorization]({{% ref "authdev" %}}) (Committed but not secure/deployable – see [Disclaimer]({{% ref "#disclaimer" %}})) ## Abandoned -* [Hive across Multiple Data Centers (Physical Clusters)]({{< ref "hive-across-multiple-data-centers" >}}) -* [Metastore on HBase]({{< ref "hbasemetastoredevelopmentguide" >}}) ([HIVE-9452](https://issues.apache.org/jira/browse/HIVE-9452)) +* [Hive across Multiple Data Centers (Physical Clusters)]({{% ref "hive-across-multiple-data-centers" %}}) +* [Metastore on HBase]({{% ref "hbasemetastoredevelopmentguide" %}}) ([HIVE-9452](https://issues.apache.org/jira/browse/HIVE-9452)) ## Other -* [Security Notes]({{< ref "security" >}}) -* [Hive Outer Join Behavior]({{< ref "outerjoinbehavior" >}}) +* [Security Notes]({{% ref "security" %}}) +* [Hive Outer Join Behavior]({{% ref "outerjoinbehavior" %}}) * [Metastore ER Diagram](https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf) ## Attachments: diff --git a/content/Development/desingdocs/dynamicpartitions.md b/content/Development/desingdocs/dynamicpartitions.md index 04675207..f098b61a 100644 --- a/content/Development/desingdocs/dynamicpartitions.md +++ b/content/Development/desingdocs/dynamicpartitions.md @@ -9,11 +9,11 @@ date: 2024-12-12 This is the design document for dynamic partitions in Hive. Usage information is also available: -* [Tutorial: Dynamic-Partition Insert]({{< ref "#tutorial:-dynamic-partition-insert" >}}) -* [Hive DML: Dynamic Partition Inserts]({{< ref "#hive-dml:-dynamic-partition-inserts" >}}) -* [HCatalog Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) - + [Usage with Pig]({{< ref "#usage-with-pig" >}}) - + [Usage from MapReduce]({{< ref "#usage-from-mapreduce" >}}) +* [Tutorial: Dynamic-Partition Insert]({{% ref "#tutorial:-dynamic-partition-insert" %}}) +* [Hive DML: Dynamic Partition Inserts]({{% ref "#hive-dml:-dynamic-partition-inserts" %}}) +* [HCatalog Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) + + [Usage with Pig]({{% ref "#usage-with-pig" %}}) + + [Usage from MapReduce]({{% ref "#usage-from-mapreduce" %}}) References: diff --git a/content/Development/desingdocs/filterpushdowndev.md b/content/Development/desingdocs/filterpushdowndev.md index fa74e597..2e31c940 100644 --- a/content/Development/desingdocs/filterpushdowndev.md +++ b/content/Development/desingdocs/filterpushdowndev.md @@ -12,7 +12,7 @@ This document explains how we are planning to add support in Hive's optimizer fo Below are the main use cases we are targeting. * Pushing filters down into Hive's builtin storage formats such as RCFile -* Pushing filters down into storage handlers such as the [HBase handler]({{< ref "hbaseintegration" >}}) () +* Pushing filters down into storage handlers such as the [HBase handler]({{% ref "hbaseintegration" %}}) () * Pushing filters down into index access plans once an indexing framework is added to Hive () ## Components Involved diff --git a/content/Development/desingdocs/hbasebulkload.md b/content/Development/desingdocs/hbasebulkload.md index d4a0ff43..4614081f 100644 --- a/content/Development/desingdocs/hbasebulkload.md +++ b/content/Development/desingdocs/hbasebulkload.md @@ -9,7 +9,7 @@ This page explains how to use Hive to bulk load data into a new (empty) HBase ta ## Overview -Ideally, bulk load from Hive into HBase would be part of [HBaseIntegration]({{< ref "hbaseintegration" >}}), making it as simple as this: +Ideally, bulk load from Hive into HBase would be part of [HBaseIntegration]({{% ref "hbaseintegration" %}}), making it as simple as this: ``` CREATE TABLE new_hbase_table(rowkey string, x int, y int) diff --git a/content/Development/desingdocs/hbasemetastoredevelopmentguide.md b/content/Development/desingdocs/hbasemetastoredevelopmentguide.md index 5ebefd55..b48ed7b9 100644 --- a/content/Development/desingdocs/hbasemetastoredevelopmentguide.md +++ b/content/Development/desingdocs/hbasemetastoredevelopmentguide.md @@ -73,7 +73,7 @@ hive --service hbaseimport  [Overall Approach](https://issues.apache.org/jira/secure/attachment/12697601/HBaseMetastoreApproach.pdf) -[Hbase execution plans for RawStore partition filter condition]({{< ref "hbase-execution-plans-for-rawstore-partition-filter-condition" >}}) +[Hbase execution plans for RawStore partition filter condition]({{% ref "hbase-execution-plans-for-rawstore-partition-filter-condition" %}})   diff --git a/content/Development/desingdocs/hive-on-spark-join-design-master.md b/content/Development/desingdocs/hive-on-spark-join-design-master.md index 48b01157..f8596f61 100644 --- a/content/Development/desingdocs/hive-on-spark-join-design-master.md +++ b/content/Development/desingdocs/hive-on-spark-join-design-master.md @@ -9,7 +9,7 @@ date: 2024-12-12 The purpose of this document is to summarize the findings of all the research of different joins and describe a unified design to attack the problem in Spark.  It will identify the optimization processors will be involved and their responsibilities. -It is not the purpose to go in depth for design of the various join implementations in Spark, such as the common-join ([HIVE-7384](https://issues.apache.org/jira/browse/HIVE-7384)), or the optimized join variants like mapjoin ([HIVE-7613](https://issues.apache.org/jira/browse/HIVE-7613)), skew-join ([HIVE-8406](https://issues.apache.org/jira/browse/HIVE-8406)) or SMB mapjoin ([HIVE-8202](https://issues.apache.org/jira/browse/HIVE-8202)).  It will be helpful to refer to the design documents attached on JIRA for those details before reading this document, as they will also contain some background of how they are implemented in MapReduce and comparisons.  Lastly, it will also be helpful to read the overall [Hive on Spark]({{< ref "hive-on-spark" >}}) design doc before reading this document. +It is not the purpose to go in depth for design of the various join implementations in Spark, such as the common-join ([HIVE-7384](https://issues.apache.org/jira/browse/HIVE-7384)), or the optimized join variants like mapjoin ([HIVE-7613](https://issues.apache.org/jira/browse/HIVE-7613)), skew-join ([HIVE-8406](https://issues.apache.org/jira/browse/HIVE-8406)) or SMB mapjoin ([HIVE-8202](https://issues.apache.org/jira/browse/HIVE-8202)).  It will be helpful to refer to the design documents attached on JIRA for those details before reading this document, as they will also contain some background of how they are implemented in MapReduce and comparisons.  Lastly, it will also be helpful to read the overall [Hive on Spark]({{% ref "hive-on-spark" %}}) design doc before reading this document. ## MapReduce Summary diff --git a/content/Development/desingdocs/hive-on-tez.md b/content/Development/desingdocs/hive-on-tez.md index b8119ec8..0712fb99 100644 --- a/content/Development/desingdocs/hive-on-tez.md +++ b/content/Development/desingdocs/hive-on-tez.md @@ -91,7 +91,7 @@ hive.execution.engine (changed in [HIVE-6103](https://issues.apache.org/jira/bro tez: Submit native TEZ dags, optimized for MRR/MPJ + ~~False~~  mr (default): Submit single map, single reduce plans -* Update:  Several configuration variables were introduced in Hive 0.13.0.  See the [Tez section]({{< ref "#tez-section" >}}) in Configuration Properties. +* Update:  Several configuration variables were introduced in Hive 0.13.0.  See the [Tez section]({{% ref "#tez-section" %}}) in Configuration Properties. Note: It is possible to execute an MR plan against TEZ. In order to do so, one simply has to change the following variable (assuming Tez is installed on the cluster): @@ -299,11 +299,11 @@ Mini Tez Cluster will initially be the only way to run Tez during unit tests. Lo For information about how to set up Tez on a Hadoop 2 cluster, see .  -For information about how to configure Hive 0.13.0+ for Tez, see the release notes for [HIVE-6098, Merge Tez branch into trunk](https://issues.apache.org/jira/browse/HIVE-6098).  Also see [Configuration Properties: Tez]({{< ref "#configuration-properties:-tez" >}}) for descriptions of all the Tez parameters. +For information about how to configure Hive 0.13.0+ for Tez, see the release notes for [HIVE-6098, Merge Tez branch into trunk](https://issues.apache.org/jira/browse/HIVE-6098).  Also see [Configuration Properties: Tez]({{% ref "#configuration-properties:-tez" %}}) for descriptions of all the Tez parameters. ### Hive-Tez Compatibility -For a list of Hive and Tez releases that are compatible with each other, see [Hive-Tez Compatibility]({{< ref "hive-tez-compatibility" >}}). +For a list of Hive and Tez releases that are compatible with each other, see [Hive-Tez Compatibility]({{% ref "hive-tez-compatibility" %}}). diff --git a/content/Development/desingdocs/hivereplicationdevelopment.md b/content/Development/desingdocs/hivereplicationdevelopment.md index 2bc5e1fc..a2646f00 100644 --- a/content/Development/desingdocs/hivereplicationdevelopment.md +++ b/content/Development/desingdocs/hivereplicationdevelopment.md @@ -11,11 +11,11 @@ Replication in the context of databases and warehouses is the process of duplica In Hive, replication (introduced in [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-7973)) focuses on disaster recovery, using a lazy, primary-copy model. It uses notifications and export/import statements to implement an API for replication that can then be executed by other tools such as [Falcon](https://falcon.apache.org/). -See [Hive Replication]({{< ref "replication" >}}) for usage information. +See [Hive Replication]({{% ref "replication" %}}) for usage information. ###### Version 2 of Hive Replication -This document describes the initial version of Hive Replication.  A second version is also available:  see [HiveReplicationv2Development]({{< ref "hivereplicationv2development" >}}) for details. +This document describes the initial version of Hive Replication.  A second version is also available:  see [HiveReplicationv2Development]({{% ref "hivereplicationv2development" %}}) for details. ## Purposes of Replication @@ -37,7 +37,7 @@ Replication is the process of making a duplicate copy of some object such as a d ## Replication Taxonomy -Replication systems are frequently classified by their transaction source (“where”) and their synchronization strategy (“when”) [[1]({{< ref "#1" >}})]. +Replication systems are frequently classified by their transaction source (“where”) and their synchronization strategy (“when”) [[1]({{% ref "#1" %}})]. ### Transaction Source @@ -79,7 +79,7 @@ Since replication in Hive focuses on disaster recovery, the read-only load balan ### Eager vs Lazy -Eager replication requires a guaranteed delta log for every update on the primary. This poses a problem when [external tables]({{< ref "#external-tables" >}}) are used in Hive. External tables allow data to be managed completely outside of Hive, and therefore may not provide Hive with a complete and accurate delta log. With ACID tables, such a log does exist and will be made use of in future development, but currently replication in Hive works only with traditional tables. +Eager replication requires a guaranteed delta log for every update on the primary. This poses a problem when [external tables]({{% ref "#external-tables" %}}) are used in Hive. External tables allow data to be managed completely outside of Hive, and therefore may not provide Hive with a complete and accurate delta log. With ACID tables, such a log does exist and will be made use of in future development, but currently replication in Hive works only with traditional tables. Instead, Hive uses lazy replication. Unlike eager replication, lazy replication is asynchronous and non-blocking which allows for better resource utilization. This is prioritized in Hive with the acknowledgement that there is some complexity involved with events being processed out of order, including destructive events such as `DROP`. @@ -96,9 +96,9 @@ In addition to the taxonomy choices listed above, a number of other factors infl ## Basic Approach -Hive already supports [EXPORT and IMPORT commands]({{< ref "languagemanual-importexport" >}}) which can be used to dump out tables, DistCp them to another cluster, and import/create from that. A mechanism which automates exports/imports would establish a base on which replication could be developed. With the aid of HiveMetaStoreEventHandler mechanisms, such automation can be developed to generate notifications when certain changes are committed to the metastore and then translate those notifications to export actions, DistCp actions, and import actions. +Hive already supports [EXPORT and IMPORT commands]({{% ref "languagemanual-importexport" %}}) which can be used to dump out tables, DistCp them to another cluster, and import/create from that. A mechanism which automates exports/imports would establish a base on which replication could be developed. With the aid of HiveMetaStoreEventHandler mechanisms, such automation can be developed to generate notifications when certain changes are committed to the metastore and then translate those notifications to export actions, DistCp actions, and import actions. -This already partially exists with the [notification system]({{< ref "hcatalog-notification" >}}) that is part of the hcatalog-server-extensions jar. Initially, this was developed to be able to trigger a JMS notification, which an Oozie workflow could use to start off actions keyed on the finishing of a job that used HCatalog to write to a table. While this currently lives under HCatalog, the primary reason for its existence has a scope well past HCatalog alone and can be used as-is without the use of [HCatalog IF/OF]({{< ref "hcatalog-inputoutput" >}}). This can be extended with the help of a library which does that aforementioned translation of notifications to actions. +This already partially exists with the [notification system]({{% ref "hcatalog-notification" %}}) that is part of the hcatalog-server-extensions jar. Initially, this was developed to be able to trigger a JMS notification, which an Oozie workflow could use to start off actions keyed on the finishing of a job that used HCatalog to write to a table. While this currently lives under HCatalog, the primary reason for its existence has a scope well past HCatalog alone and can be used as-is without the use of [HCatalog IF/OF]({{% ref "hcatalog-inputoutput" %}}). This can be extended with the help of a library which does that aforementioned translation of notifications to actions. # Implementation diff --git a/content/Development/desingdocs/hivereplicationv2development.md b/content/Development/desingdocs/hivereplicationv2development.md index 9380e6ac..b2b2db27 100644 --- a/content/Development/desingdocs/hivereplicationv2development.md +++ b/content/Development/desingdocs/hivereplicationv2development.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : HiveReplicationv2Development -This document describes the second version of Hive Replication. Please refer to the [first version of Hive Replication]({{< ref "hivereplicationdevelopment" >}}) for details on prior implementation. +This document describes the second version of Hive Replication. Please refer to the [first version of Hive Replication]({{% ref "hivereplicationdevelopment" %}}) for details on prior implementation. This work is under development and interfaces are subject to change. This has been designed for use in conjunction with external orchestration tools, which would be responsible for co-ordinating the right sequence of commands between source and target clusters, fault tolerance/failure handling, and also providing correct configuration options that are necessary to be able to do cross cluster replication. @@ -15,7 +15,7 @@ As of Hive 3.0.0 release : only managed table replication where Hive user owns t # Issues with the Current Replication System -Some of the observed issues with the [current replication implementation]({{< ref "hivereplicationdevelopment" >}}) are as follows: +Some of the observed issues with the [current replication implementation]({{% ref "hivereplicationdevelopment" %}}) are as follows: 1. Slowness 2. Requiring staging dirs with full copies (4xcopy problem) diff --git a/content/Development/desingdocs/hybrid-grace-hash-join-v1-0.md b/content/Development/desingdocs/hybrid-grace-hash-join-v1-0.md index c5ec281a..21acee2d 100644 --- a/content/Development/desingdocs/hybrid-grace-hash-join-v1-0.md +++ b/content/Development/desingdocs/hybrid-grace-hash-join-v1-0.md @@ -69,7 +69,7 @@ It’s obvious that GRACE Hash Join uses main memory as a staging area during th This feature tries to avoid the unnecessary write-back of partitions to disk as much as possible, and will only do that when necessary. The idea is to fully utilize the main memory to hold existing partitions of hash tables. -The key factor that will impact the performance of this algorithm is whether the data can be evenly distributed into different hash partitions. If we have skewed values, which will result in a few very big partitions, then an extra partitioning step is needed to divide the big partitions down to many. This can happen recursively. Refer to [Recursive Hashing and Spilling]({{< ref "#recursive-hashing-and-spilling" >}}) below for more details.  +The key factor that will impact the performance of this algorithm is whether the data can be evenly distributed into different hash partitions. If we have skewed values, which will result in a few very big partitions, then an extra partitioning step is needed to divide the big partitions down to many. This can happen recursively. Refer to [Recursive Hashing and Spilling]({{% ref "#recursive-hashing-and-spilling" %}}) below for more details.  # Algorithm diff --git a/content/Development/desingdocs/indexdev.md b/content/Development/desingdocs/indexdev.md index 5379e13c..77e62cb8 100644 --- a/content/Development/desingdocs/indexdev.md +++ b/content/Development/desingdocs/indexdev.md @@ -10,19 +10,19 @@ date: 2024-12-12 There are alternate options which might work similarily to indexing: * Materialized views with automatic rewriting can result in very similar results.  [Hive 2.3.0](https://issues.apache.org/jira/browse/HIVE-14249) adds support for materialzed views. -* Using columnar file formats ([Parquet]({{< ref "parquet" >}}), [ORC](https://orc.apache.org/docs/indexes.html)) – they can do selective scanning; they may even skip entire files/blocks. +* Using columnar file formats ([Parquet]({{% ref "parquet" %}}), [ORC](https://orc.apache.org/docs/indexes.html)) – they can do selective scanning; they may even skip entire files/blocks. Indexing has been **removed** in version 3.0 ([HIVE-18448](https://issues.apache.org/jira/browse/HIVE-18448)). ## Introduction -This document explains the proposed design for adding index support to Hive ([HIVE-417](http://issues.apache.org/jira/browse/HIVE-417)). Indexing is a standard database technique, but with many possible variations. Rather than trying to provide a "one-size-fits-all" index implementation, the approach we are taking is to define indexing in a pluggable manner (related to [StorageHandlers]({{< ref "storagehandlers" >}})) and provide one concrete indexing implementation as a reference, leaving it open for contributors to plug in other indexing schemes as time goes by. No index support will be available until Hive 0.7. +This document explains the proposed design for adding index support to Hive ([HIVE-417](http://issues.apache.org/jira/browse/HIVE-417)). Indexing is a standard database technique, but with many possible variations. Rather than trying to provide a "one-size-fits-all" index implementation, the approach we are taking is to define indexing in a pluggable manner (related to [StorageHandlers]({{% ref "storagehandlers" %}})) and provide one concrete indexing implementation as a reference, leaving it open for contributors to plug in other indexing schemes as time goes by. No index support will be available until Hive 0.7. ## Scope Only single-table indexes are supported. Others (such as join indexes) may be more appropriately expressed as materialized views once Hive has support for those. -This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on [FilterPushdownDev]({{< ref "filterpushdowndev" >}})). +This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on [FilterPushdownDev]({{% ref "filterpushdowndev" %}})). ## CREATE INDEX @@ -44,7 +44,7 @@ AS 'index.handler.class.name' ``` -For the details of the various clauses such as ROW FORMAT, see [Create Table]({{< ref "#create-table" >}}). +For the details of the various clauses such as ROW FORMAT, see [Create Table]({{% ref "#create-table" %}}). By default, index partitioning matches the partitioning of the base table. The PARTITIONED BY clause may be used to specify a subset of the table's partitioning columns (this column list may be empty to indicate that the index spans all partitions of the table). For example, a table may be partitioned by date+region even though the index is partitioned by date alone (each index partition spanning all regions). @@ -151,7 +151,7 @@ ALTER INDEX index_name ON table_name [PARTITION (...)] REBUILD ``` -For the PARTITION clause syntax, see [LanguageManual DDL#Add_Partitions]({{< ref "#languagemanual-ddl#add_partitions" >}}). +For the PARTITION clause syntax, see [LanguageManual DDL#Add_Partitions]({{% ref "#languagemanual-ddl#add_partitions" %}}). If WITH DEFERRED REBUILD is specified on CREATE INDEX, then the newly created index is initially empty (regardless of whether the table contains any data). The ALTER INDEX ... REBUILD command can be used to build the index structure for all partitions or a single partition. diff --git a/content/Development/desingdocs/listbucketing.md b/content/Development/desingdocs/listbucketing.md index 81d7bdd6..e421730e 100644 --- a/content/Development/desingdocs/listbucketing.md +++ b/content/Development/desingdocs/listbucketing.md @@ -78,7 +78,7 @@ This approach does not scale in the following scenarios: * *Skewed Table* is a table which has skewed information. * *List Bucketing Table* is a skewed table. In addition, it tells Hive to use the list bucketing feature on the skewed table: create sub-directories for skewed values. -A normal skewed table can be used for skewed join, etc. (See the [Skewed Join Optimization]({{< ref "skewed-join-optimization" >}}) design document.) You don't need to define it as a list bucketing table if you don't use the list bucketing feature. +A normal skewed table can be used for skewed join, etc. (See the [Skewed Join Optimization]({{% ref "skewed-join-optimization" %}}) design document.) You don't need to define it as a list bucketing table if you don't use the list bucketing feature. ## List Bucketing Validation @@ -211,5 +211,5 @@ List bucketing was added in Hive 0.10.0 and 0.11.0. * [HIVE-3072](https://issues.apache.org/jira/browse/HIVE-3072):  Hive List Bucketing - DDL support (release 0.10.0) * [HIVE-3073](https://issues.apache.org/jira/browse/HIVE-3073):  Hive List Bucketing - DML support (release 0.11.0) -For more information, see [Skewed Tables in the DDL document]({{< ref "#skewed-tables-in-the-ddl-document" >}}). +For more information, see [Skewed Tables in the DDL document]({{% ref "#skewed-tables-in-the-ddl-document" %}}). diff --git a/content/Development/desingdocs/llap.md b/content/Development/desingdocs/llap.md index 857cb62b..26991148 100644 --- a/content/Development/desingdocs/llap.md +++ b/content/Development/desingdocs/llap.md @@ -6,11 +6,11 @@ date: 2024-12-12 # Apache Hive : LLAP Live Long And Process (LLAP) functionality was added in Hive 2.0 ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926) and associated tasks). [HIVE-9850](https://issues.apache.org/jira/browse/HIVE-9850) links documentation, features, and issues for this enhancement. -For configuration of LLAP, see the LLAP Section of [Configuration Properties]({{< ref "#configuration-properties" >}}). +For configuration of LLAP, see the LLAP Section of [Configuration Properties]({{% ref "#configuration-properties" %}}). ## Overview -Hive has become significantly faster thanks to various features and improvements that were built by the community in recent years, including [Tez]({{< ref "hive-on-tez" >}}) and [Cost-based-optimization]({{< ref "cost-based-optimization-in-hive" >}}). The following were needed to take Hive to the next level: +Hive has become significantly faster thanks to various features and improvements that were built by the community in recent years, including [Tez]({{% ref "hive-on-tez" %}}) and [Cost-based-optimization]({{% ref "cost-based-optimization-in-hive" %}}). The following were needed to take Hive to the next level: * Asynchronous spindle-aware IO * Pre-fetching and caching of column chunks @@ -60,7 +60,7 @@ The daemon off-loads I/O and transformation from compressed format to separate t ## Caching -The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in process in Java objects; cached data is stored in the format described in the [I/O section]({{< ref "#i/o-section" >}}), and kept off-heap (see [Resource management]({{< ref "#resource-management" >}})). +The daemon caches metadata for input files, as well as the data. The metadata and index information can be cached even for data that is not currently cached. Metadata is stored in process in Java objects; cached data is stored in the format described in the [I/O section]({{% ref "#i/o-section" %}}), and kept off-heap (see [Resource management]({{% ref "#resource-management" %}})). * **Eviction policy.** The eviction policy is tuned for analytical workloads with frequent (partial) table-scans. Initially, a simple policy like LRFU is used. The policy is pluggable. * **Caching granularity.** Column-chunks are the unit of data in the cache. This achieves a compromise between low-overhead processing and storage efficiency. The granularity of the chunks depends on the particular file format and execution engine (Vectorized Row Batch size, ORC stripe, etc.). diff --git a/content/Development/desingdocs/locking.md b/content/Development/desingdocs/locking.md index 284cb99e..e309ff60 100644 --- a/content/Development/desingdocs/locking.md +++ b/content/Development/desingdocs/locking.md @@ -84,7 +84,7 @@ The default Hive behavior will not be changed, and concurrency will not be suppo ## Turn Off Concurrency -You can turn off concurrency by setting the following variable to false: [hive.support.concurrency]({{< ref "#hive-support-concurrency" >}}). +You can turn off concurrency by setting the following variable to false: [hive.support.concurrency]({{% ref "#hive-support-concurrency" %}}). ## Debugging @@ -95,18 +95,18 @@ You can see the locks on a table by issuing the following command: * SHOW LOCKS PARTITION (); * SHOW LOCKS PARTITION () EXTENDED; -See also [EXPLAIN LOCKS]({{< ref "#explain-locks" >}}). +See also [EXPLAIN LOCKS]({{% ref "#explain-locks" %}}). ## Configuration -Configuration properties for Hive locking are described in [Locking]({{< ref "#locking" >}}). +Configuration properties for Hive locking are described in [Locking]({{% ref "#locking" %}}). # Locking in Hive Transactions Hive [0.13.0](https://issues.apache.org/jira/browse/HIVE-5317) adds transactions with row-level ACID semantics, using a new lock manager. For more information, see: -* [ACID and Transactions in Hive]({{< ref "hive-transactions" >}}) -* [Lock Manager]({{< ref "#lock-manager" >}}) +* [ACID and Transactions in Hive]({{% ref "hive-transactions" %}}) +* [Lock Manager]({{% ref "#lock-manager" %}}) diff --git a/content/Development/desingdocs/outerjoinbehavior.md b/content/Development/desingdocs/outerjoinbehavior.md index 4f85df07..d12545fe 100644 --- a/content/Development/desingdocs/outerjoinbehavior.md +++ b/content/Development/desingdocs/outerjoinbehavior.md @@ -32,7 +32,7 @@ This captured in the following table: | Join Predicate | Case J1: Not Pushed | Case J2: Pushed | | Where Predicate | Case W1: Pushed | Case W2: Not Pushed | -See [Examples]({{< ref "#examples" >}}) below for illustrations of cases J1, J2, W1, and W2. +See [Examples]({{% ref "#examples" %}}) below for illustrations of cases J1, J2, W1, and W2. ### Hive Implementation diff --git a/content/Development/desingdocs/partitionedviews.md b/content/Development/desingdocs/partitionedviews.md index bb06d3f0..35d5f2cb 100644 --- a/content/Development/desingdocs/partitionedviews.md +++ b/content/Development/desingdocs/partitionedviews.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : PartitionedViews -This is a followup to [ViewDev]({{< ref "viewdev" >}}) for adding partition-awareness to views. +This is a followup to [ViewDev]({{% ref "viewdev" %}}) for adding partition-awareness to views. # Use Cases diff --git a/content/Development/desingdocs/security.md b/content/Development/desingdocs/security.md index 706493c8..44b2d38a 100644 --- a/content/Development/desingdocs/security.md +++ b/content/Development/desingdocs/security.md @@ -9,7 +9,7 @@ This page collects some resources and pointers for various efforts underway to a Authorization modes -The links below refer to the [original Hive authorization mode]({{< ref "hive-deprecated-authorization-mode" >}}). See [Authorization]({{< ref "languagemanual-authorization" >}}) for an overview of authorization modes, which include [storage based authorization]({{< ref "storage-based-authorization-in-the-metastore-server" >}}) and [SQL standards based authorization]({{< ref "sql-standard-based-hive-authorization" >}}). +The links below refer to the [original Hive authorization mode]({{% ref "hive-deprecated-authorization-mode" %}}). See [Authorization]({{% ref "languagemanual-authorization" %}}) for an overview of authorization modes, which include [storage based authorization]({{% ref "storage-based-authorization-in-the-metastore-server" %}}) and [SQL standards based authorization]({{% ref "sql-standard-based-hive-authorization" %}}). * [Thoughts on security from Venkatesh](https://issues.apache.org/jira/secure/attachment/12453831/HiveSecurityThoughts.pdf) * [Howl's approach for persisting and validating DDL authorization via HDFS permissions](http://mail-archives.apache.org/mod_mbox/hadoop-hive-dev/201007.mbox/%3C11ED50FC7C760F4EA9F44109C531617D06F692E6@SNV-EXVS06.ds.corp.yahoo.com%3E) @@ -17,9 +17,9 @@ The links below refer to the [original Hive authorization mode]({{< ref "hive-de * [THRIFT-889: allow Kerberos authentication over Thrift HTTP](https://issues.apache.org/jira/browse/THRIFT-889) * [THRIFT-876: SASL integration](https://issues.apache.org/jira/browse/THRIFT-876) * [Howl Authorization Proposal](http://wiki.apache.org/pig/Howl/HowlAuthorizationProposal) -* [Hive Authorization Proposal]({{< ref "authdev" >}}) +* [Hive Authorization Proposal]({{% ref "authdev" %}}) -Note that Howl was the precursor to [HCatalog]({{< ref "hcatalog-usinghcat" >}}). +Note that Howl was the precursor to [HCatalog]({{% ref "hcatalog-usinghcat" %}}). diff --git a/content/Development/desingdocs/skewed-join-optimization.md b/content/Development/desingdocs/skewed-join-optimization.md index 95d39952..1da8e39f 100644 --- a/content/Development/desingdocs/skewed-join-optimization.md +++ b/content/Development/desingdocs/skewed-join-optimization.md @@ -45,7 +45,7 @@ The assumption is that B has few rows with keys which are skewed in A. So these ### Hive Enhancements -*Original plan:*  ~~The skew data will be obtained from list bucketing (see the [List Bucketing]({{< ref "listbucketing" >}})~~~~design document). There will be no additions to the Hive grammar.~~ +*Original plan:*  ~~The skew data will be obtained from list bucketing (see the [List Bucketing]({{% ref "listbucketing" %}})~~~~design document). There will be no additions to the Hive grammar.~~ -*Implementation:*  Starting in Hive 0.10.0, tables can be created as skewed or altered to be skewed (in which case partitions created after the ALTER statement will be skewed). In addition, skewed tables can use the list bucketing feature by specifying the STORED AS DIRECTORIES option. See the DDL documentation for details: [Create Table]({{< ref "#create-table" >}}), [Skewed Tables]({{< ref "#skewed-tables" >}}), and [Alter Table Skewed or Stored as Directories]({{< ref "#alter-table-skewed-or-stored-as-directories" >}}). +*Implementation:*  Starting in Hive 0.10.0, tables can be created as skewed or altered to be skewed (in which case partitions created after the ALTER statement will be skewed). In addition, skewed tables can use the list bucketing feature by specifying the STORED AS DIRECTORIES option. See the DDL documentation for details: [Create Table]({{% ref "#create-table" %}}), [Skewed Tables]({{% ref "#skewed-tables" %}}), and [Alter Table Skewed or Stored as Directories]({{% ref "#alter-table-skewed-or-stored-as-directories" %}}). diff --git a/content/Development/desingdocs/statsdev.md b/content/Development/desingdocs/statsdev.md index 79848979..6120dc71 100644 --- a/content/Development/desingdocs/statsdev.md +++ b/content/Development/desingdocs/statsdev.md @@ -29,7 +29,7 @@ Table and partition level statistics were added in Hive 0.7.0 by [HIVE-1361](htt ### Column Statistics -The second milestone was to support column level statistics. See [Column Statistics in Hive]({{< ref "column-statistics-in-hive" >}}) in the Design Documents. +The second milestone was to support column level statistics. See [Column Statistics in Hive]({{% ref "column-statistics-in-hive" %}}) in the Design Documents. Supported column stats are: @@ -54,7 +54,7 @@ Column level statistics were added in Hive 0.10.0 by [HIVE-1362](https://issues. ### Top K Statistics -[Column level top K statistics]({{< ref "top-k-stats" >}}) are still pending; see [HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421). +[Column level top K statistics]({{% ref "top-k-stats" %}}) are still pending; see [HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421). ## Quick overview @@ -63,9 +63,9 @@ Column level statistics were added in Hive 0.10.0 by [HIVE-1362](https://issues. | Number of partition the dataset consists of | Fictional metastore property: **numPartitions** | computed during displaying the properties of a partitioned table | [Hive 2.3](https://issues.apache.org/jira/browse/HIVE-16315) | | Number of files the dataset consists of | Metastore table property: **numFiles** | Automatically during Metastore operations | | | Total size of the dataset as its seen at the filesystem level | Metastore table property: **totalSize** | | -| Uncompressed size of the dataset | Metastore table property: **rawDataSize** | Computed, these are the basic statistics. Calculated automatically when [hive.stats.autogather]({{< ref "#hive-stats-autogather" >}}) is enabled.Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS | [Hive 0.8](https://issues.apache.org/jira/browse/HIVE-2185) | +| Uncompressed size of the dataset | Metastore table property: **rawDataSize** | Computed, these are the basic statistics. Calculated automatically when [hive.stats.autogather]({{% ref "#hive-stats-autogather" %}}) is enabled.Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS | [Hive 0.8](https://issues.apache.org/jira/browse/HIVE-2185) | | Number of rows the dataset consist of | Metastore table property: **numRows** | | -| Column level statistics | Metastore; TAB_COL_STATS table | Computed, Calculated automatically when [hive.stats.column.autogather]({{< ref "#hive-stats-column-autogather" >}}) is enabled.Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS | | +| Column level statistics | Metastore; TAB_COL_STATS table | Computed, Calculated automatically when [hive.stats.column.autogather]({{% ref "#hive-stats-column-autogather" %}}) is enabled.Can be collected manually by: ANALYZE TABLE ... COMPUTE STATISTICS FOR COLUMNS | | @@ -155,25 +155,25 @@ public interface IStatsAggregator { ### Configuration Variables -See [Statistics]({{< ref "#statistics" >}}) in [Configuration Properties]({{< ref "configuration-properties" >}}) for a list of the variables that configure Hive table statistics. [Configuring Hive]({{< ref "#configuring-hive" >}}) describes how to use the variables. +See [Statistics]({{% ref "#statistics" %}}) in [Configuration Properties]({{% ref "configuration-properties" %}}) for a list of the variables that configure Hive table statistics. [Configuring Hive]({{% ref "#configuring-hive" %}}) describes how to use the variables. ### Newly Created Tables -For newly created tables and/or partitions (that are populated through the [INSERT OVERWRITE](http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Inserting_data_into_Hive_Tables_from_queries) command), statistics are automatically computed by default. The user has to explicitly set the boolean variable **[hive.stats.autogather]({{< ref "#hive-stats-autogather" >}})** to **false** so that statistics are not automatically computed and stored into Hive MetaStore. +For newly created tables and/or partitions (that are populated through the [INSERT OVERWRITE](http://wiki.apache.org/hadoop/Hive/LanguageManual/DML#Inserting_data_into_Hive_Tables_from_queries) command), statistics are automatically computed by default. The user has to explicitly set the boolean variable **[hive.stats.autogather]({{% ref "#hive-stats-autogather" %}})** to **false** so that statistics are not automatically computed and stored into Hive MetaStore. ``` set hive.stats.autogather=false; ``` -The user can also specify the implementation to be used for the storage of temporary statistics setting the variable **[hive.stats.dbclass]({{< ref "#hive-stats-dbclass" >}})**. For example, to set HBase as the implementation of temporary statistics storage (the default is `jdbc:derby` or `fs`, depending on the Hive version) the user should issue the following command: +The user can also specify the implementation to be used for the storage of temporary statistics setting the variable **[hive.stats.dbclass]({{% ref "#hive-stats-dbclass" %}})**. For example, to set HBase as the implementation of temporary statistics storage (the default is `jdbc:derby` or `fs`, depending on the Hive version) the user should issue the following command: ``` set hive.stats.dbclass=hbase; ``` -In case of JDBC implementations of temporary stored statistics (ex. Derby or MySQL), the user should specify the appropriate connection string to the database by setting the variable **[hive.stats.dbconnectionstring]({{< ref "#hive-stats-dbconnectionstring" >}})**. +In case of JDBC implementations of temporary stored statistics (ex. Derby or MySQL), the user should specify the appropriate connection string to the database by setting the variable **[hive.stats.dbconnectionstring]({{% ref "#hive-stats-dbconnectionstring" %}})**. ``` set hive.stats.dbclass=jdbc:derby; @@ -181,7 +181,7 @@ set hive.stats.dbconnectionstring="jdbc:derby:;databaseName=TempStatsStore;creat ``` -Queries can fail to collect stats completely accurately. There is a setting **[hive.stats.reliable]({{< ref "#hive-stats-reliable" >}})** that fails queries if the stats can't be reliably collected. This is `false` by default. +Queries can fail to collect stats completely accurately. There is a setting **[hive.stats.reliable]({{% ref "#hive-stats-reliable" %}})** that fails queries if the stats can't be reliably collected. This is `false` by default. ### Existing Tables – ANALYZE @@ -205,7 +205,7 @@ When the optional parameter NOSCAN is specified, the command won't scan files so Version 0.10.0: FOR COLUMNS -As of [Hive 0.10.0](https://issues.apache.org/jira/browse/HIVE-1362), the optional parameter FOR COLUMNS computes column statistics for all columns in the specified table (and for all partitions if the table is partitioned). See [Column Statistics in Hive]({{< ref "column-statistics-in-hive" >}}) for details. +As of [Hive 0.10.0](https://issues.apache.org/jira/browse/HIVE-1362), the optional parameter FOR COLUMNS computes column statistics for all columns in the specified table (and for all partitions if the table is partitioned). See [Column Statistics in Hive]({{% ref "column-statistics-in-hive" %}}) for details. To display these statistics, use DESCRIBE FORMATTED [*db_name*.]*table_name* *column_name* [PARTITION (*partition_spec*)]. @@ -363,7 +363,7 @@ then statistics, number of files and physical size in bytes are gathered for par Feature not implemented -Hive Metastore on HBase was discontinued and removed in Hive 3.0.0. See [HBaseMetastoreDevelopmentGuide]({{< ref "hbasemetastoredevelopmentguide" >}}) +Hive Metastore on HBase was discontinued and removed in Hive 3.0.0. See [HBaseMetastoreDevelopmentGuide]({{% ref "hbasemetastoredevelopmentguide" %}}) diff --git a/content/Development/desingdocs/storagehandlers.md b/content/Development/desingdocs/storagehandlers.md index 468762cb..6166502f 100644 --- a/content/Development/desingdocs/storagehandlers.md +++ b/content/Development/desingdocs/storagehandlers.md @@ -7,17 +7,17 @@ date: 2024-12-12 # Hive Storage Handlers -* [Hive Storage Handlers]({{< ref "#hive-storage-handlers" >}}) - + [Introduction]({{< ref "#introduction" >}}) - + [Terminology]({{< ref "#terminology" >}}) - + [DDL]({{< ref "#ddl" >}}) - + [Storage Handler Interface]({{< ref "#storage-handler-interface" >}}) - + [HiveMetaHook Interface]({{< ref "#hivemetahook-interface" >}}) - + [Open Issues]({{< ref "#open-issues" >}}) +* [Hive Storage Handlers]({{% ref "#hive-storage-handlers" %}}) + + [Introduction]({{% ref "#introduction" %}}) + + [Terminology]({{% ref "#terminology" %}}) + + [DDL]({{% ref "#ddl" %}}) + + [Storage Handler Interface]({{% ref "#storage-handler-interface" %}}) + + [HiveMetaHook Interface]({{% ref "#hivemetahook-interface" %}}) + + [Open Issues]({{% ref "#open-issues" %}}) ## Introduction -This page documents the storage handler support being added to Hive as part of work on [HBaseIntegration]({{< ref "hbaseintegration" >}}). The motivation is to make it possible to allow Hive to access data stored and managed by other systems in a modular, extensible fashion. +This page documents the storage handler support being added to Hive as part of work on [HBaseIntegration]({{% ref "hbaseintegration" %}}). The motivation is to make it possible to allow Hive to access data stored and managed by other systems in a modular, extensible fashion. Besides HBase, a storage handler implementation is also available for [Hypertable](http://code.google.com/p/hypertable/wiki/HiveExtension), and others are being developed for [Cassandra](https://issues.apache.org/jira/browse/HIVE-1434), [Azure Table](https://blogs.msdn.microsoft.com/mostlytrue/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight/), [JDBC](/docs/latest/user/jdbc-storage-handler) (MySQL and others), [MongoDB](https://github.com/yc-huang/Hive-mongo), [ElasticSearch](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html), [Phoenix HBase](https://phoenix.apache.org/hive_storage_handler.html?platform=hootsuite), [VoltDB](https://issues.voltdb.com/browse/ENG-10736?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel) and [Google Spreadsheets](https://github.com/balshor/gdata-storagehandler).  A [Kafka handler](https://github.com/HiveKa/HiveKa) demo is available. @@ -65,7 +65,7 @@ CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS cannot be specified, however starting from [Hive 4.0](/docs/latest/user/hive-iceberg-integration), they can coexist to create the Iceberg table, this is the only exception. Optional SERDEPROPERTIES can be specified as part of the STORED BY clause and will be passed to the serde provided by the storage handler. -See [CREATE TABLE]({{< ref "#create-table" >}}) and [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}) for more information. +See [CREATE TABLE]({{% ref "#create-table" %}}) and [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}) for more information. Example: @@ -113,7 +113,7 @@ The `HiveMetaHook` is optional, and described in the next section. If `getMetaHo The `configureTableJobProperties` method is called as part of planning a job for execution by Hadoop. It is the responsibility of the storage handler to examine the table definition and set corresponding attributes on jobProperties. At execution time, only these jobProperties will be available to the input format, output format, and serde. -See also [FilterPushdownDev]({{< ref "filterpushdowndev" >}}) to learn how a storage handler can participate in filter evaluation (to avoid full-table scans). +See also [FilterPushdownDev]({{% ref "filterpushdowndev" %}}) to learn how a storage handler can participate in filter evaluation (to avoid full-table scans). ## HiveMetaHook Interface diff --git a/content/Development/desingdocs/updatableviews.md b/content/Development/desingdocs/updatableviews.md index 98ed3aae..b76a5071 100644 --- a/content/Development/desingdocs/updatableviews.md +++ b/content/Development/desingdocs/updatableviews.md @@ -35,7 +35,7 @@ Notes: * With non-dynamic partitioning, do we require the partition be on each view in the updatable view chain?  This seems burdensome if you don't have write access to all the views? * When we specify dynamic partitions for a view, do we create partitions on each view in a chain of updatable views?  If we don't, there may be strange behavior where SHOW PARTITIONS may not show anything on a view, but we can insert into such partitions of a view.  If we do, drop partition on the view actually does nothing to the data. -See [Hive Views]({{< ref "viewdev" >}}) for general information about views. +See [Hive Views]({{% ref "viewdev" %}}) for general information about views. diff --git a/content/Development/desingdocs/vectorized-query-execution.md b/content/Development/desingdocs/vectorized-query-execution.md index 655c4ce7..90f5c5c6 100644 --- a/content/Development/desingdocs/vectorized-query-execution.md +++ b/content/Development/desingdocs/vectorized-query-execution.md @@ -13,7 +13,7 @@ Vectorized query execution is a Hive feature that greatly reduces the CPU usage ## Enabling vectorized execution -To use vectorized query execution, you must store your data in [ORC]({{< ref "languagemanual-orc" >}}) format, and set the following variable as shown in Hive SQL (see [Configuring Hive]({{< ref "#configuring-hive" >}})): +To use vectorized query execution, you must store your data in [ORC]({{% ref "languagemanual-orc" %}}) format, and set the following variable as shown in Hive SQL (see [Configuring Hive]({{% ref "#configuring-hive" %}})): `set hive.vectorized.execution.enabled = true;` @@ -21,7 +21,7 @@ Vectorized execution is off by default, so your queries only utilize it if this `set hive.vectorized.execution.enabled = false;` -Additional configuration variables for vectorized execution are documented in [Configuration Properties – Vectorization]({{< ref "#configuration-properties –-vectorization" >}}). +Additional configuration variables for vectorized execution are documented in [Configuration Properties – Vectorization]({{% ref "#configuration-properties –-vectorization" %}}). ## Supported data types and operations @@ -36,7 +36,7 @@ The following data types are currently supported for vectorized execution: * `double` * `decimal` * `date` -* `timestamp` (see [Limitations]({{< ref "#limitations" >}}) below) +* `timestamp` (see [Limitations]({{% ref "#limitations" %}}) below) * `string` Using other data types will cause your query to execute using standard, row-at-a-time execution. @@ -71,7 +71,7 @@ Vectorized support continues to be added for additional functions and expression ## Seeing whether vectorization is used for a query -You can verify which parts of your query are being vectorized using the **[explain]({{< ref "languagemanual-explain" >}})** feature. For example, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "`Vectorized execution: true`" notation: +You can verify which parts of your query are being vectorized using the **[explain]({{% ref "languagemanual-explain" %}})** feature. For example, when Fetch is used in the plan instead of Map, it does not vectorize and the explain output will not include the "`Vectorized execution: true`" notation: ``` create table vectorizedtable(state string,id int) stored as orc; diff --git a/content/Development/desingdocs/viewdev.md b/content/Development/desingdocs/viewdev.md index 151abeec..441893a5 100644 --- a/content/Development/desingdocs/viewdev.md +++ b/content/Development/desingdocs/viewdev.md @@ -14,7 +14,7 @@ Views () are a standard DBMS feat At a minimum, we want to * add queryable view support at the SQL language level (specifics of the scoping are under discussion in the Issues section below) - + updatable views will not be supported (see the [Updatable Views]({{< ref "updatableviews" >}}) proposal) + + updatable views will not be supported (see the [Updatable Views]({{% ref "updatableviews" %}}) proposal) * make sure views and their definitions show up anywhere tables can currently be enumerated/searched/described * where relevant, provide additional metadata to allow views to be distinguished from tables @@ -147,7 +147,7 @@ SQL:200n prohibits ORDER BY in a view definition, since a view is supposed to be **Update 30-Dec-2009**: Prasad pointed out that even without supporting materialized views, it may be necessary to provide users with metadata about data dependencies between views and underlying table partitions so that users can avoid seeing inconsistent results during the window when not all partitions have been refreshed with the latest data. One option is to attempt to derive this information automatically (using an overconservative guess in cases where the dependency analysis can't be made smart enough); another is to allow view creators to declare the dependency rules in some fashion as part of the view definition. Based on a design review meeting, we will probably go with the automatic analysis approach once dependency tracking is implemented. The analysis will be performed on-demand, perhaps as part of describing the view or submitting a query job against it. Until this becomes available, users may be able to do their own analysis either via empirical lineage tools or via view->table dependency tracking metadata once it is implemented. See HIVE-1079. -**Update 1-Feb-2011**: For the latest on this, see [PartitionedViews]({{< ref "partitionedviews" >}}). +**Update 1-Feb-2011**: For the latest on this, see [PartitionedViews]({{% ref "partitionedviews" %}}). ## Metastore Upgrades diff --git a/content/Development/gettingstarted-latest.md b/content/Development/gettingstarted-latest.md index e0712ef4..4025185b 100644 --- a/content/Development/gettingstarted-latest.md +++ b/content/Development/gettingstarted-latest.md @@ -148,7 +148,7 @@ You may find it useful, though it's not necessary, to set `HIVE_HOME`: #### Running Hive CLI -To use the Hive [command line interface]({{< ref "languagemanual-cli" >}}) (CLI) from the shell: +To use the Hive [command line interface]({{% ref "languagemanual-cli" %}}) (CLI) from the shell: ``` $ $HIVE_HOME/bin/hive @@ -164,7 +164,7 @@ Starting from Hive 2.1, we need to run the schematool command below as an initia ``` -[HiveServer2]({{< ref "setting-up-hiveserver2" >}}) (introduced in Hive 0.11) has its own CLI called [Beeline]({{< ref "hiveserver2-clients" >}}).  HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities of HiveServer2.  To run HiveServer2 and Beeline from shell: +[HiveServer2]({{% ref "setting-up-hiveserver2" %}}) (introduced in Hive 0.11) has its own CLI called [Beeline]({{% ref "hiveserver2-clients" %}}).  HiveCLI is now deprecated in favor of Beeline, as it lacks the multi-user, security, and other capabilities of HiveServer2.  To run HiveServer2 and Beeline from shell: ``` $ $HIVE_HOME/bin/hiveserver2 @@ -197,7 +197,7 @@ To use the HCatalog command line interface (CLI) in Hive release 0.11.0 and late ``` -For more information, see [HCatalog Installation from Tarball]({{< ref "hcatalog-installhcat" >}}) and [HCatalog CLI]({{< ref "hcatalog-cli" >}}) in the [HCatalog manual]({{< ref "hcatalog-base" >}}). +For more information, see [HCatalog Installation from Tarball]({{% ref "hcatalog-installhcat" %}}) and [HCatalog CLI]({{% ref "hcatalog-cli" %}}) in the [HCatalog manual]({{% ref "hcatalog-base" %}}). #### Running WebHCat (Templeton) @@ -208,7 +208,7 @@ To run the WebHCat server from the shell in Hive release 0.11.0 and later: ``` -For more information, see [WebHCat Installation]({{< ref "webhcat-installwebhcat" >}}) in the [WebHCat manual]({{< ref "webhcat-base" >}}). +For more information, see [WebHCat Installation]({{% ref "webhcat-installwebhcat" %}}) in the [WebHCat manual]({{% ref "webhcat-base" %}}). ### Configuration Management Overview @@ -305,21 +305,21 @@ Another option for logging is TimeBasedRollingPolicy (applicable for Hive 1.1.0 Note that setting `hive.root.logger` via the 'set' command does not change logging properties since they are determined at initialization time. -Hive also stores query logs on a per Hive session basis in `/tmp//`, but can be configured in [hive-site.xml]({{< ref "adminmanual-configuration" >}}) with the `[hive.querylog.location]({{< ref "#hive-querylog-location" >}})` property.  Starting with Hive 1.1.0, [EXPLAIN EXTENDED]({{< ref "#explain-extended" >}}) output for queries can be logged at the INFO level by setting the `[hive.log.explain.output]({{< ref "#hive-log-explain-output" >}})` property to true. +Hive also stores query logs on a per Hive session basis in `/tmp//`, but can be configured in [hive-site.xml]({{% ref "adminmanual-configuration" %}}) with the `[hive.querylog.location]({{% ref "#hive-querylog-location" %}})` property.  Starting with Hive 1.1.0, [EXPLAIN EXTENDED]({{% ref "#explain-extended" %}}) output for queries can be logged at the INFO level by setting the `[hive.log.explain.output]({{% ref "#hive-log-explain-output" %}})` property to true. Logging during Hive execution on a Hadoop cluster is controlled by Hadoop configuration. Usually Hadoop will produce one log file per map and reduce task stored on the cluster machine(s) where the task was executed. The log files can be obtained by clicking through to the Task Details page from the Hadoop JobTracker Web UI. When using local mode (using `mapreduce.framework.name=local`), Hadoop/Hive execution logs are produced on the client machine itself. Starting with release 0.6 – Hive uses the `hive-exec-log4j.properties` (falling back to `hive-log4j.properties` only if it's missing) to determine where these logs are delivered by default. The default configuration file produces one log file per query executed in local mode and stores it under `/tmp/`. The intent of providing a separate configuration file is to enable administrators to centralize execution log capture if desired (on a NFS file server for example). Execution logs are invaluable for debugging run-time errors. -For information about WebHCat errors and logging, see [Error Codes and Responses]({{< ref "#error-codes-and-responses" >}}) and [Log Files]({{< ref "#log-files" >}}) in the [WebHCat manual]({{< ref "webhcat-base" >}}). +For information about WebHCat errors and logging, see [Error Codes and Responses]({{% ref "#error-codes-and-responses" %}}) and [Log Files]({{% ref "#log-files" %}}) in the [WebHCat manual]({{% ref "webhcat-base" %}}). Error logs are very useful to debug problems. Please send them with any bugs (of which there are many!) to `hive-dev@hadoop.apache.org`. -From Hive 2.1.0 onwards (with [HIVE-13027](https://issues.apache.org/jira/browse/HIVE-13027)), Hive uses Log4j2's asynchronous logger by default. Setting [hive.async.log.enabled]({{< ref "#hive-async-log-enabled" >}}) to false will disable asynchronous logging and fallback to synchronous logging. Asynchronous logging can give significant performance improvement as logging will be handled in a separate thread that uses the LMAX disruptor queue for buffering log messages. Refer to  for benefits and drawbacks. +From Hive 2.1.0 onwards (with [HIVE-13027](https://issues.apache.org/jira/browse/HIVE-13027)), Hive uses Log4j2's asynchronous logger by default. Setting [hive.async.log.enabled]({{% ref "#hive-async-log-enabled" %}}) to false will disable asynchronous logging and fallback to synchronous logging. Asynchronous logging can give significant performance improvement as logging will be handled in a separate thread that uses the LMAX disruptor queue for buffering log messages. Refer to  for benefits and drawbacks. #### HiveServer2 Logs -HiveServer2 operation logs are available to clients starting in Hive 0.14. See [HiveServer2 Logging]({{< ref "#hiveserver2-logging" >}}) for configuration. +HiveServer2 operation logs are available to clients starting in Hive 0.14. See [HiveServer2 Logging]({{% ref "#hiveserver2-logging" %}}) for configuration. #### Audit Logs @@ -339,7 +339,7 @@ If the logger level has already been set to DEBUG at root via hive.root.logger, ## DDL Operations - The Hive DDL operations are documented in [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}). + The Hive DDL operations are documented in [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}). ### Creating Hive Tables @@ -384,7 +384,7 @@ shows the list of columns. ### Altering and Dropping Tables -Table names can be [changed]({{< ref "#changed" >}}) and columns can be [added or replaced]({{< ref "#added-or-replaced" >}}): +Table names can be [changed]({{% ref "#changed" %}}) and columns can be [added or replaced]({{% ref "#added-or-replaced" %}}): ``` hive> ALTER TABLE events RENAME TO 3koobecaf; @@ -418,11 +418,11 @@ Metastore can be stored in any database that is supported by JPOX. The location In the future, the metastore itself can be a standalone server. -If you want to run the metastore as a network server so it can be accessed from multiple nodes, see [Hive Using Derby in Server Mode]({{< ref "hivederbyservermode" >}}). +If you want to run the metastore as a network server so it can be accessed from multiple nodes, see [Hive Using Derby in Server Mode]({{% ref "hivederbyservermode" %}}). ## DML Operations -The Hive DML operations are documented in [Hive Data Manipulation Language]({{< ref "languagemanual-dml" >}}). +The Hive DML operations are documented in [Hive Data Manipulation Language]({{% ref "languagemanual-dml" %}}). Loading data from flat files into Hive: @@ -459,7 +459,7 @@ Note that loading data from HDFS will result in moving the file/directory. As a ## SQL Operations -The Hive query operations are documented in [Select]({{< ref "languagemanual-select" >}}). +The Hive query operations are documented in [Select]({{% ref "languagemanual-select" %}}). ### Example Queries diff --git a/content/community/meetings/contributorday2011.md b/content/community/meetings/contributorday2011.md index 368b223a..4f21ee70 100644 --- a/content/community/meetings/contributorday2011.md +++ b/content/community/meetings/contributorday2011.md @@ -9,7 +9,7 @@ date: 2024-12-12 Resources for mini-hackathon: -* [PluginDeveloperKit]({{< ref "plugindeveloperkit" >}}) has info on the new pdk; download [this snapshot build](http://people.apache.org/~jvs/hive-0.8.0-pdk-SNAPSHOT.tar.gz) of Hive which includes it -* you'll need a Mac or Linux development environment with Hive+Hadoop already installed on it per [these instructions]({{< ref "gettingstarted-latest" >}}); for Hive, use the snapshot +* [PluginDeveloperKit]({{% ref "plugindeveloperkit" %}}) has info on the new pdk; download [this snapshot build](http://people.apache.org/~jvs/hive-0.8.0-pdk-SNAPSHOT.tar.gz) of Hive which includes it +* you'll need a Mac or Linux development environment with Hive+Hadoop already installed on it per [these instructions]({{% ref "gettingstarted-latest" %}}); for Hive, use the snapshot * you'll also need [Apache ant](http://ant.apache.org) installed. * [HIVE-1545](https://issues.apache.org/jira/browse/HIVE-1545) has the UDF libraries we'd like to get cleaned up for inclusion in Hive or extension libraries (download core.tar.gz and/or ext.tar.gz) diff --git a/content/community/meetings/development-contributorsmeetings.md b/content/community/meetings/development-contributorsmeetings.md index 0feefc3a..1ecc71c7 100644 --- a/content/community/meetings/development-contributorsmeetings.md +++ b/content/community/meetings/development-contributorsmeetings.md @@ -11,18 +11,18 @@ Active contributors to the Hive project are invited to attend the monthly Hive C ## Meeting Minutes -* [April 18, 2012]({{< ref "contributorminutes20120418" >}}) -* [December 5, 2011]({{< ref "contributorminutes20111205" >}}) -* [September 7, 2011]({{< ref "contributorminutes20110907" >}}) -* [July 26, 2011]({{< ref "contributorsminutes110726" >}}) -* [June 30, 2011]({{< ref "contributorday2011" >}}) -* [April 25, 2011]({{< ref "development-contributorsmeetings-hivecontributorsminutes110425" >}}) +* [April 18, 2012]({{% ref "contributorminutes20120418" %}}) +* [December 5, 2011]({{% ref "contributorminutes20111205" %}}) +* [September 7, 2011]({{% ref "contributorminutes20110907" %}}) +* [July 26, 2011]({{% ref "contributorsminutes110726" %}}) +* [June 30, 2011]({{% ref "contributorday2011" %}}) +* [April 25, 2011]({{% ref "development-contributorsmeetings-hivecontributorsminutes110425" %}}) * January 11, 2011 (forgot to take notes) -* [October 25, 2010]({{< ref "development-contributorsmeetings-hivecontributorsminutes101025" >}}) -* [September 13, 2010]({{< ref "development-contributorsmeetings-hivecontributorsminutes100913" >}}) -* [August 8, 2010]({{< ref "development-contributorsmeetings-hivecontributorsminutes100808" >}}) -* [July 6, 2010]({{< ref "development-contributorsmeetings-hivecontributorsminutes100706" >}}) -* [June 1, 2010]({{< ref "development-contributorsmeetings-hivecontributorsminutes100601" >}}) +* [October 25, 2010]({{% ref "development-contributorsmeetings-hivecontributorsminutes101025" %}}) +* [September 13, 2010]({{% ref "development-contributorsmeetings-hivecontributorsminutes100913" %}}) +* [August 8, 2010]({{% ref "development-contributorsmeetings-hivecontributorsminutes100808" %}}) +* [July 6, 2010]({{% ref "development-contributorsmeetings-hivecontributorsminutes100706" %}}) +* [June 1, 2010]({{% ref "development-contributorsmeetings-hivecontributorsminutes100601" %}}) diff --git a/content/community/resources/books-about-hive.md b/content/community/resources/books-about-hive.md index b1096c11..091557ef 100644 --- a/content/community/resources/books-about-hive.md +++ b/content/community/resources/books-about-hive.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : Books about Hive -These books describe Apache Hive and explain how to use its features. If you know of others that should be listed here, or newer editions, please send a message to the [Hive user mailing list](http://hive.apache.org/mailing_lists.html) or add the information yourself if you have [wiki edit privileges]({{< ref "#wiki-edit-privileges" >}}). Most links go to the publishers although you can also buy most of these books from bookstores, either online or brick-and-mortar. +These books describe Apache Hive and explain how to use its features. If you know of others that should be listed here, or newer editions, please send a message to the [Hive user mailing list](http://hive.apache.org/mailing_lists.html) or add the information yourself if you have [wiki edit privileges]({{% ref "#wiki-edit-privileges" %}}). Most links go to the publishers although you can also buy most of these books from bookstores, either online or brick-and-mortar. * *[Programming Hive](http://shop.oreilly.com/product/0636920023555.do)* by Edward Capriolo, Dean Wampler, and Jason Rutherglen – O'Reilly Media, 2012 * *[Apache Hive Essentials](https://www.packtpub.com/application-development/apache-hive-essentials-second-edition)* by Dayong Du – Packt Publishing, [2015](http://bit.ly/1QVANQA) and [2018 (second edition)](https://www.packtpub.com/application-development/apache-hive-essentials-second-edition) diff --git a/content/community/resources/developerguide.md b/content/community/resources/developerguide.md index e31fe648..f36d0601 100644 --- a/content/community/resources/developerguide.md +++ b/content/community/resources/developerguide.md @@ -61,26 +61,26 @@ Hive currently uses these SerDe classes to serialize and deserialize data: ALTER TABLE person SET SERDEPROPERTIES ('serialization.encoding'='GBK'); ``` -LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property [hive.lazysimple.extended_boolean_literal]({{< ref "#hive-lazysimple-extended_boolean_literal" >}}) is set to `true` ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-3635) and later). The default is `false`, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals. +LazySimpleSerDe can treat 'T', 't', 'F', 'f', '1', and '0' as extended, legal boolean literals if the configuration property [hive.lazysimple.extended_boolean_literal]({{% ref "#hive-lazysimple-extended_boolean_literal" %}}) is set to `true` ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-3635) and later). The default is `false`, which means only 'TRUE' and 'FALSE' are treated as legal boolean literals. * ThriftSerDe: This SerDe is used to read/write Thrift serialized objects. The class file for the Thrift object must be loaded first. * DynamicSerDe: This SerDe also read/write Thrift serialized objects, but it understands Thrift DDL so the schema of the object can be provided at runtime. Also it supports a lot of different protocols, including TBinaryProtocol, TJSONProtocol, TCTLSeparatedProtocol (which writes data in delimited records). Also: -* For JSON files, [JsonSerDe]({{< ref "#jsonserde" >}}) was added in Hive 0.12.0. An Amazon SerDe is available at `s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar` for releases prior to 0.12.0. -* An [Avro SerDe]({{< ref "avroserde" >}}) was added in Hive 0.9.1.  Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause. -* A SerDe for the [ORC]({{< ref "languagemanual-orc" >}}) file format was added in Hive 0.11.0. -* A SerDe for [Parquet]({{< ref "parquet" >}}) was added via plug-in in Hive 0.10 and natively in Hive 0.13.0. +* For JSON files, [JsonSerDe]({{% ref "#jsonserde" %}}) was added in Hive 0.12.0. An Amazon SerDe is available at `s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar` for releases prior to 0.12.0. +* An [Avro SerDe]({{% ref "avroserde" %}}) was added in Hive 0.9.1.  Starting in Hive 0.14.0 its specification is implicit with the STORED AS AVRO clause. +* A SerDe for the [ORC]({{% ref "languagemanual-orc" %}}) file format was added in Hive 0.11.0. +* A SerDe for [Parquet]({{% ref "parquet" %}}) was added via plug-in in Hive 0.10 and natively in Hive 0.13.0. * A SerDe for [CSV](/docs/latest/user/csv-serde) was added in Hive 0.14. -See [SerDe]({{< ref "serde" >}}) for detailed information about input and output processing. Also see [Storage Formats]({{< ref "hcatalog-storageformats" >}}) in the [HCatalog manual]({{< ref "hcatalog-base" >}}), including [CTAS Issue with JSON SerDe]({{< ref "#ctas-issue-with-json-serde" >}}). For information about how to create a table with a custom or native SerDe, see [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}). +See [SerDe]({{% ref "serde" %}}) for detailed information about input and output processing. Also see [Storage Formats]({{% ref "hcatalog-storageformats" %}}) in the [HCatalog manual]({{% ref "hcatalog-base" %}}), including [CTAS Issue with JSON SerDe]({{% ref "#ctas-issue-with-json-serde" %}}). For information about how to create a table with a custom or native SerDe, see [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}). #### How to Write Your Own SerDe * In most cases, users want to write a Deserializer instead of a SerDe, because users just want to read their own data format instead of writing to it. * For example, the RegexDeserializer will deserialize the data using the configuration parameter 'regex', and possibly a list of column names (see serde2.MetadataTypedColumnsetSerDe). Please see serde2/Deserializer.java for details. * If your SerDe supports DDL (basically, SerDe with parameterized columns and column types), you probably want to implement a Protocol based on DynamicSerDe, instead of writing a SerDe from scratch. The reason is that the framework passes DDL to SerDe through "Thrift DDL" format, and it's non-trivial to write a "Thrift DDL" parser. -* For examples, see [SerDe - how to add a new SerDe]({{< ref "#serde---how-to-add-a-new-serde" >}}) below. +* For examples, see [SerDe - how to add a new SerDe]({{% ref "#serde---how-to-add-a-new-serde" %}}) below. Some important points about SerDe: @@ -104,7 +104,7 @@ NOTE: Apache Hive recommends that custom ObjectInspectors created for use with c #### Registration of Native SerDes -As of [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5976)a registration mechanism has been introduced for native Hive SerDes.  This allows dynamic binding between a "STORED AS" keyword in place of a triplet of {SerDe, InputFormat, and OutputFormat} specification, in [CreateTable]({{< ref "#createtable" >}}) statements. +As of [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5976)a registration mechanism has been introduced for native Hive SerDes.  This allows dynamic binding between a "STORED AS" keyword in place of a triplet of {SerDe, InputFormat, and OutputFormat} specification, in [CreateTable]({{% ref "#createtable" %}}) statements. The following mappings have been added through this registration mechanism: @@ -173,7 +173,7 @@ Ant to Maven As of version [0.13](https://issues.apache.org/jira/browse/HIVE-5107) Hive uses Maven instead of Ant for its build. The following instructions are not up to date. -See the [Hive Developer FAQ]({{< ref "hivedeveloperfaq" >}}) for updated instructions. +See the [Hive Developer FAQ]({{% ref "hivedeveloperfaq" %}}) for updated instructions. Hive can be made to compile against different versions of Hadoop. diff --git a/content/community/resources/hive-apis-overview.md b/content/community/resources/hive-apis-overview.md index 8acb62b7..1c707645 100644 --- a/content/community/resources/hive-apis-overview.md +++ b/content/community/resources/hive-apis-overview.md @@ -27,7 +27,7 @@ Operation based Java API that presents a number of DDL type operations, however ## HCatalog Storage Handlers (Java) -Operation based Java API. This is well [documented on the wiki]({{< ref "hcatalog-inputoutput" >}}). +Operation based Java API. This is well [documented on the wiki]({{% ref "hcatalog-inputoutput" %}}). TODO @@ -35,7 +35,7 @@ Requires overview. ## HCatalog CLI (Command Line) -Query based API. This is well [documented on the wiki]({{< ref "hcatalog-cli" >}}). +Query based API. This is well [documented on the wiki]({{% ref "hcatalog-cli" %}}). Hive community has been working deprecating Hive Cli. Hcatalog Cli is similar to Hive Cli and will be deprecated. @@ -49,17 +49,17 @@ There are numerous ways of instantiating the metastore API including: `HCatUtil. ## WebHCat (REST) -WebHCat is a REST operation based API for [HCatalog]({{< ref "hcatalog-base" >}}). This is well [documented on the wiki]({{< ref "webhcat-reference" >}}). +WebHCat is a REST operation based API for [HCatalog]({{% ref "hcatalog-base" %}}). This is well [documented on the wiki]({{% ref "webhcat-reference" %}}). This not actively maintained and likely not be supported in future releases. For job submission, consider using Oozie or similar tools. For DDL, use JDBC. ## Streaming Data Ingest (Java) -Operation based Java API focused on the writing of continuous streams of data into transactional tables using Hive’s [ACID]({{< ref "hive-transactions" >}}) feature. New data is inserted into tables using small batches and short-lived transactions. Documented [on the wiki]({{< ref "streaming-data-ingest" >}}) and has [package level Javadoc](http://htmlpreview.github.io/?https://github.com/apache/hive/blob/master/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html). Introduced in Hive version 0.13.0 ([HIVE-5687](https://issues.apache.org/jira/browse/HIVE-5687)). +Operation based Java API focused on the writing of continuous streams of data into transactional tables using Hive’s [ACID]({{% ref "hive-transactions" %}}) feature. New data is inserted into tables using small batches and short-lived transactions. Documented [on the wiki]({{% ref "streaming-data-ingest" %}}) and has [package level Javadoc](http://htmlpreview.github.io/?https://github.com/apache/hive/blob/master/hcatalog/streaming/src/java/org/apache/hive/hcatalog/streaming/package.html). Introduced in Hive version 0.13.0 ([HIVE-5687](https://issues.apache.org/jira/browse/HIVE-5687)). ## Streaming Mutation (Java) -Operation based Java API focused on mutating (insert/update/delete) records into transactional tables using Hive’s [ACID]({{< ref "hive-transactions" >}}) feature. Large volumes of mutations are applied atomically in a single long-lived transaction. Documented [on the wiki]({{< ref "hcatalog-streaming-mutation-api" >}}). Scheduled for release in Hive version 2.0.0 ([HIVE-10165](https://issues.apache.org/jira/browse/HIVE-10165)[).](https://issues.apache.org/jira/browse/HIVE-5687) +Operation based Java API focused on mutating (insert/update/delete) records into transactional tables using Hive’s [ACID]({{% ref "hive-transactions" %}}) feature. Large volumes of mutations are applied atomically in a single long-lived transaction. Documented [on the wiki]({{% ref "hcatalog-streaming-mutation-api" %}}). Scheduled for release in Hive version 2.0.0 ([HIVE-10165](https://issues.apache.org/jira/browse/HIVE-10165)[).](https://issues.apache.org/jira/browse/HIVE-5687) ## hive-jdbc (JDBC) diff --git a/content/community/resources/hivedeveloperfaq.md b/content/community/resources/hivedeveloperfaq.md index 8c6b2ff2..63f37cbb 100644 --- a/content/community/resources/hivedeveloperfaq.md +++ b/content/community/resources/hivedeveloperfaq.md @@ -179,7 +179,7 @@ mvn clean install -DskipTests ## Testing -For general information, see [Unit Tests and Debugging]({{< ref "#unit-tests-and-debugging" >}}) in the Developer Guide. +For general information, see [Unit Tests and Debugging]({{% ref "#unit-tests-and-debugging" %}}) in the Developer Guide. ### How do I run precommit tests on a patch? @@ -276,7 +276,7 @@ java.lang.NullPointerException: null ### How do I debug into a single test in Eclipse? -You can debug into a single JUnit test in [Eclipse](https://www.eclipse.org/users/) by first making sure you've built the Eclipse files and imported the project into Eclipse as described [here]({{< ref "#here" >}}). Then set one or more breakpoints, highlight the method name of the JUnit test method you want to debug into, and do `Run->Debug`. +You can debug into a single JUnit test in [Eclipse](https://www.eclipse.org/users/) by first making sure you've built the Eclipse files and imported the project into Eclipse as described [here]({{% ref "#here" %}}). Then set one or more breakpoints, highlight the method name of the JUnit test method you want to debug into, and do `Run->Debug`. Another useful method to debug these tests is to attach a remote debugger. When you run the test, enable the debug mode for surefire by passing in "`-Dmaven.surefire.debug`". Additional details on how to turning on debugging for surefire can be found [here](http://maven.apache.org/surefire/maven-surefire-plugin/examples/debugging.html). Now when you run the tests, it will wait with a message similar to @@ -288,7 +288,7 @@ Note that this assumes that you are still using the default port 5005 for surefi ### How do I debug my queries in Hive? -You can also interactively debug your queries in Hive by attaching a remote debugger. To do so, start [Beeline]({{< ref "#beeline" >}}) with the "`--debug`" option. +You can also interactively debug your queries in Hive by attaching a remote debugger. To do so, start [Beeline]({{% ref "#beeline" %}}) with the "`--debug`" option. ``` $ beeline --debug diff --git a/content/community/resources/howtocommit.md b/content/community/resources/howtocommit.md index c420dbef..1f9de244 100644 --- a/content/community/resources/howtocommit.md +++ b/content/community/resources/howtocommit.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : Guide for Committers -This page contains guidelines for committers of the Apache Hive project. (If you're currently a contributor, and are interested in how we add new committers, read [BecomingACommitter]({{< ref "/community/becomingcommitter" >}})) +This page contains guidelines for committers of the Apache Hive project. (If you're currently a contributor, and are interested in how we add new committers, read [BecomingACommitter]({{% ref "/community/becomingcommitter" %}})) ## New committers @@ -29,7 +29,7 @@ Hive committers can not +1/Approve their own Pull Requests, i.e. you are allowed ## Reject -Pull Requests should be rejected which do not adhere to the guidelines in [HowToContribute]({{< ref "howtocontribute" >}}). Committers should always be polite to contributors and try to instruct and encourage them to contribute better Pull Requests. If a committer wishes to improve an unacceptable change, then he/she drop review comments and ask the contributor to update. +Pull Requests should be rejected which do not adhere to the guidelines in [HowToContribute]({{% ref "howtocontribute" %}}). Committers should always be polite to contributors and try to instruct and encourage them to contribute better Pull Requests. If a committer wishes to improve an unacceptable change, then he/she drop review comments and ask the contributor to update. ## PreCommit runs, and committing patches diff --git a/content/community/resources/howtocontribute.md b/content/community/resources/howtocontribute.md index 3e7be745..34165485 100644 --- a/content/community/resources/howtocontribute.md +++ b/content/community/resources/howtocontribute.md @@ -11,7 +11,7 @@ This page describes the mechanics of *how* to contribute software to Apache Hive First of all, you need the Hive source code. -Get the source code on your local drive using git. See [Understanding Hive Branches]({{< ref "#understanding-hive-branches" >}}) below to understand which branch you should be using. +Get the source code on your local drive using git. See [Understanding Hive Branches]({{% ref "#understanding-hive-branches" %}}) below to understand which branch you should be using. ``` git clone https://github.com/apache/hive @@ -22,7 +22,7 @@ Setting Up Eclipse Development Environment (Optional) This is an optional step. Eclipse has a lot of advanced features for Java development, and it makes the life much easier for Hive developers as well. -[How do I import into eclipse?]({{< ref "#how-do-i-import-into-eclipse?" >}}) +[How do I import into eclipse?]({{% ref "#how-do-i-import-into-eclipse?" %}}) ## Becoming a Contributor @@ -32,7 +32,7 @@ This checklist tells you how to create accounts and obtain permissions needed by + The ASF JIRA system dashboard is [here](https://issues.apache.org/jira/secure/Dashboard.jspa). + The Hive JIRA is [here](https://issues.apache.org/jira/browse/HIVE). * To review patches check the open [pull requests on GitHub](https://github.com/apache/hive/pulls) -* To contribute to the Hive wiki, follow the instructions in [About This Wiki]({{< ref "#about-this-wiki" >}}). +* To contribute to the Hive wiki, follow the instructions in [About This Wiki]({{% ref "#about-this-wiki" %}}). * To edit the Hive website, follow the instructions in [How to edit the website](https://github.com/apache/hive-site/blob/main/README.md). * Join the [Hive mailing lists](http://hive.apache.org/mailing_lists.html) to receive email about issues and discussions. @@ -78,7 +78,7 @@ Hive is a **multi-module** Maven project. If you are new to [Maven](http://maven Additionally, Hive actually has two projects, "core" and "itests". The reason that itests is not connected to the core reactor is that itests requires the packages to be built. -The actual Maven commands you will need are discussed on the [HiveDeveloperFAQ]({{< ref "hivedeveloperfaq" >}}) page. +The actual Maven commands you will need are discussed on the [HiveDeveloperFAQ]({{% ref "hivedeveloperfaq" %}}) page. ### Understanding Hive Branches @@ -92,7 +92,7 @@ Release branches are made from branch-1 (for 1.x) or master (for 2.x) when the c Feature branches are used to develop new features without destabilizing the rest of Hive. The intent of a feature branch is that it will be merged back into master once the feature has stabilized. -For general information about Hive branches, see [Hive Versions and Branches]({{< ref "#hive-versions-and-branches" >}}). +For general information about Hive branches, see [Hive Versions and Branches]({{% ref "#hive-versions-and-branches" %}}). ### Hadoop Dependencies @@ -102,7 +102,7 @@ Hadoop dependencies are handled differently in master and branch-1. In branch-1 both Hadoop 1.x and 2.x are supported. The Hive build downloads a number of different Hadoop versions via Maven in order to compile "shims" which allow for compatibility with these Hadoop versions. However, the rest of Hive is only built and tested against a single Hadoop version. -The Maven build has two profiles, `hadoop-1` for Hadoop 1.x and `hadoop-2` for Hadoop 2.x. When building, you must specify which profile you wish to use via Maven's `-P` command line option (see [How to build all source]({{< ref "#how-to-build-all-source" >}})). +The Maven build has two profiles, `hadoop-1` for Hadoop 1.x and `hadoop-2` for Hadoop 2.x. When building, you must specify which profile you wish to use via Maven's `-P` command line option (see [How to build all source]({{% ref "#how-to-build-all-source" %}})). On this page we assume you are building from the master branch and do not include the profile in the example Maven commands. If you are building on branch-1 you will need to select the appropriate profile for the version of Hadoop you are building against. @@ -112,7 +112,7 @@ Hadoop 1.x is no longer supported in Hive's master branch. There is no need to s ### Unit Tests -When submitting a patch it's highly recommended you execute tests locally which you believe will be impacted in addition to any new tests. The full test suite can be executed by [hive-precommit on Jenkins](https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/activity). [Hive Developer FAQ]({{< ref "hivedeveloperfaq" >}}) describes how to execute a specific set of tests. +When submitting a patch it's highly recommended you execute tests locally which you believe will be impacted in addition to any new tests. The full test suite can be executed by [hive-precommit on Jenkins](https://ci.hive.apache.org/blue/organizations/jenkins/hive-precommit/activity). [Hive Developer FAQ]({{% ref "hivedeveloperfaq" %}}) describes how to execute a specific set of tests. ``` mvn clean install -DskipTests @@ -187,7 +187,7 @@ Please do: * try to adhere to the coding style of files you edit; * comment code whose function or rationale is not obvious; -* add one or more unit tests (see [Add a Unit Test]({{< ref "#add-a-unit-test" >}}) above); +* add one or more unit tests (see [Add a Unit Test]({{% ref "#add-a-unit-test" %}}) above); * update documentation (such as Javadocs including *package.html* files and this wiki). ### Fetching a PR from Github diff --git a/content/community/resources/metastore-api-tests.md b/content/community/resources/metastore-api-tests.md index 405879bb..3f38a832 100644 --- a/content/community/resources/metastore-api-tests.md +++ b/content/community/resources/metastore-api-tests.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : MetaStore API Tests -* [IMetaStoreClient Tests]({{< ref "#imetastoreclient-tests" >}}) +* [IMetaStoreClient Tests]({{% ref "#imetastoreclient-tests" %}}) # IMetaStoreClient Tests diff --git a/content/community/resources/testingdocs.md b/content/community/resources/testingdocs.md index 448b480a..8d8e8fcd 100644 --- a/content/community/resources/testingdocs.md +++ b/content/community/resources/testingdocs.md @@ -9,9 +9,9 @@ date: 2024-12-12 The following documents describe aspects of testing for Hive: -* [Hive Developer FAQ:  Testing]({{< ref "#hive-developer-faq:- testing" >}}) -* [Developer Guide:  Unit Tests]({{< ref "#developer-guide:- unit-tests" >}}) -* [Unit Testing Hive SQL]({{< ref "unit-testing-hive-sql" >}}) -* [Running Yetus]({{< ref "running-yetus" >}}) -* [MetaStore API Tests]({{< ref "metastore-api-tests" >}}) +* [Hive Developer FAQ:  Testing]({{% ref "#hive-developer-faq:- testing" %}}) +* [Developer Guide:  Unit Tests]({{% ref "#developer-guide:- unit-tests" %}}) +* [Unit Testing Hive SQL]({{% ref "unit-testing-hive-sql" %}}) +* [Running Yetus]({{% ref "running-yetus" %}}) +* [MetaStore API Tests]({{% ref "metastore-api-tests" %}}) * [Query File Test(qtest)](/development/qtest/) diff --git a/content/community/resources/unit-testing-hive-sql.md b/content/community/resources/unit-testing-hive-sql.md index cb238d66..a8644889 100644 --- a/content/community/resources/unit-testing-hive-sql.md +++ b/content/community/resources/unit-testing-hive-sql.md @@ -31,7 +31,7 @@ By modularising processes implemented using Hive they become easier to test effe ### Encapsulation of column level logic -In the case of column level logic Hive provides both [UDFs](/docs/latest/language/hiveplugins#creating-custom-udfs) and [macros]({{< ref "#macros" >}}) that allow the user to extract and reuse the expressions applied to columns. Once defined, UDFs and Macros can be readily isolated for testing. UDFs can be simply tested with existing Java/Python unit test tools such as JUnit whereas Macros require a Hive command line interface to execute the macro declaration and then exercise it with some sample `SELECT` statements. +In the case of column level logic Hive provides both [UDFs](/docs/latest/language/hiveplugins#creating-custom-udfs) and [macros]({{% ref "#macros" %}}) that allow the user to extract and reuse the expressions applied to columns. Once defined, UDFs and Macros can be readily isolated for testing. UDFs can be simply tested with existing Java/Python unit test tools such as JUnit whereas Macros require a Hive command line interface to execute the macro declaration and then exercise it with some sample `SELECT` statements. ### Encapsulation of set level logic diff --git a/content/docs/latest/admin/adminmanual-configuration.md b/content/docs/latest/admin/adminmanual-configuration.md index 732a8816..9ad04b67 100644 --- a/content/docs/latest/admin/adminmanual-configuration.md +++ b/content/docs/latest/admin/adminmanual-configuration.md @@ -9,19 +9,19 @@ date: 2024-12-12 A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference: -* Using the **set command** in the [CLI]({{< ref "languagemanual-cli" >}}) or [Beeline]({{< ref "#beeline" >}}) for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to `/tmp/mydir` for all subsequent statements: +* Using the **set command** in the [CLI]({{% ref "languagemanual-cli" %}}) or [Beeline]({{% ref "#beeline" %}}) for setting session level values for the configuration variable for all statements subsequent to the set command. For example, the following command sets the scratch directory (which is used by Hive to store temporary output and plans) to `/tmp/mydir` for all subsequent statements: ``` set hive.exec.scratchdir=/tmp/mydir; ``` -* Using the **`--hiveconf`** option of the `[hive]({{< ref "#hive" >}})` command (in the CLI) or `[beeline]({{< ref "#beeline" >}})` command for the entire session. For example: +* Using the **`--hiveconf`** option of the `[hive]({{% ref "#hive" %}})` command (in the CLI) or `[beeline]({{% ref "#beeline" %}})` command for the entire session. For example: ``` bin/hive --hiveconf hive.exec.scratchdir=/tmp/mydir ``` -* In **`hive-site.xml`**. This is used for setting values for the entire Hive configuration (see [hive-site.xml and hive-default.xml.template]({{< ref "#hive-sitexml-and-hive-defaultxmltemplate" >}}) below). For example: +* In **`hive-site.xml`**. This is used for setting values for the entire Hive configuration (see [hive-site.xml and hive-default.xml.template]({{% ref "#hive-sitexml-and-hive-defaultxmltemplate" %}}) below). For example: ``` @@ -37,7 +37,7 @@ The server-specific configuration file is useful in two situations: 2. You want to set a configuration value only in a server-specific configuration file (for example – setting the metastore database password only in the metastore server configuration file). HiveMetastore server reads hive-site.xml as well as hivemetastore-site.xml configuration files that are available in the $HIVE_CONF_DIR or in the classpath. If the metastore is being used in embedded mode (i.e., hive.metastore.uris is not set or empty) in `hive` commandline or HiveServer2, the hivemetastore-site.xml gets loaded by the parent process as well. The value of hive.metastore.uris is examined to determine this, and the value should be set appropriately in hive-site.xml . - Certain [metastore configuration parameters]({{< ref "#metastore-configuration-parameters" >}}) like hive.metastore.sasl.enabled, hive.metastore.kerberos.principal, hive.metastore.execute.setugi, and hive.metastore.thrift.framed.transport.enabled are used by the metastore client as well as server. For such common parameters it is better to set the values in hive-site.xml, that will help in keeping them consistent. + Certain [metastore configuration parameters]({{% ref "#metastore-configuration-parameters" %}}) like hive.metastore.sasl.enabled, hive.metastore.kerberos.principal, hive.metastore.execute.setugi, and hive.metastore.thrift.framed.transport.enabled are used by the metastore client as well as server. For such common parameters it is better to set the values in hive-site.xml, that will help in keeping them consistent. HiveServer2 reads hive-site.xml as well as hiveserver2-site.xml that are available in the $HIVE_CONF_DIR or in the classpath. If HiveServer2 is using the metastore in embedded mode, hivemetastore-site.xml also is loaded. @@ -55,7 +55,7 @@ Please note that the template file `hive-default.xml.template` is not used by H In Hive releases 0.9.0 through 0.13.1, the template file does not necessarily contain all configuration options found in `HiveConf.java` and some of its values and descriptions might be out of date or out of sync with the actual values and descriptions. However, as of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-6037) the template file is generated directly from `HiveConf.java` and therefore it is a reliable source for configuration variables and their defaults. -The administrative configuration variables are listed [below]({{< ref "#below" >}}). User variables are listed in [Hive Configuration Properties]({{< ref "configuration-properties" >}}). As of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-6037) you can display information about a configuration variable with the [SHOW CONF command]({{< ref "#show-conf-command" >}}). +The administrative configuration variables are listed [below]({{% ref "#below" %}}). User variables are listed in [Hive Configuration Properties]({{% ref "configuration-properties" %}}). As of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-6037) you can display information about a configuration variable with the [SHOW CONF command]({{% ref "#show-conf-command" %}}). ### Temporary Folders @@ -68,28 +68,28 @@ Note that when writing data to a table/partition, Hive will first write to a tem ### Log Files -Hive client produces logs and history files on the client machine. Please see [Hive Logging]({{< ref "#hive-logging" >}}) for configuration details. +Hive client produces logs and history files on the client machine. Please see [Hive Logging]({{% ref "#hive-logging" %}}) for configuration details. -For WebHCat logs, see [Log Files]({{< ref "#log-files" >}}) in the [WebHCat manual]({{< ref "webhcat-base" >}}). +For WebHCat logs, see [Log Files]({{% ref "#log-files" %}}) in the [WebHCat manual]({{% ref "webhcat-base" %}}). ### Derby Server Mode -[Derby](http://db.apache.org/derby/) is the default database for the Hive metastore ([Metadata Store]({{< ref "#metadata-store" >}})). To run Derby as a network server for multiple users, see [Hive Using Derby in Server Mode]({{< ref "hivederbyservermode" >}}). +[Derby](http://db.apache.org/derby/) is the default database for the Hive metastore ([Metadata Store]({{% ref "#metadata-store" %}})). To run Derby as a network server for multiple users, see [Hive Using Derby in Server Mode]({{% ref "hivederbyservermode" %}}). ### Configuration Variables Broadly the configuration variables for Hive administration are categorized into: -* [Hive Configuration Variables]({{< ref "#hive-configuration-variables" >}}) -* [Hive Metastore Configuration Variables]({{< ref "#hive-metastore-configuration-variables" >}}) -* [Configuration Variables Used to Interact with Hadoop]({{< ref "#configuration-variables-used-to-interact-with-hadoop" >}}) -* [Hive Variables Used to Pass Run Time Information]({{< ref "#hive-variables-used-to-pass-run-time-information" >}}) +* [Hive Configuration Variables]({{% ref "#hive-configuration-variables" %}}) +* [Hive Metastore Configuration Variables]({{% ref "#hive-metastore-configuration-variables" %}}) +* [Configuration Variables Used to Interact with Hadoop]({{% ref "#configuration-variables-used-to-interact-with-hadoop" %}}) +* [Hive Variables Used to Pass Run Time Information]({{% ref "#hive-variables-used-to-pass-run-time-information" %}}) -Also see [Hive Configuration Properties]({{< ref "configuration-properties" >}}) in the [Language Manual]({{< ref "languagemanual" >}}) for non-administrative configuration variables. +Also see [Hive Configuration Properties]({{% ref "configuration-properties" %}}) in the [Language Manual]({{% ref "languagemanual" %}}) for non-administrative configuration variables. Version information: Metrics - A new Hive metrics system based on Codahale is introduced in releases 1.3.0 and 2.0.0 by [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761). To configure it or revert to the old metrics system, see the [Metrics section of Hive Configuration Properties]({{< ref "#metrics-section-of-hive-configuration-properties" >}}). + A new Hive metrics system based on Codahale is introduced in releases 1.3.0 and 2.0.0 by [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761). To configure it or revert to the old metrics system, see the [Metrics section of Hive Configuration Properties]({{% ref "#metrics-section-of-hive-configuration-properties" %}}). #### Hive Configuration Variables @@ -108,7 +108,7 @@ Version information: Metrics | hive.resource.use.hdfs.location | Reference HDFS based files/jars directly instead of copying to session based HDFS scratch directory. (As of Hive [2.2.1](https://issues.apache.org/jira/browse/HIVE-17574).) | true | | hive.jar.path | The location of hive_cli.jar that is used when submitting jobs in a separate jvm. |   | | hive.aux.jars.path | The location of the plugin jars that contain implementations of user defined functions and SerDes. |   | -| hive.reloadable.aux.jars.path | The location of plugin jars that can be renewed (added, removed, or updated) by executing the [Beeline reload command]({{< ref "#beeline-reload-command" >}}), without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path[for creating UDFs or SerDes](/docs/latest/user/configuration-properties#hiveauxjarspath). (As of Hive [0.14.0](https://issues.apache.org/jira/browse/HIVE-7553).) | | +| hive.reloadable.aux.jars.path | The location of plugin jars that can be renewed (added, removed, or updated) by executing the [Beeline reload command]({{% ref "#beeline-reload-command" %}}), without having to restart HiveServer2. These jars can be used just like the auxiliary classes in hive.aux.jars.path[for creating UDFs or SerDes](/docs/latest/user/configuration-properties#hiveauxjarspath). (As of Hive [0.14.0](https://issues.apache.org/jira/browse/HIVE-7553).) | | | hive.partition.pruning | A strict value for this variable indicates that an error is thrown by the compiler in case no partition predicate is provided on a partitioned table. This is used to protect against a user inadvertently issuing a query against all the partitions of the table. | nonstrict | | hive.map.aggr | Determines whether the map side aggregation is on or not. | true | | hive.join.emit.interval |   | 1000 | @@ -132,9 +132,9 @@ Version information: Metrics #### Hive Metastore Configuration Variables -Please see [Hive Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}) for information about the configuration variables used to set up the metastore in local, remote, or embedded mode. Also see descriptions in the [Metastore]({{< ref "#metastore" >}}) section of the Language Manual's [Hive Configuration Properties]({{< ref "configuration-properties" >}}). +Please see [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}) for information about the configuration variables used to set up the metastore in local, remote, or embedded mode. Also see descriptions in the [Metastore]({{% ref "#metastore" %}}) section of the Language Manual's [Hive Configuration Properties]({{% ref "configuration-properties" %}}). -For security configuration (Hive 0.10 and later), see the [Hive Metastore Security](/docs/latest/user/configuration-properties#hive-metastore-security) section in the Language Manual's [Hive Configuration Properties]({{< ref "configuration-properties" >}}). +For security configuration (Hive 0.10 and later), see the [Hive Metastore Security](/docs/latest/user/configuration-properties#hive-metastore-security) section in the Language Manual's [Hive Configuration Properties]({{% ref "configuration-properties" %}}). #### Configuration Variables Used to Interact with Hadoop @@ -191,7 +191,7 @@ See , which is used by Hive to retrieve the metastore password. -3. Remove the Hive Metastore password entry ([javax.jdo.option.ConnectionPassword]({{< ref "#javax-jdo-option-connectionpassword" >}})) from the Hive configuration. The CredentialProvider will be used instead. +3. Remove the Hive Metastore password entry ([javax.jdo.option.ConnectionPassword]({{% ref "#javax-jdo-option-connectionpassword" %}})) from the Hive configuration. The CredentialProvider will be used instead. 4. Restart Hive Metastore Server/HiveServer2. ## Configuring HCatalog and WebHCat @@ -200,12 +200,12 @@ This configures the CredentialProvider used by }}) for metastore configuration properties. -* See [HCatalog Installation from Tarball]({{< ref "hcatalog-installhcat" >}}) for additional information. +* See [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}) for metastore configuration properties. +* See [HCatalog Installation from Tarball]({{% ref "hcatalog-installhcat" %}}) for additional information. For Hive releases prior to 0.11.0, see the "Thrift Server Setup" section in the HCatalog 0.5.0 document [Installation from Tarball](/docs/latest/hcatalog/hcatalog-installhcat/). ### WebHCat -For information about configuring WebHCat, see [WebHCat Configuration]({{< ref "webhcat-configure" >}}). +For information about configuring WebHCat, see [WebHCat Configuration]({{% ref "webhcat-configure" %}}). diff --git a/content/docs/latest/admin/adminmanual-installation.md b/content/docs/latest/admin/adminmanual-installation.md index e1804a23..58a2ef8f 100644 --- a/content/docs/latest/admin/adminmanual-installation.md +++ b/content/docs/latest/admin/adminmanual-installation.md @@ -46,13 +46,13 @@ Finally, add `$HIVE_HOME/bin` to your `PATH`: Version information -To build Hive 1.2.0 and later releases with [Apache Maven](http://maven.apache.org/), see [Getting Started: Building Hive from Source]({{< ref "#getting-started:-building-hive-from-source" >}}). You will need Java 1.7 or newer. +To build Hive 1.2.0 and later releases with [Apache Maven](http://maven.apache.org/), see [Getting Started: Building Hive from Source]({{% ref "#getting-started:-building-hive-from-source" %}}). You will need Java 1.7 or newer. ## Installing from Source Code (Hive 0.13.0 and Later) Version information -To build Hive 0.13.0 and later releases with [Apache Maven](http://maven.apache.org/), see [Getting Started: Building Hive from Source]({{< ref "#getting-started:-building-hive-from-source" >}}). +To build Hive 0.13.0 and later releases with [Apache Maven](http://maven.apache.org/), see [Getting Started: Building Hive from Source]({{% ref "#getting-started:-building-hive-from-source" %}}). ## Installing from Source Code (Hive 0.12.0 and Earlier) @@ -98,7 +98,7 @@ You can begin using Hive as soon as it is installed, although you will probably ## Hive CLI and Beeline CLI -To use the Hive [command line interface]({{< ref "languagemanual-cli" >}}) (CLI) go to the Hive home directory and execute the following command: +To use the Hive [command line interface]({{% ref "languagemanual-cli" %}}) (CLI) go to the Hive home directory and execute the following command: ``` $ bin/hive @@ -117,11 +117,11 @@ $ bin/beeline Metadata is stored in an embedded Derby database whose disk storage location is determined by the Hive configuration variable named javax.jdo.option.ConnectionURL. By default, this location is ./metastore_db (see conf/hive-default.xml). -Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see [Hive Using Derby in Server Mode]({{< ref "hivederbyservermode" >}}). +Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see [Hive Using Derby in Server Mode]({{% ref "hivederbyservermode" %}}). -To configure a database other than Derby for the Hive metastore, see [Hive Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}). +To configure a database other than Derby for the Hive metastore, see [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}). -**Next Step:** [Configuring Hive]({{< ref "adminmanual-configuration" >}}). +**Next Step:** [Configuring Hive]({{% ref "adminmanual-configuration" %}}). # HCatalog and WebHCat @@ -131,9 +131,9 @@ Version HCatalog is installed with Hive, starting with Hive release 0.11.0. -If you install Hive from the binary tarball, the `hcat` command is available in the `hcatalog/bin` directory. However, most `hcat` commands can be issued as `hive` commands except for "`hcat -g`" and "`hcat -p`". Note that the `hcat` command uses the `-p` flag for permissions but `hive` uses it to specify a port number. The HCatalog CLI is documented [here]({{< ref "hcatalog-cli" >}}) and the Hive CLI is documented [here]({{< ref "languagemanual-cli" >}}). +If you install Hive from the binary tarball, the `hcat` command is available in the `hcatalog/bin` directory. However, most `hcat` commands can be issued as `hive` commands except for "`hcat -g`" and "`hcat -p`". Note that the `hcat` command uses the `-p` flag for permissions but `hive` uses it to specify a port number. The HCatalog CLI is documented [here]({{% ref "hcatalog-cli" %}}) and the Hive CLI is documented [here]({{% ref "languagemanual-cli" %}}). -HCatalog installation is documented [here]({{< ref "hcatalog-installhcat" >}}). +HCatalog installation is documented [here]({{% ref "hcatalog-installhcat" %}}). ## WebHCat (Templeton) @@ -143,7 +143,7 @@ WebHCat is installed with Hive, starting with Hive release 0.11.0. If you install Hive from the binary tarball, the WebHCat server command `webhcat_server.sh` is in the `hcatalog/sbin` directory. -WebHCat installation is documented [here]({{< ref "webhcat-installwebhcat" >}}). +WebHCat installation is documented [here]({{% ref "webhcat-installwebhcat" %}}). diff --git a/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md b/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md index 886b812b..f98f95a3 100644 --- a/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md +++ b/content/docs/latest/admin/adminmanual-metastore-3-0-administration.md @@ -7,13 +7,13 @@ date: 2024-12-12 ## Version Note -**This document applies only to the Metastore in Hive 3.0 and later releases.**  For Hive 0, 1, and 2 releases please see the [Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}) document. +**This document applies only to the Metastore in Hive 3.0 and later releases.**  For Hive 0, 1, and 2 releases please see the [Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}) document. ## Introduction The definition of Hive objects such as databases, tables, and functions are stored in the Metastore.  Depending on how the system is configured, statistics and authorization records may also be stored there.  Hive, and other execution engines, use this data at runtime to determine how to parse, authorize, and efficiently execute user queries.   -The Metastore persists the object definitions to a relational database (RDBMS) via [DataNucleus](http://www.datanucleus.org/), a Java JDO based Object Relational Mapping (ORM) layer. See [Supported RDBMSs]({{< ref "adminmanual-metastore-3-0-administration" >}}) below for a list of supported RDBMSs that can be used. +The Metastore persists the object definitions to a relational database (RDBMS) via [DataNucleus](http://www.datanucleus.org/), a Java JDO based Object Relational Mapping (ORM) layer. See [Supported RDBMSs]({{% ref "adminmanual-metastore-3-0-administration" %}}) below for a list of supported RDBMSs that can be used. The Metastore can be configured to embed the Apache Derby RDBMS or connect to a external RDBMS.  The Metastore itself can be embedded entirely in a user process or run as a service for other processes to connect to.  Each of these options will be discussed in turn below. @@ -21,13 +21,13 @@ The Metastore can be configured to embed the Apache Derby RDBMS or connect to a Beginning in Hive 3.0, the Metastore can be run without the rest of Hive being installed.  It is provided as a separate release in order to allow non-Hive systems to easily integrate with it.  (It is, however, still included in the Hive release for convenience.)  Making the Metastore a standalone service involved changing a number of configuration parameter names and tool names.  All of the old configuration parameters and tools still work, in order to maximize backwards compatibility.  This document will cover both the old and new names.  As new functionality is added old, Hive style names will not be added. -For details on using the Metastore without Hive, see [Running the Metastore Without Hive]({{< ref "adminmanual-metastore-3-0-administration" >}}) below. +For details on using the Metastore without Hive, see [Running the Metastore Without Hive]({{% ref "adminmanual-metastore-3-0-administration" %}}) below. ## General Configuration -The metastore reads its configuration from the file `metastore-site.xml`.  It expects to find this file in $`METASTORE_HOME/conf` where `$METASTORE_HOME` is an environment variable.  For backwards compatibility it will also read any `hive-site.xml` or `hive-metastoresite.xml`files found in `HIVE_HOME/conf`.  Configuration options can also be defined on the command line (see [Starting and Stopping the Service]({{< ref "adminmanual-metastore-3-0-administration" >}}) below). +The metastore reads its configuration from the file `metastore-site.xml`.  It expects to find this file in $`METASTORE_HOME/conf` where `$METASTORE_HOME` is an environment variable.  For backwards compatibility it will also read any `hive-site.xml` or `hive-metastoresite.xml`files found in `HIVE_HOME/conf`.  Configuration options can also be defined on the command line (see [Starting and Stopping the Service]({{% ref "adminmanual-metastore-3-0-administration" %}}) below). -Configuration values specific to running the Metastore with various RDBMSs, embedded or as a service, and without Hive are discussed in the relevant sections.  The following configuration values apply to the Metastore regardless of how it is being run.  This table covers only commonly customized configuration values.  For less commonly changed configuration values see [Less Commonly Changed Configuration Parameters]({{< ref "adminmanual-metastore-3-0-administration" >}}). +Configuration values specific to running the Metastore with various RDBMSs, embedded or as a service, and without Hive are discussed in the relevant sections.  The following configuration values apply to the Metastore regardless of how it is being run.  This table covers only commonly customized configuration values.  For less commonly changed configuration values see [Less Commonly Changed Configuration Parameters]({{% ref "adminmanual-metastore-3-0-administration" %}}).   @@ -142,7 +142,7 @@ Prior to Hive 3.0 there was only a single implementation of the MetaStore API (c The cache is automatically updated with new data when changes are made through this MetaStore. In a scenario where there are multiple MetaStore servers the caches can be out of date on some of them. To prevent this the CachedStore automatically refreshes the cache in a configurable frequency (default: 1 minute). -Details about all properties for the CachedStore can be found on [Configuration Properties]({{< ref "configuration-properties" >}}) (Prefix: `metastore.cached`).  +Details about all properties for the CachedStore can be found on [Configuration Properties]({{% ref "configuration-properties" %}}) (Prefix: `metastore.cached`).  ## Less Commonly Changed Configuration Parameters diff --git a/content/docs/latest/admin/adminmanual-metastore-administration.md b/content/docs/latest/admin/adminmanual-metastore-administration.md index a42845df..9546abb6 100644 --- a/content/docs/latest/admin/adminmanual-metastore-administration.md +++ b/content/docs/latest/admin/adminmanual-metastore-administration.md @@ -5,11 +5,11 @@ date: 2024-12-12 # Apache Hive : AdminManual Metastore Administration -This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see [AdminManual Metastore 3.0 Administration]({{< ref "adminmanual-metastore-3-0-administration" >}}) +This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see [AdminManual Metastore 3.0 Administration]({{% ref "adminmanual-metastore-3-0-administration" %}}) ### Introduction -All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using [JPOX](http://www.datanucleus.org/) ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of [supported databases]({{< ref "#supported-databases" >}}) in section below. +All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using [JPOX](http://www.datanucleus.org/) ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of [supported databases]({{% ref "#supported-databases" %}}) in section below. You can find an E/R diagram for the metastore [here](https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf). @@ -17,13 +17,13 @@ There are 2 different ways to setup the metastore server and metastore database Configuration options for **metastore database** where metadata is persisted: -* [Local/Embedded Metastore Database (Derby)]({{< ref "#localembedded-metastore-database-derby" >}}) -* [Remote Metastore Database]({{< ref "#remote-metastore-database" >}}) +* [Local/Embedded Metastore Database (Derby)]({{% ref "#localembedded-metastore-database-derby" %}}) +* [Remote Metastore Database]({{% ref "#remote-metastore-database" %}}) Configuration options for **metastore server**: -* [Local/Embedded Metastore Server]({{< ref "#localembedded-metastore-server" >}}) -* [Remote Metastore Server]({{< ref "#remote-metastore-server" >}}) +* [Local/Embedded Metastore Server]({{% ref "#localembedded-metastore-server" %}}) +* [Remote Metastore Server]({{% ref "#remote-metastore-server" %}}) #### Basic Configuration Parameters @@ -81,7 +81,7 @@ The default configuration sets up an embedded metastore which is used in unit te **An embedded metastore database is mainly used for unit tests. Only one process can connect to the metastore database at a time, so it is not really a practical solution but works well for unit tests.** -For unit tests [Local/Embedded Metastore Server]({{< ref "#localembedded-metastore-server" >}}) configuration for the metastore server is used in conjunction with embedded database. +For unit tests [Local/Embedded Metastore Server]({{% ref "#localembedded-metastore-server" %}}) configuration for the metastore server is used in conjunction with embedded database. Derby is the default database for the embedded metastore. @@ -111,7 +111,7 @@ In local/embedded metastore setup, the metastore server component is used like a | Config Param | Config Value | Comment | | --- | --- | --- | | hive.metastore.uris | *not needed because this is local store* |   | -| hive.metastore.local | `true` | this is local store (removed in Hive 0.10, see [configuration description]({{< ref "#configuration-description" >}}) section) | +| hive.metastore.local | `true` | this is local store (removed in Hive 0.10, see [configuration description]({{% ref "#configuration-description" %}}) section) | | hive.metastore.warehouse.dir | `` | Points to default location of non-external Hive tables in HDFS. | ### Remote Metastore Server @@ -132,7 +132,7 @@ If you execute Java directly, then JAVA_HOME, HIVE_HOME, HADOOP_HOME must be cor **Server Configuration Parameters** -The following example uses a[Remote Metastore Database]({{< ref "#remote-metastore-database" >}}). +The following example uses a[Remote Metastore Database]({{% ref "#remote-metastore-database" %}}). | Config Param | Config Value | Comment | | --- | --- | --- | @@ -186,7 +186,7 @@ hive --service metastore -p | --- | --- | --- | --- | | MySQL | 5.6.17 | `mysql` | | | Postgres | 9.1.13 | ``` postgres ``` | | -| Oracle | 11g | `oracle` | [hive.metastore.orm.retrieveMapNullsAsEmptyStrings]({{< ref "#hive-metastore-orm-retrievemapnullsasemptystrings" >}}) | +| Oracle | 11g | `oracle` | [hive.metastore.orm.retrieveMapNullsAsEmptyStrings]({{% ref "#hive-metastore-orm-retrievemapnullsasemptystrings" %}}) | | MS SQL Server | 2008 R2 | `mssql` | | ### Metastore Schema Consistency and Upgrades @@ -199,5 +199,5 @@ Hive now records the schema version in the metastore database and verifies that To suppress the schema check and allow the metastore to implicitly modify the schema, you need to set a configuration property `hive.metastore.schema.verification` to false in `hive-site.xml`. -Starting in release 0.12, Hive also includes an off-line schema tool to initialize and upgrade the metastore schema. Please refer to the details [here]({{< ref "hive-schema-tool" >}}). +Starting in release 0.12, Hive also includes an off-line schema tool to initialize and upgrade the metastore schema. Please refer to the details [here]({{% ref "hive-schema-tool" %}}). diff --git a/content/docs/latest/admin/adminmanual-settinguphiveserver.md b/content/docs/latest/admin/adminmanual-settinguphiveserver.md index 5b6f5a78..fa856cb1 100644 --- a/content/docs/latest/admin/adminmanual-settinguphiveserver.md +++ b/content/docs/latest/admin/adminmanual-settinguphiveserver.md @@ -8,9 +8,9 @@ date: 2024-12-12 ## Setting Up Hive Server * [Setting Up HiveServer2](/docs/latest/admin/setting-up-hiveserver2) -* [Setting Up Thrift Hive Server]({{< ref "hiveserver" >}}) -* [Setting Up Hive JDBC Server]({{< ref "hivejdbcinterface" >}}) -* [Setting Up Hive ODBC Server]({{< ref "hiveodbc" >}}) +* [Setting Up Thrift Hive Server]({{% ref "hiveserver" %}}) +* [Setting Up Hive JDBC Server]({{% ref "hivejdbcinterface" %}}) +* [Setting Up Hive ODBC Server]({{% ref "hiveodbc" %}}) diff --git a/content/docs/latest/admin/adminmanual.md b/content/docs/latest/admin/adminmanual.md index fb1e9dc3..3efa26a4 100644 --- a/content/docs/latest/admin/adminmanual.md +++ b/content/docs/latest/admin/adminmanual.md @@ -7,11 +7,11 @@ date: 2024-12-12 # Hive Administrator's Manual -* [Installing Hive]({{< ref "adminmanual-installation" >}}) -* [Configuring Hive]({{< ref "adminmanual-configuration" >}}) -* [Setting up Metastore]({{< ref "adminmanual-metastore-administration" >}}) -* [Setting up Hive Server (JDBC, ODBC, Thrift, etc)]({{< ref "adminmanual-settinguphiveserver" >}}) -* [Hive on Amazon Web Services]({{< ref "hiveaws" >}}) +* [Installing Hive]({{% ref "adminmanual-installation" %}}) +* [Configuring Hive]({{% ref "adminmanual-configuration" %}}) +* [Setting up Metastore]({{% ref "adminmanual-metastore-administration" %}}) +* [Setting up Hive Server (JDBC, ODBC, Thrift, etc)]({{% ref "adminmanual-settinguphiveserver" %}}) +* [Hive on Amazon Web Services]({{% ref "hiveaws" %}}) diff --git a/content/docs/latest/admin/hive-on-spark-getting-started.md b/content/docs/latest/admin/hive-on-spark-getting-started.md index b0e4667b..3624a10a 100644 --- a/content/docs/latest/admin/hive-on-spark-getting-started.md +++ b/content/docs/latest/admin/hive-on-spark-getting-started.md @@ -96,7 +96,7 @@ yarn.resourcemanager.scheduler.class=org.apache.hadoop.yarn.server.resourcemanag set hive.execution.engine=spark; ``` -See the [Spark section of Hive Configuration Properties]({{< ref "#spark-section-of-hive-configuration-properties" >}}) for other properties for configuring Hive and the Remote Spark Driver. +See the [Spark section of Hive Configuration Properties]({{% ref "#spark-section-of-hive-configuration-properties" %}}) for other properties for configuring Hive and the Remote Spark Driver. 3. Configure Spark-application configs for Hive.  See: .  This can be done either by adding a file "spark-defaults.conf" with these properties to the Hive classpath, or by setting them on Hive configuration (`hive-site.xml`). For instance: ``` @@ -171,8 +171,8 @@ On this 9 node cluster we’ll have two executors per host. As such we can confi | Issue | Cause | Resolution | | --- | --- | --- | -| Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit | Spark dependency not correctly set. | Add Spark dependency to Hive, see Step 1 [above]({{< ref "#above" >}}). | -| org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set spark.serializer to be org.apache.spark.serializer.KryoSerializer, see Step 3 [above]({{< ref "#above" >}}). | +| Error: Could not find or load main class org.apache.spark.deploy.SparkSubmit | Spark dependency not correctly set. | Add Spark dependency to Hive, see Step 1 [above]({{% ref "#above" %}}). | +| org.apache.spark.SparkException: Job aborted due to stage failure: Task 5.0:0 had a not serializable result: java.io.NotSerializableException: org.apache.hadoop.io.BytesWritable | Spark serializer not set to Kryo. | Set spark.serializer to be org.apache.spark.serializer.KryoSerializer, see Step 3 [above]({{% ref "#above" %}}). | | [ERROR] Terminal initialization failed; falling back to unsupportedjava.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected | Hive has upgraded to Jline2 but jline 0.94 exists in the Hadoop lib. | 1. Delete jline from the Hadoop lib directory (it's only pulled in transitively from ZooKeeper). 2. export HADOOP_USER_CLASSPATH_FIRST=true 3. If this error occurs during mvn test, perform a mvn clean install on the root project and itests directory. | | Spark executor gets killed all the time and Spark keeps retrying the failed stage; you may find similar information in the YARN nodemanager log.WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: Container [pid=217989,containerID=container_1421717252700_0716_01_50767235] is running beyond physical memory limits. Current usage: 43.1 GB of 43 GB physical memory used; 43.9 GB of 90.3 GB virtual memory used. Killing container. | For Spark on YARN, nodemanager would kill Spark executor if it used more memory than the configured size of "spark.executor.memory" + "spark.yarn.executor.memoryOverhead". | Increase "spark.yarn.executor.memoryOverhead" to make sure it covers the executor off-heap memory usage. | | Run query and get an error like:FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTaskIn Hive logs, it shows:java.lang.NoClassDefFoundError: Could not initialize class org.xerial.snappy.Snappy  at org.xerial.snappy.SnappyOutputStream.<init>(SnappyOutputStream.java:79) | Happens on Mac (not officially supported).This is a general Snappy issue with Mac and is not unique to Hive on Spark, but workaround is noted here because it is needed for startup of Spark client. | Run this command before starting Hive or HiveServer2:export HADOOP_OPTS="-Dorg.xerial.snappy.tempdir=/tmp -Dorg.xerial.snappy.lib.name=libsnappyjava.jnilib $HADOOP_OPTS" | diff --git a/content/docs/latest/admin/hive-schema-tool.md b/content/docs/latest/admin/hive-schema-tool.md index 8324905a..7954f611 100644 --- a/content/docs/latest/admin/hive-schema-tool.md +++ b/content/docs/latest/admin/hive-schema-tool.md @@ -31,7 +31,7 @@ Caused by: MetaException(message:Version information not found in metastore. ) By default the configuration property **metastore.schema.verification** is true. If we set it false, metastore implicitly writes the schema version if it's not matching. To disable the strict schema verification, you need to set this property to false in `hive-site.xml`. -See [Hive Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}) for general information about the metastore. +See [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}) for general information about the metastore. ## The Hive Schema Tool diff --git a/content/docs/latest/admin/hiveaws.md b/content/docs/latest/admin/hiveaws.md index db7964c0..18b088eb 100644 --- a/content/docs/latest/admin/hiveaws.md +++ b/content/docs/latest/admin/hiveaws.md @@ -16,7 +16,7 @@ Hadoop already has a long tradition of being run on EC2 and S3. These are well d * [Hadoop and S3](http://wiki.apache.org/hadoop/AmazonS3) * [Amazon and EC2](http://wiki.apache.org/hadoop/AmazonEC2) -The second document also has pointers on how to get started using EC2 and S3. For people who are new to S3 - there's a few helpful notes in [S3 for n00bs section]({{< ref "#s3-for-n00bs-section" >}}) below. The rest of the documentation below assumes that the reader can launch a hadoop cluster in EC2, copy files into and out of S3 and run some simple Hadoop jobs. +The second document also has pointers on how to get started using EC2 and S3. For people who are new to S3 - there's a few helpful notes in [S3 for n00bs section]({{% ref "#s3-for-n00bs-section" %}}) below. The rest of the documentation below assumes that the reader can launch a hadoop cluster in EC2, copy files into and out of S3 and run some simple Hadoop jobs. ## Introduction to Hive and AWS @@ -52,7 +52,7 @@ However - the one downside of Option 2 is that jar files are copied over to the It is useful to go over the main storage choices for Hadoop/EC2 environment: * S3 is an excellent place to store data for the long term. There are a couple of choices on how S3 can be used: - + Data can be either stored as files within S3 using tools like aws and s3curl as detailed in [S3 for n00bs section]({{< ref "#s3-for-n00bs-section" >}}). This suffers from the restriction of 5G limit on file size in S3. But the nice thing is that there are probably scores of tools that can help in copying/replicating data to S3 in this manner. Hadoop is able to read/write such files using the S3N filesystem. + + Data can be either stored as files within S3 using tools like aws and s3curl as detailed in [S3 for n00bs section]({{% ref "#s3-for-n00bs-section" %}}). This suffers from the restriction of 5G limit on file size in S3. But the nice thing is that there are probably scores of tools that can help in copying/replicating data to S3 in this manner. Hadoop is able to read/write such files using the S3N filesystem. + Alternatively Hadoop provides a block based file system using S3 as a backing store. This does not suffer from the 5G max file size restriction. However - Hadoop utilities and libraries must be used for reading/writing such files. * HDFS instance on the local drives of the machines in the Hadoop cluster. The lifetime of this is restricted to that of the Hadoop instance - hence this is not suitable for long lived data. However it should provide data that can be accessed much faster and hence is a good choice for intermediate/tmp data. diff --git a/content/docs/latest/admin/hivederbyservermode.md b/content/docs/latest/admin/hivederbyservermode.md index 2609a753..ec0c96a0 100644 --- a/content/docs/latest/admin/hivederbyservermode.md +++ b/content/docs/latest/admin/hivederbyservermode.md @@ -7,7 +7,7 @@ date: 2024-12-12 Hive in embedded mode has a limitation of one active user at a time. You may want to run [Derby](http://db.apache.org/derby/) as a Network Server, this way multiple users can access it simultaneously from different systems. -See [Metadata Store]({{< ref "#metadata-store" >}}) and [Embedded Metastore]({{< ref "#embedded-metastore" >}}) for more information. +See [Metadata Store]({{% ref "#metadata-store" %}}) and [Embedded Metastore]({{% ref "#embedded-metastore" %}}) for more information. ### Download Derby diff --git a/content/docs/latest/admin/hivejdbcinterface.md b/content/docs/latest/admin/hivejdbcinterface.md index 7dfcf65d..5cd01faf 100644 --- a/content/docs/latest/admin/hivejdbcinterface.md +++ b/content/docs/latest/admin/hivejdbcinterface.md @@ -7,7 +7,7 @@ date: 2024-12-12 The current JDBC interface for Hive only supports running queries and fetching results. Only a small subset of the metadata calls are supported. -To see how the JDBC interface can be used, see [sample code]({{< ref "hiveclient" >}}). +To see how the JDBC interface can be used, see [sample code]({{% ref "hiveclient" %}}). ### Integration with Pentaho @@ -30,8 +30,8 @@ echo java -XX:MaxPermSize=512m -cp $CLASSPATH -jar launcher.jar java -XX:MaxPermSize=512m -cp $CLASSPATH org.pentaho.commons.launcher.Launcher ``` -3. Build and start the hive server with instructions from [HiveServer]({{< ref "hiveserver" >}}). -4. Compile and run the Hive JDBC client code to load some data (I haven't figured out how to do this in report designer yet). See [sample code]({{< ref "hiveclient" >}}) for loading the data. +3. Build and start the hive server with instructions from [HiveServer]({{% ref "hiveserver" %}}). +4. Compile and run the Hive JDBC client code to load some data (I haven't figured out how to do this in report designer yet). See [sample code]({{% ref "hiveclient" %}}) for loading the data. 5. Run the report designer (note step 2). ``` diff --git a/content/docs/latest/admin/hiveodbc.md b/content/docs/latest/admin/hiveodbc.md index 67766fee..cbf6ad07 100644 --- a/content/docs/latest/admin/hiveodbc.md +++ b/content/docs/latest/admin/hiveodbc.md @@ -5,8 +5,8 @@ date: 2024-12-12 # Apache Hive : ODBC Driver -These instructions are for the Hive ODBC driver available in Hive for [HiveServer1]({{< ref "hiveserver" >}}). -There is no ODBC driver available for [HiveServer2]({{< ref "setting-up-hiveserver2" >}}) as part of Apache Hive. There are third party ODBC drivers available from different vendors, and most of them seem to be free. +These instructions are for the Hive ODBC driver available in Hive for [HiveServer1]({{% ref "hiveserver" %}}). +There is no ODBC driver available for [HiveServer2]({{% ref "setting-up-hiveserver2" %}}) as part of Apache Hive. There are third party ODBC drivers available from different vendors, and most of them seem to be free. ## Introduction @@ -18,7 +18,7 @@ The Hive ODBC Driver is a software library that implements the Open Database Con This guide assumes you are already familiar with Hive and in particular with the following: -* [Hive Server]({{< ref "hiveserver" >}}) +* [Hive Server]({{% ref "hiveserver" %}}) * [Thrift](http://wiki.apache.org/thrift/) * [ODBC API](http://msdn.microsoft.com/en-us/library/ms714177(VS.85).aspx) * [unixODBC](http://www.unixodbc.org/) @@ -27,7 +27,7 @@ This guide assumes you are already familiar with Hive and in particular with the The following software components are needed for the successful compilation and operation of the Hive ODBC driver: -* **Hive Server** – a service through which clients may remotely issue Hive commands and requests. The Hive ODBC driver depends on Hive Server to perform the core set of database interactions. Hive Server is built as part of the Hive build process. More information regarding Hive Server usage can be found [here]({{< ref "hiveserver" >}}). +* **Hive Server** – a service through which clients may remotely issue Hive commands and requests. The Hive ODBC driver depends on Hive Server to perform the core set of database interactions. Hive Server is built as part of the Hive build process. More information regarding Hive Server usage can be found [here]({{% ref "hiveserver" %}}). * **Apache Thrift** – a scalable cross-language software framework that enables the Hive ODBC driver (specifically the Hive client) to communicate with the Hive Server. See this link for the details on [Thrift Installation](http://wiki.apache.org/thrift/ThriftInstallation). The Hive ODBC driver was developed with Thrift trunk version r790732, but the latest revision should also be fine. Make sure you note the Thrift install path during the Thrift build process as this information will be needed during the Hive client build process. The Thrift install path will be referred to as THRIFT_HOME. ### Driver Architecture @@ -45,7 +45,7 @@ NOTE: Hive client needs to be built and installed before the unixODBC API wrappe In order to build and install the Hive client: -1. Checkout and setup the latest version of Apache Hive from the Subversion or Git source code repository. For more details, see [Getting Started with Hive]({{< ref "gettingstarted-latest" >}}). From this point onwards, the path to the Hive root directory will be referred to as HIVE_SRC_ROOT. +1. Checkout and setup the latest version of Apache Hive from the Subversion or Git source code repository. For more details, see [Getting Started with Hive]({{% ref "gettingstarted-latest" %}}). From this point onwards, the path to the Hive root directory will be referred to as HIVE_SRC_ROOT. Using a tarball source release diff --git a/content/docs/latest/admin/hiveserver.md b/content/docs/latest/admin/hiveserver.md index 321aded9..fed05975 100644 --- a/content/docs/latest/admin/hiveserver.md +++ b/content/docs/latest/admin/hiveserver.md @@ -7,20 +7,20 @@ date: 2024-12-12 # Thrift Hive Server -HiveServer is an optional service that allows a remote [client]({{< ref "hiveclient" >}}) to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM (), therefore it is sometimes called the *Thrift server* although this can lead to confusion because a newer service named [HiveServer2]({{< ref "#hiveserver2" >}}) is also built on Thrift. Since the introduction of HiveServer2, HiveServer has also been called *HiveServer1*. +HiveServer is an optional service that allows a remote [client]({{% ref "hiveclient" %}}) to submit requests to Hive, using a variety of programming languages, and retrieve results. HiveServer is built on Apache ThriftTM (), therefore it is sometimes called the *Thrift server* although this can lead to confusion because a newer service named [HiveServer2]({{% ref "#hiveserver2" %}}) is also built on Thrift. Since the introduction of HiveServer2, HiveServer has also been called *HiveServer1*. WARNING! HiveServer cannot handle concurrent requests from more than one client. This is actually a limitation imposed by the Thrift interface that HiveServer exports, and can't be resolved by modifying the HiveServer code. -[HiveServer2]({{< ref "hiveserver2-clients" >}}) is a rewrite of HiveServer that addresses these problems, starting with Hive 0.11.0. Use of HiveServer2 is recommended. +[HiveServer2]({{% ref "hiveserver2-clients" %}}) is a rewrite of HiveServer that addresses these problems, starting with Hive 0.11.0. Use of HiveServer2 is recommended. -**HiveServer was [removed](https://issues.apache.org/jira/browse/HIVE-6977) from Hive releases starting in Hive 1.0.0 (**[formerly called 0.14.1]({{< ref "#formerly-called-0-14-1" >}})**). ****Please switch over to HiveServer2.****** +**HiveServer was [removed](https://issues.apache.org/jira/browse/HIVE-6977) from Hive releases starting in Hive 1.0.0 (**[formerly called 0.14.1]({{% ref "#formerly-called-0-14-1" %}})**). ****Please switch over to HiveServer2.****** -Previously its removal had been scheduled for [Hive 0.15 (now called 1.1.0)]({{< ref "#hive-0-15--now-called-1-1-0-" >}}). See [HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977). +Previously its removal had been scheduled for [Hive 0.15 (now called 1.1.0)]({{% ref "#hive-0-15--now-called-1-1-0-" %}}). See [HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977). Thrift's interface definition language (IDL) file for HiveServer is `hive_service.thrift`, which is installed in `$HIVE_HOME/service/if/`. -Once Hive has been built using steps in [Getting Started]({{< ref "gettingstarted-latest" >}}), the Thrift server can be started by running the following: +Once Hive has been built using steps in [Getting Started]({{% ref "gettingstarted-latest" %}}), the Thrift server can be started by running the following: **0.8 and Later** @@ -59,7 +59,7 @@ $ ant test -Dtestcase=TestHiveServer -Dstandalone=true ``` -The service supports clients in multiple languages. For more details see [Hive Client]({{< ref "hiveclient" >}}). +The service supports clients in multiple languages. For more details see [Hive Client]({{% ref "hiveclient" %}}). Troubleshooting: Connection Error diff --git a/content/docs/latest/admin/manual-installation.md b/content/docs/latest/admin/manual-installation.md index d753d1d8..0226ce84 100644 --- a/content/docs/latest/admin/manual-installation.md +++ b/content/docs/latest/admin/manual-installation.md @@ -469,25 +469,25 @@ $ bin/beeline Metadata is stored in a relational database. In our example (and as a default) it is a Derby database. By default, it's location is ./metastore_db. (See conf/hive-default.xml). You can change it by modifying the configuration variable javax.jdo.option.ConnectionURL. -Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see [Hive Using Derby in Server Mode]({{< ref "hivederbyservermode" >}}). +Using Derby in embedded mode allows at most one user at a time. To configure Derby to run in server mode, see [Hive Using Derby in Server Mode]({{% ref "hivederbyservermode" %}}). -To configure a database other than Derby for the Hive metastore, see [Hive Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}). +To configure a database other than Derby for the Hive metastore, see [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}). -**Next Step:** [Configuring Hive]({{< ref "adminmanual-configuration" >}}). +**Next Step:** [Configuring Hive]({{% ref "adminmanual-configuration" %}}). ## HCatalog and WebHCat ### HCatalog -If you install Hive from the binary tarball, the `hcat` command is available in the `hcatalog/bin` directory. However, most `hcat` commands can be issued as `hive` commands except for "`hcat -g`" and "`hcat -p`". Note that the `hcat` command uses the `-p` flag for permissions but `hive` uses it to specify a port number. The HCatalog CLI is documented [here]({{< ref "hcatalog-cli" >}}) and the Hive CLI is documented [here]({{< ref "languagemanual-cli" >}}). +If you install Hive from the binary tarball, the `hcat` command is available in the `hcatalog/bin` directory. However, most `hcat` commands can be issued as `hive` commands except for "`hcat -g`" and "`hcat -p`". Note that the `hcat` command uses the `-p` flag for permissions but `hive` uses it to specify a port number. The HCatalog CLI is documented [here]({{% ref "hcatalog-cli" %}}) and the Hive CLI is documented [here]({{% ref "languagemanual-cli" %}}). -HCatalog installation is documented [here]({{< ref "hcatalog-installhcat" >}}). +HCatalog installation is documented [here]({{% ref "hcatalog-installhcat" %}}). ### WebHCat (Templeton) If you install Hive from the binary tarball, the WebHCat server command `webhcat_server.sh` is in the hcatalog/webhcat/svr/src/main/bin/webhcat_server.sh directory. -WebHCat installation is documented [here]({{< ref "webhcat-installwebhcat" >}}). +WebHCat installation is documented [here]({{% ref "webhcat-installwebhcat" %}}). diff --git a/content/docs/latest/admin/replication.md b/content/docs/latest/admin/replication.md index 58ae9fcd..5bf833f3 100644 --- a/content/docs/latest/admin/replication.md +++ b/content/docs/latest/admin/replication.md @@ -7,13 +7,13 @@ date: 2024-12-12 ## Overview -Hive Replication builds on the [metastore event]({{< ref "hcatalog-notification" >}}) and [ExIm]({{< ref "languagemanual-importexport" >}}) features to provide a framework for replicating Hive metadata and data changes between clusters. There is no requirement for the source cluster and replica to run the same Hadoop distribution, Hive version, or metastore RDBMS. The replication system has a fairly 'light touch', exhibiting a low degree of coupling and using the Hive-metastore Thrift service as an integration point. However, the current implementation is not an 'out of the box' solution. In particular it is necessary to provide some kind of orchestration service that is responsible for requesting replication tasks and executing them. +Hive Replication builds on the [metastore event]({{% ref "hcatalog-notification" %}}) and [ExIm]({{% ref "languagemanual-importexport" %}}) features to provide a framework for replicating Hive metadata and data changes between clusters. There is no requirement for the source cluster and replica to run the same Hadoop distribution, Hive version, or metastore RDBMS. The replication system has a fairly 'light touch', exhibiting a low degree of coupling and using the Hive-metastore Thrift service as an integration point. However, the current implementation is not an 'out of the box' solution. In particular it is necessary to provide some kind of orchestration service that is responsible for requesting replication tasks and executing them. -See [HiveReplicationDevelopment]({{< ref "hivereplicationdevelopment" >}}) for information on the design of replication in Hive. +See [HiveReplicationDevelopment]({{% ref "hivereplicationdevelopment" %}}) for information on the design of replication in Hive.   -**A more advanced replication mechanism is being implemented in Hive to address some of the limitations of this mode. See [HiveReplicationv2Development]({{< ref "hivereplicationv2development" >}}) for details.** +**A more advanced replication mechanism is being implemented in Hive to address some of the limitations of this mode. See [HiveReplicationv2Development]({{% ref "hivereplicationv2development" %}}) for details.** ## Potential Uses @@ -34,7 +34,7 @@ See [HiveReplicationDevelopment]({{< ref "hivereplicationdevelopment" >}}) for i ## Configuration -To configure the persistence of metastore notification events it is necessary to set the following `[hive-site.xml]({{< ref "#hive-site-xml" >}})` properties on the source cluster. A restart of the metastore service will be required for the settings to take effect. +To configure the persistence of metastore notification events it is necessary to set the following `[hive-site.xml]({{% ref "#hive-site-xml" %}})` properties on the source cluster. A restart of the metastore service will be required for the settings to take effect. **hive-site.xml Configuration for Replication** diff --git a/content/docs/latest/admin/setting-up-hiveserver2.md b/content/docs/latest/admin/setting-up-hiveserver2.md index aa05bc5a..18f40ba6 100644 --- a/content/docs/latest/admin/setting-up-hiveserver2.md +++ b/content/docs/latest/admin/setting-up-hiveserver2.md @@ -5,12 +5,12 @@ date: 2024-12-12 # Apache Hive : Setting Up HiveServer2 -[HiveServer2](/docs/latest/user/hiveserver2-overview) (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro [here](/docs/latest/user/hiveserver2-overview)). The current implementation, based on Thrift RPC, is an improved version of [HiveServer]({{< ref "hiveserver" >}}) and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. +[HiveServer2](/docs/latest/user/hiveserver2-overview) (HS2) is a server interface that enables remote clients to execute queries against Hive and retrieve the results (a more detailed intro [here](/docs/latest/user/hiveserver2-overview)). The current implementation, based on Thrift RPC, is an improved version of [HiveServer]({{% ref "hiveserver" %}}) and supports multi-client concurrency and authentication. It is designed to provide better support for open API clients like JDBC and ODBC. * The Thrift interface definition language (IDL) for HiveServer2 is available at . * Thrift documentation is available at . -This document describes how to set up the server. How to use a client with this server is described in the [HiveServer2 Clients document]({{< ref "hiveserver2-clients" >}}). +This document describes how to set up the server. How to use a client with this server is described in the [HiveServer2 Clients document]({{% ref "hiveserver2-clients" %}}). ## Version information @@ -28,7 +28,7 @@ hive.server2.thrift.port – TCP port number to listen on, default 10000. hive.server2.thrift.bind.host – TCP interface to bind to. -See [HiveServer2 in the Configuration Properties document]({{< ref "#hiveserver2-in-the-configuration-properties-document" >}}) for additional properties that can be set for HiveServer2. +See [HiveServer2 in the Configuration Properties document]({{% ref "#hiveserver2-in-the-configuration-properties-document" %}}) for additional properties that can be set for HiveServer2. **Optional Environment Settings** @@ -41,11 +41,11 @@ HiveServer2 provides support for sending Thrift RPC messages over HTTP transport | Setting | Default | Description | | --- | --- | --- | -| [hive.server2.transport.mode]({{< ref "#hive-server2-transport-mode" >}}) | binary | Set to http to enable HTTP transport mode | -| [hive.server2.thrift.http.port]({{< ref "#hive-server2-thrift-http-port" >}}) | 10001 | HTTP port number to listen on | -| [hive.server2.thrift.http.max.worker.threads]({{< ref "#hive-server2-thrift-http-max-worker-threads" >}}) | 500 | Maximum worker threads in the server pool | -| [hive.server2.thrift.http.min.worker.threads]({{< ref "#hive-server2-thrift-http-min-worker-threads" >}}) | 5 | Minimum worker threads in the server pool | -| [hive.server2.thrift.http.path]({{< ref "#hive-server2-thrift-http-path" >}}) | cliservice | The service endpoint | +| [hive.server2.transport.mode]({{% ref "#hive-server2-transport-mode" %}}) | binary | Set to http to enable HTTP transport mode | +| [hive.server2.thrift.http.port]({{% ref "#hive-server2-thrift-http-port" %}}) | 10001 | HTTP port number to listen on | +| [hive.server2.thrift.http.max.worker.threads]({{% ref "#hive-server2-thrift-http-max-worker-threads" %}}) | 500 | Maximum worker threads in the server pool | +| [hive.server2.thrift.http.min.worker.threads]({{% ref "#hive-server2-thrift-http-min-worker-threads" %}}) | 5 | Minimum worker threads in the server pool | +| [hive.server2.thrift.http.path]({{% ref "#hive-server2-thrift-http-path" %}}) | cliservice | The service endpoint | ##### Cookie Based Authentication @@ -61,10 +61,10 @@ The init file lists a set of commands that will run for users of this HiveServer HiveServer2 operation logs are available for Beeline clients (Hive 0.14 onward). These parameters configure logging: -* [hive.server2.logging.operation.enabled]({{< ref "#hive-server2-logging-operation-enabled" >}}) -* [hive.server2.logging.operation.log.location]({{< ref "#hive-server2-logging-operation-log-location" >}}) -* [hive.server2.logging.operation.verbose]({{< ref "#hive-server2-logging-operation-verbose" >}}) (Hive 0.14 to 1.1) -* [hive.server2.logging.operation.level]({{< ref "#hive-server2-logging-operation-level" >}}) (Hive 1.2 onward) +* [hive.server2.logging.operation.enabled]({{% ref "#hive-server2-logging-operation-enabled" %}}) +* [hive.server2.logging.operation.log.location]({{% ref "#hive-server2-logging-operation-log-location" %}}) +* [hive.server2.logging.operation.verbose]({{% ref "#hive-server2-logging-operation-verbose" %}}) (Hive 0.14 to 1.1) +* [hive.server2.logging.operation.level]({{% ref "#hive-server2-logging-operation-level" %}}) (Hive 1.2 onward) ## How to Start @@ -110,19 +110,19 @@ hive.server2.authentication.kerberos.keytab – Keytab for server principal. Set following for LDAP mode: -[hive.server2.authentication.ldap.url]({{< ref "#hive-server2-authentication-ldap-url" >}}) – LDAP URL (for example, ldap://hostname.com:389). +[hive.server2.authentication.ldap.url]({{% ref "#hive-server2-authentication-ldap-url" %}}) – LDAP URL (for example, ldap://hostname.com:389). -[hive.server2.authentication.ldap.baseDN]({{< ref "#hive-server2-authentication-ldap-basedn" >}}) – LDAP base DN. (Optional for AD.) +[hive.server2.authentication.ldap.baseDN]({{% ref "#hive-server2-authentication-ldap-basedn" %}}) – LDAP base DN. (Optional for AD.) -[hive.server2.authentication.ldap.Domain]({{< ref "#hive-server2-authentication-ldap-domain" >}}) – LDAP domain. (Hive 0.12.0 and later.) +[hive.server2.authentication.ldap.Domain]({{% ref "#hive-server2-authentication-ldap-domain" %}}) – LDAP domain. (Hive 0.12.0 and later.) -See [User and Group Filter Support with LDAP Atn Provider in HiveServer2]({{< ref "user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2" >}}) for other LDAP configuration parameters in Hive 1.3.0 and later. +See [User and Group Filter Support with LDAP Atn Provider in HiveServer2]({{% ref "user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2" %}}) for other LDAP configuration parameters in Hive 1.3.0 and later. Set following for CUSTOM mode: hive.server2.custom.authentication.class – Custom authentication class that implements the `org.apache.hive.service.auth.PasswdAuthenticationProvider` interface. -For PAM mode, see details in [section on PAM]({{< ref "#section-on-pam" >}}) below. +For PAM mode, see details in [section on PAM]({{% ref "#section-on-pam" %}}) below. #### **Impersonation** @@ -219,10 +219,10 @@ HiveServer2 allows the configuration of various aspects of scratch directories, The following are the properties that can be configured related to scratch directories: -* [hive.scratchdir.lock]({{< ref "#hive-scratchdir-lock" >}}) -* [hive.exec.scratchdir]({{< ref "#hive-exec-scratchdir" >}}) -* [hive.scratch.dir.permission]({{< ref "#hive-scratch-dir-permission" >}}) -* [hive.start.cleanup.scratchdir]({{< ref "#hive-start-cleanup-scratchdir" >}}) +* [hive.scratchdir.lock]({{% ref "#hive-scratchdir-lock" %}}) +* [hive.exec.scratchdir]({{% ref "#hive-exec-scratchdir" %}}) +* [hive.scratch.dir.permission]({{% ref "#hive-scratch-dir-permission" %}}) +* [hive.start.cleanup.scratchdir]({{% ref "#hive-start-cleanup-scratchdir" %}}) ### ClearDanglingScratchDir Tool @@ -245,9 +245,9 @@ Introduced in Hive 2.0.0. See [HIVE-12338](https://issues.apache.org/jira/browse A Web User Interface (UI) for HiveServer2 provides configuration, logging, metrics and active session information. The Web UI is available at port 10002 (127.0.0.1:10002) by default.   -* [Configuration properties]({{< ref "#configuration-properties" >}}) for the Web UI can be [customized in hive-site.xml]({{< ref "#customized-in-hive-site-xml" >}}), including hive.server2.webui.host, hive.server2.webui.port, hive.server2.webui.max.threads, and others. -* [Hive Metrics]({{< ref "hive-metrics" >}}) can by viewed by using the "Metrics Dump" tab. -* [Logs]({{< ref "#logs" >}})can be viewed by using the "Local logs" tab. +* [Configuration properties]({{% ref "#configuration-properties" %}}) for the Web UI can be [customized in hive-site.xml]({{% ref "#customized-in-hive-site-xml" %}}), including hive.server2.webui.host, hive.server2.webui.port, hive.server2.webui.max.threads, and others. +* [Hive Metrics]({{% ref "hive-metrics" %}}) can by viewed by using the "Metrics Dump" tab. +* [Logs]({{% ref "#logs" %}})can be viewed by using the "Local logs" tab. The interface is currently under development with [HIVE-12338](https://issues.apache.org/jira/browse/HIVE-12338). diff --git a/content/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2.md b/content/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2.md index 38d39ad5..c2573d21 100644 --- a/content/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2.md +++ b/content/docs/latest/admin/user-and-group-filter-support-with-ldap-atn-provider-in-hiveserver2.md @@ -15,7 +15,7 @@ Starting in Hive 1.3.0, [HIVE-7193](https://issues.apache.org/jira/browse/HIVE- Filters greatly enhance the functionality of the LDAP Authentication provider. They allow Hive to restrict the set of users allowed to connect to HiveServer2. -See [Authentication/Security Configuration]({{< ref "#authentication/security-configuration" >}}) for general information about configuring authentication for HiveServer2. Also see [Hive Configuration Properties – HiveServer2]({{< ref "#hive-configuration-properties –-hiveserver2" >}}) for individual configuration parameters discussed below. +See [Authentication/Security Configuration]({{% ref "#authentication/security-configuration" %}}) for general information about configuring authentication for HiveServer2. Also see [Hive Configuration Properties – HiveServer2]({{% ref "#hive-configuration-properties –-hiveserver2" %}}) for individual configuration parameters discussed below. ### Group Membership @@ -230,7 +230,7 @@ will return the entries ``` but there is no means to form a query that would return just the values of "member" attributes. (LDAP APIs allow filtering of the attributes on the result set.) -To allow for such queries to return user DNs for the members of the group instead of the group DN itself, as of Hive release 2.1.1 the LDAP authentication provider will (re)use the configuration property [hive.server2.authentication.ldap.groupMembershipKey]({{< ref "#hiveserver2authenticationldapgroupmembershipkey" >}}). This property represents the attribute name that represents the user DN on the Group entry. In the example from above, that attribute is "*member*". +To allow for such queries to return user DNs for the members of the group instead of the group DN itself, as of Hive release 2.1.1 the LDAP authentication provider will (re)use the configuration property [hive.server2.authentication.ldap.groupMembershipKey]({{% ref "#hiveserver2authenticationldapgroupmembershipkey" %}}). This property represents the attribute name that represents the user DN on the Group entry. In the example from above, that attribute is "*member*". This allows the Hive LDAP authentication provider to specify a query that returns groups and individual users as below (all users of `group1` + the user `user4` will be allowed to authenticate): diff --git a/content/docs/latest/hcatalog/hcatalog-authorization.md b/content/docs/latest/hcatalog/hcatalog-authorization.md index ae86d06c..20b4b07d 100644 --- a/content/docs/latest/hcatalog/hcatalog-authorization.md +++ b/content/docs/latest/hcatalog/hcatalog-authorization.md @@ -9,7 +9,7 @@ date: 2024-12-12 ## Default Authorization Model of Hive -The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table. It is described in more detail in [Hive Authorization]({{< ref "languagemanual-authorization" >}}) and [Hive deprecated authorization mode / Legacy Mode]({{< ref "hive-deprecated-authorization-mode" >}}). +The default authorization model of Hive supports a traditional RDBMS style of authorization based on users, groups and roles and granting them permissions to do operations on database or table. It is described in more detail in [Hive Authorization]({{% ref "languagemanual-authorization" %}}) and [Hive deprecated authorization mode / Legacy Mode]({{% ref "hive-deprecated-authorization-mode" %}}). This RDBMS style of authorization is not very suitable for the typical use cases in Hadoop because of the following differences in implementation: @@ -29,7 +29,7 @@ In the HCatalog package, we have introduced implementation of an authorization i Note -This feature is also available in Hive on the metastore-side, starting with release 0.10.0 (see [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}}) in the Hive documentation). Starting in Hive 0.12.0 it also runs on the client side ([HIVE-5048](https://issues.apache.org/jira/browse/HIVE-5048) and [HIVE-5402](https://issues.apache.org/jira/browse/HIVE-5402)). +This feature is also available in Hive on the metastore-side, starting with release 0.10.0 (see [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}}) in the Hive documentation). Starting in Hive 0.12.0 it also runs on the client side ([HIVE-5048](https://issues.apache.org/jira/browse/HIVE-5048) and [HIVE-5402](https://issues.apache.org/jira/browse/HIVE-5402)). In Hive, when a file system is used for storage, there is a directory corresponding to a database or a table. With this authorization model, the read/write permissions a user or group has for this directory determine the permissions a user has on the database or table. In the case of other storage systems such as HBase, the authorization of equivalent entities in the system will be done using the system’s authorization mechanism to determine the permissions in Hive. @@ -48,7 +48,7 @@ Details of HDFS permissions are given at `ht``tp://hadoop.apache.org/docs/r`*x.x Links to documentation for different releases of Hadoop can be found here: . -**Note**: If [hive.warehouse.subdir.inherit.perms](/docs/latest/user/configuration-properties#hivewarehousesubdirinheritperms) is enabled, permissions and ACL's for Hive-created files and directories will be set via the following [permission inheritance]({{< ref "permission-inheritance-in-hive" >}}) rules. +**Note**: If [hive.warehouse.subdir.inherit.perms](/docs/latest/user/configuration-properties#hivewarehousesubdirinheritperms) is enabled, permissions and ACL's for Hive-created files and directories will be set via the following [permission inheritance]({{% ref "permission-inheritance-in-hive" %}}) rules. The file system’s logic for determining if a user has permission on the directory or file will be used by Hive.  @@ -68,7 +68,7 @@ The following table shows the minimum permissions required for Hive operations u | ALTER TABLE |   |   |   | X | | SHOW TABLES | X |   |   |   | -**Caution:** Hive's current implementation of this authorization model does not prevent malicious users from doing bad things. See the [Known Issues]({{< ref "#known-issues" >}}) section below. +**Caution:** Hive's current implementation of this authorization model does not prevent malicious users from doing bad things. See the [Known Issues]({{% ref "#known-issues" %}}) section below. ### Unused DDL for Permissions @@ -76,11 +76,11 @@ DDL statements that manage permissions for Hive's default authorization model do Caution -All GRANT and REVOKE statements for users, groups, and roles are ignored. See the [Known Issues]({{< ref "#known-issues" >}}) section below. +All GRANT and REVOKE statements for users, groups, and roles are ignored. See the [Known Issues]({{% ref "#known-issues" %}}) section below. ## Configuring Storage-System Based Authorization -The implementation of the file-system based authorization model is available through an authorization provider called StorageBasedAuthorizationProvider that is part of Hive. (Support for this was added to the Hive package in release 0.10.0 – see [HIVE-3705](https://issues.apache.org/jira/browse/HIVE-3705) and [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}}).) +The implementation of the file-system based authorization model is available through an authorization provider called StorageBasedAuthorizationProvider that is part of Hive. (Support for this was added to the Hive package in release 0.10.0 – see [HIVE-3705](https://issues.apache.org/jira/browse/HIVE-3705) and [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}}).) Version @@ -125,9 +125,9 @@ The HCatalog command line tool uses the same syntax as Hive, and will create the   **Navigation Links** -Previous: [Notification]({{< ref "hcatalog-notification" >}}) +Previous: [Notification]({{% ref "hcatalog-notification" %}}) -Hive documents: [Authorization]({{< ref "languagemanual-authorization" >}}) and [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}}) +Hive documents: [Authorization]({{% ref "languagemanual-authorization" %}}) and [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-base.md b/content/docs/latest/hcatalog/hcatalog-base.md index c0edab71..29166ca2 100644 --- a/content/docs/latest/hcatalog/hcatalog-base.md +++ b/content/docs/latest/hcatalog/hcatalog-base.md @@ -9,26 +9,26 @@ HCatalog is a table and storage management layer for Hadoop that enables users w This is the HCatalog manual.   -* [Using HCatalog]({{< ref "hcatalog-usinghcat" >}}) -* [Installation from Tarball]({{< ref "hcatalog-installhcat" >}}) -* [HCatalog Configuration Properties]({{< ref "hcatalog-configuration-properties" >}}) -* [Load and Store Interfaces]({{< ref "hcatalog-loadstore" >}}) -* [Input and Output Interfaces]({{< ref "hcatalog-inputoutput" >}}) -* [Reader and Writer Interfaces]({{< ref "hcatalog-readerwriter" >}}) -* [Command Line Interface]({{< ref "hcatalog-cli" >}}) -* [Storage Formats]({{< ref "hcatalog-storageformats" >}}) -* [Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) -* [Notification]({{< ref "hcatalog-notification" >}}) -* [Storage Based Authorization]({{< ref "hcatalog-authorization" >}}) +* [Using HCatalog]({{% ref "hcatalog-usinghcat" %}}) +* [Installation from Tarball]({{% ref "hcatalog-installhcat" %}}) +* [HCatalog Configuration Properties]({{% ref "hcatalog-configuration-properties" %}}) +* [Load and Store Interfaces]({{% ref "hcatalog-loadstore" %}}) +* [Input and Output Interfaces]({{% ref "hcatalog-inputoutput" %}}) +* [Reader and Writer Interfaces]({{% ref "hcatalog-readerwriter" %}}) +* [Command Line Interface]({{% ref "hcatalog-cli" %}}) +* [Storage Formats]({{% ref "hcatalog-storageformats" %}}) +* [Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) +* [Notification]({{% ref "hcatalog-notification" %}}) +* [Storage Based Authorization]({{% ref "hcatalog-authorization" %}}) The [old HCatalog wiki page](https://cwiki.apache.org/confluence/display/HCATALOG/Index) has many other documents including additional user documentation, further information on HBase integration, and resources for contributors. -For information about the REST API for HCatalog, *WebHCat* (formerly *Templeton*), see the [WebHCat Manual]({{< ref "webhcat-base" >}}). +For information about the REST API for HCatalog, *WebHCat* (formerly *Templeton*), see the [WebHCat Manual]({{% ref "webhcat-base" %}}).   **Navigation Links** -Next: [Using HCatalog]({{< ref "hcatalog-usinghcat" >}}) +Next: [Using HCatalog]({{% ref "hcatalog-usinghcat" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-cli.md b/content/docs/latest/hcatalog/hcatalog-cli.md index d0b8898e..5a53f287 100644 --- a/content/docs/latest/hcatalog/hcatalog-cli.md +++ b/content/docs/latest/hcatalog/hcatalog-cli.md @@ -51,11 +51,11 @@ Many `hcat` commands can be issued as `hive` commands, including all HCatalog DD For example, "`hcat -DA=B`" versus "`hive -d A=B`". * `hcat` without any flags prints a help message but `hive` uses the `-H` flag or `--help`. -The Hive CLI is documented [here]({{< ref "languagemanual-cli" >}}). +The Hive CLI is documented [here]({{% ref "languagemanual-cli" %}}). ## HCatalog DDL -HCatalog supports all [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}) except those operations that require running a MapReduce job. For commands that are supported, any variances are noted below. +HCatalog supports all [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}) except those operations that require running a MapReduce job. For commands that are supported, any variances are noted below. HCatalog does not support the following Hive DDL and other HiveQL commands: @@ -67,7 +67,7 @@ HCatalog does not support the following Hive DDL and other HiveQL commands: * IMPORT FROM ... * EXPORT TABLE -For information about using WebHCat for DDL commands, see [URL Format]({{< ref "#url-format" >}}) and [WebHCat Reference: DDL Resources]({{< ref "webhcat-reference-allddl" >}}). +For information about using WebHCat for DDL commands, see [URL Format]({{% ref "#url-format" %}}) and [WebHCat Reference: DDL Resources]({{% ref "webhcat-reference-allddl" %}}). ### Create/Drop/Alter Table @@ -152,12 +152,12 @@ If other errors occur while using the HCatalog CLI, more detailed messages are w **Navigation Links** -Previous: [Reader and Writer Interfaces]({{< ref "hcatalog-readerwriter" >}}) - Next: [Storage Formats]({{< ref "hcatalog-storageformats" >}}) +Previous: [Reader and Writer Interfaces]({{% ref "hcatalog-readerwriter" %}}) + Next: [Storage Formats]({{% ref "hcatalog-storageformats" %}}) -Hive command line interface: [Hive CLI]({{< ref "languagemanual-cli" >}}) - Hive DDL commands: [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}) - WebHCat DDL resources: [WebHCat Reference: DDL]({{< ref "webhcat-reference-allddl" >}}) +Hive command line interface: [Hive CLI]({{% ref "languagemanual-cli" %}}) + Hive DDL commands: [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}) + WebHCat DDL resources: [WebHCat Reference: DDL]({{% ref "webhcat-reference-allddl" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-configuration-properties.md b/content/docs/latest/hcatalog/hcatalog-configuration-properties.md index 2efc2a80..ea42c7d2 100644 --- a/content/docs/latest/hcatalog/hcatalog-configuration-properties.md +++ b/content/docs/latest/hcatalog/hcatalog-configuration-properties.md @@ -9,14 +9,14 @@ Apache HCatalog's behaviour can be modified through the use of a few configurati ## Setup -The properties described in this page are meant to be job-level properties set on HCatalog through the jobConf passed into it. This means that this page is relevant for Pig users of [HCatLoader/HCatStorer]({{< ref "hcatalog-loadstore" >}}), or MapReduce users of [HCatInputFormat/HCatOutputFormat]({{< ref "hcatalog-inputoutput" >}}). For a MapReduce user of HCatalog, these must be present as key-values in the Configuration (JobConf/Job/JobContext) used to instantiate HCatOutputFormat or HCatInputFormat. For Pig users of HCatStorer, these parameters are set using the Pig "set" command before instantiating an HCatLoader/HCatStorer. +The properties described in this page are meant to be job-level properties set on HCatalog through the jobConf passed into it. This means that this page is relevant for Pig users of [HCatLoader/HCatStorer]({{% ref "hcatalog-loadstore" %}}), or MapReduce users of [HCatInputFormat/HCatOutputFormat]({{% ref "hcatalog-inputoutput" %}}). For a MapReduce user of HCatalog, these must be present as key-values in the Configuration (JobConf/Job/JobContext) used to instantiate HCatOutputFormat or HCatInputFormat. For Pig users of HCatStorer, these parameters are set using the Pig "set" command before instantiating an HCatLoader/HCatStorer. ## Storage Directives | Property | Default | Description | | --- | --- | --- | |  hcat.pig.storer.external.location | not set | An override to specify where HCatStorer will write to, defined from Pig jobs, either directly by the user, or by using org.apache.hive.hcatalog.pig.HCatStorerWrapper. HCatalog will write to this specified directory, rather than writing to the table or partition directory calculated by the metadata. This will be used in lieu of the table directory if this is a table-level write (unpartitioned table write) or in lieu of the partition directory if this is a partition-level write. This parameter is used only for non-dynamic-partitioning jobs which have multiple write destinations. | -|  hcat.dynamic.partitioning.custom.pattern | not set | For a dynamic partitioning job, simply specifying a custom directory is not sufficient since the job writes to multiple destinations, and thus, instead of a directory specification, it requires a pattern specification. That is where this parameter comes in. For example, given a table partitioned by the keys country and state, with a root directory location of /apps/hive/warehouse/geo/, a dynamic partition write into this table that writes partitions (country=US,state=CA) & (country=IN,state=KA) would create two directories: /apps/hive/warehouse/geo/country=US/state=CA/ and /apps/hive/warehouse/geo/country=IN/state=KA/. However, specifying hcat.dynamic.partitioning.custom.pattern="/ext/geo/${country}-${state}" would create the following two partition directories: /ext/geo/US-CA and /ext/geo/IN-KA. Thus, it allows the user to specify a custom directory location pattern for all writes, and will interpolate each variable it sees when attempting to create a destination location for the partitions. See [Dynamic Partitioning: External Tables]({{< ref "#dynamic-partitioning:-external-tables" >}}) for another example. | +|  hcat.dynamic.partitioning.custom.pattern | not set | For a dynamic partitioning job, simply specifying a custom directory is not sufficient since the job writes to multiple destinations, and thus, instead of a directory specification, it requires a pattern specification. That is where this parameter comes in. For example, given a table partitioned by the keys country and state, with a root directory location of /apps/hive/warehouse/geo/, a dynamic partition write into this table that writes partitions (country=US,state=CA) & (country=IN,state=KA) would create two directories: /apps/hive/warehouse/geo/country=US/state=CA/ and /apps/hive/warehouse/geo/country=IN/state=KA/. However, specifying hcat.dynamic.partitioning.custom.pattern="/ext/geo/${country}-${state}" would create the following two partition directories: /ext/geo/US-CA and /ext/geo/IN-KA. Thus, it allows the user to specify a custom directory location pattern for all writes, and will interpolate each variable it sees when attempting to create a destination location for the partitions. See [Dynamic Partitioning: External Tables]({{% ref "#dynamic-partitioning:-external-tables" %}}) for another example. | | hcat.append.limit(Hive 0.15.0 and later) | not set | hcat.append.limit allows an HCatalog user to specify a custom append limit. By default, while appending to an existing directory HCatalog will attempt to avoid naming clashes and try to append `_a_*NNN*`, where *`NNN`* is a number, to the desired filename to avoid clashes. However, by default, it only tries for *`NNN`* from 0 to 999 before giving up. This can cause an issue for some tables with an extraordinarily large number of files. Ideally, this should be fixed by the user changing their usage pattern and doing some manner of compaction; however, setting this parameter can be used as a temporary fix to increase that limit. (Added in Hive 0.15.0 with [HIVE-9381](https://issues.apache.org/jira/browse/HIVE-9381).) | | hcat.input.ignore.invalid.path | false | hcat.input.ignore.invalid.path allows an HCatalog user to specify whether to ignore the path and return an empty result for it when trying to get a split for an invalid input path. The default is false, and user gets an InvalidInputException if the input path is invalid. (Added in Hive 2.1.0 with [HIVE-13509](https://issues.apache.org/jira/browse/HIVE-13509).) | @@ -56,8 +56,8 @@ While reading, it is understandable that data might contain errors, but we may n | hcat.input.bad.record.min | 2 | An int parameter defaults to 2, which is the minimum number of bad records encountered before applying the hcat.input.bad.record.threshold parameter. This is to prevent an initial or early bad record from causing a task abort because the ratio of errors was too high.  | **Navigation Links** -Previous: [Installation from Tarball]({{< ref "hcatalog-installhcat" >}}) -Next: [Load and Store Interfaces]({{< ref "hcatalog-loadstore" >}}) +Previous: [Installation from Tarball]({{% ref "hcatalog-installhcat" %}}) +Next: [Load and Store Interfaces]({{% ref "hcatalog-loadstore" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-dynamicpartitions.md b/content/docs/latest/hcatalog/hcatalog-dynamicpartitions.md index 54a5e4af..0f2d7136 100644 --- a/content/docs/latest/hcatalog/hcatalog-dynamicpartitions.md +++ b/content/docs/latest/hcatalog/hcatalog-dynamicpartitions.md @@ -56,17 +56,17 @@ Each dynamic partition column must be present in the custom location path in the * ${year}­‐${month}­‐${day}/country=${country} * output/yr=${year}/mon=${month}/day=${day}/geo=${country} -See [HCatalog Configuration Properties]({{< ref "hcatalog-configuration-properties" >}}) for another example. Also see the [PDF attachment to HIVE-6019](https://issues.apache.org/jira/secure/attachment/12622686/HIVE-6109.pdf) for details of the implementation. +See [HCatalog Configuration Properties]({{% ref "hcatalog-configuration-properties" %}}) for another example. Also see the [PDF attachment to HIVE-6019](https://issues.apache.org/jira/secure/attachment/12622686/HIVE-6109.pdf) for details of the implementation. ### Hive Dynamic Partitions Information about Hive dynamic partitions is available here: -* [Design Document for Dynamic Partitions]({{< ref "dynamicpartitions" >}}) +* [Design Document for Dynamic Partitions]({{% ref "dynamicpartitions" %}}) + [Original design doc](https://issues.apache.org/jira/secure/attachment/12437909/dp_design.txt) + [HIVE-936](https://issues.apache.org/jira/browse/HIVE-936) -* [Tutorial: Dynamic-Partition Insert]({{< ref "#tutorial:-dynamic-partition-insert" >}}) -* [Hive DML: Dynamic Partition Inserts]({{< ref "#hive-dml:-dynamic-partition-inserts" >}}) +* [Tutorial: Dynamic-Partition Insert]({{% ref "#tutorial:-dynamic-partition-insert" %}}) +* [Hive DML: Dynamic Partition Inserts]({{% ref "#hive-dml:-dynamic-partition-inserts" %}}) ## Usage with Pig @@ -141,12 +141,12 @@ With dynamic partitioning, we simply specify only as many keys as we know about,   **Navigation Links** -Previous: [Storage Formats]({{< ref "hcatalog-storageformats" >}}) - Next: [Notification]({{< ref "hcatalog-notification" >}}) +Previous: [Storage Formats]({{% ref "hcatalog-storageformats" %}}) + Next: [Notification]({{% ref "hcatalog-notification" %}}) -Hive design document: [Dynamic Partitions]({{< ref "dynamicpartitions" >}}) - Hive tutorial: [Dynamic-Partition Insert]({{< ref "#dynamic-partition-insert" >}}) - Hive DML: [Dynamic Partition Inserts]({{< ref "#dynamic-partition-inserts" >}}) +Hive design document: [Dynamic Partitions]({{% ref "dynamicpartitions" %}}) + Hive tutorial: [Dynamic-Partition Insert]({{% ref "#dynamic-partition-insert" %}}) + Hive DML: [Dynamic Partition Inserts]({{% ref "#dynamic-partition-inserts" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-inputoutput.md b/content/docs/latest/hcatalog/hcatalog-inputoutput.md index 1b5b019d..00c0db01 100644 --- a/content/docs/latest/hcatalog/hcatalog-inputoutput.md +++ b/content/docs/latest/hcatalog/hcatalog-inputoutput.md @@ -137,7 +137,7 @@ The types in an HCatalog table schema determine the types of objects returned fo | ARRAY | java.util.List | values of one data type | | MAP | java.util.Map | key-value pairs | -For general information about Hive data types, see [Hive Data Types]({{< ref "languagemanual-types" >}}) and [Type System]({{< ref "#type-system" >}}). +For general information about Hive data types, see [Hive Data Types]({{% ref "languagemanual-types" %}}) and [Type System]({{% ref "#type-system" %}}). ## Running MapReduce with HCatalog @@ -352,8 +352,8 @@ To write multiple partitions simultaneously you can leave the Map null, but all   **Navigation Links** -Previous: [Load and Store Interfaces]({{< ref "hcatalog-loadstore" >}}) - Next: [Reader and Writer Interfaces]({{< ref "hcatalog-readerwriter" >}}) +Previous: [Load and Store Interfaces]({{% ref "hcatalog-loadstore" %}}) + Next: [Reader and Writer Interfaces]({{% ref "hcatalog-readerwriter" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-installhcat.md b/content/docs/latest/hcatalog/hcatalog-installhcat.md index ff10cc47..500178d7 100644 --- a/content/docs/latest/hcatalog/hcatalog-installhcat.md +++ b/content/docs/latest/hcatalog/hcatalog-installhcat.md @@ -10,7 +10,7 @@ date: 2024-12-12 Version HCatalog is installed with Hive, starting with Hive release 0.11.0. - Hive installation is documented [here]({{< ref "adminmanual-installation" >}}). + Hive installation is documented [here]({{% ref "adminmanual-installation" %}}). ## HCatalog Command Line @@ -18,7 +18,7 @@ If you install Hive from the binary tarball, the `hcat` command is available in The `hcat` command line is similar to the `hive` command line; the main difference is that it restricts the queries that can be run to metadata-only operations such as DDL and DML queries used to read metadata (for example, "show tables"). -The HCatalog CLI is documented [here]({{< ref "hcatalog-cli" >}}) and the Hive CLI is documented [here]({{< ref "languagemanual-cli" >}}). +The HCatalog CLI is documented [here]({{% ref "hcatalog-cli" %}}) and the Hive CLI is documented [here]({{% ref "languagemanual-cli" %}}). Most `hcat` commands can be issued as `hive` commands except for "`hcat -g`" and "`hcat -p`". Note that the `hcat` command uses the `-p` flag for permissions but `hive` uses it to specify a port number. @@ -28,16 +28,16 @@ In the Hive tar.gz, HCatalog libraries are available under hcatalog/share/hcatal ## HCatalog Server -HCatalog server is the same as Hive metastore. You can just follow the [Hive metastore documentation]({{< ref "adminmanual-metastore-administration" >}}) for setting it up. +HCatalog server is the same as Hive metastore. You can just follow the [Hive metastore documentation]({{% ref "adminmanual-metastore-administration" %}}) for setting it up.   **Navigation Links** -Previous: [Using HCatalog]({{< ref "hcatalog-usinghcat" >}}) - Next: [HCatalog Configuration Properties]({{< ref "hcatalog-configuration-properties" >}}) +Previous: [Using HCatalog]({{% ref "hcatalog-usinghcat" %}}) + Next: [HCatalog Configuration Properties]({{% ref "hcatalog-configuration-properties" %}}) -Hive installation and configuration: [Installing Hive]({{< ref "adminmanual-installation" >}}), [Configuring Hive]({{< ref "adminmanual-configuration" >}}), [Hive Configuration Properties](/docs/latest/user/configuration-properties) - WebHCat installation and configuration: [WebHCat Installation]({{< ref "webhcat-installwebhcat" >}}), [WebHCat Configuration]({{< ref "webhcat-configure" >}}) +Hive installation and configuration: [Installing Hive]({{% ref "adminmanual-installation" %}}), [Configuring Hive]({{% ref "adminmanual-configuration" %}}), [Hive Configuration Properties](/docs/latest/user/configuration-properties) + WebHCat installation and configuration: [WebHCat Installation]({{% ref "webhcat-installwebhcat" %}}), [WebHCat Configuration]({{% ref "webhcat-configure" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-loadstore.md b/content/docs/latest/hcatalog/hcatalog-loadstore.md index 9b668519..514b09a5 100644 --- a/content/docs/latest/hcatalog/hcatalog-loadstore.md +++ b/content/docs/latest/hcatalog/hcatalog-loadstore.md @@ -21,7 +21,7 @@ To bring in the appropriate jars for working with HCatalog, simply include the f pig -useHCatalog ``` -See section [Running Pig with HCatalog]({{< ref "#running-pig-with-hcatalog" >}}) below for details. +See section [Running Pig with HCatalog]({{% ref "#running-pig-with-hcatalog" %}}) below for details. Stale Content Warning @@ -54,13 +54,13 @@ You must specify the table name in single quotes: LOAD 'tablename'. If you are u The Hive metastore lets you create tables without specifying a database; if you created tables this way, then the database name is 'default' and is not required when specifying the table for HCatLoader. -If the table is partitioned, you can indicate which partitions to scan by immediately following the load statement with a partition filter statement (see [Load Examples]({{< ref "#load-examples" >}}) below). +If the table is partitioned, you can indicate which partitions to scan by immediately following the load statement with a partition filter statement (see [Load Examples]({{% ref "#load-examples" %}}) below). ### HCatLoader Data Types Restrictions apply to the types of columns HCatLoader can read from HCatalog-managed tables. HCatLoader can read ***only*** the Hive data types listed below.  -The tables in [Data Type Mappings]({{< ref "#data-type-mappings" >}}) show how Pig will interpret each Hive data type. +The tables in [Data Type Mappings]({{% ref "#data-type-mappings" %}}) show how Pig will interpret each Hive data type. #### Types in Hive 0.12.0 and Earlier @@ -92,7 +92,7 @@ Hive 0.13.0 added support for reading these Hive data types with HCatLoader ([HI * char(x) * varchar(x) -See [Data Type Mappings]({{< ref "#data-type-mappings" >}}) below for details of the mappings between Hive and Pig types. +See [Data Type Mappings]({{% ref "#data-type-mappings" %}}) below for details of the mappings between Hive and Pig types. Note @@ -314,7 +314,7 @@ store z into 'web_data' using org.apache.hive.hcatalog.pig.HCatStorer(); Restrictions apply to the types of columns HCatStorer can write to HCatalog-managed tables. HCatStorer can write ***only*** the data types listed below. -The tables in [Data Type Mappings]({{< ref "#data-type-mappings" >}}) show how HCatalog will interpret each Pig data type. +The tables in [Data Type Mappings]({{% ref "#data-type-mappings" %}}) show how HCatalog will interpret each Pig data type. #### Types in Hive 0.12.0 and Earlier @@ -342,7 +342,7 @@ Hive 0.13.0 added support for writing these Pig data types with HCatStorer ([HIV * datetime * bigdecimal -and added more Hive data types that the Pig types can be written to. See [Data Type Mappings]({{< ref "#data-type-mappings" >}}) below for details of the mappings between Pig and Hive types. +and added more Hive data types that the Pig types can be written to. See [Data Type Mappings]({{% ref "#data-type-mappings" %}}) below for details of the mappings between Pig and Hive types. Note @@ -350,7 +350,7 @@ Hive does not have a data type corresponding to the biginteger type in Pig. ## Data Type Mappings -The tables below show the mappings between data types in HCatalog-managed Hive tables and Pig. For general information about Hive data types, see [Hive Data Types]({{< ref "languagemanual-types" >}}) and [Type System]({{< ref "#type-system" >}}). +The tables below show the mappings between data types in HCatalog-managed Hive tables and Pig. For general information about Hive data types, see [Hive Data Types]({{% ref "languagemanual-types" %}}) and [Type System]({{% ref "#type-system" %}}). Any type mapping not listed here is not supported and will throw an exception. The user is expected to cast the value to a compatible type first (in a Pig script, for example). @@ -391,8 +391,8 @@ Hive does not have a data type corresponding to the BIGINTEGER type in Pig (java   **Navigation Links** -Previous: [HCatalog Configuration Properties]({{< ref "hcatalog-configuration-properties" >}}) - Next: [Input and Output Interfaces]({{< ref "hcatalog-inputoutput" >}}) +Previous: [HCatalog Configuration Properties]({{% ref "hcatalog-configuration-properties" %}}) + Next: [Input and Output Interfaces]({{% ref "hcatalog-inputoutput" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-notification.md b/content/docs/latest/hcatalog/hcatalog-notification.md index 92703954..d7b7fe68 100644 --- a/content/docs/latest/hcatalog/hcatalog-notification.md +++ b/content/docs/latest/hcatalog/hcatalog-notification.md @@ -89,7 +89,7 @@ msc.markPartitionForEvent("mydb", "mytbl", partMap, PartitionEventType.LOAD_DONE To receive this notification, the consumer needs to do the following: -1. Repeat steps one and two from [above]({{< ref "#above" >}}) to establish the connection to the notification system and to subscribe to the topic. +1. Repeat steps one and two from [above]({{% ref "#above" %}}) to establish the connection to the notification system and to subscribe to the topic. 2. Receive the notification as shown in this example: ``` @@ -209,8 +209,8 @@ You then need to configure your ActiveMQ Consumer(s) to listen for messages on t **Navigation Links** -Previous: [Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) -Next: [Storage Based Authorization]({{< ref "hcatalog-authorization" >}}) +Previous: [Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) +Next: [Storage Based Authorization]({{% ref "hcatalog-authorization" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-readerwriter.md b/content/docs/latest/hcatalog/hcatalog-readerwriter.md index eb88c90a..40619990 100644 --- a/content/docs/latest/hcatalog/hcatalog-readerwriter.md +++ b/content/docs/latest/hcatalog/hcatalog-readerwriter.md @@ -115,8 +115,8 @@ A complete java program for the reader and writer examples above can be found he   **Navigation Links** -Previous: [Input and Output Interfaces]({{< ref "hcatalog-inputoutput" >}}) - Next: [Command Line Interface]({{< ref "hcatalog-cli" >}}) +Previous: [Input and Output Interfaces]({{% ref "hcatalog-inputoutput" %}}) + Next: [Command Line Interface]({{% ref "hcatalog-cli" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-storageformats.md b/content/docs/latest/hcatalog/hcatalog-storageformats.md index 4892b227..032422fc 100644 --- a/content/docs/latest/hcatalog/hcatalog-storageformats.md +++ b/content/docs/latest/hcatalog/hcatalog-storageformats.md @@ -7,15 +7,15 @@ date: 2024-12-12 ### SerDes and Storage Formats -HCatalog uses Hive's SerDe class to serialize and deserialize data. SerDes are provided for RCFile, CSV text, JSON text, and SequenceFile formats. Check the [SerDe documentation]({{< ref "serde" >}}) for additional SerDes that might be included in new versions. For example, the [Avro SerDe]({{< ref "avroserde" >}}) was added in Hive 0.9.1, the [ORC]({{< ref "languagemanual-orc" >}}) file format was added in Hive 0.11.0, and [Parquet]({{< ref "parquet" >}}) was added in Hive 0.10.0 (plug-in) and Hive 0.13.0 (native). +HCatalog uses Hive's SerDe class to serialize and deserialize data. SerDes are provided for RCFile, CSV text, JSON text, and SequenceFile formats. Check the [SerDe documentation]({{% ref "serde" %}}) for additional SerDes that might be included in new versions. For example, the [Avro SerDe]({{% ref "avroserde" %}}) was added in Hive 0.9.1, the [ORC]({{% ref "languagemanual-orc" %}}) file format was added in Hive 0.11.0, and [Parquet]({{% ref "parquet" %}}) was added in Hive 0.10.0 (plug-in) and Hive 0.13.0 (native). Users can write SerDes for custom formats using these instructions: -* [How to Write Your Own SerDe]({{< ref "#how-to-write-your-own-serde" >}}) in the Developer Guide +* [How to Write Your Own SerDe]({{% ref "#how-to-write-your-own-serde" %}}) in the Developer Guide * [Hive User Group Meeting August 2009](http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook) pages 64-70 -* also see [SerDe]({{< ref "serde" >}}) for details about input and output processing +* also see [SerDe]({{% ref "serde" %}}) for details about input and output processing -For information about how to create a table with a custom or native SerDe, see [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}). +For information about how to create a table with a custom or native SerDe, see [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}). ### Usage from Hive @@ -37,12 +37,12 @@ See [HCATALOG-436](https://issues.apache.org/jira/browse/HCATALOG-436) for detai   **Navigation Links** -Previous: [Command Line Interface]({{< ref "hcatalog-cli" >}}) - Next: [Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) +Previous: [Command Line Interface]({{% ref "hcatalog-cli" %}}) + Next: [Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) -SerDe general information: [Hive SerDe]({{< ref "#hive-serde" >}}) - SerDe details: [SerDe]({{< ref "serde" >}}) - SerDe DDL: [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}) +SerDe general information: [Hive SerDe]({{% ref "#hive-serde" %}}) + SerDe details: [SerDe]({{% ref "serde" %}}) + SerDe DDL: [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}) diff --git a/content/docs/latest/hcatalog/hcatalog-streaming-mutation-api.md b/content/docs/latest/hcatalog/hcatalog-streaming-mutation-api.md index 503f4203..21bc9966 100644 --- a/content/docs/latest/hcatalog/hcatalog-streaming-mutation-api.md +++ b/content/docs/latest/hcatalog/hcatalog-streaming-mutation-api.md @@ -13,9 +13,9 @@ In certain data processing use cases it is necessary to modify existing data whe The availability of ACID tables in Hive provides a mechanism that both enables concurrent access to data stored in HDFS (so long as it's in the ORC+ACID format) and also permits row level mutations on records within a table, without the need to rewrite the existing data. But while Hive itself supports `INSERT`, `UPDATE` and `DELETE` commands, and the ORC format can support large batches of mutations in a transaction, Hive's execution engine currently submits each individual mutation operation in a separate transaction and issues table scans (M/R jobs) to execute them. It does not currently scale to the demands of processing large deltas in an atomic manner. Furthermore it would be advantageous to extend atomic batch mutation capabilities beyond Hive by making them available to other data processing frameworks. The Streaming Mutation API does just this. -The Streaming Mutation API, although similar to the [Streaming API]({{< ref "streaming-data-ingest" >}}), has a number of differences and is built to enable very different use cases. Superficially, the Streaming API can only write new data whereas the mutation API can also modify existing data. However the two APIs are also based on very different transaction models. The Streaming API focuses on surfacing a continuous stream of new data into a Hive table and does so by batching small sets of writes into multiple short-lived transactions. Conversely the mutation API is designed to infrequently apply large sets of mutations to a data set in an atomic fashion: either all or none of the mutations will be applied. This instead mandates the use of a single long-lived transaction. This table summarises the attributes of each API: +The Streaming Mutation API, although similar to the [Streaming API]({{% ref "streaming-data-ingest" %}}), has a number of differences and is built to enable very different use cases. Superficially, the Streaming API can only write new data whereas the mutation API can also modify existing data. However the two APIs are also based on very different transaction models. The Streaming API focuses on surfacing a continuous stream of new data into a Hive table and does so by batching small sets of writes into multiple short-lived transactions. Conversely the mutation API is designed to infrequently apply large sets of mutations to a data set in an atomic fashion: either all or none of the mutations will be applied. This instead mandates the use of a single long-lived transaction. This table summarises the attributes of each API: -| Attribute | [Streaming API]({{< ref "streaming-data-ingest" >}}) | Mutation API | +| Attribute | [Streaming API]({{% ref "streaming-data-ingest" %}}) | Mutation API | | --- | --- | --- | | Ingest type | Data arrives continuously. | Ingests are performed periodically and the mutations are applied in a single batch. | | Transaction scope | Transactions are created for small batches of writes. | The entire set of mutations should be applied within a single transaction. | @@ -55,10 +55,10 @@ Update operations should not attempt to modify values of partition or bucketing A few things are currently required to use streaming.  -1. Currently, only [ORC storage format]({{< ref "languagemanual-orc" >}}) is supported. So '`stored as orc`' must be specified during table creation. -2. The Hive table must be bucketed, but not sorted. So something like '`clustered by (colName) into 10 buckets`' must be specified during table creation. See [Bucketed Tables]({{< ref "languagemanual-ddl-bucketedtables" >}}) for a detailed example. +1. Currently, only [ORC storage format]({{% ref "languagemanual-orc" %}}) is supported. So '`stored as orc`' must be specified during table creation. +2. The Hive table must be bucketed, but not sorted. So something like '`clustered by (colName) into 10 buckets`' must be specified during table creation. See [Bucketed Tables]({{% ref "languagemanual-ddl-bucketedtables" %}}) for a detailed example. 3. User of the client streaming process must have the necessary permissions to write to the table or partition and create partitions in the table. -4. Hive transactions must be configured for each table (see [Hive Transactions – Table Properties]({{< ref "#hive-transactions – table-properties" >}})) as well as in `hive-site.xml` (see [Hive Transactions – Configuration]({{< ref "#hive-transactions-–-configuration" >}})). +4. Hive transactions must be configured for each table (see [Hive Transactions – Table Properties]({{% ref "#hive-transactions – table-properties" %}})) as well as in `hive-site.xml` (see [Hive Transactions – Configuration]({{% ref "#hive-transactions-–-configuration" %}})). **Note:** Hive also supports streaming mutations to **unpartitioned** tables. diff --git a/content/docs/latest/hcatalog/hcatalog-usinghcat.md b/content/docs/latest/hcatalog/hcatalog-usinghcat.md index a0c02fad..88c1f9ca 100644 --- a/content/docs/latest/hcatalog/hcatalog-usinghcat.md +++ b/content/docs/latest/hcatalog/hcatalog-usinghcat.md @@ -24,19 +24,19 @@ HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. HCat ### Interfaces -The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. HCatLoader is implemented on top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat. (See [Load and Store Interfaces]({{< ref "hcatalog-loadstore" >}}).) +The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. HCatLoader is implemented on top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat. (See [Load and Store Interfaces]({{% ref "hcatalog-loadstore" %}}).) -The HCatalog interface for MapReduce — HCatInputFormat and HCatOutputFormat — is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and optionally a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the setOutput method; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. (See [Input and Output Interfaces]({{< ref "hcatalog-inputoutput" >}}).) +The HCatalog interface for MapReduce — HCatInputFormat and HCatOutputFormat — is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and optionally a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and optionally a specification of partition keys to create a new partition. You can write to a single partition by specifying the partition key(s) and value(s) in the setOutput method; and you can write to multiple partitions if the partition key(s) are columns in the data being stored. (See [Input and Output Interfaces]({{% ref "hcatalog-inputoutput" %}}).) **Note:** There is no Hive-specific interface. Since HCatalog uses Hive's metastore, Hive can read data in HCatalog directly. -Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports all Hive DDL that does not require MapReduce to execute, allowing users to create, alter, drop tables, etc. The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, and so on. Unsupported Hive DDL includes import/export, the REBUILD and CONCATENATE options of ALTER TABLE, CREATE TABLE AS SELECT, and ANALYZE TABLE ... COMPUTE STATISTICS. (See [Command Line Interface]({{< ref "hcatalog-cli" >}}).) +Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports all Hive DDL that does not require MapReduce to execute, allowing users to create, alter, drop tables, etc. The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, and so on. Unsupported Hive DDL includes import/export, the REBUILD and CONCATENATE options of ALTER TABLE, CREATE TABLE AS SELECT, and ANALYZE TABLE ... COMPUTE STATISTICS. (See [Command Line Interface]({{% ref "hcatalog-cli" %}}).) ### Data Model HCatalog presents a relational view of data. Data is stored in tables and these tables can be placed in databases. Tables can also be hash partitioned on one or more keys; that is, for a given value of a key (or set of keys) there will be one partition that contains all rows with that value (or set of values). For example, if a table is partitioned on date and there are three days of data in the table, there will be three partitions in the table. New partitions can be added to a table, and partitions can be dropped from a table. Partitioned tables have no partitions at create time. Unpartitioned tables effectively have one default partition that must be created at table creation time. There is no guaranteed read consistency when a partition is dropped. -Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive. See [Load and Store Interfaces]({{< ref "hcatalog-loadstore" >}}) for more information about datatypes. +Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive. See [Load and Store Interfaces]({{% ref "hcatalog-loadstore" %}}) for more information about datatypes. ## Data Flow Example @@ -105,12 +105,12 @@ group by advertiser_id; ## HCatalog Web API -*WebHCat* is a REST API for HCatalog. (REST stands for "[representational state transfer](http://en.wikipedia.org/wiki/Representational_state_transfer)", a style of API based on HTTP verbs).  The original name of WebHCat was *Templeton*. For more information, see the [WebHCat manual]({{< ref "webhcat-base" >}}). +*WebHCat* is a REST API for HCatalog. (REST stands for "[representational state transfer](http://en.wikipedia.org/wiki/Representational_state_transfer)", a style of API based on HTTP verbs).  The original name of WebHCat was *Templeton*. For more information, see the [WebHCat manual]({{% ref "webhcat-base" %}}).   **Navigation Links** -Next: [HCatalog Installation]({{< ref "hcatalog-installhcat" >}}) +Next: [HCatalog Installation]({{% ref "hcatalog-installhcat" %}}) diff --git a/content/docs/latest/language/apache-hive-sql-conformance.md b/content/docs/latest/language/apache-hive-sql-conformance.md index 5775a103..126e33e5 100644 --- a/content/docs/latest/language/apache-hive-sql-conformance.md +++ b/content/docs/latest/language/apache-hive-sql-conformance.md @@ -15,7 +15,7 @@ The formal name of the current SQL standard is ISO/IEC 9075 "Database Language S | --- | --- | | Apache Hive 2.1 | [Supported SQL Features](/docs/latest/language/supported-features-apache-hive-2-1) | | Apache Hive 2.3 | [Supported SQL Features](/docs/latest/language/supported-features-apache-hive-2-3) | -| Apache Hive 3.1 | [Supported SQL Features]({{< ref "supported-features" >}}) | +| Apache Hive 3.1 | [Supported SQL Features]({{% ref "supported-features" %}}) | Information in these pages is not guaranteed to be accurate. Corrections can be submitted to the Apache Hive mailing list at [user@hive.apache.org](mailto:user@hive.apache.org). diff --git a/content/docs/latest/language/common-table-expression.md b/content/docs/latest/language/common-table-expression.md index 29d11968..9491d883 100644 --- a/content/docs/latest/language/common-table-expression.md +++ b/content/docs/latest/language/common-table-expression.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : Common Table Expression -A Common Table Expression (CTE) is a temporary result set derived from a simple query specified in a WITH clause, which immediately precedes a SELECT or INSERT keyword.  The CTE is defined only within the execution scope of a single statement.  One or more CTEs can be used in a Hive [SELECT]({{< ref "languagemanual-select" >}}), [INSERT]({{< ref "#insert" >}}), [CREATE TABLE AS SELECT]({{< ref "#create-table-as-select" >}}), or [CREATE VIEW AS SELECT]({{< ref "#create-view-as-select" >}}) statement. +A Common Table Expression (CTE) is a temporary result set derived from a simple query specified in a WITH clause, which immediately precedes a SELECT or INSERT keyword.  The CTE is defined only within the execution scope of a single statement.  One or more CTEs can be used in a Hive [SELECT]({{% ref "languagemanual-select" %}}), [INSERT]({{% ref "#insert" %}}), [CREATE TABLE AS SELECT]({{% ref "#create-table-as-select" %}}), or [CREATE VIEW AS SELECT]({{% ref "#create-view-as-select" %}}) statement. Version diff --git a/content/docs/latest/language/exchange-partition.md b/content/docs/latest/language/exchange-partition.md index db4b0023..118781ba 100644 --- a/content/docs/latest/language/exchange-partition.md +++ b/content/docs/latest/language/exchange-partition.md @@ -9,16 +9,16 @@ The EXCHANGE PARTITION command will move a partition from a source table to targ When the command is executed, the source table's partition folder in HDFS will be renamed to move it to the destination table's partition folder.  The Hive metastore will be updated to change the metadata of the source and destination tables accordingly. -The partition specification can be fully or [partially specified]({{< ref "#partially-specified" >}}). +The partition specification can be fully or [partially specified]({{% ref "#partially-specified" %}}). -See [Language Manual DDL]({{< ref "#language-manual-ddl" >}}) for additional information on the Exchange Partition feature. +See [Language Manual DDL]({{% ref "#language-manual-ddl" %}}) for additional information on the Exchange Partition feature. #### Constraints * The destination table cannot contain the partition to be exchanged. * The operation fails in the presence of an index. -* Exchange partition is not allowed with transactional tables either as source or destination. Alternatively, use [LOAD DATA]({{< ref "#load-data" >}}) or [INSERT OVERWRITE]({{< ref "#insert-overwrite" >}}) commands to move partitions across transactional tables. +* Exchange partition is not allowed with transactional tables either as source or destination. Alternatively, use [LOAD DATA]({{% ref "#load-data" %}}) or [INSERT OVERWRITE]({{% ref "#insert-overwrite" %}}) commands to move partitions across transactional tables. * This command requires both the source and destination table names to have the same table schema.   If the schemas are different, the following exception is thrown: diff --git a/content/docs/latest/language/genericudafcasestudy.md b/content/docs/latest/language/genericudafcasestudy.md index 6d174fa4..279948d9 100644 --- a/content/docs/latest/language/genericudafcasestudy.md +++ b/content/docs/latest/language/genericudafcasestudy.md @@ -37,7 +37,7 @@ At a high-level, there are two parts to implementing a Generic UDAF. The first i The resolver handles type checking and operator overloading for UDAF queries. The type checking ensures that the user isn't passing a **double** expression where an **integer** is expected, for example, and the operator overloading allows you to have different UDAF logic for different types of arguments. -The resolver class must extend **org.apache.hadoop.hive.ql.udf.GenericUDAFResolver2** (see [#Resolver Interface Evolution]({{< ref "##resolver-interface-evolution" >}}) for backwards compatibility information). We recommend that you extend the AbstractGenericUDAFResolver base class in order to insulate your UDAF from future interface changes in Hive. +The resolver class must extend **org.apache.hadoop.hive.ql.udf.GenericUDAFResolver2** (see [#Resolver Interface Evolution]({{% ref "##resolver-interface-evolution" %}}) for backwards compatibility information). We recommend that you extend the AbstractGenericUDAFResolver base class in order to insulate your UDAF from future interface changes in Hive. Look at one of the existing UDAFs for the *import*s you will need. diff --git a/content/docs/latest/language/hive-udfs.md b/content/docs/latest/language/hive-udfs.md index 21a47f71..028165fa 100644 --- a/content/docs/latest/language/hive-udfs.md +++ b/content/docs/latest/language/hive-udfs.md @@ -39,7 +39,7 @@ These functions can be used without GROUP BY as well.  | **Return Type** | **Name(Signature)** | **Description** | **Source code** | | --- | --- | --- | --- | -| **bigint** | ``` count(*) ``` ``` count(expr) ``` ``` count(DISTINCT expr[, expr...]) ``` | count(*) - Returns the total number of retrieved rows, including rows containing NULL values.count(expr) - Returns the number of rows for which the supplied expression is non-NULL.count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with [hive.optimize.distinct.rewrite]({{< ref "#hive-optimize-distinct-rewrite" >}}). | [GenericUDAFCount](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java) | +| **bigint** | ``` count(*) ``` ``` count(expr) ``` ``` count(DISTINCT expr[, expr...]) ``` | count(*) - Returns the total number of retrieved rows, including rows containing NULL values.count(expr) - Returns the number of rows for which the supplied expression is non-NULL.count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with [hive.optimize.distinct.rewrite]({{% ref "#hive-optimize-distinct-rewrite" %}}). | [GenericUDAFCount](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFCount.java) | | **double** | ``` sum(col), sum(DISTINCT col) ``` | Returns the sum of the elements in the group or the sum of the distinct values of the column in the group. MODIFIED | [GenericUDAFSum](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java) | | **double** | ``` avg(col), avg(DISTINCT col) ``` | Returns the average of the elements in the group or the average of the distinct values of the column in the group. | [GenericUDAFAverage](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java) | | **double** | ``` min(col) ``` | Returns the minimum of the column in the group. | [GenericUDAFMin](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFMin.java) | @@ -97,7 +97,7 @@ There is no good engine without string manipulation functions. Apache Hive has r | **int** | ``` character_length(string str) ``` | Returns the number of UTF-8 characters contained in str. The function char_length is shorthand for this function. | [GenericUDFCharacterLength](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCharacterLength.java) | | **string** | ``` chr(bigint\|double A) ``` | Returns the ASCII character having the binary equivalent to A. If A is larger than 256 the result is equivalent to chr(A % 256). Example: select chr(88); returns "X". | [UDFChr](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFChr.java) | | **string** | ``` concat(string\|binary A,string\|binary B...) ``` | Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. | [GenericUDFConcat](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcat.java) | -| **array\\>** | ``` context_ngrams(array>, array, int K, int pf) ``` | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | [GenericUDAFContextNGrams](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java) | +| **array\\>** | ``` context_ngrams(array>, array, int K, int pf) ``` | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{% ref "statisticsanddatamining" %}}) for more information. | [GenericUDAFContextNGrams](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFContextNGrams.java) | | **string** | ``` concat_ws(string SEP, string A, string B...) ``` | Like concat() above, but with custom separator SEP. | [GenericUDFConcatWS](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFConcatWS.java) | | **string** | ``` concat_ws(string SEP, array) ``` | Like concat_ws() above, but taking an array ofstrings. | | **string** | ``` decode(binary bin, string charset) ``` | Decodes the first argument into a string using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. | [GenericUDFDecode](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFDecode.java) | @@ -114,7 +114,7 @@ There is no good engine without string manipulation functions. Apache Hive has r | **string** | ``` lower(string A) lcase(string A) ``` | Returns the string resulting from converting all characters of B to lowercase. For example, lower('fOoBaR') results in 'foobar'. | [GenericUDFLower](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLower.java) | | **string** | ``` lpad(string str,int len,string pad) ``` | Returns str, left-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In the case of an empty padstring, the return value is null. | [GenericUDFLpad](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java) | | **string** | ``` ltrim(string A) ``` | Returns the string resulting from trimming spaces from the beginning(left-hand side) of A. For example, ltrim(' foobar ') results in 'foobar '. | [GenericUDFLTrim](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java) | -| **array\\>** | ``` ngrams(array>,int N,int K,int pf) ``` | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | [GenericUDAFnGrams](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java) | +| **array\\>** | ``` ngrams(array>,int N,int K,int pf) ``` | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{% ref "statisticsanddatamining" %}}) for more information. | [GenericUDAFnGrams](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFnGrams.java) | | **int** | ``` octet_length(string str) ``` | Returns the number of octets required to hold the string str in UTF-8 encoding.  Note that octet_length(str) can be larger than character_length(str). | [GenericUDFOctetLength](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOctetLength.java) | | **string** | ``` parse_url(string urlString,string partToExtract [,string keyToExtract]) ``` | Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('', 'HOST') returns '[facebook.com](http://facebook.com)'. Also, a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url('', 'QUERY', 'k1') returns 'v1'. | [UDFParseUrl](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/UDFParseUrl.java) | | **string** | ``` printf(string format, Obj... args) ``` | Returns the input formatted according to [printf-style](https://en.wikipedia.org/wiki/Printf) formatstrings. | [GenericUDFPrintf](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFPrintf.java) | @@ -303,7 +303,7 @@ In Hive several built-in functions do not belong to any categories above. These | **Return Type** | **Name(Signature)** | **Description** | **Source code** | | --- | --- | --- | --- | | **varies** | ``` java_method(class, method[, arg1[, arg2..]]) ``` | Synonym for `reflect`. | [GenericUDFReflect](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java) | -| **varies** | ``` reflect(class, method[, arg1[, arg2..]]) ``` | Calls a Java method by matching the argument signature, using reflection. See [Reflect (Generic) UDF]({{< ref "reflectudf" >}}) for examples. | [GenericUDFReflect](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java) | +| **varies** | ``` reflect(class, method[, arg1[, arg2..]]) ``` | Calls a Java method by matching the argument signature, using reflection. See [Reflect (Generic) UDF]({{% ref "reflectudf" %}}) for examples. | [GenericUDFReflect](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFReflect.java) | | **int** | ``` hash(a1[, a2...]) ``` | Returns a hash value of the arguments.  | [GenericUDFHash](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFHash.java) | | **string** | ``` current_user() ``` | Returns current user name from the configured authenticator manager. Could be the same as the user provided when connecting, but with some authentication managers (for example HadoopDefaultAuthenticator) it could be different. | [GenericUDFCurrentUser](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFCurrentUser.java) | | **string** | ``` logged_in_user() ``` | Returns the current user name from the session state. This is the username provided when connecting to Hive. | [GenericUDFLoggedInUser](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLoggedInUser.java) | diff --git a/content/docs/latest/language/hiveplugins.md b/content/docs/latest/language/hiveplugins.md index 4bbbeff4..22af9dbc 100644 --- a/content/docs/latest/language/hiveplugins.md +++ b/content/docs/latest/language/hiveplugins.md @@ -28,7 +28,7 @@ public final class Lower extends UDF { After compiling your code to a jar, you need to add this to the Hive classpath. See the section below on deploying jars. -Once Hive is started up with your jars in the classpath, the final step is to register your function as described in [Create Function]({{< ref "#create-function" >}}): +Once Hive is started up with your jars in the classpath, the final step is to register your function as described in [Create Function]({{% ref "#create-function" %}}): ``` create temporary function my_lower as 'com.example.hive.udf.Lower'; @@ -49,9 +49,9 @@ vp 7.0 ``` -For a more involved example, see [this page]({{< ref "genericudafcasestudy" >}}). +For a more involved example, see [this page]({{% ref "genericudafcasestudy" %}}). -As of [Hive 0.13](https://issues.apache.org/jira/browse/HIVE-6047), you can register your function as a permanent UDF either in the current database or in a specified database, as described in [Permanent Functions]({{< ref "#permanent-functions" >}}). For example: +As of [Hive 0.13](https://issues.apache.org/jira/browse/HIVE-6047), you can register your function as a permanent UDF either in the current database or in a specified database, as described in [Permanent Functions]({{% ref "#permanent-functions" %}}). For example: ``` create function my_db.my_lower as 'com.example.hive.udf.Lower'; @@ -83,9 +83,9 @@ my_jar.jar ``` -See [Hive CLI]({{< ref "#hive-cli" >}}) for full syntax and more examples. +See [Hive CLI]({{% ref "#hive-cli" %}}) for full syntax and more examples. -As of [Hive 0.13](https://issues.apache.org/jira/browse/HIVE-6380), UDFs also have the option of being able to specify required jars in the [CREATE FUNCTION]({{< ref "#create-function" >}}) statement: +As of [Hive 0.13](https://issues.apache.org/jira/browse/HIVE-6380), UDFs also have the option of being able to specify required jars in the [CREATE FUNCTION]({{% ref "#create-function" %}}) statement: ``` CREATE FUNCTION myfunc AS 'myclass' USING JAR 'hdfs:///path/to/jar'; diff --git a/content/docs/latest/language/hiveql.md b/content/docs/latest/language/hiveql.md index 45f37705..2288358d 100644 --- a/content/docs/latest/language/hiveql.md +++ b/content/docs/latest/language/hiveql.md @@ -7,7 +7,7 @@ date: 2024-12-12 **This page is deprecated** -Please see the [HiveQL Language Manual]({{< ref "languagemanual" >}}) +Please see the [HiveQL Language Manual]({{% ref "languagemanual" %}}) diff --git a/content/docs/latest/language/languagemanual-authorization.md b/content/docs/latest/language/languagemanual-authorization.md index 6a1609d7..65728c84 100644 --- a/content/docs/latest/language/languagemanual-authorization.md +++ b/content/docs/latest/language/languagemanual-authorization.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Introduction -Note that this documentation is referring to Authorization which is verifying if a user has permission to perform a certain action, and not about Authentication (verifying the identity of the user). Strong authentication for tools like the [Hive command line]({{< ref "languagemanual-cli" >}}) is provided through the use of Kerberos. There are additional authentication options for users of [HiveServer2]({{< ref "setting-up-hiveserver2" >}}). +Note that this documentation is referring to Authorization which is verifying if a user has permission to perform a certain action, and not about Authentication (verifying the identity of the user). Strong authentication for tools like the [Hive command line]({{% ref "languagemanual-cli" %}}) is provided through the use of Kerberos. There are additional authentication options for users of [HiveServer2]({{% ref "setting-up-hiveserver2" %}}). ## Hive Authorization Options @@ -17,16 +17,16 @@ Three modes of Hive authorization are available to satisfy different use cases. It is useful to think of authorization in terms of two primary use cases of Hive.  -1. Hive as a table storage layer. This is the use case for Hive's [HCatalog]({{< ref "hcatalog-base" >}}) API users such as Apache Pig, MapReduce and some Massively Parallel Processing databases (Cloudera Impala, Facebook Presto, Spark SQL etc). In this case, Hive provides a table abstraction and metadata for files on storage (typically HDFS). These users have direct access to HDFS and the metastore server (which provides an API for metadata access). HDFS access is authorized through the use of [HDFS permissions](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html). Metadata access needs to be authorized using Hive configuration. +1. Hive as a table storage layer. This is the use case for Hive's [HCatalog]({{% ref "hcatalog-base" %}}) API users such as Apache Pig, MapReduce and some Massively Parallel Processing databases (Cloudera Impala, Facebook Presto, Spark SQL etc). In this case, Hive provides a table abstraction and metadata for files on storage (typically HDFS). These users have direct access to HDFS and the metastore server (which provides an API for metadata access). HDFS access is authorized through the use of [HDFS permissions](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html). Metadata access needs to be authorized using Hive configuration. 2. Hive as a SQL query engine. This is one of the most common use cases of Hive. This is the 'Hive view' of SQL users and BI tools. This use case has the following two subcategories: - 1. [Hive command line]({{< ref "#hive-command-line" >}}) users. These users have direct access to HDFS and the Hive metastore, which makes this use case similar to use case 1. *Note, that usage of Hive CLI will be officially [deprecated](https://issues.apache.org/jira/browse/HIVE-10304) soon in favor of Beeline.* + 1. [Hive command line]({{% ref "#hive-command-line" %}}) users. These users have direct access to HDFS and the Hive metastore, which makes this use case similar to use case 1. *Note, that usage of Hive CLI will be officially [deprecated](https://issues.apache.org/jira/browse/HIVE-10304) soon in favor of Beeline.* 2. ODBC/JDBC and other HiveServer2 API users (Beeline CLI is an example). These users have all data/metadata access happening through HiveServer2. They don't have direct access to HDFS or the metastore. ### Overview of Authorization Modes #### 1 Storage Based Authorization in the Metastore Server -In use cases 1 and 2a, the users have direct access to the data. Hive configurations don't control the data access. The HDFS permissions act as one source of truth for the table storage access. By enabling [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}}), you can use this single source for truth and have a consistent data and metadata authorization policy. To control metadata access on the metadata objects such as Databases, Tables and Partitions, it checks if you have permission on corresponding directories on the file system. You can also protect access through HiveServer2 (use case 2b above) by ensuring that the queries run as the end user ([hive.server2.enable.doAs]({{< ref "#hive-server2-enable-doas" >}}) option should be "true" in HiveServer2 configuration – this is a default value). +In use cases 1 and 2a, the users have direct access to the data. Hive configurations don't control the data access. The HDFS permissions act as one source of truth for the table storage access. By enabling [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}}), you can use this single source for truth and have a consistent data and metadata authorization policy. To control metadata access on the metadata objects such as Databases, Tables and Partitions, it checks if you have permission on corresponding directories on the file system. You can also protect access through HiveServer2 (use case 2b above) by ensuring that the queries run as the end user ([hive.server2.enable.doAs]({{% ref "#hive-server2-enable-doas" %}}) option should be "true" in HiveServer2 configuration – this is a default value). Note, that through the use of [HDFS ACL](http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#ACLs_Access_Control_Lists) (available in Hadoop 2.4 onwards) you have a lot of flexibility in controlling access to the file system, which in turn provides more flexibility with Storage Based Authorization. This functionality is available as of Hive 0.14 ([HIVE-7583](https://issues.apache.org/jira/browse/HIVE-7583)). @@ -65,7 +65,7 @@ FallbackHiveAuthorizerFactory will do the following to mitigate above mentioned Although Storage Based Authorization can provide access control at the level of Databases, Tables and Partitions, it can not control authorization at finer levels such as columns and views because the access control provided by the file system is at the level of directory and files. A prerequisite for fine grained access control is a data server that is able to provide just the columns and rows that a user needs (or has) access to. In the case of file system access, the whole file is served to the user. HiveServer2 satisfies this condition, as it has an API that understands rows and columns (through the use of SQL), and is able to serve just the columns and rows that your SQL query asked for. -[SQL Standards Based Authorization]({{< ref "sql-standard-based-hive-authorization" >}}) (introduced in Hive 0.13.0, [HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837)) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration.  +[SQL Standards Based Authorization]({{% ref "sql-standard-based-hive-authorization" %}}) (introduced in Hive 0.13.0, [HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837)) can be used to enable fine grained access control. It is based on the SQL standard for authorization, and uses the familiar grant/revoke statements to control access. It needs to be enabled through HiveServer2 configuration.  Note that for use case 2a (Hive command line) SQL Standards Based Authorization is disabled. This is because secure access control is not possible for the Hive command line using an access control policy in Hive, because users have direct access to HDFS and so they can easily bypass the SQL standards based authorization checks or even disable it altogether. Disabling this avoids giving a false sense of security to users. @@ -79,7 +79,7 @@ You also get many advanced features using them. For example, with Ranger you can #### 4 Old default Hive Authorization (Legacy Mode) -[Hive Old Default Authorization]({{< ref "hive-deprecated-authorization-mode" >}}) (was default before Hive 2.0.0) is the authorization mode that has been available in earlier versions of Hive. However, this mode does not have a complete access control model, leaving many security gaps unaddressed. For example, the permissions needed to grant privileges for a user are not defined, and any user can grant themselves access to a table or database. +[Hive Old Default Authorization]({{% ref "hive-deprecated-authorization-mode" %}}) (was default before Hive 2.0.0) is the authorization mode that has been available in earlier versions of Hive. However, this mode does not have a complete access control model, leaving many security gaps unaddressed. For example, the permissions needed to grant privileges for a user are not defined, and any user can grant themselves access to a table or database. This model is similar to the SQL standards based authorization mode, in that it provides grant/revoke statement-based access control. However, the access control policy is different from SQL standards based authorization, and they are not compatible. Use of this mode is also supported for Hive command line users. However, for reasons mentioned under the discussion of SQL standards based authorization (above), it is not a secure mode of authorization for the Hive command line. @@ -93,17 +93,17 @@ That is, you can have storage based authorization enabled for metastore API call Version 0.14 — EXPLAIN AUTHORIZATION -Starting in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-5961), the HiveQL command [EXPLAIN AUTHORIZATION]({{< ref "languagemanual-explain" >}}) shows all entities that need to be authorized to execute a query, as well as any authorization failures. +Starting in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-5961), the HiveQL command [EXPLAIN AUTHORIZATION]({{% ref "languagemanual-explain" %}}) shows all entities that need to be authorized to execute a query, as well as any authorization failures. ## More Information For detailed information about the Hive authorization modes, see: -* [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}})  - + also see [HCatalog Authorization]({{< ref "hcatalog-authorization" >}}) -* [SQL Standard Based Hive Authorization]({{< ref "sql-standard-based-hive-authorization" >}}) -* [Hive deprecated authorization mode / Legacy Mode]({{< ref "hive-deprecated-authorization-mode" >}}) - + also see the [design document]({{< ref "authdev" >}}) and [Security]({{< ref "security" >}}) +* [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}})  + + also see [HCatalog Authorization]({{% ref "hcatalog-authorization" %}}) +* [SQL Standard Based Hive Authorization]({{% ref "sql-standard-based-hive-authorization" %}}) +* [Hive deprecated authorization mode / Legacy Mode]({{% ref "hive-deprecated-authorization-mode" %}}) + + also see the [design document]({{% ref "authdev" %}}) and [Security]({{% ref "security" %}}) diff --git a/content/docs/latest/language/languagemanual-cli.md b/content/docs/latest/language/languagemanual-cli.md index 590ea9e7..5f83f7e2 100644 --- a/content/docs/latest/language/languagemanual-cli.md +++ b/content/docs/latest/language/languagemanual-cli.md @@ -9,9 +9,9 @@ $HIVE_HOME/bin/hive is a shell utility which can be used to run Hive queries in # Deprecation in favor of Beeline CLI -HiveServer2 (introduced in Hive 0.11) has its own CLI called [Beeline]({{< ref "#beeline" >}}), which is a JDBC client based on SQLLine.  Due to new development being focused on HiveServer2, [Hive CLI will soon be deprecated](https://issues.apache.org/jira/browse/HIVE-10304) in favor of Beeline ([HIVE-10511](https://issues.apache.org/jira/browse/HIVE-10511)). +HiveServer2 (introduced in Hive 0.11) has its own CLI called [Beeline]({{% ref "#beeline" %}}), which is a JDBC client based on SQLLine.  Due to new development being focused on HiveServer2, [Hive CLI will soon be deprecated](https://issues.apache.org/jira/browse/HIVE-10304) in favor of Beeline ([HIVE-10511](https://issues.apache.org/jira/browse/HIVE-10511)). -See [Replacing the Implementation of Hive CLI Using Beeline]({{< ref "replacing-the-implementation-of-hive-cli-using-beeline" >}}) and [Beeline – New Command Line Shell]({{< ref "#beeline-–-new-command-line-shell" >}}) in the HiveServer2 documentation. +See [Replacing the Implementation of Hive CLI Using Beeline]({{% ref "replacing-the-implementation-of-hive-cli-using-beeline" %}}) and [Beeline – New Command Line Shell]({{% ref "#beeline-–-new-command-line-shell" %}}) in the HiveServer2 documentation. ## Hive Command Line Options @@ -50,7 +50,7 @@ Note: The variant "`-hiveconf`" is supported as well as "`--hiveconf`". ### Examples -See [Variable Substitution]({{< ref "languagemanual-variablesubstitution" >}}) for examples of using the `hiveconf` option. +See [Variable Substitution]({{% ref "languagemanual-variablesubstitution" %}}) for examples of using the `hiveconf` option. * Example of running a query from the command line @@ -101,11 +101,11 @@ It is often desirable to emit the logs to the standard output and/or change the `hive.root.logger` specifies the logging level as well as the log destination. Specifying `console` as the target sends the logs to the standard error (instead of the log file). -See [Hive Logging in Getting Started]({{< ref "#hive-logging-in-getting-started" >}}) for more information. +See [Hive Logging in Getting Started]({{% ref "#hive-logging-in-getting-started" %}}) for more information. ### Tool to Clear Dangling Scratch Directories -See [Scratch Directory Management]({{< ref "#scratch-directory-management" >}}) in Setting Up HiveServer2 for information about scratch directories and a command-line [tool for removing dangling scratch directories]({{< ref "#tool-for-removing-dangling-scratch-directories" >}}) that can be used in the Hive CLI as well as HiveServer2. +See [Scratch Directory Management]({{% ref "#scratch-directory-management" %}}) in Setting Up HiveServer2 for information about scratch directories and a command-line [tool for removing dangling scratch directories]({{% ref "#tool-for-removing-dangling-scratch-directories" %}}) that can be used in the Hive CLI as well as HiveServer2. ## Hive Batch Mode Commands @@ -135,13 +135,13 @@ Use ";" (semicolon) to terminate commands. Comments in scripts can be specified | `set =` | Sets the value of a particular configuration variable (key). **Note:** If you misspell the variable name, the CLI will not show an error. | | `set` | Prints a list of configuration variables that are overridden by the user or Hive. | | `set -v` | Prints all Hadoop and Hive configuration variables. | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) below for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) below for more information. | | | | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{< ref "#hive-resources" >}}) below for more information. | -| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) below for more information. | -| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{< ref "#hive-resources" >}}) below for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{% ref "#hive-resources" %}}) below for more information. | +| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) below for more information. | +| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{% ref "#hive-resources" %}}) below for more information. | | `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | Removes the resource(s) from the distributed cache. | -| `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) below for more information. | +| `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) below for more information. | | `! ` | Executes a shell command from the Hive shell. | | `dfs ` | Executes a dfs command from the Hive shell. | | `` | Executes a Hive query and prints results to standard output. | @@ -173,7 +173,7 @@ Usage: ``` * FILE resources are just added to the distributed cache. Typically, this might be something like a transform script to be executed. -* JAR resources are also added to the Java classpath. This is required in order to reference objects they contain such as UDFs. See [Hive Plugins]({{< ref "hiveplugins" >}}) for more information about custom UDFs. +* JAR resources are also added to the Java classpath. This is required in order to reference objects they contain such as UDFs. See [Hive Plugins]({{% ref "hiveplugins" %}}) for more information about custom UDFs. * ARCHIVE resources are automatically unarchived as part of distributing them. Example: @@ -251,7 +251,7 @@ It is not neccessary to add files to the session if the files used in a transfor * `... MAP a.networkid USING '/home/nfsserv1/hadoopscripts/tt.py' ...` Here `tt.py` may be accessible via an NFS mount point that's configured identically on all the cluster nodes. -Note that Hive configuration parameters can also specify jars, files, and archives. See [Configuration Variables]({{< ref "#configuration-variables" >}}) for more information. +Note that Hive configuration parameters can also specify jars, files, and archives. See [Configuration Variables]({{% ref "#configuration-variables" %}}) for more information. # HCatalog CLI @@ -259,7 +259,7 @@ Version HCatalog is installed with Hive, starting with Hive release 0.11.0. -Many (but not all) `hcat` commands can be issued as `hive` commands, and vice versa. See the HCatalog [Command Line Interface]({{< ref "hcatalog-cli" >}}) document in the [HCatalog manual]({{< ref "hcatalog-base" >}}) for more information. +Many (but not all) `hcat` commands can be issued as `hive` commands, and vice versa. See the HCatalog [Command Line Interface]({{% ref "hcatalog-cli" %}}) document in the [HCatalog manual]({{% ref "hcatalog-base" %}}) for more information. diff --git a/content/docs/latest/language/languagemanual-commands.md b/content/docs/latest/language/languagemanual-commands.md index af63f546..66b7db53 100644 --- a/content/docs/latest/language/languagemanual-commands.md +++ b/content/docs/latest/language/languagemanual-commands.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : LanguageManual Commands -Commands are non-SQL statements such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the [CLI]({{< ref "languagemanual-cli" >}}) or [Beeline]({{< ref "#beeline" >}}). +Commands are non-SQL statements such as setting a property or adding a resource. They can be used in HiveQL scripts or directly in the [CLI]({{% ref "languagemanual-cli" %}}) or [Beeline]({{% ref "#beeline" %}}). | Command | Description | | --- | --- | @@ -14,12 +14,12 @@ Commands are non-SQL statements such as setting a property or adding a resource. | `set =` | Sets the value of a particular configuration variable (key). **Note:** If you misspell the variable name, the CLI will not show an error. | | `set` | Prints a list of configuration variables that are overridden by the user or Hive. | | `set -v` | Prints all Hadoop and Hive configuration variables. | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `add FILE[S] *` `add JAR[S]  *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `add FILE[S] *` `add JAR[S]  *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | | `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | Removes the resource(s) from the distributed cache. | -| `delete FILE[S] *`  `delete JAR[S] *`  `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | +| `delete FILE[S] *`  `delete JAR[S] *`  `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | | `! ` | Executes a shell command from the Hive shell. | | `dfs ` | Executes a dfs command from the Hive shell. | | `` | Executes a Hive query and prints results to standard output. | diff --git a/content/docs/latest/language/languagemanual-ddl-bucketedtables.md b/content/docs/latest/language/languagemanual-ddl-bucketedtables.md index 8596b0bb..dfe2a0f1 100644 --- a/content/docs/latest/language/languagemanual-ddl-bucketedtables.md +++ b/content/docs/latest/language/languagemanual-ddl-bucketedtables.md @@ -5,11 +5,11 @@ date: 2024-12-12 # Apache Hive : LanguageManual DDL BucketedTables -This is a brief example on creating and populating bucketed tables. (For another example, see [Bucketed Sorted Tables]({{< ref "#bucketed-sorted-tables" >}}).) +This is a brief example on creating and populating bucketed tables. (For another example, see [Bucketed Sorted Tables]({{% ref "#bucketed-sorted-tables" %}}).) -Bucketed tables are fantastic in that they allow much more efficient [sampling]({{< ref "languagemanual-sampling" >}}) than do non-bucketed tables, and they may later allow for time saving operations such as mapside joins. However, the bucketing specified at table creation is not enforced when the table is written to, and so it is possible for the table's metadata to advertise properties which are not upheld by the table's actual layout. This should obviously be avoided. Here's how to do it right. +Bucketed tables are fantastic in that they allow much more efficient [sampling]({{% ref "languagemanual-sampling" %}}) than do non-bucketed tables, and they may later allow for time saving operations such as mapside joins. However, the bucketing specified at table creation is not enforced when the table is written to, and so it is possible for the table's metadata to advertise properties which are not upheld by the table's actual layout. This should obviously be avoided. Here's how to do it right. -First, [table creation]({{< ref "#table-creation" >}}): +First, [table creation]({{% ref "#table-creation" %}}): ``` CREATE TABLE user_info_bucketed(user_id BIGINT, firstname STRING, lastname STRING) diff --git a/content/docs/latest/language/languagemanual-ddl.md b/content/docs/latest/language/languagemanual-ddl.md index 68a26d33..a63cab81 100644 --- a/content/docs/latest/language/languagemanual-ddl.md +++ b/content/docs/latest/language/languagemanual-ddl.md @@ -37,7 +37,7 @@ Version information REGEXP and RLIKE are non-reserved keywords prior to Hive 2.0.0 and reserved keywords starting in Hive 2.0.0 ([HIVE-11703](https://issues.apache.org/jira/browse/HIVE-11703)). -Reserved keywords are permitted as identifiers if you quote them as described in [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) (version 0.13.0 and later, see [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). Most of the keywords are reserved through [HIVE-6617](https://issues.apache.org/jira/browse/HIVE-6617) in order to reduce the ambiguity in grammar (version 1.2.0 and later). There are two ways if the user still would like to use those reserved keywords as identifiers: (1) use quoted identifiers, (2) set [hive.support.sql11.reserved.keywords]({{< ref "#hive-support-sql11-reserved-keywords" >}})=false. (version 2.1.0 and earlier)  +Reserved keywords are permitted as identifiers if you quote them as described in [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) (version 0.13.0 and later, see [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). Most of the keywords are reserved through [HIVE-6617](https://issues.apache.org/jira/browse/HIVE-6617) in order to reduce the ambiguity in grammar (version 1.2.0 and later). There are two ways if the user still would like to use those reserved keywords as identifiers: (1) use quoted identifiers, (2) set [hive.support.sql11.reserved.keywords]({{% ref "#hive-support-sql11-reserved-keywords" %}})=false. (version 2.1.0 and earlier)  ## Create/Drop/Alter/Use Database @@ -95,7 +95,7 @@ USE database_name; USE DEFAULT; ``` -USE sets the current database for all subsequent HiveQL statements. To revert to the default database, use the keyword "`default`" instead of a database name. To check which database is currently being used: `SELECT [current_database()]({{< ref "#current_database--" >}})` (as of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-4144)). +USE sets the current database for all subsequent HiveQL statements. To revert to the default database, use the keyword "`default`" instead of a database name. To check which database is currently being used: `SELECT [current_database()]({{% ref "#current_database--" %}})` (as of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-4144)). `USE database_name` was added in Hive 0.6 ([HIVE-675](https://issues.apache.org/jira/browse/HIVE-675)). @@ -152,21 +152,21 @@ The ALTER CONNECTOR ... SET OWNER changes the ownership of the connector object ## Create/Drop/Truncate Table -* [Create Table]({{< ref "#create-table" >}}) - + [Managed and External Tables]({{< ref "#managed-and-external-tables" >}}) - + [Storage Formats]({{< ref "#storage-formats" >}}) - + [Row Formats & SerDe]({{< ref "#row-formats--serde" >}}) - + [Partitioned Tables]({{< ref "#partitioned-tables" >}}) - + [External Tables]({{< ref "#external-tables" >}}) - + [Create Table As Select (CTAS)]({{< ref "#create-table-as-select-ctas" >}}) - + [Create Table Like]({{< ref "#create-table-like" >}}) - + [Bucketed Sorted Tables]({{< ref "#bucketed-sorted-tables" >}}) - + [Skewed Tables]({{< ref "#skewed-tables" >}}) - + [Temporary Tables]({{< ref "#temporary-tables" >}}) - + [Transactional Tables]({{< ref "#transactional-tables" >}}) - + [Constraints]({{< ref "#constraints" >}}) -* [Drop Table]({{< ref "#drop-table" >}}) -* [Truncate Table]({{< ref "#truncate-table" >}}) +* [Create Table]({{% ref "#create-table" %}}) + + [Managed and External Tables]({{% ref "#managed-and-external-tables" %}}) + + [Storage Formats]({{% ref "#storage-formats" %}}) + + [Row Formats & SerDe]({{% ref "#row-formats--serde" %}}) + + [Partitioned Tables]({{% ref "#partitioned-tables" %}}) + + [External Tables]({{% ref "#external-tables" %}}) + + [Create Table As Select (CTAS)]({{% ref "#create-table-as-select-ctas" %}}) + + [Create Table Like]({{% ref "#create-table-like" %}}) + + [Bucketed Sorted Tables]({{% ref "#bucketed-sorted-tables" %}}) + + [Skewed Tables]({{% ref "#skewed-tables" %}}) + + [Temporary Tables]({{% ref "#temporary-tables" %}}) + + [Transactional Tables]({{% ref "#transactional-tables" %}}) + + [Constraints]({{% ref "#constraints" %}}) +* [Drop Table]({{% ref "#drop-table" %}}) +* [Truncate Table]({{% ref "#truncate-table" %}}) ### Create Table @@ -264,67 +264,67 @@ CREATE TABLE creates a table with the given name. An error is thrown if a table * Table names and column names are case insensitive but SerDe and property names are case sensitive. + In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names. + In Hive 0.13 and later, column names can contain any [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) character (see [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)), however, dot (**.**) and colon (**:**) yield errors on querying, so they are disallowed in Hive 1.2.0 (see [HIVE-10120](https://issues.apache.org/jira/browse/HIVE-10120)). Any column name that is specified within backticks (```) is treated literally. Within a backtick string, use double backticks (````) to represent a backtick character. Backtick quotation also enables the use of reserved keywords for table and column identifiers. - + To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property `[hive.support.quoted.identifiers]({{< ref "#hive-support-quoted-identifiers" >}})` to `none`. In this configuration, backticked names are interpreted as regular expressions. For details, see [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html). + + To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property `[hive.support.quoted.identifiers]({{% ref "#hive-support-quoted-identifiers" %}})` to `none`. In this configuration, backticked names are interpreted as regular expressions. For details, see [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html). * Table and column comments are string literals (single-quoted). -* A table created without the [EXTERNAL clause]({{< ref "#external-clause" >}}) is called a *[managed table]({{< ref "#managed-table" >}})* because Hive manages its data. To find out if a table is managed or external, look for tableType in the output of [DESCRIBE EXTENDED table_name]({{< ref "#describe-extended-table_name" >}}). +* A table created without the [EXTERNAL clause]({{% ref "#external-clause" %}}) is called a *[managed table]({{% ref "#managed-table" %}})* because Hive manages its data. To find out if a table is managed or external, look for tableType in the output of [DESCRIBE EXTENDED table_name]({{% ref "#describe-extended-table_name" %}}). * The TBLPROPERTIES clause allows you to tag the table definition with your own metadata key/value pairs. Some predefined table properties also exist, such as last_modified_user and last_modified_time which are automatically added and managed by Hive. Other predefined table properties include: + TBLPROPERTIES ("comment"="*table_comment*") - + TBLPROPERTIES ("hbase.table.name"="*table_name*") – see [HBase Integration]({{< ref "#hbase-integration" >}}). - + TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ ([HIVE-6406](https://issues.apache.org/jira/browse/HIVE-6406)) – see [Inserting Data into Hive Tables from Queries]({{< ref "#inserting-data-into-hive-tables-from-queries" >}}). - + TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see [ORC Files]({{< ref "#orc-files" >}}). - + TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see [Hive Transactions]({{< ref "#hive-transactions" >}}). - + TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see [Hive Transactions]({{< ref "#hive-transactions" >}}). - + TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="*mapper_memory"*) – see [Hive Transactions]({{< ref "#hive-transactions" >}}). - + TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.num.threshold"="*threshold_num*") – see [Hive Transactions]({{< ref "#hive-transactions" >}}). - + TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.pct.threshold"="*threshold_pct*") – see [Hive Transactions]({{< ref "#hive-transactions" >}}). - + TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false") in release 1.2.0+ ([HIVE-9118](https://issues.apache.org/jira/browse/HIVE-9118)) – see [Drop Table]({{< ref "#drop-table" >}}), [Drop Partitions]({{< ref "#drop-partitions" >}}), [Truncate Table]({{< ref "#truncate-table" >}}), and [Insert Overwrite]({{< ref "#insert-overwrite" >}}). + + TBLPROPERTIES ("hbase.table.name"="*table_name*") – see [HBase Integration]({{% ref "#hbase-integration" %}}). + + TBLPROPERTIES ("immutable"="true") or ("immutable"="false") in release 0.13.0+ ([HIVE-6406](https://issues.apache.org/jira/browse/HIVE-6406)) – see [Inserting Data into Hive Tables from Queries]({{% ref "#inserting-data-into-hive-tables-from-queries" %}}). + + TBLPROPERTIES ("orc.compress"="ZLIB") or ("orc.compress"="SNAPPY") or ("orc.compress"="NONE") and other ORC properties – see [ORC Files]({{% ref "#orc-files" %}}). + + TBLPROPERTIES ("transactional"="true") or ("transactional"="false") in release 0.14.0+, the default is "false" – see [Hive Transactions]({{% ref "#hive-transactions" %}}). + + TBLPROPERTIES ("NO_AUTO_COMPACTION"="true") or ("NO_AUTO_COMPACTION"="false"), the default is "false" – see [Hive Transactions]({{% ref "#hive-transactions" %}}). + + TBLPROPERTIES ("compactor.mapreduce.map.memory.mb"="*mapper_memory"*) – see [Hive Transactions]({{% ref "#hive-transactions" %}}). + + TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.num.threshold"="*threshold_num*") – see [Hive Transactions]({{% ref "#hive-transactions" %}}). + + TBLPROPERTIES ("compactorthreshold.hive.compactor.delta.pct.threshold"="*threshold_pct*") – see [Hive Transactions]({{% ref "#hive-transactions" %}}). + + TBLPROPERTIES ("auto.purge"="true") or ("auto.purge"="false") in release 1.2.0+ ([HIVE-9118](https://issues.apache.org/jira/browse/HIVE-9118)) – see [Drop Table]({{% ref "#drop-table" %}}), [Drop Partitions]({{% ref "#drop-partitions" %}}), [Truncate Table]({{% ref "#truncate-table" %}}), and [Insert Overwrite]({{% ref "#insert-overwrite" %}}). + TBLPROPERTIES ("EXTERNAL"="TRUE") in release 0.6.0+ ([HIVE-1329](https://issues.apache.org/jira/browse/HIVE-1329)) – Change a managed table to an external table and vice versa for "FALSE". - As of Hive 2.4.0 ([HIVE-16324](https://issues.apache.org/jira/browse/HIVE-16324)) the value of the property 'EXTERNAL' is parsed as a boolean (case insensitive true or false) instead of a case sensitive string comparison. + TBLPROPERTIES ("external.table.purge"="true") in release 4.0.0+ ([HIVE-19981](https://issues.apache.org/jira/browse/HIVE-19981)) when set on external table would delete the data as well. -* To specify a database for the table, either issue the [USE database_name]({{< ref "#use-database_name" >}}) statement prior to the CREATE TABLE statement (in [Hive 0.6](https://issues.apache.org/jira/browse/HIVE-675) and later) or qualify the table name with a database name ("`database_name.table.name`" in [Hive 0.7](https://issues.apache.org/jira/browse/HIVE-1517) and later). +* To specify a database for the table, either issue the [USE database_name]({{% ref "#use-database_name" %}}) statement prior to the CREATE TABLE statement (in [Hive 0.6](https://issues.apache.org/jira/browse/HIVE-675) and later) or qualify the table name with a database name ("`database_name.table.name`" in [Hive 0.7](https://issues.apache.org/jira/browse/HIVE-1517) and later). The keyword "`default`" can be used for the default database. See [Alter Table](/docs/latest/language/languagemanual-ddl#alter-table) below for more information about table comments, table properties, and SerDe properties. -See [Type System]({{< ref "#type-system" >}}) and [Hive Data Types]({{< ref "languagemanual-types" >}}) for details about the primitive and complex data types. +See [Type System]({{% ref "#type-system" %}}) and [Hive Data Types]({{% ref "languagemanual-types" %}}) for details about the primitive and complex data types. #### Managed and External Tables -By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. For details on the differences between managed and external table see [Managed vs. External Tables]({{< ref "managed-vs--external-tables" >}}). +By default Hive creates managed tables, where files, metadata and statistics are managed by internal Hive processes. For details on the differences between managed and external table see [Managed vs. External Tables]({{% ref "managed-vs--external-tables" %}}). #### Storage Formats -Hive supports built-in and custom-developed file formats. See [CompressedStorage]({{< ref "compressedstorage" >}}) for details on compressed table storage. +Hive supports built-in and custom-developed file formats. See [CompressedStorage]({{% ref "compressedstorage" %}}) for details on compressed table storage. The following are some of the formats built-in to Hive:   | Storage Format | Description | | --- | --- | -| STORED AS TEXTFILE | Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter [hive.default.fileformat]({{< ref "#hive-default-fileformat" >}}) has a different setting.Use the DELIMITED clause to read delimited files.Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') Escaping is needed if you want to work with data that can contain these delimiter characters. A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').(Hive 4.0) All BINARY columns in the table are assumed to be base64 encoded.  To read the data as raw bytes:TBLPROPERTIES ("hive.serialization.decode.binary.as.base64"="false") | +| STORED AS TEXTFILE | Stored as plain text files. TEXTFILE is the default file format, unless the configuration parameter [hive.default.fileformat]({{% ref "#hive-default-fileformat" %}}) has a different setting.Use the DELIMITED clause to read delimited files.Enable escaping for the delimiter characters by using the 'ESCAPED BY' clause (such as ESCAPED BY '\') Escaping is needed if you want to work with data that can contain these delimiter characters. A custom NULL format can also be specified using the 'NULL DEFINED AS' clause (default is '\N').(Hive 4.0) All BINARY columns in the table are assumed to be base64 encoded.  To read the data as raw bytes:TBLPROPERTIES ("hive.serialization.decode.binary.as.base64"="false") | | STORED AS SEQUENCEFILE | Stored as compressed Sequence File. | -| STORED AS ORC | Stored as [ORC file format]({{< ref "#orc-file-format" >}}). Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata. | -| STORED AS PARQUET | Stored as Parquet format for the [Parquet]({{< ref "parquet" >}}) columnar storage format in [Hive 0.13.0 and later]({{< ref "#hive-0-13-0-and-later" >}}); Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in [Hive 0.10, 0.11, or 0.12]({{< ref "#hive-0-10,-0-11,-or-0-12" >}}). | -| STORED AS AVRO | Stored as Avro format in [Hive 0.14.0 and later](https://issues.apache.org/jira/browse/HIVE-6806) (see [Avro SerDe]({{< ref "avroserde" >}})). | +| STORED AS ORC | Stored as [ORC file format]({{% ref "#orc-file-format" %}}). Supports ACID Transactions & Cost-based Optimizer (CBO). Stores column-level metadata. | +| STORED AS PARQUET | Stored as Parquet format for the [Parquet]({{% ref "parquet" %}}) columnar storage format in [Hive 0.13.0 and later]({{% ref "#hive-0-13-0-and-later" %}}); Use ROW FORMAT SERDE ... STORED AS INPUTFORMAT ... OUTPUTFORMAT syntax ... in [Hive 0.10, 0.11, or 0.12]({{% ref "#hive-0-10,-0-11,-or-0-12" %}}). | +| STORED AS AVRO | Stored as Avro format in [Hive 0.14.0 and later](https://issues.apache.org/jira/browse/HIVE-6806) (see [Avro SerDe]({{% ref "avroserde" %}})). | | STORED AS RCFILE | Stored as [Record Columnar File](https://en.wikipedia.org/wiki/RCFile) format. | | STORED AS JSONFILE | Stored as Json file format in Hive 4.0.0 and later. | -| STORED BY | Stored by a non-native table format. To create or link to a non-native table, for example a table backed by [HBase]({{< ref "hbaseintegration" >}}) or [Druid]({{< ref "druid-integration" >}}) or [Accumulo]({{< ref "accumulointegration" >}}). See [StorageHandlers]({{< ref "storagehandlers" >}}) for more information on this option. | -| INPUTFORMAT and OUTPUTFORMAT | in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal.For example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT "[org.apache.hadoop.hive.ql.io](http://org.apache.hadoop.hive.ql.io/).HiveIgnoreKeyTextOutputFormat"' (see [LZO Compression]({{< ref "languagemanual-lzo" >}})). | +| STORED BY | Stored by a non-native table format. To create or link to a non-native table, for example a table backed by [HBase]({{% ref "hbaseintegration" %}}) or [Druid]({{% ref "druid-integration" %}}) or [Accumulo]({{% ref "accumulointegration" %}}). See [StorageHandlers]({{% ref "storagehandlers" %}}) for more information on this option. | +| INPUTFORMAT and OUTPUTFORMAT | in the file_format to specify the name of a corresponding InputFormat and OutputFormat class as a string literal.For example, 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'. For LZO compression, the values to use are 'INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat" OUTPUTFORMAT "[org.apache.hadoop.hive.ql.io](http://org.apache.hadoop.hive.ql.io/).HiveIgnoreKeyTextOutputFormat"' (see [LZO Compression]({{% ref "languagemanual-lzo" %}})). | #### Row Formats & SerDe You can create tables with a custom SerDe or using a native SerDe. A native SerDe is used if ROW FORMAT is not specified or ROW FORMAT DELIMITED is specified. Use the SERDE clause to create a table with a custom SerDe. For more information on SerDes see: -* [Hive SerDe]({{< ref "#hive-serde" >}}) -* [SerDe]({{< ref "serde" >}}) -* [HCatalog Storage Formats]({{< ref "hcatalog-storageformats" >}}) +* [Hive SerDe]({{% ref "#hive-serde" %}}) +* [SerDe]({{% ref "serde" %}}) +* [HCatalog Storage Formats]({{% ref "hcatalog-storageformats" %}}) -You must specify a list of columns for tables that use a native SerDe. Refer to the [Types]({{< ref "languagemanual-types" >}}) part of the User Guide for the allowable column types. +You must specify a list of columns for tables that use a native SerDe. Refer to the [Types]({{% ref "languagemanual-types" %}}) part of the User Guide for the allowable column types. A list of columns for tables that use a custom SerDe may be specified but Hive will query the SerDe to determine the actual list of columns for this table. For general information about SerDes, see [Hive SerDe](/community/resources/developerguide#hive-serde) in the Developer Guide. Also see [SerDe](/docs/latest/user/serde) for details about input and output processing. -To change a table's SerDe or SERDEPROPERTIES, use the ALTER TABLE statement as described below in [Add SerDe Properties]({{< ref "#add-serde-properties" >}}). +To change a table's SerDe or SERDEPROPERTIES, use the ALTER TABLE statement as described below in [Add SerDe Properties]({{% ref "#add-serde-properties" %}}). | Row Format | Description | | --- | --- | @@ -397,7 +397,7 @@ STORED AS SEQUENCEFILE; The above statement lets you create the same table as the previous table. -In the previous examples the data is stored in \/page_view. Specify a value for the key `[hive.metastore.warehouse.dir]({{< ref "#hive-metastore-warehouse-dir" >}})` in the Hive config file hive-site.xml. +In the previous examples the data is stored in \/page_view. Specify a value for the key `[hive.metastore.warehouse.dir]({{% ref "#hive-metastore-warehouse-dir" %}})` in the Hive config file hive-site.xml. #### External Tables @@ -408,7 +408,7 @@ Closed  ) setting table property external.table.purge=true, will also delete the data. -An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property `[hive.metastore.warehouse.dir]({{< ref "#hive-metastore-warehouse-dir" >}})`. +An EXTERNAL table points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property `[hive.metastore.warehouse.dir]({{% ref "#hive-metastore-warehouse-dir" %}})`. **Example:** @@ -426,13 +426,13 @@ CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, You can use the above statement to create a page_view table which points to any HDFS location for its storage. But you still have to make sure that the data is delimited as specified in the CREATE statement above. -For another example of creating an external table, see [Loading Data]({{< ref "#loading-data" >}}) in the Tutorial. +For another example of creating an external table, see [Loading Data]({{% ref "#loading-data" %}}) in the Tutorial. #### Create Table As Select (CTAS) Tables can also be created and populated by the results of a query in one create-table-as-select (CTAS) statement. The table created by CTAS is atomic, meaning that the table is not seen by other users until all the query results are populated. So other users will either see the table with the complete results of the query or will not see the table at all. -There are two parts in CTAS, the SELECT part can be any [SELECT statement]({{< ref "languagemanual-select" >}}) supported by HiveQL. The CREATE part of the CTAS takes the resulting schema from the SELECT part and creates the target table with other table properties such as the SerDe and storage format. +There are two parts in CTAS, the SELECT part can be any [SELECT statement]({{% ref "languagemanual-select" %}}) supported by HiveQL. The CREATE part of the CTAS takes the resulting schema from the SELECT part and creates the target table with other table properties such as the SerDe and storage format. Starting with Hive 3.2.0, CTAS statements can define a partitioning specification for the target table ([HIVE-20241](https://issues.apache.org/jira/browse/HIVE-20241)). @@ -456,7 +456,7 @@ SORT BY new_key, key_value_pair; The above CTAS statement creates the target table new_key_value_store with the schema (new_key DOUBLE, key_value_pair STRING) derived from the results of the SELECT statement. If the SELECT statement does not specify column aliases, the column names will be automatically assigned to _col0, _col1, and _col2 etc. In addition, the new target table is created using a specific SerDe and a storage format independent of the source tables in the SELECT statement. -Starting with [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-1180), the SELECT statement can include one or more common table expressions (CTEs), as shown in the [SELECT syntax]({{< ref "#select-syntax" >}}). For an example, see [Common Table Expression]({{< ref "#common-table-expression" >}}). +Starting with [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-1180), the SELECT statement can include one or more common table expressions (CTEs), as shown in the [SELECT syntax]({{% ref "#select-syntax" %}}). For an example, see [Common Table Expression]({{% ref "#common-table-expression" %}}). Being able to select data from one table to another is one of the most powerful features of Hive. Hive handles the conversion of the data from the source format to the destination format as the query is being executed. @@ -495,7 +495,7 @@ In the example above, the page_view table is bucketed (clustered by) userid and The CLUSTERED BY and SORTED BY creation commands do not affect how data is inserted into a table – only how it is read. This means that users must be careful to insert data correctly by specifying the number of reducers to be equal to the number of buckets, and using CLUSTER BY and SORT BY commands in their query. -There is also an example of [creating and populating bucketed tables]({{< ref "languagemanual-ddl-bucketedtables" >}}). +There is also an example of [creating and populating bucketed tables]({{% ref "languagemanual-ddl-bucketedtables" %}}). #### Skewed Tables @@ -505,9 +505,9 @@ As of Hive 0.10.0 ([HIVE-3072](https://issues.apache.org/jira/browse/HIVE-3072) Design documents -Read the [Skewed Join Optimization]({{< ref "skewed-join-optimization" >}}) and [List Bucketing]({{< ref "listbucketing" >}}) design documents for more information. +Read the [Skewed Join Optimization]({{% ref "skewed-join-optimization" %}}) and [List Bucketing]({{% ref "listbucketing" %}}) design documents for more information. -This feature can be used to improve performance for tables where one or more columns have [skewed]({{< ref "skewed-join-optimization" >}}) values. By specifying the values that appear very often (heavy skew) Hive will split those out into separate files (or directories in case of [list bucketing]({{< ref "listbucketing" >}})) automatically and take this fact into account during queries so that it can skip or include the whole file (or directory in case of [list bucketing]({{< ref "listbucketing" >}})) if possible. +This feature can be used to improve performance for tables where one or more columns have [skewed]({{% ref "skewed-join-optimization" %}}) values. By specifying the values that appear very often (heavy skew) Hive will split those out into separate files (or directories in case of [list bucketing]({{% ref "listbucketing" %}})) automatically and take this fact into account during queries so that it can skip or include the whole file (or directory in case of [list bucketing]({{% ref "listbucketing" %}})) if possible. This can be specified on a per-table level during table creation. @@ -531,7 +531,7 @@ CREATE TABLE list_bucket_multiple (col1 STRING, col2 int, col3 STRING) ``` -For corresponding ALTER TABLE statements, see [Alter Table Skewed or Stored as Directories]({{< ref "#alter-table-skewed-or-stored-as-directories" >}}) below. +For corresponding ALTER TABLE statements, see [Alter Table Skewed or Stored as Directories]({{% ref "#alter-table-skewed-or-stored-as-directories" %}}) below. #### Temporary Tables @@ -548,7 +548,7 @@ Temporary tables have the following limitations: * Partition columns are not supported. * No support for creation of indexes. -Starting in [Hive 1.1.0](https://issues.apache.org/jira/browse/HIVE-7313) the storage policy for temporary tables can be set to `memory`, `ssd`, or `default` with the [hive.exec.temporary.table.storage]({{< ref "#hive-exec-temporary-table-storage" >}}) configuration parameter (see [HDFS Storage Types and Storage Policies](http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types_and_Storage_Policies)). +Starting in [Hive 1.1.0](https://issues.apache.org/jira/browse/HIVE-7313) the storage policy for temporary tables can be set to `memory`, `ssd`, or `default` with the [hive.exec.temporary.table.storage]({{% ref "#hive-exec-temporary-table-storage" %}}) configuration parameter (see [HDFS Storage Types and Storage Policies](http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types_and_Storage_Policies)). **Example:** @@ -636,11 +636,11 @@ Version information: PURGE The PURGE option is added in version 0.14.0 by [HIVE-7100](https://issues.apache.org/jira/browse/HIVE-7100). -If PURGE is specified, the table data does not go to the .Trash/Current directory and so cannot be retrieved in the event of a mistaken DROP. The purge option can also be specified with the table property auto.purge (see [TBLPROPERTIES]({{< ref "#tblproperties" >}}) above). +If PURGE is specified, the table data does not go to the .Trash/Current directory and so cannot be retrieved in the event of a mistaken DROP. The purge option can also be specified with the table property auto.purge (see [TBLPROPERTIES]({{% ref "#tblproperties" %}}) above). -In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +In Hive 0.7.0 or later, DROP returns an error if the table doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. -See the Alter Partition section below for how to [drop partitions]({{< ref "#drop-partitions" >}}). +See the Alter Partition section below for how to [drop partitions]({{% ref "#drop-partitions" %}}). ### Truncate Table @@ -658,49 +658,49 @@ partition_spec: Removes all rows from a table or partition(s). The rows will be trashed if the filesystem Trash is enabled, otherwise they are deleted (as of Hive 2.2.0 with [HIVE-14626](https://issues.apache.org/jira/browse/HIVE-14626)). Currently the target table should be native/managed table or an exception will be thrown. User can specify partial partition_spec for truncating multiple partitions at once and omitting partition_spec will truncate all partitions in the table. -Starting with HIVE 2.3.0 ([HIVE-15880](https://issues.apache.org/jira/browse/HIVE-15880)) if the table property "auto.purge" (see [TBLPROPERTIES]({{< ref "#tblproperties" >}}) above) is set to "true" the data of the table is not moved to Trash when a TRUNCATE TABLE command is issued against it and cannot be retrieved in the event of a mistaken TRUNCATE. This is applicable only for managed tables (see [managed tables]({{< ref "#managed-tables" >}})). This behavior can be turned off if the "auto.purge" property is unset or set to false for a managed table. +Starting with HIVE 2.3.0 ([HIVE-15880](https://issues.apache.org/jira/browse/HIVE-15880)) if the table property "auto.purge" (see [TBLPROPERTIES]({{% ref "#tblproperties" %}}) above) is set to "true" the data of the table is not moved to Trash when a TRUNCATE TABLE command is issued against it and cannot be retrieved in the event of a mistaken TRUNCATE. This is applicable only for managed tables (see [managed tables]({{% ref "#managed-tables" %}})). This behavior can be turned off if the "auto.purge" property is unset or set to false for a managed table. Starting with Hive 4.0 ([HIVE-23183](https://issues.apache.org/jira/browse/HIVE-23183)) the TABLE token is optional, previous versions required it. ## Alter Table/Partition/Column -* [Alter Table]({{< ref "#alter-table" >}}) - + [Rename Table]({{< ref "#rename-table" >}}) - + [Alter Table Properties]({{< ref "#alter-table-properties" >}}) - - [Alter Table Comment]({{< ref "#alter-table-comment" >}}) - + [Add SerDe Properties]({{< ref "#add-serde-properties" >}}) - + [Remove SerDe Properties]({{< ref "#remove-serde-properties" >}}) - + [Alter Table Storage Properties]({{< ref "#alter-table-storage-properties" >}}) - + [Alter Table Skewed or Stored as Directories]({{< ref "#alter-table-skewed-or-stored-as-directories" >}}) - - [Alter Table Skewed]({{< ref "#alter-table-skewed" >}}) - - [Alter Table Not Skewed]({{< ref "#alter-table-not-skewed" >}}) - - [Alter Table Not Stored as Directories]({{< ref "#alter-table-not-stored-as-directories" >}}) - - [Alter Table Set Skewed Location]({{< ref "#alter-table-set-skewed-location" >}}) - + [Alter Table Constraints]({{< ref "#alter-table-constraints" >}}) - + [Additional Alter Table Statements]({{< ref "#additional-alter-table-statements" >}}) -* [Alter Partition]({{< ref "#alter-partition" >}}) - + [Add Partitions]({{< ref "#add-partitions" >}}) - - [Dynamic Partitions]({{< ref "#dynamic-partitions" >}}) - + [Rename Partition]({{< ref "#rename-partition" >}}) - + [Exchange Partition]({{< ref "#exchange-partition" >}}) - + [Discover Partitions]({{< ref "#discover-partitions" >}}) - + [Partition Retention]({{< ref "#partition-retention" >}}) - + [Recover Partitions (MSCK REPAIR TABLE)]({{< ref "#recover-partitions-msck-repair-table" >}}) - + [Drop Partitions]({{< ref "#drop-partitions" >}}) - + [(Un)Archive Partition]({{< ref "#unarchive-partition" >}}) -* [Alter Either Table or Partition]({{< ref "#alter-either-table-or-partition" >}}) - + [Alter Table/Partition File Format]({{< ref "#alter-tablepartition-file-format" >}}) - + [Alter Table/Partition Location]({{< ref "#alter-tablepartition-location" >}}) - + [Alter Table/Partition Touch]({{< ref "#alter-tablepartition-touch" >}}) - + [Alter Table/Partition Protections]({{< ref "#alter-tablepartition-protections" >}}) - + [Alter Table/Partition Compact]({{< ref "#alter-tablepartition-compact" >}}) - + [Alter Table/Partition Concatenate]({{< ref "#alter-tablepartition-concatenate" >}}) - + [Alter Table/Partition Update columns]({{< ref "#alter-tablepartition-update-columns" >}}) -* [Alter Column]({{< ref "#alter-column" >}}) - + [Rules for Column Names]({{< ref "#rules-for-column-names" >}}) - + [Change Column Name/Type/Position/Comment]({{< ref "#change-column-nametypepositioncomment" >}}) - + [Add/Replace Columns]({{< ref "#addreplace-columns" >}}) - + [Partial Partition Specification]({{< ref "#partial-partition-specification" >}}) +* [Alter Table]({{% ref "#alter-table" %}}) + + [Rename Table]({{% ref "#rename-table" %}}) + + [Alter Table Properties]({{% ref "#alter-table-properties" %}}) + - [Alter Table Comment]({{% ref "#alter-table-comment" %}}) + + [Add SerDe Properties]({{% ref "#add-serde-properties" %}}) + + [Remove SerDe Properties]({{% ref "#remove-serde-properties" %}}) + + [Alter Table Storage Properties]({{% ref "#alter-table-storage-properties" %}}) + + [Alter Table Skewed or Stored as Directories]({{% ref "#alter-table-skewed-or-stored-as-directories" %}}) + - [Alter Table Skewed]({{% ref "#alter-table-skewed" %}}) + - [Alter Table Not Skewed]({{% ref "#alter-table-not-skewed" %}}) + - [Alter Table Not Stored as Directories]({{% ref "#alter-table-not-stored-as-directories" %}}) + - [Alter Table Set Skewed Location]({{% ref "#alter-table-set-skewed-location" %}}) + + [Alter Table Constraints]({{% ref "#alter-table-constraints" %}}) + + [Additional Alter Table Statements]({{% ref "#additional-alter-table-statements" %}}) +* [Alter Partition]({{% ref "#alter-partition" %}}) + + [Add Partitions]({{% ref "#add-partitions" %}}) + - [Dynamic Partitions]({{% ref "#dynamic-partitions" %}}) + + [Rename Partition]({{% ref "#rename-partition" %}}) + + [Exchange Partition]({{% ref "#exchange-partition" %}}) + + [Discover Partitions]({{% ref "#discover-partitions" %}}) + + [Partition Retention]({{% ref "#partition-retention" %}}) + + [Recover Partitions (MSCK REPAIR TABLE)]({{% ref "#recover-partitions-msck-repair-table" %}}) + + [Drop Partitions]({{% ref "#drop-partitions" %}}) + + [(Un)Archive Partition]({{% ref "#unarchive-partition" %}}) +* [Alter Either Table or Partition]({{% ref "#alter-either-table-or-partition" %}}) + + [Alter Table/Partition File Format]({{% ref "#alter-tablepartition-file-format" %}}) + + [Alter Table/Partition Location]({{% ref "#alter-tablepartition-location" %}}) + + [Alter Table/Partition Touch]({{% ref "#alter-tablepartition-touch" %}}) + + [Alter Table/Partition Protections]({{% ref "#alter-tablepartition-protections" %}}) + + [Alter Table/Partition Compact]({{% ref "#alter-tablepartition-compact" %}}) + + [Alter Table/Partition Concatenate]({{% ref "#alter-tablepartition-concatenate" %}}) + + [Alter Table/Partition Update columns]({{% ref "#alter-tablepartition-update-columns" %}}) +* [Alter Column]({{% ref "#alter-column" %}}) + + [Rules for Column Names]({{% ref "#rules-for-column-names" %}}) + + [Change Column Name/Type/Position/Comment]({{% ref "#change-column-nametypepositioncomment" %}}) + + [Add/Replace Columns]({{% ref "#addreplace-columns" %}}) + + [Partial Partition Specification]({{% ref "#partial-partition-specification" %}}) Alter table statements enable you to change the structure of an existing table. You can add columns/partitions, change SerDe, add table and SerDe properties, or rename the table itself. Similarly, alter table partition statements allow you change the properties of a specific partition in the named table. @@ -715,7 +715,7 @@ ALTER TABLE table_name RENAME TO new_table_name; This statement lets you change the name of a table to a different name. -As of version 0.6, a rename on a [managed table]({{< ref "#managed-table" >}}) moves its HDFS location. Rename has been changed as of version 2.2.0 ([HIVE-14909](https://issues.apache.org/jira/browse/HIVE-14909)) so that a managed table's HDFS location is moved only if the table is created without a [LOCATION clause]({{< ref "#location-clause" >}}) and under its database directory. Hive versions prior to 0.6 just renamed the table in the metastore without moving the HDFS location. +As of version 0.6, a rename on a [managed table]({{% ref "#managed-table" %}}) moves its HDFS location. Rename has been changed as of version 2.2.0 ([HIVE-14909](https://issues.apache.org/jira/browse/HIVE-14909)) so that a managed table's HDFS location is moved only if the table is created without a [LOCATION clause]({{% ref "#location-clause" %}}) and under its database directory. Hive versions prior to 0.6 just renamed the table in the metastore without moving the HDFS location. #### Alter Table Properties @@ -729,7 +729,7 @@ table_properties: You can use this statement to add your own metadata to the tables. Currently last_modified_user, last_modified_time properties are automatically added and managed by Hive. Users can add their own properties to this list. You can do DESCRIBE EXTENDED TABLE to get this information. -For more information, see the [TBLPROPERTIES clause]({{< ref "#tblproperties-clause" >}}) in Create Table above. +For more information, see the [TBLPROPERTIES clause]({{% ref "#tblproperties-clause" %}}) in Create Table above. ##### Alter Table Comment @@ -754,7 +754,7 @@ serde_properties: These statements enable you to change a table's SerDe or add user-defined metadata to the table's SerDe object. -The SerDe properties are passed to the table's SerDe when it is being initialized by Hive to serialize and deserialize data. So users can store any information required for their custom SerDe here. Refer to the [SerDe documentation]({{< ref "serde" >}}) and [Hive SerDe]({{< ref "#hive-serde" >}}) in the Developer Guide for more information, and see [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}) above for details about setting a table's SerDe and SERDEPROPERTIES in a CREATE TABLE statement. +The SerDe properties are passed to the table's SerDe when it is being initialized by Hive to serialize and deserialize data. So users can store any information required for their custom SerDe here. Refer to the [SerDe documentation]({{% ref "serde" %}}) and [Hive SerDe]({{% ref "#hive-serde" %}}) in the Developer Guide for more information, and see [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}) above for details about setting a table's SerDe and SERDEPROPERTIES in a CREATE TABLE statement. Note that both `property_name` and `property_value` must be quoted. @@ -805,7 +805,7 @@ Version information As of Hive 0.10.0 ([HIVE-3072](https://issues.apache.org/jira/browse/HIVE-3072) and [HIVE-3649](https://issues.apache.org/jira/browse/HIVE-3649)). See [HIVE-3026](https://issues.apache.org/jira/browse/HIVE-3026) for additional JIRA tickets that implemented list bucketing in Hive 0.10.0 and 0.11.0. -A table's SKEWED and STORED AS DIRECTORIES options can be changed with ALTER TABLE statements. See [Skewed Tables]({{< ref "#skewed-tables" >}}) above for the corresponding CREATE TABLE syntax. +A table's SKEWED and STORED AS DIRECTORIES options can be changed with ALTER TABLE statements. See [Skewed Tables]({{% ref "#skewed-tables" %}}) above for the corresponding CREATE TABLE syntax. ##### Alter Table Skewed @@ -815,7 +815,7 @@ ALTER TABLE table_name SKEWED BY (col_name1, col_name2, ...) [STORED AS DIRECTORIES]; ``` -The STORED AS DIRECTORIES option determines whether a [skewed]({{< ref "skewed-join-optimization" >}}) table uses the [list bucketing]({{< ref "listbucketing" >}}) feature, which creates subdirectories for skewed values. +The STORED AS DIRECTORIES option determines whether a [skewed]({{% ref "skewed-join-optimization" %}}) table uses the [list bucketing]({{% ref "listbucketing" %}}) feature, which creates subdirectories for skewed values. ##### Alter Table Not Skewed @@ -862,15 +862,15 @@ ALTER TABLE table_name DROP CONSTRAINT constraint_name; #### Additional Alter Table Statements -See [Alter Either Table or Partition]({{< ref "#alter-either-table-or-partition" >}}) below for more DDL statements that alter tables. +See [Alter Either Table or Partition]({{% ref "#alter-either-table-or-partition" %}}) below for more DDL statements that alter tables. ### Alter Partition -Partitions can be added, renamed, exchanged (moved), dropped, or (un)archived by using the PARTITION clause in an ALTER TABLE statement, as described below. To make the metastore aware of partitions that were added directly to HDFS, you can use the metastore check command ([MSCK]({{< ref "#msck" >}})) or on Amazon EMR you can use the RECOVER PARTITIONS option of ALTER TABLE. See [Alter Either Table or Partition]({{< ref "#alter-either-table-or-partition" >}}) below for more ways to alter partitions. +Partitions can be added, renamed, exchanged (moved), dropped, or (un)archived by using the PARTITION clause in an ALTER TABLE statement, as described below. To make the metastore aware of partitions that were added directly to HDFS, you can use the metastore check command ([MSCK]({{% ref "#msck" %}})) or on Amazon EMR you can use the RECOVER PARTITIONS option of ALTER TABLE. See [Alter Either Table or Partition]({{% ref "#alter-either-table-or-partition" %}}) below for more ways to alter partitions. Version 1.2+ -As of Hive 1.2 ([HIVE-10307](https://issues.apache.org/jira/browse/HIVE-10307)), the partition values specified in partition specification are type checked, converted, and normalized to conform to their column types if the property [hive.typecheck.on.insert]({{< ref "#hive-typecheck-on-insert" >}}) is set to true (default). The values can be number literals. +As of Hive 1.2 ([HIVE-10307](https://issues.apache.org/jira/browse/HIVE-10307)), the partition values specified in partition specification are type checked, converted, and normalized to conform to their column types if the property [hive.typecheck.on.insert]({{% ref "#hive-typecheck-on-insert" %}}) is set to true (default). The values can be number literals. #### Add Partitions @@ -913,12 +913,12 @@ ALTER TABLE table_name ADD PARTITION (partCol = 'valueN') location 'locN'; Partitions can be added to a table dynamically, using a Hive INSERT statement (or a Pig STORE statement). See these documents for details and examples: -* [Design Document for Dynamic Partitions]({{< ref "dynamicpartitions" >}}) -* [Tutorial: Dynamic-Partition Insert]({{< ref "#tutorial:-dynamic-partition-insert" >}}) -* [Hive DML: Dynamic Partition Inserts]({{< ref "#hive-dml:-dynamic-partition-inserts" >}}) -* [HCatalog Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) - + [Usage with Pig]({{< ref "#usage-with-pig" >}}) - + [Usage from MapReduce]({{< ref "#usage-from-mapreduce" >}}) +* [Design Document for Dynamic Partitions]({{% ref "dynamicpartitions" %}}) +* [Tutorial: Dynamic-Partition Insert]({{% ref "#tutorial:-dynamic-partition-insert" %}}) +* [Hive DML: Dynamic Partition Inserts]({{% ref "#hive-dml:-dynamic-partition-inserts" %}}) +* [HCatalog Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) + + [Usage with Pig]({{% ref "#usage-with-pig" %}}) + + [Usage from MapReduce]({{% ref "#usage-from-mapreduce" %}}) #### Rename Partition @@ -931,7 +931,7 @@ ALTER TABLE table_name PARTITION partition_spec RENAME TO PARTITION partition_sp ``` -This statement lets you change the value of a partition column. One of use cases is that you can use this statement to normalize your legacy partition column value to conform to its type. In this case, the type conversion and normalization are not enabled for the column values in old *partition_spec* even with property [hive.typecheck.on.insert]({{< ref "#hive-typecheck-on-insert" >}}) set to true (default) which allows you to specify any legacy data in form of string in the old *partition_spec*. +This statement lets you change the value of a partition column. One of use cases is that you can use this statement to normalize your legacy partition column value to conform to its type. In this case, the type conversion and normalization are not enabled for the column values in old *partition_spec* even with property [hive.typecheck.on.insert]({{% ref "#hive-typecheck-on-insert" %}}) set to true (default) which allows you to specify any legacy data in form of string in the old *partition_spec*. #### Exchange Partition @@ -949,7 +949,7 @@ ALTER TABLE table_name_2 EXCHANGE PARTITION (partition_spec, partition_spec2, .. ``` This statement lets you move the data in a partition from a table to another table that has the same schema and does not already have that partition. -For further details on this feature, see [Exchange Partition]({{< ref "exchange-partition" >}}) and [HIVE-4095](https://issues.apache.org/jira/browse/HIVE-4095). +For further details on this feature, see [Exchange Partition]({{% ref "exchange-partition" %}}) and [HIVE-4095](https://issues.apache.org/jira/browse/HIVE-4095). #### Discover Partitions @@ -999,13 +999,13 @@ ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec[, PARTITION par ``` -You can use ALTER TABLE DROP PARTITION to drop a partition for a table. This removes the data and metadata for this partition. The data is actually moved to the .Trash/Current directory if Trash is configured, unless PURGE is specified, but the metadata is completely lost (see [Drop Table]({{< ref "#drop-table" >}}) above). +You can use ALTER TABLE DROP PARTITION to drop a partition for a table. This removes the data and metadata for this partition. The data is actually moved to the .Trash/Current directory if Trash is configured, unless PURGE is specified, but the metadata is completely lost (see [Drop Table]({{% ref "#drop-table" %}}) above). Version Information: PROTECTION -IGNORE PROTECTION is no longer available in versions 2.0.0 and later. This functionality is replaced by using one of the several security options available with Hive (see [SQL Standard Based Hive Authorization]({{< ref "sql-standard-based-hive-authorization" >}})). See [HIVE-11145](https://issues.apache.org/jira/browse/HIVE-11145) for details. +IGNORE PROTECTION is no longer available in versions 2.0.0 and later. This functionality is replaced by using one of the several security options available with Hive (see [SQL Standard Based Hive Authorization]({{% ref "sql-standard-based-hive-authorization" %}})). See [HIVE-11145](https://issues.apache.org/jira/browse/HIVE-11145) for details. -For tables that are protected by [NO_DROP CASCADE]({{< ref "#no_drop-cascade" >}}), you can use the predicate IGNORE PROTECTION to drop a specified partition or set of partitions (for example, when splitting a table between two Hadoop clusters): +For tables that are protected by [NO_DROP CASCADE]({{% ref "#no_drop-cascade" %}}), you can use the predicate IGNORE PROTECTION to drop a specified partition or set of partitions (for example, when splitting a table between two Hadoop clusters): ``` ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec IGNORE PROTECTION; @@ -1024,9 +1024,9 @@ If PURGE is specified, the partition data does not go to the .Trash/Current dire ALTER TABLE table_name DROP [IF EXISTS] PARTITION partition_spec PURGE; -- (Note: Hive 1.2.0 and later) ``` -The purge option can also be specified with the table property auto.purge (see [TBLPROPERTIES]({{< ref "#tblproperties" >}}) above). +The purge option can also be specified with the table property auto.purge (see [TBLPROPERTIES]({{% ref "#tblproperties" %}}) above). -In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +In Hive 0.7.0 or later, DROP returns an error if the partition doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. ``` ALTER TABLE page_view DROP PARTITION (dt='2008-08-08', country='us'); @@ -1041,7 +1041,7 @@ ALTER TABLE table_name UNARCHIVE PARTITION partition_spec; ``` -Archiving is a feature to moves a partition's files into a Hadoop Archive (HAR). Note that only the file count will be reduced; HAR does not provide any compression. See [LanguageManual Archiving]({{< ref "languagemanual-archiving" >}}) for more information +Archiving is a feature to moves a partition's files into a Hadoop Archive (HAR). Note that only the file count will be reduced; HAR does not provide any compression. See [LanguageManual Archiving]({{% ref "languagemanual-archiving" %}}) for more information ### Alter Either Table or Partition @@ -1052,7 +1052,7 @@ ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format; ``` -This statement changes the table's (or partition's) file format. For available file_format options, see the section above on [CREATE TABLE]({{< ref "#create-table" >}}). The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive. +This statement changes the table's (or partition's) file format. For available file_format options, see the section above on [CREATE TABLE]({{% ref "#create-table" %}}). The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive. #### Alter Table/Partition Location @@ -1072,7 +1072,7 @@ TOUCH reads the metadata, and writes it back. This has the effect of causing the Also, it may be useful later if we incorporate reliable last modified times. Then touch would update that time as well. -Note that TOUCH doesn't create a table or partition if it doesn't already exist. (See [Create Table]({{< ref "#create-table" >}}).) +Note that TOUCH doesn't create a table or partition if it doesn't already exist. (See [Create Table]({{% ref "#create-table" %}}).) #### Alter Table/Partition Protections @@ -1080,7 +1080,7 @@ Version information As of Hive 0.7.0 ([HIVE-1413](https://issues.apache.org/jira/browse/HIVE-1413)). The CASCADE clause for NO_DROP was added in HIVE 0.8.0 ([HIVE-2605](https://issues.apache.org/jira/browse/HIVE-2605)). -This functionality was removed in Hive 2.0.0. This functionality is replaced by using one of the several security options available with Hive (see [SQL Standard Based Hive Authorization]({{< ref "sql-standard-based-hive-authorization" >}})). See [HIVE-11145](https://issues.apache.org/jira/browse/HIVE-11145) for details. +This functionality was removed in Hive 2.0.0. This functionality is replaced by using one of the several security options available with Hive (see [SQL Standard Based Hive Authorization]({{% ref "sql-standard-based-hive-authorization" %}})). See [HIVE-11145](https://issues.apache.org/jira/browse/HIVE-11145) for details. ``` ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE NO_DROP [CASCADE]; @@ -1089,18 +1089,18 @@ ALTER TABLE table_name [PARTITION partition_spec] ENABLE|DISABLE OFFLINE; ``` -Protection on data can be set at either the table or partition level. Enabling NO_DROP prevents a table from being [dropped]({{< ref "#dropped" >}}). Enabling OFFLINE prevents the data in a table or partition from being queried, but the metadata can still be accessed. +Protection on data can be set at either the table or partition level. Enabling NO_DROP prevents a table from being [dropped]({{% ref "#dropped" %}}). Enabling OFFLINE prevents the data in a table or partition from being queried, but the metadata can still be accessed. -If any partition in a table has NO_DROP enabled, the table cannot be dropped either. Conversely, if a table has NO_DROP enabled then partitions may be dropped, but with NO_DROP CASCADE partitions cannot be dropped either unless the [drop partition command]({{< ref "#drop-partition-command" >}}) specifies IGNORE PROTECTION. +If any partition in a table has NO_DROP enabled, the table cannot be dropped either. Conversely, if a table has NO_DROP enabled then partitions may be dropped, but with NO_DROP CASCADE partitions cannot be dropped either unless the [drop partition command]({{% ref "#drop-partition-command" %}}) specifies IGNORE PROTECTION. #### Alter Table/Partition Compact Version information -In Hive release [0.13.0](https://issues.apache.org/jira/browse/HIVE-5317) and later when [transactions]({{< ref "hive-transactions" >}}) are being used, the ALTER TABLE statement can request [compaction]({{< ref "#compaction" >}}) of a table or partition. -As of Hive release [1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354) when [transactions]({{< ref "hive-transactions" >}}) are being used, the ALTER TABLE ... COMPACT statement can include a [TBLPROPERTIES](/docs/latest/user/hive-transactions#table-properties) clause that is either to change compaction MapReduce job properties or to overwrite any other Hive table properties. More details can be found [here](/docs/latest/user/hive-transactions#table-properties). -As of Hive release [4.0.0-alpha-2](https://issues.apache.org/jira/browse/HIVE-27056?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-2) [compaction pooling]({{< ref "compaction-pooling" >}}) is available. -As of Hive release [4.0.0](https://issues.apache.org/jira/browse/HIVE-27094?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0) [rebalance compaction]({{< ref "rebalance-compaction" >}}) is available. +In Hive release [0.13.0](https://issues.apache.org/jira/browse/HIVE-5317) and later when [transactions]({{% ref "hive-transactions" %}}) are being used, the ALTER TABLE statement can request [compaction]({{% ref "#compaction" %}}) of a table or partition. +As of Hive release [1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354) when [transactions]({{% ref "hive-transactions" %}}) are being used, the ALTER TABLE ... COMPACT statement can include a [TBLPROPERTIES](/docs/latest/user/hive-transactions#table-properties) clause that is either to change compaction MapReduce job properties or to overwrite any other Hive table properties. More details can be found [here](/docs/latest/user/hive-transactions#table-properties). +As of Hive release [4.0.0-alpha-2](https://issues.apache.org/jira/browse/HIVE-27056?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0-alpha-2) [compaction pooling]({{% ref "compaction-pooling" %}}) is available. +As of Hive release [4.0.0](https://issues.apache.org/jira/browse/HIVE-27094?jql=project%20%3D%20HIVE%20AND%20fixVersion%20%3D%204.0.0) [rebalance compaction]({{% ref "rebalance-compaction" %}}) is available. ``` ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])] @@ -1111,13 +1111,13 @@ ALTER TABLE table_name [PARTITION (partition_key = 'partition_value' [, ...])]   [WITH OVERWRITE TBLPROPERTIES ("property"="value" [, ...])]; ``` -In general you do not need to request compactions when [Hive transactions]({{< ref "hive-transactions" >}}) are being used, because the system will detect the need for them and initiate the compaction. However, if compaction is turned off for a table or you want to compact the table at a time the system would not choose to, ALTER TABLE can initiate the compaction. By default the statement will enqueue a request for compaction and return. To watch the progress of the compaction, use [SHOW COMPACTIONS]({{< ref "#show-compactions" >}}). As of Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15920) "AND WAIT" may be specified to have the operation block until compaction completes. +In general you do not need to request compactions when [Hive transactions]({{% ref "hive-transactions" %}}) are being used, because the system will detect the need for them and initiate the compaction. However, if compaction is turned off for a table or you want to compact the table at a time the system would not choose to, ALTER TABLE can initiate the compaction. By default the statement will enqueue a request for compaction and return. To watch the progress of the compaction, use [SHOW COMPACTIONS]({{% ref "#show-compactions" %}}). As of Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15920) "AND WAIT" may be specified to have the operation block until compaction completes. -The compaction_type can be MAJOR, MINOR or REBALANCE. See the Basic Design section in [Hive Transactions]({{< ref "#hive-transactions" >}}) for more information. +The compaction_type can be MAJOR, MINOR or REBALANCE. See the Basic Design section in [Hive Transactions]({{% ref "#hive-transactions" %}}) for more information. -More in formation on compaction pooling can be found here: [Compaction pooling]({{< ref "compaction-pooling" >}}) +More in formation on compaction pooling can be found here: [Compaction pooling]({{% ref "compaction-pooling" %}}) -More in formation on rebalance compaction pooling can be found here: [Rebalance Compaction]({{< ref "rebalance-compaction" >}}) +More in formation on rebalance compaction pooling can be found here: [Rebalance Compaction]({{% ref "rebalance-compaction" %}}) The [CLUSTERED INTO n BUCKETS] and [ORDER BY col_list] clauses are only supported for REBALANCE compaction. @@ -1159,7 +1159,7 @@ Version information In Hive release 0.12.0 and earlier, column names can only contain alphanumeric and underscore characters. -In Hive release 0.13.0 and later, by default column names can be specified within backticks (```) and contain any [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) character ([HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)), however, dot (**.**) and colon (**:**) yield errors on querying. Within a string delimited by backticks, all characters are treated literally except that double backticks (````) represent one backtick character. The pre-0.13.0 behavior can be used by setting `[hive.support.quoted.identifiers]({{< ref "#hive-support-quoted-identifiers" >}})` to `none`, in which case backticked names are interpreted as regular expressions. See [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) for details. +In Hive release 0.13.0 and later, by default column names can be specified within backticks (```) and contain any [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) character ([HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)), however, dot (**.**) and colon (**:**) yield errors on querying. Within a string delimited by backticks, all characters are treated literally except that double backticks (````) represent one backtick character. The pre-0.13.0 behavior can be used by setting `[hive.support.quoted.identifiers]({{% ref "#hive-support-quoted-identifiers" %}})` to `none`, in which case backticked names are interpreted as regular expressions. See [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) for details. Backtick quotation enables the use of reserved keywords for column names, as well as table names. @@ -1171,11 +1171,11 @@ ALTER TABLE table_name [PARTITION partition_spec] CHANGE [COLUMN] col_old_name c ``` -This command will allow users to change a column's name, [data type]({{< ref "languagemanual-types" >}}), comment, or position, or an arbitrary combination of them. The PARTITION clause is available in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7971) and later; see [Upgrading Pre-Hive 0.13.0 Decimal Columns]({{< ref "#upgrading-pre-hive-0-13-0-decimal-columns" >}}) for usage. A patch for Hive 0.13 is also available (see [HIVE-7971](https://issues.apache.org/jira/browse/HIVE-7971)). +This command will allow users to change a column's name, [data type]({{% ref "languagemanual-types" %}}), comment, or position, or an arbitrary combination of them. The PARTITION clause is available in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7971) and later; see [Upgrading Pre-Hive 0.13.0 Decimal Columns]({{% ref "#upgrading-pre-hive-0-13-0-decimal-columns" %}}) for usage. A patch for Hive 0.13 is also available (see [HIVE-7971](https://issues.apache.org/jira/browse/HIVE-7971)). The CASCADE|RESTRICT clause is available in [Hive 1.1.0](https://issues.apache.org/jira/browse/HIVE-8839). ALTER TABLE CHANGE COLUMN with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata. -ALTER TABLE CHANGE COLUMN CASCADE clause will override the table partition's column metadata regardless of the table or partition's [protection mode]({{< ref "#protection-mode" >}}). Use with discretion. +ALTER TABLE CHANGE COLUMN CASCADE clause will override the table partition's column metadata regardless of the table or partition's [protection mode]({{% ref "#protection-mode" %}}). Use with discretion. The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition. @@ -1213,13 +1213,13 @@ ALTER TABLE table_name  ADD COLUMNS lets you add new columns to the end of the existing columns but before the partition columns. This is supported for Avro backed tables as well, for [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-7446) and later. -REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to [Hive SerDe]({{< ref "#hive-serde" >}}) for more information. REPLACE COLUMNS can also be used to drop columns. For example, "`ALTER TABLE test_change REPLACE COLUMNS (a int, b int);`" will remove column 'c' from test_change's schema. +REPLACE COLUMNS removes all existing columns and adds the new set of columns. This can be done only for tables with a native SerDe (DynamicSerDe, MetadataTypedColumnsetSerDe, LazySimpleSerDe and ColumnarSerDe). Refer to [Hive SerDe]({{% ref "#hive-serde" %}}) for more information. REPLACE COLUMNS can also be used to drop columns. For example, "`ALTER TABLE test_change REPLACE COLUMNS (a int, b int);`" will remove column 'c' from test_change's schema. -The PARTITION clause is available in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7971) and later; see [Upgrading Pre-Hive 0.13.0 Decimal Columns]({{< ref "#upgrading-pre-hive-0-13-0-decimal-columns" >}}) for usage. +The PARTITION clause is available in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7971) and later; see [Upgrading Pre-Hive 0.13.0 Decimal Columns]({{% ref "#upgrading-pre-hive-0-13-0-decimal-columns" %}}) for usage. The CASCADE|RESTRICT clause is available in [Hive 1.1.0](https://issues.apache.org/jira/browse/HIVE-8839). ALTER TABLE ADD|REPLACE COLUMNS with CASCADE command changes the columns of a table's metadata, and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column changes only to table metadata. -ALTER TABLE ADD or REPLACE COLUMNS CASCADE will override the table partition's column metadata regardless of the table or partition's [protection mode]({{< ref "#protection-mode" >}}). Use with discretion.The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition. +ALTER TABLE ADD or REPLACE COLUMNS CASCADE will override the table partition's column metadata regardless of the table or partition's [protection mode]({{% ref "#protection-mode" %}}). Use with discretion.The column change command will only modify Hive's metadata, and will not modify data. Users should make sure the actual data layout of the table/partition conforms with the metadata definition. #### Partial Partition Specification @@ -1247,7 +1247,7 @@ ALTER TABLE foo PARTITION (ds, hr) CHANGE COLUMN dec_column_name dec_column_name -Similar to dynamic partitioning, [hive.exec.dynamic.partition]({{< ref "#hive-exec-dynamic-partition" >}}) must be set to true to enable use of partial partition specs during ALTER PARTITION. This is supported for the following operations: +Similar to dynamic partitioning, [hive.exec.dynamic.partition]({{% ref "#hive-exec-dynamic-partition" %}}) must be set to true to enable use of partial partition specs during ALTER PARTITION. This is supported for the following operations: * Change column * Add column @@ -1257,10 +1257,10 @@ Similar to dynamic partitioning, [hive.exec.dynamic.partition]({{< ref "#hive-ex ## Create/Drop/Alter View -* [Create View]({{< ref "#create-view" >}}) -* [Drop View]({{< ref "#drop-view" >}}) -* [Alter View Properties]({{< ref "#alter-view-properties" >}}) -* [Alter View As Select]({{< ref "#alter-view-as-select" >}}) +* [Create View]({{% ref "#create-view" %}}) +* [Drop View]({{% ref "#drop-view" %}}) +* [Alter View Properties]({{% ref "#alter-view-properties" %}}) +* [Alter View As Select]({{% ref "#alter-view-as-select" %}}) Version information @@ -1304,11 +1304,11 @@ CREATE VIEW onion_referrers(url COMMENT 'URL of Referring page') ``` -Use [SHOW CREATE TABLE]({{< ref "#show-create-table" >}}) to display the CREATE VIEW statement that created a view. As of Hive 2.2.0, [SHOW VIEWS]({{< ref "#show-views" >}}) displays a list of views in a database. +Use [SHOW CREATE TABLE]({{% ref "#show-create-table" %}}) to display the CREATE VIEW statement that created a view. As of Hive 2.2.0, [SHOW VIEWS]({{% ref "#show-views" %}}) displays a list of views in a database. Version Information -Originally, the file format for views was hard coded as SequenceFile. Hive 2.1.0 ([HIVE-13736](https://issues.apache.org/jira/browse/HIVE-13736)) made views follow the same defaults as tables and indexes using the [hive.default.fileformat]({{< ref "#hive-default-fileformat" >}})and [hive.default.fileformat.managed]({{< ref "#hive-default-fileformat-managed" >}}) properties. +Originally, the file format for views was hard coded as SequenceFile. Hive 2.1.0 ([HIVE-13736](https://issues.apache.org/jira/browse/HIVE-13736)) made views follow the same defaults as tables and indexes using the [hive.default.fileformat]({{% ref "#hive-default-fileformat" %}})and [hive.default.fileformat.managed]({{% ref "#hive-default-fileformat-managed" %}}) properties. ### Drop View @@ -1321,7 +1321,7 @@ DROP VIEW removes metadata for the specified view. (It is illegal to use DROP TA When dropping a view referenced by other views, no warning is given (the dependent views are left dangling as invalid and must be dropped or recreated by the user). -In Hive 0.7.0 or later, DROP returns an error if the view doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +In Hive 0.7.0 or later, DROP returns an error if the view doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. **Example:** @@ -1359,15 +1359,15 @@ Note: The view must already exist, and if the view has partitions, it could not ## Create/Drop/Alter Materialized View -* [Create Materialized View]({{< ref "#create-materialized-view" >}}) -* [Drop Materialized View]({{< ref "#drop-materialized-view" >}}) -* [Alter Materialized View]({{< ref "#alter-materialized-view" >}}) +* [Create Materialized View]({{% ref "#create-materialized-view" %}}) +* [Drop Materialized View]({{% ref "#drop-materialized-view" %}}) +* [Alter Materialized View]({{% ref "#alter-materialized-view" %}}) Version information Materialized view support is only available in Hive 3.0 and later. -This section provides an introduction to Hive materialized views syntax. More information about materialized view support and usage in Hive can be found [here]({{< ref "materialized-views" >}}). +This section provides an introduction to Hive materialized views syntax. More information about materialized view support and usage in Hive can be found [here]({{% ref "materialized-views" %}}). ### Create Materialized View @@ -1430,12 +1430,12 @@ Version information As of Hive 0.7. -Indexing Is Removed since 3.0! See [Indexes design document]({{< ref "indexdev" >}}) +Indexing Is Removed since 3.0! See [Indexes design document]({{% ref "indexdev" %}}) This section provides a brief introduction to Hive indexes, which are documented more fully here: -* [Overview of Hive Indexes]({{< ref "languagemanual-indexing" >}}) -* [Indexes design document]({{< ref "indexdev" >}}) +* [Overview of Hive Indexes]({{% ref "languagemanual-indexing" %}}) +* [Indexes design document]({{% ref "indexdev" %}}) In Hive 0.12.0 and earlier releases, the index name is case-sensitive for CREATE INDEX and DROP INDEX statements. However, ALTER INDEX requires an index name that was created with lowercase letters (see [HIVE-2752](https://issues.apache.org/jira/browse/HIVE-2752)). This bug is fixed in [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-2752) by making index names case-insensitive for all HiveQL statements. For releases prior to 0.13.0, the best practice is to use lowercase letters for all index names. @@ -1458,7 +1458,7 @@ CREATE INDEX index_name ``` -CREATE INDEX creates an index on a table using the given list of columns as keys. See CREATE INDEX in the [Indexes]({{< ref "#indexes" >}}) design document. +CREATE INDEX creates an index on a table using the given list of columns as keys. See CREATE INDEX in the [Indexes]({{% ref "#indexes" %}}) design document. ### Drop Index @@ -1469,7 +1469,7 @@ DROP INDEX [IF EXISTS] index_name ON table_name; DROP INDEX drops the index, as well as deleting the index table. -In Hive 0.7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +In Hive 0.7.0 or later, DROP returns an error if the index doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. ### Alter Index @@ -1530,9 +1530,9 @@ CREATE TEMPORARY FUNCTION function_name AS class_name; ``` -This statement lets you create a function that is implemented by the class_name. You can use this function in Hive queries as long as the session lasts. You can use any class that is in the class path of Hive. You can add jars to class path by executing 'ADD JAR' statements. Please refer to the CLI section [Hive Interactive Shell Commands]({{< ref "#hive-interactive-shell-commands" >}}), including [Hive Resources]({{< ref "#hive-resources" >}}), for more information on how to add/delete files from the Hive classpath. Using this, you can register User Defined Functions (UDF's). +This statement lets you create a function that is implemented by the class_name. You can use this function in Hive queries as long as the session lasts. You can use any class that is in the class path of Hive. You can add jars to class path by executing 'ADD JAR' statements. Please refer to the CLI section [Hive Interactive Shell Commands]({{% ref "#hive-interactive-shell-commands" %}}), including [Hive Resources]({{% ref "#hive-resources" %}}), for more information on how to add/delete files from the Hive classpath. Using this, you can register User Defined Functions (UDF's). -Also see [Hive Plugins]({{< ref "hiveplugins" >}}) for general information about creating custom UDFs. +Also see [Hive Plugins]({{% ref "hiveplugins" %}}) for general information about creating custom UDFs. #### Drop Temporary Function @@ -1543,7 +1543,7 @@ DROP TEMPORARY FUNCTION [IF EXISTS] function_name; ``` -In Hive 0.7.0 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +In Hive 0.7.0 or later, DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. ### Permanent Functions @@ -1560,7 +1560,7 @@ CREATE FUNCTION [db_name.]function_name AS class_name [USING JAR|FILE|ARCHIVE 'file_uri' [, JAR|FILE|ARCHIVE 'file_uri'] ]; ``` -This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if [ADD JAR/FILE]({{< ref "#add-jar/file" >}}) had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location. +This statement lets you create a function that is implemented by the class_name. Jars, files, or archives which need to be added to the environment can be specified with the USING clause; when the function is referenced for the first time by a Hive session, these resources will be added to the environment as if [ADD JAR/FILE]({{% ref "#add-jar/file" %}}) had been issued. If Hive is not in local mode, then the resource location must be a non-local URI such as an HDFS location. The function will be added to the database specified, or to the current database at the time that the function was created. The function can be referenced by fully qualifying the function name (db_name.function_name), or can be referenced without qualification if the function is in the current database. @@ -1574,7 +1574,7 @@ As of Hive 0.13.0 ([HIVE-6047](https://issues.apache.org/jira/browse/HIVE-6047)) DROP FUNCTION [IF EXISTS] function_name; ``` -DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{< ref "#hive-exec-drop-ignorenonexistent" >}}) is set to true. +DROP returns an error if the function doesn't exist, unless IF EXISTS is specified or the configuration variable [hive.exec.drop.ignorenonexistent]({{% ref "#hive-exec-drop-ignorenonexistent" %}}) is set to true. #### Reload Function @@ -1590,54 +1590,54 @@ As of [HIVE-2573](https://issues.apache.org/jira/browse/HIVE-2573), creating per ## Create/Drop/Grant/Revoke Roles and Privileges -[Hive deprecated authorization mode / Legacy Mode]({{< ref "hive-deprecated-authorization-mode" >}}) has information about these DDL statements: +[Hive deprecated authorization mode / Legacy Mode]({{% ref "hive-deprecated-authorization-mode" %}}) has information about these DDL statements: -* [CREATE ROLE]({{< ref "#create-role" >}}) -* [GRANT ROLE]({{< ref "#grant-role" >}}) -* [REVOKE ROLE]({{< ref "#revoke-role" >}}) -* [GRANT privilege_type]({{< ref "#grant-privilege_type" >}}) -* [REVOKE privilege_type]({{< ref "#revoke-privilege_type" >}}) -* [DROP ROLE]({{< ref "#drop-role" >}}) -* [SHOW ROLE GRANT]({{< ref "#show-role-grant" >}}) -* [SHOW GRANT]({{< ref "#show-grant" >}}) +* [CREATE ROLE]({{% ref "#create-role" %}}) +* [GRANT ROLE]({{% ref "#grant-role" %}}) +* [REVOKE ROLE]({{% ref "#revoke-role" %}}) +* [GRANT privilege_type]({{% ref "#grant-privilege_type" %}}) +* [REVOKE privilege_type]({{% ref "#revoke-privilege_type" %}}) +* [DROP ROLE]({{% ref "#drop-role" %}}) +* [SHOW ROLE GRANT]({{% ref "#show-role-grant" %}}) +* [SHOW GRANT]({{% ref "#show-grant" %}}) -For [SQL standard based authorization]({{< ref "sql-standard-based-hive-authorization" >}}) in Hive 0.13.0 and later releases, see these DDL statements: +For [SQL standard based authorization]({{% ref "sql-standard-based-hive-authorization" %}}) in Hive 0.13.0 and later releases, see these DDL statements: * Role Management Commands - + [CREATE ROLE]({{< ref "#create-role" >}}) - + [GRANT ROLE]({{< ref "#grant-role" >}}) - + [REVOKE ROLE]({{< ref "#revoke-role" >}}) - + [DROP ROLE]({{< ref "#drop-role" >}}) - + [SHOW ROLES]({{< ref "#show-roles" >}}) - + [SHOW ROLE GRANT]({{< ref "#show-role-grant" >}}) - + [SHOW CURRENT ROLES]({{< ref "#show-current-roles" >}}) - + [SET ROLE]({{< ref "#set-role" >}}) - + [SHOW PRINCIPALS]({{< ref "#show-principals" >}}) + + [CREATE ROLE]({{% ref "#create-role" %}}) + + [GRANT ROLE]({{% ref "#grant-role" %}}) + + [REVOKE ROLE]({{% ref "#revoke-role" %}}) + + [DROP ROLE]({{% ref "#drop-role" %}}) + + [SHOW ROLES]({{% ref "#show-roles" %}}) + + [SHOW ROLE GRANT]({{% ref "#show-role-grant" %}}) + + [SHOW CURRENT ROLES]({{% ref "#show-current-roles" %}}) + + [SET ROLE]({{% ref "#set-role" %}}) + + [SHOW PRINCIPALS]({{% ref "#show-principals" %}}) * Object Privilege Commands - + [GRANT privilege_type]({{< ref "#grant-privilege_type" >}}) - + [REVOKE privilege_type]({{< ref "#revoke-privilege_type" >}}) - + [SHOW GRANT]({{< ref "#show-grant" >}}) + + [GRANT privilege_type]({{% ref "#grant-privilege_type" %}}) + + [REVOKE privilege_type]({{% ref "#revoke-privilege_type" %}}) + + [SHOW GRANT]({{% ref "#show-grant" %}}) ## Show -* [Show Databases]({{< ref "#show-databases" >}}) -* [Show Connectors]({{< ref "#show-connectors" >}}) -* [Show Tables/Views/Materialized Views/Partitions/Indexes]({{< ref "#show-tablesviewsmaterialized-viewspartitionsindexes" >}}) - + [Show Tables]({{< ref "#show-tables" >}}) - + [Show Views]({{< ref "#show-views" >}}) - + [Show Materialized Views]({{< ref "#show-materialized-views" >}}) - + [Show Partitions]({{< ref "#show-partitions" >}}) - + [Show Table/Partition Extended]({{< ref "#show-tablepartition-extended" >}}) - + [Show Table Properties]({{< ref "#show-table-properties" >}}) - + [Show Create Table]({{< ref "#show-create-table" >}}) - + [Show Indexes]({{< ref "#show-indexes" >}}) -* [Show Columns]({{< ref "#show-columns" >}}) -* [Show Functions]({{< ref "#show-functions" >}}) -* [Show Granted Roles and Privileges]({{< ref "#show-granted-roles-and-privileges" >}}) -* [Show Locks]({{< ref "#show-locks" >}}) -* [Show Conf]({{< ref "#show-conf" >}}) -* [Show Transactions]({{< ref "#show-transactions" >}}) -* [Show Compactions]({{< ref "#show-compactions" >}}) +* [Show Databases]({{% ref "#show-databases" %}}) +* [Show Connectors]({{% ref "#show-connectors" %}}) +* [Show Tables/Views/Materialized Views/Partitions/Indexes]({{% ref "#show-tablesviewsmaterialized-viewspartitionsindexes" %}}) + + [Show Tables]({{% ref "#show-tables" %}}) + + [Show Views]({{% ref "#show-views" %}}) + + [Show Materialized Views]({{% ref "#show-materialized-views" %}}) + + [Show Partitions]({{% ref "#show-partitions" %}}) + + [Show Table/Partition Extended]({{% ref "#show-tablepartition-extended" %}}) + + [Show Table Properties]({{% ref "#show-table-properties" %}}) + + [Show Create Table]({{% ref "#show-create-table" %}}) + + [Show Indexes]({{% ref "#show-indexes" %}}) +* [Show Columns]({{% ref "#show-columns" %}}) +* [Show Functions]({{% ref "#show-functions" %}}) +* [Show Granted Roles and Privileges]({{% ref "#show-granted-roles-and-privileges" %}}) +* [Show Locks]({{% ref "#show-locks" %}}) +* [Show Conf]({{% ref "#show-conf" %}}) +* [Show Transactions]({{% ref "#show-transactions" %}}) +* [Show Compactions]({{% ref "#show-compactions" %}}) These statements provide a way to query the Hive metastore for existing data and metadata accessible to this Hive system. @@ -1818,7 +1818,7 @@ SHOW TBLPROPERTIES tblname("foo"); The first form lists all of the table properties for the table in question, one per row separated by tabs. The second form of the command prints only the value for the property that's being asked for. -For more information, see the [TBLPROPERTIES clause]({{< ref "#tblproperties-clause" >}}) in Create Table above. +For more information, see the [TBLPROPERTIES clause]({{% ref "#tblproperties-clause" %}}) in Create Table above. #### Show Create Table @@ -1839,7 +1839,7 @@ Version information As of Hive 0.7. -Indexing Is Removed since 3.0! See [Indexes design document]({{< ref "indexdev" >}}) +Indexing Is Removed since 3.0! See [Indexes design document]({{% ref "indexdev" %}}) ``` SHOW [FORMATTED] (INDEX|INDEXES) ON table_with_index [(FROM|IN) db_name]; @@ -1909,18 +1909,18 @@ SHOW FUNCTIONS lists all the user defined and builtin functions, filtered by the ### Show Granted Roles and Privileges -[Hive deprecated authorization mode / Legacy Mode]({{< ref "hive-deprecated-authorization-mode" >}}) has information about these SHOW statements: +[Hive deprecated authorization mode / Legacy Mode]({{% ref "hive-deprecated-authorization-mode" %}}) has information about these SHOW statements: -* [SHOW ROLE GRANT]({{< ref "#show-role-grant" >}}) -* [SHOW GRANT]({{< ref "#show-grant" >}}) +* [SHOW ROLE GRANT]({{% ref "#show-role-grant" %}}) +* [SHOW GRANT]({{% ref "#show-grant" %}}) -In Hive 0.13.0 and later releases, [SQL standard based authorization]({{< ref "sql-standard-based-hive-authorization" >}}) has these SHOW statements: +In Hive 0.13.0 and later releases, [SQL standard based authorization]({{% ref "sql-standard-based-hive-authorization" %}}) has these SHOW statements: -* [SHOW ROLE GRANT]({{< ref "#show-role-grant" >}}) -* [SHOW GRANT]({{< ref "#show-grant" >}}) -* [SHOW CURRENT ROLES]({{< ref "#show-current-roles" >}}) -* [SHOW ROLES]({{< ref "#show-roles" >}}) -* [SHOW PRINCIPALS]({{< ref "#show-principals" >}}) +* [SHOW ROLE GRANT]({{% ref "#show-role-grant" %}}) +* [SHOW GRANT]({{% ref "#show-grant" %}}) +* [SHOW CURRENT ROLES]({{% ref "#show-current-roles" %}}) +* [SHOW ROLES]({{% ref "#show-roles" %}}) +* [SHOW PRINCIPALS]({{% ref "#show-principals" %}}) ### Show Locks @@ -1932,11 +1932,11 @@ SHOW LOCKS PARTITION () EXTENDED; SHOW LOCKS (DATABASE|SCHEMA) database_name; -- (Note: Hive 0.13.0 and later; SCHEMA added in Hive 0.14.0) ``` -SHOW LOCKS displays the locks on a table or partition. See [Hive Concurrency Model]({{< ref "locking" >}}) for information about locks. +SHOW LOCKS displays the locks on a table or partition. See [Hive Concurrency Model]({{% ref "locking" %}}) for information about locks. SHOW LOCKS (DATABASE|SCHEMA) is supported from Hive 0.13 for DATABASE (see [HIVE-2093](https://issues.apache.org/jira/browse/HIVE-2093)) and Hive 0.14 for SCHEMA (see [HIVE-6601](https://issues.apache.org/jira/browse/HIVE-6601)). SCHEMA and DATABASE are interchangeable – they mean the same thing. -When [Hive transactions]({{< ref "hive-transactions" >}}) are being used, SHOW LOCKS returns this information (see [HIVE-6460](https://issues.apache.org/jira/browse/HIVE-6460)): +When [Hive transactions]({{% ref "hive-transactions" %}}) are being used, SHOW LOCKS returns this information (see [HIVE-6460](https://issues.apache.org/jira/browse/HIVE-6460)): * database name * table name @@ -1967,25 +1967,25 @@ As of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-6037). SHOW CONF ; ``` -SHOW CONF returns a description of the specified [configuration property]({{< ref "configuration-properties" >}}). +SHOW CONF returns a description of the specified [configuration property]({{% ref "configuration-properties" %}}). * default value * required type * description -Note that SHOW CONF does not show the *current value* of a configuration property. For current property settings, use the "set" command in the CLI or a HiveQL script (see [Commands]({{< ref "languagemanual-commands" >}})) or in Beeline (see [Beeline Hive Commands]({{< ref "#beeline-hive-commands" >}})). +Note that SHOW CONF does not show the *current value* of a configuration property. For current property settings, use the "set" command in the CLI or a HiveQL script (see [Commands]({{% ref "languagemanual-commands" %}})) or in Beeline (see [Beeline Hive Commands]({{% ref "#beeline-hive-commands" %}})). ### Show Transactions Version information -As of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6460) (see [Hive Transactions]({{< ref "hive-transactions" >}})). +As of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6460) (see [Hive Transactions]({{% ref "hive-transactions" %}})). ``` SHOW TRANSACTIONS; ``` -SHOW TRANSACTIONS is for use by administrators when [Hive transactions]({{< ref "hive-transactions" >}}) are being used. It returns a list of all currently open and aborted transactions in the system, including this information: +SHOW TRANSACTIONS is for use by administrators when [Hive transactions]({{% ref "hive-transactions" %}}) are being used. It returns a list of all currently open and aborted transactions in the system, including this information: * transaction ID * transaction state @@ -1998,13 +1998,13 @@ SHOW TRANSACTIONS is for use by administrators when [Hive transactions]({{< ref Version information -As of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6460) (see [Hive Transactions]({{< ref "#hive-transactions" >}})). +As of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6460) (see [Hive Transactions]({{% ref "#hive-transactions" %}})). ``` SHOW COMPACTIONS [DATABASE.][TABLE] [PARTITION ()] [POOL_NAME] [TYPE] [STATE] [ORDER BY `start` DESC] [LIMIT 10]; ``` -[SHOW COMPACTIONS](/docs/latest/user/hive-transactions#show-compactions) returns a list of all compaction requests currently being [processed]({{< ref "#processed" >}}) or scheduled, including this information: +[SHOW COMPACTIONS](/docs/latest/user/hive-transactions#show-compactions) returns a list of all compaction requests currently being [processed]({{% ref "#processed" %}}) or scheduled, including this information: * "CompactionId" - unique internal id (As of [Hive 3.0](https://issues.apache.org/jira/browse/HIVE-16084)) * "Database" - Hive database name @@ -2053,16 +2053,16 @@ SHOW COMPACTIONS db1.tbl0 PARTITION (p=101,day='Monday') POOL 'pool0' TYPE 'mino — show all compactions from specific database/table filtered based on pool name/type.state/status and ordered with given clause ``` -Compactions are initiated automatically, but can also be initiated manually with an [ALTER TABLE COMPACT statement]({{< ref "#alter-table-compact-statement" >}}). +Compactions are initiated automatically, but can also be initiated manually with an [ALTER TABLE COMPACT statement]({{% ref "#alter-table-compact-statement" %}}). ## Describe -* [Describe Database]({{< ref "#describe-database" >}}) -* [Describe Dataconnector]({{< ref "#describe-dataconnector" >}}) -* [Describe Table/View/Materialized View/Column]({{< ref "#describe-tableviewmaterialized-viewcolumn" >}}) - + [Display Column Statistics]({{< ref "#display-column-statistics" >}}) -* [Describe Partition]({{< ref "#describe-partition" >}}) -* [Hive 2.0+: Syntax Change]({{< ref "#hive-20-syntax-change" >}}) +* [Describe Database]({{% ref "#describe-database" %}}) +* [Describe Dataconnector]({{% ref "#describe-dataconnector" %}}) +* [Describe Table/View/Materialized View/Column]({{% ref "#describe-tableviewmaterialized-viewcolumn" %}}) + + [Display Column Statistics]({{% ref "#display-column-statistics" %}}) +* [Describe Partition]({{% ref "#describe-partition" %}}) +* [Hive 2.0+: Syntax Change]({{% ref "#hive-20-syntax-change" %}}) ### Describe Database @@ -2077,7 +2077,7 @@ DESCRIBE SCHEMA [EXTENDED] db_name; -- (Note: Hive 1.1.0 and later) DESCRIBE DATABASE shows the name of the database, its comment (if one has been set), and its root location on the filesystem. The uses of SCHEMA and DATABASE are interchangeable – they mean the same thing. DESCRIBE SCHEMA is added in Hive 1.1.0 ([HIVE-8803](https://issues.apache.org/jira/browse/HIVE-8803)). -EXTENDED also shows the [database properties]({{< ref "#database-properties" >}}). +EXTENDED also shows the [database properties]({{% ref "#database-properties" %}}). ### Describe Dataconnector @@ -2113,7 +2113,7 @@ DESCRIBE [EXTENDED|FORMATTED] DESCRIBE shows the list of columns including partition columns for the given table. If the EXTENDED keyword is specified then it will show all the metadata for the table in Thrift serialized form. This is generally only useful for debugging and not for general use. If the FORMATTED keyword is specified, then it will show the metadata in a tabular format. -Note: DESCRIBE EXTENDED shows the number of rows only if statistics were gathered when the data was loaded (see [Newly Created Tables]({{< ref "#newly-created-tables" >}})), and if the Hive CLI is used instead of a Thrift client or Beeline. [HIVE-6285](https://issues.apache.org/jira/browse/HIVE-6285) will address this issue. Although ANALYZE TABLE gathers statistics after the data has been loaded (see [Existing Tables]({{< ref "#existing-tables" >}})), it does not currently provide information about the number of rows. +Note: DESCRIBE EXTENDED shows the number of rows only if statistics were gathered when the data was loaded (see [Newly Created Tables]({{% ref "#newly-created-tables" %}})), and if the Hive CLI is used instead of a Thrift client or Beeline. [HIVE-6285](https://issues.apache.org/jira/browse/HIVE-6285) will address this issue. Although ANALYZE TABLE gathers statistics after the data has been loaded (see [Existing Tables]({{% ref "#existing-tables" %}})), it does not currently provide information about the number of rows. If a table has a complex column then you can examine the attributes of this column by specifying table_name.complex_col_name (and field_name for an element of a struct, '$elem$' for array element, '$key$' for map key, and '$value$' for map value). You can specify this recursively to explore the complex column type. @@ -2125,7 +2125,7 @@ Version information — partition & non-partition columns In Hive 0.10.0 and earlier, no distinction is made between partition columns and non-partition columns while displaying columns for DESCRIBE TABLE. From Hive 0.12.0 onwards, they are displayed separately. -In Hive 0.13.0 and later, the configuration parameter [hive.display.partition.cols.separately]({{< ref "#hive-display-partition-cols-separately" >}}) lets you use the old behavior, if desired ([HIVE-6689](https://issues.apache.org/jira/browse/HIVE-6689)). For an example, see the test case in the [patch for HIVE-6689](https://issues.apache.org/jira/secure/attachment/12635956/HIVE-6689.2.patch). +In Hive 0.13.0 and later, the configuration parameter [hive.display.partition.cols.separately]({{% ref "#hive-display-partition-cols-separately" %}}) lets you use the old behavior, if desired ([HIVE-6689](https://issues.apache.org/jira/browse/HIVE-6689)). For an example, see the test case in the [patch for HIVE-6689](https://issues.apache.org/jira/secure/attachment/12635956/HIVE-6689.2.patch). Bug fixed in Hive 0.10.0 — database qualifiers @@ -2133,7 +2133,7 @@ Database qualifiers for table names were introduced in Hive 0.7.0, but they were Bug fixed in Hive 0.13.0 — quoted identifiers -Prior to Hive 0.13.0 DESCRIBE did not accept backticks (`) surrounding table identifiers, so DESCRIBE could not be used for tables with names that matched reserved keywords ([HIVE-2949](https://issues.apache.org/jira/browse/HIVE-2949) and [HIVE-6187](https://issues.apache.org/jira/browse/HIVE-6187)). As of 0.13.0, all identifiers specified within backticks are treated literally when the configuration parameter [hive.support.quoted.identifiers]({{< ref "#hive-support-quoted-identifiers" >}}) has its default value of "`column`" ([HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). The only exception is that double backticks (``) represent a single backtick character. +Prior to Hive 0.13.0 DESCRIBE did not accept backticks (`) surrounding table identifiers, so DESCRIBE could not be used for tables with names that matched reserved keywords ([HIVE-2949](https://issues.apache.org/jira/browse/HIVE-2949) and [HIVE-6187](https://issues.apache.org/jira/browse/HIVE-6187)). As of 0.13.0, all identifiers specified within backticks are treated literally when the configuration parameter [hive.support.quoted.identifiers]({{% ref "#hive-support-quoted-identifiers" %}}) has its default value of "`column`" ([HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). The only exception is that double backticks (``) represent a single backtick character. #### Display Column Statistics @@ -2149,7 +2149,7 @@ DESCRIBE FORMATTED [db_name.]table_name column_name PARTITION (partition_spec); -- (see "Hive 2.0+: New Syntax" below) ``` -See [Statistics in Hive: Existing Tables]({{< ref "#statistics-in-hive:-existing-tables" >}}) for more information about the ANALYZE TABLE command. +See [Statistics in Hive: Existing Tables]({{% ref "#statistics-in-hive:-existing-tables" %}}) for more information about the ANALYZE TABLE command. ### Describe Partition @@ -2169,7 +2169,7 @@ DESCRIBE [EXTENDED|FORMATTED] [db_name.]table_name [column_name] PARTITION parti -- (Note: Hive 1.x.x and 0.x.x only. See "Hive 2.0+: New Syntax" below) ``` -This statement lists metadata for a given partition. The output is similar to that of DESCRIBE table_name. Presently, the column information associated with a particular partition is not used while preparing plans. As of Hive 1.2 ([HIVE-10307](https://issues.apache.org/jira/browse/HIVE-10307)), the partition column values specified in *partition_spec* are type validated, converted and normalized to their column types when [hive.typecheck.on.insert]({{< ref "#hive-typecheck-on-insert" >}}) is set to true (default). These values can be number literals. +This statement lists metadata for a given partition. The output is similar to that of DESCRIBE table_name. Presently, the column information associated with a particular partition is not used while preparing plans. As of Hive 1.2 ([HIVE-10307](https://issues.apache.org/jira/browse/HIVE-10307)), the partition column values specified in *partition_spec* are type validated, converted and normalized to their column types when [hive.typecheck.on.insert]({{% ref "#hive-typecheck-on-insert" %}}) is set to true (default). These values can be number literals. **Example:** @@ -2258,13 +2258,13 @@ DESCRIBE default.src_thrift lintString.$elem$.myint; ## Abort -* [Abort Transactions]({{< ref "#abort-transactions" >}}) +* [Abort Transactions]({{% ref "#abort-transactions" %}}) ### Abort Transactions Version information -As of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-12634) (see [Hive Transactions]({{< ref "#hive-transactions" >}})). +As of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-12634) (see [Hive Transactions]({{% ref "#hive-transactions" %}})). ``` ABORT TRANSACTIONS transactionID [ transactionID ...]; @@ -2279,20 +2279,20 @@ ABORT TRANSACTIONS cleans up the specified transaction IDs from the Hive metasto ABORT TRANSACTIONS 0000007 0000008 0000010 0000015; ``` -This command can be used together with [SHOW TRANSACTIONS]({{< ref "#show-transactions" >}}). The latter can help figure out the candidate transaction IDs to be cleaned up. +This command can be used together with [SHOW TRANSACTIONS]({{% ref "#show-transactions" %}}). The latter can help figure out the candidate transaction IDs to be cleaned up. # Scheduled queries -Documentation is available on the [Scheduled Queries]({{< ref "scheduled-queries" >}}) page. +Documentation is available on the [Scheduled Queries]({{% ref "scheduled-queries" %}}) page. # Datasketches integration -Documentation is available on the [Datasketches Integration]({{< ref "datasketches-integration" >}}) page +Documentation is available on the [Datasketches Integration]({{% ref "datasketches-integration" %}}) page # HCatalog and WebHCat DDL For information about DDL in HCatalog and WebHCat, see: -* [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) in the [HCatalog manual]({{< ref "hcatalog-base" >}}) -* [WebHCat DDL Resources]({{< ref "webhcat-reference-allddl" >}}) in the [WebHCat manual]({{< ref "webhcat-base" >}}) +* [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}) in the [HCatalog manual]({{% ref "hcatalog-base" %}}) +* [WebHCat DDL Resources]({{% ref "webhcat-reference-allddl" %}}) in the [WebHCat manual]({{% ref "webhcat-base" %}}) diff --git a/content/docs/latest/language/languagemanual-dml.md b/content/docs/latest/language/languagemanual-dml.md index 42dba68c..85f403c2 100644 --- a/content/docs/latest/language/languagemanual-dml.md +++ b/content/docs/latest/language/languagemanual-dml.md @@ -69,7 +69,7 @@ The uncompressed data should look like this: * If the keyword LOCAL is not given, *filepath* must refer to files within the same filesystem as the table's (or partition's) location. * Hive does some minimal checks to make sure that the files being loaded match the target table. Currently it checks that if the table is stored in sequencefile format, the files being loaded are also sequencefiles, and vice versa. * A bug that prevented loading a file when its name includes the "+" character is fixed in release 0.13.0 ([HIVE-6048](https://issues.apache.org/jira/browse/HIVE-6048)). -* Please read [CompressedStorage]({{< ref "compressedstorage" >}}) if your datafile is compressed. +* Please read [CompressedStorage]({{% ref "compressedstorage" %}}) if your datafile is compressed. ### Inserting data into Hive Tables from queries @@ -101,16 +101,16 @@ INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) sel * INSERT OVERWRITE will overwrite any existing data in the table or partition + unless `IF NOT EXISTS` is provided for a partition (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2612)). - + As of Hive 2.3.0 ([HIVE-15880](https://issues.apache.org/jira/browse/HIVE-15880)), if the table has `TBLPROPERTIES ("auto.purge"="true")` the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see [managed tables]({{< ref "#managed-tables" >}})) and is turned off when "auto.purge" property is unset or set to false. + + As of Hive 2.3.0 ([HIVE-15880](https://issues.apache.org/jira/browse/HIVE-15880)), if the table has `TBLPROPERTIES ("auto.purge"="true")` the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see [managed tables]({{% ref "#managed-tables" %}})) and is turned off when "auto.purge" property is unset or set to false. * INSERT INTO will append to the table or partition, keeping the existing data intact. (Note: INSERT INTO syntax is only available starting in version 0.8.) + As of Hive [0.13.0](https://issues.apache.org/jira/browse/HIVE-6406), a table can be made ***immutable*** by creating it with `TBLPROPERTIES ("immutable"="true")`. The default is "immutable"="false". INSERT INTO behavior into an immutable table is disallowed if any data is already present, although INSERT INTO still works if the immutable table is empty. The behavior of INSERT OVERWRITE is not affected by the "immutable" table property. An immutable table is protected against accidental updates due to a script loading data into it being run multiple times by mistake. The first insert into an immutable table succeeds and successive inserts fail, resulting in only one set of data in the table, instead of silently succeeding with multiple copies of the data in the table. -* Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns. If [hive.typecheck.on.insert]({{< ref "#hive-typecheck-on-insert" >}}) is set to true, these values are validated, converted and normalized to conform to their column types (Hive [0.12.0](https://issues.apache.org/jira/browse/HIVE-5297) onward). +* Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns. If [hive.typecheck.on.insert]({{% ref "#hive-typecheck-on-insert" %}}) is set to true, these values are validated, converted and normalized to conform to their column types (Hive [0.12.0](https://issues.apache.org/jira/browse/HIVE-5297) onward). * Multiple insert clauses (also known as *Multi Table Insert*) can be specified in the same query. * The output of each of the select statements is written to the chosen table (or partition). Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table or partition are replaced with the output of corresponding select statement. * The output format and serialization class is determined by the table's metadata (as specified via DDL commands on the table). -* As of [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317), if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a [transaction]({{< ref "hive-transactions" >}}) manager that implements ACID, then INSERT OVERWRITE will be disabled for that table.  This is to avoid users unintentionally overwriting transaction history.  The same functionality can be achieved by using [TRUNCATE TABLE]({{< ref "#truncate-table" >}}) (for non-partitioned tables) or [DROP PARTITION]({{< ref "#drop-partition" >}}) followed by INSERT INTO. +* As of [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317), if a table has an OutputFormat that implements AcidOutputFormat and the system is configured to use a [transaction]({{% ref "hive-transactions" %}}) manager that implements ACID, then INSERT OVERWRITE will be disabled for that table.  This is to avoid users unintentionally overwriting transaction history.  The same functionality can be achieved by using [TRUNCATE TABLE]({{% ref "#truncate-table" %}}) (for non-partitioned tables) or [DROP PARTITION]({{% ref "#drop-partition" %}}) followed by INSERT INTO. * As of Hive [1.1.0](https://issues.apache.org/jira/browse/HIVE-9353) the TABLE keyword is optional. * As of Hive [1.2.0](https://issues.apache.org/jira/browse/HIVE-9481) each INSERT INTO T can take a column list like INSERT INTO T (z, x, c1).  See Description of [HIVE-9481](https://issues.apache.org/jira/browse/HIVE-9481) for examples. * As of Hive [3.1.0](https://issues.apache.org/jira/browse/HIVE-19908) INSERT OVERWRITE from a source with UNION ALL on full CRUD ACID tables is not allowed. @@ -152,13 +152,13 @@ Here the `country` partition will be dynamically created by the last column from ###### Additional Documentation -* [Design Document]({{< ref "dynamicpartitions" >}}) +* [Design Document]({{% ref "dynamicpartitions" %}}) + [Original design doc](https://issues.apache.org/jira/secure/attachment/12437909/dp_design.txt) + [HIVE-936](https://issues.apache.org/jira/browse/HIVE-936) -* [Tutorial: Dynamic-Partition Insert]({{< ref "#tutorial:-dynamic-partition-insert" >}}) -* [HCatalog Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) - + [Usage with Pig]({{< ref "#usage-with-pig" >}}) - + [Usage from MapReduce]({{< ref "#usage-from-mapreduce" >}}) +* [Tutorial: Dynamic-Partition Insert]({{% ref "#tutorial:-dynamic-partition-insert" %}}) +* [HCatalog Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) + + [Usage with Pig]({{% ref "#usage-with-pig" %}}) + + [Usage from MapReduce]({{% ref "#usage-from-mapreduce" %}}) ### Writing data into the filesystem from queries @@ -196,7 +196,7 @@ row_format * INSERT OVERWRITE statements to HDFS filesystem directories are the best way to extract large amounts of data from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job. * The directory is, as you would expect, OVERWRITten; in other words, if the specified path exists, it is clobbered and replaced with the output. * As of Hive [0.11.0](https://issues.apache.org/jira/browse/HIVE-3682) the separator used can be specified; in earlier versions it was always the ^A character (\001). However, custom separators are only supported for LOCAL writes in Hive versions 0.11.0 to 1.1.0 – this bug is fixed in version 1.2.0 (see [HIVE-5672](https://issues.apache.org/jira/browse/HIVE-5672)). -* In [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317), inserts into [ACID]({{< ref "hive-transactions" >}}) compliant tables will deactivate vectorization for the duration of the select and insert.  This will be done automatically.  ACID tables that have data inserted into them can still be queried using vectorization. +* In [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317), inserts into [ACID]({{% ref "hive-transactions" %}}) compliant tables will deactivate vectorization for the duration of the select and insert.  This will be done automatically.  ACID tables that have data inserted into them can still be queried using vectorization. ### Inserting values into tables from SQL @@ -222,7 +222,7 @@ where a value is either null or any valid SQL literal * Each row listed in the VALUES clause is inserted into table *tablename*. * Values must be provided for every column in the table. The standard SQL syntax that allows the user to insert values into only some columns is not yet supported. To mimic the standard SQL, nulls can be provided for columns the user does not wish to assign a value to. * Dynamic partitioning is supported in the same way as for [INSERT...SELECT](/docs/latest/language/languagemanual-dml#dynamic-partition-inserts). -* If the table being inserted into supports [ACID]({{< ref "hive-transactions" >}}) and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion. +* If the table being inserted into supports [ACID]({{% ref "hive-transactions" %}}) and a transaction manager that supports ACID is in use, this operation will be auto-committed upon successful completion. * Hive does not support literals for complex types (array, map, struct, union), so it is not possible to use them in INSERT INTO...VALUES clauses. This means that the user cannot insert data into a complex datatype column using the INSERT INTO...VALUES clause. ##### Examples @@ -253,7 +253,7 @@ Version Information UPDATE is available starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317). -Updates can only be performed on tables that support ACID. See [Hive Transactions]({{< ref "hive-transactions" >}}) for details. +Updates can only be performed on tables that support ACID. See [Hive Transactions]({{% ref "hive-transactions" %}}) for details. ##### Syntax @@ -274,7 +274,7 @@ UPDATE tablename SET column = value [, column = value ...] [WHERE expression] ##### Notes * Vectorization will be turned off for update operations.  This is automatic and requires no action on the part of the user.  Non-update operations are not affected.  Updated tables can still be queried using vectorization. -* In version 0.14 it is recommended that you set [hive.optimize.sort.dynamic.partition]({{< ref "#hive-optimize-sort-dynamic-partition" >}})=false when doing updates, as this produces more efficient execution plans. +* In version 0.14 it is recommended that you set [hive.optimize.sort.dynamic.partition]({{% ref "#hive-optimize-sort-dynamic-partition" %}})=false when doing updates, as this produces more efficient execution plans. ### Delete @@ -282,7 +282,7 @@ Version Information DELETE is available starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317). -Deletes can only be performed on tables that support ACID. See [Hive Transactions]({{< ref "hive-transactions" >}}) for details. +Deletes can only be performed on tables that support ACID. See [Hive Transactions]({{% ref "hive-transactions" %}}) for details. ##### Syntax @@ -299,7 +299,7 @@ DELETE FROM tablename [WHERE expression] ##### Notes * Vectorization will be turned off for delete operations.  This is automatic and requires no action on the part of the user.  Non-delete operations are not affected.  Tables with deleted data can still be queried using vectorization. -* In version 0.14 it is recommended that you set [hive.optimize.sort.dynamic.partition]({{< ref "#hive-optimize-sort-dynamic-partition" >}})=false when doing deletes, as this produces more efficient execution plans. +* In version 0.14 it is recommended that you set [hive.optimize.sort.dynamic.partition]({{% ref "#hive-optimize-sort-dynamic-partition" %}})=false when doing deletes, as this produces more efficient execution plans. ### Merge @@ -307,7 +307,7 @@ Version Information MERGE is available starting in [Hive 2.2](https://issues.apache.org/jira/browse/HIVE-10924). -Merge can only be performed on tables that support ACID. See [Hive Transactions]({{< ref "hive-transactions" >}}) for details. +Merge can only be performed on tables that support ACID. See [Hive Transactions]({{% ref "hive-transactions" %}}) for details. ##### Syntax @@ -327,7 +327,7 @@ WHEN NOT MATCHED [AND ] THEN INSERT VALUES ##### Performance Note -SQL Standard requires that an error is raised if the ON clause is such that more than 1 row in source matches a row in target.  This check is computationally expensive and may affect the overall runtime of a MERGE statement significantly.  [hive.merge.cardinality.check]({{< ref "#hive-merge-cardinality-check" >}})=false may be used to disable the check at your own risk.  If the check is disabled, but the statement has such a cross join effect, it may lead to data corruption. +SQL Standard requires that an error is raised if the ON clause is such that more than 1 row in source matches a row in target.  This check is computationally expensive and may affect the overall runtime of a MERGE statement significantly.  [hive.merge.cardinality.check]({{% ref "#hive-merge-cardinality-check" %}})=false may be used to disable the check at your own risk.  If the check is disabled, but the statement has such a cross join effect, it may lead to data corruption. ##### Notes diff --git a/content/docs/latest/language/languagemanual-explain.md b/content/docs/latest/language/languagemanual-explain.md index 4ffca5b6..09cb6781 100644 --- a/content/docs/latest/language/languagemanual-explain.md +++ b/content/docs/latest/language/languagemanual-explain.md @@ -475,13 +475,13 @@ For the below tablescan; the estimation was 500 rows; but actually the scan only ### User-level Explain Output -Since [HIVE-8600](https://issues.apache.org/jira/browse/HIVE-8600) in Hive 1.1.0, we support a user-level explain extended output for any query at the log4j INFO level after **set [**h**ive.log.explain.output]({{< ref "#**h**ive-log-explain-output" >}})****=true** (default is **false**). +Since [HIVE-8600](https://issues.apache.org/jira/browse/HIVE-8600) in Hive 1.1.0, we support a user-level explain extended output for any query at the log4j INFO level after **set [**h**ive.log.explain.output]({{% ref "#**h**ive-log-explain-output" %}})****=true** (default is **false**). -Since [HIVE-18469](https://issues.apache.org/jira/browse/HIVE-18469) in Hive 3.1.0, the user-level explain extended output for any query will be shown in the WebUI / Drilldown / Query Plan after **set [hive.server2.webui.explain.output]({{< ref "#hive-server2-webui-explain-output" >}})=true** (default is **false**). +Since [HIVE-18469](https://issues.apache.org/jira/browse/HIVE-18469) in Hive 3.1.0, the user-level explain extended output for any query will be shown in the WebUI / Drilldown / Query Plan after **set [hive.server2.webui.explain.output]({{% ref "#hive-server2-webui-explain-output" %}})=true** (default is **false**). -Since [HIVE-9780](https://issues.apache.org/jira/browse/HIVE-9780) in Hive 1.2.0, we support a user-level explain for Hive on Tez users. After **set [hive.explain.user]({{< ref "#hive-explain-user" >}})=true** (default is **false**) if the following query is sent, the user can see a much more clearly readable tree of operations. +Since [HIVE-9780](https://issues.apache.org/jira/browse/HIVE-9780) in Hive 1.2.0, we support a user-level explain for Hive on Tez users. After **set [hive.explain.user]({{% ref "#hive-explain-user" %}})=true** (default is **false**) if the following query is sent, the user can see a much more clearly readable tree of operations. -Since [HIVE-11133](https://issues.apache.org/jira/browse/HIVE-11133) in Hive 3.0.0, we support a user-level explain for Hive on Spark users. A separate configuration is used for Hive-on-Spark, **[hive.spark.explain.user]({{< ref "#hive-spark-explain-user" >}})** which is set to false by default. +Since [HIVE-11133](https://issues.apache.org/jira/browse/HIVE-11133) in Hive 3.0.0, we support a user-level explain for Hive on Spark users. A separate configuration is used for Hive-on-Spark, **[hive.spark.explain.user]({{% ref "#hive-spark-explain-user" %}})** which is set to false by default. ``` EXPLAIN select sum(hash(key)), sum(hash(value)) from src_orc_merge_test_part where ds='2012-01-03' and ts='2012-01-03+14:46:31' diff --git a/content/docs/latest/language/languagemanual-groupby.md b/content/docs/latest/language/languagemanual-groupby.md index ae0774c5..6eb10cb3 100644 --- a/content/docs/latest/language/languagemanual-groupby.md +++ b/content/docs/latest/language/languagemanual-groupby.md @@ -18,8 +18,8 @@ groupByQuery: SELECT expression (, expression)* FROM src groupByClause? In `groupByExpression` columns are specified by name, not by position number. However in [Hive 0.11.0](https://issues.apache.org/jira/browse/HIVE-581) and later, columns can be specified by position when configured as follows: -* For Hive 0.11.0 through 2.1.x, set [hive.groupby.orderby.position.alias]({{< ref "#hive-groupby-orderby-position-alias" >}}) to true (the default is false). -* For Hive 2.2.0 and later, set [hive.groupby.position.alias]({{< ref "#hive-groupby-position-alias" >}}) to true (the default is false). +* For Hive 0.11.0 through 2.1.x, set [hive.groupby.orderby.position.alias]({{% ref "#hive-groupby-orderby-position-alias" %}}) to true (the default is false). +* For Hive 2.2.0 and later, set [hive.groupby.position.alias]({{% ref "#hive-groupby-position-alias" %}}) to true (the default is false). ### Simple Examples @@ -151,7 +151,7 @@ Version Grouping sets, CUBE and ROLLUP operators, and the GROUPING__ID function were added in Hive release 0.10.0. -See [Enhanced Aggregation, Cube, Grouping and Rollup]({{< ref "enhanced-aggregation-cube-grouping-and-rollup" >}}) for information about these aggregation operators. +See [Enhanced Aggregation, Cube, Grouping and Rollup]({{% ref "enhanced-aggregation-cube-grouping-and-rollup" %}}) for information about these aggregation operators. Also see the JIRAs: diff --git a/content/docs/latest/language/languagemanual-importexport.md b/content/docs/latest/language/languagemanual-importexport.md index 16f733f5..ca0f3b8d 100644 --- a/content/docs/latest/language/languagemanual-importexport.md +++ b/content/docs/latest/language/languagemanual-importexport.md @@ -9,7 +9,7 @@ date: 2024-12-12 The `EXPORT` and `IMPORT` commands were added in Hive 0.8.0 (see [HIVE-1918](https://issues.apache.org/jira/browse/HIVE-1918)). -Replication extensions to the `EXPORT` and `IMPORT` commands were added in Hive 1.2.0 (see [HIVE-7973](https://issues.apache.org/jira/browse/HIVE-7973) and [Hive Replication Development]({{< ref "hivereplicationdevelopment" >}})). +Replication extensions to the `EXPORT` and `IMPORT` commands were added in Hive 1.2.0 (see [HIVE-7973](https://issues.apache.org/jira/browse/HIVE-7973) and [Hive Replication Development]({{% ref "hivereplicationdevelopment" %}})). ### Overview diff --git a/content/docs/latest/language/languagemanual-indexing.md b/content/docs/latest/language/languagemanual-indexing.md index b9e9ff98..bd1f4bcf 100644 --- a/content/docs/latest/language/languagemanual-indexing.md +++ b/content/docs/latest/language/languagemanual-indexing.md @@ -10,7 +10,7 @@ date: 2024-12-12 There are alternate options which might work similarily to indexing: * Materialized views with automatic rewriting can result in very similar results.  [Hive 2.3.0](https://issues.apache.org/jira/browse/HIVE-14249) adds support for materialzed views. -* Using columnar file formats ([Parquet]({{< ref "parquet" >}}), [ORC](https://orc.apache.org/docs/indexes.html)) – they can do selective scanning; they may even skip entire files/blocks. +* Using columnar file formats ([Parquet]({{% ref "parquet" %}}), [ORC](https://orc.apache.org/docs/indexes.html)) – they can do selective scanning; they may even skip entire files/blocks. Indexing has been **removed** in version 3.0 ([HIVE-18448](https://issues.apache.org/jira/browse/HIVE-18448)). @@ -29,7 +29,7 @@ Hive indexing was added in version 0.7.0, and bitmap indexing was added in versi Documentation and examples of how to use Hive indexes can be found here: * [Indexes](/development/desingdocs/indexdev) – design document (lists indexing JIRAs with current status, starting with [HIVE-417](https://issues.apache.org/jira/browse/HIVE-417)) -* [Create/Drop/Alter Index]({{< ref "#create/drop/alter-index" >}}) – [HiveQL Language Manual DDL](/docs/latest/language/languagemanual-ddl) +* [Create/Drop/Alter Index]({{% ref "#create/drop/alter-index" %}}) – [HiveQL Language Manual DDL](/docs/latest/language/languagemanual-ddl) * [Show Indexes](/docs/latest/language/languagemanual-ddl#show-indexes) – [HiveQL Language Manual DDL](/docs/latest/language/languagemanual-ddl) * [Bitmap indexes](/development/desingdocs/indexdev-bitmap) – added in Hive version 0.8.0 ([HIVE-1803](https://issues.apache.org/jira/browse/HIVE-1803)) * [Indexed Hive](http://www.slideshare.net/NikhilDeshpande/indexed-hive) – overview and examples by Prafulla Tekawade and Nikhil Deshpande, October 2010 @@ -37,7 +37,7 @@ Documentation and examples of how to use Hive indexes can be found here: ### Configuration Parameters for Hive Indexes -The [Configuration Properties]({{< ref "#configuration-properties" >}}) document describes parameters that configure Hive indexes. +The [Configuration Properties]({{% ref "#configuration-properties" %}}) document describes parameters that configure Hive indexes. ## Simple Examples diff --git a/content/docs/latest/language/languagemanual-joinoptimization.md b/content/docs/latest/language/languagemanual-joinoptimization.md index 8aeecdbf..ae69a029 100644 --- a/content/docs/latest/language/languagemanual-joinoptimization.md +++ b/content/docs/latest/language/languagemanual-joinoptimization.md @@ -64,7 +64,7 @@ store_sales join time_dim on (ss_sold_time_sk = t_time_sk) ``` -The default value for [hive.auto.convert.join]({{< ref "#hive-auto-convert-join" >}}) was false in Hive 0.10.0.  Hive 0.11.0 changed the default to true ([HIVE-3297](https://issues.apache.org/jira/browse/HIVE-3297)). Note that [hive-default.xml.template]({{< ref "#hive-default-xml-template" >}}) incorrectly gives the default as false in Hive 0.11.0 through 0.13.1. +The default value for [hive.auto.convert.join]({{% ref "#hive-auto-convert-join" %}}) was false in Hive 0.10.0.  Hive 0.11.0 changed the default to true ([HIVE-3297](https://issues.apache.org/jira/browse/HIVE-3297)). Note that [hive-default.xml.template]({{% ref "#hive-default-xml-template" %}}) incorrectly gives the default as false in Hive 0.11.0 through 0.13.1. MAPJOINs are processed by loading the smaller table into an in-memory hash map and matching keys with the larger table as they are streamed through. The prior implementation has this division of labor: @@ -122,7 +122,7 @@ It is likely, though, that for small dimension tables the parts of both tables n 2. Merge MJ->MJ into a single MJ when possible. 3. Merge MJ* patterns into a single Map stage as a chain of MJ operators. (Not yet implemented.) -If `[hive.auto.convert.join]({{< ref "#hive-auto-convert-join" >}})` is set to true the optimizer not only converts joins to mapjoins but also merges MJ* patterns as much as possible. +If `[hive.auto.convert.join]({{% ref "#hive-auto-convert-join" %}})` is set to true the optimizer not only converts joins to mapjoins but also merges MJ* patterns as much as possible. #### Optimize Auto Join Conversion @@ -134,9 +134,9 @@ set hive.auto.convert.join.noconditionaltask.size = 10000000; ``` -The default for `[hive.auto.convert.join.noconditionaltask]({{< ref "#hive-auto-convert-join-noconditionaltask" >}})` is true which means auto conversion is enabled. (Originally the default was false – see [HIVE-3784](https://issues.apache.org/jira/browse/HIVE-3784) – but it was changed to true by [HIVE-4146](https://issues.apache.org/jira/browse/HIVE-4146) before Hive 0.11.0 was released.) +The default for `[hive.auto.convert.join.noconditionaltask]({{% ref "#hive-auto-convert-join-noconditionaltask" %}})` is true which means auto conversion is enabled. (Originally the default was false – see [HIVE-3784](https://issues.apache.org/jira/browse/HIVE-3784) – but it was changed to true by [HIVE-4146](https://issues.apache.org/jira/browse/HIVE-4146) before Hive 0.11.0 was released.) -The [size configuration]({{< ref "#size-configuration" >}}) enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory. Currently, n-1 tables of the join have to fit in memory for the map-join optimization to take effect. There is no check to see if the table is a compressed one or not and what the potential size of the table can be. The effect of this assumption on the results is discussed in the next section. +The [size configuration]({{% ref "#size-configuration" %}}) enables the user to control what size table can fit in memory. This value represents the sum of the sizes of tables that can be converted to hashmaps that fit in memory. Currently, n-1 tables of the join have to fit in memory for the map-join optimization to take effect. There is no check to see if the table is a compressed one or not and what the potential size of the table can be. The effect of this assumption on the results is discussed in the next section. For example, the previous query just becomes: @@ -157,7 +157,7 @@ Auto join conversion also affects the sort-merge-bucket joins. Version 0.13.0 and later -Hive 0.13.0 introduced `[hive.auto.convert.join.use.nonstaged]({{< ref "#hive-auto-convert-join-use-nonstaged" >}})` with a default of false ([HIVE-6144](https://issues.apache.org/jira/browse/HIVE-6144)). +Hive 0.13.0 introduced `[hive.auto.convert.join.use.nonstaged]({{% ref "#hive-auto-convert-join-use-nonstaged" %}})` with a default of false ([HIVE-6144](https://issues.apache.org/jira/browse/HIVE-6144)). For conditional joins, if the input stream from a small alias can be directly applied to the join operator without filtering or projection, then it does not need to be pre-staged in the distributed cache via a MapReduce local task. Setting `hive.auto.convert.join.use.nonstaged` to true avoids pre-staging in those cases. diff --git a/content/docs/latest/language/languagemanual-joins.md b/content/docs/latest/language/languagemanual-joins.md index af65bf2a..1abbe0ed 100644 --- a/content/docs/latest/language/languagemanual-joins.md +++ b/content/docs/latest/language/languagemanual-joins.md @@ -30,7 +30,7 @@ join_condition: ``` -See [Select Syntax]({{< ref "#select-syntax" >}}) for the context of this join syntax. +See [Select Syntax]({{% ref "#select-syntax" %}}) for the context of this join syntax. Version 0.13.0+: Implicit join notation @@ -169,7 +169,7 @@ will join a on b, producing a list of a.val and b.val. The WHERE clause, however ...first joins a on b, throwing away everything in a or b that does not have a corresponding key in the other table. The reduced table is then joined on c. This provides unintuitive results if there is a key that exists in both a and c but not b: The whole row (including a.val1, a.val2, and a.key) is dropped in the "a JOIN b" step because it is not in b. The result does not have a.key in it, so when it is LEFT OUTER JOINed with c, c.val does not make it in because there is no c.key that matches an a.key (because that row from a was removed). Similarly, if this were a RIGHT OUTER JOIN (instead of LEFT), we would end up with an even weirder effect: NULL, NULL, NULL, c.val, because even though we specified a.key=c.key as the join key, we dropped all rows of a that did not match the first JOIN. To achieve the more intuitive effect, we should instead do FROM c LEFT OUTER JOIN a ON (c.key = a.key) LEFT OUTER JOIN b ON (c.key = b.key). -* LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using [subqueries]({{< ref "languagemanual-subqueries" >}}) so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN are that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc. +* LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way. As of Hive 0.13 the IN/NOT IN/EXISTS/NOT EXISTS operators are supported using [subqueries]({{% ref "languagemanual-subqueries" %}}) so most of these JOINs don't have to be performed manually anymore. The restrictions of using LEFT SEMI JOIN are that the right-hand-side table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses etc. ``` SELECT a.key, a.value @@ -268,7 +268,7 @@ The above query is not supported. Without the mapjoin hint, the above query woul ### Predicate Pushdown in Outer Joins -See [Hive Outer Join Behavior]({{< ref "outerjoinbehavior" >}}) for information about predicate pushdown in outer joins. +See [Hive Outer Join Behavior]({{% ref "outerjoinbehavior" %}}) for information about predicate pushdown in outer joins. ### Enhancements in Hive Version 0.11 diff --git a/content/docs/latest/language/languagemanual-lateralview.md b/content/docs/latest/language/languagemanual-lateralview.md index ed4ad42e..45362049 100644 --- a/content/docs/latest/language/languagemanual-lateralview.md +++ b/content/docs/latest/language/languagemanual-lateralview.md @@ -15,7 +15,7 @@ fromClause: FROM baseTable (lateralView)* ## Description -Lateral view is used in conjunction with user-defined table generating functions such as `explode()`. As mentioned in [Built-in Table-Generating Functions]({{< ref "#built-in-table-generating-functions" >}}), a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias. +Lateral view is used in conjunction with user-defined table generating functions such as `explode()`. As mentioned in [Built-in Table-Generating Functions]({{% ref "#built-in-table-generating-functions" %}}), a UDTF generates zero or more output rows for each input row. A lateral view first applies the UDTF to each row of base table and then joins resulting output rows to the input rows to form a virtual table having the supplied table alias. Version @@ -43,7 +43,7 @@ An example table with two rows: and the user would like to count the total number of times an ad appears across all pages. -A lateral view with [explode()]({{< ref "#explode--" >}}) can be used to convert `adid_list` into separate rows using the query: +A lateral view with [explode()]({{% ref "#explode--" %}}) can be used to convert `adid_list` into separate rows using the query: ``` SELECT pageid, adid diff --git a/content/docs/latest/language/languagemanual-lzo.md b/content/docs/latest/language/languagemanual-lzo.md index 182fb7ae..ffd0a540 100644 --- a/content/docs/latest/language/languagemanual-lzo.md +++ b/content/docs/latest/language/languagemanual-lzo.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## General LZO Concepts -LZO is a lossless data compression library that favors speed over compression ratio. See and for general information about LZO and see [Compressed Data Storage]({{< ref "compressedstorage" >}}) for information about compression in Hive. +LZO is a lossless data compression library that favors speed over compression ratio. See and for general information about LZO and see [Compressed Data Storage]({{% ref "compressedstorage" %}}) for information about compression in Hive. Imagine a simple data file that has three columns @@ -80,7 +80,7 @@ hive -e "CREATE EXTERNAL TABLE IF NOT EXISTS hive_table_name (column_1 datatype Note: The double quotes have to be escaped so that the '`hive -e`' command works correctly. -See [CREATE TABLE]({{< ref "#create-table" >}}) and [Hive CLI]({{< ref "languagemanual-cli" >}}) for information about command syntax. +See [CREATE TABLE]({{% ref "#create-table" %}}) and [Hive CLI]({{% ref "languagemanual-cli" %}}) for information about command syntax. ## Hive Queries diff --git a/content/docs/latest/language/languagemanual-orc.md b/content/docs/latest/language/languagemanual-orc.md index 4b7f5919..639a79d0 100644 --- a/content/docs/latest/language/languagemanual-orc.md +++ b/content/docs/latest/language/languagemanual-orc.md @@ -54,7 +54,7 @@ Having relatively frequent row index entries enables row-skipping within a strip With the ability to skip large sets of rows based on filter predicates, you can sort a table on its secondary keys to achieve a big reduction in execution time. For example, if the primary partition is transaction date, the table can be sorted on state, zip code, and last name. Then looking for records in one state will skip the records of all other states. -A complete specification of the format is given in the [ORC specification]({{< ref "#orc-specification" >}}). +A complete specification of the format is given in the [ORC specification]({{% ref "#orc-specification" %}}). ## HiveQL Syntax @@ -64,7 +64,7 @@ File formats are specified at the table (or partition) level. You can specify th * `ALTER TABLE ... [PARTITION partition_spec] SET FILEFORMAT ORC` * `SET hive.default.fileformat=Orc` -The parameters are all placed in the TBLPROPERTIES (see [Create Table]({{< ref "#create-table" >}})). They are: +The parameters are all placed in the TBLPROPERTIES (see [Create Table]({{% ref "#create-table" %}})). They are: | Key | Default | Notes | | --- | --- | --- | @@ -91,7 +91,7 @@ create table Addresses ( Version 0.14.0+: CONCATENATE -`[ALTER TABLE table_name [PARTITION partition_spec] CONCATENATE]({{< ref "#alter-table-table_name-[partition-partition_spec]-concatenate" >}})` can be used to merge small ORC files into a larger file, starting in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7509). The merge happens at the stripe level, which avoids decompressing and decoding the data. +`[ALTER TABLE table_name [PARTITION partition_spec] CONCATENATE]({{% ref "#alter-table-table_name-[partition-partition_spec]-concatenate" %}})` can be used to merge small ORC files into a larger file, starting in [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7509). The merge happens at the stripe level, which avoids decompressing and decoding the data. ## Serialization and Compression @@ -163,7 +163,7 @@ hive --orcfiledump [-j] [-p] [-d] [-t] [--rowindex ] [--recover] [--ski Specifying `-d` in the command will cause it to dump the ORC file data rather than the metadata (Hive [1.1.0](https://issues.apache.org/jira/browse/HIVE-7896) and later). -Specifying `--rowindex` with a comma separated list of column ids will cause it to print [row indexes]({{< ref "#row-indexes" >}}) for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive [1.1.0](https://issues.apache.org/jira/browse/HIVE-7896) and later). +Specifying `--rowindex` with a comma separated list of column ids will cause it to print [row indexes]({{% ref "#row-indexes" %}}) for the specified columns, where 0 is the top level struct containing all of the columns and 1 is the first column id (Hive [1.1.0](https://issues.apache.org/jira/browse/HIVE-7896) and later). Specifying `-t` in the command will print the timezone id of the writer. @@ -181,7 +181,7 @@ Specifying `--backup-path` with a *new-path* will let the recovery tool move ## ORC Configuration Parameters -The ORC configuration parameters are described in [Hive Configuration Properties – ORC File Format]({{< ref "#hive-configuration-properties –-orc-file-format" >}}). +The ORC configuration parameters are described in [Hive Configuration Properties – ORC File Format]({{% ref "#hive-configuration-properties –-orc-file-format" %}}). # ORC Format Specification diff --git a/content/docs/latest/language/languagemanual-sampling.md b/content/docs/latest/language/languagemanual-sampling.md index 2c9e5823..02805a1d 100644 --- a/content/docs/latest/language/languagemanual-sampling.md +++ b/content/docs/latest/language/languagemanual-sampling.md @@ -50,7 +50,7 @@ On the other hand the tablesample clause would pick out half of the 3rd cluster as each bucket would be composed of (32/64)=1/2 of a cluster. -For information about creating bucketed tables with the CLUSTERED BY clause, see [Create Table]({{< ref "#create-table" >}}) (especially [Bucketed Sorted Tables]({{< ref "#bucketed-sorted-tables" >}})) and [Bucketed Tables]({{< ref "languagemanual-ddl-bucketedtables" >}}). +For information about creating bucketed tables with the CLUSTERED BY clause, see [Create Table]({{% ref "#create-table" %}}) (especially [Bucketed Sorted Tables]({{% ref "#bucketed-sorted-tables" %}})) and [Bucketed Tables]({{% ref "languagemanual-ddl-bucketedtables" %}}). ### Block Sampling diff --git a/content/docs/latest/language/languagemanual-select.md b/content/docs/latest/language/languagemanual-select.md index 3a35f0da..eee50dad 100644 --- a/content/docs/latest/language/languagemanual-select.md +++ b/content/docs/latest/language/languagemanual-select.md @@ -21,12 +21,12 @@ SELECT [ALL | DISTINCT] select_expr, select_expr, ... ``` -* A SELECT statement can be part of a [union]({{< ref "languagemanual-union" >}}) query or a [subquery]({{< ref "languagemanual-subqueries" >}}) of another query. -* `table_reference` indicates the input to the query. It can be a regular table, [a view]({{< ref "#a-view" >}}), a [join construct]({{< ref "languagemanual-joins" >}}) or a [subquery]({{< ref "languagemanual-subqueries" >}}). +* A SELECT statement can be part of a [union]({{% ref "languagemanual-union" %}}) query or a [subquery]({{% ref "languagemanual-subqueries" %}}) of another query. +* `table_reference` indicates the input to the query. It can be a regular table, [a view]({{% ref "#a-view" %}}), a [join construct]({{% ref "languagemanual-joins" %}}) or a [subquery]({{% ref "languagemanual-subqueries" %}}). * Table names and column names are case insensitive. + In Hive 0.12 and earlier, only alphanumeric and underscore characters are allowed in table and column names. + In Hive 0.13 and later, column names can contain any [Unicode](http://en.wikipedia.org/wiki/List_of_Unicode_characters) character (see [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). Any column name that is specified within backticks (```) is treated literally. Within a backtick string, use double backticks (````) to represent a backtick character. - + To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property `[hive.support.quoted.identifiers]({{< ref "#hive-support-quoted-identifiers" >}})` to `none`. In this configuration, backticked names are interpreted as regular expressions. For details, see [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) (attached to [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). Also see [REGEX Column Specification]({{< ref "#regex-column-specification" >}}) below. + + To revert to pre-0.13.0 behavior and restrict column names to alphanumeric and underscore characters, set the configuration property `[hive.support.quoted.identifiers]({{% ref "#hive-support-quoted-identifiers" %}})` to `none`. In this configuration, backticked names are interpreted as regular expressions. For details, see [Supporting Quoted Identifiers in Column Names](https://issues.apache.org/jira/secure/attachment/12618321/QuotedIdentifier.html) (attached to [HIVE-6013](https://issues.apache.org/jira/browse/HIVE-6013)). Also see [REGEX Column Specification]({{% ref "#regex-column-specification" %}}) below. * Simple query. For example, the following query retrieves all columns and all rows from table t1. ``` @@ -36,12 +36,12 @@ SELECT * FROM t1  Note As of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-4144), FROM is optional (for example, `SELECT 1+1`). -* To get the current database (as of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-4144)), use the [current_database() function]({{< ref "#current_database---function" >}}): +* To get the current database (as of [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-4144)), use the [current_database() function]({{% ref "#current_database---function" %}}): ``` SELECT current_database() ``` -* To specify a database, either qualify the table names with database names ("`db_name.table_name`" starting in [Hive 0.7](https://issues.apache.org/jira/browse/HIVE-1517)) or issue the [USE statement]({{< ref "#use-statement" >}}) before the query statement (starting in [Hive 0.6](https://issues.apache.org/jira/browse/HIVE-675)). +* To specify a database, either qualify the table names with database names ("`db_name.table_name`" starting in [Hive 0.7](https://issues.apache.org/jira/browse/HIVE-1517)) or issue the [USE statement]({{% ref "#use-statement" %}}) before the query statement (starting in [Hive 0.6](https://issues.apache.org/jira/browse/HIVE-675)). "`db_name.table_name`" allows a query to access tables in different databases. @@ -55,14 +55,14 @@ USE default; ### WHERE Clause -The WHERE condition is a [boolean]({{< ref "languagemanual-types" >}}) expression. For example, the following query returns only those sales records which have an amount greater than 10 from the US region. Hive supports a number of [operators and UDFs]({{< ref "languagemanual-udf" >}}) in the WHERE clause: +The WHERE condition is a [boolean]({{% ref "languagemanual-types" %}}) expression. For example, the following query returns only those sales records which have an amount greater than 10 from the US region. Hive supports a number of [operators and UDFs]({{% ref "languagemanual-udf" %}}) in the WHERE clause: ``` SELECT * FROM sales WHERE amount > 10 AND region = "US" ``` -As of Hive 0.13 some types of [subqueries]({{< ref "languagemanual-subqueries" >}}) are supported in the WHERE clause. +As of Hive 0.13 some types of [subqueries]({{% ref "languagemanual-subqueries" %}}) are supported in the WHERE clause. ### ALL and DISTINCT Clauses @@ -84,11 +84,11 @@ hive> SELECT DISTINCT col1 FROM t1 ``` -ALL and DISTINCT can also be used in a UNION clause – see [Union Syntax]({{< ref "#union-syntax" >}}) for more information. +ALL and DISTINCT can also be used in a UNION clause – see [Union Syntax]({{% ref "#union-syntax" %}}) for more information. ### Partition Based Queries -In general, a SELECT query scans the entire table (other than for [sampling]({{< ref "languagemanual-sampling" >}})). If a table created using the [PARTITIONED BY]({{< ref "#partitioned-by" >}}) clause, a query can do **partition pruning** and scan only a fraction of the table relevant to the partitions specified by the query. Hive currently does partition pruning if the partition predicates are specified in the WHERE clause or the ON clause in a JOIN. For example, if table page_views is partitioned on column date, the following query retrieves rows for just days between 2008-03-01 and 2008-03-31. +In general, a SELECT query scans the entire table (other than for [sampling]({{% ref "languagemanual-sampling" %}})). If a table created using the [PARTITIONED BY]({{% ref "#partitioned-by" %}}) clause, a query can do **partition pruning** and scan only a fraction of the table relevant to the partitions specified by the query. Hive currently does partition pruning if the partition predicates are specified in the WHERE clause or the ON clause in a JOIN. For example, if table page_views is partitioned on column date, the following query retrieves rows for just days between 2008-03-01 and 2008-03-31. ``` SELECT page_views.* @@ -106,9 +106,9 @@ If a table page_views is joined with another table dim_users, you can specify a ``` -* See also [Partition Filter Syntax]({{< ref "partition-filter-syntax" >}}). -* See also [Group By]({{< ref "languagemanual-groupby" >}}). -* See also [Sort By / Cluster By / Distribute By / Order By]({{< ref "languagemanual-sortby" >}}). +* See also [Partition Filter Syntax]({{% ref "partition-filter-syntax" %}}). +* See also [Group By]({{% ref "languagemanual-groupby" %}}). +* See also [Sort By / Cluster By / Distribute By / Order By]({{% ref "languagemanual-sortby" %}}). ### HAVING Clause @@ -161,7 +161,7 @@ SELECT * FROM customers ORDER BY create_date LIMIT 2,5 ### REGEX Column Specification -A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property `[hive.support.quoted.identifiers]({{< ref "#hive-support-quoted-identifiers" >}})` is set to `none`.  +A SELECT statement can take regex-based column specification in Hive releases prior to 0.13.0, or in 0.13.0 and later releases if the configuration property `[hive.support.quoted.identifiers]({{% ref "#hive-support-quoted-identifiers" %}})` is set to `none`.  * We use Java regex syntax. Try for testing purposes. * The following query selects all columns except ds and hr. @@ -175,20 +175,20 @@ SELECT `(ds|hr)?+.+` FROM sales See the following documents for additional syntax and features of SELECT statements: -* [GROUP BY]({{< ref "languagemanual-groupby" >}}) -* [SORT/ORDER/CLUSTER/DISTRIBUTE BY]({{< ref "languagemanual-sortby" >}}) -* [JOIN]({{< ref "languagemanual-joins" >}}) - + [Hive Joins]({{< ref "languagemanual-joins" >}}) - + [Join Optimization]({{< ref "languagemanual-joinoptimization" >}}) - + [Outer Join Behavior]({{< ref "outerjoinbehavior" >}}) -* [UNION]({{< ref "languagemanual-union" >}}) -* [TABLESAMPLE]({{< ref "languagemanual-sampling" >}}) -* [Subqueries]({{< ref "languagemanual-subqueries" >}}) -* [Virtual Columns]({{< ref "languagemanual-virtualcolumns" >}}) -* [Operators and UDFs]({{< ref "languagemanual-udf" >}}) -* [LATERAL VIEW]({{< ref "languagemanual-lateralview" >}}) -* [Windowing, OVER, and Analytics]({{< ref "languagemanual-windowingandanalytics" >}}) -* [Common Table Expressions]({{< ref "common-table-expression" >}}) +* [GROUP BY]({{% ref "languagemanual-groupby" %}}) +* [SORT/ORDER/CLUSTER/DISTRIBUTE BY]({{% ref "languagemanual-sortby" %}}) +* [JOIN]({{% ref "languagemanual-joins" %}}) + + [Hive Joins]({{% ref "languagemanual-joins" %}}) + + [Join Optimization]({{% ref "languagemanual-joinoptimization" %}}) + + [Outer Join Behavior]({{% ref "outerjoinbehavior" %}}) +* [UNION]({{% ref "languagemanual-union" %}}) +* [TABLESAMPLE]({{% ref "languagemanual-sampling" %}}) +* [Subqueries]({{% ref "languagemanual-subqueries" %}}) +* [Virtual Columns]({{% ref "languagemanual-virtualcolumns" %}}) +* [Operators and UDFs]({{% ref "languagemanual-udf" %}}) +* [LATERAL VIEW]({{% ref "languagemanual-lateralview" %}}) +* [Windowing, OVER, and Analytics]({{% ref "languagemanual-windowingandanalytics" %}}) +* [Common Table Expressions]({{% ref "common-table-expression" %}})   diff --git a/content/docs/latest/language/languagemanual-sortby.md b/content/docs/latest/language/languagemanual-sortby.md index 863b3514..1ad425e4 100644 --- a/content/docs/latest/language/languagemanual-sortby.md +++ b/content/docs/latest/language/languagemanual-sortby.md @@ -7,7 +7,7 @@ date: 2024-12-12 # Order, Sort, Cluster, and Distribute By -This describes the syntax of SELECT clauses ORDER BY, SORT BY, CLUSTER BY, and DISTRIBUTE BY.  See [Select Syntax]({{< ref "#select-syntax" >}}) for general information. +This describes the syntax of SELECT clauses ORDER BY, SORT BY, CLUSTER BY, and DISTRIBUTE BY.  See [Select Syntax]({{% ref "#select-syntax" %}}) for general information. ## Syntax of Order By @@ -21,7 +21,7 @@ query: SELECT expression (',' expression)* FROM src orderBy ``` -There are some limitations in the "order by" clause. In the strict mode (i.e., [hive.mapred.mode]({{< ref "#hive-mapred-mode" >}})=strict), the order by clause has to be followed by a "limit" clause. The limit clause is not necessary if you set hive.mapred.mode to nonstrict. The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output. If the number of rows in the output is too large, the single reducer could take a very long time to finish. +There are some limitations in the "order by" clause. In the strict mode (i.e., [hive.mapred.mode]({{% ref "#hive-mapred-mode" %}})=strict), the order by clause has to be followed by a "limit" clause. The limit clause is not necessary if you set hive.mapred.mode to nonstrict. The reason is that in order to impose total order of all results, there has to be one reducer to sort the final output. If the number of rows in the output is too large, the single reducer could take a very long time to finish. Note that columns are specified by name, not by position number. However in [Hive 0.11.0](https://issues.apache.org/jira/browse/HIVE-581) and later, columns can be specified by position when configured as follows: @@ -32,7 +32,7 @@ The default sorting order is ascending (ASC). In [Hive 2.1.0](https://issues.apache.org/jira/browse/HIVE-12994) and later, specifying the null sorting order for each of the columns in the "order by" clause is supported. The default null sorting order for ASC order is NULLS FIRST, while the default null sorting order for DESC order is NULLS LAST. -In [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-6348) and later, order by without limit in [subqueries]({{< ref "languagemanual-subqueries" >}}) and [views]({{< ref "#views" >}}) will be removed by the optimizer. To disable it, set [hive.remove.orderby.in.subquery]({{< ref "#hive-remove-orderby-in-subquery" >}}) to false. +In [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-6348) and later, order by without limit in [subqueries]({{% ref "languagemanual-subqueries" %}}) and [views]({{% ref "#views" %}}) will be removed by the optimizer. To disable it, set [hive.remove.orderby.in.subquery]({{% ref "#hive-remove-orderby-in-subquery" %}}) to false. ## Syntax of Sort By @@ -47,7 +47,7 @@ query: SELECT expression (',' expression)* FROM src sortBy Hive uses the columns in *SORT BY* to sort the rows before feeding the rows to a reducer. The sort order will be dependent on the column types. If the column is of numeric type, then the sort order is also in numeric order. If the column is of string type, then the sort order will be lexicographical order. -In [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-6348) and later, sort by without limit in [subqueries]({{< ref "languagemanual-subqueries" >}}) and [views]({{< ref "#views" >}}) will be removed by the optimizer. To disable it, set [hive.remove.orderby.in.subquery]({{< ref "#hive-remove-orderby-in-subquery" >}}) to false. +In [Hive 3.0.0](https://issues.apache.org/jira/browse/HIVE-6348) and later, sort by without limit in [subqueries]({{% ref "languagemanual-subqueries" %}}) and [views]({{% ref "#views" %}}) will be removed by the optimizer. To disable it, set [hive.remove.orderby.in.subquery]({{% ref "#hive-remove-orderby-in-subquery" %}}) to false. ### Difference between Sort By and Order By @@ -99,7 +99,7 @@ AS whatever ## Syntax of Cluster By and Distribute By -*Cluster By* and *Distribute By* are used mainly with the [Transform/Map-Reduce Scripts]({{< ref "languagemanual-transform" >}}). But, it is sometimes useful in SELECT statements if there is a need to partition and sort the output of a query for subsequent queries. +*Cluster By* and *Distribute By* are used mainly with the [Transform/Map-Reduce Scripts]({{% ref "languagemanual-transform" %}}). But, it is sometimes useful in SELECT statements if there is a need to partition and sort the output of a query for subsequent queries. *Cluster By* is a short-cut for both *Distribute By* and *Sort By*. diff --git a/content/docs/latest/language/languagemanual-transform.md b/content/docs/latest/language/languagemanual-transform.md index a751009d..0610fbf4 100644 --- a/content/docs/latest/language/languagemanual-transform.md +++ b/content/docs/latest/language/languagemanual-transform.md @@ -15,13 +15,13 @@ In windows, use "cmd /c your_script" instead of just "your_script" Warning -It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer will not give you back what you started with! To help with this, see [REGEXP_REPLACE]({{< ref "#regexp_replace" >}}) and replace the tabs with some other character on their way into the TRANSFORM() call. +It is your responsibility to sanitize any STRING columns prior to transformation. If your STRING column contains tabs, an identity transformer will not give you back what you started with! To help with this, see [REGEXP_REPLACE]({{% ref "#regexp_replace" %}}) and replace the tabs with some other character on their way into the TRANSFORM() call. Warning Formally, *MAP ...* and *REDUCE ...* are syntactic transformations of *SELECT TRANSFORM ( ... )*. In other words, they serve as comments or notes to the reader of the query. BEWARE: Use of these keywords may be **dangerous** as (e.g.) typing "REDUCE" does not force a reduce phase to occur and typing "MAP" does not force a new map phase! -Please also see [Sort By / Cluster By / Distribute By]({{< ref "languagemanual-sortby" >}}) and Larry Ogrodnek's [blog post](http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html). +Please also see [Sort By / Cluster By / Distribute By]({{% ref "languagemanual-sortby" %}}) and Larry Ogrodnek's [blog post](http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html). ``` clusterBy: CLUSTER BY colName (',' colName)* @@ -79,7 +79,7 @@ query: #### SQL Standard Based Authorization Disallows TRANSFORM -The TRANSFORM clause is disallowed when [SQL standard based authorization]({{< ref "sql-standard-based-hive-authorization" >}}) is configured in Hive 0.13.0 and later releases ([HIVE-6415](https://issues.apache.org/jira/browse/HIVE-6415)). +The TRANSFORM clause is disallowed when [SQL standard based authorization]({{% ref "sql-standard-based-hive-authorization" %}}) is configured in Hive 0.13.0 and later releases ([HIVE-6415](https://issues.apache.org/jira/browse/HIVE-6415)). #### TRANSFORM Examples diff --git a/content/docs/latest/language/languagemanual-types.md b/content/docs/latest/language/languagemanual-types.md index d4c053d1..40c61357 100644 --- a/content/docs/latest/language/languagemanual-types.md +++ b/content/docs/latest/language/languagemanual-types.md @@ -7,13 +7,13 @@ date: 2024-12-12 ## Overview -This lists all supported data types in Hive. See [Type System]({{< ref "#type-system" >}}) in the [Tutorial]({{< ref "tutorial" >}}) for additional information. +This lists all supported data types in Hive. See [Type System]({{% ref "#type-system" %}}) in the [Tutorial]({{% ref "tutorial" %}}) for additional information. For data types supported by HCatalog, see: -* [HCatLoader Data Types]({{< ref "#hcatloader-data-types" >}}) -* [HCatStorer Data Types]({{< ref "#hcatstorer-data-types" >}}) -* [HCatRecord Data Types]({{< ref "#hcatrecord-data-types" >}}) +* [HCatLoader Data Types]({{% ref "#hcatloader-data-types" %}}) +* [HCatStorer Data Types]({{% ref "#hcatstorer-data-types" %}}) +* [HCatRecord Data Types]({{% ref "#hcatrecord-data-types" %}}) ### Numeric Types @@ -115,14 +115,14 @@ Supported conversions: * Floating point numeric types: Interpreted as UNIX timestamp in seconds with decimal precision * Strings: JDBC compliant java.sql.Timestamp format "`YYYY-MM-DD HH:MM:SS.fffffffff`" (9 decimal place precision) -Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch. Convenience [UDFs]({{< ref "#udfs" >}}) for conversion to and from timezones are provided (`to_utc_timestamp`, `from_utc_timestamp`). -All existing datetime [UDFs]({{< ref "#udfs" >}}) (month, day, year, hour, etc.) work with the `TIMESTAMP` data type. +Timestamps are interpreted to be timezoneless and stored as an offset from the UNIX epoch. Convenience [UDFs]({{% ref "#udfs" %}}) for conversion to and from timezones are provided (`to_utc_timestamp`, `from_utc_timestamp`). +All existing datetime [UDFs]({{% ref "#udfs" %}}) (month, day, year, hour, etc.) work with the `TIMESTAMP` data type. Timestamps in text files have to use the format `yyyy-mm-dd hh:mm:ss[.f...]`. If they are in another format, declare them as the appropriate type (INT, FLOAT, STRING, etc.) and use a UDF to convert them to timestamps. Timestamps in Parquet files may be stored as int64 (as opposed to int96) by setting `hive.parquet.write.int64.timestamp=true` and `hive.parquet.timestamp.time.unit` to a default storage time unit. (`"nanos", "micros",` `"millis"`; default: `"micros"`). Note that because only 64 bits are stored, int64 timestamps stored as `"nanos"` will be stored as NULL if outside the range of 1677-09-21T00:12:43.15 and 2262-04-11T23:47:16.8. -On the table level, alternative timestamp formats can be supported by providing the format to the [SerDe property]({{< ref "#serde-property" >}}) "timestamp.formats" (as of release 1.2.0 with [HIVE-9298](https://issues.apache.org/jira/browse/HIVE-9298)). For example, `yyyy-MM-dd'T'HH:mm:ss.SSS,yyyy-MM-dd'T'HH:mm:ss.` +On the table level, alternative timestamp formats can be supported by providing the format to the [SerDe property]({{% ref "#serde-property" %}}) "timestamp.formats" (as of release 1.2.0 with [HIVE-9298](https://issues.apache.org/jira/browse/HIVE-9298)). For example, `yyyy-MM-dd'T'HH:mm:ss.SSS,yyyy-MM-dd'T'HH:mm:ss.` Version @@ -138,7 +138,7 @@ Dates were introduced in Hive 0.12.0 ([HIVE-4055](https://issues.apache.org/jira #### Casting Dates -Date types can only be converted to/from Date, Timestamp, or String types. Casting with user-specified formats is documented [here]({{< ref "cast-format-with-sql2016-datetime-formats" >}}). +Date types can only be converted to/from Date, Timestamp, or String types. Casting with user-specified formats is documented [here]({{% ref "cast-format-with-sql2016-datetime-formats" %}}). | Valid casts to/from Date type | Result | | --- | --- | @@ -199,7 +199,7 @@ With the changes in the Decimal data type in Hive 0.13.0, the pre-Hive 0.13.0 co If the user was on Hive 0.12.0 or earlier and created tables with decimal columns, they should perform the following steps on these tables **after** upgrading to Hive 0.13.0 or later. 1. Determine what precision/scale you would like to set for the decimal column in the table. -2. For each decimal column in the table, update the column definition to the desired precision/scale using the [ALTER TABLE]({{< ref "#alter-table" >}}) command: +2. For each decimal column in the table, update the column definition to the desired precision/scale using the [ALTER TABLE]({{% ref "#alter-table" %}}) command: ``` ALTER TABLE foo CHANGE COLUMN dec_column_name dec_column_name DECIMAL(38,18); @@ -217,7 +217,7 @@ ds=2008-04-08/hr=12 ``` 4. Each existing partition in the table must also have its DECIMAL column changed to add the desired precision/scale. -This can be done with a single [ALTER TABLE CHANGE COLUMN]({{< ref "#alter-table-change-column" >}}) by using dynamic partitioning (available for ALTER TABLE CHANGE COLUMN in Hive 0.14 or later, with [HIVE-8411](https://issues.apache.org/jira/browse/HIVE-8411)): +This can be done with a single [ALTER TABLE CHANGE COLUMN]({{% ref "#alter-table-change-column" %}}) by using dynamic partitioning (available for ALTER TABLE CHANGE COLUMN in Hive 0.14 or later, with [HIVE-8411](https://issues.apache.org/jira/browse/HIVE-8411)): ``` SET hive.exec.dynamic.partition = true; @@ -328,7 +328,7 @@ select cast(t as boolean) from decimal_2; ##### Mathematical UDFs -Decimal also supports many [arithmetic operators]({{< ref "#arithmetic-operators" >}}), [mathematical UDFs]({{< ref "#mathematical-udfs" >}}) and [UDAFs]({{< ref "#udafs" >}}) with the same syntax as used in the case of DOUBLE. +Decimal also supports many [arithmetic operators]({{% ref "#arithmetic-operators" %}}), [mathematical UDFs]({{% ref "#mathematical-udfs" %}}) and [UDAFs]({{% ref "#udafs" %}}) with the same syntax as used in the case of DOUBLE. Basic mathematical operations that can use decimal types include: @@ -384,7 +384,7 @@ Missing values are represented by the special value NULL. To import data with NU ## Change Types -When [hive.metastore.disallow.incompatible.col.type.changes]({{< ref "#hive-metastore-disallow-incompatible-col-type-changes" >}}) is set to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL. +When [hive.metastore.disallow.incompatible.col.type.changes]({{% ref "#hive-metastore-disallow-incompatible-col-type-changes" %}}) is set to false, the types of columns in Metastore can be changed from any type to any other type. After such a type change, if the data can be shown correctly with the new type, the data will be displayed. Otherwise, the data will be displayed as NULL. ## Allowed Implicit Conversions diff --git a/content/docs/latest/language/languagemanual-udf.md b/content/docs/latest/language/languagemanual-udf.md index b6247878..cc9fcd5f 100644 --- a/content/docs/latest/language/languagemanual-udf.md +++ b/content/docs/latest/language/languagemanual-udf.md @@ -9,7 +9,7 @@ date: 2024-12-12 All Hive keywords are case-insensitive, including the names of Hive operators and functions. -In [Beeline]({{< ref "#beeline" >}}) or the [CLI]({{< ref "languagemanual-cli" >}}), use the commands below to show the latest documentation: +In [Beeline]({{% ref "#beeline" %}}) or the [CLI]({{% ref "languagemanual-cli" %}}), use the commands below to show the latest documentation: ``` SHOW FUNCTIONS; @@ -20,7 +20,7 @@ DESCRIBE FUNCTION EXTENDED ; Bug for expression caching when UDF nested in UDF or function -When [hive.cache.expr.evaluation]({{< ref "#hive-cache-expr-evaluation" >}}) is set to true (which is the default) a UDF can give incorrect results if it is nested in another UDF or a Hive function. This bug affects releases 0.12.0, 0.13.0, and 0.13.1. Release 0.14.0 fixed the bug ([HIVE-7314](https://issues.apache.org/jira/browse/HIVE-7314)). +When [hive.cache.expr.evaluation]({{% ref "#hive-cache-expr-evaluation" %}}) is set to true (which is the default) a UDF can give incorrect results if it is nested in another UDF or a Hive function. This bug affects releases 0.12.0, 0.13.0, and 0.13.1. Release 0.14.0 fixed the bug ([HIVE-7314](https://issues.apache.org/jira/browse/HIVE-7314)). The problem relates to the UDF's implementation of the getDisplayString method, as [discussed](http://mail-archives.apache.org/mod_mbox/hive-user/201407.mbox/%3cCAEWg7THU-Pr1Dfv_A8VS3Uz5t3ZyJvL0f-bebg4Zb3hXkK-CGQ@mail.gmail.com%3e) in the Hive user mailing list. @@ -72,7 +72,7 @@ The following operators support various common arithmetic operations on the oper | A + B | All number types | Gives the result of adding A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. For example since every integer is a float, therefore float is a containing type of integer so the + operator on a float and an int will result in a float. | | A - B | All number types | Gives the result of subtracting B from A. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. | | A * B | All number types | Gives the result of multiplying A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. Note that if the multiplication causing overflow, you will have to cast one of the operators to a type higher in the type hierarchy. | -| A / B | All number types | Gives the result of dividing A by B. The result is a double type in most cases. When A and B are both integers, the result is a double type except when the [hive.compat]({{< ref "#hive-compat" >}}) configuration parameter is set to "0.13" or "latest" in which case the result is a decimal type. | +| A / B | All number types | Gives the result of dividing A by B. The result is a double type in most cases. When A and B are both integers, the result is a double type except when the [hive.compat]({{% ref "#hive-compat" %}}) configuration parameter is set to "0.13" or "latest" in which case the result is a decimal type. | | A DIV B | Integer types | Gives the integer part resulting from dividing A by B. E.g 17 div 3 results in 5. | | A % B | All number types | Gives the reminder resulting from dividing A by B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. | | A & B | All number types | Gives the result of bitwise AND of A and B. The type of the result is the same as the common parent(in the type hierarchy) of the types of the operands. | @@ -90,9 +90,9 @@ The following operators provide support for creating logical expressions. All of | A OR B | boolean | TRUE if either A or B or both are TRUE, FALSE OR NULL is NULL, otherwise FALSE. | | NOT A | boolean | TRUE if A is FALSE or NULL if A is NULL. Otherwise FALSE. | | ! A | boolean | Same as NOT A. | -| A IN (val1, val2, ...) | boolean | TRUE if A is equal to any of the values. As of Hive 0.13 [subqueries]({{< ref "languagemanual-subqueries" >}}) are supported in IN statements. | -| A NOT IN (val1, val2, ...) | boolean | TRUE if A is not equal to any of the values. As of Hive 0.13 [subqueries]({{< ref "languagemanual-subqueries" >}}) are supported in NOT IN statements. | -| [NOT] EXISTS (subquery) | | TRUE if the the subquery returns at least one row. Supported as of [Hive 0.13]({{< ref "languagemanual-subqueries" >}}). | +| A IN (val1, val2, ...) | boolean | TRUE if A is equal to any of the values. As of Hive 0.13 [subqueries]({{% ref "languagemanual-subqueries" %}}) are supported in IN statements. | +| A NOT IN (val1, val2, ...) | boolean | TRUE if A is not equal to any of the values. As of Hive 0.13 [subqueries]({{% ref "languagemanual-subqueries" %}}) are supported in NOT IN statements. | +| [NOT] EXISTS (subquery) | | TRUE if the the subquery returns at least one row. Supported as of [Hive 0.13]({{% ref "languagemanual-subqueries" %}}). | ### String Operators @@ -272,7 +272,7 @@ The following built-in String functions are supported in Hive: | int | character_length(string str) | Returns the number of UTF-8 characters contained in str (as of Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15979)). The function char_length is shorthand for this function. | | string | chr(bigint|double A) | Returns the ASCII character having the binary equivalent to A (as of Hive [1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13063)). If A is larger than 256 the result is equivalent to chr(A % 256). Example: select chr(88); returns "X". | | string | concat(string|binary A, string|binary B...) | Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. | -| array\\> | context_ngrams(array\\>, array\, int K, int pf) | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | +| array\\> | context_ngrams(array\\>, array\, int K, int pf) | Returns the top-k contextual N-grams from a set of tokenized sentences, given a string of "context". See [StatisticsAndDataMining]({{% ref "statisticsanddatamining" %}}) for more information. | | string | concat_ws(string SEP, string A, string B...) | Like concat() above, but with custom separator SEP. | | string | concat_ws(string SEP, array\) | Like concat_ws() above, but taking an array of strings. (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2203)) | | string | decode(binary bin, string charset) | Decodes the first argument into a String using the provided character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null. (As of Hive [0.12.0](https://issues.apache.org/jira/browse/HIVE-2482).) | @@ -289,7 +289,7 @@ The following built-in String functions are supported in Hive: | string | lower(string A) lcase(string A) | Returns the string resulting from converting all characters of B to lower case. For example, lower('fOoBaR') results in 'foobar'. | | string | lpad(string str, int len, string pad) | Returns str, left-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters. In case of empty pad string, the return value is null. | | string | ltrim(string A) | Returns the string resulting from trimming spaces from the beginning(left hand side) of A. For example, ltrim(' foobar ') results in 'foobar '. | -| array\\> | ngrams(array\\>, int N, int K, int pf) | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{< ref "statisticsanddatamining" >}}) for more information. | +| array\\> | ngrams(array\\>, int N, int K, int pf) | Returns the top-k N-grams from a set of tokenized sentences, such as those returned by the sentences() UDAF. See [StatisticsAndDataMining]({{% ref "statisticsanddatamining" %}}) for more information. | | int | octet_length(string str) | Returns the number of octets required to hold the string str in UTF-8 encoding (since Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-15979)). Note that octet_length(str) can be larger than character_length(str). | | string | parse_url(string urlString, string partToExtract [, string keyToExtract]) | Returns the specified part from the URL. Valid values for partToExtract include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO. For example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'HOST') returns 'facebook.com'. Also a value of a particular key in QUERY can be extracted by providing the key as the third argument, for example, parse_url('http://facebook.com/path1/p.php?k1=v1&k2=v2#Ref1', 'QUERY', 'k1') returns 'v1'. | | string | printf(String format, Obj... args) | Returns the input formatted according do printf-style format strings (as of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-2695)). | @@ -344,7 +344,7 @@ The following built-in data masking functions are supported in Hive: | **Return Type** | **Name(Signature)** | **Description** | | --- | --- | --- | | varies | java_method(class, method[, arg1[, arg2..]]) | Synonym for `reflect`. (As of Hive [0.9.0](https://issues.apache.org/jira/browse/HIVE-1877).) | -| varies | reflect(class, method[, arg1[, arg2..]]) | Calls a Java method by matching the argument signature, using reflection. (As of Hive [0.7.0](https://issues.apache.org/jira/browse/HIVE-471).) See [Reflect (Generic) UDF]({{< ref "reflectudf" >}}) for examples. | +| varies | reflect(class, method[, arg1[, arg2..]]) | Calls a Java method by matching the argument signature, using reflection. (As of Hive [0.7.0](https://issues.apache.org/jira/browse/HIVE-471).) See [Reflect (Generic) UDF]({{% ref "reflectudf" %}}) for examples. | | int | hash(a1[, a2...]) | Returns a hash value of the arguments. (As of Hive 0.4.) | | string | current_user() | Returns current user name from the configured authenticator manager (as of Hive [1.2.0](https://issues.apache.org/jira/browse/HIVE-9143)). Could be the same as the user provided when connecting, but with some authentication managers (for example HadoopDefaultAuthenticator) it could be different. | | string | logged_in_user() | Returns current user name from the session state (as of Hive [2.2.0](https://issues.apache.org/jira/browse/HIVE-14100)). This is the username provided when connecting to Hive. | @@ -360,7 +360,7 @@ The following built-in data masking functions are supported in Hive: #### xpath -The following functions are described in [LanguageManual XPathUDF]({{< ref "languagemanual-xpathudf" >}}): +The following functions are described in [LanguageManual XPathUDF]({{% ref "languagemanual-xpathudf" %}}): * xpath, xpath_short, xpath_int, xpath_long, xpath_float, xpath_double, xpath_number, xpath_string @@ -420,7 +420,7 @@ The following built-in aggregate functions are supported in Hive: | **Return Type** | **Name(Signature)** | **Description** | | --- | --- | --- | -| BIGINT | count(*), count(expr), count(DISTINCT expr[, expr...]) | count(*) - Returns the total number of retrieved rows, including rows containing NULL values.count(expr) - Returns the number of rows for which the supplied expression is non-NULL.count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with [hive.optimize.distinct.rewrite]({{< ref "#hive-optimize-distinct-rewrite" >}}). | +| BIGINT | count(*), count(expr), count(DISTINCT expr[, expr...]) | count(*) - Returns the total number of retrieved rows, including rows containing NULL values.count(expr) - Returns the number of rows for which the supplied expression is non-NULL.count(DISTINCT expr[, expr]) - Returns the number of rows for which the supplied expression(s) are unique and non-NULL. Execution of this can be optimized with [hive.optimize.distinct.rewrite]({{% ref "#hive-optimize-distinct-rewrite" %}}). | | DOUBLE | sum(col), sum(DISTINCT col) | Returns the sum of the elements in the group or the sum of the distinct values of the column in the group. | | DOUBLE | avg(col), avg(DISTINCT col) | Returns the average of the elements in the group or the average of the distinct values of the column in the group. | | DOUBLE | min(col) | Returns the minimum of the column in the group. | @@ -566,9 +566,9 @@ Using the syntax "SELECT udtf(col) AS colAlias..." has a few limitations: * GROUP BY / CLUSTER BY / DISTRIBUTE BY / SORT BY is not supported + SELECT explode(adid_list) AS myCol ... GROUP BY myCol is not supported -Please see [LanguageManual LateralView]({{< ref "languagemanual-lateralview" >}}) for an alternative syntax that does not have these limitations. +Please see [LanguageManual LateralView]({{% ref "languagemanual-lateralview" %}}) for an alternative syntax that does not have these limitations. -Also see [Writing UDTFs]({{< ref "developerguide-udtf" >}}) if you want to create a custom UDTF. +Also see [Writing UDTFs]({{% ref "developerguide-udtf" %}}) if you want to create a custom UDTF. ### explode @@ -640,7 +640,7 @@ will produce: ### json_tuple -A new json_tuple() UDTF is introduced in Hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function. This is much more efficient than calling GET_JSON_OBJECT to retrieve more than one key from a single JSON string. In any case where a single JSON string would be parsed more than once, your query will be more efficient if you parse it once, which is what JSON_TUPLE is for. As JSON_TUPLE is a UDTF, you will need to use the [LATERAL VIEW]({{< ref "languagemanual-lateralview" >}}) syntax in order to achieve the same goal. +A new json_tuple() UDTF is introduced in Hive 0.7. It takes a set of names (keys) and a JSON string, and returns a tuple of values using one function. This is much more efficient than calling GET_JSON_OBJECT to retrieve more than one key from a single JSON string. In any case where a single JSON string would be parsed more than once, your query will be more efficient if you parse it once, which is what JSON_TUPLE is for. As JSON_TUPLE is a UDTF, you will need to use the [LATERAL VIEW]({{% ref "languagemanual-lateralview" %}}) syntax in order to achieve the same goal. For example, @@ -719,7 +719,7 @@ SELECT length(string_col) FROM table_name; would evaluate the length of each of the string_col's values in the map portion of the job. The side effect of the UDF being evaluated on the map-side is that you can't control the order of rows which get sent to the mapper. It is the same order in which the file split sent to the mapper gets deserialized. Any reduce side operation (such as SORT BY, ORDER BY, regular JOIN, etc.) would apply to the UDFs output as if it is just another column of the table. This is fine since the context of the UDF's evaluate method is meant to be one row at a time. -If you would like to control which rows get sent to the same UDF (and possibly in what order), you will have the urge to make the UDF evaluate during the reduce phase. This is achievable by making use of [DISTRIBUTE BY, DISTRIBUTE BY + SORT BY, CLUSTER BY]({{< ref "languagemanual-sortby" >}}). An example query would be: +If you would like to control which rows get sent to the same UDF (and possibly in what order), you will have the urge to make the UDF evaluate during the reduce phase. This is achievable by making use of [DISTRIBUTE BY, DISTRIBUTE BY + SORT BY, CLUSTER BY]({{% ref "languagemanual-sortby" %}}). An example query would be: ``` SELECT reducer_udf(my_col, distribute_col, sort_col) FROM @@ -727,11 +727,11 @@ SELECT reducer_udf(my_col, distribute_col, sort_col) FROM ``` -However, one could argue that the very premise of your requirement to control the set of rows sent to the same UDF is to do aggregation in that UDF. In such a case, using a User Defined Aggregate Function (UDAF) is a better choice. You can read more about writing a UDAF [here]({{< ref "genericudafcasestudy" >}}). Alternatively, you can user a custom reduce script to accomplish the same using [Hive's Transform functionality]({{< ref "languagemanual-transform" >}}). Both of these options would do aggregations on the reduce side. +However, one could argue that the very premise of your requirement to control the set of rows sent to the same UDF is to do aggregation in that UDF. In such a case, using a User Defined Aggregate Function (UDAF) is a better choice. You can read more about writing a UDAF [here]({{% ref "genericudafcasestudy" %}}). Alternatively, you can user a custom reduce script to accomplish the same using [Hive's Transform functionality]({{% ref "languagemanual-transform" %}}). Both of these options would do aggregations on the reduce side. ## Creating Custom UDFs -For information about how to create a custom UDF, see [Hive Plugins]({{< ref "hiveplugins" >}}) and [Create Function]({{< ref "#create-function" >}}). +For information about how to create a custom UDF, see [Hive Plugins]({{% ref "hiveplugins" %}}) and [Create Function]({{% ref "#create-function" %}}). diff --git a/content/docs/latest/language/languagemanual-union.md b/content/docs/latest/language/languagemanual-union.md index fe9716eb..d64762d7 100644 --- a/content/docs/latest/language/languagemanual-union.md +++ b/content/docs/latest/language/languagemanual-union.md @@ -53,7 +53,7 @@ For example, if we suppose there are two different tables that track which user #### Unions in DDL and Insert Statements -Unions can be used in views, inserts, and CTAS (create table as select) statements. A query can contain multiple UNION clauses, as shown in the [syntax]({{< ref "#syntax" >}}) above. +Unions can be used in views, inserts, and CTAS (create table as select) statements. A query can contain multiple UNION clauses, as shown in the [syntax]({{% ref "#syntax" %}}) above. #### Applying Subclauses diff --git a/content/docs/latest/language/languagemanual-variablesubstitution.md b/content/docs/latest/language/languagemanual-variablesubstitution.md index 13af1fe4..3c859427 100644 --- a/content/docs/latest/language/languagemanual-variablesubstitution.md +++ b/content/docs/latest/language/languagemanual-variablesubstitution.md @@ -45,7 +45,7 @@ Time taken: 0.754 seconds ``` -For general information about Hive command line options, see [Hive CLI]({{< ref "languagemanual-cli" >}}). +For general information about Hive command line options, see [Hive CLI]({{% ref "languagemanual-cli" %}}). Version information @@ -123,7 +123,7 @@ Hive substitutes the value for a variable when a query is constructed with the v # Disabling Variable Substitution -Variable substitution is on by default ([hive.variable.substitute]({{< ref "#hive-variable-substitute" >}})=true). If this causes an issue with an already existing script, disable it using the following command: +Variable substitution is on by default ([hive.variable.substitute]({{% ref "#hive-variable-substitute" %}})=true). If this causes an issue with an already existing script, disable it using the following command: ``` set hive.variable.substitute=false; diff --git a/content/docs/latest/language/languagemanual.md b/content/docs/latest/language/languagemanual.md index cf35e5b3..d5b04287 100644 --- a/content/docs/latest/language/languagemanual.md +++ b/content/docs/latest/language/languagemanual.md @@ -5,57 +5,57 @@ date: 2024-12-12 # Apache Hive : LanguageManual -This is the Hive Language Manual.  For other Hive documentation, see the Hive wiki's [Home page]({{< ref "#home-page" >}}). +This is the Hive Language Manual.  For other Hive documentation, see the Hive wiki's [Home page]({{% ref "#home-page" %}}). * Commands and CLIs - + [Commands]({{< ref "languagemanual-commands" >}}) - + [Hive CLI]({{< ref "languagemanual-cli" >}}) (old) - + [Beeline CLI]({{< ref "hiveserver2-clients" >}}) (new) - + [Variable Substitution]({{< ref "languagemanual-variablesubstitution" >}}) - + [HCatalog CLI]({{< ref "hcatalog-cli" >}}) + + [Commands]({{% ref "languagemanual-commands" %}}) + + [Hive CLI]({{% ref "languagemanual-cli" %}}) (old) + + [Beeline CLI]({{% ref "hiveserver2-clients" %}}) (new) + + [Variable Substitution]({{% ref "languagemanual-variablesubstitution" %}}) + + [HCatalog CLI]({{% ref "hcatalog-cli" %}}) * File Formats - + [Avro Files]({{< ref "avroserde" >}}) - + [ORC Files]({{< ref "languagemanual-orc" >}}) - + [Parquet]({{< ref "parquet" >}}) - + [Compressed Data Storage]({{< ref "compressedstorage" >}}) - + [LZO Compression]({{< ref "languagemanual-lzo" >}}) -* [Data Types]({{< ref "languagemanual-types" >}}) + + [Avro Files]({{% ref "avroserde" %}}) + + [ORC Files]({{% ref "languagemanual-orc" %}}) + + [Parquet]({{% ref "parquet" %}}) + + [Compressed Data Storage]({{% ref "compressedstorage" %}}) + + [LZO Compression]({{% ref "languagemanual-lzo" %}}) +* [Data Types]({{% ref "languagemanual-types" %}}) * Data Definition Statements - + [DDL Statements]({{< ref "languagemanual-ddl" >}}) - - [Bucketed Tables]({{< ref "languagemanual-ddl-bucketedtables" >}}) - - [Write Ordering (Type-Native & Z-Order)]({{< ref "writeordering" >}}) - + [Statistics (Analyze and Describe)]({{< ref "statsdev" >}}) - + [Indexes]({{< ref "languagemanual-indexing" >}}) - + [Archiving]({{< ref "languagemanual-archiving" >}}) + + [DDL Statements]({{% ref "languagemanual-ddl" %}}) + - [Bucketed Tables]({{% ref "languagemanual-ddl-bucketedtables" %}}) + - [Write Ordering (Type-Native & Z-Order)]({{% ref "writeordering" %}}) + + [Statistics (Analyze and Describe)]({{% ref "statsdev" %}}) + + [Indexes]({{% ref "languagemanual-indexing" %}}) + + [Archiving]({{% ref "languagemanual-archiving" %}}) * Data Manipulation Statements - + [DML: Load, Insert, Update, Delete]({{< ref "languagemanual-dml" >}}) - + [Import/Export]({{< ref "languagemanual-importexport" >}}) + + [DML: Load, Insert, Update, Delete]({{% ref "languagemanual-dml" %}}) + + [Import/Export]({{% ref "languagemanual-importexport" %}}) * Data Retrieval: Queries - + [Select]({{< ref "languagemanual-select" >}}) - - [Group By]({{< ref "languagemanual-groupby" >}}) - - [Sort/Distribute/Cluster/Order By]({{< ref "languagemanual-sortby" >}}) - - [Transform and Map-Reduce Scripts]({{< ref "languagemanual-transform" >}}) - - [Operators and User-Defined Functions (UDFs)]({{< ref "languagemanual-udf" >}}) - - [XPath-specific Functions]({{< ref "languagemanual-xpathudf" >}}) - - [Joins]({{< ref "languagemanual-joins" >}}) - - [Join Optimization]({{< ref "languagemanual-joinoptimization" >}}) - - [Union]({{< ref "languagemanual-union" >}}) - - [Lateral View]({{< ref "languagemanual-lateralview" >}}) - + [Sub Queries]({{< ref "languagemanual-subqueries" >}}) - + [Sampling]({{< ref "languagemanual-sampling" >}}) - + [Virtual Columns]({{< ref "languagemanual-virtualcolumns" >}}) - + [Windowing and Analytics Functions]({{< ref "languagemanual-windowingandanalytics" >}}) - + [Enhanced Aggregation, Cube, Grouping and Rollup]({{< ref "enhanced-aggregation-cube-grouping-and-rollup" >}}) + + [Select]({{% ref "languagemanual-select" %}}) + - [Group By]({{% ref "languagemanual-groupby" %}}) + - [Sort/Distribute/Cluster/Order By]({{% ref "languagemanual-sortby" %}}) + - [Transform and Map-Reduce Scripts]({{% ref "languagemanual-transform" %}}) + - [Operators and User-Defined Functions (UDFs)]({{% ref "languagemanual-udf" %}}) + - [XPath-specific Functions]({{% ref "languagemanual-xpathudf" %}}) + - [Joins]({{% ref "languagemanual-joins" %}}) + - [Join Optimization]({{% ref "languagemanual-joinoptimization" %}}) + - [Union]({{% ref "languagemanual-union" %}}) + - [Lateral View]({{% ref "languagemanual-lateralview" %}}) + + [Sub Queries]({{% ref "languagemanual-subqueries" %}}) + + [Sampling]({{% ref "languagemanual-sampling" %}}) + + [Virtual Columns]({{% ref "languagemanual-virtualcolumns" %}}) + + [Windowing and Analytics Functions]({{% ref "languagemanual-windowingandanalytics" %}}) + + [Enhanced Aggregation, Cube, Grouping and Rollup]({{% ref "enhanced-aggregation-cube-grouping-and-rollup" %}}) + Procedural Language:  [Hive HPL/SQL](/docs/latest/user/hive-hpl-sql) - + [Explain Execution Plan]({{< ref "languagemanual-explain" >}}) -* [Locks]({{< ref "locking" >}}) -* [Authorization]({{< ref "languagemanual-authorization" >}}) - + [Storage Based Authorization]({{< ref "storage-based-authorization-in-the-metastore-server" >}}) - + [SQL Standard Based Authorization]({{< ref "sql-standard-based-hive-authorization" >}}) - + [Hive deprecated authorization mode / Legacy Mode]({{< ref "hive-deprecated-authorization-mode" >}}) -* [Configuration Properties]({{< ref "configuration-properties" >}}) + + [Explain Execution Plan]({{% ref "languagemanual-explain" %}}) +* [Locks]({{% ref "locking" %}}) +* [Authorization]({{% ref "languagemanual-authorization" %}}) + + [Storage Based Authorization]({{% ref "storage-based-authorization-in-the-metastore-server" %}}) + + [SQL Standard Based Authorization]({{% ref "sql-standard-based-hive-authorization" %}}) + + [Hive deprecated authorization mode / Legacy Mode]({{% ref "hive-deprecated-authorization-mode" %}}) +* [Configuration Properties]({{% ref "configuration-properties" %}}) diff --git a/content/docs/latest/language/managed-vs--external-tables.md b/content/docs/latest/language/managed-vs--external-tables.md index 4001a588..09898bb5 100644 --- a/content/docs/latest/language/managed-vs--external-tables.md +++ b/content/docs/latest/language/managed-vs--external-tables.md @@ -18,9 +18,9 @@ Another consequence is that data is attached to the Hive entities. So, whenever For external tables Hive assumes that it does *not* manage the data. -Managed or external tables can be identified using the [DESCRIBE FORMATTED table_name]({{< ref "#describe-formatted-table_name" >}}) command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type. +Managed or external tables can be identified using the [DESCRIBE FORMATTED table_name]({{% ref "#describe-formatted-table_name" %}}) command, which will display either MANAGED_TABLE or EXTERNAL_TABLE depending on table type. -[Statistics]({{< ref "statsdev" >}}) can be managed on internal and external tables and partitions for query optimization.  +[Statistics]({{% ref "statsdev" %}}) can be managed on internal and external tables and partitions for query optimization.  ## Feature comparison @@ -35,13 +35,13 @@ This means that there are lots of features which are only available for one of t ## Managed tables -A managed table is stored under the [hive.metastore.warehouse.dir]({{< ref "#hive-metastore-warehouse-dir" >}}) path property, by default in a folder path similar to `/user/hive/warehouse/databasename.db/tablename/`. The default location can be overridden by the `location` property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration. +A managed table is stored under the [hive.metastore.warehouse.dir]({{% ref "#hive-metastore-warehouse-dir" %}}) path property, by default in a folder path similar to `/user/hive/warehouse/databasename.db/tablename/`. The default location can be overridden by the `location` property during table creation. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. If the PURGE option is not specified, the data is moved to a trash folder for a defined duration. Use managed tables when Hive should manage the lifecycle of the table, or when generating temporary tables. ## External tables -An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an [MSCK REPAIR TABLE table_name]({{< ref "#msck-repair-table-table_name" >}}) statement can be used to refresh metadata information. +An external table describes the metadata / schema on external files. External table files can be accessed and managed by processes outside of Hive. External tables can access data stored in sources such as Azure Storage Volumes (ASV) or remote HDFS locations. If the structure or partitioning of an external table is changed, an [MSCK REPAIR TABLE table_name]({{% ref "#msck-repair-table-table_name" %}}) statement can be used to refresh metadata information. Use external tables when files are already present or in remote locations, and the files should remain even if the table is dropped. diff --git a/content/docs/latest/language/operatorsandfunctions.md b/content/docs/latest/language/operatorsandfunctions.md index ec6c8f19..eea9ec55 100644 --- a/content/docs/latest/language/operatorsandfunctions.md +++ b/content/docs/latest/language/operatorsandfunctions.md @@ -7,12 +7,12 @@ date: 2024-12-12 ## Hive Operators and Functions -* [Hive Plug-in Interfaces - User-Defined Functions and SerDes]({{< ref "hiveplugins" >}}) -* [Guide to Hive Operators and Functions]({{< ref "languagemanual-udf" >}}) +* [Hive Plug-in Interfaces - User-Defined Functions and SerDes]({{% ref "hiveplugins" %}}) +* [Guide to Hive Operators and Functions]({{% ref "languagemanual-udf" %}}) - + [Reflect UDF]({{< ref "reflectudf" >}}) - + [Generic UDAF Case Study]({{< ref "genericudafcasestudy" >}}) - + [Functions for Statistics and Data Mining]({{< ref "statisticsanddatamining" >}}) + + [Reflect UDF]({{% ref "reflectudf" %}}) + + [Generic UDAF Case Study]({{% ref "genericudafcasestudy" %}}) + + [Functions for Statistics and Data Mining]({{% ref "statisticsanddatamining" %}}) diff --git a/content/docs/latest/language/reflectudf.md b/content/docs/latest/language/reflectudf.md index 6341294a..8ce5981b 100644 --- a/content/docs/latest/language/reflectudf.md +++ b/content/docs/latest/language/reflectudf.md @@ -25,7 +25,7 @@ FROM src LIMIT 1; Version information -As of Hive 0.9.0, java_method() is a synonym for reflect(). See [Misc. Functions]({{< ref "#misc--functions" >}}) in Hive Operators and UDFs. +As of Hive 0.9.0, java_method() is a synonym for reflect(). See [Misc. Functions]({{% ref "#misc--functions" %}}) in Hive Operators and UDFs. Note that Reflect UDF is non-deterministic since there is no guarantee what a specific method will return given the same parameters. So be cautious when using Reflect on the WHERE clause because that may invalidate Predicate Pushdown optimization. diff --git a/content/docs/latest/language/sql-standard-based-hive-authorization.md b/content/docs/latest/language/sql-standard-based-hive-authorization.md index 9dfa86e6..a17bdbee 100644 --- a/content/docs/latest/language/sql-standard-based-hive-authorization.md +++ b/content/docs/latest/language/sql-standard-based-hive-authorization.md @@ -7,9 +7,9 @@ date: 2024-12-12 # Status of Hive Authorization before Hive 0.13 -The [default authorization in Hive]({{< ref "#default-authorization-in-hive" >}}) is not designed with the intent to protect against malicious users accessing data they should not be accessing. It only helps in preventing users from accidentally doing operations they are not supposed to do. It is also incomplete because it does not have authorization checks for many operations including the grant statement. The authorization checks happen during Hive query compilation. But as the user is allowed to execute dfs commands, user-defined functions and shell commands, it is possible to bypass the client security checks. +The [default authorization in Hive]({{% ref "#default-authorization-in-hive" %}}) is not designed with the intent to protect against malicious users accessing data they should not be accessing. It only helps in preventing users from accidentally doing operations they are not supposed to do. It is also incomplete because it does not have authorization checks for many operations including the grant statement. The authorization checks happen during Hive query compilation. But as the user is allowed to execute dfs commands, user-defined functions and shell commands, it is possible to bypass the client security checks. -Hive also has support for storage based authorization, which is commonly used to add authorization to metastore server API calls (see [Storage Based Authorization in the Metastore Server]({{< ref "#storage-based-authorization-in-the-metastore-server" >}})). As of Hive 0.12.0 it can be used on the client side as well. While it can protect the metastore against changes by malicious users, it does not support fine grained access control (column or row level). +Hive also has support for storage based authorization, which is commonly used to add authorization to metastore server API calls (see [Storage Based Authorization in the Metastore Server]({{% ref "#storage-based-authorization-in-the-metastore-server" %}})). As of Hive 0.12.0 it can be used on the client side as well. While it can protect the metastore against changes by malicious users, it does not support fine grained access control (column or row level). The default authorization model in Hive can be used to provide fine grained access control by creating views and granting access to views instead of the underlying tables. @@ -23,7 +23,7 @@ This authorization mode can be used in conjunction with storage based authorizat The goal of this work has been to comply with the SQL standard as far as possible, but there are deviations from the standard in the implementation. Some deviations were made to make it easier for existing Hive users to migrate to this authorization model, and some were made considering ease of use (in such cases we also looked at what many widely used databases do). -Under this authorization model, users who have access to the Hive CLI, HDFS commands, Pig command line, 'hadoop jar' command, etc., are considered privileged users. In an organization, it is typically only the teams that work on [ETL](http://en.wikipedia.org/wiki/Extract,_transform,_load) workloads that need such access. These tools don't access the data through HiveServer2, and as a result their access is not authorized through this model. For Hive CLI, Pig, and MapReduce users access to Hive tables can be controlled using [storage based authorization]({{< ref "hcatalog-authorization" >}}) enabled on the metastore server. +Under this authorization model, users who have access to the Hive CLI, HDFS commands, Pig command line, 'hadoop jar' command, etc., are considered privileged users. In an organization, it is typically only the teams that work on [ETL](http://en.wikipedia.org/wiki/Extract,_transform,_load) workloads that need such access. These tools don't access the data through HiveServer2, and as a result their access is not authorized through this model. For Hive CLI, Pig, and MapReduce users access to Hive tables can be controlled using [storage based authorization]({{% ref "hcatalog-authorization" %}}) enabled on the metastore server. Most users such as business analysts tend to use SQL and ODBC/JDBC through HiveServer2 and their access can be controlled using this authorization model. @@ -31,13 +31,13 @@ Most users such as business analysts tend to use SQL and ODBC/JDBC through HiveS Commands such as dfs, add, delete, compile, and reset are disabled when this authorization is enabled. -The set commands used to change Hive configuration are restricted to a smaller safe set. This is controlled using the [hive.security.authorization.sqlstd.confwhitelist]({{< ref "#hive-security-authorization-sqlstd-confwhitelist" >}}) configuration parameter. If this set needs to be customized, the HiveServer2 administrator can set a value for this configuration parameter in its hive-site.xml. +The set commands used to change Hive configuration are restricted to a smaller safe set. This is controlled using the [hive.security.authorization.sqlstd.confwhitelist]({{% ref "#hive-security-authorization-sqlstd-confwhitelist" %}}) configuration parameter. If this set needs to be customized, the HiveServer2 administrator can set a value for this configuration parameter in its hive-site.xml. Privileges to add or drop functions and macros are restricted to the **admin** role. To enable users to use functions, the ability to create [permanent functions](/docs/latest/language/languagemanual-ddl#create-function) has been added. A user in the **admin** role can run commands to create these functions, which all users can then use. -The Hive [transform clause]({{< ref "languagemanual-transform" >}}) is also disabled when this authorization is enabled. +The Hive [transform clause]({{% ref "languagemanual-transform" %}}) is also disabled when this authorization is enabled. ## Privileges @@ -63,7 +63,7 @@ For certain actions, the ownership of the object (table/view/database) determine The user who creates the table, view or database becomes its owner. In the case of tables and views, the owner gets all the privileges with grant option. -A role can also be the owner of a database. The "`[alter database]({{< ref "#alter-database" >}})`" command can be used to set the owner of a database to a role. +A role can also be the owner of a database. The "`[alter database]({{% ref "#alter-database" %}})`" command can be used to set the owner of a database to a role. ## Users and Roles @@ -118,7 +118,7 @@ Drops the given role. Only the **admin** role has privilege for this. SHOW CURRENT ROLES; ``` -Shows the list of the user's [current roles]({{< ref "#current-roles" >}}). All actions of the user are authorized by looking at the privileges of the user and all current roles of the user. +Shows the list of the user's [current roles]({{% ref "#current-roles" %}}). All actions of the user are authorized by looking at the privileges of the user and all current roles of the user. The default current roles has all roles for the user except for the **admin** role (even if the user belongs to the **admin** role as well). @@ -402,7 +402,7 @@ As of Hive 3.0.0 ([HIVE-12408](https://issues.apache.org/jira/browse/HIVE-12408) **Set the following in hive-site.xml:** * hive.server2.enable.doAs to false. -* hive.users.in.admin.role to the list of comma-separated users who need to be added to **admin** role. Note that a user who belongs to the **admin** role needs to run the "`[set role]({{< ref "#set-role" >}})`" command before getting the privileges of the **admin** role, as this role is not in current roles by default. +* hive.users.in.admin.role to the list of comma-separated users who need to be added to **admin** role. Note that a user who belongs to the **admin** role needs to run the "`[set role]({{% ref "#set-role" %}})`" command before getting the privileges of the **admin** role, as this role is not in current roles by default. **Start HiveServer2 with the following additional command-line options:** @@ -416,7 +416,7 @@ As of Hive 3.0.0 ([HIVE-12408](https://issues.apache.org/jira/browse/HIVE-12408) **Set the following in hive-site.xml:** * **hive.server2.enable.doAs** to false. -* **hive.users.in.admin.role** to the list of comma-separated users who need to be added to **admin** role. Note that a user who belongs to the **admin** role needs to run the "`[set role]({{< ref "#set-role" >}})`" command before getting the privileges of the **admin** role, as this role is not in current roles by default. +* **hive.users.in.admin.role** to the list of comma-separated users who need to be added to **admin** role. Note that a user who belongs to the **admin** role needs to run the "`[set role]({{% ref "#set-role" %}})`" command before getting the privileges of the **admin** role, as this role is not in current roles by default. * Add org.apache.hadoop.hive.ql.security.authorization.MetaStoreAuthzAPIAuthorizerEmbedOnly to **hive.security.metastore.authorization.manager**. (It takes a comma separated list, so you can add it along with StorageBasedAuthorization parameter, if you want to enable that as well). This setting disallows any of the authorization api calls to be invoked in a remote metastore. HiveServer2 can be configured to use embedded metastore, and that will allow it to invoke metastore authorization api. Hive cli and any other remote metastore users would be denied authorization when they try to make authorization api calls. This restricts the authorization api to privileged HiveServer2 process. You should also ensure that the metastore rdbms access is restricted to the metastore server and hiverserver2. * **hive.security.authorization.manager** to org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdConfOnlyAuthorizerFactory. This will ensure that any table or views created by hive-cli have default privileges granted for the owner. @@ -457,9 +457,9 @@ For information on the SQL standard for security see: ***Problem:***  My user name is in hive.users.in.admin.role in hive-site.xml, but I still get the error that user is not an **admin**. What could be wrong? -***Do This:***  Ensure that you have restarted HiveServer2 after a configuration change and that you have used the HiveServer2 command line options as described in [Configuration]({{< ref "#configuration" >}}) above. +***Do This:***  Ensure that you have restarted HiveServer2 after a configuration change and that you have used the HiveServer2 command line options as described in [Configuration]({{% ref "#configuration" %}}) above. -***Do This:***  Ensure that you have run a '`[set role]({{< ref "#set-role" >}}) admin;`' command to get the **admin** role. +***Do This:***  Ensure that you have run a '`[set role]({{% ref "#set-role" %}}) admin;`' command to get the **admin** role. ## Attachments: diff --git a/content/docs/latest/overview-of-major-changes.md b/content/docs/latest/overview-of-major-changes.md index 546eba9f..b2076c9f 100644 --- a/content/docs/latest/overview-of-major-changes.md +++ b/content/docs/latest/overview-of-major-changes.md @@ -41,7 +41,7 @@ date: 2024-12-12 + API optimization (performance) + Dynamic leader election - + [External data sources support]({{< ref "data-connectors-in-hive" >}}) + + [External data sources support]({{% ref "data-connectors-in-hive" %}}) + HMS support for [Thrift over HTTP](https://issues.apache.org/jira/browse/HIVE-21456) + [JWT authentication](https://issues.apache.org/jira/browse/HIVE-26071) for Thrift over HTTP + [HMS metadata summary](https://issues.apache.org/jira/browse/HIVE-26435) @@ -97,7 +97,7 @@ date: 2024-12-12 + Support Hadoop-3.3.6 + Supports Tez 0.10.3 + Works with Aarch64 (ARM) - + New UDFs ([Hive UDFs]({{< ref "hive-udfs" >}})) + + New UDFs ([Hive UDFs]({{% ref "hive-udfs" %}})) + Deprecated Hive on MR & Removed Hive on Spark + Deprecated Hive CLI diff --git a/content/docs/latest/user/accumulointegration.md b/content/docs/latest/user/accumulointegration.md index 2d583aa3..1291ba21 100644 --- a/content/docs/latest/user/accumulointegration.md +++ b/content/docs/latest/user/accumulointegration.md @@ -101,7 +101,7 @@ Using index tables greatly improve performance of non-rowId predicate queries by | --- | --- | | **accumulo.indextable.name** | **(Required) The name of the index table in Accumulo.** | | **accumulo.indexed.columns** | (Optional) A comma separated list of hive columns to index, or * which indexes all columns (default: *) | -| **accumulo.index.rows.max** | (Optional) The maximum number of predicate values to scan from the index for each search predicate (default: 20000) *[See this note about this value]({{< ref "#see-this-note-about-this-value" >}})* | +| **accumulo.index.rows.max** | (Optional) The maximum number of predicate values to scan from the index for each search predicate (default: 20000) *[See this note about this value]({{% ref "#see-this-note-about-this-value" %}})* | | **accumulo.index.scanner** | (Optional) The index scanner implementation. (default: org.apache.hadoop.hive.accumulo.AccumuloDefaultIndexScanner) | The indexes are stored in the index table using the following format: diff --git a/content/docs/latest/user/authdev.md b/content/docs/latest/user/authdev.md index b70f90b2..4f243751 100644 --- a/content/docs/latest/user/authdev.md +++ b/content/docs/latest/user/authdev.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : AuthDev -This is the design document for the [original Hive authorization mode]({{< ref "hive-deprecated-authorization-mode" >}}). See [Authorization]({{< ref "languagemanual-authorization" >}}) for an overview of authorization modes, which include [storage based authorization]({{< ref "storage-based-authorization-in-the-metastore-server" >}}) and [SQL standards based authorization]({{< ref "sql-standard-based-hive-authorization" >}}). +This is the design document for the [original Hive authorization mode]({{% ref "hive-deprecated-authorization-mode" %}}). See [Authorization]({{% ref "languagemanual-authorization" %}}) for an overview of authorization modes, which include [storage based authorization]({{% ref "storage-based-authorization-in-the-metastore-server" %}}) and [SQL standards based authorization]({{% ref "sql-standard-based-hive-authorization" %}}). # 1. Privilege diff --git a/content/docs/latest/user/avroserde.md b/content/docs/latest/user/avroserde.md index e93df9e4..f67d488e 100644 --- a/content/docs/latest/user/avroserde.md +++ b/content/docs/latest/user/avroserde.md @@ -25,7 +25,7 @@ The AvroSerde allows users to read or write [Avro data](http://avro.apache.org/) * Has worked reliably against our most convoluted Avro schemas in our ETL process. * Starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-7446), columns can be added to an Avro backed Hive table using the [Alter Table](/docs/latest/language/languagemanual-ddl#addreplace-columns) statement. -For general information about SerDes, see [Hive SerDe]({{< ref "#hive-serde" >}}) in the Developer Guide. Also see [SerDe]({{< ref "serde" >}}) for details about input and output processing. +For general information about SerDes, see [Hive SerDe]({{% ref "#hive-serde" %}}) in the Developer Guide. Also see [SerDe]({{% ref "serde" %}}) for details about input and output processing. ### Requirements @@ -81,7 +81,7 @@ CREATE TABLE kst In this example we're pulling the source-of-truth reader schema from a webserver. Other options for providing the schema are described below. -Add the Avro files to the database (or create an external table) using standard Hive operations ([http://wiki.apache.org/hadoop/Hive/LanguageManual/DML]({{< ref "languagemanual-dml" >}})). +Add the Avro files to the database (or create an external table) using standard Hive operations ([http://wiki.apache.org/hadoop/Hive/LanguageManual/DML]({{% ref "languagemanual-dml" %}})). This table might result in a description as below: @@ -353,7 +353,7 @@ Hive does not provide an easy way to unset or remove a property. If you wish to ### HBase Integration -Hive 0.14.0 onward supports storing and querying Avro objects in HBase columns by making them visible as structs to Hive. This allows Hive to perform ad hoc analysis of HBase data which can be deeply structured. Prior to 0.14.0, the HBase Hive integration only supported querying primitive data types in columns. See [Avro Data Stored in HBase Columns]({{< ref "#avro-data-stored-in-hbase-columns" >}}) for details. +Hive 0.14.0 onward supports storing and querying Avro objects in HBase columns by making them visible as structs to Hive. This allows Hive to perform ad hoc analysis of HBase data which can be deeply structured. Prior to 0.14.0, the HBase Hive integration only supported querying primitive data types in columns. See [Avro Data Stored in HBase Columns]({{% ref "#avro-data-stored-in-hbase-columns" %}}) for details. ### If something goes wrong diff --git a/content/docs/latest/user/capture-lineage-info.md b/content/docs/latest/user/capture-lineage-info.md index a8ba7e86..141eee84 100644 --- a/content/docs/latest/user/capture-lineage-info.md +++ b/content/docs/latest/user/capture-lineage-info.md @@ -13,7 +13,7 @@ In Hive, lineage information is captured in the form of `LineageInfo` object. Th - org.apache.hadoop.hive.ql.hooks.LineageLogger - org.apache.atlas.hive.hook.HiveHook -To facilitate the capture of lineage information in a custom hook or in a use case where the [existing hooks]({{< ref "#existing-hooks" >}}) are not set in `hive.exec.post.hooks`, a new configuration `hive.lineage.hook.info.enabled` was introduced in [HIVE-24051](https://issues.apache.org/jira/browse/HIVE-24051). This configuration is set to `false` by default. +To facilitate the capture of lineage information in a custom hook or in a use case where the [existing hooks]({{% ref "#existing-hooks" %}}) are not set in `hive.exec.post.hooks`, a new configuration `hive.lineage.hook.info.enabled` was introduced in [HIVE-24051](https://issues.apache.org/jira/browse/HIVE-24051). This configuration is set to `false` by default. To provide filtering capability on query type in the lineage information, a new configuration `hive.lineage.hook.info.query.type` was introduced in [HIVE-28409](https://issues.apache.org/jira/browse/HIVE-28409), with default value as "_ALL_". Users can tune the configuration accordingly to capture lineage information only for the required query types. In [HIVE-28409](https://issues.apache.org/jira/browse/HIVE-28409), the previously introduced configuration `hive.lineage.hook.info.enabled` was marked as deprecated. @@ -28,18 +28,18 @@ hive.lineage.hook.info.query.type=NONE -- will not ```` Previously, to capture lineage information, users has 2 ways: -1. Set any of the above mentioned [existing hooks]({{< ref "#existing-hooks" >}}) in `hive.exec.post.hooks` configuration. +1. Set any of the above mentioned [existing hooks]({{% ref "#existing-hooks" %}}) in `hive.exec.post.hooks` configuration. 2. Set `hive.lineage.hook.info.enabled` as true in cluster and restart HiveServer2 service. (Valid since Hive-4.0.0 release). -**NOTE**: Just by enabling `hive.lineage.hook.info.enabled`, lineage information for "Create View" query type won't be captured, user has to set the [existing hooks]({{< ref "#existing-hooks" >}}) in `hive.exec.post.hooks` along with their custom hook class name. +**NOTE**: Just by enabling `hive.lineage.hook.info.enabled`, lineage information for "Create View" query type won't be captured, user has to set the [existing hooks]({{% ref "#existing-hooks" %}}) in `hive.exec.post.hooks` along with their custom hook class name. ## Changes done in [HIVE-28768](https://issues.apache.org/jira/browse/HIVE-28768) -The hardcoded values of the [existing hooks]({{< ref "#existing-hooks" >}}) that capture lineage information in `SemanticAnalyzer` and `Optimizer` code has been removed and to determine, whether lineage information should be captured or not, the value of `hive.lineage.hook.info.query.type` configuration is checked. **The default value of `hive.lineage.hook.info.query.type` has been set to "_NONE_".** +The hardcoded values of the [existing hooks]({{% ref "#existing-hooks" %}}) that capture lineage information in `SemanticAnalyzer` and `Optimizer` code has been removed and to determine, whether lineage information should be captured or not, the value of `hive.lineage.hook.info.query.type` configuration is checked. **The default value of `hive.lineage.hook.info.query.type` has been set to "_NONE_".** ## Implications of [HIVE-28768](https://issues.apache.org/jira/browse/HIVE-28768) on users -1. Users migrating directly from Hive-3.x to HIVE-4.1.0 **will observe breaking changes** in the way lineage information is captured. Setting `hive.exec.post.hooks` to any of the [existing hooks]({{< ref "#existing-hooks" >}}) will not capture lineage information anymore. Users will have to make use of `hive.lineage.hook.info.query.type` configuration to capture lineage information. +1. Users migrating directly from Hive-3.x to HIVE-4.1.0 **will observe breaking changes** in the way lineage information is captured. Setting `hive.exec.post.hooks` to any of the [existing hooks]({{% ref "#existing-hooks" %}}) will not capture lineage information anymore. Users will have to make use of `hive.lineage.hook.info.query.type` configuration to capture lineage information. 2. Users migrating from Hive-4.0.x to Hive-4.1.0 who don't have `hive.lineage.hook.info.enabled` set to true, **will also observe breaking changes** in the way lineage information is captured. *** diff --git a/content/docs/latest/user/compressedstorage.md b/content/docs/latest/user/compressedstorage.md index f72f873a..15022d83 100644 --- a/content/docs/latest/user/compressedstorage.md +++ b/content/docs/latest/user/compressedstorage.md @@ -42,7 +42,7 @@ The value for io.seqfile.compression.type determines how the compression is perf ### LZO Compression -See [LZO Compression]({{< ref "languagemanual-lzo" >}}) for information about using LZO with Hive. +See [LZO Compression]({{% ref "languagemanual-lzo" %}}) for information about using LZO with Hive. diff --git a/content/docs/latest/user/configuration-properties.md b/content/docs/latest/user/configuration-properties.md index b31f8272..67b63be3 100644 --- a/content/docs/latest/user/configuration-properties.md +++ b/content/docs/latest/user/configuration-properties.md @@ -9,11 +9,11 @@ This document describes the Hive user configuration properties (sometimes called The canonical list of configuration properties is managed in the `HiveConf` Java class, so refer to the `HiveConf.java` file for a complete list of configuration properties available in your Hive release. -For information about how to use these configuration properties, see [Configuring Hive]({{< ref "#configuring-hive" >}}). That document also describes administrative configuration properties for setting up Hive in the [Configuration Variables]({{< ref "#configuration-variables" >}}) section. [Hive Metastore Administration](/docs/latest/admin/adminmanual-metastore-administration) describes additional configuration properties for the metastore. +For information about how to use these configuration properties, see [Configuring Hive]({{% ref "#configuring-hive" %}}). That document also describes administrative configuration properties for setting up Hive in the [Configuration Variables]({{% ref "#configuration-variables" %}}) section. [Hive Metastore Administration](/docs/latest/admin/adminmanual-metastore-administration) describes additional configuration properties for the metastore. Version information -As of Hive 0.14.0 ( [HIVE-7211](https://issues.apache.org/jira/browse/HIVE-7211) ), a configuration name that starts with "hive." is regarded as a Hive system property. With the [hive.conf.validation]({{< ref "#hiveconfvalidation" >}}) option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception. +As of Hive 0.14.0 ( [HIVE-7211](https://issues.apache.org/jira/browse/HIVE-7211) ), a configuration name that starts with "hive." is regarded as a Hive system property. With the [hive.conf.validation]({{% ref "#hiveconfvalidation" %}}) option true (default), any attempts to set a configuration property that starts with "hive." which is not registered to the Hive system will throw an exception. ## Query and DDL Execution @@ -22,11 +22,11 @@ As of Hive 0.14.0 ( [HIVE-7211](https://issues.apache.org/jira/browse/HIVE-7211) * Default Value: `mr` (deprecated in Hive 2.0.0 – see below) * Added In: Hive 0.13.0 with [HIVE-6103](https://issues.apache.org/jira/browse/HIVE-6103) and [HIVE-6098](https://issues.apache.org/jira/browse/HIVE-6098) -Chooses execution engine. Options are: `mr` (Map Reduce, default), `tez` ([Tez]({{< ref "hive-on-tez" >}}) execution, for Hadoop 2 only), or `spark` ([Spark]({{< ref "hive-on-spark" >}}) execution, for Hive 1.1.0 onward). +Chooses execution engine. Options are: `mr` (Map Reduce, default), `tez` ([Tez]({{% ref "hive-on-tez" %}}) execution, for Hadoop 2 only), or `spark` ([Spark]({{% ref "hive-on-spark" %}}) execution, for Hive 1.1.0 onward). While `mr` remains the default engine for historical reasons, it is itself a historical engine and is deprecated in the Hive 2 line ([HIVE-12300](https://issues.apache.org/jira/browse/HIVE-12300)). It may be removed without further warning. -See [Hive on Tez]({{< ref "hive-on-tez" >}}) and [Hive on Spark]({{< ref "hive-on-spark-getting-started" >}}) for more information, and see the [Tez section]({{< ref "#tez-section" >}}) and the [Spark section]({{< ref "#spark-section" >}}) below for their configuration properties. +See [Hive on Tez]({{% ref "hive-on-tez" %}}) and [Hive on Spark]({{% ref "hive-on-spark-getting-started" %}}) for more information, and see the [Tez section]({{% ref "#tez-section" %}}) and the [Spark section]({{% ref "#spark-section" %}}) below for their configuration properties. ##### hive.execution.mode @@ -56,7 +56,7 @@ Size per reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if th * Default Value: `999` prior to Hive 0.14.0; `1009`  in Hive 0.14.0 and later * Added In: Hive 0.2.0; default changed in 0.14.0 with [HIVE-7158](https://issues.apache.org/jira/browse/HIVE-7158) (and [HIVE-7917](https://issues.apache.org/jira/browse/HIVE-7917)) -Maximum number of reducers that will be used. If the one specified in the configuration property **[mapred.reduce.tasks]({{< ref "#mapredreducetasks" >}})** is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers. +Maximum number of reducers that will be used. If the one specified in the configuration property **[mapred.reduce.tasks]({{% ref "#mapredreducetasks" %}})** is negative, Hive will use this as the maximum number of reducers when automatically determining the number of reducers. ##### hive.jar.path @@ -77,7 +77,7 @@ The location of the plugin jars that contain implementations of user defined fu * Default Value: (empty) * Added In: Hive 0.14.0 with [HIVE-7553](https://issues.apache.org/jira/browse/HIVE-7553) -The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed, or updated) by executing the [Beeline reload command]({{< ref "#beeline-reload-command" >}}) without having to restart HiveServer2. These jars can be  used just like the auxiliary classes in [**hive.aux.jars.path**]({{< ref "#**hive-aux-jars-path**" >}}) for creating UDFs or SerDes. +The locations of the plugin jars, which can be comma-separated folders or jars. They can be renewed (added, removed, or updated) by executing the [Beeline reload command]({{% ref "#beeline-reload-command" %}}) without having to restart HiveServer2. These jars can be  used just like the auxiliary classes in [**hive.aux.jars.path**]({{% ref "#**hive-aux-jars-path**" %}}) for creating UDFs or SerDes. ##### hive.exec.scratchdir @@ -86,23 +86,23 @@ The locations of the plugin jars, which can be comma-separated folders or jars. Scratch space for Hive jobs. This directory is used by Hive to store the plans for different map/reduce stages for the query as well as to stored the intermediate outputs of these stages. -*Hive 0.14.0 and later:*  HDFS root scratch directory for Hive jobs, which gets created with write all (733) permission.  For each connecting user, an HDFS scratch directory ${**hive.exec.scratchdir**}/\ is created  with ${ **[hive.scratch.dir.permission]({{< ref "#hivescratchdirpermission" >}})** }. +*Hive 0.14.0 and later:*  HDFS root scratch directory for Hive jobs, which gets created with write all (733) permission.  For each connecting user, an HDFS scratch directory ${**hive.exec.scratchdir**}/\ is created  with ${ **[hive.scratch.dir.permission]({{% ref "#hivescratchdirpermission" %}})** }. -Also see  [**hive.start.cleanup.scratchdir**]({{< ref "#**hive-start-cleanup-scratchdir**" >}}) and **[hive.scratchdir.lock]({{< ref "#hivescratchdirlock" >}})** .  When running Hive in local mode, see  [**hive.exec.local.scratchdir**]({{< ref "#**hive-exec-local-scratchdir**" >}}). +Also see  [**hive.start.cleanup.scratchdir**]({{% ref "#**hive-start-cleanup-scratchdir**" %}}) and **[hive.scratchdir.lock]({{% ref "#hivescratchdirlock" %}})** .  When running Hive in local mode, see  [**hive.exec.local.scratchdir**]({{% ref "#**hive-exec-local-scratchdir**" %}}). ##### hive.scratch.dir.permission * Default Value: `700` * Added In: Hive 0.12.0 with [HIVE-4487](https://issues.apache.org/jira/browse/HIVE-4487) -The permission for the user-specific scratch directories that get created in the root scratch directory. (See [**hive.exec.scratchdir**]({{< ref "#**hive-exec-scratchdir**" >}}).) +The permission for the user-specific scratch directories that get created in the root scratch directory. (See [**hive.exec.scratchdir**]({{% ref "#**hive-exec-scratchdir**" %}}).) ##### hive.exec.local.scratchdir * Default Value: `/tmp/${user.name`} * Added In: Hive 0.10.0 with [HIVE-1577](https://issues.apache.org/jira/browse/HIVE-1577) -Scratch space for Hive jobs when Hive runs in local mode.  Also see [**hive.exec.scratchdir**]({{< ref "#**hive-exec-scratchdir**" >}}). +Scratch space for Hive jobs when Hive runs in local mode.  Also see [**hive.exec.scratchdir**]({{% ref "#**hive-exec-scratchdir**" %}}). ##### hive.hadoop.supports.splittable.combineinputformat @@ -197,7 +197,7 @@ Whether to remove an extra join with sq_count_check UDF for scalar subqueries wi * Default Value: `false` * Added In: Hive 0.8.0 with [HIVE-2056](https://issues.apache.org/jira/browse/HIVE-2056) -* Removed In: Hive 0.9.0 by [HIVE-2621](https://issues.apache.org/jira/browse/HIVE-2621) (see **[hive.multigroupby.singlereducer]({{< ref "#hivemultigroupbysinglereducer" >}})** ) +* Removed In: Hive 0.9.0 by [HIVE-2621](https://issues.apache.org/jira/browse/HIVE-2621) (see **[hive.multigroupby.singlereducer]({{% ref "#hivemultigroupbysinglereducer" %}})** ) Whether to optimize multi group by query to generate a single M/R job plan. If the multi group by query has common group by keys, it will be optimized to generate a single M/R job. (This configuration property was removed in release 0.9.0.) @@ -223,7 +223,7 @@ Whether to enable column pruner. (This configuration property was removed in rel Whether to enable automatic use of indexes. -Note:  See [Indexing]({{< ref "#indexing" >}}) for more configuration properties related to Hive indexes. +Note:  See [Indexing]({{% ref "#indexing" %}}) for more configuration properties related to Hive indexes. ##### hive.optimize.ppd @@ -232,7 +232,7 @@ Note:  See [Indexing]({{< ref "#indexing" >}}) for more configuration propertie Whether to enable predicate pushdown (PPD).  -Note: Turn on  **[hive.optimize.index.filter]({{< ref "#hiveoptimizeindexfilter" >}})** as well to use file format specific indexes with PPD. +Note: Turn on  **[hive.optimize.index.filter]({{% ref "#hiveoptimizeindexfilter" %}})** as well to use file format specific indexes with PPD. ##### hive.optimize.ppd.storage @@ -274,7 +274,7 @@ should be cached in memory. ##### hive.mapjoin.bucket.cache.size * Default Value: `100` -* Added In: Hive 0.5.0 (replaced by  **[hive.smbjoin.cache.rows]({{< ref "#hivesmbjoincacherows" >}})** in Hive 0.12.0) +* Added In: Hive 0.5.0 (replaced by  **[hive.smbjoin.cache.rows]({{% ref "#hivesmbjoincacherows" %}})** in Hive 0.12.0) How many values in each key in the map-joined table should be cached in memory. @@ -324,7 +324,7 @@ Whether Hive ignores the mapjoin hint. ##### hive.smbjoin.cache.rows * Default Value: `10000` -* Added In: Hive 0.12.0 with [HIVE-4440](https://issues.apache.org/jira/browse/HIVE-4440) (replaces **[hive.mapjoin.bucket.cache.size]({{< ref "#hivemapjoinbucketcachesize" >}})** ) +* Added In: Hive 0.12.0 with [HIVE-4440](https://issues.apache.org/jira/browse/HIVE-4440) (replaces **[hive.mapjoin.bucket.cache.size]({{% ref "#hivemapjoinbucketcachesize" %}})** ) How many rows with the same key value should be cached in memory per sort-merge-bucket joined table. @@ -341,14 +341,14 @@ Whether a MapJoin hashtable should use optimized (size-wise) keys, allowing the * Default Value: `true` * Added In: Hive 0.14.0 with [HIVE-6430](https://issues.apache.org/jira/browse/HIVE-6430) -Whether Hive should use a memory-optimized hash table for MapJoin. Only works on [Tez]({{< ref "#tez" >}}) and [Spark]({{< ref "#spark" >}}), because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with [HIVE-11180](https://issues.apache.org/jira/browse/HIVE-11180).) +Whether Hive should use a memory-optimized hash table for MapJoin. Only works on [Tez]({{% ref "#tez" %}}) and [Spark]({{% ref "#spark" %}}), because memory-optimized hash table cannot be serialized. (Spark is supported starting from Hive 1.3.0, with [HIVE-11180](https://issues.apache.org/jira/browse/HIVE-11180).) ##### hive.mapjoin.optimized.hashtable.wbsize * Default Value: `10485760 (10 * 1024 * 1024)` * Added In: Hive 0.14.0 with [HIVE-6430](https://issues.apache.org/jira/browse/HIVE-6430) -Optimized hashtable (see **[hive.mapjoin.optimized.hashtable]({{< ref "#hivemapjoinoptimizedhashtable" >}})** ) uses a chain of buffers to store data. This is one buffer size. Hashtable may be slightly faster if this is larger, but for small joins unnecessary memory will be allocated and then trimmed. +Optimized hashtable (see **[hive.mapjoin.optimized.hashtable]({{% ref "#hivemapjoinoptimizedhashtable" %}})** ) uses a chain of buffers to store data. This is one buffer size. Hashtable may be slightly faster if this is larger, but for small joins unnecessary memory will be allocated and then trimmed. ##### hive.mapjoin.lazy.hashtable @@ -363,14 +363,14 @@ Whether a MapJoin hashtable should deserialize values on demand. Depending on ho * Default Value: `100000` * Added In: Hive 0.7.0 with [HIVE-1642](https://issues.apache.org/jira/browse/HIVE-1642) -Initial capacity of mapjoin hashtable if statistics are absent, or if **[hive.hashtable.key.count.adjustment]({{< ref "#hivehashtablekeycountadjustment" >}})** is set to 0. +Initial capacity of mapjoin hashtable if statistics are absent, or if **[hive.hashtable.key.count.adjustment]({{% ref "#hivehashtablekeycountadjustment" %}})** is set to 0. ##### hive.hashtable.key.count.adjustment * Default Value: `1.0` * Added In: Hive 0.14.0 with [HIVE-7616](https://issues.apache.org/jira/browse/HIVE-7616) -Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate  of the number of keys is divided by this value. If the value is 0, statistics are not used  and **[hive.hashtable.initialCapacity]({{< ref "#hivehashtableinitialcapacity" >}})** is used instead. +Adjustment to mapjoin hashtable size derived from table and column statistics; the estimate  of the number of keys is divided by this value. If the value is 0, statistics are not used  and **[hive.hashtable.initialCapacity]({{% ref "#hivehashtableinitialcapacity" %}})** is used instead. ##### hive.hashtable.loadfactor @@ -395,7 +395,7 @@ In the process of Mapjoin, the key/value will be held in the hashtable. This val * Default Value: `false` * Added In: Hive 0.6.0 -Whether to enable skew join optimization.  (Also see **[hive.optimize.skewjoin.compiletime]({{< ref "#hiveoptimizeskewjoincompiletime" >}})** .) +Whether to enable skew join optimization.  (Also see **[hive.optimize.skewjoin.compiletime]({{% ref "#hiveoptimizeskewjoincompiletime" %}})** .) ##### hive.skewjoin.key @@ -409,14 +409,14 @@ Determine if we get a skew key in join. If we see more than the specified number * Default Value: `10000` * Added In: Hive 0.6.0 -Determine the number of map task used in the follow up map join job for a skew join. It should be used together with **[hive.skewjoin.mapjoin.min.split]({{< ref "#hiveskewjoinmapjoinminsplit" >}})** to perform a fine grained control. +Determine the number of map task used in the follow up map join job for a skew join. It should be used together with **[hive.skewjoin.mapjoin.min.split]({{% ref "#hiveskewjoinmapjoinminsplit" %}})** to perform a fine grained control. ##### hive.skewjoin.mapjoin.min.split * Default Value: `33554432` * Added In: Hive 0.6.0 -Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with **[hive.skewjoin.mapjoin.map.tasks]({{< ref "#hiveskewjoinmapjoinmaptasks" >}})** to perform a fine grained control. +Determine the number of map task at most used in the follow up map join job for a skew join by specifying the minimum split size. It should be used together with **[hive.skewjoin.mapjoin.map.tasks]({{% ref "#hiveskewjoinmapjoinmaptasks" %}})** to perform a fine grained control. ##### hive.optimize.skewjoin.compiletime @@ -425,19 +425,19 @@ Determine the number of map task at most used in the follow up map join job for Whether to create a separate plan for skewed keys for the tables in the join. This is based on the skewed keys stored in the metadata. At compile time, the plan is broken into different joins: one for the skewed keys, and the other for the remaining keys. And then, a union is performed for the two joins generated above. So unless the same skewed key is present in both the joined tables, the join for the skewed key will be performed as a map-side join. -The main difference between this paramater and **[hive.optimize.skewjoin]({{< ref "#hiveoptimizeskewjoin" >}})** is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect. -Both **hive.optimize.skewjoin.compiletime** and **[hive.optimize.skewjoin]({{< ref "#hiveoptimizeskewjoin" >}})** should be set to true. (Ideally, **[hive.optimize.skewjoin]({{< ref "#hiveoptimizeskewjoin" >}})** should be renamed as ***hive.optimize.skewjoin.runtime*** , but for backward compatibility that has not been done.) +The main difference between this paramater and **[hive.optimize.skewjoin]({{% ref "#hiveoptimizeskewjoin" %}})** is that this parameter uses the skew information stored in the metastore to optimize the plan at compile time itself. If there is no skew information in the metadata, this parameter will not have any effect. +Both **hive.optimize.skewjoin.compiletime** and **[hive.optimize.skewjoin]({{% ref "#hiveoptimizeskewjoin" %}})** should be set to true. (Ideally, **[hive.optimize.skewjoin]({{% ref "#hiveoptimizeskewjoin" %}})** should be renamed as ***hive.optimize.skewjoin.runtime*** , but for backward compatibility that has not been done.) -If the skew information is correctly stored in the metadata, **hive.optimize.skewjoin.compiletime** will change the query plan to take care of it, and **[hive.optimize.skewjoin]({{< ref "#hiveoptimizeskewjoin" >}})** will be a no-op. +If the skew information is correctly stored in the metadata, **hive.optimize.skewjoin.compiletime** will change the query plan to take care of it, and **[hive.optimize.skewjoin]({{% ref "#hiveoptimizeskewjoin" %}})** will be a no-op. ##### hive.optimize.union.remove * Default Value: `false` * Added In: Hive 0.10.0 with [HIVE-3276](https://issues.apache.org/jira/browse/HIVE-3276) -Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when **[hive.optimize.skewjoin.compiletime]({{< ref "#hiveoptimizeskewjoincompiletime" >}})** is set to true, since an extra union is inserted. +Whether to remove the union and push the operators between union and the filesink above union. This avoids an extra scan of the output by union. This is independently useful for union queries, and especially useful when **[hive.optimize.skewjoin.compiletime]({{% ref "#hiveoptimizeskewjoincompiletime" %}})** is set to true, since an extra union is inserted. -The merge is triggered if either of **[hive.merge.mapfiles]({{< ref "#hivemergemapfiles" >}})** or **[hive.merge.mapredfiles]({{< ref "#hivemergemapredfiles" >}})** is set to true. If the user has set **[hive.merge.mapfiles]({{< ref "#hivemergemapfiles" >}})** to true and **[hive.merge.mapredfiles]({{< ref "#hivemergemapredfiles" >}})** to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively. +The merge is triggered if either of **[hive.merge.mapfiles]({{% ref "#hivemergemapfiles" %}})** or **[hive.merge.mapredfiles]({{% ref "#hivemergemapredfiles" %}})** is set to true. If the user has set **[hive.merge.mapfiles]({{% ref "#hivemergemapfiles" %}})** to true and **[hive.merge.mapredfiles]({{% ref "#hivemergemapredfiles" %}})** to false, the idea was that the number of reducers are few, so the number of files anyway is small. However, with this optimization, we are increasing the number of files possibly by a big margin. So, we merge aggresively. ##### hive.mapred.supports.subdirectories @@ -454,7 +454,7 @@ Whether the version of Hadoop which is running supports sub-directories for tabl + Hive 2.x: `strict` ([HIVE-12413](https://issues.apache.org/jira/browse/HIVE-12413)) * Added In: Hive 0.3.0 -The mode in which the Hive operations are being performed. In `strict` mode, some risky queries are not allowed to run. For example, full table scans are prevented (see [HIVE-10454](https://issues.apache.org/jira/browse/HIVE-10454)) and [ORDER BY]({{< ref "#order-by" >}}) requires a LIMIT clause. +The mode in which the Hive operations are being performed. In `strict` mode, some risky queries are not allowed to run. For example, full table scans are prevented (see [HIVE-10454](https://issues.apache.org/jira/browse/HIVE-10454)) and [ORDER BY]({{% ref "#order-by" %}}) requires a LIMIT clause. ##### hive.exec.script.maxerrsize @@ -489,11 +489,11 @@ Name of the environment variable that holds the unique script operator ID in the * Default Value: `hive.txn.valid.txns,hive.script.operator.env.blacklist` * Added In: Hive 0.14.0 with [HIVE-8341](https://issues.apache.org/jira/browse/HIVE-8341) -By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid [transaction]({{< ref "hive-transactions" >}}) list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable. +By default all values in the HiveConf object are converted to environment variables of the same name as the key (with '.' (dot) converted to '_' (underscore)) and set as part of the script operator's environment.  However, some values can grow large or are not amenable to translation to environment variables.  This value gives a comma separated list of configuration values that will not be set in the environment when calling a script operator.  By default the valid [transaction]({{% ref "hive-transactions" %}}) list is excluded, as it can grow large and is sometimes compressed, which does not translate well to an environment variable. ##### Also see: -* **[SerDes]({{< ref "#serdes" >}})** for more **hive.script.*** configuration properties +* **[SerDes]({{% ref "#serdes" %}})** for more **hive.script.*** configuration properties ##### hive.exec.compress.output * Default Value: `false` @@ -535,7 +535,7 @@ Whether to provide the row offset virtual column. * Added In: Hive 0.5.0 * Removed In: Hive 0.13.0 with [HIVE-4518](https://issues.apache.org/jira/browse/HIVE-4518) -Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with **[hive.exec.dynamic.partition]({{< ref "#hiveexecdynamicpartition" >}})** set to true. (This configuration property was removed in release 0.13.0.) +Whether Hive should periodically update task progress counters during execution. Enabling this allows task progress to be monitored more closely in the job tracker, but may impose a performance penalty. This flag is automatically set to true for jobs with **[hive.exec.dynamic.partition]({{% ref "#hiveexecdynamicpartition" %}})** set to true. (This configuration property was removed in release 0.13.0.) ##### hive.counters.group.name @@ -672,9 +672,9 @@ The interval with which to poll the JobTracker for the counters the running job. + Hive 2.x: removed, which effectively makes it always true ([HIVE-12331](https://issues.apache.org/jira/browse/HIVE-12331)) * Added In: Hive 0.6.0 -Whether [bucketing]({{< ref "languagemanual-ddl-bucketedtables" >}}) is enforced. If `true`, while inserting into the table, bucketing is enforced. +Whether [bucketing]({{% ref "languagemanual-ddl-bucketedtables" %}}) is enforced. If `true`, while inserting into the table, bucketing is enforced. -Set to `true` to support [INSERT ... VALUES, UPDATE, and DELETE]({{< ref "hive-transactions" >}}) transactions in Hive 0.14.0 and 1.x.x. For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +Set to `true` to support [INSERT ... VALUES, UPDATE, and DELETE]({{% ref "hive-transactions" %}}) transactions in Hive 0.14.0 and 1.x.x. For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . ##### hive.enforce.sorting @@ -691,11 +691,11 @@ Whether sorting is enforced. If true, while inserting into the table, sorting is * Default Value: `true` * Added In: Hive 0.11.0 with [HIVE-4240](https://issues.apache.org/jira/browse/HIVE-4240) -If **[hive.enforce.bucketing]({{< ref "#hiveenforcebucketing" >}})** or **[hive.enforce.sorting]({{< ref "#hiveenforcesorting" >}})** is true, don't create a reducer for enforcing bucketing/sorting for queries of the form: +If **[hive.enforce.bucketing]({{% ref "#hiveenforcebucketing" %}})** or **[hive.enforce.sorting]({{% ref "#hiveenforcesorting" %}})** is true, don't create a reducer for enforcing bucketing/sorting for queries of the form: `insert overwrite table T2 select * from T1;` -where T1 and T2 are bucketed/sorted by the same keys into the same number of buckets. (In Hive 2.0.0 and later, this parameter does not depend on **[hive.enforce.bucketing]({{< ref "#hiveenforcebucketing" >}})**  or  **[hive.enforce.sorting]({{< ref "#hiveenforcesorting" >}})** .) +where T1 and T2 are bucketed/sorted by the same keys into the same number of buckets. (In Hive 2.0.0 and later, this parameter does not depend on **[hive.enforce.bucketing]({{% ref "#hiveenforcebucketing" %}})**  or  **[hive.enforce.sorting]({{% ref "#hiveenforcesorting" %}})** .) ##### hive.optimize.reducededuplication @@ -716,28 +716,28 @@ Reduce deduplication merges two RSs (*reduce sink operators*) by moving key/part * Default Value: `false` * Added In: Hive 0.12.0 with [HIVE-2206](https://issues.apache.org/jira/browse/HIVE-2206) -Exploit intra-query correlations. For details see the [Correlation Optimizer]({{< ref "correlation-optimizer" >}}) design document. +Exploit intra-query correlations. For details see the [Correlation Optimizer]({{% ref "correlation-optimizer" %}}) design document. ##### hive.optimize.limittranspose * Default Value: `false` * Added In: Hive 2.0.0 with [HIVE-11684](https://issues.apache.org/jira/browse/HIVE-11684), modified by [HIVE-11775](https://issues.apache.org/jira/browse/HIVE-11775) -Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in [**hive.optimize.limittranspose.reductionpercentage**]({{< ref "#**hive-optimize-limittranspose-reductionpercentage**" >}}) and [**hive.optimize.limittranspose.reductiontuples**]({{< ref "#**hive-optimize-limittranspose-reductiontuples**" >}})), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too. +Whether to push a limit through left/right outer join or union. If the value is true and the size of the outer input is reduced enough (as specified in [**hive.optimize.limittranspose.reductionpercentage**]({{% ref "#**hive-optimize-limittranspose-reductionpercentage**" %}}) and [**hive.optimize.limittranspose.reductiontuples**]({{% ref "#**hive-optimize-limittranspose-reductiontuples**" %}})), the limit is pushed to the outer input or union; to remain semantically correct, the limit is kept on top of the join or the union too. ##### hive.optimize.limittranspose.reductionpercentage * Default Value: `1.0` * Added In: Hive 2.0.0 with [HIVE-11684](https://issues.apache.org/jira/browse/HIVE-11684), modified by [HIVE-11775](https://issues.apache.org/jira/browse/HIVE-11775) -When [**hive.optimize.limittranspose**]({{< ref "#**hive-optimize-limittranspose**" >}}) is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. +When [**hive.optimize.limittranspose**]({{% ref "#**hive-optimize-limittranspose**" %}}) is true, this variable specifies the minimal percentage (fractional) reduction of the size of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. ##### hive.optimize.limittranspose.reductiontuples * Default Value: `0` * Added In: Hive 2.0.0 with [HIVE-11684](https://issues.apache.org/jira/browse/HIVE-11684), modified by [HIVE-11775](https://issues.apache.org/jira/browse/HIVE-11775) -When [**hive.optimize.limittranspose**]({{< ref "#**hive-optimize-limittranspose**" >}}) is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. +When [**hive.optimize.limittranspose**]({{% ref "#**hive-optimize-limittranspose**" %}}) is true, this variable specifies the minimal reduction in the number of tuples of the outer input of the join or input of the union that the optimizer should get in order to apply the rule. ##### hive.optimize.filter.stats.reduction @@ -757,10 +757,10 @@ When enabled, dynamic partitioning column will be globally sorted. This way we c ##### hive.cbo.enable -* Default Value: `false` in Hive 0.14.*; `true` in Hive [1.1.0]({{< ref "#1-1-0" >}}) and later ( [HIVE-8395](https://issues.apache.org/jira/browse/HIVE-8395) ) +* Default Value: `false` in Hive 0.14.*; `true` in Hive [1.1.0]({{% ref "#1-1-0" %}}) and later ( [HIVE-8395](https://issues.apache.org/jira/browse/HIVE-8395) ) * Added In: Hive 0.14.0 with [HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775) and [HIVE-7946](https://issues.apache.org/jira/browse/HIVE-7946) -When true, the [cost based optimizer]({{< ref "cost-based-optimization-in-hive" >}}), which uses the Calcite framework, will be enabled. +When true, the [cost based optimizer]({{% ref "cost-based-optimization-in-hive" %}}), which uses the Calcite framework, will be enabled. ##### hive.cbo.fallback.strategy @@ -800,30 +800,30 @@ When true, this optimization will try to not scan any rows from tables which can * Default Value: `false` prior to Hive 0.9.0; `true` in Hive 0.9.0 and later ([HIVE-2835](https://issues.apache.org/jira/browse/HIVE-2835)) * Added In: Hive 0.6.0 -Whether or not to allow [dynamic partitions]({{< ref "#dynamic-partitions" >}}) in DML/DDL. +Whether or not to allow [dynamic partitions]({{% ref "#dynamic-partitions" %}}) in DML/DDL. ##### hive.exec.dynamic.partition.mode * Default Value: `strict` * Added In: Hive 0.6.0 -In `strict` mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In `nonstrict` mode all partitions are allowed to be [dynamic]({{< ref "#dynamic" >}}). +In `strict` mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In `nonstrict` mode all partitions are allowed to be [dynamic]({{% ref "#dynamic" %}}). -Set to `nonstrict` to support [INSERT ... VALUES, UPDATE, and DELETE]({{< ref "hive-transactions" >}}) transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +Set to `nonstrict` to support [INSERT ... VALUES, UPDATE, and DELETE]({{% ref "hive-transactions" %}}) transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . ##### hive.exec.max.dynamic.partitions * Default Value: `1000` * Added In: Hive 0.6.0 -Maximum number of [dynamic partitions]({{< ref "#dynamic-partitions" >}}) allowed to be created in total. +Maximum number of [dynamic partitions]({{% ref "#dynamic-partitions" %}}) allowed to be created in total. ##### hive.exec.max.dynamic.partitions.pernode * Default Value: `100` * Added In: Hive 0.6.0 -Maximum number of [dynamic partitions]({{< ref "#dynamic-partitions" >}}) allowed to be created in each mapper/reducer node. +Maximum number of [dynamic partitions]({{% ref "#dynamic-partitions" %}}) allowed to be created in each mapper/reducer node. ##### hive.exec.max.created.files @@ -858,7 +858,7 @@ Lets Hive determine whether to run in local mode automatically. * Default Value: `134217728` * Added In: Hive 0.7.0 with [HIVE-1408](https://issues.apache.org/jira/browse/HIVE-1408) -When **[hive.exec.mode.local.auto]({{< ref "#hiveexecmodelocalauto" >}})** is true, input bytes should be less than this for local mode. +When **[hive.exec.mode.local.auto]({{% ref "#hiveexecmodelocalauto" %}})** is true, input bytes should be less than this for local mode. ##### hive.exec.mode.local.auto.tasks.max @@ -866,21 +866,21 @@ When **[hive.exec.mode.local.auto]({{< ref "#hiveexecmodelocalauto" >}})** is * Added In: Hive 0.7.0 with [HIVE-1408](https://issues.apache.org/jira/browse/HIVE-1408) * Removed In: Hive 0.9.0 with [HIVE-2651](https://issues.apache.org/jira/browse/HIVE-2651) -When **[hive.exec.mode.local.auto]({{< ref "#hiveexecmodelocalauto" >}})** is true, the number of tasks should be less than this for local mode. Replaced in Hive 0.9.0 by  **[hive.exec.mode.local.auto.input.files.max]({{< ref "#hive-exec-mode-local-auto-input-files-max" >}}).** +When **[hive.exec.mode.local.auto]({{% ref "#hiveexecmodelocalauto" %}})** is true, the number of tasks should be less than this for local mode. Replaced in Hive 0.9.0 by  **[hive.exec.mode.local.auto.input.files.max]({{% ref "#hive-exec-mode-local-auto-input-files-max" %}}).** ##### hive.exec.mode.local **.auto.input.files.max** * Default Value: `4` * Added In: Hive 0.9.0 with [HIVE-2651](https://issues.apache.org/jira/browse/HIVE-2651) -When **[hive.exec.mode.local.auto]({{< ref "#hiveexecmodelocalauto" >}})** is true, the number of tasks should be less than this for local mode. +When **[hive.exec.mode.local.auto]({{% ref "#hiveexecmodelocalauto" %}})** is true, the number of tasks should be less than this for local mode. ##### hive.exec.drop.ignorenonexistent * Default Value: `true` * Added In: Hive 0.7.0 with [HIVE-1856](https://issues.apache.org/jira/browse/HIVE-1856) and [HIVE-1858](https://issues.apache.org/jira/browse/HIVE-1858) -Do not report an error if DROP TABLE/VIEW/PARTITION/INDEX/TEMPORARY FUNCTION specifies a non-existent table/view. Also applies to [permanent functions]({{< ref "#permanent-functions" >}}) as of Hive 0.13.0. +Do not report an error if DROP TABLE/VIEW/PARTITION/INDEX/TEMPORARY FUNCTION specifies a non-existent table/view. Also applies to [permanent functions]({{% ref "#permanent-functions" %}}) as of Hive 0.13.0. ##### hive.exec.show.job.failure.debug.info @@ -908,14 +908,14 @@ Default property values for newly created tables. * Default Value: `true` * Added In: Hive 0.7.0 -This enables [substitution]({{< ref "languagemanual-variablesubstitution" >}}) using syntax like `${var`} `${system:var`} and `${env:var`}. +This enables [substitution]({{% ref "languagemanual-variablesubstitution" %}}) using syntax like `${var`} `${system:var`} and `${env:var`}. ##### hive.error.on.empty.partition * Default Value: `false` * Added In: Hive 0.7.0 -Whether to throw an exception if [dynamic partition insert]({{< ref "#dynamic-partition-insert" >}}) generates empty results. +Whether to throw an exception if [dynamic partition insert]({{% ref "#dynamic-partition-insert" %}}) generates empty results. ##### hive.exim.uri.scheme.whitelist @@ -993,16 +993,16 @@ The class responsible logging client side performance metrics. Must be a subclas * Added In: Hive 0.8.1 with [HIVE-2181](https://issues.apache.org/jira/browse/HIVE-2181) * Fixed In:  Hive 1.3.0 with [HIVE-10415](https://issues.apache.org/jira/browse/HIVE-10415) -To clean up the Hive [scratch directory]({{< ref "#scratch-directory" >}}) while starting the Hive server (or HiveServer2). This is not an option for a multi-user environment since it will accidentally remove the scratch directory in use. +To clean up the Hive [scratch directory]({{% ref "#scratch-directory" %}}) while starting the Hive server (or HiveServer2). This is not an option for a multi-user environment since it will accidentally remove the scratch directory in use. ##### hive.scratchdir.lock * Default Value: `false` * Added In: Hive 1.3.0 and 2.1.0 (but not 2.0.x) with [HIVE-13429](https://issues.apache.org/jira/browse/HIVE-13429) -When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the [cleardanglingscratchdir tool]({{< ref "#cleardanglingscratchdir-tool" >}}) will remove it. +When true, holds a lock file in the scratch directory. If a Hive process dies and accidentally leaves a dangling scratchdir behind, the [cleardanglingscratchdir tool]({{% ref "#cleardanglingscratchdir-tool" %}}) will remove it. -When false, does not create a lock file and therefore the [cleardanglingscratchdir tool]({{< ref "#cleardanglingscratchdir-tool" >}}) cannot remove any dangling scratch directories. +When false, does not create a lock file and therefore the [cleardanglingscratchdir tool]({{% ref "#cleardanglingscratchdir-tool" %}}) cannot remove any dangling scratch directories. ##### hive.output.file.extension @@ -1071,21 +1071,21 @@ If the bucketing/sorting properties of the table exactly match the grouping key, * Added In: Hive 0.11.0 with [HIVE-581](https://issues.apache.org/jira/browse/HIVE-581) * Deprecated In: Hive 2.2.0 with [HIVE-15797](https://issues.apache.org/jira/browse/HIVE-15797) -Whether to enable using Column Position Alias in [GROUP BY]({{< ref "languagemanual-groupby" >}}) and [ORDER BY]({{< ref "#order-by" >}}) clauses of queries (deprecated as of Hive 2.2.0; use [hive.groupby.position.alias]({{< ref "#hivegroupbypositionalias" >}}) and [hive.orderby.position.alias]({{< ref "#hiveorderbypositionalias" >}}) instead). +Whether to enable using Column Position Alias in [GROUP BY]({{% ref "languagemanual-groupby" %}}) and [ORDER BY]({{% ref "#order-by" %}}) clauses of queries (deprecated as of Hive 2.2.0; use [hive.groupby.position.alias]({{% ref "#hivegroupbypositionalias" %}}) and [hive.orderby.position.alias]({{% ref "#hiveorderbypositionalias" %}}) instead). ##### hive.groupby.position.alias * Default Value: `false` * Added In: Hive 2.2.0 with [HIVE-15797](https://issues.apache.org/jira/browse/HIVE-15797) -Whether to enable using Column Position Alias in [GROUP BY]({{< ref "languagemanual-groupby" >}}). +Whether to enable using Column Position Alias in [GROUP BY]({{% ref "languagemanual-groupby" %}}). ##### hive.orderby.position.alias * Default Value: `true` * Added In: Hive 2.2.0 with [HIVE-15797](https://issues.apache.org/jira/browse/HIVE-15797) -Whether to enable using Column Position Alias in [ORDER BY]({{< ref "#order-by" >}}). +Whether to enable using Column Position Alias in [ORDER BY]({{% ref "#order-by" %}}). ##### hive.fetch.task.aggr @@ -1099,7 +1099,7 @@ Aggregation queries with no group-by clause (for example, `select count(*) from * Default Value: ``-1`` in Hive 0.13.0 and 0.13.1,  `1073741824` (1 GB) in Hive 0.14.0 and later * Added In: Hive 0.13.0 with [HIVE-3990](https://issues.apache.org/jira/browse/HIVE-3990); default changed in Hive 0.14.0 with [HIVE-7397](https://issues.apache.org/jira/browse/HIVE-7397) -Input threshold (in bytes) for applying [**hive.fetch.task.conversion**]({{< ref "#**hive-fetch-task-conversion**" >}}). If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means  [**hive.fetch.task.conversion**]({{< ref "#**hive-fetch-task-conversion**" >}}) is applied without any input length threshold. +Input threshold (in bytes) for applying [**hive.fetch.task.conversion**]({{% ref "#**hive-fetch-task-conversion**" %}}). If target table is native, input length is calculated by summation of file lengths. If it's not native, the storage handler for the table can optionally implement the org.apache.hadoop.hive.ql.metadata.InputEstimator interface. A negative threshold means  [**hive.fetch.task.conversion**]({{% ref "#**hive-fetch-task-conversion**" %}}) is applied without any input length threshold. ##### hive.limit.pushdown.memory.usage @@ -1153,13 +1153,13 @@ Check if a query plan contains a cross product. If there is one, output a warnin * Default Value: `true` * Added In: Hive 0.13.0 with [HIVE-6689](https://issues.apache.org/jira/browse/HIVE-6689) -In older Hive versions (0.10 and earlier) no distinction was made between partition columns or non-partition columns while displaying columns in [DESCRIBE TABLE]({{< ref "#describe-table" >}}). From version 0.12 onwards, they are displayed separately. This flag will let you get the old behavior, if desired. See test-case in [patch for HIVE-6689](https://issues.apache.org/jira/secure/attachment/12635956/HIVE-6689.2.patch). +In older Hive versions (0.10 and earlier) no distinction was made between partition columns or non-partition columns while displaying columns in [DESCRIBE TABLE]({{% ref "#describe-table" %}}). From version 0.12 onwards, they are displayed separately. This flag will let you get the old behavior, if desired. See test-case in [patch for HIVE-6689](https://issues.apache.org/jira/secure/attachment/12635956/HIVE-6689.2.patch). ##### hive.limit.query.max.table.partition * Default Value: `-1` * Added In: Hive 0.13.0 with [HIVE-6492](https://issues.apache.org/jira/browse/HIVE-6492) -* Deprecated In: Hive 2.2.0 with [HIVE-13884](https://issues.apache.org/jira/browse/HIVE-13884) (See **[hive.metastore.limit.partition.request]({{< ref "#hivemetastorelimitpartitionrequest" >}})** .) +* Deprecated In: Hive 2.2.0 with [HIVE-13884](https://issues.apache.org/jira/browse/HIVE-13884) (See **[hive.metastore.limit.partition.request]({{% ref "#hivemetastorelimitpartitionrequest" %}})** .) * Removed In: Hive 3.0.0 with [HIVE-17965](https://issues.apache.org/jira/browse/HIVE-17965) To protect the cluster, this controls how many partitions can be scanned for each partitioned table. The default value "-1" means no limit. The limit on partitions does not affect metadata-only queries. @@ -1168,7 +1168,7 @@ To protect the cluster, this controls how many partitions can be scanned for eac * Default Value: `0002` * Added In: (none, but temporarily in patches for [HIVE-2504](https://issues.apache.org/jira/browse/HIVE-2504) before release 0.9.0) -* Removed In: Hive 0.9.0 ([HIVE-2504-1.patch](https://issues.apache.org/jira/secure/attachment/12521986/HIVE-2504-1.patch)), replaced by  **[hive.warehouse.subdir.inherit.perms]({{< ref "#hivewarehousesubdirinheritperms" >}})** +* Removed In: Hive 0.9.0 ([HIVE-2504-1.patch](https://issues.apache.org/jira/secure/attachment/12521986/HIVE-2504-1.patch)), replaced by  **[hive.warehouse.subdir.inherit.perms]({{% ref "#hivewarehousesubdirinheritperms" %}})** Obsolete:  The `dfs.umask` value for the Hive-created folders. @@ -1217,7 +1217,7 @@ Whether to enable the [constant propagation](http://en.wikipedia.org/wiki/Const ##### **hive.entity.capture.transform** * Default Value: `false` -* Added In: Hive [1.1.0]({{< ref "#1-1-0" >}}) with [HIVE-8938](https://issues.apache.org/jira/browse/HIVE-8938) +* Added In: Hive [1.1.0]({{% ref "#1-1-0" %}}) with [HIVE-8938](https://issues.apache.org/jira/browse/HIVE-8938) Enable capturing compiler read entity of transform URI which can be introspected in the semantic and exec hooks. @@ -1226,23 +1226,23 @@ Enable capturing compiler read entity of transform URI which can be introspected * Default Value: `true` * Added In: Hive 1.2.0 with [HIVE-6617](https://issues.apache.org/jira/browse/HIVE-6617) -Whether to  enable support for SQL2011 reserved keywords.  When enabled, will support (part of) SQL2011 [reserved keywords]({{< ref "#reserved-keywords" >}}). +Whether to  enable support for SQL2011 reserved keywords.  When enabled, will support (part of) SQL2011 [reserved keywords]({{% ref "#reserved-keywords" %}}). ##### **hive.log.explain.output** * Default Value: `false` -* Added In: [1.1.0]({{< ref "#1-1-0" >}}) with [HIVE-8600](https://issues.apache.org/jira/browse/HIVE-8600) +* Added In: [1.1.0]({{% ref "#1-1-0" %}}) with [HIVE-8600](https://issues.apache.org/jira/browse/HIVE-8600) -When enabled, will log [EXPLAIN EXTENDED]({{< ref "#explain-extended" >}}) output for the query at log4j INFO level and in HiveServer2 WebUI / Drilldown / Query Plan. +When enabled, will log [EXPLAIN EXTENDED]({{% ref "#explain-extended" %}}) output for the query at log4j INFO level and in HiveServer2 WebUI / Drilldown / Query Plan. -From [Hive 3.1.0](https://issues.apache.org/jira/browse/HIVE-18469) onwards, this configuration property only logs to the log4j INFO. T o log the [EXPLAIN EXTENDED]({{< ref "#explain-extended" >}}) output in WebUI / Drilldown / Query Plan from Hive 3.1.0 onwards, use **[hive.server2.webui.explain.output]({{< ref "#hive-server2-webui-explain-output" >}})** .   +From [Hive 3.1.0](https://issues.apache.org/jira/browse/HIVE-18469) onwards, this configuration property only logs to the log4j INFO. T o log the [EXPLAIN EXTENDED]({{% ref "#explain-extended" %}}) output in WebUI / Drilldown / Query Plan from Hive 3.1.0 onwards, use **[hive.server2.webui.explain.output]({{% ref "#hive-server2-webui-explain-output" %}})** .   ##### **hive.explain.user** * Default Value: `false` * Added In: Hive 1.2.0 with [HIVE-9780](https://issues.apache.org/jira/browse/HIVE-9780) -Whether to [show explain result at user level]({{< ref "#show-explain-result-at-user-level" >}}).  When enabled, will log EXPLAIN output for the query at user level. (Tez only.  For Spark, see [**hive.spark.explain.user**]({{< ref "#hivesparkexplainuser" >}}).) +Whether to [show explain result at user level]({{% ref "#show-explain-result-at-user-level" %}}).  When enabled, will log EXPLAIN output for the query at user level. (Tez only.  For Spark, see [**hive.spark.explain.user**]({{% ref "#hivesparkexplainuser" %}}).) ##### **hive.typecheck.on.insert** @@ -1259,14 +1259,14 @@ Whether to check, convert, and normalize partition value specified in partition Expects one of [`memory`, `ssd`, `default`]. -Define the storage policy for [temporary tables]({{< ref "#temporary-tables" >}}). Choices between memory, ssd and default. See [HDFS Storage Types and Storage Policies](http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types_and_Storage_Policies). +Define the storage policy for [temporary tables]({{% ref "#temporary-tables" %}}). Choices between memory, ssd and default. See [HDFS Storage Types and Storage Policies](http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html#Storage_Types_and_Storage_Policies). ##### **hive.optimize.distinct.rewrite** * Default Value: `true` * Added In: Hive 1.2.0 with [HIVE-10568](https://issues.apache.org/jira/browse/HIVE-10568) -When applicable, this optimization rewrites [distinct aggregates]({{< ref "#distinct-aggregates" >}})  from a single-stage to multi-stage aggregation. This may not be optimal in all cases. Ideally, whether to trigger it or not should be a cost-based decision. Until Hive formalizes the cost model for this, this is config driven. +When applicable, this optimization rewrites [distinct aggregates]({{% ref "#distinct-aggregates" %}})  from a single-stage to multi-stage aggregation. This may not be optimal in all cases. Ideally, whether to trigger it or not should be a cost-based decision. Until Hive formalizes the cost model for this, this is config driven. ##### **hive.optimize.point.lookup** @@ -1303,7 +1303,7 @@ Refer to  for benefit * Default Value: 0 * Added In: Hive 2.2.0 with [HIVE-12077](https://issues.apache.org/jira/browse/HIVE-12077) -To run the [MSCK REPAIR TABLE]({{< ref "#msck-repair-table" >}}) command batch-wise. If there is a large number of untracked partitions, by configuring a value to the property it will execute in batches internally. The default value of the property is zero, which means it will execute all the partitions at once. +To run the [MSCK REPAIR TABLE]({{% ref "#msck-repair-table" %}}) command batch-wise. If there is a large number of untracked partitions, by configuring a value to the property it will execute in batches internally. The default value of the property is zero, which means it will execute all the partitions at once. ##### ****hive.exec.copyfile.maxnumfiles**** @@ -1409,14 +1409,14 @@ The default record writer for writing data to the user scripts. The default SerDe Hive will use for storage formats that do not specify a SerDe.  Storage formats that currently do not specify a SerDe include 'TextFile, RcFile'.   -See [Registration of Native SerDes]({{< ref "#registration-of-native-serdes" >}}) for more information for storage formats and SerDes. +See [Registration of Native SerDes]({{% ref "#registration-of-native-serdes" %}}) for more information for storage formats and SerDes. ##### hive.lazysimple.extended_boolean_literal * Default Value: `false` * Added in: Hive 0.14 with [HIVE-3635](https://issues.apache.org/jira/browse/HIVE-3635) - [LazySimpleSerDe]({{< ref "#lazysimpleserde" >}}) uses this property to determine if it treats 'T', 't', 'F', 'f',  '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'.  The default is `false`, which means only 'TRUE' and 'FALSE' are treated as legal  boolean literals. + [LazySimpleSerDe]({{% ref "#lazysimpleserde" %}}) uses this property to determine if it treats 'T', 't', 'F', 'f',  '1', and '0' as extended, legal boolean literals, in addition to 'TRUE' and 'FALSE'.  The default is `false`, which means only 'TRUE' and 'FALSE' are treated as legal  boolean literals. #### I/O @@ -1436,7 +1436,7 @@ The default input format. Set this to HiveInputFormat if you encounter problems ##### Also see: -* **[hive.tez.input.format]({{< ref "#hivetezinputformat" >}})** +* **[hive.tez.input.format]({{% ref "#hivetezinputformat" %}})** ### File Formats ##### hive.default.fileformat @@ -1446,14 +1446,14 @@ The default input format. Set this to HiveInputFormat if you encounter problems Default file format for CREATE TABLE statement. Options are TextFile, SequenceFile, RCfile, ORC, and Parquet. -Users can explicitly say [CREATE TABLE]({{< ref "#create-table" >}})... STORED AS TEXTFILE|SEQUENCEFILE|RCFILE|ORC|AVRO|INPUTFORMAT...OUTPUTFORMAT... to override. (RCFILE was added in Hive 0.6.0, ORC in 0.11.0, AVRO in 0.14.0, and Parquet in 2.3.0) See [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}) for details. +Users can explicitly say [CREATE TABLE]({{% ref "#create-table" %}})... STORED AS TEXTFILE|SEQUENCEFILE|RCFILE|ORC|AVRO|INPUTFORMAT...OUTPUTFORMAT... to override. (RCFILE was added in Hive 0.6.0, ORC in 0.11.0, AVRO in 0.14.0, and Parquet in 2.3.0) See [Row Format, Storage Format, and SerDe]({{% ref "#row-format,-storage-format,-and-serde" %}}) for details. ##### hive.default.fileformat.managed * Default Value: `none` * Added In: Hive 1.2.0 with [HIVE-9915](https://issues.apache.org/jira/browse/HIVE-9915) -Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by [hive.default.fileformat]({{< ref "#hivedefaultfileformat" >}}). Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the [StorageHandlers]({{< ref "#storagehandlers" >}}) section for more information on managed/external and native/non-native terminology). +Default file format for CREATE TABLE statement applied to managed tables only. External tables will be created with format specified by [hive.default.fileformat]({{% ref "#hivedefaultfileformat" %}}). Options are none, TextFile, SequenceFile, RCfile, ORC, and Parquet (as of Hive 2.3.0). Leaving this null will result in using hive.default.fileformat for all native tables. For non-native tables the file format is determined by the storage handler, as shown below (see the [StorageHandlers]({{% ref "#storagehandlers" %}}) section for more information on managed/external and native/non-native terminology). | | Native | Non-Native | | --- | --- | --- | @@ -1500,12 +1500,12 @@ File format to use for a query's intermediate results. Options are TextFile, Seq #### ORC File Format -The ORC file format was introduced in [Hive 0.11.0](https://issues.apache.org/jira/browse/HIVE-3874). See [ORC Files]({{< ref "languagemanual-orc" >}}) for details. +The ORC file format was introduced in [Hive 0.11.0](https://issues.apache.org/jira/browse/HIVE-3874). See [ORC Files]({{% ref "languagemanual-orc" %}}) for details. Besides the configuration properties listed in this section, some properties in other sections are also related to ORC: -* [hive.default.fileformat]({{< ref "#hivedefaultfileformat" >}}) -* [hive.stats.gather.num.threads]({{< ref "#hivestatsgathernumthreads" >}}) +* [hive.default.fileformat]({{% ref "#hivedefaultfileformat" %}}) +* [hive.stats.gather.num.threads]({{% ref "#hivestatsgathernumthreads" %}}) ##### hive.exec.orc.memory.pool @@ -1599,14 +1599,14 @@ Define the encoding strategy to use while writing data. Changing this will only * Default Value: `false` * Added In: Hive 0.13.0 with [HIVE-6125](https://issues.apache.org/jira/browse/HIVE-6125) and [HIVE-6128](https://issues.apache.org/jira/browse/HIVE-6128) -If turned on, splits generated by [ORC]({{< ref "languagemanual-orc" >}}) will include metadata about the stripes in the file. This data is read remotely (from the client or HiveServer2 machine) and sent to all the tasks. +If turned on, splits generated by [ORC]({{% ref "languagemanual-orc" %}}) will include metadata about the stripes in the file. This data is read remotely (from the client or HiveServer2 machine) and sent to all the tasks. ##### hive.orc.cache.stripe.details.size * Default Value: `10000` * Added In: Hive 0.13.0 with [HIVE-6125](https://issues.apache.org/jira/browse/HIVE-6125) and [HIVE-6128](https://issues.apache.org/jira/browse/HIVE-6128) -Cache size for keeping meta information about [ORC]({{< ref "languagemanual-orc" >}}) splits cached in the client. +Cache size for keeping meta information about [ORC]({{% ref "languagemanual-orc" %}}) splits cached in the client. ##### hive.orc.cache.use.soft.references @@ -1627,14 +1627,14 @@ The maximum weight allowed for the SearchArgument Cache, in megabytes. By defaul * Default Value: `10` * Added In: Hive 0.13.0 with [HIVE-6125](https://issues.apache.org/jira/browse/HIVE-6125) and [HIVE-6128](https://issues.apache.org/jira/browse/HIVE-6128) -How many threads [ORC]({{< ref "languagemanual-orc" >}}) should use to create splits in parallel. +How many threads [ORC]({{% ref "languagemanual-orc" %}}) should use to create splits in parallel. ##### hive.exec.orc.split.strategy * Default Value: HYBRID * Added In: Hive 1.2.0 with [HIVE-10114](https://issues.apache.org/jira/browse/HIVE-10114) -What strategy [ORC]({{< ref "languagemanual-orc" >}}) should use to create splits for execution. The available options are "BI", "ETL" and "HYBRID". +What strategy [ORC]({{% ref "languagemanual-orc" %}}) should use to create splits for execution. The available options are "BI", "ETL" and "HYBRID". The HYBRID mode reads the footers for all files if there are fewer files than expected mapper count, switching over to generating 1 split per file if the average file sizes are smaller than the default HDFS blocksize. ETL strategy always reads the ORC footers before generating splits, while the BI strategy generates per-file splits fast without reading any data from HDFS. @@ -1657,7 +1657,7 @@ Use zerocopy reads with ORC. (This requires Hadoop 2.3 or later.) * Default Value: `true` * Added In: Hive 0.14.0 with [HIVE-7509](https://issues.apache.org/jira/browse/HIVE-7509) -When **[hive.merge.mapfiles]({{< ref "#hivemergemapfiles" >}})** , **[hive.merge.mapredfiles]({{< ref "#hivemergemapredfiles" >}})** or **[hive.merge.tezfiles]({{< ref "#hivemergetezfiles" >}})** is enabled while writing a table with ORC file format, enabling this configuration property will do stripe-level fast merge for small ORC files. Note that enabling this configuration property will not honor the padding tolerance configuration ( **[hive.exec.orc.block.padding.tolerance]({{< ref "#hiveexecorcblockpaddingtolerance" >}})** ). +When **[hive.merge.mapfiles]({{% ref "#hivemergemapfiles" %}})** , **[hive.merge.mapredfiles]({{% ref "#hivemergemapredfiles" %}})** or **[hive.merge.tezfiles]({{% ref "#hivemergetezfiles" %}})** is enabled while writing a table with ORC file format, enabling this configuration property will do stripe-level fast merge for small ORC files. Note that enabling this configuration property will not honor the padding tolerance configuration ( **[hive.exec.orc.block.padding.tolerance]({{% ref "#hiveexecorcblockpaddingtolerance" %}})** ). ##### hive.orc.row.index.stride.dictionary.check @@ -1677,7 +1677,7 @@ Value can be `SPEED` or `COMPRESSION`. #### Parquet -Parquet is supported by a plugin in Hive 0.10, 0.11, and 0.12 and natively in Hive 0.13 and later. See [Parquet]({{< ref "parquet" >}}) for details. +Parquet is supported by a plugin in Hive 0.10, 0.11, and 0.12 and natively in Hive 0.13 and later. See [Parquet]({{% ref "parquet" %}}) for details. ##### hive.parquet.timestamp.skip.conversion @@ -1702,7 +1702,7 @@ NOTE: This property will influence how HBase files using the AvroSerDe and times ### Vectorization -Hive added vectorized query execution in release 0.13.0 ([HIVE-4160](https://issues.apache.org/jira/browse/HIVE-4160), [HIVE-5283](https://issues.apache.org/jira/browse/HIVE-5283)). For more information see the design document [Vectorized Query Execution]({{< ref "vectorized-query-execution" >}}) . +Hive added vectorized query execution in release 0.13.0 ([HIVE-4160](https://issues.apache.org/jira/browse/HIVE-4160), [HIVE-5283](https://issues.apache.org/jira/browse/HIVE-5283)). For more information see the design document [Vectorized Query Execution]({{% ref "vectorized-query-execution" %}}) . ##### hive.vectorized.execution.enabled @@ -1808,19 +1808,19 @@ This flag should be set to true to enable vectorizing using row deserialize. The * Default Value: (empty) * Added in: Hive 2.4.0 with [HIVE-17534](https://issues.apache.org/jira/browse/HIVE-17534) -This flag should be used to provide a comma separated list of fully qualified classnames to exclude certain FileInputFormats from vectorized execution using the vectorized file inputformat. Note that vectorized execution could still occur for that input format based on whether **[hive.vectorized.use.vector.serde.deserialize]({{< ref "#hivevectorizedusevectorserdedeserialize" >}})** or **[hive.vectorized.use.row.serde.deserialize]({{< ref "#hivevectorizeduserowserdedeserialize" >}})** is enabled or not.   +This flag should be used to provide a comma separated list of fully qualified classnames to exclude certain FileInputFormats from vectorized execution using the vectorized file inputformat. Note that vectorized execution could still occur for that input format based on whether **[hive.vectorized.use.vector.serde.deserialize]({{% ref "#hivevectorizedusevectorserdedeserialize" %}})** or **[hive.vectorized.use.row.serde.deserialize]({{% ref "#hivevectorizeduserowserdedeserialize" %}})** is enabled or not.   ## MetaStore In addition to the Hive metastore properties listed in this section, some properties are listed in other sections: -* [Hive Metastore Security]({{< ref "#hive-metastore-security" >}}) - + **[hive.metastore.pre.event.listeners]({{< ref "#hivemetastorepreeventlisteners" >}})** - + **[hive.security.metastore.authorization.manager]({{< ref "#hivesecuritymetastoreauthorizationmanager" >}})** - + **[hive.security.metastore.authenticator.manager]({{< ref "#hivesecuritymetastoreauthenticatormanager" >}})** - + **[hive.security.metastore.authorization.auth.reads]({{< ref "#hivesecuritymetastoreauthorizationauthreads" >}})** -* [Metrics]({{< ref "#metrics" >}}) - + [**hive.metastore.metrics.enabled**]({{< ref "#**hive-metastore-metrics-enabled**" >}}) +* [Hive Metastore Security]({{% ref "#hive-metastore-security" %}}) + + **[hive.metastore.pre.event.listeners]({{% ref "#hivemetastorepreeventlisteners" %}})** + + **[hive.security.metastore.authorization.manager]({{% ref "#hivesecuritymetastoreauthorizationmanager" %}})** + + **[hive.security.metastore.authenticator.manager]({{% ref "#hivesecuritymetastoreauthenticatormanager" %}})** + + **[hive.security.metastore.authorization.auth.reads]({{% ref "#hivesecuritymetastoreauthorizationauthreads" %}})** +* [Metrics]({{% ref "#metrics" %}}) + + [**hive.metastore.metrics.enabled**]({{% ref "#**hive-metastore-metrics-enabled**" %}}) ##### hive.metastore.local @@ -1828,7 +1828,7 @@ In addition to the Hive metastore properties listed in this section, some proper * Added In: Hive 0.8.1 * Removed In: Hive 0.10 with [HIVE-2585](https://issues.apache.org/jira/browse/HIVE-2585) -Controls whether to connect to remote metastore server or open a new metastore server in Hive Client JVM. As of Hive 0.10 this is no longer used. Instead if **`[hive.metastore.uris]({{< ref "#hive-metastore-uris" >}})`** is set then `remote` mode is assumed otherwise `local`. +Controls whether to connect to remote metastore server or open a new metastore server in Hive Client JVM. As of Hive 0.10 this is no longer used. Instead if **`[hive.metastore.uris]({{% ref "#hive-metastore-uris" %}})`** is set then `remote` mode is assumed otherwise `local`. ##### hive.metastore.uri.selection @@ -1886,7 +1886,7 @@ Username to use against metastore database. Password to use against metastore database. -For an alternative configuration, see [Removing Hive Metastore Password from Hive Configuration]({{< ref "#removing-hive-metastore-password-from-hive-configuration" >}}). +For an alternative configuration, see [Removing Hive Metastore Password from Hive Configuration]({{% ref "#removing-hive-metastore-password-from-hive-configuration" %}}). ##### javax.jdo.option.Multithreaded @@ -1925,14 +1925,14 @@ configured with embedded metastore. To get optimal performance, set config to me * Default Value: `false` * Added In: Hive 0.7.0 -* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateTables]({{< ref "#datanucleusschemavalidatetables" >}})** +* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateTables]({{% ref "#datanucleusschemavalidatetables" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. ##### datanucleus.schema.validateTables * Default Value: `false` -* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateTables]({{< ref "#datanucleusvalidatetables" >}})** +* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateTables]({{% ref "#datanucleusvalidatetables" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. @@ -1940,14 +1940,14 @@ Validates existing schema against code. Turn this on if you want to verify exist * Default Value: `false` * Added In: Hive 0.7.0 -* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateColumns]({{< ref "#datanucleusschemavalidatecolumns" >}})** +* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateColumns]({{% ref "#datanucleusschemavalidatecolumns" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. ##### datanucleus.schema.validateColumns * Default Value: `false` -* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateColumns]({{< ref "#datanucleusvalidatecolumns" >}})** +* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateColumns]({{% ref "#datanucleusvalidatecolumns" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. @@ -1955,14 +1955,14 @@ Validates existing schema against code. Turn this on if you want to verify exist * Default Value: `false` * Added In: Hive 0.7.0 -* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateConstraints]({{< ref "#datanucleusschemavalidateconstraints" >}})** +* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.validateConstraints]({{% ref "#datanucleusschemavalidateconstraints" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. ##### datanucleus.schema.validateConstraints * Default Value: `false` -* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateConstraints]({{< ref "#datanucleusvalidateconstraints" >}})** +* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.validateConstraints]({{% ref "#datanucleusvalidateconstraints" %}})** Validates existing schema against code. Turn this on if you want to verify existing schema. @@ -1987,20 +1987,20 @@ Dictates whether to allow updates to schema or not. * Default Value: `true` * Added In: Hive 0.7.0 -* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.autoCreateAll]({{< ref "#datanucleusschemaautocreateall" >}})** +* Removed In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaced by **[datanucleus.schema.autoCreateAll]({{% ref "#datanucleusschemaautocreateall" %}})** Creates necessary schema on a startup if one does not exist. Set this to false, after creating it once. -In Hive 0.12.0 and later releases, **datanucleus.autoCreateSchema** is disabled if **[hive.metastore.schema.verification]({{< ref "#hivemetastoreschemaverification" >}})**  is `true`. +In Hive 0.12.0 and later releases, **datanucleus.autoCreateSchema** is disabled if **[hive.metastore.schema.verification]({{% ref "#hivemetastoreschemaverification" %}})**  is `true`. ##### datanucleus.schema.autoCreateAll * Default Value: `false` -* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.autoCreateSchema]({{< ref "#datanucleusautocreateschema" >}})** (with different default value) +* Added In: Hive 2.0.0 with [HIVE-6113](https://issues.apache.org/jira/browse/HIVE-6113), replaces **[datanucleus.autoCreateSchema]({{% ref "#datanucleusautocreateschema" %}})** (with different default value) Creates necessary schema on a startup if one does not exist. Reset this to false, after creating it once. -**datanucleus.schema.autoCreateAll** is disabled if **[hive.metastore.schema.verification]({{< ref "#hivemetastoreschemaverification" >}})**  is `true`. +**datanucleus.schema.autoCreateAll** is disabled if **[hive.metastore.schema.verification]({{% ref "#hivemetastoreschemaverification" %}})**  is `true`. ##### datanucleus.autoStartMechanismMode @@ -2022,7 +2022,7 @@ Default transaction isolation level for identity generation. * Added In: Hive 0.7.0 This parameter does nothing. -*Warning note:* For most installations, Hive should not enable the DataNucleus L2 cache, since this can cause correctness issues. Thus, some people set this parameter to false assuming that this disables the cache – unfortunately, it does not. To actually disable the cache, set **[datanucleus.cache.level2.type]({{< ref "#datanucleuscachelevel2type" >}})**  to "none". +*Warning note:* For most installations, Hive should not enable the DataNucleus L2 cache, since this can cause correctness issues. Thus, some people set this parameter to false assuming that this disables the cache – unfortunately, it does not. To actually disable the cache, set **[datanucleus.cache.level2.type]({{% ref "#datanucleuscachelevel2type" %}})**  to "none". ##### datanucleus.cache.level2.type @@ -2059,9 +2059,9 @@ Location of default database for the warehouse. * Added In: Hive 0.9.0 with [HIVE-2504](https://issues.apache.org/jira/browse/HIVE-2504). * Removed In: Hive 3.0.0 with [HIVE-16392](https://issues.apache.org/jira/browse/HIVE-16392) -Set this to true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask. (This configuration property replaced **[hive.files.umask.value]({{< ref "#hivefilesumaskvalue" >}})** before Hive 0.9.0 was released) (This configuration property was removed in release 3.0.0, more details in [Permission Inheritance in Hive]({{< ref "permission-inheritance-in-hive" >}})) +Set this to true if table directories should inherit the permissions of the warehouse or database directory instead of being created with permissions derived from dfs umask. (This configuration property replaced **[hive.files.umask.value]({{% ref "#hivefilesumaskvalue" %}})** before Hive 0.9.0 was released) (This configuration property was removed in release 3.0.0, more details in [Permission Inheritance in Hive]({{% ref "permission-inheritance-in-hive" %}})) -Behavior of the flag is changed with Hive-0.14.0 in [HIVE-6892](https://issues.apache.org/jira/browse/HIVE-6892) and sub-JIRA's. More details in [Permission Inheritance in Hive]({{< ref "permission-inheritance-in-hive" >}}). +Behavior of the flag is changed with Hive-0.14.0 in [HIVE-6892](https://issues.apache.org/jira/browse/HIVE-6892) and sub-JIRA's. More details in [Permission Inheritance in Hive]({{% ref "permission-inheritance-in-hive" %}}). ##### hive.metastore.execute.setugi @@ -2260,9 +2260,9 @@ Note: This principal is used by the metastore process for authentication with ot * Default Value: (empty) * Added In: Hive 2.2.1, 2.4.0 ([HIVE-17489](https://issues.apache.org/jira/browse/HIVE-17489)) -The client-facing Kerberos service principal for the Hive metastore. If unset, it defaults to the value set for [**hive.metastore.kerberos.principal**]({{< ref "#**hive-metastore-kerberos-principal**" >}}), for backward compatibility. +The client-facing Kerberos service principal for the Hive metastore. If unset, it defaults to the value set for [**hive.metastore.kerberos.principal**]({{% ref "#**hive-metastore-kerberos-principal**" %}}), for backward compatibility. -Also see  [**hive.server2.authentication.client.kerberos.principal**]({{< ref "#**hive-server2-authentication-client-kerberos-principal**" >}}). +Also see  [**hive.server2.authentication.client.kerberos.principal**]({{% ref "#**hive-server2-authentication-client-kerberos-principal**" %}}). ##### hive.metastore.client.cache.v2.enabled @@ -2312,10 +2312,10 @@ If true, the metastore Thrift interface will use TFramedTransport. When false (d * Added In: Hive 0.12.0 with [HIVE-3764](https://issues.apache.org/jira/browse/HIVE-3764) Enforce metastore schema version consistency. -True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt (see **[datanucleus.autoCreateSchema]({{< ref "#datanucleusautocreateschema" >}})** and **[datanucleus.schema.autoCreateAll]({{< ref "#datanucleusschemaautocreateall" >}})** ). Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. +True: Verify that version information stored in metastore matches with one from Hive jars. Also disable automatic schema migration attempt (see **[datanucleus.autoCreateSchema]({{% ref "#datanucleusautocreateschema" %}})** and **[datanucleus.schema.autoCreateAll]({{% ref "#datanucleusschemaautocreateall" %}})** ). Users are required to manually migrate schema after Hive upgrade which ensures proper metastore schema migration. False: Warn if the version information stored in metastore doesn't match with one from Hive jars. -For more information, see [Metastore Schema Consistency and Upgrades]({{< ref "#metastore-schema-consistency-and-upgrades" >}}). +For more information, see [Metastore Schema Consistency and Upgrades]({{% ref "#metastore-schema-consistency-and-upgrades" %}}). ##### hive.metastore.disallow.incompatible.col.type.changes @@ -2335,7 +2335,7 @@ See [HIVE-4409](https://issues.apache.org/jira/browse/HIVE-4409) for more detail * Default Value: `false` * Added In: Hive 0.13.0 with [HIVE-6052](https://issues.apache.org/jira/browse/HIVE-6052) -Allow JDO query pushdown for integral partition columns in metastore. Off by default. This improves metastore performance for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (for example, if they have leading zeroes like 0012). If metastore direct SQL is enabled and works ( **[hive.metastore.try.direct.sql]({{< ref "#hivemetastoretrydirectsql" >}})** ), this optimization is also irrelevant. +Allow JDO query pushdown for integral partition columns in metastore. Off by default. This improves metastore performance for integral columns, especially if there's a large number of partitions. However, it doesn't work correctly with integral values that are not normalized (for example, if they have leading zeroes like 0012). If metastore direct SQL is enabled and works ( **[hive.metastore.try.direct.sql]({{% ref "#hivemetastoretrydirectsql" %}})** ), this optimization is also irrelevant. ##### hive.metastore.try.direct.sql @@ -2351,7 +2351,7 @@ This can be configured on a per client basis by using the `set metaconf:hive.met * Default Value: `true` * Added In: Hive 0.13.0 with [HIVE-5626](https://issues.apache.org/jira/browse/HIVE-5626) -Same as **[hive.metastore.try.direct.sql]({{< ref "#hivemetastoretrydirectsql" >}})** , for read statements within a transaction that modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL select query has incorrect syntax or something similar inside a transaction, the entire transaction will fail and fall-back to DataNucleus will not be possible. You should disable the usage of direct SQL inside [transactions]({{< ref "hive-transactions" >}}) if that happens in your case. +Same as **[hive.metastore.try.direct.sql]({{% ref "#hivemetastoretrydirectsql" %}})** , for read statements within a transaction that modifies metastore data. Due to non-standard behavior in Postgres, if a direct SQL select query has incorrect syntax or something similar inside a transaction, the entire transaction will fail and fall-back to DataNucleus will not be possible. You should disable the usage of direct SQL inside [transactions]({{% ref "hive-transactions" %}}) if that happens in your case. This can be configured on a per client basis by using the `set metaconf:hive.metastore.try.direct.sql.ddl=` command, starting with Hive 0.14.0 ( [HIVE-7532](https://issues.apache.org/jira/browse/HIVE-7532)). @@ -2402,7 +2402,7 @@ Enable a metadata count at metastore startup for metrics. * Default value: -1 * Added In: Hive 2.2.0 with [HIVE-13884](https://issues.apache.org/jira/browse/HIVE-13884) -This limits the number of partitions that can be requested from the Metastore for a given table. A query will not be executed if it attempts to fetch more partitions per table than the limit configured. A value of "-1" means unlimited. This parameter is preferred over  **[hive.limit.query.max.table.partition]({{< ref "#hivelimitquerymaxtablepartition" >}})**  (deprecated; removed in 3.0.0). +This limits the number of partitions that can be requested from the Metastore for a given table. A query will not be executed if it attempts to fetch more partitions per table than the limit configured. A value of "-1" means unlimited. This parameter is preferred over  **[hive.limit.query.max.table.partition]({{% ref "#hivelimitquerymaxtablepartition" %}})**  (deprecated; removed in 3.0.0). ##### **hive.metastore.fastpath** @@ -2426,7 +2426,7 @@ Added in: Hive 3.0.0 with [HIVE-17318](https://issues.apache.org/jira/browse/HI ### Hive Metastore HBase -Development of an [HBase metastore]({{< ref "hbasemetastoredevelopmentguide" >}}) for Hive started in release 2.0.0 ([HIVE-9452](https://issues.apache.org/jira/browse/HIVE-9452)) but the work has been stopped and the code was removed from Hive in release 3.0.0 ([HIVE-17234](https://issues.apache.org/jira/browse/HIVE-17234)). +Development of an [HBase metastore]({{% ref "hbasemetastoredevelopmentguide" %}}) for Hive started in release 2.0.0 ([HIVE-9452](https://issues.apache.org/jira/browse/HIVE-9452)) but the work has been stopped and the code was removed from Hive in release 3.0.0 ([HIVE-17234](https://issues.apache.org/jira/browse/HIVE-17234)). Many more configuration properties were created for the HBase metastore in releases 2.x.x – they are not documented here.  For a full list, see the [doc note on HIVE-17234](https://issues.apache.org/jira/browse/HIVE-17234?focusedCommentId=16117879&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16117879). @@ -2451,13 +2451,13 @@ Number of threads to use to read file metadata in background to cache it. ## HiveServer2 -HiveServer2 was added in Hive 0.11.0 with [HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935).  For more information see [HiveServer2 Overview]({{< ref "hiveserver2-overview" >}}), [Setting Up HiveServer2]({{< ref "setting-up-hiveserver2" >}}), and [HiveServer2 Clients]({{< ref "hiveserver2-clients" >}}). +HiveServer2 was added in Hive 0.11.0 with [HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935).  For more information see [HiveServer2 Overview]({{% ref "hiveserver2-overview" %}}), [Setting Up HiveServer2]({{% ref "setting-up-hiveserver2" %}}), and [HiveServer2 Clients]({{% ref "hiveserver2-clients" %}}). Besides the configuration properties listed in this section, some HiveServer2 properties are listed in other sections: -* **[hive.server2.builtin.udf.whitelist]({{< ref "#hiveserver2builtinudfwhitelist" >}})** -* **[hive.server2.builtin.udf.blacklist]({{< ref "#hiveserver2builtinudfblacklist" >}})** -* **[hive.server2.metrics.enabled]({{< ref "#hiveserver2metricsenabled" >}})** +* **[hive.server2.builtin.udf.whitelist]({{% ref "#hiveserver2builtinudfwhitelist" %}})** +* **[hive.server2.builtin.udf.blacklist]({{% ref "#hiveserver2builtinudfblacklist" %}})** +* **[hive.server2.metrics.enabled]({{% ref "#hiveserver2metricsenabled" %}})** ##### hive.server2.support.dynamic.service.discovery @@ -2518,7 +2518,7 @@ Client authentication types. NONE: no authentication check – plain SASL transport LDAP: LDAP/AD based authentication KERBEROS: Kerberos/GSSAPI authentication -CUSTOM: Custom authentication provider (use with property **[hive.server2.custom.authentication.class]({{< ref "#hiveserver2customauthenticationclass" >}})** ) +CUSTOM: Custom authentication provider (use with property **[hive.server2.custom.authentication.class]({{% ref "#hiveserver2customauthenticationclass" %}})** ) PAM: Pluggable authentication module (added in Hive 0.13.0 with [HIVE-6466](https://issues.apache.org/jira/browse/HIVE-6466)) NOSASL:  Raw transport (added in Hive 0.13.0)  @@ -2541,14 +2541,14 @@ Kerberos server principal. * Default Value: (empty) * Added In: Hive 2.1.1, 2.4.0 with [HIVE-17489](https://issues.apache.org/jira/browse/HIVE-17489) -Kerberos server principal used by the HA HiveServer2. Also see [**hive.metastore.client.kerberos.principal**]({{< ref "#**hive-metastore-client-kerberos-principal**" >}}). +Kerberos server principal used by the HA HiveServer2. Also see [**hive.metastore.client.kerberos.principal**]({{% ref "#**hive-metastore-client-kerberos-principal**" %}}). ##### hive.server2.custom.authentication.class * Default Value: (empty) * Added In: Hive 0.11.0 with [HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935) -Custom authentication class. Used when property  **[hive.server2.authentication]({{< ref "#hiveserver2authentication" >}})**  is set to 'CUSTOM'. Provided class must be a proper implementation of the interface org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2 will call its Authenticate(user, passed) method to authenticate requests. The implementation may optionally extend Hadoop's org.apache.hadoop.conf.Configured class to grab Hive's Configuration object. +Custom authentication class. Used when property  **[hive.server2.authentication]({{% ref "#hiveserver2authentication" %}})**  is set to 'CUSTOM'. Provided class must be a proper implementation of the interface org.apache.hive.service.auth.PasswdAuthenticationProvider. HiveServer2 will call its Authenticate(user, passed) method to authenticate requests. The implementation may optionally extend Hadoop's org.apache.hadoop.conf.Configured class to grab Hive's Configuration object. ##### hive.server2.enable.doAs @@ -2590,7 +2590,7 @@ LDAP domain. * Default Value: (empty) * Added In: Hive 1.3 with [HIVE-7193](https://issues.apache.org/jira/browse/HIVE-7193) -A COLON-separated list of string patterns to represent the base DNs for LDAP Groups. Use "%s" where the actual group name is to be plugged in. See [Group Membership]({{< ref "#group-membership" >}}) for details. +A COLON-separated list of string patterns to represent the base DNs for LDAP Groups. Use "%s" where the actual group name is to be plugged in. See [Group Membership]({{% ref "#group-membership" %}}) for details. Example of one string pattern: *uid=%s,OU=Groups,DC=apache,DC=org* @@ -2599,7 +2599,7 @@ Example of one string pattern: *uid=%s,OU=Groups,DC=apache,DC=org* * Default Value: (empty) * Added In: Hive 1.3 with [HIVE-7193](https://issues.apache.org/jira/browse/HIVE-7193) -A COMMA-separated list of group names that the users should belong to (at least one of the groups) for authentication to succeed. See [Group Membership]({{< ref "#group-membership" >}}) for details. +A COMMA-separated list of group names that the users should belong to (at least one of the groups) for authentication to succeed. See [Group Membership]({{% ref "#group-membership" %}}) for details. ##### hive.server2.authentication.ldap.groupMembershipKey @@ -2611,9 +2611,9 @@ LDAP attribute name on the group object that contains the list of distinguished This property is used in LDAP search queries when finding LDAP group names that a particular user belongs to. The value of the LDAP attribute, indicated by this property, should be a full DN for the user or the short username or userid. For example, a group entry for "fooGroup" containing "member : uid=fooUser,ou=Users,dc=domain,dc=com" will help determine that  "fooUser" belongs to LDAP group "fooGroup". -See [Group Membership]({{< ref "#group-membership" >}}) for a detailed example. +See [Group Membership]({{% ref "#group-membership" %}}) for a detailed example. -This property can also be used to find the users if a custom-configured LDAP query returns a group instead of a user (as of [Hive 2.1.1](https://issues.apache.org/jira/browse/HIVE-14513)). For details, see [Support for Groups in Custom LDAP Query]({{< ref "#support-for-groups-in-custom-ldap-query" >}}). +This property can also be used to find the users if a custom-configured LDAP query returns a group instead of a user (as of [Hive 2.1.1](https://issues.apache.org/jira/browse/HIVE-14513)). For details, see [Support for Groups in Custom LDAP Query]({{% ref "#support-for-groups-in-custom-ldap-query" %}}). ##### hive.server2.authentication.ldap.userMembershipKey @@ -2629,14 +2629,14 @@ LDAP attribute name on the user object that contains groups of which the user is This property is used in LDAP search queries for finding LDAP group names a user belongs to. The value of this property is used to construct LDAP group search query and is used to indicate what a group's objectClass is. Every LDAP group has certain objectClass. For example: group, groupOfNames, groupOfUniqueNames etc. -See [Group Membership]({{< ref "#group-membership" >}}) for a detailed example. +See [Group Membership]({{% ref "#group-membership" %}}) for a detailed example. ##### hive.server2.authentication.ldap.userDNPattern * Default Value: (empty) * Added In: Hive 1.3 with [HIVE-7193](https://issues.apache.org/jira/browse/HIVE-7193) -A COLON-separated list of string patterns to represent the base DNs for LDAP Users. Use "%s" where the actual username is to be plugged in. See [User Search List]({{< ref "#user-search-list" >}}) for details. +A COLON-separated list of string patterns to represent the base DNs for LDAP Users. Use "%s" where the actual username is to be plugged in. See [User Search List]({{% ref "#user-search-list" %}}) for details. Example of one string pattern: *uid=%s,OU=Users,DC=apache,DC=org* @@ -2645,14 +2645,14 @@ Example of one string pattern: *uid=%s,OU=Users,DC=apache,DC=org* * Default Value: (empty) * Added In: Hive 1.3 with [HIVE-7193](https://issues.apache.org/jira/browse/HIVE-7193) -A COMMA-separated list of usernames for whom authentication will succeed if the user is found in LDAP. See [User Search List]({{< ref "#user-search-list" >}}) for details. +A COMMA-separated list of usernames for whom authentication will succeed if the user is found in LDAP. See [User Search List]({{% ref "#user-search-list" %}}) for details. ##### hive.server2.authentication.ldap.customLDAPQuery * Default Value: (empty) * Added In: Hive 1.3 with [HIVE-7193](https://issues.apache.org/jira/browse/HIVE-7193) -A user-specified custom LDAP query that will be used to grant/deny an authentication request. If the user is part of the query's result set, authentication succeeds. See [Custom Query String]({{< ref "#custom-query-string" >}}) for details. +A user-specified custom LDAP query that will be used to grant/deny an authentication request. If the user is part of the query's result set, authentication succeeds. See [Custom Query String]({{% ref "#custom-query-string" %}}) for details. ##### hive.server2.authentication.ldap.binddn @@ -2681,7 +2681,7 @@ Either the location of a HiveServer2 global init file or a directory containing * Default Value: `binary` * Added In: Hive 0.12.0 -* Deprecated In: Hive 0.14.0 with [HIVE-6972](https://issues.apache.org/jira/browse/HIVE-6972) (see [Connection URL When HiveServer2 Is Running in HTTP Mode]({{< ref "#connection-url-when-hiveserver2-is-running-in-http-mode" >}})) but only for clients. This setting is still in use and not deprecated for the HiveServer2 itself. +* Deprecated In: Hive 0.14.0 with [HIVE-6972](https://issues.apache.org/jira/browse/HIVE-6972) (see [Connection URL When HiveServer2 Is Running in HTTP Mode]({{% ref "#connection-url-when-hiveserver2-is-running-in-http-mode" %}})) but only for clients. This setting is still in use and not deprecated for the HiveServer2 itself. Server transport mode. Value can be "binary" or "http". @@ -2696,7 +2696,7 @@ Port number when in HTTP mode. * Default Value: `cliservice` * Added In: Hive 0.12.0 -* Deprecated In: Hive 0.14.0 with [HIVE-6972](https://issues.apache.org/jira/browse/HIVE-6972) (see [Connection URL When HiveServer2 Is Running in HTTP Mode]({{< ref "#connection-url-when-hiveserver2-is-running-in-http-mode" >}})) +* Deprecated In: Hive 0.14.0 with [HIVE-6972](https://issues.apache.org/jira/browse/HIVE-6972) (see [Connection URL When HiveServer2 Is Running in HTTP Mode]({{% ref "#connection-url-when-hiveserver2-is-running-in-http-mode" %}})) Path component of URL endpoint when in HTTP mode. @@ -2830,7 +2830,7 @@ SPNEGO service principal, optional. A typical value would look like `HTTP/_HOST * Default Value: (empty) * Added In: Hive 0.13.0 with [HIVE-6466](https://issues.apache.org/jira/browse/HIVE-6466) -List of the underlying PAM services that should be used when **[hive.server2.authentication]({{< ref "#hiveserver2authentication" >}})**  type is PAM. A file with the same name must exist in /etc/pam.d. +List of the underlying PAM services that should be used when **[hive.server2.authentication]({{% ref "#hiveserver2authentication" %}})**  type is PAM. A file with the same name must exist in /etc/pam.d. ##### hive.server2.use.SSL @@ -2865,7 +2865,7 @@ A list of comma separated values corresponding to YARN queues of the same name. * Default Value: `1` * Added In: Hive 0.13.0 with [HIVE-6325](https://issues.apache.org/jira/browse/HIVE-6325) -A positive integer that determines the number of Tez sessions that should be launched on each of the queues specified by **[hive.server2.tez.default.queues]({{< ref "#hiveserver2tezdefaultqueues" >}})** . Determines the parallelism on each queue. +A positive integer that determines the number of Tez sessions that should be launched on each of the queues specified by **[hive.server2.tez.default.queues]({{% ref "#hiveserver2tezdefaultqueues" %}})** . Determines the parallelism on each queue. ##### hive.server2.tez.initialize.default.sessions @@ -2891,14 +2891,14 @@ The check interval for session/operation timeout, which can be disabled by setti + Hive 1.2.1+, 1.3+, 2.x+: 7d ([HIVE-9842](https://issues.apache.org/jira/browse/HIVE-9842)) * Added In: Hive 0.14.0 with [HIVE-5799](https://issues.apache.org/jira/browse/HIVE-5799) -With  [**hive.server2.session.check.interval**]({{< ref "#**hive-server2-session-check-interval**" >}}) set to a positive time value, session will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero or negative value. +With  [**hive.server2.session.check.interval**]({{% ref "#**hive-server2-session-check-interval**" %}}) set to a positive time value, session will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero or negative value. ##### hive.server2.idle.operation.timeout * Default Value: 0ms * Added In: Hive 0.14.0 with [HIVE-5799](https://issues.apache.org/jira/browse/HIVE-5799) -With  [**hive.server2.session.check.interval**]({{< ref "#**hive-server2-session-check-interval**" >}}) set to a positive time value, operation will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero value. +With  [**hive.server2.session.check.interval**]({{% ref "#**hive-server2-session-check-interval**" %}}) set to a positive time value, operation will be closed when it's not accessed for this duration of time, which can be disabled by setting to zero value. With positive value, it's checked for operations in terminal state only (FINISHED, CANCELED, CLOSED, ERROR). With negative value, it's checked for all of the operations regardless of state. @@ -2923,7 +2923,7 @@ Top level directory where operation logs are stored if logging functionality is * Added In: Hive 0.14.0 with [HIVE-8785](https://issues.apache.org/jira/browse/HIVE-8785) * Removed In: Hive 1.2.0 with [HIVE-10119](https://issues.apache.org/jira/browse/HIVE-10119) -When `true`, HiveServer2 operation logs available for clients will be verbose. Replaced in Hive 1.2.0 by [**hive.server2.logging.operation.level**]({{< ref "#**hive-server2-logging-operation-level**" >}}). +When `true`, HiveServer2 operation logs available for clients will be verbose. Replaced in Hive 1.2.0 by [**hive.server2.logging.operation.level**]({{% ref "#**hive-server2-logging-operation-level**" %}}). ##### hive.server2.logging.operation.level @@ -2932,7 +2932,7 @@ When `true`, HiveServer2 operation logs available for clients will be verbose. R HiveServer2 operation logging mode available to clients to be set at session level. -For this to work, **[hive.server2.logging.operation.enabled]({{< ref "#hiveserver2loggingoperationenabled" >}})** should be set to true. The allowed values are: +For this to work, **[hive.server2.logging.operation.enabled]({{% ref "#hiveserver2loggingoperationenabled" %}})** should be set to true. The allowed values are: * NONE: Ignore any logging. * EXECUTION: Log completion of tasks. @@ -3007,7 +3007,7 @@ This configuration property enables the user to provide a comma-separated list o * Default Value: `true` * Added In: Hive 2.2.0 with [HIVE-15473](https://issues.apache.org/jira/browse/HIVE-15473) -Allows HiveServer2 to send progress bar update information. This is currently available only if the [execution engine]({{< ref "#execution-engine" >}}) is **tez.** +Allows HiveServer2 to send progress bar update information. This is currently available only if the [execution engine]({{% ref "#execution-engine" %}}) is **tez.** ##### hive.hadoop.classpath @@ -3018,7 +3018,7 @@ For the Windows operating system, Hive needs to pass the HIVE_HADOOP_CLASSPATH J ### HiveServer2 Web UI -A web interface for HiveServer2 is introduced in release 2.0.0 (see [Web UI for HiveServer2]({{< ref "#web-ui-for-hiveserver2" >}})). +A web interface for HiveServer2 is introduced in release 2.0.0 (see [Web UI for HiveServer2]({{% ref "#web-ui-for-hiveserver2" %}})). ##### hive.server2.webui.host @@ -3088,47 +3088,47 @@ The path to the Kerberos Keytab file containing the HiveServer2 WebUI SPNEGO ser * Default Value: `HTTP/_HOST@EXAMPLE.COM` * Added In: Hive 2.0.0 with [HIVE-12485](https://issues.apache.org/jira/browse/HIVE-12485) -The HiveServer2 WebUI SPNEGO service principal. The special string _HOST will be replaced automatically with the value of **[hive.server2.webui.host]({{< ref "#hiveserver2webuihost" >}})** or the correct host name. +The HiveServer2 WebUI SPNEGO service principal. The special string _HOST will be replaced automatically with the value of **[hive.server2.webui.host]({{% ref "#hiveserver2webuihost" %}})** or the correct host name. #####   hive.server2.webui. explain.out put * Default Value: `false` * Added in: Hive 3.1.0 with [HIVE-18469](https://issues.apache.org/jira/browse/HIVE-18469) -The [EXPLAIN EXTENDED]({{< ref "#explain-extended" >}}) output for the query will be shown in the WebUI / Drilldown / Query Plan tab when this configuration property is set to true. +The [EXPLAIN EXTENDED]({{% ref "#explain-extended" %}}) output for the query will be shown in the WebUI / Drilldown / Query Plan tab when this configuration property is set to true. -Prior to Hive 3.1.0, you can use **[hive.log.explain.output]({{< ref "#hive-log-explain-output" >}})** instead of this configuration property. +Prior to Hive 3.1.0, you can use **[hive.log.explain.output]({{% ref "#hive-log-explain-output" %}})** instead of this configuration property. ##### hive.server2.webui.show.graph * Default Value: `false` * Added in: Hive 4.0.0 with [HIVE-17300](https://issues.apache.org/jira/browse/HIVE-17300) -Set this to true to to display query plan as a graph instead of text in the WebUI. Only works with  **[hive.server2.webui.explain.output]({{< ref "#hive-server2-webui-explain-output" >}})**  set to true. +Set this to true to to display query plan as a graph instead of text in the WebUI. Only works with  **[hive.server2.webui.explain.output]({{% ref "#hive-server2-webui-explain-output" %}})**  set to true. ##### hive.server2.webui.max.graph.size * Default Value: `25` * Added in: Hive 4.0.0 with [HIVE-17300](https://issues.apache.org/jira/browse/HIVE-17300) -Max number of stages graph can display. If number of stages exceeds this, no query plan will be shown. Only works when  **[hive.server2.webui.show.graph]({{< ref "#hiveserver2webuishowgraph" >}})**  and  **[hive.server2.webui.explain.output]({{< ref "#hive-server2-webui-explain-output" >}})**  set to true. +Max number of stages graph can display. If number of stages exceeds this, no query plan will be shown. Only works when  **[hive.server2.webui.show.graph]({{% ref "#hiveserver2webuishowgraph" %}})**  and  **[hive.server2.webui.explain.output]({{% ref "#hive-server2-webui-explain-output" %}})**  set to true. ##### hive.server2.webui.show.stats * Default Value: `false` * Added in: Hive 4.0.0 with [HIVE-17300](https://issues.apache.org/jira/browse/HIVE-17300) -Set this to true to to display statistics and log file for MapReduce tasks in the WebUI. Only works when  **[hive.server2.webui.show.graph]({{< ref "#hiveserver2webuishowgraph" >}})**  and  **[hive.server2.webui.explain.output]({{< ref "#hive-server2-webui-explain-output" >}})**  set to true. +Set this to true to to display statistics and log file for MapReduce tasks in the WebUI. Only works when  **[hive.server2.webui.show.graph]({{% ref "#hiveserver2webuishowgraph" %}})**  and  **[hive.server2.webui.explain.output]({{% ref "#hive-server2-webui-explain-output" %}})**  set to true. ## Spark -[Apache Spark](http://spark.apache.org/) was added in Hive [1.1.0]({{< ref "#1-1-0" >}}) ([HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292) and the merge-to-trunk JIRA's [HIVE-9257](https://issues.apache.org/jira/browse/HIVE-9257), [9352](https://issues.apache.org/jira/browse/HIVE-9352), [9448](https://issues.apache.org/jira/browse/HIVE-9448)). For information see the design document [Hive on Spark]({{< ref "hive-on-spark" >}}) and [Hive on Spark: Getting Started.]({{< ref "hive-on-spark-getting-started" >}}) +[Apache Spark](http://spark.apache.org/) was added in Hive [1.1.0]({{% ref "#1-1-0" %}}) ([HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292) and the merge-to-trunk JIRA's [HIVE-9257](https://issues.apache.org/jira/browse/HIVE-9257), [9352](https://issues.apache.org/jira/browse/HIVE-9352), [9448](https://issues.apache.org/jira/browse/HIVE-9448)). For information see the design document [Hive on Spark]({{% ref "hive-on-spark" %}}) and [Hive on Spark: Getting Started.]({{% ref "hive-on-spark-getting-started" %}}) To configure Hive execution to Spark, set the following property to "`spark`": -* [hive.execution.engine]({{< ref "#hiveexecutionengine" >}}) +* [hive.execution.engine]({{% ref "#hiveexecutionengine" %}}) Besides the configuration properties listed in this section, some properties in other sections are also related to Spark: @@ -3140,7 +3140,7 @@ Besides the configuration properties listed in this section, some properties in hive.spark.job.monitor.timeout * Default Value: `60` seconds -* Added In: Hive [1.1.0]({{< ref "#1-1-0" >}}) with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive [1.1.0]({{% ref "#1-1-0" %}}) with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Timeout for job monitor to get Spark job state. @@ -3176,14 +3176,14 @@ Updates Spark job execution progress in-place in the terminal. * Default Value: `false` * Added In: Hive 2.3.0 with [HIVE-15489](https://issues.apache.org/jira/browse/HIVE-15489) -* Removed In: Hive 3.0.0 with [HIVE-16336](https://issues.apache.org/jira/browse/HIVE-16336), replaced by  **[hive.spark.use.ts.stats.for.mapjoin]({{< ref "#hivesparkusetsstatsformapjoin" >}})** +* Removed In: Hive 3.0.0 with [HIVE-16336](https://issues.apache.org/jira/browse/HIVE-16336), replaced by  **[hive.spark.use.ts.stats.for.mapjoin]({{% ref "#hivesparkusetsstatsformapjoin" %}})** If this is set to true, mapjoin optimization in Hive/Spark will use source file sizes associated with the TableScan operator on the root of the operator tree, instead of using operator statistics. ##### hive.spark.use.ts.stats.for.mapjoin * Default Value: `false` -* Added In: Hive 3.0.0 with [HIVE-16336](https://issues.apache.org/jira/browse/HIVE-16336), replaces  **[hive.spark.use.file.size.for.mapjoin]({{< ref "#hivesparkusefilesizeformapjoin" >}})** +* Added In: Hive 3.0.0 with [HIVE-16336](https://issues.apache.org/jira/browse/HIVE-16336), replaces  **[hive.spark.use.file.size.for.mapjoin]({{% ref "#hivesparkusefilesizeformapjoin" %}})** If this is set to true, mapjoin optimization in Hive/Spark will use statistics from TableScan operators at the root of the operator tree, instead of parent ReduceSink operators of the Join operator. @@ -3192,19 +3192,19 @@ If this is set to true, mapjoin optimization in Hive/Spark will use statistics f * Default Value: `false` * Added In: Hive 3.0.0 with [HIVE-11133](https://issues.apache.org/jira/browse/HIVE-11133) -Whether to [show explain result at user level]({{< ref "#show-explain-result-at-user-level" >}}) for Hive-on-Spark queries. When enabled, will log EXPLAIN output for the query at user level. +Whether to [show explain result at user level]({{% ref "#show-explain-result-at-user-level" %}}) for Hive-on-Spark queries. When enabled, will log EXPLAIN output for the query at user level. ##### hive.prewarm.spark.timeout * Default Value: 5000ms * Added In: Hive 3.0.0 with [HIVE-17362](https://issues.apache.org/jira/browse/HIVE-17362) -Time to wait to finish prewarming Spark executors when  **[hive.prewarm.enabled]({{< ref "#hiveprewarmenabled" >}})**  is true. +Time to wait to finish prewarming Spark executors when  **[hive.prewarm.enabled]({{% ref "#hiveprewarmenabled" %}})**  is true. -Note:  These configuration properties for Hive on Spark are documented in the  [Tez section]({{< ref "#tez-section" >}})  because they can also affect Tez: +Note:  These configuration properties for Hive on Spark are documented in the  [Tez section]({{% ref "#tez-section" %}})  because they can also affect Tez: -* **[hive.prewarm.enabled]({{< ref "#hiveprewarmenabled" >}})** -* **[hive.prewarm.numcontainers]({{< ref "#hiveprewarmnumcontainers" >}})** +* **[hive.prewarm.enabled]({{% ref "#hiveprewarmenabled" %}})** +* **[hive.prewarm.numcontainers]({{% ref "#hiveprewarmnumcontainers" %}})** ##### hive.spark.optimize.shuffle.serde @@ -3271,28 +3271,28 @@ The remote Spark driver is the application launched in the Spark cluster, that s ##### hive.spark.client.future.timeout * Default Value: `60` seconds -* Added In: Hive [1.1.0]({{< ref "#1-1-0" >}}) with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive [1.1.0]({{% ref "#1-1-0" %}}) with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Timeout for requests from Hive client to remote Spark driver. ##### hive.spark.client.connect.timeout * Default Value: `1000` miliseconds -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Timeout for remote Spark driver in connecting back to Hive client. ##### hive.spark.client.server.connect.timeout * Default Value: `90000` miliseconds -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337), default changed in same release with [HIVE-9519](https://issues.apache.org/jira/browse/HIVE-9519) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337), default changed in same release with [HIVE-9519](https://issues.apache.org/jira/browse/HIVE-9519) Timeout for handshake between Hive client and remote Spark driver. Checked by both processes. ##### hive.spark.client.secret.bits * Default Value: `256` -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Number of bits of randomness in the generated secret for communication between Hive client and remote Spark driver. Rounded down to nearest multiple of 8. @@ -3306,62 +3306,62 @@ The server address of HiverServer2 host to be used for communication between Hiv ##### hive.spark.client.rpc.threads * Default Value: `8` -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Maximum number of threads for remote Spark driver's RPC event loop. ##### hive.spark.client.rpc.max.size * Default Value: `52,428,800`(50 * 1024 * 1024, or 50 MB) -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Maximum message size in bytes for communication between Hive client and remote Spark driver. Default is 50 MB. ##### hive.spark.client.channel.log.level * Default Value: (empty) -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-9337](https://issues.apache.org/jira/browse/HIVE-9337) Channel logging level for remote Spark driver. One of DEBUG, ERROR, INFO, TRACE, WARN. If unset, TRACE is chosen. ## Tez -[Apache Tez](http://incubator.apache.org/projects/tez.html) was added in Hive 0.13.0 ([HIVE-4660](https://issues.apache.org/jira/browse/HIVE-4660) and [HIVE-6098](https://issues.apache.org/jira/browse/HIVE-6098)).  For information see the design document [Hive on Tez]({{< ref "hive-on-tez" >}}), especially the [Installation and Configuration]({{< ref "#installation-and-configuration" >}}) section. +[Apache Tez](http://incubator.apache.org/projects/tez.html) was added in Hive 0.13.0 ([HIVE-4660](https://issues.apache.org/jira/browse/HIVE-4660) and [HIVE-6098](https://issues.apache.org/jira/browse/HIVE-6098)).  For information see the design document [Hive on Tez]({{% ref "hive-on-tez" %}}), especially the [Installation and Configuration]({{% ref "#installation-and-configuration" %}}) section. Besides the configuration properties listed in this section, some properties in other sections are also related to Tez: -* **[hive.execution.engine]({{< ref "#hiveexecutionengine" >}})** -* ##### [hive.mapjoin.optimized.hashtable]({{< ref "#hivemapjoinoptimizedhashtable" >}}) -* ##### [hive.mapjoin.optimized.hashtable.wbsize]({{< ref "#hivemapjoinoptimizedhashtablewbsize" >}}) -* **[hive.server2.tez.default.queues]({{< ref "#hiveserver2tezdefaultqueues" >}})** -* **[hive.server2.tez.sessions.per.default.queue]({{< ref "#hiveserver2tezsessionsperdefaultqueue" >}})** -* **[hive.server2.tez.initialize.default.sessions]({{< ref "#hiveserver2tezinitializedefaultsessions" >}})** -* **[hive.stats.max.variable.length]({{< ref "#hivestatsmaxvariablelength" >}})** -* **[hive.stats.list.num.entries]({{< ref "#hivestatslistnumentries" >}})** -* **[hive.stats.map.num.entries]({{< ref "#hivestatsmapnumentries" >}})** -* **[hive.stats.map.parallelism]({{< ref "#hivestatsmapparallelism" >}})**  (Hive 0.13 only; removed in Hive 0.14) -* **[hive.stats.join.factor]({{< ref "#hivestatsjoinfactor" >}})** -* **[hive.stats.deserialization.factor]({{< ref "#hivestatsdeserializationfactor" >}})** -* **[hive.tez.dynamic.semijoin.reduction]({{< ref "#hivetezdynamicsemijoinreduction" >}})** -* **[hive.tez.min.bloom.filter.entries]({{< ref "#hivetezminbloomfilterentries" >}})** -* **[hive.tez.max.bloom.filter.entries]({{< ref "#hivetezmaxbloomfilterentries" >}})** -* **[hive.tez.bloom.filter.factor]({{< ref "#hivetezbloomfilterfactor" >}})** -* **[hive.tez.bigtable.minsize.semijoin.reduction]({{< ref "#hivetezbigtableminsizesemijoinreduction" >}})** -* **[hive.explain.user]({{< ref "#hive-explain-user" >}})** +* **[hive.execution.engine]({{% ref "#hiveexecutionengine" %}})** +* ##### [hive.mapjoin.optimized.hashtable]({{% ref "#hivemapjoinoptimizedhashtable" %}}) +* ##### [hive.mapjoin.optimized.hashtable.wbsize]({{% ref "#hivemapjoinoptimizedhashtablewbsize" %}}) +* **[hive.server2.tez.default.queues]({{% ref "#hiveserver2tezdefaultqueues" %}})** +* **[hive.server2.tez.sessions.per.default.queue]({{% ref "#hiveserver2tezsessionsperdefaultqueue" %}})** +* **[hive.server2.tez.initialize.default.sessions]({{% ref "#hiveserver2tezinitializedefaultsessions" %}})** +* **[hive.stats.max.variable.length]({{% ref "#hivestatsmaxvariablelength" %}})** +* **[hive.stats.list.num.entries]({{% ref "#hivestatslistnumentries" %}})** +* **[hive.stats.map.num.entries]({{% ref "#hivestatsmapnumentries" %}})** +* **[hive.stats.map.parallelism]({{% ref "#hivestatsmapparallelism" %}})**  (Hive 0.13 only; removed in Hive 0.14) +* **[hive.stats.join.factor]({{% ref "#hivestatsjoinfactor" %}})** +* **[hive.stats.deserialization.factor]({{% ref "#hivestatsdeserializationfactor" %}})** +* **[hive.tez.dynamic.semijoin.reduction]({{% ref "#hivetezdynamicsemijoinreduction" %}})** +* **[hive.tez.min.bloom.filter.entries]({{% ref "#hivetezminbloomfilterentries" %}})** +* **[hive.tez.max.bloom.filter.entries]({{% ref "#hivetezmaxbloomfilterentries" %}})** +* **[hive.tez.bloom.filter.factor]({{% ref "#hivetezbloomfilterfactor" %}})** +* **[hive.tez.bigtable.minsize.semijoin.reduction]({{% ref "#hivetezbigtableminsizesemijoinreduction" %}})** +* **[hive.explain.user]({{% ref "#hive-explain-user" %}})** ##### hive.jar.directory * Default Value: `null` * Added In: Hive 0.13.0 with [HIVE-5003](https://issues.apache.org/jira/browse/HIVE-5003) and [HIVE-6098](https://issues.apache.org/jira/browse/HIVE-6098), default changed in [HIVE-6636](https://issues.apache.org/jira/browse/HIVE-6636) -This is the location that Hive in Tez mode will look for to find a site-wide installed Hive instance.  See  **[hive.user.install.directory]({{< ref "#hiveuserinstalldirectory" >}})** for the default behavior. +This is the location that Hive in Tez mode will look for to find a site-wide installed Hive instance.  See  **[hive.user.install.directory]({{% ref "#hiveuserinstalldirectory" %}})** for the default behavior. ##### hive.user.install.directory * Default Value: `hdfs:///user/` * Added In: Hive 0.13.0 with [HIVE-5003](https://issues.apache.org/jira/browse/HIVE-5003) and [HIVE-6098](https://issues.apache.org/jira/browse/HIVE-6098) -If Hive (in Tez mode only) cannot find a usable Hive jar in **[hive.jar.directory]({{< ref "#hivejardirectory" >}})** , it will upload the Hive jar to <**hive.user.install.directory**>/\<*user_name*\> and use it to run queries. +If Hive (in Tez mode only) cannot find a usable Hive jar in **[hive.jar.directory]({{% ref "#hivejardirectory" %}})** , it will upload the Hive jar to <**hive.user.install.directory**>/\<*user_name*\> and use it to run queries. ##### [hive.compute.splits.in.am](http://hive.compute.splits.in.am) @@ -3435,14 +3435,14 @@ By default Tez will use the Java options from map tasks. This can be used to ove * Default Value: `false` * Added In: Hive 0.13.0 with [HIVE-6447](https://issues.apache.org/jira/browse/HIVE-6447) -Whether joins can be automatically converted to bucket map joins in Hive when Tez is used as the execution engine ( **[hive.execution.engine]({{< ref "#hiveexecutionengine" >}})** is set to "`tez`"). +Whether joins can be automatically converted to bucket map joins in Hive when Tez is used as the execution engine ( **[hive.execution.engine]({{% ref "#hiveexecutionengine" %}})** is set to "`tez`"). ##### hive.tez.log.level * Default Value: `INFO` * Added In: Hive 0.13.0 with [HIVE-6743](https://issues.apache.org/jira/browse/HIVE-6743) -The log level to use for tasks executing as part of the DAG. Used only if **[hive.tez.java.opts]({{< ref "#hivetezjavaopts" >}})** is used to configure Java options. +The log level to use for tasks executing as part of the DAG. Used only if **[hive.tez.java.opts]({{% ref "#hivetezjavaopts" %}})** is used to configure Java options. ##### hive.localize.resource.wait.interval @@ -3484,32 +3484,32 @@ Turn on Tez' auto reducer parallelism feature. When enabled, Hive will still est * Default Value: `2` * Added In: Hive 0.14.0 with [HIVE-7158](https://issues.apache.org/jira/browse/HIVE-7158) -When [auto reducer parallelism]({{< ref "#auto-reducer-parallelism" >}}) is enabled this factor will be used to over-partition data in shuffle  edges. +When [auto reducer parallelism]({{% ref "#auto-reducer-parallelism" %}}) is enabled this factor will be used to over-partition data in shuffle  edges. ##### hive.tez.min.partition.factor * Default Value: `0.25` * Added In: Hive 0.14.0 with [HIVE-7158](https://issues.apache.org/jira/browse/HIVE-7158) -When [auto reducer parallelism]({{< ref "#auto-reducer-parallelism" >}}) is enabled this factor will be used to put a lower limit to the number  of reducers that Tez specifies. +When [auto reducer parallelism]({{% ref "#auto-reducer-parallelism" %}}) is enabled this factor will be used to put a lower limit to the number  of reducers that Tez specifies. ##### hive.tez.exec.print.summary * Default Value: `false` * Added In: Hive 0.14.0 with [HIVE-8495](https://issues.apache.org/jira/browse/HIVE-8495) -If true, displays breakdown of execution steps for every query executed on [Hive CLI]({{< ref "languagemanual-cli" >}}) or [Beeline]({{< ref "#beeline" >}}) client. +If true, displays breakdown of execution steps for every query executed on [Hive CLI]({{% ref "languagemanual-cli" %}}) or [Beeline]({{% ref "#beeline" %}}) client. ##### hive.tez.exec.inplace.progress * Default Value: `true` * Added In: Hive 0.14.0 with [HIVE-8495](https://issues.apache.org/jira/browse/HIVE-8495) -Updates Tez job execution progress in-place in the terminal when [Hive CLI]({{< ref "languagemanual-cli" >}}) is used. +Updates Tez job execution progress in-place in the terminal when [Hive CLI]({{% ref "languagemanual-cli" %}}) is used. ## LLAP -Live Long and Process (LLAP) functionality was added in Hive 2.0 ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926) and associated tasks). For details see [LLAP in Hive]({{< ref "llap" >}}). +Live Long and Process (LLAP) functionality was added in Hive 2.0 ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926) and associated tasks). For details see [LLAP in Hive]({{% ref "llap" %}}). LLAP adds the following configuration properties.  @@ -3838,20 +3838,20 @@ LLAP delegation token lifetime, in seconds if specified without a unit. ## Transactions and Compactor -Hive transactions with row-level ACID functionality were added in Hive 0.13.0 ([HIVE-5317](https://issues.apache.org/jira/browse/HIVE-5317) and its subtasks). For details see [ACID and Transactions in Hive]({{< ref "hive-transactions" >}}). +Hive transactions with row-level ACID functionality were added in Hive 0.13.0 ([HIVE-5317](https://issues.apache.org/jira/browse/HIVE-5317) and its subtasks). For details see [ACID and Transactions in Hive]({{% ref "hive-transactions" %}}). To turn on Hive transactions, change the values of these parameters from their defaults, as described below: -* **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** -* **[hive.compactor.initiator.on]({{< ref "#hivecompactorinitiatoron" >}})** -* **[hive.compactor.cleaner.on]({{< ref "#hivecompactorcleaneron" >}})** -* **[hive.compactor.worker.threads]({{< ref "#hivecompactorworkerthreads" >}})** +* **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** +* **[hive.compactor.initiator.on]({{% ref "#hivecompactorinitiatoron" %}})** +* **[hive.compactor.cleaner.on]({{% ref "#hivecompactorcleaneron" %}})** +* **[hive.compactor.worker.threads]({{% ref "#hivecompactorworkerthreads" %}})** These parameters must also have non-default values to turn on Hive transactions: -* **[hive.support.concurrency]({{< ref "#hivesupportconcurrency" >}})** -* **[hive.enforce.bucketing]({{< ref "#hiveenforcebucketing" >}})** (Hive 0.x and 1.x only) -* **[hive.exec.dynamic.partition.mode]({{< ref "#hiveexecdynamicpartitionmode" >}})** +* **[hive.support.concurrency]({{% ref "#hivesupportconcurrency" %}})** +* **[hive.enforce.bucketing]({{% ref "#hiveenforcebucketing" %}})** (Hive 0.x and 1.x only) +* **[hive.exec.dynamic.partition.mode]({{% ref "#hiveexecdynamicpartitionmode" %}})** ### Transactions @@ -3863,7 +3863,7 @@ These parameters must also have non-default values to turn on Hive transactions: Set this to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager as part of turning on Hive transactions. The default DummyTxnManager replicates pre-Hive-0.13 behavior and provides no transactions. -Turning on Hive transactions also requires appropriate settings for ****[hive.compactor.initiator.on]({{< ref "#hivecompactorinitiatoron" >}})**** , **[hive.compactor.cleaner.on]({{< ref "#hivecompactorcleaneron" >}}),** ******[hive.compactor.worker.threads]({{< ref "#hivecompactorworkerthreads" >}})****** , ********[hive.support.concurrency]({{< ref "#hivesupportconcurrency" >}})******** , ************[hive.enforce.bucketing]({{< ref "#hiveenforcebucketing" >}})************  (Hive 0.x and 1.x only), and ********[hive.exec.dynamic.partition.mode]({{< ref "#hiveexecdynamicpartitionmode" >}})******** . +Turning on Hive transactions also requires appropriate settings for ****[hive.compactor.initiator.on]({{% ref "#hivecompactorinitiatoron" %}})**** , **[hive.compactor.cleaner.on]({{% ref "#hivecompactorcleaneron" %}}),** ******[hive.compactor.worker.threads]({{% ref "#hivecompactorworkerthreads" %}})****** , ********[hive.support.concurrency]({{% ref "#hivesupportconcurrency" %}})******** , ************[hive.enforce.bucketing]({{% ref "#hiveenforcebucketing" %}})************  (Hive 0.x and 1.x only), and ********[hive.exec.dynamic.partition.mode]({{% ref "#hiveexecdynamicpartitionmode" %}})******** . ##### hive.txn.strict.locking.mode @@ -3914,7 +3914,7 @@ Frequency of WriteSet reaper runs. Maximum number of transactions that can be fetched in one call to open_txns(). -This controls how many transactions streaming agents such as [Flume](http://flume.apache.org/) or [Storm](https://storm.incubator.apache.org/) open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of [delta files]({{< ref "#delta-files" >}}) created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance. +This controls how many transactions streaming agents such as [Flume](http://flume.apache.org/) or [Storm](https://storm.incubator.apache.org/) open simultaneously. The streaming agent then writes that number of entries into a single file (per Flume agent or Storm bolt). Thus increasing this value decreases the number of [delta files]({{% ref "#delta-files" %}}) created by streaming agents. But it also increases the number of open transactions that Hive has to track at any given time, which may negatively affect read performance. ##### hive.max.open.txns @@ -3951,7 +3951,7 @@ ex.getMessage() + " (SQLState=" + ex.getSQLState() + ", ErrorCode=" + ex.getErro * Hive Transactions Value: `true` (for exactly one instance of the Thrift metastore service) * Added In: Hive 0.13.0 with [HIVE-5843](https://issues.apache.org/jira/browse/HIVE-5843) -Whether to run the initiator thread on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +Whether to run the initiator thread on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . Before [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) it's critical that this is enabled on exactly one metastore service instance. As of  [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388)  this property may be enabled on any number of standalone metastore instances. @@ -3961,7 +3961,7 @@ Before [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) it's crit * Hive Transactions Value: `true` (for exactly one instance of the Thrift metastore service) * Added In: Hive 4.0.0 with [HIVE-26908](https://issues.apache.org/jira/browse/HIVE-26908) -Whether to run the Cleaner thread on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +Whether to run the Cleaner thread on this metastore instance. Set this to true on one instance of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . Before Hive 4.0.0 Cleaner thread can be started/stopped with config hive.compactor.initiator.on. This config helps to enable/disable initiator/cleaner threads independently @@ -3971,7 +3971,7 @@ Before Hive 4.0.0 Cleaner thread can be started/stopped with config hive.compact * Hive Transactions Value: greater than `0` on at least one instance of the Thrift metastore service * Added In: Hive 0.13.0 with [HIVE-5843](https://issues.apache.org/jira/browse/HIVE-5843) -How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see  **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +How many compactor worker threads to run on this metastore instance. Set this to a positive number on one or more instances of the Thrift metastore service as part of turning on Hive transactions. For a complete list of parameters required for turning on transactions, see  **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . Worker threads spawn MapReduce jobs to do compactions. They do not do the compactions themselves. Increasing the number of worker threads will decrease the time it takes tables or partitions to be compacted once they are determined to need compaction. It will also increase the background load on the Hadoop cluster as more MapReduce jobs will be running in the background. @@ -4067,11 +4067,11 @@ Controls how often the process to purge historical record of compactions runs. * Default Value: `2` * Added In: Hive 1.3.0 and 2.0.0 with [HIVE-12353](https://issues.apache.org/jira/browse/HIVE-12353) -Number of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use [ALTER TABLE]({{< ref "#alter-table" >}}) to initiate compaction. Once a manually-initiated compaction succeeds, auto-initiated compactions will resume. Note that this must be less than **[hive.compactor.history.retention.failed]({{< ref "#hivecompactorhistoryretentionfailed" >}})** . +Number of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use [ALTER TABLE]({{% ref "#alter-table" %}}) to initiate compaction. Once a manually-initiated compaction succeeds, auto-initiated compactions will resume. Note that this must be less than **[hive.compactor.history.retention.failed]({{% ref "#hivecompactorhistoryretentionfailed" %}})** . ## Indexing -Indexing was added in Hive 0.7.0 with [HIVE-417](https://issues.apache.org/jira/browse/HIVE-417), and bitmap indexing was added in Hive 0.8.0 with [HIVE-1803](https://issues.apache.org/jira/browse/HIVE-1803). For more information see [Indexing]({{< ref "languagemanual-indexing" >}}). +Indexing was added in Hive 0.7.0 with [HIVE-417](https://issues.apache.org/jira/browse/HIVE-417), and bitmap indexing was added in Hive 0.8.0 with [HIVE-1803](https://issues.apache.org/jira/browse/HIVE-1803). For more information see [Indexing]({{% ref "languagemanual-indexing" %}}). ##### hive.index.compact.file.ignore.hdfs @@ -4144,7 +4144,7 @@ Whether or not to use a binary search to find the entries in an index table that ## Statistics -See [Statistics in Hive]({{< ref "statsdev" >}}) for information about how to collect and use Hive table, partition, and column statistics. +See [Statistics in Hive]({{% ref "statsdev" %}}) for information about how to collect and use Hive table, partition, and column statistics. ##### hive.stats.dbclass @@ -4161,7 +4161,7 @@ Hive 0.13 and later:  The storage that stores temporary Hive statistics. In fi * Default Value: `true` * Added In: Hive 0.7 with [HIVE-1361](https://issues.apache.org/jira/browse/HIVE-1361) -This flag enables gathering and updating statistics automatically during Hive [DML]({{< ref "languagemanual-dml" >}}) operations. +This flag enables gathering and updating statistics automatically during Hive [DML]({{% ref "languagemanual-dml" %}}) operations. Statistics are not gathered for `LOAD DATA` statements. @@ -4184,14 +4184,14 @@ The default connection string for the database that stores temporary Hive statis * Default Value: (empty) * Added In: Hive 0.7 with [HIVE-1923](https://issues.apache.org/jira/browse/HIVE-1923) -The Java class (implementing the StatsPublisher interface) that is used by default if **[hive.stats.dbclass]({{< ref "#hivestatsdbclass" >}})**  is not JDBC or HBase (Hive 0.12.0 and earlier), or if **[hive.stats.dbclass]({{< ref "#hivestatsdbclass" >}})** is a custom type (Hive 0.13.0 and later:  [HIVE-4632](https://issues.apache.org/jira/browse/HIVE-4632)). +The Java class (implementing the StatsPublisher interface) that is used by default if **[hive.stats.dbclass]({{% ref "#hivestatsdbclass" %}})**  is not JDBC or HBase (Hive 0.12.0 and earlier), or if **[hive.stats.dbclass]({{% ref "#hivestatsdbclass" %}})** is a custom type (Hive 0.13.0 and later:  [HIVE-4632](https://issues.apache.org/jira/browse/HIVE-4632)). ##### hive.stats.default.aggregator * Default Value: (empty) * Added In: Hive 0.7 with [HIVE-1923](https://issues.apache.org/jira/browse/HIVE-1923) -The Java class (implementing the StatsAggregator interface) that is used by default if **[hive.stats.dbclass]({{< ref "#hivestatsdbclass" >}})**  is not JDBC or HBase (Hive 0.12.0 and earlier), or if **[hive.stats.dbclass]({{< ref "#hivestatsdbclass" >}})**  is a custom type (Hive 0.13.0 and later:  [HIVE-4632](https://issues.apache.org/jira/browse/HIVE-4632)). +The Java class (implementing the StatsAggregator interface) that is used by default if **[hive.stats.dbclass]({{% ref "#hivestatsdbclass" %}})**  is not JDBC or HBase (Hive 0.12.0 and earlier), or if **[hive.stats.dbclass]({{% ref "#hivestatsdbclass" %}})**  is a custom type (Hive 0.13.0 and later:  [HIVE-4632](https://issues.apache.org/jira/browse/HIVE-4632)). ##### hive.stats.jdbc.timeout @@ -4240,7 +4240,7 @@ Comma-separated list of statistics publishers to be invoked on counters on each * Default Value: (empty) * Added In: Hive 0.8.1 with [HIVE-2446](https://issues.apache.org/jira/browse/HIVE-2446) ([patch 2](https://issues.apache.org/jira/secure/attachment/12494853/HIVE-2446.2.patch)) -Subset of counters that should be of interest for **[hive.client.stats.publishers]({{< ref "configuration-properties" >}})** (when one wants to limit their publishing). Non-display names should be used. +Subset of counters that should be of interest for **[hive.client.stats.publishers]({{% ref "configuration-properties" %}})** (when one wants to limit their publishing). Non-display names should be used. ##### hive.stats.reliable @@ -4287,7 +4287,7 @@ Determines if, when the prefix of the key used for intermediate statistics colle * Default Value: `24` * Added In: Hive 0.13 with [HIVE-6229](https://issues.apache.org/jira/browse/HIVE-6229) -Reserved length for postfix of statistics key. Currently only meaningful for counter type statistics which should keep the length of the full statistics key smaller than the maximum length configured by **[hive.stats.key.prefix.max.length]({{< ref "#hivestatskeyprefixmaxlength" >}})** . For counter type statistics, it should be bigger than the length of LB spec if exists. +Reserved length for postfix of statistics key. Currently only meaningful for counter type statistics which should keep the length of the full statistics key smaller than the maximum length configured by **[hive.stats.key.prefix.max.length]({{% ref "#hivestatskeyprefixmaxlength" %}})** . For counter type statistics, it should be bigger than the length of LB spec if exists. ##### hive.stats.max.variable.length @@ -4369,14 +4369,14 @@ In the absence of table/partition statistics, average row size will be used to  * Default Value: `false` * Added In: Hive 0.13.0 with [HIVE-5483](https://issues.apache.org/jira/browse/HIVE-5483) -When set to true Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the metastore. For basic statistics collection, set the configuration property **[hive.stats.autogather]({{< ref "#hivestatsautogather" >}})** to true. For more advanced statistics collection, run ANALYZE TABLE queries. +When set to true Hive will answer a few queries like min, max, and count(1) purely using statistics stored in the metastore. For basic statistics collection, set the configuration property **[hive.stats.autogather]({{% ref "#hivestatsautogather" %}})** to true. For more advanced statistics collection, run ANALYZE TABLE queries. ##### hive.stats.gather.num.threads * Default Value: `10` * Added In: Hive 0.13.0 with [HIVE-6578](https://issues.apache.org/jira/browse/HIVE-6578) -Number of threads used by partialscan/noscan analyze command for partitioned tables. This is applicable only for file formats that implement the StatsProvidingRecordReader interface (like [ORC]({{< ref "#orc" >}})). +Number of threads used by partialscan/noscan analyze command for partitioned tables. This is applicable only for file formats that implement the StatsProvidingRecordReader interface (like [ORC]({{% ref "#orc" %}})). ##### hive.stats.fetch.bitvector @@ -4414,12 +4414,12 @@ Whether Hive fetches bitvector when computing number of distinct values (ndv). K ## Authentication and Authorization -* [Restricted/Hidden/Internal List and Whitelist]({{< ref "#restrictedhiddeninternal-list-and-whitelist" >}}) -* [Hive Client Security]({{< ref "#hive-client-security" >}}) -* [Hive Metastore Security]({{< ref "#hive-metastore-security" >}}) -* [SQL Standard Based Authorization]({{< ref "#sql-standard-based-authorization" >}}) +* [Restricted/Hidden/Internal List and Whitelist]({{% ref "#restrictedhiddeninternal-list-and-whitelist" %}}) +* [Hive Client Security]({{% ref "#hive-client-security" %}}) +* [Hive Metastore Security]({{% ref "#hive-metastore-security" %}}) +* [SQL Standard Based Authorization]({{% ref "#sql-standard-based-authorization" %}}) -For an overview of authorization modes, see [Hive Authorization]({{< ref "languagemanual-authorization" >}}). +For an overview of authorization modes, see [Hive Authorization]({{% ref "languagemanual-authorization" %}}). ### Restricted/Hidden/Internal List and Whitelist @@ -4435,7 +4435,7 @@ For an overview of authorization modes, see [Hive Authorization]({{< ref "langua + Hive 3.0.0: *all of the above, plus these:* `hive.spark.client.connect.timeout` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.server.connect.timeout` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.channel.log.level` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.rpc.max.size` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.rpc.threads` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.secret.bits` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.rpc.server.address` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hive.spark.client.rpc.server.port` ([HIVE-16876](https://issues.apache.org/jira/browse/HIVE-16876)), `hikari.*` ([HIVE-17318](https://issues.apache.org/jira/browse/HIVE-17318)), `dbcp.*` ([HIVE-17319](https://issues.apache.org/jira/browse/HIVE-17319)), hadoop.bin.path ([HIVE-18248](https://issues.apache.org/jira/browse/HIVE-18248)), yarn.bin.path ([HIVE-18248](https://issues.apache.org/jira/browse/HIVE-18248)) * Added In: Hive 0.11.0 with [HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935) -Comma separated list of configuration properties which are immutable at runtime. For example, if **[hive.security.authorization.enabled]({{< ref "#hivesecurityauthorizationenabled" >}})** is set to true, it should be included in this list to prevent a client from changing it to false at runtime. +Comma separated list of configuration properties which are immutable at runtime. For example, if **[hive.security.authorization.enabled]({{% ref "#hivesecurityauthorizationenabled" %}})** is set to true, it should be included in this list to prevent a client from changing it to false at runtime. ##### hive.conf.hidden.list @@ -4467,7 +4467,7 @@ Comma separated list of non-SQL Hive commands that users are authorized to execu #### Whitelist for SQL Standard Based Hive Authorization -See  **[hive.security.authorization.sqlstd.confwhitelist]({{< ref "#hivesecurityauthorizationsqlstdconfwhitelist" >}})** below for information about the whitelist property that authorizes set commands in SQL standard based authorization. +See  **[hive.security.authorization.sqlstd.confwhitelist]({{% ref "#hivesecurityauthorizationsqlstdconfwhitelist" %}})** below for information about the whitelist property that authorizes set commands in SQL standard based authorization. ### Hive Client Security @@ -4522,9 +4522,9 @@ The privileges automatically granted to the owner whenever a table gets created. ### Hive Metastore Security -Metastore-side security was added in Hive 0.10.0 ([HIVE-3705](https://issues.apache.org/jira/browse/HIVE-3705)).  For more information, see the [overview in Authorization]({{< ref "#overview-in authorization" >}}) and details in [Storage Based Authorization in the Metastore Server]({{< ref "storage-based-authorization-in-the-metastore-server" >}}). +Metastore-side security was added in Hive 0.10.0 ([HIVE-3705](https://issues.apache.org/jira/browse/HIVE-3705)).  For more information, see the [overview in Authorization]({{% ref "#overview-in authorization" %}}) and details in [Storage Based Authorization in the Metastore Server]({{% ref "storage-based-authorization-in-the-metastore-server" %}}). -For general metastore configuration properties, see [MetaStore]({{< ref "#metastore" >}}). +For general metastore configuration properties, see [MetaStore]({{% ref "#metastore" %}}). ##### hive.metastore.pre.event.listeners @@ -4560,7 +4560,7 @@ The authenticator manager class name to be used in the metastore for authenticat * Default Value: `true` * Added In: Hive 0.14.0 with [HIVE-8221](https://issues.apache.org/jira/browse/HIVE-8221) -If this is true, the metastore authorizer authorizes read actions on database and table. See [Storage Based Authorization]({{< ref "#storage-based-authorization" >}}). +If this is true, the metastore authorizer authorizes read actions on database and table. See [Storage Based Authorization]({{% ref "#storage-based-authorization" %}}). ##### hive.metastore.token.signature @@ -4573,7 +4573,7 @@ The delegation token service name to match when selecting a token from the curre Version -Hive 0.13.0 introduces fine-grained authorization based on the [SQL standard authorization]({{< ref "sql-standard-based-hive-authorization" >}}) model. See [HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837) for the functional specification and list of subtasks. +Hive 0.13.0 introduces fine-grained authorization based on the [SQL standard authorization]({{% ref "sql-standard-based-hive-authorization" %}}) model. See [HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837) for the functional specification and list of subtasks. ##### hive.users.in.admin.role @@ -4587,7 +4587,7 @@ A comma separated list of users which will be added to the ADMIN role when the m * Default Value: (empty, but includes list shown below implicitly) * Added In: Hive 0.13.0 with [HIVE-6846](https://issues.apache.org/jira/browse/HIVE-6846); updated in Hive 0.14.0 with [HIVE-8534](https://issues.apache.org/jira/browse/HIVE-8534) and in subsequent releases with several JIRA issues -A Java regex. Configuration properties that match this regex can be modified by user when [SQL standard base authorization]({{< ref "sql-standard-based-hive-authorization" >}}) is used. +A Java regex. Configuration properties that match this regex can be modified by user when [SQL standard base authorization]({{% ref "sql-standard-based-hive-authorization" %}}) is used. If this parameter is not set, the default list is added by the SQL standard authorizer. To display the default list for the current release, use the command '`set hive.security.authorization.sqlstd.confwhitelist`'. @@ -4609,41 +4609,41 @@ Hive 3.0.0 fixes a parameter added in 1.2.1, changing mapred.job.queuename to ma Some parameters are added automatically when they match one of the regex specifications for the white list in HiveConf.java (for example, **hive.log.trace.id** in Hive 2.0.0 –  see [HIVE-12419](https://issues.apache.org/jira/browse/HIVE-12419) ). -Note that the **[hive.conf.restricted.list]({{< ref "#hiveconfrestrictedlist" >}})** checks are still enforced after the white list check. +Note that the **[hive.conf.restricted.list]({{% ref "#hiveconfrestrictedlist" %}})** checks are still enforced after the white list check. ##### hive.security.authorization.sqlstd.confwhitelist.append * Default Value: (empty) * Added In: Hive 0.14.0  with [HIVE-8534](https://issues.apache.org/jira/browse/HIVE-8534) -Second Java regex that the whitelist of configuration properties would match in addition to [**hive.security.authorization.sqlstd.confwhitelist**]({{< ref "#**hive-security-authorization-sqlstd-confwhitelist**" >}}). Do not include a starting `|` in the value. +Second Java regex that the whitelist of configuration properties would match in addition to [**hive.security.authorization.sqlstd.confwhitelist**]({{% ref "#**hive-security-authorization-sqlstd-confwhitelist**" %}}). Do not include a starting `|` in the value. -Using this regex instead of updating the original regex for  [**hive.security.authorization.sqlstd.confwhitelist**]({{< ref "#**hive-security-authorization-sqlstd-confwhitelist**" >}}) means that you can append to the default that is set by SQL standard authorization instead of replacing it entirely. +Using this regex instead of updating the original regex for  [**hive.security.authorization.sqlstd.confwhitelist**]({{% ref "#**hive-security-authorization-sqlstd-confwhitelist**" %}}) means that you can append to the default that is set by SQL standard authorization instead of replacing it entirely. ##### hive.server2.builtin.udf.whitelist * Default Value: (empty, treated as not set – all UDFs allowed) -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-8893](https://issues.apache.org/jira/browse/HIVE-8893) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-8893](https://issues.apache.org/jira/browse/HIVE-8893) A comma separated list of builtin UDFs that are allowed to be executed. A UDF that is not included in the list will return an error if invoked from a query. If set to empty, then treated as wildcard character – all UDFs will be allowed. Note that this configuration is read at the startup time by HiveServer2 and changing this using a 'set' command in a session won't change the behavior. ##### hive.server2.builtin.udf.blacklist * Default Value: (empty) -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-8893](https://issues.apache.org/jira/browse/HIVE-8893) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-8893](https://issues.apache.org/jira/browse/HIVE-8893) A comma separated list of builtin UDFs that are not allowed to be executed. A UDF that is included in the list will return an error if invoked from a query.  Note that this configuration is read at the startup time by HiveServer2 and changing this using a 'set' command in a session won't change the behavior. ##### hive.security.authorization.task.factory * Default Value: `org.apache.hadoop.hive.ql.parse.authorization.HiveAuthorizationTaskFactoryImpl` -* Added In: Hive  [1.1.0]({{< ref "#1-1-0" >}})  with [HIVE-8611](https://issues.apache.org/jira/browse/HIVE-8611) +* Added In: Hive  [1.1.0]({{% ref "#1-1-0" %}})  with [HIVE-8611](https://issues.apache.org/jira/browse/HIVE-8611) To override the default authorization DDL handling, set **hive.security.authorization.task.factory** to a class that implements the org.apache.hadoop.hive.ql.parse.authorization.HiveAuthorizationTaskFactory interface. ## Archiving -See [Archiving for File Count Reduction]({{< ref "languagemanual-archiving" >}}) for general information about Hive support for [Hadoop archives](http://hadoop.apache.org/docs/stable1/hadoop_archives.html). +See [Archiving for File Count Reduction]({{% ref "languagemanual-archiving" %}}) for general information about Hive support for [Hadoop archives](http://hadoop.apache.org/docs/stable1/hadoop_archives.html). ##### fs.har.impl @@ -4669,7 +4669,7 @@ In new Hadoop versions, the parent directory must be set while creating a HAR. B ## Locking -See [Hive Concurrency Model]({{< ref "locking" >}}) for general information about locking. +See [Hive Concurrency Model]({{% ref "locking" %}}) for general information about locking. ##### hive.support.concurrency @@ -4678,14 +4678,14 @@ See [Hive Concurrency Model]({{< ref "locking" >}}) for general information abou Whether Hive supports concurrency or not. A [ZooKeeper](https://zookeeper.apache.org) instance must be up and running for the default Hive lock manager to support read-write locks. -Set to `true` to support [INSERT ... VALUES, UPDATE, and DELETE]({{< ref "hive-transactions" >}}) transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{< ref "#hivetxnmanager" >}})** . +Set to `true` to support [INSERT ... VALUES, UPDATE, and DELETE]({{% ref "hive-transactions" %}}) transactions (Hive 0.14.0 and later). For a complete list of parameters required for turning on Hive transactions, see  **[hive.txn.manager]({{% ref "#hivetxnmanager" %}})** . ##### hive.lock.manager * Default Value: `org.apache.hadoop.hive.ql.lockmgr.zookeeper.ZooKeeperHiveLockManager` * Added In: Hive 0.7.0 with [HIVE-1293](https://issues.apache.org/jira/browse/HIVE-1293) -The lock manager to use when [**hive.support.concurrency**]({{< ref "#**hive-support-concurrency**" >}}) is set to `true`. +The lock manager to use when [**hive.support.concurrency**]({{% ref "#**hive-support-concurrency**" %}}) is set to `true`. ##### hive.lock.mapred.only.operation @@ -4766,18 +4766,18 @@ Clean extra nodes at the end of the session. * Default Value: `__HIVE_DEFAULT_ZOOKEEPER_PARTITION__` * Added In: Hive 0.7.0 with [HIVE-1293](https://issues.apache.org/jira/browse/HIVE-1293) -The default partition name when ZooKeeperHiveLockManager is the  [**hive lock manager**]({{< ref "#**hive-lock-manager**" >}}) . +The default partition name when ZooKeeperHiveLockManager is the  [**hive lock manager**]({{% ref "#**hive-lock-manager**" %}}) . ## Metrics -The metrics that Hive collects can be viewed in the [HiveServer2 Web UI](https://hive.apache.org/docs/latest/admin/setting-up-hiveserver2#web-ui-for-hiveserver2). For more information, see [Hive Metrics]({{< ref "hive-metrics" >}}). +The metrics that Hive collects can be viewed in the [HiveServer2 Web UI](https://hive.apache.org/docs/latest/admin/setting-up-hiveserver2#web-ui-for-hiveserver2). For more information, see [Hive Metrics]({{% ref "hive-metrics" %}}). ##### hive.metastore.metrics.enabled * Default Value: `false` * Added in: Hive 1.3.0 and 2.0.0 with [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761) -Enable metrics on the Hive Metastore Service. (For other metastore configuration properties, see the [Metastore]({{< ref "#metastore" >}}) and [Hive Metastore Security]({{< ref "#hive-metastore-security" >}}) sections.) +Enable metrics on the Hive Metastore Service. (For other metastore configuration properties, see the [Metastore]({{% ref "#metastore" %}}) and [Hive Metastore Security]({{% ref "#hive-metastore-security" %}}) sections.) ##### hive.metastore.acidmetrics.thread.on @@ -4791,7 +4791,7 @@ Whether to run acid related metrics collection on this metastore instance. * Default Value: `false` * Added in: Hive 1.3.0 and 2.0.0 with [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761) -Enable metrics on HiveServer2. (For other HiveServer2 configuration properties, see the [HiveServer2]({{< ref "#hiveserver2" >}}) section.) +Enable metrics on HiveServer2. (For other HiveServer2 configuration properties, see the [HiveServer2]({{% ref "#hiveserver2" %}}) section.) ##### hive.service.metrics.class @@ -4822,28 +4822,28 @@ Comma separated list of reporter implementation classes for metric class org.apa * Default Value:  "`/tmp/report.json`" * Added in: Hive 1.3.0 and 2.0.0 with [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761) -For [**hive.service.metrics.class**]({{< ref "#**hive-service-metrics-class**" >}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and [**hive.service.metrics.reporter**]({{< ref "#**hive-service-metrics-reporter**" >}}) JSON_FILE, this is the location of the local JSON metrics file dump. This file will get overwritten at every interval of [**hive.service.metrics.file.frequency**]({{< ref "#**hive-service-metrics-file-frequency**" >}}). +For [**hive.service.metrics.class**]({{% ref "#**hive-service-metrics-class**" %}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and [**hive.service.metrics.reporter**]({{% ref "#**hive-service-metrics-reporter**" %}}) JSON_FILE, this is the location of the local JSON metrics file dump. This file will get overwritten at every interval of [**hive.service.metrics.file.frequency**]({{% ref "#**hive-service-metrics-file-frequency**" %}}). ##### hive.service.metrics.file.frequency * Default Value:  5 seconds * Added in: Hive 1.3.0 and 2.0.0 with [HIVE-10761](https://issues.apache.org/jira/browse/HIVE-10761) -For [**hive.service.metrics.class**]({{< ref "#**hive-service-metrics-class**" >}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and [**hive.service.metrics.reporter**]({{< ref "#**hive-service-metrics-reporter**" >}}) JSON_FILE, this is the frequency of updating the JSON metrics file. +For [**hive.service.metrics.class**]({{% ref "#**hive-service-metrics-class**" %}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and [**hive.service.metrics.reporter**]({{% ref "#**hive-service-metrics-reporter**" %}}) JSON_FILE, this is the frequency of updating the JSON metrics file. ##### hive.service.metrics.hadoop2.component * Default Value:  "`hive`" * Added in: Hive 2.1.0 with [HIVE-13480](https://issues.apache.org/jira/browse/HIVE-13480) -For  [**hive.service.metrics.class**]({{< ref "#**hive-service-metrics-class**" >}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and  [**hive.service.metrics.reporter**]({{< ref "#**hive-service-metrics-reporter**" >}}) HADOOP2, this is the component name to provide to the HADOOP2 metrics system. Ideally 'hivemetastore' for the MetaStore and 'hiveserver2' for HiveServer2. The metrics will be updated at every interval of  [**hive.service.metrics.hadoop2.frequency**]({{< ref "#**hive-service-metrics-hadoop2-frequency**" >}}). +For  [**hive.service.metrics.class**]({{% ref "#**hive-service-metrics-class**" %}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and  [**hive.service.metrics.reporter**]({{% ref "#**hive-service-metrics-reporter**" %}}) HADOOP2, this is the component name to provide to the HADOOP2 metrics system. Ideally 'hivemetastore' for the MetaStore and 'hiveserver2' for HiveServer2. The metrics will be updated at every interval of  [**hive.service.metrics.hadoop2.frequency**]({{% ref "#**hive-service-metrics-hadoop2-frequency**" %}}). ##### hive.service.metrics.hadoop2.frequency * Default Value:  30 seconds * Added in: Hive 2.1.0 with [HIVE-13480](https://issues.apache.org/jira/browse/HIVE-13480) -For  [**hive.service.metrics.class**]({{< ref "#**hive-service-metrics-class**" >}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and  [**hive.service.metrics.reporter**]({{< ref "#**hive-service-metrics-reporter**" >}}) HADOOP2, this is the frequency of updating the HADOOP2 metrics system. +For  [**hive.service.metrics.class**]({{% ref "#**hive-service-metrics-class**" %}}) org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics and  [**hive.service.metrics.reporter**]({{% ref "#**hive-service-metrics-reporter**" %}}) HADOOP2, this is the frequency of updating the HADOOP2 metrics system. ## Clustering @@ -4983,7 +4983,7 @@ Indicates whether the REPL DUMP command dumps only metadata information (`true`) * Default Value: `false` * Added in: Hive 3.0.0 with [HIVE-18352](https://issues.apache.org/jira/browse/HIVE-18352) -Indicates whether replication dump should include information about ACID tables. It should be used in conjunction with [**hive.repl.dump.metadata.only**]({{< ref "#**hive-repl-dump-metadata-only**" >}}) to enable copying of metadata for ACID tables which do not require the corresponding transaction semantics to be applied on target. This can be removed when ACID table replication is supported. +Indicates whether replication dump should include information about ACID tables. It should be used in conjunction with [**hive.repl.dump.metadata.only**]({{% ref "#**hive-repl-dump-metadata-only**" %}}) to enable copying of metadata for ACID tables which do not require the corresponding transaction semantics to be applied on target. This can be removed when ACID table replication is supported. ##### hive.repl.add.raw.reserved.namespace @@ -5009,7 +5009,7 @@ List of supported blobstore schemes that Hive uses to apply special read/write p * Added In: Hive 2.2.0 with [HIVE-15121](https://issues.apache.org/jira/browse/HIVE-15121) This parameter is a global variable that enables a number of optimizations when running on blobstores. -Some of the optimizations, such as **[hive.blobstore.use.blobstore.as.scratchdir]({{< ref "#hiveblobstoreuseblobstoreasscratchdir" >}})** , won't be used if this variable is set to false. +Some of the optimizations, such as **[hive.blobstore.use.blobstore.as.scratchdir]({{% ref "#hiveblobstoreuseblobstoreasscratchdir" %}})** , won't be used if this variable is set to false. ##### hive.blobstore.use.blobstore.as.scratchdir @@ -5027,7 +5027,7 @@ Set this to a maximum number of threads that Hive will use to list file informat ## Test Properties -Note:  This is an incomplete list of configuration properties used by developers when running Hive tests. For other test properties, search for "hive.test." in [hive-default.xml.template or HiveConf.java]({{< ref "#hive-default-xml-template or-hiveconf-java" >}}). Also see [Beeline Query Unit Test]({{< ref "#beeline-query-unit-test" >}}). +Note:  This is an incomplete list of configuration properties used by developers when running Hive tests. For other test properties, search for "hive.test." in [hive-default.xml.template or HiveConf.java]({{% ref "#hive-default-xml-template or-hiveconf-java" %}}). Also see [Beeline Query Unit Test]({{% ref "#beeline-query-unit-test" %}}). ##### hive.test.mode @@ -5066,11 +5066,11 @@ Determines whether local tasks (typically mapjoin hashtable generation phase) ru # HCatalog Configuration Properties -Starting in Hive release 0.11.0, HCatalog is installed and configured with Hive. The HCatalog server is the same as the Hive metastore. See [Hive Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}) for metastore configuration properties. For Hive releases prior to 0.11.0, see the "Thrift Server Setup" section in the HCatalog 0.5.0 document [Installation from Tarball](http://hive.apache.org/docs/hcat_r0.5.0/install.html) for information about setting the Hive metastore configuration properties. +Starting in Hive release 0.11.0, HCatalog is installed and configured with Hive. The HCatalog server is the same as the Hive metastore. See [Hive Metastore Administration]({{% ref "adminmanual-metastore-administration" %}}) for metastore configuration properties. For Hive releases prior to 0.11.0, see the "Thrift Server Setup" section in the HCatalog 0.5.0 document [Installation from Tarball](http://hive.apache.org/docs/hcat_r0.5.0/install.html) for information about setting the Hive metastore configuration properties. -Jobs submitted to HCatalog can specify configuration properties that affect storage, error tolerance, and other kinds of behavior during the job.  See [HCatalog Configuration Properties]({{< ref "hcatalog-configuration-properties" >}}) for details. +Jobs submitted to HCatalog can specify configuration properties that affect storage, error tolerance, and other kinds of behavior during the job.  See [HCatalog Configuration Properties]({{% ref "hcatalog-configuration-properties" %}}) for details. # WebHCat Configuration Properties -For WebHCat configuration, see [Configuration Variables]({{< ref "#configuration-variables" >}}) in the WebHCat manual. +For WebHCat configuration, see [Configuration Variables]({{% ref "#configuration-variables" %}}) in the WebHCat manual. diff --git a/content/docs/latest/user/cost-based-optimization-in-hive.md b/content/docs/latest/user/cost-based-optimization-in-hive.md index 8114c8fb..f0736eef 100644 --- a/content/docs/latest/user/cost-based-optimization-in-hive.md +++ b/content/docs/latest/user/cost-based-optimization-in-hive.md @@ -307,7 +307,7 @@ Cost Based Optimizations: ## Configuration -The configuration parameter [hive.cbo.enable]({{< ref "#hive-cbo-enable" >}}) determines whether cost-based optimization is enabled or not. +The configuration parameter [hive.cbo.enable]({{% ref "#hive-cbo-enable" %}}) determines whether cost-based optimization is enabled or not. ## Proposed Cost Model diff --git a/content/docs/latest/user/csv-serde.md b/content/docs/latest/user/csv-serde.md index f1297d2c..6a66392c 100644 --- a/content/docs/latest/user/csv-serde.md +++ b/content/docs/latest/user/csv-serde.md @@ -48,7 +48,7 @@ DEFAULT_QUOTE_CHARACTER " DEFAULT_SEPARATOR , ``` -For general information about SerDes, see [Hive SerDe]({{< ref "#hive-serde" >}}) in the Developer Guide. Also see [SerDe]({{< ref "serde" >}}) for details about input and output processing. +For general information about SerDes, see [Hive SerDe]({{% ref "#hive-serde" %}}) in the Developer Guide. Also see [SerDe]({{% ref "serde" %}}) for details about input and output processing. ### Versions diff --git a/content/docs/latest/user/druid-integration.md b/content/docs/latest/user/druid-integration.md index 278f71b7..ffb9a295 100644 --- a/content/docs/latest/user/druid-integration.md +++ b/content/docs/latest/user/druid-integration.md @@ -30,7 +30,7 @@ Druid is an open-source analytics data store designed for business intelligence ## Storage Handlers -You can find an overview of Hive Storage Handlers [here]({{< ref "storagehandlers" >}}); the integration of Druid with Hive depends upon that framework. +You can find an overview of Hive Storage Handlers [here]({{% ref "storagehandlers" %}}); the integration of Druid with Hive depends upon that framework. # Usage diff --git a/content/docs/latest/user/fileformats.md b/content/docs/latest/user/fileformats.md index 277f15a1..2d0db689 100644 --- a/content/docs/latest/user/fileformats.md +++ b/content/docs/latest/user/fileformats.md @@ -13,20 +13,20 @@ Hive supports several file formats: * Text File * SequenceFile -* [RCFile]({{< ref "rcfile" >}}) -* [Avro Files]({{< ref "avroserde" >}}) -* [ORC Files]({{< ref "languagemanual-orc" >}}) -* [Parquet]({{< ref "parquet" >}}) +* [RCFile]({{% ref "rcfile" %}}) +* [Avro Files]({{% ref "avroserde" %}}) +* [ORC Files]({{% ref "languagemanual-orc" %}}) +* [Parquet]({{% ref "parquet" %}}) * Custom INPUTFORMAT and OUTPUTFORMAT -The [hive.default.fileformat]({{< ref "#hive-default-fileformat" >}}) configuration parameter determines the format to use if it is not specified in a [CREATE TABLE]({{< ref "#create-table" >}}) or [ALTER TABLE]({{< ref "#alter-table" >}}) statement.  Text file is the parameter's default value. +The [hive.default.fileformat]({{% ref "#hive-default-fileformat" %}}) configuration parameter determines the format to use if it is not specified in a [CREATE TABLE]({{% ref "#create-table" %}}) or [ALTER TABLE]({{% ref "#alter-table" %}}) statement.  Text file is the parameter's default value. -For more information, see the sections [Storage Formats]({{< ref "#storage-formats" >}}) and [Row Formats & SerDe]({{< ref "#row-formats-&-serde" >}}) on the DDL page. +For more information, see the sections [Storage Formats]({{% ref "#storage-formats" %}}) and [Row Formats & SerDe]({{% ref "#row-formats-&-serde" %}}) on the DDL page. ### File Compression -* [Compressed Data Storage]({{< ref "compressedstorage" >}}) -* [LZO Compression]({{< ref "languagemanual-lzo" >}}) +* [Compressed Data Storage]({{% ref "compressedstorage" %}}) +* [LZO Compression]({{% ref "languagemanual-lzo" %}}) diff --git a/content/docs/latest/user/hbaseintegration.md b/content/docs/latest/user/hbaseintegration.md index b6480724..0ac3e85e 100644 --- a/content/docs/latest/user/hbaseintegration.md +++ b/content/docs/latest/user/hbaseintegration.md @@ -23,7 +23,7 @@ Hive 1.x will remain compatible with HBase 0.98.x and lower versions. Hive 2.x w ## Storage Handlers -Before proceeding, please read [StorageHandlers]({{< ref "storagehandlers" >}}) for an overview of the generic storage handler framework on which HBase integration depends. +Before proceeding, please read [StorageHandlers]({{% ref "storagehandlers" %}}) for an overview of the generic storage handler framework on which HBase integration depends. ## Usage @@ -80,7 +80,7 @@ ROW COLUMN+CELL Notice that even though a column name "val" is specified in the mapping, only the column family name "cf1" appears in the DESCRIBE output in the HBase shell. This is because in HBase, only column families (not columns) are known in the table-level metadata; column names within a column family are only present at the per-row level. -Here's how to move data from Hive into the HBase table (see [GettingStarted]({{< ref "gettingstarted-latest" >}}) for how to create the example table `pokes` in Hive first): +Here's how to move data from Hive into the HBase table (see [GettingStarted]({{% ref "gettingstarted-latest" %}}) for how to create the example table `pokes` in Hive first): ``` INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98; @@ -145,7 +145,7 @@ The column mapping support currently available is somewhat cumbersome and restri + If no type specification is given the value from `hbase.table.default.storage.type` will be used + Any prefixes of the valid values are valid too (i.e. `#b` instead of `#binary`) + If you specify a column as `binary` the bytes in the corresponding HBase cells are expected to be of the form that HBase's `Bytes` class yields. -* there must be exactly one `:key` mapping (this can be mapped either to a string or struct column–see [Simple Composite Keys]({{< ref "#simple-composite-keys" >}}) and [Complex Composite Keys]({{< ref "#complex-composite-keys" >}})) +* there must be exactly one `:key` mapping (this can be mapped either to a string or struct column–see [Simple Composite Keys]({{% ref "#simple-composite-keys" %}}) and [Complex Composite Keys]({{% ref "#complex-composite-keys" %}})) * (note that before [HIVE-1228](https://issues.apache.org/jira/browse/HIVE-1228) in Hive 0.6, `:key` was not supported, and the first Hive column implicitly mapped to the key; as of Hive 0.6, it is now strongly recommended that you always specify the key explictly; we will drop support for implicit key mapping in the future) * if no column-name is given, then the Hive column will map to all columns in the corresponding HBase column family, and the Hive MAP datatype must be used to allow access to these (possibly sparse) columns * Since HBase 1.1 ([HBASE-2828](https://issues.apache.org/jira/browse/HIVE-2828)) there is a way to access the HBase timestamp attribute using the special `:timestamp` mapping. It needs to be either `bigint` or `timestamp`. @@ -465,7 +465,7 @@ There are a number of areas where Hive/HBase integration could definitely use mo * more flexible column mapping (HIVE-806, HIVE-1245) * default column mapping in cases where no mapping spec is given -* filter pushdown and indexing (see [FilterPushdownDev]({{< ref "filterpushdowndev" >}}) and [IndexDev]({{< ref "indexdev" >}})) +* filter pushdown and indexing (see [FilterPushdownDev]({{% ref "filterpushdowndev" %}}) and [IndexDev]({{% ref "indexdev" %}})) * expose timestamp attribute, possibly also with support for treating it as a partition key * allow per-table hbase.master configuration * run profiler and minimize any per-row overhead in column mapping @@ -495,7 +495,7 @@ An Eclipse launch template remains to be defined. ## Links -* For information on how to bulk load data from Hive into HBase, see [HBaseBulkLoad]({{< ref "hbasebulkload" >}}). +* For information on how to bulk load data from Hive into HBase, see [HBaseBulkLoad]({{% ref "hbasebulkload" %}}). * For another project which adds SQL-like query language support on top of HBase, see [HBQL](http://www.hbql.com) (unrelated to Hive). ## Acknowledgements diff --git a/content/docs/latest/user/hive-deprecated-authorization-mode.md b/content/docs/latest/user/hive-deprecated-authorization-mode.md index 5b6a6e84..03d21415 100644 --- a/content/docs/latest/user/hive-deprecated-authorization-mode.md +++ b/content/docs/latest/user/hive-deprecated-authorization-mode.md @@ -5,11 +5,11 @@ date: 2024-12-12 # Apache Hive : Hive deprecated authorization mode / Legacy Mode -This document describes Hive security using the basic authorization scheme, which regulates access to Hive metadata on the client side. This was the default authorization mode used when authorization was enabled. The default was changed to [SQL Standard authorization]({{< ref "sql-standard-based-hive-authorization" >}}) in Hive 2.0 ([HIVE-12429](https://issues.apache.org/jira/browse/HIVE-12429)). +This document describes Hive security using the basic authorization scheme, which regulates access to Hive metadata on the client side. This was the default authorization mode used when authorization was enabled. The default was changed to [SQL Standard authorization]({{% ref "sql-standard-based-hive-authorization" %}}) in Hive 2.0 ([HIVE-12429](https://issues.apache.org/jira/browse/HIVE-12429)). ### Disclaimer -Hive authorization is not completely secure. The basic authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.  See the [Hive authorization main page]({{< ref "languagemanual-authorization" >}}) for the secure options. +Hive authorization is not completely secure. The basic authorization scheme is intended primarily to prevent good users from accidentally doing bad things, but makes no promises about preventing malicious users from doing malicious things.  See the [Hive authorization main page]({{% ref "languagemanual-authorization" %}}) for the secure options. ### Prerequisites @@ -31,7 +31,7 @@ In order to use Hive authorization, there are two parameters that should be set ``` -Note that, by default, the [hive.security.authorization.createtable.owner.grants]({{< ref "#hive-security-authorization-createtable-owner-grants" >}}) are set to null, which would result in the creator of a table having no access to the table. +Note that, by default, the [hive.security.authorization.createtable.owner.grants]({{% ref "#hive-security-authorization-createtable-owner-grants" %}}) are set to null, which would result in the creator of a table having no access to the table. ### Users, Groups, and Roles diff --git a/content/docs/latest/user/hive-metrics.md b/content/docs/latest/user/hive-metrics.md index cba7509a..99d9ac64 100644 --- a/content/docs/latest/user/hive-metrics.md +++ b/content/docs/latest/user/hive-metrics.md @@ -7,7 +7,7 @@ date: 2024-12-12 -The metrics that Hive collects can be viewed in the [HiveServer2 Web UI]({{< ref "#hiveserver2-web-ui" >}}) by using the "Metrics Dump" tab. +The metrics that Hive collects can be viewed in the [HiveServer2 Web UI]({{% ref "#hiveserver2-web-ui" %}}) by using the "Metrics Dump" tab. The metrics dump will display any metric available over JMX encoded in JSON:  @@ -147,9 +147,9 @@ These metrics include: * compaction_oldest_cleaning_age_in_sec ([Hive 4.0.0](https://issues.apache.org/jira/browse/HIVE-25737)) * compaction_num_obsolete_deltas ([Hive 4.0.0](https://issues.apache.org/jira/browse/HIVE-24974)) -Configuration properties for metrics can be found here:  [Metrics]({{< ref "#metrics" >}}). +Configuration properties for metrics can be found here:  [Metrics]({{% ref "#metrics" %}}). -See [HiveServer2 Overview]({{< ref "hiveserver2-overview" >}}) for more information about HiveServer2. +See [HiveServer2 Overview]({{% ref "hiveserver2-overview" %}}) for more information about HiveServer2. ## Attachments: diff --git a/content/docs/latest/user/hive-on-spark.md b/content/docs/latest/user/hive-on-spark.md index db454805..a598466d 100644 --- a/content/docs/latest/user/hive-on-spark.md +++ b/content/docs/latest/user/hive-on-spark.md @@ -156,7 +156,7 @@ Finally, it seems that Spark community is in the process of improving/changing t It’s rather complicated in implementing join in MapReduce world, as manifested in Hive. Hive has reduce-side join as well as map-side join (including map-side hash lookup and map-side sorted merge). We will keep Hive’s join implementations. However, extra attention needs to be paid on the shuffle behavior (key generation, partitioning, sorting, etc), since Hive extensively uses MapReduce’s shuffling in implementing reduce-side join. It’s expected that Spark is, or will be, able to provide flexible control over the shuffling, as pointed out in the previous section([Shuffle, Group, and Sort](https://docs.google.com/a/cloudera.com/document/d/11xXVgma6UPa32cW_W64BwBssnA0d6RX4jUsATOzgkcw/edit#heading=h.rnmzzu57lmuh)). -See: [Hive on Spark: Join Design Master]({{< ref "hive-on-spark-join-design-master" >}}) for detailed design. +See: [Hive on Spark: Join Design Master]({{% ref "hive-on-spark-join-design-master" %}}) for detailed design. ### Number of Tasks diff --git a/content/docs/latest/user/hive-transactions.md b/content/docs/latest/user/hive-transactions.md index 2a368ab1..4a52ffdb 100644 --- a/content/docs/latest/user/hive-transactions.md +++ b/content/docs/latest/user/hive-transactions.md @@ -25,12 +25,12 @@ Transactions with ACID semantics have been added to Hive to address the followin ## Limitations * *BEGIN*, *COMMIT*, and *ROLLBACK* are not yet supported.  All language operations are auto-commit.  The plan is to support these in a future release. -* Only [ORC file format]({{< ref "languagemanual-orc" >}}) is supported in this first release.  The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC. -* By default transactions are configured to be off.  See the [Configuration]({{< ref "#configuration" >}}) section below for a discussion of which values need to be set to configure it. -* Tables must be [bucketed]({{< ref "languagemanual-ddl-bucketedtables" >}}) to make use of these features.  Tables in the same system not using transactions and ACID do not need to be bucketed. External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor ([HIVE-13175](https://issues.apache.org/jira/browse/HIVE-13175)). +* Only [ORC file format]({{% ref "languagemanual-orc" %}}) is supported in this first release.  The feature has been built such that transactions can be used by any storage format that can determine how updates or deletes apply to base records (basically, that has an explicit or implicit row id), but so far the integration work has only been done for ORC. +* By default transactions are configured to be off.  See the [Configuration]({{% ref "#configuration" %}}) section below for a discussion of which values need to be set to configure it. +* Tables must be [bucketed]({{% ref "languagemanual-ddl-bucketedtables" %}}) to make use of these features.  Tables in the same system not using transactions and ACID do not need to be bucketed. External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor ([HIVE-13175](https://issues.apache.org/jira/browse/HIVE-13175)). * Reading/writing to an ACID table from a non-ACID session is not allowed. In other words, the Hive transaction manager must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager in order to work with ACID tables. * At this time only snapshot level isolation is supported.  When a given query starts it will be provided with a consistent snapshot of the data.  There is no support for dirty read, read committed, repeatable read, or serializable.  With the introduction of BEGIN the intention is to support snapshot isolation for the duration of transaction rather than just a single query.  Other isolation levels may be added depending on user requests. -* The existing ZooKeeper and in-memory lock managers are not compatible with transactions.  There is no intention to address this issue.  See [Basic Design]({{< ref "#basic-design" >}}) below for a discussion of how locks are stored for transactions. +* The existing ZooKeeper and in-memory lock managers are not compatible with transactions.  There is no intention to address this issue.  See [Basic Design]({{% ref "#basic-design" %}}) below for a discussion of how locks are stored for transactions. * ~~Schema changes using ALTER TABLE is NOT supported for ACID tables. [HIVE-11421](https://issues.apache.org/jira/browse/HIVE-11421) is tracking it.~~  Fixed in 1.3.0/2.0.0. * Using Oracle as the Metastore DB and "datanucleus.connectionPoolingType=BONECP" may generate intermittent "No such lock.." and "No such transaction..." errors.  Setting "datanucleus.connectionPoolingType=DBCP" is recommended in this case. * [LOAD DATA...](/docs/latest/language/languagemanual-dml#loading-files-into-tables) statement is not supported with transactional tables.  (This was not properly enforced until [HIVE-16732](https://issues.apache.org/jira/browse/HIVE-16732)) @@ -39,25 +39,25 @@ Transactions with ACID semantics have been added to Hive to address the followin Hive offers APIs for streaming data ingest and streaming mutation: -* [Hive HCatalog Streaming API]({{< ref "streaming-data-ingest" >}}) +* [Hive HCatalog Streaming API]({{% ref "streaming-data-ingest" %}}) * [Hive Streaming API](/docs/latest/user/streaming-data-ingest-v2) (Since Hive 3) -* [HCatalog Streaming Mutation API]({{< ref "hcatalog-streaming-mutation-api" >}}) (available in Hive 2.0.0 and later) +* [HCatalog Streaming Mutation API]({{% ref "hcatalog-streaming-mutation-api" %}}) (available in Hive 2.0.0 and later) -A comparison of these two APIs is available in the [Background]({{< ref "#background" >}}) section of the Streaming Mutation document. +A comparison of these two APIs is available in the [Background]({{% ref "#background" %}}) section of the Streaming Mutation document. ## Grammar Changes -*INSERT...VALUES, UPDATE*, and *DELETE* have been added to the SQL grammar, starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317).  See [LanguageManual DML]({{< ref "languagemanual-dml" >}}) for details. +*INSERT...VALUES, UPDATE*, and *DELETE* have been added to the SQL grammar, starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-5317).  See [LanguageManual DML]({{% ref "languagemanual-dml" %}}) for details. Several new commands have been added to Hive's DDL in support of ACID and transactions, plus some existing DDL has been modified.   -A new command *SHOW TRANSACTIONS* has been added, see [Show Transactions]({{< ref "#show-transactions" >}}) for details. +A new command *SHOW TRANSACTIONS* has been added, see [Show Transactions]({{% ref "#show-transactions" %}}) for details. -A new command *SHOW COMPACTIONS* has been added, see [Show Compactions]({{< ref "#show-compactions" >}}) for details. +A new command *SHOW COMPACTIONS* has been added, see [Show Compactions]({{% ref "#show-compactions" %}}) for details. -The *SHOW LOCKS* command has been altered to provide information about the new locks associated with transactions.  If you are using the ZooKeeper or in-memory lock managers you will notice no difference in the output of this command.  See [Show Locks]({{< ref "#show-locks" >}}) for details. +The *SHOW LOCKS* command has been altered to provide information about the new locks associated with transactions.  If you are using the ZooKeeper or in-memory lock managers you will notice no difference in the output of this command.  See [Show Locks]({{% ref "#show-locks" %}}) for details. -A new option has been added to *ALTER TABLE* to request a compaction of a table or partition.  In general users do not need to request compactions, as the system will detect the need for them and initiate the compaction.  However, if [compaction is turned off]({{< ref "#compaction-is-turned-off" >}}) for a table or a user wants to compact the table at a time the system would not choose to, *ALTER TABLE* can be used to initiate the compaction.  See [Alter Table/Partition Compact]({{< ref "#alter-table/partition-compact" >}}) for details.  This will enqueue a request for compaction and return.  To watch the progress of the compaction the user can use *SHOW COMPACTIONS*. +A new option has been added to *ALTER TABLE* to request a compaction of a table or partition.  In general users do not need to request compactions, as the system will detect the need for them and initiate the compaction.  However, if [compaction is turned off]({{% ref "#compaction-is-turned-off" %}}) for a table or a user wants to compact the table at a time the system would not choose to, *ALTER TABLE* can be used to initiate the compaction.  See [Alter Table/Partition Compact]({{% ref "#alter-table/partition-compact" %}}) for details.  This will enqueue a request for compaction and return.  To watch the progress of the compaction the user can use *SHOW COMPACTIONS*. A new command *ABORT TRANSACTIONS* has been added, see [Abort Transactions](/docs/latest/language/languagemanual-ddl#abort-transactions) for details. @@ -91,17 +91,17 @@ As operations modify the table more and more delta files are created and need to * Minor compaction takes a set of existing delta files and rewrites them to a single delta file per bucket. * Major compaction takes one or more delta files and the base file for the bucket and rewrites them into a new base file per bucket.  Major compaction is more expensive but is more effective. -* More information about rebalance compaction can be found here: [Rebalance compaction]({{< ref "rebalance-compaction" >}}) +* More information about rebalance compaction can be found here: [Rebalance compaction]({{% ref "rebalance-compaction" %}}) All compactions are done in the background. Minor and major compactions do not prevent concurrent reads and writes of the data. Rebalance compaction uses exclusive write lock, therefore it prevents concurrent writes. After a compaction the system waits until all readers of the old files have finished and then removes the old files. #### Initiator -This module is responsible for discovering which tables or partitions are due for compaction.  This should be enabled in a Metastore using [hive.compactor.initiator.on]({{< ref "#hive-compactor-initiator-on" >}}).  There are several properties of the form *.threshold in "New Configuration Parameters for Transactions" table below that control when a compaction task is created and which type of compaction is performed.  Each compaction task handles 1 partition (or whole table if the table is unpartitioned).  If the number of consecutive compaction failures for a given partition exceeds hive.compactor.initiator.failed.compacts.threshold, automatic compaction scheduling will stop for this partition.  See Configuration Parameters table for more info. +This module is responsible for discovering which tables or partitions are due for compaction.  This should be enabled in a Metastore using [hive.compactor.initiator.on]({{% ref "#hive-compactor-initiator-on" %}}).  There are several properties of the form *.threshold in "New Configuration Parameters for Transactions" table below that control when a compaction task is created and which type of compaction is performed.  Each compaction task handles 1 partition (or whole table if the table is unpartitioned).  If the number of consecutive compaction failures for a given partition exceeds hive.compactor.initiator.failed.compacts.threshold, automatic compaction scheduling will stop for this partition.  See Configuration Parameters table for more info. #### Worker -Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: \-compactor-\.\.\.  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions. +Each Worker handles a single compaction task.  A compaction is a MapReduce job with name in the following form: \-compactor-\.\.\.  Each worker submits the job to the cluster (via [hive.compactor.job.queue]({{% ref "#hive-compactor-job-queue" %}}) if defined) and waits for the job to finish.  [hive.compactor.worker.threads]({{% ref "#hive-compactor-worker-threads" %}}) determines the number of Workers in each Metastore.  The total number of Workers in the Hive Warehouse determines the maximum number of concurrent compactions. #### Cleaner @@ -109,13 +109,13 @@ This process is a process that deletes delta files after compaction and after it #### AcidHouseKeeperService -This process looks for transactions that have not heartbeated in [hive.txn.timeout]({{< ref "#hive-txn-timeout" >}}) time and aborts them.  The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released. +This process looks for transactions that have not heartbeated in [hive.txn.timeout]({{% ref "#hive-txn-timeout" %}}) time and aborts them.  The system assumes that a client that initiated a transaction stopped heartbeating crashed and the resources it locked should be released. #### SHOW COMPACTIONS This commands displays information about currently running compaction and recent history (configurable retention period) of compactions.  This history display is available since [HIVE-12353](https://issues.apache.org/jira/browse/HIVE-12353). -Also see [LanguageManual DDL#ShowCompactions]({{< ref "#languagemanual-ddl#showcompactions" >}}) for more information on the output of this command and [NewConfigurationParametersforTransactions]({{< ref "#newconfigurationparametersfortransactions" >}})/Compaction History for configuration properties affecting the output of this command.  The system retains the last N entries of each type: failed, succeeded, attempted (where N is configurable for each type). +Also see [LanguageManual DDL#ShowCompactions]({{% ref "#languagemanual-ddl#showcompactions" %}}) for more information on the output of this command and [NewConfigurationParametersforTransactions]({{% ref "#newconfigurationparametersfortransactions" %}})/Compaction History for configuration properties affecting the output of this command.  The system retains the last N entries of each type: failed, succeeded, attempted (where N is configurable for each type). @@ -135,18 +135,18 @@ Minimally, these configuration parameters must be set appropriately to turn on t Client Side -* [hive.support.concurrency]({{< ref "#hive-support-concurrency" >}}) – true -* [hive.enforce.bucketing]({{< ref "#hive-enforce-bucketing" >}}) – true (Not required as of [Hive 2.0](https://issues.apache.org/jira/browse/HIVE-12331)) -* [hive.exec.dynamic.partition.mode]({{< ref "#hive-exec-dynamic-partition-mode" >}}) – nonstrict -* [hive.txn.manager]({{< ref "#hive-txn-manager" >}}) – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager +* [hive.support.concurrency]({{% ref "#hive-support-concurrency" %}}) – true +* [hive.enforce.bucketing]({{% ref "#hive-enforce-bucketing" %}}) – true (Not required as of [Hive 2.0](https://issues.apache.org/jira/browse/HIVE-12331)) +* [hive.exec.dynamic.partition.mode]({{% ref "#hive-exec-dynamic-partition-mode" %}}) – nonstrict +* [hive.txn.manager]({{% ref "#hive-txn-manager" %}}) – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager Server Side (Metastore) -* [hive.compactor.initiator.on]({{< ref "#hive-compactor-initiator-on" >}}) – true (See table below for more details) -* [hive.compactor.cleaner.on]({{< ref "#hive-compactor-cleaner-on" >}}) – true (See table below for more details) -* [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) – a positive number on at least one instance of the Thrift metastore service +* [hive.compactor.initiator.on]({{% ref "#hive-compactor-initiator-on" %}}) – true (See table below for more details) +* [hive.compactor.cleaner.on]({{% ref "#hive-compactor-cleaner-on" %}}) – true (See table below for more details) +* [hive.compactor.worker.threads]({{% ref "#hive-compactor-worker-threads" %}}) – a positive number on at least one instance of the Thrift metastore service -The following sections list all of the configuration parameters that affect Hive transactions and compaction.  Also see [Limitations]({{< ref "#limitations" >}}) above and [Table Properties]({{< ref "#table-properties" >}}) below. +The following sections list all of the configuration parameters that affect Hive transactions and compaction.  Also see [Limitations]({{% ref "#limitations" %}}) above and [Table Properties]({{% ref "#table-properties" %}}) below. ### New Configuration Parameters for Transactions @@ -154,21 +154,21 @@ A number of new configuration parameters have been added to the system to suppor | **Configuration key** | **Values** | **Location** | **Notes** | | --- | --- | --- | --- | -| [hive.txn.manager]({{< ref "#hive-txn-manager" >}})  | *Default:* org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager*Value required for transactions:* org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/HiveServer2 | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | -| [hive.txn.strict.locking.mode]({{< ref "#hive-txn-strict-locking-mode" >}}) | *Default:* true | Client/ HiveServer2 | In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15774)). | -| [hive.txn.timeout]({{< ref "#hive-txn-timeout" >}}) deprecated. Use metastore.txn.timeout instead | *Default:* 300 | Client/HiveServer2/Metastore  | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5 | +| [hive.txn.manager]({{% ref "#hive-txn-manager" %}})  | *Default:* org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager*Value required for transactions:* org.apache.hadoop.hive.ql.lockmgr.DbTxnManager | Client/HiveServer2 | DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions. | +| [hive.txn.strict.locking.mode]({{% ref "#hive-txn-strict-locking-mode" %}}) | *Default:* true | Client/ HiveServer2 | In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of [Hive 2.2.0](https://issues.apache.org/jira/browse/HIVE-15774)). | +| [hive.txn.timeout]({{% ref "#hive-txn-timeout" %}}) deprecated. Use metastore.txn.timeout instead | *Default:* 300 | Client/HiveServer2/Metastore  | Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5 | | [hive.txn.heartbeat.threadpool.size](/docs/latest/user/configuration-properties#hivetxnheartbeatthreadpoolsize) deprecated - but still in use | *Default:* 5 | Client/HiveServer2 | The number of threads to use for heartbeating (as of [Hive 1.3.0 and 2.0.0](https://issues.apache.org/jira/browse/HIVE-12366)). | -| [hive.timedout.txn.reaper.start]({{< ref "#hive-timedout-txn-reaper-start" >}}) deprecated | *Default:* 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11317)). Controls AcidHouseKeeperServcie above. | -| [hive.timedout.txn.reaper.interval]({{< ref "#hive-timedout-txn-reaper-interval" >}}) deprecated | *Default:* 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11317)). Controls AcidHouseKeeperServcie above. | -| [hive.txn.max.open.batch]({{< ref "#hive-txn-max-open-batch" >}}) deprecated. Use metastore.txn.max.open.batch instead | *Default:* 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().1 | -| [hive.max.open.txns]({{< ref "#hive-max-open-txns" >}}) deprecated. Use metastore.max.open.txns instead. | *Default:* 100000 | HiveServer2/ Metastore | Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13249).) | -| [hive.count.open.txns.interval]({{< ref "#hive-count-open-txns-interval" >}}) deprecated. Use metastore.count.open.txns.interval instead. | *Default:* 1s | HiveServer2/ Metastore | Time in seconds between checks to count open transactions (as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13249)). | -| [hive.txn.retryable.sqlex.regex]({{< ref "#hive-txn-retryable-sqlex-regex" >}}) deprecated. Use metastore.txn.retryable.sqlex.regex instead. | *Default:* "" (empty string) | HiveServer2/ Metastore | Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-12637)).For an example, see [Configuration Properties]({{< ref "#configuration-properties" >}}). | +| [hive.timedout.txn.reaper.start]({{% ref "#hive-timedout-txn-reaper-start" %}}) deprecated | *Default:* 100s | Metastore | Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11317)). Controls AcidHouseKeeperServcie above. | +| [hive.timedout.txn.reaper.interval]({{% ref "#hive-timedout-txn-reaper-interval" %}}) deprecated | *Default:* 180s | Metastore | Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11317)). Controls AcidHouseKeeperServcie above. | +| [hive.txn.max.open.batch]({{% ref "#hive-txn-max-open-batch" %}}) deprecated. Use metastore.txn.max.open.batch instead | *Default:* 1000 | Client | Maximum number of transactions that can be fetched in one call to open_txns().1 | +| [hive.max.open.txns]({{% ref "#hive-max-open-txns" %}}) deprecated. Use metastore.max.open.txns instead. | *Default:* 100000 | HiveServer2/ Metastore | Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13249).) | +| [hive.count.open.txns.interval]({{% ref "#hive-count-open-txns-interval" %}}) deprecated. Use metastore.count.open.txns.interval instead. | *Default:* 1s | HiveServer2/ Metastore | Time in seconds between checks to count open transactions (as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13249)). | +| [hive.txn.retryable.sqlex.regex]({{% ref "#hive-txn-retryable-sqlex-regex" %}}) deprecated. Use metastore.txn.retryable.sqlex.regex instead. | *Default:* "" (empty string) | HiveServer2/ Metastore | Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-12637)).For an example, see [Configuration Properties]({{% ref "#configuration-properties" %}}). | | hive.compaction.merge.enabled | *Default:* false | HiveServer2 | Enables merge-based compaction which is a compaction optimization when few ORC delta files are present | | hive.compactor.initiator.duration.update.interval | *Default:* 60s | HiveServer2 | Time in seconds that drives the update interval of compaction_initiator_duration metric.Smaller value results in a fine grained metric update.This updater can be turned off if its value less than or equals to zero.In this case the above metric will be update only after the initiator completed one cycle.The hive.compactor.initiator.on must be turned on (true) in-order to enable the Initiator,otherwise this setting has no effect. | -| [hive.compactor.initiator.on]({{< ref "#hive-compactor-initiator-on" >}}) deprecated. Use metastore.compactor.initiator.on instead. | *Default:* false*Value required for transactions:* true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator thread on this metastore instance. Prior to [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet).As of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) this property may be enabled on any number of standalone metastore instances. | +| [hive.compactor.initiator.on]({{% ref "#hive-compactor-initiator-on" %}}) deprecated. Use metastore.compactor.initiator.on instead. | *Default:* false*Value required for transactions:* true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the initiator thread on this metastore instance. Prior to [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet).As of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11388) this property may be enabled on any number of standalone metastore instances. | | hive.compactor.cleaner.duration.update.interval | *Default:* 60s | HiveServer2 | Time in seconds that drives the update interval of compaction_cleaner_duration metric.Smaller value results in a fine grained metric update.This updater can be turned off if its value less than or equals to zero.In this case the above metric will be update only after the cleaner completed one cycle. | -| [hive.compactor.cleaner.on]({{< ref "#hive-compactor-cleaner-on" >}}) deprecated. Use metastore.compactor.cleaner.on instead. | *Default:* false*Value required for transactions:* true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the cleaner thread on this metastore instance. Before **Hive 4.0.0** Cleaner thread can be started/stopped with config hive.compactor.initiator.on. This config helps to enable/disable initiator/cleaner threads independently | +| [hive.compactor.cleaner.on]({{% ref "#hive-compactor-cleaner-on" %}}) deprecated. Use metastore.compactor.cleaner.on instead. | *Default:* false*Value required for transactions:* true (for exactly one instance of the Thrift metastore service) | Metastore | Whether to run the cleaner thread on this metastore instance. Before **Hive 4.0.0** Cleaner thread can be started/stopped with config hive.compactor.initiator.on. This config helps to enable/disable initiator/cleaner threads independently | | hive.compactor.cleaner.threads.num | *Default:* 1 | HiveServer2 | Enables parallelization of the cleaning directories after compaction, that includes many file related checks and may be expensive | | hive.compactor.compact.insert.only | *Default:* true | HiveServer2 | Whether the compactor should compact insert-only tables. A safety switch. | | hive.compactor.crud.query.based | *Default*: false | HiveServer2 | Means compaction on full CRUD tables is done via queries. Compactions on insert-only tables will always run via queries regardless of the value of this configuration. | @@ -178,16 +178,16 @@ A number of new configuration parameters have been added to the system to suppor | metastore.compactor.long.running.initiator.threshold.error | *Default:* 12h | Metastore | Initiator cycle duration after which an error will be logged. Default time unit is: hours | | hive.compactor.worker.sleep.time | *Default:*10800ms | HiveServer2 | Time in milliseconds for which a worker threads goes into sleep before starting another iteration in case of no launched job or error | | hive.compactor.worker.max.sleep.time | *Default:* 320000ms | HiveServer2 | Max time in milliseconds for which a worker threads goes into sleep before starting another iteration used for backoff in case of no launched job or error | -| [hive.compactor.worker.threads]({{< ref "#hive-compactor-worker-threads" >}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* \> 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | -| [hive.compactor.worker.timeout]({{< ref "#hive-compactor-worker-timeout" >}}) | *Default:* 86400s | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | -| [hive.compactor.cleaner.run.interval]({{< ref "#hive-compactor-cleaner-run-interval" >}}) | *Default*: 5000ms | Metastore | Time in milliseconds between runs of the cleaner thread. ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8258) and later.) | -| [hive.compactor.check.interval]({{< ref "#hive-compactor-check-interval" >}}) | *Default:* 300s | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | -| [hive.compactor.delta.num.threshold]({{< ref "#hive-compactor-delta-num-threshold" >}}) | *Default:* 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | -| [hive.compactor.delta.pct.threshold]({{< ref "#hive-compactor-delta-pct-threshold" >}}) | *Default:* 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | -| [hive.compactor.abortedtxn.threshold]({{< ref "#hive-compactor-abortedtxn-threshold" >}}) | *Default:* 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | -| [hive.compactor.aborted.txn.time.threshold]({{< ref "#hive-compactor-aborted-txn-time-threshold" >}}) | *Default*: 12h | Metastore | Age of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable. | -| [hive.compactor.max.num.delta]({{< ref "#hive-compactor-max-num-delta" >}}) | Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11540)).4 | -| [hive.compactor.job.queue]({{< ref "#hive-compactor-job-queue" >}}) | *Default*: "" (empty string) |  Metastore |  Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11997)). | +| [hive.compactor.worker.threads]({{% ref "#hive-compactor-worker-threads" %}}) deprecated. Use metastore.compactor.worker.threads instead. | *Default:* 0*Value required for transactions:* \> 0 on at least one instance of the Thrift metastore service | Metastore | How many compactor worker threads to run on this metastore instance.2 | +| [hive.compactor.worker.timeout]({{% ref "#hive-compactor-worker-timeout" %}}) | *Default:* 86400s | Metastore | Time in seconds after which a compaction job will be declared failed and the compaction re-queued. | +| [hive.compactor.cleaner.run.interval]({{% ref "#hive-compactor-cleaner-run-interval" %}}) | *Default*: 5000ms | Metastore | Time in milliseconds between runs of the cleaner thread. ([Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8258) and later.) | +| [hive.compactor.check.interval]({{% ref "#hive-compactor-check-interval" %}}) | *Default:* 300s | Metastore | Time in seconds between checks to see if any tables or partitions need to be compacted.3 | +| [hive.compactor.delta.num.threshold]({{% ref "#hive-compactor-delta-num-threshold" %}}) | *Default:* 10 | Metastore | Number of delta directories in a table or partition that will trigger a minor compaction. | +| [hive.compactor.delta.pct.threshold]({{% ref "#hive-compactor-delta-pct-threshold" %}}) | *Default:* 0.1 | Metastore | Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%. | +| [hive.compactor.abortedtxn.threshold]({{% ref "#hive-compactor-abortedtxn-threshold" %}}) | *Default:* 1000 | Metastore | Number of aborted transactions involving a given table or partition that will trigger a major compaction. | +| [hive.compactor.aborted.txn.time.threshold]({{% ref "#hive-compactor-aborted-txn-time-threshold" %}}) | *Default*: 12h | Metastore | Age of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable. | +| [hive.compactor.max.num.delta]({{% ref "#hive-compactor-max-num-delta" %}}) | Default: 500 | Metastore | Maximum number of delta files that the compactor will attempt to handle in a single job (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11540)).4 | +| [hive.compactor.job.queue]({{% ref "#hive-compactor-job-queue" %}}) | *Default*: "" (empty string) |  Metastore |  Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of [Hive 1.3.0](https://issues.apache.org/jira/browse/HIVE-11997)). | | hive.compactor.request.queue | *Default:* 1 | HiveServer2 | Enables parallelization of the checkForCompaction operation, that includes many file metadata checksand may be expensive | | hive.split.grouping.mode | *Default:* query (Allowed values: query, compactor) | HiveServer2 | This is set to compactor from within the query based compactor. This enables the Tez SplitGrouper to group splits based on their bucket number, so that all rows from different bucket files  for the same bucket number can end up in the same bucket file after the compaction. | | hive.txn.xlock.iow | Default: true | HiveServer2 | Ensures commands with OVERWRITE (such as INSERT OVERWRITE) acquire Exclusive locks fortransactional tables. This ensures that inserts (w/o overwrite) running concurrentlyare not hidden by the INSERT OVERWRITE. | @@ -224,9 +224,9 @@ In addition to the new parameters listed above, some existing parameters need to | Configuration key | Must be set to | | --- | --- | -| [hive.support.concurrency]({{< ref "#hive-support-concurrency" >}}) | true (default is false) | -| [hive.enforce.bucketing]({{< ref "#hive-enforce-bucketing" >}}) | true (default is false) (Not required as of [Hive 2.0](https://issues.apache.org/jira/browse/HIVE-12331)) | -| [hive.exec.dynamic.partition.mode]({{< ref "#hive-exec-dynamic-partition-mode" >}}) | nonstrict (default is strict) | +| [hive.support.concurrency]({{% ref "#hive-support-concurrency" %}}) | true (default is false) | +| [hive.enforce.bucketing]({{% ref "#hive-enforce-bucketing" %}}) | true (default is false) (Not required as of [Hive 2.0](https://issues.apache.org/jira/browse/HIVE-12331)) | +| [hive.exec.dynamic.partition.mode]({{% ref "#hive-exec-dynamic-partition-mode" %}}) | nonstrict (default is strict) | ### Configuration Values to Set for Compaction @@ -238,11 +238,11 @@ More in formation on compaction pooling can be found here: [Compaction pooling]( ## Table Properties -If a table is to be used in ACID writes (insert, update, delete) then the table property "`transactional=true`" must be set on that table, starting with [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8290). Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, [hive.txn.manager]({{< ref "#hive-txn-manager" >}}) must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited prior to HIVE-11716.  Since HIVE-11716 operations on ACID tables without DbTxnManager are not allowed.  However, this does not apply to Hive 0.13.0. +If a table is to be used in ACID writes (insert, update, delete) then the table property "`transactional=true`" must be set on that table, starting with [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-8290). Note, once a table has been defined as an ACID table via TBLPROPERTIES ("transactional"="true"), it cannot be converted back to a non-ACID table, i.e., changing TBLPROPERTIES ("transactional"="false") is not allowed. Also, [hive.txn.manager]({{% ref "#hive-txn-manager" %}}) must be set to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager either in hive-site.xml or in the beginning of the session before any query is run. Without those, inserts will be done in the old style; updates and deletes will be prohibited prior to HIVE-11716.  Since HIVE-11716 operations on ACID tables without DbTxnManager are not allowed.  However, this does not apply to Hive 0.13.0. -If a table owner does not wish the system to automatically determine when to compact, then the table property "`NO_AUTO_COMPACTION`" can be set.  This will prevent all automatic compactions.  Manual compactions can still be done with [Alter Table/Partition Compact]({{< ref "#alter-table/partition-compact" >}}) statements. +If a table owner does not wish the system to automatically determine when to compact, then the table property "`NO_AUTO_COMPACTION`" can be set.  This will prevent all automatic compactions.  Manual compactions can still be done with [Alter Table/Partition Compact]({{% ref "#alter-table/partition-compact" %}}) statements. -Table properties are set with the TBLPROPERTIES clause when a table is created or altered, as described in the [Create Table]({{< ref "#create-table" >}}) and [Alter Table Properties]({{< ref "#alter-table-properties" >}}) sections of Hive Data Definition Language. The "`transactional`" and "`NO_AUTO_COMPACTION`" table properties are case-sensitive in Hive releases 0.x and 1.0, but they are case-insensitive starting with release 1.1.0 ([HIVE-8308](https://issues.apache.org/jira/browse/HIVE-8308)). +Table properties are set with the TBLPROPERTIES clause when a table is created or altered, as described in the [Create Table]({{% ref "#create-table" %}}) and [Alter Table Properties]({{% ref "#alter-table-properties" %}}) sections of Hive Data Definition Language. The "`transactional`" and "`NO_AUTO_COMPACTION`" table properties are case-sensitive in Hive releases 0.x and 1.0, but they are case-insensitive starting with release 1.1.0 ([HIVE-8308](https://issues.apache.org/jira/browse/HIVE-8308)). More compaction related options can be set via TBLPROPERTIES as of [Hive 1.3.0 and 2.1.0](https://issues.apache.org/jira/browse/HIVE-13354). They can be set at both table-level via [CREATE TABLE](/docs/latest/language/languagemanual-ddl#createdroptruncate-table), and on request-level via [ALTER TABLE/PARTITION COMPACT](/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact).  These are used to override the Warehouse/table wide settings.  For example, to override an MR property to affect a compaction job, one can add "compactor.\=\" in either CREATE TABLE statement or when launching a compaction explicitly via ALTER TABLE.  The "\=\" will be set on JobConf of the compaction MR job.   Similarly, "tblprops.\=\" can be used to set/override any table property which is interpreted by the code running on the cluster.  Finally, "compactorthreshold.\=\" can be used to override properties from the "New Configuration Parameters for Transactions" table above that end with ".threshold" and control when compactions are triggered by the system.  Examples: diff --git a/content/docs/latest/user/hiveaws-hivings3nremotely.md b/content/docs/latest/user/hiveaws-hivings3nremotely.md index c1e04f7a..acdce9df 100644 --- a/content/docs/latest/user/hiveaws-hivings3nremotely.md +++ b/content/docs/latest/user/hiveaws-hivings3nremotely.md @@ -45,7 +45,7 @@ hive> set fs.s3n.awsAccessKeyId=1B5JYHPQCXW13GWKHAG2 ``` -**The values assigned to s3n keys are just an example and need to be filled in by the user as per their account details.** Explanation for the rest of the values can be found in [Configuration Guide]({{< ref "#configuration-guide" >}}) section below. +**The values assigned to s3n keys are just an example and need to be filled in by the user as per their account details.** Explanation for the rest of the values can be found in [Configuration Guide]({{% ref "#configuration-guide" %}}) section below. Instead of specifying these command lines each time the CLI is bought up - we can store these persistently within `hive-site.xml` in the `conf/` directory of the Hive installation (from where they will be picked up each time the CLI is launched. @@ -100,7 +100,7 @@ hive> select * from kv limit 10; `select *` queries with limit clauses can be performed locally on the Hive CLI itself. If you are doing this - please note that: -* `fs.default.name` should be set to `[file:///![](images/icons/linkext7.gif)]({{< ref "file:///" >}})` in case CLI is not configured to use a working Hadoop cluster +* `fs.default.name` should be set to `[file:///![](images/icons/linkext7.gif)]({{% ref "file:///" %}})` in case CLI is not configured to use a working Hadoop cluster * **Please Please do not select all the rows from large data sets**. This will cause large amount of data to be downloaded from S3 to outside AWS and incur charges on the host account for these data sets! Of course - the real fun is in doing some non-trivial queries using map-reduce. For this we will need a Hadoop cluster (finally!): diff --git a/content/docs/latest/user/hiveclient.md b/content/docs/latest/user/hiveclient.md index f591966c..b287f3d9 100644 --- a/content/docs/latest/user/hiveclient.md +++ b/content/docs/latest/user/hiveclient.md @@ -7,19 +7,19 @@ date: 2024-12-12 This page describes the different clients supported by Hive. The command line client currently only supports an embedded server. The JDBC and Thrift-Java clients support both embedded and standalone servers. Clients in other languages only support standalone servers. -For details about the standalone server see [Hive Server]({{< ref "hiveserver" >}}) or [HiveServer2]({{< ref "setting-up-hiveserver2" >}}). +For details about the standalone server see [Hive Server]({{% ref "hiveserver" %}}) or [HiveServer2]({{% ref "setting-up-hiveserver2" %}}). # Command Line -Operates in embedded mode only, that is, it needs to have access to the Hive libraries. For more details see [Getting Started]({{< ref "gettingstarted-latest" >}}) and [Hive CLI]({{< ref "languagemanual-cli" >}}). +Operates in embedded mode only, that is, it needs to have access to the Hive libraries. For more details see [Getting Started]({{% ref "gettingstarted-latest" %}}) and [Hive CLI]({{% ref "languagemanual-cli" %}}). # JDBC -**This document describes the JDBC client for the original [Hive Server]({{< ref "hiveserver" >}}) (sometimes called *Thrift server* or *HiveServer1*). For information about the HiveServer2 JDBC client, see [JDBC in the HiveServer2 Clients document]({{< ref "#jdbc-in-the-hiveserver2-clients-document" >}}). HiveServer2 use is recommended; the original HiveServer has several concurrency issues and lacks several features available in HiveServer2.** +**This document describes the JDBC client for the original [Hive Server]({{% ref "hiveserver" %}}) (sometimes called *Thrift server* or *HiveServer1*). For information about the HiveServer2 JDBC client, see [JDBC in the HiveServer2 Clients document]({{% ref "#jdbc-in-the-hiveserver2-clients-document" %}}). HiveServer2 use is recommended; the original HiveServer has several concurrency issues and lacks several features available in HiveServer2.** Version information -The original [Hive Server]({{< ref "hiveserver" >}}) was removed from Hive releases starting in [version 1.0.0]({{< ref "#version-1-0-0" >}}). See [HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977). +The original [Hive Server]({{% ref "hiveserver" %}}) was removed from Hive releases starting in [version 1.0.0]({{% ref "#version-1-0-0" %}}). See [HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977). For embedded mode, uri is just "jdbc:hive://". For standalone server, uri is "jdbc:hive://host:port/dbname" where host and port are determined by where the Hive server is run. For example, "jdbc:hive://localhost:10000/default". Currently, the only dbname supported is "default". @@ -219,7 +219,7 @@ $transport->close(); # ODBC -Operates only on a standalone server. The Hive ODBC client provides a set of C-compatible library functions to interact with Hive Server in a pattern similar to those dictated by the ODBC specification. See [Hive ODBC Driver]({{< ref "hiveodbc" >}}). +Operates only on a standalone server. The Hive ODBC client provides a set of C-compatible library functions to interact with Hive Server in a pattern similar to those dictated by the ODBC specification. See [Hive ODBC Driver]({{% ref "hiveodbc" %}}). # Thrift diff --git a/content/docs/latest/user/hiveserver2-clients.md b/content/docs/latest/user/hiveserver2-clients.md index bd283c6d..879c194f 100644 --- a/content/docs/latest/user/hiveserver2-clients.md +++ b/content/docs/latest/user/hiveserver2-clients.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : HiveServer2 Clients -This page describes the different clients supported by [HiveServer2]({{< ref "setting-up-hiveserver2" >}}). +This page describes the different clients supported by [HiveServer2]({{% ref "setting-up-hiveserver2" %}}). # Version information @@ -15,9 +15,9 @@ Introduced in Hive version 0.11. See [HIVE-2935](https://issues.apache.org/jira/ HiveServer2 supports a command shell Beeline that works with HiveServer2. It's a JDBC client that is based on the SQLLine CLI (). There’s detailed [documentation](http://sqlline.sourceforge.net/#manual) of SQLLine which is applicable to Beeline as well. -[Replacing the Implementation of Hive CLI Using Beeline]({{< ref "replacing-the-implementation-of-hive-cli-using-beeline" >}}) +[Replacing the Implementation of Hive CLI Using Beeline]({{% ref "replacing-the-implementation-of-hive-cli-using-beeline" %}}) -The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to [Hive CLI]({{< ref "languagemanual-cli" >}})) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-7615), when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR. Remote HiveServer2 mode is recommended for production use, as it is more secure and doesn't require direct HDFS/metastore access to be granted for users. +The Beeline shell works in both embedded mode as well as remote mode. In the embedded mode, it runs an embedded Hive (similar to [Hive CLI]({{% ref "languagemanual-cli" %}})) whereas remote mode is for connecting to a separate HiveServer2 process over Thrift. Starting in [Hive 0.14](https://issues.apache.org/jira/browse/HIVE-7615), when Beeline is used with HiveServer2, it also prints the log messages from HiveServer2 for queries it executes to STDERR. Remote HiveServer2 mode is recommended for production use, as it is more secure and doesn't require direct HDFS/metastore access to be granted for users. In remote mode HiveServer2 only accepts valid Thrift calls – even in HTTP mode, the message body contains Thrift payloads. @@ -89,7 +89,7 @@ Version: 4.0.0 ([HIVE-22853](https://issues.apache.org/jira/browse/HIVE-22853)) ### Beeline Hive Commands -Hive specific commands (same as [Hive CLI commands]({{< ref "#hive-cli-commands" >}})) can be run from Beeline, when the Hive JDBC driver is used. +Hive specific commands (same as [Hive CLI commands]({{% ref "#hive-cli-commands" %}})) can be run from Beeline, when the Hive JDBC driver is used. Use "`;`" (semicolon) to terminate commands. Comments in scripts can be specified using the "`--`" prefix. @@ -100,13 +100,13 @@ Use "`;`" (semicolon) to terminate commands. Comments in scripts can be specifie | `set =` | Sets the value of a particular configuration variable (key). **Note:** If you misspell the variable name, Beeline will not show an error. | | `set` | Prints a list of configuration variables that are overridden by the user or Hive. | | `set -v` | Prints all Hadoop and Hive configuration variables. | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `add FILE[S] *`  `add JAR[S]  *`  `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| list FILE[S] list JAR[S] list ARCHIVE[S] | Lists the resources already added to the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. (As of Hive 0.14.0: [HIVE-7592](https://issues.apache.org/jira/browse/HIVE-7592)). | -| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `add FILE[S] *`  `add JAR[S]  *`  `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form ivy://group:module:version?query_string. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| list FILE[S] list JAR[S] list ARCHIVE[S] | Lists the resources already added to the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. (As of Hive 0.14.0: [HIVE-7592](https://issues.apache.org/jira/browse/HIVE-7592)). | +| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | | `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | Removes the resource(s) from the distributed cache. | -| `delete FILE[S] *`  `delete JAR[S] *`  `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `reload` | As of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7553), makes HiveServer2 aware of any jar changes in the path specified by the configuration parameter [hive.reloadable.aux.jars.path]({{< ref "#hive-reloadable-aux-jars-path" >}}) (without needing to restart HiveServer2). The changes can be adding, removing, or updating jar files. | +| `delete FILE[S] *`  `delete JAR[S] *`  `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `reload` | As of [Hive 0.14.0](https://issues.apache.org/jira/browse/HIVE-7553), makes HiveServer2 aware of any jar changes in the path specified by the configuration parameter [hive.reloadable.aux.jars.path]({{% ref "#hive-reloadable-aux-jars-path" %}}) (without needing to restart HiveServer2). The changes can be adding, removing, or updating jar files. | | `dfs ` | Executes a dfs command. | | `` | Executes a Hive query and prints results to standard output. | @@ -117,7 +117,7 @@ The Beeline CLI supports these command line options: | Option | Description | | --- | --- | | `-u ` | The JDBC URL to connect to. Special characters in parameter values should be encoded with URL encoding if needed.Usage: `beeline -u` *db_URL*  | -| `-r` | [Reconnect]({{< ref "#reconnect" >}}) to last used URL (if a user has previously used `!connect` to a URL and used `!save` to a beeline.properties file).Usage: `beeline -r`Version: 2.1.0 ([HIVE-13670](https://issues.apache.org/jira/browse/HIVE-13670)) | +| `-r` | [Reconnect]({{% ref "#reconnect" %}}) to last used URL (if a user has previously used `!connect` to a URL and used `!save` to a beeline.properties file).Usage: `beeline -r`Version: 2.1.0 ([HIVE-13670](https://issues.apache.org/jira/browse/HIVE-13670)) | | `-n ` | The username to connect as.Usage: `beeline -n` valid_user | | `-p ` | The password to connect as.Usage: `beeline -p` valid_passwordOptional password mode:Starting Hive 2.2.0 ([HIVE-13589](https://issues.apache.org/jira/browse/HIVE-13589)) the argument for -p option is optional.Usage : beeline -p [valid_password]If the password is not provided after -p Beeline will prompt for the password while initiating the connection. When password is provided Beeline uses it initiate the connection without prompting. | | `-d ` | The driver class to use.Usage: `beeline -d` driver_class | @@ -127,7 +127,7 @@ The Beeline CLI supports these command line options: | `-w` (or) `--password-file ` | The password file to read password from.Version: 1.2.0 ([HIVE-7175](https://issues.apache.org/jira/browse/HIVE-7175)) | | `-a` (or) `--authType ` | The authentication type passed to the jdbc as an auth propertyVersion: 0.13.0 ([HIVE-5155](https://issues.apache.org/jira/browse/HIVE-5155)) | | `--property-file ` | File to read configuration properties fromUsage: `beeline --property-file /tmp/a`Version: 2.2.0 ([HIVE-13964](https://issues.apache.org/jira/browse/HIVE-13964)) | -| `--hiveconf property=value` | Use value for the given configuration property. Properties that are listed in hive.conf.restricted.list cannot be reset with hiveconf (see [Restricted List and Whitelist]({{< ref "#restricted-list-and-whitelist" >}})).Usage: `beeline --hiveconf` prop1`=`value1Version: 0.13.0 ([HIVE-6173](https://issues.apache.org/jira/browse/HIVE-6173)) | +| `--hiveconf property=value` | Use value for the given configuration property. Properties that are listed in hive.conf.restricted.list cannot be reset with hiveconf (see [Restricted List and Whitelist]({{% ref "#restricted-list-and-whitelist" %}})).Usage: `beeline --hiveconf` prop1`=`value1Version: 0.13.0 ([HIVE-6173](https://issues.apache.org/jira/browse/HIVE-6173)) | | `--hivevar name=value` | Hive variable name and value. This is a Hive-specific setting in which variables can be set at the session level and referenced in Hive commands or queries.Usage: `beeline --hivevar` var1`=`value1 | | `--color=[true/false]` | Control whether color is used for display. Default is false.Usage: `beeline --color=true`(Not supported for Separated-Value Output formats. See [HIVE-9770](https://issues.apache.org/jira/browse/HIVE-9770)) | | `--showHeader=[true/false]` | Show column names in query results (true) or not (false). Default is true.Usage: `beeline --showHeader=false` | @@ -144,7 +144,7 @@ The Beeline CLI supports these command line options: | `--maxColumnWidth=MAXCOLWIDTH` | The maximum column width, in characters, when outputformat is table. Default is 50 in Hive version 2.2.0+ (see [HIVE-14135](https://issues.apache.org/jira/browse/HIVE-14135)) or 15 in earlier versions.Usage: `beeline --maxColumnWidth=25` | | `--silent=[true/false]` | Reduce the amount of informational messages displayed (true) or not (false). It also stops displaying the log messages for the query from HiveServer2 ([Hive 0.14](https://issues.apache.org/jira/browse/HIVE-7615) and later) and the HiveQL commands ([Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-10202) and later). Default is false.Usage: `beeline --silent=true` | | `--autosave=[true/false]` | Automatically save preferences (true) or do not autosave (false). Default is false.Usage: `beeline --autosave=true` | -| `--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]` | Format mode for result display. Default is table. See [Separated-Value Output Formats]({{< ref "#separated-value-output-formats" >}}) below for description of recommended sv options.Usage: `beeline --outputformat=tsv`Version: dsv/csv2/tsv2 added in 0.14.0 ([HIVE-8615](https://issues.apache.org/jira/browse/HIVE-8615)) | +| `--outputformat=[table/vertical/csv/tsv/dsv/csv2/tsv2]` | Format mode for result display. Default is table. See [Separated-Value Output Formats]({{% ref "#separated-value-output-formats" %}}) below for description of recommended sv options.Usage: `beeline --outputformat=tsv`Version: dsv/csv2/tsv2 added in 0.14.0 ([HIVE-8615](https://issues.apache.org/jira/browse/HIVE-8615)) | | `--truncateTable=[true/false] ` | If true, truncates table column in the console when it exceeds console length.Version: 0.14.0 ([HIVE-6928](https://issues.apache.org/jira/browse/HIVE-6928)) | | `--delimiterForDSV= DELIMITER` | The delimiter for delimiter-separated values output format. Default is '|' character.Version: 0.14.0 ([HIVE-7390](https://issues.apache.org/jira/browse/HIVE-7390)) | | `--isolation=LEVEL` | Set the transaction isolation level to TRANSACTION_READ_COMMITTED or TRANSACTION_SERIALIZABLE. See the "Field Detail" section in the Java [Connection](http://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html) documentation.Usage: `beeline --isolation=TRANSACTION_SERIALIZABLE` | @@ -162,13 +162,13 @@ In Beeline, the result can be displayed in different formats. The format mode ca The following output formats are supported: -* [table]({{< ref "#table" >}}) -* [vertical]({{< ref "#vertical" >}}) -* [xmlattr]({{< ref "#xmlattr" >}}) -* [xmlelements]({{< ref "#xmlelements" >}}) -* [HiveServer2 Clients#json]({{< ref "#hiveserver2-clients#json" >}}) -* [HiveServer2 Clients#jsonfile]({{< ref "#hiveserver2-clients#jsonfile" >}}) -* [separated-value formats]({{< ref "#separated-value-formats" >}}) (csv, tsv, csv2, tsv2, dsv) +* [table]({{% ref "#table" %}}) +* [vertical]({{% ref "#vertical" %}}) +* [xmlattr]({{% ref "#xmlattr" %}}) +* [xmlelements]({{% ref "#xmlelements" %}}) +* [HiveServer2 Clients#json]({{% ref "#hiveserver2-clients#json" %}}) +* [HiveServer2 Clients#jsonfile]({{% ref "#hiveserver2-clients#jsonfile" %}}) +* [separated-value formats]({{% ref "#separated-value-formats" %}}) (csv, tsv, csv2, tsv2, dsv) #### table @@ -372,16 +372,16 @@ tsv Starting with Hive 0.14.0, HiveServer2 operation logs are available for Beeline clients. These parameters configure logging: -* [hive.server2.logging.operation.enabled]({{< ref "#hive-server2-logging-operation-enabled" >}}) -* [hive.server2.logging.operation.log.location]({{< ref "#hive-server2-logging-operation-log-location" >}}) -* [hive.server2.logging.operation.verbose]({{< ref "#hive-server2-logging-operation-verbose" >}}) (Hive 0.14 to 1.1) -* [hive.server2.logging.operation.level]({{< ref "#hive-server2-logging-operation-level" >}}) (Hive 1.2 onward) +* [hive.server2.logging.operation.enabled]({{% ref "#hive-server2-logging-operation-enabled" %}}) +* [hive.server2.logging.operation.log.location]({{% ref "#hive-server2-logging-operation-log-location" %}}) +* [hive.server2.logging.operation.verbose]({{% ref "#hive-server2-logging-operation-verbose" %}}) (Hive 0.14 to 1.1) +* [hive.server2.logging.operation.level]({{% ref "#hive-server2-logging-operation-level" %}}) (Hive 1.2 onward)  [HIVE-11488](https://issues.apache.org/jira/browse/HIVE-11488) (Hive 2.0.0) adds the support of logging queryId and sessionId to HiveServer2 log file. To enable that, edit/add %X{queryId} and %X{sessionId} to the pattern format string of the logging configuration file. ### Cancelling the Query -When a user enters `CTRL+C` on the Beeline shell, if there is a query which is running at the same time then Beeline attempts to cancel the query while closing the socket connection to HiveServer2. This behavior is enabled only when `[hive.server2.close.session.on.disconnect]({{< ref "#hive-server2-close-session-on-disconnect" >}})` is set to `true`. Starting from Hive 2.2.0 ([HIVE-15626](https://issues.apache.org/jira/browse/HIVE-15626)) Beeline does not exit the command line shell when the running query is being cancelled as a user enters `CTRL+C`. If the user wishes to exit the shell they can enter `CTRL+C` for the second time while the query is being cancelled. However, if there is no query currently running, the first `CTRL+C` will exit the Beeline shell. This behavior is similar to how the Hive CLI handles `CTRL+C`. +When a user enters `CTRL+C` on the Beeline shell, if there is a query which is running at the same time then Beeline attempts to cancel the query while closing the socket connection to HiveServer2. This behavior is enabled only when `[hive.server2.close.session.on.disconnect]({{% ref "#hive-server2-close-session-on-disconnect" %}})` is set to `true`. Starting from Hive 2.2.0 ([HIVE-15626](https://issues.apache.org/jira/browse/HIVE-15626)) Beeline does not exit the command line shell when the running query is being cancelled as a user enters `CTRL+C`. If the user wishes to exit the shell they can enter `CTRL+C` for the second time while the query is being cancelled. However, if there is no query currently running, the first `CTRL+C` will exit the Beeline shell. This behavior is similar to how the Hive CLI handles `CTRL+C`. `!quit` is the recommended command to exit the Beeline shell. @@ -428,7 +428,7 @@ Special characters in *`sess_var_list, hive_conf_list, hive_var_list`*paramete ### Connection URL for Remote or Embedded Mode -The JDBC connection URL format has the prefix `jdbc:hive2://` and the Driver class is `org.apache.hive.jdbc.HiveDriver`. Note that this is different from the old [HiveServer]({{< ref "hiveserver" >}}). +The JDBC connection URL format has the prefix `jdbc:hive2://` and the Driver class is `org.apache.hive.jdbc.HiveDriver`. Note that this is different from the old [HiveServer]({{% ref "hiveserver" %}}). * For a remote server, the URL format is `jdbc:hive2://:/;initFile=` (default port for HiveServer2 is 10000). * For an embedded server, the URL format is `jdbc:hive2:///;initFile=`(no host or port). @@ -439,12 +439,12 @@ The `initFile` option is available in [Hive 2.2.0](https://issues.apache.org/jir JDBC connection URL:  `jdbc:hive2://:/;transportMode=http;httpPath=`, where: -* `` is the corresponding HTTP endpoint configured in [hive-site.xml]({{< ref "#hive-site-xml" >}}). Default value is `cliservice`. +* `` is the corresponding HTTP endpoint configured in [hive-site.xml]({{% ref "#hive-site-xml" %}}). Default value is `cliservice`. * Default port for HTTP transport mode is 10001. Versions earlier than 0.14 -In versions earlier than [0.14](https://issues.apache.org/jira/browse/HIVE-6972) these parameters used to be called `[hive.server2.transport.mode]({{< ref "#hive-server2-transport-mode" >}})` and `[hive.server2.thrift.http.path]({{< ref "#hive-server2-thrift-http-path" >}})` respectively and were part of the *hive_conf_list*. These versions have been deprecated in favour of the new versions (which are part of the *sess_var_list*) but continue to work for now. +In versions earlier than [0.14](https://issues.apache.org/jira/browse/HIVE-6972) these parameters used to be called `[hive.server2.transport.mode]({{% ref "#hive-server2-transport-mode" %}})` and `[hive.server2.thrift.http.path]({{% ref "#hive-server2-thrift-http-path" %}})` respectively and were part of the *hive_conf_list*. These versions have been deprecated in favour of the new versions (which are part of the *sess_var_list*) but continue to work for now. ### Connection URL When SSL Is Enabled in HiveServer2 @@ -455,11 +455,11 @@ JDBC connection URL:  `jdbc:hive2://:/;ssl=true;sslTrustStore=< In HTTP mode:  `jdbc:hive2://:/;ssl=true;sslTrustStore=;trustStorePassword=;transportMode=http;httpPath=`. -For versions earlier than 0.14, see the [version note]({{< ref "#version-note" >}}) above. +For versions earlier than 0.14, see the [version note]({{% ref "#version-note" %}}) above. ### Connection URL When ZooKeeper Service Discovery Is Enabled -ZooKeeper-based service discovery introduced in Hive 0.14.0 ([HIVE-7935](https://issues.apache.org/jira/browse/HIVE-7395)) enables high availability and rolling upgrade for HiveServer2. A JDBC URL that specifies `` needs to be used to make use of these features. That is, at least in `hive-site.xml` or other configuration files for HiveServer2, `hive.server2.support.dynamic.service.discovery` should be set to `true`, and `hive.zookeeper.quorum` should be defined to point to several started Zookeeper Servers. Reference [Configuration Properties]({{< ref "configuration-properties" >}}) . +ZooKeeper-based service discovery introduced in Hive 0.14.0 ([HIVE-7935](https://issues.apache.org/jira/browse/HIVE-7395)) enables high availability and rolling upgrade for HiveServer2. A JDBC URL that specifies `` needs to be used to make use of these features. That is, at least in `hive-site.xml` or other configuration files for HiveServer2, `hive.server2.support.dynamic.service.discovery` should be set to `true`, and `hive.zookeeper.quorum` should be defined to point to several started Zookeeper Servers. Reference [Configuration Properties]({{% ref "configuration-properties" %}}) . The minimal configuration example is as follows. @@ -846,7 +846,7 @@ To use sasl.qop, add the following to the sessionconf part of your Hive JDBC hi ``` jdbc:hive://hostname/dbname;sasl.qop=auth-int ``` -For more information, see [Setting Up HiveServer2]({{< ref "setting-up-hiveserver2" >}}). +For more information, see [Setting Up HiveServer2]({{% ref "setting-up-hiveserver2" %}}). ### Multi-User Scenarios and Programmatic Login to Kerberos KDC @@ -861,7 +861,7 @@ Proxy user privileges in the Hadoop ecosystem are associated with both user name If you are only making a JDBC connection as a privileged user from a single blessed machine, then direct proxy access is the simpler approach. You can just pass the user you need to impersonate in the JDBC URL by using the hive.server2.proxy.user=*\* parameter. See examples in [ProxyAuthTest.java](https://github.com/apache/hive/blob/master/beeline/src/test/org/apache/hive/beeline/ProxyAuthTest.java). -Support for delegation tokens with HiveServer2 binary transport mode [hive.server2.transport.mode]({{< ref "#hive-server2-transport-mode" >}}) has been available starting 0.13.0; support for this feature with HTTP transport mode was added in [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169), which should be part of Hive 2.1.0. +Support for delegation tokens with HiveServer2 binary transport mode [hive.server2.transport.mode]({{% ref "#hive-server2-transport-mode" %}}) has been available starting 0.13.0; support for this feature with HTTP transport mode was added in [HIVE-13169](https://issues.apache.org/jira/browse/HIVE-13169), which should be part of Hive 2.1.0. The other way is to use a pre-authenticated Kerberos Subject (see [HIVE-6486](https://issues.apache.org/jira/browse/HIVE-6486)). In this method, starting with Hive 0.13.0 the Hive JDBC client can use a pre-authenticated subject to authenticate to HiveServer2. This enables a middleware system to run queries as the user running the client. @@ -923,7 +923,7 @@ The default JDBC fetch size value may be overwritten, per statement, with the JD + If the fetch size value is absent from the JDBC connection string, the server's preferred fetch size is used as the default value # Python Client -A Python client driver is available on [github](https://github.com/BradRuderman/pyhs2). For installation instructions, see [Setting Up HiveServer2: Python Client Driver]({{< ref "#setting-up-hiveserver2:-python-client-driver" >}}). +A Python client driver is available on [github](https://github.com/BradRuderman/pyhs2). For installation instructions, see [Setting Up HiveServer2: Python Client Driver]({{% ref "#setting-up-hiveserver2:-python-client-driver" %}}). # Ruby Client @@ -1019,7 +1019,7 @@ JDBC connection URL: * *\* is the path where the client's keystore file lives. This is a mandatory non-empty field. * *\* is the password to access the keystore. -For versions earlier than 0.14, see the [version note]({{< ref "#version-note" >}}) above. +For versions earlier than 0.14, see the [version note]({{% ref "#version-note" %}}) above. In the environment where exposing `trustStorePassword` and `keyStorePassword` in the connection URL is a security concern, a new option s`torePasswordPath` is introduced with [HIVE-27308](https://issues.apache.org/jira/browse/HIVE-27308) that can be used in URL instead of `trustStorePassword` and `keyStorePassword`. s`torePasswordPath` value hold the path to the local keystore file storing the `trustStorePassword` and `keyStorePassword` aliases. When the existing `trustStorePassword` or `keyStorePassword` is present in URL along with s`torePasswordPath`, respective password is directly obtained from password option.  Otherwise, fetches the particular alias from local keystore file(i.e., existing password options are preferred over s`torePasswordPath`). @@ -1047,7 +1047,7 @@ JDBC connection URL: When the above URL is specified, Beeline will call underlying requests to add an HTTP header set to *\* and *\* and another HTTP header set to *\* and *\*. This is helpful when the end user needs to send identity in an HTTP header down to intermediate servers such as Knox via Beeline for authentication, for example `http.header.USERNAME=\;http.header.PASSWORD=\`. -For versions earlier than 0.14, see the [version note]({{< ref "#version-note" >}}) above.  +For versions earlier than 0.14, see the [version note]({{% ref "#version-note" %}}) above.  ### Passing Custom HTTP Cookie Key/Value Pairs via JDBC Driver diff --git a/content/docs/latest/user/hiveserver2-overview.md b/content/docs/latest/user/hiveserver2-overview.md index b840e3af..56732b4f 100644 --- a/content/docs/latest/user/hiveserver2-overview.md +++ b/content/docs/latest/user/hiveserver2-overview.md @@ -25,7 +25,7 @@ The TThreadPoolServer allocates one worker thread per TCP connection. Each threa ## Transport -HTTP mode is required when a proxy is needed between the client and server (for example, for load balancing or security reasons). That is why it is supported, as well as TCP mode. You can specify the transport mode of the Thrift service through the Hive configuration property [hive.server2.transport.mode]({{< ref "#hive-server2-transport-mode" >}}). +HTTP mode is required when a proxy is needed between the client and server (for example, for load balancing or security reasons). That is why it is supported, as well as TCP mode. You can specify the transport mode of the Thrift service through the Hive configuration property [hive.server2.transport.mode]({{% ref "#hive-server2-transport-mode" %}}). ## Protocol @@ -37,12 +37,12 @@ Process implementation is the application logic to handle requests. For example, # Dependencies of HS2 -* [Metastore]({{< ref "adminmanual-metastore-administration" >}}) +* [Metastore]({{% ref "adminmanual-metastore-administration" %}}) The metastore can be configured as embedded (in the same process as HS2) or as a remote server (which is a Thrift-based service as well). HS2 talks to the metastore for the metadata required for query compilation. * Hadoop cluster HS2 prepares physical execution plans for various execution engines (MapReduce/Tez/Spark) and submits jobs to the Hadoop cluster for execution. -You can find a diagram of the interactions between HS2 and its dependencies [here]({{< ref "#here" >}}). +You can find a diagram of the interactions between HS2 and its dependencies [here]({{% ref "#here" %}}). # JDBC Client @@ -67,7 +67,7 @@ The following sections help you locate some basic components of HiveServer2 in t * **org.apache.hive.service.cli.thrift.EmbeddedThriftBinaryCLIService class:** Embedded mode for HS2. Don't get confused with embedded metastore, which is a different service (although the embedded mode concept is similar). * **org.apache.hive.service.cli.session.HiveSessionImpl class**: Instances of this class are created on the server side and managed by an *org.apache.accumulo.tserver.TabletServer.SessionManager* instance*.* * **org.apache.hive.service.cli.operation.Operation class**: Defines an operation (e.g., a query). Instances of this class are created on the server and managed by an *org.apache.hive.service.cli.operation.OperationManager* instance*.* -* **org.apache.hive.service.auth.HiveAuthFactory class**: A helper used by both HTTP and TCP mode for authentication. Refer to [Setting Up HiveServer2]({{< ref "setting-up-hiveserver2" >}}) for various authentication options, in particular [Authentication/Security Configuration]({{< ref "#authentication/security-configuration" >}}) and [Cookie Based Authentication]({{< ref "#cookie-based-authentication" >}}). +* **org.apache.hive.service.auth.HiveAuthFactory class**: A helper used by both HTTP and TCP mode for authentication. Refer to [Setting Up HiveServer2]({{% ref "setting-up-hiveserver2" %}}) for various authentication options, in particular [Authentication/Security Configuration]({{% ref "#authentication/security-configuration" %}}) and [Cookie Based Authentication]({{% ref "#cookie-based-authentication" %}}). ## Client Side @@ -82,13 +82,13 @@ The following sections help you locate some basic components of HiveServer2 in t # Resources -How to set up HS2: [Setting Up HiveServer2]({{< ref "setting-up-hiveserver2" >}}) +How to set up HS2: [Setting Up HiveServer2]({{% ref "setting-up-hiveserver2" %}}) -HS2 clients: [HiveServer2 Clients]({{< ref "hiveserver2-clients" >}}) +HS2 clients: [HiveServer2 Clients]({{% ref "hiveserver2-clients" %}}) -User interface:  [Web UI for HiveServer2]({{< ref "#web-ui-for-hiveserver2" >}}) +User interface:  [Web UI for HiveServer2]({{% ref "#web-ui-for-hiveserver2" %}}) -Metrics:  [Hive Metrics]({{< ref "hive-metrics" >}}) +Metrics:  [Hive Metrics]({{% ref "hive-metrics" %}}) Cloudera blog on HS2:  diff --git a/content/docs/latest/user/parquet.md b/content/docs/latest/user/parquet.md index 699ae095..8896d0a1 100644 --- a/content/docs/latest/user/parquet.md +++ b/content/docs/latest/user/parquet.md @@ -47,11 +47,11 @@ To use Parquet with Hive 0.10-0.12 you must [download](http://search.maven.org/ ### Hive 0.13 -Native Parquet support was added ([HIVE-5783)](https://issues.apache.org/jira/browse/HIVE-5783). Please note that not all Parquet data types are supported in this version (see [Versions and Limitations]({{< ref "#versions-and-limitations" >}}) below). +Native Parquet support was added ([HIVE-5783)](https://issues.apache.org/jira/browse/HIVE-5783). Please note that not all Parquet data types are supported in this version (see [Versions and Limitations]({{% ref "#versions-and-limitations" %}}) below). ## **HiveQL Syntax** -A [CREATE TABLE]({{< ref "#create-table" >}}) statement can specify the Parquet storage format with syntax that depends on the Hive version. +A [CREATE TABLE]({{% ref "#create-table" %}}) statement can specify the Parquet storage format with syntax that depends on the Hive version. ### Hive 0.10 - 0.12 diff --git a/content/docs/latest/user/rcfile.md b/content/docs/latest/user/rcfile.md index d4500f57..427d3d6b 100644 --- a/content/docs/latest/user/rcfile.md +++ b/content/docs/latest/user/rcfile.md @@ -14,7 +14,7 @@ RCFile combines the advantages of both row-store and column-store to satisfy th * As row-store, RCFile guarantees that data in the same row are located in the same node. * As column-store, RCFile can exploit column-wise data compression and skip unnecessary column reads. -A shell utility is available for reading RCFile data and metadata: see [RCFileCat]({{< ref "rcfilecat" >}}). +A shell utility is available for reading RCFile data and metadata: see [RCFileCat]({{% ref "rcfilecat" %}}). For details about the RCFile format, see: diff --git a/content/docs/latest/user/rcfilecat.md b/content/docs/latest/user/rcfilecat.md index 8087c1df..e816113c 100644 --- a/content/docs/latest/user/rcfilecat.md +++ b/content/docs/latest/user/rcfilecat.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : RCFileCat -$HIVE_HOME/bin/hive --rcfilecat is a shell utility which can be used to print data or metadata from [RC files]({{< ref "rcfile" >}}). +$HIVE_HOME/bin/hive --rcfilecat is a shell utility which can be used to print data or metadata from [RC files]({{% ref "rcfile" %}}). ## Data diff --git a/content/docs/latest/user/replacing-the-implementation-of-hive-cli-using-beeline.md b/content/docs/latest/user/replacing-the-implementation-of-hive-cli-using-beeline.md index 8618c38b..61ba8884 100644 --- a/content/docs/latest/user/replacing-the-implementation-of-hive-cli-using-beeline.md +++ b/content/docs/latest/user/replacing-the-implementation-of-hive-cli-using-beeline.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Why Replace the Existing Hive CLI? -[Hive CLI]({{< ref "#hive-cli" >}}) is a legacy tool which had two main use cases. The first is that it served as a thick client for SQL on Hadoop and the second is that it served as a command line tool for Hive Server (the original Hive server, now often referred to as "HiveServer1"). Hive Server has been deprecated and removed from the Hive code base as of Hive 1.0.0 ([HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977)) and replaced with HiveServer2 ([HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935)), so the second use case no longer applies. For the first use case, [Beeline]({{< ref "#beeline" >}}) provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI. +[Hive CLI]({{% ref "#hive-cli" %}}) is a legacy tool which had two main use cases. The first is that it served as a thick client for SQL on Hadoop and the second is that it served as a command line tool for Hive Server (the original Hive server, now often referred to as "HiveServer1"). Hive Server has been deprecated and removed from the Hive code base as of Hive 1.0.0 ([HIVE-6977](https://issues.apache.org/jira/browse/HIVE-6977)) and replaced with HiveServer2 ([HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935)), so the second use case no longer applies. For the first use case, [Beeline]({{% ref "#beeline" %}}) provides or is supposed to provide equal functionality, yet is implemented differently from Hive CLI. Ideally, Hive CLI should be deprecated as the Hive community has long recommended using the Beeline plus HiveServer2 configuration; however, because of the wide use of Hive CLI, we instead are replacing Hive CLI's implementation with a new Hive CLI on top of Beeline plus embedded HiveServer2 ([HIVE-10511](https://issues.apache.org/jira/browse/HIVE-10511)) so that the Hive community only needs to maintain a single code path. In this way, the new Hive CLI is just an alias to Beeline at both the shell script level and the high code level. The goal is that no or minimal changes are required from existing user scripts using Hive CLI. @@ -83,12 +83,12 @@ Use ";" (semicolon) to terminate commands. Comments in scripts can be specified | `set =` | Sets the value of a particular configuration variable (key). **Note:** If you misspell the variable name, the CLI will not show an error. | | `set` | Prints a list of configuration variables that are overridden by the user or Hive. | | `set -v` | Prints all Hadoop and Hive configuration variables. | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | -| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form [ivy://group:module:version?query_string]({{< ref "ivy://groupmoduleversion?query_string" >}}). See [Hive Resources]({{< ref "#hive-resources" >}})  for more information. | -| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}})  for more information. | -| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{< ref "#hive-resources" >}})  for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | Adds one or more files, jars, or archives to the list of resources in the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | +| `add FILE[S] *` `add JAR[S] *` `add ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), adds one or more files, jars or archives to the list of resources in the distributed cache using an [Ivy](http://ant.apache.org/ivy/) URL of the form [ivy://group:module:version?query_string]({{% ref "ivy://groupmoduleversion?query_string" %}}). See [Hive Resources]({{% ref "#hive-resources" %}})  for more information. | +| `list FILE[S]` `list JAR[S]` `list ARCHIVE[S]` | Lists the resources already added to the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}})  for more information. | +| `list FILE[S] *` `list JAR[S] *` `list ARCHIVE[S] *` | Checks whether the given resources are already added to the distributed cache or not. See [Hive Resources]({{% ref "#hive-resources" %}})  for more information. | | `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | Removes the resource(s) from the distributed cache. | -| `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{< ref "#hive-resources" >}}) for more information. | +| `delete FILE[S] *` `delete JAR[S] *` `delete ARCHIVE[S] *` | As of [Hive 1.2.0](https://issues.apache.org/jira/browse/HIVE-9664), removes the resource(s) which were added using the \ from the distributed cache. See [Hive Resources]({{% ref "#hive-resources" %}}) for more information. | | `! ` | Executes a shell command from the Hive shell. | | `dfs ` | Executes a dfs command from the Hive shell. | | `` | Executes a Hive query and prints results to standard output. | diff --git a/content/docs/latest/user/serde.md b/content/docs/latest/user/serde.md index 3f1bb7f4..6d7fd073 100644 --- a/content/docs/latest/user/serde.md +++ b/content/docs/latest/user/serde.md @@ -11,7 +11,7 @@ SerDe is short for Serializer/Deserializer. Hive uses the SerDe interface for I A SerDe allows Hive to read in data from a table, and write it back out to HDFS in any custom format. Anyone can write their own SerDe for their own data formats. -See [Hive SerDe]({{< ref "#hive-serde" >}}) for an introduction to SerDes. +See [Hive SerDe]({{% ref "#hive-serde" %}}) for an introduction to SerDes. ## Built-in and Custom SerDes @@ -19,23 +19,23 @@ The Hive SerDe library is in org.apache.hadoop.hive.serde2. (The old SerDe lib ### Built-in SerDes -* [Avro]({{< ref "avroserde" >}}) (Hive 0.9.1 and later) -* [ORC]({{< ref "languagemanual-orc" >}}) (Hive 0.11 and later) -* [RegEx]({{< ref "#regex" >}}) +* [Avro]({{% ref "avroserde" %}}) (Hive 0.9.1 and later) +* [ORC]({{% ref "languagemanual-orc" %}}) (Hive 0.11 and later) +* [RegEx]({{% ref "#regex" %}}) * [Thrift](http://thrift.apache.org/) -* [Parquet]({{< ref "parquet" >}}) (Hive 0.13 and later) +* [Parquet]({{% ref "parquet" %}}) (Hive 0.13 and later) * [CSV](/docs/latest/user/csv-serde) (Hive 0.14 and later) -* [JsonSerDe]({{< ref "#jsonserde" >}}) (Hive 0.12 and later in [hcatalog-core](https://github.com/apache/hive/blob/master/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java)) +* [JsonSerDe]({{% ref "#jsonserde" %}}) (Hive 0.12 and later in [hcatalog-core](https://github.com/apache/hive/blob/master/hcatalog/core/src/main/java/org/apache/hive/hcatalog/data/JsonSerDe.java)) Note: For Hive releases prior to 0.12, Amazon provides a JSON SerDe available at `s3://elasticmapreduce/samples/hive-ads/libs/jsonserde.jar`. ### Custom SerDes -For information about custom SerDes, see [How to Write Your Own SerDe]({{< ref "#how-to-write-your-own-serde" >}}) in the [Developer Guide]({{< ref "developerguide" >}}). +For information about custom SerDes, see [How to Write Your Own SerDe]({{% ref "#how-to-write-your-own-serde" %}}) in the [Developer Guide]({{% ref "developerguide" %}}). ## HiveQL for SerDes -For the HiveQL statements that specify SerDes and their properties, see [Create Table]({{< ref "#create-table" >}}) (particularly [Row Formats & SerDe]({{< ref "#row-formats-&-serde" >}})) and [Alter Table]({{< ref "#alter-table" >}}) ([Add SerDe Properties]({{< ref "#add-serde-properties" >}})). +For the HiveQL statements that specify SerDes and their properties, see [Create Table]({{% ref "#create-table" %}}) (particularly [Row Formats & SerDe]({{% ref "#row-formats-&-serde" %}})) and [Alter Table]({{% ref "#alter-table" %}}) ([Add SerDe Properties]({{% ref "#add-serde-properties" %}})). # Input Processing diff --git a/content/docs/latest/user/storage-based-authorization-in-the-metastore-server.md b/content/docs/latest/user/storage-based-authorization-in-the-metastore-server.md index ac28c079..83fc7478 100644 --- a/content/docs/latest/user/storage-based-authorization-in-the-metastore-server.md +++ b/content/docs/latest/user/storage-based-authorization-in-the-metastore-server.md @@ -5,12 +5,12 @@ date: 2024-12-12 # Apache Hive : Storage Based Authorization in the Metastore Server -The metastore server security feature with storage based authorization was added to Hive in release 0.10. This feature was introduced previously in [HCatalog]({{< ref "hcatalog-authorization" >}}). +The metastore server security feature with storage based authorization was added to Hive in release 0.10. This feature was introduced previously in [HCatalog]({{% ref "hcatalog-authorization" %}}). [HIVE-3705](https://issues.apache.org/jira/browse/HIVE-3705) added metastore server security to Hive in release 0.10.0. -* For additional information about storage based authorization in the metastore server, see the HCatalog document [Storage Based Authorization]({{< ref "hcatalog-authorization" >}}). -* For an overview of Hive authorization models and other security options, see the [Authorization]({{< ref "languagemanual-authorization" >}}) document. +* For additional information about storage based authorization in the metastore server, see the HCatalog document [Storage Based Authorization]({{% ref "hcatalog-authorization" %}}). +* For an overview of Hive authorization models and other security options, see the [Authorization]({{% ref "languagemanual-authorization" %}}) document. ## The Need for Metastore Server Security @@ -22,7 +22,7 @@ Also, when a Hive metastore server uses Thrift to communicate with clients and h When metastore server security is configured to use Storage Based Authorization, it uses the file system permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy. Use of Storage Based Authorization in metastore is recommended. -See details in the HCatalog [Storage Based Authorization document]({{< ref "hcatalog-authorization" >}}). +See details in the HCatalog [Storage Based Authorization document]({{% ref "hcatalog-authorization" %}}). Starting in Hive 0.14, storage based authorization authorizes read privilege on database and tables. The `get_database` api call needs database directory read privilege. The `get_table_*` calls that fetch table information and `get_partition_*` calls to list the partitions of a table require read privilege on the table directory. It is enabled by default with storage based authorization. See hive.security.metastore.authorization.auth.reads in the next section on configuration. diff --git a/content/docs/latest/user/streaming-data-ingest-v2.md b/content/docs/latest/user/streaming-data-ingest-v2.md index 38e3c4e8..1152130c 100644 --- a/content/docs/latest/user/streaming-data-ingest-v2.md +++ b/content/docs/latest/user/streaming-data-ingest-v2.md @@ -5,7 +5,7 @@ date: 2024-12-12 # Apache Hive : Streaming Data Ingest V2 -Starting in release Hive 3.0.0, [Streaming Data Ingest]({{< ref "streaming-data-ingest" >}}) is deprecated and is replaced by newer V2 API ([HIVE-19205](https://issues.apache.org/jira/browse/HIVE-19205)).  +Starting in release Hive 3.0.0, [Streaming Data Ingest]({{% ref "streaming-data-ingest" %}}) is deprecated and is replaced by newer V2 API ([HIVE-19205](https://issues.apache.org/jira/browse/HIVE-19205)).  # Hive Streaming API @@ -13,7 +13,7 @@ Traditionally adding new data into Hive requires gathering a large amount of dat Hive Streaming API allows data to be pumped continuously into Hive. The incoming data can be continuously committed in small batches of records into an existing Hive partition or table. Once data is committed it becomes immediately visible to all Hive queries initiated subsequently. -This API is intended for streaming clients such as [NiFi](https://nifi.apache.org/), [Flume](http://flume.apache.org/) and [Storm](https://storm.incubator.apache.org/), which continuously generate data. Streaming support is built on top of ACID based insert/update support in Hive (see [Hive Transactions]({{< ref "hive-transactions" >}})). +This API is intended for streaming clients such as [NiFi](https://nifi.apache.org/), [Flume](http://flume.apache.org/) and [Storm](https://storm.incubator.apache.org/), which continuously generate data. Streaming support is built on top of ACID based insert/update support in Hive (see [Hive Transactions]({{% ref "hive-transactions" %}})). The Classes and interfaces part of the Hive streaming API are broadly categorized into two sets. The first set provides support for connection and transaction management while the second set provides I/O support. Transactions are managed by the metastore. Writes are performed directly to destination filesystem defined by the table (HDFS, S3A etc.). @@ -34,7 +34,7 @@ A few things are required to use streaming. 2. **hive.compactor.initiator.on = true**(See more important details [here](/docs/latest/user/hive-transactions#new-configuration-parameters-for-transactions)) 3. **hive.compactor.cleaner.on = true** (From Hive 4.0.0 onwards. See more important details [here](/docs/latest/user/hive-transactions#new-configuration-parameters-for-transactions)) 4. **hive.compactor.worker.threads** > **0** -2. *“stored as orc”* must be specified during [table creation]({{< ref "#table-creation" >}}). Only [ORC storage format]({{< ref "languagemanual-orc" >}}) is supported currently. +2. *“stored as orc”* must be specified during [table creation]({{% ref "#table-creation" %}}). Only [ORC storage format]({{% ref "languagemanual-orc" %}}) is supported currently. 3. tblproperties("transactional"="true") must be set on the table during creation. 4. User of the client streaming process must have the necessary permissions to write to the table or partition and create partitions in the table. @@ -89,7 +89,7 @@ The HiveStreamingConnection is highly optimized for write throughput ([Delta Str ### Notes about the HiveConf Object -HiveStreamingConnect builder API accepts a HiveConf argument. This can either be set to null, or a pre-created HiveConf object can be provided. If this is null, a HiveConf object will be created internally and used for the connection. When a HiveConf object is instantiated, if the directory containing the hive-site.xml is part of the java classpath, then the HiveConf object will be initialized with values from it. If no hive-site.xml is found, then the object will be initialized with defaults. Pre-creating this object and reusing it across multiple connections may have a noticeable impact on performance if connections are being opened very frequently (for example several times a second). Secure connection relies on '[metastore.kerberos.principal]({{< ref "#metastore-kerberos-principal" >}})' being set correctly in the HiveConf object. +HiveStreamingConnect builder API accepts a HiveConf argument. This can either be set to null, or a pre-created HiveConf object can be provided. If this is null, a HiveConf object will be created internally and used for the connection. When a HiveConf object is instantiated, if the directory containing the hive-site.xml is part of the java classpath, then the HiveConf object will be initialized with values from it. If no hive-site.xml is found, then the object will be initialized with defaults. Pre-creating this object and reusing it across multiple connections may have a noticeable impact on performance if connections are being opened very frequently (for example several times a second). Secure connection relies on '[metastore.kerberos.principal]({{% ref "#metastore-kerberos-principal" %}})' being set correctly in the HiveConf object. Regardless of what values are set in hive-site.xml or custom HiveConf, the API will internally override some settings in it to ensure correct streaming behavior. The below is the list of settings that are overridden: @@ -111,7 +111,7 @@ RecordWriter is the base interface implemented by all Writers. A Writer is respo A RecordWriter's primary functions are: 1. Modify input record: This may involve dropping fields from input data if they don’t have corresponding table columns, adding nulls in case of missing fields for certain columns, and adding __HIVE_DEFAULT_PARTITION__ if partition column value is null or empty. Dynamically creating partitions requires understanding of incoming data format to extract last columns to extract partition values. -2. Encode modified record: The encoding involves serialization using an appropriate [Hive SerDe]({{< ref "serde" >}}). +2. Encode modified record: The encoding involves serialization using an appropriate [Hive SerDe]({{% ref "serde" %}}). 3. For bucketed tables, extract bucket column values from the record to identify the bucket where the record belongs. 4. For partitioned tables, in dynamic partitioning mode, extract the partition column values from last N columns (where N is number of partitions) of the record to identify the partition where the record belongs. 5. Write encoded record to Hive using the [AcidOutputFormat](https://hive.apache.org/javadocs/r1.2.1/api/org/apache/hadoop/hive/ql/io/AcidOutputFormat.html)'s record updater for the appropriate bucket. diff --git a/content/docs/latest/user/streaming-data-ingest.md b/content/docs/latest/user/streaming-data-ingest.md index 4147f8d2..539862c8 100644 --- a/content/docs/latest/user/streaming-data-ingest.md +++ b/content/docs/latest/user/streaming-data-ingest.md @@ -13,7 +13,7 @@ date: 2024-12-12 Traditionally adding new data into Hive requires gathering a large amount of data onto HDFS and then periodically adding a new partition. This is essentially a “batch insertion”. Insertion of new data into an existing partition is not permitted. Hive Streaming API allows data to be pumped continuously into Hive. The incoming data can be continuously committed in small batches of records into an existing Hive partition or table. Once data is committed it becomes immediately visible to all Hive queries initiated subsequently. -This API is intended for streaming clients such as [Flume](http://flume.apache.org/) and [Storm](https://storm.incubator.apache.org/), which continuously generate data. Streaming support is built on top of ACID based insert/update support in Hive (see [Hive Transactions]({{< ref "hive-transactions" >}})). +This API is intended for streaming clients such as [Flume](http://flume.apache.org/) and [Storm](https://storm.incubator.apache.org/), which continuously generate data. Streaming support is built on top of ACID based insert/update support in Hive (see [Hive Transactions]({{% ref "hive-transactions" %}})). The Classes and interfaces part of the Hive streaming API are broadly categorized into two sets. The first set provides support for connection and transaction management while the second set provides I/O support. Transactions are managed by the metastore. Writes are performed directly to HDFS. @@ -23,7 +23,7 @@ Streaming to **unpartitioned** tables is also supported. The API supports Kerb ### Streaming Mutation API -Starting in release 2.0.0, Hive offers another API for mutating (insert/update/delete) records into transactional tables using Hive’s ACID feature. See [HCatalog Streaming Mutation API]({{< ref "hcatalog-streaming-mutation-api" >}}) for details and a comparison with the streaming data ingest API that is described in this document. +Starting in release 2.0.0, Hive offers another API for mutating (insert/update/delete) records into transactional tables using Hive’s ACID feature. See [HCatalog Streaming Mutation API]({{% ref "hcatalog-streaming-mutation-api" %}}) for details and a comparison with the streaming data ingest API that is described in this document. # Streaming Requirements @@ -34,9 +34,9 @@ A few things are required to use streaming. 2. **hive.compactor.initiator.on = true**(See more important details [here](/docs/latest/user/hive-transactions#new-configuration-parameters-for-transactions)) 3. **hive.compactor.cleaner.on = true** (From Hive 4.0.0 onwards. See more important details [here](/docs/latest/user/hive-transactions#new-configuration-parameters-for-transactions)) 4. **hive.compactor.worker.threads** > **0** -2. *“stored as orc”* must be specified during [table creation]({{< ref "#table-creation" >}}). Only [ORC storage format]({{< ref "languagemanual-orc" >}}) is supported currently. +2. *“stored as orc”* must be specified during [table creation]({{% ref "#table-creation" %}}). Only [ORC storage format]({{% ref "languagemanual-orc" %}}) is supported currently. 3. tblproperties("transactional"="true") must be set on the table during creation. -4. The Hive table must be [bucketed]({{< ref "languagemanual-ddl-bucketedtables" >}}), but not sorted. So something like “clustered by (colName) into *10* buckets” must be specified during table creation. The number of buckets is ideally the same as the number of streaming writers. +4. The Hive table must be [bucketed]({{% ref "languagemanual-ddl-bucketedtables" %}}), but not sorted. So something like “clustered by (colName) into *10* buckets” must be specified during table creation. The number of buckets is ideally the same as the number of streaming writers. 5. User of the client streaming process must have the necessary permissions to write to the table or partition and create partitions in the table. 6. (Temporary requirements) **When issuing queries** on streaming tables, the client needs to set 1. **hive.vectorized.execution.enabled**  to  **false** (for Hive version < 0.14.0) @@ -58,7 +58,7 @@ The class HiveEndPoint describes a Hive End Point to connect to. This describes It is very likely that in a setup where data is being streamed continuously the data is added into new partitions periodically. Either the Hive admin can pre-create the necessary partitions or the streaming clients can create them as needed. HiveEndPoint.newConnection() accepts a boolean argument to indicate whether the partition should be auto created. Partition creation being an atomic action, multiple clients can race to create the partition, but only one will succeed, so streaming clients do not have to synchronize when creating a partition. -Transactions are implemented slightly differently than traditional database systems. Each transaction has an id and multiple transactions are grouped into a “Transaction Batch”. This helps grouping records from multiple transactions into fewer files (rather than 1 file per transaction). After connection, a streaming client first requests a new batch of transactions. In response it receives a set of Transaction Ids that are part of the transaction batch. Subsequently the client proceeds to consume one transaction id at a time by initiating new Transactions. The client will write() one or more records per transaction and either commits or aborts the current transaction before switching to the next one. Each TransactionBatch.write() invocation automatically associates the I/O attempt with the current Txn ID. The user of the streaming client process, needs to have write permissions to the partition or table. Kerberos based authentication is required to acquire connections as a specific user. See [secure streaming example]({{< ref "#secure-streaming-example" >}}) below. +Transactions are implemented slightly differently than traditional database systems. Each transaction has an id and multiple transactions are grouped into a “Transaction Batch”. This helps grouping records from multiple transactions into fewer files (rather than 1 file per transaction). After connection, a streaming client first requests a new batch of transactions. In response it receives a set of Transaction Ids that are part of the transaction batch. Subsequently the client proceeds to consume one transaction id at a time by initiating new Transactions. The client will write() one or more records per transaction and either commits or aborts the current transaction before switching to the next one. Each TransactionBatch.write() invocation automatically associates the I/O attempt with the current Txn ID. The user of the streaming client process, needs to have write permissions to the partition or table. Kerberos based authentication is required to acquire connections as a specific user. See [secure streaming example]({{% ref "#secure-streaming-example" %}}) below. **Concurrency Note:** I/O can be performed on multiple TransactionBatches concurrently. However the transactions within a transaction batch must be consumed sequentially. @@ -84,7 +84,7 @@ Generally, the more events are included in each transaction the more throughput ### Notes about the HiveConf Object -HiveEndPoint.newConnection() accepts a HiveConf argument. This can either be set to null, or a pre-created HiveConf object can be provided. If this is null, a HiveConf object will be created internally and used for the connection. When a HiveConf object is instantiated, if the directory containing the hive-site.xml is part of the java classpath, then the HiveConf object will be initialized with values from it. If no hive-site.xml is found, then the object will be initialized with defaults. Pre-creating this object and reusing it across multiple connections may have a noticeable impact on performance if connections are being opened very frequently (for example several times a second). Secure connection relies on '[hive.metastore.kerberos.principal]({{< ref "#hive-metastore-kerberos-principal" >}})' being set correctly in the HiveConf object. +HiveEndPoint.newConnection() accepts a HiveConf argument. This can either be set to null, or a pre-created HiveConf object can be provided. If this is null, a HiveConf object will be created internally and used for the connection. When a HiveConf object is instantiated, if the directory containing the hive-site.xml is part of the java classpath, then the HiveConf object will be initialized with values from it. If no hive-site.xml is found, then the object will be initialized with defaults. Pre-creating this object and reusing it across multiple connections may have a noticeable impact on performance if connections are being opened very frequently (for example several times a second). Secure connection relies on '[hive.metastore.kerberos.principal]({{% ref "#hive-metastore-kerberos-principal" %}})' being set correctly in the HiveConf object. Regardless of what values are set in hive-site.xml or custom HiveConf, the API will internally override some settings in it to ensure correct streaming behavior. The below is the list of settings that are overridden: @@ -104,7 +104,7 @@ RecordWriter is the base interface implemented by all Writers. A Writer is respo A RecordWriter's primary functions are: 1. Modify input record: This may involve dropping fields from input data if they don’t have corresponding table columns, adding nulls in case of missing fields for certain columns, and changing the order of incoming fields to match the order of fields in the table. This task requires understanding of incoming data format. Not all formats (for example JSON, which includes field names in the data) need this step. -2. Encode modified record: The encoding involves serialization using an appropriate [Hive SerDe]({{< ref "serde" >}}). +2. Encode modified record: The encoding involves serialization using an appropriate [Hive SerDe]({{% ref "serde" %}}). 3. Identify the bucket to which the record belongs 4. Write encoded record to Hive using the [AcidOutputFormat](https://hive.apache.org/javadocs/r1.2.1/api/org/apache/hadoop/hive/ql/io/AcidOutputFormat.html)'s record updater for the appropriate bucket. diff --git a/content/docs/latest/user/tutorial.md b/content/docs/latest/user/tutorial.md index e03d3488..72353270 100644 --- a/content/docs/latest/user/tutorial.md +++ b/content/docs/latest/user/tutorial.md @@ -19,9 +19,9 @@ Hive is not designed for online transaction processing.  It is best used for tr ## Getting Started -For details on setting up Hive, HiveServer2, and Beeline, please refer to the [GettingStarted]({{< ref "gettingstarted-latest" >}}) guide. +For details on setting up Hive, HiveServer2, and Beeline, please refer to the [GettingStarted]({{% ref "gettingstarted-latest" %}}) guide. -[Books about Hive]({{< ref "books-about-hive" >}}) lists some books that may also be helpful for getting started with Hive. +[Books about Hive]({{% ref "books-about-hive" %}}) lists some books that may also be helpful for getting started with Hive. In the following sections we provide a tutorial on the capabilities of the system. We start by describing the concepts of data types, tables, and partitions (which are very similar to what you would find in a traditional relational DBMS) and then illustrate the capabilities of Hive with the help of some examples. @@ -43,7 +43,7 @@ Note that it is not necessary for tables to be partitioned or bucketed, but thes ## Type System -Hive supports primitive and complex data types, as described below. See [Hive Data Types]({{< ref "languagemanual-types" >}}) for additional information. +Hive supports primitive and complex data types, as described below. See [Hive Data Types]({{% ref "languagemanual-types" %}}) for additional information. ### Primitive Types @@ -156,7 +156,7 @@ Other definitions: ## Built In Operators and Functions -The operators and functions listed below are not necessarily up to date. ([Hive Operators and UDFs]({{< ref "languagemanual-udf" >}}) has more current information.) In [Beeline]({{< ref "#beeline" >}}) or the Hive [CLI]({{< ref "languagemanual-cli" >}}), use these commands to show the latest documentation: +The operators and functions listed below are not necessarily up to date. ([Hive Operators and UDFs]({{% ref "languagemanual-udf" %}}) has more current information.) In [Beeline]({{% ref "#beeline" %}}) or the Hive [CLI]({{% ref "languagemanual-cli" %}}), use these commands to show the latest documentation: ``` SHOW FUNCTIONS; @@ -264,7 +264,7 @@ All Hive keywords are case-insensitive, including the names of Hive operators an ## Language Capabilities -[Hive's SQL]({{< ref "languagemanual" >}}) provides the basic SQL operations. These operations work on tables or partitions. These operations are: +[Hive's SQL]({{% ref "languagemanual" %}}) provides the basic SQL operations. These operations work on tables or partitions. These operations are: * Ability to filter rows from a table using a WHERE clause. * Ability to select certain columns from the table using a SELECT clause. @@ -278,17 +278,17 @@ All Hive keywords are case-insensitive, including the names of Hive operators an # Usage and Examples -**NOTE: Many of the following examples are out of date.  More up to date information can be found in the [LanguageManual]({{< ref "languagemanual" >}}).** +**NOTE: Many of the following examples are out of date.  More up to date information can be found in the [LanguageManual]({{% ref "languagemanual" %}}).** The following examples highlight some salient features of the system. A detailed set of query test cases can be found at [Hive Query Test Cases](http://svn.apache.org/viewvc/hive/trunk/ql/src/test/queries/clientpositive/) and the corresponding results can be found at [Query Test Case Results](http://svn.apache.org/viewvc/hive/trunk/ql/src/test/results/clientpositive/). -* [Creating, Showing, Altering, and Dropping Tables]({{< ref "#creating-showing-altering-and-dropping-tables" >}}) -* [Loading Data]({{< ref "#loading-data" >}}) -* [Querying and Inserting Data]({{< ref "#querying-and-inserting-data" >}}) +* [Creating, Showing, Altering, and Dropping Tables]({{% ref "#creating-showing-altering-and-dropping-tables" %}}) +* [Loading Data]({{% ref "#loading-data" %}}) +* [Querying and Inserting Data]({{% ref "#querying-and-inserting-data" %}}) ## Creating, Showing, Altering, and Dropping Tables -See [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}) for detailed information about creating, showing, altering, and dropping tables. +See [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}) for detailed information about creating, showing, altering, and dropping tables. ### Creating Tables @@ -513,25 +513,25 @@ In the case that the input file /tmp/pv_2008-06-08_us.txt is very large, the use It is assumed that the array and map fields in the input.txt files are null fields for these examples. -See [Hive Data Manipulation Language]({{< ref "languagemanual-dml" >}}) for more information about loading data into Hive tables, and see [External Tables]({{< ref "#external-tables" >}}) for another example of creating an external table. +See [Hive Data Manipulation Language]({{% ref "languagemanual-dml" %}}) for more information about loading data into Hive tables, and see [External Tables]({{% ref "#external-tables" %}}) for another example of creating an external table. ## Querying and Inserting Data -* [Simple Query]({{< ref "#simple-query" >}}) -* [Partition Based Query]({{< ref "#partition-based-query" >}}) -* [Joins]({{< ref "#joins" >}}) -* [Aggregations]({{< ref "#aggregations" >}}) -* [Multi Table/File Inserts]({{< ref "#multi-tablefile-inserts" >}}) -* [Dynamic-Partition Insert]({{< ref "#dynamic-partition-insert" >}}) -* [Inserting into Local Files]({{< ref "#inserting-into-local-files" >}}) -* [Sampling]({{< ref "#sampling" >}}) -* [Union All]({{< ref "#union-all" >}}) -* [Array Operations]({{< ref "#array-operations" >}}) -* [Map (Associative Arrays) Operations]({{< ref "#map-associative-arrays-operations" >}}) -* [Custom Map/Reduce Scripts]({{< ref "#custom-mapreduce-scripts" >}}) -* [Co-Groups]({{< ref "#co-groups" >}}) - -The Hive query operations are documented in [Select]({{< ref "languagemanual-select" >}}), and the insert operations are documented in [Inserting data into Hive Tables from queries]({{< ref "#inserting-data-into-hive-tables-from-queries" >}}) and [Writing data into the filesystem from queries]({{< ref "#writing-data-into-the-filesystem-from-queries" >}}). +* [Simple Query]({{% ref "#simple-query" %}}) +* [Partition Based Query]({{% ref "#partition-based-query" %}}) +* [Joins]({{% ref "#joins" %}}) +* [Aggregations]({{% ref "#aggregations" %}}) +* [Multi Table/File Inserts]({{% ref "#multi-tablefile-inserts" %}}) +* [Dynamic-Partition Insert]({{% ref "#dynamic-partition-insert" %}}) +* [Inserting into Local Files]({{% ref "#inserting-into-local-files" %}}) +* [Sampling]({{% ref "#sampling" %}}) +* [Union All]({{% ref "#union-all" %}}) +* [Array Operations]({{% ref "#array-operations" %}}) +* [Map (Associative Arrays) Operations]({{% ref "#map-associative-arrays-operations" %}}) +* [Custom Map/Reduce Scripts]({{% ref "#custom-mapreduce-scripts" %}}) +* [Co-Groups]({{% ref "#co-groups" %}}) + +The Hive query operations are documented in [Select]({{% ref "languagemanual-select" %}}), and the insert operations are documented in [Inserting data into Hive Tables from queries]({{% ref "#inserting-data-into-hive-tables-from-queries" %}}) and [Writing data into the filesystem from queries]({{% ref "#writing-data-into-the-filesystem-from-queries" %}}). ### Simple Query @@ -545,7 +545,7 @@ For all the active users, one can use the query of the following form: ``` -Note that unlike SQL, we always insert the results into a table. We will illustrate later how the user can inspect these results and even dump them to a local file. You can also run the following query in [Beeline]({{< ref "#beeline" >}}) or the Hive [CLI]({{< ref "languagemanual-cli" >}}): +Note that unlike SQL, we always insert the results into a table. We will illustrate later how the user can inspect these results and even dump them to a local file. You can also run the following query in [Beeline]({{% ref "#beeline" %}}) or the Hive [CLI]({{% ref "languagemanual-cli" %}}): ``` SELECT user.* @@ -680,7 +680,7 @@ In the previous examples, the user has to know which partition to insert into an In order to load data into all country partitions in a particular day, you have to add an insert statement for each country in the input data. This is very inconvenient since you have to have the priori knowledge of the list of countries exist in the input data and create the partitions beforehand. If the list changed for another day, you have to modify your insert DML as well as the partition creation DDLs. It is also inefficient since each insert statement may be turned into a MapReduce Job. -*[Dynamic-partition insert]({{< ref "#dynamic-partition-insert" >}})* (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table. This is a newly added feature that is only available from version 0.6.0. In the dynamic partition insert, the input column values are evaluated to determine which partition this row should be inserted into. If that partition has not been created, it will create that partition automatically. Using this feature you need only one insert statement to create and populate all necessary partitions. In addition, since there is only one insert statement, there is only one corresponding MapReduce job. This significantly improves performance and reduce the Hadoop cluster workload comparing to the multiple insert case. +*[Dynamic-partition insert]({{% ref "#dynamic-partition-insert" %}})* (or multi-partition insert) is designed to solve this problem by dynamically determining which partitions should be created and populated while scanning the input table. This is a newly added feature that is only available from version 0.6.0. In the dynamic partition insert, the input column values are evaluated to determine which partition this row should be inserted into. If that partition has not been created, it will create that partition automatically. Using this feature you need only one insert statement to create and populate all necessary partitions. In addition, since there is only one insert statement, there is only one corresponding MapReduce job. This significantly improves performance and reduce the Hadoop cluster workload comparing to the multiple insert case. Below is an example of loading data to all country partitions using one insert statement: @@ -743,13 +743,13 @@ This query will generate a MapReduce job rather than Map-only job. The SELECT-cl Additional documentation: -* [Design Document for Dynamic Partitions]({{< ref "dynamicpartitions" >}}) +* [Design Document for Dynamic Partitions]({{% ref "dynamicpartitions" %}}) + [Original design doc](https://issues.apache.org/jira/secure/attachment/12437909/dp_design.txt) + [HIVE-936](https://issues.apache.org/jira/browse/HIVE-936) -* [Hive DML: Dynamic Partition Inserts]({{< ref "#hive-dml:-dynamic-partition-inserts" >}}) -* [HCatalog Dynamic Partitioning]({{< ref "hcatalog-dynamicpartitions" >}}) - + [Usage with Pig]({{< ref "#usage-with-pig" >}}) - + [Usage from MapReduce]({{< ref "#usage-from-mapreduce" >}}) +* [Hive DML: Dynamic Partition Inserts]({{% ref "#hive-dml:-dynamic-partition-inserts" %}}) +* [HCatalog Dynamic Partitioning]({{% ref "hcatalog-dynamicpartitions" %}}) + + [Usage with Pig]({{% ref "#usage-with-pig" %}}) + + [Usage from MapReduce]({{% ref "#usage-from-mapreduce" %}}) ### Inserting into Local Files diff --git a/content/docs/latest/webhcat/webhcat-base.md b/content/docs/latest/webhcat/webhcat-base.md index e0815f4d..ab51aade 100644 --- a/content/docs/latest/webhcat/webhcat-base.md +++ b/content/docs/latest/webhcat/webhcat-base.md @@ -7,15 +7,15 @@ date: 2024-12-12 This is the manual for WebHCat, previously known as Templeton. WebHCat is the REST API for HCatalog, a table and storage management layer for Hadoop.  -* [Using WebHCat]({{< ref "webhcat-usingwebhcat" >}}) -* [Installation]({{< ref "webhcat-installwebhcat" >}}) -* [Configuration]({{< ref "webhcat-configure" >}}) -* [Reference]({{< ref "webhcat-reference" >}}) +* [Using WebHCat]({{% ref "webhcat-usingwebhcat" %}}) +* [Installation]({{% ref "webhcat-installwebhcat" %}}) +* [Configuration]({{% ref "webhcat-configure" %}}) +* [Reference]({{% ref "webhcat-reference" %}}) -See the [HCatalog Manual]({{< ref "hcatalog-base" >}}) for general HCatalog documentation. +See the [HCatalog Manual]({{% ref "hcatalog-base" %}}) for general HCatalog documentation. **Navigation Links** -Next: [Using WebHCat]({{< ref "webhcat-usingwebhcat" >}}) +Next: [Using WebHCat]({{% ref "webhcat-usingwebhcat" %}}) diff --git a/content/docs/latest/webhcat/webhcat-configure.md b/content/docs/latest/webhcat/webhcat-configure.md index 831b68e8..3d7c38e5 100644 --- a/content/docs/latest/webhcat/webhcat-configure.md +++ b/content/docs/latest/webhcat/webhcat-configure.md @@ -96,10 +96,10 @@ Default values prior to Hive 0.11 are listed in the HCatalog 0.5.0 documentation   **Navigation Links** -Previous: [Installation]({{< ref "webhcat-installwebhcat" >}}) - Next: [Reference]({{< ref "webhcat-reference" >}}) +Previous: [Installation]({{% ref "webhcat-installwebhcat" %}}) + Next: [Reference]({{% ref "webhcat-reference" %}}) -Hive configuration: [Configuring Hive]({{< ref "adminmanual-configuration" >}}), [Hive Configuration Properties](/docs/latest/user/configuration-properties), [Thrift Server Setup]({{< ref "#thrift-server-setup" >}}) +Hive configuration: [Configuring Hive]({{% ref "adminmanual-configuration" %}}), [Hive Configuration Properties](/docs/latest/user/configuration-properties), [Thrift Server Setup]({{% ref "#thrift-server-setup" %}}) diff --git a/content/docs/latest/webhcat/webhcat-installwebhcat.md b/content/docs/latest/webhcat/webhcat-installwebhcat.md index 3053ddca..6e9c269b 100644 --- a/content/docs/latest/webhcat/webhcat-installwebhcat.md +++ b/content/docs/latest/webhcat/webhcat-installwebhcat.md @@ -11,17 +11,17 @@ WebHCat and HCatalog are installed with Hive, starting with Hive release 0.11.0. If you install Hive from the binary tarball, the WebHCat server command `webhcat_server.sh` is in the `hcatalog/sbin` directory. -Hive installation is documented [here]({{< ref "adminmanual-installation" >}}). +Hive installation is documented [here]({{% ref "adminmanual-installation" %}}). ## WebHCat Installation Procedure **Note:** WebHCat was originally called Templeton. For backward compatibility the name still appears in URLs, log file names, variable names, etc. -1. Ensure that the [required related installations]({{< ref "#required-related-installations" >}}) are in place, and place required files into the [Hadoop distributed cache]({{< ref "#hadoop-distributed-cache" >}}). +1. Ensure that the [required related installations]({{% ref "#required-related-installations" %}}) are in place, and place required files into the [Hadoop distributed cache]({{% ref "#hadoop-distributed-cache" %}}). 2. Download and unpack the HCatalog distribution. 3. Set the `TEMPLETON_HOME` environment variable to the base of the HCatalog REST server installation. This will usually be same as `HCATALOG_HOME`. This is used to find the WebHCat (Templeton) configuration. 4. Set `JAVA_HOME`, `HADOOP_PREFIX`, and `HIVE_HOME` environment variables. -5. Review the [configuration]({{< ref "webhcat-configure" >}}) and update or create `webhcat-site.xml` as required. Ensure that site-specific component installation locations are accurate, especially the Hadoop configuration path. Configuration variables that use a filesystem path try to have reasonable defaults, but it's always safe to specify a full and complete path. +5. Review the [configuration]({{% ref "webhcat-configure" %}}) and update or create `webhcat-site.xml` as required. Ensure that site-specific component installation locations are accurate, especially the Hadoop configuration path. Configuration variables that use a filesystem path try to have reasonable defaults, but it's always safe to specify a full and complete path. 6. Verify that HCatalog is installed and that the `hcat` executable is in the `PATH`. 7. Build HCatalog using the command `ant jar` from the top level HCatalog directory. 8. Start the REST server with the command "`hcatalog/sbin/webhcat_server.sh start`" for Hive 0.11.0 releases and later, or "`sbin/webhcat_server.sh start`" for installations prior to HCatalog merging with Hive. @@ -51,11 +51,11 @@ Server: Jetty(7.6.0.v20120127) * [Ant](http://ant.apache.org/), version 1.8 or higher * [Hadoop](http://hadoop.apache.org/), version 1.0.3 or higher -* [ZooKeeper](http://zookeeper.apache.org/) is required if you are using the ZooKeeper storage class. (Be sure to review and update the ZooKeeper-related [WebHCat configuration]({{< ref "#webhcat-configuration" >}}).) -* HCatalog, version 0.5.0 or higher. The `hcat` executable must be both in the `PATH` and properly configured in the [WebHCat configuration]({{< ref "#webhcat-configuration" >}}). +* [ZooKeeper](http://zookeeper.apache.org/) is required if you are using the ZooKeeper storage class. (Be sure to review and update the ZooKeeper-related [WebHCat configuration]({{% ref "#webhcat-configuration" %}}).) +* HCatalog, version 0.5.0 or higher. The `hcat` executable must be both in the `PATH` and properly configured in the [WebHCat configuration]({{% ref "#webhcat-configuration" %}}). * Permissions must be given to the user running the server. (See below.) * If running a secure cluster, Kerberos keys and principals must be created. (See below.) -* [Hadoop Distributed Cache]({{< ref "#hadoop-distributed-cache" >}}). To use [Hive](http://hive.apache.org/), [Pig](http://pig.apache.org/), or [Hadoop Streaming](http://hadoop.apache.org/docs/stable/streaming.html) resources, see instructions below for placing the required files in the Hadoop Distributed Cache. +* [Hadoop Distributed Cache]({{% ref "#hadoop-distributed-cache" %}}). To use [Hive](http://hive.apache.org/), [Pig](http://pig.apache.org/), or [Hadoop Streaming](http://hadoop.apache.org/docs/stable/streaming.html) resources, see instructions below for placing the required files in the Hadoop Distributed Cache. ## Hadoop Distributed Cache @@ -101,7 +101,7 @@ hadoop fs -put ugi.jar /apps/templeton/ugi.jar ``` -The location of these files in the cache, and the location of the installations inside the archives, can be specified using the following WebHCat configuration variables. (See the [Configuration]({{< ref "webhcat-configure" >}}) documentation for more information on changing WebHCat configuration parameters.) Some default values vary depending on release number; defaults shown below are for the version of WebHCat that is included in Hive release 0.11.0. Defaults for the previous release are shown in the [HCatalog 0.5.0 documentation](http://hive.apache.org/docs/hcat_r0.5.0/rest_server_install.html#Hadoop+Distributed+Cache). +The location of these files in the cache, and the location of the installations inside the archives, can be specified using the following WebHCat configuration variables. (See the [Configuration]({{% ref "webhcat-configure" %}}) documentation for more information on changing WebHCat configuration parameters.) Some default values vary depending on release number; defaults shown below are for the version of WebHCat that is included in Hive release 0.11.0. Defaults for the previous release are shown in the [HCatalog 0.5.0 documentation](http://hive.apache.org/docs/hcat_r0.5.0/rest_server_install.html#Hadoop+Distributed+Cache). | Name | Default (Hive 0.11.0) | Description | | --- | --- | --- | @@ -116,7 +116,7 @@ The location of these files in the cache, and the location of the installations Permission must be given for the user running the WebHCat executable to run jobs for other users. That is, the WebHCat server will impersonate users on the Hadoop cluster. -Create (or assign) a Unix user who will run the WebHCat server. Call this USER. See the [Secure Cluster]({{< ref "#secure-cluster" >}}) section below for choosing a user on a Kerberos cluster. +Create (or assign) a Unix user who will run the WebHCat server. Call this USER. See the [Secure Cluster]({{% ref "#secure-cluster" %}}) section below for choosing a user on a Kerberos cluster. Modify the Hadoop core-site.xml file and set these properties: @@ -127,7 +127,7 @@ Modify the Hadoop core-site.xml file and set these properties: ## Secure Cluster -To run WebHCat on a secure cluster follow the [Permissions]({{< ref "#permissions" >}}) instructions above but create a Kerberos principal for the WebHCat server with the name `USER/host@realm`. +To run WebHCat on a secure cluster follow the [Permissions]({{% ref "#permissions" %}}) instructions above but create a Kerberos principal for the WebHCat server with the name `USER/host@realm`. Also, set the WebHCat configuration variables `templeton.kerberos.principal` and `templeton.kerberos.keytab`. @@ -163,11 +163,11 @@ In core-site.xml, make sure the following are also set: | hadoop.proxyuser.hcat.hosts | A comma-separated list of the hosts which are allowed to submit requests by 'hcat'. | **Navigation Links** -Previous: [Using WebHCat]({{< ref "webhcat-usingwebhcat" >}}) - Next: [Configuration]({{< ref "webhcat-configure" >}}) +Previous: [Using WebHCat]({{% ref "webhcat-usingwebhcat" %}}) + Next: [Configuration]({{% ref "webhcat-configure" %}}) -Hive installation: [Installing Hive]({{< ref "adminmanual-installation" >}}) - HCatalog installation: [Installation from Tarball]({{< ref "hcatalog-installhcat" >}}) +Hive installation: [Installing Hive]({{% ref "adminmanual-installation" %}}) + HCatalog installation: [Installation from Tarball]({{% ref "hcatalog-installhcat" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-allddl.md b/content/docs/latest/webhcat/webhcat-reference-allddl.md index 6d201139..121b275a 100644 --- a/content/docs/latest/webhcat/webhcat-reference-allddl.md +++ b/content/docs/latest/webhcat/webhcat-reference-allddl.md @@ -7,41 +7,41 @@ date: 2024-12-12 # WebHCat Reference: DDL Resources -This is an overview page for the WebHCat DDL resources. The full list of WebHCat resources is on [this overview page]({{< ref "webhcat-reference" >}}). +This is an overview page for the WebHCat DDL resources. The full list of WebHCat resources is on [this overview page]({{% ref "webhcat-reference" %}}). -* For information about HCatalog DDL commands, see [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}). -* For information about Hive DDL commands, see [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}). +* For information about HCatalog DDL commands, see [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}). +* For information about Hive DDL commands, see [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}). | Object | Resource (Type) | Description | | --- | --- | --- | -| DDL Command | [ddl (POST)]({{< ref "webhcat-reference-ddl" >}}) | Perform an HCatalog DDL command. | -| Database | [ddl/database (GET)]({{< ref "webhcat-reference-getdbs" >}}) | List HCatalog databases. | -|   | [ddl/database/:db (GET)]({{< ref "webhcat-reference-getdb" >}}) | Describe an HCatalog database. | -|   | [ddl/database/:db (PUT)]({{< ref "webhcat-reference-putdb" >}}) | Create an HCatalog database. | -|   | [ddl/database/:db (DELETE)]({{< ref "webhcat-reference-deletedb" >}}) | Delete (drop) an HCatalog database. | -| Table | [ddl/database/:db/table (GET)]({{< ref "webhcat-reference-gettables" >}}) | List the tables in an HCatalog database. | -|   | [ddl/database/:db/table/:table (GET)]({{< ref "webhcat-reference-gettable" >}}) | Describe an HCatalog table. | -|   | [ddl/database/:db/table/:table (PUT)]({{< ref "webhcat-reference-puttable" >}}) | Create a new HCatalog table. | -|   | [ddl/database/:db/table/:table (POST)]({{< ref "webhcat-reference-posttable" >}}) | Rename an HCatalog table. | -|   | [ddl/database/:db/table/:table (DELETE)]({{< ref "webhcat-reference-deletetable" >}}) | Delete (drop) an HCatalog table. | -|   | [ddl/database/:db/table/:existingtable/like/:newtable (PUT)]({{< ref "webhcat-reference-puttablelike" >}}) | Create a new HCatalog table like an existing one. | -| Partition | [ddl/database/:db/table/:table/partition (GET)]({{< ref "webhcat-reference-getpartitions" >}}) | List all partitions in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (GET)]({{< ref "webhcat-reference-getpartition" >}}) | Describe a single partition in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (PUT)]({{< ref "webhcat-reference-putpartition" >}}) | Create a partition in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (DELETE)]({{< ref "webhcat-reference-deletepartition" >}}) | Delete (drop) a partition in an HCatalog table. | -| Column | [ddl/database/:db/table/:table/column (GET)]({{< ref "webhcat-reference-getcolumns" >}}) | List the columns in an HCatalog table. | -|   | [ddl/database/:db/table/:table/column/:column (GET)]({{< ref "webhcat-reference-getcolumn" >}}) | Describe a single column in an HCatalog table. | -|   | [ddl/database/:db/table/:table/column/:column (PUT)]({{< ref "webhcat-reference-putcolumn" >}}) | Create a column in an HCatalog table. | -| Property | [ddl/database/:db/table/:table/property (GET)]({{< ref "webhcat-reference-getproperties" >}}) | List table properties. | -|   | [ddl/database/:db/table/:table/property/:property (GET)]({{< ref "webhcat-reference-getproperty" >}}) | Return the value of a single table property. | -|   | [ddl/database/:db/table/:table/property/:property (PUT)]({{< ref "webhcat-reference-putproperty" >}}) | Set a table property. | +| DDL Command | [ddl (POST)]({{% ref "webhcat-reference-ddl" %}}) | Perform an HCatalog DDL command. | +| Database | [ddl/database (GET)]({{% ref "webhcat-reference-getdbs" %}}) | List HCatalog databases. | +|   | [ddl/database/:db (GET)]({{% ref "webhcat-reference-getdb" %}}) | Describe an HCatalog database. | +|   | [ddl/database/:db (PUT)]({{% ref "webhcat-reference-putdb" %}}) | Create an HCatalog database. | +|   | [ddl/database/:db (DELETE)]({{% ref "webhcat-reference-deletedb" %}}) | Delete (drop) an HCatalog database. | +| Table | [ddl/database/:db/table (GET)]({{% ref "webhcat-reference-gettables" %}}) | List the tables in an HCatalog database. | +|   | [ddl/database/:db/table/:table (GET)]({{% ref "webhcat-reference-gettable" %}}) | Describe an HCatalog table. | +|   | [ddl/database/:db/table/:table (PUT)]({{% ref "webhcat-reference-puttable" %}}) | Create a new HCatalog table. | +|   | [ddl/database/:db/table/:table (POST)]({{% ref "webhcat-reference-posttable" %}}) | Rename an HCatalog table. | +|   | [ddl/database/:db/table/:table (DELETE)]({{% ref "webhcat-reference-deletetable" %}}) | Delete (drop) an HCatalog table. | +|   | [ddl/database/:db/table/:existingtable/like/:newtable (PUT)]({{% ref "webhcat-reference-puttablelike" %}}) | Create a new HCatalog table like an existing one. | +| Partition | [ddl/database/:db/table/:table/partition (GET)]({{% ref "webhcat-reference-getpartitions" %}}) | List all partitions in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (GET)]({{% ref "webhcat-reference-getpartition" %}}) | Describe a single partition in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (PUT)]({{% ref "webhcat-reference-putpartition" %}}) | Create a partition in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (DELETE)]({{% ref "webhcat-reference-deletepartition" %}}) | Delete (drop) a partition in an HCatalog table. | +| Column | [ddl/database/:db/table/:table/column (GET)]({{% ref "webhcat-reference-getcolumns" %}}) | List the columns in an HCatalog table. | +|   | [ddl/database/:db/table/:table/column/:column (GET)]({{% ref "webhcat-reference-getcolumn" %}}) | Describe a single column in an HCatalog table. | +|   | [ddl/database/:db/table/:table/column/:column (PUT)]({{% ref "webhcat-reference-putcolumn" %}}) | Create a column in an HCatalog table. | +| Property | [ddl/database/:db/table/:table/property (GET)]({{% ref "webhcat-reference-getproperties" %}}) | List table properties. | +|   | [ddl/database/:db/table/:table/property/:property (GET)]({{% ref "webhcat-reference-getproperty" %}}) | Return the value of a single table property. | +|   | [ddl/database/:db/table/:table/property/:property (PUT)]({{% ref "webhcat-reference-putproperty" %}}) | Set a table property. | **Navigation Links** -Previous: [GET version]({{< ref "webhcat-reference-version" >}}) Next: [POST ddl]({{< ref "webhcat-reference-ddl" >}}) +Previous: [GET version]({{% ref "webhcat-reference-version" %}}) Next: [POST ddl]({{% ref "webhcat-reference-ddl" %}}) -HCatalog DDL commands: [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) Hive DDL commands: [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}) +HCatalog DDL commands: [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}) Hive DDL commands: [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-ddl.md b/content/docs/latest/webhcat/webhcat-reference-ddl.md index ccdaa406..7e3853ea 100644 --- a/content/docs/latest/webhcat/webhcat-reference-ddl.md +++ b/content/docs/latest/webhcat/webhcat-reference-ddl.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Description -Performs an [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) command. The command is executed immediately upon request. Responses are limited to 1 MB. For requests which may return longer results consider using the [Hive resource]({{< ref "webhcat-reference-hive" >}}) as an alternative. +Performs an [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}) command. The command is executed immediately upon request. Responses are limited to 1 MB. For requests which may return longer results consider using the [Hive resource]({{% ref "webhcat-reference-hive" %}}) as an alternative. ## URL @@ -21,7 +21,7 @@ Performs an [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) command. The command is | **group** | The user group to use when creating a table | Optional | None | | **permissions** | The permissions string to use when creating a table. The format is "`rwxrw-r-x`". | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -83,8 +83,8 @@ In [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6576) onward, user.n ``` **Navigation Links** -Previous: [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop" >}}) - Next: [GET ddl/database]({{< ref "webhcat-reference-getdbs" >}}) +Previous: [GET version/hadoop]({{% ref "webhcat-reference-versionhadoop" %}}) + Next: [GET ddl/database]({{% ref "webhcat-reference-getdbs" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-deletedb.md b/content/docs/latest/webhcat/webhcat-reference-deletedb.md index 4e26da7c..2d575ffb 100644 --- a/content/docs/latest/webhcat/webhcat-reference-deletedb.md +++ b/content/docs/latest/webhcat/webhcat-reference-deletedb.md @@ -23,7 +23,7 @@ Delete a database. | **group** | The user group to use | Optional | None | | **permissions** | The permissions string to use. The format is "`rwxrw-r-x`". | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -67,7 +67,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [PUT ddl/database/:db]({{< ref "webhcat-reference-putdb" >}}) Next: [GET ddl/database/:db/table]({{< ref "webhcat-reference-gettables" >}}) +Previous: [PUT ddl/database/:db]({{% ref "webhcat-reference-putdb" %}}) Next: [GET ddl/database/:db/table]({{% ref "webhcat-reference-gettables" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-deletejob.md b/content/docs/latest/webhcat/webhcat-reference-deletejob.md index 1013038a..2860880c 100644 --- a/content/docs/latest/webhcat/webhcat-reference-deletejob.md +++ b/content/docs/latest/webhcat/webhcat-reference-deletejob.md @@ -12,12 +12,12 @@ Kill a job given its job ID. Substitute ":jobid" with the job ID received when t Version: Deprecated in 0.12.0 `DELETE queue/:jobid` is deprecated starting in Hive release 0.12.0. Users are encouraged to use `DELETE jobs/:jobid` instead. (See [HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443).) -`DELETE queue/:jobid` is equivalent to `DELETE jobs/:jobid` – check `[DELETE jobs/:jobid]({{< ref "webhcat-reference-deletejobid" >}})` for documentation. +`DELETE queue/:jobid` is equivalent to `DELETE jobs/:jobid` – check `[DELETE jobs/:jobid]({{% ref "webhcat-reference-deletejobid" %}})` for documentation. Version: Obsolete in 0.14.0 `DELETE queue/:jobid` will be removed in Hive release 0.14.0. (See [HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432).) -Use `[DELETE jobs/:jobid]({{< ref "webhcat-reference-deletejobid" >}})` instead. +Use `[DELETE jobs/:jobid]({{% ref "webhcat-reference-deletejobid" %}})` instead. ## URL @@ -29,7 +29,7 @@ Use `[DELETE jobs/:jobid]({{< ref "webhcat-reference-deletejobid" >}})` instead. | --- | --- | --- | --- | | **:jobid** | The job ID to delete. This is the ID received when the job was created. | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -99,15 +99,15 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported Note -The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use [GET queue/:jobid]({{< ref "webhcat-reference-jobinfo" >}}) to monitor the job and confirm that it is eventually deleted. +The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use [GET queue/:jobid]({{% ref "webhcat-reference-jobinfo" %}}) to monitor the job and confirm that it is eventually deleted. **Navigation Links** -Previous: [GET queue/:jobid]({{< ref "webhcat-reference-jobinfo" >}}) - Next: [GET jobs]({{< ref "webhcat-reference-jobs" >}}) +Previous: [GET queue/:jobid]({{% ref "webhcat-reference-jobinfo" %}}) + Next: [GET jobs]({{% ref "webhcat-reference-jobs" %}}) -Replaced in Hive 0.12.0 by: [DELETE jobs/:jobid]({{< ref "webhcat-reference-deletejobid" >}}) +Replaced in Hive 0.12.0 by: [DELETE jobs/:jobid]({{% ref "webhcat-reference-deletejobid" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-deletejobid.md b/content/docs/latest/webhcat/webhcat-reference-deletejobid.md index 7bf181b6..fdfc47e5 100644 --- a/content/docs/latest/webhcat/webhcat-reference-deletejobid.md +++ b/content/docs/latest/webhcat/webhcat-reference-deletejobid.md @@ -11,7 +11,7 @@ Kill a job given its job ID. Substitute ":jobid" with the job ID received when t Version: Hive 0.12.0 and later -`DELETE jobs/:jobid` is introduced in Hive release 0.12.0. It is equivalent to `[DELETE queue/:jobid]({{< ref "webhcat-reference-deletejob" >}})` in prior releases. +`DELETE jobs/:jobid` is introduced in Hive release 0.12.0. It is equivalent to `[DELETE queue/:jobid]({{% ref "webhcat-reference-deletejob" %}})` in prior releases. `DELETE queue/:jobid` is now deprecated ([HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443)) and will be removed in Hive 0.14.0 ([HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432)). ## URL @@ -24,7 +24,7 @@ Version: Hive 0.12.0 and later | --- | --- | --- | --- | | **:jobid** | The job ID to delete. This is the ID received when the job was created. | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -94,14 +94,14 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported Note -The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use [GET jobs/:jobid]({{< ref "webhcat-reference-job" >}}) to monitor the job and confirm that it is eventually deleted. +The job is not immediately deleted, therefore the information returned may not reflect deletion, as in our example. Use [GET jobs/:jobid]({{% ref "webhcat-reference-job" %}}) to monitor the job and confirm that it is eventually deleted. **Navigation Links** -Previous: [GET jobs/:jobid]({{< ref "webhcat-reference-job" >}}) +Previous: [GET jobs/:jobid]({{% ref "webhcat-reference-job" %}}) -Replaces deprecated resource: [DELETE queue/:jobid]({{< ref "webhcat-reference-deletejob" >}}) +Replaces deprecated resource: [DELETE queue/:jobid]({{% ref "webhcat-reference-deletejob" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-deletepartition.md b/content/docs/latest/webhcat/webhcat-reference-deletepartition.md index 1971a7d0..5e9e0ced 100644 --- a/content/docs/latest/webhcat/webhcat-reference-deletepartition.md +++ b/content/docs/latest/webhcat/webhcat-reference-deletepartition.md @@ -24,7 +24,7 @@ Delete (drop) a partition in an HCatalog table. | **group** | The user group to use | Optional | None | | **permissions** | The permissions string to use. The format is "`rwxrw-r-x`". | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -58,7 +58,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [PUT ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-putpartition" >}}) Next: [GET ddl/database/:db/table/:table/column]({{< ref "webhcat-reference-getcolumns" >}}) +Previous: [PUT ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-putpartition" %}}) Next: [GET ddl/database/:db/table/:table/column]({{% ref "webhcat-reference-getcolumns" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-deletetable.md b/content/docs/latest/webhcat/webhcat-reference-deletetable.md index b1005049..27c318f9 100644 --- a/content/docs/latest/webhcat/webhcat-reference-deletetable.md +++ b/content/docs/latest/webhcat/webhcat-reference-deletetable.md @@ -23,7 +23,7 @@ Delete (drop) an HCatalog table. | **group** | The user group to use | Optional | None | | **permissions** | The permissions string to use. The format is "`rwxrw-r-x`". | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -54,7 +54,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [POST ddl/database/:db/table/:table]({{< ref "webhcat-reference-posttable" >}}) Next: [PUT ddl/database/:db/table/:existingtable/like/:newtable]({{< ref "webhcat-reference-puttablelike" >}}) +Previous: [POST ddl/database/:db/table/:table]({{% ref "webhcat-reference-posttable" %}}) Next: [PUT ddl/database/:db/table/:existingtable/like/:newtable]({{% ref "webhcat-reference-puttablelike" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getcolumn.md b/content/docs/latest/webhcat/webhcat-reference-getcolumn.md index 8c9a5f3e..3aab9ca4 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getcolumn.md +++ b/content/docs/latest/webhcat/webhcat-reference-getcolumn.md @@ -21,7 +21,7 @@ Describe a single column in an HCatalog table. | **:table** | The table name | Required | None | | **:column** | The column name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -58,7 +58,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/column]({{< ref "webhcat-reference-getcolumns" >}}) Next: [PUT ddl/database/:db/table/:table/column/:column]({{< ref "webhcat-reference-putcolumn" >}}) +Previous: [GET ddl/database/:db/table/:table/column]({{% ref "webhcat-reference-getcolumns" %}}) Next: [PUT ddl/database/:db/table/:table/column/:column]({{% ref "webhcat-reference-putcolumn" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getcolumns.md b/content/docs/latest/webhcat/webhcat-reference-getcolumns.md index 918ecc13..45d0b573 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getcolumns.md +++ b/content/docs/latest/webhcat/webhcat-reference-getcolumns.md @@ -20,7 +20,7 @@ List the columns in an HCatalog table. | **:db** | The database name | Required | None | | **:table** | The table name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -71,7 +71,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [DELETE ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-deletepartition" >}}) Next: [GET ddl/database/:db/table/:table/column/:column]({{< ref "webhcat-reference-getcolumn" >}}) +Previous: [DELETE ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-deletepartition" %}}) Next: [GET ddl/database/:db/table/:table/column/:column]({{% ref "webhcat-reference-getcolumn" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getdb.md b/content/docs/latest/webhcat/webhcat-reference-getdb.md index 6b114582..5665a260 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getdb.md +++ b/content/docs/latest/webhcat/webhcat-reference-getdb.md @@ -19,7 +19,7 @@ Describe a database. (Note: This resource has a "format=extended" parameter howe | --- | --- | --- | --- | | **:db** | The database name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -64,7 +64,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database]({{< ref "webhcat-reference-getdbs" >}}) Next: [PUT ddl/database/:db]({{< ref "webhcat-reference-putdb" >}}) +Previous: [GET ddl/database]({{% ref "webhcat-reference-getdbs" %}}) Next: [PUT ddl/database/:db]({{% ref "webhcat-reference-putdb" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getdbs.md b/content/docs/latest/webhcat/webhcat-reference-getdbs.md index daa3d35f..75710c2d 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getdbs.md +++ b/content/docs/latest/webhcat/webhcat-reference-getdbs.md @@ -19,7 +19,7 @@ List the databases in HCatalog. | --- | --- | --- | --- | | **like** | List only databases whose names match the specified pattern. | Optional | "*" (List all) | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -51,8 +51,8 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [POST ddl]({{< ref "webhcat-reference-ddl" >}}) - Next: [GET ddl/database/:db]({{< ref "webhcat-reference-getdb" >}}) +Previous: [POST ddl]({{% ref "webhcat-reference-ddl" %}}) + Next: [GET ddl/database/:db]({{% ref "webhcat-reference-getdb" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getpartition.md b/content/docs/latest/webhcat/webhcat-reference-getpartition.md index c4f54674..24492b70 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getpartition.md +++ b/content/docs/latest/webhcat/webhcat-reference-getpartition.md @@ -21,7 +21,7 @@ Describe a single partition in an HCatalog table. | **:table** | The table name | Required | None | | **:partition** | The partition name, col_name='value' list. Be careful to properly encode the quote for http, for example, country=%27algeria%27. | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -88,7 +88,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/partition]({{< ref "webhcat-reference-getpartitions" >}}) Next: [PUT ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-putpartition" >}}) +Previous: [GET ddl/database/:db/table/:table/partition]({{% ref "webhcat-reference-getpartitions" %}}) Next: [PUT ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-putpartition" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getpartitions.md b/content/docs/latest/webhcat/webhcat-reference-getpartitions.md index d517aed4..1c660e9d 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getpartitions.md +++ b/content/docs/latest/webhcat/webhcat-reference-getpartitions.md @@ -20,7 +20,7 @@ List all the partitions in an HCatalog table. | **:db** | The database name | Required | None | | **:table** | The table name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -67,7 +67,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [PUT ddl/database/:db/table/:existingtable/like/:newtable]({{< ref "webhcat-reference-puttablelike" >}}) Next: [GET ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-getpartition" >}}) +Previous: [PUT ddl/database/:db/table/:existingtable/like/:newtable]({{% ref "webhcat-reference-puttablelike" %}}) Next: [GET ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-getpartition" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getproperties.md b/content/docs/latest/webhcat/webhcat-reference-getproperties.md index 720b0b71..91114c9f 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getproperties.md +++ b/content/docs/latest/webhcat/webhcat-reference-getproperties.md @@ -20,7 +20,7 @@ List all the properties of an HCatalog table. | **:db** | The database name | Required | None | | **:table** | The table name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -63,7 +63,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [PUT ddl/database/:db/table/:table/column/:column]({{< ref "webhcat-reference-putcolumn" >}}) Next: [GET ddl/database/:db/table/:table/property/:property]({{< ref "webhcat-reference-getproperty" >}}) +Previous: [PUT ddl/database/:db/table/:table/column/:column]({{% ref "webhcat-reference-putcolumn" %}}) Next: [GET ddl/database/:db/table/:table/property/:property]({{% ref "webhcat-reference-getproperty" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-getproperty.md b/content/docs/latest/webhcat/webhcat-reference-getproperty.md index 82a67783..d01d1ed8 100644 --- a/content/docs/latest/webhcat/webhcat-reference-getproperty.md +++ b/content/docs/latest/webhcat/webhcat-reference-getproperty.md @@ -21,7 +21,7 @@ Return the value of a single table property. | **:table** | The table name | Required | None | | **:property** | The property name | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -68,7 +68,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/property]({{< ref "webhcat-reference-getproperties" >}}) Next: [PUT ddl/database/:db/table/:table/property/:property]({{< ref "webhcat-reference-putproperty" >}}) +Previous: [GET ddl/database/:db/table/:table/property]({{% ref "webhcat-reference-getproperties" %}}) Next: [PUT ddl/database/:db/table/:table/property/:property]({{% ref "webhcat-reference-putproperty" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-gettable.md b/content/docs/latest/webhcat/webhcat-reference-gettable.md index e75271aa..c48a864b 100644 --- a/content/docs/latest/webhcat/webhcat-reference-gettable.md +++ b/content/docs/latest/webhcat/webhcat-reference-gettable.md @@ -23,7 +23,7 @@ Describe an HCatalog table. Normally returns a simple list of columns (using "de | **:table** | The table name | Required | None | | **format** | Set "`format=extended`" to see additional information (using "show table extended like") | Optional | Not extended | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -131,7 +131,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table]({{< ref "webhcat-reference-gettables" >}}) Next: [PUT ddl/database/:db/table/:table]({{< ref "webhcat-reference-puttable" >}}) +Previous: [GET ddl/database/:db/table]({{% ref "webhcat-reference-gettables" %}}) Next: [PUT ddl/database/:db/table/:table]({{% ref "webhcat-reference-puttable" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-gettables.md b/content/docs/latest/webhcat/webhcat-reference-gettables.md index 3593b426..f680998b 100644 --- a/content/docs/latest/webhcat/webhcat-reference-gettables.md +++ b/content/docs/latest/webhcat/webhcat-reference-gettables.md @@ -20,7 +20,7 @@ List the tables in an HCatalog database. | **:db** | The database name | Required | None | | **like** | List only tables whose names match the specified pattern | Optional | "*" (List all tables) | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -85,7 +85,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [DELETE ddl/database/:db]({{< ref "webhcat-reference-deletedb" >}}) Next: [GET ddl/database/:db/table/:table]({{< ref "webhcat-reference-gettable" >}}) +Previous: [DELETE ddl/database/:db]({{% ref "webhcat-reference-deletedb" %}}) Next: [GET ddl/database/:db/table/:table]({{% ref "webhcat-reference-gettable" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-hive.md b/content/docs/latest/webhcat/webhcat-reference-hive.md index f88808e0..ff049176 100644 --- a/content/docs/latest/webhcat/webhcat-reference-hive.md +++ b/content/docs/latest/webhcat/webhcat-reference-hive.md @@ -11,7 +11,7 @@ Runs a [Hive](http://hive.apache.org/) query or set of commands. Version: Hive 0.13.0 and later -As of Hive 0.13.0, [GET version/hive]({{< ref "webhcat-reference-versionhive" >}}) displays the Hive version used for the query or commands. +As of Hive 0.13.0, [GET version/hive]({{% ref "webhcat-reference-versionhive" %}}) displays the Hive version used for the query or commands. ## URL @@ -30,7 +30,7 @@ As of Hive 0.13.0, [GET version/hive]({{< ref "webhcat-reference-versionhive" > | **enablelog** | If **statusdir** is set and **enablelog** is "true", collect Hadoop job configuration and logs into a directory named `$statusdir/logs` after the job finishes. Both completed and failed attempts are logged. The layout of subdirectories in `$statusdir/logs` is: `logs/$job_id` *(directory for $job_id)* `logs/$job_id/job.xml.html` `logs/$job_id/$attempt_id` *(directory for $attempt_id)* `logs/$job_id/$attempt_id/stderr` `logs/$job_id/$attempt_id/stdout` `logs/$job_id/$attempt_id/syslog` This parameter was introduced in Hive 0.12.0. (See [HIVE-4531](https://issues.apache.org/jira/browse/HIVE-4531).) | Optional in Hive 0.12.0+ | None | | **callback** | Define a URL to be called upon job completion. You may embed a specific job ID into this URL using `$jobId`. This tag will be replaced in the callback URL with this job's job ID. | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -87,8 +87,8 @@ Found 2 items ``` **Navigation Links** -Previous: [POST pig]({{< ref "webhcat-reference-pig" >}}) - Next: [GET queue]({{< ref "webhcat-reference-jobids" >}}) +Previous: [POST pig]({{% ref "webhcat-reference-pig" %}}) + Next: [GET queue]({{% ref "webhcat-reference-jobids" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-job.md b/content/docs/latest/webhcat/webhcat-reference-job.md index 78adb517..fbc22401 100644 --- a/content/docs/latest/webhcat/webhcat-reference-job.md +++ b/content/docs/latest/webhcat/webhcat-reference-job.md @@ -11,7 +11,7 @@ Check the status of a job and get related job information given its job ID. Subs Version: Hive 0.12.0 and later -`GET jobs/:jobid` is introduced in Hive release 0.12.0. It is equivalent to `[GET queue/:jobid]({{< ref "webhcat-reference-jobinfo" >}})` in prior releases. +`GET jobs/:jobid` is introduced in Hive release 0.12.0. It is equivalent to `[GET queue/:jobid]({{% ref "webhcat-reference-jobinfo" %}})` in prior releases. `GET queue/:jobid` is now deprecated ([HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443)) and will be removed in Hive 0.14.0 ([HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432)). ## URL @@ -24,7 +24,7 @@ Version: Hive 0.12.0 and later | --- | --- | --- | --- | | **:jobid** | The job ID to check. This is the ID received when the job was created. | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -103,12 +103,12 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported ``` **Navigation Links** -Previous: [GET jobs]({{< ref "webhcat-reference-jobs" >}}) - Next: [DELETE jobs/:jobid]({{< ref "webhcat-reference-deletejobid" >}}) +Previous: [GET jobs]({{% ref "webhcat-reference-jobs" %}}) + Next: [DELETE jobs/:jobid]({{% ref "webhcat-reference-deletejobid" %}}) -Replaces deprecated resource: [GET queue/:jobid]({{< ref "webhcat-reference-jobinfo" >}}) +Replaces deprecated resource: [GET queue/:jobid]({{% ref "webhcat-reference-jobinfo" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-jobids.md b/content/docs/latest/webhcat/webhcat-reference-jobids.md index 8ce6a385..9d097074 100644 --- a/content/docs/latest/webhcat/webhcat-reference-jobids.md +++ b/content/docs/latest/webhcat/webhcat-reference-jobids.md @@ -11,12 +11,12 @@ Return a list of all job IDs. Version: Deprecated in 0.12.0 -`GET queue` is deprecated starting in Hive release 0.12.0. (See [HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443).) Users are encouraged to use `[GET jobs]({{< ref "webhcat-reference-jobs" >}})` instead. +`GET queue` is deprecated starting in Hive release 0.12.0. (See [HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443).) Users are encouraged to use `[GET jobs]({{% ref "webhcat-reference-jobs" %}})` instead. Version: Obsolete in 0.14.0 `GET queue` will be removed in Hive release 0.14.0. (See [HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432).) -Use `[GET jobs]({{< ref "webhcat-reference-jobs" >}})` instead. +Use `[GET jobs]({{% ref "webhcat-reference-jobs" %}})` instead. ## URL @@ -28,7 +28,7 @@ Use `[GET jobs]({{< ref "webhcat-reference-jobs" >}})` instead. | --- | --- | --- | --- | | **showall** | If **showall** is set to "true", then the request will return all jobs the user has permission to view, not only the jobs belonging to the user. This parameter is not available in releases prior to Hive 0.12.0. (See [HIVE-4442](https://issues.apache.org/jira/browse/HIVE-4442).) | Optional in Hive 0.12.0+ | false | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also accepted. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also accepted. ## Results @@ -56,12 +56,12 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also accepted. ``` **Navigation Links** -Previous: [POST hive]({{< ref "webhcat-reference-hive" >}}) - Next: [GET queue/:jobid]({{< ref "webhcat-reference-jobinfo" >}}) +Previous: [POST hive]({{% ref "webhcat-reference-hive" %}}) + Next: [GET queue/:jobid]({{% ref "webhcat-reference-jobinfo" %}}) -Replaced in Hive 0.12.0 by: [GET jobs]({{< ref "webhcat-reference-jobs" >}}) +Replaced in Hive 0.12.0 by: [GET jobs]({{% ref "webhcat-reference-jobs" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-jobinfo.md b/content/docs/latest/webhcat/webhcat-reference-jobinfo.md index 3f336864..2a92aeaa 100644 --- a/content/docs/latest/webhcat/webhcat-reference-jobinfo.md +++ b/content/docs/latest/webhcat/webhcat-reference-jobinfo.md @@ -12,12 +12,12 @@ Check the status of a job and get related job information given its job ID. Subs Version: Deprecated in 0.12.0 `GET queue/:jobid` is deprecated starting in Hive release 0.12.0. Users are encouraged to use `GET jobs/:jobid` instead. (See [HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443).) -`GET queue/:jobid` is equivalent to `GET jobs/:jobid` – check `[GET jobs/:jobid]({{< ref "webhcat-reference-job" >}})` for documentation. +`GET queue/:jobid` is equivalent to `GET jobs/:jobid` – check `[GET jobs/:jobid]({{% ref "webhcat-reference-job" %}})` for documentation. Version: Obsolete in 0.14.0 `GET queue/:jobid` will be removed in Hive release 0.14.0. (See [HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432).) -Use `[GET jobs/:jobid]({{< ref "webhcat-reference-job" >}})` instead. +Use `[GET jobs/:jobid]({{% ref "webhcat-reference-job" %}})` instead. ## URL @@ -29,7 +29,7 @@ Use `[GET jobs/:jobid]({{< ref "webhcat-reference-job" >}})` instead. | --- | --- | --- | --- | | **:jobid** | The job ID to check. This is the ID received when the job was created. | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -153,12 +153,12 @@ Starting in Hive release 0.12.0, `GET queue/:jobid` returns user arguments as we ``` **Navigation Links** -Previous: [GET queue]({{< ref "webhcat-reference-jobids" >}}) - Next: [DELETE queue/:jobid]({{< ref "webhcat-reference-deletejob" >}}) +Previous: [GET queue]({{% ref "webhcat-reference-jobids" %}}) + Next: [DELETE queue/:jobid]({{% ref "webhcat-reference-deletejob" %}}) -Replaced in Hive 0.12.0 by: [GET jobs/:jobid]({{< ref "webhcat-reference-job" >}}) +Replaced in Hive 0.12.0 by: [GET jobs/:jobid]({{% ref "webhcat-reference-job" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-jobs.md b/content/docs/latest/webhcat/webhcat-reference-jobs.md index 5290c5b5..c61e3fc2 100644 --- a/content/docs/latest/webhcat/webhcat-reference-jobs.md +++ b/content/docs/latest/webhcat/webhcat-reference-jobs.md @@ -11,7 +11,7 @@ Return a list of all job IDs. Version: Hive 0.12.0 and later -`GET jobs` is introduced in Hive release 0.12.0. It is equivalent to `[GET queue]({{< ref "webhcat-reference-jobids" >}})` in prior releases. +`GET jobs` is introduced in Hive release 0.12.0. It is equivalent to `[GET queue]({{% ref "webhcat-reference-jobids" %}})` in prior releases. `GET queue` is now deprecated ([HIVE-4443](https://issues.apache.org/jira/browse/HIVE-4443)) and will be removed in Hive 0.14.0 ([HIVE-6432](https://issues.apache.org/jira/browse/HIVE-6432)). ## URL @@ -27,7 +27,7 @@ Version: Hive 0.12.0 and later | **jobid** | If **jobid** is present, only the records whose job ID is lexicographically greater than **jobid** are returned. For example, if **jobid** = "job_201312091733_0001", the jobs whose job ID is greater than "job_201312091733_0001" are returned. The number of records returned depends on the value of **numrecords**.This parameter is not available in releases prior to Hive 0.13.0. (See [HIVE-5519](https://issues.apache.org/jira/browse/HIVE-5519).) | Optional in Hive 0.13.0+ | None | | **numrecords** | If the **jobid** and **numrecords** parameters are present, the top *numrecords* records appearing after **jobid** will be returned after sorting the job ID list lexicographically. If the **jobid** parameter is missing and **numrecords** is present, the top *numrecords* will be returned after lexicographically sorting the job ID list. If the **jobid** parameter is present and **numrecords** is missing, all the records whose job ID is greater than **jobid** are returned.This parameter is not available in releases prior to Hive 0.13.0. (See [HIVE-5519](https://issues.apache.org/jira/browse/HIVE-5519).) | Optional in Hive 0.13.0+ | All | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also accepted. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also accepted. ## Results @@ -38,7 +38,7 @@ Every element inside the array includes: | Name | Description | | --- | --- | | **id** | Job ID. | -| **detail** | Job details if **showall** is set to "true"; otherwise "null". For more information about what details it contains, check `[GET jobs/:jobid]({{< ref "webhcat-reference-job" >}})`. | +| **detail** | Job details if **showall** is set to "true"; otherwise "null". For more information about what details it contains, check `[GET jobs/:jobid]({{% ref "webhcat-reference-job" %}})`. | ## Examples @@ -140,12 +140,12 @@ In release 0.12.0 the first line of JSON output for the fields parameter gives t   **Navigation Links** -Previous: [DELETE queue/:jobid]({{< ref "webhcat-reference-deletejob" >}}) - Next: [GET jobs/:jobid]({{< ref "webhcat-reference-job" >}}) +Previous: [DELETE queue/:jobid]({{% ref "webhcat-reference-deletejob" %}}) + Next: [GET jobs/:jobid]({{% ref "webhcat-reference-job" %}}) -Replaces deprecated resource: [GET queue]({{< ref "webhcat-reference-jobids" >}}) +Replaces deprecated resource: [GET queue]({{% ref "webhcat-reference-jobids" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-mapreducejar.md b/content/docs/latest/webhcat/webhcat-reference-mapreducejar.md index bf17b632..5c243f17 100644 --- a/content/docs/latest/webhcat/webhcat-reference-mapreducejar.md +++ b/content/docs/latest/webhcat/webhcat-reference-mapreducejar.md @@ -11,7 +11,7 @@ Creates and queues a standard [Hadoop MapReduce](http://hadoop.apache.org/docs/s Version: Hive 0.13.0 and later -As of Hive 0.13.0, [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop" >}}) displays the Hadoop version used for the MapReduce job. +As of Hive 0.13.0, [GET version/hadoop]({{% ref "webhcat-reference-versionhadoop" %}}) displays the Hadoop version used for the MapReduce job. ## URL @@ -32,7 +32,7 @@ As of Hive 0.13.0, [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop | **callback** | Define a URL to be called upon job completion. You may embed a specific job ID into this URL using `$jobId`. This tag will be replaced in the callback URL with this job's job ID. | Optional | None | | **usehcatalog** | Specify that the submitted job uses HCatalog and therefore needs to access the metastore, which requires additional steps for WebHCat to perform in a secure cluster. (See [HIVE-5133](https://issues.apache.org/jira/browse/HIVE-5133).) This parameter will be introduced in Hive 0.13.0. Also, if webhcat-site.xml defines the parameters `templeton.hive.archive`, `templeton.hive.home` and `templeton.hcat.home` then WebHCat will ship the Hive tar to the target node where the job runs. (See [HIVE-5547](https://issues.apache.org/jira/browse/HIVE-5547).) This means that Hive doesn't need to be installed on every node in the Hadoop cluster. This is independent of security, but improves manageability. The webhcat-site.xml parameters are documented in webhcat-default.xml. | Optional in Hive 0.13.0+ | false | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -90,8 +90,8 @@ In [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6576) onward, user.n ``` **Navigation Links** -Previous: [POST mapreduce/streaming]({{< ref "webhcat-reference-mapreducestream" >}}) - Next: [POST pig]({{< ref "webhcat-reference-pig" >}}) +Previous: [POST mapreduce/streaming]({{% ref "webhcat-reference-mapreducestream" %}}) + Next: [POST pig]({{% ref "webhcat-reference-pig" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-mapreducestream.md b/content/docs/latest/webhcat/webhcat-reference-mapreducestream.md index a6c2cb45..af783884 100644 --- a/content/docs/latest/webhcat/webhcat-reference-mapreducestream.md +++ b/content/docs/latest/webhcat/webhcat-reference-mapreducestream.md @@ -11,7 +11,7 @@ Create and queue a [Hadoop streaming MapReduce](http://hadoop.apache.org/docs/st Version: Hive 0.13.0 and later -As of Hive 0.13.0, [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop" >}}) displays the Hadoop version used for the MapReduce job. +As of Hive 0.13.0, [GET version/hadoop]({{% ref "webhcat-reference-versionhadoop" %}}) displays the Hadoop version used for the MapReduce job. ## URL @@ -22,7 +22,7 @@ As of Hive 0.13.0, [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop | Name | Description | Required? | Default | | --- | --- | --- | --- | | **input** | Location of the input data in Hadoop. | Required | None | -| **output** | Location in which to store the output data. If not specified, WebHCat will store the output in a location that can be discovered using the [queue]({{< ref "webhcat-reference-jobinfo" >}}) resource. | Optional | See description | +| **output** | Location in which to store the output data. If not specified, WebHCat will store the output in a location that can be discovered using the [queue]({{% ref "webhcat-reference-jobinfo" %}}) resource. | Optional | See description | | **mapper** | Location of the mapper program in Hadoop. | Required | None | | **reducer** | Location of the reducer program in Hadoop. | Required | None | | **file** | Add an HDFS file to the distributed cache. | Optional | None | @@ -33,7 +33,7 @@ As of Hive 0.13.0, [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop | **enablelog** | If **statusdir** is set and **enablelog** is "true", collect Hadoop job configuration and logs into a directory named `$statusdir/logs` after the job finishes. Both completed and failed attempts are logged. The layout of subdirectories in `$statusdir/logs` is: `logs/$job_id` *(directory for $job_id)* `logs/$job_id/job.xml.html` `logs/$job_id/$attempt_id` *(directory for $attempt_id)* `logs/$job_id/$attempt_id/stderr` `logs/$job_id/$attempt_id/stdout` `logs/$job_id/$attempt_id/syslog` This parameter was introduced in Hive 0.12.0. (See [HIVE-4531](https://issues.apache.org/jira/browse/HIVE-4531).) | Optional in Hive 0.12.0+ | None | | **callback** | Define a URL to be called upon job completion. You may embed a specific job ID into this URL using `$jobId`. This tag will be replaced in the callback URL with this job's job ID. | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -110,8 +110,8 @@ drwxr-xr-x - ctdean supergroup 0 2011-11-11 13:26 /user/ctdean/mycoun ``` **Navigation Links** -Previous: [PUT ddl/database/:db/table/:table/property/:property]({{< ref "webhcat-reference-putproperty" >}}) - Next: [POST mapreduce/jar]({{< ref "webhcat-reference-mapreducejar" >}}) +Previous: [PUT ddl/database/:db/table/:table/property/:property]({{% ref "webhcat-reference-putproperty" %}}) + Next: [POST mapreduce/jar]({{% ref "webhcat-reference-mapreducejar" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-pig.md b/content/docs/latest/webhcat/webhcat-reference-pig.md index a714aa39..47fce9d0 100644 --- a/content/docs/latest/webhcat/webhcat-reference-pig.md +++ b/content/docs/latest/webhcat/webhcat-reference-pig.md @@ -26,7 +26,7 @@ Create and queue a [Pig](http://pig.apache.org/) job. | **callback** | Define a URL to be called upon job completion. You may embed a specific job ID into this URL using `$jobId`. This tag will be replaced in the callback URL with this job's job ID. | Optional | None | | **usehcatalog** | Specify that the submitted job uses HCatalog and therefore needs to access the metastore, which requires additional steps for WebHCat to perform in a secure cluster. (See [HIVE-5133](https://issues.apache.org/jira/browse/HIVE-5133).) This parameter will be introduced in Hive 0.13.0. It can also be set to "true" by including `-useHCatalog` in the **arg** parameter. Also, if webhcat-site.xml defines the parameters `templeton.hive.archive`, `templeton.hive.home` and `templeton.hcat.home` then WebHCat will ship the Hive tar to the target node where the job runs. (See [HIVE-5547](https://issues.apache.org/jira/browse/HIVE-5547).) This means that Hive doesn't need to be installed on every node in the Hadoop cluster. It does not ensure that Pig is installed on the target node in the cluster. This is independent of security, but improves manageability. The webhcat-site.xml parameters are documented in webhcat-default.xml. | Optional in Hive 0.13.0+ | false | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -87,8 +87,8 @@ In [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6576) onward, user.n ``` **Navigation Links** -Previous: [POST mapreduce/jar]({{< ref "webhcat-reference-mapreducejar" >}}) - Next: [POST hive]({{< ref "webhcat-reference-hive" >}}) +Previous: [POST mapreduce/jar]({{% ref "webhcat-reference-mapreducejar" %}}) + Next: [POST hive]({{% ref "webhcat-reference-hive" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-posttable.md b/content/docs/latest/webhcat/webhcat-reference-posttable.md index fa2062d1..58380668 100644 --- a/content/docs/latest/webhcat/webhcat-reference-posttable.md +++ b/content/docs/latest/webhcat/webhcat-reference-posttable.md @@ -23,7 +23,7 @@ Rename an HCatalog table. | **group** | The user group to use | Optional | None | | **permissions** | The permissions string to use. The format is "`rwxrw-r-x`". | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -71,8 +71,8 @@ In [Hive 0.13.0](https://issues.apache.org/jira/browse/HIVE-6576) onward, user.n ``` **Navigation Links** -Previous: [PUT ddl/database/:db/table/:table]({{< ref "webhcat-reference-puttable" >}}) - Next: [DELETE ddl/database/:db/table/:table]({{< ref "webhcat-reference-deletetable" >}}) +Previous: [PUT ddl/database/:db/table/:table]({{% ref "webhcat-reference-puttable" %}}) + Next: [DELETE ddl/database/:db/table/:table]({{% ref "webhcat-reference-deletetable" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-putcolumn.md b/content/docs/latest/webhcat/webhcat-reference-putcolumn.md index 473874e0..fffbd5c1 100644 --- a/content/docs/latest/webhcat/webhcat-reference-putcolumn.md +++ b/content/docs/latest/webhcat/webhcat-reference-putcolumn.md @@ -25,7 +25,7 @@ Create a column in an HCatalog table. | **type** | The type of column to add, like "string" or "int" | Required | None | | **comment** | The column comment, like a description | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -61,7 +61,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/column/:column]({{< ref "webhcat-reference-getcolumn" >}}) Next: [GET ddl/database/:db/table/:table/property]({{< ref "webhcat-reference-getproperties" >}}) +Previous: [GET ddl/database/:db/table/:table/column/:column]({{% ref "webhcat-reference-getcolumn" %}}) Next: [GET ddl/database/:db/table/:table/property]({{% ref "webhcat-reference-getproperties" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-putdb.md b/content/docs/latest/webhcat/webhcat-reference-putdb.md index 74e7f0f9..de17d28d 100644 --- a/content/docs/latest/webhcat/webhcat-reference-putdb.md +++ b/content/docs/latest/webhcat/webhcat-reference-putdb.md @@ -24,7 +24,7 @@ Create a database. | **comment** | A comment for the database, like a description | Optional | None | | **properties** | The database properties | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -57,7 +57,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db]({{< ref "webhcat-reference-getdb" >}}) Next: [DELETE ddl/database/:db]({{< ref "webhcat-reference-deletedb" >}}) +Previous: [GET ddl/database/:db]({{% ref "webhcat-reference-getdb" %}}) Next: [DELETE ddl/database/:db]({{% ref "webhcat-reference-deletedb" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-putpartition.md b/content/docs/latest/webhcat/webhcat-reference-putpartition.md index bab7f047..f3f82261 100644 --- a/content/docs/latest/webhcat/webhcat-reference-putpartition.md +++ b/content/docs/latest/webhcat/webhcat-reference-putpartition.md @@ -25,7 +25,7 @@ Create a partition in an HCatalog table. | **location** | The location for partition creation | Required | None | | **ifNotExists** | If true, return an error if the partition already exists. | Optional | False | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -59,7 +59,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-getpartition" >}}) Next: [DELETE ddl/database/:db/table/:table/partition/:partition]({{< ref "webhcat-reference-deletepartition" >}}) +Previous: [GET ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-getpartition" %}}) Next: [DELETE ddl/database/:db/table/:table/partition/:partition]({{% ref "webhcat-reference-deletepartition" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-putproperty.md b/content/docs/latest/webhcat/webhcat-reference-putproperty.md index 62e50b93..af6e2101 100644 --- a/content/docs/latest/webhcat/webhcat-reference-putproperty.md +++ b/content/docs/latest/webhcat/webhcat-reference-putproperty.md @@ -24,7 +24,7 @@ Add a single property on an HCatalog table. This will also reset an existing pro | **permissions** | The permissions string to use | Optional | None | | **value** | The property value | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -58,7 +58,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table/property/:property]({{< ref "webhcat-reference-getproperty" >}}) Next: [POST mapreduce/streaming]({{< ref "webhcat-reference-mapreducestream" >}}) +Previous: [GET ddl/database/:db/table/:table/property/:property]({{% ref "webhcat-reference-getproperty" %}}) Next: [POST mapreduce/streaming]({{% ref "webhcat-reference-mapreducestream" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-puttable.md b/content/docs/latest/webhcat/webhcat-reference-puttable.md index e4510348..edacf68e 100644 --- a/content/docs/latest/webhcat/webhcat-reference-puttable.md +++ b/content/docs/latest/webhcat/webhcat-reference-puttable.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Description -Create a new HCatalog table. For more information, please refer to the Hive documentation for [CREATE TABLE]({{< ref "#create-table" >}}). +Create a new HCatalog table. For more information, please refer to the Hive documentation for [CREATE TABLE]({{% ref "#create-table" %}}). ## URL @@ -26,12 +26,12 @@ Create a new HCatalog table. For more information, please refer to the Hive docu | **comment** | Comment for the table. | Optional | None | | **columns** | A list of column descriptions, including name, type, and an optional comment. | Optional | None | | **partitionedBy** | A list of column descriptions used to partition the table. Like the **columns** parameter this is a list of name, type, and comment fields. | Optional | None | -| **clusteredBy** | An object describing how to cluster the table including the parameters columnNames, sortedBy, and numberOfBuckets. The sortedBy parameter includes the parameters columnName and order (ASC for ascending or DESC for descending). For further information please refer to the examples below or to the [Hive documentation]({{< ref "#hive-documentation" >}}). | Optional | None | -| **format** | Storage format description including parameters for rowFormat, storedAs, and storedBy. For further information please refer to the examples below or to the [Hive documentation]({{< ref "#hive-documentation" >}}). | Optional | None | +| **clusteredBy** | An object describing how to cluster the table including the parameters columnNames, sortedBy, and numberOfBuckets. The sortedBy parameter includes the parameters columnName and order (ASC for ascending or DESC for descending). For further information please refer to the examples below or to the [Hive documentation]({{% ref "#hive-documentation" %}}). | Optional | None | +| **format** | Storage format description including parameters for rowFormat, storedAs, and storedBy. For further information please refer to the examples below or to the [Hive documentation]({{% ref "#hive-documentation" %}}). | Optional | None | | **location** | The HDFS path. | Optional | None | | **tableProperties** | A list of table property names and values (key/value pairs). | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -121,7 +121,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [GET ddl/database/:db/table/:table]({{< ref "webhcat-reference-gettable" >}}) Next: [POST ddl/database/:db/table/:table]({{< ref "webhcat-reference-posttable" >}}) +Previous: [GET ddl/database/:db/table/:table]({{% ref "webhcat-reference-gettable" %}}) Next: [POST ddl/database/:db/table/:table]({{% ref "webhcat-reference-posttable" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-puttablelike.md b/content/docs/latest/webhcat/webhcat-reference-puttablelike.md index 006c4e34..43e7136f 100644 --- a/content/docs/latest/webhcat/webhcat-reference-puttablelike.md +++ b/content/docs/latest/webhcat/webhcat-reference-puttablelike.md @@ -26,7 +26,7 @@ Create a new HCatalog table like an existing one. | **ifNotExists** | If true, you will not receive an error if the table already exists. | Optional | false | | **location** | The HDFS path | Optional | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -58,7 +58,7 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [DELETE ddl/database/:db/table/:table]({{< ref "webhcat-reference-deletetable" >}}) Next: [GET ddl/database/:db/table/:table/partition]({{< ref "webhcat-reference-getpartitions" >}}) +Previous: [DELETE ddl/database/:db/table/:table]({{% ref "webhcat-reference-deletetable" %}}) Next: [GET ddl/database/:db/table/:table/partition]({{% ref "webhcat-reference-getpartitions" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-responsetypes.md b/content/docs/latest/webhcat/webhcat-reference-responsetypes.md index 7f7f41e2..406e716a 100644 --- a/content/docs/latest/webhcat/webhcat-reference-responsetypes.md +++ b/content/docs/latest/webhcat/webhcat-reference-responsetypes.md @@ -19,7 +19,7 @@ Returns a list of the response types supported by WebHCat (Templeton). | --- | --- | --- | --- | | **:version** | The WebHCat version number. (Currently this must be "v1".) | Required | None | -The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported. +The [standard parameters]({{% ref "#standard-parameters" %}}) are also supported. ## Results @@ -59,8 +59,8 @@ The [standard parameters]({{< ref "#standard-parameters" >}}) are also supported **Navigation Links** -Previous: [Reference: WebHCat Resources]({{< ref "webhcat-reference" >}}) -Next: [GET status]({{< ref "webhcat-reference-status" >}}) +Previous: [Reference: WebHCat Resources]({{% ref "webhcat-reference" %}}) +Next: [GET status]({{% ref "webhcat-reference-status" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-status.md b/content/docs/latest/webhcat/webhcat-reference-status.md index f15115e9..9fd8bd36 100644 --- a/content/docs/latest/webhcat/webhcat-reference-status.md +++ b/content/docs/latest/webhcat/webhcat-reference-status.md @@ -15,7 +15,7 @@ Returns the current status of the WebHCat (Templeton) server. Useful for heartbe ## Parameters -Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. +Only the [standard parameters]({{% ref "#standard-parameters" %}}) are accepted. ## Results @@ -46,7 +46,7 @@ Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. **Navigation Links** -Previous: [Response Types (GET :version)]({{< ref "webhcat-reference-responsetypes" >}})Next: [GET version]({{< ref "webhcat-reference-version" >}}) +Previous: [Response Types (GET :version)]({{% ref "webhcat-reference-responsetypes" %}})Next: [GET version]({{% ref "webhcat-reference-version" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-version.md b/content/docs/latest/webhcat/webhcat-reference-version.md index 4b3ee5de..64c48c6c 100644 --- a/content/docs/latest/webhcat/webhcat-reference-version.md +++ b/content/docs/latest/webhcat/webhcat-reference-version.md @@ -15,7 +15,7 @@ Returns a list of supported versions and the current version. ## Parameters -Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. +Only the [standard parameters]({{% ref "#standard-parameters" %}}) are accepted. ## Results @@ -46,8 +46,8 @@ Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. ``` **Navigation Links** -Previous: [GET status]({{< ref "webhcat-reference-status" >}}) - Next: [GET version/hive]({{< ref "webhcat-reference-versionhive" >}}) +Previous: [GET status]({{% ref "webhcat-reference-status" %}}) + Next: [GET version/hive]({{% ref "webhcat-reference-versionhive" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-versionhadoop.md b/content/docs/latest/webhcat/webhcat-reference-versionhadoop.md index 5db30358..9f5ccdd2 100644 --- a/content/docs/latest/webhcat/webhcat-reference-versionhadoop.md +++ b/content/docs/latest/webhcat/webhcat-reference-versionhadoop.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Description -Return the version of Hadoop being run when WebHCat creates a MapReduce job ([POST mapreduce/jar]({{< ref "webhcat-reference-mapreducejar" >}}) or [mapreduce/streaming]({{< ref "webhcat-reference-mapreducestream" >}})). +Return the version of Hadoop being run when WebHCat creates a MapReduce job ([POST mapreduce/jar]({{% ref "webhcat-reference-mapreducejar" %}}) or [mapreduce/streaming]({{% ref "webhcat-reference-mapreducestream" %}})). Version: Hive 0.13.0 and later @@ -19,7 +19,7 @@ Version: Hive 0.13.0 and later ## Parameters -Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. +Only the [standard parameters]({{% ref "#standard-parameters" %}}) are accepted. ## Results @@ -50,12 +50,12 @@ Returns the Hadoop version.   **Navigation Links** -Previous: [GET version/hive]({{< ref "webhcat-reference-versionhive" >}}) -Next: [POST ddl]({{< ref "webhcat-reference-ddl" >}}) +Previous: [GET version/hive]({{% ref "webhcat-reference-versionhive" %}}) +Next: [POST ddl]({{% ref "webhcat-reference-ddl" %}}) -Replaces deprecated resource: [GET queue]({{< ref "webhcat-reference-jobids" >}}) +Replaces deprecated resource: [GET queue]({{% ref "webhcat-reference-jobids" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference-versionhive.md b/content/docs/latest/webhcat/webhcat-reference-versionhive.md index 130bf18d..b9c6fe0f 100644 --- a/content/docs/latest/webhcat/webhcat-reference-versionhive.md +++ b/content/docs/latest/webhcat/webhcat-reference-versionhive.md @@ -7,7 +7,7 @@ date: 2024-12-12 ## Description -Return the version of Hive being run when WebHCat issues Hive queries or commands ([POST hive]({{< ref "webhcat-reference-hive" >}})). +Return the version of Hive being run when WebHCat issues Hive queries or commands ([POST hive]({{% ref "webhcat-reference-hive" %}})). Version: Hive 0.13.0 and later @@ -19,7 +19,7 @@ Version: Hive 0.13.0 and later ## Parameters -Only the [standard parameters]({{< ref "#standard-parameters" >}}) are accepted. +Only the [standard parameters]({{% ref "#standard-parameters" %}}) are accepted. ## Results @@ -50,12 +50,12 @@ Returns the Hive version.   **Navigation Links** -Previous: [GET version]({{< ref "webhcat-reference-version" >}}) -Next: [GET version/hadoop]({{< ref "webhcat-reference-versionhadoop" >}}) +Previous: [GET version]({{% ref "webhcat-reference-version" %}}) +Next: [GET version/hadoop]({{% ref "webhcat-reference-versionhadoop" %}}) -Replaces deprecated resource: [GET queue]({{< ref "webhcat-reference-jobids" >}}) +Replaces deprecated resource: [GET queue]({{% ref "webhcat-reference-jobids" %}}) diff --git a/content/docs/latest/webhcat/webhcat-reference.md b/content/docs/latest/webhcat/webhcat-reference.md index 54db28c2..710565fc 100644 --- a/content/docs/latest/webhcat/webhcat-reference.md +++ b/content/docs/latest/webhcat/webhcat-reference.md @@ -7,56 +7,56 @@ date: 2024-12-12 # Reference: WebHCat Resources -This overview page lists all of the WebHCat resources. (DDL resources are listed here and on another [overview page]({{< ref "webhcat-reference-allddl" >}}). For information about HCatalog DDL commands, see [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}). For information about Hive DDL commands, see [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}).) +This overview page lists all of the WebHCat resources. (DDL resources are listed here and on another [overview page]({{% ref "webhcat-reference-allddl" %}}). For information about HCatalog DDL commands, see [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}). For information about Hive DDL commands, see [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}).)   | Category | Resource (Type) | Description | | --- | --- | --- | -| General | [:version (GET)]({{< ref "webhcat-reference-responsetypes" >}}) | Return a list of supported response types. | -|   | [status (GET)]({{< ref "webhcat-reference-status" >}}) | Return the WebHCat server status. | -|   | [version (GET)]({{< ref "webhcat-reference-version" >}}) | Return a list of supported versions and the current version. | -| | [version/hive (GET)]({{< ref "webhcat-reference-versionhive" >}}) | Return the Hive version being run. (Added in Hive 0.13.0.) | -| | [version/hadoop (GET)]({{< ref "webhcat-reference-versionhadoop" >}}) | Return the Hadoop version being run. (Added in Hive 0.13.0.) | -| [DDL]({{< ref "webhcat-reference-allddl" >}}) | [ddl (POST)]({{< ref "webhcat-reference-ddl" >}}) | Perform an HCatalog DDL command. | -|   | [ddl/database (GET)]({{< ref "webhcat-reference-getdbs" >}}) | List HCatalog databases. | -|   | [ddl/database/:db (GET)]({{< ref "webhcat-reference-getdb" >}}) | Describe an HCatalog database. | -|   | [ddl/database/:db (PUT)]({{< ref "webhcat-reference-putdb" >}}) | Create an HCatalog database. | -|   | [ddl/database/:db (DELETE)]({{< ref "webhcat-reference-deletedb" >}}) | Delete (drop) an HCatalog database. | -|   | [ddl/database/:db/table (GET)]({{< ref "webhcat-reference-gettables" >}}) | List the tables in an HCatalog database. | -|   | [ddl/database/:db/table/:table (GET)]({{< ref "webhcat-reference-gettable" >}}) | Describe an HCatalog table. | -|   | [ddl/database/:db/table/:table (PUT)]({{< ref "webhcat-reference-puttable" >}}) | Create a new HCatalog table. | -|   | [ddl/database/:db/table/:table (POST)]({{< ref "webhcat-reference-posttable" >}}) | Rename an HCatalog table. | -|   | [ddl/database/:db/table/:table (DELETE)]({{< ref "webhcat-reference-deletetable" >}}) | Delete (drop) an HCatalog table. | -|   | [ddl/database/:db/table/:existingtable/like/:newtable (PUT)]({{< ref "webhcat-reference-puttablelike" >}}) | Create a new HCatalog table like an existing one. | -|   | [ddl/database/:db/table/:table/partition (GET)]({{< ref "webhcat-reference-getpartitions" >}}) | List all partitions in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (GET)]({{< ref "webhcat-reference-getpartition" >}}) | Describe a single partition in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (PUT)]({{< ref "webhcat-reference-putpartition" >}}) | Create a partition in an HCatalog table. | -|   | [ddl/database/:db/table/:table/partition/:partition (DELETE)]({{< ref "webhcat-reference-deletepartition" >}}) | Delete (drop) a partition in an HCatalog table. | -|   | [ddl/database/:db/table/:table/column (GET)]({{< ref "webhcat-reference-getcolumns" >}}) | List the columns in an HCatalog table. | -|   | [ddl/database/:db/table/:table/column/:column (GET)]({{< ref "webhcat-reference-getcolumn" >}}) | Describe a single column in an HCatalog table. | -|   | [ddl/database/:db/table/:table/column/:column (PUT)]({{< ref "webhcat-reference-putcolumn" >}}) | Create a column in an HCatalog table. | -|   | [ddl/database/:db/table/:table/property (GET)]({{< ref "webhcat-reference-getproperties" >}}) | List table properties. | -|   | [ddl/database/:db/table/:table/property/:property (GET)]({{< ref "webhcat-reference-getproperty" >}}) | Return the value of a single table property. | -|   | [ddl/database/:db/table/:table/property/:property (PUT)]({{< ref "webhcat-reference-putproperty" >}}) | Set a table property. | -| MapReduce | [mapreduce/streaming (POST)]({{< ref "webhcat-reference-mapreducestream" >}}) | Create and queue Hadoop streaming MapReduce jobs. | -|   | [mapreduce/jar (POST)]({{< ref "webhcat-reference-mapreducejar" >}}) | Create and queue standard Hadoop MapReduce jobs. | -| Pig | [pig (POST)]({{< ref "webhcat-reference-pig" >}}) | Create and queue Pig jobs. | -| Hive | [hive (POST)]({{< ref "webhcat-reference-hive" >}}) | Run Hive queries and commands. | -| Queue(deprecated in Hive 0.12,removed in Hive 0.14) | [queue (GET)]({{< ref "webhcat-reference-jobids" >}}) | Return a list of all job IDs. (Removed in Hive 0.14.0.) | -|   | [queue/:jobid (GET)]({{< ref "webhcat-reference-jobinfo" >}}) | Return the status of a job given its ID. (Removed in Hive 0.14.0.) | -|   | [queue/:jobid (DELETE)]({{< ref "webhcat-reference-deletejob" >}}) | Kill a job given its ID. (Removed in Hive 0.14.0.) | -| Jobs(Hive 0.12 and later) | [jobs (GET)]({{< ref "webhcat-reference-jobs" >}}) | Return a list of all job IDs. | -|   | [jobs/:jobid (GET)]({{< ref "webhcat-reference-job" >}}) | Return the status of a job given its ID. | -|   | [jobs/:jobid (DELETE)]({{< ref "webhcat-reference-deletejobid" >}}) | Kill a job given its ID. | +| General | [:version (GET)]({{% ref "webhcat-reference-responsetypes" %}}) | Return a list of supported response types. | +|   | [status (GET)]({{% ref "webhcat-reference-status" %}}) | Return the WebHCat server status. | +|   | [version (GET)]({{% ref "webhcat-reference-version" %}}) | Return a list of supported versions and the current version. | +| | [version/hive (GET)]({{% ref "webhcat-reference-versionhive" %}}) | Return the Hive version being run. (Added in Hive 0.13.0.) | +| | [version/hadoop (GET)]({{% ref "webhcat-reference-versionhadoop" %}}) | Return the Hadoop version being run. (Added in Hive 0.13.0.) | +| [DDL]({{% ref "webhcat-reference-allddl" %}}) | [ddl (POST)]({{% ref "webhcat-reference-ddl" %}}) | Perform an HCatalog DDL command. | +|   | [ddl/database (GET)]({{% ref "webhcat-reference-getdbs" %}}) | List HCatalog databases. | +|   | [ddl/database/:db (GET)]({{% ref "webhcat-reference-getdb" %}}) | Describe an HCatalog database. | +|   | [ddl/database/:db (PUT)]({{% ref "webhcat-reference-putdb" %}}) | Create an HCatalog database. | +|   | [ddl/database/:db (DELETE)]({{% ref "webhcat-reference-deletedb" %}}) | Delete (drop) an HCatalog database. | +|   | [ddl/database/:db/table (GET)]({{% ref "webhcat-reference-gettables" %}}) | List the tables in an HCatalog database. | +|   | [ddl/database/:db/table/:table (GET)]({{% ref "webhcat-reference-gettable" %}}) | Describe an HCatalog table. | +|   | [ddl/database/:db/table/:table (PUT)]({{% ref "webhcat-reference-puttable" %}}) | Create a new HCatalog table. | +|   | [ddl/database/:db/table/:table (POST)]({{% ref "webhcat-reference-posttable" %}}) | Rename an HCatalog table. | +|   | [ddl/database/:db/table/:table (DELETE)]({{% ref "webhcat-reference-deletetable" %}}) | Delete (drop) an HCatalog table. | +|   | [ddl/database/:db/table/:existingtable/like/:newtable (PUT)]({{% ref "webhcat-reference-puttablelike" %}}) | Create a new HCatalog table like an existing one. | +|   | [ddl/database/:db/table/:table/partition (GET)]({{% ref "webhcat-reference-getpartitions" %}}) | List all partitions in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (GET)]({{% ref "webhcat-reference-getpartition" %}}) | Describe a single partition in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (PUT)]({{% ref "webhcat-reference-putpartition" %}}) | Create a partition in an HCatalog table. | +|   | [ddl/database/:db/table/:table/partition/:partition (DELETE)]({{% ref "webhcat-reference-deletepartition" %}}) | Delete (drop) a partition in an HCatalog table. | +|   | [ddl/database/:db/table/:table/column (GET)]({{% ref "webhcat-reference-getcolumns" %}}) | List the columns in an HCatalog table. | +|   | [ddl/database/:db/table/:table/column/:column (GET)]({{% ref "webhcat-reference-getcolumn" %}}) | Describe a single column in an HCatalog table. | +|   | [ddl/database/:db/table/:table/column/:column (PUT)]({{% ref "webhcat-reference-putcolumn" %}}) | Create a column in an HCatalog table. | +|   | [ddl/database/:db/table/:table/property (GET)]({{% ref "webhcat-reference-getproperties" %}}) | List table properties. | +|   | [ddl/database/:db/table/:table/property/:property (GET)]({{% ref "webhcat-reference-getproperty" %}}) | Return the value of a single table property. | +|   | [ddl/database/:db/table/:table/property/:property (PUT)]({{% ref "webhcat-reference-putproperty" %}}) | Set a table property. | +| MapReduce | [mapreduce/streaming (POST)]({{% ref "webhcat-reference-mapreducestream" %}}) | Create and queue Hadoop streaming MapReduce jobs. | +|   | [mapreduce/jar (POST)]({{% ref "webhcat-reference-mapreducejar" %}}) | Create and queue standard Hadoop MapReduce jobs. | +| Pig | [pig (POST)]({{% ref "webhcat-reference-pig" %}}) | Create and queue Pig jobs. | +| Hive | [hive (POST)]({{% ref "webhcat-reference-hive" %}}) | Run Hive queries and commands. | +| Queue(deprecated in Hive 0.12,removed in Hive 0.14) | [queue (GET)]({{% ref "webhcat-reference-jobids" %}}) | Return a list of all job IDs. (Removed in Hive 0.14.0.) | +|   | [queue/:jobid (GET)]({{% ref "webhcat-reference-jobinfo" %}}) | Return the status of a job given its ID. (Removed in Hive 0.14.0.) | +|   | [queue/:jobid (DELETE)]({{% ref "webhcat-reference-deletejob" %}}) | Kill a job given its ID. (Removed in Hive 0.14.0.) | +| Jobs(Hive 0.12 and later) | [jobs (GET)]({{% ref "webhcat-reference-jobs" %}}) | Return a list of all job IDs. | +|   | [jobs/:jobid (GET)]({{% ref "webhcat-reference-job" %}}) | Return the status of a job given its ID. | +|   | [jobs/:jobid (DELETE)]({{% ref "webhcat-reference-deletejobid" %}}) | Kill a job given its ID. | **Navigation Links** -Previous: [Configuration]({{< ref "webhcat-configure" >}}) - Next: [GET :version]({{< ref "webhcat-reference-responsetypes" >}}) +Previous: [Configuration]({{% ref "webhcat-configure" %}}) + Next: [GET :version]({{% ref "webhcat-reference-responsetypes" %}}) -Overview of DDL resources: [WebHCat Reference: DDL]({{< ref "webhcat-reference-allddl" >}}) - HCatalog DDL commands: [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) - Hive DDL commands: [Hive Data Definition Language]({{< ref "languagemanual-ddl" >}}) +Overview of DDL resources: [WebHCat Reference: DDL]({{% ref "webhcat-reference-allddl" %}}) + HCatalog DDL commands: [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}) + Hive DDL commands: [Hive Data Definition Language]({{% ref "languagemanual-ddl" %}}) diff --git a/content/docs/latest/webhcat/webhcat-usingwebhcat.md b/content/docs/latest/webhcat/webhcat-usingwebhcat.md index de3f155a..359233fa 100644 --- a/content/docs/latest/webhcat/webhcat-usingwebhcat.md +++ b/content/docs/latest/webhcat/webhcat-usingwebhcat.md @@ -14,13 +14,13 @@ The HCatalog project graduated from the Apache incubator and merged with the Hiv This document describes the HCatalog REST API, *WebHCat*, which was previously called *Templeton*. -As shown in the figure below, developers make HTTP requests to access [Hadoop](http://hadoop.apache.org/) [MapReduce](http://hadoop.apache.org/docs/stable/mapred_tutorial.html) (or [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)), [Pig](http://pig.apache.org/), [Hive](http://hive.apache.org/), and [HCatalog DDL]({{< ref "#hcatalog-ddl" >}}) from within applications. Data and code used by this API are maintained in [HDFS](http://hadoop.apache.org/docs/stable/hdfs_user_guide.html). HCatalog DDL commands are executed directly when requested. MapReduce, Pig, and Hive jobs are placed in queue by WebHCat (Templeton) servers and can be monitored for progress or stopped as required. Developers specify a location in HDFS into which Pig, Hive, and MapReduce results should be placed. +As shown in the figure below, developers make HTTP requests to access [Hadoop](http://hadoop.apache.org/) [MapReduce](http://hadoop.apache.org/docs/stable/mapred_tutorial.html) (or [YARN](http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)), [Pig](http://pig.apache.org/), [Hive](http://hive.apache.org/), and [HCatalog DDL]({{% ref "#hcatalog-ddl" %}}) from within applications. Data and code used by this API are maintained in [HDFS](http://hadoop.apache.org/docs/stable/hdfs_user_guide.html). HCatalog DDL commands are executed directly when requested. MapReduce, Pig, and Hive jobs are placed in queue by WebHCat (Templeton) servers and can be monitored for progress or stopped as required. Developers specify a location in HDFS into which Pig, Hive, and MapReduce results should be placed. ![](/attachments/34015492/34177184.jpg) WebHCat or Templeton? -For backward compatibility, the original name Templeton is still used for WebHCat in some contexts. See [#Project Name]({{< ref "##project-name" >}}) below. +For backward compatibility, the original name Templeton is still used for WebHCat in some contexts. See [#Project Name]({{% ref "##project-name" %}}) below. ## URL Format @@ -34,7 +34,7 @@ For example, to check if the server is running you could access the following UR `ht``tp://www.myserver.com/templeton/v1/status` -See [Reference: WebHCat Resources]({{< ref "webhcat-reference" >}}) for information about the individual REST resources. +See [Reference: WebHCat Resources]({{% ref "webhcat-reference" %}}) for information about the individual REST resources. ## Security @@ -108,7 +108,7 @@ The server creates three log files when in operation: In the tempelton-log4j.properties file you can set the location of these logs using the variable templeton.log.dir. This log4j.properties file is set in the server startup script. -Hive log files are described in the [Hive Logging]({{< ref "#hive-logging" >}}) section of [Getting Started]({{< ref "gettingstarted-latest" >}}). +Hive log files are described in the [Hive Logging]({{% ref "#hive-logging" %}}) section of [Getting Started]({{% ref "gettingstarted-latest" %}}). ## Project Name @@ -117,7 +117,7 @@ The original work to add REST APIs to HCatalog was called Templeton. For backwar   **Navigation Links** -Next: [WebHCat Installation]({{< ref "webhcat-installwebhcat" >}}) +Next: [WebHCat Installation]({{% ref "webhcat-installwebhcat" %}})