Hive Table Statistics
Minimum Required Role: Cluster Administrator (also provided by Full Administrator)
Statistics for Hive can be numbers of rows of tables or partitions and the histograms of interesting columns. Statistics are used by the cost functions of the query optimizer to generate query plans for the purpose of query optimization.
If your cluster has Impala then you can use the Impala implementation to compute statistics. The
Impala implementation to compute table statistics is available in CDH 5.0.0 or higher and in Impala version 1.2.2 or higher. The Impala implementation of COMPUTE STATS
requires no setup steps and is preferred over the Hive implementation. See Overview of Table Statistics. If you are
running an older version of Impala, you can collect statistics on a Hive table by running the following command from a Beeline client connected to HiveServer2:
analyze table <table name> compute statistics; analyze table <table name> compute statistics for columns <all columns of a table>;
Page generated August 14, 2017.
<< HiveServer2 Web UI | ©2016 Cloudera, Inc. All rights reserved | Managing User-Defined Functions (UDFs) with HiveServer2 >> |
Terms and Conditions Privacy Policy |