Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 10 additions & 10 deletions config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,15 @@ theme = 'hive'
poweredby = "/general/poweredby/"
javaDocs = "/docs/javadocs/"
latest = "/docs/latest/"
languageManual = "https://hive.apache.org/docs/latest/language/languagemanual"
languageManual = "/docs/latest/language/languagemanual"
license2 = "https://www.apache.org/licenses/LICENSE-2.0.html"
privacyPolicy = "/general/privacypolicy/"
designDocs = "/development/desingdocs/"
hiveJira = "https://issues.apache.org/jira/projects/HIVE/issues"
faq = "https://hive.apache.org/community/resources/hivedeveloperfaq"
faq = "/community/resources/hivedeveloperfaq"
vcs = "/development/versioncontrol/"
committer = "/community/becomingcommitter/"
contribute = "https://hive.apache.org/community/resources/howtocontribute"
contribute = "/community/resources/howtocontribute"
resourcesForDev = "/community/resources/"
meetings = "/community/meetings/"
mailinglists = "/community/mailinglists/"
Expand All @@ -52,13 +52,13 @@ theme = 'hive'
announcements = "/general/downloads/#23-november-2025--release-420-available"

[params.features]
acidTxn = "https://hive.apache.org/docs/latest/user/hive-transactions"
hs2 = "https://hive.apache.org/docs/latest/user/hiveserver2-overview"
hms = "https://hive.apache.org/development/desingdocs/design"
compactions = "https://hive.apache.org/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact"
repl = "https://hive.apache.org/docs/latest/admin/replication"
cbo = "https://hive.apache.org/docs/latest/user/cost-based-optimization-in-hive"
llap = "https://hive.apache.org/development/desingdocs/llap"
acidTxn = "/docs/latest/user/hive-transactions"
hs2 = "/docs/latest/user/hiveserver2-overview"
hms = "/development/desingdocs/design"
compactions = "/docs/latest/language/languagemanual-ddl#alter-tablepartition-compact"
repl = "/docs/latest/admin/replication"
cbo = "/docs/latest/user/cost-based-optimization-in-hive"
llap = "/development/desingdocs/llap"
iceberg = "https://iceberg.apache.org/docs/latest/hive/"

[outputs]
Expand Down
10 changes: 5 additions & 5 deletions content/Development/desingdocs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,15 +48,15 @@ The Metastore provides two important but often overlooked features of a data war

Metastore is an object store with a database or file backed store. The database backed store is implemented using an object-relational mapping (ORM) solution called the [DataNucleus](http://www.datanucleus.org/). The prime motivation for storing this in a relational database is queriability of metadata. Some disadvantages of using a separate data store for metadata instead of using HDFS are synchronization and scalability issues. Additionally there is no clear way to implement an object store on top of HDFS due to lack of random updates to files. This, coupled with the advantages of queriability of a relational store, made our approach a sensible one.

The metastore can be configured to be used in a couple of ways: remote and embedded. In remote mode, the metastore is a [Thrift](https://thrift.apache.org/) service. This mode is useful for non-Java clients. In embedded mode, the Hive client directly connects to an underlying metastore using JDBC. This mode is useful because it avoids another system that needs to be maintained and monitored. Both of these modes can co-exist. (Update: Local metastore is a third possibility. See [Hive Metastore Administration](https://hive.apache.org/docs/latest/admin/adminmanual-metastore-administration) for details.)
The metastore can be configured to be used in a couple of ways: remote and embedded. In remote mode, the metastore is a [Thrift](https://thrift.apache.org/) service. This mode is useful for non-Java clients. In embedded mode, the Hive client directly connects to an underlying metastore using JDBC. This mode is useful because it avoids another system that needs to be maintained and monitored. Both of these modes can co-exist. (Update: Local metastore is a third possibility. See [Hive Metastore Administration](/docs/latest/admin/adminmanual-metastore-administration) for details.)

### Metastore Interface

Metastore provides a [Thrift interface](https://thrift.apache.org/docs/idl) to manipulate and query Hive metadata. Thrift provides bindings in many popular languages. Third party tools can use this interface to integrate Hive metadata into other business metadata repositories.

## Hive Query Language

HiveQL is an SQL-like query language for Hive. It mostly mimics SQL syntax for creation of tables, loading data into tables and querying the tables. HiveQL also allows users to embed their custom map-reduce scripts. These scripts can be written in any language using a simple row-based streaming interface – read rows from standard input and write out rows to standard output. This flexibility comes at a cost of a performance hit caused by converting rows from and to strings. However, we have seen that users do not mind this given that they can implement their scripts in the language of their choice. Another feature unique to HiveQL is multi-table insert. In this construct, users can perform multiple queries on the same input data using a single HiveQL query. Hive optimizes these queries to share the scan of the input data, thus increasing the throughput of these queries several orders of magnitude. We omit more details due to lack of space. For a more complete description of the HiveQL language see the [language manual](https://hive.apache.org/docs/latest/language/languagemanual).
HiveQL is an SQL-like query language for Hive. It mostly mimics SQL syntax for creation of tables, loading data into tables and querying the tables. HiveQL also allows users to embed their custom map-reduce scripts. These scripts can be written in any language using a simple row-based streaming interface – read rows from standard input and write out rows to standard output. This flexibility comes at a cost of a performance hit caused by converting rows from and to strings. However, we have seen that users do not mind this given that they can implement their scripts in the language of their choice. Another feature unique to HiveQL is multi-table insert. In this construct, users can perform multiple queries on the same input data using a single HiveQL query. Hive optimizes these queries to share the scan of the input data, thus increasing the throughput of these queries several orders of magnitude. We omit more details due to lack of space. For a more complete description of the HiveQL language see the [language manual](/docs/latest/language/languagemanual).

## Compiler

Expand All @@ -67,11 +67,11 @@ HiveQL is an SQL-like query language for Hive. It mostly mimics SQL syntax for c

## Optimizer

More plan transformations are performed by the optimizer. The optimizer is an evolving component. As of 2011, it was rule-based and performed the following: column pruning and predicate pushdown. However, the infrastructure was in place, and there was work under progress to include other optimizations like map-side join. (Hive 0.11 added several [join optimizations](https://hive.apache.org/docs/latest/language/languagemanual-joinoptimization).)
More plan transformations are performed by the optimizer. The optimizer is an evolving component. As of 2011, it was rule-based and performed the following: column pruning and predicate pushdown. However, the infrastructure was in place, and there was work under progress to include other optimizations like map-side join. (Hive 0.11 added several [join optimizations](/docs/latest/language/languagemanual-joinoptimization).)

The optimizer can be enhanced to be cost-based (see [Cost-based optimization in Hive](https://hive.apache.org/docs/latest/user/cost-based-optimization-in-hive) and [HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775)). The sorted nature of output tables can also be preserved and used later on to generate better plans. The query can be performed on a small sample of data to guess the data distribution, which can be used to generate a better plan.
The optimizer can be enhanced to be cost-based (see [Cost-based optimization in Hive](/docs/latest/user/cost-based-optimization-in-hive) and [HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775)). The sorted nature of output tables can also be preserved and used later on to generate better plans. The query can be performed on a small sample of data to guess the data distribution, which can be used to generate a better plan.

A [correlation optimizer](https://hive.apache.org/development/desingdocs/correlation-optimizer) was added in Hive 0.12.
A [correlation optimizer](/development/desingdocs/correlation-optimizer) was added in Hive 0.12.

The plan is a generic operator tree, and can be easily manipulated.

Expand Down
12 changes: 6 additions & 6 deletions content/Development/desingdocs/designdocs.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,30 +23,30 @@ Proposals that appear in the "Completed" and "In Progress" sections should inclu
* [Table-level Statistics]({{< ref "statsdev" >}}) ([HIVE-1361](https://issues.apache.org/jira/browse/HIVE-1361))
* [Dynamic Partitions]({{< ref "dynamicpartitions" >}})
* [Binary Data Type]({{< ref "binary-datatype-proposal" >}}) ([HIVE-2380](https://issues.apache.org/jira/browse/HIVE-2380))
* [Decimal Precision and Scale Support](https://hive.apache.org/attachments/27362075/34177489.pdf)
* [Decimal Precision and Scale Support](/attachments/27362075/34177489.pdf)
* [HCatalog]({{< ref "hcatalog-base" >}}) (formerly [Howl]({{< ref "howl" >}}))
* [HiveServer2]({{< ref "hiveserver2-thrift-api" >}}) ([HIVE-2935](https://issues.apache.org/jira/browse/HIVE-2935))
* [Column Statistics in Hive]({{< ref "column-statistics-in-hive" >}}) ([HIVE-1362](https://issues.apache.org/jira/browse/HIVE-1362))
* [List Bucketing]({{< ref "listbucketing" >}}) ([HIVE-3026](https://issues.apache.org/jira/browse/HIVE-3026))
* [Group By With Rollup]({{< ref "groupbywithrollup" >}}) ([HIVE-2397](https://issues.apache.org/jira/browse/HIVE-2397))
* [Enhanced Aggregation, Cube, Grouping and Rollup](https://hive.apache.org/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup) ([HIVE-3433](https://issues.apache.org/jira/browse/HIVE-3433))
* [Enhanced Aggregation, Cube, Grouping and Rollup](/docs/latest/language/enhanced-aggregation-cube-grouping-and-rollup) ([HIVE-3433](https://issues.apache.org/jira/browse/HIVE-3433))
* [Optimizing Skewed Joins]({{< ref "skewed-join-optimization" >}}) ([HIVE-3086](https://issues.apache.org/jira/browse/HIVE-3086))
* [Correlation Optimizer]({{< ref "correlation-optimizer" >}}) ([HIVE-2206](https://issues.apache.org/jira/browse/HIVE-2206))
* [Hive on Tez]({{< ref "hive-on-tez" >}}) ([HIVE-4660](https://issues.apache.org/jira/browse/HIVE-4660))
+ [Hive-Tez Compatibility]({{< ref "hive-tez-compatibility" >}})
* [Vectorized Query Execution]({{< ref "vectorized-query-execution" >}}) ([HIVE-4160](https://issues.apache.org/jira/browse/HIVE-4160))
* [Cost Based Optimizer in Hive](https://hive.apache.org/docs/latest/user/cost-based-optimization-in-hive) ([HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775))
* [Cost Based Optimizer in Hive](/docs/latest/user/cost-based-optimization-in-hive) ([HIVE-5775](https://issues.apache.org/jira/browse/HIVE-5775))
* [Atomic Insert/Update/Delete](https://issues.apache.org/jira/browse/HIVE-5317) ([HIVE-5317](https://issues.apache.org/jira/browse/HIVE-5317))
* [Transaction Manager](https://issues.apache.org/jira/browse/HIVE-5843) ([HIVE-5843](https://issues.apache.org/jira/browse/HIVE-5843))
* [SQL Standard based secure authorization](https://hive.apache.org/attachments/27362075/35193122.pdf) ([HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837))
* [SQL Standard based secure authorization](/attachments/27362075/35193122.pdf) ([HIVE-5837](https://issues.apache.org/jira/browse/HIVE-5837))
* [Hybrid Hybrid Grace Hash Join]({{< ref "hybrid-grace-hash-join-v1-0" >}}) ([HIVE-9277](https://issues.apache.org/jira/browse/HIVE-9277))
* [LLAP Daemons]({{< ref "llap" >}}) ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926))
* [Support for Hive Replication]({{< ref "hivereplicationdevelopment" >}}) ([HIVE-7973](https://issues.apache.org/jira/browse/HIVE-7973))

## In Progress

* [Column Level Top K Statistics]({{< ref "top-k-stats" >}}) ([HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421))
* [Hive on Spark](https://hive.apache.org/docs/latest/user/hive-on-spark) ([HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292))
* [Hive on Spark](/docs/latest/user/hive-on-spark) ([HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292))
* [Hive on Spark: Join Design (HIVE-7613)]({{< ref "hive-on-spark-join-design-master" >}})
* [Improve ACID Performance](https://issues.apache.org/jira/secure/attachment/12823582/Design.Document.Improving%20ACID%20performance%20in%20Hive.02.docx) – download docx file ([HIVE-14035](https://issues.apache.org/jira/browse/HIVE-14035), [HIVE-14199](https://issues.apache.org/jira/browse/HIVE-14199), [HIVE-14233](https://issues.apache.org/jira/browse/HIVE-14233))
* [Query Results Caching]({{< ref "query-results-caching" >}}) ([HIVE-18513](https://issues.apache.org/jira/browse/HIVE-18513))
Expand All @@ -69,7 +69,7 @@ Proposals that appear in the "Completed" and "In Progress" sections should inclu
* [Updatable Views]({{< ref "updatableviews" >}}) ([HIVE-1143](https://issues.apache.org/jira/browse/HIVE-1143))
* [Phase 2 of Replication Development]({{< ref "hivereplicationv2development" >}}) ([HIVE-14841](https://issues.apache.org/jira/browse/HIVE-14841))
* [Subqueries in SELECT]({{< ref "subqueries-in-select" >}}) ([HIVE-16091](https://issues.apache.org/jira/browse/HIVE-16091))
* [DEFAULT keyword](https://hive.apache.org/development/desingdocs/default-keyword) [(HIVE-19059)](https://issues.apache.org/jira/browse/HIVE-19059)
* [DEFAULT keyword](/development/desingdocs/default-keyword) [(HIVE-19059)](https://issues.apache.org/jira/browse/HIVE-19059)
* [Hive remote databases/tables]({{< ref "hive-remote-databases-tables" >}})

## Incomplete
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Cameron Moberg (Google), Zhou Fang (Google), Feng Lu (Google), Thejas Nair (Clo

# Objective

* To modernize [Hive Metastore’s](https://hive.apache.org/docs/latest/admin/adminmanual-metastore-3-0-administration) interface with a state-of-the-art serving layer based on gRPC while also keeping it backwards compatible with Thrift for minimal upgrade toil;
* To modernize [Hive Metastore’s](/docs/latest/admin/adminmanual-metastore-3-0-administration) interface with a state-of-the-art serving layer based on gRPC while also keeping it backwards compatible with Thrift for minimal upgrade toil;
* To achieve this the proposed design is to add support for a proxy-layer between the Thrift interface and a new gRPC interface that allows for in-memory request/response translation in-between;
* To expand the Hive client to work with Hive Metastore server in both gRPC and Thrift mode.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ As of [Hive 2.0.0](https://issues.apache.org/jira/browse/HIVE-11306), a cheap Bl
# References

* Hybrid Hybrid Grace Hash Join presentation by Mostafa
* MapJoinOptimization <https://hive.apache.org/development/desingdocs/mapjoinoptimization>
* [MapJoinOptimization](/development/desingdocs/mapjoinoptimization)
* [HIVE-1641](https://issues.apache.org/jira/browse/HIVE-1641) add map joined table to distributed cache
* [HIVE-1642](https://issues.apache.org/jira/browse/HIVE-1642) Convert join queries to map-join based on size of table/row
* Database Management Systems, 3rd ed
Expand Down
2 changes: 1 addition & 1 deletion content/Development/desingdocs/llap.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ The watch and running nodes options were added in release 2.2.0 with [HIVE-15217

[LLAP Design Document](https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf)

[Hive Contributor Meetup Presentation](https://hive.apache.org/attachments/27362054/LLAP-Meetup-Nov.ppsx)
[Hive Contributor Meetup Presentation](/attachments/27362054/LLAP-Meetup-Nov.ppsx)

## Attachments:

Expand Down
4 changes: 2 additions & 2 deletions content/Development/desingdocs/storagehandlers.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ date: 2024-12-12

This page documents the storage handler support being added to Hive as part of work on [HBaseIntegration]({{< ref "hbaseintegration" >}}). The motivation is to make it possible to allow Hive to access data stored and managed by other systems in a modular, extensible fashion.

Besides HBase, a storage handler implementation is also available for [Hypertable](http://code.google.com/p/hypertable/wiki/HiveExtension), and others are being developed for [Cassandra](https://issues.apache.org/jira/browse/HIVE-1434), [Azure Table](https://blogs.msdn.microsoft.com/mostlytrue/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight/), [JDBC](https://hive.apache.org/docs/latest/user/jdbc-storage-handler) (MySQL and others), [MongoDB](https://github.com/yc-huang/Hive-mongo), [ElasticSearch](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html), [Phoenix HBase](https://phoenix.apache.org/hive_storage_handler.html?platform=hootsuite), [VoltDB](https://issues.voltdb.com/browse/ENG-10736?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel) and [Google Spreadsheets](https://github.com/balshor/gdata-storagehandler).  A [Kafka handler](https://github.com/HiveKa/HiveKa) demo is available.
Besides HBase, a storage handler implementation is also available for [Hypertable](http://code.google.com/p/hypertable/wiki/HiveExtension), and others are being developed for [Cassandra](https://issues.apache.org/jira/browse/HIVE-1434), [Azure Table](https://blogs.msdn.microsoft.com/mostlytrue/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight/), [JDBC](/docs/latest/user/jdbc-storage-handler) (MySQL and others), [MongoDB](https://github.com/yc-huang/Hive-mongo), [ElasticSearch](https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html), [Phoenix HBase](https://phoenix.apache.org/hive_storage_handler.html?platform=hootsuite), [VoltDB](https://issues.voltdb.com/browse/ENG-10736?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel) and [Google Spreadsheets](https://github.com/balshor/gdata-storagehandler).  A [Kafka handler](https://github.com/HiveKa/HiveKa) demo is available.

Hive storage handler support builds on existing extensibility features in both Hadoop and Hive:

Expand Down Expand Up @@ -63,7 +63,7 @@ CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name

```

When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS cannot be specified, however starting from [Hive 4.0](https://hive.apache.org/docs/latest/user/hive-iceberg-integration), they can coexist to create the Iceberg table, this is the only exception. Optional SERDEPROPERTIES can be specified as part of the STORED BY clause and will be passed to the serde provided by the storage handler.
When STORED BY is specified, then row_format (DELIMITED or SERDE) and STORED AS cannot be specified, however starting from [Hive 4.0](/docs/latest/user/hive-iceberg-integration), they can coexist to create the Iceberg table, this is the only exception. Optional SERDEPROPERTIES can be specified as part of the STORED BY clause and will be passed to the serde provided by the storage handler.

See [CREATE TABLE]({{< ref "#create-table" >}}) and [Row Format, Storage Format, and SerDe]({{< ref "#row-format,-storage-format,-and-serde" >}}) for more information.

Expand Down
Loading