Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions content/Development/desingdocs/column-statistics-in-hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Column Statistics in Hive

{{< toc >}}

### **Introduction**

This document describes changes to a) HiveQL, b) metastore schema, and c) metastore Thrift API to support column level statistics in Hive. Please note that the document doesn’t describe the changes needed to persist histograms in the metastore yet.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page contains details about the Hive design and architecture. A brief technical report about Hive is available at [hive.pdf]({{< ref "#hive-pdf" >}}).

{{< toc >}}

## Hive Architecture

Figure 1
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/dynamicpartitions.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : DynamicPartitions

{{< toc >}}

## Documentation

This is the design document for dynamic partitions in Hive. Usage information is also available:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Enabling gRPC in Hive/Hive Metastore (Proposal)

{{< toc >}}

## Contacts
Cameron Moberg (Google), Zhou Fang (Google), Feng Lu (Google), Thejas Nair (Cloudera), Vihang Karajgaonkar (Cloudera), Naveen Gangam (Cloudera)

Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/filterpushdowndev.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This document explains how we are planning to add support in Hive's optimizer for pushing filters down into physical access methods. This is an important optimization for minimizing the amount of data scanned and processed by an access method (e.g. for an indexed key lookup), as well as reducing the amount of data passed into Hive for further query evaluation.

{{< toc >}}

## Use Cases

Below are the main use cases we are targeting.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/groupbywithrollup.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Group By With Rollup

{{< toc >}}

## Terminology

* (No) Map Aggr: Shorthand for whether the configuration variable hive.map.aggr is set to true or false, meaning mapside aggregation is allowed or not respectively.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/hbasebulkload.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page explains how to use Hive to bulk load data into a new (empty) HBase table per [HIVE-1295](https://issues.apache.org/jira/browse/HIVE-1295). (If you're not using a build which contains this functionality yet, you'll need to build from source and make sure this patch and HIVE-1321 are both applied.)

{{< toc >}}

## Overview

Ideally, bulk load from Hive into HBase would be part of [HBaseIntegration]({{< ref "hbaseintegration" >}}), making it as simple as this:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ Guide for contributors to the metastore on hbase development work. Umbrella JIR

This work is discontinued and the code is removed in release 3.0.0 ([HIVE-17234](https://issues.apache.org/jira/browse/HIVE-17234)).

{{< toc >}}

# Building

You will need to download the source for Tephra and build it from the develop branch.  You need Tephra 0.5.1-SNAPSHOT.  You can get Tephra from [Cask's github](https://github.com/caskdata/tephra).  Switch to the branch develop and doing 'mvn install' will build the version you need.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This project has been abandoned. We're leaving the design doc here in case someone decides to attempt this project in the future.

{{< toc >}}

## Use Cases

Inside facebook, we are running out of power inside a data center (physical cluster), and we have a need to have a bigger cluster.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Hive on Spark: Join Design Master

{{< toc >}}

## Purpose and Prerequisites

The purpose of this document is to summarize the findings of all the research of different joins and describe a unified design to attack the problem in Spark.  It will identify the optimization processors will be involved and their responsibilities.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/hive-on-tez.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Hive on Tez

{{< toc >}}

# Overview

[Tez](http://tez.apache.org/) is a new application framework built on Hadoop Yarn that can execute complex directed acyclic graphs of general data processing tasks. In many ways it can be thought of as a more flexible and powerful successor of the map-reduce framework.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/hivereplicationdevelopment.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : HiveReplicationDevelopment

{{< toc >}}

# Introduction

Replication in the context of databases and warehouses is the process of duplication of entities from one warehouse to another. This can be at the broader level of an entire database, or at a smaller level such as a table or partition. The goal of replication is to have a replica which changes whenever the base entity changes.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,6 @@ This document describes the second version of Hive Replication. Please refer to

This work is under development and interfaces are subject to change. This has been designed for use in conjunction with external orchestration tools, which would be responsible for co-ordinating the right sequence of commands between source and target clusters, fault tolerance/failure handling, and also providing correct configuration options that are necessary to be able to do cross cluster replication.

{{< toc >}}

# Version information

As of Hive 3.0.0 release : only managed table replication where Hive user owns the table contents is supported. External tables, ACID tables, statistics and constraint replication are not supported.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/hybrid-grace-hash-join-v1-0.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Hybrid Hybrid Grace Hash Join, v1.0

{{< toc >}}

# Overview

We are proposing an enhanced hash join algorithm called “hybrid hybrid grace hash join”. We can benefit from this feature as illustrated below:
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/indexdev-bitmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Bitmap Indexing

{{< toc >}}

## Introduction

This document explains the proposed design for adding a bitmap index handler (<https://issues.apache.org/jira/browse/HIVE-1803>).
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/indexdev.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Indexes

{{< toc >}}

## Indexing Is Removed since 3.0

There are alternate options which might work similarily to indexing:
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/listbucketing.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : ListBucketing

{{< toc >}}

# Goal

The top level problem is as follows:
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/llap.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ date: 2024-12-12
Live Long And Process (LLAP) functionality was added in Hive 2.0 ([HIVE-7926](https://issues.apache.org/jira/browse/HIVE-7926) and associated tasks). [HIVE-9850](https://issues.apache.org/jira/browse/HIVE-9850) links documentation, features, and issues for this enhancement.
For configuration of LLAP, see the LLAP Section of [Configuration Properties]({{< ref "#configuration-properties" >}}).

{{< toc >}}

## Overview

Hive has become significantly faster thanks to various features and improvements that were built by the community in recent years, including [Tez]({{< ref "hive-on-tez" >}}) and [Cost-based-optimization]({{< ref "cost-based-optimization-in-hive" >}}). The following were needed to take Hive to the next level:
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/locking.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Locking

{{< toc >}}

# Hive Concurrency Model

## Use Cases
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : MapJoin and Partition Pruning

{{< toc >}}

# Overview

In Hive, Map-Join is a technique that materializes data for all tables involved in the join except for the largest table and then large table is streamed over the materialized data from small tables. Map-Join is often a good join approach for star-schema joins where the fact table will be streamed over materialized dimension tables.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/mapjoinoptimization.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : MapJoinOptimization

{{< toc >}}

# 1. Map Join Optimization

## 1.1 Using Distributed Cache to Propagate Hashtable File
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/outerjoinbehavior.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

# Hive Outer Join Behavior

{{< toc >}}

This document is based on a writeup of [DB2 Outer Join Behavior](http://www.ibm.com/developerworks/data/library/techarticle/purcell/0112purcell.html). The original HTML can be found [here](/attachments/OuterJoinBehavior.html).

## Definitions
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/partitionedviews.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This is a followup to [ViewDev]({{< ref "viewdev" >}}) for adding partition-awareness to views.

{{< toc >}}

# Use Cases

1. An administrator wants to create a set of views as a table/column renaming layer on top of an existing set of base tables, without breaking any existing dependencies on those tables. To read-only users, the views should behave exactly the same as the underlying tables in every way. Among other things, this means users should be able to browse available partitions.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/statsdev.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This document describes the support of statistics for Hive tables (see [HIVE-33](http://issues.apache.org/jira/browse/HIVE-33)).

{{< toc >}}

## Motivation

Statistics such as the number of rows of a table or partition and the histograms of a particular interesting column are important in many ways. One of the key use cases of statistics is query optimization. Statistics serve as the input to the cost functions of the optimizer so that it can compare different plans and choose among them. Statistics may sometimes meet the purpose of the users' queries. Users can quickly get the answers for some of their queries by only querying stored statistics rather than firing long-running execution plans. Some examples are getting the quantile of the users' age distribution, the top 10 apps that are used by people, and the number of distinct sessions.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/theta-join.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Theta Join

{{< toc >}}

## Preliminaries

### Overview
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/top-k-stats.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This document is an addition to [Statistics in Hive](https://hive.apache.org/development/desingdocs/statsdev). It describes the support of collecting column level top K values for Hive tables (see [HIVE-3421](https://issues.apache.org/jira/browse/HIVE-3421)).

{{< toc >}}

## Scope

In addition to the partition statistics, column level top K values can also be estimated for Hive tables.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/vectorized-query-execution.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Vectorized Query Execution

{{< toc >}}

# Introduction

Vectorized query execution is a Hive feature that greatly reduces the CPU usage for typical query operations like scans, filters, aggregates, and joins. A standard query execution system processes one row at a time. This involves long code paths and significant metadata interpretation in the inner loop of execution. Vectorized query execution streamlines operations by processing a block of 1024 rows at a time. Within the block, each column is stored as a vector (an array of a primitive data type). Simple operations like arithmetic and comparisons are done by quickly iterating through the vectors in a tight loop, with no or very few function calls or conditional branches inside the loop. These loops compile in a streamlined way that uses relatively few instructions and finishes each instruction in fewer clock cycles, on average, by effectively using the processor pipeline and cache memory. A detailed design document is attached to the vectorized query execution JIRA, at <https://issues.apache.org/jira/browse/HIVE-4160>.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/desingdocs/viewdev.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Views

{{< toc >}}

## Use Cases

Views (<http://issues.apache.org/jira/browse/HIVE-972>) are a standard DBMS feature and their uses are well understood. A typical use case might be to create an interface layer with a consistent entity/attribute naming scheme on top of an existing set of inconsistently named tables, without having to cause disruption due to direct modification of the tables. More advanced use cases would involve predefined filters, joins, aggregations, etc for simplifying query construction by end users, as well as sharing common definitions within ETL pipelines.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/gettingstarted-latest.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : GettingStarted

{{< toc >}}

## Installation and Configuration

You can install a stable release of Hive by downloading a tarball, or you can download the source code and build Hive from that.
Expand Down
2 changes: 0 additions & 2 deletions content/Development/qtest.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,6 @@ draft: false

Query File Test is a JUnit-based integration test suite for Apache Hive. Developers write any SQL; the testing framework runs it and verifies the result and output.

{{< toc >}}

## Tutorial: How to run a specific test case

### Preparation
Expand Down
2 changes: 0 additions & 2 deletions content/community/bylaws.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ Hive is a project of the [Apache Software Foundation](http://www.apache.org/foun

Hive is typical of Apache projects in that it operates under a set of principles, known collectively as the 'Apache Way'. If you are new to Apache development, please refer to the [Incubator Project](http://incubator.apache.org/) for more information on how Apache projects operate.

{{< toc >}}

## Roles and Responsibilities

Apache projects define a set of roles with associated rights and responsibilities. These roles govern what tasks an individual may perform within the project. The roles are defined in the following sections.
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/developerguide.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : DeveloperGuide

{{< toc >}}

## Code Organization and a Brief Architecture

### Introduction
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/hive-apis-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page aims to catalogue and describe the various public facing APIs exposed by Hive in order to inform developers wishing to integrate their applications and frameworks with the Hive ecosystem. To date the following APIs have been identified in the Hive project that are either considered public, or widely used in the public domain:

{{< toc >}}

# API categories

The APIs can be segmented into two conceptual categories: operation based APIs and query based APIs.
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/hivedeveloperfaq.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : HiveDeveloperFAQ

{{< toc >}}

## Developing

### How do I move some files?
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/howtocommit.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page contains guidelines for committers of the Apache Hive project. (If you're currently a contributor, and are interested in how we add new committers, read [BecomingACommitter]({{< ref "/community/becomingcommitter" >}}))

{{< toc >}}

## New committers

New committers are encouraged to first read Apache's generic committer documentation:
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/howtocontribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page describes the mechanics of *how* to contribute software to Apache Hive. For ideas about *what* you might contribute, please see open tickets in [Jira](https://issues.apache.org/jira/browse/HIVE).

{{< toc >}}

## Getting the Source Code

First of all, you need the Hive source code.
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/howtorelease.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : HowToRelease

{{< toc >}}

## Introduction

This page is prepared for Hive committers. You need committer rights to create a new Hive release.
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/presentations.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Presentations

{{< toc >}}

# Hive Meetups

## January 2016 Hive User Group Meetup
Expand Down
2 changes: 0 additions & 2 deletions content/community/resources/unit-testing-hive-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : Unit Testing Hive SQL

{{< toc >}}

# Motivations

Hive is widely applied as a solution to numerous distinct problem types in the domain of big data. Quite clearly it is often used for the ad hoc querying of large datasets. However it is also used to implement ETL type processes. Unlike ad hoc queries, the Hive SQL written for ETLs has some distinct attributes:
Expand Down
2 changes: 0 additions & 2 deletions content/docs/latest/admin/adminmanual-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : AdminManual Configuration

{{< toc >}}

## Configuring Hive

A number of configuration variables in Hive can be used by the administrator to change the behavior for their installations and user sessions. These variables can be configured in any of the following ways, shown in the order of preference:
Expand Down
2 changes: 0 additions & 2 deletions content/docs/latest/admin/adminmanual-installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : AdminManual Installation

{{< toc >}}

# Installing Hive

You can install a stable release of Hive by downloading and unpacking a tarball, or you can download the source code and build Hive using Maven (release 0.13 and later) or Ant (release 0.12 and earlier).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@ date: 2024-12-12

# Apache Hive : AdminManual Metastore 3.0 Administration

{{< toc >}}

## Version Note

**This document applies only to the Metastore in Hive 3.0 and later releases.**  For Hive 0, 1, and 2 releases please see the [Metastore Administration]({{< ref "adminmanual-metastore-administration" >}}) document.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@ date: 2024-12-12

This page only documents the MetaStore in Hive 2.x and earlier. For 3.x and later releases please see [AdminManual Metastore 3.0 Administration]({{< ref "adminmanual-metastore-3-0-administration" >}})

{{< toc >}}

### Introduction

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using [JPOX](http://www.datanucleus.org/) ORM solution (Data Nucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported. See the list of [supported databases]({{< ref "#supported-databases" >}}) in section below.
Expand Down
2 changes: 0 additions & 2 deletions content/docs/latest/admin/hive-on-spark-getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ set hive.execution.engine=spark;
```
Hive on Spark was added in [HIVE-7292](https://issues.apache.org/jira/browse/HIVE-7292).

{{< toc >}}

## Version Compatibility

Hive on Spark is only tested with a specific version of Spark, so a given version of Hive is only guaranteed to work with a specific version of Spark. Other versions of Spark may work with a given version of Hive, but that is not guaranteed. Below is a list of Hive versions and their corresponding compatible Spark versions.
Expand Down
Loading