Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,4 @@ themes/hive/.DS_Store
themes/hive/static/.DS_Store
.hugo_build.lock
public
target
51 changes: 27 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License. -->
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License. -->

# Apache Hive Documentation Site

This repository contains the code for generating the Apache Hive web site.
Expand All @@ -25,13 +26,14 @@ It's built with Hugo and hosted at https://hive.apache.org.
* Clone this repository.
* Install [hugo] on macOS:

```brew install hugo```
* For other OS please refer: [hugo-install]
```brew install hugo```
* For other OS please refer: [hugo-install]
* To verify your new install:

```hugo version```

* To build and start the Hugo server run:

```
>>> hugo server -D

Expand All @@ -55,19 +57,20 @@ Running in Fast Render Mode. For full rebuilds on change: hugo server --disableF
Web Server is available at http://localhost:1313/ (bind address 127.0.0.1)
Press Ctrl+C to stop
```
* Navigate to `http://localhost:1313/` to view the site locally.

* Navigate to `http://localhost:1313/` to view the site locally.

### To Add New Content
### To Add New Content

* To add new markdown file :
`hugo new general/Downloads.md`
* To add new markdown file :
`hugo new general/Downloads.md`

* Update `themes/hive/layouts/partials/menu.html` and `config.toml` to add navigation link to the markdown page as needed.

### Pushing to site
Commit and push the changes to the main branch. The site is automatically deployed from the site directory.

Commit and push the changes to the main branch. The site is automatically deployed from the site directory.

[hugo]: https://gohugo.io/getting-started/quick-start/
[hugo-install]: https://gohugo.io/installation/
[hugo-install]: https://gohugo.io/installation/

1 change: 1 addition & 0 deletions content/Development/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
title: "Development"
date: 2025-07-24
---

1 change: 1 addition & 0 deletions content/Development/desingdocs/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,4 @@
title: "Design Documents"
date: 2025-07-24
---

23 changes: 8 additions & 15 deletions content/Development/desingdocs/accessserver-design-proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,18 +46,15 @@ Hive has a powerful data model that allows users to map logical tables and parti

HCatalog's Storage Based Authorization model is explained in more detail in the [HCatalog documentation](http://hive.apache.org/docs/hcat_r0.5.0/authorization.html), but the following set of quotes provides a good high-level overview:

>
> ... when a file system is used for storage, there is a directory corresponding to a database or a table. With this authorization model, **the read/write permissions a user or group has for this directory determine the permissions a user has on the database or table**.
>
> ...
>
> For example, an alter table operation would check if the user has permissions on the table directory before allowing the operation, even if it might not change anything on the file system.
>
> ...
>
> ... when a file system is used for storage, there is a directory corresponding to a database or a table. With this authorization model, **the read/write permissions a user or group has for this directory determine the permissions a user has on the database or table**.
>
> ...
>
> For example, an alter table operation would check if the user has permissions on the table directory before allowing the operation, even if it might not change anything on the file system.
>
> ...
>
> When the database or table is backed by a file system that has a Unix/POSIX-style permissions model (like HDFS), there are read(r) and write(w) permissions you can set for the owner user, group and ‘other’. The file system’s logic for determining if a user has permission **on the directory or file** will be used by Hive.
>
>

There are several problems with this approach, the first of which is actually hinted at by the inconsistency highlighted in the preceding quote. To determine whether a particular user has read permission on table `foo`, HCatalog's [HdfsAuthorizationProvider class](http://svn.apache.org/repos/asf/hive/branches/branch-0.11/hcatalog/core/src/main/java/org/apache/hcatalog/security/HdfsAuthorizationProvider.java) checks to see if the user has read permission on the corresponding HDFS directory `/hive/warehouse/foo` that contains the table's data. However, in HDFS having [read permission on a directory](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html) only implies that you have the ability to list the contents of the directory – it doesn't have any affect on your ability to read the files contained in the directory.

Expand Down Expand Up @@ -100,7 +97,3 @@ Finally, red is used in the preceding diagram to highlight HCatalog components w

![](images/icons/bullet_blue.gif)





16 changes: 6 additions & 10 deletions content/Development/desingdocs/binary-datatype-proposal.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ create table binary_table (a string, b binary);

### How is 'binary' represented internally in Hive

Binary type in Hive will map to 'binary' data type in thrift. 
Binary type in Hive will map to 'binary' data type in thrift. 

Primitive java object for 'binary' type is ByteArrayRef
Primitive java object for 'binary' type is ByteArrayRef

PrimitiveWritableObject for 'binary' type is BytesWritable

Expand All @@ -41,13 +41,13 @@ As with other types, binary data will be sent to transform script in String form

### Supported Serde:

ColumnarSerde
ColumnarSerde

BinarySortableSerde
BinarySortableSerde

LazyBinaryColumnarSerde  
LazyBinaryColumnarSerde  

LazyBinarySerde
LazyBinarySerde

LazySimpleSerde

Expand All @@ -57,7 +57,3 @@ Group-by and unions will be supported on columns with 'binary' type

<https://issues.apache.org/jira/browse/HIVE-2380>





151 changes: 76 additions & 75 deletions content/Development/desingdocs/column-statistics-in-hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,59 +30,60 @@ To view column stats :
```
describe formatted [table_name] [column_name];
```

### **Metastore Schema**

To persist column level statistics, we propose to add the following new tables,

CREATE TABLE TAB_COL_STATS
(
CS_ID NUMBER NOT NULL,
TBL_ID NUMBER NOT NULL,
COLUMN_NAME VARCHAR(128) NOT NULL,
COLUMN_TYPE VARCHAR(128) NOT NULL,
TABLE_NAME VARCHAR(128) NOT NULL,
DB_NAME VARCHAR(128) NOT NULL,
(
CS_ID NUMBER NOT NULL,
TBL_ID NUMBER NOT NULL,
COLUMN_NAME VARCHAR(128) NOT NULL,
COLUMN_TYPE VARCHAR(128) NOT NULL,
TABLE_NAME VARCHAR(128) NOT NULL,
DB_NAME VARCHAR(128) NOT NULL,

LOW_VALUE RAW,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,

BIT_VECTOR, BLOB,  /* introduced in [HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */

AVG_COL_LEN DOUBLE,
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)

ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY (CS_ID);

ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY (TBL_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;

CREATE TABLE PART_COL_STATS
(
CS_ID NUMBER NOT NULL,
PART_ID NUMBER NOT NULL,
(
CS_ID NUMBER NOT NULL,
PART_ID NUMBER NOT NULL,

DB_NAME VARCHAR(128) NOT NULL,
COLUMN_NAME VARCHAR(128) NOT NULL,
COLUMN_TYPE VARCHAR(128) NOT NULL,
TABLE_NAME VARCHAR(128) NOT NULL,
PART_NAME VARCHAR(128) NOT NULL,
COLUMN_NAME VARCHAR(128) NOT NULL,
COLUMN_TYPE VARCHAR(128) NOT NULL,
TABLE_NAME VARCHAR(128) NOT NULL,
PART_NAME VARCHAR(128) NOT NULL,

LOW_VALUE RAW,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,
HIGH_VALUE RAW,
NUM_NULLS BIGINT,
NUM_DISTINCTS BIGINT,

BIT_VECTOR, BLOB,  /* introduced in [HIVE-16997](https://issues.apache.org/jira/browse/HIVE-16997) in Hive 3.0.0 */

AVG_COL_LEN DOUBLE,
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)
MAX_COL_LEN BIGINT,
NUM_TRUES BIGINT,
NUM_FALSES BIGINT,
LAST_ANALYZED BIGINT NOT NULL)

ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_PK PRIMARY KEY (CS_ID);

Expand All @@ -93,44 +94,44 @@ ALTER TABLE COLUMN_STATISTICS ADD CONSTRAINT COLUMN_STATISTICS_FK1 FOREIGN KEY (
We propose to add the following Thrift structs to transport column statistics:

struct BooleanColumnStatsData {
1: required i64 numTrues,
2: required i64 numFalses,
3: required i64 numNulls
}
1: required i64 numTrues,
2: required i64 numFalses,
3: required i64 numNulls
}

struct DoubleColumnStatsData {
1: required double lowValue,
2: required double highValue,
3: required i64 numNulls,
4: required i64 numDVs,
1: required double lowValue,
2: required double highValue,
3: required i64 numNulls,
4: required i64 numDVs,

5: optional string bitVectors

}

struct LongColumnStatsData {
1: required i64 lowValue,
2: required i64 highValue,
3: required i64 numNulls,
4: required i64 numDVs,
1: required i64 lowValue,
2: required i64 highValue,
3: required i64 numNulls,
4: required i64 numDVs,

5: optional string bitVectors
}
}

struct StringColumnStatsData {
1: required i64 maxColLen,
2: required double avgColLen,
3: required i64 numNulls,
4: required i64 numDVs,
1: required i64 maxColLen,
2: required double avgColLen,
3: required i64 numNulls,
4: required i64 numDVs,

5: optional string bitVectors
}
}

struct BinaryColumnStatsData {
1: required i64 maxColLen,
2: required double avgColLen,
3: required i64 numNulls
}
1: required i64 maxColLen,
2: required double avgColLen,
3: required i64 numNulls
}

struct Decimal {
1: required binary unscaled,
Expand Down Expand Up @@ -168,43 +169,43 @@ union ColumnStatisticsData {
}

struct ColumnStatisticsObj {
1: required string colName,
2: required string colType,
3: required ColumnStatisticsData statsData
}
1: required string colName,
2: required string colType,
3: required ColumnStatisticsData statsData
}

struct ColumnStatisticsDesc {
1: required bool isTblLevel,
2: required string dbName,
3: required string tableName,
4: optional string partName,
5: optional i64 lastAnalyzed
}
1: required bool isTblLevel,
2: required string dbName,
3: required string tableName,
4: optional string partName,
5: optional i64 lastAnalyzed
}

struct ColumnStatistics {
1: required ColumnStatisticsDesc statsDesc,
2: required list<ColumnStatisticsObj> statsObj;
}
1: required ColumnStatisticsDesc statsDesc,
2: required list<ColumnStatisticsObj> statsObj;
}

We propose to add the following Thrift APIs to persist, retrieve and delete column statistics:

bool update_table_column_statistics(1:ColumnStatistics stats_obj) throws (1:NoSuchObjectException o1,
2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
bool update_partition_column_statistics(1:ColumnStatistics stats_obj) throws (1:NoSuchObjectException o1,
2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)
bool update_partition_column_statistics(1:ColumnStatistics stats_obj) throws (1:NoSuchObjectException o1,
2:InvalidObjectException o2, 3:MetaException o3, 4:InvalidInputException o4)

ColumnStatistics get_table_column_statistics(1:string db_name, 2:string tbl_name, 3:string col_name) throws
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidInputException o3, 4:InvalidObjectException o4)
ColumnStatistics get_partition_column_statistics(1:string db_name, 2:string tbl_name, 3:string part_name,
4:string col_name) throws (1:NoSuchObjectException o1, 2:MetaException o2,
3:InvalidInputException o3, 4:InvalidObjectException o4)
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidInputException o3, 4:InvalidObjectException o4)
ColumnStatistics get_partition_column_statistics(1:string db_name, 2:string tbl_name, 3:string part_name,
4:string col_name) throws (1:NoSuchObjectException o1, 2:MetaException o2,
3:InvalidInputException o3, 4:InvalidObjectException o4)

bool delete_partition_column_statistics(1:string db_name, 2:string tbl_name, 3:string part_name, 4:string col_name) throws
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
4:InvalidInputException o4)
bool delete_table_column_statistics(1:string db_name, 2:string tbl_name, 3:string col_name) throws
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
4:InvalidInputException o4)
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
4:InvalidInputException o4)
bool delete_table_column_statistics(1:string db_name, 2:string tbl_name, 3:string col_name) throws
(1:NoSuchObjectException o1, 2:MetaException o2, 3:InvalidObjectException o3,
4:InvalidInputException o4)

Note that delete_column_statistics is needed to remove the entries from the metastore when a table is dropped. Also note that currently Hive doesn’t support drop column.

Expand Down
Loading