Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 86 additions & 1 deletion docs/04-user-guide/03-integrations/04-impala.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,88 @@
---
sidebar_label: Impala
---

# Impala

**TODO:** File a subtask under [HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this page or section.
Starting with version **4.2.0**, Apache Impala provides full support for querying data stored in Apache Ozone. To utilize this functionality, ensure that your Ozone version is **1.4.0** or later.

## Supported Access Protocols

Impala supports the following protocols for accessing Ozone data:

* `ofs`
* `s3a`

> **Note:**
> The `o3fs` protocol is **NOT** supported by Impala.

## Supported Replication Types

Impala is compatible with Ozone buckets configured with either:

* **RATIS** (Replication)
* **Erasure Coding**
Comment on lines +23 to +24

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve discoverability and provide more context for readers, consider linking "RATIS" and "Erasure Coding" to their respective documentation pages. This will help users who are unfamiliar with these terms find information more easily.

Suggested change
* **RATIS** (Replication)
* **Erasure Coding**
* [**RATIS** (Replication)](../../../03-core-concepts/02-replication/03-ratis.md)
* [**Erasure Coding**](../../../03-core-concepts/02-replication/04-erasure-coding.md)


## Querying Ozone Data with Impala

Impala provides two approaches to interact with Ozone:

1. Managed Tables
2. External Tables

### Managed Tables

If the Hive Warehouse Directory is located in Ozone, you can execute Impala queries without any changes, treating the Ozone file system like HDFS.

**Example:**

```sql
CREATE DATABASE d1;

CREATE TABLE t1 (x INT, s STRING);
```

The data will be stored under the Hive Warehouse Directory path in Ozone.

#### Specifying a Custom Ozone Path

You can create managed databases, tables, or partitions at a specific Ozone path using the `LOCATION` clause.

**Example:**

```sql
CREATE DATABASE d1 LOCATION 'ofs://ozone1/vol1/bucket1/d1.db';

CREATE TABLE t1 LOCATION 'ofs://ozone1/vol1/bucket1/table1';
```

### External Tables

You can create an external table in Impala to query Ozone data.

**Example:**

```sql
CREATE EXTERNAL TABLE external_table (
id INT,
name STRING
) LOCATION 'ofs://ozone1/vol1/bucket1/table1';
```

With external tables:

* The data is expected to be created and managed by another tool.
* Impala queries the data as-is.
* The metadata is stored under the external warehouse directory.

> **Note:**
> Dropping an external table in Impala does not delete the associated data.

### Using the S3A Protocol

In addition to `ofs`, Impala can access Ozone via the S3 Gateway using the S3A file system. For more details, refer to:

* [The S3 Protocol](https://www.google.com/search?q=../01-client-interfaces/03-s3.md)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link for 'The S3 Protocol' is currently broken as it points to a Google search URL instead of the internal documentation page. It should be updated to a relative path pointing to the S3 API documentation.

Suggested change
* [The S3 Protocol](https://www.google.com/search?q=../01-client-interfaces/03-s3.md)
* [The S3 Protocol](../../01-client-interfaces/03-s3.md)

* The [Hadoop S3A documentation](https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html)

For additional information, consult the Apache Impala User Documentation on [Using Impala with Apache Ozone Storage](https://impala.apache.org/docs/build/html/topics/impala_ozone.html).