Skip to content

Commit dc1875e

Browse files
committed
Address comment
1 parent 2fadc6d commit dc1875e

File tree

2 files changed

+12
-19
lines changed

2 files changed

+12
-19
lines changed

content/en/blog/features/variant.md

Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -112,11 +112,7 @@ The query engine decides which fields to shred based on access patterns and work
112112

113113
### Examples of Shredded Parquet Schemas
114114

115-
If a field's value matches the shredded type, it is stored in the typed column `typed_value`. If a field's value has a different type, it remains in the `value` binary column using standard Variant encoding.
116-
117-
#### Example 1: Shredding to String Type
118-
119-
The Variant values are shredded to string type.
115+
The following example shows shredding non nested Variants. In this case, the writer chose to shred String values as the `typed_value` column. Rows which do not contain strings are stored in the `value` column, with the binary variant encoding.
120116

121117
```parquet
122118
optional group SIMPLE_DATA (VARIANT(1)) = 1 {
@@ -126,7 +122,7 @@ optional group SIMPLE_DATA (VARIANT(1)) = 1 {
126122
}
127123
```
128124

129-
**Encoding Table:**
125+
The series of variant values “Jim”, 100, {“name”: “Jim”} are encoded as:
130126

131127
| Variant Value | `value` | `typed_value` |
132128
|---------------|---------|---------------|
@@ -136,35 +132,32 @@ optional group SIMPLE_DATA (VARIANT(1)) = 1 {
136132

137133
---
138134

139-
#### Example 2: Shredding to Object with Typed Fields
140-
141-
The Variant values are shredded to an object with `user_id` field of integer type and `type` field of string type.
142-
135+
Shredding nested variants is similar, with the shredding applied recursively, as shown in the following example. In this case, the `userId` field is shredded as an integer, and stored as two columns: in `typed_value.userId.typed_value` when the value is integer and as a variant in `typed_value.userId.value` otherwise. Similarly, the `eType` field is shredded as a string and stored in `typed_value.eType.typed_value` and `typed_value.eType.value`.
143136
```parquet
144137
optional group EVENT_DATA (VARIANT(1)) = 1 {
145138
required binary metadata; # variant metadata
146139
optional binary value; # non-shredded value
147140
optional group typed_value {
148-
required group user_id { # user_id field
141+
required group userId { # userId field
149142
optional binary value; # non-shredded value
150143
optional int32 typed_value; # the shredded value
151144
}
152-
required group type { # type field
145+
required group eType { # eType field
153146
optional binary value; # non-shredded value
154147
optional binary typed_value (STRING); # the shredded value
155148
}
156149
}
157150
}
158151
```
159152

160-
**Encoding Table:**
153+
**The table below illustrates how the data is stored:**
161154

162-
| Variant Value | `value` | `typed_value` | `.user_id.value` | `.user_id.typed_value` | `.type.value` | `.type.typed_value` |
163-
|-------------------------------------|------------------|---------------|------------------|------------------------|---------------|---------------------|
164-
| `{"user_id": 100, "type": "login"}` | `null` | | `null` | `100` | `null` | `"login"` |
165-
| `100` | `100` | `null` | | | | |
166-
| `{"user_id": "Jim"}` | `null` | | `"Jim"` | `null` | `null` | `null` |
167-
| `{"user_id": 200, "amount": 99}` | `{"amount": 99}` | | `null` | `200` | `null` | `null` |
155+
| Variant | `value` | `typed_value.userId.value` | `typed_value.userId.typed_value` | `typed_value.eType.value` | `typed_value.eType.typed_value` |
156+
|-------------------------------------|------------------|----------------------------|----------------------------------|---------------------------|---------------------|
157+
| `{"userId": 100, "eType": "login"}` | `null` | `null` | `100` | `null` | `"login"` |
158+
| `100` | `100` | | | | | |
159+
| `{"userId": "Jim"}` | `null` | `"Jim"` | `null` | `null` | `null` |
160+
| `{"userId": 200, "amount": 99}` | `{"amount": 99}` | `null` | `200` | `null` | `null` |
168161

169162
---
170163

5.55 KB
Loading

0 commit comments

Comments
 (0)