Skip to content

Wierd panic on write #337

@laskoviymishka

Description

@laskoviymishka

Describe the bug, including details regarding any error messages, version, and platform.

i'm implementing streaming (transferia/iceberg#3) sink from kafka-like sources into arrow-go, and utilize iceberg to create parquet files, I expirience some weird panic in pq.Write method:

panic: runtime error: invalid memory address or nil pointer dereference
  [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x3e2f7b4]
  
  goroutine 415 [running]:
  github.com/apache/arrow-go/v18/arrow/memory.(*Buffer).Bytes(...)
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/arrow/memory/buffer.go:106
  github.com/apache/arrow-go/v18/parquet/file.(*page).Data(...)
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/parquet/file/page_reader.go:90
  github.com/apache/arrow-go/v18/parquet/file.(*columnWriter).TotalBytesWritten(...)
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/parquet/file/column_writer.go:203
  github.com/apache/arrow-go/v18/parquet/file.(*rowGroupWriter).Close(0xc0015267e0)
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/parquet/file/row_group_writer.go:237 +0x8b
  github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).Close(0xc000612690)
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/parquet/pqarrow/file_writer.go:303 +0x38
  github.com/apache/arrow-go/v18/parquet/pqarrow.(*FileWriter).Write(0xc000612690, {0x699b5c0, 0xc001928360})
  	/home/runner/go/pkg/mod/github.com/apache/arrow-go/v18@v18.2.0/parquet/pqarrow/file_writer.go:243 +0x2a5
  github.com/transferia/iceberg.writeFile({0xc00170bab0?, 0x61?}, 0xc00060c2a0, {0xc001553b00, 0x1, 0x1})
  	/home/runner/work/iceberg/iceberg/s3_writer.go:59 +0x43b
  github.com/transferia/iceberg.(*SinkStreaming).writeBatch(0xc00184e1b0, 0xc00060c2a0, {0xc001553b00, 0x1, 0x1})
  	/home/runner/work/iceberg/iceberg/sink_streaming.go:173 +0xde
  github.com/transferia/iceberg.(*SinkStreaming).writeDataToTable(...)
  	/home/runner/work/iceberg/iceberg/sink_streaming.go:162
  github.com/transferia/iceberg.(*SinkStreaming).processTable(0xc00184e1b0, {0xc001553b00, 0x1, 0x1})
  	/home/runner/work/iceberg/iceberg/sink_streaming.go:105 +0x195
  github.com/transferia/iceberg.(*SinkStreaming).Push(0xc00184e1b0, {0xc0015539e0, 0x1, 0x4aeb8e?})
  	/home/runner/work/iceberg/iceberg/sink_streaming.go:80 +0x327
  github.com/transferia/transferia/pkg/middlewares.(*errorTracker).Push(0xc0010837d0, {0xc0015539e0?, 0xc000af16f0?, 0xc0015539e0?})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/error_tracker.go:35 +0x29
  github.com/transferia/transferia/pkg/middlewares.(*outputDataMetering).Push(0xc000567810?, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/metering.go:65 +0x2a
  github.com/transferia/transferia/pkg/middlewares.(*statistician).Push(0xc001ce63c0, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/statistician.go:58 +0x92
  github.com/transferia/transferia/pkg/middlewares.(*filter).Push(0xc001852810, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/filter.go:76 +0xa3
  github.com/transferia/transferia/pkg/middlewares.(*nonRowSeparator).Push(0xc000583e30, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/nonrow_separator.go:50 +0x37f
  github.com/transferia/transferia/pkg/middlewares.(*inputDataMetering).Push(0xc000af18f0?, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/metering.go:43 +0x2a
  github.com/transferia/transferia/pkg/middlewares/async.(*synchronizer).AsyncPush(0xc001852840, {0xc0015539e0?, 0x1, 0xc0001bc060?})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/async/synchronizer.go:61 +0xe5
  github.com/transferia/transferia/pkg/middlewares/async.(*measurer).AsyncPush(0xc000c8b820, {0xc0015539e0, 0x1, 0x1})
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/middlewares/async/measurer.go:59 +0x146
  github.com/transferia/transferia/pkg/parsequeue.(*ParseQueue[...]).pushLoop(0x69c73e0)
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/parsequeue/parsequeue.go:88 +0x13a
  created by github.com/transferia/transferia/pkg/parsequeue.New[...] in goroutine 405
  	/home/runner/go/pkg/mod/github.com/transferia/transferia@v0.0.2/pkg/parsequeue/parsequeue.go:161 +0x1f7

Here is some diagnostic that I did collect:

Converting 1 items to Arrow Record with schema: schema:
  fields: 8
    - id: type=int32, nullable
    - level: type=utf8, nullable
    - caller: type=utf8, nullable
    - msg: type=utf8, nullable
    - _timestamp: type=timestamp[us, tz=UTC]
    - _partition: type=binary
    - _offset: type=int64
    - _idx: type=int64
Processing field 0: id (type: int32)
Item 0, Field id: Value type is int32
Processing field 1: level (type: utf8)
Item 0, Field level: Value type is string
Processing field 2: caller (type: utf8)
Item 0, Field caller: Value type is string
Processing field 3: msg (type: utf8)
Item 0, Field msg: Value type is string
Processing field 4: _timestamp (type: timestamp[us, tz=UTC])
Item 0, Field _timestamp: Value type is time.Time
Processing field 5: _partition (type: binary)
Item 0, Field _partition: Value type is string
Processing field 6: _offset (type: int64)
Item 0, Field _offset: Value type is uint64
Processing field 7: _idx (type: int64)
Item 0, Field _idx: Value type is uint32
Writing record with 1 rows and 8 columns to s3://warehouse/streaming/topic1/data/00000-0-7711175e-7cbe-48a0-a534-4142d4bacede-0-00003.parquet
Recovered from panic in Write: 
runtime error: invalid memory address or nil pointer dereferenceRecord details:
  NumRows: 1
  NumCols: 8
  Column 0: id (type: int32)
  Column 1: level (type: utf8)
  Column 2: caller (type: utf8)
  Column 3: msg (type: utf8)
  Column 4: _timestamp (type: timestamp[us, tz=UTC])
  Column 5: _partition (type: binary)
  Column 6: _offset (type: int64)
  Column 7: _idx (type: int64)

Component(s)

Parquet

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions