Skip to content

[Bug]: Update Beam containers to numpy 2.x #33639

@tvalentyn

Description

@tvalentyn

What happened?

Currently, users who pip install apache-beam[gcp] install a newer version of numpy at job submission than is installed in container images. This causes a misconfiguration for Dataframe api users. From: https://lists.apache.org/thread/3k3rpnoh1tjf7d9rhvl88lmrn04fr9cn.

One of the jobs I ran (Java multi-lang that uses Python Dataframe) failed with the following error.

ModuleNotFoundError: No module named 'numpy._core.numeric'

Indeed in

we have numpy 1.x. We should try to upgrade the containers to use numpy 2.x for Python versions that support it. We should investigate what dependency is preventing Python 3.10+ containers from picking up numpy 2.x

Issue Priority

Priority: 1 (data loss / total loss of function)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions