Skip to content

Uploading with AsyncWritter can lead to OOM errors #172

@Kaldie

Description

@Kaldie

Currently when using the AsyncWritter, it is possible to have an OOM error due to the queue being huge.

For instance this snippet will fill up the queue faster than it can be send via https to hdfs

import string
import random
import hdfs

client = hdfs(<valid arguments>)

with client.write("filename", encoding="utf-8") as file_handle:
  writer = csv.writer(file_handle)

  # creates 25 pseudo lines of csv junk
  for element in [["".join(random.choice(string.ascii_letters) for _ in range(100)) for _ in range(25)] for _ in range(25)]:
    writer.writerows(element)

Leading to a unmanageable large memory usage.

Is it possible to have a limit on the queue size when creating a file_handle?
If you like I would like to create a PR with a possible solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions