Skip to content
This repository was archived by the owner on Jul 16, 2021. It is now read-only.
This repository was archived by the owner on Jul 16, 2021. It is now read-only.

Get dask-gateway scheduler address #55

@AlbertDeFusco

Description

@AlbertDeFusco

When connecting to a dask-gateway the client.scheduler_address is a proxy address

>>>client.scheduler.address
'gateway://dask.training.anaconda.com:8786/4fd53916f0214703934701aa7a7eaf85'

I was able to solve this with the following in core::_train with client.scheduler_info()['address'])

    # Start the XGBoost tracker on the Dask scheduler
    host, port = parse_host_port(client.scheduler_info()['address'])
    env = yield client._run_on_scheduler(
        start_tracker, host.strip("/:"), len(worker_map)
    )

However, I get the following warning.

>>> from dask_xgboost import XGBRegressor
>>> xgb = XGBRegressor()
>>> xgb.fit(X, y)

/Users/adefusco/Applications/miniconda3/envs/xgb/lib/python3.7/site-packages/distributed/client.py:3299: RuntimeWarning: coroutine 'Client._update_scheduler_info' was never awaited
  self.sync(self._update_scheduler_info)
RuntimeWarning: Enable tracemalloc to get the object allocation traceback

I have verified that his update works correctly on a 9m row training set and scales linearly from 4 to 8 workers (2cores/worker). Is this the correct approach to get the actual scheduler address?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions