Skip to content
This repository was archived by the owner on Jul 30, 2019. It is now read-only.

[REVIEW] Dask RF Classifier#56

Open
oyilmaz-nvidia wants to merge 4 commits intorapidsai:branch-0.9from
oyilmaz-nvidia:fea-randomforest
Open

[REVIEW] Dask RF Classifier#56
oyilmaz-nvidia wants to merge 4 commits intorapidsai:branch-0.9from
oyilmaz-nvidia:fea-randomforest

Conversation

@oyilmaz-nvidia
Copy link

This PR includes the dask RF classifier. To run the code, cuml from the following link should be installed;

https://github.com/oyilmaz-nvidia/cuml/tree/fea-update-predict

Once this PR is approved, another PR for the the updates made in the fea-update-predict branch of cuml will be created.

Copy link
Member

@cjnolet cjnolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR was just meant for a quick overview, but I notice this does not include any automated pytests.

Also, it's going to be important that the dataframe passed in contains only 1 partition per worker. Originally I was thiking of writing some custom code to do this, but I believe it's better that we use Dask's repartition() function for this first iteration.

import math

@gen.coroutine
def _extract_ddf_partitions(ddf):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a utility function inside cuml.dask.dask_df_utils in the new comms PR.

"""
c = default_client()

X_futures = c.sync(_extract_ddf_partitions, X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call version from cuml.dask.common.dask_df_utils

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants