Skip to content

This function is used to bucketize a column as per your need and add the bucketized values as additional columns to your existing Spark dataframe.Also you don't have to worry about the number the columns you want to bucketize, this function can bucketize as many columns as you want.

Notifications You must be signed in to change notification settings

Debarchan1994/Bucketizer_dynamic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

This function is used to bucketize a column as per your need and add the bucketized values as additional columns to your existing Spark dataframe.Also you don't have to worry about the number the columns you want to bucketize, this function can bucketize as many columns as you want, you just have to pass the names of the columns with the bucketization ranges into the inputCols paramter as a dictionary with the name of the input columns as key and the necessary ranges in the form of tuple or list for the bucket size as the value. For example inputCols = {'my_column' : (0,100,10)} or {'my_column' : [0,10,100,200,500,1000]}

    Parameters :: \n
                1) df : type(pyspark dataframe)--> The base dataframe
                2) inputCols : type(dictionary)--> Key : column names that need to be bucketized , value : Bucketization range or list
                3) outputCols : type(list)--> Desired names of the final output columns
                4) cat_alternative : type(boolean)--> Defines whether 2 columns should be created of each input column with second column being the indicator for the last value of a particular bucket (example,  main column:  10 ... 20 , second alternate column : <20) or just 1 main column
                                                    (example, main column:  10 ... 20). 
                                                    The default value is False, but if True is passed as parameter then only the function will create 2 columns for each input column. 

About

This function is used to bucketize a column as per your need and add the bucketized values as additional columns to your existing Spark dataframe.Also you don't have to worry about the number the columns you want to bucketize, this function can bucketize as many columns as you want.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages