Skip to content

Latest commit

 

History

History
16 lines (9 loc) · 1008 Bytes

File metadata and controls

16 lines (9 loc) · 1008 Bytes

RandomForestExample

Random Forest on 30 M records

Task- Predicting the fare amount (inclusive of tolls) for a taxi ride in New York City given the pickup and dropoff locations.

Following are the steps to make good model using Random Forest Regressor in Python:

1> Loading the data set of test and train. As there are 55M records; we will work on 30M records in order to avoid giving load to the python kernel.

2> Exploratory Data Analysis (EDA): This will help us to know the correlation between the independent variables and also with the target variable. We will also watch for the ouliers if any using EDA.

3> Feature Engineering: This step will helps us to convert the data types of variables (Here Pickup_datetime) and also splitting them into month, day, hour, weekdayname, weekday and year. We will add Haversine distance formula to calculate distance from the given pickup and dropoff latitudes and longitudes.

4> Model Training: We will train using Random Forest Regressor.

5> Prediction