Skip to content

gotcha: only the training data prior to the specific day of interest can be used #6

@greenteawarrior

Description

@greenteawarrior

from the bikeshare kaggle competition data page:

You must predict the total count of bikes rented during each hour covered 
by the test set, using only information available prior to the rental period. 

apparently the reasoning is that if you use data from the "relative" future to predict something on a given day, the predictions may or may not be legit...

ideas regarding what to do next (each has pros/cons):

  • train 24 different models (one for each month...) - not ideal b/c it's silly to have so many models for a single implementation for a solution
  • code up a model that will parameterize the amount of input data used??
  • look at how well the model(s) perform at different timepoints (first few results will probably be terrible because they only have 1-few months of data)
  • discussion to be continued

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions