from the bikeshare kaggle competition data page:
You must predict the total count of bikes rented during each hour covered
by the test set, using only information available prior to the rental period.
apparently the reasoning is that if you use data from the "relative" future to predict something on a given day, the predictions may or may not be legit...
ideas regarding what to do next (each has pros/cons):
- train 24 different models (one for each month...) - not ideal b/c it's silly to have so many models for a single implementation for a solution
- code up a model that will parameterize the amount of input data used??
- look at how well the model(s) perform at different timepoints (first few results will probably be terrible because they only have 1-few months of data)
- discussion to be continued