For the first commit of each "coding session", we cannot tell how much time was spent on it based on the timestamps alone. Currently, the algorithm simply uses a constant to estimate the work for those commits. I believe better estimates can be easily made.
Proposed algorithm:
- Group commits into commit sessions, separating where time between commits > 2h (same as current algorithm)
- Estimate average time to edit a line of code:
- let known_work be the set of commits with known hours of work (i.e., all but the first commit in each session)
- average time to edit a line of code = total lines edited in known_work / total time spent in known_work
- Estimate total time spent:
- For the first commit in each session, multiply the number of lines changed in that commit by the average time to edit a line of code
- For other commits, assume the entire duration since the last commit was spent working (same as current algorithm)
Other considerations and alternatives:
- Instead of time per line, we could use time per character or other metric
- Instead of estimating hours of work based on time per line, we could assume that the time spent on the first commit in a session would be similar to the time spent on other commits.
- For first commits, the estimated time may be capped by the duration since last commit
- Current algorithms assume there is 1 author if I understand correctly. Would be nice to compute time per-author. Not only would this be a more detailed metric, but it would also be more accurate, as 2 people committing at the same time shouldn't be considered a single session.
For the first commit of each "coding session", we cannot tell how much time was spent on it based on the timestamps alone. Currently, the algorithm simply uses a constant to estimate the work for those commits. I believe better estimates can be easily made.
Proposed algorithm:
Other considerations and alternatives: