enh(Agent): Allow for periodic historical tracking of X state#231
enh(Agent): Allow for periodic historical tracking of X state#231nicola-bastianello merged 16 commits intoteam-decent:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds configurable periodic history tracking for agent states to reduce memory usage when agent states are large. The key enhancement allows users to specify a history_period parameter (e.g., 10) so that agent states are only recorded every N iterations instead of at every iteration. Additionally, the PR includes several metric calculation improvements: customizable table formatting, computational cost-based x-axis for plots, performance optimizations for table metric aggregation, and progress bars for metric calculations.
Key Changes
- Modified Agent class to use dict-based history storage with configurable recording periods
- Added ComputationalCost dataclass for plot metrics to display computational cost instead of iterations on x-axis
- Refactored table metric calculation to compute data once and apply multiple statistics, improving performance
- Added MetricProgressBar for user feedback during metric calculations
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
| decent_bench/agents.py | Refactored Agent to use dict-based x_history with configurable history_period; added x_updates counter to AgentMetricsView |
| decent_bench/benchmark_problem.py | Added agent_history_period field to BenchmarkProblem and create_regression_problem |
| decent_bench/networks.py | Updated Agent instantiation to pass history_period parameter |
| decent_bench/benchmark.py | Added computational_cost parameter to benchmark function; removed Status wrappers for table/plot generation (now use progress bars) |
| decent_bench/metrics/metric_utils.py | Added MetricProgressBar class; updated x_mean and x_error to work with dict-based history |
| decent_bench/metrics/table_metrics.py | Added customizable fmt parameter for value formatting; refactored to calculate data once per metric; added progress bar |
| decent_bench/metrics/plot_metrics.py | Added ComputationalCost dataclass; updated to handle sparse history and scale x-axis by computational cost; added progress bar |
| test/test_agents.py | Added comprehensive test for in-place operations with history tracking across multiple frameworks |
| docs/source/user.rst | Added example usage of agent_history_period parameter |
| docs/source/developer.rst | Removed trailing whitespace |
| docs/source/api/decent_bench.metrics.metric_utils.rst | Added exclude-members directive to hide MetricProgressBar from API docs |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Would it be of use to be able to plots over iterations and over costs at the same time? That is for the default plotmetrics you would get 4 plots instead of 2? |
I'm not sure it would be good to have it as default, it might be confusing to casual users. but it definitely sounds interesting to have this as a possibility for "advanced" users; so maybe a boolean toggle? this makes me think about the layout of the plots: currently, the plots are arranged in maximum two columns, each with as many elements as needed. however, since all plots share the same x-axis, it might be better to have a single column that is as long as needed (maybe a warning of poor readability is emitted when more than 4/5 plots need to be plotted). this would also allow having a single x-label, and customizing only y-labels. with this new layout, then your suggestion would just require creating two columns, one with iterations for the x-axis the other with computational cost |
|
also, currently a legend is printed for each separate subplot; this is redundant information. as part of the layout redesign, we could print only a single legend. and I would actually place it before the first subplot (like figure 1 in https://arxiv.org/abs/2501.13516) |
| decent-bench allows you to benchmark decentralized optimization algorithms under various communication constraints, | ||
| providing realistic algorithm comparisons in a user-friendly and highly configurable setting. | ||
|
|
||
| Report any bugs you *may find* to `GitHub <https://github.com/team-decent/decent-bench/issues>`_. |
There was a problem hiding this comment.
could you expand "report any bugs ..." adding "contributions are welcome, see developer guide on how to get started. Please contact Dr. Nicola Bastianello (with link https://bastianello.me/) for discussions". and with this change we can consider #162 closed
|
I have pushed the majority of the fixes. Will post the rest later this evening. You can now test the new plots |
|
The new plots look great, thanks! |
Will all plotmetrics have the same x-label? If not it might be strange to only show the label for one? |
|
yes I think all plotmetrics will have the same x-label (either "iterations" or "computational cost" -- or whatever names we decide to use). so it will be ok to only show one (the x-label of the plot at the bottom). we should use the |
|
yes, this is exactly what I had in mind, looks great! for the x-label of the first column, I would maybe use "time (computational cost units)" and for the second column "iterations" yes, let's remove the x-label property from plot metrics. it might be interesting to allow users to customize the x-labels of the two columns, but we can think about that later maybe instead of another thing: when we have two column, the y-label of the plots in each row is going to be the same. in that case we could keep it only for the first column, and also use |
|
Should we maybe have grid on too? I have turned it on and I think it helps with readability, let me know what you think |
|
MyPy is failing because it is using networks.py from the main branch for some reason (refers to line 521 which doesnt exist in this PR). This issue is mentioned by Elias in some issue, it passes mypy on my machine. I dont have access to modify the checks so I cannot fix this nor do I have a lot of experience with github checks. From what I can find the workflow should be updated to add: after the |
yes, that's a good idea, let's have it on by default. but maybe we can also have a toggle to disable it |
this is very annoying.. Elias is not available for fixing this until mid-January I think, and I don't know how to do it. does syncing your branch with the current main work as a temporary fix? |
|
|
||
| def common_sorted_iterations(agents: Sequence[AgentMetricsView]) -> list[int]: | ||
| """ | ||
| Get a sorted list of all common iterations reached by agents in *agents*. |
There was a problem hiding this comment.
I would maybe add a bit more context, like: "since the agents can sample periodically, and potentially at different times, this function can be used to find the numbers of iterations where all agents have recorded their states, which can then be used to compute the metrics"
| subplot.legend() | ||
|
|
||
| if len(metrics) > 4: | ||
| LOGGER.warning( |
There was a problem hiding this comment.
let's open an issue for the following addition: creating several figures, each with 4 plots, when more than 4 metrics are plotted. alternatively, we could provide a boolean toggle to plot each metric in a separate figure. in any case when we implement storing the results users will be able to create their own custom plots
There was a problem hiding this comment.
Currently we have a sequence of plot metrics to specify what you want to plot, what if we changed it to be nested lists? Then the elements would be figures and then the second list would be which metrics in each figure? Like
plot_metrics=[
[Regret, GradientNorm],
[PlotMetric3, PlotMetric4, PlotMetric5],
]Would yeild 2 figures with the first having Regret and GradientNorm and the second figure having three plots.
There was a problem hiding this comment.
I think I would prefer that the user still only specifies a list, and the framework takes care of the plotting details. otherwise it would be a little less user friendly
but again let's discuss in the future, there is not rush to change this
There was a problem hiding this comment.
The reason why I would like that the user specifies what is contained in one figure is that a user might want certain subplots in one figure. If the framework decides then they would need to run multiple plotting runs (when we save the final state).
Could make it so that both work. Maybe have an int specifying maximum number of plots per figure and list of lists also work to manually split them?
There was a problem hiding this comment.
I agree that it's good to leave the users more freedom to decide what to do with the plots. however, I have some doubts about having so many options (and their docs), because I think it might get confusing quite quickly and put off some users. in a sense, if we provide this plotting functionality we are providing a non-trivial interface to matplotlib, which is harder to maintain in the long run. I say non-trivial because it's not just passing the same kwargs of matplotlib. this is the same discussion as networkx in #233
we also need to consider whether users would use these plots directly in papers or if they would want to customize them. I personally would likely look at the plots coming out of decent-bench to decide what to display in papers, and then create my own using the data stored after the benchmark has executed. this is because I often need to control the fontsize, add a reference number like [4] to legen labels, etc. but if we want to provide this level of customizability to the plots, then essentially we are reproducing matplotlib or providing a complex interface to it, which is not a good idea. and we have just seen that creating plots is hard and subplots are especially messy
so to conclude: I'm not against having some additional options to customize the plots, but I don't want them to be too complicated, since I expect users will create their own plots afterwards. I also don't want to have to maintain complex plotting functionality in the code-base. so for now I would keep the warning, and open an issue copy-pasting this discussion
once my colleagues (hopefully) start using the code-base, I can also ask them what they would like on the plotting side
There was a problem hiding this comment.
Maybe then instead of having customizable plots we could export the data used to create the plots so that users can customize their own plots? Creating the data for the plots is not very trivial as the structurer is rather nested, having the pre-calculated data would be much faster and just export it as a csv. We should probably have a logs folder anyway where we store all of the important console outputs and such from each run
There was a problem hiding this comment.
yes, I think this is a very good idea. let's aim for two things: 1) a simple plot provided by us at the end of the benchmark execution, with limited customizable options (I would keep the option of having iterations and computational cost), 2) the data is also exported in a log/results folder, so that users can create their own plots; in my experience storing data as ndarrays in a npz file works well, but also csv works, although we likely need to use pandas to import it.
but 2) requires a lot of work, so I would open an issue and leave it for the future (also related to #217)
There was a problem hiding this comment.
I think we can include this into the metric union (table + plot) issue. Exporting to csv is more general and easier to work with than numpy data imo. A csv also allows users to quickly inspect it, there are (think built in) easy libraries to import csv files as dictionaries in python using the csv module but pandas is much easier to work with and allows for very easy plotting.
The logs folder could include full console log, a file for the table metrics as latex format and the plot data + image
There was a problem hiding this comment.
good points! sounds good to go for csv files then. and yes, let's include this into #220
It will solve it but there really isnt a point unless there are major merge conflicts, I've already made sure it passes mypy before commiting. |
|
great that the y-label is fixed. I have 1920x1080 screen resolution, and I just sent you the script by email |
ok then for now let's solve it this way, and we can talk about it with Elias in January |
|
I have hopefully solved it now, I made a lot of changes to hopefully make it more stable. The plots might be very big on lower resolutions if more than 2 plot rows are used but I kept this because I wanted the plots to keep the same shape and size independent from how many plots you show. Can make the individual plots smaller but that would require to use a smaller font size and some other minor modifications. Let me know what you think. Also moved away from |
|
That is still not how it looks on my end. I spent a lot of time making sure that the plots looked the same no matter if you had 1 plot or 4 and that the label box never clipper into the plots. I copied the logic from matplotlib source code that calculates the height of certain parts and did empirical testing for some things that were too complex to find anything on. Are you using any kind of zoom/scaling in windows? |
no, I'm not using any zoom/scaling (at least as far as I was able to verify in Window's messy settings). one thing to notice is that I'm still on windows 10 on the KTH laptop anyway, this is a very messy problem and I don't think we can easily solve it ourselves. I was looking a bit more at matplotlib options, and it looks like there might be way to specify legend position with |
|
I think I will remove everything that has to do with trying to make the plots have the same size regardless of how many plots you show and instead just have a fixed window size (like how matplotlib does it by default) and fit the plots into that. If these plots are more meant as a quick overview for how well an algorithm works then they dont have to bee too pretty. I will also take a look at that "outside upper center". I have not seen that before and if it exists they havent updated their loc error because it prints all the possible values and it is not in there. Might have been because I was using tight_layout and it seems like those locations are new for the constrained layout. |
|
sounds good, thank you! I also find that the new loc option is not widely documented, but it seems to be in the stable version.. let's see if it works |
|
Lets hope this works, let me know! Also, interestingly the outside location is not in their error message, see: |
|
I confirm that this works as intended, well done! last thing: could you please update the example plot in the user guide? then I'll merge |






Adds functionality to customize the period for which agent states are saved, useful when the agent state is large in order to prevent filling user memory. Following is a list of further minor enhancements made:
xvariable with minimal performance degradation (especially when historical period is greater than 1) Allowing __add__ on agent.x #212closes #197, closes #199, closes #188, closes #198, closes #212, closes #230, closes #162, closes #194