enh(Agent): Allow for periodic historical tracking of X state by Simpag · Pull Request #231 · team-decent/decent-bench

Simpag · 2025-12-17T01:07:36Z

Adds functionality to customize the period for which agent states are saved, useful when the agent state is large in order to prevent filling user memory. Following is a list of further minor enhancements made:

Customizable formatting of table metrics Configurable formatting for table metrics #199
Display computational cost in x-axis of plot metrics using a plot_metrics.ComputationalCost container. Display computational cost in x-axis of plot metrics #188
Improved performance of table metric calculation when multiple aggrigation methods (sum, mean, max etc) are used. Cache metric results when calculating statistics #198
Allows the use for inplace updates of the agents x variable with minimal performance degradation (especially when historical period is greater than 1) Allowing __add__ on agent.x #212
Progress bar for metric calculations Progress bar for metrics #230
Add call for discussion/contributions Add call to action in README and docs #162
Ensure unique marker+linestyle in plots, maximizing readability in b/w Ensure unique marker+line style in metrics plots #194

closes #197, closes #199, closes #188, closes #198, closes #212, closes #230, closes #162, closes #194

Copilot

Pull request overview

This PR adds configurable periodic history tracking for agent states to reduce memory usage when agent states are large. The key enhancement allows users to specify a history_period parameter (e.g., 10) so that agent states are only recorded every N iterations instead of at every iteration. Additionally, the PR includes several metric calculation improvements: customizable table formatting, computational cost-based x-axis for plots, performance optimizations for table metric aggregation, and progress bars for metric calculations.

Key Changes

Modified Agent class to use dict-based history storage with configurable recording periods
Added ComputationalCost dataclass for plot metrics to display computational cost instead of iterations on x-axis
Refactored table metric calculation to compute data once and apply multiple statistics, improving performance
Added MetricProgressBar for user feedback during metric calculations

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
decent_bench/agents.py	Refactored Agent to use dict-based x_history with configurable history_period; added x_updates counter to AgentMetricsView
decent_bench/benchmark_problem.py	Added agent_history_period field to BenchmarkProblem and create_regression_problem
decent_bench/networks.py	Updated Agent instantiation to pass history_period parameter
decent_bench/benchmark.py	Added computational_cost parameter to benchmark function; removed Status wrappers for table/plot generation (now use progress bars)
decent_bench/metrics/metric_utils.py	Added MetricProgressBar class; updated x_mean and x_error to work with dict-based history
decent_bench/metrics/table_metrics.py	Added customizable fmt parameter for value formatting; refactored to calculate data once per metric; added progress bar
decent_bench/metrics/plot_metrics.py	Added ComputationalCost dataclass; updated to handle sparse history and scale x-axis by computational cost; added progress bar
test/test_agents.py	Added comprehensive test for in-place operations with history tracking across multiple frameworks
docs/source/user.rst	Added example usage of agent_history_period parameter
docs/source/developer.rst	Removed trailing whitespace
docs/source/api/decent_bench.metrics.metric_utils.rst	Added exclude-members directive to hide MetricProgressBar from API docs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

decent_bench/agents.py

test/test_agents.py

decent_bench/metrics/metric_utils.py

test/test_agents.py

decent_bench/metrics/table_metrics.py

decent_bench/metrics/plot_metrics.py

decent_bench/benchmark_problem.py

test/test_agents.py

decent_bench/metrics/plot_metrics.py

decent_bench/metrics/table_metrics.py

decent_bench/agents.py

decent_bench/benchmark_problem.py

decent_bench/metrics/metric_utils.py

Simpag · 2025-12-19T15:58:03Z

Would it be of use to be able to plots over iterations and over costs at the same time? That is for the default plotmetrics you would get 4 plots instead of 2?

nicola-bastianello · 2025-12-19T16:43:21Z

Would it be of use to be able to plots over iterations and over costs at the same time? That is for the default plotmetrics you would get 4 plots instead of 2?

I'm not sure it would be good to have it as default, it might be confusing to casual users. but it definitely sounds interesting to have this as a possibility for "advanced" users; so maybe a boolean toggle?

this makes me think about the layout of the plots: currently, the plots are arranged in maximum two columns, each with as many elements as needed. however, since all plots share the same x-axis, it might be better to have a single column that is as long as needed (maybe a warning of poor readability is emitted when more than 4/5 plots need to be plotted). this would also allow having a single x-label, and customizing only y-labels.

with this new layout, then your suggestion would just require creating two columns, one with iterations for the x-axis the other with computational cost

nicola-bastianello · 2025-12-19T16:45:59Z

also, currently a legend is printed for each separate subplot; this is redundant information. as part of the layout redesign, we could print only a single legend. and I would actually place it before the first subplot (like figure 1 in https://arxiv.org/abs/2501.13516)

nicola-bastianello · 2025-12-19T16:56:13Z

docs/source/index.rst

 decent-bench allows you to benchmark decentralized optimization algorithms under various communication constraints,
 providing realistic algorithm comparisons in a user-friendly and highly configurable setting.

+Report any bugs you *may find* to `GitHub <https://github.com/team-decent/decent-bench/issues>`_.


could you expand "report any bugs ..." adding "contributions are welcome, see developer guide on how to get started. Please contact Dr. Nicola Bastianello (with link https://bastianello.me/) for discussions". and with this change we can consider #162 closed

Simpag · 2025-12-19T17:17:57Z

I have pushed the majority of the fixes. Will post the rest later this evening. You can now test the new plots

nicola-bastianello · 2025-12-19T18:01:35Z

The new plots look great, thanks!

Simpag · 2025-12-19T18:33:24Z

Would it be of use to be able to plots over iterations and over costs at the same time? That is for the default plotmetrics you would get 4 plots instead of 2?

I'm not sure it would be good to have it as default, it might be confusing to casual users. but it definitely sounds interesting to have this as a possibility for "advanced" users; so maybe a boolean toggle?

this makes me think about the layout of the plots: currently, the plots are arranged in maximum two columns, each with as many elements as needed. however, since all plots share the same x-axis, it might be better to have a single column that is as long as needed (maybe a warning of poor readability is emitted when more than 4/5 plots need to be plotted). this would also allow having a single x-label, and customizing only y-labels.

with this new layout, then your suggestion would just require creating two columns, one with iterations for the x-axis the other with computational cost

Will all plotmetrics have the same x-label? If not it might be strange to only show the label for one?

nicola-bastianello · 2025-12-19T19:34:29Z

yes I think all plotmetrics will have the same x-label (either "iterations" or "computational cost" -- or whatever names we decide to use). so it will be ok to only show one (the x-label of the plot at the bottom). we should use the sharex=True option of matplotlib.subplots, and define the x-label of the bottom plot only

Simpag · 2025-12-19T19:45:39Z

Something like this? This is using ComputationalCost(proximal=2.0, communication=0.1) and I made a boolean called show_cost_and_iterations but I don't like the name. Do you have any suggestions?

If we decide to use the same x-label for all plots then we should remove the x_label property of plot metrics

nicola-bastianello · 2025-12-20T08:43:39Z

yes, this is exactly what I had in mind, looks great! for the x-label of the first column, I would maybe use "time (computational cost units)" and for the second column "iterations"

yes, let's remove the x-label property from plot metrics. it might be interesting to allow users to customize the x-labels of the two columns, but we can think about that later

maybe instead of show_cost_and_iterations we could have compare_iterations_and_computational_cost? I know it's very long, but this is not a trivial thing to understand so it would benefit from a clear name

another thing: when we have two column, the y-label of the plots in each row is going to be the same. in that case we could keep it only for the first column, and also use sharey=True

Simpag · 2025-12-20T16:02:46Z

Should we maybe have grid on too? I have turned it on and I think it helps with readability, let me know what you think

Simpag · 2025-12-20T16:31:14Z

MyPy is failing because it is using networks.py from the main branch for some reason (refers to line 521 which doesnt exist in this PR). This issue is mentioned by Elias in some issue, it passes mypy on my machine. I dont have access to modify the checks so I cannot fix this nor do I have a lot of experience with github checks.

From what I can find the workflow should be updated to add:

with:
      ref: ${{ github.event.pull_request.head.sha }}

after the uses: actions/checkout@v4 for each of the jobs (under the step named name: Check out code). For example mypy would become (all others should be updated too):

mypy:
    runs-on: macos-latest

    steps:
      - name: Check out code
        uses: actions/checkout@v4
        with:
          ref: ${{ github.event.pull_request.head.sha }}

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.13"

      - name: Install tox
        run: pip install tox

      - name: Run mypy
        run: tox -e mypy

nicola-bastianello · 2025-12-21T09:32:46Z

Should we maybe have grid on too? I have turned it on and I think it helps with readability, let me know what you think

yes, that's a good idea, let's have it on by default. but maybe we can also have a toggle to disable it

nicola-bastianello · 2025-12-21T09:35:07Z

MyPy is failing because it is using networks.py from the main branch for some reason (refers to line 521 which doesnt exist in this PR). This issue is mentioned by Elias in some issue, it passes mypy on my machine. I dont have access to modify the checks so I cannot fix this nor do I have a lot of experience with github checks. ......

this is very annoying.. Elias is not available for fixing this until mid-January I think, and I don't know how to do it. does syncing your branch with the current main work as a temporary fix?

decent_bench/metrics/plot_metrics.py

nicola-bastianello · 2025-12-21T12:58:34Z

decent_bench/metrics/metric_utils.py

+
+def common_sorted_iterations(agents: Sequence[AgentMetricsView]) -> list[int]:
+    """
+    Get a sorted list of all common iterations reached by agents in *agents*.


I would maybe add a bit more context, like: "since the agents can sample periodically, and potentially at different times, this function can be used to find the numbers of iterations where all agents have recorded their states, which can then be used to compute the metrics"

nicola-bastianello · 2025-12-21T13:04:03Z

decent_bench/metrics/plot_metrics.py

-        subplot.legend()
+
+    if len(metrics) > 4:
+        LOGGER.warning(


let's open an issue for the following addition: creating several figures, each with 4 plots, when more than 4 metrics are plotted. alternatively, we could provide a boolean toggle to plot each metric in a separate figure. in any case when we implement storing the results users will be able to create their own custom plots

Currently we have a sequence of plot metrics to specify what you want to plot, what if we changed it to be nested lists? Then the elements would be figures and then the second list would be which metrics in each figure? Like

plot_metrics=[ [Regret, GradientNorm], [PlotMetric3, PlotMetric4, PlotMetric5], ]

Would yeild 2 figures with the first having Regret and GradientNorm and the second figure having three plots.

I think I would prefer that the user still only specifies a list, and the framework takes care of the plotting details. otherwise it would be a little less user friendly

but again let's discuss in the future, there is not rush to change this

The reason why I would like that the user specifies what is contained in one figure is that a user might want certain subplots in one figure. If the framework decides then they would need to run multiple plotting runs (when we save the final state).

Could make it so that both work. Maybe have an int specifying maximum number of plots per figure and list of lists also work to manually split them?

I agree that it's good to leave the users more freedom to decide what to do with the plots. however, I have some doubts about having so many options (and their docs), because I think it might get confusing quite quickly and put off some users. in a sense, if we provide this plotting functionality we are providing a non-trivial interface to matplotlib, which is harder to maintain in the long run. I say non-trivial because it's not just passing the same kwargs of matplotlib. this is the same discussion as networkx in #233

we also need to consider whether users would use these plots directly in papers or if they would want to customize them. I personally would likely look at the plots coming out of decent-bench to decide what to display in papers, and then create my own using the data stored after the benchmark has executed. this is because I often need to control the fontsize, add a reference number like [4] to legen labels, etc. but if we want to provide this level of customizability to the plots, then essentially we are reproducing matplotlib or providing a complex interface to it, which is not a good idea. and we have just seen that creating plots is hard and subplots are especially messy

so to conclude: I'm not against having some additional options to customize the plots, but I don't want them to be too complicated, since I expect users will create their own plots afterwards. I also don't want to have to maintain complex plotting functionality in the code-base. so for now I would keep the warning, and open an issue copy-pasting this discussion

once my colleagues (hopefully) start using the code-base, I can also ask them what they would like on the plotting side

Maybe then instead of having customizable plots we could export the data used to create the plots so that users can customize their own plots? Creating the data for the plots is not very trivial as the structurer is rather nested, having the pre-calculated data would be much faster and just export it as a csv. We should probably have a logs folder anyway where we store all of the important console outputs and such from each run

yes, I think this is a very good idea. let's aim for two things: 1) a simple plot provided by us at the end of the benchmark execution, with limited customizable options (I would keep the option of having iterations and computational cost), 2) the data is also exported in a log/results folder, so that users can create their own plots; in my experience storing data as ndarrays in a npz file works well, but also csv works, although we likely need to use pandas to import it.

but 2) requires a lot of work, so I would open an issue and leave it for the future (also related to #217)

I think we can include this into the metric union (table + plot) issue. Exporting to csv is more general and easier to work with than numpy data imo. A csv also allows users to quickly inspect it, there are (think built in) easy libraries to import csv files as dictionaries in python using the csv module but pandas is much easier to work with and allows for very easy plotting.

The logs folder could include full console log, a file for the table metrics as latex format and the plot data + image

good points! sounds good to go for csv files then. and yes, let's include this into #220

nicola-bastianello · 2025-12-21T13:23:06Z

there are some issues with the figure as rendered on my laptop: 1) the x label half disappears under the bottom border, 2) the y labels are not printed, 3) the legend hangs over the top plot

here is what I see, both figure as it appears and figure maximized to full screen

Simpag · 2025-12-21T16:33:48Z

MyPy is failing because it is using networks.py from the main branch for some reason (refers to line 521 which doesnt exist in this PR). This issue is mentioned by Elias in some issue, it passes mypy on my machine. I dont have access to modify the checks so I cannot fix this nor do I have a lot of experience with github checks. ......

this is very annoying.. Elias is not available for fixing this until mid-January I think, and I don't know how to do it. does syncing your branch with the current main work as a temporary fix?

It will solve it but there really isnt a point unless there are major merge conflicts, I've already made sure it passes mypy before commiting.

Simpag · 2025-12-21T16:43:30Z

there are some issues with the figure as rendered on my laptop: 1) the x label half disappears under the bottom border, 2) the y labels are not printed, 3) the legend hangs over the top plot

here is what I see, both figure as it appears and figure maximized to full screen

That is really annoying... I guess matplotlib is dependent on screen resolution. As you can see above they are square for me. I'll have to take a look at it, I assumed the padding and such was ratio based? What is your screen resolution? Could you also provide me with your script

Edit: y-label not showing was because of my logic mistake, I did if use_cost and compare_iterations_and_computational_cost and i % 2 == 0: so it only set if you use cost and I didnt notice it

nicola-bastianello · 2025-12-22T09:55:32Z

great that the y-label is fixed.

I have 1920x1080 screen resolution, and I just sent you the script by email

nicola-bastianello · 2025-12-22T09:56:13Z

MyPy is failing because it is using networks.py from the main branch for some reason (refers to line 521 which doesnt exist in this PR). This issue is mentioned by Elias in some issue, it passes mypy on my machine. I dont have access to modify the checks so I cannot fix this nor do I have a lot of experience with github checks. ......

this is very annoying.. Elias is not available for fixing this until mid-January I think, and I don't know how to do it. does syncing your branch with the current main work as a temporary fix?

It will solve it but there really isnt a point unless there are major merge conflicts, I've already made sure it passes mypy before commiting.

ok then for now let's solve it this way, and we can talk about it with Elias in January

Simpag · 2025-12-22T19:33:12Z

I have hopefully solved it now, I made a lot of changes to hopefully make it more stable. The plots might be very big on lower resolutions if more than 2 plot rows are used but I kept this because I wanted the plots to keep the same shape and size independent from how many plots you show. Can make the individual plots smaller but that would require to use a smaller font size and some other minor modifications. Let me know what you think. Also moved away from plt.tight_layout to layout="constrained" as per matplotlibs documentation recommendation as apparently plt.tight_layout was their first layout engine and its not the best.

nicola-bastianello · 2025-12-23T11:23:42Z

thanks for the hard work, the plots have definitely improved. when I maximize the figure window everything displays as intended. however, the window as it is first displayed (not maximized) is too high for the screen, since it disappears under the Windows task bar at the bottom (the upper edge of the figure is aligned with the top of the screen):

but when I save the figure, the x-labels are correctly displayed and not cut off

so I think this is a matter of the height of the figure being too high, because the usable screen space is smaller due to the taskbar

Simpag · 2025-12-23T13:07:36Z

That is still not how it looks on my end. I spent a lot of time making sure that the plots looked the same no matter if you had 1 plot or 4 and that the label box never clipper into the plots. I copied the logic from matplotlib source code that calculates the height of certain parts and did empirical testing for some things that were too complex to find anything on. Are you using any kind of zoom/scaling in windows?

nicola-bastianello · 2025-12-23T14:23:15Z

Are you using any kind of zoom/scaling in windows?

no, I'm not using any zoom/scaling (at least as far as I was able to verify in Window's messy settings). one thing to notice is that I'm still on windows 10 on the KTH laptop

anyway, this is a very messy problem and I don't think we can easily solve it ourselves. I was looking a bit more at matplotlib options, and it looks like there might be way to specify legend position with loc="outside upper center", see here. it looks like this is available in the stable version 3.10. if this works I think it's the best way to do it, so that we delegate the messy plotting entirely to matplotlib

Simpag · 2025-12-23T15:50:48Z

I think I will remove everything that has to do with trying to make the plots have the same size regardless of how many plots you show and instead just have a fixed window size (like how matplotlib does it by default) and fit the plots into that. If these plots are more meant as a quick overview for how well an algorithm works then they dont have to bee too pretty.

I will also take a look at that "outside upper center". I have not seen that before and if it exists they havent updated their loc error because it prints all the possible values and it is not in there. Might have been because I was using tight_layout and it seems like those locations are new for the constrained layout.

nicola-bastianello · 2025-12-23T15:54:21Z

sounds good, thank you! I also find that the new loc option is not widely documented, but it seems to be in the stable version.. let's see if it works

Simpag · 2025-12-23T16:25:00Z

Lets hope this works, let me know! Also, interestingly the outside location is not in their error message, see:
ValueError: 'outsidee upper center' is not a valid value for loc; supported values are 'best', 'upper right', 'upper left', 'lower left', 'lower right', 'right', 'center left', 'center right', 'lower center', 'upper center', 'center'

nicola-bastianello · 2025-12-23T16:51:41Z

I confirm that this works as intended, well done! last thing: could you please update the example plot in the user guide? then I'll merge

Simpag added 4 commits December 17, 2025 01:50

enh(Agent): Periodic historical x tracking

b1c6d4c

docs(Advanded): Add advanced developer guide

be73521

test(Agent): Test inplace operators and some user docs update

2261f24

ref(Docs): Move advanced dev guide to new PR

bb25c0c

Copilot AI review requested due to automatic review settings December 17, 2025 01:07

Simpag requested review from elramen and nicola-bastianello as code owners December 17, 2025 01:07

Copilot started reviewing on behalf of Simpag December 17, 2025 01:07 View session

Simpag changed the title ~~ref(Agent): Allow for periodic historical tracking of X state~~ enh(Agent): Allow for periodic historical tracking of X state Dec 17, 2025

Copilot AI reviewed Dec 17, 2025

View reviewed changes

fix(PR): Fix PR comments

e758a77

nicola-bastianello mentioned this pull request Dec 19, 2025

Multiprocess metrics #143

Closed

nicola-bastianello reviewed Dec 19, 2025

View reviewed changes

decent_bench/metrics/metric_utils.py Outdated Show resolved Hide resolved

nicola-bastianello reviewed Dec 19, 2025

View reviewed changes

fix(Agent): PR issues

0e49f32

Simpag added 3 commits December 20, 2025 17:19

fix(Agent): Fix PR comments

8e27b5a

fix(Agent): Change history_period to state_snapshot_period

3c202e2

fix(Test/Agent): Fix initial argument in test

d9f5372

nicola-bastianello reviewed Dec 21, 2025

View reviewed changes

decent_bench/metrics/plot_metrics.py Outdated Show resolved Hide resolved

nicola-bastianello reviewed Dec 21, 2025

View reviewed changes

Simpag added 3 commits December 22, 2025 20:12

fix(Plots): Fix plotting

92fae58

ref(Plot): Reorder functions

14f41e0

Merge branch 'team-decent:main' into x-tracking

c681d28

Simpag added 2 commits December 22, 2025 20:33

fix(Networks): Add snapshot period to FED network creation

fe4b547

ref(Plot): Update arg doc

620b038

fix(Plot): Allow matplotlib to handle layout

a28e95c

docs(User): Update example plot

55d1a65

nicola-bastianello merged commit 607a6a2 into team-decent:main Dec 24, 2025
7 checks passed

Simpag deleted the x-tracking branch January 19, 2026 15:33

Conversation

Simpag commented Dec 17, 2025 • edited by nicola-bastianello Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Simpag commented Dec 19, 2025

Uh oh!

nicola-bastianello commented Dec 19, 2025

Uh oh!

nicola-bastianello commented Dec 19, 2025

Uh oh!

nicola-bastianello Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Simpag commented Dec 19, 2025

Uh oh!

nicola-bastianello commented Dec 19, 2025

Uh oh!

Simpag commented Dec 19, 2025

Uh oh!

nicola-bastianello commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Simpag commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicola-bastianello commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Simpag commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Simpag commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicola-bastianello commented Dec 21, 2025

Uh oh!

nicola-bastianello commented Dec 21, 2025

Uh oh!

Uh oh!

nicola-bastianello Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

nicola-bastianello Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

Simpag Dec 21, 2025

Choose a reason for hiding this comment

Uh oh!

nicola-bastianello Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Simpag Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Simpag commented Dec 17, 2025 •

edited by nicola-bastianello

Loading

nicola-bastianello Dec 19, 2025 •

edited

Loading

nicola-bastianello commented Dec 19, 2025 •

edited

Loading

Simpag commented Dec 19, 2025 •

edited

Loading

nicola-bastianello commented Dec 20, 2025 •

edited

Loading

Simpag commented Dec 20, 2025 •

edited

Loading

Simpag commented Dec 20, 2025 •

edited

Loading

Simpag Dec 22, 2025 •

edited

Loading

Simpag Dec 23, 2025 •

edited

Loading

nicola-bastianello Dec 23, 2025 •

edited

Loading

Simpag Dec 23, 2025 •

edited

Loading

nicola-bastianello commented Dec 21, 2025 •

edited

Loading

Simpag commented Dec 21, 2025 •

edited

Loading

Simpag commented Dec 22, 2025 •

edited

Loading

nicola-bastianello commented Dec 23, 2025 •

edited

Loading

Simpag commented Dec 23, 2025 •

edited

Loading