Skip to content

Conversation

@marvinKaster
Copy link
Collaborator

Introducing nested timers. You can now create timed sections within other timed sections. The spent time will be outputted in a hierarchical format.

Code example

PE(first_timer);
//...
PE(second_timer);
//...
PL(second_timer);
//...
PL(first_timer);

Example output of the ring benchmark

REGION                         CALLS       WALL    WALLCHILDS    WALLCAPTURED%    THREAD        %    
arbor                              2    285.048       285.129            100.0    35.631    100.0    
  run                              0    279.246       273.738             98.0    34.906     98.0    
    update                       199    273.264       218.751             80.1    34.158     95.9    
      advance                  19999    218.640       218.333             99.9    27.330     76.7    
        integrate              19999    218.067       135.328             62.1    27.258     76.5    
          cable              3999999     55.818         0.000              0.0     6.977     19.6    
          current_pas        3999999     24.780         0.000              0.0     3.098      8.7    
          ionupdate          3999999      6.131         0.000              0.0     0.766      2.2    
          zero               3999999      5.964         0.000              0.0     0.745      2.1    
          state_hh           3999999      5.259         0.000              0.0     0.657      1.8    
          threshold          3999999      5.157         0.000              0.0     0.645      1.8    
          samples            3999999      4.836         0.000              0.0     0.604      1.7    
          stimuli            3999999      4.567         0.000              0.0     0.571      1.6    
          current_expsyn     3999999      4.297         0.000              0.0     0.537      1.5    
          current_hh         3999999      4.189         0.000              0.0     0.524      1.5    
          state_expsyn       3999999      4.132         0.000              0.0     0.517      1.4    
          state_pas          3999999      3.698         0.000              0.0     0.462      1.3    
          post               3999999      3.518         0.000              0.0     0.440      1.2    
          setup                19999      2.981         0.000              0.0     0.373      1.0    
          event_expsyn            93      0.000         0.000              0.0     0.000      0.0    
        clear                  19999      0.134         0.000              0.0     0.017      0.0    
        sampledeliver          19999      0.099         0.000              0.0     0.012      0.0    
        samplesetup            19999      0.034         0.000              0.0     0.004      0.0    
      spikes                   19999      0.111         0.000              0.0     0.014      0.0    
    enqueue                    19999      0.458         0.354             77.3     0.057      0.2    
      merge_cell_events        19999      0.306         0.123             40.2     0.038      0.1    
        reserve                19999      0.031         0.000              0.0     0.004      0.0    
        tree                   19999      0.029         0.000              0.0     0.004      0.0    
        setup                  19999      0.022         0.000              0.0     0.003      0.0    
        generators             19999      0.022         0.000              0.0     0.003      0.0    
        push_back              19999      0.019         0.000              0.0     0.002      0.0    
      sort                     19999      0.026         0.000              0.0     0.003      0.0    
      clear                    19999      0.022         0.000              0.0     0.003      0.0    
    exchange                     199      0.016         0.013             81.4     0.002      0.0    
      exchange                   199      0.008         0.006             79.5     0.001      0.0    
        gather                   199      0.005         0.000              0.0     0.001      0.0    
        sort                     199      0.001         0.000              0.0     0.000      0.0    
        remote                   199      0.000         0.000              0.0     0.000      0.0    
        post_process             199      0.000         0.000              0.0     0.000      0.0    
      walkspikes                 199      0.003         0.000              0.0     0.000      0.0    
      gatherlocal                199      0.001         0.000              0.0     0.000      0.0    
      spikeio                    199      0.001         0.000              0.0     0.000      0.0    
  init                             0      5.882         5.883            100.0     0.735      2.1    
    group                          0      5.881         5.716             97.2     0.735      2.1    
      factory                     99      5.715         0.000              0.0     0.714      2.0    
      targets_and_sources         99      0.001         0.000              0.0     0.000      0.0    
    sources                        0      0.001         0.000              0.0     0.000      0.0    
    targets                        0      0.001         0.000              0.0     0.000      0.0    
  update                           0      0.000         0.000             74.0     0.000      0.0    
    update_conns                   0      0.000         0.001            367.1     0.000      0.0    
      local                        0      0.000         0.000              0.0     0.000      0.0    
      sort                         0      0.000         0.000              0.0     0.000      0.0    
      remote                       0      0.000         0.000             43.6     0.000      0.0    
        connections                0      0.000         0.000              0.0     0.000      0.0    
        destructure                0      0.000         0.000              0.0     0.000      0.0    
        sort                       0      0.000         0.000              0.0     0.000      0.0    
      collect_gids                 0      0.000         0.000              0.0     0.000      0.0    
      destructure                  0      0.000         0.000              0.0     0.000      0.0    
      index                        0      0.000         0.000              0.0     0.000      0.0    
      raw                          0      0.000         0.000              0.0     0.000      0.0    
      generated                    0      0.000         0.000              0.0     0.000      0.0    
      partition                    0      0.000         0.000              0.0     0.000      0.0    
    generators                     0      0.000         0.000              0.0     0.000      0.0  

@thorstenhater
Copy link
Contributor

Hi @marvinKaster,

you are running into a CI issue that has been fixed in main. Please rebase and it should be gone.

@thorstenhater
Copy link
Contributor

Hi @marvinKaster,

#2490 has a fix for the last failing test. Could you give it a review?

Best,
TH

Copy link
Contributor

@thorstenhater thorstenhater left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @marvinKaster,

thanks, that will make life a bit easier. I've left a few suggestions.

bool running = false;
};

struct TimerStackHash {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use the internal FNV hasher, which is used in label resolution, too.

current_timer_stack.push_back(index);
auto& cur_acc = accumulators_[current_timer_stack];
if (cur_acc.running) {
throw std::runtime_error("recorder::enter you entered the timer twice "+names[index]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are throwing here anyhow, how about adding more info? The full stack of ids maybe or even more?! See also later.

// recorder implementation

const std::vector<profile_accumulator>& recorder::accumulators() const {
const std::unordered_map<timer_stack, profile_accumulator, TimerStackHash>& recorder::accumulators() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give that type a name please ;)

arb::profile::profiler_enter(region_id_); \
}

// leave a profling region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly a note for later updates. We could also offer a RAII macro here now, in the style of:

struct AutoProf {
 AutoProf(const auto& label): lbl(label) { enter(lbl); }
 ~AutoProf() { leave(lbl); }
  
  std::string lbl;
};

Even some macro magick could be used to generate unique names based on __FILE__ and __LINE__.

// This is where tasks of the task_group are actually executed.
void operator()() {
auto prev_timer_stack = arb::profile::get_current_timer_stack();
arb::profile::thread_started(timer_stack);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

task_started / _stopped might be more appropriate. Threads are running all the time and are woken from yield/sleep when tasks are pushed.

#define PL arb::profile::profiler_leave
#define PL(name) \
{ \
static std::size_t region_id_ = arb::profile::profiler_region_id(#name); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible extension: we could record FILE LINE FUNCTION when we set up the region.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants