-
Notifications
You must be signed in to change notification settings - Fork 65
More uniform sampling #2440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
More uniform sampling #2440
Conversation
|
Hi Thorsten, I strongly believe throwing away the genericity of the sample_record is a bad idea; not everything we have now is best represented by a vector of doubles, and certainly not everything we might conceivably represent in the future. It also prevents us from using the type system to ensure that the concordance of probe types and sample consumers. What is the motivation for discarding this? The simple sampler etc. and cable cell-specific sampling data representations can be changed if there's a need, but reducing a relatively small amount of complexity in representation through the removal of a large amount of expressive power is a bad trade-off. |
|
Hi @halfflat, I have been doing a lot of model building / collaboration lately and terms of practical end-user experience, There are four types of meta data (mlocation, [mcable], point_info, [point_info]) and two types of payload (double, [double]). In C++ we therefore need to either know the type of probe or rely on RTTI in some capacity. This is made The Python interface is disjoint from the C++ API. Here we deal with tables of samples at the end of The way interpolation works has bitten me and others before. Overall, my experience is that sampling is quite complex, more powerful than needed in the cases I have come across,
Net effect would be
thus significantly less casting in (C++) user code and only ever the need to deal with one table per probe in Python. Next, I want to allow switching off interpolation and add a few canned samplers in addition to So far, I am just playing with these ideas to see how the effect on production code looks like. |
|
Also, since I am now done with the usage code in C++, you could take a look at |
|
Just on a few of the points raised in description and your notes above, before I look more thoroughly at the probe-demo code.
I think there are two issues that are at the root of the irritations:
On the second point, I think the generic use case, where user code really doesn't care about the semantic content of the probed data but just wants to present it in a generic way, is the less import case when compared to situation where the user code expects particular sorts of probed data for the purposes of further analysis, where a generic representation is less safe than a type-checked representation. But that doesn't mean the generic case shouldn't be better supported. A possible approach is to include in the probe metadata a (possibly empty) function that maps the type-abstracted sample records to e.g. a JSON or flatbuffer (plus flatbuffer schema) representation that in-turn can be parsed and processed in a generic way by the user code. For the first point, I have been wondering if we could adopt a Python async approach: the python lib sets up callbacks that push sample data into queues which are drained asynchronously and which block forward simulation progress until the Python user side code pulls out enough sample data. It would avoid, potentially, the issue of keeping all the sample data in memory before any processing or I/O can deal with it. |
|
The performance / memory concerns of I will think through your points more carefully, but I want to stress the central point that I hid among the details: This is about the user interface, because an elegant design doesn't help if it's annoying to use. The Rust folks use 'ergonomic', which fits the idea really well. I am also not the first person to think about sampling from this point of view, see #1929 |
|
I guess I can't really see how the current design (in broad strokes; details of course can be improved and simple_sampler also is a mess) is less frustrating than the alternative. Squishing everything into the one data representation throws away flexibility and type-safety — we're swapping one problem for another. If you can verify the type, you know the data structure and can work with it. If you have to infer the structure and meaning of the data from a generic representation, you have to do a lot more runtime work to avoid semantic errors. But also, and this is something I really don't see, what is so awful about calling |
|
Just to expand on the first part of the reply: if we represent a set of |
Sure, that isn't up to debate and was not relaxed. The core change is to acknowledge that most often things come in bunches, so instead of one value, there's now a row of values. What I wanted to point out is that there's an injective
Agreed, that wasn't really changed, except me overeagerly flattening locations and cables, which I am unhappy with, see above. Expect that to be reverted. Metadata is still defined by the probe type. I am less sure about the value type though. I am not really fond of building for generality without the need and currently none of our probes do return more than a value or an array thereof.
I fail to see how a user could and should know that and rather protect them from such details
Me too, but I am lacking the time and use case.
True, I am still in favour of that idea. However, let's make it easy on our users (easier at least). |
|
So, now things are looking a lot closer to what I want.
TODO: documentation is outdated now. |
This reverts commit 015db62.
|
Fixed all open points and cleaned up documentation. @halfflat if you're still interested in this topic, please take a look, else I'll open it up for review. |
Intro
This PR explores a few directions for simplifying the sampling interface from the user perspective.
It's not the only way to approach the problems and possibly not the best.
Grievances
There are several stumbling blocks from that perspective that make the use of samplers unergonomic and hard to
communicate
Metadata
Metadata is delivered without type information as
std::anyand must be cast. Missing aconsthas consequences.The type must be determined by knowing the metadata type associatated the probe set in the recipe. Samplers and
probes aren't connected directly to each other, but via a handler that doesn't carry (or allows retrieval of) more
information. There are currently four types:
mlocation,[mcable],point_info, and[point_info].Scalar and Vector probes
Full cell probes generate samples with a vector payload
[double]and vector metadata while point probes generate what's essentially a vector of scalar (payload/metadata) probes.More
anyEach entry in the payload needs to cast from
anyto a scalar or vector (of doubles). Again, missingconstis fatal.Summary Effect
The user is required to distinguish four metadata types and apply the correct casts, while also applying matching casts
to the payload. The information is disaggregated over simulation and recipe code, thus using existing recipes comes at
an extra cost. Moreover, simulations are somewhat brittle towards changes in the probe definitions, which require a full
review of the attached samplers. Despite the obvious implication of metadata type on payload type, the user is required
to track this information.
In Python this might seem less problematic due to dynamic typing, yet differentiating between lists and scalars leads to
different code, which make simulation scripts reliant on dynamic type info (
isinstanceand ilk).Possible Remedies and Status.
A list of ideas to reduce churn and verbosity.
anycasts. Seemed too little of an effect to implement.anycast(per entry!) completely. I also removed the differentiation between location and cables, which I am less fond of. There's also the question whether[any=mlocation|mcable|point_info]orany=[mlocation]|...is preferable, neither is really expression what's going on, namely a homogeneous vector of unknown type.(gid, 'tag')(akacell_address_type). Decided against it, as it doesn't help with the core (in my view) problems; users are still needed to track probe <> sampler relations.anytovariant<...>. Didn't add this -- for now -- since it requires one header to have all the metadata types. Although the set of possible metadata isn't really open, it's possible to change one type without rebuilding all of Arbor. On the positive side, it's more typesafe and transparent to the user. Doesn't help with Python, though.As implemented, C++ and Python code for sampling is now significantly less busy, see
probe-demo.cpp. The backend code is also somewhat more straightforward. Some conceptual feedback is needed before continuing.EDIT: As sensible compromise might be this (not literally, but conceptually, as these are function arguments in Arbor.)