Conversation
|
I think I have an even better solution than parallelizing - eliminating the section entirely and just keeping track of the zone linkage volumes during the zone creation instead of going through everything a second time |
|
So I have an update, its a bit messy, so I'm not intending on having it do a pull request to merge in here, but rather to provide an alternative implementation for you to look at, but here is a possible update on its own branch: https://github.com/DESI-UR/VAST/tree/zonelink_redesign_2 The main trick is thus - as we build the zones, we have to go through every galaxy/cell anyway, so why not just look at their linkage volumes as well and keep track of the zone linkage volumes then? This relies on a key trick - sometimes we will be adding the current galaxy to a zone but its neighbors will not have been added to a zone yet, but later when those neighbors come up as the focus of adding to a zone, we can add the linkage then, and so we can do the 'redundant' thing like: This eliminates the need for the entire Something that seemed natural to me is that the maximum linkage volume for a pair of zones could easily be stored in a hashmap/dictionary using the key as a 2-tuple For clarity, I rewrote the zone creation section using a dictionary where the key is a Zone_ID, and then it has natural-language fields describing the properties of the zone. This makes things clearer in my opinion, but I had a difficult time figuring out what to do with the information in the next section If you have a better understanding than me about what needs to happen in Voids/prevoids creation, I think I can take another swing at optimizing. Before I moved to eliminating the zlinks loop entirely, I had started a two-pass section where first we get all the zone linkages, and then we could have had a parallel section calculating the zone linkage volumes, and that section I left in there commented-out with a comment heading like "Teriary implementation", if you're wondering what that was for. Lastly, I haven't addressed the visualization bits, I saw what you added in terms of normal vectors and face stuff and I think it can be added here but I was just trying to solve the core speed issue first. So take a look and let me know what you think. |
|
@QuiteAFoxtrot Thanks for the work on this! I've taken a look at the new code to familiarize myself with it. The Voids class docstring links to the original ZOBOV paper, and my understanding is that the class is adapting section 2.3 of the paper, reproduced here for convenience: 2.3 From zones to voids Zones are joined as follows. Imagine a 2D density field (represented as height) in a water tank. For each zone z, the water level is set to z’s minimum density, and then raised gradually. Water may flow, along lines joining Voronoi neighbours, into adjacent zones, adding them to the void defined around the zone z. The process stops when water flows into a deeper zone (with a lower minimum than z’s), or if z is the deepest void, when water floods the whole field. The final void corresponding to z is defined as the set of zones containing water just before this happens. The minimum-density (core) particle of the original zone is also the minimum-density particle of the zone’s void. Many low-significance zones fail to annex surrounding zones as they attempt to grow; a zone in this situation has a void equal to itself. The density (water level) at which water flows into a deeper zone is recorded as ρl(z) (l stands for ‘link’ to a deeper zone). With the above analogy, the Zones class goes as far as calculating the heights of the saddle points separating the zones through which water will flow. The Voids class then takes over for the rest of the steps. The VIDE and ZOBOV pruning make use of I also wonder if the Voids logic could be absorbed into the zone-building loop, similar to what you did for the zone-linking. That would eliminate the need for a separate voids class. Let me know your thoughts |
|
An update: I've ran some tests and The |
|
I've created a new branch zonelink_redesign_3 that converts the new data structures from zonelink_redesign_2 into a usable algorithm output. I'm noticing that when I run the vsquared example script on both branches, zonelink_redesign_3 produces 619 zones, wheras zonelink_redesign produces 620 zones (same as the main branch). I've verified that the difference originates directly after the zones are created in the zonebuilding stage and not later in the code. There seems to then be some sort of differerence that results in one less zone in the new code |
|
Maybe something to do with the fact that the old version uses Wheras in zonelink_redesign2/3 I separated out a new list |
|
That seems to be it. Further comparing the catalogs, this change propagates into which galaxies are considered edge galaxies. The 0-volume cells are no longer being flagged as edge galaxies, since the elist array is initialized to all zeros (non-edge galaxies) and is no longer being updated for these galaxies (in fact, these were the only galaxies that were categorized as "edge galaxies" to begin with, so without them, the edge column output is all set to 0). For consistency with past versions, we may want to flag them as edge galaxies. I'm finding that between zonelink_redesign2/3, the order of a single pair of the voids in the voids output table has been swapped. I'm not sure of the reason for the swap, but that is something to keep in mind for the unit tests |
|
I've updated the edge galaxy calculation in This is coming directly from the multivoro output, so it might be an issue with multivoro (happens regardless of whether I set num_cpus to 1 or more than 1). But the multivoro voids have previously been shown to resemble the scipy tessellation voids in slice plots, so this mismatch doesn't appear to be signification to the output, at least for the example script. I also discovered a potential bug where two saddle-point Voronoi cells could get treated as the same cell if they have the exact same volume, but the change of this happening seems negligible to me, so I've left my (slower) version of the code that would fix the bug commented out (to clarify, this potential bug is also present on the main branch). |
|
@QuiteAFoxtrot Let me know if you were planning to take another go at optimizing the branch. Apart from the in in |
|
I've figured out what the
Each entry in The above voids list corresponds to the below arrangement of zones. The zone ID is the number in each box. The size of the box is proportional to the largest Voronoi cell within it (used to calculate the "lowest water depth" for a given saddle point). The saddle points correspond to the letters A through E, with the order of the letters corresponding to the order that the saddle points flood as the water rises (if 3+ zones were to meet at a saddle point, then that letter would appear twice in the below list, but that isn't the case for this example).
Here is a script that generates the above And here is a version where I add an extra zone so that three zones (0, 1, 5) meet at a single saddle point.
Now as for why the voids need to be stored this way for VIDE to work, I'm still not sure about that |
|
Taking a look at your updates - just some notes: ~line 1496 in classes.py: I like your breakdown & usage of the new data structure here, but I do have a suspicion and a question - if you look at the docs for dict.fromkeys() here: https://docs.python.org/3.10/library/stdtypes.html - it says "fromkeys() is a class method that returns a new dictionary. value defaults to None. All of the values refer to just a single instance, so it generally doesn’t make sense for value to be a mutable object such as an empty list. To get distinct values, use a dict comprehension instead." Basically, you create a dictionary something like This way there'd be no mucking about with multiple references to the same underlying list object and we can just use .append() instead of list addition (and then also avoid creating and discarding a bunch of new list objects). |
|
I also think this would be a very appropriate time to address the saddle point uniqueness bug, and possibly this zone adjacency bug that you found, though I'm not sure I have a full grasp on them just yet. On the zone-adjacency side, it looks like Galaxy 5098 may be a 0-volume cell? In the second part of the output you posted under "converted to zones" it has a zone label of -1 instead of zone 152. If it is a zero-volume cell, maybe we should add a zero-volume filter pass before the main cell linking to ensure that these galaxies don't end up in a zone in either direction? I know we do currently check for zero-volume cells to add the -1 label during the zone creation loop but maybe we missed a detail. Might be worth asking Mark about - my gut tells me that a zero-volume cell is degenerate and should be excluded but maybe its still important to know something about which zone(s) it is neighbor to later in the code (or for visualization?) Edit: - see next comment in this thread On the saddle point bug, is the idea that there might be a breakpoint with a value of say, 150, and that there are multiple zones which are completely spatially separate? With a goofy 1-D diagram: |
|
I was chatting with Kelly, and I think we might need to do a more robust job in the Tesselation of tracking which galaxies are edge cells (aka possibly feeding into this adjacency issue). Right now, we create an |
I've implemented the suggested changes here in |
|
Good catch on galaxy 5098 appearing with a different zone ID label! I checked, and galaxy 5098 has a cell volume of 62.31. Galaxy 11721 has a cell volume of 79.82. So it isn't either of them having 0-volume cells that's causing this bug. Something else is going on. For the saddle point bug, yes that 1D diagram illustrates what the problem is. To my understanding the algorithm would treat zones 1, 2, 3, and 4 as if they all meet at a single saddle point, and it will combine them all into one "contiguous" void during that iteration, despite that fact that the two pairs of zones could be, for example, at completely opposite corners of a survey (edit: this isn't a problem, since the zones are only connected into a single void in the program loop if they flow into one another). I'll set a up a demo script to test if the bug works the way I believe it does, or if it's not actually an issue. I'll look into adding an |
|
I've updated I also discovered why galaxy 5098 appeared with two differnet zone labels. It's because every galaxy's zone label was being set to -1 before they were processed, so if a galaxy's neighbor hadn't been processed yet, the galaxy's zone was being fictitiously linked to the -1 zone. I've changed the behavior so that this no longer happens by initializing the galaxy zones IDs to -2 (and handling neighbors with I think we're about ready to merge *It may not be possible to parallelize the main loop, since the order in which the galaxies are iterated is intentionally tied to their cell volumes, but it may still be possible to parallelize the time-intensive triangle calculations by splitting them off into another process. Nothing else in the loop is dependent on the information calculated in the visualization code, so it may be possible to isolate it. |
|
Hey Hernan,
Nice work, will try to take a closer look tomorrow morning but for now I
agree, lets merge this into main, then we can start a new issue/branch/pr
for zone building parallelization
Best,
Steve
…On Fri, Feb 6, 2026, 5:24 PM Hernan Rincon ***@***.***> wrote:
*hbrincon* left a comment (DESI-UR/VAST#144)
<#144 (comment)>
I've updated zonelink_redesign_3 to use the edge_cells array we
discussed. I'm not sure if it really makes a difference to use it instead
of checking for zero volume cells (if two galaxies were on top of each
other as you suggested, I'd think that multivoro would throw an error, but
I haven't checked).
I also discovered why galaxy 5098 appeared with two differnet zone labels.
It's because every galaxy's zone label was being set to -1 before they were
processed, so if a galaxy's neighbor hadn't been processed yet, the
galaxy's zone was being fictitiously linked to the -1 zone. I've changed
the behavior so that this no longer happens by initializing the galaxy
zones IDs to -2 (and handling neighbors with zone_ID = -2 appropriately)
before updating their zones to -1 or a number >=0 when they are processed.
Coincidentally, this also removes the redundant "double calculation" of the
cell triangle areas that was happening before, so this change will save
time.
I think we're about ready to merge zonelink_redesign_3 into main, but let
me know if you have other things you'd like to do with the branch first. It
would still be nice if we could parallelize the zone-building if possible,
but that would be more involved and might be best to start on another
branch. Let me know what you think @QuiteAFoxtrot
<https://github.com/QuiteAFoxtrot>
—
Reply to this email directly, view it on GitHub
<#144 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA6CH3GGKBSGR73PESZSIGT4KUIANAVCNFSM6AAAAACR6DBF76VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNRSHAZTIOJSGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Merge changes from `zonelink_redesign_3` into `zonelink_redesign`
|
I've merged |


A redesign of the zone link calculation to use dictionaries instead of nested lists. Also includes a bug fix for the V2 ellipticity calculation in the catalog class and the addition of optional galaxy weighting for V2.