-
Notifications
You must be signed in to change notification settings - Fork 15
Internally cache network data instead of networks to reduce redundancy in network construction #282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bockthom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job @maxloeffler!
I really appreciate your changes. Hope that it really boosts the performance of multi-network creation. And even if this is not the case, this PR contains numerous changes I am in favor of.
bockthom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
|
Before I can finish the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #282 +/- ##
======================================
Coverage ? 82.31%
======================================
Files ? 16
Lines ? 5367
Branches ? 0
======================================
Hits ? 4418
Misses ? 949
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@maxloeffler Could you please figure out why the upload of the coverage report has failed? After the first failing upload (when Codecov started to report a project coverage of 0.00%) I've re-run the upload and it failed again. Thereafter, I've re-run the Build (latest) in order to generate a new coverage report, but even uploading the new report to codecov failed again – I have no idea why. Codecov only provides us with the following hint:
As I have no idea what to do here, I'll leave that up to you to investigate this problem... (but it has a low priority; the other tasks you are currently working on are more important). |
bockthom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes regarding directedness. I have few comments on that - merely regarding tests and documentation.
26aa527 to
a61e6ef
Compare
|
Sorry for the many pushes, im just discovering typos ...... |
"such as example multi networks" is also a typo? |
bockthom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved. 🥳
But we need to rerun the CI pipeline as soon as R 4.0 is removed from the pipeline so that we can figure out whether codecov complains about changes in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances network building performance by caching internal network data rather than repeatedly constructing full network objects. It updates configuration to support new edge attributes and adds directedness enforcement tests.
- Update allowed edge attributes in
NetworkConf - Add tests to enforce directedness rules in author, artifact, and multi-networks
- Refresh changelog to document caching and directedness behavior changes
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| util-conf.R | Add 2025 copyright; adjust allowed edge attributes |
| tests/test-networks.R | Add directedness enforcement tests for partial & multi-networks |
| NEWS.md | Update changelog entries for caching and directedness changes |
Comments suppressed due to low confidence (3)
tests/test-networks.R:1174
- There are directedness tests for author and artifact networks but none for commit networks. Consider adding similar tests for get.commit.network() to ensure commit-network directedness is correctly enforced.
test_that("Enforcement of directedness in sub-networks", {
NEWS.md:14
- [nitpick] The changelog entry references a long list of commit hashes which makes it hard to read. Consider condensing the commit list by referring to the pull request number or grouping hashes, or splitting into multiple bullet points.
- Reduce the amount of redundantly built networks by caching network data internally. This should improve the performance of building multi-networks, especially, when parts of the multi-networks have been built before (#119, PR #282, 64ac42aa743e7f3a724a66bcd551e5b477e30293, ...)
util-conf.R:887
- The update to allowed edge attributes removed 'author.name' and 'author.mail', which are needed when building author-related networks. Please include these attributes back to ensure author metadata isn't dropped.
"event.date", "event.name", "event.info.1", "event.info.2"
This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This function works analogous to 'convert.edge.attributes.to.list' but takes an edge list as input instead of a network. This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This helper function creates a network data object from vertex data and edge data while correctly initializing empty input data. This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'get.author.network', 'get.artifact.netork', and 'get.commit.network' now receive network data instead of networks from the internal network creation functions. Then these functions merges the received data instead of merging networks (like it used to be). This approach should improve performance as it removes the redundancy of decomposing networks in network data to merge them. This works towards fixing se-sic#119. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
A NULL default for 'edge.data' or 'allowed.edge.attributes' does not correctly represent the default use-case of 'construct.network.data'. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'author.name' and 'author.mail' are redundant and can be removed. While we currently do not build networks that have 'event.info.1' or 'event.info.2' edge attributes these attributes are present on the source data and should be allowed edge attributes. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
The directedness of the edge construction algorithm should always align with the directedness of the constructed network. Further, when one sub-network of a multi-network requires (un)directed edge construction, all sub-networks of that multi-network must inherit this directedness. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
After a change in Codecov's coverage report processing, uploaded reports must now contain a listing of all files that could be relevant for the report, in order to be valid. This listing is generated by 'codecov-action' from the files in the current working directory which implies that the current directory is the repository. Therefore, we must checkout the repository before triggering the 'codecov-action'. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
|
The coverage upload seems to work now again, thanks for the fix @maxloeffler! Codecov reports 11 lines in |
|
@bockthom coronet does not have tests explicitly focused on caching for any network types. Caching is being capitalized on by tests without notice. My opinion is that if you want tests to cover the caching lines for commit interaction networks then we need more regular tests regarding these networks (there are indeed not too many) but this does not concern this PR here. |
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Prerequisites
showcase.Rwith respect to my changes.dev.Description
Changelog
This works towards fixing #119.
Make internal network creation functions return network data instead of networks. This should improve performance as it removes the redundancy in the
get.author.network,get.artifact.network, andget.commit.networkwhen merging the returns of the internal network creation functions into a single network because merging networks requires decomposition in network data. This decomposition is not necessary anymore. As a consequence, this also removes the necessity to first compose networks before decomposing them directly after.Added
convert.edge.list.attributes.to.listthat converts edge attributes to lists similar toconvert.edge.attributes.to.listbut that takes an edge list as input instead of a networkconstruct.network.datathat takes vertex-, and edge-data and constructs a network data object while correctly initializing empty input dataChanged / Improved