Skip to content

Conversation

@maxloeffler
Copy link
Contributor

@maxloeffler maxloeffler commented Apr 7, 2025

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

Changelog

This works towards fixing #119.

Make internal network creation functions return network data instead of networks. This should improve performance as it removes the redundancy in the get.author.network, get.artifact.network, and get.commit.network when merging the returns of the internal network creation functions into a single network because merging networks requires decomposition in network data. This decomposition is not necessary anymore. As a consequence, this also removes the necessity to first compose networks before decomposing them directly after.

Added

  • Add helper function convert.edge.list.attributes.to.list that converts edge attributes to lists similar to convert.edge.attributes.to.list but that takes an edge list as input instead of a network
  • Add helper function construct.network.data that takes vertex-, and edge-data and constructs a network data object while correctly initializing empty input data

Changed / Improved

  • Internal network creation functions now return network data instead of networks

@maxloeffler maxloeffler changed the title Internally cache network data instead of networks to reduce redundancy in network constructionq Internally cache network data instead of networks to reduce redundancy in network construction Apr 7, 2025
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @maxloeffler!

I really appreciate your changes. Hope that it really boosts the performance of multi-network creation. And even if this is not the case, this PR contains numerous changes I am in favor of.

bockthom
bockthom previously approved these changes Apr 16, 2025
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@maxloeffler
Copy link
Contributor Author

Before I can finish the NEWS.md, I noticed that I forgot the email in the signed-off tag of the first few commit messages. I will add it in so don't wonder for all the commits 😄.

@codecov
Copy link

codecov bot commented Apr 16, 2025

Codecov Report

Attention: Patch coverage is 96.35762% with 11 lines in your changes missing coverage. Please review.

Please upload report for BASE (dev@6d4d726). Learn more about missing BASE report.
Report is 25 commits behind head on dev.

Files with missing lines Patch % Lines
util-networks.R 96.35% 11 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##             dev     #282   +/-   ##
======================================
  Coverage       ?   82.31%           
======================================
  Files          ?       16           
  Lines          ?     5367           
  Branches       ?        0           
======================================
  Hits           ?     4418           
  Misses         ?      949           
  Partials       ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bockthom
Copy link
Collaborator

Project coverage is 0.00%.

@maxloeffler Could you please figure out why the upload of the coverage report has failed? After the first failing upload (when Codecov started to report a project coverage of 0.00%) I've re-run the upload and it failed again. Thereafter, I've re-run the Build (latest) in order to generate a new coverage report, but even uploading the new report to codecov failed again – I have no idea why. Codecov only provides us with the following hint:

Unusable report due to issues such as source code unavailability, path mismatch, empty report, or incorrect data format. Please visit our troubleshooting document
for assistance.

As I have no idea what to do here, I'll leave that up to you to investigate this problem... (but it has a low priority; the other tasks you are currently working on are more important).

Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes regarding directedness. I have few comments on that - merely regarding tests and documentation.

@maxloeffler maxloeffler force-pushed the dev branch 4 times, most recently from 26aa527 to a61e6ef Compare May 27, 2025 14:43
@maxloeffler
Copy link
Contributor Author

Sorry for the many pushes, im just discovering typos ......

@bockthom
Copy link
Collaborator

Sorry for the many pushes, im just discovering typos ......

"such as example multi networks" is also a typo?

@bockthom bockthom added this to the v5.1 milestone May 30, 2025
bockthom
bockthom previously approved these changes May 30, 2025
Copy link
Collaborator

@bockthom bockthom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. 🥳
But we need to rerun the CI pipeline as soon as R 4.0 is removed from the pipeline so that we can figure out whether codecov complains about changes in this PR.

@bockthom bockthom requested a review from Copilot June 11, 2025 04:37
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances network building performance by caching internal network data rather than repeatedly constructing full network objects. It updates configuration to support new edge attributes and adds directedness enforcement tests.

  • Update allowed edge attributes in NetworkConf
  • Add tests to enforce directedness rules in author, artifact, and multi-networks
  • Refresh changelog to document caching and directedness behavior changes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
util-conf.R Add 2025 copyright; adjust allowed edge attributes
tests/test-networks.R Add directedness enforcement tests for partial & multi-networks
NEWS.md Update changelog entries for caching and directedness changes
Comments suppressed due to low confidence (3)

tests/test-networks.R:1174

  • There are directedness tests for author and artifact networks but none for commit networks. Consider adding similar tests for get.commit.network() to ensure commit-network directedness is correctly enforced.
test_that("Enforcement of directedness in sub-networks", {

NEWS.md:14

  • [nitpick] The changelog entry references a long list of commit hashes which makes it hard to read. Consider condensing the commit list by referring to the pull request number or grouping hashes, or splitting into multiple bullet points.
- Reduce the amount of redundantly built networks by caching network data internally. This should improve the performance of building multi-networks, especially, when parts of the multi-networks have been built before (#119, PR #282, 64ac42aa743e7f3a724a66bcd551e5b477e30293, ...)

util-conf.R:887

  • The update to allowed edge attributes removed 'author.name' and 'author.mail', which are needed when building author-related networks. Please include these attributes back to ensure author metadata isn't dropped.
"event.date", "event.name", "event.info.1", "event.info.2"

This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This function works analogous to 'convert.edge.attributes.to.list' but
takes an edge list as input instead of a network.

This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This helper function creates a network data object from vertex data and
edge data while correctly initializing empty input data.

This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'get.author.network', 'get.artifact.netork', and 'get.commit.network'
now receive network data instead of networks from the internal network
creation functions. Then these functions merges the received data
instead of merging networks (like it used to be).

This approach should improve performance as it removes the redundancy of
decomposing networks in network data to merge them.

This works towards fixing se-sic#119.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
A NULL default for 'edge.data' or 'allowed.edge.attributes' does not
correctly represent the default use-case of 'construct.network.data'.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
'author.name' and 'author.mail' are redundant and can be removed.

While we currently do not build networks that have 'event.info.1' or
'event.info.2' edge attributes these attributes are present on the
source data and should be allowed edge attributes.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
The directedness of the edge construction algorithm should always align
with the directedness of the constructed network. Further, when one
sub-network of a multi-network requires (un)directed edge construction,
all sub-networks of that multi-network must inherit this directedness.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
After a change in Codecov's coverage report processing, uploaded reports
must now contain a listing of all files that could be relevant for the
report, in order to be valid.
This listing is generated by 'codecov-action' from the files in the
current working directory which implies that the current directory is
the repository. Therefore, we must checkout the repository before
triggering the 'codecov-action'.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
@bockthom
Copy link
Collaborator

bockthom commented Jun 27, 2025

The coverage upload seems to work now again, thanks for the fix @maxloeffler!

Codecov reports 11 lines in util-networks.R that are not covered. Could you please have a look at them?
The first two lines reported can be ignored, codecov complains about two lines of comments not being covered 🙈 The remaining 9 affect either callgraph or commit-interaction networks. Callgraph can also be ignored.
For the commit-interaction networks, @hechtlC and I are not sure whether these lines should be covered by tests or not.
What is your opinion @maxloeffler ? Maybe we need to enhance existing tests that deal with network caching by also testing for cached commit-interaction networks.

@maxloeffler
Copy link
Contributor Author

@bockthom coronet does not have tests explicitly focused on caching for any network types. Caching is being capitalized on by tests without notice. My opinion is that if you want tests to cover the caching lines for commit interaction networks then we need more regular tests regarding these networks (there are indeed not too many) but this does not concern this PR here.

Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
@bockthom bockthom merged commit 6b9a597 into se-sic:dev Jul 1, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants