Skip to content

[Spike] [MVP] Package maintenance predictive model #444

@mayaCostantini

Description

@mayaCostantini

Problem statement

While most approaches focus on guaranteeing the provenance of software components, this is only one side of sustainable software development. One other side is the focus on software components which are critical to the success of the whole software system, its development and delivery/operation.

cc @goern

As Python developer, I would like to be able to predict if some of my dependencies will go unmaintained with time.

The idea would be to develop a learning model able when a given package will go under an acceptable level of maintenance that could be defined by the user or directly in the model, in an arbitrary way.
A PoC for this model could use project maintenance data as provided by the OpenSSF Security Scorecards, given that the upstream project implements Scorecard checks per package version instead of updating Scorecards check given the project repository last commit SHA.

Proposal description

  1. Provide a PoC of a model trained on the Scorecards dataset (with Scorecard checks per package version) capable to predict from which version a package is susceptible to go under a predefined level of maintenance. A good candidate for this task could be a Multiple Linear Regression, given that MLR assumptions (linear relationship between predictive and response variables, predictive variables are not too correlated, etc) are validated. Other supervised learning models could also be considered.
  • Select features for prediction according to the model chosen
  • Aggregate and process data for training
  • Train and validate the model, and examine coherence of the results
  • Experiment with different models and document a benchmark
  1. Find relevant integrations for the model

Think about ways to provide this model as a service, and where in a Python project lifecycle it would be most relevant for developers to predict the maintenance duration of their dependencies.

Acceptance Criteria

To be defined.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.needs-triageIndicates an issue or PR lacks a `triage/...` label and requires one.priority/important-longtermImportant over the long term, but may not be staffed and/or may need multiple releases to complete.sig/stack-guidanceCategorizes an issue or PR as relevant to SIG Stack Guidance.

    Type

    No type

    Projects

    Status

    🆕 New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions