Skip to content

Improve analysis of python based projects #7964

@netomi

Description

@netomi

I was testing out ORT on a couple of projects and noticed that it was rather slow to analyse a poetry based python project (https://github.com/netomi/otterdog). The project has a lock file so I was assuming that the dependency resolution should be rather fast.

After some debugging, it turns out that ORT calls python-inspector on the requirements file that is exported from poetry:

13:42:35.163 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'python-inspector --python-version 311 --operating-system linux --json-pdt /tmp/ort-PythonInspector5612839292643848503/python-inspector5965944598659711246.json --analyze-setup-py-insecurely --requirement /tmp/ort-Poetry1934851817638935981/requirements.txt16944187604928218322.tmp --verbose' in '/tmp/ort-Poetry1934851817638935981'...
13:43:10.194 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.plugins.packagemanagers.python.Poetry - Generating 'requirements.txt7902172682984572843.tmp' file in '/home/tn/workspace/eclipse/otterdog' directory...

but as you can see from the log, it takes a while for python-inspector to resolve the dependencies although they are all pinned (from the lock file).

Digging into the code of python-inspector, I figured various performance improvements that could relatively easily be applied. I created a PR at aboutcode-org/python-inspector#163 .

Running this version of python-inspector with ORT on the same project, leads to far better results (see timestamps):

13:45:06.460 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'python-inspector --python-version 311 --operating-system linux --json-pdt /tmp/ort-PythonInspector13905845346354005296/python-inspector9105449330839367212.json --analyze-setup-py-insecurely --requirement /tmp/ort-Poetry17813648676616128042/requirements.txt17803415363049372639.tmp --verbose' in '/tmp/ort-Poetry17813648676616128042'...
13:45:17.402 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.plugins.packagemanagers.python.Poetry - Generating 'requirements.txt13258899092414060651.tmp' file in '/home/tn/workspace/eclipse/otterdog' directory...

with the exact same output. When you have multiple scopes defined in your project (in my case I have 5), this improvement can really sum up, I could bring the analysis from 2min down to 30s.

I would be happy if there is some feedback on the PR so we can get that into ORT asap, as I am currently investigating the ability of ORT running license checks automatically via GitHub actions and everything that speeds up the analysis is greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    analyzerAbout the analyzer tool

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions