-
Notifications
You must be signed in to change notification settings - Fork 27
bug extracting readthedocs badges with regexp. Fixes #860, Fixes #857 #863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| The following metadata fields can be extracted from a readme.md file. | ||
| Unlike others files formats (pom, cargo, cabal...), README documents do not follow a formal specification. They are free‑form text files, usually written in markdown or restructuredtext, and their structure varies widely across projects. SOMEF applies heuristics to identify common sections (e.g., Title, Description, Installation, Usage, License...) and extracts metadata accordingly. | ||
|
|
||
| | Software metadata category | SOMEF metadata JSON path | README.MD metadata file field | | ||
| |--------------------------------|----------------------------------------|----------------------------------------| | ||
| | acknowledgement | acknowledgement[i].result.value | hearders with acknowledgement | | ||
| | citation | citation[i].result.value | headers with citation, reference, cite. Extract bibtext **(1)** | | ||
| | contact | contact[i].result.value | headers with contact | | ||
| | contributing_guidelines | contributing_guidelines[i].result.value | headers with contributing | | ||
| | contributors | contributors[i].result.value | headers with contributor | | ||
| | description | description[i].result.value | headers with description, introduction, basics, initiation, overview | | ||
| | documentation | documentation[i].result.value | github or gitlab url documentation **(2)**, headers with documentation, readthedocs same name project, readthedocs in badges, wiki links in badges and text | | ||
| | download | download[i].result.value | headers with download | | ||
| | executable_example | executable_example[i].result.value | extracts Binder from badgets **(3)** | | ||
| | faq | faq[i].result.value | headers with faq, errors, problems | | ||
| | full_title | full_title[i].result.value | extract full title **(4)** | | ||
| | homepage | homepage[i].result.value | homepage from badgets **(5)** | | ||
| | identifier | idenfier[i].result.value | extract from badgets directly or get from zenodo with latest doi **(6)**, swh identifiers **(7)** | | ||
| | images | images[i].result.value | other images in the README apart from the logo | | ||
| | installation | installation[i].result.value | headers with installation, install, setup, prepare, preparation, manual, guide | | ||
| | license | license[i].result.value | headers with license | | ||
| | logo | logo[i].result.value | look images in badges and text **(8)** | | ||
| | package_distribution | package_distribution[i].result.value | Pypi or latest Pypi version in badges **(9)** | | ||
| | related_documentation | dorelated_documentationumentation[i].result.value | readthedocs diferent name project | | ||
| | run | run[i].result.value | headers with run, execute | | ||
| | readme_url | readme_url[i].result.value | url in raw githubuser content **(10)** | | ||
| | related_papers | related_papers[i].result.value | look for arXiv reference in all the text **(11)** | | ||
| | repository_status | repository_status[i].result.value | badges with Project status **(12)** | | ||
| | requirements | requirements[i].result.value | headers with requirement, prerequisite, dependency, dependent | | ||
| | support | support[i].result.value | headers with support, help, report | | ||
| | support_channels | support_channels[i].result.value | extract information of gitter, reddit and discord in badges and text **(13)** | | ||
| | usage | usage[i].result.value | headers with usage, example, implement, implementation, demo, tutorial, start, started | | ||
|
|
||
|
|
||
| ------ | ||
|
|
||
| **(1)** | ||
| - Example: | ||
| ```bib | ||
| @inproceedings{garijo2017widoco, | ||
| title={WIDOCO: a wizard for documenting ontologies}, | ||
| author={Garijo, Daniel}, | ||
| booktitle={International Semantic Web Conference}, | ||
| pages={94--102}, | ||
| year={2017}, | ||
| organization={Springer, Cham}, | ||
| doi = {10.1007/978-3-319-68204-4_9}, | ||
| funding = {USNSF ICER-1541029, NIH 1R01GM117097-01}, | ||
| url={http://dgarijo.com/papers/widoco-iswc2017.pdf} | ||
| } | ||
| ``` | ||
| - Result: | ||
| ``` | ||
| { | ||
| "result": { | ||
| "value": "@inproceedings{garijo2017widoco,\n url = {http://dgarijo.com/papers/widoco-iswc2017.pdf},\n funding = {USNSF ICER-1541029, NIH 1R01GM117097-01},\n doi = {10.1007/978-3-319-68204-4_9},\n organization = {Springer, Cham},\n year = {2017},\n pages = {94--102},\n booktitle = {International Semantic Web Conference},\n author = {Garijo, Daniel},\n title = {WIDOCO: a wizard for documenting ontologies},\n}", | ||
| "type": "Text_excerpt", | ||
| "format": "bibtex", | ||
| "doi": "10.1007/978-3-319-68204-4_9", | ||
| "title": "WIDOCO: a wizard for documenting ontologies", | ||
| "author": "Garijo, Daniel", | ||
| "url": "http://dgarijo.com/papers/widoco-iswc2017.pdf" | ||
| }, | ||
| } | ||
| ``` | ||
|
|
||
|
|
||
| **(2)** | ||
| - Example if github: | ||
| ``` | ||
| f"https://github.com/{owner}/{repo_name}/tree/{urllib.parse.quote(repo_default_branch)}/{docs_path}" | ||
| ``` | ||
| - Example if gitlab: | ||
| ``` | ||
| f"https://{domain_gitlab}/{owner}/{repo_name}/-/tree/{urllib.parse.quote(repo_default_branch)}/{docs_path}" | ||
| ``` | ||
|
|
||
| **(3)** | ||
| - Example: `[](https://mybinder.org/v2/gh/user/repo/HEAD)` | ||
| - Result: `"value": "https://mybinder.org/v2/gh/user/repo/HEAD"` | ||
|
|
||
| **(4)** | ||
| - Example: `# WIzard for DOCumenting Ontologies (WIDOCO)` | ||
| - Result: | ||
| ``` | ||
| "full_title": [ | ||
| { | ||
| "result": { | ||
| "type": "String", | ||
| "value": "WIzard for DOCumenting Ontologies (WIDOCO)" | ||
| }, | ||
| "confidence": 1, | ||
| "technique": "regular_expression", | ||
| "source": "https://raw.githubusercontent.com/dgarijo/Widoco/master/README.md" | ||
| } | ||
| ] | ||
| ``` | ||
|
|
||
| **(5)** | ||
| - Example: `[](https://myproject.org)` | ||
| - Result: `"value": "https://myproject.org"` | ||
|
|
||
|
|
||
| **(6)** | ||
| - Example: `[](https://doi.org/10.5281/zenodo.11093793)` | ||
| - Result: `"value": "https://doi.org/10.5281/zenodo.11093793"` | ||
|
|
||
| **(7)** | ||
| - Example: `[](https://archive.softwareheritage.org/swh:1:dir:40d462bbecefc3a9c3e810567d1f0d7606e0fae7;origin=...)` | ||
| - Result: ` "value": "https://archive.softwareheritage.org/swh:1:dir:40d462bbecefc3a9c3e810567d1f0d7606e0fae7",` | ||
|
|
||
|
|
||
| **(8)** | ||
| - Example: `` | ||
| - Result: `"value": "https://raw.githubusercontent.com/dgarijo/Widoco/master/src/main/resources/logo/logo2.png"`` | ||
|
|
||
| **(9)** | ||
| - Example: `[](https://badge.fury.io/py/somef) ` | ||
| - Result: `"value": "https://pypi.org/project/somef"` | ||
|
|
||
|
|
||
| **(10)** | ||
| - Example: | ||
| ``` | ||
| [Yulun Zhang](http://yulunzhang.com/), [Yapeng Tian](http://yapengtian.org/), [Yu Kong](http://www1.ece.neu.edu/~yukong/), [Bineng Zhong](https://scholar.google.de/citations?user=hvRBydsAAAAJ&hl=en), and [Yun Fu](http://www1.ece.neu.edu/~yunfu/), "Residual Dense Network for Image Super-Resolution", CVPR 2018 (spotlight), [[arXiv]](https://arxiv.org/abs/1802.08797) | ||
| ``` | ||
| - Result: `"value": "https://arxiv.org/abs/1802.08797"` | ||
|
|
||
|
|
||
| **(11)** | ||
| - Example: | ||
| ``` | ||
| f"https://raw.githubusercontent.com/{owner}/{repo_name}/{repo_ref}/{urllib.parse.quote(partial)}" | ||
| ``` | ||
|
|
||
| **(12)** | ||
| - Example: | ||
| ``` | ||
| [](https://www.repostatus.org/#active) | ||
| ``` | ||
| - Result: | ||
| ``` | ||
| "value": "https://www.repostatus.org/#active", | ||
| "description": "Active \u2013 The project has reached a stable, usable state and is being actively developed." | ||
| ``` | ||
|
|
||
| **(13)** | ||
| - Example: | ||
| ``` | ||
| [](https://gitter.im/myproject/community) | ||
| [Reddit](https://www.reddit.com/r/myproject) | ||
| [Discord](https://discord.com/invite/xyz789) | ||
| ``` | ||
| - Result: | ||
| ``` | ||
| "value": "https://gitter.im/myproject/community" | ||
| .... | ||
| "value": "https://www.reddit.com/r/myproject" | ||
| ..... | ||
| "value": "https://discord.com/invite/xyz789" | ||
| ``` | ||
|
|
||
|
|
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,33 @@ | ||
|
|
||
| SOMEF recognizes the programming languages used in a software repository by inspecting | ||
| well-known configuration files, dependency descriptors and executable artifacts. | ||
| To know more about the extraction details for each type of file, click on it. | ||
|
|
||
|
|
||
|
|
||
| | Language | Supported Files | | ||
| |-----------|----------------------------| | ||
| | Haskell | [`*.cabal`](./cabal.md) | | ||
| | Java | [`pom.xml`](./pom.md) | | ||
| | JavaScript | [`package.json`](./packagejson.md), [`bower.json`](./bower.md) | | ||
| | Julia | [`Project.toml`](./julia.md) | | ||
| | PHP | [`composer.json`](./composer.md) | | ||
| | Python | [`setup.py`](./setuppy.md), [`pyproject.toml`](./pyprojecttoml.md), [`requirements.txt`](./requirementstxt.md) | | ||
| | R | [`DESCRIPTION`](./description.md) | | ||
| | Ruby | [`*.gemspec`](./gemspec.md) | | ||
| | Rust | [`Cargo.toml`](./cargo.md) | | ||
|
|
||
| --- | ||
|
|
||
| SoMEF also detects the following files to recognize build instructions, workflows or executable examples: | ||
|
|
||
|
|
||
| | Language | Supported Files | Software metadata category | | ||
| |-----------|------------------------------------|-----------------------------| | ||
| | Docker | `Dockerfile`, `docker-compose.yml` | has_built_file | ||
| | Jupyter Notebook | `*.ipynb` | executable_example | | ||
| | Ontologies | `*.ttl`, `*.owl`, `*.nt`, `*.xml`, `*.jsonld` | ontologies | | ||
| | Shell | `*.sh` | has_script_file | | ||
| | YAML | `*.yml`, `*.yaml` | continuous_integration, workflows | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this file? Shouldn't it be requirements.txt?