Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .claude/settings.local.json
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@
"Bash(gh issue view:*)",
"Bash(pytest:*)",
"Bash(pip search:*)",
"Bash(psql:*)"
"Bash(psql:*)",
"Bash(OPTIMAP_LOGGING_LEVEL=WARNING python manage.py test tests.test_work_landing_page.PublicationStatusVisibilityTest)"
],
"deny": [],
"ask": []
Expand Down
19 changes: 17 additions & 2 deletions .claude/temp.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,21 @@
# OPTIMAP
add a button to the work landing page for the logged in admin that takes the user directly to the editing view in the Django backend.

for the article http://127.0.0.1:8000/work/10.1007/s11368-020-02742-9/ with the internal ID 949

the editing page is http://127.0.0.1:8000/admin/publications/publication/949/change/

# geoextent
--


expand all harvesting to identify an existing OpenAlex record based on the available unique identifier and store the OpenAlex ID together with the record; if there is no perfet match then the property of the record should be set to None and a seperate field should indicate which partial match(es) were found and what kind of match it was (e.g. DOI match, title+author match, etc);

expand all harvesting to include the messages that led to a warning log also in the email that is sent after the harvesting run, so that the user can see what went wrong without having to check the logs;

--


add feed-based harvesting support (RSS/Atom) for EarthArxiv;

all articles from EarthArxiv are available via https://eartharxiv.org/repository/list/

there is a feed at https://eartharxiv.org/feed/ but it is unclear how many articles it contains
143 changes: 17 additions & 126 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,135 +4,26 @@

### Added

- **RSS/Atom feed harvesting support** (`publications/tasks.py`)
- `parse_rss_feed_and_save_publications()` function for parsing RSS/Atom feeds
- `harvest_rss_endpoint()` function for complete RSS harvesting workflow
- Support for RDF-based RSS feeds (Scientific Data journal)
- DOI extraction from multiple feed fields (prism:doi, dc:identifier)
- Duplicate detection by DOI and URL
- Abstract/description extraction from feed content
- feedparser library integration (v6.0.12)
- Added to requirements.txt for RSS/Atom feed parsing
- Supports RSS 1.0/2.0, Atom, and RDF feeds
- Django management command `harvest_journals` enhanced for RSS/Atom feeds
- Added Scientific Data journal with RSS feed support
- Support for both OAI-PMH and RSS/Atom feed types
- Automatic feed type detection based on journal configuration
- Now supports 4 journals: ESSD, AGILE-GISS, GEO-LEO (OAI-PMH), Scientific Data (RSS)
- Comprehensive RSS harvesting tests (`RSSFeedHarvestingTests`)
- 7 test cases covering RSS parsing, duplicate detection, error handling
- Test fixture with sample RDF/RSS feed (`tests/harvesting/rss_feed_sample.xml`)
- Tests for max_records limit, invalid feeds, and HTTP errors
- Django management command `harvest_journals` for harvesting real journal sources
- Command-line options for journal selection, record limits, and source creation
- Detailed progress reporting with colored output
- Statistics for spatial/temporal metadata extraction
- Integration tests for real journal harvesting (`tests/test_real_harvesting.py`)
- 6 tests covering ESSD, AGILE-GISS, GEO-LEO, and EssOAr
- Tests skipped by default (use `SKIP_REAL_HARVESTING=0` to enable)
- Max records parameter to limit harvesting for testing
- Comprehensive error handling tests for OAI-PMH harvesting (`HarvestingErrorTests`)
- 10 test cases covering malformed XML, missing metadata, HTTP errors, network timeouts
- Test fixtures for various error conditions in `tests/harvesting/error_cases/`
- Verification of graceful error handling and logging
- pytest configuration with custom markers (`pytest.ini`)
- `real_harvesting` marker for integration tests
- Configuration for Django test discovery
- **Temporal extent contribution** - Users can now contribute temporal extent (start/end dates) in addition to spatial extent. Works can be published with either spatial, temporal, or both extents. Supports flexible date formats: YYYY, YYYY-MM, YYYY-MM-DD.
- **Complete status workflow documentation** - Documented all 6 publication statuses (Draft, Harvested, Contributed, Published, Testing, Withdrawn) with workflow transitions and visibility rules in README.md.
- **Map popup enhancement** - Added "View Publication Details" button to map popups linking to work landing pages.
- **Admin unpublish functionality** - Admins can unpublish works, changing status from Published to Draft.
- **RSS/Atom feed harvesting support** - Added support for harvesting publications from RSS/Atom feeds in addition to OAI-PMH.
- **Django management command `harvest_journals`** - Command-line tool for harvesting from real journal sources with progress reporting and statistics.
- **Comprehensive test coverage** - Added 40+ new tests covering temporal contribution, status workflow, RSS harvesting, error handling, and real journal harvesting.

### Changed

- Fixed OAI-PMH harvesting test failures by updating response format parameters
- Changed from invalid 'structured'/'raw' to valid 'geojson'/'wkt'/'wkb' formats
- Updated test assertions to expect GeoJSON FeatureCollection
- Fixed syntax errors in `publications/tasks.py`
- Fixed import statement typo
- Fixed indentation in `extract_timeperiod_from_html` function
- Fixed misplaced return statement in `regenerate_geopackage_cache` function
- Fixed test setup method in `tests/test_harvesting.py`
- Removed incorrect `@classmethod` decorator from `setUp` method
- Fixed `test_regular_harvesting.py` to include `max_records` parameter in mock function
- Updated README.md with comprehensive documentation for:
- Integration test execution
- `harvest_journals` management command usage
- Journal harvesting workflows
- **Unified contribution workflow** - Single "Submit contribution" button for both spatial and temporal extent. Users can submit either or both in one action.
- **Unified admin control panel** - Consolidated admin status display, publish/unpublish buttons, and "Edit in Admin" link into single highlighted box at top of work landing page.
- **Improved text wrapping** - Page titles and abstract text now properly wrap on narrow windows instead of overflowing.
- **Unified URL structure** - Changed ID-based URLs from `/publication/<id>/` to `/work/<id>/` for consistency with DOI-based URLs.
- **Refactored views_geometry.py** - Eliminated code duplication by making DOI-based functions wrap ID-based functions. Reduced from 375 to 240 lines (~36% reduction).
- **Renamed "Locate" to "Contribute"** - URL, page title, and navigation updated for clarity about crowdsourcing purpose.
- **Contribute page layout refactored** - Fixed text overflow issues with proper CSS containment strategy.
- **Flexible publishing requirements** - Harvested publications with geometry can be published directly without requiring user contribution.
- **Contribute page login button improved** - Changed to informational disabled button with clear text: "Please log in to contribute (user menu at top right)".

### Fixed

- Docker build for geoextent installation (added git dependency to Dockerfile)
- 18 geoextent API test failures due to invalid response format values
- 8 test setup errors in OAI-PMH harvesting tests
- Test harvesting function signature mismatch

### Deprecated

- None.

### Removed

- None.

### Security

- None.

## [0.2.0] - 2025-10-09

### Added

- Work landing page improvements:
- Clickable DOI links to https://doi.org resolver
- Clickable source links to journal homepages
- Link to raw JSON API response
- Publication title and DOI in HTML `<title>` tag
- Map enhancements on work landing page:
- Fullscreen control using Leaflet Fullscreen plugin
- Custom "Zoom to All Features" button
- Scroll wheel zoom enabled
- Comprehensive test suite for work landing page (9 tests)
- Comprehensive test suite for geoextent API (24 tests)

### Changed

- None.

### Fixed

- None.

### Deprecated

- None.

### Removed

- None.

### Security

- External links (DOI, source, API) now use `target="_blank"` with `rel="noopener"` for security

## [0.1.0] - 2025-04-16

### Added

- Changelog

### Changed

- None.

### Fixed

- None.

### Deprecated

- None.

### Removed

- None.

### Security

- None.
- **JavaScript scope error** - Fixed "drawnItems is not defined" error in contribution form by declaring variable in outer scope.
50 changes: 50 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,56 @@ The OPTIMAP has the following features:
- Start page with a full screen map (showing geometries and metadata) and a time line of the areas and time periods of interest for scientific publications
- Passwordless login via email
- RESTful API at `/api`
- **Crowdsourced metadata contribution**: Logged-in users can contribute spatial and temporal extent data for publications
- **Publication workflow**: Harvested → Contributed → Published status transitions with full provenance tracking
- **Admin controls**: Publish/unpublish functionality with audit trails

## Publication Status Workflow

Publications in OPTIMAP follow a status-based workflow with six possible states:

### Status Definitions

- **Draft** (`d`): Internal draft state. Not visible to public. Can be edited by admins. Created when unpublishing a published work.
- **Harvested** (`h`): Automatically harvested from OAI-PMH or RSS feeds. May or may not have spatial/temporal extent. Not publicly visible.
- **Contributed** (`c`): User has contributed spatial and/or temporal extent. Awaits admin review. Not publicly visible.
- **Published** (`p`): Public-facing works visible to all users via website, map, API, and feeds.
- **Testing** (`t`): Reserved for testing purposes. Not publicly visible. Admin access only.
- **Withdrawn** (`w`): Publication has been withdrawn or retracted. Not publicly visible.

### Workflow Transitions

**Harvesting → Publishing:**

1. Publication harvested from external source → Status: **Harvested** (`h`)
2. User contributes spatial/temporal extent → Status: **Contributed** (`c`)
3. Admin reviews and approves → Status: **Published** (`p`)
4. If needed, admin can unpublish → Status: **Draft** (`d`)

**Direct Publishing (Skip Contribution):**

- Harvested publications with **at least one extent type** (spatial OR temporal) can be published directly by admins without user contribution

**Contribution Requirements:**

- Users can only contribute to publications with **Harvested** (`h`) status
- Harvested publications **without any extent** require user contribution before publishing
- Contributed publications can always be published after admin review

**Visibility Rules:**

- Only **Published** (`p`) status is visible to non-admin users
- All other statuses require admin privileges to view
- Published works appear in: main map, work list, API responses, RSS/Atom feeds

**Extent Contribution:**

- Users can contribute **spatial extent** (geographic location) via interactive map with drawing tools
- Users can contribute **temporal extent** (time period) via date form (formats: YYYY, YYYY-MM, YYYY-MM-DD)
- Both extent types can be contributed separately or together in a single submission
- Publications without DOI are supported via ID-based URLs (`/work/<id>/`)
- All contributions are tracked with full provenance (user, timestamp, changes)
- Contribute page lists publications missing either spatial OR temporal extent

OPTIMAP is based on [Django](https://www.djangoproject.com/) (with [GeoDjango](https://docs.djangoproject.com/en/4.1/ref/contrib/gis/) and [Django REST framework](https://www.django-rest-framework.org/)) with a [PostgreSQL](https://www.postgresql.org/)/[PostGIS](https://postgis.net/) database backend.

Expand Down
76 changes: 76 additions & 0 deletions fixtures/test_data_optimap.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,5 +72,81 @@
"timeperiod_enddate": "[\"2024\"]",
"provenance": "Manually added from file test_data.json using the Django management script."
}
},
{
"model": "publications.publication",
"pk": 903,
"fields": {
"status": "c",
"title": "Contributed Paper - Hamburg Harbor Study",
"abstract": "This paper has been contributed by a user with geolocation data. It studies shipping traffic in Hamburg harbor.",
"publicationDate": "2022-05-15",
"doi": "10.5555/contrib1",
"url": "http://paper.url/contrib1",
"geometry": "SRID=4326;GEOMETRYCOLLECTION(POINT (9.9937 53.5511))",
"creationDate": "2023-01-15T10:20:30.086Z",
"lastUpdate": "2023-01-16T14:35:22.086Z",
"source": 9,
"timeperiod_startdate": "[\"2020\"]",
"timeperiod_enddate": "[\"2021\"]",
"provenance": "Harvested via OAI-PMH on 2023-01-15T10:20:30Z.\n\nGeometry contributed by user test_user@example.com on 2023-01-16T14:35:22Z. Changed geometry from empty to Point. Status changed from Harvested to Contributed."
}
},
{
"model": "publications.publication",
"pk": 904,
"fields": {
"status": "c",
"title": "Contributed Paper - Bavarian Alps Research",
"abstract": "User-contributed geolocation for a study about alpine ecosystems in Bavaria.",
"publicationDate": "2023-03-20",
"doi": "10.5555/contrib2",
"url": "http://paper.url/contrib2",
"geometry": "SRID=4326;GEOMETRYCOLLECTION(POLYGON ((10.5 47.3, 10.5 47.7, 11.2 47.7, 11.2 47.3, 10.5 47.3)))",
"creationDate": "2023-06-10T08:15:45.086Z",
"lastUpdate": "2023-06-11T16:22:10.086Z",
"source": 9,
"timeperiod_startdate": "[\"2022-06\"]",
"timeperiod_enddate": "[\"2023-06\"]",
"provenance": "Harvested via OAI-PMH on 2023-06-10T08:15:45Z.\n\nGeometry contributed by user scientist@example.org on 2023-06-11T16:22:10Z. Changed geometry from empty to Polygon. Status changed from Harvested to Contributed."
}
},
{
"model": "publications.publication",
"pk": 905,
"fields": {
"status": "h",
"title": "Harvested Paper Without DOI - Frankfurt Study",
"abstract": "This harvested paper has no DOI but has a URL identifier. It needs geolocation contribution.",
"publicationDate": "2022-08-10",
"doi": null,
"url": "http://repository.example.org/id/12345",
"geometry": "SRID=4326;GEOMETRYCOLLECTION EMPTY",
"creationDate": "2023-08-01T09:30:00.086Z",
"lastUpdate": "2023-08-01T09:30:00.086Z",
"source": 9,
"timeperiod_startdate": "[\"2021\"]",
"timeperiod_enddate": "[\"2022\"]",
"provenance": "Harvested via RSS feed on 2023-08-01T09:30:00Z from OPTIMAP Test Journal."
}
},
{
"model": "publications.publication",
"pk": 906,
"fields": {
"status": "c",
"title": "Contributed Paper Without DOI - Stuttgart Research",
"abstract": "This paper was harvested without a DOI, but a user contributed geolocation data using the URL identifier.",
"publicationDate": "2023-02-14",
"doi": null,
"url": "http://repository.example.org/id/67890",
"geometry": "SRID=4326;GEOMETRYCOLLECTION(POINT (9.1829 48.7758))",
"creationDate": "2023-09-05T11:45:00.086Z",
"lastUpdate": "2023-09-06T15:20:30.086Z",
"source": 9,
"timeperiod_startdate": "[\"2022\"]",
"timeperiod_enddate": "[\"2023\"]",
"provenance": "Harvested via RSS feed on 2023-09-05T11:45:00Z from OPTIMAP Test Journal.\n\nGeometry contributed by user researcher@example.com on 2023-09-06T15:20:30Z. Changed geometry from empty to Point. Status changed from Harvested to Contributed."
}
}
]
2 changes: 1 addition & 1 deletion optimap/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = "0.3.0"
__version__ = "0.4.0"
VERSION = __version__
1 change: 1 addition & 0 deletions publications/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
("t", "Testing"),
("w", "Withdrawn"),
("h", "Harvested"),
("c", "Contributed"),
)

EMAIL_STATUS_CHOICES = [
Expand Down
Loading
Loading