Skip to content

Conversation

@adelavega
Copy link
Contributor

Successfully implemented Taylor & Francis source parser for ACE with the following components:

Implementation Summary

1. Configuration File (ace/sources/TaylorAndFrancis.json)

  • Multiple publisher detection patterns (domain, meta tags, JavaScript object)
  • Custom entity mappings for HTML entities
  • Rate limiting configuration

2. Source Parser (TaylorAndFrancisSource in ace/sources.py)

Key Features:

  • Hybrid extraction approach: Extracts tables from JavaScript objects embedded in HTML
  • Primary method: Parses tandf.tfviewerdata JavaScript object containing table HTML
  • Fallback method: CSV download endpoints (placeholder implementation)
  • Metadata extraction: Table numbers, labels, and captions from JSON structure
  • Critical fix: Extracts JavaScript data BEFORE parent class removes script tags

Technical Implementation:

3. Test Suite (test_taylor_and_francis_source in ace/tests/test_ace.py)

  • Validates source identification
  • Tests table extraction from JavaScript data
  • Verifies metadata extraction (table numbers, labels, captions)
  • Confirms activation parsing from table content

4. Additional Fixes

  • Fixed BeautifulSoup parser warnings in ace/utils.py by specifying "lxml" parser
  • All existing tests continue to pass

Test Results

✅ Test passed successfully - extracts 2 tables from Taylor & Francis HTML with correct metadata and activations

The parser is now ready to handle Taylor & Francis publications and can be extended to support the CSV download fallback method for cases where JavaScript extraction fails.

@adelavega adelavega merged commit f652fbc into master Oct 15, 2025
1 check passed
@jdkent jdkent added this to Planning Dec 19, 2025
@jdkent jdkent moved this to Done in Planning Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants