-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feature Request: Support for University Subscriptions to Access Paywalled Journals #194
Description
Hi 👋
First of all, thank you for the great work on AutoresearchClaw. It's an incredibly useful tool for automating literature discovery.
I’d like to propose a feature that could significantly improve access to academic resources: integration with university or institutional subscriptions.
Currently, ResearchClaw effectively gathers information from open-access sources like arXiv, Semantic Scholar, and OpenAlex, and can extract content from publicly accessible PDFs. This is crucial for novelty assessment and citation verification.
However, a significant portion of academic literature is behind paywalls or requires institutional subscriptions. For users affiliated with universities or research institutions, access to these papers is often available through proxies, VPNs, or single sign-on (SSO) mechanisms provided by their institution.
This feature request proposes to explore and implement mechanisms that would allow ResearchClaw to leverage existing university/institutional access for subscription-based content. This would greatly enhance the comprehensiveness of literature reviews and the depth of research that ResearchClaw can perform, as it would enable access to a much broader range of academic papers.
Proposed Approach (to be investigated):
- Proxy Configuration: Allow users to configure institutional proxy settings (HTTP/S proxy) that ResearchClaw can use for all outbound web requests, particularly when attempting to download PDFs or access journal websites.
- SSO/Authentication Integration (Complex): Investigate if there are feasible ways to integrate with common university SSO systems (e.g., Shibboleth, OpenAthens) or credential managers to authenticate and gain access to paywalled content. This is likely more complex but would offer a more robust solution.
- Cookies/Session Management: Explore if providing specific cookies or session tokens (obtained manually by the user) could be used by the WebCrawler (researchclaw/web/crawler.py) or PDFExtractor (researchclaw/web/pdf_extractor.py) to maintain an authenticated session with journal publishers.
- Integration with Existing Tools: Research existing open-source tools or libraries that handle institutional access to academic resources, and evaluate their potential for integration into ResearchClaw.
Benefits:
- Expanded Literature Coverage: Significantly broaden the scope of papers accessible to ResearchClaw beyond open-access sources.
- Enhanced Research Quality: Allow for more thorough and complete literature reviews, leading to higher-quality research outputs.
- Increased User Value: Make ResearchClaw more valuable for academic users who have institutional access but currently cannot utilize it within the system.
This would be a significant step towards making AutoResearchClaw an even more powerful tool for comprehensive academic research.