Skip to content

feat: add Chinese academic & policy database adapters#243

Open
Muuuun wants to merge 4 commits intojackwener:mainfrom
Muuuun:feat/chinese-academic-policy-adapters
Open

feat: add Chinese academic & policy database adapters#243
Muuuun wants to merge 4 commits intojackwener:mainfrom
Muuuun:feat/chinese-academic-policy-adapters

Conversation

@Muuuun
Copy link

@Muuuun Muuuun commented Mar 22, 2026

Summary

  • Add 7 new site adapters for Chinese academic databases and government policy/law databases
  • Academic: baidu-scholar/search, wanfang/search, google-scholar/search
  • Policy & Law: gov-law/search, gov-law/recent, gov-policy/search, gov-policy/recent

Details

Command Strategy Description
baidu-scholar search <query> cookie + DOM 百度学术论文搜索 (title, authors, journal, year, citations)
wanfang search <query> cookie + DOM 万方数据论文搜索 (title, authors, source, year, type, citations)
google-scholar search <query> cookie + DOM Google Scholar 学术搜索 (title, authors, source, year, citations)
gov-law search <query> cookie + Vue Router 国家法律法规数据库搜索 (title, status, date, type, department)
gov-law recent cookie + Vue Router 最新法律法规列表
gov-policy search <query> cookie + DOM 中国政府网政策文件搜索 (title, description, date, url)
gov-policy recent cookie + DOM 国务院最新政策文件

Notable techniques

  • gov-law: The site is a Vue 3 SPA with Element UI. Search is triggered by programmatically navigating via Vue Router.push() with searchWord query param, after injecting the search term into the input via native value setter to trigger Vue's reactivity.
  • gov-policy: Search URL discovered from inline JS source — sousuo.www.gov.cn/sousuo/search.shtml (not the obvious sousuo.gov.cn which doesn't resolve).
  • wanfang: Uses obfuscated CSS class names but span.title, span.authors, span.essay-type remain stable selectors.

Test plan

  • npx tsc --noEmit — type check passed
  • npx vitest run src/ — 306 tests passed
  • opencli validate — 86 CLI definitions validated, 0 errors
  • Manual testing of all 7 commands with real data

🤖 Generated with Claude Code

Add 7 new adapters for Chinese academic and government databases:

Academic:
- baidu-scholar/search: 百度学术论文搜索 (cookie + DOM extraction)
- wanfang/search: 万方数据论文搜索 (cookie + DOM extraction)
- google-scholar/search: Google Scholar 学术搜索 (cookie + DOM extraction)

Policy & Law:
- gov-law/search: 国家法律法规数据库搜索 (cookie + Vue Router injection)
- gov-law/recent: 最新法律法规 (cookie + Vue Router)
- gov-policy/search: 中国政府网政策文件搜索 (cookie + DOM extraction)
- gov-policy/recent: 国务院最新政策文件 (cookie + DOM extraction)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@Astro-Han Astro-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great coverage of academic and government data sources — the gov-law Vue Router technique is clever. A few things I noticed:

gov-policy/recent.ts — missing navigateBefore: false

The other 6 adapters all set navigateBefore: false, but this one doesn't. With Strategy.COOKIE + domain, the framework will auto-navigate to www.gov.cn before func runs, then func navigates again to the target URL — double navigation adding 2-4s overhead.

Strategy — COOKIE vs PUBLIC

These sites all serve public data without requiring login. Strategy.COOKIE forces users to go through the browser extension flow, while Strategy.PUBLIC (with browser: true if DOM extraction is needed) would be lighter. See google/search.ts for a similar pattern.

baidu-scholar/search.ts:43 — duplicate condition

if (t.startsWith('《') || t.startsWith('《'))

Both sides are the same character (U+300A). The second branch is always redundant.

gov-law — Vue Router fallback

app.__vue_app__.config.globalProperties.$router is a Vue 3 internal — if the site upgrades or restructures, this silently returns nothing. A null guard with a descriptive CliError would help users understand why the command stopped working.

Tests & docs

No E2E tests or documentation updates (README, docs/adapters/, SKILL.md, vitepress sidebar). Per TESTING.md, browser commands should have entries in browser-public.test.ts (or browser-auth.test.ts).

Mu Qiao and others added 2 commits March 22, 2026 16:13
- Change Strategy.COOKIE to Strategy.PUBLIC + browser:true for all
  adapters (these sites serve public data without login)
- Add navigateBefore:false to gov-policy/recent.ts (was missing,
  causing double navigation)
- Fix duplicate condition in baidu-scholar/search.ts (both sides
  of || were identical U+300A)
- Add Vue Router null guards with CliError to gov-law/search.ts
  and gov-law/recent.ts for graceful failure if site restructures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add 7 browser-public E2E tests covering all new adapters:
- baidu-scholar/search
- google-scholar/search
- wanfang/search
- gov-law/recent, gov-law/search
- gov-policy/recent, gov-policy/search

Tests use tryBrowserCommand + expectDataOrSkip pattern (warn+pass
on geo-blocking/bot-detection, per TESTING.md conventions).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Contributor

@Astro-Han Astro-Han left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback! Strategy, tests, duplicate condition, and Vue Router guard all look good now. LGTM from my side.

Minor remaining nits (non-blocking):

  • gov-law/recent.ts and search.ts share ~15 lines of identical DOM extraction — could be a shared helper
  • The 'no_router' return from evaluate is not consumed; the location.href check works but is indirect
  • No doc updates (README, docs/adapters/, SKILL.md) — deferring to maintainer on whether that's needed in this PR

- Extract navigateViaVueRouter() and extractLawResults() into
  gov-law/shared.ts — eliminates ~15 lines of duplication
- CliError is now thrown directly in shared helper (no unconsumed
  'no_router' return value)
- search.ts and recent.ts simplified to ~20 lines each

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants