Skip to content

fix(seo): apply critical and high-priority SEO fixes#248

Open
pbrissaud wants to merge 2 commits intomainfrom
seo/critical-and-high-fixes
Open

fix(seo): apply critical and high-priority SEO fixes#248
pbrissaud wants to merge 2 commits intomainfrom
seo/critical-and-high-fixes

Conversation

@pbrissaud
Copy link
Member

Summary

  • Add robots.txt (was missing) — no /_next/* block, AI crawler rules for GPTBot/ClaudeBot/PerplexityBot
  • Add public/llms.txt for AI search readiness (ChatGPT, Perplexity, Claude)
  • Fix relative URLs in generateLearningResourceSchema and generateCourseSchema — now absolute
  • Fix generateMetadata returning {} on 404 in challenges/[slug] and themes/[slug]noIndex: true
  • Add noindex metadata to login page
  • Guard ReactQueryDevtools behind NODE_ENV=development (~150KB JS saved in prod)
  • Fix schema spec violations in lib/seo.ts: SearchAction.target, applicationCategory, SoftwareApplication.url, hasCourseInstance, remove invalid timeRequired from BlogPosting, add @id to Organization and WebSite
  • Move SEO audit docs to docs/seo/

Test plan

  • Verify https://kubeasy.dev/robots.txt is accessible and doesn't block /_next/
  • Verify https://kubeasy.dev/llms.txt is accessible
  • Check Google Rich Results Test on a challenge page — LearningResource URL should be absolute
  • Check Google Rich Results Test on a theme page — Course URL should be absolute
  • Confirm /login has noindex in page source
  • Confirm ReactQueryDevtools is absent from production bundle
  • Validate schemas via schema.org validator

🤖 Generated with Claude Code

- Add robots.txt (was missing) without /_next/* block; include AI crawler rules
- Add public/llms.txt for AI search readiness (ChatGPT, Perplexity, Claude)
- Fix relative URLs in generateLearningResourceSchema and generateCourseSchema
- Fix generateMetadata returning {} on 404 in challenges/[slug] and themes/[slug]
- Add noindex metadata to login page
- Guard ReactQueryDevtools behind NODE_ENV=development (~150KB JS saved in prod)
- Fix schema spec violations in lib/seo.ts:
  - SearchAction.target: remove EntryPoint wrapper, use plain string
  - applicationCategory: "DeveloperApplication" → "Developer Tools"
  - Add url to SoftwareApplication schema
  - Add hasCourseInstance to Course schema
  - Remove invalid timeRequired from BlogPosting schema
  - Add @id to Organization and WebSite schemas
- Move SEO audit docs to docs/seo/

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Mar 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
website Ready Ready Preview, Comment Mar 13, 2026 6:54pm

Request Review

@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @pbrissaud's task in 2m 11s —— View job


PR Review: fix(seo): apply critical and high-priority SEO fixes

Tasks

  • Gather context and understand the PR
  • Read changed files
  • Analyze code quality, bugs, security, performance
  • Post review feedback

Overall Assessment

Good set of targeted SEO fixes. The absolute URL fix for structured data schemas and the noIndex on 404s are the most impactful changes. A few issues worth addressing before merging.


Issues Found

🔴 applicationCategory regression — lib/seo.ts:121

The PR changes applicationCategory from "DeveloperApplication""Developer Tools". This is actually a step backward: "DeveloperApplication" maps to the schema:DeveloperApplication enumeration recognized by structured data validators, while "Developer Tools" is an arbitrary string. Google's Rich Results documentation recommends using the schema.org enum URL (https://schema.org/DeveloperApplication) or its short form. The original value was closer to spec.

Fix this →


🟡 robots.txtHost: directive is Yandex-only — public/robots.txt:20

Host: https://kubeasy.dev

The Host: directive is only understood by Yandex (and even there it's legacy). Google, Bing, and all other major crawlers ignore it. It also incorrectly includes the https:// prefix — Yandex's spec expects just the hostname (kubeasy.dev). Safe to remove.

Fix this →


🟡 robots.txt — inconsistent User-agent casing — public/robots.txt

The default block uses User-agent: (lowercase a), while GPTBot, OAI-SearchBot, ClaudeBot and PerplexityBot blocks use User-Agent: (uppercase A). The spec is case-insensitive, but consistency makes the file easier to read and maintain.


🟡 robots.txt/login not disallowed despite being noindexpublic/robots.txt

The login page is now noindex, but it's not in robots.txt. The noindex meta tag is sufficient for Google, but best practice is to keep both in sync. Consider adding:

Disallow: /login

🟡 ReactQueryDevtools — static import may not fully eliminate from prod bundle — app/layout.tsx:16

import { ReactQueryDevtools } from "@tanstack/react-query-devtools";
// ...
{process.env.NODE_ENV === "development" && <ReactQueryDevtools initialIsOpen={false} />}

Next.js/webpack does perform dead code elimination on process.env.NODE_ENV constants, so this likely works correctly in practice. However, a static import at module level means the bundler must analyze the module before it can tree-shake it. For a guaranteed ~150KB saving, a lazy import is more reliable:

// Only import in development - guaranteed to not be bundled in prod
const ReactQueryDevtools =
  process.env.NODE_ENV === "development"
    ? (await import("@tanstack/react-query-devtools")).ReactQueryDevtools
    : null;

Or simply wrap in a dedicated DevTools component with a dynamic import. The current approach is acceptable if CI build validation confirms the production bundle doesn't include it.


🟡 hasCourseInstance.courseWorkload hardcoded — lib/seo.ts:201-206

hasCourseInstance: {
  "@type": "CourseInstance",
  courseMode: "online",
  courseWorkload: "PT30M",  // Hardcoded 30 min for ALL themes
  inLanguage: "en",
},

courseWorkload: "PT30M" is applied to every theme regardless of its actual challenge count. A theme with 15 challenges will still show 30 min. This will fail Google's Rich Results Test with incorrect data. Consider either passing courseWorkload as a parameter to generateCourseSchema, calculating it from the total estimated time, or omitting the field entirely until accurate data is available.


🟢 llms.txt — hardcoded URLs may go stale — public/llms.txt

The blog URLs and challenge URLs are hardcoded static content. This is fine for now, but if posts are removed or challenge slugs change, the file will contain broken links. No action required immediately, but worth tracking as the content library grows.


Positive Changes ✅

  • Absolute URLs in structured data — Critical fix. Relative URLs in @type: LearningResource and @type: Course would fail validation.
  • noIndex: true on 404 generateMetadata — Previously returning {} meant Next.js would use inherited/default metadata; now correctly suppresses indexing of 404 pages.
  • noIndex on login page — Correct; login has no SEO value and should not appear in search results.
  • @id on Organization and WebSite — Enables proper entity linking in Google's Knowledge Graph.
  • SearchAction.target as plain string — Correct per Google's current Sitelinks Searchbox spec (EntryPoint object was deprecated).
  • aggregateRating: undefined removal — Eliminates an invalid null-ish property from the schema output.
  • hasCourseInstance addition — Makes Course schema more complete (minus the hardcoded workload noted above).
  • robots.txt AI crawler rules — Forward-looking addition for GPTBot/ClaudeBot/PerplexityBot.

- Revert applicationCategory to "DeveloperApplication" (valid schema.org enum)
- Use dynamic import for ReactQueryDevtools — guaranteed exclusion from prod bundle
- Remove courseWorkload from hasCourseInstance (was hardcoded 30min for all themes)
- Fix robots.txt: remove Yandex-only Host: directive, normalize User-agent casing,
  add Disallow: /login to match noindex meta tag

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link

claude bot commented Mar 13, 2026

Claude finished @pbrissaud's task in 1m 50s —— View job


PR Review: fix(seo): apply critical and high-priority SEO fixes

Tasks

  • Gather context and understand the PR
  • Read changed files and diff
  • Analyze code quality, bugs, security, performance
  • Post review feedback

Follow-up Review (after review feedback commit)

All 6 issues raised in the initial review have been addressed. Here's a verification and the one remaining item I spotted.


Previous Issues — All Resolved ✅

Issue Status
applicationCategory: "Developer Tools" (schema regression) ✅ Reverted to "DeveloperApplication" (lib/seo.ts:120)
Host: https://kubeasy.dev (Yandex-only directive) ✅ Removed from public/robots.txt
Inconsistent User-agent casing ✅ All entries now use User-agent: (lowercase a)
/login not in robots.txt Disallow ✅ Added at line 7
ReactQueryDevtools static import (prod bundle risk) ✅ Replaced with next/dynamic — guaranteed code splitting
Hardcoded courseWorkload: "PT30M" on all themes ✅ Field removed from hasCourseInstance

New Observation

🟡 generateBlogPostSchema is dead code — lib/seo.ts:266-321

The generateBlogPostSchema function is defined in lib/seo.ts but is no longer imported or called anywhere in the codebase (the blog page only imports generateMetadata and stringifyJsonLd). This leaves ~55 lines of unused code. It should either be removed or re-wired to the blog post page to actually emit the BlogPosting JSON-LD schema (which would be a meaningful SEO improvement for blog posts).

Fix this →


Overall Assessment

The PR is in good shape. The absolute URL fix for structured data, noIndex on 404s, dynamic ReactQueryDevtools, and the corrected robots.txt are all solid improvements. The dead code in generateBlogPostSchema is the only remaining item — whether to remove or re-wire it is a judgement call. Ready to merge once that's decided.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant