Skip to content

Docs: Overhaul GSoC25_H /README.md to reflect current architecture#31

Open
mallasiddharthreddy wants to merge 3 commits intodbpedia:mainfrom
mallasiddharthreddy:docs-readme-overhaul
Open

Docs: Overhaul GSoC25_H /README.md to reflect current architecture#31
mallasiddharthreddy wants to merge 3 commits intodbpedia:mainfrom
mallasiddharthreddy:docs-readme-overhaul

Conversation

@mallasiddharthreddy
Copy link

@mallasiddharthreddy mallasiddharthreddy commented Feb 8, 2026

Context

The current GSoC25_H/README.md contained weekly personal logs and outdated "TODO" notes from the 2024/2025 cycle. This made it difficult for new contributors to understand the current folder structure and component capabilities.

Changes

I have refactored the README into standard technical documentation. It now includes:

  • Directory Structure: A clear tree view of the IndIE, llm_IE, and ReAct modules.
  • Component Documentation: Concise explanations of the extraction pipelines and benchmarks.
  • Installation & Usage: Updated setup instructions based on the current requirements.txt.
  • Benchmarks: Consolidated performance metrics for the Gemma-4B models tested in the llm_IE module.

Impact

This documentation update improves the onboarding experience for future GSoC candidates and maintainers.

Summary by CodeRabbit

  • Documentation
    • Project renamed to "DBpedia Hindi Extraction Framework" and README completely revamped with a modular overview and component taxonomy.
    • Added installation, CLI and Streamlit demo usage, model download/setup instructions, extraction-mode and prompting configuration examples, and end-to-end pipeline flow.
    • Included data/resource guidance, performance benchmarks, licensing/credits, known issues, and references for DBpedia-related utilities.

@coderabbitai
Copy link

coderabbitai bot commented Feb 8, 2026

Warning

Rate limit exceeded

@mallasiddharthreddy has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 10 minutes and 6 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📝 Walkthrough

Walkthrough

The README was fully rewritten to replace the "Neural Hindi Wiki Triple Extraction Pipeline" with a comprehensive "DBpedia Hindi Extraction Framework" overview, adding modular component descriptions, installation and usage instructions, configuration examples, data/model resources, benchmarks, and licensing/credits.

Changes

Cohort / File(s) Summary
Documentation Overhaul
GSoC25_H/README.md
Complete rewrite and expansion of the README: renamed project, introduced modular component taxonomy (IndIE, llm_IE, ReAct, link_prediction, src, models, ontology_input, assets), added directory mapping, entry points, extraction-mode and LLM prompting configurations, CLI/Streamlit usage, environment and model-download instructions, datasets/resources, benchmark results, known issues, license and credits.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • mommi84
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: a comprehensive documentation overhaul of the README to reflect the current project architecture, replacing outdated personal logs and TODO notes with standard technical documentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@GSoC25_H/README.md`:
- Around line 5-15: The README shows conflicting project cycles: the GSoC badge
reads “GSoC-2026_Candidate_Work” while the Overview paragraph mentions “GSoC
2024/2025 work”; pick the correct cycle and make them consistent by editing the
badge text or the Overview line in README.md so both reference the same year
(e.g., change the badge label or update the sentence in the Overview from “GSoC
2024/2025 work” to “GSoC 2026 work” or vice versa) and ensure the Note timestamp
(Feb 2026) aligns with the chosen cycle.
- Around line 20-80: Change the Markdown code fence for the directory tree in
README.md to specify a language identifier by replacing the opening ``` with
```text so the block starts with ```text and ends with ```; this satisfies MD040
and improves rendering for the tree listing.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@GSoC25_H/README.md`:
- Around line 340-348: README contains incomplete paths; update the three
referenced locations to the full consistent directory hierarchy used in this
repo (use either repo-root-relative or file-relative consistently).
Specifically, replace `hindi-benchie/hindi_benchie_gold.txt` with
`GSoC25_H/IndIE/hindi-benchie/hindi_benchie_gold.txt` (or
`./IndIE/hindi-benchie/hindi_benchie_gold.txt`), replace
`link_prediction/data/*.ttl.bz2` with `GSoC25_H/link_prediction/data/*.ttl.bz2`
(or `./link_prediction/data/*.ttl.bz2`), and replace
`ontology_input/ontology--DEV_type=parsed.ttl` with
`GSoC25_H/ontology_input/ontology--DEV_type=parsed.ttl` (or
`./ontology_input/ontology--DEV_type=parsed.ttl`); ensure the chosen convention
is applied uniformly across the README.
🧹 Nitpick comments (3)
GSoC25_H/README.md (3)

207-207: Replace the placeholder with actual repository URL or clearer instruction.

The <repository-url> placeholder should be replaced with either the actual GitHub repository URL or a more descriptive instruction such as https://github.com/dbpedia/neural-extraction-framework (based on the PR URL context).

📝 Suggested improvement
-git clone <repository-url>
+git clone https://github.com/dbpedia/neural-extraction-framework
 cd GSoC25_H

Or if the URL should remain flexible:

-git clone <repository-url>
+git clone https://github.com/dbpedia/neural-extraction-framework.git
+# Or clone from your fork: git clone https://github.com/YOUR_USERNAME/neural-extraction-framework.git
 cd GSoC25_H

282-283: Avoid hardcoded line numbers in documentation.

Referencing specific line numbers ("lines 45-50") creates a maintenance burden, as these become outdated when code changes. Consider describing the configuration location more generically or referencing a specific variable/section name.

📝 Suggested alternative
-# Configure extraction mode in main.py (lines 45-50)
-# Set: use_llm, llm_fallback, llm_enhancement, llm_filter_mode
+# Configure extraction mode in main.py
+# Locate the hyper_params dictionary and set: use_llm, llm_fallback, llm_enhancement, llm_filter_mode

448-448: Replace IP address link with official ACL Anthology URL.

The link uses an IP address (http://103.25.231.59:80), which is unreliable and unsustainable. Replace it with the official ACL Anthology entry:

- **IndIE Paper:** [IJCNLP-AACL 2023 Findings](https://aclanthology.org/2023.findings-ijcnlp.28/)

Alternatively, use the DOI reference: 10.18653/v1/2023.findings-ijcnlp.28

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant