Docs: Overhaul GSoC25_H /README.md to reflect current architecture#31
Docs: Overhaul GSoC25_H /README.md to reflect current architecture#31mallasiddharthreddy wants to merge 3 commits intodbpedia:mainfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📝 WalkthroughWalkthroughThe README was fully rewritten to replace the "Neural Hindi Wiki Triple Extraction Pipeline" with a comprehensive "DBpedia Hindi Extraction Framework" overview, adding modular component descriptions, installation and usage instructions, configuration examples, data/model resources, benchmarks, and licensing/credits. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@GSoC25_H/README.md`:
- Around line 5-15: The README shows conflicting project cycles: the GSoC badge
reads “GSoC-2026_Candidate_Work” while the Overview paragraph mentions “GSoC
2024/2025 work”; pick the correct cycle and make them consistent by editing the
badge text or the Overview line in README.md so both reference the same year
(e.g., change the badge label or update the sentence in the Overview from “GSoC
2024/2025 work” to “GSoC 2026 work” or vice versa) and ensure the Note timestamp
(Feb 2026) aligns with the chosen cycle.
- Around line 20-80: Change the Markdown code fence for the directory tree in
README.md to specify a language identifier by replacing the opening ``` with
```text so the block starts with ```text and ends with ```; this satisfies MD040
and improves rendering for the tree listing.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@GSoC25_H/README.md`:
- Around line 340-348: README contains incomplete paths; update the three
referenced locations to the full consistent directory hierarchy used in this
repo (use either repo-root-relative or file-relative consistently).
Specifically, replace `hindi-benchie/hindi_benchie_gold.txt` with
`GSoC25_H/IndIE/hindi-benchie/hindi_benchie_gold.txt` (or
`./IndIE/hindi-benchie/hindi_benchie_gold.txt`), replace
`link_prediction/data/*.ttl.bz2` with `GSoC25_H/link_prediction/data/*.ttl.bz2`
(or `./link_prediction/data/*.ttl.bz2`), and replace
`ontology_input/ontology--DEV_type=parsed.ttl` with
`GSoC25_H/ontology_input/ontology--DEV_type=parsed.ttl` (or
`./ontology_input/ontology--DEV_type=parsed.ttl`); ensure the chosen convention
is applied uniformly across the README.
🧹 Nitpick comments (3)
GSoC25_H/README.md (3)
207-207: Replace the placeholder with actual repository URL or clearer instruction.The
<repository-url>placeholder should be replaced with either the actual GitHub repository URL or a more descriptive instruction such ashttps://github.com/dbpedia/neural-extraction-framework(based on the PR URL context).📝 Suggested improvement
-git clone <repository-url> +git clone https://github.com/dbpedia/neural-extraction-framework cd GSoC25_HOr if the URL should remain flexible:
-git clone <repository-url> +git clone https://github.com/dbpedia/neural-extraction-framework.git +# Or clone from your fork: git clone https://github.com/YOUR_USERNAME/neural-extraction-framework.git cd GSoC25_H
282-283: Avoid hardcoded line numbers in documentation.Referencing specific line numbers ("lines 45-50") creates a maintenance burden, as these become outdated when code changes. Consider describing the configuration location more generically or referencing a specific variable/section name.
📝 Suggested alternative
-# Configure extraction mode in main.py (lines 45-50) -# Set: use_llm, llm_fallback, llm_enhancement, llm_filter_mode +# Configure extraction mode in main.py +# Locate the hyper_params dictionary and set: use_llm, llm_fallback, llm_enhancement, llm_filter_mode
448-448: Replace IP address link with official ACL Anthology URL.The link uses an IP address (
http://103.25.231.59:80), which is unreliable and unsustainable. Replace it with the official ACL Anthology entry:- **IndIE Paper:** [IJCNLP-AACL 2023 Findings](https://aclanthology.org/2023.findings-ijcnlp.28/)Alternatively, use the DOI reference:
10.18653/v1/2023.findings-ijcnlp.28
Context
The current
GSoC25_H/README.mdcontained weekly personal logs and outdated "TODO" notes from the 2024/2025 cycle. This made it difficult for new contributors to understand the current folder structure and component capabilities.Changes
I have refactored the README into standard technical documentation. It now includes:
IndIE,llm_IE, andReActmodules.requirements.txt.llm_IEmodule.Impact
This documentation update improves the onboarding experience for future GSoC candidates and maintainers.
Summary by CodeRabbit