This folder implements two core reliability components for MicroTraitLLM:
- Article Validation – scores and filters retrieved papers before they are passed to the LLM.
- Citation Accuracy Checking – post-processes model outputs to detect and correct hallucinated or mismatched citations.
These components correspond to the Article Validation and Citation Accuracy tasks in the overall MicroTraitLLM project.
Ensure that only high-quality, relevant, and accessible articles are used as context in the RAG pipeline.
query– user query string.articles– list ofArticleobjects, each containing:pmcid,doititle,abstractjournal,yearcitation_countis_peer_reviewed,is_retracted
- Same list of
Articleobjects, each annotated with:validation_score∈ [0, 1]confidence_label∈ {high,medium,low,invalid_id,unknown}
By default, low-confidence articles are filtered out.
The composite score is a weighted combination of:
- Recency – newer articles score higher.
- Source reputation – tiered by journal and peer-review status.
- Citation count – log-scaled and normalized.
- Topic relevance – cosine similarity between query and article text embeddings.
The implementation is exposed via:
from validation import validate_articles
filtered_articles = validate_articles(query, retrieved_articles)Configuration points:
- Journal tiers (
JOURNAL_TIERS) - Weighting of score components
- Identifier accessibility check (
check_identifier_accessibility)
Reduce citation hallucinations by verifying that:
- Each citation in the answer corresponds to a real article in the retrieved corpus.
- The cited passage is semantically consistent with the referenced article.
answer_body– main answer text (with inline numeric citations like[1]).ref_text– reference list generated by the model (e.g., lines starting with[1],[2], etc.).retrieved_articles– sameArticleobjects used in retrieval.
cleaned_answer_body– answer text with hallucinated citations removed.report_list– per-citation diagnostics:raw_citation(e.g.[1])identifier(PMCID/DOI orNone)status–valid,mismatch, ornot_foundsimilarity– embedding similarity score
- Extract numeric citations from the answer (e.g.
[1],[2]). - Parse the reference list and map each number to a PMCID or DOI.
- Match each identifier to a retrieved article.
- Extract a local context window around the citation token.
- Compute embedding similarity between the context and the article’s title + abstract.
- Flag citations as:
valid(similarity above threshold),mismatch(article found but content does not align),not_found(no article with that identifier in the corpus).
Usage:
from validation import check_citations
cleaned_answer, report = check_citations(
answer_body=answer_body,
ref_text=ref_section,
retrieved_articles=filtered_articles,
similarity_threshold=0.6,
)These subsystems plug into the MicroTraitLLM RAG pipeline as follows:
- Retrieval – given a query, retrieve candidate articles.
- Article Validation – call
validate_articlesto score and filter the candidates. - LLM Generation – pass validated articles as context to the model.
- Citation Accuracy – call
check_citationson the model’s answer and reference list. - Final Output – return the cleaned answer plus an optional citation report.
- Replace
embed_textstub with the production embedding model. - Implement real PMCID/DOI checks in
check_identifier_accessibilityusing NCBI/PMC APIs or a local metadata cache. - Allow scoring weights and journal tiers to be configured via YAML/JSON.
- Add unit tests and benchmark evaluation:
- Precision/recall for valid article retention.
- Precision/recall/F1 for citation validation.
- Extend citation parsing to handle additional formats (e.g., inline PMCIDs, author-year styles).