Skip to content

Conversation

@Rossi-Luciano
Copy link
Collaborator

O que esse PR faz?

Este PR implementa validação de valores de data_availability_status extraídos do XML, preservando valores inválidos para análise posterior. Quando o XML contém um valor que não está na lista de choices válidos, o sistema:

  • Armazena o valor original em um novo campo invalid_data_availability_status
  • Define o campo data_availability_status como "invalid"
  • Evita perda de dados e permite identificar valores não reconhecidos

Problema solucionado: XMLs estão chegando com valores de specific-use que não correspondem aos choices esperados, causando perda de informação.

Onde a revisão poderia começar?

  1. article/choices.py - Verificar as novas constantes e lista de validação
  2. article/models.py - Verificar o novo campo invalid_data_availability_status
  3. article/sources/xmlsps.py - Verificar a lógica de validação na função add_data_availability_status
  4. article/migrations/0045_article_invalid_data_availability_status_and_more.py - Verificar a migration gerada

Como este poderia ser testado manualmente?

Teste via Django Shell:

from lxml import etree
from article.sources.xmlsps import add_data_availability_status
from article.models import Article
from django.contrib.auth import get_user_model

User = get_user_model()
user = User.objects.first()

# Teste 1: Valor VÁLIDO
xml_valido = """<article>
<body>
<sec sec-type="data-availability" specific-use="data-available">
<p xml:lang="en">Data is available</p>
</sec>
</body>
</article>"""

tree = etree.fromstring(xml_valido.encode('utf-8'))
article = Article.objects.create()
errors = []
add_data_availability_status(tree, errors, article, user)

# Resultado esperado:
# article.data_availability_status == "data-available"
# article.invalid_data_availability_status == None

# Teste 2: Valor INVÁLIDO
xml_invalido = """<article>
<body>
<sec sec-type="data-availability" specific-use="valor-nao-reconhecido">
<p xml:lang="en">Some text</p>
</sec>
</body>
</article>"""

tree = etree.fromstring(xml_invalido.encode('utf-8'))
article = Article.objects.create()
errors = []
add_data_availability_status(tree, errors, article, user)

# Resultado esperado:
# article.data_availability_status == "invalid"
# article.invalid_data_availability_status == "valor-nao-reconhecido"

Verificar no banco:

SELECT data_availability_status, invalid_data_availability_status 
FROM article_article 
WHERE data_availability_status = 'invalid';

Algum cenário de contexto que queira dar?

N.A.

Quais são tickets relevantes?

TK #1249

Referências

N.A.

Cria constante DATA_AVAILABILITY_STATUS_INVALID e lista
DATA_AVAILABILITY_STATUS_VALID_VALUES para validar valores vindos
do XML, preservando informações inválidas para análise posterior.
Cria campo invalid_data_availability_status para preservar valores
inválidos vindos do XML quando não estiverem na lista de choices.
Valida se o valor extraído do XML está na lista de choices válidos.
Valores inválidos são preservados em invalid_data_availability_status
e o status é marcado como "invalid".
Cria campo no modelo Article para armazenar valores inválidos
de data availability recebidos do XML.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements validation for data_availability_status values extracted from XML files, preserving invalid values for analysis. When an XML contains a value that doesn't match the expected choices, the system stores the original value in a new field and marks the status as "invalid" to prevent data loss.

Key changes:

  • Added validation logic to detect and handle invalid data_availability_status values from XML
  • Introduced a new database field to preserve unrecognized values for later analysis
  • Updated the choices list to include an "invalid" status option

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
article/choices.py Added new constant DATA_AVAILABILITY_STATUS_INVALID and created validation list DATA_AVAILABILITY_STATUS_VALID_VALUES
article/models.py Added invalid_data_availability_status field to store unrecognized values from XML; whitespace cleanup
article/sources/xmlsps.py Implemented validation logic in add_data_availability_status() to check values against valid choices and preserve invalid ones; whitespace cleanup
article/migrations/0045_article_invalid_data_availability_status_and_more.py Database migration adding the new field and updating choices for data_availability_status

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

article.invalid_data_availability_status = status
article.data_availability_status = choices.DATA_AVAILABILITY_STATUS_INVALID
else:
article.invalid_data_availability_status = None
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation logic has a subtle issue: when the status from XML is None (absent), it correctly sets the status to DATA_AVAILABILITY_STATUS_ABSENT, but it also sets invalid_data_availability_status to None. However, when a valid status is provided from the XML, invalid_data_availability_status is also set to None. This means that if an article previously had an invalid status and is then updated with a valid status or no status, the original invalid value will be lost. Consider whether the invalid_data_availability_status field should only be cleared when explicitly appropriate, or if historical tracking of invalid values is needed.

Suggested change
article.invalid_data_availability_status = None
# Não sobrescreve `invalid_data_availability_status` quando o status é válido ou ausente,
# preservando qualquer valor inválido previamente armazenado.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

)

# Lista com valores válidos para validação
DATA_AVAILABILITY_STATUS_VALID_VALUES = [status[0] for status in DATA_AVAILABILITY_STATUS]
Copy link
Member

@robertatakenaka robertatakenaka Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano os valores válidos estão na documentação do SPS. Esta lista é a lista de controle e não necessariamente dos valores aceitáveis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

items.append({"language": lang, "text": text})

article.data_availability_status = status or choices.DATA_AVAILABILITY_STATUS_ABSENT
# Valida se o status está na lista de valores aceitos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

choices.DATA_AVAILABILITY_STATUS_ABSENT é usado para identificar que não há no XML este dado, pois a orientação da existência deste elemento no XML é mais recente. Então, ajuste o código para que isso seja considerado.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Copy link
Member

@robertatakenaka robertatakenaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano realizar as correções

Corrige preservação de histórico em invalid_data_availability_status
Corrige validação de data availability para valores SPS oficiais
if status is None:
# Valor ausente no XML (orientação mais recente do SPS)
# Não altera invalid_data_availability_status (preserva histórico)
article.data_availability_status = choices.DATA_AVAILABILITY_STATUS_ABSENT
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano acho que é melhor adicionar article.invalid_data_availability_status = None

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 361 to 373
# Valida o status extraído do XML
if status is None:
# Valor ausente no XML (orientação mais recente do SPS)
# Não altera invalid_data_availability_status (preserva histórico)
article.data_availability_status = choices.DATA_AVAILABILITY_STATUS_ABSENT
elif status not in choices.DATA_AVAILABILITY_STATUS_VALID_VALUES:
# Valor inválido encontrado no XML
article.invalid_data_availability_status = status
article.data_availability_status = choices.DATA_AVAILABILITY_STATUS_INVALID
else:
# Valor válido explícito presente
article.invalid_data_availability_status = None
article.data_availability_status = status
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic for data availability status values lacks test coverage. Consider adding tests to verify the three validation paths: (1) invalid values are stored in invalid_data_availability_status and status is set to "invalid", (2) valid values clear invalid_data_availability_status and set the correct status, and (3) absent values preserve the existing invalid_data_availability_status without changes. This is particularly important since this logic handles data preservation and could impact data integrity if it behaves incorrectly.

Copilot uses AI. Check for mistakes.
# Valida o status extraído do XML
if status is None:
# Valor ausente no XML (orientação mais recente do SPS)
# Não altera invalid_data_availability_status (preserva histórico)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic preserves the invalid_data_availability_status field when status is absent (None), which creates potential inconsistency. If an article is reprocessed with a new XML that lacks a data availability section, the status becomes "absent" but the old invalid value remains in invalid_data_availability_status. This means the fields can show conflicting information (status="absent" with invalid_data_availability_status="some-old-value"). Consider clearing invalid_data_availability_status when status is None to maintain consistency, or document why this historical preservation is desired despite the inconsistency.

Suggested change
# Não altera invalid_data_availability_status (preserva histórico)
# Limpa invalid_data_availability_status para evitar estado inconsistente
article.invalid_data_availability_status = None

Copilot uses AI. Check for mistakes.
Copy link
Member

@robertatakenaka robertatakenaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Rossi-Luciano verificar o último comentário e corrigir

@robertatakenaka robertatakenaka merged commit f377436 into scieloorg:main Jan 8, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants