Skip to content

Data Quality Control #480

@PaulHeggarty

Description

@PaulHeggarty

Miscellaneous Quality Control Issues with Slavic

  • One word is not playing (‘write!’); again, this applies to most Slavic recordings so there must be a general technical issue here that is hopefully easy to fix.
  • Greater Poland Some words say “soon”, we need to check if the sound files actually exist.
  • Mazovian A few words showing “soon” or having a transcription but no sound.
  • All Kashubian A few words showing a black play button or having a transcription but no sound.
  • Sorbian U Cath Some of the alternative pronunciations are not playing. ‘dead’ has a transcription but no sound.

Languages with many sound files failing to show -- missing from the server?

There are cases where languages that should have a full set of sound files do not. This is most likely caused by those files failing to load properly to the server (or having somehow been deleted since). Please check these up and fix. Please also use the diagnostic tools to identify and fix all other such cases. Here are some first examples.


Transcriptions do not match sound files

For Germanic study:

For Malakula study:
In some Malakula languages, there's a mismatch that affects many words in that language. Listen to 'woman' in the languages below, for example: the transcription is clearly for another word, and does not match the sound file. (And a few more examples and you'll hear/see the other mismatches.

There are two possible causes:

  1. There is a problem with the excel grid, which does not sync correctly with the big sound file for this language, so the segmentation and export went wrong.
  2. When the transcriptions were pasted in from Aviva's own excel file into the SndComp transcriptions template, there was a mismatch between word and transcription that started at some point in the transcriptions list. 'Woman' should be the transcription in the next cell down, for example.

We need to (manually) identify all the languages affected, and go back to fix whatever is the cause in each case, and either re-segment the sound files, or re-create the SQL transcription files.


Transcriptions too long and strange


Transcriptions missing on the website


Both sound files and transcriptions missing on the website


Transcription of lex2 online, but the sound is missing


no master TextGrid found

Region1

  • Oce_Van_Mal_Nth_WSth_VenenTaute_Unmet_Dl
    we both incl, below --> sound file is missing
  • Oce_Van_Mal_Nth_WSth_VenenTaute_Hamax_Dl
    seven, egg, thunder --> sound file is missing

Region 2

  • Oce_Van_Mal_Nth_ESth_Tirax_Bethel_Dl
    small, hair, breast, swim, fear, shoot, what, who --> sound file is missing
    neck --> wrong transcription

Region 6

  • there is Oce_Van_Mal_East_NCtr_Neverver_Sakan_Dl online but in the Rgn6-Excel it is called either
    Oce_Van_Mal_East_NCtr_Neverver_LingarakhSakan_Dl (new) or wl_0041_Neverver_02 (old), in Avivas compiled transcriptions sheet it is called "15-41 Neverver_Sakan". I only could find Oce_Van_Mal_East_NCtr_Neverver_LingarakhSakan_Dl on OC but there are just a few words in this recording. much less than there are online --> left the language for later DDA
  • Oce_Van_Mal_East_SCtr_Tesmbol_Usus_Dl

Languages in the Transcription Excel file but not on the website

  • Oce_Van_Mal_East_Sth_Avok_Shiaf_Dl

Language is NEITHER in Avivas compiled transcriptions NOR in Lauras Rgn-tables BUT online


empty pages

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions