Skip to content

Json Transcript Format

tim-contexta edited this page Apr 16, 2019 · 23 revisions

Transcripts return by the /transcripts/:transcriptId service have the following format:

{
  "name": "02e0dd49e6dec6d360ef1718fe374ee7",     // Internal file name

  "overall_conf": "0.94",                         // Average confidence score*
                        
  "path": "02e0dd49e6dec6d360ef1718fe374ee7.wav", // Internal file name (with extension)

  "word_count": "107",                            // Total number of words in the 
                                                  // conversation 

  "SpeakerList": [                                // An array of speakers found in 
                                                  // the conversation

    {
      "spkid": "spk1"                             // Speaker id referenced by the 
                                                  // speech segments
    },
    {
      "spkid": "spk2"
    }
  ],
  "SegmentList": [                                // An array of speech segments 
                                                  // (sentences). Each segment
                                                  // should be punctuated after
                                                  // the last word.

    {
      "spkid": "spk1",                            // The speaker id for this speech 
                                                  // segment.
      
      "question": "False",                        // Attribute to signal if this 
                                                  // speech segment contains a 
                                                  // question. Segments with this
                                                  // attribute = True should be
                                                  // punctuated with a ? instead of
                                                  // a .
      

      "words": [                                  // An array of words said in this 
                                                  // speech segment
        {
          "text": "goedemorgen",                  // The text of the word.

          "conf": "1.00",                         // Confidence score*

          "dur": "0.75",                          // Duration of the word in seconds

          "stime": "2.23"                         // Start time of the word in seconds
                                                  // since the begining of the audio.
        },
        {
          "text": "inpakt",
          "conf": "0.50",
          "dur": "0.45",
          "stime": "3.10"
        },
        {
          "text": "energie",
          "conf": "0.77",
          "dur": "0.39",
          "stime": "3.55"
        }
      ]
    },
    {
      "spkid": "spk2",
      "question": "True",
      "words": [
        {
          "text": "goedemiddag",
          "conf": "0.97",
          "dur": "0.54",
          "stime": "7.24"
        },
        {
          "text": "waar",
          "conf": "1.00",
          "dur": "0.18",
          "stime": "7.84"
        }
      ]
  ]
}

*More info about confidence scores

Contexta Analyzer

A subscription to Contexta Analyzer will also make meta data fields visible.

Clone this wiki locally