Digital Toolbox


Project URL

Site screenshot
Cost and Development Philosophy

Free, open source

Local Installation or Hosted Solution

Local installation


TranscriberAG is designed for assisting the manual annotation of speech signals. It provides a user-friendly graphical user interface (GUI) for segmenting long duration speech recordings, transcribing them, labeling speech turns, topic changes and acoustic conditions.


TranscriberAG is geared toward the needs of the speech research community, but its features might be found useful for other applications. It uses the Annotation Graph format as native format but can read a number of other annotation formats. (source)


“This new version is based on Annotation Graphs (AG), the default format of the new annotation files: the associated XML-based format is named tag, and file names are suffixed .tag.” (source)

Use on SpokenWeb Project

To date we have not used TranscriberAG on the SpokenWeb project.


While the tool could simply be used as is, several features are of interest from a development perspective (i.e. to be incorporated in the SpokenWeb site):

  • enhance signal output (tempo/pitch variation)
  • transcription and annotation for multiple speakers
  • support for named entities
  • easy creation and modification of temporal anchors
  • multilingual support

TranscriberAG provides support for “annotation events” (source):

  • Noise inserts foreground noise event (for example breathing, general, conversation, etc).”
  • “Pronounciation annotates pronunciation problems (eg. unintelligible).”
  • Language annotates short scope language changes.”
  • Normalization inserts term normalisation (dates, numbers, etc).”
  • Disfluency annotates discourse disfluencies (hesitation, revision, etc).”

TranscriberAG provides support for “named entities” (source):

  • Person annotates a person identification (human, fictional character, other, etc)”
  • Location annotates a geographical location name (town, country, region, other, etc).”
  • Time annotates a time definition (date, hour, other).”
  • “Amount annotates an amount (currency, other).”
  • Organization annotates an organization name (non profit, educative, commercial, other).”
  • Geo Socio Politic annotates a geo-socio-political entity.”


Annotating audio files at this level of detail raises interesting possibilities for connecting the audio and visual realms. For example, if the audio recording has been marked up in this detailed way, playback of the audio on a web site (or at a re-creation of a poetry reading) could trigger the display of appropriate images. So, for instance, one speaker will trigger the display of an image (or projection) of that speaker and a change in speaker would result in a change in image. Similarly for locations, organizations, and other entities that are referenced in a poetry reading.


Further, annotating in this way also facilitates interesting possibilities for searching and browsing the audio and text. For example, if audience laughter or speaker hesitations have been annotated then a user could search a poetry archive for, say, all the occurrences of laughter or hesitations in a poet’s readings.


TrascriberAG seems to be a good tool that features excellent documentation. On the down side, currently the software doesn’t work doesn’t work on anything above Mac OS X 10.5.





We are not aware of any poetry projects that use TranscriberAG.

Future Directions

It is not clear what the future direction of TranscriberAG is.