Australian Access Federation

You are here: Home About Glossary


Term Definition


Linguistic information added to a speech signal or text. It may include information on any level of language, e.g. part-of-speech labels, morphological analysis, markup of sentence boundaries, speaker turns and overlaps. A transcription can also be considered a form of annotation in relation to its audio source.


In linguistics, this refers to a more or less structured assemblage of digitised texts with some common properties, e.g. a particular type of discourse. Also called a text archive.


A large, structured collection of authentic (written or spoken) texts that have been compiled in electronic form according to a specific set of criteria, to represent a language or language variety, and particular speakers/writers, at a specified date or period.

Data cleansing The process of correcting data errors to bring the level of data quality to an acceptable level for the needs of AusNC information consumers.
Data model A representation of the data describing objects and the relationships between the objects, independent of any associated process. A data model may include a set of diagrams for each view along with the meta data defining each object in the model. A complete data model may also include state transition diagrams depicting each major entity lifecycle and value chain analysis linking the data model to processes, roles, organizations, goals, applications and projects.
Hapax A word or form appearing only once in the Australian National Corpus.


A verb used in computational engineering to refer to capturing or transferring video, audio, and metadata from one media storage system to another.


A digitised record of a linguistic event, as defined in the structure of the corpus or collection. It can include a sample text, audio file, video file, or combination of any of samples mentioned.

Language database

A classified collection of linguistic elements, digitised individually, not as continuous text.


The study of human language, which may be undertaken from many different aspects, for example, sounds (phonetics) or structures of words(morphology) or meanings (semantics), as well as text-types and their structure, the mediums of communication, and the interaction between participants in conversation.


This is information used to describe items and groups of items, i.e. data about data at different levels. At the collection level, it is information about a whole collection/corpus. At the item level it is information features of the individual text within a collection/corpus (e.g. participant characteristics, time, text-type).

Parsing In computer science and linguistics, parsing is the process of analyzing a text, made of a sequence of demonstrations (for example, words), to determine its grammatical structure with respect to a given (more or less) formal grammar.


The study of how language expresses social identity, e.g. the age, gender, education, socio-economic status of the individual.


A written representation of the text of an audio or video file. This can be considered an item in its own right, or a form of annotation.

*Some terms were developed from the Wikipedia free encyclopedia and online Merriam Webster Dictionary.