An alternative approach is to derive statistical models of a language automatically from such texts by learning mappings between input strings and output structures. For example, a flight from Belfast to Malaga might be represented asīoth approaches described involve grammars constructed by a developer who creates the grammar rules, often on the basis of a corpus of texts from a relevant domain. A domain-specific dialogue act may also be the output of this analysis, accompanied by values extracted from the input string that fill slots in a frame. The output of the parsing of an input string using a semantic grammar is usually a set of key words representing the main concepts expressed. So, for example, a system involving flights will have categories relevant to the flight domain, such as airline, departure airport, and flight number, whereas a system involving banking will have categories such as account, balance, and transfer. Generally semantic grammars for spoken dialogue and other natural language systems are domain specific.
A semantic grammar uses phrase structure rules as a syntactic grammar does, but its constituents are classified in terms of function or meaning rather than syntactic categories. In many spoken dialogue systems the meaning of the utterance is derived directly from the recognized string using a semantic grammar. Meaning is often represented using logical formulae or alternatively in terms of dialogue acts that represent the user's intent, that is, whether the utterance was meant as a question, a command, a promise, or a threat. This approach is motivated by research in theoretical and computational linguistics and it provides a deeper level of understanding by capturing fine-grained distinctions that might be missed in alternative approaches that aim to extract the meaning directly without recourse to syntactic analysis. The traditional approach involves two stages: syntactic analysis, to determine the constituent structure of the recognized string, and semantic analysis, to determine the meaning of the constituents. SLU is a complex process that can be carried out in a variety of ways. Given a string of words from the ASR component, the SLU component analyzes it to determine its meaning. For a detailed account of speech recognition, particularly in noisy ambient environments, see. However, given that this hypothesis may not be correct, there is merit in maintaining multiple recognition hypotheses so that alternatives can be considered at a later processing stage.
In some dialogue systems the first-best hypothesis is chosen and passed to SLU for further analysis. ASR is a probabilistic pattern-matching process whose output is a set of word hypotheses, often referred to as an n-best list, or word graph. This involves returning a sequence of words by matching a set of models, acquired in a prior training phase, with the incoming speech signal that constitutes the user's utterance. Michael McTear, in Human-Centric Interfaces for Ambient Intelligence, 2010 9.3.1 Input InterpretationĪs indicated in Figure 9.1, the first stage of analysis is recognition of the user's input.