vxi-discuss



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Suggestions/comments wanted!



Hi,
 
I'm a graduate student and my current project involves suggesting (and
possibly implementing) changes to the FIA in VoiceXML. What I'd like
from you folks is a quick evaluation of a suggestion I plan to make. 
Unfortunately no prizes/money are being offered ;).
 
I've incomplete knowledge of what is being done in the spoken dialogue
domain, but from what I can make out -- most currently implemented
dialog systems (as opposed to  experimental/research systems) today seem
to be merely speechifying touch tone menus. Really good, complete
natural language handling in dialogues is almost non-existent except for
a few research systems such as Rochester's TRAINS project or MIT's
START/GALAXY projects. There exist few (zero?) general or standardised
frameworks for the development of true NLU based dialog systems. This
state of affairs is actually reasonable when one considers the
requirements for what I call 'true NLU'; stuff like representing
knowledge and reliable extraction of semantics from spoken input(there's
a few interesting papers in this, that I can point you too -- if you'd
like). It's something that I attribute to the general immaturity of the
NLP field. However this is besides the point; it still means that most
VoiceXML applications today will involve dialogue systems that use
finite state machines to keep track of what is happening and what needs
to be done.
 
One of the biggest problems I see with Finite State Dialogue design is
the tendency of error - correction (or merely confirmational) nodes
swamping the dialogue design. Even the simplest 'pizza ordering' variety
of VoiceXML demo programs require some method of confirming the user's
input. To ease the load on dialogue designer I'm proposing the addition
of something like a 'confirm' attribute for both forms and fields. This
will possibly become clearer when you read the examples, but for now
please read on.
 
I can think of three confirmation strategies (assuming we limit the
input to one word utterances for now) 
 
Explicit field by field confirmation: [EXPLICIT]
C: What is A?
H: X.
C: A is X?
H: 'yes, A is X' or 'No, A is Y' or just 'Yes/No'
 
Block Confirmation: [BLOCK]
C: What is A?
H: X
C: What is B?
H: Y
C: What is C?
H: Z
C: A is X, B is Y, C is Z?
H: No and corrects parts that are wrong.
 
Implicit confirmation of previous field: [IMPLICIT]
C: What is A?
H: X
C: A is X; What is B?
H: can either answer for B, therefore implicitly confirming A or reject
A (optionally providing a correction).
 
Dynamic implicit confirmation: [SMART]
C: What is A,B?
H: A is X. B is Y
C: A is X and B is Y. What is C?
H: C is Z ..
[This example probably needs a bit of explanation. The dialogue manager
(or think of it as FIA) selects 'n' out the 'm' possible fields; it then
asks for these fields. Once it has recd and answer it attempts to
implicitly confirm as many of the previous fields as possible while
asking for another 'r' of the remaining 'm-n' fields. The choice of
which 'n' fields to pick is rather hard, but in general the goal would
be to minimise the number of dialogue moves while maintaining any
interdependencies between fields]
 
 
All of these conformational strategies can be implemented in pure
VoiceXML 1.0 (and tons of JavaScript - I know because I have working
examples). However it still requires considerable effort on part of the
dialogue designer. Hence - drum roll - here's what I propose:
 
<form confirm='explicit | block | implicit | smart'>
            <field name='A'> </field>
            <field name='B'> </field>
            <field name='C'> </field>
            ...
            <field name='N'> </field>
</form>
 
or at the field level:
 
<field name='A' confirm='yes | no | smart'>
 
 
I haven't really looked at the exact syntax requirements - I'm still
thinking of exactly what I want to offer in functional terms. For
instance it's rather clear that automatic prompt generation would be
quite difficult and would involve an exponential number of prompts to be
generated - hence some form of templates would need to be provided for.
[If anyone is interested: David Toney of Rhetorical Systems, has done
some work on automatic prompt generation/aggregation]. Similarly most of
the above schemes would require automatic generation of correction
grammars (or again a system for specification of correction grammars)
such as:
<field name='n'>
  <grammar src=expr1 type=ask>
  <grammar src=expr2 type=confirm>
</field>
 
My report for this project is going to be publicly available at no cost.
Furthermore any demo implementations will probably be built around
HTK/OpenVXI and will thus be entirely open-source. 
 
------------------------------------------------------------------------
------
 
What I'd like from you is answers to these questions:
 
1. Do you think your current dialogues are swamped by dialogue
correction/repair? (yes/no)
 
2. As a dialogue designer, how useful do you think this extension would
be to you? 
(scale 1-3, where 1 - could not care less, 2 - moderately useful, 3 -
very useful)
 
3. Do you feel that a dialogue which uses the implicit/smart
confirmation strategy would be more natural than say a dialogue that
uses explicit confirmation?
(yes/no)
 
4. If the platform provided support for it, do you feel a confirmation
system that leveraged N-BEST results would provide additional benefit?
(yes/no)
 
5. How do you rate VoiceXML in terms of design complexity?  
(on a scale of 1-5, 5 is good)
  5.1 -- For simple applications such as speechifying touch tone
systems:
 
  5.2 -- For slightly more complex apps such as say a large banking
system:
 
  5.3 -- For complex dialog modelling:
           (With anaphora resolution, meta-dialogues, things like
spontaneous questions such as these: How many more questions? What have
I asked for as yet? Do you think Saturday would be more appropriate? Do
I have to take fries with that?)
 
6. How long have you been coding Voice applications?
 
7. Your proficiency with VoiceXML (scale of 1-5, 5 is expert)
 
8. Any other comments:
 
 
Thanx for your time.
 
 
Vishal Doshi.
Speech, Vision and Robotics Group,
Department of Engineering,
University of Cambridge.
 
 
PS: If you want more information on any of the above ideas please feel
free to contact me either by e-mail or telephone at: 44 1223 528480

PPS: I'm expecting a response from atleast the regulars; however lurkers
-- it really wont take that much time -- go on reply!




This page is maintained by Alan W Black (awb@cs.cmu.edu)
speechinfo.org is hosted on a machine donated by VA Linux Systems
VA Linux Systems