Luks.fe.uni-lj.si

Melita Hajdinjak and France MiheliˇcUniversity of Ljubljana, Faculty of Electrical Engineering, Slovenia{melita.hajdinjak,france.mihelic}@fe.uni-lj.si and http://luks/ Keywords: natural-language dialogue systems, Wizard-of-Oz experiment, dialogue-manager evalua-tion, PARADISE evaluation framework Human-human and human-computer dialogues differ in such an important way that thedata from human interaction becomes an unreliable source of information for some im-portant aspects of designing natural-language dialogue systems. Therefore, we beganthe process of developing a natural-language, weather-information-providing dialoguesystem by conducting the Wizard-of-Oz (WOZ) experiment. In WOZ experiments sub-jects are told to interact with a computer system, though in fact they are not sincethe system is partly simulated by a human, the wizard. During the development of theweather-information-providing dialogue system this experiment was used twice. Whilethe aim of the first WOZ experiment was, first of all, to gather human-computer data,the aim of the second WOZ experiment was to evaluate the newly-implemented dialogue-manager component. The evaluation was carried out using the PARADISE evaluationframework, which maintains that the system’s primary objective is to maximize usersatisfaction, and it derives a combined performance metric for a dialogue system as aweighted linear combination of task-success measures and dialogue costs. been argued that human dialogues should be re-garded as a guidance and a norm for the de- In a nutshell, a dialogue system or a voice in- sign of natural-language dialogue systems, i.e., terface enables users to interact with some appli- that a natural dialogue between a person and a cation using spoken language. The application computer should resemble a dialogue between hu- in question, for example, can be a piece of hard- mans as much as possible. On the other hand, a ware (command & control systems) or a kind of computer is not a person. Consequently, human- database (interactive voice response, information- human and human-computer dialogues differ in providing dialogue systems, problem-solving dia- such an important way that the data from hu- logue systems). A detailed overview is given by man interaction becomes an unreliable source of Krahmer [1]. In this article, we will focus on information for some important aspects of design- information-providing, natural-language dialogue ing natural-language dialogue systems, in par- systems, which have already been developed for ticular the style and complexity of interaction different domains, for instance, restaurant infor- mation [2], theatre information [3], train travel language dialogue systems are influenced by the information [4, 5], air travel information [6, 7], system’s language [11], i.e., they often adapt their behaviour to the expected language abilities of It is generally acknowledged that developing the counterpart. Therefore, instead of gather- a successful computational model of natural- ing human-human data, we started the process language dialogues requires extensive analysis of of designing the Slovenian and Croatian spo- sample dialogues, but the question that arises ken, weather-information-providing dialogue sys- is whether these sample dialogues should be hu- tem [12] by conducting the Wizard-of-Oz (WOZ) man dialogues. On the one hand, it has often experiment [10, 13], which is a more accurate pre- dictor of actual human-computer interaction [9].
responses as well as forms, image fields and This is because in WOZ studies subjects are told to interact with a computer system, though infact they are not. The system is at least partly simulated by a human, the wizard, with the con- Slovenian text-to-speech synthesis [16].
sequence that the subjects can be given more free-dom of expression or be constrained in more sys- Hence, the task of the wizard in the first WOZ tematic ways than this is the case in already ex- experiment was to simulate Slovenian speech understanding (speech recognition and natural- language understanding) and dialogue manage- information-providing dialogue system the WOZ ment. Croatian speech understanding was not experiment was used twice. While the aim of the performed since only Slovene users were being in- first WOZ experiment (section 2) was, first of all, volved into the experiment. During the experi- to gather human-computer data, the aim of the ment, the wizard was sitting behind the graphi- second WOZ experiment (section 3) was to eval- cal interface, listened to users’ queries and tried to uate the newly-implemented dialogue-manager mediate an appropriate response, which was be- component [14]. Consequently, while in the first ing successively followed by the natural-language- WOZ experiment dialogue management was still generation process and the text-to-speech process.
one of the tasks of the wizard, in the second However, a total of 76 Slovene users (38 fe- WOZ experiment it was performed by the newly- male, 38 male) were chosen to take part in the implemented dialogue-manager component. The first WOZ experiment. The statistical distribu- differences in the data from both WOZ exper- tions of the users’ ages, educations, dialects, the iments therefore reflect the dialogue manager’s telephone units and the background environments performance. However, this data was evaluated from where the telephone calls were made were with the PARADISE evaluation framework [15], chosen to simulate the actual scenarios. The users i.e., a potential general methodology for evaluat- were given verbal instructions about the general ing and comparing the performance of spoken di- functionality of the system and a sheet of paper alogue agents, which maintains that the system’s containing a description of the tasks they were primary objective is to maximize user satisfaction, supposed to complete. They had two scenarios and it derives a combined performance metric for to enact. The first task was to obtain a partic- a dialogue system as a weighted linear combina- ular piece of weather-forecast information, such tion of task-success measures and dialogue costs.
as the temperature in Ljubljana or the weatherforecast for Slovenia tomorrow, and the secondtask was a given situation, such as ”You are plan- ning a trip to. What are you interested in?”,the aim of which was to stimulate the user to ask The aim of the first WOZ experiment [13] was context-specific questions. After these two sce- to gather data that would serve as the basis for narios, users were given the freedom to ask addi- the construction of the dialogue manager and the speech-understanding component within the In order to evaluate user satisfaction, users were developing Slovenian and Croatian spoken di- given the user-satisfaction survey [17] used within alogue system for weather-information retrieval the PARADISE framework (section 4), which asks [12]. However, the first WOZ system consisted to specify the degree to which one agrees with several questions about the behaviour or the per-formance of the system (TTS Performance, Pace, User Expertise, System Response,Expected Behaviour, Future Use). The an- wizard’s graphical interface [13], designed as swers to the questions were based on a five-class an internet application, which included facil- ranking scale from 1, indicating strong disagree- ities for the playback of predefined spoken ment, to 5, indicating strong agreement. All the mean values are given in table 1. A comprehen- and the same user-satisfaction survey as the users sive User Satisfaction was then computed by in the first experiment. All the mean user val- summing each question’s score, and thus ranged ues, which were slightly worse than the values in value from a low of 8 to a high of 40. In the from the first WOZ experiment, are given in ta- first WOZ experiment, the mean User Satisfac- ble 1. The mean User Satisfaction value was tion value was 34.08, with a standard deviation this time 31.96, with a standard deviation of 4.99.
Note, the difference between the mean User Sat-isfaction values in both experiments is expected since the wizard with her human-level intelligence should had been able to manage the dialogue bet- ter than the implemented dialogue-manager com- The Slovenian spontaneous speech data col- lected during the second WOZ experiment was named Slovenian Spontaneous Speech Queries 2 In agreement with previous studies [9, 10, 11], we observed that in both experiments the usersadapted their behaviour to the expected language abilities of the natural-language-spoken WOZ sys- (WOZ1) and the second (WOZ2) WOZ experi- tem. In several dialogues the first question was much longer than the following ones and, in caseof repetitions, requested by the system, the speechmode became more articulated, slower and/or The spontaneous speech data, named Slove- louder. Moreover, while the wizard was medi- nian Spontaneous Speech Queries (SSSQ), that ating her response some users made fun of the was collected during the first WOZ experiment, system, they made comments like ”What a voice was transcribed with the Transcriber tool [18].
- terribly”, ”It is thinking”, ”It is searching in The transcription was labelled for turns and ut- the computer”, and they laugh. But such side re- terances, and special labels for dialectal words marks certainly would be rather strange in a nat- and non-speech sounds were added. An example ural information-providing task because, in both experiments, subjects were basically role playing.
They were not real users with real information re-quirements or real time constraints and telephone The second WOZ experiment was carried out inorder to evaluate the performance of the newly- implemented dialogue manager [14], build on thebasis of the data collected during the first WOZ The dialogue-manager component [14] was evalu- experiment. Therefore, all the other components ated using the PARADISE framework [15], which of the system remained the same. Hence, in com- maintains that the system’s primary objective is parison with the first WOZ experiment, the task to maximize user satisfaction, and it derives a of the wizard in the second WOZ experiment was combined performance metric for a dialogue sys- only to simulate Slovenian speech understanding.
tem as a weighted linear combination of task- The wizard was sitting behind the dialogue man- success measures and dialogue costs (i.e., dialogue- ager’s interface and entered the meaning repre- efficiency costs and dialogue-quality costs). The PARADISE model of performance posits that a A total of 68 Slovene users (29 female, 39 male) performance function can then be derived by ap- were chosen to take part in the second WOZ ex- plying multivariate linear regression (MLR) with periment. They were given the same instructions user satisfaction as the dependent variable and Hello. The dialogue system for weather-forecast information speaking. Can I help you? Wait a moment, please . [the wizard is choosing her answer]In ˇ Stajerska today - the visibility will be more than 10 km. Is there something else? For which location are you asking for? Wait a moment, please . [the wizard is choosing her answer]No, in ˇ Stajerska today – the sky will clear up. Is there something else? What about the weather in Poland in the next few days? Wait a moment, please . [the wizard is choosing her answer]In Varˇsava, Poland - it is cloudy, the air temperature is -6 degrees Celsius. Is theresomething else? I do not offer this information. Do you have any other question? Thank you for your cooperation. Goodbye. Table 2: The Slovene-English translation of an example dialogue between a user (U) and the WOZsystem (S), recorded during the first WOZ experiment.
task-success measures, dialogue-efficiency costs, Message Ratio (HMR), i.e., the ratio of sys- and dialogue-quality costs as the independent tem help moves; Check Ratio (CR) and Num- variables. Here, user satisfaction, which has been ber of Check moves (NC), i.e., the ratio and frequently used in the literature as an external the number of system moves checking some in- indicator of the usability of a dialogue system, is formation regarding past dialogue events; Non- calculated with the survey [17], used in our WOZ Provided Information Ratio (NPR), i.e., the ratio of user-initiating moves that do not result In order to model the performance of both in relevant information being provided; No-Data WOZ systems, we selected 17 regression pa- rameters. First, we computed the task-success sponses (NNR), i.e., the ratio and the number measure Kappa coefficient (κ) [19], reflect- of system moves stating that the requested in- ing the wizard’s typing errors, and the dialogue- formation is not available; Relevant-Data Ra- efficiency costs Mean Elapsed Time (MET), tio (RDR), i.e., the ratio of system moves di- i.e., the mean elapsed time for the completion recting the user to select relevant, available data; of the tasks that occurred within the interac- Unsuitable-Initiative Ratio (UIR), i.e., the ra- tion, and Number of User Turns (NUT). Sec- tio of user-initiating moves that are out of context; ond, the following dialogue-quality costs were Non-Initiating Ratio (NIR), i.e., the ratio of selected: Task Completion (Comp), i.e., the user’s perception of completing the given task; Mean Words per Turn (MWT), i.e., the mean the first WOZ experiment to derive a performance number of words per user’s turns; Mean Re- sponse Time (MRT), i.e., the mean system- tio, Non-Provided-Information Ratio, Task response time; Max Response Time (MaxRT), i.e., the maximum system-response time; Rejec- tion Ratio (RR), i.e., the ratio of system moves that significantly contributed to user satisfac- asking for a repetition of the last utterance; Help- tion. On the other hand, the most significant parameters in the second WOZ experiment were manager should be as flexible as possible in directing the user to select relevant, available Walker et al. [17] found in their experiments that Task Completion, rather than Kappa,was a significant factor in predicting user sat- isfaction, and argued that this was because the ducted WOZ experiments, aim of which was to user’s perceptions of task completion sometimes gather human-computer data and to evaluate the varied from Kappa. In our experiments, Kappa dialogue-manager component of the developing, only referred to the wizard and Task Comple- Slovenian and Croatian spoken dialogue system tion was related only with the first task, which could be the reasons why we did not come to the The results of applying PARADISE to the data same conclusion. On the one hand, in these ex- from both WOZ experiments have been given.
periments, Kappa and Task Completion were These have shown that user satisfaction is sig- uncorrelated, but on the other hand, in the sec- nificantly correlated with the percentage of those ond WOZ experiment, Kappa was an even more user initiatives that did not result in relevant in- significant predictor of user satisfaction.
the ability to direct the user to select relevant, available data is of great importance, and, con- sequently, that a dialogue system should give no information only if there is no other available data that might be relevant to the user’s request.
Message Ratio is a consequence of the user’sbehaviour during the conversation, which is, [1] Krahmer, E.J. (2001) The Science and Art on the other hand, influenced by the system’s of Voice Interfaces, Philips research report, level of user-friendliness and cooperation.
user-friendly and cooperative dialogue systemshould not only play an active role in directing [2] Jurafsky, D., Wooters, C., Tajchman, G., Se- the dialogue flow toward a successful conclusion gal, J., Stolcke, A., Fosler, E., and Morgan, for the user, it should also be able to take the N. (1994) The Berkeley Restaurant Project, initiative and to instruct the user if he/she asks Proc. of the 3rd International Conference on for help. However, because some novice users Spoken Language Processing, Acoustical So- of a dialogue system who are not able to adapt ciety of Japan, Yokohama, Japan, pp. 2139– quickly are likely to need instructions provided by the system, Help-Message Ratio is ex-pected to reflect user satisfaction. Furthermore, [3] van der Hoeven, G., Andernach, J., van der because Check Ratio is in a way related to the Burgt, S., Kruijff, J., Nijholt, A., Schaake, J., speech-understanding process, which is usually and de Jong, F. (1995) A Natural Language the most problematic part of a dialogue-system’s Accesible Theatre Information and Booking performance, it is inappropriate to try to decrease System, Proc. of the 1st International Work- it at any price. Consequently, user satisfaction shop on Applications of Natural Language to can be remarkably improved only by decreasing Data Bases, AFCET, Versailles, France, pp.
be done by preventing the dialogue manager fromgiving no information before first checking that [4] Eckert, W., Kuhn, T., Niemann, H., Rieck, there is no other available data that might be S., Scheuer, A., and Schukat-Talamazzini, relevant to the user’s request, i.e., the dialogue for Weather Information Retrieval, Proc. of quiries, Proc. of the 3rd European Conference the 8th European Conference on Speech Com- on Speech Communication and Technology, munication and Technology, ISCA, Geneva, ISCA, Berlin, Germany, pp. 1871–1874.
[5] Allen, J.F., Schubert, L.K., Ferguson, G., [13] Hajdinjak, M. and Miheliˇc, F. (2003) The Heeman, P., Hwang, C.-H., Kato, T., Light, tion Retrieval, Lecture Notes in Artificial In- telligence 2807: Text, Speech and Dialogue, Project: A Case Study in Building a Conver- pp. 400–405. Matouˇsek, V. and Mautner, P.
sational Planning Agent, Journal of Experi- mental and Theoretical AI, Taylor and Fran-cis Ltd, pp. 7 7–48.
[14] Hajdinjak, M. and Miheliˇc, F. (2004) [6] Ipˇsi´c, I., Miheliˇc, F., Dobriˇsek, S., Gros, ment, Lecture Notes in Artificial Intelligence J., and Paveˇsi´c, N. (1999) A Slovenian 3206: Text, Speech and Dialogue, pp. 595– Spoken Dialogue System for Air Flight In- 602. Sojka, P., Kopecek, I. and Pala, K.
quires, Proc. of the 6th European Conference on Speech Communication and Technology,ISCA, Budapest, Hungary, pp. 2659–2662.
[15] Walker, M.A., Litman, D., Kamm, C.A., and [7] Stallard, D. (2000) Talk’n’Travel: A Con- versational System for Air Travel Planning, Agents, Proc. of the 35th Annual Meeting Proc. of the 6th Applied Natural Language of the Association of Computational Linguis- Processing Conference, Association for Com- tics, Association for Computational Linguis- putational Linguistics, Seattle, USA, pp. 68– [16] Gros, J., Paveˇsi´c, N., and Miheliˇc, F. (1997) [8] Zue, V., Seneff, S., Glass, J., Polifroni, J., Text-to-Speech Synthesis: a Complete Sys- Pao, C., Hazen, T.J., and Hetherington, L.
tem for the Slovenian Language, Journal of Computing and Information Technology, versational Interface for Weather Informa- University Computing Centre Zagreb, pp.
tion, IEEE Transactions on Speech and Au- dio Processing, IEEE, pp. 8(1) 85–96.
[17] Walker, M.A., Litman, D.A., Kamm, C.A., [9] Fraser, N.M. and Gilbert, G.N. (1991) Sim- and Abella, A. (1998) Evaluating Spoken Di- ulating Speech Systems, Computer, Speech and Language, Academic Press, pp. 5(1) 81– Studies, Computer, Speech and Language, Academic Press, pp. 12(3) 317–347.
[10] Dahlb¨ack, N., J¨onsson, A., and Ahrenberg, [18] Barras, C., Geoffrois, E., Wu, Z., and Liber- How, Proc. of the International Workshop on and Use of a Tool for Assisting Speech Cor- Intelligent User Interfaces, ACM Press, Or- pora Production, Speech Communication: Special Issue on Speech Annotation and Cor- [11] Zoltan-Ford, E. (1991) How to Get People pus Tools, Elsevier Science, pp. 33(1) 5-22.
[19] Di Eugenio, B. and Glass, M. (2004) The derstand, Journal of Man-Machine Studies, Kappa Statistic: A Second Look, Computa- tional Linguistics, The MIT Press, pp. 30(1) Zibert, J., Martinˇci´c-Ipˇsi´c, S., Hajdinjak, M., Ipˇsi´c, I., and Miheliˇc, F. (2003) Develop-ment of a Bilingual Spoken Dialogue System

Source: http://luks.fe.uni-lj.si/en/research/publications/papers/melitah-informatica04.pdf

Caffeine extraction

ISOLATION OF CAFFEINE FROM TEA EXPERIMENTAL TECHNIQUES REQUIRED OTHER DOCUMENTS INTRODUCTION Caffeine is a commonly encountered mild stimulant and a diuretic; it is widely used in proprietary drugs for the stimulant effect to prevent drowsiness. Caffeine is naturally present in the fruit and bark of a number of plants, including tea, coffee, and cacao. Tea contains about 30-75

Microsoft word - osteoporosis chronic dx handouts pcmh.doc

Board Certified Family Medicine / Geriatric Medicine / General Surgery / Physical Medicine / Cardiology Goals We at Preferred Medical Group, PC are recognized as a Patient Centered Medical Home (PCMH.) This simply means that we meet stringent requirements as a health care team and as a team we are committed to providing you the best state-of-the-art care we can. You are an integral part of thi

Copyright © 2010 Health Drug Pdf