emptyclip.de - ¡Este dominio podría estar en venta!

Encuentre la mejor información y los enlaces más relevantes sobre todos los temas relacionados con emptyclip.de. ¡Este dominio podría estar en venta!
149KB Größe 6 Downloads 68 vistas
A Virtual Reality-based Framework for Experiments on Perception of Manual Gestures Sebastian Ullrich1 , Jakob T. Valvoda1 , Marc Wolter1 , Gisela Fehrmann2 , Isa Werth2 , Ludwig Jaeger2, and Torsten Kuhlen1 1

2

Virtual Reality Group Deaf & Sign Language Research Team Aachen 1,2 RWTH Aachen University, Germany [email protected]

Abstract. This work contributes an integrated and flexible approach to sign language processing in virtual environments that allows for interactive experimental evaluations with high ecological validity. Initial steps deal with real-time tracking and processing of manual gestures. Motion data is stereoscopically rendered in immersive virtual environments with varying spatial and representational configurations. Besides flexibility, the most important aspect is the seamless integration within a VR-based neuropsychological experiment software. Ongoing studies facilitated with this system contribute to the understanding of the cognition of sign language. The system is beneficial for experimenters because of the controlled and immersive three-dimensional environment enabling experiments with visual depth perception that can not be achieved with video presentations.

1

Introduction

Experimental evaluation of sign language is of major interest in the areas of linguistics, media theory, psycholinguistics and psychology. However, most times conventional presentations of manual gestures and facial expressions by video or mirrors are used. Virtual Reality (VR) is one of the most sophisticated interactive media capable of closely resembling reality and has been therefore increasingly used for neuropsychological experiments. The methods proposed are used in current projects focusing on spatial cognition of manual gestures in hearing and deaf [1]. In these experiments we concentrate on the fact that in contrast to hearing speakers, who receive a constant auditory feedback of their own voice while speaking, the visual perception of a manual sign is different for the signer and the addressee. Even though the described toolkit is the enabling technology for our experiments, its application is not limited to one specific paradigm. The remainder is structured as follows. Section 2 presents current computerbased systems for acquisition, processing, and presentation of sign language and discusses open issues. The proposed approach is detailed in section 3 and technical details are described in section 4. Its applications for gesture acquisition and experimental evaluations are presented in section 5. Section 6 provides a conclusion and an outlook on future work.

2

Related Work

There are many contributions to the simulation of sign language gestures. We focus on systems that deal with 3D motion data and visual representation of sign language by the use of virtual humanoids. In order to create a corpus with manual gestures, there are three common approaches: manual modeling and animation, semi-automatic generation, and motion capturing. The software of Vcom3D1 is the “gold standard” for producing American Sign Language animations. Although it provides a very sophisticated GUI and features 3D characters, the output can be only rendered into 2D movie-clips. Synthetic generation is often done based on semantic input from specialized description languages and merges basic components to complete gestures [2, 3]. Motion capturing approaches deal with calibration issues of data gloves [4] and retargeting of motion data to different models. Some recent systems focus on real-time communication with embodied conversational agents. One of the first such systems is called TESSA [5]. It has been developed to translate speech to sign language for transactions in a Post Office. Motion capture techniques have been used to create a corpus with typical phrases. The research project ViSiCAST is focused on automatic translation into sign language [6]. The main contribution is an animation synthesizer that combines input sequences in HamNoSys (a gesture description language) with items from a static hand shape database to generate animation data. Most recently, the research project HuGEx introduced expressiveness to virtual gesture communication [7]. From a motion database significant features are extracted, labeled with semantics and applied with different executional styles to synthesize varying expressiveness. The aforementioned systems are all showing promising results. Although, 3D computer graphics are employed, VR-techniques to immerse the user (like stereoscopic displays and head tracking) are not utilized, which is essential for depth perception. In addition, they do not facilitate controlled virtual environments dedicated for experiments.

3

Flexible architecture

The requirements for interactive experimental evaluations of sign language processing cover a broad field of different disciplines. Four distinct tasks have been identified (data input, model, representation and experimental platform) and integrated into a shared framework (cp Fig. 1). The first component consists of an arbitrary hardware or software source producing data input (e. g., output from modeling, motion capturing, kinematic algorithms or synthesized gestures). This includes automatic or manual post-processing and mapping of the data to a model. The model consists of a complete description of a time-varying hierarchical structure based on human anatomy. It combines the data from an arbitrary number of data inputs to provide a consistent and complete model. For the representation to a human user 1

http://www.vcom3d.com

database

model

representation

experimental platform

visualization

experiment paradigm

hierarchical skeleton mapping

software

postprocessing

data input hardware

motion data geometry

session/blocks/trials

user

Fig. 1. Flexible pipeline with extenible components

the structural model must be visualized in a stereoscopic 3D environment. Several techniques can be chosen to vary the degree of realism or to change the appearance. Thus, the same structural model can be visualized differently, e. g., as puristic dots, as stick-figure or as realistic human model. The experimental platform adds the semantic context for sign language experiments. Here, the representations are integrated into interactive virtual environments according to a user-defined experimental design. This includes the specification of a chain of events structured into sessions, blocks, and trials, of analysis parameters and the interaction possibilities of the human user. The last step of the workflow is the user, who is able to interact with the experimental platform (e. g., reacting to stimuli) or with the data inputs directly.

4

Implementation

Within this section we describe technical details of the components from the previous section. The data input covers a broad range of sources. Different types of gloves are supported, each with device specific calibration techniques and mapping functions, which are responsible for production of data compliant with our model. The calibration can be modified on-the-fly with a LUA script to adjust for difficult hand postures. In addition, motion capturing of body parts or a full body is realized with either electro-magnetic tracking or optical markers. This motion data is processed and mapped to an articulated structure (model). Retargeting allows to transfer data from one structural model to another. Motion sequences can be cut, merged or blended together. Because the synthesized motion data uses pointers and time markers to refer to the original data, these operations are performed interactively in real-time. Also, modeling of movement is supported through forward and inverse kinematics. Forward kinematics enables direct joint manipulation and affects all subsequent joints. Combined with a picking metaphor it is used to create specific postures interactively. The implementation of inverse kinematics solutions includes analytical solutions for human limbs and algorithms based on the Jacobian matrix for longer kinematic chains. Our toolkit contains multiple readers and writers, e.g., the H-Anim format for articulated humanoids and the BVH format for motion data are supported. The model consists of several extendible systems that are grouping functional anatomic data structures [8]. An articulated hierarchical joint structure

(based on the H-Anim standard) is defined within the skeletal system that is mainly used for motion data. A joint has among other properties a local quaternion, a rotation center, optional references to segments and sites. Additionally, realistic degrees of freedom are enforced with joint limits. Motion is stored by using discrete samples that contain quaternions, which represent local orientations of the joints. Another important system is the integumentary system that represents the outermost layer of the human body. It contains the representational data of the virtual humanoid, e. g., skin geometries with vertex weights that are linked to the skeletal system. Our implementation of representational components places emphasis on different interactive visualizations (cp. Fig. 2). Thus, the joints, segments and sites of the skeletal system can be visualized, as well as the skin from the integumentary system. In particular, state-of-the art methods for rigid skin visualization, stitching, blendshapes (i. e., for facial animation), and vertex blending are integrated. GPU-optimizations have been applied to ensure interactive frame rates for skin visualizations with complex geometries. Through the usage of the Model-View-Controller design pattern the representational algorithms can be flexibly exchanged or extended without changes to the data input or model. This enables the creation of multiple views for one humanoid, that can be shown concurrently or can be switched instantly [9]. ReactorMan serves as the experiment platform to define and conduct experiments [10]. Recorded or synthesized gestures are visualized in ReactorMan by applying the representational component of the pipeline. The experimental paradigm is defined by the sign language linguist. The ReactorMan software is also used to record several experiment-related variables, for example user movement, user interaction, and reaction times. In particular, ReactorMan provides special hardware for accurate (i. e., error < 1 ms), software-independent reaction time measurements. All recorded values are integrated into a single consistent timeline and can be evaluated by common tools (e. g., SPSS or MatLab).

5

Results

The framework and components described in the previous two sections have already been successfully utilized in different setups to build a motion database for perception experiments. In addition first experiments on spatial perception of manual gestures have been conducted with it already. A CyberGlove is used in conjunction with an electromagnetically tracked sensor to capture hand posture and trajectory. Due to the pipeline, real-time visualization is possible during recording, which allows to adjust calibration errors instantly. In a post-processing session the recorded items are edited and cut. Inverse kinematics (IK) is used to animate the upper limbs from shoulder to the hand. The signing space is adjusted interactively within our software by moving the hand position of the looping gestures whilst the arm automatically aligns accordingly. After these steps the items are stored in the database for subsequent experiments, although they may be used for other purposes as well. Over 100

Fig. 2. Different degrees of realism (joints only, robot and realistic hand) and variation of perspective (egocentric, addressee) for intuitive interaction and experiments in VR.

items from the German Sign Language (GSL) have been recorded, categorized and adjusted. In addition, over 100 non-signs have been conceived and created accordingly (by new recordings, as well as, edited from GSL signs). Non-signs are phonologically possible but non-occuring GSL signs, analogous to German nonsense words like “Rakane” and “mieren” or English nonsense words “raner” and “marg”. That is, non-signs were recombined out of separate existing phonological components (i. e., handshape and movement). All items are balanced in length and movement manner. The recording sessions took about 2 weeks with two deaf signers performing the signs. The post-processing was done by a deaf signer iteratively with review sessions by deaf signers and linguists in-between to evaluate and categorize the items. The average adjustment time for one item was about 10 minutes to half an hour. In a first experiment all subjects run a lexical decision task to decide whether the presented item is a lexical sign taken from German Sign Language or not. Participants view randomized meaningful signs and meaningless non-signs displayed in 3D on a stereoscopic desktop-VR screen from varied perspectives (lateral, egocentric and addressee). They are instructed to react as soon and as correctly as possible by pressing designated keys for ’Yes’ or ’No’. Responses are scored for the number of correct items and for reaction time. This experiment design has been implemented in the scripting language of ReactorMan. The system is interactive, i. e., it allows for on-line recording and immediate representation of hand gestures in virtual environments. It combines several processing stages in one framework. The ease of use has been proven by the fact that the deaf signers did most of the post-processing on their own and were pleased about the direct manipulation with the tools. Especially, the seamless integration into a neuropsychological experiment platform in combination with VR closes a gap in current approaches.

6

Conclusion & Future Work

Th presented system is for the experimental evaluation of sign language processing in virtual environments. A pipeline has been developed to reach flexibility in terms of data input during acquisition of signs. It allows for multimodal and immersive representation of the signs and adds an integrated platform for controlled

experiments. Possible scenarios include: interactive studies on online-monitoring (i. e., monitoring of kinesthetic feedback processes), studies of monitoring processes with delayed visual feedback and studies on morphing of discrete hand forms to distinguish spatial and linguistic borders. The system has proven its capability in first studies and enables a large variety of experiments in VR that will contribute to the understanding of the cognition of sign language.

Acknowledgment This work was funded by the German Research Foundation (DFG) SFB/FK 427. The authors would like to thank Babak Modjtabavi, Michael K¨onig and Horst Sieprath for their support during acquisition of hand gestures.

References 1. Fehrmann, G., J¨ ager, L.: Sprachbewegung und Raumerinnerung. Zur topographischen Medialit¨ at der Geb¨ ardensprachen. In: Kunst der Bewegung. Kin¨ asthetische Wahrnehmung und Probehandeln in virtuellen Welten. Volume 8 of Publikationen zur Zeitschrift f¨ ur Germanistik. Peter Lang, Bern (2004) 311–341 2. Lee, J., Kunii, T.L.: Visual translation: from native language to sign language. In: Proc. of IEEE Workshop on Visual Languages, USA (1992) 103–109 3. Lebourque, T., Gibet, S.: High level specification and control of communication gestures: the GESSYCA system. In: Proc. of IEEE Computer Animation, Switzerland (1999) 24–35 4. Heloir, A., Gibet, S., Multon, F., Courty, N.: Captured Motion Data Processing for Real Time Synthesis of Sign Language. In: Gesture in Human-Computer Interaction and Simulation, 6th International Gesture Workshop, GW 2005, Revised Selected Papers. (2006) 168–171 5. Cox, S., Lincoln, M., Tryggvason, J., Nakisa, M., Wells, M., Tutt, M., Abbott, S.: TESSA, a system to aid communication with deaf people. In: Proc. of ASSETS 2002, Fifth International ACM SIGCAPH Conference on Assistive Technologies, Scotland (2002) 205–212 6. Kennaway, R.: Synthetic Animation of Deaf Signing Gestures. In: GW ’01: Revised Papers from the International Gesture Workshop on Gesture and Sign Languages in Human-Computer Interaction. (2002) 146–157 7. Rezzoug, N., Gorce, P., Hloir, A., Gibet, S., Courty, N., Kamp, J.F., Multon, F., Pelachaud, C.: Virtual humanoids endowed with expressive communication gestures : The HuGEx project. In: Proc. of IEEE International Conference on Systems, Man, and Cybernetics, Taiwan (2006) 8. Ullrich, S., Valvoda, J.T., Prescher, A., Kuhlen, T.: Comprehensive Architecture for Simulation of the Human Body based on Functional Anatomy. In: Proc. of BVM 2007, Germany (2007) 9. Valvoda, J.T., Kuhlen, T., Bischof, C.: Interactive Virtual Humanoids for Virtual Environments. In: Short Paper Proc. of Eurographics Symposium on Virtual Environments, Portugal (2006) 9–12 10. Valvoda, J.T., Kuhlen, T., Wolter, M., et al.: NeuroMan: A Comprehensive Software System for Neuropsychological Experiments. CyberPsychology and Behaviour 8(4) (2005) 366–367