INTUITION review on evaluation methods
From EuroVR Knowledge Base
The following is an excerpt from INTUITION deliverable D???
The advantages and disadvantages of some of the main evaluation methods/approaches are here reported. INTUITION partners provided feedback on the types of methods which they have used and their personal experience of using these methods in terms of ease of use and usefulness. This feedback shows that the most commonly evaluated measures are usability and user performance and outcomes. The university partners have a broader experience of methods than the industrial, research and development partners, however, the industrial, research and development partners generally found the methods more useful than the academic partners. This is most likely due to the fact that industrial, research and development partners tend to evaluate specific aspects of a VR/VE system design which has been implemented within the workplace. The outcomes of the evaluation are often used to further the development of a system or to prove that a system is working as intended, or that the user experience of the system is as intended. It follows that collaborative work between academic and industry, research and development partners would be a good way for the universities and research institutes to share information about evaluation methods unfamiliar to industry, and for industry to discuss the types of systems and VEs they work with and their requirements for evaluation. This type of collaboration will allow the testing of different methods with different systems and applications according to user requirements.
In addition, the report includes a description of the design methods/approaches considered along with a collection of the most important characteristics of each one. INTUITION partners have provided feedback on the different types of design methods they use and their personal experience when using these methods in regards to the final application and the design philosophy. Although no standard methodology is normally used, we can categorize the actual design methodologies in four classes: User centered, Interaction centered, Task centered and Content centered, being the two first ones much more common among our sample of designers.
Evaluation of VR/VE
It is important that evaluation work is conducted in the laboratory to further our understanding about how people interact with VR/VE and to compare different types of technology and VE designs in terms of usability, likeability, usefulness and utility. It is equally important that evaluation work is carried out in end-user companies in order to examine the varied needs of the actual end-users and monitor the success of the transfer of VR technology into real working industrial applications. This allows us to identify key usability issues and further user requirements, from which we can work to bridge any gaps between company expectations, the actual performance of the VR/VE system and any technical constraints.
Evaluation of a part or the whole VR/VE system will indicate whether it performs as it was intended, and also whether participant performance is at an adequate or expected level. For example, evaluation can show whether a VE designed to train participants to learn how to complete a specific task actually achieves its goals. Performance on such a task could be compared to traditional methods of training to show whether VR/VE has any added value. Sometimes there is no difference between traditional and VR/VE training, however, participants may find one training method more acceptable than another. User tests are also often conducted to examine which factors are important in order to perform a particular task and whether certain display systems meet these needs better than others.
Not only does evaluation provide feedback for the evaluator or the company in question, but it also provides important feedback to the participant in terms of their progress on a particular task. It can help to improve task performance and increase participant motivation. Evaluation can highlight any strengths and weaknesses of a system, for example, in terms of usability of the VE/devices/interface/interaction technique, and can also assess the suitability of VE participation for particular users (e.g. some users may consistently report sickness symptoms and do not habituate to VE use), as well as the health and safety of a particular VR/VE system. Evaluation also allows comparison across different systems or components of a VR/VE system and can assess whether any changes made to a system produce the required outcome. End-user behaviour can be observed and assessed on predetermined criteria and any negative symptoms of VR/VE use can be monitored.
During evaluation industrial VR/VE users may consider a number of different factors such as cost effectiveness, precision, functionality and quality of the technology they are using. For example, when VR/VE is being used for design purposes a number of performance measures could indicate how successful the use of technology has been in achieving the company’s overall aims, e.g. reducing the number of physical prototypes being built (cost), and accelerating the design process (time). For many users it is important to assess that the technology being used actually meets, and is acceptable for, the industrial/academic user’s application needs. There are problems with defining some of the measurable performance indicators in VEs, for example, usability and presence. Researchers often view usability and presence as incorporating different elements; thus there are a range of tools available which inevitably measure different proposed elements of these concepts.
Selection of appropriate evaluation methods: issues to consider
Evaluation methods differ in the type of user required (representative users or usability expert(s)). They differ in terms of the context in which they are applied and this will have an impact on the applicability and generalisability of the results. Findings from evaluations carried out in a generic context will be more readily generalisable to other situations than findings from evaluations conducted in an application-specific context. Evaluation methods also differ in the type of results they produce - quantitative or qualitative (Bowman, Gabbard and Hix, 2002).
It is important to consider the goals of the evaluation, the type of outcomes that are necessary or desired and how the findings will be used. The available resources should also be considered, such as access to the VR/VE technology, time available to conduct the evaluation and analyse the results, availability of adequately trained evaluators and data analysers, the types of users the evaluator will have access to and any time constraints these users may have, and financial constraints. In addition to this, the nature of a particular VR/VE system may constrain the types of evaluation methods which are appropriate to use. Some methods may be more cost-effective than others depending on the type of results required and how these will be used.
It is particularly important that user needs are considered when designing an evaluation programme so that the methods used will assess whether the specific requirements are met by the VR/VE system. It is only in this way that we can draw conclusions about the utility of the VE application for real working use. Too often the individual and organisational user needs are not adequately considered when assessing cost benefit. It is only with a clear understanding of these needs prior to evaluation that appropriate methods can be selected to assess whether the specific requirements are met by the VR/VE system. It is important for the evaluator to have a good understanding of the VR/VE system, user and organisational needs and any constraints. The specific demands of the tasks within a VE need to be considered prior to choosing evaluation methods so that the VR/VE system is appropriately assessed. For example, the nature of a menu/interface and/or tasks within a VE will make specific demands on interaction (e.g. in the types of wrist movements required when using hand-held interaction devices), therefore evaluation methods need to adequately capture this information (which will often be specific to a given application).
An evaluation process will usually include the following steps: determining the purpose of the evaluation and establishing evaluation objectives, designing the experiment(s), recruiting appropriate participants, running the experiment(s) and collecting data, analysing the data, and reporting the results in an appropriate way to interested parties.
Methods usually need to be tailored for the evaluation of a specific VR/VE system as there is much variability between different systems (in terms of hardware, software, interaction). For example, it is currently unfeasible to create a standardised usability questionnaire for use with all systems. VR/VE technology is constantly evolving and so the evaluation tools used to assess this technology must also evolve and adapt.
Constraints
The organisation should be clear on what type of results they would like to achieve from the evaluation process; this will have an impact on the types of evaluation methods chosen and/or on the type of analysis carried out on the data collected during evaluation.
However, the organisation may have financial constraints which may impact on the type of evaluation methods they can use and the time available for evaluation. Other constraints are: the physical space required for a particular evaluation set-up, access to participants, availability of equipment and required expertise (e.g. some methods such as neurophysiological methods and eye tracking are complicated to set up and the analysis of data requires specific expertise). All constraints need to be considered when designing an evaluation programme.
Difficulties in evaluating VR/VE
Physical environment issues (see Bowman, Gabbard and Hix, 2002)
The nature of the VR/VE system impacts on the evaluation of the system, and involves a consideration of issues which are not usually relevant in traditional 2-D interface/system evaluation. For example, there are no standardised interaction devices or output devices used in VR/VE and participants may interact with the VE in a standing or sitting position, or they may be moving around a large or confined physical space. In addition, there may be multiple users interacting in a collaborative VE, possibly in more than one physical location with multiple interaction or output devices. Evaluation in such cases may involve several evaluators to record the behaviour of each participant.
The use of a head mounted display (HMD) obscures the actual physical environment and care must be taken that participants stay within the designated physical interaction space (for tracking purposes) and do not collide into walls or trip over any cables. Similarly, participants in projection system environments such as a CAVE may bump into walls during immersion. It is important to take these issues into consideration as they may impact on performance (e.g. on timed tasks).
There are issues with evaluation of VEs displayed on HMDs as the display only allows one viewer. The video signal could be split, sending it to both HMD and a monitor for the evaluator. However, multiple evaluators may be necessary – one to observe the VE on the monitor and another to observe participant behaviour. Recording a synchronised video of both user and interface (Hix and Hartson, 1993) is a technique which is used to establish cause and effect of certain participant behaviours. This may be difficult because participants may be mobile whilst interacting with the VE. The use of a tracking camera could be used to counter this problem, or additional evaluators may be employed. There are also problems with recording stereoscopic images.
Evaluator issues (see Bowman, Gabbard and Hix, 2002)
In many VEs it is important that the participant feels a sense of presence, i.e. feeling that they are actually in the virtual world. If evaluation of a VE involves assessing the level of presence experienced by the participant or if presence may affect task performance then it is important that the evaluator remain out of sight and refrain from talking (i.e. avoid doing anything which may disrupt the participant’s sense of presence) whilst observing participant behaviour/performance. Thus the evaluator must ensure that the participant clearly understands the task to be performed during VE immersion prior to beginning the study so that there are no later unnecessary interruptions.
Due to the sometimes complex nature of VEs, multiple evaluators may be necessary to set up and monitor any technical equipment as well as to record the different types of data (e.g. observation of behaviour, timing tasks, recording errors, processing multi-modal input) (Bowman, Gabbard and Hix, 2002). An additional problem for the evaluator is that due to the nature of some VR/VE systems it is sometimes difficult to observe what the user is doing in the VE as their movements may be obscured from view, for example, in a CAVE the user may have their back to the evaluator.
User issues (see Bowman, Gabbard and Hix, 2002)
Many evaluation studies examining the usability of VEs, interfaces, interaction devices, interaction techniques have been carried out with university students who may not be representative of the likely end-users of VR/VE technology, thus making it difficult to generalise research findings. There are very few expert users of VEs – most participants in experiments are novice users. However, some participants may have some limited previous experience with VR – it is difficult to know whether these participants may perform better than participants who have never used VR. Potentially, factors such as age, gender, technical experience, spatial ability, and physical characteristics may impact on performance in a particular VE or all VEs. One example of how physical characteristics may affect performance concerns hand-span size - some hand-held interaction devices are suited to users with large hands – the data glove may keep twisting if worn by a user with small hands, thus displacing the sensors and interfering with task performance. Another example is a hand-held device which has small buttons positioned in close proximity to each other – in this case a user with small hands may find it easier to press and distinguish between buttons than a user with larger hands (Patel et al., in press).
As many participants are novice VE users, the results of an evaluation may show high variability and individual differences. To account for this a large participant sample may be required for reliable statistical results. The importance of conducting tests with real end-users cannot be stressed enough. It is only within a real working context that we can develop an understanding of actual user needs and usability concerns.
It is important to bear in mind that some VE participants may experience side effects of VR/VE use and this should be taken into account when participants are required to use a VE for long periods (e.g. over 30 minutes without a break).
Technology issues
In contrast to the average Desktop office working environment, there is no typical or average VR set-up. Different organisations using VR/VE usually have different requirements and may use different types of projection technology, different software, and different interaction devices and interfaces. This very fact is largely responsible for the range of methods being used to evaluate VR/VE technology, and the lack of a standardised evaluation toolkit. Typically, methods are tailored to evaluate specific VR/VE applications; these may be based on the type of technology, display systems and devices being used as well as the user requirements of the application and desired end-goals of the applications. Different VR/VEs may have different usability issues concerned with their use, for example, the different field of view afforded by the head mounted display compared to a CAVE. The technology being used can also affect participant performance, for example, system fidelity may have an impact on performance measures such as presence. Evaluation methods must take these differences into account. When VEs have been evaluated, for example, to assess usability, comparison across different studies is very difficult due to the range of evaluation techniques which are in use and the differences in requirements of VR/VE applications. Equipment unreliability may also have an impact on the evaluation methods used.
Human factors issues to consider in evaluation programme
There are a number of important human factors issues which should normally be assessed during the VR/VE evaluation process:
- The effectiveness of the system, (e.g. does the VE offer the functionality required by the user; system fidelity)
- Participant performance, e.g. can the user perform the task effectively (this could be in terms of ability to design effectively, or time taken to complete a task, number of errors made, or successfully learning how to do something)
- Usability of the VE and associated devices (e.g. ease of use, intuitiveness of interaction, appropriate object behaviour, adequate feedback on performed actions etc.)
- Health and safety issues – psychological and physiological effects (e.g. sickness, stress, eye discomfort, any postural problems)
- User attitudes towards VR/VE system; user experience/likeability/acceptance
- Utility of the system, i.e. is it useful for the company – does it meet the needs of the company; is there a transfer of outcomes of VE use into the real world?
- Effects of VR/VE implementation on the organisation (e.g. how will/have work processes changed with the introduction of VR; how can the use of VR be successfully integrated into the current work process; are there are any special training needs?)
The exact methods used to assess the human factors issues outlined above will depend on company constraints (e.g. time and resources available) and also on the specific VR/VE system/application. The main tasks common to most VEs are object selection, object manipulation, navigation and orientation. How these tasks are accomplished will depend on the interaction device, interaction technique and display system used. The design of VEs/devices/interaction techniques should ideally improve and support participant interaction, minimise sickness, in addition to facilitating the bidirectional transfer of knowledge from the VE to the real world. Depending on the goals of the evaluation study, some or all of these issues may be assessed using a number of different methods. A number of different evaluation approaches and methods are described (and specific tools cited where appropriate) in Table 1 in Section 2.
Advantages and disadvantages of some evaluation methods
In this section there will be a discussion of the relative advantages and disadvantages of using some of the main types of evaluation methods (expert based/heuristic evaluation, sequential evaluation, traditional experimental design, testbed evaluation, ethnographic approach, self-report measures/questionnaires, verbal protocol/think-aloud, interviews/focus groups, observation, and psychophysiological/neurophysiological measures).
Expert based/heuristic evaluation
Using expert-based/heuristic evaluation early on in the evaluation process has the advantage of highlighting and addressing many obvious usability problems prior to end-users trying the system, thus reducing the time and cost of conducting many user tests. However, it is difficult to conduct stand alone evaluations by usability experts based on the use of heuristics as there is a lack of standardised design guidelines for VR/VEs (though see Bowman, 2002, Kaur, 1998; Mills and Noyes, 1999 for published guidelines, and Tromp and Nichols 2001 for an example of an inspection tool based on heuristic evaluation and cognitive walkthrough). As there is a lack of standardised interaction devices, display systems, in addition to the large design space for VE interaction techniques and interfaces it is difficult to produce generalisable heuristics which can be used to evaluate a range of different systems.
<br>
Sequential evaluation
A sequential evaluation approach is a useful and cost-effective method for addressing the design and evaluation of VE interfaces (see Gabbard, Hix and Swan, 1999; Bowman, Gabbard and Hix, 2002). The user task analysis is particularly important because the representative user tasks will emerge from this phase, and thus will impact on the types of tasks users are asked to perform in the formative and summative evaluations. Heuristic evaluation is typically carried out prior to any formative evaluation so that any obvious usability issues can be addressed before the involvement of representative users. A sequential evaluation approach is typically adopted early on in the design process, resulting in a usable interface for a specific application through an iterative process of design and evaluation. End-user involvement in the process should ensure that user needs are being met by the design and should also increase user acceptance of the VE leading to an end result of positively impacting on productivity and quality of work. However, although this is an effective approach it can be quite demanding in terms of the time and resources required to conduct evaluations (Bowman, Gabbard and Hix, 2002).
Traditional experimental design
A traditional experimental design evaluation approach can produce both quantitative and qualitative data which can be analysed to provide information on which interface or device performs better in terms of, for example, accuracy of task performance or usability. It is important to adopt a rigorous approach to design in order to infer cause and effect. Possible extraneous factors should be controlled, order effects should be taken into account (there should be counter-balancing of participant exposure to different conditions in a within-subjects design), and there should be random assignment of participants to different conditions. It is often difficult to control all possible extraneous factors. The results from this type of study are less generalisable across different systems, tasks and users compared to testbed evaluation.
Testbed evaluation
Testbed evaluation involves investigating the effect of a primary independent variable (e.g. a particular interaction device) as well as varying as many other potential factors which could have an impact on the experimental results (e.g. system, task, environment and user characteristics. Evaluation of interaction is often conducted outside the context of specific applications and usually involves large numbers of participants (see Bowman and Hodges, 1999). Testbed evaluation is conducted when quantitative, statistical results are desired. This is a more complex and time consuming approach than examining the primary independent variable in a specific setting whilst trying to hold constant as many of the other potential factors which may influence results.
Ethnographic approach
Ethnographic approaches may be used by VR researchers in the future in order to gain a detailed understanding of how the technology is actually used. This approach involves extensively studying an organisation’s culture. The evaluator becomes an integrated, active participant within the organisation, recording behaviours in context using observational and interview techniques. Currently, this approach is not widely used in the area of VR, however, researchers have used this approach to understand feelings of presence in a VE (Spagnolli, Varotto and Matovani, 2003). The advantage of this approach is that it allows a greater and more detailed understanding of the working culture of a particular organisation than other evaluation methods. Disadvantages of this approach include that it is organisation-specific – it only reports current work practice and therefore it can be difficult to generalise findings. It can be very time consuming to collect data, and analyse observational notes, video recordings and interview transcripts. The whole process of analysis is often conducted by the ethnographer who recorded the data as he/she has a detailed understanding of the situation. Interpretation of the findings also relies on the ethnographer and as such may be subject to bias.
<br>
Self-report measures/Questionnaires
The advantages of using questionnaires as an evaluation method are that they are cost-effective, easy to administer, generally quick to complete and the time taken to organise and analyse the data is relatively short. This is important when there is only limited access to industrial users and/or certain technology. In addition, questionnaires are accessible to industrial users without the need for additional explanation with regards to their completion. Administering questionnaires does not usually require any special expertise on the evaluator’s part (though analysing the data may require specific statistical or qualitative analysis skills). Questionnaires are also useful for evaluation purposes in circumstances in which participant anonymity is important. Many questionnaires could be potentially used as an automated tool so that data can be collected by using a keyboard, eliminating the need for manual data entry.
However, there are disadvantages to using questionnaires: the information they provide is often not as rich as that collected from observation or interview methods. In addition, responder bias may play a role in the way that questionnaires are completed. For example, some participants may be willing to select the extreme options on a question involving scales whereas other participants may be biased to selecting options near the centre of the scale. Another type of bias which may occur is that participants may report some symptoms when completing sickness questionnaires even if they are not experiencing any because they assume that they are required to report symptoms. Ensuring participant privacy may reduce prestige bias (participants responding to questions in a way which puts them in a favourable light). In addition, the type of question used (open or closed) needs to be considered in relation to the information required from the participant, participant motivation (open-ended questions tend to require high participant motivation) and the possible introduction of bias (if closed questions are not designed carefully, i.e. all possible alternative answers to a particular question were not anticipated). Closed questions are quick and easy to answer, however, open-ended questions can be more useful when exploring a new area of research or the use of a new technology (though the data from open-ended questions can be difficult to analyse and report, and sometimes participants may not answer the questions accurately). The use of questionnaires eliminates evaluator bias which can occur during evaluation methods such as unstructured interviews or unstructured observation.
Questionnaires are effective in collecting information on a range of different factors in a relatively short period of time. However, it is important to keep in mind the type of information that is required from a particular evaluation and to consider data collation of the total set of questionnaires, and the different analyses that will be required on different types of data (some may be more time consuming than others).
Verbal protocol/Think-aloud
The think-aloud protocol involves the participant vocalising his thoughts, goals, perception, opinions, feelings, and talking about his actions whilst performing a task or generally interacting with the VE. This allows the evaluator to understand how the participant approaches the task and their reasoning when interacting with the interface, as well as gain a better understanding of the participant’s mental model. Any differences between expectations of how to conduct a specific task and the actual sequence of steps required in order to complete the task are highlighted using this technique. This technique is a cost effective way of collecting good qualitative feedback from the participant, however, depending on the number of participants it may be more time consuming to analyse this data than questionnaire data. In addition, this technique may be difficult to use in some VEs, e.g. when speech recognition is employed. Think-aloud protocol may also be inappropriate to use in VEs in which it is important to facilitate a sense of presence. Post-immersion interviews may have to be used instead in order to elicit the required information. The key to choosing any evaluation technique is considering what the goal of the evaluation is and the type of information that is required and what this will be used for.
Interviews/focus groups
Interviews conducted following VR immersion can be used to obtain more information than a questionnaire, for example on usability, presence, likeability issues. Interviews can elicit more detailed information than a questionnaire and probe deeper into the participant’s VR experience and can provide insights into the participant’s thought processes. Structured interviews consist of a specific, defined set of questions and responses. Open-ended interviews allow more of a conversation between the interviewer and participant and thus may allow the interviewer to ask broad questions or more detailed questions about topics which arise during the interview, and allow the interviewee to provide additional information. Experienced evaluators may be required for open-ended interviews in order to ensure a thorough exploration of relevant issues. This type of interview makes greater demands on the evaluator to collect the required information, and in addition can be subject to evaluator bias as well as be more time consuming than administering self-report measures.
Focus groups could be useful in eliciting participants’ requirements for a VR/VE system or their attitudes and opinions about a particular VR/VE system. This method provides an opportunity for a group of users to discuss their requirements for a VR/VE system during the design phase, or compare their experiences of using a VR/VE system following implementation. The interaction between participants could lead to insights about certain issues which would not ordinarily emerge from the use of individual interviews. Participants could be the same type of end-user (e.g. designers) or could be different stakeholders from a company. This method produces a large amount of information in a relatively short amount of time (e.g. 2 hours) but requires a good moderator to ensure that the discussion is focused. However, the data collected can be time consuming to transcribe (the discussion is usually recorded) and analyse. It is recommended that focus groups are used in combination with other methods for examining the usability of a VR/VE system because what users say about using a VR/VE system may not necessarily reflect how they actually use the system (see Morgan and Kreuger, 1993 for more information on the use of focus groups).
Observation
Data from observation methods and participant comments may provide more detailed information on participants’ VR experience than for example, questionnaire data. Some researchers have found that this data is more useful in identifying specific usability problems, and analysis of this data has provided more constructive information for designers to use in improving user interfaces and devices (Patel et al., in press). Observing how participants are interacting with the VR/VE system may also highlight areas in which training may improve participant performance and their VR experience.
A number of factors can influence data collection when using observation as an evaluation method. The expectations of the observer (e.g. if they are aware of the experimental hypothesis) may bias them into attending to and recording certain types of data. Often more than one evaluator will record and analyse the data in order to increase accuracy or data collection and analysis. Participants may behave differently when they are being observed and any interactions between observer and participant may change how the task would normally be conducted. In addition, observers may change the use of any pre-defined categories for recording behaviour during the observation period. It is important that the observer(s) are clear about the definitions of each category of behaviours and anticipate any areas of potential confusion prior to the observation period. The frequency of occurrence of a target behaviour may also be the cause of observer error, e.g. if the target behaviour occurs frequently, the observer may fail to record some instances of the behaviour due to a heavy workload or lack of attention to the task.
Psychophysiological/neurophysiological measures
The use of physiological and neurophysiological methods of evaluation can in principle measure factors such as mental workload, side and after effects of VR use (e.g. stress, strain), emotion, alertness and attention objectively without user or experimenter bias, and can provide information not easily available from performance measures or subjective reports. These measures are taken concurrently with user participation in the VE, and thus monitor exactly when changes occur during immersion, indicating aspects of the VE or task which could be the cause of these changes (Nichols et al., 2000b). In combination with self-report methods (e.g. questionnaires) to measure, for example, sickness/stress, these neuro- and psychophysiological methods could be used to provide a more detailed assessment of the user’s state during VR/VE immersion and verify some of the self-report measures (e.g. state of alertness).
One area in which researchers often recommend the use of physiological measures is to monitor presence, as these measures are continuous, allowing for the assessment of time-varying qualities of presence. One can assume that as the sense of presence in a VE increases, the physiological responses to the environment will become increasingly similar to those exhibited in a similar, real environment.
When employing the use of physiological measures it is important to appreciate that there are many interacting dependent variables (e.g. vigilance, arousal, stress, workload, mood, etc.) and many physiological measures may have an ambiguous meaning because they are related to different concepts (for example, heart rate can potentially reflect mental workload, effort, attention, and emotion; and the appearance of theta/delta rhythms may indicate both vigilance degradation or intense cognitive processing, Mager et al., 2003).
The cost of adopting such measures in terms of time and monetary expense is greater than that expended when using self-report methods, therefore there must be a careful consideration of the potential gains of using such techniques. Some of the methods are complicated to set up and require specific experimenter expertise. Other methods such as measuring endocrine response (e.g. cortisol or catecholamines) will require the laboratory analysis of blood, urine or saliva samples. Analysis of saliva samples (e,g, to measure cortisol levels) is cheaper than blood or urine samples (which are necessary for the measurement of catecholamines).
Analysis and interpretation of this data may be a complex and time consuming task due to the sheer volume of data generated in the use of these methods. In addition, these measures are sensitive to any changes in the internal state (e.g. current physical state) and the external environment (e.g. noise, temperature), thus require controlling as many of these extraneous factors as possible. Whilst in principle these measures are potentially objective, there are likely to be individual differences in the data produced (baseline data is usually required for each individual) and when interpreting this data and making generalisations the evaluator may exert his biases (see Mager et al., 2003; Karaseitanidis et al., in press for further information on the use of these techniques).
Design of VR/VE
The development of a VR application can become a very complex process and for that reason it should be carried out by experts and technicians with the knowledge and experience required. However, there is a growing interest in tools that allow non-experts to either develop or participate in the development of these applications. One of the advantages of doing so is that domain experts could better communicate the requirements of the application to the VR experts, reducing the number of cycles of development. Besides, the popularity that Virtual Reality gained thanks to media produced a demand of tools that allowed average users to create their own virtual worlds too [Molina 2005].
Nowadays, designers have visual tools that make their work much easier due to the development of better desktop application interfaces. Previously, these interfaces used to be coded by programmers, but now rapid prototyping tools allow potential users to design the layout and appearance of their future application. In addition, design and development of VR/VE applications is not a one-man task, but a complex multistage process with many participants [Kanev 1998] [Östman 1998].
If we consider that a VR application consists of the interface used plus the logic behind, we could think of similar prototyping tools for VR. However, there are important differences. The tools used for prototyping desktop interfaces do not differ much from a 2D drawing tool, as the designer just positions interface elements -from a given list- in the different windows the application consists of. This is so simple because the designer assumes that the user will use the keyboard and mouse to interact with the application, and the techniques used for such an interaction are well known. On the opposite, a VR application can involve input and presentation devices for which there are no standard interaction techniques. Anyway, drawing desktop interfaces with these tools is not enough as program logic must be added later [Molina 2005].
Design methodologies should provide a sequential process to guide designers and facilitate communication among the members of the project team and users, and this is a characteristic that should be kept through every stage of the design process. Most processes start with a conceptual design phase and then go through several iterations of development before coming with a robust solution. This step by step process is especially important for the design of virtual environments, where software engineers work side by side with artists or psychologists (e.g.) to create meaningful places for people. A truly user-centered design process for virtual worlds must be concerned with “who” and “why” before determining what it is and what it does [Tanney 1997].
The first part of this deliverable will report on general issues associated with the design of VR/VE systems, in particular, on how to determine which design methods are the most convenient for a certain design programme, type of application or design philosophy. Stages to consider in the design process and tools which facilitate the work of the designer are also studied. A bibliographical review was performed in order to do so and is presented in section 2. There, we will provide a description of a collection of methods used in the design of VR/VE, along with the advantages and disadvantages of these design methods and approaches. A simple questionnaire, shown in appendix I, was built and sent to all INTUITION partners to get their feedback about their design methods. Section 3 presents a detailed analysis of the gathered responses to our questionnaire. A discussion of these results is presented in section 4. To conclude, although there are many design methodologies applied to VR/VE, we notice that there is not a standard design methodology, which could translate into the lack of rigor and ambiguity in the design process. However, derived from our analysis, we propose in section 5 a possible categorization of the design methodologies used by our sample of designers.

