Standardization of user-centric evaluation

From RecSysWiki
Jump to navigation Jump to search

Impossible standardization

Standardization of user-centric research is difficult because the procedures, methods, and metrics used are highly context-dependent. In other words: Two user-centric research projects rarely use exactly the same system, a similar set of users, or the same evaluation metric. More importantly, because both usability and user experience are multi-dimensional, they rarely have the same goal. Rigidly standardized evaluation metrics are thus not feasible in user-centric research. However, early attempts have been made to integrate user-centric research findings under a common framework.

Generic frameworks

The concept of usability can be traced back to the cognitive psychological concepts of perception, attention, and memory. An early conceptualization of usability is provided by Don Norman in his seminal work The Design of Everyday Things. Norman describes the interaction between users and systems to exist of two gulfs: the gulf of execution and the gulf of evaluation. In the gulf of execution, the user, who has a goal, has to formulate an intention, translate this into the correct action sequence, and then perform this action sequence using the system's interface. In the gulf of evaluation, the user has to perceive the state of the system, interpret this state, and then evaluate the state in relation to the original goal. Interaction is thus a perpetual bridging of the gulfs of execution and evaluation. In order to bridge the gulfs effectively, users create an internal mental model (the Use model) of the system, which represents their beliefs of how the system works. The creation of such a Use model is helped by the feedforward and feedback provided by the system. The more the Use model resembles the actual way the system works (the System model), the better the usability of the system. Jakob Nielsen provides a classification of 10 usability evaluation heuristics. Nielsen uses these heuristics as guidelines for his Heuristic Evaluation method. However, the guidelines can also be used to categorize usability problems.

The concept of user experience can be traced back to the social psychological concepts of attitude, intention, and behavior. The most influential model in this respect is the Theory of Planned Behavior (TPB) (and its predecessor the Theory of Reasoned Action, TRA) by Icek Ajzen and Martin Fishbein. This model claims that our behavioral intentions are based on attitudinal and normative evaluations, and that these intentions, given enough behavioral control, lead to actual behaviors. The attitudinal part of this model has been adopted in the Technology Acceptance Model (TAM); the normative part of TPB has been adopted in the Unified Theory of Acceptance and Use of Technology (UTAUT).

A descriptive framework

Bo Xiao and Izak Benbasat have made an extensive effort to integrate existing work on user-centric recommender system evaluation in a framework in their 2007 paper titled "e-Commerce Product Recommendation Agents: Use, Characteristics, and Impacts". Although the paper focuses mainly on the body of research available in the field Information Systems, it also includes some work from the Human-Computer Interaction and Recommender Systems fields.

An integrative framework

Bart Knijnenburg et al. provide an evaluation framework that can be used as a guideline conducting and analyzing quantitative user-centric research on recommender systems. Their approach focuses on quantitative user experiments or field trials, and includes subjective as well as objective evaluation measures. Specifically, it reasons that objective system aspect, as perceived by the user (subjective system aspects), impact their attitude (user experience) and behavior (interaction). Attitude and behavior are also influenced by personal and situational characteristics. The framework by Knijnenburg et al. is not meant as a standardized evaluation metric to provide a "experience score" for a single recommender system, but as a guideline for controlled experiments that compare two or more systems that systematically differ in one or more objective system aspects. Using the framework, researchers can measure and explain the influence of these system aspects on the user experience.

A multi-dimensional metric

Pearl Pu and Li Chen provide also provide a user-centric evaluation framework for recommender systems. In contrast to Knijnenburg et al., they provide a specific list of questionnaire that can be used to provide a standardized, multi-dimensional evaluation of a recommender system. The framework breaks down into perceived system qualities (recommendation quality, interaction adequacy, interface adequacy; similar to Knijnenburg et al.'s subjective system aspects), beliefs (perceived ease of use, perceived usefulness, and control and transparency; in Knijnenburg et al.'s framework these are divided over subjective system aspects and experience), attitudes (a more generic evaluation of the user experience in terms of satisfaction and trust; similar to Knijnenburg et al.'s experience), and behavioral intentions (intention to use and return to the system; partly similar to Knijnenburg et al.'s interaction).

Benefits of using a framework

A recommender systems framework can be used to integrate existing and new research under a common denominator. Standardized terms can be used to compare research findings and to uncover gaps or inconsistencies in existing work. If a framework is adequately validated, it allows for a more robust measurement of subjective concepts, and the possibility to simplify evaluation.