Assessing The Equivalence Of The Paper And On-line Formats Of The Quis 5.5

Laura Slaughter, Ben Harper, and Kent Norman
Laboratory for Automation Psychology
University of Maryland, College Park

This investigation compared responses from paper and on-line formats of the Questionnaire for User Interaction Satisfaction (QUIS 5.5), a tool for assessing users' subjective satisfaction with specific aspects of the human/computer interface. The majority of studies assessing equivalence between computerized and paper forms of tests have found no differences. In light of past research, we expected to find equivalence between the two forms of the QUIS. Twenty subjects evaluated WordPerfectŠ using both the paper and on-line formats of the QUIS 5.5. Each administration was preceded by a practice session to refamiliarize the subject with the interface. As expected, the format of the questionnaire did not affect users' ratings. However, subjects using the on-line format wrote more in the comment sections than when using the paper format. The comments made by subjects using the on-line format contained better clarification of problems, strengths, and often included examples. These results indicate that the on-line QUIS format provides more information to developers, researchers and human factors experts than the paper-pencil format.

This study was conducted to assess the equivalence between the on-line and paper-pencil versions of the Questionnaire for User Interaction Satisfaction (QUIS 5.5). The on-line version, QUIS 5.5, was designed to be used as a substitute for the conventional format that also to save the experimenter both time and effort (Harper & Norman, 1993). Finding equivalence or determining what factors cause different results is important for anyone who administers the on-line questionnaire and is concerned with the validity of the QUIS. The paper format of the QUIS has proven reliability and validity across many types of interfaces (Chin, Diehl & Norman, 1988). Knowing that computer and paper-pencil administered questionnaires yield comparable results is valuable for researchers and organizational practitioners faced with the option of giving their questionnaires under one or both administration modes (Booth-Kewley, Edwards & Rosenfeld, 1992).

Although much of the current research finds equivalence between paper-pencil and computer-based formats, equivalence should not always be assumed. The QUIS is a questionnaire that evaluates the human-computer interface. This creates the concern for those using the computer-based version of the questionnaire that the interface of the questionnaire could influence the raters' responses. State-dependent memory theory (Eich, 1980) asserts that retrieval will be enhanced when performed in the same context as when the memory first occurred. This suggests that people will remember more about the interface when they are completing the questionnaire on a computer than on paper. Furthermore, computer administration could effect responses by making the QUIS seem more applicable to what it is trying to determine, that is, satisfaction with a particular interface.

Previous studies have shown that respondents using a computer found the questionnaire to be more important, felt more aware of their thoughts and feelings, and generally regarded the computer-based version as more favorable (Booth-Kewley, Edwards & Rosenfeld, 1992). This could result in an increase in the quality and quantity of comments made by raters. Respondents who are more aware of their thoughts when using the computer-based version of the QUIS would then write more comments containing more useful and specific comments in comparison to more general comments made when completing the paper-pencil version. Viewing the questionnaire more favorably when completing the on-line version might have the effect of causing the respondent to rate aspects more positively. Furthermore, the view of the questionnaire as important might cause the respondent to more thoughtfully answer each question. Obviously, many positive and negative effects are possible when the conventional version of the questionnaire is moved to an on-line format.

Total time taken to complete the questionnaire is generally longer for the on-line version and over-all ease of completion might affect results between the two versions. Research on other questionnaires has shown that the computer administered versions are faster for the respondent to complete than the paper-pencil versions. For example, speed of completion has been implemented as a factor in increasing the reliabilities and decreasing the dispersion of the scores obtained in the computer-based mode(Vansickle, Kimmel & Kapes, 1989) Although these results were indicted for an on-line questionnaire that was faster for the responder to complete, the longer to complete computer-based QUIS might also cause similar inequivalencies. The screen layout of the on-line questionnaire differs slightly from the paper-pencil version .

The majority of studies (Booth-Kewley, Edwards & Rosenfeld, 1992; Holden & Hickman, 1987; Huba, 1988; Kapes & Vansickle, 1992; Wilson, Genco & Yager, 1985) which assess equivalence between computerized and paper formats of tests find no differences. In the face of past research, we expected to find equivalence between the paper-pencil and on-line versions of the Questionnaire for User Interaction Satisfaction.

METHODS

Subjects
Twenty subjects (9 males, 11 females) participated in this experiment. Eleven were undergraduates, seven were graduate students and two were non-student volunteers. All subjects were required to have experience using a Macintosh computer as a prerequisite for participation. An equal number of novice users (less than one day) and more experienced users (over six months) of WordPerfectŠ participated in this study.

Instrumentation
The Questionnaire for User Interaction Satisfaction (QUIS) is a measurement tool designed to evaluate a computer user's subjective satisfaction with the human-computer interface (Chin, Diehl & Norman, 1988). The questionnaire is arranged in a hierarchical format and contains: (1) a demographic questionnaire, (2) six scales that measure overall reaction ratings of the system, and (3) four measures of specific interface factors: screen factors, terminology and system feedback, learning factors and system capabilities. Each of the four specific interface factors has a main component question followed by related subcomponent questions. Each item is rated on a scale from 1 to 9 with positive adjectives anchoring the right end and negative on the left. In addition, "not applicable" is listed as a choice. Additional space which allows the rater to make comments is also included within the questionnaire.

Computer System
The QUIS version 5.5 for Macintosh was administered using a Macintosh LCII in Spinnaker PLUS. The comment box option was used in the experiment which added comment screens between sections of the questionnaire. A color monitor (8 bit), standard keyboard and apple ergonomic mouse were used when administering the on-line QUIS.

Design
A within-subjects, test-retest design was used in which the sessions were separated by a period which ranged between two to eight days. The administration order of the two versions of the questionnaire was counter-balanced. Subjects were asked to complete one version of the QUIS (on-line or paper-pencil) in the first week of experimentation and returned after a minimum of two days to complete the other version.

Subjects rated WordPerfectŠ on the Macintosh using the QUIS. Prior to each evaluation, subjects completed tasks that were designed to help the subjects recall specific aspects of the interface. Two similar sets of tasks were used for each week which were also counter-balanced. Both tasks involved the use of different fonts, inserting a page break, copying and pasting, changing the style of writing, inserting a header or footer, and completing a spell check. These were chosen to give the rater a chance to review a few aspects of the WordPerfectŠ interface before completing the QUIS. Each subject had 10-15 minutes to finish the set of tasks, and were then asked to complete the QUIS.

Procedure
Subjects were informed that this experiment included a practice session to allow them to review different aspects of WordPerfectŠ on the Macintosh. They were told that they would practice changing fonts, copying and pasting items, and would complete several other tasks. Subjects were given ten to fifteen minutes to complete the set of tasks. The experimenter opened a new page in WordPerfectŠ and from that point forward no help was given to the subjects. Subjects were encouraged to use on-line help if they did not know how to complete a task. Following the set of tasks, they completed either the on-line or paper-pencil version of the QUIS.

On their second session, subjects were told that they would be completing very similar tasks to the ones in their first session. Subjects were again given ten to fifteen minutes to complete the set of tasks. Depending on the version of the QUIS used in the previous session, subjects completed either the on-line or paper-pencil version of the questionnaire. They were told that they were not being tested on how well they could remember their responses from the last session but to complete the questionnaire while thinking about rating different aspects of WordPerfectŠ.

RESULTS

Figure 1 shows the comparison of ratings between the on-line and paper versions of the QUIS 5.5. As shown in the figure, the profile of mean ratings across items was very similar in both versions.

Figure 1

A power analysis was performed to ensure our ability to detect any effect of survey format. With a power of .95 at a = .01, the critical h2 = 0.31. Figure 2 shows the resulting power curve. A one point mean difference between the on-line and paper survey results was taken as the lower bound for our interest, and yields h2 = 0.46. With N=20, this study should be able to detect response differences as small as 0.77.

Figure 2

As previous research has suggested, no effect of survey format was detected, F(1, 6) = 4.160, p>0.02. The interaction between survey format and individual question was also not significant, F(25, 150)= 1.077, p>0.02. Alpha has been divided among the analyses according to their importance, maintaining an experiment-wise error rate of 5%. Figure 1. Means for both survey formats on each question with marginal means and standard deviations inserted.

As hypothesized, there was a difference in the comments generated between the two versions. Subjects using the on-line survey ( = 153.5, SD = 178.8) entered significantly more words in the comment areas than when using the paper version ( = 28.7, SD = 50.0), tpaired(19) = 3.693, p

DISCUSSION

This study supports the growing idea that computer-based surveys are appropriate alternatives to traditional paper-pencil questionnaires. It even suggests that computer administered surveys may have some advantages over their conventional counterparts.

Vansickle, Kimmel & Kapes (1989) wrote that there is reason to conduct further research with many different surveys because if findings can be replicated, users of computer-based instruments will benefit from the statistical confidence only previously associated with paper-pencil administration. Our finding of equivalency between the two versions for all the question aspects extends to the user's of the QUIS added confidence in the on-line version.

Comments written in the paper version of the questionnaire identified problems, but left little explanation or clarification of what the actual difficulties were. The computer-based version, however, gave better verbalizations of problems and comments often included examples. For instance, on paper, the following comment was written about the computer's ability to keep the user informed about what it is doing: "Instruments were OK. However, user does not have much control over feedback". For the same question, the subject wrote using the on-line version: "Does not really keep you informed. On cut and paste, I had mistakenly pasted before I cut and found that someone before me had pasted something to the clipboard. You can imagine my surprise when I saw three lines of extra characters that I did not type appear on the screen. The software did not inform me that I had pasted anything so I had to deduce that I had pasted before cutting or copying. Luckily, I am familiar with computers and this was not a problem but a less experienced user might have been confused." These same types of response differences were seen between on-line and paper-pencil versions of the questionnaire for the majority of the subjects' comments. This suggests that the quality of the comments increased in the on-line version of the QUIS .

Differences found in the quality of comments have particular importance to software developers and researchers who use the QUIS. The QUIS is most often used to spot problems with new or proposed computer software. Comments are at least as important as the QUIS ratings because they contain specific information. The explicit and illustrative types of comments produced by the computer-based version of the QUIS are more likely to inform the developer than the terse and unexplained comments of the paper-pencil version. The computer version also tended to elicit comments of satisfaction in addition to the problems. This information can be just as valuable.

Booth-Kewley, Edwards & Rosenfeld (1992) suggested that researchers should attempt to identify the boundary and contextual conditions that produce differences between computer-based and pencil-paper responses. We have postulated a few of the differences between the two versions that may cause the discrepancy in the comment 's content and quantity. The layout of the questionnaire offers more screens available for the respondent to write comments . Also, there is a sentence that appears at the top of each on-line comment screen that suggests what can be commented on in that space. The on-line version is more forceful in collecting comments because it cannot be ignored by the subject, it is included in the series of cards and is not just simply an "optional" box at the bottom of a page. Furthermore, the subject might also feel obliged to fill in comments because the other cards containing the actual questions of the QUIS require that the subject complete all of them before they can move to the next section of the questionnaire.

Overall, we have shown that no differences in ratings exist between versions of the questionnaire which makes it safe for all those who want to use both or either in practice or research. The increase in the number and quality of comments written is an added plus to those who use the on-line version as the richness of comments will aid in the utility of the questionnaire.

REFERENCES

Booth-Kewley, S., Edwards, J.E., & Rosenfeld, P. (1992). Impression management, social
desirability, and computer administration of attitude questionnaires: Does the computer make a difference?. Journal of Applied Psychology, 77(4), 562-566.

Chin, J., Diehl, V., & Norman, K.(1988). Development of an instrument measuring
user satisfaction of the human-computer interface. In CHI '88 Conference Proceedings: Human Factors in Computing Systems, (pp.213-218), New York: Association for Computing Machinery.

Eich, J.E. (1980). The cue-dependent nature of state dependent retrieval. Memory &
Cognition, 8, 157-173.

Harper, B.D. & Norman, K.L.(1993). Improving user satisfaction: The
Questionnaire for User Interaction Satisfaction version 5.5. In Proceedings of the First Mid-Atlantic Human Factors Conference, (224-228), Virginia Beach, Virginia.

Holden, R.R. & Hickman, D. (1987). Computerized versus standard
administration of the Jenkins Activity Survey (Form T). Journal of Human Stress, 13(4), 175-179.

Huba, G.J. (1988). Comparability of traditional and computer Western Personnel Test
(WPT) versions. Educational and Psychological Measurement, 48, 957-959.

Kapes J.T. & Vansickle T. R. (1992). Comparing paper-pencil and computer-
based versions of the Harrington-O'Shea Career Decision Making System. Measurement and Evaluation in Counseling and Development, 25(1), 5-13.

Vansickle T.R., Kimmel C. & Kapes J.T. (1989). Test-Retest equivalency of the
computer-based and paper-pencil versions of the Strong-Campbell Interest Inventory. Measurement and Evaluation in Counseling and Development, 22(2), 88-93.

Wilson, R., Genco, K., & Yager, G. (1985). Assessing the equivalence of paper-and-
pencil vs. computerized tests: Demonstration of a promising methodology. Special Issue: Computer assessment and interpretation: Prospects, promise and pitfalls. Computers-in-Human Behavior, 1(3-4), 265-275.