Vasja Vehovar, Zenel Batagelj

THE METHODOLOGICAL ISSUES IN WWW SURVEYS

paper presented at CASIC '96, San Antonio

ABSTRACT:

The paper describes the methodology and the results of three simultaneous Internet surveys in Slovenia: a telephone survey of the Internet users, a mail survey of companies using the Internet and a WWW survey of (self-selected) respondents. Specific features of the WWW respondents are emphasized and comparisons are made with other Internet users. It is confirmed that socio-demographics of WWW users across different countries are surprisingly close.
The WWW survey methodology enables specific experiments related to the length of the questionnaire, exact timing of each question, tracking of the respondents and the lay-out of the questionnaire. The paper presents the key findings based on experiments incorporated in the above mentioned WWW survey.
Also presented is the questionnaire markup language (QML) and the software for automatic transformation from QML to CATI, CASI, CADI format (e.g. Blaise, Interv) and to the format which can be directly applied for WWW surveys (HTML forms, Perl scripts, Java).
Finally, the overall advantages and disadvantages of the WWW surveys are briefly discussed. It is argued that WWW surveys can be very effective for surveying a specific population, for mixed mode surveys and for administrative data collection. A practical application of an integrated computer assisted data collection approach is also demonstrated: the membership survey of the national Statistical Association.

1. Introduction

1.1 The WWW users (and their activities) present an extremely interesting area to be explored from a substantial point of view . On the other side, WWW is also becoming a valuable data collection tool. Both aspects were extensively investigated in the Research on Internet in Slovenia (http://www.ris.org) project.

The following two diagrams bring an impression on global trends in the rapid growth of Internet.

Graph 1: Population (15+) ever used Internet


Graph 2: Frequency of the Internet usage (April 1996)


1.2 The RIS project simultaneously targeted three different populations:

a) Companies: all large and medium companies were surveyed, also 7% of small companies (smaller than 50 employees). The two-page mail questionnaire was followed by two mail follow-up, the third follow-up being conducted by telephone. The overall response rate was 65% and there were 2300 responses obtained.

b) General population: the telephone screening of 7000 households was performed, followed by a telephone survey of 500 Internet users. The response rate was 75%.

c) WWW survey: Information about WWW survey was introduced to the public using: login messages, WWW announcement, news in classic media. More than 3000 users visited the home page, over 1800 users linked to the questionnaire and 1200 respondents answered the complete questionnaire.

The simultaneous performance of all three surveys in Spring 1996 gave an opportunity to study the Internet related behavior in different groups and, particularly, in different modes of survey data collection.

1.3 Many different surveys were conducted on the issue of Internet usage. However, the international comparisons are often difficult, especially when companies are analyzed. There, the difficulties arise due to differences in methodology and also due to specific national environment.

On the other side, the easiest comparison can be performed in the case of WWW surveys. Not only is the methodology relatively standardized, but the results (at least socio-demographics) of the surveys are relatively close. The target group are, basically, young males with similar pattern of Internet usage. This holds for all WWW general surveys, despite the fact that in WWW general surveys we don't speak about probability samples. Of course, here we talk about the 1996 situation when less then 10% of the population uses Internet regularly. There are some signs, however, that the socio-demographic of the Internet users is slowly changing.

Graph 4: The gender (percentage of female users) of the WWW users


Graph 5: The age structure of the WWW users


2. The WWW Questionnaire in RIS survey

2.1 The WWW questionnaire survey asks standard Internet-usage questions (frequency, areas, background). Altogether, there were 20 question which, on average, take 7 minutes of respondents time.

The WWW survey enables - similar to the CASI (Computer Assisted Self-Interviewing) surveys - an exact timing of each question and also an exact measurement of the length of the whole interview. Thus, the technology is extremely suitable for testing questionnaires.

In the above described WWW survey the average length and the coefficient of variation for all questions were calculated. The extra lengthy questions and the questions with large variability (coefficients of variation) in the time needed to complete the self-administrated interview indicate that attention is needed. Some additional work may be performed and, if needed the wording might be changed, the question split etc.

2.2 The survey also tested two different layouts. The first one was the commonly used one - one long scrolling page for the whole questionnaire. The alternative one was the layout where each question block was put on it's own page; the next page appeared only when the previous was finished. The software automatically (randomly) allocate the proper layout to each respondent.

The second layout has many advantages over the first one. When using the first one it is very simple to implement the use of conditions in the questionnaire. Otherwise Java applets needs to be used which results in unnecessary waiting for questionnaire download. When using the second layout the measurement of time used for each question block is done automatically.

Also tested was the impact of the layout on the completion rate and length of the interview. Here, with completion rate we understand the ratio between number of complete interviews/questionnaires and the number all attempted/started questionnaires. The results were calculated separately for text browsers (Lynx) and and graphical browsers.

The results are as follows:

Table 1: Mean of the responding time, CV and the completion rate
mean (sec.)
completion rate
textmultiple page
553
76.1%
one page
468
76.9%
graphical multiple page
466
83.5%
one page
368
85.4%

It is evident that the completion rate is highly influenced by the type of browser used. The average length of interview is higher when text browsers were used and also higher when multiple page layout is used. The most important result is that there are no statistically significant differences in completion rates which means that no significant differences occured when text browsers with multiple page layout were used compared to single page layout. Therefore, according to the benefits of multiple page layout discussed before there are no limitations for using it.

2.3 As mentioned, in WWW surveys the exact timing of each interview can be easily measured. This enables not only analysis of the timing of each question and experiments with different layouts, but it is also possible to obtain some important information on technical aspects:

The above described RIS survey thus clearly showed which browser and which operation system dominate in the population of WWW users without asking those questions. It is also possible to track the e-mail address of respondents what can definitely raise some ethical issue.

2.4 At the end of the questionnaire the respondents were asked for comments and their e-mail address if they preferred to obtain the results. Surprisingly, 70% of the respondents left the email address.

When the results (in fact, the WWW address with results) were e-mailed to respondents the exact time of the each interview was cited (for example: "Thank you for 5 minutes and 35 seconds of your time you spent responding to our survey...").

The explicit quotations of the responding time impress the respondents and make them remember the possibilities of WWW surveys. The intention behind this idea was to prepare the respondents to participate also in the next year survey.

Regardless of sending this very specific information to respondent, the key advantage of WWW surveys is the simple fact that is very easy - and almost without costs - to perform follow-up communication. That also includes the possibilities of distributing the results of the survey.

3. The technology

All surveys mentioned above were created with the software that supported Integrated Computer-Assisted Data Collection (ICDC). The software developed enables easy transformation of the questionnaire to the surveys of different modes. The basic idea is to create the questionnaire once and let software create the final layouts for all modes of data collection.

All questionnaires in RIS project were thus written in the questionnaire markup language (QML), which is based on SGML - an international standard for data description. The software than enables automatic conversion to CATI, CASI and CADI format which can be directly applied for interviewing and data entry.

The questionnaire can be also printed in the form that can be used directly for email survey. Of course, some additional design work may be needed on the layout.

The questionnaire can be also converted directly to the specific form of standard software such as Blaise and InterV. This may be extremely useful when cooperation is needed between agencies / companies using different software. A CATI / CADI / CASI software interpreter for QML has been also developed.

Another important output is the format for WWW surveys: HTML, Perl and Java scripts. The software automatically creates and designs a standardized layout of the page on WWW. The important point is, that QML supports also HTML source code. That means that multimedia WWW surveys are also possible.

The above described software can be schematically expressed in the following scheme:


WWW survey development.

4. Integrated Computer Assisted Data Collection

4.1 The above described technology enables considerable flexibility in designing the surveys. In general, the survey process can be split in two parts: the contact with the respondent and the interviewing process itself. Both components can, of course be, performed together, however, there are situations when the two components are separated.




The above scheme enables large flexibility in selecting the most suitable mode / technique to perform the survey. Specifically, the integrated computer assisted data collection approach strongly supports the mixed mode surveys. The advantage there is to give the respondent the comfort of selecting the preferred mode and also the time of the interview. Of course, this is reasonable only in the case of relatively motivated target population.

In the majority of surveys with less salient topics the aggressive approach based on face-to-face or telephone contact, which is immediately followed by the interview, may still be the preferred option and should be by no way neglected with the offer of the mixed mode survey.

On the other side, the approach described above is especially suitable not only for mixed mode surveys but also for the collection of the administrative data.

5. Statistical Association Membership Survey

Described below is an example of the survey of the member of the Slovenian Statistical Association. It was assumed that the statisticians have the access to Internet, although the exact data were not known,

The survey proceeded by the following steps:

a) First week: the members received the (snail) mail with the invitation to visit the home page and to answer the WWW survey;

b) Second week: the members received another letter which expressed both: thanks and remainders. This time the non-respondents were, besides WWW survey, also invited to call a given telephone number and participate in a CATI interview. Trained interviewers were available there which enables cooperation of the respondents without access to the Internet. (As an option a computer automated data collection with voice recognition can also be used.)

c) Third week: the telephone calls were made to the non-respondents.

The possibility of the mail survey was also discussed. However, it was established that it would only bring the complexity that is not needed in this survey. However, even without this option, the survey enabled the respondents to select the responding time and the survey mode.