ABSTRACT:
The paper describes the methodology and the results
of three simultaneous Internet surveys in Slovenia: a telephone
survey of the Internet users, a mail survey of companies using
the Internet and a WWW survey of (self-selected) respondents.
Specific features of the WWW respondents are emphasized and comparisons
are made with other Internet users. It is confirmed that socio-demographics
of WWW users across different countries are surprisingly close.
The WWW survey methodology enables specific experiments related
to the length of the questionnaire, exact timing of each question,
tracking of the respondents and the lay-out of the questionnaire.
The paper presents the key findings based on experiments incorporated
in the above mentioned WWW survey.
Also presented is the questionnaire markup language (QML) and
the software for automatic transformation from QML to CATI, CASI,
CADI format (e.g. Blaise, Interv) and to the format which can
be directly applied for WWW surveys (HTML forms, Perl scripts,
Java).
Finally, the overall advantages and disadvantages of the WWW surveys
are briefly discussed. It is argued that WWW surveys can be very
effective for surveying a specific population, for mixed mode
surveys and for administrative data collection. A practical application
of an integrated computer assisted data collection approach is
also demonstrated: the membership survey of the national Statistical
Association.
1. Introduction
1.1 The WWW users (and their activities) present an extremely interesting area to be explored from a substantial point of view . On the other side, WWW is also becoming a valuable data collection tool. Both aspects were extensively investigated in the Research on Internet in Slovenia (http://www.ris.org) project.
The following two diagrams bring an impression on global trends
in the rapid growth of Internet.
Graph 1: Population (15+) ever used Internet
Graph 2: Frequency of the Internet usage (April 1996)
1.2 The RIS project simultaneously targeted three different populations:
a) Companies: all large and medium companies were surveyed, also 7% of small companies (smaller than 50 employees). The two-page mail questionnaire was followed by two mail follow-up, the third follow-up being conducted by telephone. The overall response rate was 65% and there were 2300 responses obtained.
b) General population: the telephone screening of 7000 households was performed, followed by a telephone survey of 500 Internet users. The response rate was 75%.
c) WWW survey: Information about WWW survey was introduced to the public using: login messages, WWW announcement, news in classic media. More than 3000 users visited the home page, over 1800 users linked to the questionnaire and 1200 respondents answered the complete questionnaire.
The simultaneous performance of all three surveys in Spring 1996
gave an opportunity to study the Internet related behavior in
different groups and, particularly, in different modes of survey
data collection.
1.3 Many different surveys were conducted on the issue of Internet usage. However, the international comparisons are often difficult, especially when companies are analyzed. There, the difficulties arise due to differences in methodology and also due to specific national environment.
On the other side, the easiest comparison can be performed in
the case of WWW surveys. Not only is the methodology relatively
standardized, but the results (at least socio-demographics) of
the surveys are relatively close. The target group are, basically,
young males with similar pattern of Internet usage. This holds
for all WWW general surveys, despite the fact that in WWW general
surveys we don't speak about probability samples. Of course, here
we talk about the 1996 situation when less then 10% of the population
uses Internet regularly. There are some signs, however, that the
socio-demographic of the Internet users is slowly changing.
Graph 4: The gender (percentage of female users) of the WWW users
Graph 5: The age structure of the WWW users
2. The WWW Questionnaire in RIS survey
2.1 The WWW questionnaire survey asks standard Internet-usage questions (frequency, areas, background). Altogether, there were 20 question which, on average, take 7 minutes of respondents time.
The WWW survey enables - similar to the CASI (Computer Assisted Self-Interviewing) surveys - an exact timing of each question and also an exact measurement of the length of the whole interview. Thus, the technology is extremely suitable for testing questionnaires.
In the above described WWW survey the average length and the coefficient
of variation for all questions were calculated. The extra lengthy
questions and the questions with large variability (coefficients
of variation) in the time needed to complete the self-administrated
interview indicate that attention is needed. Some additional work
may be performed and, if needed the wording might be changed,
the question split etc.
2.2 The survey also tested two different layouts. The first one was the commonly used one - one long scrolling page for the whole questionnaire. The alternative one was the layout where each question block was put on it's own page; the next page appeared only when the previous was finished. The software automatically (randomly) allocate the proper layout to each respondent.
The second layout has many advantages over the first one. When using the first one it is very simple to implement the use of conditions in the questionnaire. Otherwise Java applets needs to be used which results in unnecessary waiting for questionnaire download. When using the second layout the measurement of time used for each question block is done automatically.
Also tested was the impact of the layout on the completion rate
and length of the interview. Here, with completion rate we understand
the ratio between number of complete interviews/questionnaires
and the number all attempted/started questionnaires. The results
were calculated separately for text browsers (Lynx) and and graphical
browsers.
The results are as follows:
Table 1: Mean of the responding time, CV and the completion rate
| text | multiple page | ||
| one page | |||
| graphical | multiple page | ||
| one page |
It is evident that the completion rate is highly influenced by
the type of browser used. The average length of interview is higher
when text browsers were used and also higher when multiple page
layout is used. The most important result is that there are no
statistically significant differences in completion rates which
means that no significant differences occured when text browsers
with multiple page layout were used compared to single page layout.
Therefore, according to the benefits of multiple page layout discussed
before there are no limitations for using it.
2.3 As mentioned, in WWW surveys the exact timing of each interview
can be easily measured. This enables not only analysis of the
timing of each question and experiments with different layouts,
but it is also possible to obtain some important information on
technical aspects:
The above described RIS survey thus clearly showed which browser
and which operation system dominate in the population of WWW users
without asking those questions. It is also possible to track the
e-mail address of respondents what can definitely raise some ethical
issue.
2.4 At the end of the questionnaire the respondents were asked for comments and their e-mail address if they preferred to obtain the results. Surprisingly, 70% of the respondents left the email address.
When the results (in fact, the WWW address with results) were e-mailed to respondents the exact time of the each interview was cited (for example: "Thank you for 5 minutes and 35 seconds of your time you spent responding to our survey...").
The explicit quotations of the responding time impress the respondents and make them remember the possibilities of WWW surveys. The intention behind this idea was to prepare the respondents to participate also in the next year survey.
Regardless of sending this very specific information to respondent,
the key advantage of WWW surveys is the simple fact that is very
easy - and almost without costs - to perform follow-up communication.
That also includes the possibilities of distributing the results
of the survey.
3. The technology
All surveys mentioned above were created with the software that supported Integrated Computer-Assisted Data Collection (ICDC). The software developed enables easy transformation of the questionnaire to the surveys of different modes. The basic idea is to create the questionnaire once and let software create the final layouts for all modes of data collection.
All questionnaires in RIS project were thus written in the questionnaire markup language (QML), which is based on SGML - an international standard for data description. The software than enables automatic conversion to CATI, CASI and CADI format which can be directly applied for interviewing and data entry.
The questionnaire can be also printed in the form that can be used directly for email survey. Of course, some additional design work may be needed on the layout.
The questionnaire can be also converted directly to the specific form of standard software such as Blaise and InterV. This may be extremely useful when cooperation is needed between agencies / companies using different software. A CATI / CADI / CASI software interpreter for QML has been also developed.
Another important output is the format for WWW surveys: HTML,
Perl and Java scripts. The software automatically creates and
designs a standardized layout of the page on WWW. The important
point is, that QML supports also HTML source code. That means
that multimedia WWW surveys are also possible.
The above described software can be schematically expressed in
the following scheme:
WWW survey development.
4. Integrated Computer Assisted Data Collection
4.1 The above described technology enables considerable flexibility
in designing the surveys. In general, the survey process can be
split in two parts: the contact with the respondent and the interviewing
process itself. Both components can, of course be, performed together,
however, there are situations when the two components are separated.
The above scheme enables large flexibility in selecting the most suitable mode / technique to perform the survey. Specifically, the integrated computer assisted data collection approach strongly supports the mixed mode surveys. The advantage there is to give the respondent the comfort of selecting the preferred mode and also the time of the interview. Of course, this is reasonable only in the case of relatively motivated target population.
In the majority of surveys with less salient topics the aggressive approach based on face-to-face or telephone contact, which is immediately followed by the interview, may still be the preferred option and should be by no way neglected with the offer of the mixed mode survey.
On the other side, the approach described above is especially
suitable not only for mixed mode surveys but also for the collection
of the administrative data.
5. Statistical Association Membership Survey
Described below is an example of the survey of the member of the Slovenian Statistical Association. It was assumed that the statisticians have the access to Internet, although the exact data were not known,
The survey proceeded by the following steps:
a) First week: the members received the (snail) mail with the invitation to visit the home page and to answer the WWW survey;
b) Second week: the members received another letter which expressed both: thanks and remainders. This time the non-respondents were, besides WWW survey, also invited to call a given telephone number and participate in a CATI interview. Trained interviewers were available there which enables cooperation of the respondents without access to the Internet. (As an option a computer automated data collection with voice recognition can also be used.)
c) Third week: the telephone calls were made to the non-respondents.
The possibility of the mail survey was also discussed. However, it was established that it would only bring the complexity that is not needed in this survey. However, even without this option, the survey enabled the respondents to select the responding time and the survey mode.