Abraham, Manja D., Hendrien L. Kaal, & Peter D.A. Cohen (2002), Licit and illicit drug use in the Netherlands 2001. Amsterdam: CEDRO/Mets en Schilt. Pp. 35-59.
© Copyright 2002 CEDRO Centrum voor Drugsonderzoek.
Licit and illicit drug use in the Netherlands 2001
Chapter 1: Methodology
This report presents the results of the national drug use prevalence survey amongst the population of the Netherlands aged 12 years and over in 2001. This survey is called the National (drug use) Prevalence Survey 2001 (NPO 2001). In this chapter the history and methodology will be described. The first paragraph gives an overview of the history of the NPO and preceding reports. The remainder of the chapter outlines the methodology with regard to the following subjects: sample, method, fieldwork and adjustments in terms of time and weighting. The final paragraph contains some remarks on statistical issues.
The tradition of drug use surveys amongst the population in the Netherlands dates back to 1970. Between 1970 and 1991, six national population surveys were conducted (Korf 1995). Their results are limited due to three reasons. First, given the relative small sample sizes involved (varying from 910 to 1,123 persons), confidence intervals are large and as a consequence less reliable estimates can be made. Secondly, rarely used drugs (such as heroin or inhalants) require a larger sample to obtain estimates that approach some reliability (see e.g. Sandwijk 1995). Thirdly, strong conclusions of development of drug use in time cannot be drawn because of the differences in sampling, age ranges and measurements methods between these surveys.
In 1987, a large size survey on drug use was conducted in Amsterdam. A second, third and fourth Amsterdam survey were conducted in 1990, 1994 and 1997. A parallel study was conducted in Utrecht and Tilburg in 1996. Each of these surveys consisted of approximately 4,000 respondents of 12 years and older sampled from the population registry. The data was collected by face-to-face interviews first with paper and pencil, later by CAPI. In 1997, these surveys were expanded to a national level (NPO 1997). The total 1997 national sample, which included the Amsterdam, as well as Rotterdam, The Hague and Utrecht samples, consisted of almost 22,000 successful responses. Although some fine adjustments have been made over time, from a methodological perspective these survey results can be validly compared from 1987 on.
In 2001, a national follow-up was performed: NPO 2001. The sample protocol was similar to the one used in 1997 and yielded approximately 18,000 respondents. Regretfully, due to low response in general and lack of interviewers, it was not possible to approach and interview all respondents in the same way (CAPI) as in 1997. For comparison purposes a subset of 3,000 respondents was interviewed face-to-face (CAPI, like in NPO 1997), the remaining 15,000 participants had the opportunity to choose their own preferred method to supply their data (Multi Method). Details regarding these methodological changes and their consequences will be given below. The conclusion was that results of the early Amsterdam and the later NPO surveys are comparable. Therefore, insight can be given into the dynamics of drug use in the Netherlands from 1997, and in Amsterdam from 1987.
NPO is a nationally representative survey, covering the population of the Netherlands aged 12 and above. The Municipal Population Registry (GBA: Gemeentelijke Basisadministratie Persoonsgegevens) as maintained by all municipalities, served as the sampling frame for selection of participants, as was the case in NPO 1997. A two-stage stratified sample was drawn from all persons 12 years of age and older. Because the survey was conducted amongst registered persons, homeless persons and illegal residents were not included in the sample. Between 0.5 per cent and 1.0 per cent of all Dutch residents is illegal (CBS 2002). However, high-school drop-outs, omitted in many of the popular school surveys, are included in the sample frame. The sampling was done by Statistics Netherlands, with the exclusion of the Amsterdam sample that was drawn by the Municipal Registry itself.
The sample design is generally the same as in 1997 apart from the following dissimilarities. First, in the 1997 survey the cities Amsterdam, Rotterdam, The Hague and Utrecht were oversampled. In the 2001 survey only Amsterdam and Rotterdam were assigned their own sub-sample and stratum. Secondly, in the 1997 sample, all sub-samples were equal sized whereas in the 2001 sample the size (and thus allocation) of the sub-sample is proportional to the number of inhabitants (for strata 3-7). In 2001 the age group 12-19 was oversampled whilst in 1997 this was the age group 12-18. The reason to oversample those aged 19 is that reliable estimates can be given per 2-year age groups: 12-13; 14-15; 16-17; 18-19.
A stratified two-stage random probability design was used to select participants. The sample design for NPO 2001 is stratified, meaning that before drawing any sample, the sample frame is classified into seven non-overlapping strata. All municipalities were classified (or stratified) on the base of their address density as defined by Statistics Netherlands. The number of addresses within the radius of 1 kilometre from a given address is counted. This is done so for all addresses within a given municipality. Finally the total is summed up and divided by the number of addresses in that municipality (Statline 2002 http://statline.cbs.nl/). To be able to provide detailed coverage of the Amsterdam and Rotterdam population, these cities were both assigned their own sub-sample. This process resulted in seven strata:
As shown by the previous NPO 1997 survey (Abraham et al. 1999) drug use is strongly related to address density. The NPO 2001 sample is organised according to the same stratification criteria in order to see if drug use was still related to density, and if density determined unequal dynamics of drug use over time. Furthermore, stratification increases accuracy of the estimates by reducing the variance of the estimator, an important advantage of stratification.
The required number of persons in each sub-sample was calculated to be around 2,500. The size of the pilot samples was determined by the response rates of the 1997 survey (expecting that response rates would not fluctuate that much) but the size of subsequent samples was based on the response rates of the pilot surveys. The sample design and size was chosen to enable analysis of a wide range of demographic variables, down to at least address density level, per age group. Below that level, numbers tend to be too small for statistical purposes.
The selection of persons included in the sample was performed in two stages. At stage one of the selection process, a random sample of municipalities was drawn within each stratum. The total number of municipalities in the Netherlands in 2001 was 504, the sample included 449 of these municipalities. Municipalities in stratum 1 to 5 were self-selecting, meaning that all municipalities in that stratum are automatically included in the sample. Stratum 1 contained 1 municipality (Amsterdam), stratum 2 contained 1 municipality (Rotterdam), stratum 3 contained 10 municipalities, stratum 4 contained 53 municipalities, and stratum 5 contained 93 municipalities. The address density stratum 6 contained 176 municipalities, which were represented by a random sample of 110 municipalities. And finally the address density stratum 7 contained 170 municipalities, which were represented by 83 municipalities in the sample. Thus, the actual 'two-stage' sampling only applied to stratum 6 and 7.
In order to obtain an even representation of all regions in the sample, the stratification of municipalities followed the division of the country into 40 COROP regions. Each province consists of one or several of these regions. The distribution of the sample over the COROP regions is correlated to the total population size of the municipalities within a COROP region in each specific density stratum.
The number of municipalities in each stratum and the threshold of number of inhabitants beyond which a municipality is automatically drawn depended on the total number of persons to be drawn in each stratum and on the sample size in each of the drawn municipalities (m). In this design the minimum sample size (m) of each municipality is set at 20 persons. The required number of persons in each sub-sample was calculated to be around 2,500. The precise number was proportional to the number of inhabitants of this stratum. In general, it can be said that the precision of the outcomes decreases due to cluster effects. This is the case when the chosen sample size (m) is large, and therefore the number of drawn municipalities is low. Because a relatively large share of municipalities represented those strata where selection of municipalities did take place, there should be limited or almost no cluster effect or design effect.
At stage two people were drawn from the selected municipalities, which resulted in one sub-sample for each stratum. To be able to make precise drug use estimates for the age group 12-19 (2-year age groups per stratum), this group was oversampled with a double probability of being selected.
All 22,000 respondents in the NPO 1997 were questioned using computer-assisted personal interview (CAPI). With this method, the interviewer questions respondents at their homes, using a laptop computer. Regretfully it was not feasible to repeat the CAPI method for the entire NPO 2001. This had the following reasons:
As an alternative method for CAPI a new approach was developed: Multi Method (MM). The basic principle of MM is that respondents get the opportunity to choose how and when they want to participate in the survey. The respondent indicates whether he wants to answer the questions on a paper questionnaire (non-assisted paper interview), via their own computer on the Internet, or a computer disk (floppy disk by mail). A non-reacting person will be re-approached and reminded, with the offer to be interviewed by phone (CATI; computer assisted telephone interview) or, if the phone number is not listed, will be sent a reminder consisting of questionnaire and diskette by mail. During the pilot surveys, the options CAPI and CATI could be chosen as well. However, only a very small number of people indicated they preferred to be interviewed that way. These choices were later omitted.
With an eye on comparability, approaching the 2001 MM sample always making strenuous attempts to make data comparability as rigorous as possible it might seem contradictory to approach the 2001 sample using MM. Using the same samples, different procedures would be expected to result in different estimates. On top of this, methods are self-selecting. That is, persons who completed the interview using a specific mode were not selected randomly. (Aquilino and Lo Sciuto 1990) Therefore eventual mode effects might be prevalent in some groups and absent in others. Because of self-selection, it is less straightforward to compare data collection methods for mode effects.
To guarantee comparability with the results of NPO 1997 it was considered to be necessary to investigate the impact of effects caused by methodological differences. Therefore, the gross sample is split in two parts (see table 1.1). Approximately 8,000 persons were approached following CAPI procedure and approximately 32,500 persons were addressed according to the MM protocol (protocols are given below in paragraph 1.5). Differences between CAPI and MM results turned out to be small. A more detailed description of the differences between CAPI and MM can be found in chapter 4.
Disadvantages of the use of MM are that the practical realisation is complicated. It puts a lot of pressure on the fieldwork organisation and requires great management skills. Furthermore, with CAPI, all interviews were conducted with the use of a computer. The use of computers minimises routing errors and instantly alerts inconsistencies of given answers. Because MM offers the choice of using a paper questionnaire, routing errors and inconsistencies occur. Indeed, a large share of the MM interviews consisted of paper questionnaires (64.8 per cent of the MM interviews), creating the normal data quality problems normally associated with such interviewing.
All interview methods used identical questionnaires that contained questions about the use of various licit and illicit drugs as well as respondent background and lifestyle characteristics. The questionnaire of NPO 2001 was similar to the one used in NPO 1997. Detailed questions were asked about the subject's use of particular drugs, the frequency and intensity of use and age of first use. These questions were asked for a range of substances: tobacco, alcohol, sedatives, hypnotics, cannabis, cocaine, ecstasy, amphetamines, hallucinogens, mushrooms, opiates, inhalants, performance-enhancing substances and so called smart drugs.
Although one of the most economic ways to assess the nature and extent of drug use is by completing confidential national surveys, this method also has its limitations. For example, the data is self-reported. Asking people to report private or personal information, and specifically about drug use, can lead to overestimation, underestimation and/or selective non-responses (Harrison et al. 1995).
Interviewing was organised and carried out by staff of NIPO, the organisation that also accomplished the fieldwork of the 1997 national survey and in 1994 the Amsterdam survey. Interviewing took place in the first half of 2001, while the pilot interviews started as early as May 2000. Altogether 17,655 people were successfully interviewed out of a gross sample of 40,573.
Because there was no former experience with the followed MM approach, a pilot study was performed in order to come to the final protocol. After the pilot study it was decided to test a renewed protocol in a second pilot. This means that the exact protocol that describes which steps are undertaken to approach the respondent, depended on the stage of the fieldwork. The fieldwork is described in three sequential parts: First, the pilot survey, covering the Amsterdam sample; then, an extra pilot survey, covering a supplementary sample in Amsterdam and stratum 5 (all municipalities with address densities between 1,000 to 1,500 on average); and finally, the main survey, covering the remainder of the sample (stratum 2 to 4, 6 and 7). Table 1.2 tabulates the adjustments in the fieldwork for each period.
Pilot 1 Amsterdam
The pilot study was designed to test the completely new fieldwork protocols. Performing a pilot survey has important practical benefits: inconsistencies in the protocol are revealed and can be improved, the fieldwork organisation can get used to the new way of interviewing, and in this case, the size of the gross sample can be recalculated on base of the response rates. Amsterdam was chosen as the site of the pilot, because it is the most difficult area due to low response willingness and interview problems (NPO 1997).
The target was to collect data on 1,000 respondents with CAPI and 3,000 persons with MM in Amsterdam. This number of observations would be sufficient to make a sound methodological comparison between CAPI 2001 and MM 2001 results in Amsterdam and between CAPI 1997 and CAPI 2001 results in Amsterdam. By interviewing 1,000 persons CAPI, the Amsterdam trend-data series was safeguarded, even if it would turn out that MM did not deliver satisfactory results. In Amsterdam a sample of roughly 3,200 persons was approached following the CAPI protocol, approximately 8,200 persons following the MM protocol. Both gross samples were randomly drawn from the Amsterdam Municipal Registry.
CAPI protocol - Respondents selected in the CAPI sample were approached in the same way as in NPO 1997. Persons received a letter from the University of Amsterdam, sent by the fieldwork organisation, that requested for co-operation in the survey and announced the visit of an interviewer at their home address. The interviewer questioned the respondent, guided by a laptop computer. If the selected person was not found at home, the interviewer revisited the address up to at least three times.
MM protocol - The first MM protocol, used in the first Amsterdam pilot, was based on the idea that respondents should be able to choose their preferred way of participation in the survey. Therefore the maximum number of options was offered as follows:
First MM protocol first pilot
Persons in the MM sample received a letter from the University of Amsterdam inviting them to participate in the survey. Respondents could indicate their preferred mode of being interviewed by
Respondents in the first pilot were offered an incentive of 10.- (€ 4.54 ) in return for participation.
Respondents could choose from the following five interview methods:
After two weeks, employees of the fieldwork organisation would reply to respondents' requests; paper questionnaires and floppy disks were sent out, phone interviews were completed and appointments were made for the personal interviews. Respondents who received a paper questionnaire or a floppy disk were asked to return the completed questionnaire within another two weeks.
Passive persons, i.e. those who did not undertake action -returning the prepaid address card or making a free phone call to the fieldwork organisation- within two weeks from receiving the invitation letter, were approached by phone if their phone number was listed. In cases where the telephone number was not found, they were reminded by mail. Active persons, i.e. those who did search contact by indicating how they wanted to participate, but whose contact did not result in a completed interview within two weeks after receipt of their mode preference, were also approached by phone or mail. Persons who explicitly refused to participate were not contacted anymore.
A slightly different protocol was used for Moroccan and Turkish persons. People with Moroccan ethnicity form 7.8 per cent of the Amsterdam population and those with Turkish ethnicity form 4.8 per cent, together 12.5 per cent (January 1st 2001 (O&S, 2001 http://www.onderzoek-en-statistiek.amsterdam.nl/)). The NPO 1997 survey showed that Moroccan and Turkish people were less willing to participate than the other Amsterdam citizens were. To increase the probability that Moroccan and Turkish people are represented in the survey, the letter of invitation was written in Dutch and in Arabic or Turkish. Furthermore, the reminder-procedure was not by means of phone or writing, instead an interviewer visited the house to do a face-to-face survey. A Moroccan or Turkish interviewer revisited respondents if language or cultural problems occurred during the interview.
Evaluating the pilot study, it was found that:
After the first pilot, the response rate was found to be low (36 per cent). Response rates possibly could be improved by increasing the incentive, by simplifying the method, and by clarifying the invitation letter (with regard to contents and/or text).
To gain insight into the initial perception of the invitation letter and the value of the incentive, a small-scale qualitative survey was conducted by means of focus interviews. It was hypothesised that respondents were overwhelmed by the number of possibilities and therefore were not able to choose. To research this 'l'embarras du choix', 18 persons were randomly picked from the street and invited in the offices of the fieldwork organisation for a fee of 100.- (€ 45.38) per person. In a semi structured open interview situation, questions were asked regarding the following subjects: how do persons choose and prioritise handling their mail? Does the envelope of the invitation letter look inviting enough to open? How do they perceive the letter of invitation? Should there be a financial incentive and how high should this incentive be?
These interviews were held in a video monitored closed room. All 18 discussions were witnessed via video by two members of the CEDRO staff and two members of the fieldwork organisation staff. Specialised focus group personnel rendered the 18 interviews.
It was found that the number of choices did not confuse the respondents. However, the tone of writing and the lay-out of the invitation required some adjustment. For example, the word 'incentive' and the value of this incentive needed to be seen at the moment the letter is opened. Therefore this paragraph of text was placed on the top half of the A4 letter. The evaluation group considered a reward encouraging. Apart from exceptions, they were willing to participate for an amount of 25.- (€ 11.34) which they considered to be a fair reward for a 20 minute questionnaire. The respondents in this small survey were invited to comment on a higher incentive, for instance 50.- It became clear from most answers that incentives higher than 25.- were looked at with some suspicion. The sender of the invitation letter, the University of Amsterdam, was perceived to be serious enough to consider participation.
Another method of evaluating the value of the incentive was embedded in an ongoing NIPO panel survey titled capi@home (Foekema 2001). The optimum level of an incentive at 25.- was confirmed by this study.
Pilot 2 Supplementary pilot Amsterdam and stratum 5
The first pilot study had to be expanded by a second one since the initial pilot study did not result in a sufficient number of responses. A supplementary sample was drawn in Amsterdam to obtain the required number of interviews in order to be able to compare the CAPI approach to the MM approach and to evaluate the improvements made since the first pilot. Based on the knowledge of the NPO 1997 CAPI survey, it could be expected that response rates would be higher in less urban areas than in more urban areas. However, it was not certain that the MM protocol would give the same response differentiation. So, to collect knowledge about the improved MM protocol, it was decided to enrich the results of a second pilot by including stratum 5 into it, consisting of all municipalities with address densities between 1,000 to 1,500. Doing so would show how the MM protocol was received in another address density area. Moreover, incorporating another address density stratum would enable to measure response rate and adjust, if needed, the gross sample size.
The purpose of the supplementary pilot study was also to create an experiment on incentive perception. The aim was to interview 200 persons with CAPI and 800 persons with MM in Amsterdam and 350 persons with CAPI and 2,000 persons with MM in stratum 5. A split-sample test was conducted to find out whether extra responses countervail extra costs involved. Half the sample was offered an incentive of 10.-, the other half an incentive of 25.-. A sample of roughly 1,600 persons was approached following to the CAPI protocol, approximately 3,800 persons following the MM protocol.
CAPI - The CAPI guidelines remained unchanged, as is described for pilot 1.
MM protocol - The second MM approach was almost equal to the protocol in pilot one with the following adjustments: the incentive was 10,- for half of the MM sample, 25,- for the other half; the period of time between first invitation and reminder was extended to three weeks; respondents could no longer phone the fieldwork organisation and complete the questionnaire instantly. Instead, there was the possibility to be called at a preferred time (which could be made known by phone or mail).
Based on the results of the second pilot study, the following observations and adjustments could be made.
After running the pilots in the year 2000, the rest of the survey was conducted in 2001 through September. Selected persons living in strata 2 to 4, 6, or 7 were appointed to the CAPI or MM sample and approached according to the following guidelines.
CAPI - The CAPI guidelines remained unchanged, as is described in pilot 1.
MM - The final MM protocol for the major part of the survey (strata 2 to 4, 6 and 7) was as follows: Persons in the MM sample received an invitation to participate in the survey by mail. Future active respondents could participate in their preferred mode and favoured point of time by:
Passive persons, who did not undertake action within three weeks, were re-approached by phone provided their phone number was listed or otherwise by mail. Persons who explicitly refused to participate were not contacted again.
Evaluating the main survey, it appeared that:
It can be concluded that the MM approach as it is in its final form, is suitable for a survey of this size. Therefore it is a feasible, although extremely complex, alternative where the conventional CAPI approach does not seem to work.
1.6 Data weighting
To provide national estimates it is necessary to adjust response data for differences in the selection probabilities of the stratified two-stage sample design. Weights were produced to balance the oversampling of two groups of persons: 1) persons aged 12 to 19; and 2) persons living in Amsterdam and Rotterdam. Weights were calculated without making the distinction between the CAPI and MM approach. According to the methodological analysis it was decided that CAPI and MM samples could together be treated as one comprehensive sample in the weighting procedure (see chapter 4).
This weighting procedure was based on post-stratification on the level of address density, age, gender and marital status, according to the population registry (GBA). Variables were combined; so within each address density level, persons were classified in age groups. Within these age groups persons were classified according to a combination of gender and marital status. The calculation of each individual weight was based on the aim to achieve complete correspondence of distributions of mentioned characteristics between response and population. An important additional advantage is that results are improved for non-response errors. The drawback of this approach is that it becomes less straightforward to perform statistical tests because weighting and stratification affect computation and interpretation of statistical significance. Weights can be found in Appendix D.
1.7 Statistical notes
The correctness or accuracy of the measurement can be classified in two groups:
Statistical tests quantify the reliability and not the validity of the measurement. Note that reliability is a necessary but not sufficient condition for validity.
The validity of the NPO 2001 outcomes is possibly affected by non-response. Response group might be self-selected and e.g. more interested into a drug use survey because they use drugs, and therefore more likely to participate. In chapter 3 this issue of non-response is dealt with. Furthermore, validity is possibly affected by the use of different kinds of data collection. As it was found that different kinds of data collection led to different answer patterns (such as item missing) one can not assume that the answers are equally distributed in the total sample. Also the use of questionnaires generates validity effects. Another example of a systematic error is 'social desirability' (Swanborn 1994). However, is seems unlikely that social desirability is an important issue in the NPO surveys (Winter et al. 1999; Abraham and Jol 2001).
The term reliability embraces all kind of coincidental factors that influence the outcomes of a survey. Such are: sampling, the mood and condition of the respondent, the presence of a third person during the interview, the patience of the interviewer and interviewee, copious last month drug use due to a party, and countless other factors. In the case of CAPI, the cluster effect caused by interviewers is another example. Other than validity, reliability is commonly expressed in terms of statistical tests and their p-values. The p-value represents the probability of error that is involved in accepting the observed result as valid, that is, as representative of the population. For example, a p-value of .05 indicates that there is a 5 per cent probability that the relation between variables found in the sample is a coincidence.
However, one has to be cautious when interpreting significance levels. First, it is important to emphasize the fact that significance has nothing to do with validity. Results can be highly reliable regardless of the validity.Secondly, the larger the sample, the higher the statistical power of the test. In very large samples, such as the NPO sample, even very small relationships between variables or small differences between the same variables over time, will be significant. Therefore if significance is found, this does not have to mean that the differences are interesting in reality. Thirdly, as said before, statistical tests are more complex to apply due to the sampling and weighting procedure. Weighting influences the significance in two ways; it reduces the distortion (bias), but sometimes leads to higher variance and therefore decreases the reliability. Generally said, the larger the impact of the weights (i.e. the more the weights deviate from '1', the less reliable the results. In practice, the total effect is often an enlarged reliability of the estimates (Kish 1965). In most tables the number of unweighted cases (n) and the estimates in (weighted) percentages are given. The unweighted (n) serves as an indicator of the reliability and allows the reader to better relate to the data. The larger the number of cases (n), the more reliable the estimate. And fourthly, the sample is not a simple randomly drawn sample and hence does not meet the assumptions of simple tests.
The 95 per cent confidence interval for the drug use and corresponding population estimates was calculated, based on its logit transformation. Because the drug use proportions in the survey are often small, the logit transformation has been used for this report to yield asymmetric interval boundaries. These asymmetric intervals are more balanced with respect to the probability that the interval is above or below the true population value than is the case for standard symmetric confidence intervals. This method of computing confidence intervals is amongst others applied in NPO 1997 and, in the United States NHSDA survey (SAMSHSA 1997). Intervals as given for the composed samples (e.g. the total of all samples used to give national estimates) are weighted averages of the separate seven confidence intervals.
The logit transformation of the 95 per cent interval of the proportion p (p lower, p upper) was calculated as follows.
First, the 95 per cent logit interval was calculated, given by the logit transformation of p (L), and the standard error of L:
Then, the 95 per cent confidence interval was calculated for the proportion p as:
In most tables the number of unweighted cases (n) and the estimates in (weighted) percentages are given. The unweighted n serves as an indicator of the design-effect; it shows on how many (or few) observations the estimate is based and thus allows the reader to better relate to the data.
Due to the small number of persons that use particular substances (e.g. heroin), results cannot always be generalised for the population. By drawing large samples, this problem is minimised but not solved. The following rule of thumb is applied: an estimate is considered to be unreliable if the sub-sample group is smaller than 50. In tables these are noted with a hyphen (-).
The following symbols are used in the tables:
. Data not available
Last update: May 25, 2016