The HUNT4 study: the validity of questionnaire-based diagnoses

Background Questionnaire-based headache diagnoses should be validated against diagnoses made by the gold standard, which is personal interview by a headache expert. The diagnostic algorithm with the best diagnostic accuracy should be used when later analysing the data. Methods The Nord-Trøndelag Health Study (HUNT4) was performed between 2017 and 2019. Among HUNT4 participants, a total of 232 (19.3%) out of 1201 randomly invited were interviewed by a headache expert to assess the sensitivity, specificity and kappa value of the questionnaire-based headache diagnoses. Results The median interval between answering the headache questions and the validation interview was 60 days (95% CI 56–62 days). The best agreements were found for self-reported lifetime migraine (sensitivity of 59%, specificity of 99%, and a kappa statistic of 0.65, 95% CI 0.55–0.75), self-reported active migraine (sensitivity of 50%, specificity of 97%, and a kappa statistic of 0.55, 95% 0.39–0.71), liberal criteria of migraine (sensitivity of 64%, specificity of 93%, and a kappa statistic of 0.58, 95% CI 0.43–0.73) and ICDH3-based migraine ≥1 days/month (sensitivity of 50%, specificity of 94%, and a kappa statistic of 0.49, 95% CI 0.30–0.68). For headache suffering ≥1 days/month a sensitivity of 90%, specificity 80%, and a kappa statistic of 0.55, 95% CI 0.41–0-69 were found. For tension-type headache (TTH) ≥ 1 days/month the agreement was 0.33 (95% CI 0.17–0.49). Conclusion The HUNT4 questionnaire is a valid tool for identifying persons with lifetime migraine, self-reported active migraine and active migraine applying liberal modified criteria. The agreement for TTH was fair.


Introduction
A careful history taken by a face-to-face interview by a headache expert is the "gold standard" for making a valid headache diagnosis [1]. However, a self-administrated headache questionnaire is less time-consuming and costly, and commonly used in large-scale populationbased studies. With questionnaire-based diagnoses, it is necessary to validate against the gold standard method [1]. Accordingly, during the last decade many validation studies have been published e.g. [2][3][4][5][6][7][8][9][10][11][12].
The headache part of the fourth Nord-Trøndelag Health Study (HUNT4) performed in 2017-2019 is mainly a replication of cross-sectional surveys performed in1995-1997 (HUNT2) and 2006-2008 [13,14]. Ideally, questionnaire-based headache diagnoses should be validated in a random selected sub-sample of participants during the period the survey is performed [1]. Even if a questionnaire can be shown to be valid at a particular time and in a particular area, it is not necessarily valid in other regions or at different time. Thus, in order to validate questionnaire-based headache diagnoses, a new question about lifetime migraine included, a clinical interview performed by headache experts was performed in a sub-sample of participants in HUNT4 [15]. The aim of the present study was to assess the sensitivity and specificity of questionnaire-based headache diagnoses using a personal interview as a gold standard.

Materials and methods
The fourth Nord-Trøndelag health survey (HUNT4) All inhabitants aged 20 years or more in Nord-Trøndelag county of Norway were invited to participate in in the fourth Nord-Trøndelag Health Survey (HUNT4) in the period between August 20th 2017 and February 28th 2019. The survey included a wide number of medical topics in two extensive questionnaires (Q1 and Q2). Based on preliminary data from HUNT4, 103, 800 adults received a written invitation, whereof 56,000 (54%) individuals participated and answered Q1, and 42, 600 (41%) answered also Q2 (Fig. 1).

Study population of the validation study
The present validation study is part of a subproject of HUNT4 mainly focusing on sleep disorders including an invitation to ambulatory polysomnography (PSG) and neurophysiological measurements [15]. The validation study was performed in Stjørdal community in the last week of November 2017, but some interviews were carried out in January 2018. A random sample of adults living in Stjørdal who had participated in HUNT4 and answered Q1 and Q2 received a written invitation. The invitation letter informed about an initial interview focusing on sleep and pain, and headache was not mentioned in the invitation letter [15].

Sample size calculation
Headache diagnoses were not a part of the sample size calculation. The main goal of the sleep disorder subproject of HUNT4 was to perform at least 200 PSGs. In a previous HUNT3 PSG study, less than 17% of invited participated [16]. Thus, based on this experience, 1201 postal written invitations were sent randomly to HUNT4 participants (Fig. 1). No stratification by sex or age was performed. However, based on the HUNT3 PSG study [16], we anticipated that the participation would be higher among women than men, highest in the age group 60-69, and lowest in the age group 20-29. Based on previous headache prevalence data [13][14][15], a sample size of 200 was considered acceptable for the evaluation of migraine and tension-type headache (TTH) [17], but suboptimal considering headache > 15 days per month.

Headache questionnaire in HUNT4
The Q1 included a question about lifetime migraine ( Table 1). The Q2 included a total of 12 headache questions ( Table 1) that were designed to determine whether the person suffered from headache and fulfilled the ICHD-3 criteria [18] for migraine or tension-type headache (TTH). The diagnoses were mutually exclusive and allowed only one diagnosis to be made in each participant. The screening question was "Have you suffered from headache during the last year?", and only individuals who answered "yes" were asked to fill in the other headache questions. Headache sufferers were further asked to report how their headaches usually were regarding pain intensity, attack duration, and accompanying symptoms ( Table 1). As to attack-duration the participants were not instructed to report the duration of "untreated attacks", partly because some individuals always use attack medication for their headaches. In another part of the first questionnaire, the individuals were also asked to state the consumption of over-the-counter (OCT) drugs (painkillers) because of headache during the last month, with four response options: Seldom or never, 1-3 times per week, 4-6 times per week, or daily. The participants' responses to the questionnaire in HUNT4 were unknown to the interviewers at the validation study.
The main objective of the present study was to evaluate the validity of questionnaire-based information.
The diagnosis of migraine in the HUNT4 questionnaire was made according to three different sets of criteria listed in Table 2. The restrictive migraine criteria set was based on ICHD-3 criteria [18], except that duration less than 4 h was accepted because it was not specifically asked for untreated headache attacks in Q2. We have previously reported that asking whether individuals had suffered from headache during the last year yielded high positive predictive value and high specificity for identifying individuals with migraine > 1 day/month [19]. Thus, because this restrictive screening question was used, the validity of migraine > 1 day/month was evaluated. For migraine with aura (MA) only visual disturbance was included in the questionnaire criteria set. Selfreported diagnosis of migraine was also considered separately because high specificity and positive predictive value of this statement were found in HUNT2 and HUNT3 [20,21]. In accordance with the HUNT2 and HUNT3 study, self-reported migraine was integrated in the liberal migraine criteria set also used in HUNT3. The questionnaire-based diagnosis of TTH was based on the ICHD-3 criteria [18]. We have previously reported that very few subjects with infrequent TTH consider themselves as headache sufferers [19], and that a high positive predictive value and high specificity for identifying individuals with TTH > 1 day/month were found among headache sufferers [19]. Because of these findings, the validity of TTH > 1 day/month was evaluated.
To fulfil the questionnaire-based diagnosis of medication-overuse headache (MOH) the participants had to report headache > 15 days per month (i.e. chronic  Fulfilled the restrictive migraine criteria, and reported visual disturbance prior to or during headache III. Liberal migraine criteria (definite and probable migraine) Self-reported migraine, or fulfilled the restrictive migraine criteria IV. Lifetime migraine Answered "yes" to the question "Have you ever had migraine?" a ) Headache duration < 4 h also accepted because the participant were not asked for duration of untreated attacks in second questionnaire headache) and use of analgesics 4 times per week or more during the last month.

Validation interview
A semi-structured interview was performed by five medical doctors (three neurologists) with special interest and competence in headache. Initially, all subjects were asked the questions "Have you ever had migraine?". Those who answered "yes" were asked for age of onset. Later in the interview, we repeated the question regarding age of onset for each type of headache. In the interview, we focused on those who answered "yes" to the question "Have you had a headache during the last 12 months?". We also asked an additional question identical to the screening question in Q2: "Have you suffered from headache during the last 12 months?" Individuals who reported headache during the past year were asked about frequency (average number of days per month during the last year), time span since last headache, intensity, location, aura symptoms, other migraine and cluster headache features, and use of medication. All diagnoses were based on ICHD-3 [18]. Up to three different headache types were diagnosed in those with active headache. Subjects with medication overuse headache (MOH) were also categorized according to their primary headache diagnosis.

Ethics
This study was approved by the Regional Committee for Ethics in Medical Research (#2018/2422/Rek Midt). The participants have given written informed consent. The HUNT4 project was also approved by the Norwegian Data Inspectorate.

Statistics
Sensitivity, specificity and Cohen's kappa statistics with 95% CI were calculated for different headache diagnoses based on information in the questionnaires using headache experts' headache diagnoses as a gold standard. A kappa value of ≤0.20 is considered as poor, between 0.21-0.40 as fair, between 0.41-0.60 as moderate, between 0.61-0.80 as good, and between 0.81-1.00 as very good [22].

Participation rate in the validation interview
Among the 1201 invited participants, 239 (19.9%) agreed to participate (Fig. 1). Among these 239, seven did not attend to the interview because they were out of town, had a sick husband, were busy at work, or they had forgotten the invitation. Thus, among 1201 invited, 232 persons (19.3%) participated in the headache interview, more women (n = 152) than men (n = 80). The mean age was 58.4 years (range 22-89), and the majority (70%) of participants were in the age group 50-79 years.
Response rate to the headache questionnaire A total of 223 (96%) had answered the question about lifetime prevalence of migraine in Q1, whereas 224 (97%) had answered the first screening headache question in Q2.

Time interval between response on questionnaire and interview
The mean interval between answering the Q2 and the validation interview was 60 days (95% CI 56-62 days; median 60 days, range 14-110 days).

Validity of headache diagnoses
The sensitivity and specificity of different questionnairebased headache diagnosis and agreement with kappa statistics between questionnaire and interview are shown in Table 3. Several diagnostic subtypes were evaluated, and the highest figures were found for lifetime migraine (kappa 0.65, 95% CI 0.55-0.75) and migraine during last year using liberal criteria (kappa 0.58, 95% CI 0.43-0.73).
Overall, the questionnaire-based diagnoses of active migraine using restrictive criteria (MA or MO or both) had a sensitivity of 48%, a specificity of 93%, and a kappa statistic of 0.45. Considering those with migraine > 1 days per month, the figures changed to 50%, 94%, and 0.49. Correspondingly, the sensitivity, specificity, and kappa statistics of headache suffering > 1 days per month was 90%, 80% and 0.55 (95% CI 0.41-0.69), and for TTH > 1 days per month 45%, 85%, and 0.33 (Table 3).

Discussion
The agreement between validation interview and questionnaire-based diagnoses for lifetime migraine, selfreported active migraine and liberal criteria of migraine was good, whereas the agreement for TTH was fair [22].

Comparison with other population-based studies
In previous studies comparing questionnaire-based and interview-based diagnosis, highly variable sensitivity and specificity for migraine and TH have been reported [12,23]. However, because different methodological designs have been used in previous published validation studies, direct comparison should be done with caution. One important difference between studies is the wording of the initial screening question, which may vary between a neutral screening question ("Have you had a headache during the last year"), to a more restricted screening question "Have you suffered from headache during the last year" as used in the HUNT surveys [19]. Furthermore, the total number of headache questions included may vary widely. In the present HUNT4 study, a large number of other health related questions were included, allowing only 12 headache questions in the second questionnaire. In comparison, more than 65 headache questions are included in the HARDSHIP questionnaire, which is used in many recently published epidemiological studies [23]. For lifetime self-reported migraine, we found a good agreement between interview and questionnaire-based diagnoses (kappa value 0.65). Correspondingly, good agreement for lifetime migraine has previously been reported in two studies from Denmark (kappa values of 0.62 and 0.77, respectively) [24,25].
In the validation of questionnaire-based diagnoses of active migraine different methodological strategies have been used of recruiting participants. Typically, better agreements have been reported in validation studies on migraine patients recruited from specialist practice (e.g. [5][6][7][8][9][10]26]) than in studies recruiting participants from the general population e.g. [2-4, 20, 21, 27]. Furthermore, a better agreement may be found when the interview is performed directly after filling in the questionnaire e.g. [5,6,[8][9][10] than when there is a time interval of several weeks or month e.g. [2,4,20,21]. In general, with very short time interval the participants may recall their answers in the questionnaire, whereas with a long-time interval the headache condition could have changed during this period, reducing the agreement between responses in the questionnaire and clinical interview. In the guidelines it is stated that reinterviewing should be no more than 1 month after the questionnaire diagnosis [1]. In the present study, the mean time interval between questionnaire response and clinical interview was 2 months. This was not optimal, but mainly caused by an administrative delay, since the invitation letter was sent by postal mail performed by the HUNT administration instead of by telephone as done in HUNT3 [21]. The vast majority of participants were interviewed in the last week of November 2017, but 27 participants who could not attend this week had an interview approximately three weeks later. Among these, it is even more likely that the headache condition could have changed during this period. On the other hand, during the interview none of the participants reported changes in headache treatment during the last three months.
We have previously performed a similar validation study among HUNT3 participants [21]. In the present study a lower agreement between the questionnaire and clinical interview was found with respect the status  [21]. The better results in HUNT3 could at least partly be explained by the fact that the time span between the questionnaire and the validation study was shorter in HUNT3 (median of 45 days) [21] than in the HUNT4 study (median 60 days). In a reliability study from UK it was clearly demonstrated that the agreement for number of headache days was low after 1 month [28].
In the present study the agreement was lower for TTH than for migraine, and similar tendency has also been reported in other population-based studies [2][3][4][5]27]. A main problem of studies based on selfadministrated questionnaires is to correctly diagnose patients with the co-existence of two or more headaches, usually migraine and TTH. Some respondents with migraine may not keep the different subtypes apart when answering the questions. It may also be that many patients have somewhat atypical migraine (probable migraine), and in these patients the distinction between migraine and TTH will be difficult. Questions using the term "usually" regarding features of headache may not be ideal if one tries to make the respondent differentiate between different subtypes of headache.
Thus, the questionnaire was not optimal for estimating prevalence of TTH. On the other hand, high specificity (> 90) for the questionnaire-based diagnosis of migraine makes the questionnaire a valid tool for estimating migraine prevalence and to identify a population of individuals with migraine suitable for genetic studies.

Strengths and limitations of the study
The major strength of this study was the populationbased design inviting a random sample to a face-to-face interview performed by headache experts. The invitation letter did not mention that a detailed headache interview would be performed, hence a selective participation of headache patients is less likely. The previous validation study in HUNT3 was mainly performed in a different study area. Most likely, none of the participants in the present study had participated in the previous validation studies [20,21]. Several study limitations should also be considered. The present study is a minor subproject of HUNT4 that mainly focused on sleep disorders, and participants with sleep problems were more likely to participate. Therefore, the present participants had much more insomnia than previously reported in an unselected general population (33.2% fulfilled the DSM-V diagnosis of insomnia vs. 7.9% in HUNT3 [16,29]. The low participation rate of 19.3% could possibly partly be explained by the supplementary invitation to time-consuming ambulatory PSG and neurophysiological measurements, although the PSG-study was optional. Furthermore, most full-time workers could not attend interviews during daytime and we regrettably had no time for afternoon appointments. Finally, the low participation rate may also partly be explained by the fact that the invitation was sent by postal mail instead of an invitation directly by telephone as done in HUNT3 with an overall participation rate of 53% [21]. It should also be highlighted that 66% of the participants were women, and 70% of participants were in the age group between 50 and 79 years. The impact of the high proportion of women and elderly is difficult to interpret. However, the agreement in the present study may have been reduced because older individuals may be more likely to have change in headache condition during time compared to the younger participants.

Conclusion
The HUNT4 questionnaire is a valid tool for diagnosing patients with lifetime migraine and active migraine applying the liberal criteria. The fair agreement for TTH makes the questionnaire-based diagnoses suboptimal for determining the prevalence of TTH in the population. However, the high specificity of the questionnaire-based diagnosis of several headache types, in particular for migraine with aura, makes the questionnaire a valid tool for diagnosing patients with migraine for genetic studies.