The applicability of web-based solutions in headache epidemiology research

Background Epidemiological research of headache is vital but resource consuming prerequisite for evidence-based development in the field. Rapid evolution of information technology may provide new opportunities for population-based surveys. The aim of this study was to evaluate the applicability of web-based solutions in epidemiological studies of primary headaches. Methods An online survey was conducted among 20–64 year old Estonian citizens, using a previously validated headache questionnaire. The participants were accessed through most popular portals and e-mail domains to get the maximum coverage of Estonian digital community. The resulting one-year headache prevalences were compared to those acquired in parallel from a population-based cross-sectional person-to-person study in Estonia. Results Five thousand seven hundred eight entries were made by 5347 participants in the online study. Of the participants, 3896 (72.9%) had no headache, 1436 (26.8%) had only one and 15 (0.3%) had more than one type of headache. The study sample demographics were statistically significantly different from Estonian population and the prevalences were adjusted by age, gender, education and habitat. The proportion of headache sufferers was smaller in the online study sample (23.1% vs 41.0% in the population-based parallel person-to-person study). Among the headache sufferers the proportions of different headache diagnoses were similar across the two studies with the exceptions of episodic migraine and episodic tension-type headache. There were less migraine and more tension-type headache sufferers in the online study sample. Discussion This is the first study addressing applicability of web-based solutions in headache related large epidemiological studies. Online approach presents a much faster means of data collection, larger samples, has mechanisms of avoiding data contamination and distinguishes the proportions of most primary headache disorders among the headache sufferers. However, the present online survey was significantly biased towards the people without headache, leading to underestimation of headache prevalence. This stems from the shortcomings related to method of sampling, access and engagement. Conclusion Online headache epidemiology research could be a resource saving alternative to person-to-person studies, however, further research is needed to overcome the problems related to methods of sampling, access and engagement.


Introduction
Epidemiological studies are important for acquiring information about disease patterns and aetiology as well as for creating the basis for the assessment of disease burden, cost and need for health services in society [1,2]. However, large population-based epidemiological studies are usually resource and time consuming [2]. In the face of rapid digital evolution it would be beneficial to search for new methods for epidemiological surveys that could exploit the fast development of information technology. It certainly could be the case in headache epidemiology, bearing in mind that most primary headaches can be diagnosed based on history and do not require additional instrumental investigations. Nevertheless, online research poses several possible obstacles mainly concerning sampling-related biases. The latter probably demotivates researchers from testing online methodology in large-scale nation-wide studies. Therefore, in the context of epidemiological studies, the degree of biases involved in web-based methods have not been studied in an evidence-based, comparative manner.
Estonia is a North-Eastern European country with the population of 1.3 million [3]. It is one of the leading countries in the world regarding the usage of internet and web-based solutions per householdestimated at 86.2% among the population of 16-74-year-olds in 2016 [4]. In 2019 Estonia ranked 8th out of the 28 EU Member States in the European Commission Digital Economy and Society Index, showing that the use of internet services remains consistently high in this country [5]. This sets up potentially promising conditions for using etechnology in performing representative studies in headache epidemiology.
The aim of this study was to evaluate the applicability of a web-based approach in epidemiological studies of primary headaches by comparing the results of a webbased survey to a population-based epidemiological study in Estonia, the results of which have previously been published [6].

Surveys
Two parallel surveys were conducted, both from January 2016 to May 2017.
One of the surveys consisted of a population-based random sample of 2162 subjects who were interviewed by telephone or face-to-face using a previously validated questionnaire. The description of the methods and the results of this study have been previously published [6].
The other was a web-based survey. The participants included in the survey were Estonian citizens aged 20-64 and they were recruited via internet. For this purpose, an online recruitment campaign was performed. Advertisements for the same questionnaire were sent to different online portals and 150,000 e-mails were sent to six most popular e-mail domains in Estonia. The portals and e-mail addresses were chosen by an advertisement company and were aimed at maximum coverage of Estonian digital community. The advertisements and e-mails consisted of a short informative description of a health survey, avoiding any explanation that this was a headache survey in order to minimize participation bias. The advertisements and e-mails also contained a link to the headache questionnaire. The questionnaire was hosted by Tartu University Hospital's server which provides a highly secure mode for participants' data storage. In order to reach the questionnaire, the participants had to log in with their unique personal Estonian ID cards so that double entries could be traced and managed appropriately. This also secured that only Estonian citizens of the appropriate age were included, since the ID card data include the date of birth of the participants. At the end of the questionnaire there was a more thorough description of the purpose of the study explaining that it was a headache epidemiology study and making sure that participants, upon being fully informed, had the possibility of leaving the site without saving their data in case they decided not to give their consent. Otherwise, they saved their data by pushing the button "Finished".
In order to encourage participation a lottery was announced on the advertisements and in the e-mails. The lottery draw was performed at the end of the study and two kinds of prizes were awarded to 11 random participants. The prizes were 10 sports-club memberships and 1 tablet-computer.
As multiple entries by single participants were expected, the following protocol was developed in order to manage these. In case the results of multiple questionnaire entries were identical, only the first entry was retained in the study. Thus, the multiple entries of the participants made by mistake or to enhance their chances of winning a prize by filling in the questionnaire several times were eliminated. If the results of the questionnaire did not overlap, the following 4 options were possible.
Firstly, if the age reported by the participant did not match the age by ID (Estonian ID includes the date of birth) the entries were excluded as the participant was filling the questionnaire in under a false identity. Secondly, if one entry resulted in a headache diagnosis and another in no headache, the "headache" entry was accepted and "no headache" entries were excluded, because it is most probable that the "no headache" entries were completed in order to enhance the chances of winning the prize. Thirdly, if different entries by a participant resulted in different headache diagnoses that did not exclude each other, they were all accepted as different headaches may occur in one person. The fourth option was the case when different entries by a participant resulted in different headache diagnoses that excluded one anotherfor example the participant had both diagnoses of a chronic and an episodic form of the same headache, or both the probable and definite diagnoses of the same headache. In these cases, the chronic form was accepted and the episodic omitted, or the definite diagnosis accepted and the probable omitted, respectively.
When multiple entries were included from the same participant, s/he was still counted as a single participant having multiple headache cases. In other words, the total number of participants in the sample did not increase, but the number of respective headache cases did.

Questionnaire
We used a structured questionnaire in Estonian that was developed by our study group and had undergone the specificity and sensitivity, as well as positive and negative predictive values' estimation for most primary headache disorders [7]. The same questionnaire was used in the aforementioned population-based epidemiological personto-person study of primary headaches in Estonia [6]. In the online study, the questionnaire was self-administered similarly to the original validation process [7].
The questionnaire consisted of four parts: 1) demographic data, 2) the headache diagnostic questionnaire, 3) headache-related burden and associated factors, and 4) enquiry on socioeconomic status and willingness to pay for effective headache treatment. The results of the enquiries in the third and fourth parts will be published elsewhere.
At the beginning of the diagnostic section the respondent was asked a screening question for headaches: "During the last year, have you had repeated headaches that were not caused by an acute infection, medication side effects, medical procedures, or consumption of toxic substances including alcohol?" If the respondent answered "yes", s/he was introduced to questions targeting different aspects of the person's headache (localization, laterality, character, intensity, preceding and accompanying symptoms, duration, frequency, response to indomethacin, association with certain situations/activities, precipitating factors, drug consumption, and history of head trauma) [7].
The participants were required to complete all the questions of the demographic and diagnostic parts in order to finish the questionnaire, thus avoiding missing data.
After filling in the headache questionnaire, an ICHD-3 beta based diagnostic algorithm [6][7][8] was applied and the respondent received one of the following diagnoses: no headache, episodic or chronic migraine, episodic or chronic tension-type headache, one of trigeminal autonomic cephalalgias, one of other primary headaches except for primary thunderclap headache and externalpressure headache, or, in case the described headache did not meet the criteria of any of the aforementioned entities, the unidentified headache was diagnosed [6,7]. The headache had to fit either definite or probable criteria of ICHD-3 beta to be considered as a case [8].

Statistical analyses
The main outcome of the online study were the oneyear prevalences of primary headache disorders in the study sample. These prevalences were compared to the one-year prevalences of primary headaches in Estonian population acquired from the population-based personto-person study published earlier [6]. Statistical methods used in both studies were identical. Data analysis was performed using R [9]. Sample weights were calculated using ANES (Americal National Election Study [10]) raking algorithm implemented in R package anesrake [11] (a standard approach in situations where data need to be simultaneously weighted for multiple demographic criteria). Comparison of the sample proportions was conducted using two-sample test for equality of proportions (with continuity correction).

Results
During the period from January 2016 to May 2017, five thousand seven hundred and eight entries were made by 5347 individual participants. Five thousand and thirty two participants filled in the questionnaire only once, 250 participants made multiple entries which resulted in identical diagnoses, and 65 participants made multiple entries with differing diagnoses. After addressing the multiple entries according to the protocol, 5363 entries were included and 340 entries were excluded from the final analysis. Of the 5347 participants 3896 (72.9%) had no headache, 1436 (26.8%) had only one type of headache and 15 (0.3%) had more than one type of headache (Fig. 1).
The demographic data of the study sample are depicted in Table 1 alongside the data for Estonian population (data derived from Statistics Estonia, 01.01.2016 [12]).
The study sample demographics was statistically significantly different from Estonian population. The proportion of women was higher, participants were younger, there were more married people, the education level and the proportion of people living in urban areas were higher in the study sample compared to the general  population. Hence the study sample was adjusted to match the population demographically by weighting by age, gender, marital status, habitat and education. The adjusted prevalences of primary headaches in the study sample (weighted by age, gender, marital status, habitat and education) are depicted in Table 2.
The comparison between the adjusted prevalences of primary headaches in Estonian population-based personto-person study sample [6] and in the online study sample (weighted by age, gender, marital status, habitat and education) is depicted in Table 3.
The percentage of headache sufferers in general was considerably smaller in the online study sample. However, among the participants who had headaches, the proportions of different headache diagnoses were similar in the two studies (Table 4 and Fig. 2) with only the proportions of episodic migraine and episodic tensiontype headache being statistically different. There were proportionally less migraine and more tension-type headache sufferers in the online study sample compared to the population based person-to-person study sample in Estonia [6].

Discussion
To the best of the authors' knowledge, this study is the first one in the world to experimentally and evidentially address the question if web-based approach to the epidemiological studies of primary headache disorders is useful, and what pitfalls there may be expected. The comparison between the online and person-to-person survey methods is optimal, the most correct and informative only in case both surveys are performed within the same population during the same time period. Online solutions have been used in headache research previously [13][14][15] but there have been no attempts to conduct an online survey for primary headache epidemiology on such a large scale, involving a whole country.
One of the considerations in favour of online approach to epidemiological studies is time. Although the headache questionnaire was available for 15 months, we noticed that most of the entries were made in close temporal connection to the launches of the online advertisement and e-mail campaigns with most of the entries (n = 4082, 76% of total) made during 3 months' time after the release of the campaign. This means that compared to the traditional methods of epidemiological studies it presents a much faster and cost-effective means of data collection. Additionally, the online study sample was considerably larger than the sample obtained in the offline person-to-person study that was carried out in parallel [6]. Hence the power of the study is also bigger and statistical corrections can be made with smaller error. This, we believe, is one of the biggest benefits of online surveysthe acquisition of large samples with less consumption of time and resource.
The prevalence of all headache in the study sample after adjusting it to the general population by age, gender, marital status, habitat and education was only 23.5% almost 2 times smaller than the prevalence of all headache in the population-based random sample person-toperson survey performed in parallel [6]. This is likely to mean that the online survey was significantly biased towards the people without headache. We speculate that one of the reasons this could have occurred was the selection bias created by the lottery that was originally intended to enhance participation. Since the potential reward was a gym membership, it is possible that physically more active individuals may have been more likely to participate, reducing the prevalence of headaches [16]. It is also possible that a proportion of the participants did not take the trouble to fill in the questionnaire truthfully even if they had had a headache during the previous year and simply took the easy way out by saying they had not in order to be able to participate in the aforementioned lottery (we propose it could be called "convenience bias"). Although participants not admitting to having had a headache could not be totally excluded in the person-to-person survey, it is definitely more likely to have occurred in the web-based approach.
There is an evident imbalance between genders among the responders (71.5% were females). The gender differences in attending the internet in Estonia are not considerable, for example, 86% of women and 87% of men used the internet for sending and receiving e-mails in Estonia in 2017, whereas men used the internet more for reading the news and bank transactions than women (75% vs 70% and 63% vs 60% respectively) and social networks were attended slightly more by women than men (67% vs 63%) [17]. We believe that the imbalance of genders in this study might be explained by the tendency of women to be more concerned about health issues than men and therefore to attend health surveys more readily. However, since data in this study are adjusted for gender among other characteristics, this imbalance should not influence the prevalence rates considerably.
When omitting the participants without headache and looking at the proportions of primary headache diagnoses among those who reported headache in the online study, they are surprisingly similar to those found in the population based person-to-person random sample survey in Estonia [6] (Fig. 2, Table 4). The only statistically different proportions were those of episodic migraine and episodic tension-type headache, whereas the proportions of chronic migraine, chronic tension-type headache, trigeminal autonomic cephalalgias, other primary headaches and even unidentified headaches were almost   Table 3 Comparison of weighted one-year prevalences of primary headaches in Estonia [6] and in the online sample PRIMARY HEADACHES Weighted one-year prevalences (%) with 95% CIs in Estonian population aged 20-64 Weighted one-year prevalences (%) with 95% CIs in the online study sample identical. Furthermore, even the proportions of the statistically different episodic tension-type headache and episodic migraine are still similar to the proportions of their counterparts in the population-based person-toperson random sample study in the respect that these are still the largest and most prevalent diagnoses of primary headaches in the samples: in both studies, episodic tension-type headache and episodic migraine together comprise 83% and 85% of the primary headaches, respectively. The proportion of episodic tension-type headache is larger and the proportion of episodic migraine is equally smaller in the online study compared to the person-to-person study. One of the reasons for this discrepancy might be the fact that in the population based person-to-person study the questionnaire was administered face-to-face or by telephone interviews [6] whereas in the online study the questionnaire was completely self-administered. Since migraine diagnosis requires more detail (presence of accompanying symptoms etc) these nuances might be missed when the questionnaire is self-administered as opposed to the situation where the participant can ask clarifications from the interviewer. This is an important issue regarding the reliability of the future online studies and must be taken into account when designing these. On the other hand, most of the headache prevalence studies so far have  demonstrated that tension-type headache usually is more prevalent than migraine in any given population [18,19]. The prevalence of episodic tension-type headache in the population-based person-to-person random sample study was found to be 18.0% [6], which is exceptionally low when compared to other countries in the same North-Eastern European region. The reasons for this possible underestimation of episodic tension-type headache are discussed elsewhere [6]. However, this raises the question of whether the online study could have reflected the proportions of primary headache disorders in the population even more truthfully. Nevertheless, it is apparent that underestimation of headache prevalence would be one of the most troubling issues of online prevalence studies. Another important factor, that our study underlined, is the necessity of having the participants identify themselves by some unique ID method. The 671 multiple entries by the same participants in the online study (about 12% of all the entries) certainly point to the fact that in such web-based surveys it is vital to have an identification method that would grant the means to manage the situation, especially where multiple entries increase the chances of winning a prize for participants. This again provides evidence that if lotteries and other similar "stimulating packages" are to be used to boost participation, it must be applied with utmost care to minimize the inevitable bias.
The main limitations of our study are related to the sampling methods. Valid conclusions of the population of interest (in our case Estonian population of 20-64 years of age) require probability sampling, where all members of the population have an initial probability of being selected to the study sample [2]. In our case it means that since almost about 87% of 16-74-year-olds in Estonia use internet on daily basis [4], about 13% of the population would be isolated from the possibility of being invited to a study when conducted online. However, there is no information about the age distribution of the non-users within this 13%. It is highly probable that most non-users are in the older age-group. Since our study sample's upper age limit is 64 years, it is quite possible that the actual percentage of internet users within the targeted age-population is even higher than 87%, but this remains speculative due to lack of respective data.
We tested the hypothesis that high internet coverage among the general population in Estonia would be a factor sufficient enough for obtaining a representative sample by the chosen method of access and engagement. The analysis of the demographic data of the sample evidentially overruled this hypothesis. Our sample of 5347 participants was statistically significantly different form the general population of Estoniathe sample consisted of a younger, more educated and more urban group of people and there were more women than men among the participants. The smallest difference, although statistically significant, was in the marital status of the participants compared to the general populationthere were more married people in the study sample. This shows that simply by addressing the digital community based on the most popular sites and domains does not grant a representative sample of general population even in the countries with highly developed information technology and in order to obtain representative samples in the future online epidemiological studies the methods of sampling, access and engagement must be more conservative [2]. There can be several possible solutions: the targeted invitation to the study could be linked to banking systems, e-health registries or e-mail addresses from national population registries in countries that use corresponding solutions extensively among adult population.
The evidence provided by our study should be considered when planning further research and generating guidelines for using web-based approaches in headache epidemiology.

Conclusion
Our study shows that in the face of an already extensive and rapidly increasing usage of internet and IT-solutions among the general population, online headache epidemiology research could be a time-and resource efficient alternative in technologically developed countries. In addition to the possibility of obtaining larger study samples in relatively short time periods the IT solutions are capable of providing participant identification methods that enable avoiding data contamination. However, further research is needed to find more reliable methods of online access and engagement to gain representative samples and overcome the pitfalls of bias and most probably underestimation of headache prevalence.