Survey Design for Top Level Domains
Survey Design and Implementation for New Top Level Domains (TLDs)
Abstract
Is the new Top Level Domain (TLD) program a success or are new top level domain names being purchased by industry insiders. Are industry outsiders aware of the new TLDs and if so, would they consider registering a new top level domain name. Nobody has surveyed industry outsiders to determine their acceptance and awareness of new domain names. Providing an easily accessible survey link to respondents will help attain the desired number of responses. However, to ensure reliable results a field test will be conducted to ensure the questions are understood as intended and follow a logical flow. To ensure the survey provides accurate results, missing data will not be imputed, most questions will have a ‘does not apply’ option, embedded data will be collected and tests for consistency and logic will be incorporated. The resulting survey data will be coded for use in dashboards that will visually display the results.
Survey Design and Implementation
Does achieving 25 million new Top Level Domain (TLD) registrations in less than two years constitute a success (greenSec GmbH, 2016)? While it is a large number of domain names, the new TLD program is relatively new and a portion of the general public is unaware these new domain name extensions exist and are available for registration. Some argue the program is a failure, with only industry insiders possessing the knowledge or awareness of its existence. To gain insight into this argument, a survey on new domain name registrations will be conducted focusing on a cohort outside the TLD industry.
“Spending time making sure your survey is rooted in the realities of respondents lives” is the basis of this question as the Internet is becoming a part of our daily lives and only continues to grow (Snap Surveys, 2015). As such, more people and businesses are creating website and buying domain names. While most people are familiar with legacy top level domains such as .com, .org, .info, etc. or country code domains such as .us or .ca the question of whether people understand and accept new top level domains and to what degree is not known.
This paper will examine the development of a survey surrounding how Internet users understand and accept new Top Level Domains. A proposed sampling plan, survey instrument and mode of delivery, implementation plan and approach to data management and integration will be provided. This will be rounded out with a draft dashboard for reporting on the resulting data.
Survey Research Topic
The introduction of new generic Top Level Domains (gTLD) began in January 2015 and has been slowly rolling out on a continuous basis. Two years later, debate on whether the gTLD program was a success or simply an influx of money to Internet Corporation for Assigned Names and Numbers (ICANN) has yet to be determined. Some argue that the 25 million registrations originate mostly from insiders and that the general public is unaware or simply does not care about new TLDs and will instead continue to register legacy TLDs such as .com, .net., .info. Thus, the focus of this survey is on gathering information around these questions.
The new gTLD program was developed with the intent to “promote competition in the domain name market while ensuring Internet security and stability” (Internet Corporation for Assigned Names and Numbers, 2015). As part of the evaluation process, committees were created to both develop and evaluate a set of metrics. There were initially 70 metrics which were later reduced to an evaluation of 65 different metrics focused on four objectives:
- Ensuring accountability, transparency and interests of global Internet Users
- Preserving security, stability and resiliency of the Domain Name System
- Promoting competition, consumer trust and consumer choice
- Whois policy (Competition, Consumer Trust & Consumer Choice, 2016)
The final recommendation on the above metrics is readily available and can augment the above-mentioned survey and its focus. Other information such as registration statistics are available on a number of websites but there is no readily available method to differentiate registrations by people inside the industry versus non-insiders. To further strengthen this case is the number of TLD owners who register their premium names to setup a parking page. In this case, the domain name is registered which allows the domain name to resolve to a website. In the case of a parked page, information on purchasing the domain name and/or pricing is immediately available to a potential buyer, referred to as a registrant.
The survey seeks to collect data that will address:
- Are new TLDs a success or is growth based primarily on industry insiders?
- Are people outside the industry familiar with new top level domains and if so, do they believe they are beneficial and would they consider registering a new top level domain name?
These types of questions are key in determining acceptance and awareness, with a variety of questions to help determine if the respondent understands the fundamentals of a domain name. An online version of the survey questions can be found here but is also reflected in Appendix A. Naturally, a survey needs “to be accurate and valuable…and the information gathered needs to be representative of the whole population” to ensure meaningful results (Research LifeLine, 2012). To accomplish this, a carefully developed sampling plan is required.
Sampling Plan
To meet the objective of the survey, a representative sample needs to be collected, handled, analyzed and reported upon. The sampling frame will be a simple random sample of Internet users who own or are considering registering a domain name, and/or who already have a website or are considering the creation of a website. This group of people are most likely to provide the necessary insight to answer our questions.
With over 75% of domain registrations occurring in North America or Europe, and with the United States having the highest number of registered domain names, the sampling plan needs to reflect this reality (Graham & De Sabbata, 2013). With a population size in the hundreds of millions and Europe having 24 official languages, the survey will be translated into the five most-spoken languages including English, Spanish, German, French and Chinese. A confidence level of 95% with a standard deviation of 5% is used to calculate the sample size as follows:
Sample Size = Z-Score * Standard Deviation * (1- Standard Deviation) / Margin of Error
= 1.96 * 5(.5) / (.05)
= 385 Respondents Required
To attain a sample size of 385 respondents based on a response rate of 25%, the survey needs to be sent to at least 1,540 different individuals and/or businesses that meet the desired criteria as described below (FluidSurveys Team, 2014). The sample of respondents needs to be carefully chosen to avoid duplication, for instance a respondent who may be considering a domain name for both their personal use and their business needs. If 385 responses are not attained, then additional participants will be sought to ensure the minimum number is achieved and the desired level of accuracy and reliability are attained.
Respondents can be acquired from a number of sources including email, social networking services, new business resources and communities, website creation tools such as websites and training sources. They can also be targeted via cooperation with one or more registrars (a company that registers names on behalf of their customers) or through the purchase of suitable email lists and paid online survey companies.
Survey Instrument and Mode of Delivery
A self-administered survey will be provided to respondents who can use the provided link, either from their computer or their mobile device, at their convenience. The online questionnaire will be comprised of both open and multiple choice questions and be provided in multiple languages, matching the diversity of the worldwide web’s audience and the users to whom the survey applies. With the use of an online survey, additional clarification and information can be provided to respondents as needed by clicking on the “?” symbol for each question. The online version of Appendix A shows the use of such a field. For example, on the question “Would you consider purchasing a new domain name extension?” the additional hint reflects “Would you consider purchasing a domain name ending with say, .club, .school, .surf, etc.”. Additionally, the “method used to contact sample units may not be the same method used to collect the data” (Groves, et al., 2009, p. 150). For example, the request to fill out the questionnaire could be initiated by mail but conducted online.
While there are advantages to providing mixed-modes of survey designs including “a reduction of cost/increased efficiency, the establishment of credibility and trust with the respondent and an improvement in the degree of privacy offered to the respondent” a web based survey and sampling plan will be our chosen method of data collection (de Leeuw, 2016). Additionally, based on sampling frame and the responders’ requisite knowledge of domain names and websites, it is a fitting method. This is particularly so given that many of the disadvantages of using web based surveys are overcome by the topical knowledge required to adequately respond to the survey. Examples of disadvantages include web browser issues, a reliable internet connection, computer literacy and understanding the difference between a domain name and a website. In addition, online surveys allow anyone around the world to respond and are quick to administer and to receive responses. Furthermore, the resulting data can be easily exported or combined with other data.
Implementation Plan
Lead-up information on the questionnaire, survey context and an easily accessible survey link or code with a clear deadline are essential to a successful implementation. However, if the right questions are not being asked, or the respondents misunderstand the questions as intended, or are unable to provide an answer easily, the process fails and the results become meaningless (Groves, et al., 2009, p. 286). Questions need to be reviewed for spelling, flow, terminology and ease of understanding.
Conducting a field test with a small sample of potential respondents is the best approach to ensure reliable results. A small group of participants should be provided with the survey and asked to offer feedback on the survey itself. The clarity of the questions, the logical flow of the survey, use of familiar terms, ease of access, time taken to complete the survey along with any other issues that might impact its quality and reliability need to assessed by this test group. Also, responses need to be validated and matched against the intended type of responses. For instance, how well do open-ended questions requesting a number match with the results, and how consistent are the answers with prior responses.
Once the field test is complete and any necessary adjustments made, the survey can be sent to the sampling frame where responses will be collected and reported upon. The initial timeframe will be for three months or until a maximum of 4,160 responses are received, which represents a 99% confidence interval with a 2% standard deviation for added benefit. Any subsequent surveys will be conducted for the same length of time with the identical questions and tracking of embedded data to ensure the data is comparable and reliable over time.
Data Management and Integration Plan
According to research there are three types of non-responses: failure to deliver the survey, unwillingness to participate and unable to participate (Groves, et al., 2009, p. 192). Missing data or poor quality data should not be easily dismissed as it should be carefully reviewed and considered. Respondents may miss questions intentionally or accidentally. For some studies, missing data is imputed with replacement values as though a response was received, but the missing data would be too important in this study and instead only the provided responses are to be included in the summary. If required, additional respondents would be pursued to meet the required sample size. To help minimize the impact of missing data, critical questions can be indicated as mandatory requiring the respondent to provide an answer before they are able to move to the next question. Additionally, to prevent a respondent from selecting any answer for the sole purpose of moving to the next question which will impact results, a ‘does not apply’ choice will be included for most questions. This option will be especially important for all critical/mandatory response questions.
Besides dealing with missing data, embedding other data can also be used in the flow of questions. Collapsed questions would be expanded if a preselected answer was chosen, such as ‘if yes, then…’ triggering a related question to appear. Conversely, logic that ends the survey would be triggered if the respondent does not have sufficient knowledge on the topic, for example not knowing what the Internet is; or if the respondent does not meet the minimum requirements, such as age. Additional information that can be collected but not directly asked of the respondent include embedded data. Embedded data can be comprised of date and time components, the length of time taken to complete the survey, location, country, referral information such as Facebook, Email, Survey Monkey, language, email address or response ID that can be used for tracking paid responses (Qualtrics LLC, 2016).
To help audit data quality, a few overlaps in the questions for consistency and logic will be incorporated. For example, a respondent would not likely own a website without owning a domain name or indicate they own a domain name but when asked how many, provide zero as a response. Additionally, open ended questions or ones where a respondent chooses ‘other’ would also have a minimum acceptable number of characters to help encourage a proper response.
Dashboard
After the surveys are complete, the data needs to be coded, entered into files, edited, missing data added, weighted and a sampling variance estimation calculated (Groves, et al., 2009, p. 331). While not all of these activities are required for the Internet domain name evaluation survey, as noted earlier on imputation, data needs to be coded before it can be used. While most of the questions in Appendix A can be easily assigned a unique number for each response, text responses will require extra attention to ensure they are coded correctly and to avoid producing statistical errors (Groves, et al., 2009, p. 332).
The dashboard needs to provide an effective way to convey the results of the survey. Multiple choice questions are great for visual display such as histograms while bar charts are a great method to display the percentage of users who responded to a question. Based on the survey questions provided in Appendix A, a draft dashboard has been created and is provided in Appendix B.
Besides displaying the results in a dashboard, patterns in responses should also be reviewed for additional insight. For example, if 30% of respondents already own a domain name and 70% of these respondents believe they are trustworthy; or if 60% of respondents indicate that cost is a major factor in their decision and 80% of these respondents prefer to buy a legacy domain name (legacy domain names are typically cheaper than new TLD names). These types of patterns provide insight on industry outsiders knowledge of new Top Level Domains along with their purchasing preferences.
Conclusion
With millions of new top level domain registrations, it is critical for the Internet industry to understand public sentiment to better grow and develop the market. To gauge the success of the program, two key questions must be answered: First, are top-level domain names being purchased by primarily industry insiders or by outsiders? And secondly, are people outside the industry aware of these new choices and would they consider registering a new top level domain name?
To answer these questions, a sampling frame of random Internet users who own or would consider purchasing a domain name and/or website have been identified as the target population. A sample size of 385 respondents is required to attain a 95% confidence level with a 5% standard deviation. Approximately 1,540 surveys are required to attain the requisite number of respondents based on a 25% response ratio.
The survey would be self-administered via a web link which the respondent could complete at their convenience. This is not only cost effective but fitting as the survey requires basic internet knowledge. However, before the survey is implemented a field test to ensure the quality and reliability of the survey needs to be conducted. Participants in the field test should assess the terminology, logical flow of questions, ease of access and time required to complete the survey. Once the field test is concluded with the necessary adjustments made, the link can be shared with the designed sampling frame. Embedded data including the time taken, referral, language, etc. will also be collected as a basis of comparison for future iterations of the survey. Finally, a proposed dashboard has been provided as a way to visualize the data and present the results.
References
Competition, Consumer Trust & Consumer Choice. (2016, September 11). Global Registrant Survey Final Phase Results Available. Retrieved from ICANN: https://www.icann.org/news/announcement-2-2016-09-15-en
de Leeuw, E. D. (2016, 05 24). Data Quality in Mail, Telephone, and Face to Face surveys. Retrieved from Market Research: http://www.marketingresearch.org/issues-policies/best-practice/mixed-mode-surveys
FluidSurveys Team. (2014, October 8). Response Rate Statistics for Online Surveys – What Numbers Should You be Aiming For? Retrieved from FluidSurveys University: http://fluidsurveys.com/university/response-rate-statistics-online-surveys-aiming/
Graham, M., & De Sabbata, S. (2013). Geography of Top-Level Domain Names. Retrieved from Information Geographies at the Oxford Internet Institute: http://geography.oii.ox.ac.uk/?page=geography-of-top-level-domain-names
greenSec GmbH. (2016, 11 15). New gTLD Overview. Retrieved from nTLDStats: https://ntldstats.com/tld
Groves, R. M., Couper, M. P., Fowler, F. J., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2009). Survey Methodology. (S. Edition, Ed.) New Jersey: John Wiley & Sons, Inc.
Internet Corporation for Assigned Names and Numbers. (2015). Frequently Asked Questions. Retrieved from New Generic Top-Level Domains: https://newgtlds.icann.org/en/applicants/global-support/faqs/faqs-en
Qualtrics LLC. (2016). Embedded Data. Retrieved from Qualtrics: https://www.qualtrics.com/support/survey-platform/survey-module/survey-flow/standard-elements/embedded-data/
Research LifeLine. (2012). Survey Process White Paper Series: Five Steps in Creating a Survey Sampling Plan. Retrieved from Hubspot: http://cdn2.hubspot.net/hub/58820/docs/rl_process_wp_five_step_sampling.pdf
Snap Surveys. (2015, October 8). Engage with Respondents by Making the Survey Relevant. Retrieved from Snap Surveys: http://www.snapsurveys.com/blog/engage-respondents-making-survey-relevant/