What measures are best to address potential biases in the selection of your data sources?


What measures are best to address potential biases in the selection of your data sources?

data sources


What is a Data Source

Data Source is very important. In data analysis and business intelligence, a data source is a vital component that provides raw data for analysis. A data source is a location or system that stores and manages data, and it can take on many different forms. From traditional databases and spreadsheets to cloud-based platforms and APIs, countless types of data sources are available to modern businesses.

Understanding the different types of data sources and their strengths and limitations is crucial for making informed decisions and deriving actionable insights from data. In this article, we will define what is a data source, examine data source types, and provide examples of how they can be used in different contexts.

In short, data source refers to the physical or digital location where data can be stored as a data table, data object, or another storage format. It’s also where someone can access data for further use — analysis, processing, visualization, etc.

You often deal with data sources when you need to perform any transformations with your data. Let’s assume you have an eCommerce website on Shopify. And you want to analyze your sales to understand how to enhance your store performance. You decided that you would use Tableau for data processing. As it is a standalone tool, you must somehow fetch the data you need from Shopify. Thus, Shopify will act as a data source for your further data manipulations.

The difference between what is being valued and what is believed to be valued (Casal & Mateu, 2003). Unlike random error, systematic error is not compensated by increasing the sample size (Department of Statistics, Universidad Carlos III de Madrid). However, although its importance is vital in the development of an investigation, it is relevant to mention that
none is exempt from them; and that the essential thing is to know them to try to avoid, minimize or correct them (Beaglehole et al., 2008).



The risk of bias appearing is intrinsically related to clinical research, which is particularly high in frequency since it works with variables that involve individual and population dimensions, which are also difficult to control. However, they also occur in basic sciences, a context in which experimental settings present conditions in which biases adopt peculiar characteristics and are less complex to minimize, since a series or a large part of the variables can be controlled.

From a statistical perspective, when trying to measure a variable, it must be considered that the value obtained as a result of the measurement (XM) is made up of two parts; the true value (XV) and the measurement error (XE); so that XM = XV + XE. Thus, the measurement error is in turn composed of two parts; one random and the other systematic or bias, which can be measurement, selection or confusion (Dawson-Saunders et al., 1994).

This explanation allows us to understand the fundamental characteristics of any measurement: accuracy (measurements close to the true value [not biased]); and precision (repeated measurements of a phenomenon with similar values) (Manterola, 2002).
The objective of this article is to describe the concepts that allow us to understand the importance of biases, the most frequent ones in clinical research, their association with the different types of research designs and the strategies that allow them to be minimized and controlled.

A simple way to understand the different possibilities of committing bias during research is to think about the three axes that dominate research: what will be observed or measured, that is, the variable under study; the one who will observe or measure, that is, the observer; and with what will be observed or measured, that is, the measuring instrument (Tables II and III) (Beaglehole et al.).

1. From the variable (s) under study.

There are a series of possibilities of bias that are associated with the variable under study, either at the time of its observation, the measurement of its magnitude and its subsequent classification (Manterola).

a) Periodicity: Corresponds to the variability in the observation; That is, what is observed can follow an abnormal pattern over time, either because it is distributed uniformly over time or because it is concentrated in periods. Knowledge of this characteristic is essential in biological events that present known cycles such as the circadian rhythm,
electroencephalographic waves, etc.

b) Observation conditions: There are events that require special conditions for their occurrence to be possible, such as environmental humidity and temperature, respiratory and heart rates. These are non-controllable situations that, if not adequately considered, can generate bias; context more typical of basic sciences.

c) Nature of the measurement: Sometimes there may be difficulty in measuring the magnitude or value of a variable, qualitative or quantitative. This situation may occur because the magnitude of the values ​​is small (hormonal determinations), or due to the nature of the phenomenon under study (quality of life).

d) Errors in the classification of certain events:
They may occur as a result of modifications in the nomenclature used; fact that must be noted by the researcher. For example, neoplasm classification codes, operational definition of obesity, etc.

2. From the observer
The ability to observe an event of interest
(EI) varies from one subject to another. What’s more, when faced with the same stimulus it is possible that two individuals can have different perceptions. Therefore, homogenizing the observation, guaranteeing adequate conditions for its occurrence and adequate observation methodology, leads to minimizing measurement errors.

This is how we know that the error is inherent to the observer, independent of the measuring instrument used. This is why in the different clinical research models, strict conditions are required to homogenize the measurements made by different observers; using clear operational definitions or verifying compliance with these requirements among the subjects incorporated into the study.

 3. From the measurement instrument (s) The measurement of biomedical phenomena using more than just the senses entails the participation of measurement instruments, which in turn may have technical limitations to be able to measure exactly what they are. is desired.

The limitations of measurement instruments apply both to “hard” devices and technology, as well as to population exploration instruments such as surveys, questionnaires, scales and others. Regarding the latter, it is important to consider that the verification of compliance with the technical attributes of these is usually left aside, which, independent of any consideration, are “measuring instruments”, since they have been designed to measure the occurrence of an EI; Therefore, they must be subject to the same considerations as any measuring instrument (Manterola).

These restrictions easily apply to diagnostic tests, in which there is always the probability of overdiagnosing subjects (false positives) or underdiagnosing them (false negatives), committing errors of a different nature in both cases.
Frequently, it is necessary to resort to the design of data collection instruments; whose purpose, like the application of diagnostic tests, is to separate the population according to the presence of some IS.

Thus, if an instrument lacks adequate sensitivity, it will determine a low identification rate of subjects with IS (true positives). On the contrary, screening instruments with low specificity will decrease the probability of finding subjects without the IS (true negatives).

For example, a questionnaire intended to carry out a prevalence study of gastroesophageal reflux may consider inappropriate items to detect the problem in a certain group of subjects, altering their sensitivity. The same instrument, with an excessive number of items of little significance in relation to the problem, may lack adequate specificity to measure EI.


Cohorts Cases and controls Cross section Ecological studies

  1. Selection bias Low High Medium Not applicable
  2. Recall bias Low High High Not applicable Confusion
  3. bias Low Medium Medium High
  4. Follow-up losses High Low Not applicable Not applicable
  5. Time required High Medium Medium Low
  6. Cost High Medium Medium Low
  7. Table III. Most common types of bias in observational studies.
  8. MANTEROLA, C. & OTZEN, T. Biases in clinical research. Int. J. Morphol., 33(3):1156-1164, 2015. Another way of classifying biases is that which is related to the frequency in which they occur and the stage of the study in which they originate; It is known that in clinical research, the most frequent biases that affect the validity of a study can be classified into three categories: selection (generated during the selection or monitoring of the study population), information (originated during measurement processes in the study population) and confusion (occur due to the impossibility of comparing the study groups).

1. Selection biases
This type of bias, particularly common in case-control studies (events that occurred in the past can influence the probability of being selected in the study); It occurs when there is a systematic error in the procedures used to select the subjects of the study (Restrepo Sarmiento & Gómez-Restrepo, 2004). Therefore, it leads to an estimate of the effect different from that obtainable for the white population.

It is due to systematic differences between the characteristics of the subjects selected for the study and those of the individuals who were selected for us. For example: hospital cases and those excluded from these either because the subject dies before arriving at the hospital due to the acute or more serious nature of their condition; or for not being sick enough to require admission to the hospital under study; or due to the costs of entry; the distance of the healthcare center from the home of the subject who is excluded from the study, etc.

They can occur in any type of study design, however, they occur most frequently in retrospective case series, case-control, cross-sectional, and survey studies. This type of bias prevents extrapolation of conclusions in studies carried out with volunteers drawn from a population without IS. An example of this situation is the so-called Berkson bias; Also called Berkson’s fallacy or paradox, or admission or diagnostic bias; which is defined as the set of selective factors that lead to systematic differences that can be generated in a case-control study with hospital cases.

It occurs in those situations in which the combination between an exposure and the IS under study increases the risk of admission to a hospital, which leads to a systematically higher exposure rate among hospital cases compared to controls (for example: negative association between cancer and pulmonary tuberculosis, in which tuberculosis acted as a protective factor for the development of cancer; which was explained by the low frequency of tuberculosis in those hospitalized for cancer, a fact
that does not mean that among these subjects the frequency of the disease is less).

Another subtype of selection bias is the so-called Neymann bias (prevalence or incidence), which occurs when the condition under study determines premature loss due to death of the subjects affected by it; For example, if in a group
of 1000 subjects with high blood pressure (risk factor for myocardial infarction) and 1000 non-hypertensive subjects, followed for 10 years; An intense association is observed between arterial hypertension and myocardial infarction. However, it may occur that an association is not obtained due to the non-incorporation in the analysis of subjects who die from myocardial infarction during follow-up.

Another subtype of selection bias is the so-called non-response bias (self-selection or volunteer effect), which occurs when the degree of motivation of a subject who voluntarily participates in research can vary significantly in relation to other subjects; either over or under reporting.

Another that should be mentioned is the membership (or belonging) bias, which occurs when among the subjects under study there are subgroups of individuals who share a particular attribute, related positively or negatively with the variable under study; For example, the profile of surgeons’ habits and lifestyles may differ significantly from that of the general population, such that incorporating a large number of this type of subjects in a study may determine findings conditioned by this factor.

Another is the bias of the selection procedure, which occurs in some clinical trials (CT), in which the random assignment process to the study groups is not respected (Manterola & Otzen, 2015). Another type of selection bias is loss to follow-up bias, which can occur especially in cohort studies, when subjects from one of the study cohorts are lost totally or partially (≥ 20%) and pre-follow-up cannot be completed. -established, thus generating a relevant alteration in the results (Lazcano-Ponce et al., 2000; Manterola et al., 2013).

measurement bias

2.  Measurement bias

This type of bias occurs when a defect occurs when measuring exposure or evolution that generates different information between the study groups that are compared (precision). It is therefore due to errors made in obtaining the information that is required once the eligible subjects are part of the study sample (classification of subjects with and without IS; or of exposed and non-exposed).

In practice, it can present itself as the incorrect classification of subjects, variables or attributes, within a category different from the one to which they should have been assigned. The probabilities of classification can be the same in all groups under study, called “non-differential incorrect classification” (the degree of misclassification) MANTEROLA, C. & OTZEN, T. Biases in clinical research. Int. J. Mor

What is your strategy for minimizing non-response bias?


What is your strategy for minimizing non-response bias?



Have you ever looked at audience data and thought that it doesn’t seem completely real or accurate? It could be the result of bias in the data. Bias in the data generates results that are not fully representative of the audience you are researching. It can happen intentionally or unintentionally, and is something you should take into account in your planning and strategy.

Before we continue, you might want to read this couple of articles about  how we use and enrich our data sources in Audiense , and  data restrictions and how it works in the real world .

An example of data bias can be found in demographic and socioeconomic data. India’s population is made up of  52% men and 48% women . If we talk about social data, to begin with, Internet penetration in the population is  49% . And looking at India’s population in Facebook Insights, we see that the gender split is 76% men and 24% women! So what is the correct data?

This shows us that there is an imbalance between how many men and women there are on social media compared to the number of men and women in the country. Simply put, we know that not the entire adult population in the world is on social media, so we are aware that the data we are working with will only be representative of the existing population on social media.

If we want to go deeper, we must remember that people can create various social profiles, such as private accounts or fan pages, and this can differ depending on the online community you are analyzing.

If you’re to deliver an effective survey, you’ll need to identify what you want to measure, the audience you want to target and your choice of distribution method to reach that audience.

However, if after all this careful planning you find that your survey response rate is much lower than you expected, you need to be asking yourself, what could be the cause of this?

Well, one of the biggest factors could be nonresponse bias. Read on to find out more about this issue, its causes, why it can be problematic and ways to reduce nonresponse bias in your own surveys.

What is nonresponse bias?

Nonresponse bias occurs when survey participants are unwilling or unable to respond to a survey question or an entire survey. While the reasons for nonresponse can vary from person to person, when respondents refuse to participate it can be a major source of error in your survey data, which can harm its accuracy.

If it’s to be considered a form of bias, then a source of error must be systematic in nature. And nonresponse bias is not an exception to this rule.

So, if a survey method or design is created in a way that makes it more likely for certain groups of potential respondents to refuse to participate or be absent during a surveying period, then it has created a systematic bias.

Consider the following example where you’re asking respondents for sensitive information as part of a survey, which is looking to measure tax payment compliance.

In this scenario, it’s likely that citizens who do not properly follow tax laws are likely to be the most uncomfortable with filling out this type of survey and be more likely to refuse. Consequently, this will bias the data towards the more law-abiding net sample, rather than the original sample.

This nonresponse bias in surveys when requesting legally sensitive information has been proven to be even more extreme if the survey explicitly states that a government or another organisation of authority is collecting that data.

What causes nonresponse bias?

Besides requests for sensitive information, there are many more issues that can cause nonresponse bias.

Here are some of the key ones.

Poor survey design

From the length and presentation of your survey to how easy it is to understand and answer. There’s a lot of issues to do with your survey design that can cause respondents to drop out and fail to complete your survey.

Subsequently, you need to make sure your survey is as clear, concise and engaging as you can make it.

Be sure to follow and include some survey design best practices, to make your next survey is as good as it can be.

Incorrect target audience

One of the first things you need to think about in your survey project, is the audience you’re targeting.

Make sure that audience is relevant to the survey you’re looking to send out.

For example, if you were issuing a survey to canvas views about a new flavour of dog food, you wouldn’t want to accidentally include cat owners in your survey distribution list.

Failed deliveries

Unfortunately, when you send your surveys, some will always end up going directly into a spam folder. However, if you’ve not set up your sending options in the right way, you may not even know if your survey wasn’t received and it will be just recorded as a nonresponse.

To help with that some distribution options such as email enable open tracking options, to let you know if your email was opened, how many survey click throughs you got, and who responded to your survey, so you can be more accurate in your recording.


There can be a lot going on in people’s lives, so, no matter how good your survey, there’s always likely to be some people who will just say ‘no’ to completing it.

It could be a bad day or time for them, or they may just not want to do it. However, bear in mind that just because they said “no” today, it doesn’t mean they won’t take one of your surveys another time.

Accidental omission

Sometimes some people will simply forget to complete your survey.

While it’s difficult to prevent this from happening, in most cases this will only affect a smaller number of your nonresponses.

nonresponse bias


Why is nonresponse bias a problem?

The problem with nonresponse bias is that it can lead to inconclusive results, which prevents your survey from meeting its objective, no matter what your survey’s goal.

For example, let’s say you wanted to gather data about a particular product feature, to find out whether or not it was still adding value to your product.

If an insufficient number of your sample completed your survey, you might not have sufficient data to make an informed decision on whether to keep the feature as it is, improve it, or go in another direction completely.

Survey data is only at its most informative and useful when you’re able to see the complete picture of something. So, limiting your nonresponse bias not only has an impact on your survey responses, but on your decision making too.

How to reduce nonresponse bias

Having got up to speed with nonresponse bias, it’s causes and why it’s a problem, we’re sure many of you will be keen to know how to keep it to an absolute minimum in your surveys.

Well, read on for some tips about how to reduce nonresponse bias.

Keep your surveys simple and concise

Short and simple is the key here.

In fact, studies show that longer surveys on average lose more than three times as many respondents compared to a survey that is less than five minutes long.

The problem with including too many survey questions, is that your customer may not finish their responses, or want to begin your survey in the first place. Consider making your survey no more than five minutes long with 10 questions at most.

Pre test your survey

It is really important to ensure that your survey and your survey invites are able to run smoothly on any medium or device that your respondents might potentially use. This is because respondents are more likely to ignore any survey requests which have long loading times, or the questions don’t fit accurately with the screen size they’re looking at.

Consequently, it’s prudent to consider all the possible communications software and devices your survey may be run on and pre-test your surveys on each of these, to try and ensure it runs as smoothly on as many of these as you can.

Set participant expectations

To help minimise nonresponse bias, it’s good practice to communicate to your customer what they should expect from your survey, either through an earlier email or in your survey introduction message.

You need to outline your survey’s goal, approximate time it will take to complete and any details about anonymity or confidentiality that’s included in your survey.

Re-examine your survey timing and distribution methods

Your survey distribution timing and method can also make a difference to your volume of nonresponses.

From whether you’re targeting an internal or external or B2B or B2C audience. When it comes to the best times to send your survey, there’s lots of factors that can influence your success. However, it can be helpful to test out a range of different days and times and see what best works for you.

Whether you use email, SMS or a weblink, or have more success through social media or QR codes. Similarly, to altering your timings, different survey distribution methods can work better with different audience groups.

Once again, it can also be helpful to test out different channels with your audience to see what generates the best response rate, while minimising your nonresponses.

Offer an incentive to complete your survey

Always try to communicate to respondents how they will benefit from taking your survey. It could be as simple as telling them how their feedback will be used and the pain points this will solve for them moving forward.

Alternatively, if you’re targeting a consumer audience, you might like to offer a monetary incentive for them to complete your survey.

For example, you might like to offer a discount on a future purchase they make with you, or an incentive for referring a friend.



Issue reminders

Busy customers can easily put your survey on their to-do’s list, but then forget to complete it. So, being able to send a few reminders can be really beneficial in boosting the number of completed responses you’re able to gather.

Carefully make a note of when you send reminders and be mindful to space them out, so you don’t harass people on your contact lists, especially those who’ve already completed your survey.

Remember to close the feedback loop

Be sure to thank those that complete your survey, letting them know how much you appreciate their time and feedback. And depending on the nature of your survey, you may give a brief indication of what you hope to do with that information.

Ultimately, when a respondent feels that they have been heard and appreciated, then they’ll be more likely to complete another one of your surveys in the future.

Get better results when sending your surveys

We hope you found this blog interesting. Having provided an overview of nonresponse bias, it’s causes and ways to reduce it, we hope you will be able to incorporate some of this advice in your own surveys

While the extra checks and tasks are likely to add a bit more time on to your survey project, the boost to your survey response numbers and the quality of your data should make up for this.