What is your strategy for minimizing non-response bias?

english

What is your strategy for minimizing non-response bias?

bias

BIAS

Have you ever looked at audience data and thought that it doesn’t seem completely real or accurate? It could be the result of bias in the data. Bias in the data generates results that are not fully representative of the audience you are researching. It can happen intentionally or unintentionally, and is something you should take into account in your planning and strategy.

Before we continue, you might want to read this couple of articles about  how we use and enrich our data sources in Audiense , and  data restrictions and how it works in the real world .

An example of data bias can be found in demographic and socioeconomic data. India’s population is made up of  52% men and 48% women . If we talk about social data, to begin with, Internet penetration in the population is  49% . And looking at India’s population in Facebook Insights, we see that the gender split is 76% men and 24% women! So what is the correct data?

This shows us that there is an imbalance between how many men and women there are on social media compared to the number of men and women in the country. Simply put, we know that not the entire adult population in the world is on social media, so we are aware that the data we are working with will only be representative of the existing population on social media.

If we want to go deeper, we must remember that people can create various social profiles, such as private accounts or fan pages, and this can differ depending on the online community you are analyzing.

If you’re to deliver an effective survey, you’ll need to identify what you want to measure, the audience you want to target and your choice of distribution method to reach that audience.

However, if after all this careful planning you find that your survey response rate is much lower than you expected, you need to be asking yourself, what could be the cause of this?

Well, one of the biggest factors could be nonresponse bias. Read on to find out more about this issue, its causes, why it can be problematic and ways to reduce nonresponse bias in your own surveys.

What is nonresponse bias?

Nonresponse bias occurs when survey participants are unwilling or unable to respond to a survey question or an entire survey. While the reasons for nonresponse can vary from person to person, when respondents refuse to participate it can be a major source of error in your survey data, which can harm its accuracy.

If it’s to be considered a form of bias, then a source of error must be systematic in nature. And nonresponse bias is not an exception to this rule.

So, if a survey method or design is created in a way that makes it more likely for certain groups of potential respondents to refuse to participate or be absent during a surveying period, then it has created a systematic bias.

Consider the following example where you’re asking respondents for sensitive information as part of a survey, which is looking to measure tax payment compliance.

In this scenario, it’s likely that citizens who do not properly follow tax laws are likely to be the most uncomfortable with filling out this type of survey and be more likely to refuse. Consequently, this will bias the data towards the more law-abiding net sample, rather than the original sample.

This nonresponse bias in surveys when requesting legally sensitive information has been proven to be even more extreme if the survey explicitly states that a government or another organisation of authority is collecting that data.

What causes nonresponse bias?

Besides requests for sensitive information, there are many more issues that can cause nonresponse bias.

Here are some of the key ones.

Poor survey design

From the length and presentation of your survey to how easy it is to understand and answer. There’s a lot of issues to do with your survey design that can cause respondents to drop out and fail to complete your survey.

Subsequently, you need to make sure your survey is as clear, concise and engaging as you can make it.

Be sure to follow and include some survey design best practices, to make your next survey is as good as it can be.

Incorrect target audience

One of the first things you need to think about in your survey project, is the audience you’re targeting.

Make sure that audience is relevant to the survey you’re looking to send out.

For example, if you were issuing a survey to canvas views about a new flavour of dog food, you wouldn’t want to accidentally include cat owners in your survey distribution list.

Failed deliveries

Unfortunately, when you send your surveys, some will always end up going directly into a spam folder. However, if you’ve not set up your sending options in the right way, you may not even know if your survey wasn’t received and it will be just recorded as a nonresponse.

To help with that some distribution options such as email enable open tracking options, to let you know if your email was opened, how many survey click throughs you got, and who responded to your survey, so you can be more accurate in your recording.

Refusals

There can be a lot going on in people’s lives, so, no matter how good your survey, there’s always likely to be some people who will just say ‘no’ to completing it.

It could be a bad day or time for them, or they may just not want to do it. However, bear in mind that just because they said “no” today, it doesn’t mean they won’t take one of your surveys another time.

Accidental omission

Sometimes some people will simply forget to complete your survey.

While it’s difficult to prevent this from happening, in most cases this will only affect a smaller number of your nonresponses.

nonresponse bias

 

Why is nonresponse bias a problem?

The problem with nonresponse bias is that it can lead to inconclusive results, which prevents your survey from meeting its objective, no matter what your survey’s goal.

For example, let’s say you wanted to gather data about a particular product feature, to find out whether or not it was still adding value to your product.

If an insufficient number of your sample completed your survey, you might not have sufficient data to make an informed decision on whether to keep the feature as it is, improve it, or go in another direction completely.

Survey data is only at its most informative and useful when you’re able to see the complete picture of something. So, limiting your nonresponse bias not only has an impact on your survey responses, but on your decision making too.

How to reduce nonresponse bias

Having got up to speed with nonresponse bias, it’s causes and why it’s a problem, we’re sure many of you will be keen to know how to keep it to an absolute minimum in your surveys.

Well, read on for some tips about how to reduce nonresponse bias.

Keep your surveys simple and concise

Short and simple is the key here.

In fact, studies show that longer surveys on average lose more than three times as many respondents compared to a survey that is less than five minutes long.

The problem with including too many survey questions, is that your customer may not finish their responses, or want to begin your survey in the first place. Consider making your survey no more than five minutes long with 10 questions at most.

Pre test your survey

It is really important to ensure that your survey and your survey invites are able to run smoothly on any medium or device that your respondents might potentially use. This is because respondents are more likely to ignore any survey requests which have long loading times, or the questions don’t fit accurately with the screen size they’re looking at.

Consequently, it’s prudent to consider all the possible communications software and devices your survey may be run on and pre-test your surveys on each of these, to try and ensure it runs as smoothly on as many of these as you can.

Set participant expectations

To help minimise nonresponse bias, it’s good practice to communicate to your customer what they should expect from your survey, either through an earlier email or in your survey introduction message.

You need to outline your survey’s goal, approximate time it will take to complete and any details about anonymity or confidentiality that’s included in your survey.

Re-examine your survey timing and distribution methods

Your survey distribution timing and method can also make a difference to your volume of nonresponses.

From whether you’re targeting an internal or external or B2B or B2C audience. When it comes to the best times to send your survey, there’s lots of factors that can influence your success. However, it can be helpful to test out a range of different days and times and see what best works for you.

Whether you use email, SMS or a weblink, or have more success through social media or QR codes. Similarly, to altering your timings, different survey distribution methods can work better with different audience groups.

Once again, it can also be helpful to test out different channels with your audience to see what generates the best response rate, while minimising your nonresponses.

Offer an incentive to complete your survey

Always try to communicate to respondents how they will benefit from taking your survey. It could be as simple as telling them how their feedback will be used and the pain points this will solve for them moving forward.

Alternatively, if you’re targeting a consumer audience, you might like to offer a monetary incentive for them to complete your survey.

For example, you might like to offer a discount on a future purchase they make with you, or an incentive for referring a friend.

survey

 

Issue reminders

Busy customers can easily put your survey on their to-do’s list, but then forget to complete it. So, being able to send a few reminders can be really beneficial in boosting the number of completed responses you’re able to gather.

Carefully make a note of when you send reminders and be mindful to space them out, so you don’t harass people on your contact lists, especially those who’ve already completed your survey.

Remember to close the feedback loop

Be sure to thank those that complete your survey, letting them know how much you appreciate their time and feedback. And depending on the nature of your survey, you may give a brief indication of what you hope to do with that information.

Ultimately, when a respondent feels that they have been heard and appreciated, then they’ll be more likely to complete another one of your surveys in the future.

Get better results when sending your surveys

We hope you found this blog interesting. Having provided an overview of nonresponse bias, it’s causes and ways to reduce it, we hope you will be able to incorporate some of this advice in your own surveys

While the extra checks and tasks are likely to add a bit more time on to your survey project, the boost to your survey response numbers and the quality of your data should make up for this.

How to best address potential confounding variables in your data collection?

Translating

How to best address potential confounding variables in your data collection?

data collection

Data Collection

Data is a collection of facts, figures, objects, symbols, and events gathered from different sources. Organizations collect data with various data collection methods to make better decisions. Without data, it would be difficult for organizations to make appropriate decisions, so data is collected from different audiences at various points in time.

For instance, an organization must collect data on product demand, customer preferences, and competitors before launching a new product. If data is not collected beforehand, the organization’s newly launched product may fail for many reasons, such as less demand and inability to meet customer needs. 

Although data is a valuable asset for every organization, it does not serve any purpose until analyzed or processed to get the desired results.

Data collection methods are techniques and procedures used to gather information for research purposes. These methods can range from simple self-reported surveys to more complex experiments and can involve either quantitative or qualitative approaches to data gathering.

Some common data collection methods include surveys, interviews, observations, focus groups, experiments, and secondary data analysis. The data collected through these methods can then be analyzed and used to support or refute research hypotheses and draw conclusions about the study’s subject matter.

What is a confounding variable?

The concept of confusion is probably one of the most important in general epidemiology. Firstly, because much of the work carried out in this field of science consists precisely of trying to prevent it when designing research studies or controlling its effect when it appears in the research work carried out. Secondly, and specifically with regard to health professionals, because the adequate understanding of this phenomenon will depend on whether they can interpret, critically and correctly, the results of the many studies that are published in the scientific literature. .

This review aims to explain, in a didactic manner and with the help of several examples, the concept of confusion
and then do the same with another important concept, that of effect modification (interaction), and finally describe the differences between both concepts.

Concept

Although some antecedents can be found in Francis Bacon, the first author who explicitly addressed the issue of confusion was the British philosopher and economist John Stuart Mill (1806-1873)1. When referring to the criteria necessary for establishing a causal relationship, Mill pointed out the need to ensure that no factor was present that had effects that could be confused with the agent one wanted to study.

Before defining the confusion phenomenon, it is necessary to describe the counterfactual approach of biological models2. Given the presence of data referring to 15 newborns with neural tube defects (NTD) in a sample of 10,000 women with folic acid deficiency, we could ask ourselves if the incidence of malformations is due to folic acid deficiency. The question is important because, if the answer is affirmative, we would have a simple solution to the problem in our hands: for example, the fortification of foods with folic acid.

To answer this question, it is necessary to compare this group of women with another that had normal folic acid values. If the hypothesis that the deficiency increases the incidence of NTD in newborns is true, it would be logical to find a smaller number of newborns with these malformations, for example 5 cases with NTD, in another we would obtain a relative risk of 3, which would be would be interpreted as stating that folic acid deficiency multiplies by three the risk of a newborn being born with a neural tube defect.

However, we cannot rule out that this second group of women, with normal values ​​of
folic acid, also present certain healthy characteristics, such as a better diet in general, a better genetic makeup or a low prevalence of risk factors such as tobacco. or alcohol. Therefore, it would be acceptable to conclude that the lower incidence of NTD in these women may be due to two different phenomena: to normal values ​​of folic acid, but also to healthier habits and conditions, with which the relative risk of 3 would be a overestimation of the harmful effect of
low concentrations of folic acid in pregnant women.

Intuitively, it seems logical that the most perfect procedure to determine the effect of folic acid deficiency would consist of comparing the first 10,000 women who had insufficient folic acid values ​​with themselves but assuming that they had normal folic acid concentrations. In this case, both groups would only differ in their exposure to folic acid and the measure of association would really be attributable to folic acid deficiency. However, this comparison group is not possible
in practice, it is not “feasible”, it goes against the facts (each woman has or does not have an adequate value of folic acid), and for this reason it has been called a “counterfactual group”. .

If there is an association between folic acid deficiency and NTDs, we would probably obtain a figure lower than 15 in these
“counterfactual” women with normal concentrations of folic acid, but a frequency greater than the hypothetical figure of 5 that we presented previously, we would obtain, for example, the hypothetical number of 10.

confounding variable

 

ID

In general terms, we speak of confounding when there are important differences between the raw estimates of an association and those adjusted for possible confounding factors. These differences can be assessed following various criteria, although there is a certain consensus on the importance of assessing the effect that the adjustment has on the magnitude of the changes in the association measures. Thus, a factor can be considered a confounder when its adjustment is responsible for a change of at least
10% in the magnitude of the difference between the adjusted and raw estimates.

Before carrying out these comparisons, it is necessary to estimate the adjusted values. The most classic method to obtain adjusted values ​​of association measures is the one we have presented previously, which consists of recalculating new estimates within each stratum of the possibly confounding variable. It is easy to understand that, when we want to assess several confounding factors simultaneously (e.g., age categorized into two groups, sex, and intake of a given food as a dichotomous variable), we quickly reach the situation where we lack resources in the strata to be able to reach a valid estimate in each stratum (we would obtain 8 strata with the variables that we have just listed).

A more efficient option to consider the confounding role of various variables simultaneously is multivariate analysis. Multivariate analysis is a complex procedure carried out more or less automatically by statistical programs and which consists of obtaining, from an initially important number of variables, the set of variables (called independent variables, covariates or predictor variables) that are most intensely associated with the outcome of interest (dependent variable). This set of variables constitutes what we call the “multivariate statistical model.” From
this model we obtain the measures of association of the different variables that make it up, but with the additional advantage that each of these estimates is adjusted by the other variables that make up the model.

Depending on the scale of the variable that quantifies the outcome, different types of multivariate models are used: multiple linear regression (quantitative outcome), logistic regression (dichotomous outcome) or multiple Cox regression (survival function as outcome of interest), Poisson (outcome in the form of rates).

The main advantage of multivariate analysis over stratified analysis is that multivariate models are more efficient. That is, given the same sample size used, more precise estimates are obtained and with a greater number of variables than would be admissible in a stratified analysis.

When estimating these models to identify confounding variables, it is recommended to choose any variable that, while meeting the general criteria for confounding variables (criteria summarized in the acyclic diagrams), is responsible for changes of more than 10% between crude measures of association (without said variable in the model) and adjusted (with said variable included in the model) and that present a conservative level of significance (p value), approximately less2 than 0.20.

identify confounding

 

Because confounding factors introduce, by definition, a bias in the measures of association, it is evident that an attempt should be made to prevent and control this effect before presenting the definitive results of an investigation. Confounding factors can be prevented in the design phase or eliminated in the analysis phase of an epidemiological study.

Why Are Confounding Variables Important?

A quantitative study can be an investment of significant time and money. You must have confidence that your study will be reliable, and its results will be valid. Studies should be constructed such that they could be repeated, with an expectation of the same result.

When a research study has low bias and a high level of repeatability and control, it has high internal validity — in other words, a study is internally valid if it does not bias a participant towards any specific answer or action.

If your research study has significant confounding variables, then the conclusions from that study may be wrong. Making decisions based on these misguided conclusions can result in significant loss of time and money for organizations.

Best Practices for Avoid Confounding Variables

  • Use within-subject study designs when possible. Counterbalance or randomize the order in which participants are exposed to the different conditions in your study. For example, if they are testing two designs, randomly decide the first one tested by each participant. Within-subjects designs reduce sources of error and naturally counterbalance experimental conditions.
  • Randomly assign condition groups for between-subjects study designs. For example, randomly decide which design should be seen by a participant.

What best steps to take to ensure the representative sample?

images 10

What best steps to take to ensure the representative sample?

representative sample

 

What is a representative sample?

The representative sample is a sample of a relatively appropriate size that has been selected by random procedures and the characteristics observed in it correspond to the population from which it was drawn (Ras, 1980; Cochran, 1976; Scheaffer, Mendenhall and Ott, 1987). It is not possible, in any case, to be certain of the degree of representativeness, but rather there is a reasonable probability of that representativeness.

Representativeness

Is a function of several factors, it not only depends on the randomness and size of the sample, but also on the sampling design, very particular for each case, the use of key auxiliary information, the sampling design and a useful and useful sampling frame. updated. The term representative is used as long as the sample faithfully represents the variable under study, which has a probabilistic distribution in the population and the frequency distribution in the sample must be mirror or very similar to that of the population.

This highlights how complex the selection of a representative sample is. To do this, the following must be taken into account: the way the sample is selected, the estimators to be proposed and their precision, the determination of the sample size that takes into account the “aquaricity” or margin of error allowed, the level of confidence in the estimation and variability of the variable on which the Probabilistic Inference is going to be carried out.

Likewise, attention must be paid to the available sampling frame and the set of key auxiliary variables or covariates that are correlated with the variables of interest, which will allow improving the sampling design, with the formation of strata, selection of direct estimators, such as the Horvitz-Thompson, and indirect (ratio, regression and difference), and choose a sample size appropriate to a given precision, choose samples with probabilities proportional to a measure of size (PPT) and use calibrated estimates where adjustment has to be made. sampling weights depending on the non-response and the auxiliary information found, especially in complex samples.

Many times, when designing a probabilistic sample, concessions must be made, especially if the statistical population is asymmetric, there are even times when elements with probability one (1) of belonging to the sample are used and, if this is not done, the sample will not be representative enough.

A probabilistic sample in its structure approaches a greater degree of what is called representativeness when the value of the distance between the estimate of the sample and the value of the population parameter becomes smaller, this is known as aquaricity in the Statistical inference.

We can be in the presence of a sufficiently representative sample when the selection process assigns a probability of inclusion in advance to each element, if this probability is different from zero, if known, and not necessarily being equal for each element of the population. and, furthermore, if the sampling error is low, if aquarity exists and if a random process is used in its selection.

The best way we have to define a sufficiently representative sample is one where a probabilistic sampling strategy is used that allows estimating the value of the parameter with aquaricity, the minimum bias, the minimum Standard Error of the estimator of that parameter or the minimum error of estimate, which is a multiple of the Standard Error of the estimator.

Representativeness

Importance of having a representative sample

Representative samples are known to collect results, knowledge, and observations that can be relied upon as representative of the broader population being studied. Therefore, representative sampling is usually the best method for  market research .

If we do not have representation, we will surely have data that will be of no use to us. Therefore, it is important that we guarantee that the characteristics that matter to us and need to be investigated are found in the sample that is going to be the object of study.

Let’s take into account that we will always be prone to falling into  sampling bias  because there will always be people who do not answer the survey because they are busy, or answer it incompletely, so we will not be able to obtain the data we require.

Regarding the  size of the sample , the larger it is, the more likely it is to be representative of the population. 

That a sample is representative gives us greater certainty that the  people included are the ones we need , and we also reduce possible  bias . Therefore, if we want to avoid inaccuracy in our surveys, we must have a representative and balanced sample.

Representative samples

How to obtain a representative sample?

There are  established sampling methods  to obtain a representative sample that have been tested and verified over time through academic, scientific and market research

The  most common types of sampling  are probability or random sampling and non-probability sampling.

Probability sampling

If we are going to have a  probabilistic or random sampling  , we must make sure we have updated information on the population from which we will draw the sample and survey the majority to ensure representativeness. 

The sample will be chosen at random, which guarantees that each member of the population will have the same probability of selection and inclusion in the sample group.

Non-probability sampling

In  non-probabilistic sampling,  the aim is to have different types of people to ensure a more balanced representative sample. 

Knowing the demographic characteristics of our group will undoubtedly help to limit the profile of the desired sample and define the variables that interest us, such as gender, age, place of residence, etc. 

By knowing these criteria, before obtaining the information, we can have the control to create a representative sample that is useful to us.

We must  avoid having a sample that does NOT reflect the target population , the ideal is to have data that is as accurate as possible for the success of our project.  

Probability sampling

Avoid making sampling errors

When a sample is not representative, then we will have a  sampling error . If we want to have a representative sample of 100 employees, then we must choose a similar number of men and women. For example, if we have a sample biased towards a certain gender, then we will have an error in the sample.

Sample size is very important, but it does not guarantee that the population we need is accurately represented. More than size, representativeness is more related to the  sampling frame , that is, to the list from which the people who are going to be, for example, part of a  survey are selected . 

Therefore, we must ensure that people from our  target audience  are included in that list to say that it is a representative sample.

 

Are you using best primary and secondary data, and why?

Translating

Are you using best primary and secondary data, and why?

primary and secondary data

Primary and secondary data

To analyze the similarity and difference between primary and secondary data sources, it is advisable that you know what a market study entails. As a  Statista report explains , “ a market study is an important business strategy that requires the collection of information about a target market for the company .”

That is, before launching a new line of products or services, growing your team or establishing a social media campaign, you will most likely need to conduct market research. Collecting primary and secondary data ensures that you can make informed decisions to save time and money.

On the one hand,  primary data is the information collected directly from the source of interest; in this case, the potential client . Normally, when a company needs primary marketing data, it does so to determine the viability of a product or service and analyze the buyer persona, market offers, investment risk, among other factors. Primary data collection methods include customer interviews, surveys,  focus groups , etc.

On the other hand, unlike primary data,  secondary data is based on information researched by other companies, institutions or platforms . Secondary data sources are usually public. Some of these include newspapers, government websites, media agencies, etc. In order for it to serve the company’s purposes, the collection of secondary data has to go through an analysis by the marketing team that selects the most convenient information.

Primary data types

To begin to understand the characteristics of primary and secondary data, we must start from their types. That way,  data analysis in marketing  makes sense and can help you plan market research. In the case of primary data, there are two types that you must take into account.

Quantitative primary data

As you might guess from the name, the importance of quantitative primary data lies in the quantities and numbers. Those in charge of conducting primary data research focus on the mathematical data and not on the subjective signals of people’s behaviors or opinions .

The information from primary data allows us to understand a problem in the market and analyze its implication for the consumer. By obtaining this numerical information, a company’s marketing team and managers can make decisions based on clear and objective realities.

Primary data

 

Qualitative primary data

Numbers or statistics are not necessary here. On the contrary,  qualitative secondary data offers information about the behaviors and emotions of potential customers of a product or service from audios, texts, videos or other formats for collecting opinions .

It is common for this type of data to be obtained from direct conversations with people of interest. Interviews are usually conducted with open-ended questions and other data collection techniques.

Primary data sources

In the process of obtaining primary and secondary data, locating the sources is the most important step. If you’re part of a marketing team with research underway, here’s a list of primary data sources to consider.

  • Surveys:  serve to gather direct information on the problems, needs and preferences of the potential client. In this way, survey information makes it possible to predict consumer behavior during the sales phases.
  • Questionnaires:  this format includes open and closed questions to determine the assessment that a potential client has of a brand, company or product. They can be made in different formats, such as phone calls, sending SMS, emails, etc.
  • Interviews:  These are conversations conducted by a marketing or market data research specialist. They tend to be long and deep conversations with people interested in a brand and product. Given their character, they offer quality information from verbal responses and even body language.
  • Website: web analytics  of   a company’s website is also a primary database. From the SEO and user experience analysis carried out by the corresponding areas, valuable information can be obtained for the content marketing strategy, the creation of new products and the customer profile.
  • Social networks:  brand accounts on Instagram, Facebook, Twitter and other social networks are sources of a lot of information for any marketing team. In them, existing customers and interested parties present their opinions, objections and ideas, which can be used to make decisions about advertising campaigns and organic content. That is, primary data in social media marketing has a lot of relevance for a company.

Primary data sources

 

Secondary data types

If you are wondering  how to do a complete market study  , you need to also take into account the types of secondary data. These include the information that can be found in the company’s internal files and that which comes from outside.

Internal secondary data

Although this concept seems to reflect a similarity between primary and secondary data, the truth is that internal secondary data is information that is not collected directly from the ideal client or the specific audience at their current moment. These data are obtained from the internal and archived records of a company, such as reports from past advertising campaigns, accounts of former clients, accounting data, etc.

Secondary data

 

External secondary data

This information is what is commonly found outside of company records and files . They have been prepared and published by external sources for objectives that do not directly relate to commercial research objectives. That is, as we mentioned before, secondary data information comes from studies of competitors, magazines of public organizations, etc.

Secondary data sources

Just as with primary data sources, secondary data also exists in different formats. The nature of each of these sources and the information collected must be analyzed to select what is useful and what is not.

  • Government studies and reports:  many governments around the world carry out in-depth studies on the behavior of their populations, demographics and the sociocultural changes that occur in them. This offers a lot of valuable secondary data for any business’s marketing research.
  • Private market research:  using the power of  data science , there are many companies that are dedicated to conducting user and consumer behavior studies. These usually sell this information to other companies that can leverage the value of this secondary data for their objectives.
  • Sales data:  the data obtained from sales processes is very important for any commercial organization. You can find useful secondary data in invoices, returned products, order documents, and delivery experience. 
  • Competitor platforms:  in a secondary data investigation you cannot forget the information found on competitor pages, applications and reports. In them, you can locate information about potential customers, unresolved pain points, and gaps in your value proposition.
  • Web search engines:  Search engines like Google also offer invaluable data. Not only because they list the competition’s organic results for their websites, but because they allow you to see an overview of the advertising spending that these other companies make in search of the same type of client.

 

 

 

 

 

 

How to best handle outliers or anomalous data points?

images 3

How to best handle outliers or anomalous data points?

data

 

DATA

Data is a collection of facts, figures, objects, symbols, and events gathered from different sources. Organizations collect data with various data collection methods to make better decisions. Without data, it would be difficult for organizations to make appropriate decisions, so data is collected from different audiences at various points in time.

Outlier detection in the field of data mining (DM) and knowledge discovery from data (KDD) is of great interest in areas that require decision support systems, such as, for example, in the financial area, where through DM you can detect financial fraud or find errors produced by users. Therefore, it is essential to evaluate the veracity of the information, through methods for detecting unusual behavior in the data.

This article proposes a method to detect values ​​that are considered outliers in a database of nominal type data. The method implements a global “k” nearest neighbors algorithm, a clustering algorithm called  k-means  , and a statistical method called chi-square. The application of these techniques has been implemented on a database of clients who have requested financial credit. The experiment was performed on a data set with 1180 tuples, where outliers were deliberately introduced. The results demonstrated that the proposed method is capable of detecting all introduced outliers.

Detecting outliers represents a challenge in data mining techniques. Outliers or also called anomalous values ​​have different properties with respect to generality, since due to the nature of their values ​​and therefore, their behavior, they are not data that maintain a behavior similar to the majority. Anomalous  are susceptible to being introduced by malicious mechanisms ( Atkinson, 1981 ). 

Mandhare and Idate (2017)  consider this type of data to be a threat and define it as irrelevant or malicious. Additionally, this data creates conflicts during the analysis process, resulting in unreliable and inconsistent information. However, although anomalous data are irrelevant for finding patterns in everyday data, they are useful as an object of study in cases where, through them, it is possible to identify problems such as financial fraud through an uncontrolled process.

What is anomaly detection?

Anomaly detection examines specific data points and detects unusual occurrences that appear suspicious because they are different from established patterns of behavior. Anomaly detection is not new, but as the volume of data increases, manual tracking is no longer practical.

Why is anomaly detection important?

Anomaly detection is especially important in industries such as finance, retail, and cybersecurity, but all businesses should consider implementing an anomaly detection solution. Such a solution provides an automated means to detect harmful outliers and protects data. For example, banking is a sector that benefits from anomaly detection. Thanks to it, banks can identify fraudulent activity and inconsistent patterns, and protect data. 

Data is the lifeline of your business and compromising it can put your operation at risk. Without anomaly detection, you could lose revenue and brand value, which takes years to cultivate. Your company is facing security breaches and the loss of confidential customer information. If this happens, you risk losing a level of customer trust that may be irretrievable. 

anomaly detection

 

The detection process using data mining techniques facilitates the search for anomalous values ​​ ( Arce, Lima, Orellana, Ortega and Sellers, 2018 ). Several studies show that most of this type of data also originates from domains such as credit cards ( Bansal, Gaur, & Singh, 2016 ), security systems ( Khan, Pradhan, & Fatima, 2017 ), and electronic health information ( Zhang & Wang, 2018 ).

The detection process includes a data mining process that uses tools based on unsupervised algorithms (Onan, 2017). The detection process consists of two approaches depending on its form: local and global ( Monamo, Marivate, & Twala, 2017 ). Global approaches include a set of techniques in which each anomaly is assigned a score relative to the global data set. On the other hand, local approaches represent the anomalies in a given data with respect to its direct neighborhood; that is, to the data that are close in terms of the similarity of their characteristics.

According to the aforementioned concepts, the local approach detects outliers that are ignored when a global approach is used, especially those with variable density (Amer and Goldstein, 2012). Examples of such algorithms are those based on i) clustering and ii) nearest neighbor. The first category algorithm considers outliers to be in sparse neighborhoods, which are far from the nearest neighbors. While the second category operates in grouped algorithms ( Onan, 2017 ).

There are several approaches related to the detection of outliers, in this context,  Hassanat, Abbadi, Altarawneh, and Alhasanat (2015) , carried out a survey where a summary of the different studies of detection of outliers is presented, these being: statistics-based approach, the distance-based approach and the density-based approach. The authors present a discussion related to outliers,  and conclude that the k-mean algorithm is the most popular in clustering a data set.

Furthermore, in other studies (Dang et al., 2015; Ganji, 2012; Gu et al., 2017; Malini and Pushpa, 2017; Mandhare and Idate, 2017; Sumaiya Thaseen and Aswani Kumar, 2017; Yan et al., 2016 ) data mining techniques, statistical methods or both are used. For outlier detection, nearest neighbor (KNN) techniques have commonly been applied along with others to find unusual patterns during data behavior or to improve process performance. What’s up. (2017) present an efficient grid-based method for finding outlier data patterns in large data sets.

Similarly, Yan et al. (2016) propose an outlier detection method with KNN and data pruning, which takes successive samples of tuples and columns, and applies a KNN algorithm to reduce dimensionality without losing relevant information.

Classification of significant columns

To classify the significant columns, the chi-square statistic was used. Chi-square is a non-parametric test used to determine whether a distribution of observed frequencies differs from expected theoretical frequencies ( Gol and Abur, 2015 ). The weight of the input column (columns that determine the customer profile) is calculated in relation to the output column (credit amount). The higher the weight of a column corresponding to the input columns on a scale of zero to one, the more relevant it is considered.

That is, the closer the weight value is to one, the more important the relationship with respect to the output column will be. The statistic can only be applied to nominal type columns and has been selected as a method to define relevance. Chi-square reports a level of significance of the associations or dependencies and was used as a hypothesis test on the weight or importance of each of the columns with respect to the output  column S. The resulting value is stored in a column called  weight , which together with the anomaly score is reported at the end of the process.

Nearest local neighbor rating

To obtain the values ​​with suspected abnormality, K-NN Global Anomaly Score is used. KNN is based on the k-nearest neighbor algorithm, which calculates the anomaly score of the data relative to the neighborhood. Usually, outliers are far from their neighbors or their neighborhood is sparse. In the first case, it is known as global anomaly detection and is identified with KNN; The second refers to an approach based on local density.

The score comes by default from the average of the distance to the nearest neighbors ( Amer and Goldstein, 2012 ). In the  k  nearest neighbor classification, the output column  S  of the nearest neighbor of the training dataset is related to a new data not classified in the prediction, this implies a linear decision line.

values

To obtain a correct prediction, the value k (number of neighbors to be considered around the analyzed value) must be carefully configured. A high value of k represents a poor solution with respect to prediction, while low values ​​tend to generate noise ( Bhattacharyya, Jha, Tharakunnel, & Westland, 2011 ).

Frequently, the parameter k is chosen empirically and depends on each problem. Hassanat, Abbadi, Altarawneh, and Alhasanat (2014)  propose carrying out tests with different numbers of close neighbors until reaching the one with the best precision. Their proposal starts with values ​​from k=1 to k= square root of the number of tuples in the training dataset. The general rule is often to map k with the square root of the number of tuples in dataset D.

HOW CAN WE SOLVE THE PROBLEM OF ATYPICAL DATA?

If we have confirmed that these outliers are not due to an error when constructing the database or in measuring the variable,  eliminating them is not the solution. If it is not due to an error, eliminating or replacing it can modify the inferences made from that information, because it introduces a bias, reduces the sample size, and can affect both the distribution and the variances.

Furthermore,  the treasure of our research lies in the variability of the data!

That is, variability (differences in the behavior of a phenomenon) must be explained, not eliminated. And if you still can’t explain it, you should at least be able to reduce the influence of these outliers on your data.

The best option is to remove weight from these atypical observations using robust techniques .

Robust statistical methods are modern techniques that address these problems. They are similar to the classic ones but are less affected by the presence of outliers or small variations with respect to the models’ hypotheses.

ALTERNATIVES TO THE MEDIA

If we calculate the  median (the center value of an ordered sample) for the second data set we have a value of 14 (the same as for the first data set). We see that this centrality statistic has not been disturbed by the presence of an extreme value, therefore, it is more robust.

Let’s look at other alternatives…

The  trimmed mean (trimming)  “discards” extreme values. That is, it eliminates a fraction of the extreme data from the analysis (eg 20%) and calculates the mean of the new data set. The trimmed mean for our case would be worth 13.67.

The  winsorized mean  progressively replaces a percentage of the extreme values ​​(eg 20%) with less extreme ones. In our case, the winsorized mean of the second sample would be the same 13.62.

We see that all of these robust estimates better represent the sample and are less affected by extreme data.

What role does technology play in your best data collection process?

language20for20applications

What role does technology play in your best data collection process?

data collection

Data Collection

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

Data collection is the process of gathering data for use in business decision-making, strategic planning, research and other purposes. It’s a crucial part of data analytics applications and research projects: Effective data collection provides the information that’s needed to answer questions, analyze business performance or other outcomes, and predict future trends, actions and scenarios.

In businesses, data collection happens on multiple levels. IT systems regularly collect data on customers, employees, sales and other aspects of business operations when transactions are processed and data is entered. Companies also conduct surveys and track social media to get feedback from customers. Data scientists, other analysts and business users then collect relevant data to analyze from internal systems, plus external data sources if needed. The latter task is the first step in data preparation, which involves gathering data and preparing it for use in business intelligence (BI) and analytics applications.

For research in science, medicine, higher education and other fields, data collection is often a more specialized process, in which researchers create and implement measures to collect specific sets of data. In both the business and research contexts, though, the collected data must be accurate to ensure that analytics findings and research results are valid.

What are information technologies?

Information technology is   a process that uses a combination of means and methods of collecting, processing and transmitting data to obtain new quality information about the state of an object, process or phenomenon. The purpose of information technology is the  production of information  for analysis by people and making decisions based on it to perform an action.

Information technologies (IT)

The introduction of a personal computer in the information sphere and the application of telecommunications media have determined a new stage in the development of  information technology . Modern IT is an information technology with a “friendly” user interface using personal computers and telecommunication facilities. The new information technology is based on the following basic principles.

Information technologies

  1. Interactive (dialogue) mode of working with a computer.
  2. Integration with other software products.
  3. Flexibility in the process of changing data and task definitions.

As a set of  information technology tools , many types of computer programs are used: word processors, publishing systems, spreadsheets, database management systems, electronic calendars, functional purpose information systems.

Characteristics of information technologies:

  • User operation in  data manipulation mode  (without programming). The user must not know and remember, but must see (output devices) and act (input devices).
  • Transversal information support  at all stages of information transmission is supported by an integrated database, which provides a unique way to enter, search, display, update and  protect information .
  • Paperless document processing  during which only the final version of the paper document is recorded, intermediate versions and necessary data recorded on the media are delivered to the user through the PC display screen.
  •  Interactive (dialogue) task solution mode  with a wide range of possibilities for the user.
  • Collective production of a document  on the basis of a group of computers linked by means of communication.
  • Adaptive processing  of the form and modes of presentation of information in the problem-solving process.

Types of information technologies

The main  types of information technology  include the following.

  • Information technology for data processing is   designed to solve well-structured problems, whose solution algorithms are well known and for which all necessary input data exist. This technology is applied to the performance level of low-skilled personnel in order to automate some routine and constantly repeated operations of administrative work.
  • Management information technology is   intended for the information service of all company employees, related to the acceptance of administrative decisions. In this case, the information is usually in the form of ordinary or special management reports and contains information about the past, present and possible future of the company.
  • Automated office information technology is   designed to complement the company’s existing staff communication system. Office automation assumes the organization and support of communication processes both within the company and with the external environment on the basis of computer networks and other modern means of transferring and working with information.
  • Information technology for decision support is   designed to develop a management decision that occurs as a result of an iterative process involving a decision support system (a computer link and the object of management) and a person (the management link, which sets input data and evaluates the result).
  • Expert systems information technology is   based on the use of  artificial intelligence . Expert systems allow managers to receive expert advice on any problem about which knowledge has been accumulated in these systems.

Types of information technologies

 

The use of modern technology is more economical than ever and electronic tools now offer a cost-effective alternative to paper questionnaires for collecting high-quality data. In order to help you decide if using computer-assisted personal interviewing (CAPI) is for you, this blog reviews the potential benefits and challenges of using CAPI and shares a recent survey experience. carried out in Guyana in which free software developed by Survey Solutions was used.

Paper questionnaires: the traditional way to collect data

Conducting surveys of this magnitude with paper questionnaires can be costly in terms of economic, administrative and logistical efforts while presenting a series of challenges: printing and transporting questionnaires to and from the field is often associated with with a high cost; Corrections to questions can represent a significant challenge in terms of cost and time. There is also the real risk that questionnaires will be lost in the field or damaged by weather or transportation before the data is systematized.

collect data

Even when all interviews have been conducted, responses must be manually entered into a digital file before the data can be analyzed. This process represents a lot of time and manual work and increases the margin of error. Data quality checks are limited, and errors are sometimes only recognized after the survey has ended, making them more difficult to correct.

However, there is an alternative to paper questionnaires: computer-assisted personal interview (CAPI). In recent years, CAPI has attracted more attention as it presents a more economical way to collect high-quality data.

P7213035

CAPI: an increasingly popular tool

With the new processing speeds of today’s computers, the increasing global availability of Internet service and the falling prices of mobile devices, CAPI has become increasingly attractive. The CAPI tool creates the questionnaire using special software that can be downloaded directly to a mobile device (usually a smartphone or tablet), which the interviewer uses to administer and fill out the questionnaire. Information from these questionnaires is uploaded to a central server where it can be accessed and reviewed remotely.

Depending on the size of the survey sample, purchasing tablets to complete electronic surveys becomes increasingly more affordable than printing paper questionnaires. The technical requirements for such devices are relatively low and a large number of questionnaires can usually be saved on the device without danger of running out of storage. Additionally, once a questionnaire has been entered on a mobile device, it can be modified: If an error is detected in the early stages of the survey, it can be easily corrected without incurring additional printing costs.

How do you best standardize your data collection procedures?

images 4

How do you best standardize your data collection procedures?

data collection

Data Collection

Data collection is the process of collecting and analyzing information on relevant variables in a predetermined, methodical way so that one can respond to specific research questions, test hypotheses, and assess results. Data is a collection of facts, figures, objects, symbols, and events gathered from different sources. Organizations collect data with various data collection methods to make better decisions. Without data, it would be difficult for organizations to make appropriate decisions, so data is collected from different audiences at various points in time.

Data collection is a systematic process of gathering observations or measurements. Whether you are performing research for business, governmental or academic purposes, data collection allows you to gain first-hand knowledge and original insights into your research problem.

While methods and aims may differ between fields, the overall process of data collection remains largely the same. Before you begin collecting data, you need to consider:

  • The aim of the research
  • The type of data that you will collect
  • The methods and procedures you will use to collect, store, and process the data

To collect high-quality data that is relevant to your purposes, follow these four steps.

For instance, an organization must collect data on product demand, customer preferences, and competitors before launching a new product. If data is not collected beforehand, the organization’s newly launched product may fail for many reasons, such as less demand and inability to meet customer needs. 

Although data is a valuable asset for every organization, it does not serve any purpose until analyzed or processed to get the desired results.

In a society in which information and data are the key to any activity or business, it is very important to know  how to standardize data  to be able to get the most out of it and performance.

With globalization and the information systems that are used in our daily lives, the amount of information and data we have at our disposal is immense. The problem is  knowing how to manage such an amount of data, how to collect it, how to treat it, classify it and apply it . In this sense, data standardization can be of great help.

Aware of the importance of knowing how to standardize data, at  Ayuware ,  as  experts in big data,  we have prepared this post with relevant information about the advantages that this procedure offers and all the benefits it provides, so that you can know exactly what it is about. treats.

What is standardizing data?

Data standardization is the data quality process of transforming data to fit a predefined and constrained set of values, relying on the power of uniformity to improve data efficiencies.

Data standardization, sometimes also known as  normalization , is the  process of adjusting or adapting certain characteristics  so that the data resembles a common type, model or normal with the aim of making  its treatment, access and use easier for people. the users or people who have them.

The very concept of open data implies a search for standardization in the use of information in open format so that it can be used and reused by citizens.

Therefore, it could be said that standardization is the way in which all people can  compare and consult data and always find the information they need, being certain of uniformity in the way in which they will find it.

In statistics, normalization or standardization can have a wide list of meanings. In the simplest cases, standardization of indices involves adjusting the values ​​measured on different scales with respect to a common scale.

On the other hand, in more complex cases, data normalization or standardization can refer to making more sophisticated adjustments where the objective is to obtain all those probability distributions that fit the determined values.

In short, it could be said that without standardization and, therefore,  without uniformity  in naming conventions and coincidences between macros and parameters,  only unreliable results can be obtained.

Importance of data standardization

Before knowing how to standardize data, it is important that you know the  importance  of this process for the correct use and processing of information in an effective and functional manner.

Data standardization allows us  to ensure that we will have useful, easily linkable and usable data  at our disposal for any activity we require.

Data standardization not only  helps us organize sets of complex information , but it will also  facilitate its analysis  since it breaks down multiple dimensions and transforms the information into viable insights.

Committing to standardize data is a way to  ensure that we implement the uniformity necessary to maintain the effectiveness of your analysis . As we know, information, today, is power and, therefore,  having standardized and optimized information is a great advantage  both on a personal and business level.

standardizing data

In this sense, data standardization has gone from being a good practice or something recommended, to being  a necessity  for all those who want  to get the most out of the  data  and information they have at their disposal.

Now that we are getting closer to knowing the process on how to standardize data, it is important to keep in mind that, for it to be possible,  it is necessary for institutions to work together  to develop standards since, for a standard to exist, it must be used by a large majority for it to be considered implemented.

Furthermore, data standards and normalization imply  good practices  such as monitoring, control, keeping in mind at all times that the information is useful to anyone who wishes to use it, etc. Thanks to these good practices, the process of how to standardize data is effective and provides all the advantages mentioned above.

To know how to standardize data correctly, it is necessary  to pay attention to the planning prior to collecting the information , in addition to thinking about the feedback of said information from the publication of the data in an open and public manner.

Key moments in data standardization

To know how to standardize data, it is important to know that  there are three key moments  for the process, in which it is necessary to pay attention to carry out the process satisfactorily:

  • Standardization in data capture : For the data standardization procedure to be correct, it is necessary that the data be collected optimally, following guidelines that will greatly facilitate the following steps.
  • Standardization at the time of data storage : Once the information is collected, it is very important to pay attention to how the data is stored, to do so in an orderly manner and to provide all subsequent facilities for its recovery and consultation.
  • Standardization in the presentation of data : This is a fundamental moment since, after carrying out a search for data that we may need, it is very important that these are displayed in a standardized way, so that we can make use of them in a useful and functional. If all the previous steps have been carried out correctly, this last moment should not present any problems.

data

 

What measures are in place to ensure the security of your data?

english

What measures are in place to ensure the security of your data?

data

Data

Data is a collection of facts, figures, objects, symbols, and events gathered from different sources. Organizations collect data with various data collection methods to make better decisions. Without data, it would be difficult for organizations to make appropriate decisions, so data is collected from different audiences at various points in time.

For instance, an organization must collect data on product demand, customer preferences, and competitors before launching a new product. If data is not collected beforehand, the organization’s newly launched product may fail for many reasons, such as less demand and inability to meet customer needs. 

Although data is a valuable asset for every organization, it does not serve any purpose until analyzed or processed to get the desired results.

A phrase attributed to the creator of the Internet, Tim Berners-Lee, is that “data is precious and will last longer than the systems themselves.”

The computer scientist was referring to the fact that information is something highly coveted, one of the most valuable assets that companies have, so it must be protected. The loss of sensitive data can represent the bankruptcy of a company.

Faced with an increasing number of threats, it is necessary to implement measures to protect information in companies. And before doing so, it is necessary to classify the data  you have and the risks to which you are exposed: the price list of the products that are marketed is not the same as the estimated sales figure that is planned to be achieved in the year or base. of customer data.

To talk about the cloud these days is to talk about a need for storage, flexibility, connectivity and decision making in real time. Information is a constantly growing asset and needs to be managed by work teams, and platforms such as  Claro Drive Negocio  offer, in addition to that storage space, collaboration tools to manage an organization’s data.

In cloud storage, the user, instead of saving the data on their computers or hard drives, does so somewhere in the remote location, which can be accessed through the internet service. There are several  providers  of these services that sell space on the network for different ranges, but few offer true security and protection of that gold that you have in your company called: data.

To give you context, more than a third of companies have consolidated  flexible and scalable cloud models  as an alternative to execute their workload and achieve their digital transformation, reducing costs. Hosted information management services allow IT to maintain control and administrators to monitor access and hierarchies by business units.

Five key security measures

Below are five security recommendations to protect information in companies:

  1. Make backup copies or backups . Replicating or having a copy of the information outside the company’s facilities can save your operation in the event of an attack. In this case, options can be sought  in the cloud or in data centers so that the protected information is available at any time. It is also important that the frequency with which the backup is made can be configured, so that the most recent data is backed up.
  2. Foster a culture of strong passwords . Kaspersky recommends that passwords be longer than eight characters, including uppercase, lowercase, numbers, and special characters. The manufacturer also suggests not including personal information or common wordsuse a password for each service ; change them periodically; Do not share them, write them on paper or store them in the web browser. Every year, Nordpass publishes a ranking of the 200 worst passwords used in the world. The worst four are “123456”, “123456789”, “picture1” and “password”.
  3. Protect email.  Now that most of the communication is done through this medium, it is advisable to have anti-spam filters and message encryption systems to protect and take care of the privacy of the data. Spam filters help control the receipt of unsolicited emails, which may be  infected with viruses  and potentially compromise the security of company data.
  4. Use antivirus.  This tool should provide protection against security threats such as zero-day attacks, ransomware, and cryptojacking. And it must also be installed on cell phones that contain company information.
  5. Control access to information.  One way to minimize the risk and consequent impact of errors on data security is to provide access to data according to the profile of each user. With the principle of least privilege, it is considered that, if a person does not have access to certain vital company information, he cannot put it at risk.

security measures

In security, nothing is too much

In a synthetic way, the National Cybersecurity Institute of Spain, INCIBE, recommends the following “basic security measures”:

  • Keep systems updated  free of viruses and vulnerabilities
  • Raise awareness among employees about the correct use of corporate systems
  • Use secure networks to communicate with customers, encrypting information when necessary
  • Include customer information in annual risk analyses, perform regular backups, and verify your restore procedures
  • Implement correct authentication mechanisms, communicate passwords to clients securely and store them encrypted, ensuring that only they can recover and change them

The first time a company or business faces the decision to automate a process it can be somewhat intimidating, however, taking into account the following points, it is a simple task.

1.- Start with the easy processes

Many companies start  considering automation  because they have a large, inflexible process that they know takes up too much time and money. So they start with their most complex problem and work backwards. This strategy is generally expensive and time-consuming, what you should do is review your most basic processes and automate them first. For example, are you emailing a document with revisions when you should be building an automated workflow? There are probably dozens, if not hundreds of these simple processes that you can address and automate before taking on your “giant” process.

2.- Make sure your employees lose their fear of automation

Many times an employee who is not familiar with an automated process is afraid of it. Because? In general he is afraid that automation will eliminate his position. That’s why it’s important to build a supportive culture around automation and get your employees to understand that just because  some of their work is now being assisted by an automated process , it doesn’t mean they are any less valuable.

How will you store and manage the best your collected data?

what is object storage 1080x630 1

How will you store and manage the best your collected data?

collected data

Collected data

Collected data  is very important. Data collection is  the process of collecting and measuring information about specific variables in an established system, which then allows relevant questions to be answered and results to be evaluated. Data collection is a component of research in all fields of study, including the  physical  and  social sciences ,  humanities and business . While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal of all data collection is to capture quality evidence that will allow analysis to lead to the formulation of compelling and credible answers to the questions that have been posed. What is meant by privacy?

The ‘right to privacy’ refers to being free from intrusions or disturbances in one’s private life or personal affairs. All research should outline strategies to protect the privacy of the subjects involved, as well as how the researcher will have access to the information.

The concepts of privacy and confidentiality are related but are not the same. Privacy refers to the individual or subject, while confidentiality refers to the actions of the researcher.

What does the management of stored information entail?

Manual collected data and analysis are time-consuming processes, so transforming data into insights is laborious and expensive without the support  of automated tools.

The size and scope of the information analytics market is expanding at an increasing pace, from self-driving cars to security camera analytics and medical developments. In every industry, in every part of our lives, there is rapid change and  the speed at which transformations occur is increasing.

It is a constant  evolution that is based on data.  That information comes from all the new and old data collected, when it is used to  develop new types of knowledge.

The relevance that information management has acquired raises many questions about the requirements applicable to all data collected and information developed.

Data encryption

Data encryption is  not a new concept, in history we can go to the ciphers that Julius Caesar used to send his orders or the famous communication encryption enigma machine that the Nazis used in the Second World War.

Nowadays,  data encryption  is one of the most used security options to protect personal and business data.

Data encryption  works through mathematical algorithms that convert data into unreadable data. This encrypted data consists of two keys to decrypt it, an internal key that only the person who encrypts the data knows, and a key

external that the recipient of the data or the person who is going to access it must know.

Data encryption can be used   to protect all types of documents, photos, videos, etc. It is a method that has many advantages for information security.

 

Data encryption

Advantages of data encryption

  • Useless data : in the event of the loss of a storage device or the data is stolen by a cybercriminal, allows said data to be useless for all those who do not have the permissions and decryption key.
  • Improve reputation : companies that work with encrypted data offer both clients and suppliers a secure way to protect the confidentiality of their communications and data, displaying an image of professionalism and security.
  • Less exposure to sanctions : some companies or professionals are required by law to encrypt the data they handle, such as lawyers, data from police investigations, data containing information on acts of gender violence, etc. In short, all data that, due to its nature, is sensitive to being exposed, therefore requires mandatory encryption, and sanctions may be generated if it is not encrypted.

Data storage 

There are many advantages associated with achieving good management of stored information. Among the  benefits of adequately covering the requirements of the  Data Storage function  and  data management  , the following two stand out:

  • Savings: the capacity of a server to  store data  is limited, so  storing data  without a structure, without a logical order and lacking guiding principles, represents an increase in cost that could be avoided. On the contrary, when data storage responds to a plan and the decisions made are aligned with the business strategy, advantages are achieved that extend to all functions of the organization.
  • Increased productivity:  when   has not been stored correctly the system works slower. One of the strategies often used to avoid this is to  divide data into active and inactive . The latter would be kept compressed and in a different place, so that the system remains agile, but without this meaning that they remain completely inactive, since it may sometimes be necessary to access them again. Today, with cloud services it is much easier to find the most appropriate data storage approach for each type of information.

We must avoid each application deciding  how to save the data , and to this end the information management policy should be uniform for all applications and respond to the following questions in each case:

  • How the data is stored .
  • When is the data saved ?
  • What part of the data or information is collected.

In short,  through  a person in charge will be established who is determined by the  Data Governance , which is in turn responsible for defining the standards and the way to store the information, since not all silos can be used.

And this is the way to support the common objective from this function and through the procedures, planning and organization and control that is exercised transversally and always seeking  to enhance  the pragmatic side of the data .

Data storage 

Steps of data processing in research

Data processing in research has six steps. Let’s look at why they are an imperative component of  research design

  • Research data collection

Data collection is   the main stage of the research process. This process can be carried out through various online and offline research techniques and can be a mix of primary and secondary research methods

The most used form of data collection is research surveys. However, with a  mature market research platform  , you can collect qualitative data through focus groups, discussion modules, etc.

  • Preparation of research 

The second step in  research data management  is data preparation to eliminate inconsistencies, remove bad or incomplete survey data, and clean the data to maintain consensus. 

This step is essential, since insufficient data can make research studies completely useless and a waste of time and effort.

Introduction of research 

The next step is to enter the cleaned data into a digitally readable format consistent with organizational policies, research needs, etc. This step is essential as the data is entered into online systems that support research data management.

  • Research data processing

Once the data is entered into the systems, it is essential to process it to make sense of it. The information is processed based on needs, the  types of data  collected, the time available to process the data and many other factors. This is one of the most critical components of the research process. 

  • Research data output

This stage of processing research data is where it becomes knowledge. This stage allows business owners, stakeholders, and other staff to view data in the form of graphs, charts, reports, and other easy-to-consume formats

  • Storage of processed research

The last stage of data processing steps is storage. It is essential to keep data in a format that can be indexed, searched, and create a single source of truth. Knowledge management platforms are the most used for storing processed research data.

data

Benefits of data processing in research

Data processing can differentiate between actionable knowledge and its non-existence in the research process. However, the processing of research data has some specific advantages and benefits:

  • Streamlined processing and management

When research data is processed, there is a high probability that this data will be used for multiple purposes now and in the future. Accurate data processing helps streamline the handling and management of research data.

  • Better decision making

With accurate data processing, the likelihood of making sense of data to arrive at faster and better decisions becomes possible. Thus, decisions are made based on data that tells stories rather than on a whim.

  • Democratization of knowledge

Data processing allows raw data to be converted into a format that works for multiple teams and personnel. Easy-to-consume data enables the democratization of knowledge.

  • Cost reduction and high return on investment

Data-backed decisions help brands and organizations  make decisions based on data  backed by evidence from credible sources. This helps reduce costs as decisions are linked to data. The process also helps maintain a very high ROI on business decisions. 

  • Easy to store, report and distribute

Processed data is easier to store and manage since the raw data is structured. This data can be consulted and accessible in the future and can be called upon when necessary. 

Examples of data processing in research 

Now that you know the nuances of data processing in research, let’s look at concrete examples that will help you understand its importance.

  • Example in a global SaaS brand

Software as a Service (Saas) brands have a global footprint and have an abundance of customers, often both B2B and B2C. Each brand and each customer has different problems that they hope to solve using the SaaS platform and therefore have different needs. 

By conducting  consumer research , the SaaS brand can understand their expectations,  purchasing  and purchasing behaviors, etc. This also helps in profiling customers, aligning product or service improvements, managing marketing spend and more based on the processed research data. 

Other examples of this data processing include retail brands with a global footprint, with customers from various demographic groups, vehicle manufacturers and distributors with multiple dealerships, and more. Everyone who does market research needs to leverage data processing to make sense of it.  

data

What best criteria are you using to determine the relevance of your data?

images

What best criteria are you using to determine the relevance of your data?

relevance of your data

The relevance of your data

The relevance of your data is very important. The importance of relevant data spans all departments. Basing business decisions on data can be the difference between success or failure — for your entire organization.

Just having the metrics isn’t enough. First, the data you’re collecting needs to be relevant to your organization’s goals. It should indisputably report all pertinent information – positive or negative. Then, the metrics collected need to actionable. When reporting insights, your team should be prepared to answer the questions, “Why does that matter?” and “What are we going to do with that information?”.

When you can collect the information then answer those questions, your organization is on its way to reporting relevant data. Here are 7 reasons why that is important:

Surely you have ever asked yourself the following question-What criteria should you take into account when searching and selecting digital resources or content, both for use and for modifying them?

The distinction between relevance and other dimensions of data quality is important because relevance ensures your data is actionable and aligned with business goals. If you use irrelevant data, you’ll generate inaccurate insights, make poor decisions, and damage your company’s reputation.

Relevant, actionable data is your “ace of spades”. But to play that card, you need to first have it in your deck. If your organization wants to make decisions based on facts, having actionable data on-hand empowers you to answer any “why?” questions.

To be crystal clear: The relevance of your data reported correctly is indisputable. Actionable analytics and insights remove the subjectiveness in business. Without the correct reporting in place, all your team has are instincts and opinions being thrown around, taking you in a million different directions. Take the time to set up reporting and present the relevant data. When the numbers are available and understood by everyone in the organization and the data supports your strategy, it becomes difficult (or impossible) for anyone to argue with your approach.

Relevance of your data creates strong strategies

Opinions can turn into great hypotheses, but only with the right reporting in place. And those hypotheses are just the first step in creating a strong strategy. It can look something like this:

“Based on X, I believe Y, which will result in Z.”

Once you have a hypothesis, you can create a strong, measurable strategy and put it to work! The structured criteria of a hypothesis, including data, is your lighthouse while executing the strategy. Compare results to the hypothesis regularly to ensure the campaign is going to plan. If it’s not, make adjustments to reach your numbers. Having the hypothesis, based on relevant data, allows your team to be proactive and achieve more goals. The alternative is being reactive, finding problems that you wish you caught sooner.

 Relevance of your data is necessary for optimization

How can your team optimize anything if you don’t have meaningful data to support making changes? You can’t. A lot of times, people confuse testing with optimizing. Testing is a part of optimizing, but they aren’t synonyms. Testing is measuring to check the quality, performance, or reliability of something. Optimizing takes those measurements a step further. It means to make the best of or most effective use of something. In order to optimize, you first need to test whatever it is you want to optimize (based on a measurable hypothesis, of course). Then, once you get significant results, your team can start optimizing. Consider starting with one of these aspects:

  • Email subject lines
  • Website pages
  • Ad images
  • Form fields
  • Pieces of content

Relevance of your data builds better relationships with customers

Data can build better relationships with customers in a number of ways, but let’s focus on a few major ones for now.

Website personalizations

All customers are unique, so the more personalized the experience, the better. But we know that’s not always feasible. So, start with general personalization. Segment your audiences by location, job title, or referral source. Then, deliver relevant information to that audience segment.

Easy website navigation

Take a peek at your website data. What are visitors searching for most? What are the most common conversions? Where is the information your audience is looking for? You can answer all of these questions using Google Analytics. If there are significant results, it might be time to make some changes on your website so visitors can quickly and easily get what they need.

Email Preferences

How frequently do your customers like being contacted? Then, what day of the week and time of day do they prefer? Recognizing and implementing this is a win-win strategy. Your email metrics improve, and customers see you as a resource of information instead of a bother.

Knowing Customers’ Interests

If a customer has shown you (through data) that they are NOT interested in something, stop [virtually] shoving it in their faces. Even if you worked really hard on that whitepaper, the customer has closed your 3 popups advertising it multiple times. So, stop showing them the darn pop-up!

(Sorry. Kind of.)

The bottom line? It’s the little things that count. Using data you already have to make that extra effort to improve your customers’ experiences with your company can go a long way. It makes their lives a bit easier, validates their opinions, and makes them feel important.

 Relevance of your data quantifies the purpose of your work

The numbers don’t lie. Data can prove that the projects you’re working on are where your limited time is best spent. It can also support what not to work on. Say you spend 20 hours on each webinar your organization hosts, and you put on 2-4 per month thinking they’re driving leads. But once you look at the report, you realize webinars account for 25-50% of your time on the clock and only bring in 2-5% of leads. Turns out, your time might be best spent on a totally different lead generation campaign.

Relevance of your data helps CYA (cover your…)

Our last reason why data is important to your organization is comical, but oh so true! Protect yourself and your work by collecting AND distributing relevant data. It’s important to make the collected information, good and bad, readily available to key stakeholders. Even if they don’t look at it, you did your part to present the analytics. Not only will you cover your, um…backside, but making the analytics easily accessible communicates transparency and can result in more trust or autonomy for future projects.

For guidance on how to set up your reports correctly, check out the Google Data Studio blog. To get the know-how on something more specific, you can read how to report social media ROI as well.

And it is that we have a wide and varied offer of sources, that if we do not filter content applying certain criteria, it will be difficult to achieve veracity, credibility, reliability and of course quality.

Some of these criteria or indicators that are recommended are: authority, content selection, updating, navigability, organization, readability and good online information resources and type of licenses.

Authority:  

Refers to the person responsible for the site, whether it is a person, a group of people, an association, a public institution, an educational institution, etc. This indicator is also used to evaluate resources such as books, magazines or other types of publications. The level of authority of the person in charge of the site accounts for his legitimacy to give his opinion, write or work on a specific area. This indicator allows you to analyze the level of reliability of the information provided on the site or publication.

Content selection : This indicator serves to evaluate whether the selection of content and its treatment are appropriate. This indicator is essential, since it refers to the validity of the contents and information. To contrast this indicator, it is necessary to compare the information provided by a specific site with data from other sources.

Authority

Accuracy – precision – rigor

Update:  The level of update of a site refers to the periodic incorporation of new information; or the modification of existing data, according to theoretical and scientific advances. This indicator allows you to recognize sites that contain updated information, and sites that are still operational.

  • Creation date  –  Update date  –  Current and updated information  –  existence of obsolete links  –  Existence of incorrect links

Navigability:  This indicator is particularly relevant if it is proposed that students navigate a certain site to search for information. The navigability of a web page refers to the ease with which a user can navigate through it. If a web page is clear, simple, and understandable, navigation will be autonomous and fast.

  • Design  –  Elegant, functional and attractive  –  Combination of colors, shapes and images  –  Homogeneity of style and format  –  Design compatible with different browser versions and screen resolutions

Online information sources:  that is, selecting information through: academic search engines, libraries (databases, magazine portals, catalogues), digital books

Know the conditions of use of all types of digital content before using it with students, assessing aspects such as the inclusion of advertising, the collection of information and personal data and the additional applications that are installed to complement that content.

Critically evaluate the suitability and reliability of sources and content.

Types of licenses:  Not everything on the Internet can be used Intellectual property protects any original literary, artistic or scientific creation. More specifically, article 10 of the Intellectual Property Law indicates that works can be books, musical compositions, films, photographs, computer programs… Everything,
including its title, is protected both completely and partially. For example, in a song both the music and the lyrics are protected.

Consider the license, terms of use and possible restrictions on the use of digital content.

Teachers do not need the authorization of the author to use a work in their classes, they only
need to simultaneously comply with the following conditions:

  1. The use of the work must be solely for illustrative purposes of its educational activities.
  2. The name of the author and the source must be cited.
  3. There should not be any type of commercial purpose.
  4. Additionally, teachers may also reproduce a work in their classes, for example they may photocopy and give a copy to their students. This reproduction is allowed as long as the following conditions are met:
    1. The length reproduced is not more than 10% of the total work (a chapter, an article…).
    2. It is only distributed among students for a specific activity.
      It is very important to keep in mind that this intellectual protection exception is limited only to “what happens in class.” With the rise of new technologies, it is common to make the mistake of uploading copyrighted material to a “class blog”; If the blog were open to anyone, it would not be limited only to “what happens in class” and would be in violation of the educational exception in intellectual property.

WHAT LOCATION OR LOCATIONS WOULD BE MOST SUITABLE ACCORDING TO THE RESOURCE?

EVAGD  allows, among other actions, to organize digital educational content and make it available to the educational community, considering it a safe environment as it is hosted on the CEUCD servers.
In its structure, it has different tools that enable the cataloging and sharing of many types of content.

It would be recommended for those resources in which, through their use, the student’s data protection could be compromised: for example, a questionnaire, a forum, or a videoconference.

G Suite Educational  is a package of Google tools and services for educational centers. It would be advisable when the different options for tracking and using the digital resource by the company do not represent a prejudice.

Aula Digital Canaria  is a comprehensive solution to provide the classroom with tools for students’ digital work and to manage class information in real time. It is an application that allows digitalized learning situations of the Brújula20 Program to be made available to students and teachers in public centers in the Canary Islands in an interactive virtual environment, and facilitates greater control and the real possibility
of responding to the individual needs of the students. .

Being customizable and flexible, it is recommended in the courses that are included in said program.

Institutional blogs  (Eco-school blog 2.0 – multisites for the creation of blogs for institutional projects and digital magazines-, EduBlogs -multisites for the creation of blogs for educational centers-, EcoBlogs -multisites for the creation of teachers’ blogs-, AulaBlog – multisites for the creation of classroom blogs (currently in
pilot phase)

It is recommended as a living space for content management, publication of experiences, communication, dynamization and exchange of knowledge information.

 

Author affiliation