What criteria are best for determining the relevance of your data sources?

inteligencia artificial

What criteria are best for determining the relevance of your data sources?

data sources

What is a Data Sources

Data sources is very important. In data analysis and business intelligence, a data sources is a vital component that provides raw data for analysis. A data source is a location or system that stores and manages data, and it can take on many different forms. From traditional databases and spreadsheets to cloud-based platforms and APIs, countless types of data sources are available to modern businesses.

Understanding the different types of data sources and their strengths and limitations is crucial for making informed decisions and deriving actionable insights from data. In this article, we will define what is a data source, examine data source types, and provide examples of how they can be used in different contexts.

Information

In today’s world, it is essential to master skills that allow us to manage information appropriately, according to our needs. Being a person competent in information management becomes a fundamental factor for the development of our academic life, as well as our professional life and even staff. Therefore, a key factor will be our degree of autonomy in the management of information.

The history of access to information has been one of universalization and progressive growth. In recent years, we have witnessed a true information explosion, in which the volume of information of all kinds (journalistic, economic, commercial, academic , scientific, etc.) has exploded to reach unthinkable dimensions, almost always difficult to manage.

Thanks to the development of ICT (information and communication technologies), our capacity to process, store and transmit information through the use of computers and communications networks, giving rise to the birth of the information and knowledge society in which we are immersed.

information

What are the sources of information?

An information source is understood as any instrument or, in a broader sense, resource, that can serve to satisfy an information need.

The objective of the information sources will be to facilitate the location and identification of documents, thus answering the question: where are we going to look for the information?

It is necessary to consider the type of information sources that will be consulted for class work. The student must select sources that provide information at a level appropriate to his or her needs.

1. Books:

We generally call a book a “scientific, literary or any other work of sufficient length to form a volume, which may appear in print or on another medium.”

Traditionally, the book was a printed document, but today we can find many in electronic format. Depending on the content and structure, various types of books can be established:

  • Manuals: These are works in which the most substantial aspects of a subject are gathered and synthesized. They compile basic data that is easy to consult, and are especially useful for getting started in the fundamentals of a discipline.
  • Monographs: They are specific studies on a specific topic and will help us gain in-depth knowledge of the area of ​​knowledge. They can provide both basic and exhaustive information on the topic of the work. We can complete the information using specialized magazine articles.
  • Encyclopedias and dictionaries: They offer synthetic and timely information on a topic for quick reference. There are general ones, for all topics, and specialized ones, for a specific subject. Encyclopedia entries are of medium length, while dictionaries contain short definitions.
  • Doctoral theses: These are research works carried out to obtain a doctorate degree. They are original works, not published commercially, exponents of research, with very complete information on a topic of study.

To locate books we will consult the library catalogue.

2. Magazines:

These are periodical publications that appear in successive installments. They are a fundamental source of up-to-date information, necessary to stay up to date on a topic.

We must highlight that electronic publishing has had a great impact on the publication of magazines, and a large number of them are already They publish in digital format. To locate journal articles we will consult the bibliographic databases.

1 . Library catalogs

Catalogs are databases that include descriptions of the documents held by a library. They include the publications that make up the fund or collection of a library: books and magazines, both printed and electronic, sound recordings, videos, etc. The libraries of the University of Valencia have a common catalog called Trobes.

What can we NOT find in the catalogue?

We cannot find MAGAZINE ARTICLES. Articles contained in magazines must be searched in bibliographic databases.

Through a search system, catalogs allow us to locate documents and find out their availability online. To find books and other resources available through the catalog we can search by different fields:

– Author: search by the last name and first name of an author, the name of a public or private organization

– Title: search by exact title

– Word: search for documents that contain said word in any of the record fields

– Subject: search for records of a specific subject or topic. In Trobes the subjects are in Valencian.

When we have identified the book we are looking for in the catalog, we have to locate it in the library. The catalog provides us with a signature for each copy located and indicates where (room, closet, shelf) in the library we can find it.

The catalog also allows:

– Consult the documents in electronic version subscribed by the library: magazines and electronic books and databases

– Carry out certain procedures remotely: reservations, renewals, etc.

2. Databases available through the Library

In addition to the documents that we find in the library catalog, we may need to search for more information (press, scientific articles, statistics, legislation, jurisprudence, financial data…) on the topic of our work.

For this, the library has a series of databases.

What is a database?

A database is a collection of data (texts, figures and/or images) belonging to the same context, systematically selected and stored, and organized according to a search program that allows their location and automated retrieval.

The libraries of the University of Valencia subscribe to a wide range of databases where we can locate information. We can access through the following link: http://biblioteca.uv.es/castellano/recursos_electronicos/bases_dades/acces.php

They are usually found online and we can access them through the university network or from home by setting up a virtual private network (VPN). They also gather freely accessible databases.

There are different types of databases, depending on the information they contain: bibliographic, factual, press; You can consult the main ones for your discipline in section 2.4. Sources of information in Social Sciences. Some of the most used are bibliographic databases, which contain references to documents, mainly journal articles, chapters, reports, conference communications, patents, etc. Sometimes they contain access to the full text of the documents and/or a summary.

General characteristics:

  • rfield-structured recordsauthor, title, title of the source, type of document, etc.
  • contain iinformation extracted fromprimary sources (journals, monographs, conference proceedings…), submitted to documentary analysis (indexation and summary).
  • They allow you to search by keywords.
  • They allow you to save information to print it, save it, send it to an email account or to a bibliography manager.

Internet

The Internet provides access to a large and diverse amount of information and resources. However, unlike libraries that select and evaluate information based on the quality and relevance of each resource, the Internet contains everything, no one is in charge of the content that is hosted, since it is a medium in which it can be self-published.

It is a participatory environment where anyone can contribute information. And that is where the problem of the network lies: not all the information is true or verified. Therefore, when using the Internet as a source of information, we must be critical and know how to differentiate which resources can help us. We must evaluate the information we find, especially if we want to use it to do a job.

Google

One of the first impulses when you feel a need for information is to turn to Google to satisfy it. Although in some cases this resource is sufficient, it is necessary to keep in mind that neither is everything that is, nor is it everything that is, that is, that there is a lot of important information that does not appear in conventional searches and that much of what appears only adds noise and confusion.

How does Google work?

Google incorporates an automatic algorithm that evaluates the sites found, so that only the most relevant ones appear, taking into account the terms or keywords entered in the search. Once the results are obtained, these terms appear in bold, so that the user knows why those resources have been selected.

To evaluate the quality of the resources, Google uses the number of links as a measure. that each page has. In this way, each link from one page to another works as a “quote.” But all links are not valued equally: those links, or quotes, that come from pages that in turn have received more links from other pages are worth more. Through this “democratic” system, Google orders the list of results by placing the websites that receive the most links at the top of the list.

The main characteristic of these search engines is that they only index websites linked to the academic world: journal portals, repositories, headquarters academic websites, databases, commercial publishers, scientific societies, online library catalogs, etc.

information

In the search process, we can come across a wide variety of information on our topic. However, not all information will have the same value, therefore, we must select the appropriate sources of information, taking into account different aspects.

What measures are best to address potential biases in the selection of your data sources?

Datasets

What measures are best to address potential biases in the selection of your data sources?

data sources

 

What is a Data Source

Data Source is very important. In data analysis and business intelligence, a data source is a vital component that provides raw data for analysis. A data source is a location or system that stores and manages data, and it can take on many different forms. From traditional databases and spreadsheets to cloud-based platforms and APIs, countless types of data sources are available to modern businesses.

Understanding the different types of data sources and their strengths and limitations is crucial for making informed decisions and deriving actionable insights from data. In this article, we will define what is a data source, examine data source types, and provide examples of how they can be used in different contexts.

In short, data source refers to the physical or digital location where data can be stored as a data table, data object, or another storage format. It’s also where someone can access data for further use — analysis, processing, visualization, etc.

You often deal with data sources when you need to perform any transformations with your data. Let’s assume you have an eCommerce website on Shopify. And you want to analyze your sales to understand how to enhance your store performance. You decided that you would use Tableau for data processing. As it is a standalone tool, you must somehow fetch the data you need from Shopify. Thus, Shopify will act as a data source for your further data manipulations.

The difference between what is being valued and what is believed to be valued (Casal & Mateu, 2003). Unlike random error, systematic error is not compensated by increasing the sample size (Department of Statistics, Universidad Carlos III de Madrid). However, although its importance is vital in the development of an investigation, it is relevant to mention that
none is exempt from them; and that the essential thing is to know them to try to avoid, minimize or correct them (Beaglehole et al., 2008).

bias

Bias

The risk of bias appearing is intrinsically related to clinical research, which is particularly high in frequency since it works with variables that involve individual and population dimensions, which are also difficult to control. However, they also occur in basic sciences, a context in which experimental settings present conditions in which biases adopt peculiar characteristics and are less complex to minimize, since a series or a large part of the variables can be controlled.

From a statistical perspective, when trying to measure a variable, it must be considered that the value obtained as a result of the measurement (XM) is made up of two parts; the true value (XV) and the measurement error (XE); so that XM = XV + XE. Thus, the measurement error is in turn composed of two parts; one random and the other systematic or bias, which can be measurement, selection or confusion (Dawson-Saunders et al., 1994).

This explanation allows us to understand the fundamental characteristics of any measurement: accuracy (measurements close to the true value [not biased]); and precision (repeated measurements of a phenomenon with similar values) (Manterola, 2002).
The objective of this article is to describe the concepts that allow us to understand the importance of biases, the most frequent ones in clinical research, their association with the different types of research designs and the strategies that allow them to be minimized and controlled.

POSSIBILITIES OF COMMITTING BIAS
A simple way to understand the different possibilities of committing bias during research is to think about the three axes that dominate research: what will be observed or measured, that is, the variable under study; the one who will observe or measure, that is, the observer; and with what will be observed or measured, that is, the measuring instrument (Tables II and III) (Beaglehole et al.).

1. From the variable (s) under study.

There are a series of possibilities of bias that are associated with the variable under study, either at the time of its observation, the measurement of its magnitude and its subsequent classification (Manterola).

a) Periodicity: Corresponds to the variability in the observation; That is, what is observed can follow an abnormal pattern over time, either because it is distributed uniformly over time or because it is concentrated in periods. Knowledge of this characteristic is essential in biological events that present known cycles such as the circadian rhythm,
electroencephalographic waves, etc.

b) Observation conditions: There are events that require special conditions for their occurrence to be possible, such as environmental humidity and temperature, respiratory and heart rates. These are non-controllable situations that, if not adequately considered, can generate bias; context more typical of basic sciences.

c) Nature of the measurement: Sometimes there may be difficulty in measuring the magnitude or value of a variable, qualitative or quantitative. This situation may occur because the magnitude of the values ​​is small (hormonal determinations), or due to the nature of the phenomenon under study (quality of life).

d) Errors in the classification of certain events:
They may occur as a result of modifications in the nomenclature used; fact that must be noted by the researcher. For example, neoplasm classification codes, operational definition of obesity, etc.

2. From the observer
The ability to observe an event of interest
(EI) varies from one subject to another. What’s more, when faced with the same stimulus it is possible that two individuals can have different perceptions. Therefore, homogenizing the observation, guaranteeing adequate conditions for its occurrence and adequate observation methodology, leads to minimizing measurement errors.

This is how we know that the error is inherent to the observer, independent of the measuring instrument used. This is why in the different clinical research models, strict conditions are required to homogenize the measurements made by different observers; using clear operational definitions or verifying compliance with these requirements among the subjects incorporated into the study.

 3. From the measurement instrument (s) The measurement of biomedical phenomena using more than just the senses entails the participation of measurement instruments, which in turn may have technical limitations to be able to measure exactly what they are. is desired.

The limitations of measurement instruments apply both to “hard” devices and technology, as well as to population exploration instruments such as surveys, questionnaires, scales and others. Regarding the latter, it is important to consider that the verification of compliance with the technical attributes of these is usually left aside, which, independent of any consideration, are “measuring instruments”, since they have been designed to measure the occurrence of an EI; Therefore, they must be subject to the same considerations as any measuring instrument (Manterola).

These restrictions easily apply to diagnostic tests, in which there is always the probability of overdiagnosing subjects (false positives) or underdiagnosing them (false negatives), committing errors of a different nature in both cases.
Frequently, it is necessary to resort to the design of data collection instruments; whose purpose, like the application of diagnostic tests, is to separate the population according to the presence of some IS.

Thus, if an instrument lacks adequate sensitivity, it will determine a low identification rate of subjects with IS (true positives). On the contrary, screening instruments with low specificity will decrease the probability of finding subjects without the IS (true negatives).

For example, a questionnaire intended to carry out a prevalence study of gastroesophageal reflux may consider inappropriate items to detect the problem in a certain group of subjects, altering their sensitivity. The same instrument, with an excessive number of items of little significance in relation to the problem, may lack adequate specificity to measure EI.

Probability:

Cohorts Cases and controls Cross section Ecological studies

  1. Selection bias Low High Medium Not applicable
  2. Recall bias Low High High Not applicable Confusion
  3. bias Low Medium Medium High
  4. Follow-up losses High Low Not applicable Not applicable
  5. Time required High Medium Medium Low
  6. Cost High Medium Medium Low
  7. Table III. Most common types of bias in observational studies.
  8. MANTEROLA, C. & OTZEN, T. Biases in clinical research. Int. J. Morphol., 33(3):1156-1164, 2015. Another way of classifying biases is that which is related to the frequency in which they occur and the stage of the study in which they originate; It is known that in clinical research, the most frequent biases that affect the validity of a study can be classified into three categories: selection (generated during the selection or monitoring of the study population), information (originated during measurement processes in the study population) and confusion (occur due to the impossibility of comparing the study groups).

1. Selection biases
This type of bias, particularly common in case-control studies (events that occurred in the past can influence the probability of being selected in the study); It occurs when there is a systematic error in the procedures used to select the subjects of the study (Restrepo Sarmiento & Gómez-Restrepo, 2004). Therefore, it leads to an estimate of the effect different from that obtainable for the white population.

It is due to systematic differences between the characteristics of the subjects selected for the study and those of the individuals who were selected for us. For example: hospital cases and those excluded from these either because the subject dies before arriving at the hospital due to the acute or more serious nature of their condition; or for not being sick enough to require admission to the hospital under study; or due to the costs of entry; the distance of the healthcare center from the home of the subject who is excluded from the study, etc.

They can occur in any type of study design, however, they occur most frequently in retrospective case series, case-control, cross-sectional, and survey studies. This type of bias prevents extrapolation of conclusions in studies carried out with volunteers drawn from a population without IS. An example of this situation is the so-called Berkson bias; Also called Berkson’s fallacy or paradox, or admission or diagnostic bias; which is defined as the set of selective factors that lead to systematic differences that can be generated in a case-control study with hospital cases.

It occurs in those situations in which the combination between an exposure and the IS under study increases the risk of admission to a hospital, which leads to a systematically higher exposure rate among hospital cases compared to controls (for example: negative association between cancer and pulmonary tuberculosis, in which tuberculosis acted as a protective factor for the development of cancer; which was explained by the low frequency of tuberculosis in those hospitalized for cancer, a fact
that does not mean that among these subjects the frequency of the disease is less).

Another subtype of selection bias is the so-called Neymann bias (prevalence or incidence), which occurs when the condition under study determines premature loss due to death of the subjects affected by it; For example, if in a group
of 1000 subjects with high blood pressure (risk factor for myocardial infarction) and 1000 non-hypertensive subjects, followed for 10 years; An intense association is observed between arterial hypertension and myocardial infarction. However, it may occur that an association is not obtained due to the non-incorporation in the analysis of subjects who die from myocardial infarction during follow-up.

Another subtype of selection bias is the so-called non-response bias (self-selection or volunteer effect), which occurs when the degree of motivation of a subject who voluntarily participates in research can vary significantly in relation to other subjects; either over or under reporting.

Another that should be mentioned is the membership (or belonging) bias, which occurs when among the subjects under study there are subgroups of individuals who share a particular attribute, related positively or negatively with the variable under study; For example, the profile of surgeons’ habits and lifestyles may differ significantly from that of the general population, such that incorporating a large number of this type of subjects in a study may determine findings conditioned by this factor.

Another is the bias of the selection procedure, which occurs in some clinical trials (CT), in which the random assignment process to the study groups is not respected (Manterola & Otzen, 2015). Another type of selection bias is loss to follow-up bias, which can occur especially in cohort studies, when subjects from one of the study cohorts are lost totally or partially (≥ 20%) and pre-follow-up cannot be completed. -established, thus generating a relevant alteration in the results (Lazcano-Ponce et al., 2000; Manterola et al., 2013).

measurement bias

2.  Measurement bias

This type of bias occurs when a defect occurs when measuring exposure or evolution that generates different information between the study groups that are compared (precision). It is therefore due to errors made in obtaining the information that is required once the eligible subjects are part of the study sample (classification of subjects with and without IS; or of exposed and non-exposed).

In practice, it can present itself as the incorrect classification of subjects, variables or attributes, within a category different from the one to which they should have been assigned. The probabilities of classification can be the same in all groups under study, called “non-differential incorrect classification” (the degree of misclassification) MANTEROLA, C. & OTZEN, T. Biases in clinical research. Int. J. Mor