Tutorial Estadístico

Técnicas Estadísticas en la Biblioteca de Datos: Un Tutorial

Las técnicas estadísticas son herramientas esenciales para analizar largas bases de datos; así este tutorial estadístico cubre las competencias esenciales para la mayoridad de los usuarios de la biblioteca de datos.

La Biblioteca de Datos es una herramienta en línea que provee acceso a una abundancia de datos relativos al clima gracia a su interfaz fácil que utilizar. Ingrid, el lenguage de programación con el cual la Biblitoca de Datos se constuió, tiene una variedad de funciones que se pueden utilizar para manipular datos. Estas funciones Ingrid se aplican simplemente porque están ejecutadas directamente por el interfaz de la Biblioteca de Datos. La larga variedad de funciones disponibles facilita al usuario novato tal como al avanzado.

Mientras que el primero tutorial de la Biblioteca de Datos, Navegación por la Biblioteca de Datos, sea principalmente concentrado sobre la introducción de la Biblioteca de Datos a nuevos usuarios, este tutorial de estadísticas facilita el uso de funciones estadísticas de la Biblioteca de Datos, y aunque se interesa a algunas técnicas avanzadas, las secciones cubren también muchas competencias básicas.

Los temas siguientes están incluidos en el tutorial : medidas de tendencia central, medidas de dispersión, climatologías, anomalías estandarizadas, correlaciones, indices climáticos, distribuciones, decomposición en valores singulares, interpolaciones. Una introducción y un ejemplo práctico componen cada tutorial de cada función estadística.

Measures of Central Tendency

One of the most common quantities used to summarize a set of data is its center. The center is a single value, chosen in such a way that it gives a reasonable approximation of normality.

Running and Weighted Averages

Both running and weighted averages are important filtering methods for statistical analysis.

Climatologies and Standardized Anomalies

Climatology is commonly known as the study of our climate, yet the term encompasses many other important definitions. Climatology is also defined as the long-term average of a given variable, often over time periods of 20-30 years.

Data Homogeneity

It is often important to determine if a set of data is homogeneous before any statistical technique is applied to it. Homogeneous data are drawn from a single population.

Stationarity

A random variable or random process is said to be stationary if all of its statistical parameters are independent of time. While most statistical techniques require that data is stationary, most atmospheric processes are visibly nonstationary.

Measures of Dispersion

While measures of central tendency are used to estimate "normal" values of a dataset, measures of dispersion are important for describing the spread of the data, or its variation around a central value.

Correlation

The correlation is defined as the measure of linear association between two variables. A single value, commonly referred to as the correlation coefficient, is often needed to describe this association.

Climate Indices

Indices are diagnostic tools used to describe the state of a climate system. Climate indices are most often represented with a time series; each point in time corresponds to one index value.

Frequency Distributions

A frequency distribution is one of the most common graphical tools used to describe a single population. It is a tabulation of the frequencies of each value (or range of values).

Singular Value Decomposition

Singular value decomposition (SVD) is quite possibly the most widely-used multivariate statistical technique used in the atmospheric sciences. The technique was first introduced to meteorology in a 1956 paper by Edward Lorenz, in which he referred to the process as empirical orthogonal function (EOF) analysis. Today, it is also commonly known as principal-component analysis (PCA). All three names are still used, and refer to the same set of procedures within the Data Library.

Interpolation Techniques

Interpolation is the process of using known data values to estimate unknown data values.