Display of Categorical Data

From Department of Mathematics at UTSA
Revision as of 21:31, 17 December 2021 by Khanh (talk | contribs) (Created page with "== Categorical variable == In statistics, a '''categorical variable''' (also called '''qualitative variable''') is a variable that can take on one of a limited, and usually f...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Categorical variable

In statistics, a categorical variable (also called qualitative variable) is a variable that can take on one of a limited, and usually fixed, number of possible values, assigning each individual or other unit of observation to a particular group or nominal category on the basis of some qualitative property. In computer science and some branches of mathematics, categorical variables are referred to as enumerations or enumerated types. Commonly (though not in this article), each of the possible values of a categorical variable is referred to as a level. The probability distribution associated with a random categorical variable is called a categorical distribution.

Categorical data is the statistical data type consisting of categorical variables or of data that has been converted into that form, for example as grouped data. More specifically, categorical data may derive from observations made of qualitative data that are summarised as counts or cross tabulations, or from observations of quantitative data grouped within given intervals. Often, purely categorical data are summarised in the form of a contingency table. However, particularly when considering data analysis, it is common to use the term "categorical data" to apply to data sets that, while containing some categorical variables, may also contain non-categorical variables.

A categorical variable that can take on exactly two values is termed a binary variable or a dichotomous variable; an important special case is the Bernoulli variable. Categorical variables with more than two possible values are called polytomous variables; categorical variables are often assumed to be polytomous unless otherwise specified. Discretization is treating continuous data as if it were categorical. Dichotomization is treating continuous data or polytomous variables as if they were binary variables. Regression analysis often treats category membership with one or more quantitative dummy variables.

Examples of categorical variables

Examples of values that might be represented in a categorical variable:

  • The roll of a six-sided die: possible outcomes are 1,2,3,4,5, or 6.
  • Demographic information of a population: gender, disease status.
  • The blood type of a person: A, B, AB or O.
  • The political party that a voter might vote for, e. g. Green Party, Christian Democrat, Social Democrat, etc.
  • The type of a rock: igneous, sedimentary or metamorphic.
  • The identity of a particular word (e.g., in a language model): One of V possible choices, for a vocabulary of size V.

Licensing

Content obtained and/or adapted from: