Statistics And Probability

Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. In applying #statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. This course is the comprehensive explanation of all statistics which very crucial for data science as well. ⭐️ Table of Contents ⭐️

  1. ⌨️ (0:00) Lesson 1: Getting started with statistics
  2. ⌨️ (16:57) Lesson 2: Data Classification
  3. ⌨️ (40:32) Lesson 3: The process of statistical study
  4. ⌨️ (1:05:30) Lesson 4: Frequency distribution
  5. ⌨️ (1:28:48) Lesson 5: Graphical displays of data
  6. ⌨️ (2:05:34) Lesson 6: Analyzing graph
  7. ⌨️ (2:17:25) Lesson 7: Measures of Center
  8. ⌨️ (2:48:20) Lesson 8: Measures of Dispersion
  9. ⌨️ (3:19:27) Lesson 9: Measures of relative position
  10. ⌨️ (3:44::09) Lesson 10: Introduction to probability
  11. ⌨️ (4:02:15) Lesson 11: Addition rules for probability
  12. ⌨️ (4:16:7) Lesson 12: Multiplication rules for probability
  13. ⌨️ (4:33:18) Lesson 13: Combinations and permutations
  14. ⌨️ (4:46:11) Lesson 14: Combining probability and counting techniques
  15. ⌨️ (4:57:09) Lesson 15: Discreate distribution
  16. ⌨️ (5:21:08) Lesson 16: The binomial distribution
  17. ⌨️ (5:43:10) Lesson 17: The poisson distribution
  18. ⌨️ (6:01:15) Lesson 18: The hypergeometric
  19. ⌨️ (6:21:10) Lesson 19: The uniform distribution
  20. ⌨️ (6:46:59) Lesson 20: The exponential distribution
  21. ⌨️ (7:02:01) Lesson 21: The normal distribution
  22. ⌨️ (7:21:06) Lesson 22: Approximating the binomial
  23. ⌨️ (7:42:36) Lesson 23: The central limit theorem
  24. ⌨️ (7:56:54) Lesson 24: The distribution of sample mean
  25. ⌨️ (8:22:03) Lesson 25: The distribution of sample proportion
  26. ⌨️ (8:41:50) Lesson 26: Confidence interval
  27. ⌨️ (9:09:32) Lesson 27: The theory of hypothesis testing
  28. ⌨️ (9:53:50) Lesson 28: Handling proportions
  29. ⌨️ (10:21:38) Lesson 29: Discrete distributing matching
  30. ⌨️ (10:50:05) Lesson 30: Categorical independence
  31. ⌨️ (11:11:53) Lesson 31: Analysis of variance

Topic 3 – Statistics and probability – the geometric and negative binomial distributions, unbiased estimatorsstatistical hypothesis testing and an introduction to bivariate distributions


 Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

Two main statistical methods are used in data analysisdescriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).  Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution’s central or typical value, while  dispersion  (or variability)  characterizes the extent to which members of the distribution depart from its center and each other. Inferences on  mathematical statistics  are made under the framework of  probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the collection of data leading to  test of the relationship  between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an  alternative to an idealized  null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized:  Type I errors (null hypothesis is falsely rejected giving a “false positive”) and  Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a “false negative”). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error.Fisher iris versicolor sepalwidth.svg Many of these errors are classified as random (noise) or systematic  (bias),  but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of  missing data  or censoring may result in biased estimates and specific techniques have been developed to address these problems.

The  normal distribution, a very common  probability density, useful because of the  central limit theorem.

Scatter plots are used in descriptive statistics to show the observed relationships between different variables, here using the Iris flower data set.

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an  observational study does not involve experimental manipulation.

Two main statistical methods are used in  data analysisdescriptive statistics, which summarize data from a sample using  indexes such as the  mean or  standard deviation, and  inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).[5] Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population):  central tendency  (or location) seeks to characterize the distribution’s central or typical value, while  dispersion  (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on  mathematical statistics are made under the framework of  probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a “false positive”) and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a “false negative”). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.


Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of  data, or as a branch of  mathematics. Some consider statistics to be a distinct mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty and decision making in the face of uncertainty. In applying statistics to a problem, it is common practice to start with a  population or process to be studied. Populations can be diverse topics such as “all people living in a country” or “every atom composing a crystal”. Ideally, statisticians compile data about the entire population (an operation called  census). This may be organized by governmental statistical institutes.  Descriptive statistics can be used to summarize the population data. Numerical descriptors include  mean and  sta ndard deviation  for  continuous data  (like income), while frequency and percentage are more useful in terms of describing  categorical data (like education).

When a census is not feasible, a chosen subset of the population called a  sample is studied. Once a sample that is representative of the population is determined, data is collected for the sample members in an observational or  experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, drawing the sample contains an element of randomness; hence, the numerical descriptors from the sample are also prone to uncertainty. To draw meaningful conclusions about the entire population,  inferential statistics is needed. It uses patterns in the sample data to draw inferences about the population represented while accounting for randomness. These inferences may take the form of answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data  (estimation), describing  associations within the data  (correlation), and modeling relationships within the data (for example, using regression analysis). Inference can extend to forecastingprediction, and estimation of unobserved values either in or associated with the population being studied. It can include extrapolation and interpolation of time series or spatial data, and data mining.

Mathematical statistics

Mathematical statistics is the application of mathematics to statistics. Mathematical  techniques used for this include  mathematical analysislinear algebrastochastic analysisdifferential equations,  and  measure-theoretic probability theory.


Gerolamo Cardano, a pioneer on the mathematics of probability.

The earliest European writing on statistics dates back to 1663, with the publication of Natural and Political Observations upon the Bills of Mortality by John Graunt. Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data, hence its  stat- etymology. The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and natural and social sciences.

The mathematical foundations of modern statistics were laid in the 17th century with the development of the probability theory by Gerolamo CardanoBlaise Pascal and Pierre de Fermat. Mathematical probability theory arose from the study of games of chance, although the concept of probability was already examined in medieval law and by philosophers such as  Juan Caramuel   The  method of least squares  was first described by  Adrien-Marie Legendre in 1805.

Karl Pearson, a founder of mathematical statistics. The modern field of statistics emerged in the late 19th and early 20th century in three stages. The first wave, at the turn of the century, was led by the work of  Francis Galton  and  Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. Galton’s contributions included introducing the concepts of  standard deviationcorrelationregression analysis  and the application of these methods to the study of the variety of human characteristics—height, weight, eyelash length among others. Pearson developed the  Pearson product-moment correlation coefficient, defined as a product-moment,the  method of moments for the fitting of distributions to samples and the  Pearson distribution, among many other things. Galton and Pearson founded  Biometrika as the first journal of mathematical statistics and  biostatistics (then called biometry), and the latter founded the world’s first university statistics department at  University College London.

Ronald Fisher coined the term  null hypothesis  during the  Lady tasting tea experiment, which “is never proved or established, but is possibly disproved, in the course of experimentation”.

The second wave of the 1910s and 20s was initiated by  William Sealy Gosset,  and reached its culmination in the insights of  Ronald Fisher, who wrote the textbooks that were to define the academic discipline in universities around the world. Fisher’s most important publications were his 1918 seminal paper  The Correlation between Relatives on the Supposition of Mendelian Inheritance (which was the first to use the statistical term,  variance), his classic 1925 work  Statistical Methods for Research Workers and his 1935 The Design of Experiments , where he developed rigorou s design of experiments   models. He originated the concepts of  sufficiencyancillary statistics Fisher’s linear discriminator and  Fisher information. In his 1930 book  The Genetical Theory of Natural Selection,  he applied statistics to various  biological  concepts such as Fisher’s principle  (which  A. W. F. Edwards called “probably the most celebrated argument in  evolutionary biology“) and  Fisherian runaway, a concept in  sexual selection about a positive feedback runaway effect found in  evolution.

The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between  Egon Pearson and  Jerzy Neyman in the 1930s. They introduced the concepts of  “Type II” error, power of a test and  confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.

Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology. The use of modern computers has expedited large-scale statistical computations and has also made possible new methods that are impractical to perform manually. Statistics continues to be an area of active research for example on the problem of how to analyze big data.

Statistical data

Data collection


When full census data cannot be collected, statisticians collect sample data by developing specific  experiment designs and  survey samples. Statistics itself also provides tools for prediction and forecasting through  statistical models.

To use a sample as a guide to an entire population, it is important that it truly represents the overall population. Representative  sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. A major problem lies in determining the extent that the sample chosen is actually representative. Statistics offers methods to estimate and correct for any bias within the sample and data collection procedures. There are also methods of experimental design for experiments that can lessen these issues at the outset of a study, strengthening its capability to discern truths about the population.

Sampling theory is part of the mathematical discipline of probability theory. Probability  is used in mathematical statistics to study the sampling distributions  of sample statistics  and, more generally, the properties of  statistical procedures. The use of any statistical method is valid when the system or population under consideration satisfies the assumptions of the method. The difference in point of view between classic probability theory and sampling theory is, roughly, that probability theory starts from the given parameters of a total population to deduce probabilities that pertain to samples. Statistical inference, however, moves in the opposite direction—inductively inferring from samples to the parameters of a larger or total population.

Experimental and observational studies

A common goal for a statistical research project is to investigate causality, and in particul ar to draw a conclusion on the effect of changes in the values of predictors or independent variables on dependent variables. There are two major types of causal statistical studies:  experimental studie s an d observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types lies in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead, data are gathered and correlations between predictors and response are investigated. While the tools of data analysis work best on data from  randomized studies, they are also applied to other kinds of data—li ke natural experiments and  observational studie— for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation  and  instrumental variables, among many others) that produce  consistent estimators.


The basic steps of a statistical experiment are:

  1. Planning the research, including finding the number of replicates of the study, using the following information: preliminary estimates regarding the size of  treatment effectsalternative hypotheses,  and the estimated  experimental  variability. Consideration of the selection of experimental subjects and the ethics of research is necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard treatment or control, to allow an unbiased estimate of the difference in treatment effects.
  2.  Design of experiments, using  blocking to reduce the influence of  confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At this stage, the experimenters and statisticians write the experimental protocol  that will guide the performance of the experiment and which specifies the primary analysis of the experimental data.
  3. Performing the experiment following the experimental protocol  and  analyzing the data following the experimental protocol.
  4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study.
  5. Documenting and presenting the results of the study.

Experiments on human behavior have special concerns. The famous  Hawthorne study examined changes to the working environment at the Hawthorne plant of the  Western Electric Company. The researchers were interested in determining whether increased illumination would increase the productivity of the  assembly line workers. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected productivity. It turned out that productivity indeed improved (under the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a control group and blindness. The  Hawthorne effect refers to finding that an outcome (in this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed.

Observational study 

An example of an observational study is one that explores the association between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a cohort study, and then look for the number of cases of lung cancer in each group.  A  case-control study is another type of observational study in which people with and without the outcome of interest (e.g. lung cancer) are invited to participate and their exposure histories are collected.

Types of data

Various attempts have been made to produce a taxonomy of  levels of measurement. The psychophysicist  Stanley Smith Stevens defined nominal, ordinal, interval, and ratio scales. Nominal measurements do not have meaningful rank order among values, and permit any one-to-one (injective) transformation. Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values, and permit any order-preserving transformation. Interval measurements have meaningful distances between measurements defined, but the zero value is arbitrary (as in the case with  longitude  and  temperature  measurements in  Celsius  or  Fahrenheit), and permit any linear transformation. Ratio measurements have both a meaningful zero value and the distances between different measurements defined, and permit any rescaling transformation.

Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as  categorical variables, whereas ratio and interval measurements are grouped together as quantitative variables, which can be either discrete or continuous, due to their numerical nature. Such distinctions can often be loosely correlated with data type in computer science, in that dichotomous categorical variables may be represented with the Boolean data type, polytomous categorical variables with arbitrarily assigned integers in the integral data type, and continuous variables with the real data type involving floating point computation. But the mapping of computer science data types to statistical data types depends on which categorization of the latter is being implemented.

Other categorizations have been proposed. For example, Mosteller and Tukey (1977  distinguished grades, ranks, counted fractions, counts, amounts, and balances. Nelder (1990)  described continuous counts, continuous ratios, count ratios, and categorical modes of data. (See also: Chrisman (1998), van den Berg (1991). )

The issue of whether or not it is appropriate to apply different kinds of statistical methods to data obtained from different kinds of measurement procedures is complicated by issues concerning the transformation of variables and the precise interpretation of research questions. “The relationship between the data and what they describe merely reflects the fact that certain kinds of statistical statements may have truth values which are not invariant under some transformations. Whether or not a transformation is sensible to contemplate depends on the question one is trying to answer.”


Descriptive statistics

descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features of a collection of information,[44] while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics. Descriptive statistics is distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent.

Inferential statistic

Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution.[45] Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population.

Terminology and theory of inferential statistics[edit]

Statistics, estimators and pivotal quantities

Consider independent identically distributed (IID) random variables with a given probability distribution: standard statistical inference and estimation theory defines a random sample as the random vector given by the column vector of these IID variables.  The population being examined is described by a probability distribution that may have unknown parameters.

A statistic is a random variable that is a function of the random sample, but not a function of unknown parameters. The probability distribution of the statistic, though, may have unknown parameters. Consider now a function of the unknown parameter: an estimator is a statistic used to estimate such function. Commonly used estimators include sample mean, unbiased sample variance and sample covariance.

A random variable that is a function of the random sample and of the unknown parameter, but whose probability distribution does not depend on the unknown parameter is called a pivotal quantity or pivot. Widely used pivots include the z-score, the chi square statistic and Student’s t-value.

Between two estimators of a given parameter, the one with lower mean squared error is said to be more efficient. Furthermore, an estimator is said to be unbiased if its expected value is equal to the true value of the unknown parameter being estimated, and asymptotically unbiased if its expected value converges at the limit to the true value of such parameter.

Other desirable properties for estimators include: UMVUE estimators that have the lowest variance for all possible values of the parameter to be estimated (this is usually an easier property to verify than efficiency) and consistent estimators which converges in probability to the true value of such parameter.

This still leaves the question of how to obtain estimators in a given situation and carry the computation, several methods have been proposed: the method of moments, the maximum likelihood method, the least squares method and the more recent method of estimating equations.

Null hypothesis and alternative hypothesis[edit]

Interpretation of statistical information can often involve the development of a null hypothesis which is usually (but not necessarily) that no relationship exists among variables or that no change occurred over time.[47][48]

The best illustration for a novice is the predicament encountered by a criminal trial. The null hypothesis, H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained unless H1 is supported by evidence “beyond a reasonable doubt”. However, “failure to reject H0” in this case does not imply innocence, but merely that the evidence was insufficient to convict. So the jury does not necessarily accept H0 but fails to reject H0. While one can not “prove” a null hypothesis, one can test how close it is to being true with a power test, which tests for type II errors.

What statisticians call an alternative hypothesis is simply a hypothesis that contradicts the null hypothesis.


Working from a null hypothesis, two broad categories of error are recognized:

  • Type I errors where the null hypothesis is falsely rejected, giving a “false positive”.
  • Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is missed, giving a “false negative”.

Standard deviation refers to the extent to which individual observations in a sample differ from a central value, such as the sample or population mean, while Standard error refers to an estimate of difference between sample mean and population mean.

statistical error is the amount by which an observation differs from its expected value. A residual is the amount an observation differs from the value the estimator of the expected value assumes on a given sample (also called prediction).

Mean squared error is used for obtaining efficient estimators, a widely used class of estimators. Root mean square error is simply the square root of mean squared error.


A least squares fit: in red the points to be fitted, in blue the fitted line.

Many statistical methods seek to minimize the residual sum of squares, and these are called “methods of least squares” in contrast to Least absolute deviations. The latter gives equal weight to small and big errors, while the former gives more weight to large errors. Residual sum of squares is also differentiable, which provides a handy property for doing regression. Least squares applied to linear regression is called ordinary least squares method and least squares applied to nonlinear regression is called non-linear least squares. Also in a linear regression model the non deterministic part of the model is called error term, disturbance or more simply noise. Both linear regression and non-linear regression are addressed in polynomial least squares, which also describes the variance in a prediction of the dependent variable (y axis) as a function of the independent variable (x axis) and the deviations (errors, noise, disturbances) from the estimated (fitted) curve.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.[49]

Interval estimation[edit]


Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.

Most studies only sample part of a population, so results don’t fully represent the whole population. Any estimates obtained from the sample only approximate the population value. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Often they are expressed as 95% confidence intervals. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value in 95% of all possible cases. This does not imply that the probability that the true value is in the confidence interval is 95%. From the frequentist perspective, such a claim does not even make sense, as the true value is not a random variable. Either the true value is or is not within the given interval. However, it is true that, before any data are sampled and given a plan for how to construct the confidence interval, the probability is 95% that the yet-to-be-calculated interval will cover the true value: at this point, the limits of the interval are yet-to-be-observed random variables. One approach that does yield an interval that can be interpreted as having a given probability of containing the true value is to use a credible interval from Bayesian statistics: this approach depends on a different way of interpreting what is meant by “probability”, that is as a Bayesian probability.

In principle confidence intervals can be symmetrical or asymmetrical. An interval can be asymmetrical because it works as lower or upper bound for a parameter (left-sided interval or right sided interval), but it can also be asymmetrical because the two sided interval is built violating symmetry around the estimate. Sometimes the bounds for a confidence interval are reached asymptotically and these are used to approximate the true bounds.


Statistics rarely give a simple Yes/No type answer to the question under analysis. Interpretation often comes down to the level of statistical significance applied to the numbers and often refers to the probability of a value accurately rejecting the null hypothesis (sometimes referred to as the p-value).


In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area.

The standard approach[46] is to test a null hypothesis against an alternative hypothesis. A critical region is the set of values of the estimator that leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn’t belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false.

Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug is unlikely to help the patient noticeably.

Although in principle the acceptable level of statistical significance may be subject to debate, the significance level is the largest p-value that allows the test to reject the null hypothesis. This test is logically equivalent to saying that the p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic. Therefore, the smaller the significance level, the lower the probability of committing type I error.

Some problems are usually associated with this framework (See criticism of hypothesis testing):

  • A difference that is highly statistically significant can still be of no practical significance, but it is possible to properly formulate tests to account for this. One response involves going beyond reporting only the significance level to include the p-value when reporting whether a hypothesis is rejected or accepted. The p-value, however, does not indicate the size or importance of the observed effect and can also seem to exaggerate the importance of minor differences in large studies. A better and increasingly common approach is to report confidence intervals. Although these are produced from the same calculations as those of hypothesis tests or p-values, they describe both the size of the effect and the uncertainty surrounding it.
  • Fallacy of the transposed conditional, aka prosecutor’s fallacy: criticisms arise because the hypothesis testing approach forces one hypothesis (the null hypothesis) to be favored, since what is being evaluated is the probability of the observed result given the null hypothesis and not probability of the null hypothesis given the observed result. An alternative to this approach is offered by Bayesian inference, although it requires establishing a prior probability.
  • Rejecting the null hypothesis does not automatically prove the alternative hypothesis.
  • As everything in inferential statistics it relies on sample size, and therefore under  fat tails  p-values may be seriously mis-computed.

Some well-known statistical  tests  and procedures are:

Exploratory data analysis

Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.


Misuse of statistics can produce subtle but serious errors in description and interpretation—subtle in the sense that even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper use of statistics.

Even when statistical techniques are correctly applied, the results can be difficult to interpret for those lacking expertise. The statistical significance of a trend in the data—which measures the extent to which a trend could be caused by random variation in the sample—may or may not agree with an intuitive sense of its significance. The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy.

There is a general perception that statistical knowledge is all-too-frequently intentionally misused by finding ways to interpret only the data that are favorable to the presenter.  A mistrust and misunderstanding of statistics is associated with the quotation, “There are three kinds of lies: lies, damned lies, and statistics“. Misuse of statistics can be both inadvertent and intentional, and the book How to Lie with Statistics,  by Darrell Huff, outlines a range of considerations. In an attempt to shed light on the use and misuse of statistics, reviews of statistical techniques used in particular fields are conducted (e.g. Warne, Lazo, Ramos, and Ritter (2012)).

Ways to avoid misuse of statistics include using proper diagrams and avoiding  bias  Misuse can occur when conclusions are  overgeneralized  and claimed to be representative of more than they really are, often by either deliberately or unconsciously overlooking sampling bias. Bar graphs are arguably the easiest diagrams to use and understand, and they can be made either by hand or with simple computer programs. Unfortunately, most people do not look for bias or errors, so they are not noticed. Thus, people may often believe that something is true even if it is not well represented.[54] To make data gathered from statistics believable and accurate, the sample taken must be representative of the whole  According to Huff, “The dependability of a sample can be destroyed by [bias]… allow yourself some degree of skepticism.”

To assist in the understanding of statistics Huff proposed a series of questions to be asked in each case:

  • Who says so? (Does he/she have an axe to grind?)
  • How does he/she know? (Does he/she have the resources to know the facts?)
  • What’s missing? (Does he/she give us a complete picture?)
  • Did someone change the subject? (Does he/she offer us the right answer to the wrong problem?)
  • Does it make sense? (Is his/her conclusion logical and consistent with what we already know?)

The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable ZZ is called a confounding factor.

Misinterpretation: correlation


The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if they were connected. For example, a study of annual income that also looks at age of death might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to immediately infer the existence of a causal relationship between the two variables.


Applied statistics, theoretical statistics and mathematical statistics

Applied statistics, sometimes referred to as Statistical science, comprises descriptive statistics and the application of inferential statistics.[58][59] Theoretical statistics concerns the logical arguments underlying justification of approaches to statistical inference, as well as encompassing mathematical statistics. Mathematical statistics includes not only the manipulation of probability distributions necessary for deriving results related to methods of estimation and inference, but also various aspects of computational statistics and the design of experiments.

Statistical consultants can help organizations and companies that don’t have in-house expertise relevant to their particular questions.

Machine learning and data mining

Machine learning models are statistical and probabilistic models that capture patterns in the data through use of computational algorithms.

Statistics in academia 

Statistics is applicable to a wide variety of academic disciplines, including  natural  and social s ciences,  government, and business. Business statistics applies statistical methods in econometricsauditing and production and operations, including services improvement and marketing research. A study of two journals in tropical biology found that the 12 most frequent statistical tests are: Analysis of Variance  (ANOVA),  Chi-Square TestStudent’s T TestLinear RegressionPearson’s Correlation CoefficientMann-Whitney U TestKruskal-Wallis TestShannon’s Diversity IndexTukey’s TestCluster AnalysisSpearman’s Rank Correlation Test  and  Principal Component Analysis.

A typical statistics course covers descriptive statistics, probability, binomial and normal distributions, test of hypotheses and confidence intervals, linear regression, and correlation. Modern fundamental statistical courses for undergraduate students focus on correct test selection, results interpretation, and use of free statistics software.

Statistical computing

The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models.

Increased computing power has also led to the growing popularity of computationally intensive methods based on resampling, such as permutation tests and the bootstrap, while techniques such as Gibbs sampling have made use of Bayesian models more feasible. The computer revolution has implications for the future of statistics with a new emphasis on “experimental” and “empirical” statistics. A large number of both general and special purpose statistical software are now available. Examples of available software capable of complex statistical computation include programs such as MathematicaSASSPSS, and R.

Business statistics

In business, “statistics” is a widely used management- and decision support tool. It is particularly applied in financial managementmarketing management, and productionservices and operations management . Statistics is also heavily used in management accounting and auditing. The discipline of Management Science formalizes the use of statistics, and other mathematics, in business. (Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.)

A typical “Business Statistics” course is intended for business majors, and covers [65] descriptive statistics (collection, description, analysis, and summary of data), probability (typically the binomial and normal distributions), test of hypotheses and confidence intervals, linear regression, and correlation; (follow-on) courses may include forecastingtime seriesdecision treesmultiple linear regression, and other topics from business analytics more generally. See also Business mathematics § University levelProfessional certification programs, such as the CFA, often include topics in statistics.

Statistics applied to mathematics or the arts

Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was “required learning” in most sciences.[citation needed] This tradition has changed with the use of statistics in non-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically.[according to whom?] Initially derided by some mathematical purists, it is now considered essential methodology in certain areas.

  • In number theoryscatter plots of data generated by a distribution function may be transformed with familiar tools used in statistics to reveal underlying patterns, which may then lead to hypotheses.
  • Predictive methods of statistics in forecasting combining chaos theory and fractal geometry can be used to create video works.[66]
  • The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were artistically revealed.[67] With the advent of computers, statistical methods were applied to formalize such distribution-driven natural processes to make and analyze moving video art.[citation needed]
  • Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process that only works some of the time, the occasion of which can be predicted using statistical methodology.
  • Statistics can be used to predicatively create art, as in the statistical or stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of artistry does not always come out as expected, it does behave in ways that are predictable and tunable using statistics.

Specialized disciplines

Statistical techniques are used in a wide range of types of scientific and social research, including: biostatisticscomputational biologycomputational sociologynetwork biologysocial sciencesociology and social research. Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include:

In addition, there are particular types of statistical analysis that have also developed their own specialised terminology and methodology:

Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles, it is a key tool, and perhaps the only reliable tool.[citation needed]

See also

Foundations and major areas of statistics

Topic 3 – Statistics and probability – the geometric and negative binomial distributions, unbiased estimatorsstatistical hypothesis testing and an introduction to bivariate distributions


linear form is a linear map from a vector space V over a field F to the field of scalars F, viewed as a vector space over itself. Equipped by pointwise addition and multiplication by a scalar, the linear forms form a vector space, called the dual space of V, and usually denoted V*[16] or V′.[17][18]

If v1, …, vn is a basis of V (this implies that V is finite-dimensional), then one can define, for i = 1, …, n, a linear map vi* such that vi*(vi) = 1 and vi*(vj) = 0 if j ≠ i. These linear maps form a basis of V*, called the dual basis of v1, …, vn. (If V is not finite-dimensional, the vi* may be defined similarly; they are linearly independent, but do not form a basis.)

For v in V, the map

{\displaystyle f\to f(\mathbf {v} )}

is a linear form on V*. This defines the canonical linear map from V into (V*)*, the dual of V*, called the bidual of V. This canonical map is an isomorphism if V is finite-dimensional, and this allows identifying V with its bidual. (In the infinite dimensional case, the canonical map is injective, but not surjective.)

There is thus a complete symmetry between a finite-dimensional vector space and its dual. This motivates the frequent use, in this context, of the bra–ket notation

{\displaystyle \langle f,\mathbf {x} \rangle }

for denoting f(x).

Dual map


{\displaystyle f:V\to W}

be a linear map. For every linear form h on W, the composite function h ∘ f is a linear form on V. This defines a linear map

{\displaystyle f^{*}:W^{*}\to V^{*}}

between the dual spaces, which is called the dual or the transpose of f.

If V and W are finite dimensional, and M is the matrix of f in terms of some ordered bases, then the matrix of f * over the dual bases is the transpose MT of M, obtained by exchanging rows and columns.

If elements of vector spaces and their duals are represented by column vectors, this duality may be expressed in bra–ket notation by

{\displaystyle \langle h^{\mathsf {T}},M\mathbf {v} \rangle =\langle h^{\mathsf {T}}M,\mathbf {v} \rangle .}

For highlighting this symmetry, the two members of this equality are sometimes written

{\displaystyle \langle h^{\mathsf {T}}\mid M\mid \mathbf {v} \rangle .}

Inner-product spaces

Besides these basic concepts, linear algebra also studies vector spaces with additional structure, such as an inner product. The inner product is an example of a bilinear form, and it gives the vector space a geometric structure by allowing for the definition of length and angles. Formally, an inner product is a map

{\displaystyle \langle \cdot ,\cdot \rangle :V\times V\to F}

that satisfies the following three axioms for all vectors uvw in V and all scalars a in F:[19][20]

  • Conjugate symmetry:
    {\displaystyle \langle \mathbf {u} ,\mathbf {v} \rangle ={\overline {\langle \mathbf {v} ,\mathbf {u} \rangle }}.}
In , it is symmetric.
  • Linearity in the first argument:
    {\displaystyle {\begin{aligned}\langle a\mathbf {u} ,\mathbf {v} \rangle &=a\langle \mathbf {u} ,\mathbf {v} \rangle .\\\langle \mathbf {u} +\mathbf {v} ,\mathbf {w} \rangle &=\langle \mathbf {u} ,\mathbf {w} \rangle +\langle \mathbf {v} ,\mathbf {w} \rangle .\end{aligned}}}
  • Positive-definiteness:
    {\displaystyle \langle \mathbf {v} ,\mathbf {v} \rangle \geq 0}
with equality only for v = 0.

We can define the length of a vector v in V by

{\displaystyle \|\mathbf {v} \|^{2}=\langle \mathbf {v} ,\mathbf {v} \rangle ,}

and we can prove the Cauchy–Schwarz inequality:

{\displaystyle |\langle \mathbf {u} ,\mathbf {v} \rangle |\leq \|\mathbf {u} \|\cdot \|\mathbf {v} \|.}

In particular, the quantity

{\displaystyle {\frac {|\langle \mathbf {u} ,\mathbf {v} \rangle |}{\|\mathbf {u} \|\cdot \|\mathbf {v} \|}}\leq 1,}

and so we can call this quantity the cosine of the angle between the two vectors.

Two vectors are orthogonal if uv⟩ = 0. An orthonormal basis is a basis where all basis vectors have length 1 and are orthogonal to each other. Given any finite-dimensional vector space, an orthonormal basis could be found by the Gram–Schmidt procedure. Orthonormal bases are particularly easy to deal with, since if v = a1 v1 + ⋯ + an vn, then

{\displaystyle a_{i}=\langle \mathbf {v} ,\mathbf {v} _{i}\rangle .}

The inner product facilitates the construction of many useful concepts. For instance, given a transform T, we can define its Hermitian conjugate T* as the linear transform satisfying

{\displaystyle \langle T\mathbf {u} ,\mathbf {v} \rangle =\langle \mathbf {u} ,T^{*}\mathbf {v} \rangle .}

If T satisfies TT* = T*T, we call T normal. It turns out that normal matrices are precisely the matrices that have an orthonormal system of eigenvectors that span V.

Relationship with geometry

There is a strong relationship between linear algebra and geometry, which started with the introduction by René Descartes, in 1637, of Cartesian coordinates. In this new (at that time) geometry, now called Cartesian geometry, points are represented  by  Cartesian coordinates, which are sequences of three real numbers (in the case of the usual  three-dimensional space). The basic objects of geometry, which are  lines  and  planes  are represented by linear equations. Thus, computing intersections of lines and planes amounts to solving systems of linear equations. This was one of the main motivations for developing linear algebra.

Most geometric transformation, such as translationsrotationsreflectionsrigid motionsisometries, and projections transform lines into lines. It follows that they can be defined, specified and studied in terms of linear maps. This is also the case of homographies and Möbius transformations, when considered as transformations  of a projective space.

Until the end of 19th century, geometric spaces were defined by axioms relating points, lines and planes (synthetic geometry). Around this date, it appeared that one may also define geometric spaces by constructions involving vector spaces (see, for example, Projective space and Affine space). It has been shown that the two approaches are essentially equivalent. In classical geometry, the involved vector spaces are vector spaces over the reals, but the constructions may be extended to vector spaces over any field, allowing considering geometry over arbitrary fields, including finite fields.

Presently, most textbooks, introduce geometric spaces from linear algebra, and geometry is often presented, at elementary level, as a subfield of linear algebra.

Usage and applications

Linear algebra is used in almost all areas of mathematics, thus making it relevant in almost all scientific domains that use mathematics. These applications may be divided into several wide categories.

Geometry of ambient space

The modeling of ambient space is based on geometry. Sciences concerned with this space use geometry widely. This is the case with mechanics and robotics, for describing rigid body dynamicsgeodesy for describing Earth 

shapeperspectivitycomputer vision,  and  computer graphics,  for describing the relationship between a scene and its plane representation; and many other scientific domains.

In all t  hese applicat ions, synthetic geometry is often used for general descriptions and a qualitative approach, but for the study of explicit situations, one must compute with coordinates. This requires the heavy use of linear algebra.

Functional analysis

Functional analysis studies function spaces. These are vector spaces with additional structure, such as Hilbert spaces. Linear algebra is thus a fundamental part of functional analysis and its applications, which include, in particular, quantum mechanics (wave functions).

Study of complex systems

Most physical phenomena are modeled by partial differential equations. To solve them, one usually decomposes the space in which the solutions are searched into small, mutually interacting cells. For linear systems this interaction involves linear functions. For nonlinear systems, this interaction is often approximated by linear functions.[b] In both cases, very large matrices are generally involved. Weather forecasting is a typical example, where the whole Earth atmosphere is divided in cells of, say, 100 km of width and 100 m of height.

Scientific computation

Nearly all scientific computations involve linear algebra. Consequently, linear algebra algorithms have been highly optimized. BLAS and LAPACK are the best known implementations. For improving efficiency, some of them configure the algorithms automatically, at run time, for adapting them to the specificities of the computer (cache size, number of available cores, …).

Some processors, typically graphics processing units (GPU), are designed with a matrix structure, for optimizing the operations of linear algebra.

Extensions and generalizations

This section presents several related topics that do not appear generally in elementary textbooks on linear algebra, but are commonly considered, in advanced mathematics, as parts of linear algebra.

Module theory

The existence of multiplicative inverses in fields is not involved in the axioms defining a vector space. One may thus replace the field of scalars by a ring R, and this gives a structure called module over R, or R-module.

The concepts of linear independence, span, basis, and linear maps (also called module homomorphisms) are defined for modules exactly as for vector spaces, with the essential difference that, if R is not a field, there are modules that do not have any basis. The modules that have a basis are the free modules, and those that are spanned by a finite set are the finitely generated modules. Module homomorphisms between finitely generated free modules may be represented by matrices. The theory of matrices over a ring is similar to that of matrices over a field, except that determinants exist only if the ring is commutative, and that a square matrix over a commutative ring is invertible only if its determinant has a multiplicative inverse in the ring.

Vector spaces are completely characterized by their dimension (up to an isomorphism). In general, there is not such a complete classification for modules, even if one restricts oneself to finitely generated modules. However, every module is a cokernel of a homomorphism of free modules.

Modules over the integers can be identified with abelian groups, since the multiplication by an integer may identified to a repeated addition. Most of the theory of abelian groups may be extended to modules over a principal ideal domain. In particular, over a principal ideal domain, every submodule of a free module is free, and the fundamental theorem of finitely generated abelian groups may be extended straightforwardly to finitely generated modules over a principal ring.

There are many rings for which there are algorithms for solving linear equations and systems of linear equations. However, these algorithms have generally a computational complexity that is much higher than the similar algorithms over a field. For more details, see Linear equation over a ring.

Multilinear algebra and tensors

In multilinear algebra, one considers multivariable linear transformations, that is, mappings that are linear in each of a number of different variables. This line of inquiry naturally leads to the idea of the dual space, the vector space V* consisting of linear maps f : V → F where F is the field of scalars. Multilinear maps T : Vn → F can be described via tensor products of elements of V*.

If, in addition to vector addition and scalar multiplication, there is a bilinear vector product V × V → V, the vector space is called an algebra; for instance, associative algebras are algebras with an associate vector product (like the algebra of square matrices, or the algebra of polynomials).

Topological vector spaces

Vector spaces that are not finite dimensional often require additional structure to be tractable. A normed vector space is a vector space along with a function called a norm, which measures the “size” of elements. The norm induces a metric, which measures the distance between elements, and induces a topology, which allows for a definition of continuous maps. The metric also allows for a definition of limits and completeness – a metric space that is complete is known as a Banach space. A complete metric space along with the additional structure of an inner product (a conjugate symmetric sesquilinear form) is known as a Hilbert space, which is in some sense a particularly well-behaved Banach space. Functional analysis applies the methods of linear algebra alongside those of mathematical analysis to study various function spaces; the central objects of study in functional analysis are Lp spaces, which are Banach spaces, and especially the L2 space of square integrable functions, which is the only Hilbert space among them. Functional analysis is of particular importance to quantum mechanics, the theory of partial differential equations, digital signal processing, and electrical engineering. It also provides the foundation and theoretical framework that underlies the Fourier transform and related methods.

Homological algebra

See also

Topic 5 – Calculus – infinite sequences and serieslimitsimproper integrals and various first-order ordinary differential equations

Calculus, originally called infinitesimal calculus or “the calculus of infinitesimals“, is the  mathematical study of continuous change, in the same way that  geometry is the study of shape, and  algebra is the study of generalizations of  arithmetic operations.

It has two major branches,  differential calculus  and  integral calculus; differential calculus concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of  convergence of  infinite sequences and  infinite series to a well-defined  limit.

Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.  Later work, including codifying the idea of limits, put these developments on a more solid conceptual footing. Today, calculus has widespread uses in scienceengineering, and social science.

In mathematics educationcalculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus is Latin for “small pebble” (the diminutive of calx, meaning “stone”). Because such pebbles were used for counting out distances, tallying votes, and doing abacus arithmetic, the word came to mean a method of computation. In this sense, it was used in English at least as early as 1672, several years prior to the publications of Leibniz and Newton. (The older meaning still persists in medicine.) In addition to the differential calculus and integral calculus, the term is also used for naming specific methods of calculation and related theories, such as propositional calculusRicci calculuscalculus of variationslambda calculus, and process calculus.


Modern calculus was developed in 17th-century Europe by Isaac Newton and Gottfried Wilhelm Leibniz (independently of each other, first publishing around the same time) but elements of it appeared in ancient Greece, then in China and the Middle East, and still later again in medieval Europe and in India.

Ancient precursors


Calculations of volume and area, one goal of integral calculus, can be found in the Egyptian Moscow papyrus (c. 1820 BC), but the formulae are simple instructions, with no indication as to how they were obtained.


Archimedes used the method of exhaustion to calculate the area under a parabola.

Laying the foundations for integral calculus and foreshadowing the concept of the limit, ancient Greek mathematician Eudoxus of Cnidus (c. 390 – 337 BCE) developed the method of exhaustion to prove the formulas for cone and pyramid volumes.

During the Hellenistic period, this method was further developed by Archimedes, who combined it with a concept of the indivisibles—a precursor to infinitesimals—allowing him to solve several problems now treated by integral calculus. These problems include, for example, calculating the center of gravity of a solid hemisphere, the center of gravity of a frustum of a circular paraboloid, and the area of a region bounded by a parabola and one of its secant lines.


The method of exhaustion was later discovered independently in China by Liu Hui in the 3rd century AD in order to find the area of a circle.

In the 5th century AD, Zu Gengzhi, son of Zu Chongzhi, established a method[13] that would later be called Cavalieri’s principle to find the volume of a sphere.


Middle East

Alhazen, 11th-century Arab mathematician and physicis. In  the  Middle   East, Hasan Ibn al-Haytham, Latinized as Alhazen  (c. 965 – c. 1040 CE) derived a formula for the sum of fourth powers. He used the results to carry out what would now be called an integration of this function, where the formulae for the sums of integral squares and fourth powers allowed him to calculate the volume of a paraboloid.


In the 14th century, Indian mathematicians gave a non-rigorous method, resembling differentiation, applicable to some trigonometric functions. Madhava of Sangamagrama and the Kerala School of Astronomy and Mathematics thereby stated components of calculus. A complete theory encompassing these components is now well known in the Western . However, they were not able to “combine many differing ideas under the two unifying themes of the derivative and the integral, show the connection between the two, and turn calculus into the great problem-solving tool we have today”.


The calculus was the first achievement of modern mathematics and it is difficult to overestimate its importance. I think it defines more unequivocally than anything else the inception of modern mathematics, and the system of mathematical analysis, which is its logical development, still constitutes the greatest technical advance in exact thinking.

— John von Neumann 

Johannes Kepler‘s work Stereometrica Doliorum formed the basis of integral calculus.[18] Kepler developed a method to calculate the area of an ellipse by adding up the lengths of many radii drawn from a focus of the ellipse.

A significant work was a treatise, the origin being Kepler’s methods,[19] written by Bonaventura Cavalieri, who argued that volumes and areas should be computed as the sums of the volumes and areas of infinitesimally thin cross-sections. The ideas were similar to Archimedes’ in The Method, but this treatise is believed to have been lost in the 13th century, and was only rediscovered in the early 20th century, and so would have been unknown to Cavalieri. Cavalieri’s work was not well respected since his methods could lead to erroneous results, and the infinitesimal quantities he introduced were disreputable at first.

The formal study of calculus brought together Cavalieri’s infinitesimals with the calculus of finite differences developed in Europe at around the same time. Pierre de Fermat, claiming that he borrowed from Diophantus, introduced the concept of adequality, which represented equality up to an infinitesimal error term. The combination was achieved by  John WallisIsaac Barrow, and  James Gregory, the latter two proving predecessors to the  second fundamental theorem of calculus around 1670.

Isaac Newton developed the use of calculus in his laws of motion and gravitation.

The product rule and chain rule, the notions of higher derivatives and Taylor series, and of analytic functions were used by Isaac Newton in an idiosyncratic notation which he applied to solve problems of mathematical physics. In his works, Newton rephrased his ideas to suit the mathematical  idiom of the time, replacing calculations with infinitesimals by equivalent geometrical arguments which were considered beyond reproach. He used the methods of calculus to solve the pro blem of planetary motion, the shape of the surface of a rotating fluid, the oblateness of the earth, the motion of a weight sliding on a cycloid, and many other problems discussed in his Principia Mathematica (1687). In other work, he developed series expansions for functions, including fractional and irrational powers, and it was clear that he understood the principles of the Taylor series. He did not publish all these discoveries, and at this time infinitesimal methods were still considered disreputable.

Gttfried Wilhelm Leibniz was the first to state clearly the rules of calculus.

These ideas were arranged into a true calculus of infinitesimals by  Gottfried Wilhelm Leibniz, who was originally accused of  plagiarism by Newton. He is now regarded as an  independent inventor of and contributor to calculus. His contribution was to provide a clear set of rules for working with infinitesimal quantities, allowing the computation of second and higher derivatives, and providing the product rule and chain rule, in their differential and integral forms. Unlike Newton, Leibniz put painstaking effort into his choices of notation.

Today, Leibniz and Newton are usually both given credit for independently inventing and developing calculus. Newton was the first to apply calculus to general physics and Leibniz developed much of the notation used in calculus today. The basic insights that both Newton and Leibniz provided were the laws of differentiation and integration, second and higher derivatives, and the notion of an approximating polynomial series.

When Newton and Leibniz first published their results, there was great controversy over which mathematician (and therefore which country) deserved credit. Newton derived his results first (later to be published in his Method of Fluxions), but Leibniz published his “Nova Methodus pro Maximis et Minimis” first. Newton claimed Leibniz stole ideas from his unpublished notes, which Newton had shared with a few members of the Royal Society. This controversy divided English-speaking mathematicians from continental European mathematicians for many years, to the detriment of English mathematics. A careful examination of the papers of Leibniz and Newton shows that they arrived at their results independently, with Leibniz starting first with integration and Newton with differentiation. It is Leibniz, however, who gave the new discipline its name. Newton called his calculus “the science of fluxions“, a term that endured in English schools into the 19th century. The first complete treatise on calculus to be written in English and use the Leibniz notation was not published until 1815.

Since the time of Leibniz and Newton, many mathematicians have contributed to the continuing development of calculus. One of the first and most complete works on both infinitesimal and integral calculus was written in 1748 by Maria Gaetana Agnesi.

Maria Gaetana Agnes



In calculus, foundations refers to rigorous development of the subject from axioms and definitions. In early calculus the use of infinitesimal quantities was thought unrigorous, and was fiercely criticized by a number of authors, mos Michel Rolle and Bishop Berkeley. Berkeley famously described infinitesimals as the ghosts of departed quantities in his book The Analyst in 1734. Working out a rigorous foundation for calculus occupied mathematicians for much of the century following Newton and Leibniz, and is still to some extent an active area of research today.  Several mathematicians, including  Maclaurin, tried to prove the soundness of using infinitesimals, but it would not be until 150 years later when, due to the work of  Cauchy and  Weierstrass, a way was finally found to avoid mere “notions” of infinitely small quantities. The foundations of differential and integral calculus had been laid. In Cauchy’s  Cours d’Analyse, we find a broad range of foundational approaches, including a definition of continuity in terms of infinitesimals, and a (somewhat imprecise) prototype of an (ε, δ)-definition of limit in the definition of differentiation.[36] In his work Weierstrass formalized the concept of limit and eliminated infinitesimals (although his definition can actually validate nilsquare infinitesimals). Following the work of Weierstrass, it eventually became common to base calculus on limits instead of infinitesimal quantities, though the subject is still occasionally called “infinitesimal calculus”. Bernhard Riemann used these ideas to give a precise definition of the integral. It was also during this period that the ideas of calculus were generalized to the complex plane with the development of complex analysis.

In modern mathematics, the foundations of calculus are included in the field of real analysis, which contains full definitions and proofs of the theorems of calculus. The reach of calculus has also been greatly extended. Henri Lebesgue  invented  measure theory,  based on earlier developments by  Émile Borel,  and used it to define integrals of all but the most  pathological  functions . Laurent Schwartz  introduced  distributions, which can be used to take the derivative of any function whatsoever.

Limits are not the only rigorous approach to the foundation of calculus. Another way is to use  Abraham Robinson‘s  non-standard analysis. Robinson’s approach, developed in the 1960s, uses technical machinery from  mathematical logic to augment the real number system with infinitesimal and infinite numbers, as in the original Newton-Leibniz conception. The resulting numbers are called  hyperreal numbers, and they can be used to give a Leibniz-like development  of the usual rules of calculusThere is also smooth infinitesimal analysis, which differs from non-standard analysis in that it mandates neglecting higher-power infinitesimals during derivations.


While many of the ideas of calculus had been developed earlier in GreeceChinaIndiaIraq, Persia, and  Japan, the use of calculus began in Europe, during the 17th century, when  Isaac Newton and  Gottfried Wilhelm Leibniz built on the work of earlier mathematicians to introduce its basic principles.  The development of calculus was built on earlier concepts of instantaneous motion and area underneath curves.

Applications of differential calculus include computations involving velocity and acceleration, the slope of a curve, and optimization. Applications of integral calculus include computations involving area, volumearc lengthcenter of masswork, and pressure. More advanced applications include power series and Fourier series.

Calculus is also used to gain a more precise understanding of the nature of space, time, and motion. For centuries, mathematicians and philosophers wrestled with paradoxes involving division by zero or sums of infinitely many numbers. These questions arise in the study of motion and area. The ancient Greek philosopher Zeno of Elea gave several famous examples of such paradoxes. Calculus provides tools, especially the limit and the infinite series, that resolve the paradoxes


Limits and infinitesimals

Calculus is usually developed by working with very small quantities. Historically, the first method of doing so was by infinitesimals. These are objects which can be treated like real numbers but which are, in some sense, “infinitely small”. For example, an infinitesimal number could be greater than 0, but less than any number in the sequence 1, 1/2, 1/3, … and thus less than any positive real number. From this point of view, calculus is a collection of techniques for manipulating infinitesimals. The symbols {\displaystyle dx} and  were taken to be infinitesimal, and the derivative  was simply their ratio.

The infinitesimal approach fell out of favor in the 19th century because it was difficult to make the notion of an infinitesimal precise. In the late 19th century, infinitesimals were replaced within academia by the epsilon, delta approach to limits. Limits describe the behavior of a function at a certain input in terms of its values at nearby inputs. They capture small-scale behavior using the intrinsic structure of the real number system (as a metric space with the least-upper-bound property). In this treatment, calculus is a collection of techniques for manipulating certain limits. Infinitesimals get replaced by sequences of smaller and smaller numbers, and the infinitely small behavior of a function is found by taking the limiting behavior for these sequences. Limits were thought to provide a more rigorous foundation for calculus, and for this reason they became the standard approach during the 20th century. However, the infinitesimal concept was revived in the 20th century with the introduction of non-standard analysis and smooth infinitesimal analysis, which provided solid foundations for the manipulation of infinitesimals.

Differential calculus

Tangent line at (x0f(x0)). The derivative f′(x) of a curve at a point is the slope (rise over run) of the line tangent to that curve at that point. Differential calculus is the study of the definition, properties, and applications of the  derivative  of a function. The process of finding the derivative is called   differentiation.  Given a function and a point in the domain, the derivative at that point is a way of encoding the small-scale behavior of the function near that point. By finding the derivative of a function at every point in its domain, it is possible to produce a new function, called the derivative function or just the derivative of the original function. In formal terms, the derivative is a linear operator which takes a function as its input and produces a second function as its output. This is more abstract than many of the processes studied in elementary algebra, where functions usually input a number and output another number. For example, if the doubling function is given the input three, then it outputs six, and if the squaring function is given the input three, then it outputs nine. The derivative, however, can take the squaring function as an input. This means that the derivative takes all the information of the squaring function—such as that two is sent to four, three is sent to nine, four is sent to sixteen, and so on—and uses this information to produce another function. The function produced by differentiating the squaring function turns out to be the doubling function.

In more explicit terms the “doubling function” may be denoted by g(x) = 2x and the “squaring function” by f(x) = x2. The “derivative” now takes the function f(x), defined by the expression “x2“, as an input, that is all the information—such as that two is sent to four, three is sent to nine, four is sent to sixteen, and so on—and uses this information to output another function, the function g(x) = 2x, as will turn out.

In Lagrange’s notation, the symbol for a derivative is an apostrophe-like mark called a prime. Thus, the derivative of a function called f is denoted by f′, pronounced “f prime”. For instance, if f(x) = x2 is the squaring function, then f′(x) = 2x is its derivative (the doubling function g from above).

If the input of the function represents time, then the derivative represents change with respect to time. For example, if f is a function that takes a time as input and gives the position of a ball at that time as output, then the derivative of f is how the position is changing in time, that is, it is the velocity of the ball.

If a function is linear (that is, if the graph of the function is a straight line), then the function can be written as y = mx + b, where x is the independent variable, y is the dependent variable, b is the y-intercept, and:

This gives an exact value for the slope of a straight line. If the graph of the function is not a straight line, however, then the change in y divided by the change in x varies. Derivatives give an exact meaning to the notion of change in output with respect to change in input. To be concrete, let f be a function, and fix a point a in the domain of f(af(a)) is a point on the graph of the function. If h is a number close to zero, then a + h is a number close to a. Therefore, (a + hf(a + h)) is close to (af(a)). The slope between these two points is

This expression is called a difference quotient. A line through two points on a curve is called a secant line, so m is the slope of the secant line between (af(a)) and (a + hf(a + h)). The secant line is only an approximation to the behavior of the function at the point a because it does not account for what happens between a and a + h. It is not possible to discover the behavior at a by setting h to zero because this would require dividing by zero, which is undefined. The derivative is defined by taking the limit as h tends to zero, meaning that it considers the behavior of f for all small values of h and extracts a consistent value for the case when h equals zero:

Geometrically, the derivative is the slope of the tangent line to the graph of f at a. The tangent line is a limit of secant lines just as the derivative is a limit of difference quotients. For this reason, the derivative is sometimes called the slope of the function f.

Here is a particular example, the derivative of the squaring function at the input 3. Let f(x) = x2 be the squaring function.

The derivative f′(x) of a curve at a point is the slope of the line tangent to that curve at that point. This slope is determined by considering the limiting value of the slopes of secant lines. Here the function involved (drawn in red) is f(x) = x3 − x. The tangent line (in green) which passes through the point (−3/2, −15/8) has a slope of 23/4. Note that the vertical and horizontal scales in this image are different.

The slope of the tangent line to the squaring function at the point (3, 9) is 6, that is to say, it is going up six times as fast as it is going to the right. The limit process just described can be performed for any point in the domain of the squaring function. This defines the derivative function of the squaring function or just the derivative of the squaring function for short. A computation similar to the one above shows that the derivative of the squaring function is the doubling function.

Leibniz notation

A common notation, introduced by Leibniz, for the derivative in the example above is

In an approach based on limits, the symbol dy/dx is to be interpreted not as the quotient of two numbers but as a shorthand for the limit computed above. Leibniz, however, did intend it to represent the quotient of two infinitesimally small numbers, dy being the infinitesimally small change in y caused by an infinitesimally small change dx applied to x. We can also think of d/dx as a differentiation operator, which takes a function as an input and gives another function, the derivative, as the output. For example:

In this usage, the dx in the denominator is read as “with respect to x“. Another example of correct notation could be:

Even when calculus is developed using limits rather than infinitesimals, it is common to manipulate symbols like dx and dy as if they were real numbers; although it is possible to avoid such manipulations, they are sometimes notationally convenient in expressing operations such as the total derivative.

Integral calculus

Integral calculus is the study of the definitions, properties, and applications of two related concepts, the indefinite integral and the definite integral. The process of finding the value of an integral is called integration. In technical language, integral calculus studies two related linear operators.

The indefinite integral, also known as the antiderivative, is the inverse operation to the derivative. F is an indefinite integral of f when f is a derivative of F. (This use of lower- and upper-case letters for a function and its indefinite integral is common in calculus.)

The definite integral inputs a function and outputs a number, which gives the algebraic sum of areas between the graph of the input and the x-axis. The technical definition of the definite integral involves the limit of a sum of areas of rectangles, called a Riemann sum.

A motivating example is the distance traveled in a given time. If the speed is constant, only multiplication is needed:

But if the speed changes, a more powerful method of finding the distance is necessary. One such method is to approximate the distance traveled by breaking up the time into many short intervals of time, then multiplying the time elapsed in each interval by one of the speeds in that interval, and then taking the sum (a Riemann sum) of the approximate distance traveled in each interval. The basic idea is that if only a short time elapses, then the speed will stay more or less the same. However, a Riemann sum only gives an approximation of the distance traveled. We must take the limit of all such Riemann sums to find the exact distance traveled.

Constant velocity


Integration can be thought of as measuring thearea under a curve, defined by f(x), between two points (here a and b).

When velocity is constant, the total distance traveled over the given time interval can be computed by multiplying velocity and time. For example, travelling a steady 50 mph for 3 hours results in a total distance of 150 miles. In the diagram on the left, when constant velocity and time are graphed, these two values form a rectangle with height equal to the velocity and width equal to the time elapsed. Therefore, the product of velocity and time also calculates the rectangular area under the (constant) velocity curve. This connection between the area under a curve and distance traveled can be extended to any irregularly shaped region exhibiting a fluctuating velocity over a given time period. If f(x) in the diagram on the right represents speed as it varies over time, the distance traveled (between the times represented by a and b) is the area of the shaded region s.

To approximate that area, an intuitive method would be to divide up the distance between a and b into a number of equal segments, the length of each segment represented by the symbol Δx. For each small segment, we can choose one value of the function f(x). Call that value h. Then the area of the rectangle with base Δx and height h gives the distance (time Δx multiplied by speed h) traveled in that segment. Associated with each segment is the average value of the function above it, f(x) = h. The sum of all such rectangles gives an approximation of the area between the axis and the curve, which is an approximation of the total distance traveled. A smaller value for Δx will give more rectangles and in most cases a better approximation, but for an exact answer we need to take a limit as Δx approaches zero.

The symbol of integration is, an elongated S (the S stands for “sum”). The definite integral is written as:

and is read “the integral from a to b of f-of-x with respect to x.” The Leibniz notation dx is intended to suggest dividing the area under the curve into an infinite number of rectangles, so that their width Δx becomes the infinitesimally small dx. In a formulation of the calculus based on limits, the notation

is to be understood as an operator that takes a function as an input and gives a number, the area, as an output. The terminating differential, dx, is not a number, and is not being multiplied by f(x), although, serving as a reminder of the Δx limit definition, it can be treated as such in symbolic manipulations of the integral. Formally, the differential indicates the variable over which the function is integrated and serves as a closing bracket for the integration operator.

The indefinite integral, or antiderivative, is written:

Functions differing by only a constant have the same derivative, and it can be shown that the antiderivative of a given function is actually a family of functions differing only by a constant. Since the derivative of the function y = x2 + C, where C is any constant, is y′ = 2x, the antiderivative of the latter is given by:

The unspecified constant C present in the indefinite integral or antiderivative is known as the constant of integration.

Fundamental theorem

The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of antiderivatives  to definite integrals. Because it is usually easier to compute an antiderivative than to apply the definition of a definite integral, the fundamental theorem of calculus provides a practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration.

The fundamental theorem of calculus states: If a function f is continuous on the interval [ab] and if F is a function whose derivative is f on the interval (ab), then

Furthermore, for every x in the interval (ab),

This realization, made by both Newton and Leibniz, was key to the proliferation of analytic results after their work became known. (The extent to which Newton and Leibniz were influenced by immediate predecessors, and particularly what Leibniz may have learned from the work of Isaac Barrow, is difficult to determine thanks to the priority dispute between them.) The fundamental theorem provides an algebraic method of computing many definite integrals—without performing limit processes—by finding formulae for antiderivatives. It is also a prototype solution of a differential equation. Differential equations relate an unknown function to its derivatives, and are ubiquitous in the sciences.


The logarithmic spiral of the Nautilus shell is a classical image used to depict the growth andchange related to calculus.

Calculus is used in every branch of the physical sciences, actuarial sciencecomput er sciencestatisticsengineeringeconomicsbusinessmedicinedemography,  and in other fields wherever a problem can be  mathematically  modeled and an  optimal  solution is desired. It allows one to go from (non-constant) rates of change to the total change or vice versa, and many times in studying a problem we know one and are trying to find the other. Calculus can be used in conjunction with other mathematical disciplines. For example, it can be used with linear algebra to find the “best fit” linear approximation for a set of points in a domain. Or, it can be used in probability theory to determine the expectation value of a continuous random variable given a probability density function. In analytic geometry, the study of graphs of functions, calculus is used to find high points and low points (maxima and minima), slope,  concavity  and  inflection points.  Calculus is also used to find approximate solutions to equations; in practice it is the standard way to solve differential equations and do root finding in most applications. Examples are methods such as Newton’s methodfixed point iteration, and   linear approximation. For instance, spacecraft use a variation of the  Euler method  to approximate curved courses within zero gravity environments.

Physics makes particular use of calculus; all concepts in classical mechanics  and electromagnetism are related through calculus. The mass of an object of known  density, the  moment of inertia of objects, and the potential energies due to gravitational and electromagnetic forces can all be found by the use of calculus. An example of the use of calculus in mechanics is Newton’s second law of motion, which states that the derivative of an object’s momentum with respect to time equals the net force upon it. Alternatively, Newton’s second law can be expressed by saying that the net force is equal to the object’s mass times its acceleration, which is the time derivative of velocity and thus the second time derivative of spatial position. Starting from knowing how an object is accelerating, we use calculus to derive its path.

Maxwell’s theory of electromagnetism and Einstein‘s theory of general relativity are also expressed in the language of differential calculus. Chemistry also uses calculus in determining reaction rates  and in studying radioactive decay. In biology, population dynamics starts with reproduction and death rates to model population changes. 

Green’s theorem, which gives the relationship between a line integral around a simple closed curve C and a double integral over the plane region D bounded by C, is applied in an instrument known as a planimeter, which is used to calculate the area of a flat surface on a drawing.  For example, it can be used to calculate the amount of area taken up by an irregularly shaped flower bed or swimming pool when designing the layout of a piece of property.

In the realm of medicine, calculus can be used to find the optimal branching angle of a blood vessel so as to maximize flow. Calculus can be applied to understand how quickly a drug is eliminated from a body or how quickly a  cancerous  tumour grows.

In economics, calculus allows for the determination of maximal profit by providing a way to easily calculate both marginal cost and marginal revenue.


Over the years, many reformulations of calculus have been investigated for different purposes.

Non-standard calculus

Imprecise calculations with infinitesimals were widely replaced with the rigorous  (ε, δ)-definition of limit starting in the 1870s. Meanwhile, calculations with infinitesimals persisted and often led to correct results. This led Abraham Robinson to investigate if it were possible to develop a number system with infinitesimal quantities over which the theorems of calculus were still valid. In 1960, building upon the work of Edwin Hewitt and Jerzy Łoś, he succeeded in developing non-standard analysis. The theory of non-standard analysis is rich enough to be applied in many branches of mathematics. As such, books and articles dedicated solely to the traditional theorems of calculus often go by the title non-standard calculus.

Smooth infinitesimal analysis

This is another reformulation of the calculus in terms of infinitesimals. Based on the ideas of F. W. Lawvere and employing the methods of category theory, it views all functions as being continuous and incapable of being expressed in terms of discrete entities. One aspect of this formulation is that the law of excluded middle does not hold in this formulation.

Constructive analysis

Constructive mathematics is a branch of mathematics that insists that proofs of the existence of a number, function, or other mathematical object should give a construction of the object. As such constructive mathematics also rejects the law of excluded middle. Reformulations of calculus in a constructive framework are generally part of the subject of constructive analysis.[34]

See also


Other related topics

Topic 6 – Discrete mathematics – complete mathematical induction, linear Diophantine equationsFermat’s little theoremroute inspection problem and recurrence relations

Discrete mathematics is the study of mathematical structures that can be considered “discrete” (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than “continuous” (analogously to continuous functions). Objects studied in discrete mathematics include integersgraphs, and statements in logic.[1][2][3][4] By contrast, discrete mathematics excludes topics in “continuous mathematics” such as real numberscalculus or Euclidean geometry. Discrete objects can often be enumerated by integers; more formally, discrete mathematics has been characterized as the branch of mathematics dealing with countable sets[5] (finite sets or sets with the same cardinality as the natural numbers). However, there is no exact definition of the term “discrete mathematics”.[6]

The set of objects studied in discrete mathematics can be finite or infinite. The term finite mathematics is sometimes applied to parts of the field of discrete mathematics that deals with finite sets, particularly those areas relevant to business.

Research in discrete mathematics increased in the latter half of the twentieth century partly due to the development of digital computers which operate in “discrete” steps and store data in “discrete” bits. Concepts and notations from discrete mathematics are useful in studying and describing objects and problems in branches of computer science, such as computer algorithmsprogramming languagescryptographyautomated theorem proving, and software development. Conversely, computer implementations are significant in applying ideas from discrete mathematics to real-world problems, such as in operations research.

Although the main objects of study in discrete mathematics are discrete objects, analytic methods from “continuous” mathematics are often employed as well.

In university curricula, “Discrete Mathematics” appeared in the 1980s, initially as a computer science support course; its contents were somewhat haphazard at the time. The curriculum has thereafter developed in conjunction with efforts by ACM and MAA into a course that is basically intended to develop mathematical maturity in first-year students; therefore, it is nowadays a prerequisite for mathematics majors in some universities as well.[7][8] Some high-school-level discrete mathematics textbooks have appeared as well.[9] At this level, discrete mathematics is sometimes seen as a preparatory course, not unlike precalculus in this respect.[10]

The Fulkerson Prize is awarded for outstanding papers in discrete mathematics.

Grand challenges, past and present[edit]

Much research in graph theory was motivated by attempts to prove that all maps, like this one, can be colored using only four colors so that no areas of the same color share an edge. Kenneth Appel and Wolfgang Haken proved this in 1976.[11]

The history of discrete mathematics has involved a number of challenging problems which have focused attention within areas of the field. In graph theory, much research was motivated by attempts to prove the four color theorem, first stated in 1852, but not proved until 1976 (by Kenneth Appel and Wolfgang Haken, using substantial computer assistance).[11]

In logic, the second problem on David Hilbert‘s list of open problems presented in 1900 was to prove that the axioms of arithmetic are consistentGödel’s second incompleteness theorem, proved in 1931, showed that this was not possible – at least not within arithmetic itself. Hilbert’s tenth problem was to determine whether a given polynomial Diophantine equation with integer coefficients has an integer solution. In 1970, Yuri Matiyasevich proved that this could not be done.

The need to break German codes in World War II led to advances in cryptography and theoretical computer science, with the first programmable digital electronic computer being developed at England’s Bletchley Park with the guidance of Alan Turing and his seminal work, On Computable Numbers.[12] At the same time, military requirements motivated advances in operations research. The Cold War meant that cryptography remained important, with fundamental advances such as public-key cryptography being developed in the following decades. Operations research remained important as a tool in business and project management, with the critical path method being developed in the 1950s. The telecommunication industry has also motivated advances in discrete mathematics, particularly in graph theory and information theoryFormal verification of statements in logic has been necessary for software development of safety-critical systems, and advances in automated theorem proving have been driven by this need.

Computational geometry has been an important part of the computer graphics incorporated into modern video games and computer-aided design tools.

Several fields of discrete mathematics, particularly theoretical computer science, graph theory, and combinatorics, are important in addressing the challenging bioinformatics problems associated with understanding the tree of life.[13]

Currently, one of the most famous open problems in theoretical computer science is the P = NP problem, which involves the relationship between the complexity classes P and NP. The Clay Mathematics Institute has offered a $1 million USD prize for the first correct proof, along with prizes for six other mathematical problems.[14]

Topics in discrete mathematics[edit]

Theoretical computer science[edit]

Complexity studies the time taken by algorithms, such as this sorting routine.

Theoretical computer science includes areas of discrete mathematics relevant to computing. It draws heavily on graph theory and mathematical logic. Included within theoretical computer science is the study of algorithms and data structures. Computability studies what can be computed in principle, and has close ties to logic, while complexity studies the time, space, and other resources taken by computations. Automata theory and formal language theory are closely related to computability. Petri nets and process algebras are used to model computer systems, and methods from discrete mathematics are used in analyzing VLSI electronic circuits. Computational geometry applies algorithms to geometrical problems, while computer image analysis applies them to representations of images. Theoretical computer science also includes the study of various continuous computational topics.

Information theory[edit]

The ASCII codes for the word “Wikipedia”, given here in binary, provide a way of representing the word in information theory, as well as for information-processing algorithms.

Information theory involves the quantification of information. Closely related is coding theory which is used to design efficient and reliable data transmission and storage methods. Information theory also includes continuous topics such as: analog signalsanalog codinganalog encryption.


Logic is the study of the principles of valid reasoning and inference, as well as of consistencysoundness, and completeness. For example, in most systems of logic (but not in intuitionistic logicPeirce’s law (((PQ)→P)→P) is a theorem. For classical logic, it can be easily verified with a truth table. The study of mathematical proof is particularly important in logic, and has applications to automated theorem proving and formal verification of software.

Logical formulas are discrete structures, as are proofs, which form finite trees[15] or, more generally, directed acyclic graph structures[16][17] (with each inference step combining one or more premise branches to give a single conclusion). The truth values of logical formulas usually form a finite set, generally restricted to two values: true and false, but logic can also be continuous-valued, e.g., fuzzy logic. Concepts such as infinite proof trees or infinite derivation trees have also been studied,[18] e.g. infinitary logic.

Set theory[edit]

Set theory is the branch of mathematics that studies sets, which are collections of objects, such as {blue, white, red} or the (infinite) set of all prime numbersPartially ordered sets and sets with other relations have applications in several areas.

In discrete mathematics, countable sets (including finite sets) are the main focus. The beginning of set theory as a branch of mathematics is usually marked by Georg Cantor‘s work distinguishing between different kinds of infinite set, motivated by the study of trigonometric series, and further development of the theory of infinite sets is outside the scope of discrete mathematics. Indeed, contemporary work in descriptive set theory makes extensive use of traditional continuous mathematics.


Combinatorics studies the way in which discrete structures can be combined or arranged. Enumerative combinatorics concentrates on counting the number of certain combinatorial objects – e.g. the twelvefold way provides a unified framework for counting permutationscombinations and partitionsAnalytic combinatorics concerns the enumeration (i.e., determining the number) of combinatorial structures using tools from complex analysis and probability theory. In contrast with enumerative combinatorics which uses explicit combinatorial formulae and generating functions to describe the results, analytic combinatorics aims at obtaining asymptotic formulae. Design theory is a study of combinatorial designs, which are collections of subsets with certain intersection properties. Partition theory studies various enumeration and asymptotic problems related to integer partitions, and is closely related to q-seriesspecial functions and orthogonal polynomials. Originally a part of number theory and analysis, partition theory is now considered a part of combinatorics or an independent field. Order theory is the study of partially ordered sets, both finite and infinite.

Graph theory[edit]

Graph theory has close links to group theory. This truncated tetrahedron graph is related to the alternating group A4.

Graph theory, the study of graphs and networks, is often considered part of combinatorics, but has grown large enough and distinct enough, with its own kind of problems, to be regarded as a subject in its own right.[19] Graphs are one of the prime objects of study in discrete mathematics. They are among the most ubiquitous models of both natural and human-made structures. They can model many types of relations and process dynamics in physical, biological and social systems. In computer science, they can represent networks of communication, data organization, computational devices, the flow of computation, etc. In mathematics, they are useful in geometry and certain parts of topology, e.g. knot theoryAlgebraic graph theory has close links with group theory. There are also continuous graphs; however, for the most part, research in graph theory falls within the domain of discrete mathematics.


Discrete probability theory deals with events that occur in countable sample spaces. For example, count observations such as the numbers of birds in flocks comprise only natural number values {0, 1, 2, …}. On the other hand, continuous observations such as the weights of birds comprise real number values and would typically be modeled by a continuous probability distribution such as the normal. Discrete probability distributions can be used to approximate continuous ones and vice versa. For highly constrained situations such as throwing dice or experiments with decks of cards, calculating the probability of events is basically enumerative combinatorics.

Number theory[edit]

The Ulam spiral of numbers, with black pixels showing prime numbers. This diagram hints at patterns in the distribution of prime numbers.

Number theory is concerned with the properties of numbers in general, particularly integers. It has applications to cryptography and cryptanalysis, particularly with regard to modular arithmeticdiophantine equations, linear and quadratic congruences, prime numbers and primality testing. Other discrete aspects of number theory include geometry of numbers. In analytic number theory, techniques from continuous mathematics are also used. Topics that go beyond discrete objects include transcendental numbersdiophantine approximationp-adic analysis and function fields.

Algebraic structures[edit]

Algebraic structures occur as both discrete examples and continuous examples. Discrete algebras include: boolean algebra used in logic gates and programming; relational algebra used in databases; discrete and finite versions of groupsrings and fields are important in algebraic coding theory; discrete semigroups and monoids appear in the theory of formal languages.

Calculus of finite differences, discrete calculus or discrete analysis[edit]

function defined on an interval of the integers is usually called a sequence. A sequence could be a finite sequence from a data source or an infinite sequence from a discrete dynamical system. Such a discrete function could be defined explicitly by a list (if its domain is finite), or by a formula for its general term, or it could be given implicitly by a recurrence relation or difference equation. Difference equations are similar to differential equations, but replace differentiation by taking the difference between adjacent terms; they can be used to approximate differential equations or (more often) studied in their own right. Many questions and methods concerning differential equations have counterparts for difference equations. For instance, where there are integral transforms in harmonic analysis for studying continuous functions or analogue signals, there are discrete transforms for discrete functions or digital signals. As well as the discrete metric there are more general discrete or finite metric spaces and finite topological spaces.


Computational geometry applies computer algorithms to representations of geometrical objects.

Discrete geometry and combinatorial geometry are about combinatorial properties of discrete collections of geometrical objects. A long-standing topic in discrete geometry is tiling of the plane. Computational geometry applies algorithms to geometrical problems.


Although topology is the field of mathematics that formalizes and generalizes the intuitive notion of “continuous deformation” of objects, it gives rise to many discrete topics; this can be attributed in part to the focus on topological invariants, which themselves usually take discrete values. See combinatorial topologytopological graph theorytopological combinatoricscomputational topologydiscrete topological spacefinite topological spacetopology (chemistry).

Operations research[edit]

PERT charts like this provide a project management technique based on graph theory.

Operations research provides techniques for solving practical problems in engineering, business, and other fields — problems such as allocating resources to maximize profit, and scheduling project activities to minimize risk. Operations research techniques include linear programming and other areas of optimizationqueuing theoryscheduling theory, and network theory. Operations research also includes continuous topics such as continuous-time Markov process, continuous-time martingalesprocess optimization, and continuous and hybrid control theory.

Game theory, decision theory, utility theory, social choice theory[edit]

Cooperate-1, -1−10, 0
Defect0, -10-5, -5
Payoff matrix for the Prisoner’s dilemma, a common example in game theory. One player chooses a row, the other a column; the resulting pair gives their payoffs

Decision theory is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision.

Utility theory is about measures of the relative economic satisfaction from, or desirability of, consumption of various goods and services.

Social choice theory is about voting. A more puzzle-based approach to voting is ballot theory.

Game theory deals with situations where success depends on the choices of others, which makes choosing the best course of action more complex. There are even continuous games, see differential game. Topics include auction theory and fair division.


Discretization concerns the process of transferring continuous models and equations into discrete counterparts, often for the purposes of making calculations easier by using approximations. Numerical analysis provides an important example.

Discrete analogues of continuous mathematics[edit]

There are many concepts in continuous mathematics which have discrete versions, such as discrete calculusdiscrete probability distributionsdiscrete Fourier transformsdiscrete geometrydiscrete logarithmsdiscrete differential geometrydiscrete exterior calculusdiscrete Morse theorydifference equationsdiscrete dynamical systems, and discrete vector measures.

In applied mathematicsdiscrete modelling is the discrete analogue of continuous modelling. In discrete modelling, discrete formulae are fit to data. A common method in this form of modelling is to use recurrence relation.

In algebraic geometry, the concept of a curve can be extended to discrete geometries by taking the spectra of polynomial rings over finite fields to be models of the affine spaces over that field, and letting subvarieties or spectra of other rings provide the curves that lie in that space. Although the space in which the curves appear has a finite number of points, the curves are not so much sets of points as analogues of curves in continuous settings. For example, every point of the form {\displaystyle V(x-c)\subset \operatorname {Spec} K[x]=\mathbb {A} ^{1}} for {\ displaystyle K} a field can be studied either as {\displaystyle \operatorname {Spec} K[x]/(x-c)\cong \operatorname {Spec} K}, a point, or as the spectrum {\displaystyle \operatorname {Spec} K[x]_{(x-c)}} of the local ring at (x-c), a point together with a neighborhood around it. Algebraic varieties also have a well-defined notion of tangent space called the Zariski tangent space, making many features of calculus applicable even in finite settings.

Hybrid discrete and continuous mathematics[edit]

The time scale calculus is a unification of the theory of difference equations with that of differential equations, which has applications to fields requiring simultaneous modelling of discrete and continuous data. Another way of modeling such a situation is the notion of hybrid dynamical systems.

See also[edit]

Translate »