# Higher Mathematics

Matan per hour. A cheat sheet for a freshman. higher mathematics

1. 0:00 Introduction
2. 1:10 Limits
3. 15:50 Derivative
4. 23:52 L’Hopital Rules
5. 32:00 Taylor row
6. 42:00 Plotting a function
7. 54:22 Quintessence

### What is higher mathematics?

1. 0:00 Introduction
2. 1:32 A bit of history
3. 2:38 Mathematics 16th century
4. 4:15 Mathematics 17th century
5. 5:12 Higher Mathematics Distinction
6. 6:08 Branches of higher mathematics
7. 9:12 Paradoxes of higher mathematics
8. 13:22 The use of a vyshmat
9. 14:08 Summary

## Linear algebra

Linear algebra is the branch of mathematics concerning linear equations such as:

linear maps such as:

and their representations in vector spaces and through matrices.

Linear algebra is central to almost all areas of mathematics. For instance, linear algebra is fundamental in modern presentations of geometry, including for defining basic objects such as linesplanes and rotations. Also, functional analysis, a branch of mathematical analysis, may be viewed as the application of linear algebra to spaces of functions.

Linear algebra is also used in most sciences and fields of engineering, because it allows modeling many natural phenomena, and computing efficiently with such models. For nonlinear systems, which cannot be modeled with linear algebra, it is often used for dealing with first-order approximations, using the fact that the differential of a multivariate function at a point is the linear map that best approximates the function near that point.

## History

The  procedure  (using counting rods) for solving simultaneous linear equations now called Gaussian elimination appears in the ancient Chinese mathematical text Chapter Eight: Rectangular Arrays of The Nine Chapters on the Mathematical Art. Its use is illustrated in eighteen problems, with two to five equations.

Systems of linear equations arose in Europe with the introduction in 1637 by  René Descartes of coordinates  in geometry.  In fact, in this new geometry, now called  Cartesian geometry, lines and planes are represented by linear equations, and computing their intersections amounts to solving systems of linear equations.

The first systematic methods for solving linear systems used determinants, first considered by Leibniz in 1693. In 1750, Gabriel Cramer used them for giving explicit solutions of linear systems, now called Cramer’s rule. Later, Gauss further described the method of elimination, which was initially listed as an advancement in  geodesy.

In 1844  Hermann Grassmann  published his “Theory of Extension” which included foundational new topics of what is today called linear algebra. In 1848,  James Joseph Sylvester   introduced the term matrix, which is Latin for womb.

Linear algebra grew with ideas noted in the  complex plane.  For instance, two numbers w and z in  have a difference w – z, and the line segments  wz  and 0(w − z)  are of the same length and direction. The segments are  equipollent.  The four-dimensional system of  quaternions   was started in 1843. The term vector was introduced as v = xi + yj + zk representing a point in space. The quaternion difference p – q also produces a segment equipollent to pq. Other hypercomplex number systems also used the idea of a linear space with a  basis.

Arthur Cayley  introduced  matrix multiplication  and the  inverse matrix  in 1856, making possible the  general linear group. The mechanism of  group representation  became available for describing complex and hypercomplex numbers. Crucially, Cayley used a single letter to denote a matrix, thus treating a matrix as an aggregate object. He also realized the connection between matrices and determinants, and wrote “There would be many things to say about this theory of matrices which should, it seems to me, precede the theory of determinants”.

Benjamin Peirce published his Linear Associative Algebra (1872), and his son Charles Sanders Peirce extended the work later.

The telegraph required an explanatory system, and the 1873 publication of A Treatise on Electricity and Magnetism instituted a field theory of forces and required differential geometry for expression. Linear algebra is flat differential geometry and serves in tangent spaces to manifolds. Electromagnetic symmetries of spacetime are expressed by the Lorentz transformations, and much of the history of linear algebra is the history of Lorentz transformations.

The first modern and more precise definition of a vector space was introduced by Peano in 1888; by 1900, a theory of linear transformations of finite-dimensional vector spaces had emerged. Linear algebra took its modern form in the first half of the twentieth century, when many ideas and methods of previous centuries were generalized as abstract algebra. The development of computers led to increased research in efficient algorithms for Gaussian elimination and matrix decompositions, and linear algebra became an essential tool for modelling and simulations.

## Vector spaces

Until the 19th century, linear algebra was introduced through systems of linear equations and matrices. In modern mathematics, the presentation through vector spaces is generally preferred, since it is more synthetic, more general (not limited to the finite-dimensional case), and conceptually simpler, although more abstract.

A vector space over a  field  F (often the field of the   real numbers) is a set  V equipped  with two  binary operations  satisfying the following axiomsElements  of V are called vectors, and elements of F are called scalars. The first operation,  vector addition,  takes any two vectors v and w and outputs a third vector v + w. The second operation,  scalar multiplication,  takes any scalar a and any vector  v  and outputs a new vector av. The axioms that addition and scalar multiplication must satisfy are the following. (In the list below, uv and w are arbitrary elements of V, and a and b are arbitrary scalars in the field F.)

 Axiom Signification Associativity of addition u + (v + w) = (u + v) + w Commutativity of addition u + v = v + u Identity element of addition There exists an element 0 in V, called the zero vector (or simply zero), such that v + 0 = v for all v in V. Inverse elements of addition For every v in V, there exists an element −v in V, called the additive inverse of v, such that v + (−v) = 0 Distributivity of scalar multiplication with respect to vector addition a ( u + v ) = a u + a v Distributivity of scalar multiplication with respect to field addition ( a + b ) v = a v + b v Compatibility of scalar multiplication with field multiplication a(bv) = (ab)v Identity element of scalar multiplication 1v = v, where 1 denotes the multiplicative identity of F.

The first four axioms mean that V is an abelian group under addition.

An element of a specific vector space may have various nature; for example, it could be a sequence, a function, a polynomial or a matrix. Linear algebra is concerned with those properties of such objects that are common to all vector spaces.

### Linear maps

Linear maps are  mappings  between vector spaces that preserve the vector-space structure. Given two vector spaces V and W over a field F, a linear map (also called, in some contexts, linear transformation or linear mapping) is a  map

that is compatible with addition and scalar multiplication, that is

for any vectors u,v in V and scalar a in F.

This implies that for any vectors uv in V and scalars ab in F, one has

When V = W are the same vector space, a linear map T : V → V is also known as a linear operator on V.

bijective  linear map between two vector spaces (that is, every vector from the second space is associated with exactly one in the first) is an  isomorphism. Because an isomorphism preserves linear structure, two isomorphic vector spaces are “essentially the same” from the linear algebra point of view, in the sense that they cannot be distinguished by using vector space properties. An essential question in linear algebra is testing whether a linear map is an isomorphism or not, and, if it is not an isomorphism, finding its  range  (or image) and the set of elements that are mapped to the zero vector, called the  kernel  of the map. All these questions can be solved by using  Gaussian elimination  or some variant of this algorithm.

### Subspaces, span, and basis

The study of those subsets of vector spaces that are in themselves vector spaces under the induced operations is fundamental, similarly as for many mathematical structures. These subsets are called linear subspaces. More precisely, a linear subspace of a vector space V over a field F is a subset W of V such that u + v and au are in W, for every uv in W, and every a in F. (These conditions suffice for implying that W is a vector space.)

For example, given a linear map T : V → W, the image T(V) of V, and the inverse image T−1(0) of 0 (called kernel or null space), are linear subspaces of W and V, respectively.

Another important way of forming a subspace is to consider linear combinations of a set S of vectors: the set of all sums

where v1v2, …, vk are in S, and a1a2, …, ak are in F form a linear subspace called the  span of S. The span of S is also the intersection of all linear subspaces containing S. In other words, it is the smallest (for the inclusion relation) linear subspace containing S.

A set of vectors is linearly independent if none is in the span of the others. Equivalently, a set S of vectors is linearly independent if the only way to express the zero vector as a linear combination of elements of S is to take zero for every coefficient ai.

A set of vectors that spans a vector space is called a  spanning set  or  generating set. If a spanning set S is linearly dependent (that is not linearly independent), then some element w of S is in the span of the other elements of S, and the span would remain the same if one remove w from S.  One may continue to remove elements of S until getting a linearly independent spanning set.  Such a linearly independent  set that spans a vector space V is called a  basis  of V. The importance of bases lies in the fact that they are simultaneously minimal generating sets and maximal independent sets. More precisely, if S is a linearly independent set, and T is a spanning set such that S ⊆ T, then there is a basis B such that S ⊆ B ⊆ T.

Any two bases of a vector space V have the same cardinality, which is called the dimension of V; this is the dimension theorem for vector spaces. Moreover, two vector spaces over the same field F are  isomorphic  if and only if they have the same dimension.

If any basis of V (and therefore every basis) has a finite number of elements, V is a finite-dimensional vector space. If U is a subspace of V, then dim U ≤ dim V. In the case where V is finite-dimensional, the equality of the dimensions implies U = V.

If U1 and U2 are subspaces of V, then

where U1 + U2 denotes the span of U1 ∪ U2.

## Matrices

Matrices allow explicit manipulation of finite-dimensional vector spaces and linear maps. Their theory is thus an essential part of linear algebra. Let V be a finite-dimensional vector space over a field F, and (v1v2, …, vm) be a basis of V (thus m is the dimension of V). By definition of a basis, the map

is a bijection from Fm, the set of the sequences of m elements of F, onto V. This is an isomorphism of vector spaces, if Fm is equipped of its standard structure of vector space, where vector addition and scalar multiplication are done component by component.

This isomorphism allows representing a vector by its inverse image under this isomorphism, that is by the coordinates vector (a1, …, am) or by the column matrix

If W is another finite dimensional vector space (possibly the same), with a basis (w1, …, wn), a linear map f from W to V is well defined by its values on the basis elements, that is (f(w1), …, f(wn)). Thus, f is well represented by the list of the corresponding column matrices.

That is, if

for j = 1, …, n, then f is represented by the matrix

with m rows and n columns.

Matrix multiplication is defined in such a way that the product of two matrices is the matrix of the composition of the corresponding linear maps, and the product of a matrix and a column matrix is the column matrix representing the result of applying the represented linear map to the represented vector. It follows that the theory of finite-dimensional vector spaces and the theory of matrices are two different languages for expressing exactly the same concepts.

Two matrices that encode the same linear transformation in different bases are called similar. It can be proved that two matrices are similar if and only if one can transform one into the other by elementary row and column operations. For a matrix representing a linear map from W to V, the row operations correspond to change of bases in V and the column operations correspond to change of bases in W. Every matrix is similar to an  identity matrix  possibly bordered by zero rows and zero columns. In terms of vector spaces, this means that, for any linear map from W to V, there are bases such that a part of the basis of W is mapped bijectively on a part of the basis of V, and that the remaining basis elements of W, if any, are mapped to zero.  Gaussian elimination  is the basic algorithm for finding these elementary operations, and proving these results.

## Linear systems

A finite set of linear equations in a finite set of variables, for example, x1x2, …, xn, or xy, …, z is called a system of linear equations or a linear system.

Systems of linear equations form a fundamental part of linear algebra. Historically, linear algebra and matrix theory has been developed for solving such systems. In the modern presentation of linear algebra through vector spaces and matrices, many problems may be interpreted in terms of linear systems.

For example, let

be a linear system.

To such a system, one may associate its matrix

and its right member vector

Let T be the linear transformation associated to the matrix M. A solution of the system (S) is a vector

such that   {\displaystyle T(\mathbf {X} )=\mathbf {v} ,}

that is an element of the preimage of v by T.

Let (S′) be the associated homogeneous system, where the right-hand sides of the equations are put to zero:

The solutions of (S′) are exactly the elements of the kernel of T or, equivalently, M.

The Gaussian-elimination consists of performing elementary row operations on the augmented matrix

for putting it in reduced row echelon form. These row operations do not change the set of solutions of the system of equations. In the example, the reduced echelon form is

showing that the system (S) has the unique solution

It follows from this matrix interpretation of linear systems that the same methods can be applied for solving linear systems and for many operations on matrices and linear transformations, which include the computation of the rankskernelsmatrix inverses.

## Endomorphisms and square matrices

A linear endomorphism is a linear map that maps a vector space V to itself. If V has a basis of n elements, such an endomorphism is represented by a square matrix of size n.

With respect to general linear maps, linear endomorphisms and square matrices have some specific properties that make their study an important part of linear algebra, which is used in many parts of mathematics, including geometric transformationscoordinate changesquadratic forms, and many other part of mathematics.

### Determine

The determinant of a square matrix A is defined to be.

where Sn is the group of all permutations of n elements, σ is a permutation, and (−1)σ the parity of the permutation. A matrix is invertible if and only if the determinant is invertible (i.e., nonzero if the scalars belong to a field).

Cramer’s rule is a closed-form expression, in terms of determinants, of the solution of a system of n linear equations in n unknowns. Cramer’s rule is useful for reasoning about the solution, but, except for n = 2 or 3, it is rarely used for computing a solution, since Gaussian elimination is a faster algorithm.

The determinant of an endomorphism is the determinant of the matrix representing the endomorphism in terms of some ordered basis. This definition makes sense, since this determinant is independent of the choice of the basis.

### Eigenvalues and eigenvectors

If f is a linear endomorphism of a vector space V over a field F, an eigenvector of f is a nonzero vector v of V such that f(v) = av for some scalar a in F. This scalar a is an eigenvalue of f.

If the dimension of V is finite, and a basis has been chosen, f and v may be represented, respectively, by a square matrix M and a column matrix z; the equation defining eigenvectors and eigenvalues becomes

Using the identity matrix I, whose entries are all zero, except those of the main diagonal, which are equal to one, this may be rewritten

As z is supposed to be nonzero, this means that M – aI is a singular matrix, and thus that its determinant det (M − aI) equals zero. The eigenvalues are thus the roots of the polynomial

If V is of dimension n, this is a monic polynomial of degree n, called the characteristic polynomial of the matrix (or of the endomorphism), and there are, at most, n eigenvalues.

If a basis exists that consists only of eigenvectors, the matrix of f on this basis has a very simple structure: it is a  diagonal matrix  such that the entries on the main diagonal are eigenvalues, and the other entries are zero. In this case, the endomorphism and the matrix are said to be  diagonalizable. More generally, an endomorphism and a matrix are also said diagonalizable, if they become diagonalizable after  extending  the field of scalars. In this extended sense, if the characteristic polynomial is  square-free,  then the matrix is diagonalizable.

symmetric matrix is always diagonalizable. There are non-diagonalizable matrices, the simplest being

(it cannot be diagonalizable since its square is the zero matrix, and the square of a nonzero diagonal matrix is never zero).

When an endomorphism is not diagonalizable, there are bases on which it has a simple form, although not as simple as the diagonal form. The Frobenius normal form does not need of extending the field of scalars and makes the characteristic polynomial immediately readable on the matrix. The Jordan normal form requires to extend the field of scalar for containing all eigenvalues, and differs from the diagonal form only by some entries that are just above the main diagonal and are equal to 1.

## Duality

linear form is a linear map from a vector space V over a field F to the field of scalars F, viewed as a vector space over itself. Equipped by pointwise addition and multiplication by a scalar, the linear forms form a vector space, called the dual space of V, and usually denoted V*[16] or V′.[17][18]

If v1, …, vn is a basis of V (this implies that V is finite-dimensional), then one can define, for i = 1, …, n, a linear map vi* such that vi*(vi) = 1 and vi*(vj) = 0 if j ≠ i. These linear maps form a basis of V*, called the dual basis of v1, …, vn. (If V is not finite-dimensional, the vi* may be defined similarly; they are linearly independent, but do not form a basis.)

For v in V, the map

is a linear form on V*. This defines the canonical linear map from V into (V*)*, the dual of V*, called the bidual of V. This canonical map is an isomorphism if V is finite-dimensional, and this allows identifying V with its bidual. (In the infinite dimensional case, the canonical map is injective, but not surjective.)

There is thus a complete symmetry between a finite-dimensional vector space and its dual. This motivates the frequent use, in this context, of the bra–ket notation

for denoting f(x).

### Dual map

Let

be a linear map. For every linear form h on W, the composite function h ∘ f is a linear form on V. This defines a linear map

between the dual spaces, which is called the dual or the transpose of f.

If V and W are finite dimensional, and M is the matrix of f in terms of some ordered bases, then the matrix of f * over the dual bases is the transpose MT of M, obtained by exchanging rows and columns.

If elements of vector spaces and their duals are represented by column vectors, this duality may be expressed in bra–ket notation by

For highlighting this symmetry, the two members of this equality are sometimes written

### Inner-product spaces

Besides these basic concepts, linear algebra also studies vector spaces with additional structure, such as an inner product. The inner product is an example of a bilinear form, and it gives the vector space a geometric structure by allowing for the definition of length and angles. Formally, an inner product is a map

that satisfies the following three axioms for all vectors uvw in V and all scalars a in F:[19][20]

In , it is symmetric.
with equality only for v = 0.

We can define the length of a vector v in V by

and we can prove the Cauchy–Schwarz inequality:

In particular, the quantity

and so we can call this quantity the cosine of the angle between the two vectors.

Two vectors are orthogonal if uv⟩ = 0. An orthonormal basis is a basis where all basis vectors have length 1 and are orthogonal to each other. Given any finite-dimensional vector space, an orthonormal basis could be found by the Gram–Schmidt procedure. Orthonormal bases are particularly easy to deal with, since if v = a1 v1 + ⋯ + an vn, then

The inner product facilitates the construction of many useful concepts. For instance, given a transform T, we can define its Hermitian conjugate T* as the linear transform satisfying

If T satisfies TT* = T*T, we call T normal. It turns out that normal matrices are precisely the matrices that have an orthonormal system of eigenvectors that span V.

## Relationship with geometry

There is a strong relationship between linear algebra and geometry, which started with the introduction by René Descartes, in 1637, of Cartesian coordinates. In this new (at that time) geometry, now called Cartesian geometry, points are represented by Cartesian coordinates, which are sequences of three real numbers (in the case of the usual three-dimensional space). The basic objects of geometry, which are lines and planes are represented by linear equations. Thus, computing intersections of lines and planes amounts to solving systems of linear equations. This was one of the main motivations for developing linear algebra.

Most geometric transformation, such as translationsrotationsreflectionsrigid motionsisometries, and projections transform lines into lines. It follows that they can be defined, specified and studied in terms of linear maps. This is also the case of homographies and Möbius transformations, when considered as transformations of a projective space.

Until the end of 19th century, geometric spaces were defined by axioms relating points, lines and planes (synthetic geometry). Around this date, it appeared that one may also define geometric spaces by constructions involving vector spaces (see, for example, Projective space and Affine space). It has been shown that the two approaches are essentially equivalent.[21] In classical geometry, the involved vector spaces are vector spaces over the reals, but the constructions may be extended to vector spaces over any field, allowing considering geometry over arbitrary fields, including finite fields.

Presently, most textbooks, introduce geometric spaces from linear algebra, and geometry is often presented, at elementary level, as a subfield of linear algebra.

## Usage and applications

Linear algebra is used in almost all areas of mathematics, thus making it relevant in almost all scientific domains that use mathematics. These applications may be divided into several wide categories.

### Geometry of ambient space

The modeling of ambient space is based on geometry. Sciences concerned with this space use geometry widely. This is the case with mechanics and robotics, for describing rigid body dynamicsgeodesy for describing Earth shapeperspectivitycomputer vision, and computer graphics, for describing the relationship between a scene and its plane representation; and many other scientific domains.

In all these applications, synthetic geometry is often used for general descriptions and a qualitative approach, but for the study of explicit situations, one must compute with coordinates. This requires the heavy use of linear algebra.

### Functional analysis

Functional analysis studies function spaces. These are vector spaces with additional structure, such as Hilbert spaces. Linear algebra is thus a fundamental part of functional analysis and its applications, which include, in particular, quantum mechanics (wave functions).

### Study of complex systems

Most physical phenomena are modeled by partial differential equations. To solve them, one usually decomposes the space in which the solutions are searched into small, mutually interacting cells. For linear systems this interaction involves linear functions. For nonlinear systems, this interaction is often approximated by linear functions.[b] In both cases, very large matrices are generally involved. Weather forecasting is a typical example, where the whole Earth atmosphere is divided in cells of, say, 100 km of width and 100 m of height.

### Scientific computation

Nearly all scientific computations involve linear algebra. Consequently, linear algebra algorithms have been highly optimized. BLAS and LAPACK are the best known implementations. For improving efficiency, some of them configure the algorithms automatically, at run time, for adapting them to the specificities of the computer (cache size, number of available cores, …).

Some processors, typically graphics processing units (GPU), are designed with a matrix structure, for optimizing the operations of linear algebra.

## Extensions and generalizations

This section presents several related topics that do not appear generally in elementary textbooks on linear algebra, but are commonly considered, in advanced mathematics, as parts of linear algebra.

### Module theory

The existence of multiplicative inverses in fields is not involved in the axioms defining a vector space. One may thus replace the field of scalars by a ring R, and this gives a structure called module over R, or R-module.

The concepts of linear independence, span, basis, and linear maps (also called module homomorphisms) are defined for modules exactly as for vector spaces, with the essential difference that, if R is not a field, there are modules that do not have any basis. The modules that have a basis are the free modules, and those that are spanned by a finite set are the finitely generated modules. Module homomorphisms between finitely generated free modules may be represented by matrices. The theory of matrices over a ring is similar to that of matrices over a field, except that determinants exist only if the ring is commutative, and that a square matrix over a commutative ring is invertible only if its determinant has a multiplicative inverse in the ring.

Vector spaces are completely characterized by their dimension (up to an isomorphism). In general, there is not such a complete classification for modules, even if one restricts oneself to finitely generated modules. However, every module is a cokernel of a homomorphism of free modules.

Modules over the integers can be identified with abelian groups, since the multiplication by an integer may identified to a repeated addition. Most of the theory of abelian groups may be extended to modules over a principal ideal domain. In particular, over a principal ideal domain, every submodule of a free module is free, and the fundamental theorem of finitely generated abelian groups may be extended straightforwardly to finitely generated modules over a principal ring.

There are many rings for which there are algorithms for solving linear equations and systems of linear equations. However, these algorithms have generally a computational complexity that is much higher than the similar algorithms over a field. For more details, see Linear equation over a ring.

### Multilinear algebra and tensors

In multilinear algebra, one considers multivariable linear transformations, that is, mappings that are linear in each of a number of different variables. This line of inquiry naturally leads to the idea of the dual space, the vector space V* consisting of linear maps f : V → F where F is the field of scalars. Multilinear maps T : Vn → F can be described via tensor products of elements of V*.

If, in addition to vector addition and scalar multiplication, there is a bilinear vector product V × V → V, the vector space is called an algebra; for instance, associative algebras are algebras with an associate vector product (like the algebra of square matrices, or the algebra of polynomials).

### Topological vector spaces

Vector spaces that are not finite dimensional often require additional structure to be tractable. A normed vector space is a vector space along with a function called a norm, which measures the “size” of elements. The norm induces a metric, which measures the distance between elements, and induces a topology, which allows for a definition of continuous maps. The metric also allows for a definition of limits and completeness – a metric space that is complete is known as a Banach space. A complete metric space along with the additional structure of an inner product (a conjugate symmetric sesquilinear form) is known as a Hilbert space, which is in some sense a particularly well-behaved Banach space. Functional analysis applies the methods of linear algebra alongside those of mathematical analysis to study various function spaces; the central objects of study in functional analysis are Lp spaces, which are Banach spaces, and especially the L2 space of square integrable functions, which is the only Hilbert space among them. Functional analysis is of particular importance to quantum mechanics, the theory of partial differential equations, digital signal processing, and electrical engineering. It also provides the foundation and theoretical framework that underlies the Fourier transform and related methods.

### Homological algebra

In the 21st century, the way students learn, interact and prepare themselves for the world outside the classroom has changed. Teachers have kept up with these changes, and have readied themselves to understand what skills students need to know – however; they may not know how to teach those skills.

During the first season of the Competencies without a Classroom podcast, we interviewed business leaders, decision-makers, hiring managers and executives on what competencies they’re looking for from their young employees, teammates and co-workers in order to succeed in today’s competitive landscape.

What we heard from these leaders was that ‘soft’ skills like resilience, problem-solving, critical thinking and resourcefulness are among the most high-demand traits for young people making the shift from the classroom to the workplace. These were the skills that set apart applicants from other applicants, individual contributors from other contributors, and the good leaders from the great leaders.

Teachers and educators know this. Educators know that these skills, among others, are the traits that their students will require to thrive in the digital era.

The challenge?

Educators may not know how to instill these skills into their students in the classroom setting.

was created to equip teachers with the tools, tactics, and resources they need to empower their students to develop the skills to succeed in our 21st century world.

We conducted 21 interviews with 21 teachers to hear how they implement 21st century skills in their classrooms.

If you had a magic wand and could change one thing about the education system as we know it today, what would you change? This is how some of our guests from season 2 of the Competencies without a Classroom podcast answered that question.As a teacher, you have tried to explain how the concepts you are teaching in the classroom will help to carry your students forward as they enter the “real world.” The Competencies without a Classroom podcast provides classroom teachers with access to brilliant minds and hearts in the “real world” bringing alive the skills and competencies required to be successful in the 21st century.” (Tanya Clift, District Career Facilitator)

An illustration of Desargues’ theorem, a result in Euclidean and projective geometry

Geometry  is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space that are related with distance, shape, size, and relative position of figures.[1] A mathematician who works in the field of geometry is called a geometer.

Until the 19th century, geometry was almost exclusively devoted to  Euclidean geometry, which includes the notions of  pointlineplanedistanceanglesurface, and  curve, as fundamental concepts.

During the 19th century several discoveries enlarged dramatically the scope of geometry. One of the oldest such discoveries is Gauss‘ Theorema Egregium (“remarkable theorem”) that asserts roughly that the Gaussian curvature of a surface is independent from any specific  embedding  in a Euclidean space. This implies that surfaces can be studied  intrinsically, that is, as stand-alone spaces, and has been expanded into the theory of manifolds  and   Riemannian geometry.

Later in the 19th century, it appeared that geometries without the parallel postulate  (non-Euclidean geometries)  can be developed without introducing any contradiction. The geometry that underlies general relativity is a famous application of non-Euclidean geometry.

Since then, the scope of geometry has been greatly expanded, and the field has been split in many subfields that depend on the underlying methods—differential geometryalgebraic geometrycomputational geometryalgebraic topologydiscrete geometry (also known as combinatorial geometry), etc.—or on the properties of Euclidean spaces that are disregarded—projective geometry that consider only alignment of points but not distance and parallelism, affine geometry that omits the concept of angle and distance, finite geometry that omits continuity, and others.

Originally developed to model the physical world, geometry has applications in almost all sciences, and also in artarchitecture, and other activities that are related to graphics. Geometry also has applications in areas of mathematics that are apparently unrelated. For example, methods of algebraic geometry are fundamental in Wiles’s proof of Fermat’s Last Theorem, a problem that was stated in terms of elementary arithmetic, and remained unsolved for several centuries.

An illustration of Desargues’ theorem, a result in Euclidean and projective geometry

Geometry  is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space that are related with distance, shape, size, and relative position of figures.[1] A mathematician who works in the field of geometry is called a geometer.

Until the 19th century, geometry was almost exclusively devoted to  Euclidean geometry, which includes the notions of  pointlineplanedistanceanglesurface, and  curve, as fundamental concepts.

During the 19th century several discoveries enlarged dramatically the scope of geometry. One of the oldest such discoveries is Gauss‘ Theorema Egregium (“remarkable theorem”) that asserts roughly that the Gaussian curvature of a surface is independent from any specific  embedding  in a Euclidean space. This implies that surfaces can be studied  intrinsically, that is, as stand-alone spaces, and has been expanded into the theory of manifolds  and   Riemannian geometry.

Later in the 19th century, it appeared that geometries without the parallel postulate  (non-Euclidean geometries)  can be developed without introducing any contradiction. The geometry that underlies general relativity is a famous application of non-Euclidean geometry.

Since then, the scope of geometry has been greatly expanded, and the field has been split in many subfields that depend on the underlying methods—differential geometryalgebraic geometrycomputational geometryalgebraic topologydiscrete geometry (also known as combinatorial geometry), etc.—or on the properties of Euclidean spaces that are disregarded—projective geometry that consider only alignment of points but not distance and parallelism, affine geometry that omits the concept of angle and distance, finite geometry that omits continuity, and others.

Originally developed to model the physical world, geometry has applications in almost all sciences, and also in artarchitecture, and other activities that are related to graphics. Geometry also has applications in areas of mathematics that are apparently unrelated. For example, methods of algebraic geometry are fundamental in Wiles’s proof of Fermat’s Last Theorem, a problem that was stated in terms of elementary arithmetic, and remained unsolved for several centuries.

Тайм-коды:

1.  0:05 Приветствую всех
2. 4:15 Тема стрима – вспомнить всё за 3 часа
3. 10:35 Задача 1
4. 12:04 Задача 2
5. 14:30 Задача 4
6. 20:07 Задача 9
7. 31:46 Задача 5
8. 35:40 Задача 10
9. 37:37 Задача 11
10. 46:10 Задачи 6 и 9 тригонометрия
11. 56:45 Задача 10 тригонометрия
12. 58:10 Задача 6 планиметрия
13. 1:42:48 Задача 8
14. 1:51:45 Задача 3
15. 1:52:27 Задача 7
16. 2:04:55 Задача 12
17. 2:11:48 перерыв
18. 2:18:13 Задача 13
19. 2:41:55 Задача 14
20. 3:02:33 Задача 15
21. 3:08:30 Задача 16
22. 3:24:20 Задача 17
23. 3:30:15 Задача 18
24. 3:36:25 Задача 19
25. 3:50:20 Подводим итоги и напутствуем

Further Mathematics is the title given to a number of advanced secondary mathematics courses. The term “Higher and Further Mathematics”, and the term “Advanced Level Mathematics”, may also refer to any of several advanced mathematics courses at many institutions.

Topics studied in Further Mathematics included:

## Statistics

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

Two main statistical methods are used in data analysisdescriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).  Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution’s central or typical value, while  dispersion  (or variability)  characterizes the extent to which members of the distribution depart from its center and each other. Inferences on  mathematical statistics  are made under the framework of  probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the collection of data leading to  test of the relationship  between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an  alternative to an idealized  null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized:  Type I errors (null hypothesis is falsely rejected giving a “false positive”) and  Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a “false negative”). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic  (bias),  but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of  missing data  or censoring may result in biased estimates and specific techniques have been developed to address these problems.

The  normal distribution, a very common  probability density, useful because of the  central limit theorem.

Scatter plots are used in descriptive statistics to show the observed relationships between different variables, here using the Iris flower data set.

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as “all people living in a country” or “every atom composing a crystal”. Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an  observational study does not involve experimental manipulation.

Two main statistical methods are used in  data analysisdescriptive statistics, which summarize data from a sample using  indexes such as the  mean or  standard deviation, and  inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).[5] Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population):  central tendency  (or location) seeks to characterize the distribution’s central or typical value, while  dispersion  (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on  mathematical statistics are made under the framework of  probability theory, which deals with the analysis of random phenomena.

A standard statistical procedure involves the collection of data leading to test of the relationship between two statistical data sets, or a data set and synthetic data drawn from an idealized model. A hypothesis is proposed for the statistical relationship between the two data sets, and this is compared as an alternative to an idealized null hypothesis of no relationship between two data sets. Rejecting or disproving the null hypothesis is done using statistical tests that quantify the sense in which the null can be proven false, given the data that are used in the test. Working from a null hypothesis, two basic forms of error are recognized: Type I errors (null hypothesis is falsely rejected giving a “false positive”) and Type II errors (null hypothesis fails to be rejected and an actual relationship between populations is missed giving a “false negative”). Multiple problems have come to be associated with this framework, ranging from obtaining a sufficient sample size to specifying an adequate null hypothesis.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also occur. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.

## Introduction

Statistics is a mathematical body of science that pertains to the collection, analysis, interpretation or explanation, and presentation of  data, or as a branch of  mathematics. Some consider statistics to be a distinct mathematical science rather than a branch of mathematics. While many scientific investigations make use of data, statistics is concerned with the use of data in the context of uncertainty and decision making in the face of uncertainty. In applying statistics to a problem, it is common practice to start with a  population or process to be studied. Populations can be diverse topics such as “all people living in a country” or “every atom composing a crystal”. Ideally, statisticians compile data about the entire population (an operation called  census). This may be organized by governmental statistical institutes.  Descriptive statistics can be used to summarize the population data. Numerical descriptors include  mean and  sta ndard deviation  for  continuous data  (like income), while frequency and percentage are more useful in terms of describing  categorical data (like education).

When a census is not feasible, a chosen subset of the population called a  sample is studied. Once a sample that is representative of the population is determined, data is collected for the sample members in an observational or  experimental setting. Again, descriptive statistics can be used to summarize the sample data. However, drawing the sample contains an element of randomness; hence, the numerical descriptors from the sample are also prone to uncertainty. To draw meaningful conclusions about the entire population,  inferential statistics is needed. It uses patterns in the sample data to draw inferences about the population represented while accounting for randomness. These inferences may take the form of answering yes/no questions about the data (hypothesis testing), estimating numerical characteristics of the data  (estimation), describing  associations within the data  (correlation), and modeling relationships within the data (for example, using regression analysis). Inference can extend to forecastingprediction, and estimation of unobserved values either in or associated with the population being studied. It can include extrapolation and interpolation of time series or spatial data, and data mining.

### Mathematical statistics

Mathematical statistics is the application of mathematics to statistics. Mathematical  techniques used for this include  mathematical analysislinear algebrastochastic analysisdifferential equations,  and  measure-theoretic probability theory.

## History

Gerolamo Cardano, a pioneer on the mathematics of probability.

The earliest European writing on statistics dates back to 1663, with the publication of Natural and Political Observations upon the Bills of Mortality by John Graunt. Early applications of statistical thinking revolved around the needs of states to base policy on demographic and economic data, hence its  stat- etymology. The scope of the discipline of statistics broadened in the early 19th century to include the collection and analysis of data in general. Today, statistics is widely employed in government, business, and natural and social sciences.

The mathematical foundations of modern statistics were laid in the 17th century with the development of the probability theory by Gerolamo CardanoBlaise Pascal and Pierre de Fermat. Mathematical probability theory arose from the study of games of chance, although the concept of probability was already examined in medieval law and by philosophers such as  Juan Caramuel   The  method of least squares  was first described by  Adrien-Marie Legendre in 1805.

Karl Pearson, a founder of mathematical statistics. The modern field of statistics emerged in the late 19th and early 20th century in three stages. The first wave, at the turn of the century, was led by the work of  Francis Galton  and  Karl Pearson, who transformed statistics into a rigorous mathematical discipline used for analysis, not just in science, but in industry and politics as well. Galton’s contributions included introducing the concepts of  standard deviationcorrelationregression analysis  and the application of these methods to the study of the variety of human characteristics—height, weight, eyelash length among others. Pearson developed the  Pearson product-moment correlation coefficient, defined as a product-moment,the  method of moments for the fitting of distributions to samples and the  Pearson distribution, among many other things. Galton and Pearson founded  Biometrika as the first journal of mathematical statistics and  biostatistics (then called biometry), and the latter founded the world’s first university statistics department at  University College London.

Ronald Fisher coined the term  null hypothesis  during the  Lady tasting tea experiment, which “is never proved or established, but is possibly disproved, in the course of experimentation”.

The second wave of the 1910s and 20s was initiated by  William Sealy Gosset,  and reached its culmination in the insights of  Ronald Fisher, who wrote the textbooks that were to define the academic discipline in universities around the world. Fisher’s most important publications were his 1918 seminal paper  The Correlation between Relatives on the Supposition of Mendelian Inheritance (which was the first to use the statistical term,  variance), his classic 1925 work  Statistical Methods for Research Workers and his 1935 The Design of Experiments , where he developed rigorou s design of experiments   models. He originated the concepts of  sufficiencyancillary statistics Fisher’s linear discriminator and  Fisher information. In his 1930 book  The Genetical Theory of Natural Selection,  he applied statistics to various  biological  concepts such as Fisher’s principle  (which  A. W. F. Edwards called “probably the most celebrated argument in  evolutionary biology“) and  Fisherian runaway, a concept in  sexual selection about a positive feedback runaway effect found in  evolution.

The final wave, which mainly saw the refinement and expansion of earlier developments, emerged from the collaborative work between  Egon Pearson and  Jerzy Neyman in the 1930s. They introduced the concepts of  “Type II” error, power of a test and  confidence intervals. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.

Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data and for making decisions in the face of uncertainty based on statistical methodology. The use of modern computers has expedited large-scale statistical computations and has also made possible new methods that are impractical to perform manually. Statistics continues to be an area of active research for example on the problem of how to analyze big data.

## Statistical data

### Data collection

#### Sampling

When full census data cannot be collected, statisticians collect sample data by developing specific  experiment designs and  survey samples. Statistics itself also provides tools for prediction and forecasting through  statistical models.

To use a sample as a guide to an entire population, it is important that it truly represents the overall population. Representative  sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. A major problem lies in determining the extent that the sample chosen is actually representative. Statistics offers methods to estimate and correct for any bias within the sample and data collection procedures. There are also methods of experimental design for experiments that can lessen these issues at the outset of a study, strengthening its capability to discern truths about the population.

Sampling theory is part of the mathematical discipline of probability theory. Probability  is used in mathematical statistics to study the sampling distributions  of sample statistics  and, more generally, the properties of  statistical procedures. The use of any statistical method is valid when the system or population under consideration satisfies the assumptions of the method. The difference in point of view between classic probability theory and sampling theory is, roughly, that probability theory starts from the given parameters of a total population to deduce probabilities that pertain to samples. Statistical inference, however, moves in the opposite direction—inductively inferring from samples to the parameters of a larger or total population.

#### Experimental and observational studies

A common goal for a statistical research project is to investigate causality, and in particul ar to draw a conclusion on the effect of changes in the values of predictors or independent variables on dependent variables. There are two major types of causal statistical studies:  experimental studie s an d observational studies. In both types of studies, the effect of differences of an independent variable (or variables) on the behavior of the dependent variable are observed. The difference between the two types lies in how the study is actually conducted. Each can be very effective. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation. Instead, data are gathered and correlations between predictors and response are investigated. While the tools of data analysis work best on data from  randomized studies, they are also applied to other kinds of data—li ke natural experiments and  observational studie— for which a statistician would use a modified, more structured estimation method (e.g., Difference in differences estimation  and  instrumental variables, among many others) that produce  consistent estimators.

##### Experiments

The basic steps of a statistical experiment are:

1. Planning the research, including finding the number of replicates of the study, using the following information: preliminary estimates regarding the size of  treatment effectsalternative hypotheses,  and the estimated  experimental  variability. Consideration of the selection of experimental subjects and the ethics of research is necessary. Statisticians recommend that experiments compare (at least) one new treatment with a standard treatment or control, to allow an unbiased estimate of the difference in treatment effects.
2.  Design of experiments, using  blocking to reduce the influence of  confounding variables, and randomized assignment of treatments to subjects to allow unbiased estimates of treatment effects and experimental error. At this stage, the experimenters and statisticians write the experimental protocol  that will guide the performance of the experiment and which specifies the primary analysis of the experimental data.
3. Performing the experiment following the experimental protocol  and  analyzing the data following the experimental protocol.
4. Further examining the data set in secondary analyses, to suggest new hypotheses for future study.
5. Documenting and presenting the results of the study.

Experiments on human behavior have special concerns. The famous  Hawthorne study examined changes to the working environment at the Hawthorne plant of the  Western Electric Company. The researchers were interested in determining whether increased illumination would increase the productivity of the  assembly line workers. The researchers first measured the productivity in the plant, then modified the illumination in an area of the plant and checked if the changes in illumination affected productivity. It turned out that productivity indeed improved (under the experimental conditions). However, the study is heavily criticized today for errors in experimental procedures, specifically for the lack of a control group and blindness. The  Hawthorne effect refers to finding that an outcome (in this case, worker productivity) changed due to observation itself. Those in the Hawthorne study became more productive not because the lighting was changed but because they were being observed.

##### Observational study

An example of an observational study is one that explores the association between smoking and lung cancer. This type of study typically uses a survey to collect observations about the area of interest and then performs statistical analysis. In this case, the researchers would collect observations of both smokers and non-smokers, perhaps through a cohort study, and then look for the number of cases of lung cancer in each group.  A  case-control study is another type of observational study in which people with and without the outcome of interest (e.g. lung cancer) are invited to participate and their exposure histories are collected.

### Types of data

Various attempts have been made to produce a taxonomy of  levels of measurement. The psychophysicist  Stanley Smith Stevens defined nominal, ordinal, interval, and ratio scales. Nominal measurements do not have meaningful rank order among values, and permit any one-to-one (injective) transformation. Ordinal measurements have imprecise differences between consecutive values, but have a meaningful order to those values, and permit any order-preserving transformation. Interval measurements have meaningful distances between measurements defined, but the zero value is arbitrary (as in the case with  longitude  and  temperature  measurements in  Celsius  or  Fahrenheit), and permit any linear transformation. Ratio measurements have both a meaningful zero value and the distances between different measurements defined, and permit any rescaling transformation.

Because variables conforming only to nominal or ordinal measurements cannot be reasonably measured numerically, sometimes they are grouped together as  categorical variables, whereas ratio and interval measurements are grouped together as quantitative variables, which can be either discrete or continuous, due to their numerical nature. Such distinctions can often be loosely correlated with data type in computer science, in that dichotomous categorical variables may be represented with the Boolean data type, polytomous categorical variables with arbitrarily assigned integers in the integral data type, and continuous variables with the real data type involving floating point computation. But the mapping of computer science data types to statistical data types depends on which categorization of the latter is being implemented.

Other categorizations have been proposed. For example, Mosteller and Tukey (1977  distinguished grades, ranks, counted fractions, counts, amounts, and balances. Nelder (1990)  described continuous counts, continuous ratios, count ratios, and categorical modes of data. (See also: Chrisman (1998), van den Berg (1991). )

The issue of whether or not it is appropriate to apply different kinds of statistical methods to data obtained from different kinds of measurement procedures is complicated by issues concerning the transformation of variables and the precise interpretation of research questions. “The relationship between the data and what they describe merely reflects the fact that certain kinds of statistical statements may have truth values which are not invariant under some transformations. Whether or not a transformation is sensible to contemplate depends on the question one is trying to answer.”

## Methods

### Descriptive statistics

descriptive statistic (in the count noun sense) is a summary statistic that quantitatively describes or summarizes features of a collection of information,[44] while descriptive statistics in the mass noun sense is the process of using and analyzing those statistics. Descriptive statistics is distinguished from inferential statistics (or inductive statistics), in that descriptive statistics aims to summarize a sample, rather than use the data to learn about the population that the sample of data is thought to represent.

### Inferential statistic

Statistical inference is the process of using data analysis to deduce properties of an underlying probability distribution.[45] Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. Inferential statistics can be contrasted with descriptive statistics. Descriptive statistics is solely concerned with properties of the observed data, and it does not rest on the assumption that the data come from a larger population.

#### Terminology and theory of inferential statistics

##### Statistics, estimators and pivotal quantities

Consider independent identically distributed (IID) random variables with a given probability distribution: standard statistical inference and estimation theory defines a random sample as the random vector given by the column vector of these IID variables.  The population being examined is described by a probability distribution that may have unknown parameters.

A statistic is a random variable that is a function of the random sample, but not a function of unknown parameters. The probability distribution of the statistic, though, may have unknown parameters. Consider now a function of the unknown parameter: an estimator is a statistic used to estimate such function. Commonly used estimators include sample mean, unbiased sample variance and sample covariance.

A random variable that is a function of the random sample and of the unknown parameter, but whose probability distribution does not depend on the unknown parameter is called a pivotal quantity or pivot. Widely used pivots include the z-score, the chi square statistic and Student’s t-value.

Between two estimators of a given parameter, the one with lower mean squared error is said to be more efficient. Furthermore, an estimator is said to be unbiased if its expected value is equal to the true value of the unknown parameter being estimated, and asymptotically unbiased if its expected value converges at the limit to the true value of such parameter.

Other desirable properties for estimators include: UMVUE estimators that have the lowest variance for all possible values of the parameter to be estimated (this is usually an easier property to verify than efficiency) and consistent estimators which converges in probability to the true value of such parameter.

This still leaves the question of how to obtain estimators in a given situation and carry the computation, several methods have been proposed: the method of moments, the maximum likelihood method, the least squares method and the more recent method of estimating equations.

##### Null hypothesis and alternative hypothesis

Interpretation of statistical information can often involve the development of a null hypothesis which is usually (but not necessarily) that no relationship exists among variables or that no change occurred over time.[47][48]

The best illustration for a novice is the predicament encountered by a criminal trial. The null hypothesis, H0, asserts that the defendant is innocent, whereas the alternative hypothesis, H1, asserts that the defendant is guilty. The indictment comes because of suspicion of the guilt. The H0 (status quo) stands in opposition to H1 and is maintained unless H1 is supported by evidence “beyond a reasonable doubt”. However, “failure to reject H0” in this case does not imply innocence, but merely that the evidence was insufficient to convict. So the jury does not necessarily accept H0 but fails to reject H0. While one can not “prove” a null hypothesis, one can test how close it is to being true with a power test, which tests for type II errors.

What statisticians call an alternative hypothesis is simply a hypothesis that contradicts the null hypothesis.

##### Error

Working from a null hypothesis, two broad categories of error are recognized:

• Type I errors where the null hypothesis is falsely rejected, giving a “false positive”.
• Type II errors where the null hypothesis fails to be rejected and an actual difference between populations is missed, giving a “false negative”.

Standard deviation refers to the extent to which individual observations in a sample differ from a central value, such as the sample or population mean, while Standard error refers to an estimate of difference between sample mean and population mean.

statistical error is the amount by which an observation differs from its expected value. A residual is the amount an observation differs from the value the estimator of the expected value assumes on a given sample (also called prediction).

Mean squared error is used for obtaining efficient estimators, a widely used class of estimators. Root mean square error is simply the square root of mean squared error.

A least squares fit: in red the points to be fitted, in blue the fitted line.

Many statistical methods seek to minimize the residual sum of squares, and these are called “methods of least squares” in contrast to Least absolute deviations. The latter gives equal weight to small and big errors, while the former gives more weight to large errors. Residual sum of squares is also differentiable, which provides a handy property for doing regression. Least squares applied to linear regression is called ordinary least squares method and least squares applied to nonlinear regression is called non-linear least squares. Also in a linear regression model the non deterministic part of the model is called error term, disturbance or more simply noise. Both linear regression and non-linear regression are addressed in polynomial least squares, which also describes the variance in a prediction of the dependent variable (y axis) as a function of the independent variable (x axis) and the deviations (errors, noise, disturbances) from the estimated (fitted) curve.

Measurement processes that generate statistical data are also subject to error. Many of these errors are classified as random (noise) or systematic (bias), but other types of errors (e.g., blunder, such as when an analyst reports incorrect units) can also be important. The presence of missing data or censoring may result in biased estimates and specific techniques have been developed to address these problems.[49]

##### Interval estimation

Confidence intervals: the red line is true value for the mean in this example, the blue lines are random confidence intervals for 100 realizations.

Most studies only sample part of a population, so results don’t fully represent the whole population. Any estimates obtained from the sample only approximate the population value. Confidence intervals allow statisticians to express how closely the sample estimate matches the true value in the whole population. Often they are expressed as 95% confidence intervals. Formally, a 95% confidence interval for a value is a range where, if the sampling and analysis were repeated under the same conditions (yielding a different dataset), the interval would include the true (population) value in 95% of all possible cases. This does not imply that the probability that the true value is in the confidence interval is 95%. From the frequentist perspective, such a claim does not even make sense, as the true value is not a random variable. Either the true value is or is not within the given interval. However, it is true that, before any data are sampled and given a plan for how to construct the confidence interval, the probability is 95% that the yet-to-be-calculated interval will cover the true value: at this point, the limits of the interval are yet-to-be-observed random variables. One approach that does yield an interval that can be interpreted as having a given probability of containing the true value is to use a credible interval from Bayesian statistics: this approach depends on a different way of interpreting what is meant by “probability”, that is as a Bayesian probability.

In principle confidence intervals can be symmetrical or asymmetrical. An interval can be asymmetrical because it works as lower or upper bound for a parameter (left-sided interval or right sided interval), but it can also be asymmetrical because the two sided interval is built violating symmetry around the estimate. Sometimes the bounds for a confidence interval are reached asymptotically and these are used to approximate the true bounds.

##### Significance

Statistics rarely give a simple Yes/No type answer to the question under analysis. Interpretation often comes down to the level of statistical significance applied to the numbers and often refers to the probability of a value accurately rejecting the null hypothesis (sometimes referred to as the p-value).

In this graph the black line is probability distribution for the test statistic, the critical region is the set of values to the right of the observed data point (observed value of the test statistic) and the p-value is represented by the green area.

The standard approach[46] is to test a null hypothesis against an alternative hypothesis. A critical region is the set of values of the estimator that leads to refuting the null hypothesis. The probability of type I error is therefore the probability that the estimator belongs to the critical region given that null hypothesis is true (statistical significance) and the probability of type II error is the probability that the estimator doesn’t belong to the critical region given that the alternative hypothesis is true. The statistical power of a test is the probability that it correctly rejects the null hypothesis when the null hypothesis is false.

Referring to statistical significance does not necessarily mean that the overall result is significant in real world terms. For example, in a large study of a drug it may be shown that the drug has a statistically significant but very small beneficial effect, such that the drug is unlikely to help the patient noticeably.

Although in principle the acceptable level of statistical significance may be subject to debate, the significance level is the largest p-value that allows the test to reject the null hypothesis. This test is logically equivalent to saying that the p-value is the probability, assuming the null hypothesis is true, of observing a result at least as extreme as the test statistic. Therefore, the smaller the significance level, the lower the probability of committing type I error.

Some problems are usually associated with this framework (See criticism of hypothesis testing):

• A difference that is highly statistically significant can still be of no practical significance, but it is possible to properly formulate tests to account for this. One response involves going beyond reporting only the significance level to include the p-value when reporting whether a hypothesis is rejected or accepted. The p-value, however, does not indicate the size or importance of the observed effect and can also seem to exaggerate the importance of minor differences in large studies. A better and increasingly common approach is to report confidence intervals. Although these are produced from the same calculations as those of hypothesis tests or p-values, they describe both the size of the effect and the uncertainty surrounding it.
• Fallacy of the transposed conditional, aka prosecutor’s fallacy: criticisms arise because the hypothesis testing approach forces one hypothesis (the null hypothesis) to be favored, since what is being evaluated is the probability of the observed result given the null hypothesis and not probability of the null hypothesis given the observed result. An alternative to this approach is offered by Bayesian inference, although it requires establishing a prior probability.
• Rejecting the null hypothesis does not automatically prove the alternative hypothesis.
• As everything in inferential statistics it relies on sample size, and therefore under  fat tails  p-values may be seriously mis-computed.
##### Examples

Some well-known statistical  tests  and procedures are:

### Exploratory data analysis

Exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.

## Misuse

Misuse of statistics can produce subtle but serious errors in description and interpretation—subtle in the sense that even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper use of statistics.

Even when statistical techniques are correctly applied, the results can be difficult to interpret for those lacking expertise. The statistical significance of a trend in the data—which measures the extent to which a trend could be caused by random variation in the sample—may or may not agree with an intuitive sense of its significance. The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy.

There is a general perception that statistical knowledge is all-too-frequently intentionally misused by finding ways to interpret only the data that are favorable to the presenter.  A mistrust and misunderstanding of statistics is associated with the quotation, “There are three kinds of lies: lies, damned lies, and statistics“. Misuse of statistics can be both inadvertent and intentional, and the book How to Lie with Statistics,  by Darrell Huff, outlines a range of considerations. In an attempt to shed light on the use and misuse of statistics, reviews of statistical techniques used in particular fields are conducted (e.g. Warne, Lazo, Ramos, and Ritter (2012)).

Ways to avoid misuse of statistics include using proper diagrams and avoiding  bias  Misuse can occur when conclusions are  overgeneralized  and claimed to be representative of more than they really are, often by either deliberately or unconsciously overlooking sampling bias. Bar graphs are arguably the easiest diagrams to use and understand, and they can be made either by hand or with simple computer programs. Unfortunately, most people do not look for bias or errors, so they are not noticed. Thus, people may often believe that something is true even if it is not well represented.[54] To make data gathered from statistics believable and accurate, the sample taken must be representative of the whole  According to Huff, “The dependability of a sample can be destroyed by [bias]… allow yourself some degree of skepticism.”

To assist in the understanding of statistics Huff proposed a series of questions to be asked in each case:

• Who says so? (Does he/she have an axe to grind?)
• How does he/she know? (Does he/she have the resources to know the facts?)
• What’s missing? (Does he/she give us a complete picture?)
• Did someone change the subject? (Does he/she offer us the right answer to the wrong problem?)
• Does it make sense? (Is his/her conclusion logical and consistent with what we already know?)

The confounding variable problem: X and Y may be correlated, not because there is causal relationship between them, but because both depend on a third variable ZZ is called a confounding factor.

Misinterpretation: correlation

The concept of correlation is particularly noteworthy for the potential confusion it can cause. Statistical analysis of a data set often reveals that two variables (properties) of the population under consideration tend to vary together, as if they were connected. For example, a study of annual income that also looks at age of death might find that poor people tend to have shorter lives than affluent people. The two variables are said to be correlated; however, they may or may not be the cause of one another. The correlation phenomena could be caused by a third, previously unconsidered phenomenon, called a lurking variable or confounding variable. For this reason, there is no way to immediately infer the existence of a causal relationship between the two variables.

## Applications

### Applied statistics, theoretical statistics and mathematical statistics

Applied statistics, sometimes referred to as Statistical science, comprises descriptive statistics and the application of inferential statistics.[58][59] Theoretical statistics concerns the logical arguments underlying justification of approaches to statistical inference, as well as encompassing mathematical statistics. Mathematical statistics includes not only the manipulation of probability distributions necessary for deriving results related to methods of estimation and inference, but also various aspects of computational statistics and the design of experiments.

Statistical consultants can help organizations and companies that don’t have in-house expertise relevant to their particular questions.

### Machine learning and data mining

Machine learning models are statistical and probabilistic models that capture patterns in the data through use of computational algorithms.

Statistics is applicable to a wide variety of academic disciplines, including  natural  and social s ciences,  government, and business. Business statistics applies statistical methods in econometricsauditing and production and operations, including services improvement and marketing research. A study of two journals in tropical biology found that the 12 most frequent statistical tests are: Analysis of Variance  (ANOVA),  Chi-Square TestStudent’s T TestLinear RegressionPearson’s Correlation CoefficientMann-Whitney U TestKruskal-Wallis TestShannon’s Diversity IndexTukey’s TestCluster AnalysisSpearman’s Rank Correlation Test  and  Principal Component Analysis.

A typical statistics course covers descriptive statistics, probability, binomial and normal distributions, test of hypotheses and confidence intervals, linear regression, and correlation. Modern fundamental statistical courses for undergraduate students focus on correct test selection, results interpretation, and use of free statistics software.

### Statistical computing

The rapid and sustained increases in computing power starting from the second half of the 20th century have had a substantial impact on the practice of statistical science. Early statistical models were almost always from the class of linear models, but powerful computers, coupled with suitable numerical algorithms, caused an increased interest in nonlinear models (such as neural networks) as well as the creation of new types, such as generalized linear models and multilevel models.

Increased computing power has also led to the growing popularity of computationally intensive methods based on resampling, such as permutation tests and the bootstrap, while techniques such as Gibbs sampling have made use of Bayesian models more feasible. The computer revolution has implications for the future of statistics with a new emphasis on “experimental” and “empirical” statistics. A large number of both general and special purpose statistical software are now available. Examples of available software capable of complex statistical computation include programs such as MathematicaSASSPSS, and R.

In business, “statistics” is a widely used management- and decision support tool. It is particularly applied in financial managementmarketing management, and productionservices and operations management . Statistics is also heavily used in management accounting and auditing. The discipline of Management Science formalizes the use of statistics, and other mathematics, in business. (Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships.)

A typical “Business Statistics” course is intended for business majors, and covers [65] descriptive statistics (collection, description, analysis, and summary of data), probability (typically the binomial and normal distributions), test of hypotheses and confidence intervals, linear regression, and correlation; (follow-on) courses may include forecastingtime seriesdecision treesmultiple linear regression, and other topics from business analytics more generally. See also Business mathematics § University levelProfessional certification programs, such as the CFA, often include topics in statistics.

### Statistics applied to mathematics or the arts

Traditionally, statistics was concerned with drawing inferences using a semi-standardized methodology that was “required learning” in most sciences.[citation needed] This tradition has changed with the use of statistics in non-inferential contexts. What was once considered a dry subject, taken in many fields as a degree-requirement, is now viewed enthusiastically.[according to whom?] Initially derided by some mathematical purists, it is now considered essential methodology in certain areas.

• In number theoryscatter plots of data generated by a distribution function may be transformed with familiar tools used in statistics to reveal underlying patterns, which may then lead to hypotheses.
• Predictive methods of statistics in forecasting combining chaos theory and fractal geometry can be used to create video works.[66]
• The process art of Jackson Pollock relied on artistic experiments whereby underlying distributions in nature were artistically revealed.[67] With the advent of computers, statistical methods were applied to formalize such distribution-driven natural processes to make and analyze moving video art.[citation needed]
• Methods of statistics may be used predicatively in performance art, as in a card trick based on a Markov process that only works some of the time, the occasion of which can be predicted using statistical methodology.
• Statistics can be used to predicatively create art, as in the statistical or stochastic music invented by Iannis Xenakis, where the music is performance-specific. Though this type of artistry does not always come out as expected, it does behave in ways that are predictable and tunable using statistics.

## Specialized disciplines

Statistical techniques are used in a wide range of types of scientific and social research, including: biostatisticscomputational biologycomputational sociologynetwork biologysocial sciencesociology and social research. Some fields of inquiry use applied statistics so extensively that they have specialized terminology. These disciplines include:

In addition, there are particular types of statistical analysis that have also developed their own specialised terminology and methodology:

Statistics form a key basis tool in business and manufacturing as well. It is used to understand measurement systems variability, control processes (as in statistical process control or SPC), for summarizing data, and to make data-driven decisions. In these roles, it is a key tool, and perhaps the only reliable tool.[citation needed]

Foundations and major areas of statistics

## Duality

linear form is a linear map from a vector space V over a field F to the field of scalars F, viewed as a vector space over itself. Equipped by pointwise addition and multiplication by a scalar, the linear forms form a vector space, called the dual space of V, and usually denoted V*[16] or V′.[17][18]

If v1, …, vn is a basis of V (this implies that V is finite-dimensional), then one can define, for i = 1, …, n, a linear map vi* such that vi*(vi) = 1 and vi*(vj) = 0 if j ≠ i. These linear maps form a basis of V*, called the dual basis of v1, …, vn. (If V is not finite-dimensional, the vi* may be defined similarly; they are linearly independent, but do not form a basis.)

For v in V, the map

{\displaystyle f\to f(\mathbf {v} )}

is a linear form on V*. This defines the canonical linear map from V into (V*)*, the dual of V*, called the bidual of V. This canonical map is an isomorphism if V is finite-dimensional, and this allows identifying V with its bidual. (In the infinite dimensional case, the canonical map is injective, but not surjective.)

There is thus a complete symmetry between a finite-dimensional vector space and its dual. This motivates the frequent use, in this context, of the bra–ket notation

{\displaystyle \langle f,\mathbf {x} \rangle }

for denoting f(x).

### Dual map

Let

{\displaystyle f:V\to W}

be a linear map. For every linear form h on W, the composite function h ∘ f is a linear form on V. This defines a linear map

{\displaystyle f^{*}:W^{*}\to V^{*}}

between the dual spaces, which is called the dual or the transpose of f.

If V and W are finite dimensional, and M is the matrix of f in terms of some ordered bases, then the matrix of f * over the dual bases is the transpose MT of M, obtained by exchanging rows and columns.

If elements of vector spaces and their duals are represented by column vectors, this duality may be expressed in bra–ket notation by

{\displaystyle \langle h^{\mathsf {T}},M\mathbf {v} \rangle =\langle h^{\mathsf {T}}M,\mathbf {v} \rangle .}

For highlighting this symmetry, the two members of this equality are sometimes written

{\displaystyle \langle h^{\mathsf {T}}\mid M\mid \mathbf {v} \rangle .}

### Inner-product spaces

Besides these basic concepts, linear algebra also studies vector spaces with additional structure, such as an inner product. The inner product is an example of a bilinear form, and it gives the vector space a geometric structure by allowing for the definition of length and angles. Formally, an inner product is a map

{\displaystyle \langle \cdot ,\cdot \rangle :V\times V\to F}

that satisfies the following three axioms for all vectors uvw in V and all scalars a in F:[19][20]

• Conjugate symmetry:
{\displaystyle \langle \mathbf {u} ,\mathbf {v} \rangle ={\overline {\langle \mathbf {v} ,\mathbf {u} \rangle }}.}
In , it is symmetric.
• Linearity in the first argument:
{\displaystyle {\begin{aligned}\langle a\mathbf {u} ,\mathbf {v} \rangle &=a\langle \mathbf {u} ,\mathbf {v} \rangle .\\\langle \mathbf {u} +\mathbf {v} ,\mathbf {w} \rangle &=\langle \mathbf {u} ,\mathbf {w} \rangle +\langle \mathbf {v} ,\mathbf {w} \rangle .\end{aligned}}}
• Positive-definiteness:
{\displaystyle \langle \mathbf {v} ,\mathbf {v} \rangle \geq 0}
with equality only for v = 0.

We can define the length of a vector v in V by

{\displaystyle \|\mathbf {v} \|^{2}=\langle \mathbf {v} ,\mathbf {v} \rangle ,}

and we can prove the Cauchy–Schwarz inequality:

{\displaystyle |\langle \mathbf {u} ,\mathbf {v} \rangle |\leq \|\mathbf {u} \|\cdot \|\mathbf {v} \|.}

In particular, the quantity

{\displaystyle {\frac {|\langle \mathbf {u} ,\mathbf {v} \rangle |}{\|\mathbf {u} \|\cdot \|\mathbf {v} \|}}\leq 1,}

and so we can call this quantity the cosine of the angle between the two vectors.

Two vectors are orthogonal if uv⟩ = 0. An orthonormal basis is a basis where all basis vectors have length 1 and are orthogonal to each other. Given any finite-dimensional vector space, an orthonormal basis could be found by the Gram–Schmidt procedure. Orthonormal bases are particularly easy to deal with, since if v = a1 v1 + ⋯ + an vn, then

{\displaystyle a_{i}=\langle \mathbf {v} ,\mathbf {v} _{i}\rangle .}

The inner product facilitates the construction of many useful concepts. For instance, given a transform T, we can define its Hermitian conjugate T* as the linear transform satisfying

{\displaystyle \langle T\mathbf {u} ,\mathbf {v} \rangle =\langle \mathbf {u} ,T^{*}\mathbf {v} \rangle .}

If T satisfies TT* = T*T, we call T normal. It turns out that normal matrices are precisely the matrices that have an orthonormal system of eigenvectors that span V.

## Relationship with geometry

There is a strong relationship between linear algebra and geometry, which started with the introduction by René Descartes, in 1637, of Cartesian coordinates. In this new (at that time) geometry, now called Cartesian geometry, points are represented  by  Cartesian coordinates, which are sequences of three real numbers (in the case of the usual  three-dimensional space). The basic objects of geometry, which are  lines  and  planes  are represented by linear equations. Thus, computing intersections of lines and planes amounts to solving systems of linear equations. This was one of the main motivations for developing linear algebra.

Most geometric transformation, such as translationsrotationsreflectionsrigid motionsisometries, and projections transform lines into lines. It follows that they can be defined, specified and studied in terms of linear maps. This is also the case of homographies and Möbius transformations, when considered as transformations  of a projective space.

Until the end of 19th century, geometric spaces were defined by axioms relating points, lines and planes (synthetic geometry). Around this date, it appeared that one may also define geometric spaces by constructions involving vector spaces (see, for example, Projective space and Affine space). It has been shown that the two approaches are essentially equivalent. In classical geometry, the involved vector spaces are vector spaces over the reals, but the constructions may be extended to vector spaces over any field, allowing considering geometry over arbitrary fields, including finite fields.

Presently, most textbooks, introduce geometric spaces from linear algebra, and geometry is often presented, at elementary level, as a subfield of linear algebra.

## Usage and applications

Linear algebra is used in almost all areas of mathematics, thus making it relevant in almost all scientific domains that use mathematics. These applications may be divided into several wide categories.

### Geometry of ambient space

The modeling of ambient space is based on geometry. Sciences concerned with this space use geometry widely. This is the case with mechanics and robotics, for describing rigid body dynamicsgeodesy for describing Earth

shapeperspectivitycomputer vision,  and  computer graphics,  for describing the relationship between a scene and its plane representation; and many other scientific domains.

In all t  hese applicat ions, synthetic geometry is often used for general descriptions and a qualitative approach, but for the study of explicit situations, one must compute with coordinates. This requires the heavy use of linear algebra.

### Functional analysis

Functional analysis studies function spaces. These are vector spaces with additional structure, such as Hilbert spaces. Linear algebra is thus a fundamental part of functional analysis and its applications, which include, in particular, quantum mechanics (wave functions).

### Study of complex systems

Most physical phenomena are modeled by partial differential equations. To solve them, one usually decomposes the space in which the solutions are searched into small, mutually interacting cells. For linear systems this interaction involves linear functions. For nonlinear systems, this interaction is often approximated by linear functions.[b] In both cases, very large matrices are generally involved. Weather forecasting is a typical example, where the whole Earth atmosphere is divided in cells of, say, 100 km of width and 100 m of height.

### Scientific computation

Nearly all scientific computations involve linear algebra. Consequently, linear algebra algorithms have been highly optimized. BLAS and LAPACK are the best known implementations. For improving efficiency, some of them configure the algorithms automatically, at run time, for adapting them to the specificities of the computer (cache size, number of available cores, …).

Some processors, typically graphics processing units (GPU), are designed with a matrix structure, for optimizing the operations of linear algebra.

## Extensions and generalizations

This section presents several related topics that do not appear generally in elementary textbooks on linear algebra, but are commonly considered, in advanced mathematics, as parts of linear algebra.

### Module theory

The existence of multiplicative inverses in fields is not involved in the axioms defining a vector space. One may thus replace the field of scalars by a ring R, and this gives a structure called module over R, or R-module.

The concepts of linear independence, span, basis, and linear maps (also called module homomorphisms) are defined for modules exactly as for vector spaces, with the essential difference that, if R is not a field, there are modules that do not have any basis. The modules that have a basis are the free modules, and those that are spanned by a finite set are the finitely generated modules. Module homomorphisms between finitely generated free modules may be represented by matrices. The theory of matrices over a ring is similar to that of matrices over a field, except that determinants exist only if the ring is commutative, and that a square matrix over a commutative ring is invertible only if its determinant has a multiplicative inverse in the ring.

Vector spaces are completely characterized by their dimension (up to an isomorphism). In general, there is not such a complete classification for modules, even if one restricts oneself to finitely generated modules. However, every module is a cokernel of a homomorphism of free modules.

Modules over the integers can be identified with abelian groups, since the multiplication by an integer may identified to a repeated addition. Most of the theory of abelian groups may be extended to modules over a principal ideal domain. In particular, over a principal ideal domain, every submodule of a free module is free, and the fundamental theorem of finitely generated abelian groups may be extended straightforwardly to finitely generated modules over a principal ring.

There are many rings for which there are algorithms for solving linear equations and systems of linear equations. However, these algorithms have generally a computational complexity that is much higher than the similar algorithms over a field. For more details, see Linear equation over a ring.

### Multilinear algebra and tensors

In multilinear algebra, one considers multivariable linear transformations, that is, mappings that are linear in each of a number of different variables. This line of inquiry naturally leads to the idea of the dual space, the vector space V* consisting of linear maps f : V → F where F is the field of scalars. Multilinear maps T : Vn → F can be described via tensor products of elements of V*.

If, in addition to vector addition and scalar multiplication, there is a bilinear vector product V × V → V, the vector space is called an algebra; for instance, associative algebras are algebras with an associate vector product (like the algebra of square matrices, or the algebra of polynomials).

### Topological vector spaces

Vector spaces that are not finite dimensional often require additional structure to be tractable. A normed vector space is a vector space along with a function called a norm, which measures the “size” of elements. The norm induces a metric, which measures the distance between elements, and induces a topology, which allows for a definition of continuous maps. The metric also allows for a definition of limits and completeness – a metric space that is complete is known as a Banach space. A complete metric space along with the additional structure of an inner product (a conjugate symmetric sesquilinear form) is known as a Hilbert space, which is in some sense a particularly well-behaved Banach space. Functional analysis applies the methods of linear algebra alongside those of mathematical analysis to study various function spaces; the central objects of study in functional analysis are Lp spaces, which are Banach spaces, and especially the L2 space of square integrable functions, which is the only Hilbert space among them. Functional analysis is of particular importance to quantum mechanics, the theory of partial differential equations, digital signal processing, and electrical engineering. It also provides the foundation and theoretical framework that underlies the Fourier transform and related methods.

### Homological algebra

#### Topic 5 – Calculus – infinite sequences and series, limits, improper integrals and various first-order ordinary differential equations

Calculus, originally called infinitesimal calculus or “the calculus of infinitesimals“, is the  mathematical study of continuous change, in the same way that  geometry is the study of shape, and  algebra is the study of generalizations of  arithmetic operations.

It has two major branches,  differential calculus  and  integral calculus; differential calculus concerns instantaneous rates of change, and the slopes of curves, while integral calculus concerns accumulation of quantities, and areas under or between curves. These two branches are related to each other by the fundamental theorem of calculus, and they make use of the fundamental notions of  convergence of  infinite sequences and  infinite series to a well-defined  limit.

Infinitesimal calculus was developed independently in the late 17th century by Isaac Newton and Gottfried Wilhelm Leibniz.  Later work, including codifying the idea of limits, put these developments on a more solid conceptual footing. Today, calculus has widespread uses in scienceengineering, and social science.

In mathematics educationcalculus denotes courses of elementary mathematical analysis, which are mainly devoted to the study of functions and limits. The word calculus is Latin for “small pebble” (the diminutive of calx, meaning “stone”). Because such pebbles were used for counting out distances, tallying votes, and doing abacus arithmetic, the word came to mean a method of computation. In this sense, it was used in English at least as early as 1672, several years prior to the publications of Leibniz and Newton. (The older meaning still persists in medicine.) In addition to the differential calculus and integral calculus, the term is also used for naming specific methods of calculation and related theories, such as propositional calculusRicci calculuscalculus of variationslambda calculus, and process calculus.

## History

Modern calculus was developed in 17th-century Europe by Isaac Newton and Gottfried Wilhelm Leibniz (independently of each other, first publishing around the same time) but elements of it appeared in ancient Greece, then in China and the Middle East, and still later again in medieval Europe and in India.

### Ancient precursors

#### Egypt

Calculations of volume and area, one goal of integral calculus, can be found in the Egyptian Moscow papyrus (c. 1820 BC), but the formulae are simple instructions, with no indication as to how they were obtained.

#### Greece

Archimedes used the method of exhaustion to calculate the area under a parabola.

Laying the foundations for integral calculus and foreshadowing the concept of the limit, ancient Greek mathematician Eudoxus of Cnidus (c. 390 – 337 BCE) developed the method of exhaustion to prove the formulas for cone and pyramid volumes.

During the Hellenistic period, this method was further developed by Archimedes, who combined it with a concept of the indivisibles—a precursor to infinitesimals—allowing him to solve several problems now treated by integral calculus. These problems include, for example, calculating the center of gravity of a solid hemisphere, the center of gravity of a frustum of a circular paraboloid, and the area of a region bounded by a parabola and one of its secant lines.

#### China

The method of exhaustion was later discovered independently in China by Liu Hui in the 3rd century AD in order to find the area of a circle.

In the 5th century AD, Zu Gengzhi, son of Zu Chongzhi, established a method[13] that would later be called Cavalieri’s principle to find the volume of a sphere.

### Medieval

#### Middle East

Alhazen, 11th-century Arab mathematician and physicis. In  the  Middle   East, Hasan Ibn al-Haytham, Latinized as Alhazen  (c. 965 – c. 1040 CE) derived a formula for the sum of fourth powers. He used the results to carry out what would now be called an integration of this function, where the formulae for the sums of integral squares and fourth powers allowed him to calculate the volume of a paraboloid.

#### India

In the 14th century, Indian mathematicians gave a non-rigorous method, resembling differentiation, applicable to some trigonometric functions. Madhava of Sangamagrama and the Kerala School of Astronomy and Mathematics thereby stated components of calculus. A complete theory encompassing these components is now well known in the Western . However, they were not able to “combine many differing ideas under the two unifying themes of the derivative and the integral, show the connection between the two, and turn calculus into the great problem-solving tool we have today”.

### Modern

The calculus was the first achievement of modern mathematics and it is difficult to overestimate its importance. I think it defines more unequivocally than anything else the inception of modern mathematics, and the system of mathematical analysis, which is its logical development, still constitutes the greatest technical advance in exact thinking.

— John von Neumann

Johannes Kepler‘s work Stereometrica Doliorum formed the basis of integral calculus.[18] Kepler developed a method to calculate the area of an ellipse by adding up the lengths of many radii drawn from a focus of the ellipse.

A significant work was a treatise, the origin being Kepler’s methods,[19] written by Bonaventura Cavalieri, who argued that volumes and areas should be computed as the sums of the volumes and areas of infinitesimally thin cross-sections. The ideas were similar to Archimedes’ in The Method, but this treatise is believed to have been lost in the 13th century, and was only rediscovered in the early 20th century, and so would have been unknown to Cavalieri. Cavalieri’s work was not well respected since his methods could lead to erroneous results, and the infinitesimal quantities he introduced were disreputable at first.

The formal study of calculus brought together Cavalieri’s infinitesimals with the calculus of finite differences developed in Europe at around the same time. Pierre de Fermat, claiming that he borrowed from Diophantus, introduced the concept of adequality, which represented equality up to an infinitesimal error term. The combination was achieved by  John WallisIsaac Barrow, and  James Gregory, the latter two proving predecessors to the  second fundamental theorem of calculus around 1670.

Isaac Newton developed the use of calculus in his laws of motion and gravitation.

The product rule and chain rule, the notions of higher derivatives and Taylor series, and of analytic functions were used by Isaac Newton in an idiosyncratic notation which he applied to solve problems of mathematical physics. In his works, Newton rephrased his ideas to suit the mathematical  idiom of the time, replacing calculations with infinitesimals by equivalent geometrical arguments which were considered beyond reproach. He used the methods of calculus to solve the pro blem of planetary motion, the shape of the surface of a rotating fluid, the oblateness of the earth, the motion of a weight sliding on a cycloid, and many other problems discussed in his Principia Mathematica (1687). In other work, he developed series expansions for functions, including fractional and irrational powers, and it was clear that he understood the principles of the Taylor series. He did not publish all these discoveries, and at this time infinitesimal methods were still considered disreputable.

Gttfried Wilhelm Leibniz was the first to state clearly the rules of calculus.

These ideas were arranged into a true calculus of infinitesimals by  Gottfried Wilhelm Leibniz, who was originally accused of  plagiarism by Newton. He is now regarded as an  independent inventor of and contributor to calculus. His contribution was to provide a clear set of rules for working with infinitesimal quantities, allowing the computation of second and higher derivatives, and providing the product rule and chain rule, in their differential and integral forms. Unlike Newton, Leibniz put painstaking effort into his choices of notation.

Today, Leibniz and Newton are usually both given credit for independently inventing and developing calculus. Newton was the first to apply calculus to general physics and Leibniz developed much of the notation used in calculus today. The basic insights that both Newton and Leibniz provided were the laws of differentiation and integration, second and higher derivatives, and the notion of an approximating polynomial series.

When Newton and Leibniz first published their results, there was great controversy over which mathematician (and therefore which country) deserved credit. Newton derived his results first (later to be published in his Method of Fluxions), but Leibniz published his “Nova Methodus pro Maximis et Minimis” first. Newton claimed Leibniz stole ideas from his unpublished notes, which Newton had shared with a few members of the Royal Society. This controversy divided English-speaking mathematicians from continental European mathematicians for many years, to the detriment of English mathematics. A careful examination of the papers of Leibniz and Newton shows that they arrived at their results independently, with Leibniz starting first with integration and Newton with differentiation. It is Leibniz, however, who gave the new discipline its name. Newton called his calculus “the science of fluxions“, a term that endured in English schools into the 19th century. The first complete treatise on calculus to be written in English and use the Leibniz notation was not published until 1815.

Since the time of Leibniz and Newton, many mathematicians have contributed to the continuing development of calculus. One of the first and most complete works on both infinitesimal and integral calculus was written in 1748 by Maria Gaetana Agnesi.

Foundations

In calculus, foundations refers to rigorous development of the subject from axioms and definitions. In early calculus the use of infinitesimal quantities was thought unrigorous, and was fiercely criticized by a number of authors, mos Michel Rolle and Bishop Berkeley. Berkeley famously described infinitesimals as the ghosts of departed quantities in his book The Analyst in 1734. Working out a rigorous foundation for calculus occupied mathematicians for much of the century following Newton and Leibniz, and is still to some extent an active area of research today.  Several mathematicians, including  Maclaurin, tried to prove the soundness of using infinitesimals, but it would not be until 150 years later when, due to the work of  Cauchy and  Weierstrass, a way was finally found to avoid mere “notions” of infinitely small quantities. The foundations of differential and integral calculus had been laid. In Cauchy’s  Cours d’Analyse, we find a broad range of foundational approaches, including a definition of continuity in terms of infinitesimals, and a (somewhat imprecise) prototype of an (ε, δ)-definition of limit in the definition of differentiation.[36] In his work Weierstrass formalized the concept of limit and eliminated infinitesimals (although his definition can actually validate nilsquare infinitesimals). Following the work of Weierstrass, it eventually became common to base calculus on limits instead of infinitesimal quantities, though the subject is still occasionally called “infinitesimal calculus”. Bernhard Riemann used these ideas to give a precise definition of the integral. It was also during this period that the ideas of calculus were generalized to the complex plane with the development of complex analysis.

In modern mathematics, the foundations of calculus are included in the field of real analysis, which contains full definitions and proofs of the theorems of calculus. The reach of calculus has also been greatly extended. Henri Lebesgue  invented  measure theory,  based on earlier developments by  Émile Borel,  and used it to define integrals of all but the most  pathological  functions . Laurent Schwartz  introduced  distributions, which can be used to take the derivative of any function whatsoever.

Limits are not the only rigorous approach to the foundation of calculus. Another way is to use  Abraham Robinson‘s  non-standard analysis. Robinson’s approach, developed in the 1960s, uses technical machinery from  mathematical logic to augment the real number system with infinitesimal and infinite numbers, as in the original Newton-Leibniz conception. The resulting numbers are called  hyperreal numbers, and they can be used to give a Leibniz-like development  of the usual rules of calculusThere is also smooth infinitesimal analysis, which differs from non-standard analysis in that it mandates neglecting higher-power infinitesimals during derivations.

### Significance

While many of the ideas of calculus had been developed earlier in GreeceChinaIndiaIraq, Persia, and  Japan, the use of calculus began in Europe, during the 17th century, when  Isaac Newton and  Gottfried Wilhelm Leibniz built on the work of earlier mathematicians to introduce its basic principles.  The development of calculus was built on earlier concepts of instantaneous motion and area underneath curves.

Applications of differential calculus include computations involving velocity and acceleration, the slope of a curve, and optimization. Applications of integral calculus include computations involving area, volumearc lengthcenter of masswork, and pressure. More advanced applications include power series and Fourier series.

Calculus is also used to gain a more precise understanding of the nature of space, time, and motion. For centuries, mathematicians and philosophers wrestled with paradoxes involving division by zero or sums of infinitely many numbers. These questions arise in the study of motion and area. The ancient Greek philosopher Zeno of Elea gave several famous examples of such paradoxes. Calculus provides tools, especially the limit and the infinite series, that resolve the paradoxes

## Principles

### Limits and infinitesimals

Calculus is usually developed by working with very small quantities. Historically, the first method of doing so was by infinitesimals. These are objects which can be treated like real numbers but which are, in some sense, “infinitely small”. For example, an infinitesimal number could be greater than 0, but less than any number in the sequence 1, 1/2, 1/3, … and thus less than any positive real number. From this point of view, calculus is a collection of techniques for manipulating infinitesimals. The symbols {\displaystyle dx} and  were taken to be infinitesimal, and the derivative  was simply their ratio.

The infinitesimal approach fell out of favor in the 19th century because it was difficult to make the notion of an infinitesimal precise. In the late 19th century, infinitesimals were replaced within academia by the epsilon, delta approach to limits. Limits describe the behavior of a function at a certain input in terms of its values at nearby inputs. They capture small-scale behavior using the intrinsic structure of the real number system (as a metric space with the least-upper-bound property). In this treatment, calculus is a collection of techniques for manipulating certain limits. Infinitesimals get replaced by sequences of smaller and smaller numbers, and the infinitely small behavior of a function is found by taking the limiting behavior for these sequences. Limits were thought to provide a more rigorous foundation for calculus, and for this reason they became the standard approach during the 20th century. However, the infinitesimal concept was revived in the 20th century with the introduction of non-standard analysis and smooth infinitesimal analysis, which provided solid foundations for the manipulation of infinitesimals.

### Differential calculus

Tangent line at (x0f(x0)). The derivative f′(x) of a curve at a point is the slope (rise over run) of the line tangent to that curve at that point. Differential calculus is the study of the definition, properties, and applications of the  derivative  of a function. The process of finding the derivative is called   differentiation.  Given a function and a point in the domain, the derivative at that point is a way of encoding the small-scale behavior of the function near that point. By finding the derivative of a function at every point in its domain, it is possible to produce a new function, called the derivative function or just the derivative of the original function. In formal terms, the derivative is a linear operator which takes a function as its input and produces a second function as its output. This is more abstract than many of the processes studied in elementary algebra, where functions usually input a number and output another number. For example, if the doubling function is given the input three, then it outputs six, and if the squaring function is given the input three, then it outputs nine. The derivative, however, can take the squaring function as an input. This means that the derivative takes all the information of the squaring function—such as that two is sent to four, three is sent to nine, four is sent to sixteen, and so on—and uses this information to produce another function. The function produced by differentiating the squaring function turns out to be the doubling function.

In more explicit terms the “doubling function” may be denoted by g(x) = 2x and the “squaring function” by f(x) = x2. The “derivative” now takes the function f(x), defined by the expression “x2“, as an input, that is all the information—such as that two is sent to four, three is sent to nine, four is sent to sixteen, and so on—and uses this information to output another function, the function g(x) = 2x, as will turn out.

In Lagrange’s notation, the symbol for a derivative is an apostrophe-like mark called a prime. Thus, the derivative of a function called f is denoted by f′, pronounced “f prime”. For instance, if f(x) = x2 is the squaring function, then f′(x) = 2x is its derivative (the doubling function g from above).

If the input of the function represents time, then the derivative represents change with respect to time. For example, if f is a function that takes a time as input and gives the position of a ball at that time as output, then the derivative of f is how the position is changing in time, that is, it is the velocity of the ball.

If a function is linear (that is, if the graph of the function is a straight line), then the function can be written as y = mx + b, where x is the independent variable, y is the dependent variable, b is the y-intercept, and:

This gives an exact value for the slope of a straight line. If the graph of the function is not a straight line, however, then the change in y divided by the change in x varies. Derivatives give an exact meaning to the notion of change in output with respect to change in input. To be concrete, let f be a function, and fix a point a in the domain of f(af(a)) is a point on the graph of the function. If h is a number close to zero, then a + h is a number close to a. Therefore, (a + hf(a + h)) is close to (af(a)). The slope between these two points is

This expression is called a difference quotient. A line through two points on a curve is called a secant line, so m is the slope of the secant line between (af(a)) and (a + hf(a + h)). The secant line is only an approximation to the behavior of the function at the point a because it does not account for what happens between a and a + h. It is not possible to discover the behavior at a by setting h to zero because this would require dividing by zero, which is undefined. The derivative is defined by taking the limit as h tends to zero, meaning that it considers the behavior of f for all small values of h and extracts a consistent value for the case when h equals zero:

Geometrically, the derivative is the slope of the tangent line to the graph of f at a. The tangent line is a limit of secant lines just as the derivative is a limit of difference quotients. For this reason, the derivative is sometimes called the slope of the function f.

Here is a particular example, the derivative of the squaring function at the input 3. Let f(x) = x2 be the squaring function.

The derivative f′(x) of a curve at a point is the slope of the line tangent to that curve at that point. This slope is determined by considering the limiting value of the slopes of secant lines. Here the function involved (drawn in red) is f(x) = x3 − x. The tangent line (in green) which passes through the point (−3/2, −15/8) has a slope of 23/4. Note that the vertical and horizontal scales in this image are different.

The slope of the tangent line to the squaring function at the point (3, 9) is 6, that is to say, it is going up six times as fast as it is going to the right. The limit process just described can be performed for any point in the domain of the squaring function. This defines the derivative function of the squaring function or just the derivative of the squaring function for short. A computation similar to the one above shows that the derivative of the squaring function is the doubling function.

### Leibniz notation

A common notation, introduced by Leibniz, for the derivative in the example above is

In an approach based on limits, the symbol dy/dx is to be interpreted not as the quotient of two numbers but as a shorthand for the limit computed above. Leibniz, however, did intend it to represent the quotient of two infinitesimally small numbers, dy being the infinitesimally small change in y caused by an infinitesimally small change dx applied to x. We can also think of d/dx as a differentiation operator, which takes a function as an input and gives another function, the derivative, as the output. For example:

In this usage, the dx in the denominator is read as “with respect to x“. Another example of correct notation could be:

Even when calculus is developed using limits rather than infinitesimals, it is common to manipulate symbols like dx and dy as if they were real numbers; although it is possible to avoid such manipulations, they are sometimes notationally convenient in expressing operations such as the total derivative.

### Integral calculus

Integral calculus is the study of the definitions, properties, and applications of two related concepts, the indefinite integral and the definite integral. The process of finding the value of an integral is called integration. In technical language, integral calculus studies two related linear operators.

The indefinite integral, also known as the antiderivative, is the inverse operation to the derivative. F is an indefinite integral of f when f is a derivative of F. (This use of lower- and upper-case letters for a function and its indefinite integral is common in calculus.)

The definite integral inputs a function and outputs a number, which gives the algebraic sum of areas between the graph of the input and the x-axis. The technical definition of the definite integral involves the limit of a sum of areas of rectangles, called a Riemann sum.

A motivating example is the distance traveled in a given time. If the speed is constant, only multiplication is needed:

But if the speed changes, a more powerful method of finding the distance is necessary. One such method is to approximate the distance traveled by breaking up the time into many short intervals of time, then multiplying the time elapsed in each interval by one of the speeds in that interval, and then taking the sum (a Riemann sum) of the approximate distance traveled in each interval. The basic idea is that if only a short time elapses, then the speed will stay more or less the same. However, a Riemann sum only gives an approximation of the distance traveled. We must take the limit of all such Riemann sums to find the exact distance traveled.

Constant velocity

Integration can be thought of as measuring thearea under a curve, defined by f(x), between two points (here a and b).

When velocity is constant, the total distance traveled over the given time interval can be computed by multiplying velocity and time. For example, travelling a steady 50 mph for 3 hours results in a total distance of 150 miles. In the diagram on the left, when constant velocity and time are graphed, these two values form a rectangle with height equal to the velocity and width equal to the time elapsed. Therefore, the product of velocity and time also calculates the rectangular area under the (constant) velocity curve. This connection between the area under a curve and distance traveled can be extended to any irregularly shaped region exhibiting a fluctuating velocity over a given time period. If f(x) in the diagram on the right represents speed as it varies over time, the distance traveled (between the times represented by a and b) is the area of the shaded region s.

To approximate that area, an intuitive method would be to divide up the distance between a and b into a number of equal segments, the length of each segment represented by the symbol Δx. For each small segment, we can choose one value of the function f(x). Call that value h. Then the area of the rectangle with base Δx and height h gives the distance (time Δx multiplied by speed h) traveled in that segment. Associated with each segment is the average value of the function above it, f(x) = h. The sum of all such rectangles gives an approximation of the area between the axis and the curve, which is an approximation of the total distance traveled. A smaller value for Δx will give more rectangles and in most cases a better approximation, but for an exact answer we need to take a limit as Δx approaches zero.

The symbol of integration is, an elongated S (the S stands for “sum”). The definite integral is written as:

and is read “the integral from a to b of f-of-x with respect to x.” The Leibniz notation dx is intended to suggest dividing the area under the curve into an infinite number of rectangles, so that their width Δx becomes the infinitesimally small dx. In a formulation of the calculus based on limits, the notation

is to be understood as an operator that takes a function as an input and gives a number, the area, as an output. The terminating differential, dx, is not a number, and is not being multiplied by f(x), although, serving as a reminder of the Δx limit definition, it can be treated as such in symbolic manipulations of the integral. Formally, the differential indicates the variable over which the function is integrated and serves as a closing bracket for the integration operator.

The indefinite integral, or antiderivative, is written:

Functions differing by only a constant have the same derivative, and it can be shown that the antiderivative of a given function is actually a family of functions differing only by a constant. Since the derivative of the function y = x2 + C, where C is any constant, is y′ = 2x, the antiderivative of the latter is given by:

The unspecified constant C present in the indefinite integral or antiderivative is known as the constant of integration.

### Fundamental theorem

The fundamental theorem of calculus states that differentiation and integration are inverse operations. More precisely, it relates the values of antiderivatives  to definite integrals. Because it is usually easier to compute an antiderivative than to apply the definition of a definite integral, the fundamental theorem of calculus provides a practical way of computing definite integrals. It can also be interpreted as a precise statement of the fact that differentiation is the inverse of integration.

The fundamental theorem of calculus states: If a function f is continuous on the interval [ab] and if F is a function whose derivative is f on the interval (ab), then

Furthermore, for every x in the interval (ab),

This realization, made by both Newton and Leibniz, was key to the proliferation of analytic results after their work became known. (The extent to which Newton and Leibniz were influenced by immediate predecessors, and particularly what Leibniz may have learned from the work of Isaac Barrow, is difficult to determine thanks to the priority dispute between them.) The fundamental theorem provides an algebraic method of computing many definite integrals—without performing limit processes—by finding formulae for antiderivatives. It is also a prototype solution of a differential equation. Differential equations relate an unknown function to its derivatives, and are ubiquitous in the sciences.

## Applications

The logarithmic spiral of the Nautilus shell is a classical image used to depict the growth andchange related to calculus.

Calculus is used in every branch of the physical sciences, actuarial sciencecomput er sciencestatisticsengineeringeconomicsbusinessmedicinedemography,  and in other fields wherever a problem can be  mathematically  modeled and an  optimal  solution is desired. It allows one to go from (non-constant) rates of change to the total change or vice versa, and many times in studying a problem we know one and are trying to find the other. Calculus can be used in conjunction with other mathematical disciplines. For example, it can be used with linear algebra to find the “best fit” linear approximation for a set of points in a domain. Or, it can be used in probability theory to determine the expectation value of a continuous random variable given a probability density function. In analytic geometry, the study of graphs of functions, calculus is used to find high points and low points (maxima and minima), slope,  concavity  and  inflection points.  Calculus is also used to find approximate solutions to equations; in practice it is the standard way to solve differential equations and do root finding in most applications. Examples are methods such as Newton’s methodfixed point iteration, and   linear approximation. For instance, spacecraft use a variation of the  Euler method  to approximate curved courses within zero gravity environments.

Physics makes particular use of calculus; all concepts in classical mechanics  and electromagnetism are related through calculus. The mass of an object of known  density, the  moment of inertia of objects, and the potential energies due to gravitational and electromagnetic forces can all be found by the use of calculus. An example of the use of calculus in mechanics is Newton’s second law of motion, which states that the derivative of an object’s momentum with respect to time equals the net force upon it. Alternatively, Newton’s second law can be expressed by saying that the net force is equal to the object’s mass times its acceleration, which is the time derivative of velocity and thus the second time derivative of spatial position. Starting from knowing how an object is accelerating, we use calculus to derive its path.

Maxwell’s theory of electromagnetism and Einstein‘s theory of general relativity are also expressed in the language of differential calculus. Chemistry also uses calculus in determining reaction rates  and in studying radioactive decay. In biology, population dynamics starts with reproduction and death rates to model population changes.

Green’s theorem, which gives the relationship between a line integral around a simple closed curve C and a double integral over the plane region D bounded by C, is applied in an instrument known as a planimeter, which is used to calculate the area of a flat surface on a drawing.  For example, it can be used to calculate the amount of area taken up by an irregularly shaped flower bed or swimming pool when designing the layout of a piece of property.

In the realm of medicine, calculus can be used to find the optimal branching angle of a blood vessel so as to maximize flow. Calculus can be applied to understand how quickly a drug is eliminated from a body or how quickly a  cancerous  tumour grows.

In economics, calculus allows for the determination of maximal profit by providing a way to easily calculate both marginal cost and marginal revenue.

## Varieties

Over the years, many reformulations of calculus have been investigated for different purposes.

### Non-standard calculus

Imprecise calculations with infinitesimals were widely replaced with the rigorous  (ε, δ)-definition of limit starting in the 1870s. Meanwhile, calculations with infinitesimals persisted and often led to correct results. This led Abraham Robinson to investigate if it were possible to develop a number system with infinitesimal quantities over which the theorems of calculus were still valid. In 1960, building upon the work of Edwin Hewitt and Jerzy Łoś, he succeeded in developing non-standard analysis. The theory of non-standard analysis is rich enough to be applied in many branches of mathematics. As such, books and articles dedicated solely to the traditional theorems of calculus often go by the title non-standard calculus.

### Smooth infinitesimal analysis

This is another reformulation of the calculus in terms of infinitesimals. Based on the ideas of F. W. Lawvere and employing the methods of category theory, it views all functions as being continuous and incapable of being expressed in terms of discrete entities. One aspect of this formulation is that the law of excluded middle does not hold in this formulation.

### Constructive analysis

Constructive mathematics is a branch of mathematics that insists that proofs of the existence of a number, function, or other mathematical object should give a construction of the object. As such constructive mathematics also rejects the law of excluded middle. Reformulations of calculus in a constructive framework are generally part of the subject of constructive analysis.[34]

### Other related topics

#### Topic 6 – Discrete mathematics – complete mathematical induction, linear Diophantine equations, Fermat’s little theorem, route inspection problem and recurrence relations

Discrete mathematics is the study of mathematical structures that can be considered “discrete” (in a way analogous to discrete variables, having a bijection with the set of natural numbers) rather than “continuous” (analogously to continuous functions). Objects studied in discrete mathematics include integersgraphs, and statements in logic.[1][2][3][4] By contrast, discrete mathematics excludes topics in “continuous mathematics” such as real numberscalculus or Euclidean geometry. Discrete objects can often be enumerated by integers; more formally, discrete mathematics has been characterized as the branch of mathematics dealing with countable sets[5] (finite sets or sets with the same cardinality as the natural numbers). However, there is no exact definition of the term “discrete mathematics”.[6]

The set of objects studied in discrete mathematics can be finite or infinite. The term finite mathematics is sometimes applied to parts of the field of discrete mathematics that deals with finite sets, particularly those areas relevant to business.

Research in discrete mathematics increased in the latter half of the twentieth century partly due to the development of digital computers which operate in “discrete” steps and store data in “discrete” bits. Concepts and notations from discrete mathematics are useful in studying and describing objects and problems in branches of computer science, such as computer algorithmsprogramming languagescryptographyautomated theorem proving, and software development. Conversely, computer implementations are significant in applying ideas from discrete mathematics to real-world problems, such as in operations research.

Although the main objects of study in discrete mathematics are discrete objects, analytic methods from “continuous” mathematics are often employed as well.

In university curricula, “Discrete Mathematics” appeared in the 1980s, initially as a computer science support course; its contents were somewhat haphazard at the time. The curriculum has thereafter developed in conjunction with efforts by ACM and MAA into a course that is basically intended to develop mathematical maturity in first-year students; therefore, it is nowadays a prerequisite for mathematics majors in some universities as well.[7][8] Some high-school-level discrete mathematics textbooks have appeared as well.[9] At this level, discrete mathematics is sometimes seen as a preparatory course, not unlike precalculus in this respect.[10]

The Fulkerson Prize is awarded for outstanding papers in discrete mathematics.

## Grand challenges, past and present

Much research in graph theory was motivated by attempts to prove that all maps, like this one, can be colored using only four colors so that no areas of the same color share an edge. Kenneth Appel and Wolfgang Haken proved this in 1976.[11]

The history of discrete mathematics has involved a number of challenging problems which have focused attention within areas of the field. In graph theory, much research was motivated by attempts to prove the four color theorem, first stated in 1852, but not proved until 1976 (by Kenneth Appel and Wolfgang Haken, using substantial computer assistance).[11]

In logic, the second problem on David Hilbert‘s list of open problems presented in 1900 was to prove that the axioms of arithmetic are consistentGödel’s second incompleteness theorem, proved in 1931, showed that this was not possible – at least not within arithmetic itself. Hilbert’s tenth problem was to determine whether a given polynomial Diophantine equation with integer coefficients has an integer solution. In 1970, Yuri Matiyasevich proved that this could not be done.

The need to break German codes in World War II led to advances in cryptography and theoretical computer science, with the first programmable digital electronic computer being developed at England’s Bletchley Park with the guidance of Alan Turing and his seminal work, On Computable Numbers.[12] At the same time, military requirements motivated advances in operations research. The Cold War meant that cryptography remained important, with fundamental advances such as public-key cryptography being developed in the following decades. Operations research remained important as a tool in business and project management, with the critical path method being developed in the 1950s. The telecommunication industry has also motivated advances in discrete mathematics, particularly in graph theory and information theoryFormal verification of statements in logic has been necessary for software development of safety-critical systems, and advances in automated theorem proving have been driven by this need.

Computational geometry has been an important part of the computer graphics incorporated into modern video games and computer-aided design tools.

Several fields of discrete mathematics, particularly theoretical computer science, graph theory, and combinatorics, are important in addressing the challenging bioinformatics problems associated with understanding the tree of life.[13]

Currently, one of the most famous open problems in theoretical computer science is the P = NP problem, which involves the relationship between the complexity classes P and NP. The Clay Mathematics Institute has offered a \$1 million USD prize for the first correct proof, along with prizes for six other mathematical problems.[14]

## Topics in discrete mathematics

### Theoretical computer science

Complexity studies the time taken by algorithms, such as this sorting routine.

Theoretical computer science includes areas of discrete mathematics relevant to computing. It draws heavily on graph theory and mathematical logic. Included within theoretical computer science is the study of algorithms and data structures. Computability studies what can be computed in principle, and has close ties to logic, while complexity studies the time, space, and other resources taken by computations. Automata theory and formal language theory are closely related to computability. Petri nets and process algebras are used to model computer systems, and methods from discrete mathematics are used in analyzing VLSI electronic circuits. Computational geometry applies algorithms to geometrical problems, while computer image analysis applies them to representations of images. Theoretical computer science also includes the study of various continuous computational topics.

### Information theory

The ASCII codes for the word “Wikipedia”, given here in binary, provide a way of representing the word in information theory, as well as for information-processing algorithms.

Information theory involves the quantification of information. Closely related is coding theory which is used to design efficient and reliable data transmission and storage methods. Information theory also includes continuous topics such as: analog signalsanalog codinganalog encryption.

### Logic

Logic is the study of the principles of valid reasoning and inference, as well as of consistencysoundness, and completeness. For example, in most systems of logic (but not in intuitionistic logicPeirce’s law (((PQ)→P)→P) is a theorem. For classical logic, it can be easily verified with a truth table. The study of mathematical proof is particularly important in logic, and has applications to automated theorem proving and formal verification of software.

Logical formulas are discrete structures, as are proofs, which form finite trees[15] or, more generally, directed acyclic graph structures[16][17] (with each inference step combining one or more premise branches to give a single conclusion). The truth values of logical formulas usually form a finite set, generally restricted to two values: true and false, but logic can also be continuous-valued, e.g., fuzzy logic. Concepts such as infinite proof trees or infinite derivation trees have also been studied,[18] e.g. infinitary logic.

### Set theory

Set theory is the branch of mathematics that studies sets, which are collections of objects, such as {blue, white, red} or the (infinite) set of all prime numbersPartially ordered sets and sets with other relations have applications in several areas.

In discrete mathematics, countable sets (including finite sets) are the main focus. The beginning of set theory as a branch of mathematics is usually marked by Georg Cantor‘s work distinguishing between different kinds of infinite set, motivated by the study of trigonometric series, and further development of the theory of infinite sets is outside the scope of discrete mathematics. Indeed, contemporary work in descriptive set theory makes extensive use of traditional continuous mathematics.

### Combinatorics

Combinatorics studies the way in which discrete structures can be combined or arranged. Enumerative combinatorics concentrates on counting the number of certain combinatorial objects – e.g. the twelvefold way provides a unified framework for counting permutationscombinations and partitionsAnalytic combinatorics concerns the enumeration (i.e., determining the number) of combinatorial structures using tools from complex analysis and probability theory. In contrast with enumerative combinatorics which uses explicit combinatorial formulae and generating functions to describe the results, analytic combinatorics aims at obtaining asymptotic formulae. Design theory is a study of combinatorial designs, which are collections of subsets with certain intersection properties. Partition theory studies various enumeration and asymptotic problems related to integer partitions, and is closely related to q-seriesspecial functions and orthogonal polynomials. Originally a part of number theory and analysis, partition theory is now considered a part of combinatorics or an independent field. Order theory is the study of partially ordered sets, both finite and infinite.

### Graph theory

Graph theory has close links to group theory. This truncated tetrahedron graph is related to the alternating group A4.

Graph theory, the study of graphs and networks, is often considered part of combinatorics, but has grown large enough and distinct enough, with its own kind of problems, to be regarded as a subject in its own right.[19] Graphs are one of the prime objects of study in discrete mathematics. They are among the most ubiquitous models of both natural and human-made structures. They can model many types of relations and process dynamics in physical, biological and social systems. In computer science, they can represent networks of communication, data organization, computational devices, the flow of computation, etc. In mathematics, they are useful in geometry and certain parts of topology, e.g. knot theoryAlgebraic graph theory has close links with group theory. There are also continuous graphs; however, for the most part, research in graph theory falls within the domain of discrete mathematics.

### Probability

Discrete probability theory deals with events that occur in countable sample spaces. For example, count observations such as the numbers of birds in flocks comprise only natural number values {0, 1, 2, …}. On the other hand, continuous observations such as the weights of birds comprise real number values and would typically be modeled by a continuous probability distribution such as the normal. Discrete probability distributions can be used to approximate continuous ones and vice versa. For highly constrained situations such as throwing dice or experiments with decks of cards, calculating the probability of events is basically enumerative combinatorics.

### Number theory

The Ulam spiral of numbers, with black pixels showing prime numbers. This diagram hints at patterns in the distribution of prime numbers.

Number theory is concerned with the properties of numbers in general, particularly integers. It has applications to cryptography and cryptanalysis, particularly with regard to modular arithmeticdiophantine equations, linear and quadratic congruences, prime numbers and primality testing. Other discrete aspects of number theory include geometry of numbers. In analytic number theory, techniques from continuous mathematics are also used. Topics that go beyond discrete objects include transcendental numbersdiophantine approximationp-adic analysis and function fields.

### Algebraic structures

Algebraic structures occur as both discrete examples and continuous examples. Discrete algebras include: boolean algebra used in logic gates and programming; relational algebra used in databases; discrete and finite versions of groupsrings and fields are important in algebraic coding theory; discrete semigroups and monoids appear in the theory of formal languages.

### Calculus of finite differences, discrete calculus or discrete analysis

function defined on an interval of the integers is usually called a sequence. A sequence could be a finite sequence from a data source or an infinite sequence from a discrete dynamical system. Such a discrete function could be defined explicitly by a list (if its domain is finite), or by a formula for its general term, or it could be given implicitly by a recurrence relation or difference equation. Difference equations are similar to differential equations, but replace differentiation by taking the difference between adjacent terms; they can be used to approximate differential equations or (more often) studied in their own right. Many questions and methods concerning differential equations have counterparts for difference equations. For instance, where there are integral transforms in harmonic analysis for studying continuous functions or analogue signals, there are discrete transforms for discrete functions or digital signals. As well as the discrete metric there are more general discrete or finite metric spaces and finite topological spaces.

### Geometry

Computational geometry applies computer algorithms to representations of geometrical objects.

Discrete geometry and combinatorial geometry are about combinatorial properties of discrete collections of geometrical objects. A long-standing topic in discrete geometry is tiling of the plane. Computational geometry applies algorithms to geometrical problems.

### Topology

Although topology is the field of mathematics that formalizes and generalizes the intuitive notion of “continuous deformation” of objects, it gives rise to many discrete topics; this can be attributed in part to the focus on topological invariants, which themselves usually take discrete values. See combinatorial topologytopological graph theorytopological combinatoricscomputational topologydiscrete topological spacefinite topological spacetopology (chemistry).

### Operations research

PERT charts like this provide a project management technique based on graph theory.

Operations research provides techniques for solving practical problems in engineering, business, and other fields — problems such as allocating resources to maximize profit, and scheduling project activities to minimize risk. Operations research techniques include linear programming and other areas of optimizationqueuing theoryscheduling theory, and network theory. Operations research also includes continuous topics such as continuous-time Markov process, continuous-time martingalesprocess optimization, and continuous and hybrid control theory.

### Game theory, decision theory, utility theory, social choice theory

 Cooperate Defect Cooperate -1, -1 −10, 0 Defect 0, -10 -5, -5 Payoff matrix for the Prisoner’s dilemma, a common example in game theory. One player chooses a row, the other a column; the resulting pair gives their payoffs

Decision theory is concerned with identifying the values, uncertainties and other issues relevant in a given decision, its rationality, and the resulting optimal decision.

Utility theory is about measures of the relative economic satisfaction from, or desirability of, consumption of various goods and services.

Social choice theory is about voting. A more puzzle-based approach to voting is ballot theory.

Game theory deals with situations where success depends on the choices of others, which makes choosing the best course of action more complex. There are even continuous games, see differential game. Topics include auction theory and fair division.

### Discretization

Discretization concerns the process of transferring continuous models and equations into discrete counterparts, often for the purposes of making calculations easier by using approximations. Numerical analysis provides an important example.

### Discrete analogues of continuous mathematics

There are many concepts in continuous mathematics which have discrete versions, such as discrete calculusdiscrete probability distributionsdiscrete Fourier transformsdiscrete geometrydiscrete logarithmsdiscrete differential geometrydiscrete exterior calculusdiscrete Morse theorydifference equationsdiscrete dynamical systems, and discrete vector measures.

In applied mathematicsdiscrete modelling is the discrete analogue of continuous modelling. In discrete modelling, discrete formulae are fit to data. A common method in this form of modelling is to use recurrence relation.

In algebraic geometry, the concept of a curve can be extended to discrete geometries by taking the spectra of polynomial rings over finite fields to be models of the affine spaces over that field, and letting subvarieties or spectra of other rings provide the curves that lie in that space. Although the space in which the curves appear has a finite number of points, the curves are not so much sets of points as analogues of curves in continuous settings. For example, every point of the form {\displaystyle V(x-c)\subset \operatorname {Spec} K[x]=\mathbb {A} ^{1}} for {\ displaystyle K} a field can be studied either as {\displaystyle \operatorname {Spec} K[x]/(x-c)\cong \operatorname {Spec} K}, a point, or as the spectrum {\displaystyle \operatorname {Spec} K[x]_{(x-c)}} of the local ring at (x-c), a point together with a neighborhood around it. Algebraic varieties also have a well-defined notion of tangent space called the Zariski tangent space, making many features of calculus applicable even in finite settings.

### Hybrid discrete and continuous mathematics

The time scale calculus is a unification of the theory of difference equations with that of differential equations, which has applications to fields requiring simultaneous modelling of discrete and continuous data. Another way of modeling such a situation is the notion of hybrid dynamical systems.