Summary
Contents
Subject index
A Mathematical Primer for Social Statistics covers many often ignored yet important topics in mathematics and mathematical statistics. This text provides readers with the foundation on which an understanding of applied statistics rests. Key Features: covers matrices, linear algebra, and vector geometry; discusses basic differential and integral calculus; focuses on probability and statistical estimation; and develops by way of illustration the seminal statistical method of linear least-squares regression. This book is ideal for advanced undergraduates, graduate students, and researchers in the social sciences who need to understand and use relatively advanced statistical methods but whose mathematical preparation for this work is insufficient.
An Introduction to Calculus
An Introduction to Calculus
What is now called calculus deals with two basic types of problems: finding the slopes of tangent lines to curves (differential calculus) and evaluating areas under curves (integral calculus). In the 17th century, the English physicist and mathematician Sir Isaac Newton (1643–1727) and the German philosopher and mathematician Gottfried Wilhelm Leibniz (1646–1716) independently demonstrated the relationship between these two kinds of problems, consolidating and extending previous work in mathematics dating to the classical period. Newton and Leibniz are generally acknowledged as the cofounders of calculus.1 In the 19th century, the great French mathematician Augustin Louis Cauchy (1789–1857), among others, employed the concept of the limit of a function to provide a rigorous logical foundation for calculus.
After a review of some elementary mathematics—numbers, equations of lines and planes, polynomial functions, logarithms, exponentials, and basic trigonometric functions—I will briefly take up the following seminal topics in calculus, emphasizing basic concepts: Section 2.2, limits of functions; Section 2.3, the derivative of a function; Section 2.4, the application of derivatives to optimization problems; Section 2.5, partial derivatives of functions of several variables, constrained optimization, and differential calculus in matrix form; Section 2.6, Taylor series expansions and approximations; and Section 2.7, the essential ideas of integral calculus.
Although a thorough and rigorous treatment is well beyond the scope of this brief book, one can get a lot of mileage out of an intuitive grounding in the basic ideas of calculus.
2.1 Review
2.1.1 Numbers
The definition of various sets of numbers is a relatively deep topic in mathematics, but the following rough distinctions will be sufficient for our purposes:
1. Newton's claim that Leibniz had appropriated his work touched off one of the most famous priority disputes in the history of science.
- The natural numbers include 0 and the positive whole numbers: 0, 1, 2, 3, ….2
- The integers include all negative and positive whole numbers and 0: …, −3, −2, −1, 0, 1, 2, 3, …
- The rational numbers consist of all numbers that can be written as ratios of two integers, with m ≠ 0, including all of the integers and nonintegers such as −½ and .
- The real numbers include all of the rational numbers along with the irrational numbers, such as and the mathematical constants π ≈ 3.14159 and e ≈ 2.71828, which cannot be written precisely as the ratio of two integers. The real numbers can be mapped into distances along a continuous line from −∞ to +∞.
- The complex numbers are of the form a + bi, where a and b are real numbers, and where The complex numbers can be thought of as points in a plane: The real component of the number a gives the horizontal coordinate of the point, and the coefficient b of the ‘imaginary’ component bi gives the vertical coordinate. The complex numbers include the real numbers (for which b = 0).
2.1.2 Lines and Planes
A straight line has the equation
where a and b are constants. The constant a is the y-intercept of the line, that is, the value of y associated with x = 0; and b is the slope of the line, that is the change in y when x is increased by 1: See Figure 2.1, which shows straight lines in the two-dimensional coordinate space with axes x and y; in each case, the line extends infinitely to the left and right beyond the line-segment shown in the graph. When the slope is positive, b > 0, the line runs from lower left to upper right; when the slope is negative, b < 0, the line runs from upper left to lower right; and when b = 0, the line is horizontal.
Similarly, the linear equation
represents a flat plane in the three-dimensional space with axes x1, x2, and y, as illustrated in the three-dimensional graph in Figure 2.2; the axes are at right angles to each other, so think of the x2 axis as extending directly into the page. The plane extends infinitely in all directions beyond the lines on its surface shown in the graph. The intercept of the plane, a, is the value of y when both x1 and x2 are 0; b1 represents the slope of the plane in the direction of x1 for a fixed value of x2; and b2 represents the slope of the plane in the direction of x2 for a fixed value of x1.
2. In some areas of mathematics, the natural numbers include only the positive whole numbers.
Figure 2.1 The graph of a straight line, y = a + bx, for (a) b > 0, (b) b < 0, and (c) b = 0.
Figure 2.2 The equation of a plane, y = a + b1x1 + b2x2. Here, both slopes, b1 and b2, are positive.
Figure 2.3 ‘Typical’ first-order (linear), second-order (quadratic), and third-order (cubic) polynomials.
The equation of a straight line can be written in other forms, including
which can be transformed into slope-intercept form as
Likewise, the equation
represents a plane,
2.1.3 Polynomials
Polynomials are functions of the form
where a0, a1, a2, …, ap are constants, some of which (with the exception of ap) may be 0. The largest exponent, p, is called the order of the polynomial. In particular, and as illustrated in Figure 2.3, a first-order polynomial is a straight line,
a second-order polynomial is a quadratic equation,
and a third-order polynomial is a cubic equation,
A polynomial equation of order p can have up to p − 1 ‘bends’ in it, such as the single bend (change of direction) in the quadratic function in Figure 2.3(b) and the two bends in the cubic in Figure 2.3(c).
2.1.4 Logarithms and Exponentials
Logarithms (‘logs’) are exponents: The expression
which is read as ‘the log of x to the base b is y,’ means that
where b > 0 and b ≠ 1. Thus, for example,
and, similarly,
Indeed, the log of 1 to any base is 0, because b0 = 1 for any number b ≠ 0. Logs are defined only for positive numbers x. The most commonly used base for logarithms in mathematics is the base e ≈ 2.718; logs to the base e are called natural logs.3 (For a justification of this terminology, see Section 2.3.4.)
Figure 2.4 Graph of the log function y = logbx.
A ‘typical’ log function is graphed in Figure 2.4. As the graph implies, log functions have the same basic shape regardless of the base, and converting from one base, say b, to another, say a, simply involves multiplication by a constant:
For example,
Logs inherit their properties from the properties of exponents: Because bx1bx2 = bx1+x2, it follows that
Similarly, because bx1 /bx2 = bx1−x2,
3. Although I prefer always to show the base of the log function explicitly, as in log10 or loge (unless the base is irrelevant, in which case log will do), many authors use unsubscripted log or ln to represent natural logs.
Figure 2.5 Graph of the exponential function y = ex.
and because bax = (bx)a,
At one time, the conversion of multiplication into addition, division into subtraction, and exponentiation into multiplication simplified laborious computations. Although this motivation has faded, logs still play a prominent role in mathematics and statistics.
An exponential function is a function of the form
where a is a constant. The most common exponential, y = exp(x) = ex, is graphed in Figure 2.5. The log and exponential functions are inverses of each other, in the sense that loga(ax) = x and alogax = x.
2.1.5 Basic Trigonometric Functions
Figure 2.6 shows a unit circle—that is, a circle of radius 1 centered at the origin. The angle x produces a right triangle inscribed in the circle; notice that the angle is measured in a counterclockwise direction from the horizontal axis. The cosine of the angle x, denoted cos x, is the signed length of the side of the triangle adjacent to the angle (i.e., ‘adjacent/hypotenuse,’ where the hypotenuse is 1 because it is a radius of the unit circle); the sine of the angle x, denoted sin x, is the signed length of the side of the triangle opposite the angle (i.e., ‘opposite/hypotenuse’); and the tangent of x, tan x = sin x/ cos x, is the ratio of the signed length of the side opposite to the side adjacent to the right angle (‘opposite’/‘adjacent’). The cosine, sine, and tangent functions for angles between −360° and 360° are shown in Figure 2.7; negative angles represent clockwise rotations. Notice that the tangent function approaches ±∞ at angles of ±90° and ±270°, and that the sine and cosine functions have the same shape, with sin(x) = cos(x − 90).
Figure 2.6 A unit circle, showing the angle x and its cosine and sine.
It is sometimes mathematically convenient to measure angles in radians rather than in degrees, with 2π radians corresponding to 360 degrees. The circumferance of the unit circle in Figure 2.6 is also 2π, and therefore the radian measure of an angle represents the length of the arc along the unit circle subtended by the angle.
2.2 Limits
Calculus deals with functions of the form y = f(x). I will consider the case where both the domain (values of the independent variable x) and range (values of the dependent variable y) of the function are real numbers. Thelimit of a function concerns its behavior when x is near, but not necessarily equal to, a specific value. This is often a useful idea, especially when a function is undefined at a particular value of x.
Figure 2.7 The cosine, sine, and tangent functions for angles between x = −360° and x = 360°.
2.2.1 The ‘Epsilon-Delta’ Definition of a Limit
A function y = f(x) has a limit L at x = x0 (i.e., a particular value of x) if for any positive tolerance ε, no matter how small, there exists a positive number δ such that the distance between f(x) and L is less than the tolerance as long as the distance between x and x0 is smaller than δ—that is, as long as x is confined to a sufficiently small neighborhood of width 2δ around x0. In symbols:
for all
This possibly cryptic definition is clarified by Figure 2.8. Note that f(x0) need not equal L. Indeed, limits are often most useful when f(x) does not exist at x = x0. For L to be the limit of f(x) at x = x0, the function must approach this value as x approaches x0both from the left and from the right.
Figure 2.8 The limit of the function f(x) as x approaches x0 is L. The gap in the curve above x0 is meant to suggest that the function is undefined at x = x0.
The following notation is used:
We read this expression as ‘The limit of the function f(x) as x approaches x0 is L.’
2.2.2 Finding a Limit: An Example
Find the limit of
at x0 = 1:
Notice that is undefined. Nevertheless, as long as x is not exactly equal to 1, even if it is very close to it, we can divide by x − 1:
Figure 2.9 limx→1 = 2, even though the function is undefined at x = 1.
Moreover, because x0 + 1 = 1 + 1 = 2,
This limit is graphed in Figure 2.9.
2.2.3 Rules for Manipulating Limits
Suppose that we have two functions f(x) and g(x) of an independent variable x, and that each function has a limit at x = x0:
Then the limits of functions composed from f(x) and g(x) by the arithmetic operations of addition, subtraction, multiplication, and division are straightforward:
The last result holds as long as the denominator b ≠ 0.
Similarly, if c and n are constants and limx→x0f(x) = a, then
Finally, it is (I hope) obvious that
2.3 The Derivative of a Function
Now consider a function y = f(x) evaluated at two values of x:
The difference quotient is defined as the change in y divided by the change in x, as we move from the point (x1, y1) to the point (x2, y2):
where Δ (‘Delta’) is a short-hand denoting ‘change.’ As illustrated in Figure 2.10, the difference quotient is the slope of the secant line connecting the points (x1, y1) and (x2, y2).
The derivative of the function f(x) at x = x1 (so named because it is derived from the original function) is the limit of the difference quotient Δy/Δx as x2 approaches x1 (i.e., as Δx → 0):
Figure 2.10 The difference quotient Δy/Δx is the slope of the secant line connecting (x1, y1) and (x2, y2).
Figure 2.11 The derivative is the slope of the tangent line at f(x1). As x2 → x1, the secant line approaches the tangent line.
The derivative is therefore the slope of the tangent line to the curve f(x) at x = x1, as shown in Figure 2.11.
The following alternative notation is often used for the derivative:
The last form, f′(x), emphasizes that the derivative is itself a function of x, but the notation employing the differentials dy and dx, which may be thought of as infinitesimally small values that are nevertheless nonzero, can be productive: In many circumstances the differentials can be manipulated as if they were numbers. (See, e.g., the ‘chain rule’ for differentiation, introduced in Section 2.3.3.) The operation of finding the derivative of a function is called differentiation.
2.3.1 The Derivative as the Limit of the Difference Quotient: An Example
Given the function y = f(x) = x2, find the derivative f′(x) for any value of x:
Applying the definition of the derivative as the limit of the difference quotient,
Notice that division by Δx is justified here, because although Δx approaches 0 in the limit, it never is exactly equal to 0. For example, the slope of the curve y = f(x) = x2 at x = 3 is f′(x) = 2x = 2 × 3 = 6.
2.3.2 Derivatives of Powers
More generally, by similar reasoning, the derivative of
is
For example, the derivative of the function
is
Moreover, this rule applies as well to negative powers and to fractional powers. For example, the derivative of the function
is
and the derivative of the function
is
2.3.3 Rules for Manipulating Derivatives
Suppose that a function is the sum of two other functions:
The addition rule for derivatives follows from the addition rule for limits:
For example,
Notice that the derivative of a constant—the constant 4 in the last example—is 0, because the constant can be expressed as
This result makes sense geometrically: A constant is represented as a horizontal line in the {x, y} plane, and a horizontal line has a slope of 0.
The addition rule, therefore, along with the result that serves to differentiate any polynomial function.
Multiplication and division are more complex. The multiplication rule for derivatives:
h′(x) = f(x) g′ (x) + f′(x)g(x)
The division rule for derivatives:
For example, the derivative of the function
is
and the derivative of the function
is
The chain rule: If y = f(z) and z = g(x), then y is indirectly a function of x:
The derivative of y with respect to x is
as if the differential dz in the numerator and the denominator can be cancelled.4
For example, given the function
find the derivative dy/dx of y with respect to x:
This problem could be solved by expanding the power—that is, by multiplying the expression in parentheses by itself five times—but that would be tedious in the extreme. It is much easier to find the derivative by using the chain rule, introducing a new variable, z, to represent the expression inside the parentheses. Let
Then
Differentiating y with respect to z, and z with respect to x, produces
4. The differentials are not ordinary numbers, so thinking of the chain rule as simultaneously dividing and multiplying by the differential dz is a heuristic device, illustrating how the notation for the derivative using differentials proves to be productive.
Applying the chain rule,
Finally, substituting for z,
The use of the chain rule in this example is typical, introducing an ‘artificial’ variable (z) to simplify the structure of the problem.
2.3.4 Derivatives of Logs and Exponentials
Logarithms and exponentials often occur in statistical applications, and so it is useful to know how to differentiate these functions.
The derivative of the log function y = loge(x) is
Recall that loge is the natural-log function, that is, log to the base e ≈ 2.718. Indeed, the simplicity of its derivative is one of the reasons that it is ‘natural’ to use the base e for the natural logs.
The derivative of the exponential function y = ex is
The derivative of the exponential function y = ax for any constant a (i.e., not necessarily e) is
2.3.5 Derivatives of the Basic Trigonometric Functions
The derivatives of the basic trigometric functions are as follows, with the angle x measured in radians:
Note that cos2x ≡ (cos x)2.
2.3.6 Second-Order and Higher-Order Derivatives
Because derivatives are themselves functions, they can be differentiated. The second derivative of the function y = f(x) is therefore defined as
Notice the alternative notation.
Likewise, the third derivative of the function y = f(x) is the derivative of the second derivative,
and so on for higher-order derivatives.
For example, the function
has the derivatives
All derivatives beyond the fifth-order are also 0.
2.4 Optimization
An important application of derivatives, both in statistics and more generally, is to maximization and minimization problems: that is, finding maximum and minimum values of functions (e.g., maximum-likelihood estimation; least-squares estimation). Such problems are collectively called optimization.
Figure 2.12 The derivative (i.e., the slope) of the function is 0 where the function f(x) is at a minimum or maximum.
As illustrated in Figure 2.12, when a function is at a relative (local) maximum or relative minimum (i.e., a value higher than or lower than surrounding values) or at an absolute or global maximum or minimum (a value at least as high or low as all other values of the function), the tangent line to the function is flat, and hence the function has a derivative of 0 at that point. A function can also have a 0 derivative, however, at a point that is neither a minimum nor a maximum, such as at a point of inflection—that is, a point where the direction of curvature changes, as in Figure 2.13. Points at which the derivative is 0 are called stationary points.
To distinguish among the three cases—minimum, maximum, or neither—we can appeal to the value of the second derivative (see Figure 2.14).
- At a minimum, the first derivative ƒ′(x) is changing from negative, through 0, to positive—that is, the first derivative is increasing, and therefore the second derivative ƒ″(x) is positive: The second derivative indicates change in the first derivative just as the first derivative indicates change in the original function.
- At a maximum, the first derivative ƒ′(x) is changing from positive, through 0, to negative—the first derivative is decreasing, and therefore the second derivative ƒ″(x) is negative.
- At a point of inflection, ƒ″(x) = 0.
Figure 2.13 The derivative is also 0 at a point of inflection in f(x).
Figure 2.14 The first derivative (the slope of the function) is increasing where the function f(x) is at a minimum and decreasing at a maximum.
The relationships among the original function, the first derivative, and the second derivative are illustrated in Figure 2.15: The first derivative dy/dx is 0 at the two minima and at the (relative) maximum of f(x); the second derivative d2y/dx2 is positive at the two minima, and negative at the maximum of f(x).
2.4.1 Optimization: An Example
Find the extrema (minima and maxima) of the function
Figure 2.15 An example of a function and its first and second derivatives.
The function is shown in Figure 2.16. By the way, locating stationary points and determining whether they are minima or maxima (or neither) is helpful in graphing functions.
Figure 2.16 Finding the extrema of the function
The first and second derivatives of the function are
Setting the first derivative to 0, and solving for the values of x that satisfy the resulting equation, produces the following results:
The two roots, at which ƒ′(x) is 0, are therefore x = 2 and x = 1.
- For x = 2,
Because ƒ″ (2) is positive, the point (2, 10) represents a (relative) minimum.
- Likewise, for x = 1,
Because ƒ″(1) is negative, the point (1, 11) represents a (relative) maximum.
2.5 Multivariable and Matrix Differential Calculus
Multivariable differential calculus—the topic of this section—finds frequent application in statistics. The essential ideas of multivariable calculus are straightforward extensions of calculus of a single independent variable, but the topic is frequently omitted from introductory treatments of calculus.
2.5.1 Partial Derivatives
Consider a function y = f(x1, x2, …, xn) of several independent variables. The partial derivative of y with respect to a particular xi is the derivative of f(x1, x2, …, xn) treating the other xs as constants. To distinguish it from the ordinary derivative dy/dx, the standard notation for the partial derivative uses ‘curly ds’ in place of ds: ∂y/∂xi.
For example, for the function
the partial derivatives with respect to x1 and x2 are
The ‘trick’ in partial differentiation with respect to xi is to remember to treat all of the other xs as constants (i.e., literally to hold other xs constant). Thus, when we differentiate with respect to x1, terms such as x22 and x22 are constants.
The partial derivative ∂f(x1, x2, …., xn)/∂x1 gives the slope of the tangent hyperplane to the function f(x, x2, …, xn) in the direction of x1.5 For example, the tangent plane to the function
above the pair of values x1 = 1, x2 = 2 is shown in Figure 2.17.
At a local or global minimum or maximum, the slope of the tangent hyperplane is 0 in all directions. Consequently, to minimize or maximize a function of several variables, we have to differentiate the function with respect to each variable, set the partial derivatives to 0, and solve the resulting set of simultaneous equations. I will explain in Section 2.5.3 how to distinguish maxima from minima.
5. A hyperplane is the generalization of a linear (i.e., flat) surface to a space of more than three dimensions. The dimension of the hyperplane is one less than that of the enclosing space, just as a plane is a two-dimensional object embedded in a three-dimensional space.
Figure 2.17 The function y = f(x1, x2) = x12 + x1x2 + x22 + 10, showing the tangent plane at x1 = 1, x2 = 2.
Let us, for example, find the values of x1 and x2 that minimize the function
Differentiating,
Setting these partial derivatives to 0 produces the unique solution x1 = 0, x2 = 0. In this case, the solution is particularly simple because the partial derivatives are linear functions of x1 and x2. The value of the function at its minimum is
The slopes of the tangent plane above the pair of values x1 = 1, x2 = 2, illustrated in Figure 2.17, are
2.5.2 Lagrange Multipliers for Constrained Optimization
The method of Lagrange multipliers (named after the 18th-century French mathematician Joseph-Louis Lagrange) permits us to optimize a function of the form y = f(x1, x2, …, xn) subject to a constraint of the form g(x1, x2, …, xn) = 0. The method, in effect, incorporates the constraint into the set of partial derivatives.
Here is a simple example: Minimize
subject to the restriction that x1 + x2 = 1. (In the absence of this restriction, it is obvious that x1 = x2 = 0 minimizes the function.) To solve this constrained minimization problem:
- Rewrite the constraint in the required form, g(x1, x2, …, xn) = 0. That is, x1 + x2 − 1 = 0.
- Construct a new function incorporating the constraint. In the general case, this function takes the form6
The new independent variable λ is called a Lagrange multiplier. For the example,
- Find the values of x1, x2, …, xn that (along with λ) optimize the function h(x1, x2, …, xn, λ). That is, differentiate h(x1, x2, …, xn, λ) with respect to each of x1, x2, …, xn and λ; set the n + 1 partial derivatives to 0; and solve the resulting system of simultaneous equations for x1, x2, …, xn and λ. For the example,
6. Some authors prefer to add, rather than subtract, the constraint,
h(x1, x2, …, xn, λ) ≡ f(x1, x2, …, xn) + λ × g(x1, x2, …, xn)
but, except for a change in the sign of λ, the two approaches are equivalent.
Notice that the partial derivative with respect to λ, when equated to 0, reproduces the constraint x1 + x2 = 1. Consequently, whatever solutions satisfy the equations produced by setting the partial derivatives to 0, necessarily satisfy the constraint. In this case, there is only one solution: x1 = x2 = 0.5 (and λ = 1).
The method of Lagrange multipliers easily extends to handle several restrictions, by introducing a separate Lagrange multiplier for each restriction.
2.5.3 Differential Calculus in Matrix Form
The function y = f(x1, x1, …, xn) of the independent variables x1, x2, …, xn can be written as the function y = f(x) of the vector x = [x1, x2, …, xn]′. The vector partial derivative (or the gradient) of y with respect to x is defined as the column vector of partial derivatives of y with respect to each of the entries of x:
If, therefore, y is a linear function of x,
then ∂y/∂xi = ai, and ∂y/∂x = a. For example, for
the vector partial derivative is
Alternatively, suppose that y is a quadratic form in x (see Section 1.6),
where the matrix A is symmetric. Expanding the matrix product gives us
and, thus,
where a′i represents the ith row of A. Placing these partial derivatives in a vector produces ∂y/∂x = 2Ax. The vector partial derivatives of linear and quadratic functions are strikingly similar to the analogous scalar derivatives of functions of one variable: d(ax)/dx = a and d(ax2)/dx = 2ax.
For example, for
the partial derivatives are
and the vector partial derivative is
The so-called Hessian matrix of second-order partial derivatives of the function y = f(x) is defined in the following manner:
For instance, ∂2(x′ Ax)/∂x ∂x′ = 2A, for a symmetric matrix A. The Hessian is named after the 19th-century German mathematician Ludwig Otto Hesse.
To minimize a function y = f(x) of several variables, we can set the vector partial derivative to 0, ∂y/∂x = 0, and solve the resulting set of simultaneous equations for x, obtaining a solution x∗. This solution represents a (local) minimum of the function in question if the Hessian matrix evaluated at x = x∗ is positive definite. The solution represents a maximum if the Hessian is negative definite.7 Again, there is a strong parallel with the scalar results for a single x: Recall that the second derivative d2y/dx2 is positive at a minimum and negative at a maximum.
I showed earlier that the function
has a stationary point (i.e., a point at which the partial derivatives are 0) at x1 = x2 = 0.5. The second-order partial derivatives of this function are
The Hessian evaluated at x1 = x2 = 0.5 (or, indeed, at any point), is, therefore,
7. The square matrix H (here, the Hessian) is positive definite if x′Hx > 0 for any nonzero vector x. (See Section 1.6.) A positive-definite Hessian is a sufficient but not necessary condition for a minimum. Likewise, the square matrix H is negative definite if x′Hx < 0 for any nonzero vector x; a negative-definite Hessian is a sufficient but not necessary condition for a maximum.
This matrix is clearly positive definite, verifying that the value y = 10 at x1 = x2 = 0.5 is a minimum of f(x1, x2).
2.6 Taylor Series
If a function f(x) has infinitely many derivatives (most of which may, however, be zero) near the value x = x0, then the function can be decomposed into the Taylor series
where f(n) represents the nth-order derivative of f, and n! is the factorial of n.8 Taylor series are named after the 18th-century British mathematician Brook Taylor.
As long as x is sufficiently close to x0, and as long as the function f(·) is sufficiently well behaved, f(x) may be approximated by taking only the first few terms of the Taylor series. For example, if the function is nearly quadratic between x and x0, then f(x) can be approximated by the first three terms of the Taylor expansion, because the remaining derivatives will be small; similarly, if the function is nearly linear between x and x0, then f(x) can be approximated by the first two terms.
To illustrate the application of Taylor series, consider the cubic function
Then
8. The factorial of a non-negative integer n is defined as n! ≡ n(n − 1)(n − 2) …… (2)(1); by convention, 0! and 1! are both taken to be 1.
Let us take x0 = 2; evaluating the function and its derivatives at this value of x,
Finally, let us evaluate f(x) at x = 4 using the Taylor-series expansion of the function around x0 = 2:
Checking by evaluating the function directly,
In this case, using fewer than all four terms would produce a poor approximation (because, of course, the function in cubic).
Taylor series expansions and approximations generalize to functions of several variables, most simply when the function is scalar-valued and when we can use a first- or second-order approximation. Suppose that y = f(x1, x2, …, xn) = f(x), and that we want to approximate f(x) near the value x = x0. Then the secord-order Taylor-series approximation of f(x) is
where g(x) ≡ ∂y/∂x and H(x) ≡ ∂2y/∂x ∂x′ are, respectively, the gradient and Hessian of f(x), both evaluated at x0. Notice the strong analogy to the first three terms of the scalar Taylor expansion, given in Equation 2.1.
Figure 2.18 The area A under a function f(x) between x0 and x1.
Figure 2.19 Approximating the area under a curve by summing the areas of rectangles.
2.7 Essential Ideas of Integral Calculus
2.7.1 Areas: Definite Integrals
Consider the area A under a curve f(x) between two horizontal coordinates, x0 and x1, as illustrated in Figure 2.18. This area can be approximated by dividing the line segment between x0 and x1 into n small intervals, each of width Δx, and constructing a series of rectangles just touching the curve, as shown in Figure 2.19. The x-values defining the rectangles are
Consequently, the combined area of the rectangles is
Figure 2.20 The integral ∫abf(x)dx is negative because the y values are negative between the limits of integration a and b.
The approximation grows better as the number of rectangles n increases (and Δx grows smaller). In the limit,9
The following notation is used for this limit, which is called the definite integral of f(x) from x = x0 to x1:
Here, x0 and x1 give the limits of integration, while the differential dx is the infinitesimal remnant of the width of the rectangles Δx. The symbol for the integral, ∫, is an elongated ‘S,’ indicative of the interpretation of the definite integral as the continuous analog of a sum.
The definite integral defines a signed area, which may be negative if (some) values of y are less than 0, as illustrated in Figure 2.20.
2.7.2 Indefinite Integrals
Suppose that for the function f(x), there exists a function F(x) such that
That is, f(x) is the derivative of f(x). Then F(x) is called an antiderivative or indefinite integral of f(x).
9. This approach, called the method of exhaustion (though not the formal notion of a limit), was known to the ancient Greeks.
The indefinite integral of a function is not unique, for if F(x) is an antiderivative of f(x), then so is G(x) = F(x) + c, where c is an arbitrary constant (i.e., not a function of x). Conversely, if F(x) and G(x) are both antiderivatives of f(x), then for some constant c, G(x) = F(x) + c.
For example, for f(x) = x3, the function ¼x4 + 10 is an antiderivative of f(x), as are ¼x4 − 10 and ¼x4. Indeed, any function of the form F(x) = ¼x4 + c will do.
The following notation is used for the indefinite integral: If
then we write
where the integral sign appears without limits of integration. That the same symbol is employed for both areas and antiderivatives (i.e., for definite and indefinite integrals), and that both of these operations are called ‘integration,’ are explained in the following section. Notice that while a definite integral—an area—is a particular number, an indefinite integral is a function.
2.7.3 The Fundamental Theorem of Calculus
Newton and Leibniz figured out the connection between antiderivatives and areas under curves. The relationship that they discovered between indefinite and definite integrals is called the fundamental theorem of calculus:
where F(·) is any antiderivative of f(·).
Here is a nonrigorous proof of this theorem: Consider the area A(x) under the curve f(x) between some fixed value x0 and another (moveable) value x, as shown in Figure 2.21. The notation A(x) emphasizes that the area is a function of x: As we move x left or right, the area A(x) changes. In Figure 2.21, x + Δx is a value slightly to the right of x, and ΔA is the area under the curve between x and x + Δx. A rectangular approximation to this small area is
The area ΔA is also
Figure 2.21 The area A(x) under the curve between the fixed value x0 and another value x.
Taking the derivative of A,
Consequently,
is a specific, but as yet unknown, indefinite integral of f(x). Let F(x) be some other specific, arbitrary, indefinite integral of f(x). Then A(x) = F(x) + c, for some c (because, as we previously discovered, two indefinite integrals of the same function differ by a constant). We know that A(x0) = 0, because A(x) is the area under the curve between x0 and any x, and the area under the curve between x0 and x0 is 0. Thus,
and, for a particular value of x = x1,
where (recall) F(·) is an arbitrary antiderivative of f(·).
For example, let us find the area (evaluate the definite integral)
Figure 2.22 The area A = ∫13 (x2 + 3)dx.
This area is graphed in Figure 2.22. It is convenient to use10
Then
2.8 Recommended Reading
There is an almost incredible profusion of introductory calculus texts, and I cannot claim to have read more than a few of them. Of these, my favorite is Thompson and Gardner (1998). For an extensive treatment of calculus of several variables with a social science (specifically, economic) focus, see Binmore and Davies (2001).
10. Reader: Verify that F(x) is an antiderivative of f(x) = x2 + 3. More generally, one can find antiderivatives of polynomial functions by working the rule for differentiating powers in reverse.
- derivative
- calculus
- Graphs
- Independent variables
- Measurement
- Loading...