II. Geometry of Functions

Linear Methods of Applied Mathematics
Evans M. Harrell II and James V. Herod*

*(c) Copyright 1994-2000 by Evans M. Harrell II and James V. Herod. All rights reserved.

version of 12 January 2000

If you wish to print a nicely formatted version of this chapter, you may download the rtf file, which will be interpreted and opened by Microsoft Word. Alternatively, you may switch to the Adobe Acrobat version of this chapter.

Here are a Mathematica notebook and a Maple worksheet performing some of the calculations of this chapter.

Finally, here are some remarks for the instructor.

II. The geometry of functions.

The other familiar vector operation we shall use, besides sums and scalar multiples, is the dot product, which we abstractly call an inner product. (We won't be concerned with analogues of the cross product, although you may see these in other courses.) You probably learned about the dot product as connected with trigonometry - the dot product of two vectors is given as the product of their lengths times the cosine of the angle between them:

v.w = |v| |w| cos(angle(v,w)) (2.1)

Later you learned that this could be conveniently calculated from the coordinates of the two vectors, by multiplying given components together and summing: If the components of the vector v are upsilon ₁, ..., upsilon _n and those of w are omega ₁, ..., omega _n., then

v.w = Sum on k of v<sub>k</sub> (complex conj of w<sub>k</sub>) (2.2)

From the abstract point of view it is best not to begin with angle, since we don't have a good intuitive picture of the angle between two functions or matrices. Instead, we will make sense of the "angle" between functions from the connection between these formulae.

In the abstract setting, we shall denote the inner product rather than v.w. Given an inner product, we may speak about the length, defined by ||v|| = (v.v)^1/2

We shall usually call this the norm rather than the length.

An abstract inner product will share most of the properties you learned about for the ordinary dot product:

1. The inner product is a mapping taking pairs of vectors and producing scalars

2. Linearity :

<a_1 w_1 + a_2 w_2, v > = a_1
<w_1,v> + a_2
<w_2,v>, where a's are scalars; v,w vectors

Note: some texts define an inner product to be linear on the right side rather than the left side. This makes no practical difference, but if there are complex quantities, you should be careful to be consistent with whichever convention you follow.

3. Symmetry :

<v,w> = complex conj of <w,v>

(The bar denotes complex conjugate, in case these are complex numbers.)

4. Positivity :

<v,v> >= 0, and <v,v> = 0 only if v is the
zero vector

5. The Cauchy, Schwarz, or Buniakovskii inequality (depending on your nationality):

|<v,w>| <= ||v|| ||w|| (2.3)

6. The triangle inequality :

||v+w|| <= ||v|| + ||w|| (2.4)

We need Property 5 if an inner product is to correlate with Eq. (2.1), and we need Property 6 if the length

|||v|| = Sqrt(<v,v>)

is to make sense geometrically. We will shortly define an inner product for functions. The reason we can speak of the geometry of functions is that properties 5 and 6 follow automatically from Properties 1-4. Since this is not at all obvious, I shall prove it below.

Definition II.1. Given an abstract vector space, an inner product is any mapping satisfying properties 1-4 above. A vector space with an inner product is called an inner product space.

Examples II.2.

1. The usual dot product, as defined in (2.2).

2. A modified dot product on 2-vectors. A modification of the usual dot product can also produce an inner product defined by

<v,w>_A = sum j,k from 1 to 2 of v_j A_jk * (conj. w_k)

where

$A_jk = {{2,1},{1,2}} (2X2 matrix)$

Actually, we could consider the vector space of n-vectors and let A be any positive n by n matrix. (By definition, a positive matrix is one for which the inner product so defined satisfies Property 4 of the inner product.)

3. The standard inner product for functions. Consider the set of functions which are continuous on an interval a x b. Then the standard inner product on them is the integral:
<f,g> = integral from a to b of f * g-bar, with bar denoting conjugate
Another name for this is the L² inner product. We can use it for functions with discontinuities and even singularities, so long as the singularities are not too strong. The most general set of functions, defined on a set Omega , for which this inner product is defined is known as the set of square-integrable functions, denoted L²( Omega ).

Some fancier examples of inner products.

Theorem II.3. If V is an inner product space, then the CSB inequality (Property 5) and the triangle inequality (Property 6) hold.

Proof (don't worry -it won't hurt you!)

The most familiar inner product space is Rⁿ, especially with n=2 or 3, and a brief review of some vector operations is available as a Mathematica notebook.

Having the CSB inequality in hand, we may now define a strange but useful idea - the angle between two functions. Consider, for example, the interval 0xL and the functions f(x) = 1 and g(x) = x. With the standard inner product, we first calculate their "L²" norms:

||1|| = Sqrt(L) and ||x|| = L^3/2 / 3^1/2

Since their inner product is

<1,x>= L^2 /2,

the cosine of the angle between the two functions must be

Sqrt(3) /2.

Thus the "angle" between the functions 1 and x is /6 radians. The most useful angle to deal with, however, is a right angle:

Definition II.4. Two functions f and g are said to be orthogonal if <f,g> = 0. A set of functions {f_j} is orthogonal if <f_j,f_k> = 0 whenever j differs from k. The set is said to be orthonormal if it is orthogonal and ||f_j|| = 1 for all j.

With the Kronecker delta symbol, delta _jk = 0 when j differs from k, and delta _kk = 1, orthonormality can be expressed as <f_j,f_k> = delta _jk.

Examples II.5.

1. Integral tables, mathematical software, integration by parts (twice), substitution with the cosine angle-sum rule, and rewriting trigonometric functions as complex exponentials can all be used to evaluate integrals such as

trig integrals.

Any or all of these methods will lead to the same conclusion, viz.:

integral 0 to L of sin(m Pi x/L) sin(n Pi x/L) = (L/2)
delta_mn (2.5)

The set of functions

sin(m Pi x/L)

is orthogonal on the interval [0,L], and to turn it into an orthonormal set, we normalize the functions by multiplying by the appropriate constant:

Sqrt(2/L) (2.6)

2. Similarly,

Sqrt(2/L) cos(m Pi x/L) (2.7)

is orthonormal on the interval [0,L], and we can even include another function, the constant:

1/Sqrt(L) along with the cosines (but not the sines).

3. We can mix the previous two sets to have both sines and cosines as long as we leave out all of the odd coefficients:

(2.8)

is also an orthonormal set. This one is the basis of the usual Fourier series, and is perhaps the most important of all our orthonormal sets. By the way, we do not claim that the functions in (2.6), (2.7), and (2.8) are orthogonal to functions in the other sets, but only separately among themselves. For instance,

sin(Pi x /L) is not orthogonal to 1/Sqrt(L).

4. Recall that by Euler's formula, exp(i alpha ) := eⁱ := cos( alpha ) + i sin( alpha ). The complex trigonometric functions

exp(2 Pi i n x/L)

have many useful properties, including

|exp(2 Pi i n x/L)| = 1 for all x

and

(1/Sqrt(L) exp(2 Pi i n x/L), - infinity<n < infinity (2.9)

is an orthonormal set on [0,L].

For later purposes, we observe here that the sets of functions (2.5)-(2.9) are each orthogonal on any interval [a,b] with L = b-a.

Before finishing this section we need two more notions about vectors and functions, thought of as abstract vectors. The first is distance. With the standard inner product, we would like to define the distance between two functions f and g as the L² norm of f - g:

||f-g||^2 := integral of f* g-bar

This turns out to be a familiar quantity in data analysis, called the root-mean-square, or r.m.s., deviation. It is a convenient way of specifying how large the error is when the true function f is replaced by a mathematical approximation or experimentally measured function g. It is always positive unless f = g almost everywhere.

Definition II.6. Almost everywhere is a technical phrase meaning that f and g differ for sufficiently few values of x that all integrals involving f have the same values as those involving g. Such a negligible set is called a null set. For most practical purposes we may regard f and g as the same functions, and write:

f = g a.e.

A typical example is where a function is discontinuous at a point and some arbitrary decision is made about the value of the function at the discontinuity. Different choices are still equal a.e. Much more exotic possibilities than this typical example can occur, but will not arise in this course.

The second notion we generalize from ordinary vectors is that of projection. Suppose that we have a position vector in three dimensions such as v = (3, 4, 5)

Model Problem II.7. We wish to find the vector in the plane spanned by (1,1,1) and (1,-1,0), which is the closest to v = (3,4, 5) .

Solution. We solve this problem with a projection. The projection of a vector v onto v₁ is given by the formula:

$P_{v_1} (v) = (<v,v_1>/||v_1||^2) v_1$ (2.10)

Notice that this points in the direction of v₁ but has a length equal to ||v|| cos(theta), where theta is the angle between v and v₁. The length of v₁ has nothing to do with the result of this projection - if we were being very careful, we would say that we were projecting v onto the direction determined by v₁, or onto the line through v₁. For similar reasons we notice that the vector v₁ could be normalized to have length 1, so that the denominator can be ignored - it is 1. In our example,

the proj of (3,4,5) in the direction (1,1,1) is (4,4,4).

(Here we write the vectors as column vectors because the projection operator is equivalent to a 3 by 3 matrix multiplying them.)

If the basis for the plane consists of orthogonal vectors v₁ and v₂, as in our example, then the projection into the plane is just the sum of the projections onto the two vectors:

$P_{v1,v2} (v) = sum of (v dot v_n) v_n /||v_n||^2$ (2.11)

In our example,

the result is (7/2,9/2,4).

Calculations like these are easily done with software; see the Mathematica notebook or the Maple worksheet

Model Problem II.8. We wish to find a) the vector in the plane spanned by (1,1,1) and (1,2,3), which is the closest to v = (3, 4, 5) and b) the vector in the plane closest to (1,-1,0) .

Solution. This is similar to the previous problem, except that the vectors defining the plane are not orthogonal. We need to replace them with a different pair of vectors, which are linear combinations of the first, but which are orthogonal. (We'll do this later systematically, with the Gram-Schmidt method.) The formula (2.11) is definitely wrong if the vectors v_n are not orthogonal (problem II.8). After finding the new pair of vectors, however, the solution will be as before - just sum the projections onto the orthogonal basis vectors.

There is more than one good choice for the pair of orthogonal basis vectors. If we solve the vector equation

(1,1,1) dot ( (1,2,3) - alpha (1,1,1)) = 0

for alpha we get alpha = 2, so a suitable second vector which is orthogonal to (1,1,1) is (-1,0,1).

The projection of (3,4,5) into the plane is the sum of its projections onto these vectors, i.e.,

(3.4.5).

Perhaps it looks strange to see that the projection of the vector (3,4,5) is itself, but the interpretation is simply that the vector was already in the plane before it was projected. In general a vector will be moved (and shortened) when it is projected into a plane, and we can see this when we project (1,-1,0):

the result is (1/2, 0, -1/2).

Now that we have a vector in the plane, if we project again, it won't move. Algebraically, projections satisfy the equation P² = P .

We shall make these same calculations in function space to find the best mean-square fit of a function f(x) by a nicer expression, such as the functions in (2.8) or polynomials. This can be automated with Mathematica (see notebook) or Maple (see worksheet). The formula simply replaces the dot product with the standard inner product:

$Projf[f_,g_,] := g[x] Integrate[f[t] g[t] , {t,a,b}] \ / Integrate[g[t]^2, {t,a,b}]$

For example, if we wish to find the multiple of sin(3x) which is closest to the function x on the interval 0 < x < , we find:

(2/3) sin(3 x).

The projection of a vector/function onto its own direction will be the same as the vector/function itself.

Now consider what we mean when we project a function onto the constant function 1. This should be the best approximation to f(x) consisting of a single number. What could this be but the average of f? Indeed, the projection formula gives us

the integral of f divided by (b-a),

which is familiar as the average of a function.

Model Problem II.9. Consider the set of functions on the interval -1x1 We wish to find the function in the span of 1, x, and x², which is the closest to f(x) = cos( x/2) . In other words, find the best quadratic fit to the function f in the mean-square sense. In Exercise II.6 you are asked to compare a similar global polynomial approximation to this function with the Taylor series.

Solution. The calculations can be done with Mathematica (see notebook) or Maple (see worksheet) if you prefer. Conceptually, the calculations are closely analogous to those of Model Problem II.8.

First let us ask whether the three functions are orthogonal. Actually, no. The function x is orthogonal to each of the other two, but 1 and x² are not orthogonal. We can see this immediately because x is odd, while 1 and x²are both even, and both positive a.e.

A more suitable function than x² would be x² minus its projection onto the direction of 1, that is, x² minus its average, which is easily found to be 1/3. The set of functions {1, x, and x² - 1/3} is an orthogonal set, as you can check.

Then we can project cos(�x/2) onto the span of the three functions 1, x, and x² - 1/3:

2/pi, 0, and -(15(24-2 pi^2)/92 pi^3)) (x^2 - 1/3)

The best quadratic approximation to cos( x/2) on the interval -1 < x < 1 is the sum of these three functions. Here is a graph showing the original function and its approximation:

(graph)

For comparison, here is a plot which shows the Taylor approximation as well as the original and the best quadratic:

(graph)

The general algorithm for finding the best approximation to a function is as follows. Suppose that we want to find the best approximation to f(x), a

b, of the form a₁g₁(x) + a₂g₂(x) + ... + a_1ng_n(x), where g₁...g_n are some functions with nice properties - they may oscillate with definite frequencies, have simple shapes, etc. They could be chosen to capture the important features of f(x), while possibly simplifying its form or filtering out some noise.

Step 1. Replace g₁...g_n by an orthogonal set with the same span. (A systematic way to do this, the Gram-Schmidt procedure, is described below.) Let's call the orthogonal set h₁...h_n.
Step 2. Project f onto each of the h_k.
Step 3. Sum the projections. If P denotes the span of g₁...g_n, then

Proj_P(f) = the sum of (<f,h_k>/||h_k||^2) h_k(x) (2.12)

Perhaps the most important functional approximation uses the Fourier functions (2.8), as we shall learn to call them. The coefficients of these functions in the projection are called Fourier coefficients, and the approximation is as follows:

$f(x) approx = a_0 + Sum(a_m cos(2 \pi m x/L) + Sum(b_n sin(2 \pi n x/L).$

The right side should be the projection of f(x) on the span of the sines and cosines (including the constant) on the right. To get the coefficients we use the analogue of the formula (2.10). For example,

(long expression for proj onto a cosine)

In other words, the formula for the coefficients a_m must be:

a_m = (2/L) integral 0 to L of f(x)*cos(2 Pi m
x/L). (2.13)

a_0 = (1/L) integral 0 to L of f(x). (2.14)

As mentioned above, the projection of f onto a constant function is its average.

Finally, the analogous calculation for the sines gives:

b_n = (2/L) integral 0 to L of f(x)*sin(2 Pi n x/L). (2.15)

Formulae (2.13)-(2.15) will be the basis of the Fourier series in the next section.

The notions of best approximation and projection in function space are at the heart of filtering theory. A filtered signal is nothing other than a function that has been projected in some way to remove certain frequencies or some undesired characteristics such as noise. (Some further remarks about filtering and signal processing.)

Constructing orthonormal sets. It is often convenient to have orthonormal, or at least orthogonal sets. These are analogous to the usual basis vectors for the plane or for 3-space (denoted R³), but you may recall that there are many choices of orthogonal bases. For instance, you may have unit vectors oriented along the x,y, and z axes, but someone else may have unit vectors oriented along a rotated set of axes. Although we shall first concentrate on the set (2.8) as a basis for a vector space of functions, the other sets of orthonormal functions (2.5)-(2.7) and (2.9) will be useful later for the same purpose. The choice among these sets is analogous to the choice of different bases for R³.

But how do we come up with a basis in the first place? Suppose you are given several vectors, such as v_1,2,3 = (1,0,0), (1,1,0), and (1,1,1), and you want to recombine them to get an orthonormal, or at least orthogonal set. You can do this by projecting away the parts of the vectors orthogonal to each other. The systematic way of doing this is called the Gram-Schmidt procedure, and it depends a great deal on the order in which it is done.

Model Problem II.10. Find an orthonormal set with the same span as v_1,2,3 = (1,0,0), (1,1,0), and (1,1,1), beginning with w₁ = v₁ = (1,0,0). (We rename it because it is the first vector in a new set of recombined vectors.)

Solution. The next vector v₂ is not orthogonal to v₁, so we subtract off the projection of v₂ onto the direction of v₁:

w_2 = v_2 - (proj. of v_2 on w_1 = (0,1,0).

For w₃, we begin with v₃ and project away the parts in the plane spanned by v₁ and v₂, which is the same as the plane spanned by w₁ and w₂. We find the standard basis vector

w_3 = (0,0,1).

A computer-based solution, using either a Mathematica notebook or a Maple worksheet to do the work of orthogonalizing for us is also available.

Notice that the results are different if we take the same vectors in a different order:

Model Problem II.11. Find an orthonormal set with the same span as v_1,2,3 = (1,0,0), (1,1,0), and (1,1,1), beginning with v₃.

Solution. Fist let's turn v₃ into a unit vector:

w_1-tilde = 1/Sqrt(3) (1,1,1)

For the second unit vector we could take v₂ and project away the part pointing along v₃, getting

calculation for w_2-tilde

which we can normalize as

w_2-tilde = 1/Sqrt(6) (1,1,-2)

Finally, taking v₁, projecting out the components in these directions, and normalizing, gives us the vector

w_3-tilde = 1/Sqrt(2) (1,-1,0)

Again, a computer-based solution using either a Mathematica notebook or a Maple worksheet is available.

Guess what's coming? The same procedure with sets of functions!

Model Problem II.12. Construction of the Legendre polynomials. (See the Mathematica notebook or Maple worksheet for a computer-assisted discussion.) Let us consider the interval -1 x 1, and find a set of orthonormal functions which are more mundane than the trigonometric functions, namely the polynomials. We begin with the power functions 1, x, x², x³, .... Some of these are orthogonal because some are even functions and others are odd, but they are not all orthogonal to one another. For instance,

<1,x^2>=2/3.

Let us denote the set of orthogonal polynomials we get using the Gram-Schmidt procedure on the power functions (in this order) p₀, p₁, p₂, .... These are important in approximation theory and differential equations, and are known as the normalized Legendre polynomials. Beginning with the function x⁰ = 1, after normalization we find

p_0 (x) = Sqrt(1/2).

The next power is x¹ = x. Since x is already orthogonal to any constant function, all we do is normalize it:

p_1 (x) = Sqrt(3/2) x.

To make x² orthogonal to p₀ , we need to subtract a constant: x² - 1/3. Because of symmetry, it is already orthogonal to p₁, so we don't worry about p₁ yet, and just normalize:

p_2(x) = Sqrt(5/2) (3 x^2 /2 - 1/2).

Similarly, when orthogonalizing x³ we need to project out x but not 1 or x². We find x³ - 3x/5, or, when normalized:

p_3(x) = Sqrt(7/2) (5 x^3 /2 - 3 x/2).

etc.

By the way, Legendre polynomials are traditionally not normalized as we have done, but rather are denoted P_k(x) and scaled so that P_k(1) = 1. The normalization for Legendre polynomials of arbitrary index is such that

p_n(x) = (n + 1/2)^1/2 P_n(x).

Most special functions are known to Mathematica and Maple, so calculations with them are no more difficult than calculations with sines and cosines. (See the Mathematica notebook or Maple worksheet.)

Sometimes it is not possible to define an inner product, but there is still some useful geometry which depends only on the norm, and sometimes on a slightly more complicated version of the CSB inequality, known as Hölder's inequality. (Further discussion.)

Exercises.

II.1. Find the norms of and "angles" between the following functions defined for -1x1. (In the case of complex quantities, the "angle" is not so meaningful, but you can still define the cosines of the angles.) Use the standard inner product:

1, x, x², cos( x), exp(i x)

II.2. Verify the inner product of Example II.2.5 for 2x2 matrices. Find all matrices orthogonal to {0,1},{1,0}}

Do they form a subspace of the 2 by 2 matrices? If so, of what dimension?

(Selected solutions by Mathematica are available in Mathematica format or in rtf format, which is readable by MS Word).

II.3. It was remarked above that the projection operator on ordinary 3-vectors is equivalent to a matrix. What is the matrix for the projection onto (1,2,-3)? What is the matrix for the projection onto the plane through the origin determined by the vectors (1,2,-3) and (-2,0,0)?

II.4. Use the Gram-Schmidt procedure to construct other orthonormal sets from the basis vectors v_1,2,3 given in the text, taken in other orders. How many distinct basis sets are there?

II.5. Find the first 8 normalized Legendre polynomials.

II.6. a) Calculate the Taylor series for cos( x/2) about the point x=0, up to the term with x⁴. (Go to x⁶ if you choose to use software to do this problem.)

b) Use the Legendre polynomials to find the polynomial of degree 4 which is the best approximation to cos( x/2) for -1x1.

c) On a single graph, sketch cos( x/2) and these two approximations.

II.7. In each case below, find a) the multiple of g which is closest in the mean-square sense to f, and b) a function of the form f - alpha g which is orthogonal to f. If for some reason this is not possible, interpret the situation geometrically.

(i) f(x) = sin( x), g(x) = x, 0 x 1

(ii) f(x) = sin( x), g(x) = x, -1 x 1

(iii) f(x) = cos( x), g(x) = x, -1 x 1

(iv) f(x) = x² - 1, g(x) = x² + 1, -1 x 1

(Selected solutions by Mathematica are available in Mathematica format or in rtf format, which is readable by MS Word).

II.8. Show with explicit examples that the formula (2.11) is definitely wrong if the vectors v_n are not orthogonal.

Link to

chapter III (red and yellow syllabus)

chapter I

Table of Contents

Evans Harrell's home page

Jim Herod's home page

Linear Methods of Applied Mathematics Evans M. Harrell II and James V. Herod*

*(c) Copyright 1994-2000 by Evans M. Harrell II and James V. Herod. All rights reserved.

II. The geometry of functions.

Linear Methods of Applied Mathematics
Evans M. Harrell II and James V. Herod*