Chapter 4 Solving

4.1 Functions vs equations

Much of the content of high-school algebra involves “solving.” In the typical situation, you have an equation, say \[ 3 x + 2 = y\] and you are asked to “solve” the equation for \(x\). This involves rearranging the symbols of the equation in the familiar ways, e.g., moving the \(2\) to the right hand side and dividing by the \(3\). These steps, originally termed “balancing” and “reduction” are summarized in the original meaning of the arabic word “al-jabr” (that is, used by Muhammad ibn Musa al-Khowarizmi (c. 780-850) in his “Compendious Book on Calculation by Completion and Balancing”. This is where our word “algebra” originates.

High school students are also taught a variety of ad hoc techniques for solving in particular situations. For example, the quadratic equation \(a x^2 + b x + c = 0\) can be solved by application of the procedures of “factoring,” or “completing the square,” or use of the quadratic formula: \[x = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a} .\] Parts of this formula can be traced back to at least the year 628 in the writings of Brahmagupta, an Indian mathematician, but the complete formula seems to date from Simon Stevin in Europe in 1594, and was published by René Descartes in 1637.

For some problems, students are taught named operations that involve the inverse of functions. For instance, to solve \(\sin(x) = y\), one simply writes down \(x = \arcsin(y)\) without any detail on how to find \(\arcsin\) beyond “use a calculator” or, in the old days, “use a table from a book.”

4.1.1 From Equations to Zeros of Functions

With all of this emphasis on procedures such as factoring and moving symbols back and forth around an \(=\) sign, students naturally ask, “How do I solve equations in R?”

The answer is surprisingly simple, but to understand it, you need to have a different perspective on what it means to “solve” and where the concept of “equation” comes in.

The general form of the problem that is typically used in numerical calculations on the computer is that the equation to be solved is really a function to be inverted. That is, for numerical computation, the problem should be stated like this:

You have a function \(f(x)\). You happen to know the form of the function \(f\) and the value of the output \(y\) for some unknown input value \(x\). Your problem is to find the input \(x\) given the function \(f\) and the output value \(y\).

One way to solve such problems is to find the inverse of \(f\). This is often written \(f^{\ -1}\) (which many students understandably but mistakenly take to mean \(1/f(x)\)). But finding the inverse of \(f\) can be very difficult and is overkill. Instead, the problem can be handled by finding the zeros of \(f\).

If you can plot out the function \(f(x)\) for a range of \(x\), you can easily find the zeros. Just find where the \(x\) where the function crosses the \(y\)-axis. This works for any function, even ones that are so complicated that there aren’t algebraic procedures for finding a solution.
To illustrate, consider the function \(g()\)

g <- makeFun(sin(x^2)*cos(sqrt(x^4 + 3 )-x^2) - x + 1 ~ x)
graphFun(g(x) ~ x, xlim = range(-3, 3)) %>%
  gf_hline(yintercept  = 0, color = "red")

You can see easily enough that the function crosses the \(y\) axis somewhere between \(x=1\) and \(x=2\). You can get more detail by zooming in around the approximate solution:

graphFun(g(x) ~ x, xlim = range(1, 2)) %>%
  gf_hline(yintercept = 0, color = "red")

The crossing is at roughly \(x \approx 1.6\). You could, of course, zoom in further to get a better approximation. Or, you can let the software do this for you:

findZeros(g(x) ~ x, xlim = range(1, 2))
##        x
## 1 1.5576

The syntax is very much like graphFun( ), but now the argument xlim is used to state where to look for a solution. (Due to a software bug, it’s always called xlim even if you use a variable other than x in your expression.)

You need only have a rough idea of where the solution is. For example:

findZeros(g(x) ~ x, xlim = range(-1000,  1000))
##        x
## 1 1.5576

findZeros() will only look inside the interval you give it. It will do a more precise job if you can state the interval in a narrow way.

4.1.2 Multiple Solutions

The findZeros( ) function will try to find multiple solutions if they exist. For instance, the equation \(\sin x = 0.35\) has an infinite number of solutions. Here are some of them:

findZeros( sin(x) - 0.35 ~ x, xlim=range(-20,20) )
##           x
## 1  -12.2088
## 2   -9.7823
## 3   -5.9256
## 4   -3.4991
## 5    0.3576
## 6    2.7840
## 7    6.6407
## 8    9.0672
## 9   12.9239
## 10  15.3504

Note that the equation \(\sin x = 0.35\) was turned into a function sin(3) - 0.35.

4.1.3 Setting up a Problem

As the name suggests, findZeros( ) finds the zeros of functions. You can set up any solution problem in this form. For example, suppose you want to solve \(4 + e^{k t} = 2^{b t}\) for \(b\), letting the parameter \(k\) be \(k=0.00035\). You may, of course, remember how to do this problem using logarithms. But here’s the set up for findZeros( ):

g <- makeFun(4 + exp(k*t) - 2^(b*t) ~ b, k=0.00035, t=1)
findZeros( g(b) ~ b , xlim=range(-1000, 1000) )
##       b
## 1 2.322

Note that numerical values for both \(b\) and \(t\) were given. But in the original problem, there was no statement of the value of \(t\). This shows one of the advantages of the algebraic techniques. If you solve the problem algebraically, you’ll quickly see that the \(t\) cancels out on both sides of the equation. The numerical findZeros( ) function doesn’t know the rules of algebra, so it can’t figure this out. Of course, you can try other values of \(t\) to make sure that \(t\) doesn’t matter.

findZeros( g(b, t=2) ~ b, xlim=range(-1000,1000) )
##        b
## 1 1.1611

4.1.4 Exercises

4.1.4.1 Exercise 1

Solve the equation \(\sin(\cos(x^2) - x) - x = 0.5\) for \(x\). {0.0000,0.1328,0.2098,0.3654,0.4217}

ANSWER:

findZeros( sin(cos(x^2) - x) -x - 0.5 ~ x, xlim=range(-10,10))
##        x
## 1 0.2098

4.1.4.2 Exercise 2

Find any zeros of the function \(3 e^{- t/5} \sin(\frac{2\pi}{2} t)\) that are between \(t=1\) and \(t=10\).

  1. There aren’t any zeros in that interval.}
  2. There aren’t any zeros at all!}
  3. $ 2, 4, 6, 8$}
  4. $ 1, 3, 5, 7, 9$}
  5. {\(1, 2, 3, 4, 5, 6, 7, 8, 9\)}

ANSWER:

findZeros( 3*exp(-t/5)*sin(pi*t) ~ t, xlim=range(1,10))
##    t
## 1  0
## 2  1
## 3  2
## 4  3
## 5  4
## 6  5
## 7  6
## 8  7
## 9  8
## 10 9

4.1.4.3 Exercise 3

Use findZeros() to find the zeros of each of these polynomials:

  1. \(3 x^2 +7 x - 10\)

Where are the zeros? a. \(x=-3.33\) or \(1\) b. \(x=3.33\) or \(1\) c. \(x=-3.33\) or \(-1\) d. \(x=3.33\) or \(-1\) e. No zeros

ANSWER:

findZeros( 3*x^2 + 7*x - 10 ~ x, xlim=range(-100,100))
##         x
## 1 -3.3334
## 2  1.0000
  1. \(4 x^2 -2 x + 20\)

Where are the zeros? a. \(x=-3.33\) or \(1\)} b. \(x=3.33\) or \(1\)} c. \(x=-3.33\) or \(-1\)} d. \(x=3.33\) or \(-1\)} e. No zeros

  1. \(2 x^3 - 4x^2 - 3x - 10\)

Which one of these is a zero? {-1.0627,0,1.5432,1.8011,2.1223,3.0363,none}

ANSWER:

findZeros(2*x^3 - 4*x^2 - 3*x - 10 ~ x, xlim=c(-10,10))
##        x
## 1 3.0363
  1. \(7x^4 - 2 x^3 - 4x^2 - 3x - 10\)

Which one of these is a zero? {-1.0627,0,1.5432,1.8011,2.1223,3.0363,none}

ANSWER:

findZeros( 7*x^4 -2*x^3 - 4*x^2 - 3*x - 10 ~ x, xlim=c(-10,10))
##         x
## 1 -1.0628
## 2  1.4123
  1. \(6 x^5 -7x^4 - 2 x^3 - 4x^2 - 3x - 10\)

Which one of these is a zero? {-1.0627,0,1.5432,1.8012,2.1223,3.0363,none}

ANSWER:

findZeros( 6*x^5-7*x^4 -2*x^3 - 4*x^2 - 3*x - 10 ~ x, xlim=c(-10,10))
##        x
## 1 1.8012

4.2 Linear algebra

The computations for performing linear algebra operations are among the most important in science. It’s so important that the unit used to measure computer performance for scientific computation is called a “flop”, standing for “floating point operation” and is defined in terms of a linear algebra calculation.

For you, the issue with using the computer to perform linear algebra is mainly how to set up the problem so that the computer can solve it. The notation that we will use has been chosen specifically to relate to the kinds of problems for which you will be using linear algebra: fitting models to data. This means that the notation will be very compact.

The basic linear algebra operations of importance are:

  • Project a single vector onto the space defined by a set of vectors.
  • Make a linear combination of vectors.

In performing these operations, you will use two main functions, project( ) and mat( ), along with the ordinary multiplication * and addition + operations. There is also a new sort of operation that provides a compact description for taking a linear combination: “matrix multiplication,” written %*%.

By the end of this ;ession, you should feel comfortable with those two functions and the new form of multiplication %*%.

To start, consider the sort of linear algebra problem often presented in textbooks in the form of simultaneous linear equations. For example: \[\begin{array}{rcrcr} x & + & 5 y & = &1\\ 2x & + & -2 y & = &1\\ 4x & + & 0 y & = & 1\\ \end{array} .\]

Thinking in terms of vectors, this equation can be re-written as \[ x \left(\begin{array}{r}1\\2\\4\end{array}\right) + y \left(\begin{array}{r}5\\-2\\0\end{array}\right) = \left(\begin{array}{r}1\\1\\1\end{array}\right) .\]

Solving this vector equation involves projecting the vector \(\vec{b} = \left(\begin{array}{r}1\\1\\1\end{array}\right)\) onto the space defined by the two vectors \(\vec{v}_1 = \left(\begin{array}{r}1\\2\\4\end{array}\right)\) and
\(\vec{v}_2 = \left(\begin{array}{r}5\\-2\\0\end{array}\right)\). The solution, \(x\) and \(y\) will be the number of multiples of their respective vectors needed to reach the projected vectors.

When setting this up with the R notation that you will be using, you need to create each of the vectors \(\vec{b}, \vec{v}_1\), and \(\vec{v}_2\). Here’s how:

b <- c(1,1,1)
v1 <- c(1,2,4)
v2 <- c(5,-2,0)

The projection is accomplished using the project() function:

project(b ~ v1 + v2)
##         v1         v2 
## 0.32894737 0.09210526

Read this as "project \(\vec{b}\) onto the subspace defined by \(\vec{v}_1\) and \(\vec{v}_1\).

The answer is given in the form of the multiplier on \(\vec{v}_1\) and \(\vec{v}_2\), that is, the values of \(x\) and \(y\) in the original problem. This answer is the “best” in the sense that these particular values for \(x\) and \(y\) are the ones that come the closest to \(\vec{b}\), that is, the linear combination that give the projection of \(\vec{b}\) onto the subspace defined by \(\vec{v}_1\) and \(\vec{v}_2\).

If you want to see what that projection is, just multiply the coefficients by the vectors and add them up. In other words, take the linear combination

0.32894737*v1 + 0.09210526*v2
## [1] 0.7894737 0.4736842 1.3157895

When there are lots of vectors involved in the linear combination, it’s much easier to be able to refer to all of them by a single object name. The mat( ) function takes the vectors and packages them together into a matrix. It works just like project( ), but doesn’t involve the vector that’s being projected onto the subspace. Like this:

A <- mat( ~ v1 + v2)
A
##      v1 v2
## [1,]  1  5
## [2,]  2 -2
## [3,]  4  0

Notice that \(A\) doesn’t have any new information; it’s just the two vectors \(\vec{v}_1\) and \(\vec{v}_2\) placed side by side.

Let’s do the projection again:

z <- project( b ~ v1 + v2)
z
##         v1         v2 
## 0.32894737 0.09210526

To get the linear combination of the vectors in \(A\), you matrix-multiply the matrix \(A\) times the solution \(z\):

A %*% z
##           [,1]
## [1,] 0.7894737
## [2,] 0.4736842
## [3,] 1.3157895

Notice, it’s the same answer you got when you did the multiplication “by hand.”

When working with data, statisticians almost always include another vector called the intercept which is simply a vector of all 1s. You can denote the intercept vector with a plain 1 in the mat() or project() function, like this:

A <- mat( ~ v1 + v2 + 1)
A
##      (Intercept) v1 v2
## [1,]           1  1  5
## [2,]           1  2 -2
## [3,]           1  4  0
z <- project( b ~ A)
z
## A(Intercept)          Av1          Av2 
## 1.000000e+00 0.000000e+00 2.775558e-17
A %*% z
##      [,1]
## [1,]    1
## [2,]    1
## [3,]    1

Notice that the matrix A has a third vector: the intercept vector. The solution consequently has three coefficients. Notice as well that the linear combination of the three vectors exactly reaches the vector \(\vec{b}\). That’s because now there are three vectors that define the subspace: \(\vec{v}_1\), \(\vec{v}_2\), and the intercept vectors of all ones.

4.2.1 Example: Atomic bomb data.

The data file blastdata.csv contains measurements of the radius of the fireball from an atomic bomb (in meters) versus time (in seconds). In the analysis of these data, it’s appropriate to look for a power-law relationship between radius and time. This will show up as a linear relationship between log-radius and log-time. In other words, we want to find \(m\) and \(b\) in the relationship log-radius \(= m\) log-time \(+ b\). This amounts to the projection

bomb = read.csv("http://www.mosaic-web.org/go/datasets/blastdata.csv")
project(log(radius) ~ log(time) + 1, data=bomb)
## (Intercept)   log(time) 
##   6.2946893   0.3866425

The parameter \(m\) is the coefficient on log-time, found to be 0.3866.

4.2.2 Exercises

4.2.2.1 Exercise 1

Remember all those “find the line that goes through the points problems” from algebra class. They can be a bit simpler with the proper linear-algebra tools.

Example: “Find the line that goes through the points \((2,3)\) and \((7,-8)\).”

One way to interpret this is that we are looking for a relationship between \(x\) and \(y\) such that \(y = mx + b\). In vector terms, this means that the \(x\)-coordinates of the two points, \(2\) and \(7\), made into a vector \(\left(\begin{array}{c}2\\7\end{array}\right)\) will be scaled by \(m\), and an intercept vector \(\left(\begin{array}{c}1\\1\end{array}\right)\) will be scaled by \(b\).

x <- c(2, 7)
y <- c(3, -8)
project( y ~ x + 1) 
## (Intercept)           x 
##         7.4        -2.2

Now you know \(m\) and \(b\).

YOUR TASKS: For each of the following, find the line that goes through the two Cartesian points using the project( ) function. Remember, the vectors involved in the projection will have the form \[\vec{x}=\left(\begin{array}{r}x_1\\x_2\end{array}\right) \mbox{and} \ \ \vec{y}=\left(\begin{array}{r}y_1\\y_2\end{array}\right)\] and

  1. Find the line that goes through the two points \((x_1=9, y_1=1)\) and \((x_2=3, y_2=7)\).
  1. \(y = x + 2\)
  2. \(y = -x + 10\)
  3. \(y=x + 0\)
  4. \(y = -x + 0\)
  5. \(y = x - 2\)

ANSWER:

project( c(1,7) ~ c(9,3) +  1)
## (Intercept)     c(9, 3) 
##          10          -1
  1. Find the line that goes through the origin \((x_1=0, y_1=0)\) and \((x_2=2,y_2=-2)\).
  1. \(y = x + 2\)
  2. \(y = -x + 10\)
  3. \(y=x + 0\)
  4. \(y = -x + 0\)
  5. \(y = x - 2\)

ANSWER:

project( c(0,-2) ~ c(0,2) + 1)
##   (Intercept)       c(0, 2) 
## -5.551115e-17 -1.000000e+00
  1. Find the line that goes through \((x_1=1, y_1=3)\) and \((x_2=7, y_2=9)\)
  1. \(y = x + 2\)
  2. \(y = -x + 10\)
  3. \(y=x + 0\)
  4. \(y = -x + 0\)
  5. \(y = x - 2\)

ANSWER:

project( c(3,9) ~ c(1,7) + 1)
## (Intercept)     c(1, 7) 
##           2           1

4.2.2.2 Exercise 2

  1. Find \(x\), \(y\), and \(z\) that solve the following: \[ x \left(\begin{array}{r}1\\2\\4\end{array}\right) + y \left(\begin{array}{r}5\\-2\\0\end{array}\right) + z \left(\begin{array}{r}1\\-2\\3\end{array}\right) = \left(\begin{array}{r}1\\1\\1\end{array}\right) .\]

What’s the value of \(x\)?: {-0.2353,0.1617,0.4265,1.3235,1.5739}

ANSWER:

project( c(1,1,1) ~ c(1,2,4) + c(5,-2,0) + c(1,-2,3)-1)
##  c(1, 2, 4) c(5, -2, 0) c(1, -2, 3) 
##   0.4264706   0.1617647  -0.2352941
  1. Find \(x\), \(y\), and \(z\) that solve the following: \[ x \left(\begin{array}{r}1\\2\\4\end{array}\right) + y \left(\begin{array}{r}5\\-2\\0\end{array}\right) + z \left(\begin{array}{r}1\\-2\\3\end{array}\right) = \left(\begin{array}{r}1\\4\\3\end{array}\right) .\]

What’s the value of \(x\)? {-0.2353,0.1617,0.4264,1.3235,1.5739}

ANSWER:

project(c(1,4,3) ~ c(1,2,4) + c(5,-2,0)  + c(1,-2,3))
##  c(1, 2, 4) c(5, -2, 0) c(1, -2, 3) 
##  1.32352941  0.08823529 -0.76470588

4.2.2.3 Exercise 3

Using project( ), solve these sets of simultaneous linear equations for \(x\), \(y\), and \(z\):

Two equations in two unknowns: \[\begin{array}{rcrcr} x & + & 2 y & = &1\\ 3 x & + & 2 y & = &7\\ \end{array}\]

  1. \(x=3\) and \(y=-1\)
  2. \(x=1\) and \(y=3\)
  3. \(x=3\) and \(y=3\)

ANSWER:

b <- c(1,7)
x <- c(1,3)
y <- c(2,2)
project(b ~ x + y)
##  x  y 
##  3 -1

Three equations in three unknowns: \[\begin{array}{rcrcrcr} x & + & 2 y & + & 7 z & = &1\\ 3 x & + & 2 y & + &2 z&= &7\\ -2 x & + & 3 y & + & z&= &7\\ \end{array} \]

  1. \(x=3.1644\), \(y=-0.8767\), \(z=0.8082\)
  2. \(x=-0.8767\),\(y=0.8082\), \(z=3.1644\)
  3. \(x=0.8082\), \(y=3.1644\), \(z=-0.8767\)

ANSWER:

b <- c(1,7,7)
x <- c(1,3,-2)
y <- c(2,2,3)
z <- c(7,2,1)
project(b ~ x + y + z)
##          x          y          z 
##  0.8082192  3.1643836 -0.8767123

Four equations in four unknowns: \[\begin{array}{rcrcrcrcr} x & + & 2 y & + & 7 z & +& 8 w& = &1\\ 3 x & + & 2 y & + &2 z& +& 2 w& = &7\\ -2 x & + & 3 y & + & z&+& w&= &7\\ x & + & 5 y & + &3 z&+& w&= &3\\ \end{array} \]

\begin{MultipleChoice} a. \(x=5.500\), \(y=-7.356\), \(z=3.6918\), \(w=1.1096\) #. \(x=1.1096\), \(y=3.6918\), \(z=-7.356\), \(w=5.500\) #. \(x=5.500\), \(y=-7.356\), \(z=1.1096\), \(w=3.6918\) #. \(x=1.1096\), \(y=-7.356\), \(z=5.500\), \(w=3.6918\)

ANSWER:

x <- c(1,3,-2,1)
y <- c(2,2,3,5)
z <- c(7,2,1,3)
w <- c(8,2,1,1)
b <- c(1,7,7,3)
project(b ~ x + y + z + w)
##         x         y         z         w 
##  1.109589  3.691781 -7.356164  5.500000

Three equations in four unknowns: \[\begin{array}{rcrcrcrcr} x & + & 2 y & + & 7 z & +& 8 w& = &1\\ 3 x & + & 2 y & + &2 z& +& 2 w& = &7\\ -2 x & + & 3 y & + & z&+& w&= &7\\ \end{array} \]

  1. There is no solution.
  2. There is a solution.

ANSWER:

x <- c(1,3,-2)
y <- c(2,2,3)
z <- c(7,2,1)
w <- c(8,2,1)
b <- c(1,7,7)
answer <- project(b ~ x + y + z + w)
mat( ~ x + y + z + w) %*% answer
##      [,1]
## [1,]    1
## [2,]    7
## [3,]    7

You may hear it said that there is no solution to a problem of three equations in four unknowns. But a more precise statement is that there are many solutions, an infinity of them. Mathematicians tend to use “a solution” to stand for “a unique, exact solution.” In applied work, neither “unique” nor “exact” mean very much.