1 Space exploration

Chapters 11 and 12 introduced vectors in two-dimensional space, where simple tasks can be performed intuitively. For example, to measure the length of a vector, use a ruler! To find out whether two vectors are parallel, just look!

Now we want to extend our capabilities into higher-dimensional space. The motivation for this is, for example, making sense of complex data or constructing a model of language.

We start with three-dimensional space, where the geometry can be conveyed by moving the viewing perspective. This will let us explore how pairs of vectors can be used to specify a unique ☞ plane ☜. This leads to a new operation, ☞ projection ☜, which casts the shadow of a vector onto another vector or onto a plane.

The projection operation is key to working with data, for instance, building statistical models or identifying patterns in the data. But projection is especially powerful when working in high-dimensional spaces.

In four- and higher-dimensional spaces, it is practically impossible to draw a meaningful graphic. But the operations we can see in three dimensions can be calculated in any dimension using arithmetic. This includes vector length, angles between vectors, and projections onto planes or higher-dimensional extensions of planes.

To build intuition, let’s return to 3-dimensional space. Consider Figure 1, which shows two vectors, one in blue and the other gray. They are drawn on a green background. At a glance, one can see that the angle between the vectors is obtuse, perhaps about 130 degrees. Also, the gray vector is shorter than the blue vector. Easy!

In fact, each vector is 3-dimensional. And the green “background” is actually a plane that contains both the vectors. Try rotating the scene to see this more clearly.

glX 
  3 
Figure 1: Two vectors: blue and gray. You can position the cursor over the scene and press/drag to rotate the vectors in three dimensions.

As you change the viewing perspective in Figure 1, the apparent lengths of the vectors as well as the angle between them can change. And, you’ll see that the green “background” is actually a plane, with both vectors contained in the plane.

Indeed, if you change the perspective to view the plane edge-on, the blue and gray vectors will appear to lie on the same line. And depending on which angle you take (keeping the plane edge on), the two vectors will be arranged variously in opposite directions or in the same direction.

To measure the proper length of a vector, you need to lay down a ruler on top of the vector, parallel to it. Similarly, the measure the proper angle between the two vectors, you need to lay a protractor in the green plane that contains both vectors.

We can represent the situation arithmetically. Since the vectors live in three-dimensional space, the tip of each vector can be written in (x,y,z) coordinates. (For simplicity, the base of each vector will always be at the origin, that is, (0,0,0).) We could have put tick marks along the x-, y-, and z-axes to let you measure the coordinates of each vector, but for convenience, here they are.

\[ \color{blue}{\vec{v} \equiv \left(\begin{array}{c}7\\4\\0\end{array}\right)}\ \ \ \text{and}\ \ \ \color{magenta}{\vec{w}\equiv\left(\begin{array}{c}1.5\\-4\\0\end{array}\right)}\] Note that, for future reference, we have given names, \(\vec{v}\) and \(\vec{w}\) to the two vectors.

With the arithmetic representation, we don’t need a ruler to measure the ☞ length of a vector ☜; the Pythagorean formula does the job.

\[\color{blue}{\sqrt{7^2 + 4^2 + 0^2} = 8.06}\ \ \text{and}\ \ \ \color{magenta}{\sqrt{1.5^2 + 4^2 + 0^2} = 4.27}\ . \tag{1}\]

We won’t do very much arithmetic here. But for what we will do, it will help to have a much more concise notation. Here are two parts of that notation:

  • A function, called the ☞ dot product ☜ which involves simple multiplication and addition. Here’s an example, the dot product between \(\vec{v}\) and \(\vec{w}\): \[\vec{v} \odot \vec{w} = 1.5 \times 7\ + \ -4 \times 4\ +\ 0 \times 0 = -5.5 .\] The dot product takes two vectors as input and produces an ordinary number as output.

Sometimes it is helpful to dot a vector with itself. For instance:

\(\vec{v} \odot \vec{v} = 7^2 + 4^2 + 0^2 = 65\ \ \ \) and \(\ \ \ \vec{w} \odot \vec{w} = 1.5^2 + (-4)^2 + 0^2 = 18.25\).

This use of the dot product captures a large part of the Pythagorean formula for vector length (Math expression 1)

  • The length of \(\vec{v}\) is written \(\|\vec{v}\|\). There is a simple formula for the length of a vector that uses the dot product:

\[\|\vec{v}\| = \sqrt{\ \strut\vec{v} \odot \vec{v}\ }\ \ \ \text{and}\ \ \ \|\vec{w}\| = \sqrt{\ \strut\vec{w} \odot \vec{w}\ }\ .\] This dot-produce notation pays off with slightly more elaborate calculations. For instance,

  • The ☞ alignment between two vectors ☜ is calculated like this: \[\text{alignment}(\vec{v}, \vec{w}) \equiv \frac{\ \ \ \ \vec{v} \odot \vec{w}}{\sqrt{\strut\ \|\vec{v}\|\ \|\vec{w}\|\ }} \tag{2}\]
NoteSpaces with dimension greater than 3

Understandably, people have difficulty visualizing scenes in 4- or higher dimensions, or even imagining that such scenes might exist. Our cognitive abilities are tuned to perception in the familiar 3-D space of everyday life.

With the arithmetic representation of vectors, however, it’s easy to see how to construct vectors in 4-space, 5-space, or any-dimensional space. A vector in such a space is merely a column of numbers. Here are a few examples.

\[\color{tomato}{\vec{a} \equiv \left(\begin{array}{r}1.5\\3.0\end{array}\right)}\ \ \ \ \ \color{darkgreen}{\vec{b}\equiv\left(\begin{array}{r}4.2\\-7.1\\2.3\end{array}\right)} \ \ \ \ \ \color{dodgerblue}{\vec{c}\equiv\left(\begin{array}{r}0.3\\4.1\\1.9\\-3.2\end{array}\right)} \ \ \ \ \ \color{purple}{\vec{d}\equiv\left(\begin{array}{r}1.0\\ -7.0\\ 2.9\\ 1.2\\ -4.1\end{array}\right)} \ \ \ \ \ \color{darkgreen}{\vec{g}\equiv\left(\begin{array}{r}-0.1\\ -8.8\\ -4.6\\ 6.5\\ 5.2\\ -4.2\end{array}\right)} \ \ \ \ \ \]

The count of components in each vector tells the dimension of the space where the vector exists or “lives in.”

The reader may object. “Of course, you can make a column of numbers with as many components as you like. What does that have to do with ‘space?’”

What makes these simple columns of numbers into an “object in space” is the calculations we can do with them, for instance, the “length” of a vector or the alignment between two vectors. Both of these calculations are rooted in the dot product, whose formula is a simple extension of that already shown. For instance, here’s a dot product in 5-dimensional space:

\[\color{olive}{\left(\begin{array}{r}1.0\\ -7.0\\ 2.9\\ 1.2\\ -4.1\end{array}\right)} \odot \color{black}{\left(\begin{array}{r}2.6\\8.3\\6.4\\1.6\\2.7\end{array}\right)}= \text{ }1.0 \times 2.6 \ -\ 7.0 \times 8.3\ +\ 2.9 \times 6.4 \ +\ 1.2 \times 1.6 \ -\ 4.1\times 2.7 = -46.09\]

2 Alignment, projection, and decomposition

Three important spatial operations in modern technology are

  1. Finding the alignment between two vectors.

  2. Defining a ☞ subspace ☜ of a larger space. As you will see, subspaces are a way of encoding information about patterns.

  3. ☞ Projecting ☜ a vector in the larger space down into a subspace. Since subspaces encode patterns, projection determines to what extent a vector displays those patterns.

The alignment quantifies how similar the two vectors are, that is, whether they have information in common. An alignment of 1 indicates that the vectors share all information. An alignment of zero means that they have no information in common. (An alignment of -1 means that the vectors share information completely, but that one vector has been reversed in direction.) The reader who has studied statistics may recall the “correlation coefficient.” This is essentially the same as the alignment.

In a two-dimension space, a subspace is a line: an infinite set of points that all lie within the bigger space. The line is a one-dimensional subspace. In three-dimensional space, there are two different kinds of subspaces: a line (1-dimensional) and a plane (2-dimensional). In higher-dimensional spaces, say one with \(n\) dimensions, there can be 3-, 4-, and up to \((n-1)\)-dimensional subspaces.

A one-dimensional subspace is defined by a single vector. A two-dimensional subspace is defined by two vectors.1 Similarly, an \((n-1)\)-dimensional subspace is defined by \(n-1\) vectors.

Figure 2 shows a two-dimensional subspace (that is, a plane) of the three-dimensional space. The two vectors drawn in the same color as the subspace have been used to define the subspace. We have colored the subspace-defining vectors to be inconspicuous, so the reader can focus attention on the subspace itself.

glX 
  5 
Figure 2: A two-dimensional subspace of a three-dimensional space. The black vector is not in the subspace.

You can verify that the two green vectors are in the subspace by rotating the scene. Whatever way you rotate it, you will never see the tips of the green vectors outside of the subspace. The black vector, however, does not lie in the subspace. (Of course, the green vectors must be in the subspace. We defined the subspace using the two green vectors.)

Projection of a vector onto a subspace means to find a new vector in the subspace that is a close as possible to the vector being projected. Figure 3 shows the projection of the black vector onto the green subspace. We have drawn in gray the “new” vector, that is, the one that lies in the subspace and is as close as possible to the black vector.

glX 
  7 
Figure 3: The black vector consists of a part entirely within the subspace and a part entirely outside the subspace.

To confirm that the gray vector really is the one in the subspace that is as close as possible to the black vector, observe these facts by rotating the scene:

  1. The gray vector is in the green subspace. No matter how you rotate the scene, the gray vector never emerges from the subspace.

  2. The tip of the black vector is directly above the tip of the gray vector. Admittedly, it can be hard to be sure of this just by observing the co-movement of the black and gray vectors.

To allow greater precision of observation, we have added a red that is perpendicular to the green subspace. You will want to confirm that this statement is accurate. Do so by rotating the space until you are looking at the subspace edge on. Whichever such perspective you take, the red vector will be perpendicular to the vestigial line of the subspace.

Now rotate the space so you are looking straight down the red vector. When you have this right, the red vector will look like a single dot. You will also see that the tip of the gray vector is exactly covered by the tip of the black vector.

Go back to looking at the subspace edge on. You should be able to see the black, red, and gray vectors. Notice that if the red vector were moved so that its base is at the tip of the gray vector, it would reach directly to the tip of the black vector. That is,

\[{\color{darkgray}{\text{gray}}} + {\color{red}{\text{red}}} = \text{black}\ .\]

Writing a vector as a sum of two other vectors is a form of decomposition. This decomposition of the black vector involves a part directly in the subspace (gray) and a part entirely outside the subspace (red).

In three-dimensional space, this is just fun and games. But in higher-dimensional space, where we need to perform the projection using arithmetic, it is the core technique of statistical modeling, a method that underlies many published scientific findings and that gives a more nuanced view of relationships than the simple rates described in Chapter 3.

3 Example: Learning to identify different species

We are going to take on a ☞ learning ☜ task: figuring out how to classify the species of a penguin. For people, such learning comes from experience. Spend enough time with penguins, and you’ll figure out how to discern one type of penguin from another.

Consider now the problem of getting a computer to discern the different species from one another using physical measurements. Our motivation here is to demonstrate a process of learning. I don’t know that there is a practical, ecological motivation behind this problem.

In this learning task, we will start by sending human experts out to collect data from penguins: the species, but also quantitative data such as the penguin’s mass, beak length, and width, and so on. The data collected constitute a ☞ training data set ☜ that we will use to establish quantitative rules for identifying a penguin’s species.

Figure 4 shows training data from 344 penguins across three species: Adélie, Chinstrap, and Gentoo. Each penguin is one dot in the picture. The judgement of the human experts about the species is indicated by the color of the dot: Adelie is blue, Chinstrap green, and Gentoo red. Five different quantitative measurements were taken for each penguin, so the dots rightfully belong in a five-dimensional space. But we can only show three dimensions graphically, so we work with those.

glX 
  9 
Figure 4: Representing the penguin data as points in space.

From the perspective shown initially, it’s easy to distinguish the Chinstrap penguins from the other two species. But the Adelie and Gentoo species are all mixed up; we can’t divide the space into distinct regions that separate one species from another.

Now rotate the scene until each of the three species occupies a distinct region. By rotating, you are determining how to combine the various measurements to obtain the cleanest results from the training data. Once that learning has been accomplished, you’ll know how to classify any new penguin that comes along without a species label.

Think about how you learned the pattern. You rotated the space more or less at random, keeping track of which perspective best separates the groups of different-colored dots. Such a perspective optimization can also be automated, without human involvement. In fact, that is how the subspace shown in Figure 4 was determined. The information about species corresponds to the pattern encoded by the subspace.

This was a pretty easy learning task since it can be done in three dimensions. The next example demonstrates a more impressive feat performed in a space of more than 100 dimensions. What’s more, there is no expert providing the training data; the method will figure out how to perform classification without knowing the goal!

4 Example: Voting patterns

In high-dimensional space, random vectors tend to have an alignment close to zero. This is an important clue to help us spot patterns in data. As described previously, patterns are encoded by a choice of subspace. To search for patterns, we look for subspaces where projecting onto the space tends to produce vectors that are more closely aligned.

To avoid unnecessary abstraction and the mystique that can arise from leaving the familiar realm of three dimensions, we have selected an easy-to-understand setting: votes by legislators in a council. In particular, we will examine a series of 773 votes by 135 lawmakers in the Scottish Parliament in 1999-2000.2 This might seem an obscure topic, but the advantage for us is that few readers will have direct knowledge of the parliament’s political structure, for instance, the number and make-up of political parties. Similarly, the reader will not know how the 773 recorded ballots are related to one another.

This is an example of ☞ unsupervised learning ☜. In supervised learning, as with the penguin species assignment, we have training data about the patterns we seek to uncover: the different colored points in Figure 4 that enable us to find an advantageous projection onto a low-dimensional subspace. In unsupervised we have no such knowledge. The problem is then to find a low-dimensional subspace without knowing in advance what we are looking for.

To orient the reader, consider some of the data for one of the 135 members of parliament, Bruce Crawford. Mr. Crawford voted “For” 235 ballots, “Against” 264, and didn’t register a vote in 274. For instance, Bill S1M-786 was entitled “Ethical Standards in Public Life.” Mr. Crawford was in favor, as were 98 of his colleagues. (It is likely that most members of parliament are in favor of ethical standards in public life, but evidently, 17 members objected to something or another in the bill. There were 2 abstentions and 16 absences.)

A diligent researcher can read through the minutes of Parliament, which record which party each member belongs to and the content of each bill. But, in the spirit of unsupervised learning, we will not use this information.

Figure 5 shows the votes and absences for each of the 134 members of parliament on each of the 773 bills, giving somewhat over 100,000 data points. Each point is one member voting for or against one bill (or abstaining or being absent). The point’s color shows whether the vote was for or against.

Figure 5: Votes by 134 members of the Scottish Parliament on 773 bills considered in the 1999-2000 session. The “Ethical Standards” bill is column 183. Mr. Crawford is row 100.

Our goal is to find and interpret patterns in the data. There are many ways to define “pattern.” For instance, the reader will, possibly, see the overall pattern in Figure 5 as a plaid, a common fabric pattern. (Satisfyingly, plaid is culturally associated with Scotland, although that has nothing to do with the matter at hand.) Here, however, we will stick with the sorts of patterns that can be seen through the lens of quantitative reasoning, particularly the alignment of vectors and the detection of low-dimensional subspaces that provide a good approximation to the raw data.

We start with alignment, a useful measure that can be calculated using the formula in Math expression 2, which involves simple arithmetic. We start by measuring the alignment among the various members of parliament. For each of the 133 members of parliament, we have a vector of votes on the 773 bills. For example, Bruce Crawford’s voting record is represented by the vector we have named \(\vec{M}_{100}\) in Math expression 3. We also show vectors for a randomly selected few of his colleagues: Colin Campbell (\(\vec{M}_{99}\)), Paul Martin \(\vec{M}_{57}\), and Scott Barrie (\(\vec{M}_{25}\)). Each of the vectors has 773 components; we’re leaving out most of them for human readability.

\[\vec{M}_{100} = \left(\begin{array}{r}1\\-1\\-1\\1\\1\\1\\1\\-1\\\vdots\\-1\\-1\\1 \end{array}\right)\ \ \ \ \vec{M}_{99} = \left(\begin{array}{r}-1\\1\\1\\-1\\-1\\-1\\-1\\1\\\vdots\\1\\1\\0 \end{array}\right)\ \ \ \ \vec{M}_{57} = \left(\begin{array}{r}1\\-1\\-1\\-1\\1\\1\\1\\-1\\\vdots\\0\\0\\0 \end{array}\right)\ \ \ \ \vec{M}_{25} = \left(\begin{array}{r}0\\0\\0\\0\\0\\-1\\1\\1\\\vdots\\0\\1\\0 \end{array}\right)\ \ \ \ \tag{3}\]

Note that the first component of all the vectors corresponds to the vote on the same bill, S1M-1, and similarly for all the other bills up to S1M-4064.

Observe that \(\vec{M}_{100}\) is almost entirely the opposite voting pattern of \(\vec{M}_{99}\). To find the quantitative alignment, first take the dot product of the two vectors, \[ (-1) \times 1 + (-1)\times 1 + 1 \times (-1) + 1 \times (-1) + \cdots = - 8\] then divide by the square root of the lengths of the two vectors, \(\sqrt{11}\) and \(\sqrt{10}\) respectively. This arithmetic operation gives an alignment of -76%.

The point here is not to emphasize this particular calculation, but to show that the alignment is a straightforward statistic to compute. And, in any event, we should take into account all 773 bills when computing the alignment between two members of parliament. Table 1 shows the result.

Table 1: Alignments among the voting patterns of four members of parliament.
  Scott Paul Colin Bruce
Scott 1 -0.33 0.68 -0.31
Paul -0.33 1 -0.39 0.85
Colin 0.68 -0.39 1 -0.37
Bruce -0.31 0.85 -0.37 1

Notice that (of course!) each MP is perfectly aligned with themself. And, the alignments are symmetrical: the alignment between Paul and Scott is the same as the alignment between Scott and Paul. Such patterns tell us nothing about voting patterns. They are merely an inevitable consequence of the symmetry in the dot-product calculation.

The alignment array, however, shows some patterns. Notice that Scott and Colin are strongly aligned. Paul and Bruce are even more so. But Scott and Paul are somewhat negatively aligned, as are Colin and Bruce.

Re-arranging the names to place strongly aligned ones near each other and negatively aligned ones far away gives the following order:

Scott, Colin, Bruce, Paul

Table 2 re-arranges Table 1 into this order. The re-arrangement suggests that among these four MPs there are two voting blocks, blue and green.

Table 2: Re-ordering the members of parliament to put those closely aligned near each other shows two voting blocks.
  Scott Colin Bruce Paul
Scott 1 0.68 -0.31 -0.33
Colin 0.68 1 -0.37 -0.39
Bruce -0.31 -0.37 1 0.85
Paul -0.33 -0.39 0.85 1

Another way to look at the voting data is as 773 vectors, each having 134 components. That is, we make 134 measurements on each of the 773 bills. Looking at the alignments among the 773 vectors lets us organized the bills into blocks.

Figure 6 shows the entire data set from Figure 5 with both the rows and columns re-arranged to enhance the “blockiness” of the data.

Figure 6: The same data as shown in Figure 5, but the rows and the columns have been reordered so that closely aligned vectors are neighbors.

The strong block structure in Figure 6 is evident. Looking across the columns, there are four factions. The largest, by far, includes the members numbered 60 and higher. The second largest faction consists of members numbered 1 through 36. This second faction is distinguished from the first by voting “for” bills 1-230 and against bills 580+. They also have many absences for bills 450-550.

Having identified two major factions, we can compare the voting patterns of members of parliament who are not in either faction. A third faction, contrasting with the first two, occupies 37-55. Meanwhile, a small fourth “faction,” rows 56 to 59, is a mix of opinions and absences.

To facilitate further discussion, let’s call the three substantial factions the “Top,” “Middle,” and “Bottom.”3

With these names for the horizontal bands in Figure 6, we can discuss the vertical bands: the types of bills considered by Parliament. Bills 1 to 230 have the Top faction voting “for” and the Bottom faction voting “against.” Bills 231 to 325 are opposed by both factions. Bills numbered 600+ are opposed by the Bottom faction but supported by the Top faction. Bills 326 to 450 receive support from both factions.

This is a lot to find out about a session of Parliament from a set of 100,000 “for” and “against” votes. It is not, of course, everything to know about the session. For instance, although we have identified four types of bills, we would need to read the actual bills or the debate record to determine which social, economic, or administrative features characterize each type. Here, modern quantitative reasoning provides a tool that can inform other types of description and argument.

Footnotes

  1. The vectors must not be exactly aligned. Otherwise, the subspace is one-dimensional.↩︎

  2. This example was provided soon after the session of Parliament by then-undergraduate Caroline Ettinger and Professor Andrew Beveridge at Macalester College.↩︎

  3. By referring to the known party registration of each MP, the Top faction consists of a mix of Scottish Labour and Scottish Liberal Democrats, the Bottom faction is the Scottish National Party, and the Middle faction is the Scottish Liberal Democrats. The party registration information, however, played no role in the construction of Figure 6.↩︎