Lesson Tuesday

This lesson explores join statements in more depth. You do not need to know how to write JOIN statements for this section's independent project. However, it is important to understand how data from multiple tables can be gathered in a single query statement using matching keys on related tables.

SQL JOIN Statements

Let's look at an example where a sponsor has many athletes. When we have a one-to-many relationship, it's straightforward to select all of the "many" that belongs to the "one." Here's some sample data:

table:  athletes

 id | name              | sponsor_id
  1 | Michael Jordan    | 1
  2 | Maria Sharapova   | 1
  3 | David Beckham     | 2

table: sponsors

 id | organization      
  1 | Nike
  2 | Adidas

If we want to get all of the athletes that Nike sponsors, we can write the following SQL statement:

SELECT * FROM athletes WHERE sponsor_id = 1;

There's another way to get this data: using a SQL join statement. A join statement is different from a join table but performs a similar job. It allows us to join information from multiple tables. Here's how it works:

SELECT * FROM sponsors
JOIN athletes ON (athletes.sponsor_id = sponsors.id)
WHERE sponsors.id = 1;

Let's walk through this statement line by line: We SELECT all ( * ) of the records FROM the sponsors table. We JOIN the athletes table to the sponsors table ON the key that is the same for and links both tables: the sponsorid (which is athletes.sponsorid on the athletes table and sponsors.id on the sponsors table). We limit the JOIN to only those records WHERE the sponsors.id is 1.

So, in plain English, we're joining the athletes and sponsors tables together wherever the sponsor_id column of the athletes table is equal to the id column of the sponsors table. Then, we're narrowing to just the records where the sponsors table has an ID of 1.

The join statement is used to simplify queries across multiple tables. Again, it has nothing inherently to do with join tables. But, as we're about to see, you can use it to join together tables that include a join table.

Let's move to a more complex example. The spotlight on some athletes has grown, and now some athletes have many sponsors, and an organization can continue to sponsor many athletes. The data now looks like this:


 id | name
  1 | Michael Jordan
  2 | Maria Sharapova   
  3 | David Beckham


 id |     name      
  1 | Nike
  2 | Adidas

endorsements (join table)

 id | athlete_id | sponsor_id
  1 |         1 |          1
  2 |         2 |          1
  3 |         1 |          2

If we want to figure out who all of Nike's athletes are without using a join statement, we have to go through two steps. First, we select all of the entries in the join table, endorsements:

SELECT athlete_id FROM endorsements WHERE sponsor_id = 1;

From this statement, we get the athlete_ids 1 and 2 back. Now, we have to loop through each of these athlete_ids and select the corresponding athlete, leading to an n+1 query:

SELECT * FROM athletes WHERE id = 1;
SELECT * FROM athletes WHERE id = 2;

To do this more efficiently, we can use a 3-table join:

SELECT athletes.* FROM sponsors
JOIN endorsements ON (sponsors.id = endorsements.sponsor_id)
JOIN athletes ON (endorsements.athlete_id = athletes.id)
WHERE sponsors.id = 1;

We're taking all of the sponsors with id 1, joining all of the endorsements to that sponsor, and then joining all of the athletes to those endorsements, and finally selecting everything from those athletes.

Another Example

Let's look at another join statement using a different example.


 id | name    | phone
 1  | Mike     | 9165551212
 2  | Courtney | 3235551212
 3  | Jake      | 4155551212


 id | city     | state
 1  | Chicago  | IL
 2  | New York | NY
 3  | Oakland  | CA
 4  | Portland | OR

cities_friends (join table)

 id | city_id | friend_id
 1  | 1       | 1
 2  | 1       | 2
 3  | 2       | 3
 4  | 2       | 1

If we want to figure out all of the cities that Mike has visited, we can use this join statement:

SELECT cities.* FROM friends
JOIN cities_friends ON (friends.id = cities_friends.friend_id)
JOIN cities ON (cities_friends.city_id = cities.id)
WHERE friends.id = 1;

Sometimes it is easiest to look at join statements backwards.

We want our results to be centered around whatever friend has an id = 1. So we are asking the database to look at the friends table and the cities_friends table together. We want it to return any entries where friends.id is the same as cities_friends.friend_id. In our case the friend_id value in the cities_friends table is 1.

Just by looking at the cities_friends table, we can see that there are two entries where friend_id is 1. Then we are asking the database to look at whatever the city_id is for those two entries (in this case 1 and 2). Then the database will find those same numbers in the cities table (Chicago and New York) and return all (*) the information about those cities. In this case, that means the name of the city and the name of the state it's in.

Think of it like slowly filtering down results of our queries. We start with a lot of database entries, we filter the join table first for the correct friend, then we look for the IDs of the other table that match with the result of our first filtering. Then we ask the database to return everything about the entries on the other table that matched.

Lesson 23 of 29
Last updated August 7, 2022