Lesson Tuesday

This lesson offers an optional exploration into something called a join statement. Join statements allow us to associate objects in a slightly different manner than seen in the previous lesson: By using specialized SQL statements.

You are not required to use join statements, but it is important to understand how data from multiple tables can be gathered in a single query statement using matching keys on related tables.

Demonstration

Database Relationship Without Join Statements

When you have a one-to-many relationship - say, a sponsor has many athletes - it's straightforward to select all of a sponsor's athletes. Here's some sample data:

table:  athletes

 id | name              | sponsor_id
----+-----------------------+----------------
  1 | Michael Jordan    | 1
  2 | Maria Sharapova   | 1
  3 | David Beckham     | 2


table: sponsors

 id | organization      
----+-----------------
  1 | Nike
  2 | Adidas

If we want to get all of the athletes that Nike sponsors, we can write the following SQL statement:

SELECT * FROM athletes WHERE sponsor_id = 1;

Database Relationships With Join Statements

There's another way to get this data: using the SQL join statement. A join statement is different from a join table but performs a similar job: Joining information from multiple tables. Here's how it works:

SELECT * FROM sponsors
JOIN athletes ON (athletes.sponsor_id = sponsors.id)
WHERE sponsors.id = 1;

Let's walk through this statement line by line:

  • We SELECT all ( * ) of the records FROM the sponsors table.

  • We JOIN the athletes table to the sponsors table ON the key that is the same for and links both tables: The sponsor_id (which is athletes.sponsor_id on the athletes table and sponsors.id on the sponsors table).

  • We limit the JOIN to only those records WHERE the sponsors.id is 1.

So, in plain English, we're joining the athletes and sponsors tables together wherever the sponsor_id column of the athletes table is equal to the id column of the sponsors table. Then, we're narrowing to just the records where the sponsors table has an ID of 1.

The join statement is used to simplify queries across multiple tables. Again, it has nothing inherently to do with join tables. But, as we're about to see, you can use it to join together tables that include a join table.

Join Statements with Join Tables

Let's move to a more complex example. The spotlight on some athletes has grown, and now some athletes have many sponsors, and an organization can continue to sponsor many athletes. The data now looks like this:

athletes
---------

 id | name
----+--------
  1 | Michael Jordan
  2 | Maria Sharapova   
  3 | David Beckham


sponsors
--------

 id |     name      
----+-----------------
  1 | Nike
  2 | Adidas

endorsements (join table)
--------

 id | athlete_id | sponsor_id
----+-----------+-----------
  1 |         1 |          1
  2 |         2 |          1
  3 |         1 |          2

If we want to figure out who all of Nike's athletes are without using a join statement, we have to go through two steps. First, we select all of the entries in the join table, endorsements:

SELECT athlete_id FROM endorsements WHERE sponsor_id = 1;

From this statement, we get the athlete_ids 1 and 2 back. Now, we have to loop through each of these athlete_ids and select the corresponding athlete:

SELECT * FROM athletes WHERE id = 1;
SELECT * FROM athletes WHERE id = 2;

To do this more efficiently, we can use a 3-table join:

SELECT athletes.* FROM sponsors
JOIN endorsements ON (sponsors.id = endorsements.sponsor_id)
JOIN athletes ON (endorsements.athlete_id = athletes.id)
WHERE sponsors.id = 1;

We're taking all of the sponsors with ID 1, then joining all of the endorsements to that sponsor, and then joining all of the athletes to those endorsements, and finally selecting everything from those athletes.

Let's look at another join statement using an example we have seen before.

friends
-------

 id | name    | phone
 ---+---------+-----------
 1  | Chris   | 9165551212
 2  | Jessica | 3235551212
 3  | Jake    | 4155551212


cities
------

 id | city     | state
 ---+----------+------
 1  | Chicago  | IL
 2  | New York | NY
 3  | Oakland  | CA
 4  | Portland | OR


cities_friends (join table)
--------------

 id | city_id | friend_id
 ---+---------+----------
 1  | 1       | 1
 2  | 1       | 2
 3  | 2       | 3
 4  | 2       | 1

If we want to figure out all of the cities that Chris has visited, we can use this join statement:

SELECT cities.* FROM friends
JOIN cities_friends ON (friends.id = cities_friends.friend_id)
JOIN cities ON (cities_friends.city_id = cities.id)
WHERE friends.id = 1;

Sometimes it is easiest to look at join statements backwards. We want our results to be centered around whatever friend has an id = 1. So, we ask the database to look at the friends table and the cities_friends table together. Then, to return any entries where friends.id is the same as cities_friends.friend_id, or in our case, where the friend_id value in the cities_friends table is 1.

Just by looking at the cities_friends table, we can see that there are two entries where friend_id is 1. Then we ask the database to use those two entries, look at whatever their city_id is (in this case 1 and 2), then find those same numbers in the cities table (Chicago and New York) and return all (*) information about the cities.

Think of it like slowly filtering down results of our queries. We start with a lot of database entries, we filter the join table first for the the correct friend, then we look for the IDs of the other table that match with the result of our first filtering. Then we ask the database to return everything about the entries on the other table that matched.