Lesson Monday

When you have a one-to-many relationship - say, a circus trainer has many animals - it's straightforward to select all of a trainer's animals. Here's some sample data:

 id | animals | trainer_id
----+---------+-----------
  1 | tiger   | 1
  2 | lion    | 1
  3 | gorilla | 2


 id |     name      
----+-----------------
  1 | harry houdini
  2 | missy mysterious

If we want to get all of Harry's animals, we simply:

SELECT * FROM animals WHERE trainer_id = 1;

There's another way to get this data: with a join statement. A join statement is totally different from a join table - don't get them confused! A join table holds data about the relationship between two tables, a join statement queries the join table and other tables to select relationship specific data. Here's how it works:

SELECT * FROM
animals JOIN trainers ON (animals.trainer_id = trainers.id)
WHERE trainers.id = 1;

This honestly is more confusing than the first way, but I want you to take a moment to understand it so that when we get to a more complex example, you have this as a foundation. We're joining the animals and trainers tables together wherever the trainer_id column of the animals table is equal to the id column of the trainers table. Then, we're narrowing down to just the records where the trainers table has an ID of 1.

The join statement is used to simplify queries across multiple tables. Again, it has nothing inherently to do with join tables. But, as we're about to see, you can use it to join together tables that include a join table.

So, let's move to a more meaningful example. The circus has grown, and an animal can have many trainers, and a trainer can work with many animals. The data now looks like this:

animals
---------

 id | species
----+--------
  1 | tiger  
  2 | lion   
  3 | gorilla


trainers 
--------

 id |     name      
----+-----------------
  1 | harry houdini
  2 | missy mysterious

lessons (join table)
--------

 id | animal_id | trainer_id
----+-----------+-----------
  1 |         1 |          1
  2 |         2 |          1
  3 |         1 |          2

If we want to figure out who all of Harry's animals are without using a join statement, we have to go through two steps. First, we select all of the entries in the join table, lessons:

SELECT animal_id FROM lessons WHERE trainer_id = 1;

From this statement, we get the animal_ids 1 and 2 back. Now, we have to loop through each of these animal_ids and select the corresponding animal:

SELECT * FROM animals WHERE id = 1;
SELECT * FROM animals WHERE id = 2;

To do this more efficiently, we can use a 3-table join:

SELECT animals.* FROM
trainers JOIN lessons ON (trainers.id = lessons.trainer_id)
         JOIN animals ON (lessons.animal_id = animals.id)
WHERE trainers.id = 1;

Take a moment to digest this. It might be helpful to picture the tables left-to-right:

     trainers                         lessons                  animals
     --------                         --------                 --------

 id |     name              id | animal_id | trainer_id      id | species 
----+------------------    ----+-----------+------------    ----+-------- 
  1 | harry houdini          1 |         1 |          1       1 | tiger
  2 | missy mysterious       2 |         2 |          1       2 | lion
                             3 |         1 |          2       3 | gorilla

We're taking all of the trainers with ID 1, then joining all of the lessons to that trainer, and then joining all of the animals to those lessons, and finally selecting everything from those animals.

Let's look at another join statement using an example we have seen before.

friends
-------

 id | name    | phone
 ---+---------+-----------
 1  | Chris   | 9165551212
 2  | Jessica | 3235551212
 3  | Jake    | 4155551212


cities
------

 id | city     | state
 ---+----------+------
 1  | Chicago  | IL
 2  | New York | NY
 3  | Oakland  | CA
 4  | Portland | OR


cities_friends (join table)
--------------

 id | city_id | friend_id
 ---+---------+----------
 1  | 1       | 1
 2  | 1       | 2
 3  | 2       | 3
 4  | 2       | 1

If we want to figure out all of the cities that Chris has visited, we can use this join statement:

SELECT cities.* FROM
friends JOIN cities_friends ON (friends.id = cities_friends.friend_id)
        JOIN cities ON (cities_friends.city_id = cities.id)
WHERE friends.id = 1;

Sometimes it is easiest to look at join statements backwards. We want our results to be centered around whatever friend has an id = 1. So we are asking the database to look at the friends table and the cities_friends table together and return any entries where friends.id is the same as cities_friends.friend_id, or in our case, where the friend_id value in the cities_friends table is 1. Just by looking at the cities_friends table, we can see that there are two entries where friend_id is 1. Then we are asking the database to use those two entries, look at whatever the city_id is for those two entries (in this case 1 and 2), and then find those same numbers in the cities table (Chicago and New York) and return all (*) information about those cities.

I know that this can be a little confusing, but you can think of it like slowly filtering down results of your queries. You start with a lot of database entries, you filter the join table first for the the correct friend, then you look for the IDs of the other table that match with the result of your first filtering. Then you ask the database to return everything about the entries on the other table that matched.

Here are some blogs for you to look at that could make the concept of join statements a little easier to understand: