What is a SQL join?
A SQL join is a Structured Query Language (SQL) instruction to combine data from two sets of data (e.g. two tables). Before we dive into the details of a SQL join, let’s briefly discuss what SQL is, and why someone would want to perform a SQL join.
SQL is a special-purpose programming language designed for managing information in a relational database management system (RDBMS). The word relational here is key; it specifies that the database management system is organized in such a way that there are clear relations defined between different sets of data.
Typically, you need to extract, transform, and load data into your RDBMS before you’re able to manage it using SQL, which you can accomplish by using a tool like Stitch.
Relational Database Example
Imagine you’re running a store and would like to record information about your customers and their orders. By using a relational database, you can save this information as two tables that represent two distinct entities: customers and orders.
|1||George||Washingtonemail@example.com||3200 Mt Vernon Hwy||Mount Vernon||VA||22121|
|2||John||Adamsfirstname.lastname@example.org||1250 Hancock St||Quincy||MA||02169|
|3||Thomas||Jeffersonemail@example.com||931 Thomas Jefferson Pkwy||Charlottesville||VA||22902|
|4||James||Madisonfirstname.lastname@example.org||11350 Constitution Hwy||Orange||VA||22960|
|5||James||Monroeemail@example.com||2050 James Monroe Parkway||Charlottesville||VA||22902|
Here, information about each customer is stored in its own row, with columns specifying different bits of information, like their first name, last name, and email address. Additionally, we associate a unique customer number, or primary key, with each customer record.
Again, each row contains information about a specific order. Each order has its own unique identification key
order_id assigned to it as well.
You’ve probably noticed that these two examples share similar information. You can see these simple relations diagrammed below:
Note that the orders table contains two keys: one for the order and one for the customer who placed that order. In scenarios when there are multiple keys in a table, the key that refers to the entity being described in that table is called the Primary Key (PK) and other key is called a Foreign Key (FK).
In our example,
order_id is a primary key in the orders table, while
customer_id is both a primary key in the customers table, and a foreign key in the orders table. Primary and foreign keys are essential to describing relations between the tables, and in performing SQL joins.
SQL Join Example
Let’s say we want to find all orders placed by a particular customer. We can do this by joining the customers and orders tables together using the relationship established by the
Here, we’re joining the two tables using the
join keyword, and specifying what key to use when joining the tables in the
on customers.customer_id = orders.customer_id line following the join statement. Here is the result of the above SQL query, which includes two orders placed by Thomas Jefferson (customer_id = 3):
This particular join is an example of an “inner” join. Depending on the kind of analysis you’d like to perform, you may want to use a different method. There are actually a number of different ways to join the two tables together, depending on your application. The next section will explain inner, left, right, and full joins, and provide examples using the data tables used above.