While doing joins within a specific data source can be helpful, joining data from multiple sources is the first step towards high-level analysis. For example, combining your marketing data from Adwords and your transaction data from Square can uncover actionable insights that allow companies to start moving the needle. 

However, performing joins across data sources is difficult; you first need to consolidate your data sources into one place, preferably a data warehouse like Amazon Redshift. You can extract the data from each source manually, or use an ETL service like Stitch to connect your data sources and start flowing information into Redshift in minutes.

 
 

Once your data is in a data warehouse, you can join across whatever sources you want. Redshift in particular allows for fast querying and seamless connections to business intelligence tools -- like Looker, Mode, and Wagon -- that allow you to query the data and visualize it using charts and reports.

The whole system looks something like this:

ETL process overview

ETL process overview

The point here is that it’s impossible to use SQL to join your data without a data warehouse, and the easiest way to connect your many sources is through an ETL tool. Stitch connects to the data sources your business is already using, like MySQL, PostreSQL, MongoDB, Salesforce, Zendesk, Google Analytics, and many more, then it streams all that data to Amazon Redshift.

Once you have your data in a data warehouse, you can use the links on the next page to understand how each of your data sources structures their data, so you can start performing joins as soon as possible.