Digging Deeper Into SQL Temp Tables
Categories
This article helps you unlock the power of SQL temporary or temp tables. Improve your query efficiency, data analysis, and data transformation with them.
SQL temporary or temp tables. Ever heard of them? We bet you did! But how often do you use them? No, don’t tell us – no matter how much, it’s probably too little.
They are a practical SQL tool that can make many SQL tasks easier. We’ll talk about what they are, how, and in what cases to use them.
Understanding Temporary or Temp Tables in SQL
Temporary tables are named temporary storage of the interim results. You can query, edit, and delete them – just like you would with a regular table.
Do these explanations sound awfully familiar? Yes, it’s almost a carbon copy of a CTE definition. It’s no surprise. SQL CTEs and temp tables are indeed very similar, and people often use CTEs as a temp table.
The main difference between the two is that the CTE result is not stored anywhere, while the temp tables are. Also, the CTE results can be accessed only if you run the CTE query each time. Temp tables are more practical: they can be accessed as long your secession is active, you can easily see the data with a simple SELECT statement, and all the changes to the temp table can be made without creating the temp table over and over again.
One difference is that the temp tables can be explicitly dropped. But even if not, once the session that created them is closed, the temp tables are deleted from the database.
This is the furthest we’ll go in touching on CTEs – the rest of the article is dedicated solely to the temp tables.
Comparing Temporary and Permanent Tables in SQL
The SQL temp tables are even better understood when compared to the permanent tables. Here’s an overview of their differences.
Benefits and Use Cases of SQL Temp Tables
We already mentioned two SQL temp table use cases. Let’s extend this list and show you all the temp tables' main benefits and use cases.
1. Intermediate Storage
Benefit: Temporary tables can be an efficient storage for data that's only needed for a short period. This saves the developers and database administrators from creating permanent tables for transient tasks.
Use Case: If you’re migrating data, you’ll need to temporarily store data that is moved from one to another table. This is to make sure no data is duplicated or lost. You don’t need to create a permanent table for that – you can easily set up a SQL temp table, use it during the migration process, and it will be discarded after.
2. Query Simplification
Benefit: Complex queries can be broken down into several simpler ones. You can save their results in a temporary table. Doing that’ll make your queries more readable, maintainable, and debuggable.
Use Case: An analyst wants to produce a complex report that involves multiple joins, aggregations, filters, and other operations. It could all be written in one complex, hard-to-read-and-understand SQL query. Instead, the analyst would be better off if they used SQL temp tables to handle each reporting step separately. That way, it will be much clearer what each step/query does. It’s easier to check the report and correct the errors that way, as only one step is being changed.
3. Performance Improvements
Benefit: If the frequently accessed or processed data is stored in temporary tables, you can avoid tasking the system with redundant computations. Naturally, this will lead to faster query execution.
Use Case: A system performs several end-of-day calculations on trading data. The primary trading table is huge as it holds all the transactions from that day and historical data. Instead of querying it repeatedly, the daily trading data is loaded into a temporary table. The system then uses this much smaller and more accessible table, which improves the performance.
4. Data Isolation
Benefit: If you manipulate data within temp tables, this ensures that the data in the main tables remains untouched. This protects the main tables’ integrity, which is much appreciated when experimenting with data or performing risky tasks.
Use Case: Say you’re a database administrator who wants to test a new data-cleaning algorithm. If you do that on the primary dataset, you’re risking changing the main data. Instead of risking this, a clever and experienced database administrator would copy data to a temp table and apply the algorithm there. If anything goes wrong, the main dataset remains unscratched.
5. Easier Data Transformation
Benefit: The process of data transformation is simplified by using temp tables as a staging area, where you dump the data before its insertion into the final tables.
Use Case: In the ETL process, the raw data has to be transformed before being loaded into the main tables. The transformation includes cleaning, mapping, and enriching data. All these tasks can be done in temp tables.
6. Reducing Lock Contention
Benefit: Main tables could be locked by frequent or lengthy operations on main tables. This may slow down and delay other operations. SQL temp tables can help reduce the periods of the main tables being locked. They do that by allowing the bulk of the operations to be done on the temp table and only the final results to be moved to the main table.
Use Case: In peak hours, a retail database faces lock contention when updating inventory levels. By using temp tables, inventory adjustments can be done first in the temp table, and then the result can easily and quickly be merged with the main inventory table. This minimizes lock durations.
7. Ad-hoc Analysis and Reporting
Benefit: In reporting, there are always plenty of ad-hoc analyses and reports. Do it once and never again. Creating a permanent table for that might be an overkill. Instead, you could use temp tables.
Use Case: Your boss wants a special quarterly report that includes some data not shown in the standard quarterly report. To add that data, you need to use other datasets. You don’t need to change the existing tables or create new permanent ones. Simply use the temporary table and make this one-off report. You can only hope this ad-hoc report will not become a regular one.
SQL Temp Table Types
Generally, there are two types of SQL temp tables: local temp table and global temp table. This is important for all of you MS SQL Server and Oracle users, as they recognize the concept of a global temporary table.
This topic is not that important for MySQL and PostgreSQL users, as there are only local temporary tables.
We’ll talk about implementing SQL temp tables in the next section. However, we will focus on the local temp tables, as PostgreSQL doesn’t allow global temp tables. If your SQL dialect does, implementing them is not very different from the local ones.
In SQL Server, you use the CREATE TABLE command with a hashtag (#) prefix before the table name. If you want a global table, just put a double hashtag (##) prefix. Here’s a detailed explanation from Microsoft.
In Oracle, it’s the same syntax as in PostgreSQL (see the next section). You only write GLOBAL or PRIVATE (as Oracle calls the local temp tables) between CREATE and TABLE keywords. Here’s a detailed explanation from Oracle.
Now, let’s talk about the differences and start with the below overview.
Temporary Table Scope & Visibility
Local Temporary Tables: The local temp table is automatically deleted after you end the session in which you created the table. Of course, you can also explicitly drop it drop it before exiting the session.
Also, they are only visible in the session where they are created. Other sessions can’t see it or access it.
Global Temporary Tables: The global temp table is alive as long the session that created it is active. But, since they can be accessed from other sessions, the global temp table is not deleted while any other sessions referencing it are active. Once they’re ended, too, the global temporary table is automatically dropped.
All the global tables are visible in all sessions. After they are created, you can query and modify them in any session.
Global And Local Temp Table Use Cases
Local Temporary Tables: They are ideal for tasks specific to a single-user session. They are frequently used in stored procedures and scripts where the interim results shouldn't interfere with other operations.
Global Temporary Tables: They are useful when multiple sessions share the same temporary data, e.g., in collaborative tasks. The global temp tables are often used for debugging purposes in a multi-user environment, e.g., a developer can create a global temp table to see its effects across various sessions.
Global and Local Temp Table Security
Local Temporary Tables: There’s an isolation level since the local temp tables are session-specific. This ensures data security and integrity, as other users or sessions can’t access them.
Global Temporary Tables: There’s no isolation from other users, so there’s a risk of data being accessed and modified by unauthorized users.
Global and Local Temp Table Performance
Both types of temp tables are stored in tempdb system database, so there’s generally no performance difference between them.
But please note that there might be contention or concurrency issues with global temp tables when multiple sessions are accessing and modifying them simultaneously.
Creating and Managing SQL Temp Tables
Now, it’s time that we deal with the actual SQL code. We’ll work with PostgreSQL, so there might be some differences compared to the other SQL flavors.
Creating SQL Temp Table
In PostgreSQL, the syntax for creating a temporary table looks like this.
CREATE TEMPORARY TABLE temp_table_name (
column1 datatype1 [DEFAULT value1],
column2 datatype2 [DEFAULT value2],
...
);
If you’re familiar with CREATE TABLE query, then there is practically nothing new here. The only difference is that you write TEMP or TEMPORARY to specify you want to create a temporary table.
For those who don’t create tables often, let’s explain the syntax line by line:
- CREATE TEMPORARY TABLE or CREATE TEMP TABLE: Used to create a temporary table.
- temp_table_name: This is the name of your temporary table.
- column1, column2, ...: The names of the columns that you want to create in the temporary table.
- datatype1, datatype2, ...: These are the types of data that the respective columns will accept, like INTEGER, TEXT, DATE, etc.
- [DEFAULT value1]: This is an optional parameter. It sets a default value for the column if no value is specified during an INSERT operation.
In more practical terms, here’s how to create a temp table named employee and insert some data into it.
CREATE TEMP TABLE employee (
id INT PRIMARY KEY,
first_name VARCHAR (120),
last_name VARCHAR (120),
date_of_birth DATE,
salary DECIMAL (10,2)
);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (1, 'Sam', 'Richards', '2000-05-08', 2479.54);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (2, 'Margarethe', 'Ziegler', '1974-12-01', 5412.74);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (3, 'Bob', 'Fink', '1982-11-27', 8500.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (4, 'Samantha', 'Morris', '1994-10-23', 8000.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (5, 'Friedrich', 'Zeeman', '1961-01-01', 5847.12);
Querying SQL Temp Table
The temp tables are queried the same way as the regular tables, provided you stay in the same session.
For instance, this query
SELECT *
FROM employee;
will return the following output.
Dropping and Deleting SQL Temp Table
There are several options for dropping and deleting the temporary tables in SQL.
1. Automatic Dropping: The temporary tables are dropped automatically at the end of the session. So, the only thing you need to do is simply exit the current session, and, poof, your temporary table is gone. No code for that.
2. Dropping at the End of Transaction: If you want for a temp table to survive only the duration of the transaction, you can give this instruction when creating the table.
CREATE TEMP TABLE temp_table_name ... ON COMMIT DROP;
In our example, we can create the table, insert data into it, and select the data, all as a part of one transaction. Once we commit the transaction, the temp table will be dropped.
CREATE TEMP TABLE employee (
id INT PRIMARY KEY,
first_name VARCHAR (120),
last_name VARCHAR (120),
date_of_birth DATE,
salary DECIMAL (10,2)
) ON COMMIT DROP;
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (1, 'Sam', 'Richards', '2000-05-08', 2479.54);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (2, 'Margarethe', 'Ziegler', '1974-12-01', 5412.74);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (3, 'Bob', 'Fink', '1982-11-27', 8500.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (4, 'Samantha', 'Morris', '1994-10-23', 8000.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (5, 'Friedrich', 'Zeeman', '1961-01-01', 5847.12);
SELECT *
FROM employee;
If you try to query the temporary table separately after this transaction, you’ll get an error that proves the temp table was really dropped after the transaction.
If you haven’t specified the ON COMMIT DROP instruction, you can explicitly drop the temp table like any other table.
DROP TABLE table_name;
Here’s the code to drop our example temp table.
DROP TABLE employee;
Note that we needed to write TEMP or TEMPORARY when we created the temp table. When dropping, you do it without specifying it’s the temporary table.
3. Deleting All the Rows but Keeping the Structure: It’s also possible to delete the rows from the temporary table at the end of the transaction, but for the structure to remain for the rest of the session.
The syntax is shown below.
CREATE TEMP TABLE temp_table_name ... ON COMMIT DELETE ROWS;
Let’s apply this to our employee temp table.
CREATE TEMP TABLE employee (
id INT PRIMARY KEY,
first_name VARCHAR (120),
last_name VARCHAR (120),
date_of_birth DATE,
salary DECIMAL (10,2)
) ON COMMIT DELETE ROWS;
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (1, 'Sam', 'Richards', '2000-05-08', 2479.54);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (2, 'Margarethe', 'Ziegler', '1974-12-01', 5412.74);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (3, 'Bob', 'Fink', '1982-11-27', 8500.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (4, 'Samantha', 'Morris', '1994-10-23', 8000.00);
INSERT INTO employee (id, first_name, last_name, date_of_birth, salary) VALUES (5, 'Friedrich', 'Zeeman', '1961-01-01', 5847.12);
SELECT *
FROM employee;
Again, if try to query the table after this transaction, you will get the column names without the data.
Even if you don’t specify this deletion when creating the table, you can simply do it afterward using the TRUNCATE or DELETE command.
TRUNCATE table_name;
or
DELETE FROM table_name;
For our example table, it’s
TRUNCATE employee;
or
DELETE FROM employee
You can also use DELETE to remove the specific rows. For example, to remove all the employees with a salary equal to or greater than 8,000, you would write this query.
DELETE FROM employee
WHERE salary >= 8000;
After this, our SQL temp table looks like this.
Now that we learned how to create them and manage, let’s use the temp tables to solve some real-world problems.
Real-World Examples: Solving Problems With Temp Tables
We’ll go through four examples that show when SQL temp tables can be useful.
Analyzing Data Subsets Using Temporary Tables
This temp table use is helpful when you have a large dataset and want to focus on a specific subset. Instead of constantly filtering the main table, you can store the subset in a temp table for further analysis.
To show you how to do that, let’s solve this question by Airbnb.
Find how many reviews exist for each review score given to 'Hotel Arena'. Output the hotel name ('Hotel Arena'), each review score, and the number of reviews for that score. Ensure the results only include 'Hotel Arena.'
Link to the question: https://platform.stratascratch.com/coding/10166-reviews-of-hotel-arena
It wants us to output the hotel name, review score, and the number of that score occurrences. The output should only include Hotel Arena.
The question provides us with the table hotel_reviews.
hotel_address | additional_number_of_scoring | review_date | average_score | hotel_name | reviewer_nationality | negative_review | review_total_negative_word_counts | total_number_of_reviews | positive_review | review_total_positive_word_counts | total_number_of_reviews_reviewer_has_given | reviewer_score | tags | days_since_review | lat | lng |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 Western Gateway Royal Victoria Dock Newham London E16 1AA United Kingdom | 359 | 2017-07-05 | 8.5 | Novotel London Excel | United Kingdom | coffee and tea at breakfast were not particularly hot Otherwise everything else was fine | 16 | 1158 | we were allocated the newly refurbished rooms and so everything was fresh and the bed was very comfortable the hotel is ideally situated near City Airport although eventually we travelled by train | 34 | 2 | 10 | [' Leisure trip ', ' Family with young children ', ' Standard Double Room with Two Single Beds ', ' Stayed 2 nights ', ' Submitted from a mobile device '] | 29 days | 51.51 | 0.02 |
35 Charles Street Mayfair Westminster Borough London W1J 5EB United Kingdom | 252 | 2015-08-29 | 9.1 | The Chesterfield Mayfair | Israel | No Negative | 0 | 1166 | We liked everything The hotel is simply a boutique the staff were all polite and helpfull The room was clean and been serviced daily Wifi was completely free Breakfast was simply great I so much want to get back | 41 | 8 | 10 | [' Leisure trip ', ' Couple ', ' Classic Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device '] | 705 day | 51.51 | -0.15 |
14 Rue Stanislas 6th arr 75006 Paris France | 40 | 2017-05-23 | 9.1 | Hotel Le Six | United States of America | There is currently utility construction taking place on the street in front of the hotel so a little noisy at times and barriers in place | 27 | 177 | Neat boutique hotel Some of the most comfortable hotel beds I have ever come across Staff was wonderful Loved the location Not too touristy Luxembourg gardens close by and a great place for a morning run walk | 39 | 3 | 9.2 | [' Leisure trip ', ' Family with young children ', ' Deluxe Double Room ', ' Stayed 4 nights ', ' Submitted from a mobile device '] | 72 days | 48.84 | 2.33 |
Gran V a De Les Corts Catalanes 570 Eixample 08011 Barcelona Spain | 325 | 2016-08-25 | 8.2 | Sunotel Central | United Kingdom | Coffee at breakfast could be better When you spend this amount in a hotel I expect better coffee in the morning | 22 | 3870 | Great bed nice to have a coffee machine in the room love the air conditioning and basically loved the attitude of the staff Really great | 26 | 2 | 9.2 | [' Leisure trip ', ' Group ', ' Comfort Double or Twin Room ', ' Stayed 1 night ', ' Submitted from a mobile device '] | 343 day | 41.38 | 2.16 |
Rathausstra e 17 01 Innere Stadt 1010 Vienna Austria | 195 | 2015-09-17 | 8.5 | Austria Trend Hotel Rathauspark Wien | United Kingdom | A bit out of the way location wise | 9 | 1884 | Clean modern rooms and bathroom well equipped | 9 | 2 | 7.5 | [' Leisure trip ', ' Couple ', ' Comfort Room ', ' Stayed 2 nights ', ' Submitted from a mobile device '] | 686 day | 48.21 | 16.36 |
It’s a big table that has far too many rows and columns than we need. We’ll subset the table into a temporary table.
CREATE TEMP TABLE temp_hotel_arena_reviews AS
SELECT hotel_name, reviewer_score
FROM hotel_reviews
WHERE hotel_name = 'Hotel Arena';
Now our dataset in the temp table has only two columns and contains only data for Hotel Arena. Have a look below.
hotel_name | reviewer_score |
---|---|
Hotel Arena | 7.5 |
Hotel Arena | 9.6 |
Hotel Arena | 4.6 |
Hotel Arena | 3.8 |
Hotel Arena | 4.2 |
After doing this, we can do our analysis on a subset and not the whole dataset. We select the columns from the temp table.
Then, we use COUNT(*) to find the number of the score occurrences. To show the count by hotel and score, we group the output by those columns.
Finally, we order the results by the score descendingly to make it more intuitive.
SELECT hotel_name,
reviewer_score,
COUNT(*) AS review_count
FROM temp_hotel_arena_reviews
GROUP BY hotel_name,
reviewer_score
ORDER BY reviewer_score DESC;
And here’s the output.
hotel_name | reviewer_score | count |
---|---|---|
Hotel Arena | 6.3 | 1 |
Hotel Arena | 5.4 | 1 |
Hotel Arena | 7.1 | 1 |
Hotel Arena | 4.2 | 1 |
Hotel Arena | 5.8 | 2 |
Working With Temporary Tables in Complex Queries
Temporary tables are ideal for breaking down complex queries into several logical blocks. This is especially helpful when your queries involve multiple JOIN operations. In such cases, temp tables can act as a staging platform to break down the process.
We’ll show this by solving the question from Meta/Facebook.
Interview Question Date: April 2020
Calculate the running total (i.e., cumulative sum) energy consumption of the Meta/Facebook data centers in all 3 continents by the date. Output the date, running total energy consumption, and running total percentage rounded to the nearest whole number.
Link to the question: https://platform.stratascratch.com/coding/10084-cum-sum-energy-consumption
The question gives us the tables that have data on the energy consumption on three continents.
The first table is fb_eu_energy.
date | consumption |
---|---|
2020-01-01 | 400 |
2020-01-02 | 350 |
2020-01-03 | 500 |
2020-01-04 | 500 |
2020-01-07 | 600 |
The next one is fb_na_energy.
date | consumption |
---|---|
2020-01-01 | 250 |
2020-01-02 | 375 |
2020-01-03 | 600 |
2020-01-06 | 500 |
2020-01-07 | 250 |
The third table is fb_asia_energy.
date | consumption |
---|---|
2020-01-01 | 400 |
2020-01-02 | 400 |
2020-01-04 | 675 |
2020-01-05 | 1200 |
2020-01-06 | 750 |
We need to use these tables to find the date, cumulative total energy consumption, and cumulative total percentage rounded to the nearest whole number.
Instead of writing one complex query, let’s write several simpler ones that will use the temp tables.
The first step is to create a temporary table that will have all energy consumption data in one place. We do that by using UNION ALL.
CREATE TEMP TABLE total_energy AS
SELECT *
FROM fb_eu_energy
UNION ALL
SELECT *
FROM fb_asia_energy
UNION ALL
SELECT *
FROM fb_na_energy;
Here’s what the temp table looks like.
date | consumption |
---|---|
2020-01-01 | 400 |
2020-01-02 | 350 |
2020-01-03 | 500 |
2020-01-04 | 500 |
2020-01-07 | 600 |
Here’s the article if you want to know the difference between UNION vs UNION ALL in SQL.
Next, we will use this temp table to sum up the energy consumption by date and save it in another temporary table. We’ll sort the data ascendingly by date.
CREATE TEMP TABLE energy_by_date AS
SELECT date,
SUM(consumption) AS total_energy
FROM total_energy
GROUP BY date
ORDER BY date ASC;
The data in the second temp table looks like this.
date | total_energy |
---|---|
2020-01-01 | 1050 |
2020-01-02 | 1125 |
2020-01-03 | 1100 |
2020-01-04 | 1175 |
2020-01-05 | 1200 |
We’re now at the final step. Only the final calculations remain.
This is where we calculate the energy cumulative using the SUM() window function.
Also, we need a cumulative total percentage. We get it by dividing the cum sum by the total energy consumption, multiplying it by 100, and rounding it to the nearest whole number.
SELECT date,
SUM(total_energy) OVER (ORDER BY date ASC) AS cumulative_total_energy,
ROUND(SUM(total_energy) OVER (ORDER BY date ASC) * 100 / (SELECT SUM(total_energy) FROM energy_by_date), 0) AS percentage_of_total_energy
FROM energy_by_date;
If you compare this with the official solution on the platform, you’ll see that we simply took two CTEs and turned them into temp table creating queries.
But now, the code is easier to debug, and we have a result of each CTE stored in a separate table that we can access without running CTE every time.
This solves the question and returns the desired output.
date | cumulative_total_energy | percentage_of_total_energy |
---|---|---|
2020-01-01 | 1050 | 13 |
2020-01-02 | 2175 | 27 |
2020-01-03 | 3275 | 40 |
2020-01-04 | 4450 | 55 |
2020-01-05 | 5650 | 69 |
Improving Query Performance With Temporary Tables
Frequently used datasets or complex computed values can be cached in a temp table. In doing so, the data fetching times can be greatly reduced.
Let’s take a look at this example by Yelp. We need to find reviews that contain the words 'food', 'pizza', 'sandwich', or 'burger'. Then, we should list the businesses with such reviews, their addresses, and states.
Find Yelp food reviews containing any of the keywords: 'food', 'pizza', 'sandwich', or 'burger'. List the business name, address, and the state which satisfies the requirement.
Link to the question: https://platform.stratascratch.com/coding/9612-keywords-from-yelp-reviews
The question works with the tables yelp_business
business_id | name | neighborhood | address | city | state | postal_code | latitude | longitude | stars | review_count | is_open | categories |
---|---|---|---|---|---|---|---|---|---|---|---|---|
G5ERFWvPfHy7IDAUYlWL2A | All Colors Mobile Bumper Repair | 7137 N 28th Ave | Phoenix | AZ | 85051 | 33.45 | -112.07 | 1 | 4 | 1 | Auto Detailing;Automotive | |
0jDvRJS-z9zdMgOUXgr6rA | Sunfare | 811 W Deer Valley Rd | Phoenix | AZ | 85027 | 33.68 | -112.08 | 5 | 27 | 1 | Personal Chefs;Food;Gluten-Free;Food Delivery Services;Event Planning & Services;Restaurants | |
6HmDqeNNZtHMK0t2glF_gg | Dry Clean Vegas | Southeast | 2550 Windmill Ln, Ste 100 | Las Vegas | NV | 89123 | 36.04 | -115.12 | 1 | 4 | 1 | Dry Cleaning & Laundry;Laundry Services;Local Services;Dry Cleaning |
pbt3SBcEmxCfZPdnmU9tNA | The Cuyahoga Room | 740 Munroe Falls Ave | Cuyahoga Falls | OH | 44221 | 41.14 | -81.47 | 1 | 3 | 0 | Wedding Planning;Caterers;Event Planning & Services;Venues & Event Spaces | |
CX8pfLn7Bk9o2-8yDMp_2w | The UPS Store | 4815 E Carefree Hwy, Ste 108 | Cave Creek | AZ | 85331 | 33.8 | -111.98 | 1.5 | 5 | 1 | Notaries;Printing Services;Local Services;Shipping Centers |
and yelp_reviews.
business_name | review_id | user_id | stars | review_date | review_text | funny | useful | cool |
---|---|---|---|---|---|---|---|---|
AutohausAZ | C4TSIEcazRay0qIRPeMAFg | jlbPUcCRiXlMtarzi9sW5w | 5 | 2011-06-27 | Autohaus is my main source for parts for an old Mercedes that has been in the family since new that I am restoring. The old beater is truly a labor of | 1 | 2 | 1 |
Citizen Public House | ZZ0paqUsSX-VJbfodTp1cQ | EeCWSGwMAPzwe_c1Aumd1w | 4 | 2013-03-18 | First time in PHX. Friend recommended. Friendly waitstaff. They were understaffed and extremely busy, but we never knew. The short ribs were tende | 0 | 0 | 0 |
Otto Pizza & Pastry | pF6W5JOPBK6kOXTB58cYrw | JG1Gd2mN2Qk7UpCqAUI-BQ | 5 | 2013-03-14 | LOVE THIS PIZZA! This is now one of my favorite pizza places in phoenix area. My husband i walked into this cute family owned business and were greet | 0 | 0 | 0 |
Giant Hamburgers | QBddRcflAcXwE2qhsLVv7w | T90ybanuLhAr0_s99GDeeg | 3 | 2009-03-27 | ok, so I tried this place out based on other reviews. I had the cheeseburger with everything, some fries, and a chocolate shake. The burger was okay, | 0 | 1 | 1 |
Tammie Coe Cakes | Y8UMm_Ng9oEpJbIygoGbZQ | MWt24-6bfv_OHLKhwMQ0Tw | 3 | 2008-08-25 | Overrated. The treats are tasty but certainly not the best I have ever had. I would have rated this a two star but the cakes and cookies are REALLY pr | 1 | 3 | 2 |
These tables are huge, and new reviews are constantly added. So, the businesses with any reviews change. Also, with each added review, new businesses will possibly satisfy the criteria specified in the question.
So, what we can do to improve the query performance is to create a table with distinct businesses that have been reviewed with specific terms.
We use WHERE and the STRPOS() function to find the keywords in reviews.
CREATE TEMP TABLE temp_reviewed_businesses AS
SELECT DISTINCT business_name
FROM yelp_reviews
WHERE
STRPOS(review_text, 'food') <> 0 OR
STRPOS(review_text, 'sandwich') <> 0 OR
STRPOS(review_text, 'pizza') <> 0 OR
STRPOS(review_text, 'burger') <> 0;
The data in the temp table is shown below.
business_name |
---|
Flancer's Cafe |
Otto Pizza & Pastry |
Los Dos Molinos |
Sleepy Dog Brewpub |
Athens Gyros |
Now, we can join the temporary table with yelp_business to get the required business details.
SELECT trb.business_name,
bus.address,
bus.state
FROM temp_reviewed_businesses trb
JOIN yelp_business bus
ON bus.name = trb.business_name;
Staging and Transforming Data Using Temporary Tables
SQL temp tables can be used as a staging area to preprocess data before it gets moved to its final destination.
We’ll show you how this can work on this question by the City of San Francisco.
Find the average total compensation based on employee titles and gender. Total compensation is calculated by adding both the salary and bonus of each employee. However, not every employee receives a bonus so disregard employees without bonuses in your calculation. Employee can receive more than one bonus. Output the employee title, gender (i.e., sex), along with the average total compensation.
Link to the question: https://platform.stratascratch.com/coding/10077-income-by-title-and-gender
We’ll rephrase it to suit our scenario. Imagine you’re tasked with importing data from an external source into a primary employee_compensation table. The external needs some preprocessing before it’s inserted:
- Calculate the total compensation
- Convert all the employee emails to lowercase
- Exclude any employees without the living address
Let’s see what the code for each step looks like.
1. Calculate the total compensation
In this first step, we consider the question’s tables as external tables, and some of this data has to enter our imagined main table, employee_compensation.
The first external table we have is sf_employee.
id | first_name | last_name | age | sex | employee_title | department | salary | target | city | address | manager_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Max | George | 26 | M | Sales | Sales | 1300 | 200 | Max@company.com | California | 2638 Richards Avenue | 1 |
13 | Katty | Bond | 56 | F | Manager | Management | 150000 | 0 | Katty@company.com | Arizona | 1 | |
11 | Richerd | Gear | 57 | M | Manager | Management | 250000 | 0 | Richerd@company.com | Alabama | 1 | |
10 | Jennifer | Dion | 34 | F | Sales | Sales | 1000 | 200 | Jennifer@company.com | Alabama | 13 | |
19 | George | Joe | 50 | M | Manager | Management | 100000 | 0 | George@company.com | Florida | 1003 Wyatt Street | 1 |
The second table we need to get data from is sf_bonus
worker_ref_id | bonus |
---|---|
1 | 5000 |
2 | 3000 |
3 | 4000 |
1 | 4500 |
2 | 3500 |
Along with getting some data from the two tables, we’ll also calculate the total compensation for each employee by summing the columns salary and bonus. This new column is called total_compensation.
All data will be stored in the temporary table temp_employee_bonus_info.
CREATE TEMP TABLE temp_employee_bonus_info AS
SELECT
e.id,
e.first_name,
e.last_name,
(e.salary + b.bonus) AS total_compensation,
e.employee_title,
e.department,
e.address,
e.city,
e.email
FROM
sf_employee e
JOIN
sf_bonus b ON e.id = b.worker_ref_id;
This is our data from the temp table.
id | first_name | last_name | salary | bonus | total_compensation | employee_title | department | address | city | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Allen | Wang | 200000 | 5000 | 205000 | Manager | Management | 1069 Ventura Drive | California | Allen@company.com |
2 | Joe | Jack | 1000 | 3000 | 4000 | Sales | Sales | 995 Jim Rosa Lane | California | Joe@company.com |
3 | Henry | Ted | 2000 | 4000 | 6000 | Senior Sales | Sales | 1609 Ford Street | California | Henry@company.com |
1 | Allen | Wang | 200000 | 4500 | 204500 | Manager | Management | 1069 Ventura Drive | California | Allen@company.com |
2 | Joe | Jack | 1000 | 3500 | 4500 | Sales | Sales | 995 Jim Rosa Lane | California | Joe@company.com |
2. Convert all the employee emails to lowercase
For some reason, the email addresses start with a capital letter. It shouldn’t be like that; all the letters should be lowercase.
We use the UPDATE command and the LOWER() function to update the temporary table we created.
UPDATE temp_employee_bonus_info
SET email = LOWER(email);
Now, the data in the temp table is changed.
id | first_name | last_name | salary | bonus | total_compensation | employee_title | department | address | city | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Allen | Wang | 200000 | 5000 | 205000 | Manager | Management | 1069 Ventura Drive | California | allen@company.com |
2 | Joe | Jack | 1000 | 3000 | 4000 | Sales | Sales | 995 Jim Rosa Lane | California | joe@company.com |
3 | Henry | Ted | 2000 | 4000 | 6000 | Senior Sales | Sales | 1609 Ford Street | California | henry@company.com |
1 | Allen | Wang | 200000 | 4500 | 204500 | Manager | Management | 1069 Ventura Drive | California | allen@company.com |
2 | Joe | Jack | 1000 | 3500 | 4500 | Sales | Sales | 995 Jim Rosa Lane | California | joe@company.com |
3. Exclude any employees without a living address and insert the data into a main table named employee_compensation
We use the INSERT INTO command to list all the columns we want to insert into the employee_compensation table.
Then, we select all these columns from the temp_employee_bonus_info table.
Along the way, we exclude all the employees without the address. We do that by specifying the filtering criteria in WHERE. The first condition excludes NULLs, while the second condition uses the TRIM() function to get rid of the blanks.
INSERT INTO employee_compensation (
id,
first_name,
last_name,
salary,
bonus,
employee_title,
department,
address,
city,
email
)
SELECT
id,
first_name,
last_name,
salary,
bonus,
employee_title,
department,
address,
city,
email
FROM temp_employee_bonus_info
WHERE address IS NOT NULL AND TRIM(address) <> '';
Now the data in the employee_compensation table looks like this. And this solves our data engineering problem.
id | first_name | last_name | salary | bonus | total_compensation | employee_title | department | address | city | |
---|---|---|---|---|---|---|---|---|---|---|
1 | Allen | Wang | 200000 | 5000 | 205000 | Manager | Management | 1069 Ventura Drive | California | allen@company.com |
2 | Joe | Jack | 1000 | 3000 | 4000 | Sales | Sales | 995 Jim Rosa Lane | California | joe@company.com |
3 | Henry | Ted | 2000 | 4000 | 6000 | Senior Sales | Sales | 1609 Ford Street | California | henry@company.com |
1 | Allen | Wang | 200000 | 4500 | 204500 | Manager | Management | 1069 Ventura Drive | California | allen@company.com |
2 | Joe | Jack | 1000 | 3500 | 4500 | Sales | Sales | 995 Jim Rosa Lane | California | joe@company.com |
30 | Mark | Jon | 1200 | 500 | 1700 | Sales | Sales | 2522 George Avenue | Alabama | mark@company.com |
Best Practices for Using SQL Temp Tables
Here are some useful tips that will make your using of the SQL temp tables smoother.
1. Name Uniqueness: Use unique, meaningful, and descriptive names to avoid conflicts with other session's temp tables.
2. Drop When Done: Dropping temporary tables once you're done should become your habit. It will free up resources.
3. Use Indexes Sparingly: The indexes consume additional memory and I/O resources. This can sometimes offset the benefit of improving query performance, so consider whether you need indexes.
4. Size Awareness: If your temp table is too large, it might spill to disk, degrading performance.
Conclusion: Leveraging the Potential of SQL Temp Tables
SQL temp tables offer a dynamic way to manipulate, stage, and transform data, especially during intensive or complex querying. They can help in data engineering tasks, and data analysis, can simplify complex queries, and improve their performance.
Their transient nature makes them ideal for interim operations without the risk of cluttering the database. By understanding their strengths and limitations, you can effectively harness the power of SQL temp tables to improve data analysis, processing, and querying.
StrataScratch can help you with that with its vast choice of coding questions. Working with SQL temp tables requires a strong knowledge of many other SQL concepts. By practicing them, you also improve your ability to leverage the SQL temp tables for your work.
Our blog provides a good source for learning about all those SQL concepts and SQL interview questions.