Data Type Conversion in SQL: A Closer Look at CAST Function
Categories
Create a waterfall chart to illustrate the step-by-step revenue changes throughout the year for a retail company, using 'tomato' for increases and 'navy' for decreases.
Casting data types in SQL allows you to change data types according to your needs. This article discusses this essential skill for data professionals.
If programming languages were humans, SQL would be described as a narcissist – it does manipulate (data) and brags about being the best at it.
In this case, manipulation (of data) is positive, as it gives you more flexibility in data handling. Imagine being stuck with data as it is. What a world that would be! No ability to change a data type, no nothing; just query data and wish really hard that the data looks exactly as you want. What are the odds for that? Slim. Slimmer than finding a good movie on Netflix.
Luckily, we have SQL. (Not for streaming movies, though.)
With SQL and its ability to convert data from one type to another, you get access to customizing data to your needs.
It’s an important topic for anyone working with data. So, let’s take it step by step and first explain data type conversion. Then, we’ll look into different ways to convert data.
What Is Data Type Conversion in SQL?
In SQL, data type refers to the data type (e.g., numerical, textual, date…) a column in the database can accept.
Data type conversion is also known as data type casting. It means changing data’s data type from one to another.
In SQL or, to be more specific, Relational Database Management System (RDBMS), two sorts of data type conversion exist: implicit and explicit.
The implicit data type conversion is done automatically by RDBMS. For example, if you’re comparing a number written as a string to integer data, the database will automatically convert the string to an integer before comparing two values. The database automatically decides which data type to convert to.
The explicit data type conversion happens when a database user explicitly ‘orders’ the database to convert data from one data type to another. This is done using dedicated SQL functions for data conversion.
What is the CAST() Function in SQL?
CAST() is a standard SQL function for explicit data type conversion. By allowing you to specify the output data type, it puts you in control of data shaping.
Syntax of the CAST() Function in SQL
The CAST() syntax is very straightforward.
CAST(expression AS data_type);
In this syntax, expression is the value or column to be converted, while data_type is the target data type you want to convert to.
A short example:
CAST('25' AS INT);
This code will convert a string '25' to an integer. In other words, it converts textual data to numerical data.
Types of Conversions Possible With Cast() in SQL
While CAST() offers conversion flexibility, it doesn’t mean you can convert any data type to any data type. Each database has its restrictions, so for detailed information about type conversions, consult the database’s documentation.
But, in general, we can talk about these most common conversions.
- String to date/time
- String to numeric
- Date/time to string
- Date/time to numeric
- Numeric to string
- Numeric to date/time
There’s also an ‘introverted conversion’ (yes, I just came up with this term) when the data type family stays the same, but the data type itself is cast. For example, you can convert an integer to a decimal. You changed the data type, but you’re still operating within the realm of numeric data types.
All these conversions are possible with CAST(). However, it doesn’t mean they are necessary. What I mean by that is some of these conversions are already implicitly executed by a database. So, using CAST() would be unnecessary. Again, you’ll need to consult a database documentation for more information.
Practical Applications and Examples of SQL CAST Function
Let’s now look at several interview questions from our platform that show you the practical applications of CAST().
Example 1: From INT to DECIMAL
It’s best we kick off this example section with one easy interview question by Google.
Interview Question Date: October 2021
The marketing manager wants you to evaluate how well the previously ran advertising campaigns are working.
Particularly, they are interested in the promotion IDs from the online_promotions
table.
Find the percentage of orders with promotion IDs from the online_promotions
table applied.
Link to the question: https://platform.stratascratch.com/coding/2069-sales-with-valid-promotion
This question asks you to output the percentage of orders with promotion IDs.
For that, you have tables online_promotions
promotion_id |
---|
1 |
2 |
and online_orders.
product_id | promotion_id | cost_in_dollars | customer_id | date | units_sold |
---|---|---|---|---|---|
1 | 1 | 2 | 1 | 2022-04-01 | 4 |
3 | 3 | 6 | 3 | 2022-05-24 | 6 |
1 | 2 | 2 | 10 | 2022-05-01 | 3 |
1 | 2 | 3 | 2 | 2022-05-01 | 9 |
2 | 2 | 10 | 2 | 2022-05-01 | 1 |
The mathematics behind calculating the percentage is dividing the number of promotion order IDs by the total number of orders and multiplying everything by 100.
So, we first use COUNT() to count the promotion_id column, meaning we count the orders with applied promotion.
Then, we use COUNT(*) to count the total number of orders.
Now, we need to divide the two results. However, we need to convert the division result to the DECIMAL data type. Why? Because both the numerator and denominator are integers. We’re looking for a percentage (part of the total number), so the numerator will usually be lower than the denominator. In the integer world, this division will result in 0; the decimal places will be omitted, as the result of the integer division is still an integer.
SELECT ((COUNT(p.promotion_id) / CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
But, if you convert it to a decimal data type, you’ll get 0.75. Multiply that by 100, and you get 75%.
percentage |
---|
75 |
Example 2: From FLOAT to TEXT
Another example is the City of San Francisco question that requires casting data from FLOAT to TEXT.
Verify that the first 4 digits are equal to 1415 for all phone numbers. Output the number of businesses with a phone number that does not start with 1415.
Link to the question: https://platform.stratascratch.com/coding/9737-verify-that-the-first-4-digits-are-equal-to-1415-for-all-phone-numbers
The question gives you the table sf_restaurant_health_violations.
business_id | business_name | business_address | business_city | business_state | business_postal_code | business_latitude | business_longitude | business_location | business_phone_number | inspection_id | inspection_date | inspection_score | inspection_type | violation_id | violation_description | risk_category |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5800 | John Chin Elementary School | 350 Broadway St | San Francisco | CA | 94133 | 37.8 | -122.4 | {'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 5800_20171017 | 2017-10-17 | 98 | Routine - Unscheduled | 5800_20171017_103149 | Wiping cloths not clean or properly stored or inadequate sanitizer | Low Risk | |
64236 | Sutter Pub and Restaurant | 700 Sutter St | San Francisco | CA | 94102 | 37.79 | -122.41 | {'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 64236_20170725 | 2017-07-25 | 88 | Routine - Unscheduled | 64236_20170725_103133 | Foods not protected from contamination | Moderate Risk | |
1991 | SRI THAI CUISINE | 4621 LINCOLN Way | San Francisco | CA | 94122 | 37.76 | -122.51 | {'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 1991_20171129 | 2017-11-29 | 86 | Routine - Unscheduled | 1991_20171129_103139 | Improper food storage | Low Risk | |
3816 | Washington Bakery & Restaurant | 733 Washington St | San Francisco | CA | 94108 | 37.8 | -122.41 | {'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 3816_20160728 | 2016-07-28 | 67 | Routine - Unscheduled | 3816_20160728_103108 | Contaminated or adulterated food | High Risk | |
39119 | Brothers Restaurant | 4128 GEARY Blvd | San Francisco | CA | 94118 | 37.78 | -122.46 | {'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'} | 39119_20160718 | 2016-07-18 | 79 | Routine - Unscheduled | 39119_20160718_103133 | Foods not protected from contamination | Moderate Risk |
The task here is to output the number of businesses whose phone numbers don’t start at 1415.
First, you need to use COUNT(*) to count all those businesses you’ll filter in the WHERE clause.
Speaking of WHERE the first condition there (I feel poetic today!) is that the phone number is not NULL. That is to ensure that we don’t count the businesses without a phone number if there are such.
The second condition is that the phone number starts with 1415. You would, naturally, think of using LEFT() to search for a substring from the left. However, the column business_phone_number is FLOAT, a numeric data type, so LEFT(), as a text function, won’t work. Luckily, we have CAST() to convert FLOAT to TEXT, and now LEFT() works!
SELECT COUNT(*) AS n_of_businesses
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL and LEFT(business_phone_number :: TEXT, 4) <> '1415';
The question notes that 0 is expected as a result, and it indeed is.
n_of_businesses |
---|
1 |
Example 3: From INTERVAL to TIME
The final example is a hard interview question by Meta.
Interview Question Date: July 2018
Meta/Facebook's web logs capture every action from users starting from page loading to page scrolling. Find the user with the least amount of time between a page load and their scroll down. Your output should include the user id, page load time, scroll down time, and time between the two events in seconds.
Link to the question: https://platform.stratascratch.com/coding/9784-time-between-two-events
The question asks you to query Meta’s weblogs to find the user with the least time between a page load and their scroll down. In the output, we will show the user ID, page load time, scroll-down time, and time between the two events in seconds.
Here’s the dataset named facebook_web_log.
user_id | timestamp | action |
---|---|---|
0 | 2019-04-25 13:30:15 | page_load |
0 | 2019-04-25 13:30:18 | page_load |
0 | 2019-04-25 13:30:40 | scroll_down |
0 | 2019-04-25 13:30:45 | scroll_up |
0 | 2019-04-25 13:31:10 | scroll_down |
Let me explain the code one CTE by one.
The first CTE self-joins the web log table. By doing that, we can use the timestamp column from the ‘both’ tables to show the load and scroll time by user ID. Additionally, we subtract those two times to get the time between the two events. Additionally, we use WHERE to keep only rows where the page load is followed by scroll down, and the scroll down happens after the page load.
WITH initial_data AS (
SELECT t1.user_id,
t1.timestamp AS load_time,
t2.timestamp AS scroll_time,
t2.timestamp - t1.timestamp AS duration
FROM facebook_web_log t1
JOIN facebook_web_log t2 ON t1.user_id = t2.user_id
WHERE t1.action = 'page_load' AND t2.action = 'scroll_down' AND t2.timestamp > t1.timestamp
)
Here’s the snapshot of this CTE’s output.
user_id | load_time | scroll_time | duration |
---|---|---|---|
0 | 2019-04-25 13:30:15 | 2019-04-25 13:30:40 | 25 |
0 | 2019-04-25 13:30:15 | 2019-04-25 13:31:10 | 55 |
0 | 2019-04-25 13:30:15 | 2019-04-25 13:31:25 | 70 |
0 | 2019-04-25 13:30:15 | 2019-04-28 13:30:40 | 259225 |
0 | 2019-04-25 13:30:18 | 2019-04-25 13:30:40 | 22 |
0 | 2019-04-25 13:30:18 | 2019-04-25 13:31:10 | 52 |
0 | 2019-04-25 13:30:18 | 2019-04-25 13:31:25 | 67 |
0 | 2019-04-25 13:30:18 | 2019-04-28 13:30:40 | 259222 |
1 | 2019-04-25 13:40:00 | 2019-04-25 13:40:10 | 10 |
1 | 2019-04-25 13:40:00 | 2019-04-25 13:40:15 | 15 |
1 | 2019-04-25 13:40:00 | 2019-04-25 13:40:20 | 20 |
1 | 2019-04-25 13:40:00 | 2019-04-25 13:40:25 | 25 |
1 | 2019-04-25 13:40:00 | 2019-04-25 13:40:30 | 30 |
1 | 2019-04-25 13:40:00 | 2019-04-26 11:15:10 | 77710 |
1 | 2019-04-25 13:40:00 | 2019-04-26 11:15:20 | 77720 |
2 | 2019-04-25 13:41:21 | 2019-04-25 13:41:30 | 9 |
2 | 2019-04-25 13:41:21 | 2019-04-25 13:41:35 | 14 |
1 | 2019-04-26 11:15:00 | 2019-04-26 11:15:10 | 10 |
1 | 2019-04-26 11:15:00 | 2019-04-26 11:15:20 | 20 |
We got duration, cool, but what does it mean? 25 what? Seconds, minutes, hours? We don’t know, as this is shown as an INTERVAL data type. The question asks us to show the duration in seconds. So it’s best that we take this data and convert it to TIME. This is exactly what the following CTE does by referencing the first one
formatted_data AS (
SELECT user_id,
load_time,
scroll_time,
CAST(duration AS TIME) AS duration
FROM initial_data
)
The next query references the previous one to rank the duration using the RANK() window function. It ranks the users from the shortest to the longest time between loading and scrolling.
ranked_data AS (
SELECT *,
RANK() OVER (ORDER BY duration) AS rnk
FROM formatted_data
)
We can now put all this together and add another SELECT that queries the ranked data and returns only the best-ranked user.
WITH initial_data AS (
SELECT t1.user_id,
t1.timestamp AS load_time,
t2.timestamp AS scroll_time,
t2.timestamp - t1.timestamp AS duration
FROM facebook_web_log t1
JOIN facebook_web_log t2 ON t1.user_id = t2.user_id
WHERE t1.action = 'page_load' AND t2.action = 'scroll_down' AND t2.timestamp > t1.timestamp
),
formatted_data AS (
SELECT user_id,
load_time,
scroll_time,
CAST(duration AS TIME) AS duration
FROM initial_data
),
ranked_data AS (
SELECT *,
RANK() OVER (ORDER BY duration) AS rnk
FROM formatted_data
)
SELECT user_id,
load_time,
scroll_time,
duration
FROM ranked_data
WHERE rnk = 1;
Here’s that one user.
user_id | load_time | scroll_time | duration |
---|---|---|---|
2 | 2019-04-25 13:41:21 | 2019-04-25 13:41:30 | 00:00:09 |
Alternatives to CAST() in SQL
CAST() is not the only way to convert data types. There are also alternatives in each database. These alternatives are often similar but are sometimes also database-specific.
PostgreSQL Alternative to CAST()
In PostgreSQL, you can use the type cast operator '::' instead of CAST(). This operator is just a shortcut for CAST() unique to PostgreSQL.
Its syntax is shown below.
:: AS data_type;
Take a look at this code.
SELECT ((COUNT(p.promotion_id) / CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
Instead, I could write it like this using the type cast operator.
SELECT ((COUNT(p.promotion_id) / COUNT(*)::DECIMAL) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
MySQL Alternative to CAST()
In MySQL, there’s another function with the same purpose as CAST(), and is called CONVERT().
Its syntax is:
CONVERT(expression, data_type);
So, the previous code example could be written like this in MySQL.
SELECT ((COUNT(p.promotion_id) / CONVERT(COUNT(*), DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
SQL Server Alternatives to CAST()
SQL Server has several alternatives to CAST():
- CONVERT()
- TRY_CAST()
- PARSE()
- TRY_PARSE()
1. CONVERT()
The SQL Server’s CONVERT() function is the same as MySQL’s in terms of its use.
However, the syntax’s not the same. In SQL Server, it goes like this.
CONVERT(data_type, expression);
So, the previous code should be rewritten like this.
SELECT ((COUNT(p.promotion_id) / CONVERT(DECIMAL, COUNT(*))) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
2. TRY_CAST()
This is the CAST() variation that adds the layer of error handling, as it returns NULL (instead of error) if the conversion is not possible.
The syntax is the same as CAST().
TRY_CAST(expression AS data_type);
Applied to the above example, the code looks like this.
SELECT ((COUNT(p.promotion_id) / TRY_CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;
3. PARSE()
Another alternative to CAST() is PARSE(). However, it’s only a partial alternative, as this function is used for converting string values to date, time, and number values. This function can take the culture parameter for culture-specific formatting.
Here’s the syntax.
PARSE(string AS data_type);
This code, for example…
SELECT PARSE('15 April 2023' AS DATE USING 'en-US');
… would convert the string to date looking like this: 2023-04-15.
4. TRY_PARSE()
This is the same as what TRY_CAST() is to CAST(): it returns NULL if conversion is not possible.
Again, it’s the same syntax.
TRY_PARSE(string AS data_type);
You can write the above code using this function, too.
SELECT TRY_PARSE('15 April 2023' AS DATE USING 'en-US');
Oracle Alternatives to CAST()
Oracle has three alternatives to CAST(), but they are in the form of type-specific functions.
- TO_DATE()
- TO_CHAR()
- TO_NUMBER()
1. TO_DATE()
This is a function for converting strings to numeric data types.
Here’s the syntax.
TO_DATE(string, [format])
So, writing this code…
SELECT TO_DATE('15 April 2023', 'DD MONTH YYYY') FROM dual
…returns this: 2023-04-15 00:00:00.
The format argument must reflect the format of the string.
2. TO_CHAR()
This function converts a date or a number to a string.
The function’s syntax is shown below.
TO_CHAR(value)
So, this code…
SELECT COUNT(*) AS n_of_businesses
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
AND SUBSTR(CAST(business_phone_number AS VARCHAR2(100)), 1, 4) <> '1415'
…can be rewritten like this.
SELECT COUNT(*)
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
AND SUBSTR(TO_CHAR(business_phone_number), 1, 4) <> '1415'
3. TO_NUMBER()
And, finally, the TO_NUMBER(). This function converts strings to numbers.
The syntax is:
TO_NUMBER(string, format)
If your numbers were written as text you would convert them to numbers like this.
SELECT TO_NUMBER('123.45', '999.99') FROM dual
Same as with TO_DATE(); the format argument must reflect the string format.
Frequent Errors SQL Users Make With Data Type Conversion and CAST()
Users make three typical errors when casting data types.
1. Loss of Precision
In SQL, different data types have different levels of precision. The loss of precision is when significant digits of accuracy are lost when converting data. This happens when you convert from high-precision data to lower-precision data.
So, besides knowing general data types, you also need to know how they show data and behave in mathematical operations. For example, how many decimal places will there be, are they rounded up or down, etc. Sometimes, these differences can significantly affect your data.
Take a look at this example.
SELECT CAST(158.45 AS INT) AS integer_value;
If you do that, you convert 158.68 to 156. This might be the change you don’t want!
2. Incompatible Conversions
Another common error is when you’re trying to convert one data type to a data type that it’s not convertible to.
For example, you could try to convert DATE to INT.
SELECT CAST('2022-04-15' AS INT) AS integer_value;
This won’t work; you’ll get an error.
3. Performance Issues
Converting data slows down a query. This is not noticeable in the examples we’ve shown. But, if you’re type-casting on a large dataset and in complex queries, the excessive use of CAST() and other conversion functions can seriously degrade performance.
To avoid this, you should use CAST() only when necessary. (This is where your knowledge of implicit conversions comes into the game.) You could also optimize your queries or try to find a way to use appropriate data types from the outset.
Conclusion
SQL CAST() is a rather simple function that can do wonders with your data. The examples I’ve shown you are just three of many data type conversion interview questions we have on StrataScratch. Practicing more similar questions will help you learn how different data types behave; converting data types should be like second nature to you. The same goes for many other SQL concepts you can practice on our platform or read in our blog.
This is especially important if you’re preparing for SQL interview questions.