Data Type Conversion in SQL: A Closer Look at CAST Function

CAST AS SQL Function




Casting data types in SQL allows you to change data types according to your needs. This article discusses this essential skill for data professionals.

If programming languages were humans, SQL would be described as a narcissist – it does manipulate (data) and brags about being the best at it.

In this case, manipulation (of data) is positive, as it gives you more flexibility in data handling. Imagine being stuck with data as it is. What a world that would be! No ability to change a data type, no nothing; just query data and wish really hard that the data looks exactly as you want. What are the odds for that? Slim. Slimmer than finding a good movie on Netflix.

Luckily, we have SQL. (Not for streaming movies, though.)

With SQL and its ability to convert data from one type to another, you get access to customizing data to your needs.

It’s an important topic for anyone working with data. So, let’s take it step by step and first explain data type conversion. Then, we’ll look into different ways to convert data.

What Is Data Type Conversion in SQL?

In SQL, data type refers to the data type (e.g., numerical, textual, date…) a column in the database can accept.

Data type conversion is also known as data type casting. It means changing data’s data type from one to another.

In SQL or, to be more specific, Relational Database Management System (RDBMS), two sorts of data type conversion exist: implicit and explicit.

The implicit data type conversion is done automatically by RDBMS. For example, if you’re comparing a number written as a string to integer data, the database will automatically convert the string to an integer before comparing two values. The database automatically decides which data type to convert to.

The explicit data type conversion happens when a database user explicitly ‘orders’ the database to convert data from one data type to another. This is done using dedicated SQL functions for data conversion.

What is the CAST() Function in SQL?

Ninja with headphones

CAST() is a standard SQL function for explicit data type conversion. By allowing you to specify the output data type, it puts you in control of data shaping.

Syntax of the CAST() Function in SQL

The CAST() syntax is very straightforward.

CAST(expression AS data_type);

In this syntax, expression is the value or column to be converted, while data_type is the target data type you want to convert to.

A short example:

CAST('25' AS INT);

This code will convert a string '25' to an integer. In other words, it converts textual data to numerical data.

Types of Conversions Possible With Cast() in SQL

While CAST() offers conversion flexibility, it doesn’t mean you can convert any data type to any data type. Each database has its restrictions, so for detailed information about type conversions, consult the database’s documentation.

But, in general, we can talk about these most common conversions.

  1. String to date/time
  2. String to numeric
  3. Date/time to string
  4. Date/time to numeric
  5. Numeric to string
  6. Numeric to date/time

There’s also an ‘introverted conversion’ (yes, I just came up with this term) when the data type family stays the same, but the data type itself is cast. For example, you can convert an integer to a decimal. You changed the data type, but you’re still operating within the realm of numeric data types.

All these conversions are possible with CAST(). However, it doesn’t mean they are necessary. What I mean by that is some of these conversions are already implicitly executed by a database. So, using CAST() would be unnecessary. Again, you’ll need to consult a database documentation for more information.

Practical Applications and Examples of SQL CAST Function

Let’s now look at several interview questions from our platform that show you the practical applications of CAST().

Example 1: From INT to DECIMAL

It’s best we kick off this example section with one easy interview question by Google.


Tables: online_promotions, online_orders


Link to the question: https://platform.stratascratch.com/coding/2069-sales-with-valid-promotion

This question asks you to output the percentage of orders with promotion IDs.

For that, you have tables online_promotions

Table: online_promotions
promotion_id
1
2


and online_orders.

Table: online_orders
product_idpromotion_idcost_in_dollarscustomer_iddateunits_sold
11212022-04-014
33632022-05-246
122102022-05-013
12322022-05-019
221022022-05-011


The mathematics behind calculating the percentage is dividing the number of promotion order IDs by the total number of orders and multiplying everything by 100.

So, we first use COUNT() to count the promotion_id column, meaning we count the orders with applied promotion.

Then, we use COUNT(*) to count the total number of orders.

Now, we need to divide the two results. However, we need to convert the division result to the DECIMAL data type. Why? Because both the numerator and denominator are integers. We’re looking for a percentage (part of the total number), so the numerator will usually be lower than the denominator. In the integer world, this division will result in 0; the decimal places will be omitted, as the result of the integer division is still an integer.

SELECT ((COUNT(p.promotion_id) / CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;


But, if you convert it to a decimal data type, you’ll get 0.75. Multiply that by 100, and you get 75%.

All required columns and the first 5 rows of the solution are shown

percentage
75

Example 2: From FLOAT to TEXT

Another example is the City of San Francisco question that requires casting data from FLOAT to TEXT.


Table: sf_restaurant_health_violations

Link to the question: https://platform.stratascratch.com/coding/9737-verify-that-the-first-4-digits-are-equal-to-1415-for-all-phone-numbers

The question gives you the table sf_restaurant_health_violations.

Table: sf_restaurant_health_violations
business_idbusiness_namebusiness_addressbusiness_citybusiness_statebusiness_postal_codebusiness_latitudebusiness_longitudebusiness_locationbusiness_phone_numberinspection_idinspection_dateinspection_scoreinspection_typeviolation_idviolation_descriptionrisk_category
5800John Chin Elementary School350 Broadway StSan FranciscoCA9413337.8-122.4{'longitude': '-122.403154', 'needs_recoding': False, 'latitude': '37.798358', 'human_address': '{"address":"","city":"","state":"","zip":""}'}5800_201710172017-10-1798Routine - Unscheduled5800_20171017_103149Wiping cloths not clean or properly stored or inadequate sanitizerLow Risk
64236Sutter Pub and Restaurant700 Sutter StSan FranciscoCA9410237.79-122.41{'longitude': '-122.41188', 'needs_recoding': False, 'latitude': '37.78881', 'human_address': '{"address":"","city":"","state":"","zip":""}'}64236_201707252017-07-2588Routine - Unscheduled64236_20170725_103133Foods not protected from contaminationModerate Risk
1991SRI THAI CUISINE4621 LINCOLN WaySan FranciscoCA9412237.76-122.51{'longitude': '-122.507779', 'needs_recoding': False, 'latitude': '37.764073', 'human_address': '{"address":"","city":"","state":"","zip":""}'}1991_201711292017-11-2986Routine - Unscheduled1991_20171129_103139Improper food storageLow Risk
3816Washington Bakery & Restaurant733 Washington StSan FranciscoCA9410837.8-122.41{'longitude': '-122.405845', 'needs_recoding': False, 'latitude': '37.795174', 'human_address': '{"address":"","city":"","state":"","zip":""}'}3816_201607282016-07-2867Routine - Unscheduled3816_20160728_103108Contaminated or adulterated foodHigh Risk
39119Brothers Restaurant4128 GEARY BlvdSan FranciscoCA9411837.78-122.46{'longitude': '-122.463762', 'needs_recoding': False, 'latitude': '37.781148', 'human_address': '{"address":"","city":"","state":"","zip":""}'}39119_201607182016-07-1879Routine - Unscheduled39119_20160718_103133Foods not protected from contaminationModerate Risk

The task here is to output the number of businesses whose phone numbers don’t start at 1415.

First, you need to use COUNT(*) to count all those businesses you’ll filter in the WHERE clause.

Speaking of WHERE the first condition there (I feel poetic today!) is that the phone number is not NULL. That is to ensure that we don’t count the businesses without a phone number if there are such.

The second condition is that the phone number starts with 1415. You would, naturally, think of using LEFT() to search for a substring from the left. However, the column business_phone_number is FLOAT, a numeric data type, so LEFT(), as a text function, won’t work. Luckily, we have CAST() to convert FLOAT to TEXT, and now LEFT() works!

SELECT COUNT(*) AS n_of_businesses
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL and LEFT(business_phone_number :: TEXT, 4) <> '1415';

The question notes that 0 is expected as a result, and it indeed is.

All required columns and the first 5 rows of the solution are shown

n_of_businesses
1

Example 3: From INTERVAL to TIME

The final example is a hard interview question by Meta.


Table: facebook_web_log

Link to the question: https://platform.stratascratch.com/coding/9784-time-between-two-events

The question asks you to query Meta’s weblogs to find the user with the least time between a page load and their scroll down. In the output, we will show the user ID, page load time, scroll-down time, and time between the two events in seconds.

Here’s the dataset named facebook_web_log.

Table: facebook_web_log
user_idtimestampaction
02019-04-25 13:30:15page_load
02019-04-25 13:30:18page_load
02019-04-25 13:30:40scroll_down
02019-04-25 13:30:45scroll_up
02019-04-25 13:31:10scroll_down

Let me explain the code one CTE by one.

The first CTE self-joins the web log table. By doing that, we can use the timestamp column from the ‘both’ tables to show the load and scroll time by user ID. Additionally, we subtract those two times to get the time between the two events. Additionally, we use WHERE to keep only rows where the page load is followed by scroll down, and the scroll down happens after the page load.

WITH initial_data AS (
  SELECT t1.user_id,
         t1.timestamp AS load_time,
         t2.timestamp AS scroll_time,
         t2.timestamp - t1.timestamp AS duration
  FROM facebook_web_log t1
  JOIN facebook_web_log t2 ON t1.user_id = t2.user_id
  WHERE t1.action = 'page_load' AND t2.action = 'scroll_down' AND t2.timestamp > t1.timestamp
)


Here’s the snapshot of this CTE’s output.

All required columns and the first 5 rows of the solution are shown

user_idload_timescroll_timeduration
02019-04-25 13:30:152019-04-25 13:30:4025
02019-04-25 13:30:152019-04-25 13:31:1055
02019-04-25 13:30:152019-04-25 13:31:2570
02019-04-25 13:30:152019-04-28 13:30:40259225
02019-04-25 13:30:182019-04-25 13:30:4022
02019-04-25 13:30:182019-04-25 13:31:1052
02019-04-25 13:30:182019-04-25 13:31:2567
02019-04-25 13:30:182019-04-28 13:30:40259222
12019-04-25 13:40:002019-04-25 13:40:1010
12019-04-25 13:40:002019-04-25 13:40:1515
12019-04-25 13:40:002019-04-25 13:40:2020
12019-04-25 13:40:002019-04-25 13:40:2525
12019-04-25 13:40:002019-04-25 13:40:3030
12019-04-25 13:40:002019-04-26 11:15:1077710
12019-04-25 13:40:002019-04-26 11:15:2077720
22019-04-25 13:41:212019-04-25 13:41:309
22019-04-25 13:41:212019-04-25 13:41:3514
12019-04-26 11:15:002019-04-26 11:15:1010
12019-04-26 11:15:002019-04-26 11:15:2020


We got duration, cool, but what does it mean? 25 what? Seconds, minutes, hours? We don’t know, as this is shown as an INTERVAL data type. The question asks us to show the duration in seconds. So it’s best that we take this data and convert it to TIME. This is exactly what the following CTE does by referencing the first one

formatted_data AS (
  SELECT user_id,
         load_time,
         scroll_time,
         CAST(duration AS TIME) AS duration
  FROM initial_data
)


The next query references the previous one to rank the duration using the RANK() window function. It ranks the users from the shortest to the longest time between loading and scrolling.

ranked_data AS (
  SELECT *,
         RANK() OVER (ORDER BY duration) AS rnk
  FROM formatted_data
)


We can now put all this together and add another SELECT that queries the ranked data and returns only the best-ranked user.

WITH initial_data AS (
  SELECT t1.user_id,
         t1.timestamp AS load_time,
         t2.timestamp AS scroll_time,
         t2.timestamp - t1.timestamp AS duration
  FROM facebook_web_log t1
  JOIN facebook_web_log t2 ON t1.user_id = t2.user_id
  WHERE t1.action = 'page_load' AND t2.action = 'scroll_down' AND t2.timestamp > t1.timestamp
),

formatted_data AS (
  SELECT user_id,
         load_time,
         scroll_time,
         CAST(duration AS TIME) AS duration
  FROM initial_data
),

ranked_data AS (
  SELECT *,
         RANK() OVER (ORDER BY duration) AS rnk
  FROM formatted_data
)

SELECT user_id,
       load_time,
       scroll_time,
       duration
FROM ranked_data
WHERE rnk = 1;


Here’s that one user.

All required columns and the first 5 rows of the solution are shown

user_idload_timescroll_timeduration
22019-04-25 13:41:212019-04-25 13:41:3000:00:09

Alternatives to CAST() in SQL

CAST() is not the only way to convert data types. There are also alternatives in each database. These alternatives are often similar but are sometimes also database-specific.

Alternatives cast chart

PostgreSQL Alternative to CAST()

In PostgreSQL, you can use the type cast operator '::' instead of CAST(). This operator is just a shortcut for CAST() unique to PostgreSQL.

Its syntax is shown below.

:: AS data_type;


Take a look at this code.

SELECT ((COUNT(p.promotion_id) / CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;


Instead, I could write it like this using the type cast operator.

SELECT ((COUNT(p.promotion_id) / COUNT(*)::DECIMAL) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;

MySQL Alternative to CAST()

In MySQL, there’s another function with the same purpose as CAST(), and is called CONVERT().

Its syntax is:

CONVERT(expression, data_type);


So, the previous code example could be written like this in MySQL.

SELECT ((COUNT(p.promotion_id) / CONVERT(COUNT(*), DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;

SQL Server Alternatives to CAST()

SQL Server has several alternatives to CAST():

  1. CONVERT()
  2. TRY_CAST()
  3. PARSE()
  4. TRY_PARSE()

1. CONVERT()

The SQL Server’s CONVERT() function is the same as MySQL’s in terms of its use.

However, the syntax’s not the same. In SQL Server, it goes like this.

CONVERT(data_type, expression);

So, the previous code should be rewritten like this.

SELECT ((COUNT(p.promotion_id) / CONVERT(DECIMAL, COUNT(*))) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;

2. TRY_CAST()

This is the CAST() variation that adds the layer of error handling, as it returns NULL (instead of error) if the conversion is not possible.

The syntax is the same as CAST().

TRY_CAST(expression AS data_type);


Applied to the above example, the code looks like this.

SELECT ((COUNT(p.promotion_id) / TRY_CAST(COUNT(*) AS DECIMAL)) * 100.0) AS percentage
FROM online_orders o
LEFT JOIN online_promotions p ON o.promotion_id = p.promotion_id;

3. PARSE()

Another alternative to CAST() is PARSE(). However, it’s only a partial alternative, as this function is used for converting string values to date, time, and number values. This function can take the culture parameter for culture-specific formatting.

Here’s the syntax.

PARSE(string AS data_type);


This code, for example…

SELECT PARSE('15 April 2023' AS DATE USING 'en-US');


… would convert the string to date looking like this: 2023-04-15.

4. TRY_PARSE()

This is the same as what TRY_CAST() is to CAST(): it returns NULL if conversion is not possible.

Again, it’s the same syntax.

TRY_PARSE(string AS data_type);

You can write the above code using this function, too.

SELECT TRY_PARSE('15 April 2023' AS DATE USING 'en-US');

Oracle Alternatives to CAST()

Oracle has three alternatives to CAST(), but they are in the form of type-specific functions.

  1. TO_DATE()
  2. TO_CHAR()
  3. TO_NUMBER()

1. TO_DATE()

This is a function for converting strings to numeric data types.

Here’s the syntax.

TO_DATE(string, [format])

So, writing this code…

SELECT TO_DATE('15 April 2023', 'DD MONTH YYYY') FROM dual

…returns this: 2023-04-15 00:00:00.

The format argument must reflect the format of the string.

2. TO_CHAR()

This function converts a date or a number to a string.

The function’s syntax is shown below.

TO_CHAR(value)

So, this code…

SELECT COUNT(*) AS n_of_businesses
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
AND SUBSTR(CAST(business_phone_number AS VARCHAR2(100)), 1, 4) <> '1415'

…can be rewritten like this.

SELECT COUNT(*)
FROM sf_restaurant_health_violations
WHERE business_phone_number IS NOT NULL
AND SUBSTR(TO_CHAR(business_phone_number), 1, 4) <> '1415'

3. TO_NUMBER()

And, finally, the TO_NUMBER(). This function converts strings to numbers.

The syntax is:

TO_NUMBER(string, format)

If your numbers were written as text you would convert them to numbers like this.

SELECT TO_NUMBER('123.45', '999.99') FROM dual

Same as with TO_DATE(); the format argument must reflect the string format.

Frequent Errors SQL Users Make With Data Type Conversion and CAST()

Users make three typical errors when casting data types.

Errors in Data Conversion

1. Loss of Precision

In SQL, different data types have different levels of precision. The loss of precision is when significant digits of accuracy are lost when converting data. This happens when you convert from high-precision data to lower-precision data.

So, besides knowing general data types, you also need to know how they show data and behave in mathematical operations. For example, how many decimal places will there be, are they rounded up or down, etc. Sometimes, these differences can significantly affect your data.

Take a look at this example.

SELECT CAST(158.45 AS INT) AS integer_value;


If you do that, you convert 158.68 to 156. This might be the change you don’t want!

2. Incompatible Conversions

Another common error is when you’re trying to convert one data type to a data type that it’s not convertible to.

For example, you could try to convert DATE to INT.

SELECT CAST('2022-04-15' AS INT) AS integer_value;

This won’t work; you’ll get an error.

3. Performance Issues

Converting data slows down a query. This is not noticeable in the examples we’ve shown. But, if you’re type-casting on a large dataset and in complex queries, the excessive use of CAST() and other conversion functions can seriously degrade performance.

To avoid this, you should use CAST() only when necessary. (This is where your knowledge of implicit conversions comes into the game.) You could also optimize your queries or try to find a way to use appropriate data types from the outset.

Conclusion

SQL CAST() is a rather simple function that can do wonders with your data. The examples I’ve shown you are just three of many data type conversion interview questions we have on StrataScratch. Practicing more similar questions will help you learn how different data types behave; converting data types should be like second nature to you. The same goes for many other SQL concepts you can practice on our platform or read in our blog.

This is especially important if you’re preparing for SQL interview questions.

CAST AS SQL Function


Become a data expert. Subscribe to our newsletter.