Postgres vs MySQL: Which is Better for Analytics?
Categories
Postgres vs MySQL: Looking into differences (and some similarities) between PostgreSQL and MySQL to help you decide which one suits your needs best.
There are three main similarities between PostgreSQL and MySQL. They are highly popular, they’re open-source, and they both use their own dialects to communicate with the databases.
They are for years among the most popular RDBMSes (or databases, as we’re going to call them colloquially from now on). Data from Statista shows both databases in the four most popular databases as of January 2022.
Of those four, only PostgreSQL and MySQL are open-source databases. Being open-source means these two are completely free, which also has a significant (positive!) impact on database popularity. It’s not the only benefit. The possibility that anyone can modify the software leads to flexibility and a community providing quality support. It’s only multiplied by the popularity of the two databases and the sheer number of its users and contributors.
All this means both PostgreSQL and MySQL dialects are almost unavoidable in the data science world.
But how do you decide which one you should use? Nobody has ever decided between two choices based on their similarities. The differences are what make, well, the difference. As a data scientist, you’re probably not that concerned about database architecture, performance, integrity, and all other stuff RDBMS is for.
What will make you give a pass (or not) to a certain database are the possibilities of the SQL dialect and how well it serves you to do your job.
That’s why we’re going to look into the syntax differences between PostgreSQL and MySQL and their possibilities.
Here’s the overview of the differences we’ll cover, with detailed explanations following.
Postgres vs MySQL: Data Types
Speaking of possibilities, when working with data, one of the starting points to consider should be data types. Knowing which data types a database allows give you information on how easy you’ll be able to complete your project. Or if you’ll be able!
Difference Overview
MySQL Data Types
MySQL has five data type categories:
- Numeric Data Types
- Date and Time Data Types
- String Data Types
- Spatial Data Types
- The JSON Data Types
PostgreSQL Data Types
MySQL’s five data types seem rather basic compared to PostgreSQL:
- Numeric Data Types
- Monetary Data Types
- Character Data Types
- Binary Data Types
- Date and Time Data Types
- Boolean Data Type
- Enumerated Data Types
- Geometric Data Types
- Network Address Data Types
- Bit String Data Types
- Text Search Data Types
- UUID Data Types
- XML Data Types
- JSON Data Types
- Arrays
- Composite Data Types
- Range Data Types
- Object Identifier Data Types
- pg_lsn Data Type
- Pseudo-Types
Verdict: PostgreSQL Wins!
Basically, it doesn’t matter which dialect you use if you’re working with basic data types, such as numeric, date-and-time, or text data. However, if you require a wider range of data types, some atypical data types such as geometric data types, or working with unstructured data, learning PostgreSQL might be a go-to. It’s also interesting to note that MySQL, unlike PostgreSQL, doesn’t have the Boolean data type. Instead, the TINYINT(1) is used for such data.
Postgres vs MySQL: Case Sensitivity
Case sensitivity influences the ease of SQL code writing. If the dialect is case-sensitive, it means you have to be very careful about getting this right. If the code is case-insensitive, you can relax and write whatever you want in regards to the upper or lower case.
Difference Overview
MySQL Case Sensitivity
MySQL is case insensitive, which means you don’t have to think about the upper and lower case when writing a query.
Let’s take this interview question by the City of Los Angeles to showcase this:
Find all inspections which are part of an inactive program.
Link to the question: https://platform.stratascratch.com/coding/10277-find-all-inspections-which-are-part-of-an-inactive-program
For the sake of readability, we’ll only select two columns in the query otherwise the same as the official solution. If your MySQL code looks like this, you’d get a correct answer.
SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'inactive'
Don't worry if you check this solution and find the answer is not correct. It's only because we selected two instead of all columns, like in the official solution.
Here are the first several rows of the output:
Even though the values in the column program_status are written as ‘INACTIVE’, you could write ‘inactive’ in the WHERE clause. In MySQL, this doesn’t matter. You could write ‘INACTIVE’, ‘inactive’ or even go wild and write ‘iNaCtIvE’, the output will contain all the inspections that are part of an inactive program, no matter the case.
The above code would cover all possibilities, even if the table was named Los_Angeles_restaurant_health_inspections.
PostgreSQL Case Sensitivity
PostgreSQL is case sensitive. Here’s what that means in practical terms. To get the correct answer to the above question, your code must be written like shown below.
SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'INACTIVE'
What if you write it this way?
SELECT serial_number,
program_status
FROM los_angeles_restaurant_health_inspections
WHERE program_status = 'inactive'
The output is here. Sort of.
There’s nothing because your string in the WHERE clause doesn’t match the case of data in the column program_status.
PostgreSQL is also case-sensitive when it comes to column and table names. Or, to be more specific, case-sensitivity becomes important if your column and/or table name includes upper case. If the above table was named Los_Angeles_restaurant_health_inspections and the columns are Serial_Number and Program_Status, your PostgreSQL code should be:
SELECT “Serial_Number”,
“Program_Status”
FROM Los_Angeles_restaurant_health_inspections
WHERE “Program_Status” = 'INACTIVE'
Names with the uppercase should be double-quoted and match the table or column case.
Some workarounds to make PostgreSQL case-insensitive include using citext extension, ILIKE operator, or LOWER() function.
When the names are all in lowercase, and you write your query in uppercase, this query would still work:
SELECT SERIAL_NUMBER,
PROGRAM_STATUS
FROM LOS_ANGELES_RESTAURANT_HEALTH_INSPECTIONS
WHERE PROGRAM_STATUS = 'INACTIVE'
Verdict: MySQL Wins!
It’s simply more practical to write a code when you don’t have to think about the upper or lower case. Working on any serious project would become painstakingly difficult otherwise. To get everything right, you would have to know how exactly every table or column is named in terms of cases. The same goes for the data in tables.
Postgres vs MySQL: Converting Character Sets and Strings to UTF-8
UTF-8 is an encoding system that allows storing every possible character and symbol. Within this system, you’re not limited to standard Latin letters. You could store any letter from any alphabet and various other symbols. This even includes emojis.
Difference Overview
MySQL Conversion to UTF-8
Depending on the version you’re working on, you might be required to convert to UTF-8.
PostgreSQL Conversion to UTF-8
PostgreSQL works completely the opposite. There’s no need to convert character sets and strings to UTF-8. Not only that but the UTF-8 syntax is not allowed at all.
Verdict: PostgreSQL Wins!
Both dialects support the UTF-8 encoding system, which is great. This means you can work with all data outside the typical Latin letters.
But, there’s a but! To do that in PostgreSQL, you don’t have to do anything. You simply use the characters you want, and the database does the rest. In MySQL, you have one additional step, which is a conversion. That’s why PostgreSQL is a winner here.
Postgres vs MySQL: Conditional Statement
The IF-THEN-ELSE or a conditional statement is common in programming languages. It is used to compare data, and based on that comparison, the query returns (different) results. In other words, the output depends on whether the set condition is met or not.
Difference Overview
MySQL Conditional Statement
MySQL allows you to use the IF() and NULLIF() functions when writing a conditional statement. The IF() function is a classic IF-THEN-ELSE statement whose arguments specify the condition, the output if the condition is true, and the output if the condition is false.
The question by ESPN will be our showcase here:
“Add a column to each row which will classify Olympics that athlete is competing in as 'European' or 'NonEuropean' based on the city it was hosted. Output all details along with the corresponding city classification.
European cities are Athina, Berlin, London, Paris, and Lillehammer.”
Link to the question: https://platform.stratascratch.com/coding/10185-european-and-non-european-olympics
If you want to solve it in MySQL, you’d use the IF() function:
SELECT *,
IF (city = 'Athina'
OR city = 'Berlin'
OR city = 'London'
OR city = 'Paris'
OR city = 'Lillehammer',
'European',
'NonEuropean') AS city_classification
FROM olympics_athletes_events
PostgreSQL Conditional Statement
In PostgreSQL, the IF() and NULLIF() functions don’t exist. However, that doesn’t mean you can’t write conditional statements. To do that, you’ll need to use the SQL CASE WHEN statements.
Here’s how it works in a code:
SELECT *,
(CASE
WHEN city IN ('Athina',
'Berlin',
'London',
'Paris',
'Lillehammer', 'Albertville') THEN 'European'
ELSE 'NonEuropean'
END) AS city_classification
FROM olympics_athletes_events
Both codes return the same result (which we’ll show only partially).
Verdict: PostgreSQL Wins!
The PostgreSQL wins here, too. While in the above example, both IF() function and the CASE WHEN statement do the same thing in an equally easy way, conditions can get much more complicated. When they do, the CASE WHEN then becomes much easier to write and read. It allows you to set out all the conditions in a very tidy way. In the IF() statement, you’d probably have to nest multiple IF() functions, which can get quite messy and is prone to errors.
Additionally, the CASE WHEN statement is standard SQL, so it will work on any other SQL-standard database.
Postgres vs MySQL: Deleting Data and Tables
As you would probably expect, both SQL dialects support DELETE, DROP, and TRUNCATE commands. Still, there are some differences in the allowed extensions of these commands.
Difference Overview
MySQL DROP and TRUNCATE
When you want to DROP a table together with its dependent objects, such as other tables and views, using the CASCADE keyword along with the DROP command would be very useful. Unfortunately, MySQL doesn’t support it but is probably to be introduced in some future versions.
MySQL supports temporary tables. It also has the TEMPORARY keyword used in the DROP command to delete only the temporary tables.
Regarding TRUNCATE, MySQL is really basic here. It allows you to TRUNCATE the table, and that’s it. No additional possibilities like in PostgreSQL.
PostgreSQL DROP and TRUNCATE
Here, you can use DROP CASCADE to drop a table and its dependent objects.
While PostgreSQL supports temporary tables, it doesn’t have a special keyword used for deleting them. To do that, you’ll use the DROP TABLE statement and specify the temporary table you want to delete as you would do with any other table.
However, when it comes to the TRUNCATE statement, PostgreSQL offers much more flexibility. It has features such as CASCADE (you can truncate dependent objects), RESTART IDENTITY (automatically restarts sequences associated with the truncated table’s columns), CONTINUE IDENTITY (the default argument that doesn’t change the values of sequences), and RESTRICT (the default argument not allowing truncate if any tables are referenced by the other tables’ foreign key).
Verdict: PostgreSQL Wins!
The only MySQL benefit here is having the DROP TEMPORARY TABLE command. PostgreSQL allows you to DROP CASCADE and offers several very useful options to go with TRUNCATE.
Postgres vs MySQL: Joining Tables and Combining Data
Being a data scientist, your work heavily depends on the dialect’s ability to join two (or multiple) tables and combine outputs of different queries into one dataset. Let’s see how MySQL and PostgreSQL fare in this area!
Difference Overview
MySQL Joining and Combining Data
In regards to JOINs, MySQL supports four types:
- INNER JOIN
- LEFT JOIN
- RIGHT JOIN
- CROSS JOIN
When it comes to combining data from two or more SELECT statements, it supports UNION and UNION ALL. However, unlike in PostgreSQL, INTERSECT and EXCEPT operators are not supported.
PostgreSQL Joining and Combining Data
It offers all MySQL’s joins, with the additional one: a FULL (OUTER) JOIN.
To find out how all these joins work in practice, we prepared a guide with the SQL JOIN interview question examples.
PostgreSQL, too, supports UNION and UNION ALL. On top of that, there are also INTERSECT and EXCEPT operators.
The INTERSECT operator returns all the rows appearing in both output data sets. The below image shows a visualization of what this operator does.
The EXCEPT operator outputs rows from the first query that are not found in the second query. The following picture shows how it works.
Verdict: PostgreSQL Wins!
With one more JOIN than MySQL and two more options for merging data from two or more SELECT statements, no wonder PostgreSQL wins again.
Postgres vs MySQL: Window Functions
Window functions or analytical functions are used to achieve a new level of data analysis. While they are similar to the SQL aggregate functions in that they also aggregate data, the window functions do not collapse individual rows. This translates to being able to show aggregate along with non-aggregate data.
Difference Overview
MySQL Window Functions
MySQL offers the following window or analytical functions:
- ROW_NUMBER() – numbers the rows
- RANK() – ranks the rows, skips the rank sequence in case of ties
- DENSE_RANK() – ranks the rows, doesn’t skip the rank sequence in case of ties
- PERCENT_RANK() – returns the relative rank of a row
- CUME_DIST() – returns the cumulative distribution
- NTILE() – divides rows into groups
- LAG() – returns data from the previous number of rows
- LEAD() – returns data from the following number of rows
- FIRST_VALUE() – returns the first value
- LAST_VALUE() – returns the last value
- NTH_VALUE() – returns the nth value
The aggregate functions can also be used as window functions. These are the ones MySQL offers, and the majority of them can be used as window functions.
PostgreSQL Window Functions
While PostgreSQL offers the same window functions as MySQL, it offers much more aggregate functions. Not only that, but it allows all of them to be used as window functions, unlike MySQL.
To easier deal with the window functions in both SQL dialects, we prepared the ultimate guide to the window functions in our blog. If you’re interested only in ranking window functions, you should look at this introduction to the SQL Rank Functions talking about RANK() and DENSE_RANK().
Verdict: PostgreSQL Wins!
There’s a win for PostgreSQL in this last category, too. Yes, there are the same window functions in both MySQL and PostgreSQL. However, by offering more aggregate functions and allowing them all to be used as window functions, PostgreSQL simply offers more possibilities when it comes to data analysis.
Should You Choose PostgreSQL or MySQL?
The answer is: choose PostgreSQL!
As you saw from our analysis, the only area where MySQL takes the win is its case insensitivity. Other than that, PostgreSQL is much more suitable for the demands of a data scientist’s job. It’s no wonder since PostgreSQL was built with analytics in mind.
While PostgreSQL is much more suitable for data science, it doesn’t mean you must use it. There are some additional criteria you should consider.
Postgres vs MySQL: Syntax
Both PostgreSQL and MySQL are SQL dialects. It’s no wonder that learning one dialect will allow you to use the other dialect to some degree.
What makes these two dialects’ syntax differ is the level of compliance with the SQL standards. PostgreSQL has a much higher compliance level (more than 150 SQL standards), while MySQL is partially compliant.
If you learned standard SQL or simply want to stay closer to the standard (and maybe use some other SQL standard compliant databases), it’s better you learn PostgreSQL. Also, if there’s a chance that you’ll be working with other SQL standard compliant databases, PostgreSQL is a better choice.
In case you’re just starting, MySQL could be a better choice due to its fewer features, making it easier to learn. However, due to its lower SQL standard compliance, you could be tied to this dialect, making it harder to work with other databases that are more true to the SQL standard.
Postgres vs MySQL: Ease of Use
MySQL is more straightforward and user-friendly, making it especially beneficial for beginners. The learning curve is not very steep, which makes it easier to start building your data science projects much faster as you learn.
PostgreSQL is very robust and features-heavy. The learning curve could be pretty steep for beginners. If your project is heavy and requires some sophisticated calculations, PostgreSQL with its features could be a better choice.
Of course, popularity also plays its part. MySQL is much more ubiquitous, so there’s a better chance that your employer will require MySQL, not PostgreSQL. So what your desired employers use could also influence your decision between two dialects.
Postgres vs MySQL: Performance
Generally, MySQL is faster when dealing with read-only commands. If most of your projects are of such kind and MySQL satisfies all your requirements, then there really isn’t necessary to learn PostgreSQL. You might be better off with MySQL.
However, if your projects include (or you want them to include in the future!) working with big data sets and complex queries that are both read and write-intensive, PostgreSQL performs better. Therefore, it’s a better choice.
Conclusion
Learning any of these two SQL dialects is a wise decision. It could become even wiser by choosing which suits you best.
As you saw here, there are a lot of similarities but also plenty of differences between PostgreSQL and MySQL.
Some of the main differences are data types, case sensitivity, need for a conversion to UTF-8, functions used for IF-THEN-ELSE statements, deleting and joining tables and data, and the range of the window functions offered.
In general, MySQL is more suitable for users who want to learn fast, are mainly interested in read-only intensive projects, and want to start those projects as soon as possible.
PostgreSQL is better for data scientists because they often work on big projects requiring complex (read and write) queries, have to invest more time to learn how to handle such projects, and are or want to be familiar with the standard SQL.
But it’s not always that simple. So you should carefully go through your current and possible future needs, compare them to all the details we talked about here and only then make a choice. If you do that, chances are the choice will be right.