Query Relational Data in Azure SQL, MySQL, MariaDB, and PostgreSQL
While most RDBMSs implementations of SQL use the same core functionality, there are some subtle differences. The following sections explore the syntax used to query data in Azure SQL, Azure Database for MySQL, Azure Database for MariaDB, and Azure Database for PostgreSQL and highlights some of the key differences.
Querying Azure SQL with T-SQL
The first set of queries discussed are constructed using T-SQL. As mentioned previously, T-SQL is the Microsoft extension of ANSI SQL used to communicate with a SQL Server–based relational database. All the examples in this section can be used to query tables in the AdventureWorksDW2019 database. Use the following link to download a backup of the database: https://docs.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver15&tabs=ssms#download-backup-files. The link also provides instructions on how to restore the database to an instance of SQL Server or Azure SQL.
Retrieving data from a relational database all starts with a select command. The following is an example of a select statement that returns all data from every column in a single table:
SELECT *
FROM [dbo].[FactInternetSales]
The asterisk (*) symbol is a wildcard character that indicates “all.” In this case, the * is used to represent every column in the FactInternetSales table. While this saves users time when writing queries with long column lists, it can result in poor query performance by returning more data than what is required. Also, applications that use SELECT * statements are liable to break when new columns are added to the table or view being queried. For these reasons, it’s always better to explicitly list the columns needed in a select statement.
Queries are often written to return a filtered list of data. The following is an example that returns sales information only for products that cost more than $1,000. The result set is also sorted by sales amount in descending order.
SELECT ProductKey
,ProductStandardCost
,TotalProductCost
,SalesAmount
,OrderDate
FROM [dbo].[FactInternetSales]
WHERE ProductStandardCost> 1000
ORDER BY SalesAmount DESC
Rarely do applications and reports use data from only one table. Instead, applications querying relational databases will often build result sets from two or more tables with the same select statement. Queries can do this by using a join operation. Join operations leverage the logical relationships between tables to build rows of data with columns from different tables. There are different types of join operations available that can return different combinations of data. The following is a list of the four most common join types and their T-SQL implementations:
- Inner joins retrieve data from both tables that meets the join condition. Inner joins can be defined with the INNER JOIN or JOIN expressions.
- Left outer joins retrieve all data from the table on the left-hand side of the join condition and data from the right table that meets the join condition. Left outer joins can be defined with the LEFT OUTER JOIN or LEFT JOIN expressions.
- Right outer joins retrieve all data from the table on the right-hand side of the join condition, and data from the left table that meets the join condition. Right outer joins can be defined with the RIGHT OUTER JOIN or RIGHT JOIN expressions.
- Full outer joins retrieve all data from both the left and right tables. Full outer joins can be defined with the FULL OUTER JOIN or FULL JOIN expressions.
Figure 2.21 illustrates how the different join types retrieve data from two tables (represented as table A and table B).
FIGURE 2.21 Types of SQL joins
In addition to these join types, SQL Server–based database engines also enable users to develop queries using cross joins. Cross joins are special types of joins that return the Cartesian product of rows from both tables. Cross joins can be defined with the CROSS JOIN expression.
The following query builds on the previous example with added data from the DimProduct table. Since it uses a JOIN command without any additional adjectives, the query will perform an inner join, only returning data from both tables that meet the join condition defined in the ON clause.
SELECT P.EnglishProductName
,FIS.ProductKey
,FIS.ProductStandardCost
,FIS.TotalProductCost
,FIS.SalesAmount
,FIS.OrderDate
FROM [dbo].[FactInternetSales] AS FIS
JOIN [dbo].[DimProduct] AS P
ON FIS.ProductKey = P.ProductKey
WHERE ProductStandardCost> 1000
ORDER BY SalesAmount DESC
Note that the query uses the AS command in the FROM clause to give each table a short form alias. This alias can be used to specify which tables the selected columns are in and how the join condition is defined. Aliases can also be given to columns, allowing users to give names to columns that are the result of aggregations.
SQL provides several built-in functions that can be used to infer insights out of relational data. Built-in functions can be categorized based on the actions they perform on data. For example, aggregate functions such as SUM(), MAX(), and MIN() perform calculations on a set of values and return a single value. They can be used in combination with the GROUP BY clause to calculate aggregations on categories of rows.
The following query revises the previous one so that it returns the total quantity sold and the total sales dollars for products that cost more than $1,000. It also groups the sales totals by product name and monthly sales per year.
SELECT P.EnglishProductName
,SUM(FIS.OrderQuantity) AS TotalQuantity
,SUM(FIS.SalesAmount) AS TotalSales
,MONTH(FIS.OrderDate) AS [Month]
,YEAR(FIS.OrderDate) AS [Year]
FROM [dbo].[FactInternetSales] AS FIS
JOIN [dbo].[DimProduct] AS P
ON FIS.ProductKey = P.ProductKey
WHERE ProductStandardCost> 1000
AND YEAR(FIS.OrderDate)> 2010
GROUP BY P.EnglishProductName,
MONTH(FIS.OrderDate),
YEAR(FIS.OrderDate)
ORDER BY [Year], TotalSales DESC
You may be wondering why the WHERE and GROUP BY clauses are not using the column aliases that were defined at the beginning of the statement. This is due to the T-SQL logical processing order that was discussed previously in this chapter. Since the WHERE and GROUP BY clauses are processed by the database engine before the SELECT is, these clauses do not know how to resolve column aliases.
The final T-SQL example in this section describes how to limit the result set to the first 10 rows the query returns. This is one key difference between T-SQL and other versions of SQL, as T-SQL uses the TOP command and other versions use LIMIT. We will demonstrate how other relational database platforms implement the LIMIT command. SELECT TOP(10) P.EnglishProductName
,SUM(FIS.OrderQuantity) AS TotalQuantity
,SUM(FIS.SalesAmount) AS TotalSales
FROM [dbo].[FactInternetSales] AS FIS
JOIN [dbo].[DimProduct] AS P
ON FIS.ProductKey = P.ProductKey
WHERE ProductStandardCost> 1000
GROUP BY P.EnglishProductName
ORDER BY TotalSales DESC