Menu

Querying Data – CREATE DATABASE dbName; GO

Data is not very useful without some way to look at it, search through it, and manipulate it—in other words, querying. You have seen many examples of managing and manipulating data from both structured and semi‐structured data sources. In this section, you’ll learn many ways to analyze the data in your data lake, data warehouse, or any supported data source store.

SQL, T‐SQL, and SparkSQL

What is it with all these SQL acronym variants? Structured Query Language (SQL) was originally called SEQUEL but was later renamed to SQL. So when you refer to or see a reference to SQL, it means the common structured query language for RDBMS products. Other variants of SQL, like T‐SQL, PL/SQL, and SparkSQL, to name a few, use the SQL base to extend the language. Transact‐SQL (T‐SQL) is found in the context of Microsoft SQL Server and Azure SQL. Procedural Language/Structured Query Language (PL/SQL) is an extension of the base SQL language for Oracle databases, and SparkSQL is, of course, an extension of SQL on Apache Spark. Take a look at the following snippet. Notice that the results would be the same for each line of code, but the first line is default SQL and the second is T‐SQL. In many cases the differences have to do with syntax.

SELECT * FROM [READINGS] ORDER BY [VALUE] LIMIT 10;
SELECT TOP 10 (*) FROM [READINGS] ORDER BY [VALUE];

There are, for certain, numerous technical and implementation differences between the different SQL libraries as well. T‐SQL, PL/SQL, and SparkSQL all have some tuning, feature, and management capabilities that are specific to the targeted RDBMSs. To learn more about each of those SQL libraries, visit

More on this topic is not in the scope for this book, but it is an important area to understand if you plan on working a lot with data. It wouldn’t be a stretch to assume your work scope could span across many different types of data sources that use different SQL implementations. Being well‐versed in such views can help progress projects along much faster.

Database Console Commands

Database Console Command (DBCC) statements are useful for maintaining and analyzing the performance and state of an Azure SQL or Azure Synapse Analytics SQL pool instance. These commands do not run on a Spark pool. If you are experiencing any unexpected latency that you cannot attribute to a change in the velocity or variety of data flowing into your data solution, you might want to look at the database itself. Using these commands can help you gather insights into how the DBMS itself is performing and if the data is optimally stored and structured. There are many commands, so the following is a summary of some of the most important ones.

Leave a Reply

Your email address will not be published. Required fields are marked *