To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]
Spark Streaming – CREATE DATABASE dbName; GO
The previous chapter introduced you to both Azure Stream Analytics/Event Hubs and Apache Spark/Apache Kafka. Those products are what you use to implement a data streaming solution, as illustrated in Figure 2.20. Notice the various kinds of data producers that can feed into Kafka. Any device that has permission and that can send correctly formatted […]
DataFrame – CREATE DATABASE dbName; GO
Up to this point you have seen examples that created a DataFrame, typically identified as df from a spark.read.* method: df = spark.read.csv(‘/tmp/output/brainjammer/reading.csv’) Instead of passing the data to load into a DataFrame as a path via the read.* method, you could load the data into an object, named data, for example: data =’abfss://<uid>@<accountName>.dfs.core.windows.net/reading.csv’ Once […]
Schema Drift – CREATE DATABASE dbName; GO
You have now read and learned about the many kinds of schemas and all the different kinds of tables. Those concepts help you as an Azure data engineer to better understand the kind of data structures you could be working with. You need to know something about the data to normalize it and query to […]
Static Schema– CREATE DATABASE dbName; GO
The word static has numerous meanings, and the one that applies is dependent on the context in which it is used. In the database context, the meaning is that once a schema is defined and created, it will not change. You find static schemas in relational (aka structured) databases. If you recall from the previous […]
Temporary Table– CREATE DATABASE dbName; GO
A temporary table is one that is intended to be used only for a given session. For example, if you create a normal table, you expect the table to remain persisted on the database until you purposefully remove it. Each time you log in, you expect that the table is available and queryable. This isn’t […]
Feature Availability– CREATE DATABASE dbName; GO
Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]
Star Schema – CREATE DATABASE dbName; GO
A star schema is a fact table surrounded by multiple dimension tables. When visualized, the shape the table makes resembles that of a star, something similar to Figure 2.16. FIGURE 2.16 A star schema example Do you remember what table distribution type you use for a fact table and what type you use for a […]
Explode Arrays – CREATE DATABASE dbName; GO
The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following: %%pysparkfrom pyspark.sql.functions import explodedf = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))dfe.show(2, truncate=False, vertical=True) The first line of the code snippet is what is referred to as a magic command. The magic command […]
Data Management– CREATE DATABASE dbName; GO
Don’t confuse data management with database management, where the focus is on the mechanics of the DBMS. When you choose to run your database on the Azure platform and select a PaaS product, then the management of that database is no longer your or your company’s responsibility. Instead, the focus here is the management of […]