Feature Availability – Microsoft Azure Data Engineering Associate (DP-203) Exam

DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/03/2024Write a Comment

To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]

Feature Availability Microsoft DP-203

Spark Streaming – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202406/26/2024Write a Comment

The previous chapter introduced you to both Azure Stream Analytics/Event Hubs and Apache Spark/Apache Kafka. Those products are what you use to implement a data streaming solution, as illustrated in Figure 2.20. Notice the various kinds of data producers that can feed into Kafka. Any device that has permission and that can send correctly formatted […]

Create an Azure Cosmos DB Feature Availability Microsoft DP-203

DataFrame – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202404/11/2024Write a Comment

Up to this point you have seen examples that created a DataFrame, typically identified as df from a spark.read.* method: df = spark.read.csv(‘/tmp/output/brainjammer/reading.csv’) Instead of passing the data to load into a DataFrame as a path via the read.* method, you could load the data into an object, named data, for example: data =’abfss://<uid>@<accountName>.dfs.core.windows.net/reading.csv’ Once […]

Feature Availability Microsoft DP-203

Schema Drift – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202410/11/2023Write a Comment

You have now read and learned about the many kinds of schemas and all the different kinds of tables. Those concepts help you as an Azure data engineer to better understand the kind of data structures you could be working with. You need to know something about the data to normalize it and query to […]

Feature Availability Microsoft DP-203 Spark Streaming

Static Schema– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202409/19/2023Write a Comment

The word static has numerous meanings, and the one that applies is dependent on the context in which it is used. In the database context, the meaning is that once a schema is defined and created, it will not change. You find static schemas in relational (aka structured) databases. If you recall from the previous […]

Feature Availability Microsoft DP-203 Querying Data Spark Streaming

Temporary Table– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/21/2023Write a Comment

A temporary table is one that is intended to be used only for a given session. For example, if you create a normal table, you expect the table to remain persisted on the database until you purposefully remove it. Each time you log in, you expect that the table is available and queryable. This isn’t […]

Create an Azure Cosmos DB Distributed Tables Feature Availability Microsoft DP-203

Feature Availability– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202406/16/2023Write a Comment

Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]

Feature Availability Microsoft DP-203 Querying Data

Star Schema – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202404/28/2023Write a Comment

A star schema is a fact table surrounded by multiple dimension tables. When visualized, the shape the table makes resembles that of a star, something similar to Figure 2.16. FIGURE 2.16 A star schema example Do you remember what table distribution type you use for a fact table and what type you use for a […]

Distributed Tables Feature Availability Microsoft DP-203

Explode Arrays – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202402/20/2023Write a Comment

The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following: %%pysparkfrom pyspark.sql.functions import explodedf = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))dfe.show(2, truncate=False, vertical=True) The first line of the code snippet is what is referred to as a magic command. The magic command […]

Feature Availability Microsoft DP-203 Querying Data

Data Management– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202405/19/2022Write a Comment

Don’t confuse data management with database management, where the focus is on the mechanics of the DBMS. When you choose to run your database on the Azure platform and select a PaaS product, then the management of that database is no longer your or your company’s responsibility. Instead, the focus here is the management of […]