Microsoft Azure Data Engineering Associate (DP-203) Exam

DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/03/2024

To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]

Create an Azure Cosmos DB Microsoft DP-203

Querying Data – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202407/09/2024

Data is not very useful without some way to look at it, search through it, and manipulate it—in other words, querying. You have seen many examples of managing and manipulating data from both structured and semi‐structured data sources. In this section, you’ll learn many ways to analyze the data in your data lake, data warehouse, […]

Feature Availability Microsoft DP-203

Spark Streaming – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202406/26/2024

The previous chapter introduced you to both Azure Stream Analytics/Event Hubs and Apache Spark/Apache Kafka. Those products are what you use to implement a data streaming solution, as illustrated in Figure 2.20. Notice the various kinds of data producers that can feed into Kafka. Any device that has permission and that can send correctly formatted […]

Microsoft DP-203 Spark Streaming

CREATEGLOBALTEMPVIEW() – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202405/05/2024

This method creates a temporary view, which has a lifetime of the Spark application. If a view with the same name already exists, then an exception is thrown. df.createGlobalTempView(‘Brainwaves’)df2 = spark.sql(‘SELECT Session.POWReading.AF3[0].THETA FROM Brainwaves’) Notice that the argument following FROM is the name of the view created in the previous line of code. CREATEORREPLACEGLOBALTEMPVIEW() This […]

Create an Azure Cosmos DB Feature Availability Microsoft DP-203

DataFrame – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202404/11/2024

Up to this point you have seen examples that created a DataFrame, typically identified as df from a spark.read.* method: df = spark.read.csv(‘/tmp/output/brainjammer/reading.csv’) Instead of passing the data to load into a DataFrame as a path via the read.* method, you could load the data into an object, named data, for example: data =’abfss://<uid>@<accountName>.dfs.core.windows.net/reading.csv’ Once […]

Create an Azure Cosmos DB Distributed Tables Microsoft DP-203

GROUPBY() – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202403/23/2024

This method provides the ability to run aggregation, which is the gathering, summary, and presentation of data in an easily consumable format. The groupBy() method provides several aggregate functions; here are the most common: avg() Returns the average of grouped columnscount() Returns the number of rows in that identified groupmax() Returns the largest value in […]

Microsoft DP-203 Querying Data

TO_DATE() AND TO_TIMESTAMP() – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202402/27/2024

There can be many challenges when working with dates and datetimes. In many scenarios a date is stored as a string. That means if you want to perform any calculation with it, the date value stored in the string needs to be converted to the date data type. Additionally, the date format is often specific […]

Distributed Tables Microsoft DP-203

SDKs – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202401/12/2024

Coding, especially with C#, is not a big part of the DP‐203 exam, but knowing about the available SDKs might come up. Table 2.7 provides an overview of the most relevant SDKs in the scope of the DP‐203 exam. A complete list of all Azure SDKs for .NET can be found at https://docs.microsoft.com/dotnet/azure/sdk/packages. TABLE 2.7 […]

Create an Azure Cosmos DB Microsoft DP-203 Spark Streaming

Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202412/13/2023

To perform the duties of an Azure data engineer, you will need to write some code. Perhaps you will not need to have a great understanding of encapsulation, asynchronous patterns, or parallel LINQ queries, but some coding skill is necessary. Up to this point you have been exposed primarily to SQL syntax and PySpark, which […]

Distributed Tables Microsoft DP-203 Querying Data

Data Skew – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202411/23/2023

When data is skewed, it means that one category is represented more often when compared to the other data categories in a given dataset. Take Figure 2.19, which represents a right/positive skew, no skew, and a left/negative skew for the BCI electrodes. You might notice that the graph in the middle, with no skew, is […]