Distributed Tables – Microsoft Azure Data Engineering Associate (DP-203) Exam

DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/03/2024Write a Comment

To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]

Create an Azure Cosmos DB Distributed Tables Microsoft DP-203

GROUPBY() – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202403/23/2024Write a Comment

This method provides the ability to run aggregation, which is the gathering, summary, and presentation of data in an easily consumable format. The groupBy() method provides several aggregate functions; here are the most common: avg() Returns the average of grouped columnscount() Returns the number of rows in that identified groupmax() Returns the largest value in […]

Distributed Tables Microsoft DP-203

SDKs – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202401/12/2024Write a Comment

Coding, especially with C#, is not a big part of the DP‐203 exam, but knowing about the available SDKs might come up. Table 2.7 provides an overview of the most relevant SDKs in the scope of the DP‐203 exam. A complete list of all Azure SDKs for .NET can be found at https://docs.microsoft.com/dotnet/azure/sdk/packages. TABLE 2.7 […]

Distributed Tables Microsoft DP-203 Querying Data

Data Skew – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202411/23/2023Write a Comment

When data is skewed, it means that one category is represented more often when compared to the other data categories in a given dataset. Take Figure 2.19, which represents a right/positive skew, no skew, and a left/negative skew for the BCI electrodes. You might notice that the graph in the middle, with no skew, is […]

Create an Azure Cosmos DB Distributed Tables Feature Availability Microsoft DP-203

Feature Availability– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202406/16/2023Write a Comment

Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]

Distributed Tables Microsoft DP-203 Spark Streaming

Pruning – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202403/16/2023Write a Comment

If you already know what the term projection means, then you can use that as a basis for the meaning of pruning. You can also use the literal meaning of the word, which involves trimming branches of a tree or a bush. Also, many times there are some stems that simply come out of nowhere […]

Distributed Tables Feature Availability Microsoft DP-203

Explode Arrays – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202402/20/2023Write a Comment

The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following: %%pysparkfrom pyspark.sql.functions import explodedf = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))dfe.show(2, truncate=False, vertical=True) The first line of the code snippet is what is referred to as a magic command. The magic command […]

Distributed Tables Feature Availability Microsoft DP-203 Querying Data

Table Categories – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202401/03/2022Write a Comment

You might be wondering which distribution model you should use. The answer has to do with the table category to which the table you are creating belongs; see Table 2.3. TABLE 2.3 Table category distribution matrix Category Distribution model Staging ROUND_ROBIN Fact HASH Dimension (small table) REPLICATED Dimension (large table) HASH STAGING TABLE A staging […]

Distributed Tables Microsoft DP-203 Querying Data

Unstructured– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202412/15/2021Write a Comment

This kind of data is typically media files like audio, video, or images. There is no available interface for developers to use to query the contents of media files. There are some advancements happening in the Azure Cognitive Services area, where some artificial intelligence (AI) algorithms are able the identify visual or sound patterns. Those […]

Distributed Tables Microsoft DP-203 Spark Streaming

Symmetric Multiprocessing (SMP)– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202407/08/2021Write a Comment

You will find the MMP design in Azure Synapse Analytics and Symmetric Multiprocessing (SMP) design in Azure SQL Database. MMP processors (i.e., CPUs) are allocated with dedicated compute resources like memory, while SMP shares those compute resources. Consider the fact that the retrieval of datastored in memory is less latent than when stored on disk. […]