To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]
GROUPBY() – CREATE DATABASE dbName; GO
This method provides the ability to run aggregation, which is the gathering, summary, and presentation of data in an easily consumable format. The groupBy() method provides several aggregate functions; here are the most common: avg() Returns the average of grouped columnscount() Returns the number of rows in that identified groupmax() Returns the largest value in […]
SDKs – CREATE DATABASE dbName; GO
Coding, especially with C#, is not a big part of the DP‐203 exam, but knowing about the available SDKs might come up. Table 2.7 provides an overview of the most relevant SDKs in the scope of the DP‐203 exam. A complete list of all Azure SDKs for .NET can be found at https://docs.microsoft.com/dotnet/azure/sdk/packages. TABLE 2.7 […]
Data Skew – CREATE DATABASE dbName; GO
When data is skewed, it means that one category is represented more often when compared to the other data categories in a given dataset. Take Figure 2.19, which represents a right/positive skew, no skew, and a left/negative skew for the BCI electrodes. You might notice that the graph in the middle, with no skew, is […]
Feature Availability– CREATE DATABASE dbName; GO
Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]
Pruning – CREATE DATABASE dbName; GO
If you already know what the term projection means, then you can use that as a basis for the meaning of pruning. You can also use the literal meaning of the word, which involves trimming branches of a tree or a bush. Also, many times there are some stems that simply come out of nowhere […]
Explode Arrays – CREATE DATABASE dbName; GO
The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following: %%pysparkfrom pyspark.sql.functions import explodedf = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))dfe.show(2, truncate=False, vertical=True) The first line of the code snippet is what is referred to as a magic command. The magic command […]
Table Categories – CREATE DATABASE dbName; GO
You might be wondering which distribution model you should use. The answer has to do with the table category to which the table you are creating belongs; see Table 2.3. TABLE 2.3 Table category distribution matrix Category Distribution model Staging ROUND_ROBIN Fact HASH Dimension (small table) REPLICATED Dimension (large table) HASH STAGING TABLE A staging […]
Unstructured– CREATE DATABASE dbName; GO
This kind of data is typically media files like audio, video, or images. There is no available interface for developers to use to query the contents of media files. There are some advancements happening in the Azure Cognitive Services area, where some artificial intelligence (AI) algorithms are able the identify visual or sound patterns. Those […]
Symmetric Multiprocessing (SMP)– CREATE DATABASE dbName; GO
You will find the MMP design in Azure Synapse Analytics and Symmetric Multiprocessing (SMP) design in Azure SQL Database. MMP processors (i.e., CPUs) are allocated with dedicated compute resources like memory, while SMP shares those compute resources. Consider the fact that the retrieval of datastored in memory is less latent than when stored on disk. […]