Create an Azure Cosmos DB – Microsoft Azure Data Engineering Associate (DP-203) Exam

DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/03/2024Write a Comment

To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]

Create an Azure Cosmos DB Microsoft DP-203

Querying Data – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202407/09/2024Write a Comment

Data is not very useful without some way to look at it, search through it, and manipulate it—in other words, querying. You have seen many examples of managing and manipulating data from both structured and semi‐structured data sources. In this section, you’ll learn many ways to analyze the data in your data lake, data warehouse, […]

Create an Azure Cosmos DB Feature Availability Microsoft DP-203

DataFrame – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202404/11/2024Write a Comment

Up to this point you have seen examples that created a DataFrame, typically identified as df from a spark.read.* method: df = spark.read.csv(‘/tmp/output/brainjammer/reading.csv’) Instead of passing the data to load into a DataFrame as a path via the read.* method, you could load the data into an object, named data, for example: data =’abfss://<uid>@<accountName>.dfs.core.windows.net/reading.csv’ Once […]

Create an Azure Cosmos DB Distributed Tables Microsoft DP-203

GROUPBY() – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202403/23/2024Write a Comment

This method provides the ability to run aggregation, which is the gathering, summary, and presentation of data in an easily consumable format. The groupBy() method provides several aggregate functions; here are the most common: avg() Returns the average of grouped columnscount() Returns the number of rows in that identified groupmax() Returns the largest value in […]

Create an Azure Cosmos DB Microsoft DP-203 Spark Streaming

Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202412/13/2023Write a Comment

To perform the duties of an Azure data engineer, you will need to write some code. Perhaps you will not need to have a great understanding of encapsulation, asynchronous patterns, or parallel LINQ queries, but some coding skill is necessary. Up to this point you have been exposed primarily to SQL syntax and PySpark, which […]

Create an Azure Cosmos DB Distributed Tables Feature Availability Microsoft DP-203

Feature Availability– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202406/16/2023Write a Comment

Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]

Create an Azure Cosmos DB Microsoft DP-203 Querying Data

Data Sources – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202401/03/2023Write a Comment

There are many locations where you can retrieve data. In this section you will see how to read and write JSON, CSV, and parquet files using PySpark. You have already been introduced to a DataFrame in some capacity. Reading and writing data can happen totally within the context of a file, or the data can […]

Create an Azure Cosmos DB Microsoft DP-203 Spark Streaming

HASH – CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202410/05/2022Write a Comment

This distribution model uses a function to make the distribution, as shown in Figure 2.10. For large table sizes, this distribution model delivers the highest query performance. Consider the following snippet, which can be added to the script that creates the READING table: DISTRIBUTION = HASH([ELECTRODE_ID]) This results in the data being deterministically distributed across […]

Create an Azure Cosmos DB Microsoft DP-203 Spark Streaming

Data Concepts– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202408/07/2022Write a Comment

There are many concepts you must be aware, comfortable, and competent with to manage data efficiently. This section covers many data concepts that will not only help you pass the Data Engineering on Microsoft Azure exam, but also help you do the job in the real world. Keep in mind that when discussing relational structure […]

Create an Azure Cosmos DB Microsoft DP-203 Querying Data

Create an Azure Cosmos DB– CREATE DATABASE dbName; GO

Kadisha Cruickshank Updated on 08/03/202404/06/2021Write a Comment

FIGURE 2.6 Azure Cosmos DB APIs FIGURE 2.7 Azure Cosmos Data Explorer FIGURE 2.8 Azure Cosmos Data Explorer SQL query The first query returns the scenario from all the files in that container. The second query returns the first reading for a specific scenario.