Menu
Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

To perform the duties of an Azure data engineer, you will need to write some code. Perhaps you will not need to have a great understanding of encapsulation, asynchronous patterns, or parallel LINQ queries, but some coding skill is necessary. Up to this point you have been exposed primarily to SQL syntax and PySpark, which […]

Data Skew – CREATE DATABASE dbName; GO

Data Skew – CREATE DATABASE dbName; GO

When data is skewed, it means that one category is represented more often when compared to the other data categories in a given dataset. Take Figure 2.19, which represents a right/positive skew, no skew, and a left/negative skew for the BCI electrodes. You might notice that the graph in the middle, with no skew, is […]

Schema Drift – CREATE DATABASE dbName; GO

Schema Drift – CREATE DATABASE dbName; GO

You have now read and learned about the many kinds of schemas and all the different kinds of tables. Those concepts help you as an Azure data engineer to better understand the kind of data structures you could be working with. You need to know something about the data to normalize it and query to […]

Static Schema– CREATE DATABASE dbName; GO

Static Schema– CREATE DATABASE dbName; GO

The word static has numerous meanings, and the one that applies is dependent on the context in which it is used. In the database context, the meaning is that once a schema is defined and created, it will not change. You find static schemas in relational (aka structured) databases. If you recall from the previous […]

Temporary Table– CREATE DATABASE dbName; GO

Temporary Table– CREATE DATABASE dbName; GO

A temporary table is one that is intended to be used only for a given session. For example, if you create a normal table, you expect the table to remain persisted on the database until you purposefully remove it. Each time you log in, you expect that the table is available and queryable. This isn’t […]

Feature Availability– CREATE DATABASE dbName; GO

Feature Availability– CREATE DATABASE dbName; GO

Hadoop external tables, created using the previous SQL syntax, are only available when using dedicated SQL pools and support CSV, parquet, and ORC file types. Notice in the following SQL syntax that there is no TYPE argument. The result of not identifying a TYPE is supported only on serverless SQL pools, with CSV and Parquet […]

Unsupported PolyBase Data Types– CREATE DATABASE dbName; GO

Unsupported PolyBase Data Types– CREATE DATABASE dbName; GO

When you’re working with external tables, the following data types are not supported: Unsupported Table Features Here is a list of unsupported Azure Synapse Analytics dedicated SQL pool features: Schema A schema is an organization feature found in a database. Imagine a large relational database where you have over a thousand tables. You would hope […]

Star Schema – CREATE DATABASE dbName; GO

Star Schema – CREATE DATABASE dbName; GO

A star schema is a fact table surrounded by multiple dimension tables. When visualized, the shape the table makes resembles that of a star, something similar to Figure 2.16. FIGURE 2.16 A star schema example Do you remember what table distribution type you use for a fact table and what type you use for a […]

Pruning – CREATE DATABASE dbName; GO

Pruning – CREATE DATABASE dbName; GO

If you already know what the term projection means, then you can use that as a basis for the meaning of pruning. You can also use the literal meaning of the word, which involves trimming branches of a tree or a bush. Also, many times there are some stems that simply come out of nowhere […]

Explode Arrays – CREATE DATABASE dbName; GO

Explode Arrays – CREATE DATABASE dbName; GO

The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following: %%pysparkfrom pyspark.sql.functions import explodedf = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))dfe.show(2, truncate=False, vertical=True) The first line of the code snippet is what is referred to as a magic command. The magic command […]