To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]
TO_DATE() AND TO_TIMESTAMP() – CREATE DATABASE dbName; GO
There can be many challenges when working with dates and datetimes. In many scenarios a date is stored as a string. That means if you want to perform any calculation with it, the date value stored in the string needs to be converted to the date data type. Additionally, the date format is often specific […]
Data Skew – CREATE DATABASE dbName; GO
When data is skewed, it means that one category is represented more often when compared to the other data categories in a given dataset. Take Figure 2.19, which represents a right/positive skew, no skew, and a left/negative skew for the BCI electrodes. You might notice that the graph in the middle, with no skew, is […]
Temporary Table– CREATE DATABASE dbName; GO
A temporary table is one that is intended to be used only for a given session. For example, if you create a normal table, you expect the table to remain persisted on the database until you purposefully remove it. Each time you log in, you expect that the table is available and queryable. This isn’t […]
Unsupported PolyBase Data Types– CREATE DATABASE dbName; GO
When you’re working with external tables, the following data types are not supported: Unsupported Table Features Here is a list of unsupported Azure Synapse Analytics dedicated SQL pool features: Schema A schema is an organization feature found in a database. Imagine a large relational database where you have over a thousand tables. You would hope […]
Star Schema – CREATE DATABASE dbName; GO
A star schema is a fact table surrounded by multiple dimension tables. When visualized, the shape the table makes resembles that of a star, something similar to Figure 2.16. FIGURE 2.16 A star schema example Do you remember what table distribution type you use for a fact table and what type you use for a […]
Data Sources – CREATE DATABASE dbName; GO
There are many locations where you can retrieve data. In this section you will see how to read and write JSON, CSV, and parquet files using PySpark. You have already been introduced to a DataFrame in some capacity. Reading and writing data can happen totally within the context of a file, or the data can […]
Data Management– CREATE DATABASE dbName; GO
Don’t confuse data management with database management, where the focus is on the mechanics of the DBMS. When you choose to run your database on the Azure platform and select a PaaS product, then the management of that database is no longer your or your company’s responsibility. Instead, the focus here is the management of […]
Table Categories – CREATE DATABASE dbName; GO
You might be wondering which distribution model you should use. The answer has to do with the table category to which the table you are creating belongs; see Table 2.3. TABLE 2.3 Table category distribution matrix Category Distribution model Staging ROUND_ROBIN Fact HASH Dimension (small table) REPLICATED Dimension (large table) HASH STAGING TABLE A staging […]
Unstructured– CREATE DATABASE dbName; GO
This kind of data is typically media files like audio, video, or images. There is no available interface for developers to use to query the contents of media files. There are some advancements happening in the Azure Cognitive Services area, where some artificial intelligence (AI) algorithms are able the identify visual or sound patterns. Those […]