Menu

Symmetric Multiprocessing (SMP)– CREATE DATABASE dbName; GO

You will find the MMP design in Azure Synapse Analytics and Symmetric Multiprocessing (SMP) design in Azure SQL Database. MMP processors (i.e., CPUs) are allocated with dedicated compute resources like memory, while SMP shares those compute resources. Consider the fact that the retrieval of datastored in memory is less latent than when stored on disk. In the shared resource SMP architecture, it is possible for datasets to be cached in memory and reused, but with MMP this is not possible. You must find the best solution for your requirements.

Partition

Partitions are like shards, but instead of moving data to different databases on different nodes, the data is instead split across the same machine. Note that a machine and node are the same thing. Recall from Exercise 2.2 where you created an Azure Cosmos DB, that part of the provisioning required a partition key. The partition key name was /pk with a value of the Session.Scenario, for example, ClassicalMusic, FlipChart, Meditation. The data is divided into logical, queryable datasets using the value in the identified column as the basis for the split. Splitting, or partitioning, the data into smaller sets improves performance. Consider a scenario where you have collected brainwaves for millions of sessions. Retrieving the session information that has the scenario of Meditation is faster if the query is executed on a partition that contains only that scenario. In contrast, if there was no partition, the query would have to parse through the entire table of all sessions and pick only the records that match the desired scenario.

The following is an example of how you might create a table in Azure Synapse Analytics on a SQL pool named READING with a partition on a SCENARIO_ID:

CREATE TABLE [dbo].[READING] ([READING_ID]    IN NOT NULL IDENTITY(1,1),[SCENARIO_ID]   INT NOT NULL, [ELECTRODE_ID]  INT NOT NULL,[FREQUENCY_ID]  INT NOT NULL,[VALUE] DECIMAL(7,3) NOT NULL)WITH(   CLUSTERED COLUMNSTORE INDEX,  DISTRIBUTION = HASH([SCENARIO_ID]),PARTITION ( [SCENARIO_ID] RANGE RIGHT FOR VALUES(1, 2, 3, 4, 5, 6, 7, 8))););

Both INDEX and DISTRIBUTION are discussed in the next sections. Since there are only eight captured scenarios, it is possible to specifically identify the values used to identify them—for example, SCENARIO_ID = 1 is ClassicalMusic and SCENARIO_ID = 2 is FlipChart. You should know the values from the database you built in Exercise 2.1 and the SCENARIO table you loaded with data.

The PARTITION option contains arguments that can be provided with it. In the previous example, you see RANGE RIGHT and FOR VALUES. RANGE LEFT is the default, which means that the boundary values belong to the partition on the left (lower values), where the boundary values are, in the example (1, 2, 3, 4, 5, 6, 7, 8). The opposite is true for RANGE RIGHT in that the boundary values belong to the partition on the right (higher values). If there is a value in the tables that is less than 1 or greater than 8, this is how it is determined onto which partition the data is placed. If you provide only a single boundary value—for example, a 4—two partitions are created, one for values lower than the value and one for values that are higher. If the data changes and you need to adjust the partitions, there are ways to do this using partition splitting, merging, and switching functions. Splitting comes into play, for example, when that partition gets too big and warrants breaking it down into more partitions. Perhaps you need to split the partition using the ELECTRODE_ID as well. Merging, on the other hand, is the opposite of splitting. Since the objective of partitions is to enhance performance and utilization, if some partitions are taking more time to return than others, level out the workloads. You can level them out by combining two or more existing partitions into one, for example. Lastly, switching has to do with changing the table to which the partition targets. Perhaps a need arises to create a table named READING2 that is loaded with fresh data. It is possible to switch one or more of the partitions to point to data on READING2 instead of the original table READING. Other partitions that are not switched continue targeting the original data on the READING table. All three partition functions (splitting, merging, and switching) are executed using the ALTER TABLE SQL statement. One additional option available supported with the ALTER TABLE command is the TRUNCATE_TARGET option. When this is used, as seen here, the data on the partition is overwritten with new data:

ALTER TABLE [dbo].[READING_NEW] SWITCH PARTITION 2
TO [dbo].[READING] PARTITION 2 WITH (TRUNCATE_TARGET = ON);

It is possible to partition data when running on a Spark pool:

df = spark.read.json(‘abfss://*@*.dfs.core.windows.net/brainjammer.json’)
df.write.partitionBy(‘Scenario’).mode(‘overwrite’).json(‘/tmp/brainwaves/scenario’)

This would result in eight directories being created on ADLS and files containing the data of different scenarios within the directories. You can use df.repartition(8), which will create eight partitions in memory instead of writing the data to files and directories.

Leave a Reply

Your email address will not be published. Required fields are marked *