Menu
DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

DBCC DROPRESULTSETCACHE – CREATE DATABASE dbName; GO

To enable caching for a session on an Azure Synapse Analytics SQL pool, you would execute the following command. Caching is OFF by default. SET RESULT_SET_CACHING ON The first time a query is executed, the results are stored in cache. The next time the same query is run, instead of parsing through all the data […]

Spark Streaming – CREATE DATABASE dbName; GO

Spark Streaming – CREATE DATABASE dbName; GO

The previous chapter introduced you to both Azure Stream Analytics/Event Hubs and Apache Spark/Apache Kafka. Those products are what you use to implement a data streaming solution, as illustrated in Figure 2.20. Notice the various kinds of data producers that can feed into Kafka. Any device that has permission and that can send correctly formatted […]

CREATEGLOBALTEMPVIEW() – CREATE DATABASE dbName; GO

CREATEGLOBALTEMPVIEW() – CREATE DATABASE dbName; GO

This method creates a temporary view, which has a lifetime of the Spark application. If a view with the same name already exists, then an exception is thrown. df.createGlobalTempView(‘Brainwaves’)df2 = spark.sql(‘SELECT Session.POWReading.AF3[0].THETA FROM Brainwaves’) Notice that the argument following FROM is the name of the view created in the previous line of code. CREATEORREPLACEGLOBALTEMPVIEW() This […]

DataFrame – CREATE DATABASE dbName; GO

DataFrame – CREATE DATABASE dbName; GO

Up to this point you have seen examples that created a DataFrame, typically identified as df from a spark.read.* method: df = spark.read.csv(‘/tmp/output/brainjammer/reading.csv’) Instead of passing the data to load into a DataFrame as a path via the read.* method, you could load the data into an object, named data, for example: data =’abfss://<uid>@<accountName>.dfs.core.windows.net/reading.csv’ Once […]

GROUPBY() – CREATE DATABASE dbName; GO

GROUPBY() – CREATE DATABASE dbName; GO

This method provides the ability to run aggregation, which is the gathering, summary, and presentation of data in an easily consumable format. The groupBy() method provides several aggregate functions; here are the most common: avg() Returns the average of grouped columnscount() Returns the number of rows in that identified groupmax() Returns the largest value in […]

Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

Data Programming and Querying for Data Engineers – CREATE DATABASE dbName; GO

To perform the duties of an Azure data engineer, you will need to write some code. Perhaps you will not need to have a great understanding of encapsulation, asynchronous patterns, or parallel LINQ queries, but some coding skill is necessary. Up to this point you have been exposed primarily to SQL syntax and PySpark, which […]