Menu

Explode Arrays – CREATE DATABASE dbName; GO

The concept of exploding arrays is related to Apache Spark pools and the programming language PySpark. The command to explode an array resembles the following:

%%pyspark
from pyspark.sql.functions import explode
df = spark.read.json(‘abfss://<endpoint>/brainjammer.json’)
dfe = df.select(‘Session.Scenario’, explode(‘Session.POWReading.AF3’))
dfe.show(2, truncate=False, vertical=True)

The first line of the code snippet is what is referred to as a magic command. The magic command is preceded with two percent signs and is what identifies the language in which the following code is written. The languages are listed in Table 2.5.

TABLE 2.5 Spark pool magic commands

Magic commandLanguage
%%pysparkPython
%%sparkScala
%%sqlSparkSQL
%%csharpC#

The next line of code imports the explode method, and then a JSON file is loaded into a DataFrame named df. Next, a select query is performed on the DataFrame to retrieve the brain reading scenario and then explodes the readings held within the AF3 electrode array and displays two results by using the show() method. The snippet of the data that is loaded into the DataFrame—the contents of the brainjammer.json file—is shown here. The snippet constitutes the first brain reading in that file, which is a ClassicalMusic session for one of the five electrodes, in this case AF3. Each electrode captures five different brainwave frequencies.

{“Session”: { “Scenario”: “ClassicalMusic”, “POWReading”: [{ “ReadingDate”: ” 2021-09-12T09:00:18.492″, “Counter”: 0,”AF3″: [{ “THETA”: 15.585, “ALPHA”: 5.892, “BETA_L”: 3.415, “BETA_H”: 1.195, “GAMMA”: 0.836}]}]}}

The output of the exploded AF3 array resembles the following. Notice that Session.Scenario equals the expected value of ClassicalMusic. The second column is named col and contains a single reading of all five frequencies from the AF3 electrode. Because we passed a 2 to the show() method, two records are returned.

-RECORD 0————————————————–
 Session.Scenario | ClassicalMusic                         
 col              | [[5.892, 1.195, 3.415, 0.836, 15.585]]
-RECORD 1————————————————–
 Session.Scenario | ClassicalMusic                        
col              | [[5.871, 1.331, 3.56, 0.799, 26.864]]

Without using the explode() method, if you instead select the entire content of the AF3 electrode array as in the following snippet, you end up with two columns:

dfe = df.select(‘Session.Scenario’, ‘Session.POWReading.AF3’)

The first column is the scenario, as expected, and the second is every AF3 reading contained within the brainjammer.json file. A snippet of the output is shown here. A given session lasts about two minutes and the number of readings in this file is close to 1,200, which is a lot of data—and remember, this is only for a single electrode.

Scenario | ClassicalMusic
AF3      | [[[5.892, 1.195, 3.415, 0.836, 15.585]], [[5.871, 1.331, 3.56,
0.799, 26.864]], [[6.969, 1.479, 3.61, 0.749, 47.282]], [[9.287, 1.624, 3.58,
0.7, 75.78]], [[12.231, 1.736, 3.5, 0.658, 104.216]], [[14.652, 1.792,
3.389, 0.621, 124.413]], [[15.983, 1.805, 3.259, 0.587, 131.456]], …

So, what the explode() method provides is a way to format the data in a more presentable, human‐friendly, and interpretable manner. You might also find that the output helps build the basis for further queries—for example, searching for an average, max, or min value for this specific electrode. When you create an Azure Synapse Analytics workspace, provision a Spark pool, and create a Notebook (described in Chapters 3, “The Sources and Ingestion of Data,” and 4, “The Storage of Data”), you will be able to execute this code and certainly learn more about the data. The nature of gathering intelligence from data has a lot to do with the creativity and curiosity of the person performing the data analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *