This kind of data is typically media files like audio, video, or images. There is no available interface for developers to use to query the contents of media files. There are some advancements happening in the Azure Cognitive Services area, where some artificial intelligence (AI) algorithms are able the identify visual or sound patterns. Those patterns can be stored as metadata and used to compare with known patterns to determine the contents of the media file. This is a new area you could pursue if you find it interesting.
Data Types and Management
It is important to know in depth the different kinds of data you will work with and store for analysis. Is the piece of data you use in a WHERE clause an integer, GUID, or string? It makes a difference in not only the column definitions, if used, but also the size of the value and the way in which you make reference to them. Notice the different ways in which you reference those different types in the following snippet:
WHERE count = 1
WHERE name = “Benjamin Perkins”
WHERE id = ‘081EBD8A-6032-4B99-976B-D59215B2BB2A’
You can also see how different the lengths of each are. The sizes of the values are significant when you are working with huge amounts of data. A GUID takes up a lot more space than the integer. Storage space is required to store the data on disk and then in memory when loaded from disk for manipulation or analysis.
Data Types
Table 2.2 contains a list of the most common data types.
TABLE 2.2 Common data types
Type | Example | Description |
binary | 1–8,000 | Values between 1 and 8,000 |
bit | 1 or 0 | Either a 1 or 0; nulls can also be stored |
blob | goodvibes.mp3 | An un/semi/structured file |
bool | true or false | Used to compare values |
array | {‘1’, ‘2’, ‘3’} | A group of elements |
char | a | A single number, letter, symbol, or space |
datetime | 2022‐05‐02 08:02:00 | A date and time value; YYYY‐MM‐DD HH:MM:SS |
decimal | 1.5 | ~ ±1.0 × 10−28 to ± 7.9 × 1028 with 28–29 digits |
double | 1.5 | ~ ±5.0 × 10−324 to ± 1.7 × 1038 with 15–16 digits |
float | 3.14 | ~ ±1.5 × 10−45 to ± 3.4 × 1038 with 7 digits |
image | brainmap.png | An image of some kind, such as PNG, BMP, GIF |
int | 100 | A whole number, ±2,147,483,648 |
json | { “brain” : “ben” } | Used to store JSON data |
money | €100 | Currency values |
nchar | Now is the time for… | Fixed length up to 4,000 characters |
nvarchar | To no one will we… | Variable length up to 4,000 characters |
nvarchar (max) | In the beginning… | Variable length up to maximum |
string | csharpguitar | An array of characters |
uniqueidentifier | 081EBD8A‐6032‐… | 16‐byte GUID |
varchar | 30‐65‐EC‐6F‐C4‐58 | Use for fixed‐length data, e.g., SSN or MAC |
xml | <brain>ben</brain> | Store XML data |
There are a few more than this, but the table represents the most common data types for a database. There are three important items you should note. First, you might notice that the value is the same for a decimal and a double. The difference between the two, however, is in the description. A decimal can contain ~29 digits to the right of the decimal, which makes it more precise than a double, which can contain ~16. Second, VARCHAR and NVARCHAR let you specify a specific length, such as VARCHAR(100) or NVARCHAR(125). You should do this if you know the maximum length of the data that can be added to that column. Specifying the length value will save space compared to using VARCHAR(MAX) or NVARCHAR(MAX) to save space on your database. Lastly, use VARCHAR unless there is a specific need for NVARCHAR that you need to store or work with foreign language special characters.