What is data?
Rational database
- An entity is described as a thing about which information needs to be known or held.
- All data is tabular. Entities are modeled as tables, each instance of an entity is a row in the table, and each property is defined as a column.
- All rows in the same table have the same set of columns.
- A table can contain any number of rows.
- A primary key uniquely identifies each row in a table. No two rows can share the same primary key.
- A foreign key references rows in another, related table. For each value in the foreign key column, there should be a row with the same value in the corresponding primary key column in the other table.
non-relational system
you store the information for entities in collections or containers rather than relational tables. Two entities in the same collection can have a different set of fields rather than a regular set of columns found in a relational table. The lack of a fixed schema means that each entity must be self-describing. Often this is achieved by labeling each field with the name of the data that it represents.
levels of access
analytical systems
transaction processing systems.
visualization options to represent data
neither Single Database nor Elastic Pool support linked servers.
- The query editor in the Azure portal
- The
sqlcmdutility from the command line or the Azure Cloud Shell - SQL Server Management Studio
- Azure Data Studio
- SQL Server Data Tools
non-relational data in Azure
Block blobs. A block blob is handled as a set of blocks. Each block can vary in size, up to 100 MB. A block blob can contain up to 50,000 blocks, giving a maximum size of over 4.7 TB. The block is the smallest amount of data that can be read or written as an individual unit. Block blobs are best used to store discrete, large, binary objects that change infrequently.
Page blobs. A page blob is organized as a collection of fixed size 512-byte pages. A page blob is optimized to support random read and write operations; you can fetch and store data for a single page if necessary. A page blob can hold up to 8 TB of data. Azure uses page blobs to implement virtual disk storage for virtual machines.
Append blobs. An append blob is a block blob optimized to support append operations. You can only add blocks to the end of an append blob; updating or deleting existing blocks isn't supported. Each block can vary in size, up to 4 MB. The maximum size of an append blob is just over 195 GB.
- SQL API
- Table API - Key/Value data
- MongoDB API - JASON documents
- Cassandra API - column oriented
- Gremlin API - Graph data
|
|
Azure
Table Storage |
Azure
Blob storage |
Azure
File storage |
Azure
Cosmos DB |
Data
Lake store |
|
Use case |
NoSQL key-value model |
large
binary like Video and Audio |
File
Share |
documents |
Large
Data |
|
Size |
500 TB
limited by Storage account individual
entity Up to 1 MB, with a maximum of 255 properties. PartitionKey/RowKey string
up to 1 KB in size. An
Azure storage account can hold up to 5 PB of data. |
Block
blobs – 4.7 TB. 100MB each block Page
blobs – 8TB. 512-byte pages Append
blobs – 195GB, each blob up to 4MB |
100
TB of in a single storage account single
file is 1 TB |
automatically
allocates space in a container for your partitions, and each partition can
grow up to 10 GB in size. |
|
|
availability |
|
|
|
99.99% |
|
|
Performance
tiers |
|
·
The Hot tier is the default. ·
The Cool tier. ·
The Archive tier. |
·
The Standard tier - HD ·
The Premium tier – SSD Azure
aims to provide up to 300 MB/second of throughput for a single Standard file
share |
less
than 10-ms latencies for both reads (indexed) and writes at the 99th
percentile,
one
RU per second (RU/s) will support an application that reads a single 1-KB
document each second. |
|
|
Data
Replication |
replicated
three times within an Azure region. you can create tables in geo-redundant
storage. |
|
replicated
locally within a region, but can also be geo-replicated to a second region. |
replicated
within a single region. |
|
|
Data encryption
at rest |
supported |
|
Enabled
by default |
|
|
|
Data
encryption in Transit |
|
|
Can
be enabled |
|
|
|
Tools |
|
|
AzCopy utility Azure
File Sync service |
|
|
- The Azure portal.
- The Azure command-line interface (CLI).
- Azure PowerShell.
- Azure Resource Manager templates.
Provision Azure Cosmos DB
Azure Storage account
A role assignment consists of three elements: a security principal, a role definition, and a scope.
A security principal is an object that represents a user, group, service, or managed identity that is requesting access to Azure resources.
A role definition, often abbreviated to role, is a collection of permissions. A role definition lists the operations that can be performed, such as read, write, and delete. Roles can be given high-level names, like owner, or specific names, like virtual machine reader. Azure includes several built-in roles that you can use, including:
Owner - Has full access to all resources including the right to delegate access to others.
Contributor - Can create and manage all types of Azure resources but can't grant access to others.
Reader- Can view existing Azure resources.
User Access Administrator - Lets you manage user access to Azure resources.
You can also create your own custom roles. For detailed information, see Create or update Azure custom roles using the Azure portal on the Microsoft website.
A scope lists the set of resources that the access applies to. When you assign a role, you can further limit the actions allowed by defining a scope. This is helpful if, for example, you want to make someone a Website Contributor, but only for one resource group.
- Eventual
- Consistent Prefix
- Session
- Bounded Staleness
- Strong
- Data Explorer
- Cosmos DB Data Migration tool
- Azure Data Factory
- custom application that imports data using the Cosmos DB BulkExecutor library
- your own application that uses the functions available through the Cosmos DB SQL API client library
- Hadoop Map/Reduce/ Apache Spark
- Apache Hive provides interactive SQL-like facilities for querying, aggregating, and summarizing data.
- Apache Kafka is a clustered streaming service that can ingest data in real time. It's a highly scalable solution that offers publish and subscribe features.
- Apache Storm is a scalable, fault tolerant platform for running real-time data processing applications.
- Data Lake Store
- Data Lake Analytics - U-SQL
- HDInsight






