Azure

What is difference between data lake and data warehouse?

Azure

Jovan Stosic January 17, 2023

A data lake contains all an organization’s data in a raw, unstructured form, and can store the data indefinitely — for immediate or future use. A data warehouse contains structured data that has been cleaned and processed, ready for strategic analysis based on predefined business needs.

Data lake – Wikipedia

Azure

Jovan Stosic January 17, 2023

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and binary data (images, audio, video). A data lake can be established “on premises” (within an organization’s data centers) or “in the cloud” (using cloud services from vendors such as Amazon, Microsoft, or Google).
Source: Data lake – Wikipedia

Mount SMB Azure file share on Windows

Azure

Jovan Stosic June 5, 2022

Source: Mount SMB Azure file share on Windows | Microsoft Docs

Azure Storage Replication Explained

Azure

Jovan Stosic May 11, 2020

In the previous post about Storage Accounts, we talked about various types of accounts and their associated data services. Since that post, the Premium File storage option that was in preview has now gone into General Availability. (https://azure.microsoft.com/en-us/blog/announcing-the-general-availability-of-azure-premium-files/)

Alongside the data access tiers (Hot, Cool, and Archive) and data services (Blob, File, Queue, and Table), Azure also has options around the data redundancy. In this post, we are going to look at the four types of data replication Azure provides.
LRS (Locally redundant storage)

The most ubiquitously available option is Locally Redundant Storage (LRS); this is the default and only replication type available for all storage account types.

LRS ensure your data is replicated three times within a single data centre. These datastores are updated using synchronous writes to guarantee all three copies are kept up to date.

LRS does have downsides, predominately due to a single data centre in a single azure region containing the data replication. This issue exposes the data to a single point of failure if the Data Centre is entirely offline. Microsoft does commit to a 99.9% SLA for read and writes operations for data stored in LRS datastores. The SLA is not to be confused with the 11 9’s (99.999999999%) guarantee they offer for data durability which is just a commitment to ensure a level of data integrity against data loss or corruption.
ZRS (Zone redundant storage)

Introduced for General Purpose v2 storage accounts Zone Redundant Storage (ZRS) brings enhancements to the replication seen in LRS.

As with LRS, ZRS has synchronous writes between 3 copies of your data to ensure data integrity. Although still within the same Azure region, additional resilience is introduced due to the use of availability zones within the region. Two or three availability zones contain copies of the data. The increased resilience removes the issues of a single data centre outage causing data access issues. Although the data is spread across multiple availability zones, these zones are all within a single region, so data available is still susceptible to region-wide outages.

ZRS has the same SLA levels for read and write operations as LRS, but they increase the integrity of the durability of the data objects by an additional 9 to 99.9999999999% (12 9’s).
GRS (Geo-redundant storage)

Geo-redundant storage (GRS) brings additional redundancy to the data storage over both LRS or ZRS. Along with the three copies of your data stored within a single region, a further three copies are stored in the twinned Azure region. So using GRS means you get all the features of the LRS storage within your primary zone, but you also get a second LRS data storage in a neighbouring Azure region. This data is updated asynchronously, so there is a small lag between the 2 data sets, but for most cases this is acceptable.

Although using GRS means you are using two different datacentres in conjunction, there is a drawback of GRS, which is the secondary data storage is not accessible to read unless the storage account fails over. Due to all read and write operations still being managed by via a single data centre, Microsoft offers the same read and write SLA’s as ZRS and LRS datastores.
RA-GRS (Read-Access Geo-redundant)

The final replication type available is Read Access Geo-redundant storage, it has all the benefits and redundancy of the standard GRS replication but also allows for the secondary copy stored in the paired sister Azure region to be readable. This means you have multiple endpoints that are readable that your applications can use if they are configured to handle this configuration.

The additional read endpoint also means that for RA-GRS the read operation SLA is increased to 99.99% availability for Hot datastores. Write operations are left at the 99.9% SLA due to the single region still being in control of the write and update operations.

Both types of GRS replication do have a slight delay in the replication due to its asynchronous behaviour. This is where checks of the LastSyncTime come in useful to ensure you are reading the most up to date copy of the data. This asynchronous replication needs to be checked against your Recovery Point Objectives if you are planning to use GRS / RA-GRS as part of your DR planning.

Currently, Microsoft controls the failover of storage accounts from one Azure region to another region, so the scope for using the failover on your terms is limited. Microsoft has recently, however, brought into preview an option for a customer-initiated failover themselves. This is currently only available in the US West 2 / US West Central region pair and while in preview not suitable for production environments.

What is the difference between an Azure tenant and Azure subscription?

Azure

Jovan Stosic May 11, 2020

Basic understanding:

a tenant is associated with a single identity (person, company, or organization) and can own one or several subscriptions
a subscription is linked to a payment method
in every subscription, you can add virtual resources (VM, storage, network, …)

Additionally:

Every tenant is linked to a single Azure AD instance, which is shared with all tenant’s subscriptions
Resources from one subscription are isolated from resources in other subscriptions
An owner of a tenant can decide to have multiple subscriptions:
- when Subscriptions limits are reached
- to use different payment methods
- to isolate resources between different entities (departments)

Example 1:

Contoso decides to have a tenant with 2 subscriptions:

one subscription for the Prod department with Credit Card A
one subscription for the Dev department with Credit Card B
(but could also be the same Credit Card as the one of another subscription)

In this example, the two departments share the same Azure AD database. However, resources are isolated between departments, and budgets can be separated too.

Example 2:

A holding company decides to have 2 tenants:

one tenant for subsidiary Contoso with one subscription for Dev and Prod
one tenant for subsidiary Fabrikam with one subscription for Dev and another subscription for Prod

In this example, both companies have a different Azure AD database.

Example 3:

You have a tenant for your personal training.
In this tenant, you can have:

one free Azure subscription (linked to a credit card but not charged, and can be converted to a Pay-As-You-Go subscription after the free trial)
one or several Pay-As-You-Go subscriptions (linked to different credit cards)
one or several Azure Pass Sponsorship subscriptions, not linked to any credit card because these subscriptions are obtained during Microsoft trainings
one Visual Studio subscription (linked to a credit card) and with different quotas (of free resources) than the free subscription

Despite all those subscriptions have isolated resources (per subscription), and some are free while you have to pay for others, all subscriptions share the same Azure AD database.

https://stackoverflow.com/questions/47307368/what-is-the-difference-between-an-azure-tenant-and-azure-subscription

Disk Stripping – Optimize storage performance

Azure

Jovan Stosic May 11, 2020

Disks can be striped using a striping technology (such as Storage Spaces on Windows or mdadm on Linux) to increase the throughput and IOPS by spreading disk activity across multiple disks. Using disk striping allows you to really push the limits of performance for disks, and is often seen in high-performance database systems and other systems with intensive storage requirements.

https://docs.microsoft.com/en-us/learn/modules/design-for-performance-and-scalability-in-azure/4-optimize-storage-performance

Optimize storage performance – Premium Storage

Azure

Jovan Stosic May 11, 2020

Premium storage can attach only to specific virtual machine (VM) sizes. Premium storage capable sizes are designated with an “s” in the name, for example D2s_v3 or Standard_F2s_v2. Any virtual machine type (with or without an “s” in the name) can attach standard storage HDD or SSD drives.

https://docs.microsoft.com/en-us/learn/modules/design-for-performance-and-scalability-in-azure/4-optimize-storage-performance

AZ-103 or AZ-300 – AZURE

Azure

Jovan Stosic May 10, 2020

AZ-103 or AZ-300 from AZURE

Build a highly available architecture – Learn

Azure

Jovan Stosic May 9, 2020

Build a highly available architecture

20 minutes

High availability (HA) ensures your architecture can handle failures. Imagine you’re responsible for a system that must be always fully operational. Failures can and will happen, so how do you ensure that your system can remain online when something goes wrong? How do you perform maintenance without service interruption?

Here, you’ll learn the need for high availability, evaluate application high-availability requirements, and see how the Azure platform helps you meet your availability goals.

What is high availability?

A highly-available service is a service that absorbs fluctuations in availability, load, and temporary failures in dependent services and hardware. The application remains online and available (or maintains the appearance of it) while performing acceptably. This availability is often defined by business requirements, service-level objectives, or service-level agreements.

High availability is ultimately about the ability to handle the loss or severe degradation of a component of a system. This might be due to a virtual machine that’s hosting an application going offline because the host failed. It could be due to planned maintenance for a system upgrade. It could even be caused by the failure of a service in the cloud. Identifying the places where your system can fail, and building in the capabilities to handle those failures, will ensure that the services you offer to your customers can stay online.

High availability of a service typically requires high availability of the components that make up the service. Think of a website that offers an online marketplace to purchase items. The service that’s offered to your customers is the ability to list, buy, and sell items online. To provide this service, you’ll have multiple components: a database, web servers, application servers, and so on. Each of these components could fail, so you have to identify how and where your failure points are, and determine how to address these failure points in your architecture.

Evaluate high availability for your architecture

There are three steps to evaluate an application for high availability:

Determine the service-level agreement of your application
Evaluate the HA capabilities of the application
Evaluate the HA capabilities of dependent applications

Let’s explore these steps in detail.

Determine the service-level agreement of your application

A service-level agreement (SLA) is an agreement between a service provider and a service consumer in which the service provider commits to a standard of service based on measurable metrics and defined responsibilities. SLAs can be strict, legally bound, contractual agreements, or assumed expectations of availability by customers. Service metrics typically focus on service throughput, capacity, and availability, all of which can be measured in various ways. Regardless of the specific metrics that make up the SLA, failure to meet the SLA can have serious financial ramifications for the service provider. A common component of service agreements is guaranteed financial reimbursement for missed SLAs.

Service-level objectives (SLO) are the values of target metrics that are used to measure performance, reliability, or availability. These could be metrics defining the performance of request processing in milliseconds, the availability of services in minutes per month, or the number of requests processed per hour. By evaluating the metrics exposed by your application and understanding what customers use as a measure of quality, you can define the acceptable and unacceptable ranges for these SLOs. By defining these objectives, you clearly set goals and expectations with both the teams supporting the services and customers who are consuming these services. These SLOs will be used to determine if your overall SLA is being met.

The following table shows the potential cumulative downtime for various SLA levels.

TABLE 1
SLA	Downtime per week	Downtime per month	Downtime per year
99%	1.68 hours	7.2 hours	3.65 days
99.9%	10.1 minutes	43.2 minutes	8.76 hours
99.95%	5 minutes	21.6 minutes	4.38 hours
99.99%	1.01 minutes	4.32 minutes	52.56 minutes
99.999%	6 seconds	25.9 seconds	5.26 minutes

Of course, higher availability is better, everything else being equal. But as you strive for more 9s, the cost and complexity to achieve that level of availability grows. An uptime of 99.99% translates to about 5 minutes of total downtime per month. Is it worth the additional complexity and cost to reach five 9s? The answer depends on the business requirements.

Here are some other considerations when defining an SLA:

To achieve four 9’s (99.99%), you probably can’t rely on manual intervention to recover from failures. The application must be self-diagnosing and self-healing.
Beyond four 9’s, it is challenging to detect outages quickly enough to meet the SLA.
Think about the time window that your SLA is measured against. The smaller the window, the tighter the tolerances. It probably doesn’t make sense to define your SLA in terms of hourly or daily uptime.

Identifying SLAs is an important first step when determining the high availability capabilities that your architecture will require. These will help shape the methods you’ll use to make your application highly available.

Evaluate the HA capabilities of the application

To evaluate the HA capabilities of your application, perform a failure analysis. Focus on single points of failure and critical components that would have a large impact on the application if they were unreachable, misconfigured, or started behaving unexpectedly. For areas that do have redundancy, determine whether the application is capable of detecting error conditions and self-healing.

You’ll need to carefully evaluate all components of your application, including the pieces designed to provide HA functionality, such as load balancers. Single points of failure will either need to be modified to have HA capabilities integrated, or will need to be replaced with services that can provide HA capabilities.

Evaluate the HA capabilities of dependent applications

You’ll need to understand not only your application’s SLA requirements to your consumer, but also the provided SLAs of any resource that your application may depend on. If you are committing an uptime to your customers of 99.9%, but a service your application depends on only has an uptime commitment of 99%, this could put you at risk of not meeting your SLA to your customers. If a dependent service is unable to provide a sufficient SLA, you may need to modify your own SLA, replace the dependency with an alternative, or find ways to meet your SLA while the dependency is unavailable. Depending on the scenario and the nature of the dependency, failing dependencies can be temporarily worked around with solutions like caches and work queues.

Azure’s highly available platform

The Azure cloud platform has been designed to provide high availability throughout all its services. Like any system, applications may be affected by both hardware and software platform events. The need to design your application architecture to handle failures is critical, and the Azure cloud platform provides you with the tools and capabilities to make your application highly available. There are several core concepts when considering HA for your architecture on Azure:

Availability sets
Availability zones
Load balancing
Platform as a service (PaaS) HA capabilities

Availability sets

Availability sets are a way for you to inform Azure that VMs that belong to the same application workload should be distributed to prevent simultaneous impact from hardware failure and scheduled maintenance. Availability sets are made up of update domains and fault domains.

An illustration showing three availability sets. The first set has one update domain, the second has two update domains, and the third is without any update domain.

Update domains ensure that a subset of your application’s servers always remain running when the virtual machine hosts in an Azure datacenter require downtime for maintenance. Most updates can be performed with no impact to the VMs running on them, but there are times when this isn’t possible. To ensure that updates don’t happen to a whole datacenter at once, the Azure datacenter is logically sectioned into update domains (UD). When a maintenance event, such as a performance update and critical security patch that needs to be applied to the host, the update is sequenced through update domains. The use of sequencing updates using update domains ensures that the whole datacenter isn’t unavailable during platform updates and patching.

While update domains represent a logical section of the datacenter, fault domains (FD) represent physical sections of the datacenter and ensure rack diversity of servers in an availability set. Fault domains align to the physical separation of shared hardware in the datacenter. This includes power, cooling, and network hardware that supports the physical servers located in server racks. In the event the hardware that supports a server rack has become unavailable, only that rack of servers would be affected by the outage. By placing your VMs in an availability set, your VMs will be automatically spread across multiple FDs so that in the event of a hardware failure only part of your VMs will be impacted.

With availability sets, you can ensure your application remains online if a high-impact maintenance event is required or hardware failures occur.

Availability zones

Availability zones are independent physical datacenter locations within a region that include their own power, cooling, and networking. By taking availability zones into account when deploying resources, you can protect workloads from datacenter outages while retaining presence in a particular region. Services like virtual machines are zonal services and allow you to deploy them to specific zones within a region. Other services are zone-redundant services and will replicate across the availability zones in the specific Azure region. Both types ensure that within an Azure region there are no single points of failure.

An illustration showing three availability zones within an Azure region. Each availability zone has its own set of resources and are connected with each other for replication of zone-redundant services.

Supported regions contain a minimum of three availability zones. When creating zonal service resources in those regions, you’ll have the ability to select the zone in which the resource should be created. This will allow you to design your application to withstand a zonal outage and continue to operate in an Azure region before having to evacuate your application to another Azure region.

Availability zones are a newer high availability configuration service for Azure regions and are currently available for certain regions. It’s important to check the availability of this service in the region that you’re planning to deploy your application if you want to consider this functionality. Availability zones are supported when using virtual machines, as well as several PaaS services. Availability zones are mutually exclusive with availability sets. When using availability zones you no longer need to define an availability set for your systems. You’ll have diversity at the data center level, and updates will never be performed to multiple availability zones at the same time.

Load balancing

Load balancers manage how network traffic is distributed across an application. Load balancers are essential in keeping your application resilient to individual component failures and to ensure your application is available to process requests. For applications that don’t have service discovery built in, load balancing is required for both availability sets and availability zones.

Azure possesses three load balancing technology services that are distinct in their abilities to route network traffic:

Azure Traffic Manager provides global DNS load balancing. You would consider using Traffic Manager to provide load balancing of DNS endpoints within or across Azure regions. Traffic manager will distribute requests to available endpoints, and use endpoint monitoring to detect and remove failed endpoints from load.
Azure Application Gateway provides Layer 7 load-balancing capabilities, such as round-robin distribution of incoming traffic, cookie-based session affinity, URL path-based routing, and the ability to host multiple websites behind a single application gateway. Application Gateway by default monitors the health of all resources in its back-end pool and automatically removes any resource considered unhealthy from the pool. Application Gateway continues to monitor the unhealthy instances and adds them back to the healthy back-end pool once they become available and respond to health probes.
Azure Load Balancer is a layer 4 load balancer. You can configure public and internal load-balanced endpoints and define rules to map inbound connections to back-end pool destinations by using TCP and HTTP health-probing options to manage service availability.

One or a combination of all three Azure load-balancing technologies can ensure you have the necessary options available to architect a highly available solution to route network traffic through your application.

PaaS HA capabilities

PaaS services come with high availability built in. Services such as Azure SQL Database, Azure App Service, and Azure Service Bus include high availability features and ensure that failures of an individual component of the service will be seamless to your application. Using PaaS services is one of the best ways to ensure that your architecture is highly available.

When architecting for high availability, you’ll want to understand the SLA that you’re committing to your customers. Then evaluate both the HA capabilities that your application has, and the HA capabilities and SLAs of dependent systems. After those have been established, use Azure features, such as availability sets, availability zones, and various load-balancing technologies, to add HA capabilities to your application. Any PaaS services you should choose to use will have HA capabilities built in.

https://docs.microsoft.com/en-us/learn/modules/design-for-availability-and-recoverability-in-azure/2-high-availability

How to associate a Microsoft account to your organization’s Microsoft Partner Network ID

Azure

Jovan Stosic July 18, 2019

How to associate a Microsoft account to your organization’s Microsoft Partner Network ID in Partner Membership Center

Option 1: The primary program contact adds you to the organization

Instructions for the primary program contact:

Sign in to the Partner Membership Center.
From the Requirements & Assets dropdown menu, select Invite People to Associate.
At the top of the page, click Add New People.
Fill in the required fields and then click Send Invitation.

Once the primary program contact of your organization has completed the steps above, you will receive a notification email. Follow the instructions in the email to complete the association.

Option 2: Associate yourself to the organization

Open a new tab using an InPrivate browsing session and clear your cache.
Navigate to the Partner Membership Center and sign in with your Microsoft account. (You must have an @oultlook.com or @hotmail.com account if you associate yourself for the first time).
You will see the screen below. Here, you’ll have two options: Associate as an Individual or Enroll Organization. Do not choose Enroll Organization; that option is only for new partners creating a Microsoft Partner Network ID for the first time. Please choose Associate as an Individual.
After clicking Associate as an Individual, you will be redirected to the Find Your Organization page.
Here, you will need to complete the Organization/Location Name, the Country/Region, and the State/Province fields. If you are outside of the United States, please follow the prompts that are provided. Then click the Find My Organization button.
Microsoft’s system will generate a list of organizations that match the information you provided. Select your organization from the list and then click Associate to this Organization.
You will be redirected to a new screen, Organization Found – Submit E-mail. Type in your work email address and name, then click Submit.
You are now associated to your organization and will receive a confirmation email.

Source: How to associate a Microsoft account to your organization’s Microsoft Partner Network ID

How to use the SharePoint Migration Tool | Microsoft Docs

Azure

Jovan Stosic July 16, 2019

The SharePoint Migration Tool (SPMT) is a tool that migrates your files from SharePoint on-premises document libraries or regular file shares and easily moves them to your SharePoint Online tenant. It is available to all Office 365 users.

Source: How to use the SharePoint Migration Tool | Microsoft Docs

RIS

Robust Information Systems

Azure

What is difference between data lake and data warehouse?

Data lake – Wikipedia

Mount SMB Azure file share on Windows

Azure Storage Replication Explained

What is the difference between an Azure tenant and Azure subscription?

Basic understanding:

Additionally:

Example 1:

Example 2:

Example 3:

Disk Stripping – Optimize storage performance

Optimize storage performance – Premium Storage

AZ-103 or AZ-300 – AZURE

Build a highly available architecture – Learn

Build a highly available architecture

What is high availability?

Evaluate high availability for your architecture

Determine the service-level agreement of your application

Evaluate the HA capabilities of the application

Evaluate the HA capabilities of dependent applications

Azure’s highly available platform

Availability sets

Availability zones

Load balancing

PaaS HA capabilities

How to associate a Microsoft account to your organization’s Microsoft Partner Network ID

How to associate a Microsoft account to your organization’s Microsoft Partner Network ID in Partner Membership Center

How to use the SharePoint Migration Tool | Microsoft Docs