Welcome!
This post is a handbook where you probably will find an answer to “How different things works in Azure Service Fabric?” question. Almost all of the information (around 90%) is from docs.microsoft.com, GitHub and etc., the rest of it are personal findings made during development of CoherentSolutions.Extensions.ServiceFabric.Hosting.
Where possible the information will be confirmed by reliable source (docs, issues or posts).
Table of Contents
Interlude
Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable micro-services and containers. SF operates on a so called cluster. A cluster is a network-connected set of physical or virtual machines into which micro-services are deployed and managed. A physical or virtual machine that is part of a cluster is called a node. The workload is represented in form of applications where each application consists from services where each service is represents as one or more replicas deployed to nodes [*].
Object Model
All conceptual objects in SF model are represented by type and instance. The type represents a unique named basic definition of something i.e. the type can define attributes and properties that all instances of this type will share. The instance in turn represents an independent uniquely named projection of the type initialized with specific set of parameters.
The closest analogy between type and instance is a class and instance of that class in object oriented language like C# or Java.
Author’s note
Besides type and instance SF devotes one more concept – replica. The replica represents a service’s code executing on the node.
NodeType & Node
NodeType is a uniquely identifiable definition of the hosting environment. This definition contains abstract information about hardware and software capabilities in form of properties as well as SF specific information i.e. communication ports.
Standalone Cluster
When configuring standalone cluster NodeType is defined in the ClusterConfig.json. Only a subset of configuration can be updated at runtime.
"nodeTypes": [{
"name": "NodeType0",
"clientConnectionEndpointPort": "19000",
"clusterConnectionEndpointPort": "19001",
"leaseDriverEndpointPort": "19002"
"serviceConnectionEndpointPort": "19003",
"httpGatewayEndpointPort": "19080",
"reverseProxyEndpointPort": "19081",
"applicationPorts": {
"startPort": "20575",
"endPort": "20605"
},
"ephemeralPorts": {
"startPort": "20606",
"endPort": "20861"
},
"isPrimary": true
}]
Example: Configuring NodeType in ClusterConfig.json
The complete description of all properties of nodeType
in ClusterConfig.json can be found in Microsoft.ServiceFabric.json schema (the referenced version is of 2018.02.01).
Node is a uniquely named instance of NodeType that represents a dedicated hosting environment where replicas are executed. It is usually represented by physical or virtual machine (or operating system process in case of local development cluster).
The relationship between NodeType and Node is shown on the picture bellow:

Standalone Cluster
When configuring standalone cluster nodes are configured either in ClusterConfig.json…
"nodes": [{
"nodeName": "vm0",
"iPAddress": "localhost",
"nodeTypeRef": "NodeType0",
"faultDomain": "fd:/dc1/r0",
"upgradeDomain": "UD0"
}]
Example: Configuring Node in ClusterConfig.json
… or added dynamically using PowerShell.
.\AddNode.ps1 `
-NodeName vm0 `
-NodeIPAddressorFQDN localhost `
-NodeType NodeType0 `
-FaultDomain fd:/dc1/r0 `
-UpgradeDomain UD0 `
-AcceptEULA
Example: Dynamically adding Node using PowerShell
ServiceType & Service
ServiceType is a uniquely identifiable, versioned definition of service as a separate unit of application logic. The type is declared in ServiceManifest.xml which includes: list of Code, Config and Data packages, resources and diagnostics configuration.
The complete list can be found in ServiceManifest.xml schema and documentation.
<ServiceManifest
Name="ApiServicePkg"
Version="1.0.0"
xmlns="http://schemas.microsoft.com/2011/01/fabric"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<ServiceTypes>
<StatefulServiceType
ServiceTypeName="ApiServiceType"
HasPersistedState="true" />
</ServiceTypes>
<CodePackage
Name="Code"
Version="1.0.0">
<EntryPoint>
<ExeHost IsExternalExecutable="true">
<Program>dotnet</Program>
<Arguments>Api.dll</Arguments>
<WorkingFolder>CodePackage</WorkingFolder>;
</ExeHost>
</EntryPoint>
</CodePackage>
<ConfigPackage
Name="Config"
Version="1.0.0" />
<DataPackage
Name="Data"
Version="1.0.0" />
<Resources>
<Endpoints>
<Endpoint
Name="ServiceEndpoint"
Protocol="http"
UriScheme="http" />
<Endpoint Name="ReplicatorEndpoint" />
</Endpoints>
</Resources>
</ServiceManifest>
Example of ServiceManifest.xml
Each package defined in the ServiceManifest.xml has it’s own purpose [*]:
CodePackage
Contains binaries with the implementation of one or more service types logic. There can be multiple code packages defined in the same ServiceManifest.xml, so when the replica is instantiated all code packages are activated by hitting their entry points. It is expected through that code packages will register the service types declared in the ServiceManifest.xml.
DataPackage
Declares a folder, named by the @Name
attribute, that contains arbitrary static data to be consumed by the process at run time.
ConfigPackage
Declares a folder, named by the @Name
attribute, that contains a Settings.xml file. The settings file contains sections of user-defined, key-value pair settings that the process reads back at run time.
<Settings
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://schemas.microsoft.com/2011/01/fabric">
<Section Name="Section">
<Parameter Name="KeyOne" Value="ValueOne" />
<Parameter Name="KeyTwo" Value="ValueTwo" />
</Section>
</Settings>
Example of Settings.xml
In short service type is declared in the ServiceManifest.xml but it’s implementation is included in one of the code packages.

There can be multiple active versions of the same service type in the cluster.
Author’s note
Service is a uniquely identifiable, versioned instance of service type initialized with custom set of parameters. Service defines all replica related aspects like: placement in cluster (constraints, policies, packing), partitioning schema, replica count.
All services can be managed independently using SF management API i.e. using PowerShell we can modify replica count.
Author’s note
ApplicationType & Application
ApplicationType is a uniquely identifiable, versioned definition of collection of service types.

The type is declared in the ApplicationManifest.xml and includes: list of application services (as references to ServiceManifest.xml), security and policy information, default configuration etc.
The complete list can be found in ApplicationManifest.xml schema and documentation.
<ApplicationManifest
ApplicationTypeName="AppType"
ApplicationTypeVersion="1.0.0">
<Parameters>
<Parameter
Name="MinReplicaSetSize"
DefaultValue="3" />
<Parameter
Name="PartitionCount"
DefaultValue="1" />
<Parameter
Name="TargetReplicaSetSize"
DefaultValue="3" />
</Parameters>
<ServiceManifestImport>
<ServiceManifestRef
ServiceManifestName="ApiServicePkg"
ServiceManifestVersion="1.0.0" />
<ConfigOverrides />
<EnvironmentOverrides CodePackageRef="Code">
<EnvironmentVariable
Name="ASPNETCORE_ENVIRONMENT"
Value="Development" />
</EnvironmentOverrides>
</ServiceManifestImport>
<DefaultServices>
<Service
Name="ApiService"
ServicePackageActivationMode="ExclusiveProcess">
<StatefulService ServiceTypeName="ApiServiceType"
TargetReplicaSetSize="[TargetReplicaSetSize]"
MinReplicaSetSize="[MinReplicaSetSize]">
<NamedPartition>
<Partition Name="Left" />
<Partition Name="Right" />
</NamedPartition>
</StatefulService>
</Service>
</DefaultServices>
</ApplicationManifest>
Example of ApplicationManifest.xml
There can be multiple active versions of the same application type in the cluster.
Author’s note
Application is a uniquely identifiable, versioned instance of application type initialized with custom set of parameters. Application unites services into single management unit and allows to configure security, policies, diagnostics and service templates for quick service instantiation.
All applications can be managed independently using SF management API i.e. we can create or delete any service from the application without affecting the others.
Author’s note
Application & Service & Nodes
The picture bellow illustrates the relationships between types, instances and cluster.
Nodes
There are three nodes: /left-node
, /middle-node
and /right-node
of the same NodeType
which basically means they have the same hardware and software characteristics.
Applications & Services
Single application (fabric:/app-colors) with single service (fabric:/app-colors/srv-blue) of the ApplicationType
and ServiceType
correspondingly. The application was initialized with app-params parameters and service was initialized with blue-params parameters.
Replicas
Replicas of the service are activated on all of nodes with code from the same ServiceType
and have access to DataPackage
and ConfigPackage
packages.
Cluster
Cluster is a set or interconnected nodes used to host applications and services. SF manages nodes and orchestrate applications in a way to ensure their durability, availability and scalability.
Orchestration logic is concentrated in Cluster Manager Service (CM). CM utilizes different layout concepts (fault domains, upgrade domains), scalability strategies (partitioning) and load metrics in order to satisfy the above requirements.
Fault Domains
Fault Domains represent a layout concept of segregation of resource by physical implementation. Fault domains are naturally represented as hierarchy of likely to fail components. The example of this can be two servers connected to the same power supply. While each server can fail by different reasons they both would fail in case of power supply outage so they both rooted to the same fault domain: /power-supply/server-1
, /power-supply/server-2
.
CM uses knowledge of fault domains when placing replicas to make sure they are correctly distributed across nodes from different fault domains. This reduces the effect of unexpected outages.
Azure Hosting Services
Fault Domains are defined automatically when using Azure Hosting Services. This information is picked up from the Azure automatically.
Standalone Cluster
When configuring standalone cluster fault domains are explicitly defined for each node in the ClusterConfig.json.
"nodes": [
{
"nodeName": "Node1",
"faultDomain": "fd:/datacenter1/rack1/blade1/v1",
// 'Node1' is in "datacenter #1 in rack #1 on blade #1"
},
{
"nodeName": "Node2",
"faultDomain": "fd:/datacenter1/rack2/blade1/v6",
// 'Node2' is in "datacenter #1 in rack #2 on blade #1"
}
]
Example: Defining Fault Domain in ClusterConfig.json
Upgrade Domains
Upgrade Domains represents a layout concept of splitting nodes into logical groups (domains) in a way where all services on nodes within the same domain are upgraded simultaneously while domains are processed sequentially. Upgrade domain can consist from one of more nodes.
Azure Hosting Services
Upgrade Domains are defined automatically when using Azure Hosting Services. This information is picked up from the Azure automatically.
Standalone Cluster
When configuring standalone cluster upgrade domains are explicitly defined for each node in the ClusterConfig.json.
"nodes": [
{
"nodeName": "Node1",
"upgradeDomain": "up:/green",
// 'Node1' belongs to 'green' upgrade domain
},
{
"nodeName": "Node2",
"upgradeDomain": "ud:/orange",
// 'Node2' belongs to 'orange' upgrade domain
}
]
Example: Defining Upgrade Domain in ClusterConfig.json
While the main justification when defining upgrade domains is quite obvious – we need to make sure that not all replicas will be shutdown during upgrade. There is also a second justification. We also need to make sure that remaining nodes will have enough capacity to handle additional load.
It is important to understand that while fault domains are dictated by the underlying hardware infrastructure. Upgrade domains are more like ‘tags’.
Author’s note
Reliable Services
Stateful Service
Stateful Service represents a service that preserves state between requests. It’s Replicas are divided into two types: primary and secondary. Primary replica has rights to read and manipulate state while the secondary replica has a readonly access to it.
The state is preserved within the service without involving a third-party agent. This is done by having a full copy of state within each replica (Picture A).

State consistency between replicas is achieved by replication of all state modifications done by primary replica to all secondary replicas (Picture B).

From developer’s standpoint state represents a sort of hash-table were you can put and retrieve state objects. All interactions with state are done using IReliableStateManager while values itself are usually stored in special Reliable Collections.
There also a way to introduce custom reliable state objects by implementing IReliableState interface.
Author’s note
There are three available reliable collections: Reliable Dictionary, Reliable Queue and Reliable Concurrent Queue.
Reliable Dictionary
Represents a replicated, transactional, and asynchronous collection of key/value pairs.
Reliable Queue
Represents a replicated, transactional, and asynchronous strict first-in, first-out (FIFO) queue.
Reliable Concurrent Queue
Represents a replicated, transactional, and asynchronous best effort ordering queue for high throughput.
All reliable collections are: replicated, persisted, asynchronous and transactional.
Replicated
State changes are replicated for high availability.
Persisted
Data is persisted to disk for durability against large-scale outages (for example, a datacenter power outage).
Asynchronous
APIs are asynchronous to ensure that threads are not blocked when incurring IO.
Transactional
APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily.
All replicas keep all the state both in memory and as persisted to disk [*]. This is done to ensure best performance when accessing state. When the state is persisted to disk all state values are serialized [*].
Life Cycle
Stateful Service implementation isn’t just an instance of class
derived from StatefulServiceBase. When replica is building it’s implementation object has to be created, initialized and correctly registered in the SF runtime.
There are four life-cycle routines: Startup, Shutdown, Promotion and Demotion and Abort.
Pay attention that described life-cycles are slightly contradicts with the documentation. Please see issue on GitHub for details (why) and gist for code sample.
The implementation code is available on GitHub.
Author’s note
Startup
The startup routine is executed when replica is created by SF.
When replica is initialized as primary the following sequence of events is performed:
- Service’s implementation object is created.
- Service’s
OpenAsync
method is called and awaited. - Service’s
CreateServiceReplicaListeners
method is called and awaited. - All
ServiceReplicaListener
‘s returned fromCreateServiceReplicaListeners
are used to create instances ofICommunicationListener
and have theirOpenAsync
methods called. The methods are called and awaited in sequence. - Service’s
ChangeRoleAsync
method is called (withnewRole = Primary
) and awaited. - Service’s
RunAsync
method is called.
When replica is initialized as secondary the following sequence of events is performed:
- Service’s implementation object is created.
- Service’s
OpenAsync
method is called and awaited. - Service’s
ChangeRoleAsync
method is called (withnewRole = IdleSecondary
) and awaited. - Service’s
CreateServiceReplicaListeners
method is called and awaited. - All
ServiceReplicaListener
‘s whereServiceReplicaListener.ListenOnSecondary == true
return fromCreateServiceReplicaListeners
are used to create instances ofICommunicationListener
and have theirOpenAsync
methods called. The methods are called and awaited in sequence. - Service’s
ChangeRoleAsync
method is called (withnewRole = ActiveSecondary
) and awaited.
The differences in startup sequences can be explained according to replica roles and method purposes.
RunAsync isn’t invoked on secondary replica
The RunAsync
method is designed to allows replica to perform some sort of background job. Because in Stateful Service only primary replica has write access to the reliable state then it doesn’t make sense to invoke this method in secondary replicas because primary and secondary replicas share the same implementation and would expect write access.
ChangeRoleAsync is invoked twice on secondary replica
New secondary replica is always created in the IdleSecondary
role because it should receive copy of the reliable state from the primary replica before this replica can be of any use. When copy of reliable state is received then replica continues the same startup sequence as primary replica with exceptions that final replica role is set to ActiveSecondary
and RunAsync
method isn’t invoked.
The important moment to understand is when the startup routines are executed – when primary or secondary replica are initialized from scratch. The secondary replica can be initialized from scratch at any time when CM decides. In contrary to this primary replica is initialized from scratch only when partition is being initialized – in all other cases when CM requires to move primary replica then either existing secondary replica is promoted to primary or new secondary replica is initialized from scratch and then promoted to primary.
The partition initialization sequence is illustrated on the picture below:
Important to note that primary replica doesn’t call RunAsync
method until secondary replicas are built (i.e. have a copy of state). The amount of secondary replicas to await before running RunAsync
in primary replica is determined by MinReplicaSetSize
.
Promotion and Demotion
Promotion and demotion are natural part of Stateful Service partition life-cycle. These processes can happen for various reasons: primary replica movement (manual or automatic during resource balancing), primary replica failure, etc.
When secondary replica is promoted the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. - All
ServiceReplicaListener
‘s returned fromCreateServiceReplicaListeners
(during secondary replica initialization) are used to create instances ofICommunicationListener
and have theirOpenAsync
methods called and awaited. The methods are called and awaited in sequence. - Service’s
ChangeRoleAsync
method is called (withnewRole = Primary
) and awaited. - Service’s
RunAsync
method is called.
When primary replica is demoted the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. CancellationToken
passed toRunAsync
method is canceled.RunAsync
method is awaited.- All
ServiceReplicaListener
‘s returned fromCreateServiceReplicaListeners
(during primary replica initialization) whereServiceReplicaListener.ListenOnSecondary == true
are used to create instances ofICommunicationListener
and have theirOpenAsync
methods called and awaited. The methods are called and awaited in sequence. - Service’s
ChangeRoleAsync
method is called (withnewRole = ActiveSecondary
) and awaited.
Pay attention that
Author’s noteCreateServiceReplicaListeners
method isn’t called during promotion-demotion routines. The code uses the sameServiceReplicaListener
‘s returned from the first initialization.
If the promotion-demotion cycle is graceful (i.e. it was triggered by CM) then at first primary replica is demoted and then secondary replica is promoted. In case of primary replica failure SF tires (if possible) to perform graceful shutdown or primary replica and in parallel executes promotion sequence on secondary replica.
Shutdown
Shutdown routine can be triggered by various reasons including replica restart or replica failure. The shutdown routine describes the so-called graceful shutdown (i.e. the process isn’t crashed and SF initiates a shutdown).
When primary replica is shutdown the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. CancellationToken
passed to Service’sRunAsync
method is canceled.- Service’s
RunAsync
method is awaited. - Service’s
OnCloseAsync
method is called and awaited. - Service’s implementation object is destroyed.
When secondary replica is shutdown the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. - Service’s
OnCloseAsync
method is called and awaited. - Service’s implementation object is destroyed.
Abort
The abort routine is executed in cases when replica has to be terminated very quickly i.e. there is an internal failure on the Node, replica is being killed programmatically using Remove-ServiceFabricReplica, etc.
When primary replica is aborted the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. CancellationToken
passed to Service’sRunAsync
method is canceled.- Service’s
RunAsync
method is awaited. - Service’s
ChangeRoleAsync
method is called (withnewRole = None
) and awaited. - Service’s
Abort
method is executed. - Service’s implementation object is destroyed.
When secondary replica is shutdown the following sequence of events is performed:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. - Service’s
ChangeRoleAsync
method is called (withnewRole = None
) and awaited. - Service’s
Abort
method is executed. - Service’s implementation object is destroyed.
Stateless Service
Stateless Service represents a service that doesn’t preserve state between requests or uses external storage to store and restore state on each request. All replicas are equal and there is not difference what replica process the request.
Life Cycle
Stateless Service implementation isn’t just an instance of class
derived from StatelessService. When replica is build it’s implementation object is created and has to be initialized and correctly registered in the SF runtime.
Stateless Service has three life-cycle routines: Startup, Shutdown and Abort.
Pay attention that described life-cycles are slightly contradicts with the documentation. Please see issue on GitHub for details (why) and gist for code sample.
The implementation code is also available on GitHub.
Author’s note
Startup
The startup routine is executed when replica is created by SF.
Here is the sequence of events:
- Service’s implementation object is created.
- Service’s
CreateServiceInstanceListeners
method is called and awaited.
On error:
⮚ Service implementation object is recreated. - All
ServiceInstanceListener
‘s returned fromCreateServiceInstanceListeners
are used to create instances ofICommunicationListener
and have theirOpenAsync
methods called. The methods are called and awaited in sequence.
On error:
⮚ All previously opened listeners have theirAbort
method called.
⮚ Service implementation object is recreated. - Then in parallel:
- Service’s
RunAsync
method is called.
On error:
If exception isOperationCanceledException
andRunAsync
was canceled:
⮚ Execution ofRunAsync
is treated as cancelled.
⮚ No additional actions performed.
If exception isFabricException
:
⮚ Partition Health status is temporary changed to Warning.
⮚ Service implementation object is recreated.
If exception isOperationCanceledException
andRunAsync
wasn’t canceled or exception is unexpected type of exception then:
⮚ Partition Health status is temporary changed to Warning.
⮚ Executing process is terminated using Environment.FailFast method. - Service’s
OpenAsync
method is called.
- Service’s
- Service’s
OpenAsync
method is awaited.
It is important to understand that while RunAsync
and OpenAsync
are called in parallel, from the SF prospective replica isn’t initialized until the OpenAsync
method is finished.
Shutdown
Shutdown routine can be triggered by various reasons including application upgrade, service scaling or replica failure. The shutdown routine describes the so-called graceful shutdown (i.e. the process isn’t crashed and SF initiates a shutdown).
Here is the sequence of events:
- All previously created
ICommunicationListener
have theirCloseAsync
method called and awaited. The methods are called and awaited in sequence. CancellationToken
passed to Service’sRunAsync
method is canceled.- Service’s
RunAsync
method is awaited. - Service’s
OnCloseAsync
method is called and awaited. - Service’s implementation object is destroyed.
Abort
The abort routine is executed in cases when replica has to be terminated very quickly i.e. there is an internal failure on the Node, replica is being killed programmatically using Remove-ServiceFabricReplica, etc.
Here is the sequence of events:
CancellationToken
passed to Service’sRunAsync
method is canceled.- All previously created
ICommunicationListener
have theirAbort
method executed. The methods are executed in sequence. - Service’s
Abort
method is executed. - Service’s implementation object is destroyed.
Because service’s RunAsync
method isn’t awaited then depending on the service hosting model the abort routine has slightly different constraints on how much time RunAsync
has in order to finish cancellation:
ExecutiveProcess
The time is very limited and equals to the time required to execute all ICommunicationListener.Abort
methods and service’s Abort
method. Right after that hosting process is terminated.
SharedProcess
In normal circumstances when there is no critical reason to terminate host process there is no time limit.
From the above description it can sound like
Author’s noteSharedProcess
hosting model is better thanExecutiveProcess
because it doesn’t limit theRunAsync
in cancellation time. This behavior is just a side effect of howSharedProcess
hosting is implemented currently and should be relied on when writingRunAsync
code.
Partitioning
Partitioning is a cluster agnostic concept used to achieve scalability by grouping service replicas based on domain requirements. Partitioning isn’t related to cluster layout i.e. when choosing partitioning schema it shouldn’t matter how many nodes are in cluster and what configuration they have.
There are three supported partitioning schemes: singleton, ranged and named.
Singleton
When using singleton partitioning all replicas are assigned to the single group (no grouping).


Distribution replicas across three and six nodes clusters when using Singleton partitioning schema.
Ranged (UniformInt64)
When using uniform partitioning specified range of values (i.e. LowKey = 0
, HighKey = 29
) is divided into desired partitions count (PartitionCount = 3
) in a way each partition is become responsible for particular subrange of values (Partition 1: 0-9, Partition 2: 10-19, Partition 3: 20-29).


Distribution of replicas across three and six nodes clusters when using Ranged (UniformInt64) partitioning schema.
Named
When using named partitioning each partition is addressable by partition unique string
id.


Distribution of replicas across three and six nodes clusters when using Named partitioning schema.
Stateful Service
Partitioning has a special meaning for Stateful Service – each partition has it’s own isolated instance of reliable state and therefore own primary replica and multiple secondary replicas.
When configuring replica count Stateful Service uses two values – MinReplicaSetSize
and TargetReplicaSetSize
.
MinReplicaSetSize
MinReplicaSetSize
is the minimum number of replicas required to preserve state and continue partition progress [*]. This value is used to calculate minimum quorum size [*]:
In case of quorum lose all future writes are disallowed.
TargetReplicaSetSize
TargetReplicaSetSize
is the desired number of replicas in a cluster. Service Fabric will try to keep replica count as close as possible to this value.
When reliable state is updated by primary replica the change request MUST be accepted by replica quorum otherwise the change request is rejected and operation fails.
Author’s note
Both MinReplicaSetSize
and TargetReplicaSetSize
can be changed at runtime with C# …
var updateDescription =
new StatefulServiceUpdateDescription();
updateDescription.MinReplicaSetSize = 5;
updateDescription.TargetReplicaSetSize = 7;
await fabricClient.ServiceManager
.UpdateServiceAsync(
new Uri("fabric:/app/service"),
updateDescription);
Example: Updating MinReplicaSetSize and TargetReplicaSetSize at runtime using C#
… or PowerShell
Update-ServiceFabricService `
-Stateful fabric:/app/service `
-MinReplicaSetSize 5 `
-TargetReplicaSetSize 7
Example: Updating MinReplicaSetSize and TargetReplicaSetSize at runtime using PowerShell
Stateless Service
Partitioning has no special meaning for Stateless Service hence it still can be used to achieve scalability by domain decomposition i.e. each partition can have it’s own database instance and each replica will use partition information to determine the appropriate database to use.
When configuring replica count Stateless Service uses single value – InstanceCount
. The InstanceCount
represents absolute number of replicas required hence SF allows to specify InstanceCount = -1
in order to produce one replica for each node.
The InstanceCount can be changed at runtime with C# …
var updateDescription =
new StatelessServiceUpdateDescription();
updateDescription.InstanceCount = 5;
await fabricClient.ServiceManager
.UpdateServiceAsync(
new Uri("fabric:/app/service"),
updateDescription);
Example: Updating InstanceCount at runtime using C#
… or PowerShell
Update-ServiceFabricService `
-Stateless fabric:/app/service `
-InstanceCount 5
Example: Updating InstanceCount at runtime using PowerShell
Communication
Cross-service communication is very important and complex topic in rapidly changing environment. SF has rich support for implementation of cross-service communication including: use custom, HTTP or Remoting protocols. Because replicas can be created / deleted / moved between nodes based on the decisions made by CM services there is no way to statically preconfigure services with any kind of IP address or port to contact to.
Replica address resolution is done using one of the SF infrastructure services – Naming Service. The high-level overview of cross-service communication is illustrated on Picture A.

Contact to Naming Service doesn’t end up with the address of the particular replica rather than that the Naming Service returns a special Address structure that has all the endpoints (declared in ServiceManifest.xml) registered in partition and fills each endpoint address value with the physical address of one of the replica. The replica which address is returned are chosen based on the internal load balancing algorithms of Naming Service and passed TargetReplicaSelector.
Default (PrimaryReplica, RandomInstance)
if
the service partition is stateful then
communication is established with primary replica (PrimaryReplica).else
the RandomInstance
replica selector is used.
RandomReplica
This selector indicates that communication can be established with any replica chosen in random – (i.e) primary or secondary (Stateful Service).
RandomSecondaryReplica
This selector indicates that communication can be established with any secondary replica chosen in random (Stateful Service).
The Picture B presents a detailed diagram of communication between two services.

The Address
structure contains values for each Endpoint
declared in the ServiceManifest.xml. Depending on the type of the endpoint the value consists from IP and Port plus custom metadata.
There a few build in way to code the communication:
Replica Placement
Strategies
Fault and Update domains information is dictated by hardware infrastructure and nodes capacity and significantly affect the way replicas are distributed across the cluster. While there are multiple factors that affect how replicas are distributed SF support three major strategies: Maximum Difference, Quorum Safe and Adaptive.
Maximum Difference
The main Maximum Difference strategy constraint can be formulated as following:
For a given service partition there should never be a difference greater than one in the number of service objects (stateless service instances or stateful service replicas) between two domains.
docs.microsoft.com
Here is a quick example:
Imagine a six nodes cluster with nodes distributed across five fault domain and five upgrade domains (Picture A).

When using Maximum Difference strategy the service instance with five replicas in total would have them distributed as shown on Picture B.

Here CM distributed replicas between fault domains (to ensure maximum availability). The trick here is if we would try to distribute replicas differently – we would always made up with two fault domains that has difference in count of assigned replica greater or equal to two which violates the constraint of Maximum Different strategy.
This example demonstrates the rule of thumb when planning your cluster: Try to keep fault domain hierarchy tree balanced. Unbalanced fault domain tree can result in harder availability management and as result can lead to worse usage of available resources.
Author’s note
Quorum Safe
The main Quorum Safe strategy constraint can be formulated as following:
For a given service partition, replica distribution across domains should ensure that the partition does not suffer a quorum loss.
docs.microsoft.com
Here is a quick example:
Imagine a six nodes cluster with nodes distributed across five fault domain and five upgrade domains (Picture A).

When using Quorum Safe strategy the service instance with five replicas in total would have them distributed as shown on Picture B.

In contrast to Maximum Difference strategy this is possible because the main constraint of Quorum Safe is satisfied i.e. in case of FD0 outage the majority of replicas – the quorum will remain safe.
The Quorum Safe strategy is more flexible than Maximum Difference and allows more variants of replica distribution across the cluster nodes by sacrificing some of fault tolerance characteristics.
The term “quorum” exists only for Stateful Services. That is why Quorum Safe strategy is used for Stateless Services this strategy basically allow more relaxed way of replica distribution.
Author’s note
Adaptive
The Maximum Difference and Quorum Safe placement strategies have their advantages and disadvantages that is why there is the third strategy – Adaptive strategy.
The main Adaptive strategy constraint can be formulated as following:
If the TargetReplicaSetSize is evenly divisible by the number of Fault Domains and the number of Upgrade Domains and the number of nodes is less than or equal to the (number of Fault Domains) x (the number of Upgrade Domains), the Cluster Resource Manager should utilize the “quorum based” logic for that service.
docs.microsoft.com
Generally speaking Adaptive strategy results in usage of either Maximum Difference or Quorum Safe strategies based on the service instance configuration. The Maximum Difference strategy is used by default but if the situation changes them Quorum Safe strategy is used.
Starting from SF v6.2 [*] strategy is the default strategy used by CM.
docs.microsoft.com
Here is a quick example:
Imagine an eight nodes cluster distributed across five fault domain and five upgrade domains (Picture A).

When using Adaptive strategy the service instance with TargetReplicaSetSize = 5
would have them distributed as shown on Picture B.

The Quorum Safe approach is utilized here because service instance meets all the conditions:
Condition #1
TargetReplicaSetSize
is evenly divisible by the number of fault and upgrade domains:
Condition #2
The number of nodes is less that or equal to the number of fault domains multiplied by the number of upgrade domains:
If we reduce TargetReplicaSetSize
to 4 then CM will start to utilize the Maximum Difference strategy as shown on (Picture C).

The Maximum Difference approach is utilized here because service instance doesn’t met all the conditions:
Condition #1
TargetReplicaSetSize
isn’t evenly divisible by the number of fault and upgrade domains:
Node Placement Properties
Placement Properties is the way of describing Node’s capabilities in terms abstract properties. These properties can be anything – they can reflect business purpose of the Node, hardware or software specification. These information is required in order to satisfy need on multi-tired application to host different services on different hardware, software or environment.
By default all Nodes have two predefined placement properties: NodeName
and NodeType
. These placement properties cannot be edited or changes. All other placement properties are defined as part of the NodeType definition and can be updated by updating cluster configuration.
"nodeTypes": [
{
"name": "HostBackendPrimary",
"placementProperties": {
"IsBackend": true
}
}
]
Example of placement properties definition in ClusterConfig.json
Service Placement Constraints
Placement Constraints – is the way of describing services instance requirements for the nodes hardware, software or business in terms of abstract properties. Combination of Placement Properties and Placement Constraints allows to make sure that replicas of the service instance are hosted in desired environment.
Placement Constraints are represented via conditional expression that support various types (string
, int
, enumeration
) and operations like: equality and comparison.
Service instance placement constraints can be modified at runtime. This possible could result in the massive replica movement around the cluster.
var updateDescription =
new StatelessServiceUpdateDescription();
updateDescription
.PlacementConstraints = "IsBackend == True";
await fabricClient.ServiceManager
.UpdateServiceAsync(
new Uri("fabric:/app/service"),
updateDescription);
Example: Updating service instance placement constraints
There is no naming convention for placement properties and placement constraints.
For example the following definition of NodeType is absolutely correct:
"nodeTypes": [
{
"name": "host-backend",
"placementProperties": {
"is-backend": true
}
}
]
The above definition of placement property is-backend
is correct and passes the validation done by SF but if we specify it as placement constraint.
var updateDescription =
new StatelessServiceUpdateDescription();
updateDescription
.PlacementConstraints = "is-backend == True";
await fabricClient.ServiceManager
.UpdateServiceAsync(
new Uri("fabric:/app/service"),
updateDescription);
No service replicas will be created. There is an issue on GitHub with details investigation regarding this point.
Service Placement Policy
When replicas can be placed only on certain nodes is managed by using placement properties and placement constraints, but there are cases when this isn’t enough. This can be related to physical cluster layout i.e. the nodes in a cluster spans geographically / geo-politically or this can be related to some performance considerations i.e. some workloads should always be collocated to reduce latency.
SF supports four placement policies driven by the knowledge of cluster layout: Invalid Domains, Required Domains, Preferred Primary Domain and Replica Packing.
Invalid Domain
Invalid Domain placement policy is used to specify fault domain to ignore during replica placement. The replica won’t be placed on nodes from specified domain even if there is no other nodes available.

There can be multiple invalid domain polices set on the same service in order to restrict replica placement on more than one fault domain – this can be configured at runtime.
Summary
✓ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✓ Can be configured multiple times?
✓ Can be updated at runtime?
Required Domain
Required Domain placement policy is used to specify fault domain to consider during replica placement. The replica will be placed only on nodes from the specified domain.

Summary
✓ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✓ Can be configured multiple times?
✓ Can be updated at runtime?
Preferred Domain
Preferred Domain policy is used to specify fault domain to use when placing stateful service primary replica. The primary replica ends up in this domain when everything is healthy. In case when domain isn’t healthy the Cluster Resource Manager moves primary replica to another location and returns it back to the preferred domain as soon as possible [*].

There can be only one preferred domain policy set on the same service at a time. This policy can be configured at runtime.
The policy makes sense only for Stateful Service.
Summary
✗ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✗ Can be configured multiple times?
✓ Can be updated at runtime?
Packing
Normally replicas are distributed between fault and upgrade domains when cluster is health. However there are situations when multiple replicas of the same partition can be placed into the single domain. This process is called replica packing.
Imagine a health cluster with four fault domains and three replicas distributed across them (Picture A).

Now something has happen and two of the four fault domains went down. On Picture B you can see that when replica packing is enabled – CM will create replica in one of the remaining fault domains.

Hence if replica packing is disabled – then CM won’t build any of new replica until nodes from new fault domains become available (Picture C).

A need for replica packing is treated as temporary situation that is why it is enabled by default. This policy can be configured at runtime.
Summary
✓ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✗ Can be configured multiple times?
✓ Can be updated at runtime?
Service Affinity
All application services are expected to be independent from each other but sometimes there are situations where it is necessary to ensure that all replicas of one service are collocated to replicas of another service [*].
Service Affinity is a directional relationship between two services where replicas of dependent service / child service (A) are collocated to replicas of principle service / parent services (B). The collocation process has several limitations:
- The collocation isn’t done immediately. There can be notable periods when replicas won’t be collocated because of cluster capacity limitations, replica failures of outages.
- The parent service can’t define partitioning schema.
In addition to the above limitations service affinity doesn’t support affinity chaining:
Instead of chaining this kind of relationships should be expressed in form of stars to top most service which would have the same effect:
Besides the default affinity type – None there are two affinity types: NonAlignedAffinity and AlignedAffinity.
NonAlignedAffinity
NonAlignedAffinity – instructs CM to place replicas of child service alongside to replicas of parent service. When using NonAlignedAffinity with stateful service CM doesn’t make difference between primary and secondary replicas.

This affinity schema can be configured at runtime.
Summary
✓ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✓ Can be updated at runtime?
AlignedAffinity
AlignedAffinity – instructs CM to place secondary replicas of child service alongside secondary replicas of parent service and primary replica or child service alongside primary replica of parent service.

The policy makes sense only for Stateful Service and can be configured at runtime.
Summary
✗ Can be used with Stateless Service?
✓ Can be used with Stateful Service?
✓ Can be updated at runtime?
Application Upgrade
Application Upgrade is the process of upgrading existing Application to a new version of ApplicationType. During the process SF check the ApplicationType version and if it is changed it compares all version of ServiceTypes and for Type that are changed it compares all version of the Code, Config and Data packages.
NOTE
The important moment with the Application Upgrade is while the process is called Application Upgrade it is important to understand that for SF there is no difference in upgrading application from 1.0 to 2.0 or from 2.0 to 1.0.
The upgrade process sequentially iterates over all upgrade domains and simultaneously upgrades all application within the upgrade domain. The application upgrade is a multi stage process that is why it is important to understand that at some point of time some of the services will be already upgraded to a new version while the rest of them will still run on the previous version. This leads to a situation when replicas of the same service are run two different versions of code.
Because any of the Application components can be upgraded Application Upgrade doesn’t always mean modification of service code. That is why when the upgrade process affects only Config or Data packages the replicas aren’t restarted. Instead of this the replica is notified using Code Package events.
NOTE
When replica restart is necessary the application upgrade process configuration allows to specify Force Restart parameter.
History
- 2023/02/21 – Introduced history section to track post updates.
- 2023/02/21 – Updated information in NodeType & Node section about mapping of node type to VMSS (Virtual Machine Scale Set) based on the tip in Tweet by Alexander Batishchev and the documentation.
2 thoughts on “Service Fabric Handbook”