Fleet Training Guide

Introductions

What is fleet?

Fleet is a continuous delivery solution. It detects changes, renders the source into a deployable artifact, and deploys to any matched clusters.

A solution to bring GitOps easily to a Kubernetes cluster, where the configuration of the deployed resources is maintained as the source of truth in a repository (git), changes are implemented by committing to the repository. Pull requests provide a gate and audit trail for review and release.

Relatively new, preview in Rancher v2.5

gitops-workflow

What can Fleet do?

Fleet’s primary function is to manage deployments from a git repository and turn these into helm charts, providing control into how these are deployed to clusters.

Clusters are organized by labels/selectors or in groups also by label/selector.

With Rancher integration, centralized RBAC is provided, along with a management UI and visibility of the fleet-related workloads and objects.

Why

Fleet comes built-in with Rancher; there is no need to deploy anything. Downstream clusters are automatically registered.

Scales, fleet supports a very high number of clusters.

Deployments are relatively self-documented. The currently deployed state is available in code with a history of changes. Deployments become consistent and repeatable.

Supports:

Kubernetes manifests
helm charts (inline, or an external helm repo)
kustomize

A combination of these can be used within the same repo.

How does Fleet work?

Fleet is essentially made up of standard Kubernetes primitives - controllers and CRDs

These provide the spec to configure fleet with objects and the logic to perform the desired actions.

Git repo is monitored by polling for changes (by default); when a change occurs, a Bundle is created, followed by a BundleDeployment for the particular fleet-agent pod in a selected cluster to retrieve and is deploying.

As helm is used as a deployment mechanism, the behavior of helm should be expected where deployments to clusters are made.

Clusters register to the fleet manager cluster. This can be automated or manual.

Lifecycle of a fleet bundle - https://fleet.rancher.io/ref-bundle-stages

Basics

Components

Fleet manager - the cluster that orchestrates the deployments, commonly the Rancher management cluster
Fleet controller - k8s controllers that work with the fleet CRD objects in the fleet manager
Fleet agent - one agent, is run on each downstream and local cluster, communicates to the fleet manager

Note, connectivity is always fleet-agent -> fleet-manager via the Kube-apiserver on the fleet-manager cluster. All management of fleet objects is done via the Kube-apiserver on the fleet manager, including downstream fleet-agents connecting; there is no custom ingress or API

One exception, with manager initiated registration, connectivity can occur from fleet-manager

Configuration

The two main areas of focus are:

GitRepo defines and describes the target clusters, a repo and paths, and what to do with a fleet.yaml if it’s there

fleet.yaml describes what to do with the contents (helm/manifests etc.) of the directory in the repo it resides, the fleet.yaml allows easy usage of the contents in and outside of fleet; it’s not a CRD. E.g., customize namespace, values for the helm, and overlays for manifests.

Bundle - a unit (manifests, helm, kustomize, etc.) that is deployed to a cluster, multiple bundles could be created from a single GitRepo

BundleDeployment - an instance of a bundle that is deployed to a cluster, containing its configuration for the fleet agent on a cluster to retrieve

Clusters and Groups

All clusters are automatically added when using Rancher. Manual registration or some automation is required when using fleet on its own

Users define groups and select clusters based on labels.

Cluster labels can be added, updated, or added during cluster creation to manage group membership.

Architecture

Multiple CRDs configure repositories, objects, and clusters that make up a fleet configuration.

Workspaces (the equivalent of k8s namespaces) are used to isolate areas of concern, and the following are always created:

fleet-default - default for downstream clusters
fleet-local - local cluster (typically Rancher management cluster)

The fleet clusters CRD (clusters.fleet.cattle.io) works with both cluster CRDs in Rancher v2.6 and mirrors configuration:

cluster.provisioning.cattle.io (rke2/k3s)
cluster.management.cattle.io (rkev1)

Basic Architecture

fleet-architecture-1

Workflow

fleet-architecture-2

Working with Fleet

Namespaces

The primary objects in the fleet manager cluster are namespaced, allowing logical grouping and security to be applied.

For example: GitRepos, Bundles, Clusters, ClusterGroups

Ideally, teams or clusters do not deploy the same GitRepo. It is generally recommended that these be maintained in separate namespaces as a safety measure to avoid any possible label or group selection.

Some built-in namespaces are created in the fleet manager:

cattle-fleet-local-system - special use for fleet-agent, also to bootstrap the fleet manager configuration on the fleet manager cluster
cattle-fleet-system - fleet-controller and fleet-agent are deployed here
cattle-fleet-clusters-system - holds secrets for the cluster registration process

A namespace is also created for each cluster that is registered, in the form of cluster-${namespace}-${cluster}-${random}. The purpose of this namespace is that all BundleDeployments for that cluster are put into this namespace. Then the downstream cluster is given access to watch and update its BundleDeployments in that namespace only.

GitRepos

Perhaps the most used fleet object.

Used to register a git repository, supports private repositories by creating a secret containing an SSH keypair

No particular structure is needed within the git repo itself. However, it is ideal to avoid committing large objects to a repository. At present, the size of a repository must gzip to less than 1MB

GitRepo structure - https://fleet.rancher.io/gitrepo-content

Mapping GitRepo to clusters - https://fleet.rancher.io/gitrepo-targets#defining-targets

fleet.yaml (or fleet.yml)

An optional but essential file to customize the resources that are deployed. Multiple fleet.yaml files can co-exist in a GitRepo under separate folder structures

Examples of customisation:

targetCustomizations - different values for targets, eg: set the replica count different between test and prod environments
defaultNamespace - when not specified, default to using this namespace for objects
rolloutStrategy - define a batch size for the rollout of deployment to clusters

fleet.yaml example - https://fleet.rancher.io/ref-fleet-yaml

fleet.yaml - https://fleet.rancher.io/gitrepo-content#fleetyaml

Fleet webhook

By default, fleet polls (default: 15 seconds) to pull from a GitRepo. A webhook can be used instead; see the link below for the list of supported git solutions.

States

Clusters and Bundles have states representing the different phases.

Fleet states - https://fleet.rancher.io/cluster-bundles-state

Fleet webhook - https://fleet.rancher.io/webhook

Common Configurations

Initial configuration

Fleet creates a default cluster group in the fleet-local workspace; this is just a starting point.

The default cluster group is configured to match clusters with the name: local cluster label. Adding this label to the local cluster will add it to the cluster group.

Customising deployments

Often each cluster is slightly different in some way: environment, size, use case, location, active/passive, etc.

Fleet exposes different levels and approaches to customize each cluster; some have been covered. For completeness, here’s a list of ways to customize what fleet does:

GitRepo object fields

paths - configures the particular path(s) in the repo for fleet to use in this GitRepo
targets - defines which clusters in the workspace/namespace will be selected for deployment

Multiple targets can be specified, like in fleet.yaml these can be all clusters, a specific cluster, cluster groups, or using a cluster label selector

fleet.yaml file fields

helm - version, chart, repo etc for an external helm repo. Charts are configurable with values, valuesFiles, valuesFrom (secrets/configmaps)
kustomize - dir for a kustomization.yaml file
targetCustomizations - can be used to overlay raw manifest files for all clusters, a specific cluster, cluster groups, or using a cluster label selector
dependsOn - depend on another bundle

target in a GitRepo can be thought of as a constraint - what clusters will be selected.

targetCustomization in fleet.yaml can be thought of as a strategy - how the selected clusters will be configured (if needed)

Git repo structure

alphanumeric, fleet will process the directory structure in name order
overlays, a directory structure to provide replacement or patch files to alter raw manifests, used in tandem with fleet.yaml to define which overlay directory is used

Single Cluster & Multi Cluster

Single Cluster

The cluster will run both the fleet manager and the fleet-agent. The cluster will communicate with a git repository to retrieve and deploy resources to the local cluster.

This is the most straightforward setup and valuable for dev/test and small-scale setups.

single cluster

Multi Cluster

A very similar configuration to how Rancher works with a Rancher management (local) cluster, and downstream clusters

The below shows the fleet manager (local) and downstream clusters with fleet-agent pods running, communicating back to the Kube-apiserver of the local cluster.

A multi-cluster configuration evolves a single cluster with other clusters registered.

multi cluster

Differences between v2.5 / v2.6

v2.5

It was first introduced as an optional component. Fleet is available under Global > Tools > Continuous Delivery, which launches Cluster Explorer to access the Fleet UI.

v2.6

Fleet is available in the Dashboard under Continuous Delivery. The same UI experience as above is now a required component.

All other features should be relatively equal; fleet versions bundled with v2.5 / v2.6 can differ.

Fleet moved namespaces in v2.6.1 to avoid issues with Rancher managing a cluster and Rancher running.

fleet namespaces

Common Issues

Redeploying an application using Fleet

Suppose an application like Longhorn is deployed via the Rancher catalog or directly using helm. The labels fleet will be missing, and the Fleet will fail on that cluster.

Example error:

rendered manifests contain a resource that already exists. Unable to continue with install: ClusterIssuer "letsencrypt-prod" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-auth-standard-cert-manager-auth"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"

Workaround: A) Uninstall the current application and let Fleet redeploy it. B) Manually add/edit the labels in the error message. Note: You might need to script this out as there might be a lot of objects to update.

Self modifying objects

If Fleet deploys an object on a downstream cluster, this is a known issue GH-30696. Fleet wants to own that object and if anything changes about it. Fleet will detect that difference and put it back if the application has a Kubernetes operator that changes after being deployed. Fleet will prevent the changes which cause the operator to adjust the object back. They both will keep fighting. Customers have reported that opa-gatekeeper and Kube-Prometheus-stack have this issue. Longhorn used to have a problem, but that was resolved under GH-189.

Workaround: None - This is working as designed. You should open a GH issue to see if Fleet can ignore the changes.

Flighting repos

Suppose you have the same object like a namespace defined in multiple Git Repos. The owner label that Fleet added will be different, and you will run into a race condition with both deployments fighting to set the label.

Workaround: None - This is working as designed, and objects should only be defined in a single Git Repo.

Changing subpaths of a git repo break the owner labels

There is a known issue GH-502 if you set up a repo with a subdirectory without a fleet.yaml. Then, later on, change the subdirectory, the owner labels will not match, and the repo will become stuck.

Workaround: A) Specify the releaseName in Fleet.yaml B) Create a fleet.yaml in each empty directory C) Not have any files in the git repo that are not required

Repo file path issues

Fleet has several open issues related to having certain characters in the file path to the fleet config file. Note: There is an open enhancement GH-599 to address this.

“.” is not allowed GH-273
The path is too long if the path is over 63 characters [GH-432] (https://github.com/rancher/fleet/issues/432)

Workaround: None - You must remove/change the path to meet these limitations.

Considerations and Limitations

Considerations

Separate areas of concern

As a best practice, separate clusters and GitRepos into their workspaces (namespaces) to reduce blast radius. For example, if a cluster is accidentally matched/unmatched as a target, the related deployment(s) would also be installed/deleted.

Performance

By default, fleet will poll (default, 15 seconds) git repositories (pull-based), each GitRepo configured will start an independent poll. This can be an issue at scale.

Some approaches to mitigate this are:

Increasing the polling interval (e.g.: pollingInterval: 2m)
Reuse GitRepo that might be split unnecessarily, e.g., a single GitRepo for all paths of the same repo
Moving to a webhook model (push-based)
- https://fleet.rancher.io/webhook

Resources

In a large environment where fleet is used with Rancher, resources allocated to the nodes in the Rancher management cluster may need more CPU and memory resources. Additionally, nodes may need to be added to cover different roles (etcd, controlplane) to better perform at scale.

Limitations

Git repositories are the only source control solution supported.

Like a Rancher cluster-agent, the fleet-agent needs connectivity back to the fleet manager cluster. With this in mind, a load balancer and dedicated hostname are recommended for an HA solution.

The contents of a GitRepo must gzip to less than 1MB due to being stored as a Kubernetes object. Avoid large file sizes in repositories where possible.

Troubleshooting

fleet-controller logs

Pod logs for fleet-controller (fleet-default workspace/namespace) pod on the fleet manager cluster (Rancher), check for unexpected patterns

kubectl logs -n cattle-fleet-system -l app=fleet-controller

fleet-agent logs

Check the fleet-agent pod logs on the cluster experiencing an issue, check for connectivity-related issues and deployment activity.

local cluster:

kubectl logs -n cattle-fleet-local-system -l app=fleet-agent

downstream cluster*:

kubectl logs -n cattle-fleet-system -l app=fleet-agent

Object status

GitRepo being an abstraction around a Bundle, check the status of the Bundle object (under Advanced in the UI)

For connectivity issues, check the local and downstream fleet-agent pod logs.

Debug logs

As of Rancher v2.6.3 (fleet v0.3.8) debug logging is available. To enable go to Dashboard -> Local cluster -> Apps & Marketplace -> Installed Apps, upgrade the fleet chart with value debug=true

Optionally debugLevel=5 can be set too.

Further reading

Fleet troubleshooting - https://fleet.rancher.io/troubleshooting

Bundle and Cluster states - https://fleet.rancher.io/cluster-bundles-state

Introductions#

What is fleet?#

What can Fleet do?#

Why#

How does Fleet work?#

Basics#

Components#

Configuration#

Clusters and Groups#

Architecture#

Basic Architecture#

Workflow#

Working with Fleet#

Namespaces#

GitRepos#

fleet.yaml (or fleet.yml)#

Fleet webhook#

States#

Common Configurations#

Initial configuration#

Customising deployments#

GitRepo object fields#

fleet.yaml file fields#

Git repo structure#

Single Cluster & Multi Cluster#

Single Cluster#

Multi Cluster#

Differences between v2.5 / v2.6#

v2.5#

v2.6#

Common Issues#

Redeploying an application using Fleet#

Self modifying objects#

Flighting repos#

Changing subpaths of a git repo break the owner labels#

Repo file path issues#

Considerations and Limitations#

Considerations#

Separate areas of concern#

Performance#

Resources#

Limitations#

Troubleshooting#

fleet-controller logs#

fleet-agent logs#

Object status#

Debug logs#