Introductions
What is fleet?
Fleet is a continuous delivery solution. It detects changes, renders the source into a deployable artifact, and deploys to any matched clusters.
A solution to bring GitOps easily to a Kubernetes cluster, where the configuration of the deployed resources is maintained as the source of truth in a repository (git), changes are implemented by committing to the repository. Pull requests provide a gate and audit trail for review and release.
Relatively new, preview in Rancher v2.5
What can Fleet do?
Fleet’s primary function is to manage deployments from a git repository and turn these into helm charts, providing control into how these are deployed to clusters.
Clusters are organized by labels/selectors or in groups also by label/selector.
With Rancher integration, centralized RBAC is provided, along with a management UI and visibility of the fleet-related workloads and objects.
Why
Fleet comes built-in with Rancher; there is no need to deploy anything. Downstream clusters are automatically registered.
Scales, fleet supports a very high number of clusters.
Deployments are relatively self-documented. The currently deployed state is available in code with a history of changes. Deployments become consistent and repeatable.
Supports:
- Kubernetes manifests
- helm charts (inline, or an external helm repo)
- kustomize
A combination of these can be used within the same repo.
How does Fleet work?
Fleet is essentially made up of standard Kubernetes primitives - controllers and CRDs
These provide the spec to configure fleet with objects and the logic to perform the desired actions.
Git repo is monitored by polling for changes (by default); when a change occurs, a Bundle
is created, followed by a BundleDeployment
for the particular fleet-agent pod in a selected cluster to retrieve and is deploying.
As helm is used as a deployment mechanism, the behavior of helm should be expected where deployments to clusters are made.
Clusters register to the fleet manager cluster. This can be automated or manual.
- Lifecycle of a fleet bundle - https://fleet.rancher.io/ref-bundle-stages
Basics
Components
-
Fleet manager - the cluster that orchestrates the deployments, commonly the Rancher management cluster
-
Fleet controller - k8s controllers that work with the fleet CRD objects in the fleet manager
-
Fleet agent - one agent, is run on each downstream and local cluster, communicates to the fleet manager
Note, connectivity is always fleet-agent -> fleet-manager via the Kube-apiserver on the fleet-manager cluster. All management of fleet objects is done via the Kube-apiserver on the fleet manager, including downstream fleet-agents connecting; there is no custom ingress or API
One exception, with manager initiated registration, connectivity can occur from fleet-manager
Configuration
The two main areas of focus are:
GitRepo
defines and describes the target clusters, a repo and paths, and what to do with a fleet.yaml if it’s there
fleet.yaml
describes what to do with the contents (helm/manifests etc.) of the directory in the repo it resides, the fleet.yaml allows easy usage of the contents in and outside of fleet; it’s not a CRD. E.g., customize namespace, values for the helm, and overlays for manifests.
Bundle
- a unit (manifests, helm, kustomize, etc.) that is deployed to a cluster, multiple bundles could be created from a single GitRepo
BundleDeployment
- an instance of a bundle that is deployed to a cluster, containing its configuration for the fleet agent on a cluster to retrieve
Clusters and Groups
All clusters are automatically added when using Rancher. Manual registration or some automation is required when using fleet on its own
Users define groups and select clusters based on labels.
Cluster labels can be added, updated, or added during cluster creation to manage group membership.
Architecture
Multiple CRDs configure repositories, objects, and clusters that make up a fleet configuration.
Workspaces (the equivalent of k8s namespaces) are used to isolate areas of concern, and the following are always created:
fleet-default
- default for downstream clustersfleet-local
- local cluster (typically Rancher management cluster)
The fleet clusters CRD (clusters.fleet.cattle.io
) works with both cluster CRDs in Rancher v2.6 and mirrors configuration:
cluster.provisioning.cattle.io
(rke2/k3s)cluster.management.cattle.io
(rkev1)
Basic Architecture
Workflow
Working with Fleet
Namespaces
The primary objects in the fleet manager cluster are namespaced, allowing logical grouping and security to be applied.
For example: GitRepos, Bundles, Clusters, ClusterGroups
Ideally, teams or clusters do not deploy the same GitRepo. It is generally recommended that these be maintained in separate namespaces as a safety measure to avoid any possible label or group selection.
Some built-in namespaces are created in the fleet manager:
cattle-fleet-local-system
- special use for fleet-agent, also to bootstrap the fleet manager configuration on the fleet manager clustercattle-fleet-system
- fleet-controller and fleet-agent are deployed herecattle-fleet-clusters-system
- holds secrets for the cluster registration process
A namespace is also created for each cluster that is registered, in the form of cluster-${namespace}-${cluster}-${random}.
The purpose of this namespace is that all BundleDeployments
for that cluster are put into this namespace. Then the downstream cluster is given access to watch and update its BundleDeployments
in that namespace only.
GitRepos
Perhaps the most used fleet object.
Used to register a git repository, supports private repositories by creating a secret containing an SSH keypair
No particular structure is needed within the git repo itself. However, it is ideal to avoid committing large objects to a repository. At present, the size of a repository must gzip to less than 1MB
GitRepo structure - https://fleet.rancher.io/gitrepo-content
Mapping GitRepo to clusters - https://fleet.rancher.io/gitrepo-targets#defining-targets
fleet.yaml (or fleet.yml)
An optional but essential file to customize the resources that are deployed. Multiple fleet.yaml files can co-exist in a GitRepo under separate folder structures
Examples of customisation:
targetCustomizations
- different values for targets, eg: set the replica count different between test and prod environmentsdefaultNamespace
- when not specified, default to using this namespace for objectsrolloutStrategy
- define a batch size for the rollout of deployment to clusters
fleet.yaml example - https://fleet.rancher.io/ref-fleet-yaml
fleet.yaml - https://fleet.rancher.io/gitrepo-content#fleetyaml
Fleet webhook
By default, fleet polls (default: 15 seconds) to pull from a GitRepo. A webhook can be used instead; see the link below for the list of supported git solutions.
States
Clusters and Bundles have states representing the different phases.
Fleet states - https://fleet.rancher.io/cluster-bundles-state
Fleet webhook - https://fleet.rancher.io/webhook
Common Configurations
Initial configuration
Fleet creates a default
cluster group in the fleet-local
workspace; this is just a starting point.
The default
cluster group is configured to match clusters with the name: local
cluster label. Adding this label to the local cluster will add it to the cluster group.
Customising deployments
Often each cluster is slightly different in some way: environment, size, use case, location, active/passive, etc.
Fleet exposes different levels and approaches to customize each cluster; some have been covered. For completeness, here’s a list of ways to customize what fleet does:
GitRepo object fields
- paths - configures the particular path(s) in the repo for fleet to use in this
GitRepo
- targets - defines which clusters in the workspace/namespace will be selected for deployment
Multiple
targets
can be specified, like in fleet.yaml these can be all clusters, a specific cluster, cluster groups, or using a cluster label selector
fleet.yaml file fields
- helm -
version
,chart
,repo
etc for an external helm repo. Charts are configurable withvalues
,valuesFiles
,valuesFrom
(secrets/configmaps) - kustomize -
dir
for akustomization.yaml
file - targetCustomizations - can be used to overlay raw manifest files for all clusters, a specific cluster, cluster groups, or using a cluster label selector
- dependsOn - depend on another bundle
- target in a
GitRepo
can be thought of as a constraint - what clusters will be selected.- targetCustomization in
fleet.yaml
can be thought of as a strategy - how the selected clusters will be configured (if needed)
Git repo structure
- alphanumeric, fleet will process the directory structure in name order
- overlays, a directory structure to provide replacement or patch files to alter raw manifests, used in tandem with
fleet.yaml
to define which overlay directory is used
Single Cluster & Multi Cluster
Single Cluster
The cluster will run both the fleet manager and the fleet-agent. The cluster will communicate with a git repository to retrieve and deploy resources to the local cluster.
This is the most straightforward setup and valuable for dev/test and small-scale setups.
Multi Cluster
A very similar configuration to how Rancher works with a Rancher management (local) cluster, and downstream clusters
The below shows the fleet manager (local) and downstream clusters with fleet-agent pods running, communicating back to the Kube-apiserver of the local cluster.
A multi-cluster configuration evolves a single cluster with other clusters registered.
Differences between v2.5 / v2.6
v2.5
It was first introduced as an optional component. Fleet is available under Global > Tools > Continuous Delivery, which launches Cluster Explorer to access the Fleet UI.
v2.6
Fleet is available in the Dashboard under Continuous Delivery. The same UI experience as above is now a required component.
All other features should be relatively equal; fleet versions bundled with v2.5 / v2.6 can differ.
Fleet moved namespaces in v2.6.1 to avoid issues with Rancher managing a cluster and Rancher running.
Common Issues
Redeploying an application using Fleet
Suppose an application like Longhorn is deployed via the Rancher catalog or directly using helm. The labels fleet will be missing, and the Fleet will fail on that cluster.
Example error:
rendered manifests contain a resource that already exists. Unable to continue with install: ClusterIssuer "letsencrypt-prod" in namespace "" exists and cannot be imported into the current release: invalid ownership metadata; label validation error: missing key "app.kubernetes.io/managed-by": must be set to "Helm"; annotation validation error: missing key "meta.helm.sh/release-name": must be set to "cert-manager-auth-standard-cert-manager-auth"; annotation validation error: missing key "meta.helm.sh/release-namespace": must be set to "default"
Workaround: A) Uninstall the current application and let Fleet redeploy it. B) Manually add/edit the labels in the error message. Note: You might need to script this out as there might be a lot of objects to update.
Self modifying objects
If Fleet deploys an object on a downstream cluster, this is a known issue GH-30696. Fleet wants to own that object and if anything changes about it. Fleet will detect that difference and put it back if the application has a Kubernetes operator that changes after being deployed. Fleet will prevent the changes which cause the operator to adjust the object back. They both will keep fighting. Customers have reported that opa-gatekeeper and Kube-Prometheus-stack have this issue. Longhorn used to have a problem, but that was resolved under GH-189.
Workaround: None - This is working as designed. You should open a GH issue to see if Fleet can ignore the changes.
Flighting repos
Suppose you have the same object like a namespace defined in multiple Git Repos. The owner label that Fleet added will be different, and you will run into a race condition with both deployments fighting to set the label.
Workaround: None - This is working as designed, and objects should only be defined in a single Git Repo.
Changing subpaths of a git repo break the owner labels
There is a known issue GH-502 if you set up a repo with a subdirectory without a fleet.yaml
. Then, later on, change the subdirectory, the owner labels will not match, and the repo will become stuck.
Workaround: A) Specify the releaseName in Fleet.yaml B) Create a fleet.yaml in each empty directory C) Not have any files in the git repo that are not required
Repo file path issues
Fleet has several open issues related to having certain characters in the file path to the fleet config file. Note: There is an open enhancement GH-599 to address this.
- “.” is not allowed GH-273
- The path is too long if the path is over 63 characters [GH-432] (https://github.com/rancher/fleet/issues/432)
Workaround: None - You must remove/change the path to meet these limitations.
Considerations and Limitations
Considerations
Separate areas of concern
As a best practice, separate clusters and GitRepos into their workspaces (namespaces) to reduce blast radius. For example, if a cluster is accidentally matched/unmatched as a target, the related deployment(s) would also be installed/deleted.
Performance
By default, fleet will poll (default, 15 seconds) git repositories (pull-based), each GitRepo
configured will start an independent poll. This can be an issue at scale.
Some approaches to mitigate this are:
- Increasing the polling interval (e.g.:
pollingInterval: 2m
) - Reuse
GitRepo
that might be split unnecessarily, e.g., a singleGitRepo
for all paths of the same repo - Moving to a webhook model (push-based)
Resources
In a large environment where fleet is used with Rancher, resources allocated to the nodes in the Rancher management cluster may need more CPU and memory resources. Additionally, nodes may need to be added to cover different roles (etcd, controlplane) to better perform at scale.
Limitations
Git repositories are the only source control solution supported.
Like a Rancher cluster-agent, the fleet-agent needs connectivity back to the fleet manager cluster. With this in mind, a load balancer and dedicated hostname are recommended for an HA solution.
The contents of a GitRepo must gzip to less than 1MB due to being stored as a Kubernetes object. Avoid large file sizes in repositories where possible.
Troubleshooting
fleet-controller logs
Pod logs for fleet-controller
(fleet-default
workspace/namespace) pod on the fleet manager cluster (Rancher), check for unexpected patterns
kubectl logs -n cattle-fleet-system -l app=fleet-controller
fleet-agent logs
Check the fleet-agent pod logs on the cluster experiencing an issue, check for connectivity-related issues and deployment activity.
local cluster:
kubectl logs -n cattle-fleet-local-system -l app=fleet-agent
downstream cluster*:
kubectl logs -n cattle-fleet-system -l app=fleet-agent
Object status
GitRepo
being an abstraction around a Bundle
, check the status of the Bundle
object (under Advanced in the UI)
For connectivity issues, check the local and downstream fleet-agent pod logs.
Debug logs
As of Rancher v2.6.3 (fleet v0.3.8) debug logging is available. To enable go to Dashboard -> Local cluster -> Apps & Marketplace -> Installed Apps, upgrade the fleet
chart with value debug=true
Optionally debugLevel=5
can be set too.
Further reading
Fleet troubleshooting - https://fleet.rancher.io/troubleshooting
Bundle and Cluster states - https://fleet.rancher.io/cluster-bundles-state