Infrastructure Layer

Kubernetes

Gremlin allows targeting objects within your Kubernetes clusters. After selecting a cluster, you can filter the visible set of objects by selecting a namespace. Select any of your Deployments, ReplicaSets, StatefulSets, DaemonSets, or Pods. When one object is selected, all child objects will also be targeted. For example, when selecting a DaemonSet, all of the pods within will be selected.

Installation

The simplest way to install Gremlin on Kubernetes is with Helm. Check out Gremlin's Helm Chart Repository for full documentation and usage.

shell
1helm repo add gremlin https://helm.gremlin.com/
2kubectl create namespace gremlin
3helm install gremlin gremlin/gremlin --namespace gremlin \
4 --set gremlin.hostPID=true \
5 --set gremlin.container.driver=docker-runc \
6 --set gremlin.secret.managed=true \
7 --set gremlin.secret.type=secret \
8 --set gremlin.secret.teamID=$GREMLIN_TEAM_ID \
9 --set gremlin.secret.clusterID=$GREMLIN_CLUSTER_ID \
10 --set gremlin.secret.teamSecret=$GREMLIN_TEAM_SECRET

Some environments require more configuration, check out the resources below to help you find the best configuration for your environment.

Cri-O and Containerd

As of version 2.16.0, you can now install Gremlin on Kubernetes running Cri-O or Containerd. Follow this guide to get started.

OpenShift

As of version 2.16.0, you can now install Gremlin on OpenShift 3 and OpenShift 4 running Cri-O or Containerd.

Install manually

If the above sections are not what you're looking for, follow this guide to install Gremlin manually, from nothing but YAML files and a text editor.

Other considerations

Enabling Gremlin on the Kubernetes Master

Most Kubernetes deployments configure master nodes with the node-role.kubernetes.io/master:NoSchedule taint. You can run the following command to see if any of your nodes have this taint:

shell
1kubectl get no -o=custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
1NAME TAINTS
2kube-01 [map[effect:NoSchedule key:node-role.kubernetes.io/master]]
3kube-02 <none>

If you wish to install Gremlin on a Kubernetes master that has been tainted, add a tolerations section to the PodSpec of the Gremlin Client Manifest.

yaml
1tolerations:
2 - key: node-role.kubernetes.io/master
3 operator: Exists
4 effect: NoSchedule

You will need to reapply the Gremlin client manifest after making this change.

AppArmor support

If your cluster has AppArmor enabled (e.g. Azure Kubernetes Service), add the following line to your Helm deployment. This allows the Gremlin container to run without a security profile:

shell
1--set gremlin.apparmor=unconfined

Proxy configuration

Both Gremlin and Chao can be configured to use a proxy for outgoing HTTP traffic. The conventional https_proxy and no_proxy variables can be passed as environment variables for this purpose.

Proxy certificate authorities

When proxies support HTTPS communication, or are otherwise configured with a TLS certificate, it can be necessary to configure Gremlin to trust the proxy's certificate authority. This is done by passing the SSL_CERT_FILE environment variable where the value is a path on the file system to a PEM encoded file containing the certificate authority's certificate.

Configuring Gremlin

yaml
1- name: https_proxy
2 value: http://proxy.local:3128
3# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate
4- name: SSL_CERT_FILE
5 value: /etc/gremlin/ssl/proxy-ca.pem

Configuring Chao

Because the Gremlin Kubernetes Client (Chao) communicates with the local Kubernetes ApiServer in addition to the internet, it's important to bypass internet proxies for traffic bound to apiserver

yaml
1- name: https_proxy
2 value: http://proxy.local:3128
3- name: no_proxy
4 value: $(KUBERNETES_SERVICE_HOST):$(KUBERNETES_SERVICE_PORT)
5# Pass SSL_CERT_FILE when the proxy requires trusting a TLS certificate
6- name: SSL_CERT_FILE
7 value: /etc/gremlin/ssl/proxy-ca.pem
8# Pass SSL_CERT_DIR when SSL_CERT_FILE contains only the proxy certificate. This will ensure Chao trusts api.gremlin.com
9# The value of SSL_CERT_DIR varies depending on the operating system on which the cluster hosts run
10# See https://www.gremlin.com/docs/infrastructure-layer/kubernetes/#ssl_cert_dir
11- name: SSL_CERT_DIR
12 value: /etc/ssl/
SSL_CERT_DIR

Supplying SSL_CERT_DIR ensures Chao is still configured with the necessary certificate authories to trust api.gremlin.com. However it is not needed for most Gremlin installations because Chao will trust Gremlin servers by default. This variable is only required for Chao deployments when both of the following conditions are true:

  • Chao is configured with https_proxy and this proxy is configured to accept TLS connections
  • Chao is also configured with SSL_CERT_FILE, and the file it points to contains only the certificate authority for the https proxy

The value of SSL_CERT_DIR should point to the root of the certificate authority directory for the operating system on which Chao runs.

PathOS
/etc/ssl/certs/Debian/Ubuntu
/etc/pki/tls/Fedora/RHEL 6/OpenELEC
/etc/ssl/OpenSUSE / Alpine Linux
/etc/pki/ca-trust/extracted/pem/CentOS/RHEL 7

Using a PodSecurityPolicy

Gremlin does not support running within the restricted PodSecurityPolicy (PSP) that is configured by default on clusters that enable such policies. You can install a gremlin PodSecurityPolicy to grant chao and gremlin everything they need, and nothing they don't need.

When installing Gremlin with Helm, you can supply --set gremlin.podSecurity.podSecurityPolicy.create=true to install Gremlin's custom pod security policies. Check out Gremlin's Helm Chart Repository for full documentation and usage.

Without Helm, you can download Gremlin's PSP files and install them with kubectl

shell
1mkdir gremlin-psp
2wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/chao-psp.yaml
3wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/gremlin-psp.yaml
4kubectl create -f gremlin-psp/

Using a custom Seccomp policy

All Gremlin behavior works under Docker's default seccomp policy. However some environments use seccomp profiles that are more restrictive, and prevent Gremlin behavior when using their default seccomp profile.

Gremlin has a custom seccomp profile which can be automatically installed when you install with Helm and pass --set gremlin.podSecurity.seccomp.enabled=true. Check out Gremlin's Helm Chart Repository for full documentation and usage.

You can also download this seccomp policy in order to install it manually.

shell
1mkdir gremlin-psp
2wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/chao-psp.yaml
3wget -P gremlin-psp/ https://k8s.gremlin.com/resources/psp/v1/gremlin-psp.yaml
4kubectl create -f gremlin-psp/

Gremlin container drivers

Gremlin currently has 4 different drivers for integrating with the underlying container runtime powering Kubernetes:

DriverRequirements and file accessMore info
docker
  • Connect: /var/run/docker.sock
No support for systemd cgroup driver
docker-runc
  • Connect: /var/run/docker.sock
  • Write: /run/docker/runtime-runc/moby
  • Write: host's cgroup root
  • Minimum Docker version: 17.11.0
Recommended for the Docker runtime
crio-runc
  • Connect: /run/crio/crio.sock
  • Write: /run/runc
  • Write: host's cgroup root
  • Access to the host's PID namespace
Used with the Cri-O container runtime
containerd-runc
  • Connect: /run/containerd/containerd.sock
  • Write: /run/containerd/runc/k8s.io
  • Write: host's cgroup root
  • Access to the host's PID namespace
Used with the Cri-O container runtime

Gremlin automatically chooses any of the above cgroup drivers when the associated requirements are met. Users installing with Helm can automatically provide all requirements by declaring the intended container driver with

shell
1--set gremlin.container.driver=$driver

Verify your installation

Last you need to check that Gremlin is installed properly

bash
1kubectl get pods -n gremlin

This should list a Gremlin agent per node (physical/virtual machine in your cluster) plus one for chao

Example

shell
1kubectl get pods -n gremlin
2
3NAME READY STATUS RESTARTS AGE
4chao-78bbc7cbf6-9hn7q 1/1 Running 0 5d20h
5gremlin-9r4t7 1/1 Running 0 5d20h
6gremlin-bwmtz 1/1 Running 1 126d
7gremlin-bx6dn 1/1 Running 0 5d20h

Pending Pods

If any pods are pending this means your installation is incomplete and you should contact your cluster administrator to debug why you are unable to run gremlin on those nodes

shell
1kubectl get pods -n gremlin
2
3NAME READY STATUS RESTARTS AGE
4chao-78bbc7cbf6-9hn7q 1/1 Running 0 5d20h
5gremlin-c25ld 0/1 Pending 0 112d
6gremlin-n5gt7 0/1 Pending 0 112d
7gremlin-zn4kq 1/1 Running 0 126d

Selecting Containers

For state and resource attack types, you can choose to target one, all, or specific containers within a selected pod. Once targets have been selected, all state and resource attack types will present this configuration. Selecting 'any' will target a single container within each pod at runtime. If you've selected more than one target (eg. Deployment), you can select from a list of common containers across all of these targets.

Running an attack

Once you select the Kubernetes objects to be targeted, select and configure your desired Gremlin attack. When the attack is run, the underlying containers within the objects selected will be impacted.

Namespace access control

With the Kubernetes client installed on your cluster you can share individual namespaces with other Gremlin teams. Once installed head to the Clients section to view all of the clusters installed across your company.

By sharing individual namespaces to teams across your company, you can provide access for users to run attacks only on relevant services while also limiting access to the hosts or nodes themselves.

Managing Cluster access

As the Team Manager on a team where a Kubernetes cluster is installed or as a Company Manager, you can click the gear icon to manage access. On the cluster view, to share a namespace with a team use the search box to filter down the list of available teams. Then use the search box on the team row and click on the namespace you'd like to share. Use the options menu to share all of the namespaces.

To remove access of a namespace to a team, click on the x on the blue namesapce bubbles. Using the options menu you can also remove all namespaces at once.

Requesting Namespace access

As a member of a different team of your company, you can view the list of clusters installed across your company. To request access to a namespace on one of these clusters not installed on your team, click the Request Access button. You can then check off the namespaces you'd like access to, or you can use the select all switch.

You can also request access to a namespace within a cluster when creating an attack. Once you've selected a cluster, the drop down list of namespaces will have an option to request access.

Approving access requests

As a Team Manager where a cluster is installed on your team, you'll recieve an email when a user in your company has requested access to a namespace. Open the view of the cluster where you can approve or deny the request.

Troubleshooting

Run Chao in debug mode

Chao supports the GODEBUG environment variable, which can be used to enable debug features such as verbose logging of HTTP activity. You can enable verbose http logs by adding the following variable to the environment section of the Chao deployment.

NOTE: Verbose logging prints sensitive information like HTTP request and response bodies. This configuration is intended to be a troubleshooting measure only, and should be removed when unused.

yaml
1- name: GODEBUG
2 value: http2debug=2

Chao's logs will now contain verbose logs for http requests.

Run Gremlin checks

Gremlin's check subcommand can be run on Kubernetes clusters in order to troubleshoot common issues with configuration or compatibility with the environment. The following is an example Job that can be run to get gremlin check output

yaml
1apiVersion: batch/v1
2kind: Job
3metadata:
4 name: gremlin-check
5 namespace: gremlin
6 labels:
7 k8s-app: gremlin
8 version: v1
9spec:
10 template:
11 metadata:
12 labels:
13 app.kubernetes.io/name: gremlin-check
14 spec:
15 restartPolicy: Never
16 containers:
17 - name: gremlin
18 image: gremlin/gremlin
19 # You can also pass subcommands (like `proxy` to check only proxy information)
20 args: [ "check" ]
21 env:
22 # # Pass the same environment you would pass to the Gremlin DaemonSet, including secrets, and proxy information
23 - name: GREMLIN_TEAM_PRIVATE_KEY_OR_FILE
24 value: file:///var/lib/gremlin/cert/gremlin.key
25 - name: GREMLIN_TEAM_CERTIFICATE_OR_FILE
26 value: file:///var/lib/gremlin/cert/gremlin.cert
27 - name: GREMLIN_IDENTIFIER
28 valueFrom:
29 fieldRef:
30 fieldPath: spec.nodeName
31 # # Example proxy configuration
32 # - name: https_proxy
33 # value: http://my-proxy:3128
34 # - name: SSL_CERT_FILE
35 # value: /etc/gremlin/ssl/proxy-ca.pem
36 # - name: GREMLIN_TEAM_ID
37 # value: my-team-id
38 volumeMounts:
39 - name: docker-sock
40 mountPath: /var/run/docker.sock
41 - name: gremlin-state
42 mountPath: /var/lib/gremlin
43 - name: gremlin-logs
44 mountPath: /var/log/gremlin
45 - name: gremlin-cert
46 mountPath: /var/lib/gremlin/cert
47 readOnly: true
48 # # Example proxy configuration
49 # - name: proxy-ca
50 # mountPath: /etc/gremlin/ssl
51 volumes:
52 - name: docker-sock
53 hostPath:
54 path: /var/run/docker.sock
55 - name: gremlin-state
56 hostPath:
57 path: /var/lib/gremlin
58 - name: gremlin-logs
59 hostPath:
60 path: /var/log/gremlin
61 - name: gremlin-cert
62 secret:
63 secretName: gremlin-secret
64 # # Example proxy configuration
65 # - name: proxy-ca
66 # configMap:
67 # name: proxy-ca
68 backoffLimit: 4

Once deployed, you can get the output of gremlin check by pulling the logs of the Pod associated with the Job:

shell
1kubectl logs --follow \
2 --namespace gremlin \
3 $(kubectl get pods --namespace gremlin --selector=job-name=gremlin-check --output=jsonpath='{.items[*].metadata.name}')
1proxy
2====================================================
3https_proxy : http://proxy.local:3128
4http_proxy : (unset)
5SSL_CERT_FILE : /etc/gremlin/ssl/proxy-ca.pem
6Service Ping : OK