审计简介.md
审计简介
Kubernetes 审计(Auditing) 功能提供了与安全相关的、按时间顺序排列的记录集, 记录每个用户、使用 Kubernetes API 的应用以及控制面自身引发的活动(所有访问kube-apiserver服务的客户端)
审计功能使得集群管理员能够回答以下问题:
- 发生了什么?
- 什么时候发生的?
- 谁触发的?
- 活动发生在哪个(些)对象上?
- 在哪观察到的?
- 它从哪触发的?
- 活动的后续处理行为是什么?
这对平台管理者来说十分重要,能够回答一些在故障时候出现的问题。哪个用户,从哪个ip上,发起了什么请求。如删除了一个命名空间,导致这个命名空间下的所有pod被回收,对故障进行复盘。
审计配置
Kubernetes 审计功能由 kube-apiserver 服务提供。每个请求在不同执行阶段都会生成审计事件;这些审计事件会根据特定策略被预处理并写入后端。策略确定要记录的内容,当前的后端支持日志文件和 webhook
审计日志
通过配置 kube-apiserver 的下列参数开启审计日志
audit-log-path
:审计日志路径audit-log-maxage
:旧日志最长保留天数audit-log-maxbackup
:旧日志文件最多保留个数audit-log-maxsize
:日志文件最大大小(单位 MB),超过后自动做轮转(默认为 100MB)
每条审计记录包括两行
- 请求行包括:唯一 ID 和请求的元数据(如源 IP、用户名、请求资源等)
- 响应行包括:唯一 ID(与请求 ID 一致)和响应的元数据(如 HTTP 状态码)
比如,admin 用户查询默认 namespace 的 Pod 列表的审计日志格式为
2017-03-21T03:57:09.106841886-04:00 AUDIT: id="c939d2a7-1c37-4ef1-b2f7-4ba9b1e43b53" ip="127.0.0.1" method="GET" user="admin" groups="\"system:masters\",\"system:authenticated\""as="<self>"asgroups="<lookup>"namespace="default"uri="/api/v1/namespaces/default/pods"
2017-03-21T03:57:09.108403639-04:00 AUDIT: id="c939d2a7-1c37-4ef1-b2f7-4ba9b1e43b53" response="200"
审计策略
v1.7 + 支持实验性的高级审计特性,可以自定义审计策略(选择记录哪些事件)和审计存储后端(日志和 webhook)等。开启方法为
注意开启 AdvancedAuditing 后,日志的格式有一些修改,如新增了 stage 字段(包括 RequestReceived,ResponseStarted ,ResponseComplete,Panic 等)
审计策略选择记录哪些事件,设置方法为
每个请求都可被记录其相关的阶段(stage)。已定义的阶段有:
RequestReceived
- 此阶段对应审计处理器接收到请求后,并且在委托给处理器处理之前生成的事件ResponseStarted
- 在响应消息的头部发送后,响应消息体发送前生成的事件。 只有长时间运行的请求(例如 watch)才会生成这个阶段ResponseComplete
- 当响应消息体完成并且没有更多数据需要传输的时候Panic
- 当 panic 发生时生成
其中,审计策略的配置格式为
rules:
# Don't log watch requests by the"system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
参考 GCE 审计策略配置:https://github.com/kubernetes/kubernetes/blob/v1.29.1/cluster/gce/gci/configure-helper.sh#L1113-L1273
# Write the config for the audit policy.
function create-master-audit-policy {
local -r path="${1}"
local -r policy="${2:-}"
if [[ -n "${policy}" ]]; then
echo "${policy}" > "${path}"
return
fi
# Known api groups
local -r known_apis='
- group: "" # core
- group: "admissionregistration.k8s.io"
- group: "apiextensions.k8s.io"
- group: "apiregistration.k8s.io"
- group: "apps"
- group: "authentication.k8s.io"
- group: "authorization.k8s.io"
- group: "autoscaling"
- group: "batch"
- group: "certificates.k8s.io"
- group: "extensions"
- group: "metrics.k8s.io"
- group: "networking.k8s.io"
- group: "node.k8s.io"
- group: "policy"
- group: "rbac.authorization.k8s.io"
- group: "scheduling.k8s.io"
- group: "storage.k8s.io"'
cat <<EOF >"${path}"
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# The following requests were manually identified as high-volume and low-risk,
# so drop them.
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core
resources: ["endpoints", "services", "services/status"]
- level: None
# Ingress controller reads 'configmaps/ingress-uid' through the unsecured port.
# TODO(#46983): Change this to the ingress controller service account.
users: ["system:unsecured"]
namespaces: ["kube-system"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["configmaps"]
- level: None
users: ["kubelet"] # legacy kubelet identity
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
userGroups: ["system:nodes"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["nodes", "nodes/status"]
- level: None
users:
- system:kube-controller-manager
- system:cloud-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
- level: None
users: ["system:apiserver"]
verbs: ["get"]
resources:
- group: "" # core
resources: ["namespaces", "namespaces/status", "namespaces/finalize"]
- level: None
users: ["cluster-autoscaler"]
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["configmaps", "endpoints"]
# Don't log HPA fetching metrics.
- level: None
users:
- system:kube-controller-manager
- system:cloud-controller-manager
verbs: ["get", "list"]
resources:
- group: "metrics.k8s.io"
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Don't log events requests because of performance impact.
- level: None
resources:
- group: "" # core
resources: ["events"]
# node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
- level: Request
users: ["kubelet", "system:node-problem-detector", "system:serviceaccount:kube-system:node-problem-detector"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
- level: Request
userGroups: ["system:nodes"]
verbs: ["update","patch"]
resources:
- group: "" # core
resources: ["nodes/status", "pods/status"]
omitStages:
- "RequestReceived"
# deletecollection calls can be large, don't log responses for expected namespace deletions
- level: Request
users: ["system:serviceaccount:kube-system:namespace-controller"]
verbs: ["deletecollection"]
omitStages:
- "RequestReceived"
# Secrets, ConfigMaps, TokenRequest and TokenReviews can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps", "serviceaccounts/token"]
- group: authentication.k8s.io
resources: ["tokenreviews"]
omitStages:
- "RequestReceived"
# Get responses can be large; skip them.
- level: Request
verbs: ["get", "list", "watch"]
resources: ${known_apis}
omitStages:
- "RequestReceived"
# Default level for known APIs
- level: RequestResponse
resources: ${known_apis}
omitStages:
- "RequestReceived"
# Default level for all other requests.
- level: Metadata
omitStages:
- "RequestReceived"
EOF
}
审计存储后端
审计存储后端支持两种方式
- 日志,配置
--audit-log-path
开启,格式为
2017-06-15T21:50:50.259470834Z AUDIT: id="591e9fde-6a98-46f6-b7bc-ec8ef575696d" stage="RequestReceived" ip="10.2.1.3" method="update" user="system:serviceaccount:kube-system:default" groups="\"system:serviceaccounts\",\"system:serviceaccounts:kube-system\",\"system:authenticated\""as="<self>"asgroups="<lookup>"namespace="kube-system"uri="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager"response="<deferred>"
2017-06-15T21:50:50.259470834Z AUDIT: id="591e9fde-6a98-46f6-b7bc-ec8ef575696d" stage="ResponseComplete" ip="10.2.1.3" method="update" user="system:serviceaccount:kube-system:default" groups="\"system:serviceaccounts\",\"system:serviceaccounts:kube-system\",\"system:authenticated\""as="<self>"asgroups="<lookup>"namespace="kube-system"uri="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager"response="200"
- webhook,配置
--audit-webhook-config-file=/etc/kubernetes/audit-webhook-kubeconfig --audit-webhook-mode=batch
开启,其中 audit-webhook-mode 支持 batch 和 blocking 两种格式,而 webhook 配置文件格式为
# clusters refers to the remote service.
clusters:
- name: name-of-remote-audit-service
cluster:
certificate-authority: /path/to/ca.pem # CA for verifying the remote service.
server: https://audit.example.com/audit # URL of remote service to query. Must use 'https'.
# users refers to the API server's webhook configuration.
users:
- name: name-of-api-server
user:
client-certificate: /path/to/cert.pem # cert for the webhook plugin to use
client-key: /path/to/key.pem # key matching the cert
# kubeconfig files require a context. Provide one for the API server.
current-context: webhook
contexts:
- context:
cluster: name-of-remote-audit-service
user: name-of-api-sever
name: webhook
所有的事件以 JSON 格式 POST 给 webhook server,如
{
"kind": "EventList",
"apiVersion": "audit.k8s.io/v1alpha1",
"items": [
{
"metadata": {
"creationTimestamp": null
},
"level": "Metadata",
"timestamp": "2017-06-15T23:07:40Z",
"auditID": "4faf711a-9094-400f-a876-d9188ceda548",
"stage": "ResponseComplete",
"requestURI": "/apis/rbac.authorization.k8s.io/v1beta1/namespaces/kube-public/rolebindings/system:controller:bootstrap-signer",
"verb": "get",
"user": {
"username": "system:apiserver",
"uid": "97a62906-e4d7-4048-8eda-4f0fb6ff8f1e",
"groups": [
"system:masters"
]
},
"sourceIPs": [
"127.0.0.1"
],
"objectRef": {
"resource": "rolebindings",
"namespace": "kube-public",
"name": "system:controller:bootstrap-signer",
"apiVersion": "rbac.authorization.k8s.io/v1beta1"
},
"responseStatus": {
"metadata": {},
"code": 200
}
}
]
}