Cilium Host Routing Mode

Cilium ConfigMap

cilium configmap 是配置cilium的地方:

root@node1:~# kubectl get cm -n kube-system | grep cilium
cilium-config                        34     7d21h
root@node1:~# kubectl edit cm -n kube-system cilium-config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
  agent-health-port: "9879"
  auto-direct-node-routes: "False"
  bpf-ct-global-any-max: "262144"
  bpf-ct-global-tcp-max: "524288"
  bpf-map-dynamic-size-ratio: "0.0025"
  cgroup-root: /run/cilium/cgroupv2
  clean-cilium-bpf-state: "false"
  clean-cilium-state: "false"
  cluster-name: default
  custom-cni-conf: "false"
  debug: "True"
  disable-cnp-status-updates: "True"
  enable-bpf-clock-probe: "True"
  enable-bpf-masquerade: "True"
  enable-host-legacy-routing: "False"
  enable-ip-masq-agent: "False"
  enable-ipv4: "True"
  enable-ipv4-masquerade: "True"
  enable-ipv6: "False"
  enable-ipv6-masquerade: "True"
  enable-remote-node-identity: "True"
  enable-well-known-identities: "False"
  etcd-config: |-
    ---
    endpoints:
      - https://192.168.64.8:2379
      - https://192.168.64.9:2379
      - https://192.168.64.10:2379

    # In case you want to use TLS in etcd, uncomment the 'ca-file' line
    # and create a kubernetes secret by following the tutorial in
    # https://cilium.link/etcd-config
    ca-file: "/etc/cilium/certs/ca_cert.crt"

    # In case you want client to server authentication, uncomment the following
    # lines and create a kubernetes secret by following the tutorial in
    # https://cilium.link/etcd-config
    key-file: "/etc/cilium/certs/key.pem"
    cert-file: "/etc/cilium/certs/cert.crt"
  identity-allocation-mode: kvstore
  ipam: kubernetes
  kube-proxy-replacement: probe
  kvstore: etcd
  kvstore-opt: '{"etcd.config": "/var/lib/etcd-config/etcd.config"}'
  monitor-aggregation: medium
  monitor-aggregation-flags: all
  operator-api-serve-addr: 127.0.0.1:9234
  preallocate-bpf-maps: "False"
  sidecar-istio-proxy-image: cilium/istio_proxy
  tunnel: vxlan
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"agent-health-port":"9879","auto-direct-node-routes":"False","bpf-ct-global-any-max":"262144","bpf-ct-global-tcp-max":"524288","bpf-map-dynamic-size-ratio":"0.0025","cgroup-root":"/run/cilium/cgroupv2","clean-cilium-bpf-state":"false","clean-cilium-state":"false","cluster-name":"default","custom-cni-conf":"false","debug":"False","disable-cnp-status-updates":"True","enable-bpf-clock-probe":"True","enable-bpf-masquerade":"False","enable-host-legacy-routing":"True","enable-ip-masq-agent":"False","enable-ipv4":"True","enable-ipv4-masquerade":"True","enable-ipv6":"False","enable-ipv6-masquerade":"True","enable-remote-node-identity":"True","enable-well-known-identities":"False","etcd-config":"---\nendpoints:\n  - https://192.168.64.8:2379\n  - https://192.168.64.9:2379\n  - https://192.168.64.10:2379\n\n# In case you want to use TLS in etcd, uncomment the 'ca-file' line\n# and create a kubernetes secret by following the tutorial in\n# https://cilium.link/etcd-config\nca-file: \"/etc/cilium/certs/ca_cert.crt\"\n\n# In case you want client to server authentication, uncomment the following\n# lines and create a kubernetes secret by following the tutorial in\n# https://cilium.link/etcd-config\nkey-file: \"/etc/cilium/certs/key.pem\"\ncert-file: \"/etc/cilium/certs/cert.crt\"","identity-allocation-mode":"kvstore","ipam":"kubernetes","kube-proxy-replacement":"probe","kvstore":"etcd","kvstore-opt":"{\"etcd.config\": \"/var/lib/etcd-config/etcd.config\"}","monitor-aggregation":"medium","monitor-aggregation-flags":"all","operator-api-serve-addr":"127.0.0.1:9234","preallocate-bpf-maps":"False","sidecar-istio-proxy-image":"cilium/istio_proxy","tunnel":"vxlan"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"cilium-config","namespace":"kube-system"}}
  creationTimestamp: "2023-07-08T12:42:52Z"
  name: cilium-config
  namespace: kube-system
  resourceVersion: "168401"
  uid: 0fbe4759-a8c4-457b-a06f-5fc6231272b9

具体每个字段的意思可参考:Cilium 配置详解

如果有需要修改的修改保存退出后需要重启cilium-agent生效:

root@node1:~# kubectl rollout restart ds/cilium -n kube-system

重启后进入任意一个cilium-agent容器中:

root@node1:/home/cilium# cilium status
KVStore:                 Ok   etcd: 3/3 connected, lease-ID=797e89450e0683a2, lock lease-ID=797e89450e0b2b6a, has-quorum=true: https://192.168.64.10:2379 - 3.5.4; https://192.168.64.8:2379 - 3.5.4 (Leader); https://192.168.64.9:2379 - 3.5.4
Kubernetes:              Ok   1.24 (v1.24.6) [linux/amd64]
Kubernetes APIs:         ["cilium/v2::CiliumClusterwideNetworkPolicy", "cilium/v2::CiliumNetworkPolicy", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "discovery/v1::EndpointSlice", "networking.k8s.io/v1::NetworkPolicy"]
KubeProxyReplacement:    Probe   [enp0s2 192.168.64.8]
Host firewall:           Disabled
CNI Chaining:            none
Cilium:                  Ok   1.12.1 (v1.12.1-4c9a630)
NodeMonitor:             Disabled
Cilium health daemon:    Ok   
IPAM:                    IPv4: 5/254 allocated from 10.233.64.0/24, 
BandwidthManager:        Disabled
Host Routing:            BPF # 注意这里
Masquerading:            BPF   [enp0s2]   10.233.64.0/24 [IPv4: Enabled, IPv6: Disabled] # 表明只是在enp0s2接口上进行NAT的接口。[两种模式,一种是BPF。一种是在Legacy Mode下使用IPtables]
Controller Status:       43/43 healthy
Proxy Status:            OK, ip 10.233.64.93, 0 redirects active on ports 10000-20000
Global Identity Range:   min 256, max 65535
Hubble:                  Disabled
Encryption:              Disabled
Cluster health:          3/3 reachable   (2023-07-16T10:01:41Z)

Host Routing

即使 Cilium 使用 eBPF 执行网络路由,默认情况下网络数据包仍会遍历节点的常规网络堆栈的某些部分。 这样可以确保所有数据包仍然遍历所有 iptables hooks,以防你依赖它们。然而,这样会增加大量开销。
在 Cilium 1.9 中引入了基于 eBPF 的​​ ​host-routing​​,来完全绕过 iptables 和上层主机堆栈,与常规 veth 设备操作相比,实现了更快的网络命名空间切换。 如果您的内核支持,此选项会自动启用。
在这里插入图片描述

eBPF host-Routing Legacy

Legacy是传统的eBPF host-Routing 模式,左侧是标准的网络流向过程,可以发现iptables 会对包进行处理。

eBPF host-Routing BPF

右侧是启用ebpf host-routing, 直接跨过iptables,必要条件:

 Kernel >= 5.10
 Direct-routing or tunneling
 eBPF-based 替换kube-proxy
 eBPF-based masquerading

此模式由于引入了两个helper()函数加持了eBPF host-Routing,那么在DataPath上就会有不同的地方:

  • bpf_redirect_peer()
    简单描述:
    同节点的Pod通信时候,Pod-A从自己的veth pair 的lxc出来以后直接送到Pod-B的veth pair的eth0网卡,而绕过Pod-B的veth pair的lxc网卡。

在这里插入图片描述

  • bpf_redirect_neigh()
    简单描述:
    不同节点Pod通信的时候,Pod-A从自己的veth pair的lxc出来以后被自己的ingress方向上的from-container处理(vxlan封装或是native routing)然后tail call到vxlan处理然后再走bpf_redirect_neigh()到物理网卡,然后到对端Pod-B所在的Node节点,此时再进行反向处理即可。这里有一点注意:就是from-container —> vxlan 封装 —> bpf_redirect_neigh () 处理。

在这里插入图片描述