版本: 最新版本-3.5

使用 Operator 部署 StarRocks

本文介绍如何使用 StarRocks Operator 在 Kubernetes 集群上自动部署和管理 StarRocks 集群。

注意

StarRocks k8s operator 被设计为 level 2 operator。请参见 https://sdk.operatorframework.io/docs/overview/operator-capabilities/ 以了解有关 level 2 operator 功能的更多信息。

工作原理

准备工作

创建 Kubernetes 集群

您可以使用云托管 Kubernetes 服务，例如 Amazon Elastic Kubernetes Service (EKS) 或 Google Kubernetes Engine (GKE) 集群，或者自管理的 Kubernetes 集群。

创建 Amazon EKS 集群
1. 检查您的环境中是否安装了以下命令行工具
  1. 安装和配置 AWS 命令行工具 AWS CLI。
  2. 安装 EKS 集群命令行工具 eksctl。
  3. 安装 Kubernetes 集群命令行工具 kubectl。
2. 使用以下方法之一创建 EKS 集群
  1. 使用 eksctl 快速创建 EKS 集群.
  2. 使用 AWS 控制台和 AWS CLI 手动创建 EKS 集群.
创建 GKE 集群

在开始创建 GKE 集群之前，请确保完成所有先决条件。然后按照创建 GKE 集群中提供的说明创建 GKE 集群。
创建自管理的 Kubernetes 集群

按照使用 kubeadm 引导集群中提供的说明创建自管理的 Kubernetes 集群。您可以使用 Minikube 和 Docker Desktop 以最少的步骤创建单节点私有 Kubernetes 集群。

部署 StarRocks Operator

添加自定义资源 StarRocksCluster。

kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/starrocks.com_starrocksclusters.yaml

部署 StarRocks Operator。您可以选择使用默认配置文件或自定义配置文件来部署 StarRocks Operator。
1. 使用默认配置文件部署 StarRocks Operator。
```
kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml
```
  StarRocks Operator 部署到 starrocks 命名空间，并管理所有命名空间下的所有 StarRocks 集群。
2. 使用自定义配置文件部署 StarRocks Operator。
  - 下载配置文件 **operator.yaml**，该文件用于部署 StarRocks Operator。
    curl -O https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/deploy/operator.yaml
  - 修改配置文件 **operator.yaml** 以满足您的需求。
  - 部署 StarRocks Operator。
    kubectl apply -f operator.yaml

检查 StarRocks Operator 的运行状态。如果 Pod 处于 Running 状态，并且 Pod 内的所有容器都处于 READY 状态，则 StarRocks Operator 按预期运行。

$ kubectl -n starrocks get pods
NAME                                  READY   STATUS    RESTARTS   AGE
starrocks-controller-65bb8679-jkbtg   1/1     Running   0          5m6s

注意

如果您自定义了 StarRocks Operator 所在的命名空间，则需要将 starrocks 替换为您自定义的命名空间的名称。

部署 StarRocks 集群

您可以直接使用 StarRocks 提供的示例配置文件来部署 StarRocks 集群（使用自定义资源 StarRocks Cluster 实例化的对象）。例如，您可以使用 **starrocks-fe-and-be.yaml** 部署包含三个 FE 节点和三个 BE 节点的 StarRocks 集群。

kubectl apply -f https://raw.githubusercontent.com/StarRocks/starrocks-kubernetes-operator/main/examples/starrocks/starrocks-fe-and-be.yaml

下表描述了 **starrocks-fe-and-be.yaml** 文件中的一些重要字段。

字段	描述
Kind	对象的资源类型。该值必须为 `StarRocksCluster`。
元数据	Metadata，其中嵌套了以下子字段 `name`：对象的名称。每个对象名称唯一标识同一资源类型的对象。 `namespace`：对象所属的命名空间。
Spec	对象的预期状态。有效值为 `starRocksFeSpec`、`starRocksBeSpec` 和 `starRocksCnSpec`。

您也可以使用修改后的配置文件来部署 StarRocks 集群。有关支持的字段和详细说明，请参见 api.md。

部署 StarRocks 集群需要一段时间。在此期间，您可以使用命令 kubectl -n starrocks get pods 来检查 StarRocks 集群的启动状态。如果所有 Pod 都处于 Running 状态，并且 Pod 内的所有容器都处于 READY 状态，则 StarRocks 集群按预期运行。

注意

如果您自定义了 StarRocks 集群所在的命名空间，则需要将 starrocks 替换为您自定义的命名空间的名称。

$ kubectl -n starrocks get pods
NAME                                  READY   STATUS    RESTARTS   AGE
starrocks-controller-65bb8679-jkbtg   1/1     Running   0          22h
starrockscluster-sample-be-0          1/1     Running   0          23h
starrockscluster-sample-be-1          1/1     Running   0          23h
starrockscluster-sample-be-2          1/1     Running   0          22h
starrockscluster-sample-fe-0          1/1     Running   0          21h
starrockscluster-sample-fe-1          1/1     Running   0          21h
starrockscluster-sample-fe-2          1/1     Running   0          22h

注意

如果某些 Pod 长时间无法启动，您可以使用 kubectl logs -n starrocks <pod_name> 查看日志信息，或使用 kubectl -n starrocks describe pod <pod_name> 查看事件信息以定位问题。

管理 StarRocks 集群

访问 StarRocks 集群

可以通过其关联的 Service 访问 StarRocks 集群的组件，例如 FE Service。有关 Service 及其访问地址的详细说明，请参见 api.md 和 Services。

注意

默认情况下只部署 FE Service。如果您需要部署 BE Service 和 CN Service，则需要在 StarRocks 集群配置文件中配置 starRocksBeSpec 和 starRocksCnSpec。

Service 的名称默认为 <cluster name>-<component name>-service，例如 starrockscluster-sample-fe-service。您也可以在每个组件的 spec 中指定 Service 名称。

从 Kubernetes 集群内部访问 StarRocks 集群

从 Kubernetes 集群内部，可以通过 FE Service 的 ClusterIP 访问 StarRocks 集群。

获取 FE Service 的内部虚拟 IP 地址 CLUSTER-IP 和端口 PORT(S)。

$ kubectl -n starrocks get svc 
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                               AGE
be-domain-search                     ClusterIP   None             <none>        9050/TCP                              23m
fe-domain-search                     ClusterIP   None             <none>        9030/TCP                              25m
starrockscluster-sample-fe-service   ClusterIP   10.100.162.xxx   <none>        8030/TCP,9020/TCP,9030/TCP,9010/TCP   25m

从 Kubernetes 集群内部使用 MySQL 客户端访问 StarRocks 集群。
```
mysql -h 10.100.162.xxx -P 9030 -uroot
```

从 Kubernetes 集群外部访问 StarRocks 集群

从 Kubernetes 集群外部，您可以通过 FE Service 的 LoadBalancer 或 NodePort 访问 StarRocks 集群。本主题以 LoadBalancer 为例

运行命令 kubectl -n starrocks edit src starrockscluster-sample 更新 StarRocks 集群配置文件，并将 starRocksFeSpec 的 Service 类型更改为 LoadBalancer。

starRocksFeSpec:
  image: starrocks/fe-ubuntu:3.0-latest
  replicas: 3
  requests:
    cpu: 4
    memory: 16Gi
  service:            
    type: LoadBalancer # specified as LoadBalancer

获取 FE Service 向外公开的 IP 地址 EXTERNAL-IP 和端口 PORT(S)。

$ kubectl -n starrocks get svc
NAME                                 TYPE           CLUSTER-IP       EXTERNAL-IP                                                              PORT(S)                                                       AGE
be-domain-search                     ClusterIP      None             <none>                                                                   9050/TCP                                                      127m
fe-domain-search                     ClusterIP      None             <none>                                                                   9030/TCP                                                      129m
starrockscluster-sample-fe-service   LoadBalancer   10.100.162.xxx   a7509284bf3784983a596c6eec7fc212-618xxxxxx.us-west-2.elb.amazonaws.com   8030:30629/TCP,9020:32544/TCP,9030:32244/TCP,9010:32024/TCP   129m               ClusterIP      None            <none>                                                                   9030/TCP                                                      23h

登录到您的机器主机，并使用 MySQL 客户端访问 StarRocks 集群。

mysql -h a7509284bf3784983a596c6eec7fc212-618xxxxxx.us-west-2.elb.amazonaws.com -P9030 -uroot

升级 StarRocks 集群

升级 BE 节点

运行以下命令指定新的 BE 镜像文件，例如 starrocks/be-ubuntu:latest

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"image":"starrocks/be-ubuntu:latest"}}}'

升级 FE 节点

运行以下命令指定新的 FE 镜像文件，例如 starrocks/fe-ubuntu:latest

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksFeSpec":{"image":"starrocks/fe-ubuntu:latest"}}}'

升级过程会持续一段时间。您可以运行命令 kubectl -n starrocks get pods 查看升级进度。

扩展 StarRocks 集群

扩展 BE 集群

运行以下命令将 BE 集群扩展到 9 个节点

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"replicas":9}}}'

缩减 BE 集群

缩减 BE 节点时，您需要一次缩减一个，并等待 BE 上的 Tablet 重新分配后再继续。如果存在单副本表，则离线 BE 节点可能会导致数据丢失（如果 Tablet 未能重新分配）。

执行以下命令将具有 10 个 BE 节点的集群缩减到 9 个。

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksBeSpec":{"replicas":9}}}'

缩减后，您必须手动删除 alive 状态为 false 的节点。

Tablet 的重新分配需要一些时间。您可以通过执行 SHOW PROC '/statistic'; 来检查进度。

扩展 FE 集群

运行以下命令将 FE 集群扩展到 4 个节点

kubectl -n starrocks patch starrockscluster starrockscluster-sample --type='merge' -p '{"spec":{"starRocksFeSpec":{"replicas":4}}}'

扩展过程会持续一段时间。您可以使用命令 kubectl -n starrocks get pods 查看扩展进度。

CN 集群的自动扩展

运行命令 kubectl -n starrocks edit src starrockscluster-sample 配置 CN 集群的自动扩展策略。您可以将 CN 的资源指标指定为平均 CPU 利用率、平均内存使用量、弹性伸缩阈值、弹性伸缩上限和弹性伸缩下限。弹性伸缩上限和弹性伸缩下限指定允许弹性伸缩的 CN 的最大数量和最小数量。

注意

如果配置了 CN 集群的自动扩展策略，请从 StarRocks 集群配置文件中的 starRocksCnSpec 中删除 replicas 字段。

Kubernetes 还支持使用 behavior 根据业务场景自定义扩展行为，帮助您实现快速或缓慢扩展或禁用扩展。有关自动扩展策略的更多信息，请参见 Horizontal Pod Scaling。

以下是 StarRocks 提供的模板，可帮助您配置自动扩展策略

  starRocksCnSpec:
    image: starrocks/cn-ubuntu:latest
    limits:
      cpu: 16
      memory: 64Gi
    requests:
      cpu: 16
      memory: 64Gi
    autoScalingPolicy: # Automatic scaling policy of the CN cluster.
      maxReplicas: 10 # The maximum number of CNs is set to 10.
      minReplicas: 1 # The minimum number of CNs is set to 1.
      # operator creates an HPA resource based on the following field.
      # see https://kubernetes.ac.cn/docs/tasks/run-application/horizontal-pod-autoscale/ for more information.
      hpaPolicy:
        metrics: # Resource metrics
          - type: Resource
            resource:
              name: memory  # The average memory usage of CNs is specified as a resource metric.
              target:
                # The elastic scaling threshold is 60%.
                # When the average memory utilization of CNs exceeds 60%, the number of CNs increases for scale-out.
                # When the average memory utilization of CNs is below 60%, the number of CNs decreases for scale-in.
                averageUtilization: 60
                type: Utilization
          - type: Resource
            resource:
              name: cpu # The average CPU utilization of CNs is specified as a resource metric.
              target:
                # The elastic scaling threshold is 60%.
                # When the average CPU utilization of CNs exceeds 60%, the number of CNs increases for scale-out.
                # When the average CPU utilization of CNs is below 60%, the number of CNs decreases for scale-in.
                averageUtilization: 60
                type: Utilization
        behavior: #  The scaling behavior is customized according to business scenarios, helping you achieve rapid or slow scaling or disable scaling.
          scaleUp:
            policies:
              - type: Pods
                value: 1
                periodSeconds: 10
          scaleDown:
            selectPolicy: Disabled

下表描述了一些重要字段

弹性伸缩上限和下限。

maxReplicas: 10 # The maximum number of CNs is set to 10.
minReplicas: 1 # The minimum number of CNs is set to 1.

弹性伸缩阈值。

# For example, the average CPU utilization of CNs is specified as a resource metric.
# The elastic scaling threshold is 60%.
# When the average CPU utilization of CNs exceeds 60%, the number of CNs increases for scale-out.
# When the average CPU utilization of CNs is below 60%, the number of CNs decreases for scale-in.
- type: Resource
  resource:
    name: cpu
    target:
      averageUtilization: 60

常见问题

**问题描述：** 当使用 kubectl apply -f xxx 安装自定义资源 StarRocksCluster 时，返回错误 The CustomResourceDefinition 'starrocksclusters.starrocks.com' is invalid: metadata.annotations: Too long: must have at most 262144 bytes。

**原因分析：** 每次使用 kubectl apply -f xxx 创建或更新资源时，都会添加元数据注释 kubectl.kubernetes.io/last-applied-configuration。此元数据注释采用 JSON 格式，并记录 _last-applied-configuration_。kubectl apply -f xxx" 适用于大多数情况，但在极少数情况下，例如当自定义资源的配置文件太大时，可能会导致元数据注释的大小超过限制。

**解决方案：** 如果您是第一次安装自定义资源 StarRocksCluster，建议使用 kubectl create -f xxx。如果自定义资源 StarRocksCluster 已经安装在环境中，并且您需要更新其配置，建议使用 kubectl replace -f xxx。

工作原理​

准备工作​

创建 Kubernetes 集群​

部署 StarRocks Operator​

部署 StarRocks 集群​

管理 StarRocks 集群​

访问 StarRocks 集群​

从 Kubernetes 集群内部访问 StarRocks 集群​

从 Kubernetes 集群外部访问 StarRocks 集群​

升级 StarRocks 集群​

升级 BE 节点​

升级 FE 节点​

扩展 StarRocks 集群​

扩展 BE 集群​

缩减 BE 集群​

扩展 FE 集群​

CN 集群的自动扩展​

常见问题​

您觉得这篇文档怎么样？