Uploaded image for project: 'Near Realtime RAN Intelligent Controller'
  1. Near Realtime RAN Intelligent Controller
  2. RIC-1041

Frequent Pod Eviction in Near-RT RIC Due to Ephemeral-Storage Resource Limitation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Medium Medium
    • None
    • None
    • ric-dep
    • None

      Hello, during testing of the near-RT RIC deployment, I've observed frequent and regular pod evictions attributed to ephemeral-storage resource limitations. This issue appears to originate from the pods not requesting any ephemeral-storage resources during deployment, leading to Kubernetes evicting them when nodes run low on this resource. The corresponding log is pasted below:

      root@tianchang-Ubuntu:/home/tianchang/Desktop/proj/oran-sc/ric-dep/bin# kubectl get po -n ricplt
      NAME                                                         READY   STATUS    RESTARTS   AGE
      deployment-ricplt-a1mediator-7d5b85ff7d-mst5j                1/1     Running   0          2m51s
      deployment-ricplt-alarmmanager-6bd5fccfc8-lw5rg              1/1     Running   0          2m7s
      deployment-ricplt-appmgr-77986c9cbb-7lh2t                    1/1     Running   0          3m35s
      deployment-ricplt-e2mgr-78c987559f-vcjd2                     1/1     Running   0          3m14s
      deployment-ricplt-e2term-alpha-5dc768bcb7-brsz6              0/1     Pending   0          8s
      deployment-ricplt-e2term-alpha-5dc768bcb7-cn9p9              0/1     Evicted   0          3m3s
      deployment-ricplt-o1mediator-97fb6759b-trkdt                 1/1     Running   0          2m18s
      deployment-ricplt-rtmgr-78f768474-sv6b6                      1/1     Running   2          3m25s
      deployment-ricplt-submgr-56bb776b68-fwzdm                    1/1     Running   0          2m40s
      deployment-ricplt-vespamgr-84f7d87dfb-jxcms                  1/1     Running   0          2m29s
      r4-infrastructure-kong-7995f4679b-kx9gj                      0/2     Pending   0          13s
      r4-infrastructure-kong-7995f4679b-n7dj9                      0/2     Evicted   0          3m57s
      r4-infrastructure-prometheus-alertmanager-5798b78f48-7p9mq   0/2     Evicted   0          3m57s
      r4-infrastructure-prometheus-alertmanager-5798b78f48-qz588   0/2     Pending   0          2s
      r4-infrastructure-prometheus-server-c8ddcfdf5-ffkbn          0/1     Evicted   0          3m57s
      r4-infrastructure-prometheus-server-c8ddcfdf5-k6t5n          0/1     Evicted   0          19s
      r4-infrastructure-prometheus-server-c8ddcfdf5-r674b          0/1     Evicted   0          19s
      r4-infrastructure-prometheus-server-c8ddcfdf5-vhbxz          0/1     Pending   0          17s
      r4-infrastructure-prometheus-server-c8ddcfdf5-w74sr          0/1     Evicted   0          19s
      r4-infrastructure-prometheus-server-c8ddcfdf5-xsb42          0/1     Evicted   0          18s
      statefulset-ricplt-dbaas-server-0                            1/1     Running   0          3m46s
      root@tianchang-Ubuntu:/home/tianchang/Desktop/proj/oran-sc/ric-dep/bin# kubectl describe pod -n ricplt deployment-ricplt-e2term-alpha-5dc768bcb7-cn9p9
      Name:           deployment-ricplt-e2term-alpha-5dc768bcb7-cn9p9
      Namespace:      ricplt
      Priority:       0
      Node:           tianchang-ubuntu/
      Start Time:     Wed, 03 Jan 2024 11:07:59 -0500
      Labels:         app=ricplt-e2term-alpha
                      pod-template-hash=5dc768bcb7
                      release=r4-e2term
      Annotations:    <none>
      Status:         Failed
      Reason:         Evicted
      Message:        The node was low on resource: ephemeral-storage. Container container-ricplt-e2term was using 124Ki, which exceeds its request of 0. 
      IP:             
      IPs:            <none>
      Controlled By:  ReplicaSet/deployment-ricplt-e2term-alpha-5dc768bcb7
      Containers:
        container-ricplt-e2term:
          Image:       nexus3.o-ran-sc.org:10002/o-ran-sc/ric-plt-e2:6.0.4
          Ports:       4561/TCP, 38000/TCP, 36422/SCTP, 8088/TCP
          Host Ports:  0/TCP, 0/TCP, 0/SCTP, 0/TCP
          Liveness:    exec [/bin/sh -c ip=`hostname -i`;export RMR_SRC_ID=$ip;/opt/e2/rmr_probe -h $ip:38000] delay=10s timeout=1s period=10s #success=1 #failure=3
          Readiness:   exec [/bin/sh -c ip=`hostname -i`;export RMR_SRC_ID=$ip;/opt/e2/rmr_probe -h $ip:38000] delay=120s timeout=1s period=60s #success=1 #failure=3
          Environment Variables from:
            configmap-ricplt-e2term-env-alpha  ConfigMap  Optional: false
          Environment:
            SYSTEM_NAME:      SEP
            CONFIG_MAP_NAME:  /etc/config/log-level
            HOST_NAME:         (v1:spec.nodeName)
            SERVICE_NAME:     RIC_E2_TERM
            CONTAINER_NAME:   container-ricplt-e2term
            POD_NAME:         deployment-ricplt-e2term-alpha-5dc768bcb7-cn9p9 (v1:metadata.name)
          Mounts:
            /data/outgoing/ from vol-shared (rw)
            /etc/config from local-loglevel-file (rw)
            /opt/e2/router.txt from local-router-file (rw,path="router.txt")
            /tmp/rmr_verbose from local-router-file (rw,path="rmr_verbose")
            /var/run/secrets/kubernetes.io/serviceaccount from default-token-s87jw (ro)
      Volumes:
        local-router-file:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      configmap-ricplt-e2term-router-configmap
          Optional:  false
        local-loglevel-file:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      configmap-ricplt-e2term-loglevel-configmap
          Optional:  false
        vol-shared:
          Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
          ClaimName:  pvc-ricplt-e2term-alpha
          ReadOnly:   false
        default-token-s87jw:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  default-token-s87jw
          Optional:    false
      QoS Class:       BestEffort
      Node-Selectors:  <none>
      Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                       node.kubernetes.io/unreachable:NoExecute for 300s
      Events:
        Type     Reason     Age        From                       Message
        ----     ------     ----       ----                       -------
        Normal   Scheduled  <unknown>  default-scheduler          Successfully assigned ricplt/deployment-ricplt-e2term-alpha-5dc768bcb7-cn9p9 to tianchang-ubuntu
        Normal   Pulling    3m17s      kubelet, tianchang-ubuntu  Pulling image "nexus3.o-ran-sc.org:10002/o-ran-sc/ric-plt-e2:6.0.4"
        Normal   Pulled     2m4s       kubelet, tianchang-ubuntu  Successfully pulled image "nexus3.o-ran-sc.org:10002/o-ran-sc/ric-plt-e2:6.0.4"
        Normal   Created    2m3s       kubelet, tianchang-ubuntu  Created container container-ricplt-e2term
        Normal   Started    2m3s       kubelet, tianchang-ubuntu  Started container container-ricplt-e2term
        Warning  Evicted    23s        kubelet, tianchang-ubuntu  The node was low on resource: ephemeral-storage. Container container-ricplt-e2term was using 124Ki, which exceeds its request of 0.
        Normal   Killing    23s        kubelet, tianchang-ubuntu  Stopping container container-ricplt-e2term
      

      It appears that the helm charts in ric-dep/helm do not contain any specifications for ephemeral-storage resources, which could explain why the pods don't request any storage resources. This issue leads to instability in the near-RT RIC deployment, as critical pods are frequently evicted and restarted, potentially disrupting service. 

      For deployment, I followed the following installation guide: https://docs.o-ran-sc.org/projects/o-ran-sc-ric-plt-ric-dep/en/latest/installation-guides.html# .

      I have attached the full logs below. Thanks for your attention, any insights or suggestions on the potential causes and mitigations are appreciated!

       

        No reviews matched the request. Check your Options in the drop-down menu of this sections header.

            gabhijit Abhijit Gadgil
            tchyang Tianchang Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: