-
Epic
-
Resolution: Done
-
Highest
-
ZZZ_future, Dawn-R7, E, F, G
-
None
-
E2 check, validate and define how various overload and disconnect case are handled, incl. checking highload-results from dawn
e.g. we want to make sure that an undectected SCTP link failure to E2 node 1 does not affect handling of E2 node 2. Also we need well-defined behavior if a SCTP association is overloading E2T. When do we start dropping messages, or do we throttle the peer as per SCTP flow control. Also we need to clearly see how RMR behaves on the other side.
We might have to define config parameters for socket buffer sizes.
Also check some of the bugs listed here are actually related: https://jira.o-ran-sc.org/issues/?jql=project%20%3D%20RIC%20AND%20type%20%3D%20Bug%20and%20status%20!%3D%20done%20and%20createdDate%20%3E%20%272020-06-24%2000%3A00%27%20ORDER%20BY%20createdDate%20DESC
Maybe a good starting point is some kind of FMEA analysis around failure modes like: "slow sender/receiver, fast sender/receiver, bursty sender/receiver, stuck sender/receiver". the interplay of routing mgr, e2t, e2m and xapps and their failure cases.
Examples:
(A) xApp 1 gets stuck and does not process incoming RMR messages anymore. How do we detect this and how does RMR, and E2T behave (expectation is that E2T still serves everything else (other xapps ingrees and egress) and normally.
(B) same as (A) but focus on indication message served for a merged subscription in which xApp1 and xapp 2 receive the same message.
(C) E2 node 1 gets stuck and only way to detect is SCTP timeouts. E2T instance handling this E2 node must continue to serve other E2 instances.
(D) xApp fails and does not respond to routing manager anymore.
(E) Kubernetes interfaces related to failure detection and isolation slow.
- relates to
-
RIC-916 new reconnect timer in E2 to reject new connect for x seconds
- Done