1. Introduction


Selfheal
is another feature implemented in Test And Diagnostic Component. SelfHeal is a monitoring and recovery module.

Self-heal Periodically monitors the below scenarios:

  • CPU usage
  • Memory Usage
  • Critical RDK-B processes

Self-heal stores Reset Count and Reboot Count.
Self-heal takes required action like: Rebooting the device, Restarting required process based on predefined conditions.
Self-heal does connectivity test.

2. Design Considerations

Self Heal functionality is handled by a set of scripts. These scripts are available in the RDK-B RPI build by default, and customised to rpi system specificification referring to actual devices.

Please ensure that below Self heal scripts are present on the device at the path "/usr/ccsp/tad".

  • resource_monitor.sh

  • task_health_monitor.sh

  • corrective_action.sh

  • self_heal_connectivity_test.sh

2.1. Resource Monitoring

       "resource_monitor.sh" script is used for monitoring Memory and CPU usage.Monitors the resources periodically (eg: 60 seconds). If "Average Memory Used" reaches threshold value, reboot action will be executed. 

For Resource Monitor Sequence ,

                          i)  First cycle onwards -  sleep will calculate based on below commands

                                                   Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_UsageComputeWindow *60 .  For example : By default, RMInterval value as 1 . so, sleep will be 60.

2.2. Process Monitoring

  1.         "task_health_monitor.sh" script is used for monitoring all RDKB processes .Monitors the processes periodically (eg:- 60 seconds) based on it's process id (pid).
  2.           Based on the process id availability, required action will be taken such as restarting the process, rebooting the device.

       Important points to remember  :                 

  • Ccsp processes: If any of these processes crashed, it will be restarted via Self Heal.
  • "CcspCrSsp": If this process is crashed, device will be rebooted.
  • "syseventd": If syseventd is crashed, device will be rebooted.


For  task monitor sequence,

                          i)  First cycle onwards -  sleep will calculate based on below commands

                                                   Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_UsageComputeWindow *60 .  For example : By default, RMInterval value as 1 . so, sleep will be 60.


2.3. Connectivity Test

         "self_heal_connectivity_test.sh" script is used for ping test.Ping test will be done through server IP/URI (this needs to be configured). If server IP/URI is not configured, Ping test won't be executed and no action will be taken. If server is configured and ping test fails, reboot action will be executed.

For Connectivity Test  sequence ,

                            i) After boot-up,Very  First cycle  - random sleep functionality call was called.

                           ii) Second cycle onwards - sleep will calculate based on below commands

                                                  Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_PingInterval - value of this command * 60 . For example : By default, PingInterval value as 60 . so, sleep will be 3600.

2.4. SelfHeal Logs

          Self heal logs will be created on below folder,

                             /rdklogs/logs/SelfHeal.txt.0

3. Architecture

3.1. Self Heal DataModel Flow 

                             


3.2. Process Monitor Flow 

                                                    

    

4. Data Model

4.1. Lists of self heal supported DataModel commands 


S.NO Module DMCLI COMMANDS Description
1.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.X_RDKCENTRAL-COM_Enable Used to enable/disable self heal functionality
2.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.X_RDKCENTRAL-COM_MaxRebootCount Used to set the maximum reboot count for rebooting the rpi device once the cpu and memory threshold value was reached as 100(default value).   By default, it set as 3.  If it reaches 3 ..after that it doesn't do the reboot functionality. If we want, we can increase the reboot count also.
3.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.X_RDKCENTRAL-COM_MaxResetCount Used to set the maximum reset count for connectivity test. for example, if  it reaches 3(3 times it will do the reboot), after that it won't do the reboot. If we want , we can increase the reset count also.
4.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable Using this command to enable the PING function for connectivity tests.By default, it set as TRUE.
5.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.X_RDKCENTRAL-COM_DNS_URL Using this command to set the DNS url for PING function for connectivity test , By default, it set as www.google.com
6.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_PingInterval Using this command to set the PING interval time for connectivity test . By default, it set as 60.  Range of ping interval is min 15 to max 1440.
7.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CorrectiveAction Using this command to enable/disable for Corrective Action for self heal scripts. By default, it set as TRUE.
8.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_UsageComputeWindow Using this command to set the resource monitor interval time. By default, it set as 1.
9.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold Using this command to set the AVG CPU threshold value. By default, it set as 100
10.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold Using this command to set the AVG Memory threshold value. By default, it set as 100
11.

TDM(TestandDiagnostic)

XML Mapper  -  TestAndDiagnostic.XML

Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_RebootInterval Using this command to set the reboot interval time for connectivity test. By default, it can be set as 28800. If DNS or WAN_IP gets down, device will go for reboot after 28800. If device will go for reboot..Internally PING functionality will check the diff of current time and last reboot time will be greater than the reboot interval time..then only device go for reboot.
12.

PAM Module 

XML Mapper -

TR181-USGv2.XML

Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason

Using this command to know about why our rpi device will be rebooted,

This command value will be the current reboot status.

5. Limitations

             RPI doesn't have IPv6 support functionality. So we skip the ipv6 logics  from "self_heal_connectivity_test.sh" and "task_health_monitor.sh".

            

  • No labels