Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Children Display

Table of Contents

...

stylenone

Introduction

Self Heal is a monitoring and recovery module.

...

  • Resource monitoring: Monitors memory / cpu usage and if it goes beyond threshold, it reboots the device.
  • Process monitoring: It will periodically monitors status of the critical processes.
    • Ccsp processes: If any of these processes crashed, it will be restarted via Self Heal.
    • "CcspCrSsp": If this process is crashed, device will be rebooted.
    • "syseventd": If syseventd is crashed, device will be rebooted.
  • Connectivity test: If DNS or WAN_IP is down, device will stop the LAN functionality.

Resource Monitor - Monitors CPU and MEMORY

...

                1.  By default, AVG CPU threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG CPU threshold value, Please refer the attached screenshot and do the following steps,

...

RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
 RDKB_SELFHEAL : Total memory in system is 949444
 RDKB_SELFHEAL : Used memory in system is 148752
 RDKB_SELFHEAL : Free memory in system is 800792

     

Process Monitor - Monitors  the Process Periodically based on Process id's

...

If it detects that any of the process is not running, it automatically restarts that particular Component.

...

        4. After 60 seconds(default), it will automatically restart  the Process. Please check the CcspLMLIte PID.

Observation in /rdklogs/logs/SelfHeal.txt.0

RDKB_SELFHEAL : <128>CABLEMODEM[Raspberry]:<99000007><2019:09:24:09:20:34><B8:27:EB:50:C1:CF><ARMv7> RM CcspLMLite process not running , restarting it

 RDKB_SELFHEAL : Resetting process CcspLMLite

Connectivity Test - Ping Functionality

                     Once the WAN_IP/DNS gets down, Device will stop the PING functionality.

Observed the below logs in /rdklogs/logs/SelfHeal.txt.0 - successful scenario

190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com

                                    If Connectivity Test fails, device will go for reboot.

        Validation :   Using the below steps to validate the connectivity Test

                                   unplug the ethernet LAN cable or ifconfig erouter0 downObserved the below logs in /rdklogs/logs/SelfHeal.txt.0 - Failure scenario

Troubleshooting

                    1.  Using selfHeal logs to trouble shoot the run-time errors. SelfHeal logs will be created the below path,

...

                    2.  Resource Monitor sample Logs Logs, 

                            MEM :

                                  RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : AvgMemUsed in % is 15
                                 190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
                                 RDKB_SELFHEAL : Total memory in system is 949444
                                 RDKB_SELFHEAL : Used memory in system is 148752
                                 RDKB_SELFHEAL : Free memory in system is 800792

                           CPUCPU:

                                  190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM CPU threshold reached

                   3. Process Monitor Sample Logs,

                         LMLite Process :

                                      RDKB_SELFHEAL : <128>CABLEMODEM[Raspberry]:<99000007><2019:09:24:09:20:34><B8:27:EB:50:C1:CF><ARMv7> RM CcspLMLite process not running , restarting it

                                      RDKB_SELFHEAL : Resetting process CcspLMLite

                 4. Connectivity Test Sample Logs ,

                              Successful Scenario :

                                        190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
                                        190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
                                        190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
                                        190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
                                        190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com

                             Failure Scenario :

                                         191007-09:00:13.899713 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
                                         191007-09:00:13.909201 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.60.1
                                         191007-09:00:13.918684 [RDKB_SELFHEAL] : IPv6 GW Address is:
                                         191007-09:00:13.972966 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
                                         191007-09:00:14.119985 DNS Response: fail to resolve this URL www.google.com
                                          191007-09:00:14.152808 RDKB_SELFHEAL : Taking corrective action