Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Children Display

Table of Contents

Info

This Page is under Development

Introduction

Briefly describe in general terms the system/application and the purpose for which it is intended, written in non-technical terminology. Consider including a high-level architecture diagram for the system. The description should include, but is not limited to, the following:

  • Key features or major functions/use cases performed by the system/application
  • Architecture of the system in non-technical terms (e.g., client server, Web-based, etc.)
  • System environment or special conditions

Environment Setup

  • Set-up Considerations

<Briefly describe and graphically depict as appropriate the equipment, communications, and network configuration of the system in a way that a non-technical user can understand>

  • User Access Considerations

<Briefly describe and graphically depict as appropriate the equipment, communications, and network configuration of the system in a way that a non-technical user can understand>

  • Accessing the System

<Provide detailed information and describe the procedures necessary to access the system. If applicable, include how to get a user ID and log on to the system, as well as the actions a user must take to change and/or reset a password.>

  • System Work Flow

<Describe how system flow is happening mentioning checkpoints which can be verified during flow to ensure system is working as expected>

  • Exiting the System

<Mention the validation artifacts that are essential to ensure the functionality is working as expected. Also any limitation while closing the validation process>

Executing System

Describe the specific system function or feature in detail and depict graphically by including screen prints and descriptive narrative as appropriate. Ensure each screen print is captioned and has an associated tag providing appropriate alternative text

Follow the above for sub feature / use cases

Troubleshooting

  • Error Messages

<Identify the error messages that a user may receive and the likely cause(s) and/or possible corrective actions for the error>

  • Special Considerations

<If applicable, describe any special circumstances, actions, exceptions, etc., that should be considered for troubleshooting.>

Support

stylenone

Introduction

Self Heal is a monitoring and recovery module.

It continuously monitors the system resources like CPU and Memory and monitors the critical Processes running.

Self heal also performs Connectivity tests.

In case of any problems encountered, Self Heal takes corrective actions like: Rebooting the device, Restarting required process based on predefined conditions.

Self-heal stores Reset Count and Reboot Count.

Environment Setup

Self Heal functionality is handled by a set of scripts. These scripts are available in the RDK build by default.

Please ensure that below Self heal scripts are present on the device at the path "/usr/ccsp/tad".

  • resource_monitor.sh

  • task_health_monitor.sh

  • corrective_action.sh

  • self_heal_connectivity.sh

Please refer the below screenshot to verify self heal module was enabled or not,

Image Added          

Executing System

Self Heal is enabled by default and is active at the time of boot up.

It periodically performs below actions.

  • Resource monitoring: Monitors memory / cpu usage and if it goes beyond threshold, it reboots the device.
  • Process monitoring: It will periodically monitors status of the critical processes.
    • Ccsp processes: If any of these processes crashed, it will be restarted via Self Heal.
    • "CcspCrSsp": If this process is crashed, device will be rebooted.
    • "syseventd": If syseventd is crashed, device will be rebooted.
  • Connectivity test: If DNS or WAN_IP is down, device will stop the LAN functionality.

Resource Monitor - Monitors CPU and MEMORY

                1.  By default, AVG CPU threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG CPU threshold value, Please refer the attached screenshot and do the following steps,

Image Added

                 2. By default, AVG Memory threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG Memory threshold value, Please refer the attached screenshot and do the following steps,

Image Added

                  3. Once it's reaches the threshold value, device will be rebooted automatically.           

observation in /rdklogs/logs/SelfHeal.txt.0

RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
 RDKB_SELFHEAL : Total memory in system is 949444
 RDKB_SELFHEAL : Used memory in system is 148752
 RDKB_SELFHEAL : Free memory in system is 800792

     

Process Monitor - Monitors  the Process Periodically based on Process id's

If it detects that any of the process is not running, it automatically restarts that particular Component.

Let us take the example of CcspLMLite Component :

  1. Run a ps command to verify that CcspLMLite is up and running again with different process id

                                 ps aux | grep Ccsp

        2. kill CcspLMLite process by using the below command

                                kill -9 PID(CcspLMLite PID)

        3.  Verfiy whether the CcspLMLite Process was killed or not by using the below command

                                 ps aux | grep Ccsp

        4. After 60 seconds(default), it will automatically restart  the Process. Please check the CcspLMLIte PID.

Image Added

Connectivity Test - Ping Functionality

                                    If Connectivity Test fails, device will go for reboot.

        Validation :   Using the below steps to validate the connectivity Test

                                   unplug the ethernet LAN cable or ifconfig erouter0 down

Troubleshooting

                    1.  Using selfHeal logs to trouble shoot the run-time errors. SelfHeal logs will be created the below path,

                                                 /rdklogs/logs/SelfHeal.txt.0

                    2.  Resource Monitor sample Logs, 

                            MEM :

                                  RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
                                  RDKB_SELFHEAL : AvgMemUsed in % is 15
                                 190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
                                 RDKB_SELFHEAL : Total memory in system is 949444
                                 RDKB_SELFHEAL : Used memory in system is 148752
                                 RDKB_SELFHEAL : Free memory in system is 800792

                         CPU:

                                  190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM CPU threshold reached

                   3. Process Monitor Sample Logs,

                         LMLite Process :

                                      RDKB_SELFHEAL : <128>CABLEMODEM[Raspberry]:<99000007><2019:09:24:09:20:34><B8:27:EB:50:C1:CF><ARMv7> RM CcspLMLite process not running , restarting it

                                      RDKB_SELFHEAL : Resetting process CcspLMLite

                 4. Connectivity Test Sample Logs ,

                              Successful Scenario :

                                        190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
                                        190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
                                        190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
                                        190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
                                        190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com

                             Failure Scenario :

                                         191007-09:00:13.899713 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
                                         191007-09:00:13.909201 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.60.1
                                         191007-09:00:13.918684 [RDKB_SELFHEAL] : IPv6 GW Address is:
                                         191007-09:00:13.972966 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
                                         191007-09:00:14.119985 DNS Response: fail to resolve this URL www.google.com
                                          191007-09:00:14.152808 RDKB_SELFHEAL : Taking corrective action





      

...

Contact

...

Organization

...

Phone

...

Email

...

Role

...

<Contact Name>

...

<Organization>

...

<Phone>

...

<Email>

...