RDK Resources
[*RDK Preferred*]
Code Management Facility
RDK Forums
[RDK Conferences]
RDK Support
Archives
Papers & Presentations Archive
Children Display |
---|
Table of Contents |
---|
Info |
---|
This Page is under Development |
Briefly describe in general terms the system/application and the purpose for which it is intended, written in non-technical terminology. Consider including a high-level architecture diagram for the system. The description should include, but is not limited to, the following:
<Briefly describe and graphically depict as appropriate the equipment, communications, and network configuration of the system in a way that a non-technical user can understand>
<Briefly describe and graphically depict as appropriate the equipment, communications, and network configuration of the system in a way that a non-technical user can understand>
<Provide detailed information and describe the procedures necessary to access the system. If applicable, include how to get a user ID and log on to the system, as well as the actions a user must take to change and/or reset a password.>
<Describe how system flow is happening mentioning checkpoints which can be verified during flow to ensure system is working as expected>
<Mention the validation artifacts that are essential to ensure the functionality is working as expected. Also any limitation while closing the validation process>
Describe the specific system function or feature in detail and depict graphically by including screen prints and descriptive narrative as appropriate. Ensure each screen print is captioned and has an associated tag providing appropriate alternative text
Follow the above for sub feature / use cases
<Identify the error messages that a user may receive and the likely cause(s) and/or possible corrective actions for the error>
<If applicable, describe any special circumstances, actions, exceptions, etc., that should be considered for troubleshooting.>
|
Self Heal is a monitoring and recovery module.
It continuously monitors the system resources like CPU and Memory and monitors the critical Processes running.
Self heal also performs Connectivity tests.
In case of any problems encountered, Self Heal takes corrective actions like: Rebooting the device, Restarting required process based on predefined conditions.
Self-heal stores Reset Count and Reboot Count.
Self Heal functionality is handled by a set of scripts. These scripts are available in the RDK build by default.
Please ensure that below Self heal scripts are present on the device at the path "/usr/ccsp/tad".
resource_monitor.sh
task_health_monitor.sh
corrective_action.sh
self_heal_connectivity.sh
Please refer the below screenshot to verify self heal module was enabled or not,
Self Heal is enabled by default and is active at the time of boot up.
It periodically performs below actions.
1. By default, AVG CPU threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG CPU threshold value, Please refer the attached screenshot and do the following steps,
2. By default, AVG Memory threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG Memory threshold value, Please refer the attached screenshot and do the following steps,
3. Once it's reaches the threshold value, device will be rebooted automatically.
observation in /rdklogs/logs/SelfHeal.txt.0
RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
RDKB_SELFHEAL : Total memory in system is 949444
RDKB_SELFHEAL : Used memory in system is 148752
RDKB_SELFHEAL : Free memory in system is 800792
If it detects that any of the process is not running, it automatically restarts that particular Component.
Let us take the example of CcspLMLite Component :
Run a ps command to verify that CcspLMLite is up and running again with different process id
ps aux | grep Ccsp
2. kill CcspLMLite process by using the below command
kill -9 PID(CcspLMLite PID)
3. Verfiy whether the CcspLMLite Process was killed or not by using the below command
ps aux | grep Ccsp
4. After 60 seconds(default), it will automatically restart the Process. Please check the CcspLMLIte PID.
If Connectivity Test fails, device will go for reboot.
Validation : Using the below steps to validate the connectivity Test
unplug the ethernet LAN cable or ifconfig erouter0 down
1. Using selfHeal logs to trouble shoot the run-time errors. SelfHeal logs will be created the below path,
/rdklogs/logs/SelfHeal.txt.0
2. Resource Monitor sample Logs,
MEM :
RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
RDKB_SELFHEAL : Total memory in system is 949444
RDKB_SELFHEAL : Used memory in system is 148752
RDKB_SELFHEAL : Free memory in system is 800792
CPU:
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM CPU threshold reached
3. Process Monitor Sample Logs,
LMLite Process :
RDKB_SELFHEAL : <128>CABLEMODEM[Raspberry]:<99000007><2019:09:24:09:20:34><B8:27:EB:50:C1:CF><ARMv7> RM CcspLMLite process not running , restarting it
RDKB_SELFHEAL : Resetting process CcspLMLite
4. Connectivity Test Sample Logs ,
Successful Scenario :
190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com
Failure Scenario :
191007-09:00:13.899713 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
191007-09:00:13.909201 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.60.1
191007-09:00:13.918684 [RDKB_SELFHEAL] : IPv6 GW Address is:
191007-09:00:13.972966 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
191007-09:00:14.119985 DNS Response: fail to resolve this URL www.google.com
191007-09:00:14.152808 RDKB_SELFHEAL : Taking corrective action
...
Contact
...
Organization
...
Phone
...
...
Role
...
<Contact Name>
...
<Organization>
...
<Phone>
...
<Email>
...