RDK Resources
[*RDK Preferred*]
Code Management Facility
RDK Forums
[RDK Conferences]
RDK Support
Archives
Papers & Presentations Archive
Self Heal is a monitoring and recovery module.
It continuously monitors the system resources like CPU and Memory and monitors the critical Processes running.
Self heal also performs Connectivity tests.
In case of any problems encountered, Self Heal takes corrective actions like: Rebooting the device, Restarting required process based on predefined conditions.
Self-heal stores Reset Count and Reboot Count.
Self Heal functionality is handled by a set of scripts. These scripts are available in the RDK build by default.
Please ensure that below Self heal scripts are present on the device at the path "/usr/ccsp/tad".
resource_monitor.sh
task_health_monitor.sh
corrective_action.sh
self_heal_connectivity.sh
Please refer the below screenshot to verify self heal module was enabled or not,
Self Heal is enabled by default and is active at the time of boot up.
It periodically performs below actions.
1. By default, AVG CPU threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG CPU threshold value, Please refer the attached screenshot and do the following steps,
2. By default, AVG Memory threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG Memory threshold value, Please refer the attached screenshot and do the following steps,
3. Once it's reaches the threshold value, device will be rebooted automatically.
observation in /rdklogs/logs/SelfHeal.txt.0
RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
RDKB_SELFHEAL : Total memory in system is 949444
RDKB_SELFHEAL : Used memory in system is 148752
RDKB_SELFHEAL : Free memory in system is 800792
If it detects that any of the process is not running, it automatically restarts that particular Component.
Let us take the example of CcspLMLite Component :
Run a ps command to verify that CcspLMLite is up and running again with different process id
ps aux | grep Ccsp
2. kill CcspLMLite process by using the below command
kill -9 PID(CcspLMLite PID)
3. Verfiy whether the CcspLMLite Process was killed or not by using the below command
ps aux | grep Ccsp
4. After 60 seconds(default), it will automatically restart the Process. Please check the CcspLMLIte PID.
If Connectivity Test fails, device will go for reboot.
Validation : Using the below steps to validate the connectivity Test
unplug the ethernet LAN cable or ifconfig erouter0 down
1. Using selfHeal logs to trouble shoot the run-time errors. SelfHeal logs will be created the below path,
/rdklogs/logs/SelfHeal.txt.0
2. Resource Monitor sample Logs,
MEM :
RDKB_SELFHEAL : Total memory in system is 949444 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Used memory in system is 148772 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : Free memory in system is 800768 at timestamp 2019:09:24:10:17:08
RDKB_SELFHEAL : AvgMemUsed in % is 15
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM Memory threshold reached
RDKB_SELFHEAL : Total memory in system is 949444
RDKB_SELFHEAL : Used memory in system is 148752
RDKB_SELFHEAL : Free memory in system is 800792
CPU:
190924-10:17:09.055074 <128>CABLEMODEM[Raspberry]:<99000006><2019:09:24:10:17:09><B8:27:EB:50:C1:CF><ARMv7> RM CPU threshold reached
3. Process Monitor Sample Logs,
LMLite Process :
RDKB_SELFHEAL : <128>CABLEMODEM[Raspberry]:<99000007><2019:09:24:09:20:34><B8:27:EB:50:C1:CF><ARMv7> RM CcspLMLite process not running , restarting it
RDKB_SELFHEAL : Resetting process CcspLMLite
4. Connectivity Test Sample Logs ,
Successful Scenario :
190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com
Failure Scenario :
191007-09:00:13.899713 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
191007-09:00:13.909201 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.60.1
191007-09:00:13.918684 [RDKB_SELFHEAL] : IPv6 GW Address is:
191007-09:00:13.972966 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
191007-09:00:14.119985 DNS Response: fail to resolve this URL www.google.com
191007-09:00:14.152808 RDKB_SELFHEAL : Taking corrective action