Self Heal is a monitoring and recovery module.
It continuously monitors the system resources like CPU and Memory and monitors the critical Processes running.
Self heal also performs Connectivity tests.
In case of any problems encountered, Self Heal takes corrective actions like: Rebooting the device, Restarting required process based on predefined conditions.
Self-heal stores Reset Count and Reboot Count.
Self Heal functionality is handled by a set of scripts. These scripts are available in the RDK build by default.
Please ensure that below Self heal scripts are present on the device at the path "/usr/ccsp/tad".
resource_monitor.sh
task_health_monitor.sh
corrective_action.sh
self_heal_connectivity.sh
Please refer the below code snippet to verify self heal module was enabling by default or not,
root@Filogic-GW:/rdklogs/logs# dmcli eRT getv Device.SelfHeal.X_RDKCENTRAL-COM_Enable CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.X_RDKCENTRAL-COM_Enable type: bool, value: true root@Filogic-GW:/rdklogs/logs# root@Filogic-GW:/rdklogs/logs# ps -alx | grep reso 4 0 4531 1 20 0 3496 2744 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/resource_monitor.sh 0 0 16777 10009 20 0 2244 808 pipe_w S+ pts/0 0:00 grep reso root@Filogic-GW:/rdklogs/logs# ps -alx | grep self 4 0 4528 1 20 0 3628 2836 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/self_heal_connectivity_test.sh 4 0 4539 1 20 0 4288 3468 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/selfheal_aggressive.sh 0 0 16816 10009 20 0 2244 824 pipe_w S+ pts/0 0:00 grep self |
Self Heal is enabled by default and is active at the time of boot up.
It periodically performs below actions.
DM parameters ============= root@Filogic-GW:/# dmcli eRT getv Device.SelfHeal. CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.X_RDKCENTRAL-COM_FreeMemThreshold type: uint, value: 0 Parameter 2 name: Device.SelfHeal.X_RDKCENTRAL-COM_MemFragThreshold type: uint, value: 0 Parameter 3 name: Device.SelfHeal.X_RDKCENTRAL-COM_CpuMemFragInterval type: uint, value: 0 Parameter 4 name: Device.SelfHeal.X_RDKCENTRAL-COM_Enable type: bool, value: true Parameter 5 name: Device.SelfHeal.X_RDKCENTRAL-COM_MaxRebootCount type: uint, value: 3 Parameter 6 name: Device.SelfHeal.X_RDKCENTRAL-COM_MaxResetCount type: uint, value: 3 Parameter 7 name: Device.SelfHeal.X_RDKCENTRAL-COM_NoWaitLogSync type: bool, value: false Parameter 8 name: Device.SelfHeal.X_RDKCENTRAL-COM_LogBackupThreshold type: uint, value: 0 Parameter 9 name: Device.SelfHeal.X_RDKCENTRAL-COM_DiagnosticMode type: bool, value: false Parameter 10 name: Device.SelfHeal.X_RDKCENTRAL-COM_DiagMode_LogUploadFrequency type: uint, value: 1440 Parameter 11 name: Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable type: bool, value: false Parameter 12 name: Device.SelfHeal.X_RDKCENTRAL-COM_DNS_URL type: string, value: www.google.com Parameter 13 name: Device.SelfHeal.CpuMemFragNumberOfEntries type: uint, value: 2 Parameter 14 name: Device.SelfHeal.CpuMemFrag.1.DMA type: string, value: Parameter 15 name: Device.SelfHeal.CpuMemFrag.1.DMA32 type: string, value: Parameter 16 name: Device.SelfHeal.CpuMemFrag.1.Normal type: string, value: Parameter 17 name: Device.SelfHeal.CpuMemFrag.1.Highmem type: string, value: Parameter 18 name: Device.SelfHeal.CpuMemFrag.1.FragPercentage type: uint, value: 0 Parameter 19 name: Device.SelfHeal.CpuMemFrag.2.DMA type: string, value: Parameter 20 name: Device.SelfHeal.CpuMemFrag.2.DMA32 type: string, value: Parameter 21 name: Device.SelfHeal.CpuMemFrag.2.Normal type: string, value: Parameter 22 name: Device.SelfHeal.CpuMemFrag.2.Highmem type: string, value: Parameter 23 name: Device.SelfHeal.CpuMemFrag.2.FragPercentage type: uint, value: 0 Parameter 24 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_PingInterval type: uint, value: 60 Parameter 25 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_NumPingsPerServer type: uint, value: 3 Parameter 26 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_MinNumPingServer type: uint, value: 1 Parameter 27 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_PingRespWaitTime type: uint, value: 1000 Parameter 28 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CorrectiveAction type: bool, value: false Parameter 29 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_LastReboot type: uint, value: 0 Parameter 30 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_RebootInterval type: int, value: 0 Parameter 31 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CurrentCount type: int, value: 0 Parameter 32 name: Device.SelfHeal.ConnectivityTest.PingServerList.IPv4PingServerTableNumberOfEntries type: uint, value: 0 Parameter 33 name: Device.SelfHeal.ConnectivityTest.PingServerList.IPv6PingServerTableNumberOfEntries type: uint, value: 0 Parameter 34 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_UsageComputeWindow type: uint, value: 15 Parameter 35 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold type: uint, value: 100 Parameter 36 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold type: uint, value: 100 Parameter 37 name: Device.SelfHeal.CPUProcAnalyzer.Enable type: bool, value: false Parameter 38 name: Device.SelfHeal.CPUProcAnalyzer.SleepInterval type: uint, value: 60 Parameter 39 name: Device.SelfHeal.CPUProcAnalyzer.TimeToRun type: uint, value: 600 Parameter 40 name: Device.SelfHeal.CPUProcAnalyzer.DynamicProcess type: bool, value: false Parameter 41 name: Device.SelfHeal.CPUProcAnalyzer.MonitorAllProcess type: bool, value: false Parameter 42 name: Device.SelfHeal.CPUProcAnalyzer.MemoryLimit type: uint, value: 1536 Parameter 43 name: Device.SelfHeal.CPUProcAnalyzer.ProcessList type: string, value: Parameter 44 name: Device.SelfHeal.CPUProcAnalyzer.SystemStatsToMonitor type: string, value: cpu,memory,fd,loadavg,cliconnected Parameter 45 name: Device.SelfHeal.CPUProcAnalyzer.ProcessStatsToMonitor type: string, value: cpu,memory,fd,thread Parameter 46 name: Device.SelfHeal.CPUProcAnalyzer.TelemetryOnly type: bool, value: false root@Filogic-GW:/# root@Filogic-GW:/usr/ccsp/tad# ls CcspTandDSsp corrective_action.sh log_buddyinfo.sh self_heal_connectivity_test.sh selfheal_reset_counts.sh TestAndDiagnostic.XML cpumemfrag_cron.sh resource_monitor.sh selfheal_aggressive.sh task_health_monitor.sh root@Filogic-GW:/rdklogs/logs# root@Filogic-GW:/rdklogs/logs# ps -alx | grep reso 4 0 4531 1 20 0 3496 2744 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/resource_monitor.sh 0 0 16777 10009 20 0 2244 808 pipe_w S+ pts/0 0:00 grep reso root@Filogic-GW:/rdklogs/logs# ps -alx | grep self 4 0 4528 1 20 0 3628 2836 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/self_heal_connectivity_test.sh 4 0 4539 1 20 0 4288 3468 do_wai S ? 0:00 /bin/sh /usr/ccsp/tad/selfheal_aggressive.sh 0 0 16816 10009 20 0 2244 824 pipe_w S+ pts/0 0:00 grep self |
1. By default, AVG CPU threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG CPU threshold value, Please refer the below code snippet and do the following steps,
root@Filogic-GW:/# dmcli eRT getv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold type: uint, value: 100 root@Filogic-GW:/rdklogs/logs# dmcli eRT setv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold uint 200 CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. root@Filogic-GW:/# dmcli eRT getv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgMemoryThreshold type: uint, value: 200 |
2. By default, AVG Memory threshold value will be set as 100. This value will be stored in syscfg database. If we want the change the default AVG Memory threshold value, Please refer the below code snippet and do the following steps,
root@Filogic-GW:~# dmcli eRT getv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold type: uint, value: 100 root@Filogic-GW:/rdklogs/logs# dmcli eRT setv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold uint 200 CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. root@Filogic-GW:~# dmcli eRT getv Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.ResourceMonitor.X_RDKCENTRAL-COM_AvgCPUThreshold type: uint, value: 200 |
3. Once it's reaches the threshold value, device will be rebooted automatically.
observation in /rdklogs/logs/SelfHeal.txt.0
For Memory Threshold
41106-10:22:51.210979 RDKB_SELFHEAL : Used memory in system is 153740 at timestamp 2024:11:06:10:22:51
241106-10:22:51.212450 RDKB_SELFHEAL : Free memory in system is 3754608 at timestamp 2024:11:06:10:22:51
241106-10:22:51.213869 RDKB_SELFHEAL : AvgMemUsed in % is 3
241106-10:22:51.267398 <128>CABLEMODEM[Mediatek]:<99000006><2024:11:06:10:22:51><ea:4f:a0:5d:06:99><BananapiBPI-R4> RM Memory threshold reached
241106-10:23:21.732742 RDKB_SELFHEAL : Today's reboot count is 1
241106-10:23:21.734210 RDKB_SELFHEAL : <128>CABLEMODEM[Mediatek]:<99000000><2024:11:06:10:23:21><ea:4f:a0:5d:06:99><BananapiBPI-R4> RM Rebooting device as part of corrective action
241106-10:23:21.735754 Setting Last reboot reason as MEM_THRESHOLD
241106-10:23:21.737264 Setting rebootReason to MEM_THRESHOLD and rebootCounter to 1
241106-10:23:21.789361 RDKB_REBOOT : Rebooting device due to MEM threshold reached
After reboot,
root@Filogic-GW:~# dmcli eRT getv Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason type: string, value: MEM_THRESHOLD |
For CPU Threshold
241106-10:54:09.546821 RDKB_SELFHEAL : Today's reboot count is 3
241106-10:54:09.548312 RDKB_SELFHEAL : <128>CABLEMODEM[Mediatek]:<99000000><2024:11:06:10:54:09><d2:33:17:da:85:e4><BananapiBPI-R4> RM Rebooting device as part of corrective action
241106-10:54:09.549727 Setting Last reboot reason as CPU_THRESHOLD
241106-10:54:09.551169 Setting rebootReason to CPU_THRESHOLD and rebootCounter to 1
241106-10:54:09.603327 RDKB_REBOOT : Rebooting device due to CPU threshold reached
<128>CABLEMODEM[Mediatek]:<99000005><2024:11:06:10:53:39><d2:33:17:da:85:e4><BananapiBPI-R4> RM CPU threshold reached
[2024-11-06:10:53:39:083104] Setting Last reboot reason
After reboot,
root@Filogic-GW:~# dmcli eRT getv Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason type: string, value: CPU_THRESHOLD |
If it detects that any of the process is not running, it automatically restarts that particular Component.
Let us take the example of CcspLMLite Component :
Run a ps command to verify that CcspLMLite is up and running again with different process id
ps aux | grep Ccsp
2. kill CcspLMLite process by using the below command
kill -9 PID(CcspLMLite PID)
3. Verfiy whether the CcspLMLite Process was killed or not by using the below command
ps aux | grep Ccsp
4. After 60 seconds(default), it will automatically restart the Process. Please check the CcspLMLIte PID.
root@Filogic-GW:~# ps -alx | grep CcspLM 5 950 4397 1 20 0 574768 7360 hrtime Ssl ? 0:00 /usr/bin/CcspLMLite -subsys eRT. 0 0 31853 9847 20 0 2244 820 pipe_w S+ pts/0 0:00 grep CcspLM root@Filogic-GW:~# systemctl status CcspLMLite ● CcspLMLite.service - CcspLMLite service Loaded: loaded (/lib/systemd/system/CcspLMLite.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2022-04-28 17:43:00 UTC; 2 years 6 months ago Process: 4353 ExecStart=/usr/bin/CcspLMLite -subsys $Subsys (code=exited, status=0/SUCCESS) Main PID: 4397 (CcspLMLite) CGroup: /system.slice/CcspLMLite.service └─ 4397 /usr/bin/CcspLMLite -subsys eRT. 2022 Apr 28 17:43:00 Filogic-GW systemd[1]: Starting CcspLMLite service... 2022 Apr 28 17:43:00 Filogic-GW systemd[1]: Started CcspLMLite service. 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: eRT.com.cisco.spvtg.ccsp.lmlite start to check eRT.com.cisco.spvtg.ccsp.psm status 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: eRT.com.cisco.spvtg.ccsp.psm is ready, eRT.com.cisco.spvtg.ccsp.lmlite continue 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: PSM module done. 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: Conf file /etc/debug.ini open success 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: rdk_dyn_log_initg_dl_socket = 3 __progname = CcspLMLite 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: rdk_logger_init /etc/debug.ini Already Stack Level Logging processed... not processing again. 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: mq == (mqd_t)-1: Invalid argument 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: mq == (mqd_t)-1: Invalid argument root@Filogic-GW:~# root@Filogic-GW:~# kill -9 4397 root@Filogic-GW:~# systemctl status CcspLMLite × CcspLMLite.service - CcspLMLite service Loaded: loaded (/lib/systemd/system/CcspLMLite.service; enabled; vendor preset: enabled) Active: failed (Result: signal) since Wed 2024-11-06 09:22:16 UTC; 1s ago Process: 4353 ExecStart=/usr/bin/CcspLMLite -subsys $Subsys (code=exited, status=0/SUCCESS) Process: 32297 ExecStopPost=/bin/sh -c echo "`date`: Stopping/Restarting CcspLMLite" >> ${PROCESS_RESTART_LOG} (code=exited, status=0/SUCCESS) Main PID: 4397 (code=killed, signal=KILL) 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: eRT.com.cisco.spvtg.ccsp.lmlite start to check eRT.com.cisco.spvtg.ccsp.psm status 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: eRT.com.cisco.spvtg.ccsp.psm is ready, eRT.com.cisco.spvtg.ccsp.lmlite continue 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: PSM module done. 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: Conf file /etc/debug.ini open success 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: rdk_dyn_log_initg_dl_socket = 3 __progname = CcspLMLite 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: rdk_logger_init /etc/debug.ini Already Stack Level Logging processed... not processing again. 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: mq == (mqd_t)-1: Invalid argument 2022 Apr 28 17:43:00 Filogic-GW CcspLMLite[4397]: mq == (mqd_t)-1: Invalid argument 2024 Nov 06 09:22:16 Filogic-GW systemd[1]: CcspLMLite.service: Main process exited, code=killed, status=9/KILL 2024 Nov 06 09:22:16 Filogic-GW systemd[1]: CcspLMLite.service: Failed with result 'signal'. root@Filogic-GW:~# ps -alx | grep CcspLM 0 0 32501 9847 20 0 2244 820 pipe_w S+ pts/0 0:00 grep CcspLM self heal logs : 241106-09:22:55.245084 RDKB_SELFHEAL : <128>Ethwan Gateway[Mediatek]:<99000007><2024:11:06:09:22:53><e6:72:eb:94:4f:2e><BananapiBPI-R4> RM CcspLMLite process not running , restarting it 241106-09:22:55.246875 RDKB_SELFHEAL : Resetting process CcspLMLite root@Filogic-GW:~# ps -alx | grep CcspLM 5 950 33247 1 20 0 443700 7364 hrtime Ssl ? 0:00 /usr/bin/CcspLMLite -subsys eRT. 0 0 33796 9847 20 0 2244 804 pipe_w S+ pts/0 0:00 grep CcspLM root@Filogic-GW:~# |
If Connectivity Test fails, device will go for reboot if corrective action enabled.
Validation : Using the below steps to validate the connectivity Test
unplug the ethernet LAN cable or ifconfig erouter0 down
Note : By default DNS PING Test and corrective actions are disabled.
By default DNS and corrective actions are disabled. Use the below commands to enable those parameters. For DNS Testing , root@Filogic-GW:/rdklogs/logs# dmcli eRT getv Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable type: bool, value: false root@Filogic-GW:~# dmcli eRT getv Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CorrectiveAction CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CorrectiveAction type: bool, value: false root@Filogic-GW:/rdklogs/logs# root@Filogic-GW:/rdklogs/logs# root@Filogic-GW:/rdklogs/logs# dmcli eRT setv Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable bool true CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. root@Filogic-GW:/rdklogs/logs# dmcli eRT getv Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.SelfHeal.X_RDKCENTRAL-COM_DNS_PINGTEST_Enable type: bool, value: true root@Filogic-GW:/rdklogs/logs# To enable corrective action, dmcli eRT setv Device.SelfHeal.ConnectivityTest.X_RDKCENTRAL-COM_CorrectiveAction bool true root@Filogic-GW:/rdklogs/logs# dmcli eRT getv Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason CR component name is: eRT.com.cisco.spvtg.ccsp.CR subsystem_prefix eRT. Execution succeed. Parameter 1 name: Device.DeviceInfo.X_RDKCENTRAL-COM_LastRebootReason type: string, value: PING_Connectivity_Test_Failure root@Filogic-GW:/rdklogs/logs# root@Filogic-GW:/rdklogs/logs# tail -f SelfHeal.txt.0 Failure : ====== 241104-10:16:38.221777 RDKB_SELFHEAL : No IPv4 Gateway Address detected 241104-10:16:38.223798 RDKB_SELFHEAL : No IPv6 Gateway Address detected 241104-10:16:38.239881 RDKB_SELFHEAL : Taking corrective action 241104-10:16:39.782027 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions 241104-10:16:39.821478 DNS Response: fail to resolve this URL www.google.com 241104-10:16:39.837748 RDKB_SELFHEAL : Taking corrective action 241104-10:14:35.589120 RDKB_SELFHEAL : <128>Ethwan Gateway[Mediatek]:<99000007><2024:11:04:10:14:34><ea:a2:ae:1a:b7:63><BananapiBPI-R4> RM PIt diff and last_reboot -960 and 28800 PING_LATENCY_GWIPv4:1.00,2.42,1.07 241104-10:14:59.740762 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull 241104-10:14:59.742689 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.2.254 241104-10:14:59.744620 [RDKB_SELFHEAL] : IPv6 GW Address is:fe80::da3a:ddff:fe0d:b86c 241104-10:14:59.771527 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions 241104-10:14:59.808810 DNS Response: fail to resolve this URL www.google.com 241104-10:14:59.824618 RDKB_SELFHEAL : Taking corrective action 241104-11:01:01.277374 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions 241104-11:01:01.312725 DNS Response: fail to resolve this URL www.google.com 241104-11:01:01.328165 RDKB_SELFHEAL : Taking corrective action 241104-11:01:01.406788 RDKB_SELFHEAL : Total memory in system is 4023440 241104-11:01:01.408277 RDKB_SELFHEAL : Used memory in system is 143808 241104-11:01:01.409853 RDKB_SELFHEAL : Free memory in system is 3773984 241104-11:01:02.457602 RDKB_SELFHEAL : Current CPU load is 0 241104-11:01:02.459152 RDKB_SELFHEAL : Top 5 tasks running on device with resource usage are below MiB Swap: 0.0 total, 0.0 free, 0.0 used. 3735.6 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1 root 20 0 94000 8100 5744 S 0.0 0.2 0:06.29 systemd 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp 241104-11:01:02.716353 RDKB_SELFHEAL : 2.4GHz radio is operating on channel 241104-11:01:02.732741 RDKB_SELFHEAL : 5GHz radio is operating on channel 241104-11:01:02.734334 RDKB_SELFHEAL : MoCA stats are not available due to MoCA crash 241104-11:01:02.800183 RDKB_SELFHEAL : <128>Ethwan Gateway[Mediatek]:<99000007><2024:11:04:11:01:01><7e:22:72:d9:02:13><BananapiBPI-R4> RM PIt diff and last_reboot 55 and 50 Ping reset Router 241104-11:01:02.881393 RDKB_SELFHEAL : DNS Information : Success: ======= 241106-11:23:00.787843 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull 241106-11:23:00.789414 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.2.254 241106-11:23:00.791062 [RDKB_SELFHEAL] : IPv6 GW Address is:fe80::1dde:7669:fc2e:fe43 fe80::da3a:ddff:fe09:f505 fe80::532e:c128:f66b:79f1 fe80::f485:c03c:fcf3:c75 241106-11:23:00.817414 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions 241106-11:23:01.021947 DNS Response: Got success response for this URL www.google.com |
1. Using selfHeal logs to trouble shoot the run-time errors. SelfHeal logs will be created the below path,
/rdklogs/logs/SelfHeal.txt.0
2. Resource Monitor sample Logs,
MEM :
41106-10:22:51.210979 RDKB_SELFHEAL : Used memory in system is 153740 at timestamp 2024:11:06:10:22:51
241106-10:22:51.212450 RDKB_SELFHEAL : Free memory in system is 3754608 at timestamp 2024:11:06:10:22:51
241106-10:22:51.213869 RDKB_SELFHEAL : AvgMemUsed in % is 3
241106-10:22:51.267398 <128>CABLEMODEM[Mediatek]:<99000006><2024:11:06:10:22:51><ea:4f:a0:5d:06:99><BananapiBPI-R4> RM Memory threshold reached
241106-10:23:21.732742 RDKB_SELFHEAL : Today's reboot count is 1
241106-10:23:21.734210 RDKB_SELFHEAL : <128>CABLEMODEM[Mediatek]:<99000000><2024:11:06:10:23:21><ea:4f:a0:5d:06:99><BananapiBPI-R4> RM Rebooting device as part of corrective action
241106-10:23:21.735754 Setting Last reboot reason as MEM_THRESHOLD
241106-10:23:21.737264 Setting rebootReason to MEM_THRESHOLD and rebootCounter to 1
241106-10:23:21.789361 RDKB_REBOOT : Rebooting device due to MEM threshold reached
CPU:
241106-10:54:09.546821 RDKB_SELFHEAL : Today's reboot count is 3
241106-10:54:09.548312 RDKB_SELFHEAL : <128>CABLEMODEM[Mediatek]:<99000000><2024:11:06:10:54:09><d2:33:17:da:85:e4><BananapiBPI-R4> RM Rebooting device as part of corrective action
241106-10:54:09.549727 Setting Last reboot reason as CPU_THRESHOLD
241106-10:54:09.551169 Setting rebootReason to CPU_THRESHOLD and rebootCounter to 1
241106-10:54:09.603327 RDKB_REBOOT : Rebooting device due to CPU threshold reached
<128>CABLEMODEM[Mediatek]:<99000005><2024:11:06:10:53:39><d2:33:17:da:85:e4><BananapiBPI-R4> RM CPU threshold reached
[2024-11-06:10:53:39:083104] Setting Last reboot reason
3. Process Monitor Sample Logs,
LMLite Process :
241106-09:22:55.245084 RDKB_SELFHEAL : <128>Ethwan Gateway[Mediatek]:<99000007><2024:11:06:09:22:53><e6:72:eb:94:4f:2e><BananapiBPI-R4> RM CcspLMLite process not running , restarting it
241106-09:22:55.246875 RDKB_SELFHEAL : Resetting process CcspLMLite
4. Connectivity Test Sample Logs ,
Successful Scenario :
190924-08:56:43.577621 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
190924-08:56:43.583217 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.30.1
190924-08:56:43.588370 [RDKB_SELFHEAL] : IPv6 GW Address is:
190924-08:56:43.622618 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
190924-08:56:43.730057 DNS Response: Got success response for this URL www.google.com
Failure Scenario :
191007-09:00:13.899713 [RDKB_SELFHEAL] : GW IP Connectivity Test Successfull
191007-09:00:13.909201 [RDKB_SELFHEAL] : IPv4 GW Address is:192.168.60.1
191007-09:00:13.918684 [RDKB_SELFHEAL] : IPv6 GW Address is:
191007-09:00:13.972966 RDKB_SELFHEAL : Ping server lists are empty , not taking any corrective actions
191007-09:00:14.119985 DNS Response: fail to resolve this URL www.google.com
191007-09:00:14.152808 RDKB_SELFHEAL : Taking corrective action