Overview

Maintenance Manager is one of the WPEFramework Thunder Plugin that manages the daily maintenance activities on RDK device. 

Source Code

Terminology

Daily Maintenance

Set of Activities (or tasks) that needs to be executed daily without impacting the UX. Executed when the user is not actively using the CPE

Maintenance ActivityActivity (or Task) that is a part of Daily Maintenance. They are considered critical and wake up the Device from DEEPSLEEP power state to be executed
DCMDevice Configuration Manager, it fetches and updates configuration for Device
RFCRDK Runtime Feature Control, is used for remotely controlling CPE software features on RDK platforms. Release Management uses RFC for staged rollout of new features.
Log UploadUploads all the logs at the time specified in DCMSettings.conf  or when the size reaches predefined limit
FW Download (swupdate/ rdkfwupgrader)Checkis if there is a new firmware update is required and performing the update on the device (if needed)

Types of Maintenance

  • Unsolicited Maintenance (Maintenance on Boot)
  • Solicited Maintenance (Scheduled Maintenance)

Unsolicited Maintenance

Also known as Maintenance on Bootup. RDK automatically performs a Maintenance as soon as the device boots up. Unsolicited maintenance occurs on each and every bootup. Device will look for critical updates on boot and performs maintenance of critical activities without application triggering maintenance. The isRebootPending will be set to false unless there is an RFC/FWUpdate needed. 

Solicited Maintenance

Also known as Scheduled Maintenance/ Daily Maintenance. This is executed daily based on Maintenance Start Time triggered by Application. The isRebootPending is set to true irrespective of XConf configuration. 

Note: Both Unsolicited and Solicited maintenance cannot be triggered at the same time. If a Solicited Maintenance is started while an Unsolicited Maintenance is in progress, then the Solicited Maintenance will be notified and postponed to the next day.

Functionality Observations

  • Maintenance Manager notifies the upper layer components when the next daily maintenance activity is required to take place (based on the start time scheduled by XConf for the specific device)
  • Maintenance Manager starts the daily maintenance upon request from the upper layer components
  • Capable of running the maintenance activity in the background or aborting or skipping a daily activity upon request of the upper layers
  • Capable for running the list of the activity in time order, one at a time. If multiple activities have the same start time, it will run one after the other one, not concurrently.
  • A default Maintenance Start Time will be maintained by Maintenance Manager on first initialization, with initial status as MAINTENANCE_IDLE
  • Maintenance Start Time will be updated by the Maintenance Plugin by calling getMaintenanceStartTime jsonPRC call refer JsonPRC section for more info
  • Maintenance Manager Plugin, interacts with Network Plugin polling to get the Network status (if its connected to active internet or not). The polling happens for 4 times with a retry delay of 30 seconds.
  • On an un-successful polling session, Maintenance Manager will subscribe to onInternetStatusChange event and set the Maintenance Status to MAINTENANCE_ERROR, on successful event reception, MaintenanceMaintenance will trigger Critical Tasks ()

WhoAmI

WhoAmI is used by device to determine the operating context (partner, product/experience, Regional configService) and load the right firmware so that we can activate and use the service that user signed up for. It happens only in Unsolicited Maintenance. It is not defaulted in RDKE yet (as of 25Q2). WhoAmI has dependency on components such as AuthService which has to be refactored before making it suitable for community builds usage.

Maintenance Activity/ Task

Activity/ Task NameRDKVRDKE
WrapperTaskWrapperTask
RFC/lib/rdk/Start_RFC.sh/lib/rdk/RFCbase.sh/lib/rdk/StartMaintenanceTasks.sh RFCENABLE_RFC_MANAGER flag Enabled
/usr/bin/rfcMgr
ENABLE_RFC_MANAGER flag Disbled
/lib/rdk/RFCbase.sh





SWUPDATE/lib/rdk/swupdate_utility.shFWDOWNLOAD RFC Enabled/lib/rdk/StartMaintenanceTasks.sh SWUPDATE/usr/bin/rdkvfwupgrader 0 1
/usr/bin/rdkvfwupgrader 0 1
FWDOWNLOAD RFC Disabled
/lib/rdk/deviceInitiatedFWDnld.sh





LOGUPLOAD/lib/rdk/Start_uploadSTBLogs.sh/lib/rdk/uploadSTBLogs.sh/lib/rdk/StartMaintenanceTasks.sh LOGUPLOAD/lib/rdk/uploadSTBLogs.sh

XConf Settings

The DCMScript.sh (in RDKV) or dcmd (in RDKE) will parse the /tmp/DCMSettings.conf file to get XConf Data fetched by Telemetry. Data likeurn:settings:CheckSchedule:cron is used to get start_hr and start_min values and urn:settings:TimeZoneMode is used to get the tz_mode data. Device is expected to receive the time based on the time zone confiigured in the device. This parsed data is then stored to /opt/rdk_maintenance.conf file by DCM. The data can be used to calculate the Maintenance Start Time for the next scheduled Maintenance (SOLICITED). The Maintenance Start Time is calculated using /lib/rdk/getMaintenanceStartTime.sh (in RDKV) or using CalculateStartTime() API in MaintenanceManager internally.

Architecture Overview (HLA)

Activity Diagram

High Level Sequence Diagram

Maintenance Status Notifications

  • MAINTENANCE_IDLE - Initial state
  • MAINTENANCE_STARTED - Sent immediately on maintenance startup either scheduled or onboot
  • MAINTENANCE_ERROR - Sent after receiving error notification while executing any of the maintenance activities
  • MAINTENANCE_COMPLETE - Sent after receiving *_COMPLETE notification from all critical maintenance tasks. 
  • MAINTENANCE_INCOMPLETE - Sent whenever Maintenance service doesn't execute any of the tasks. MAINTENANCE_ERROR is returned even if one task returns error while others weren't executed.

Distros and RFCs used in Maintenance Manager

  • Maintenance Manager feature is controlled by distro "enable_maintenance_manager"
  • stopMaintenance support is controlled by RFC "Device.DeviceInfo.X_RDKCENTRAL-COM_RFC.Feature.StopMaintenance.Enable" which is enabled by default
  • whoAmI feature is controlled by both distros "whoami_enabled" and "sec_manager_whoami_enabled" and a compile time flag "ENABLE_WHOAMI".

Notable features

  1. Network check - Before triggering maintenance tasks, first it will check for internet connectivity in the device. If the device is connected with internet, it will proceed to execute maintenance tasks otherwise it will wait for 2 minutes and if still it has no internet connection, it will exit from maintenance and status will be MAINTENANCE_ERROR
  2. Maintenance Manager will subscribe for network event if device is not able to acquire network in 2mins before exiting from maintenance activities. Once device got internet, Maintenance Manager will be notified, and it will trigger RFC and xconfImageCheck.sh
  3. StopMaintenance is allowed only when the maintenance is in progress and it will set the status as MAINTENANCE_ERROR
  4. StartMaintenance is allowed only when the maintenance is already not in progress.
  5. Who Am I integration changes are done in Maintenance Manager to acquire the device Initialization context

Logs

Maintenance Manager logs can be found in wpeframework log. 

Grep string: grep -inr "MaintenanceManager.cpp" /opt/logs/maintenance.log* 

Child Pages

Methods, RPCs. Events and Functions

Methods

subscribeForInternetStatusEvent


DescriptionA MaintenanceManager method that does a proper subscription to Network state changes coming from Network Plugin, we use internetStatusChangeEventHandler() Event handler
Prototypebool MaintenanceManager::subscribeForInternetStatusEvent(string event)
Input Argsstring event
Returntruereturned if Thunder Plaugin Handle is obtained and Network Plugin Event is subscribed with ERROR_NONE
falsereturned if there is a failure to get Thunder Plugin Handle or if the Event Subscription fails (with Status not ERROR_NONE)
task_execution_thread
Description

This is triggered by the MaintenanceonBootup and startMaintenance to execute all maintenance tasks in a specific order with all preconditions asserted like Network Status, reboot flags, critical update, etc 

Prototypevoid MaintenanceManager::task_execution_thread()
Input ArgsNone
Returnvoid
getThunderPluginHandle


DescriptionA Plugin method to get a plugin handler for a Plugin with a Callsign. It creates a Toke on runtime to get the Plugin handle
PrototypeWPEFramework::JSONRPC::LinkType<WPEFramework::Core::JSON::IElement>* MaintenanceManager::getThunderPluginHandle(const char* callsign)
Input Argsconst char* callsign
ReturnWPEFramework::JSONRPC::LinkType<WPEFramework::Core::JSON::IElement>*
internetStatusChangeEventHandler


DescriptionThis is an event handler method that is triggered subscribeForInternetStatusEvent() when there is a network state change from the Network Plugin.
Prototypevoid MaintenanceManager::internetStatusChangeEventHandler(const JsonObject& parameters)
Input Argsconst JsonObject& parameters
Returnvoid
startCriticalTasks
Description

A Maintenance Manager method to trigger Critical tasks via 3 Scripts

  • StartDCM_maintenance.sh
  • RFCbase.sh
  • xconfImageCheck.sh

These are triggered in case if the network is reconnected after there is no active network connection available during an Unsolicited Maintenance and network retry fails with Maintenance Status with MAINTENANCE_ERROR

Prototypevoid MaintenanceManager::startCriticalTasks
Input ArgsNA
Returnvoid
checkNetwork


DescriptionThis method checks if Network Plugin is active and subscribes to Network state change; This uses runtime token creation to communicate with Thunder
Prototypebool MaintenanceManager::checkNetwork()
Input ArgsNone
Returnfalsereturns when Plugin is not active or the network plugin connection check fails
trueother cases
isDeviceOnline


DescriptionThis utility method brings in a network retry mechanism with MAX_NETWORK_RETRIES retries (4) for a retry delay NETWORK_RETRY_INTERVAL (30 seconds). This uses checkNetwork() to get the state of network plugin state change.
Prototypebool MaintenanceManager::isDeviceOnline()
Input ArgsNone
Returntruereturns if any try or retry of network check gives true
falsereturns if the network is not available even after 4 retries
Initialize
Description

This method initializes Maintenance Manager service and instance and also calls subscribeToDeviceInitializationEvent() in WhoAmI to subscribe to onDeviceInitializationContextUpdate Event from SecManager.

This also initializes IARM thru  InitializeIARM() and returns an empty string

Prototypeconst string MaintenanceManager::Initialize(PluginHost::IShell* service)
Input ArgsPluginHost::IShell* service
Returnstring
Deinitialize


DescriptionThis method Deinitializes the MaintenanceManager services releases it triggers stopMaintenanceTasks() to gracefully exit maintenance and deinitializes IARM via DeinitializeIARM()
Prototypevoid MaintenanceManager::Deinitialize(PluginHost::IShell* service)
Input ArgsPluginHost::IShell* service
Returnvoid
InitializeIARM


DescriptionThis method checks if IARM is Initialized and registers event handlers for Maintenance Event Update Event and DCM Start Time Event
Prototypevoid MaintenanceManager::InitializeIARM()
Input ArgsNone
Returnvoid
maintenanceManagerOnBootup


DescriptionThis is a Maintenance API to trigger Maintenance Task execution thread by setting appropriate preconditions and flags to them
Prototypevoid MaintenanceManager::maintenanceManagerOnBootup()
Input ArgsNone
Returnvoid
_MaintenanceMgrEventHandler


DescriptionThis is an event handler to trigger IARM Event Handler iarmEventHandler() for events triggered by InitializeIARM() and DeinitializeIARM()
Prototypevoid MaintenanceManager::_MaintenanceMgrEventHandler(const char *owner, IARM_EventId_t eventId, void *data, size_t len)
Input Argsconst char *owner, IARM_EventId_t eventId, void *data, size_t len
Returnvoid
iarmEventHandler
Description

This is an IARM based event handler for Maintenance Manager, triggered by _MaintenanceMgrEventHandler, this sets appropriate Task Completion or Error Status and 

completion/ incompletion of all tasks it sets the Maintenance Status and joins the thread accordingly

Prototypevoid MaintenanceManager::iarmEventHandler(const char *owner, IARM_EventId_t eventId, void *data, size_t len)
Input Argsconst char *owner, IARM_EventId_t eventId, void *data, size_t len
Returnvoid
DeinitializeIARM


DescriptionThis method checks if IARM is Connected and removes event handlers for Maintenance Event Update Event and DCM Start Time Event
Prototypevoid MaintenanceManager::DeinitializeIARM()
Input ArgsNone
Returnvoid
stopMaintenanceTasks


DescriptionThis Utility method is triggered via stopMaintenance() RPC method to stop a running (MAINTENANCE_STARTED) maintenance cycle gracefully. It aborts all the tasks on a running Maintenance using abortTask(). And once that is done, the threads used are joined and MAINTENANCE_ERROR is set as Maintenance Status on exit
Prototypebool MaintenanceManager::stopMaintenanceTasks()
Input ArgsNone
Returntruereturns if all tasks are gracefully exited (aborted)
falsereturns in any other cases
readRFC


DescriptionA Utility method to read the RFC Value set for the RFC Parameter passed (Boolean Data type RFCs only) and returns it
Prototypebool MaintenanceManager::readRFC(const char *rfc)
Input Argsconst char *rfc
Returntrueif the RFC Param stored has value "true"
falseif the RFC Param stored has value "false" or the RFC Param does not exist
abortTasks


DescriptionA Utility method used by Maintenance Manager. This uses getTaskPID() to get the PID of the process to be aborted and terminates it with an appropriate Signal passed. This is triggered in case of Stop Maintenance
Prototypeint MaintenanceManager::abortTask(const char* taskname, int sig_to_send)
Input Argsconst char* taskname, int sig_to_send
Returnint
getTaskPID


DescriptionA Utility method used in Maintenance Manager to get the process PID. This is used to abort the Task gracefully using its PID in case Stop Maintenance is being triggered
Prototypepid_t MaintenanceManager::getTaskPID(const char* taskname)
Input Argsconst char* taskname
Returnpid_t

Remote Procedure Calls (RPC)

getMaintenanceActivityStatus


DescriptionJsonRPC Method to get the current Maintenance Status or the Maintenance Status of the recent Maintenance Activity (either of MAINTENANCE_IDLE/ MAINTENANCE_STARTED/ MAINTENANCE_ERROR/ MAINTENANCE_COMPLETE/ MAINTENANCE_INCOMPLETE) along with info like isCriticalMaintenance 
Prototypeuint32_t MaintenanceManager::getMaintenanceActivityStatus(const JsonObject& parameters,  JsonObject& response)
Parametersconst JsonObject& parameters,  JsonObject& response
RPC Callcurl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.getMaintenanceActivityStatus","params":{}}' http://127.0.0.1:9998/jsonrpc
Response{"jsonrpc":"2.0","id":3,"result":{"maintenanceStatus":"MAINTENANCE_ERROR","LastSuccessfulCompletionTime":0,"isCriticalMaintenance":false,"isRebootPending":false,"success":true}}
Returnuint32_t
getMaintenanceStartTime


Descriptionreturns the future time window - start time in epoch or Unix Time (UTC) - of the critical activities in the maintenance schedule. The time the critical activities start is uniformly distributed by XConf across the entire population of CPEs (based on the PMI).
  1. Maintenance start time from XConf comes as local time. This is because in the past the maintenance start time is used to configure Cron directly, which works based on local time. This has limitations at the begin and at the end of the daylight saving period as it can happen that maintenance doesn't run for that day, or run twice (e.g. if it's between 1am and 2am, it won't run the day daylight saving period starts – and clock moves forward - and it will run twice the day daylight saving period ends – and clock moves backward)
  2. on the contrary, the start time provided to the upper layers by the maintenance Thunder service needs to make sure that maintenance never fails to run or run twice during the same day. For this reason, the time reported back is always epoch or Unix Time express in number of seconds since 1st Jan 1970 (UTC). Care needs to be put in converting local time to epoch time so that maintenance always run once every day. 
Prototypeuint32_t MaintenanceManager::getMaintenanceStartTime (const JsonObject& parameters, JsonObject& response)
Parametersconst JsonObject& parameters, JsonObject& response
RPC Callcurl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.getMaintenanceStartTime","params":{}}' http://127.0.0.1:9998/jsonrpc
ResponsePositive{"jsonrpc":"2.0","id":3,"result":{"maintenanceStartTime":1640094912,"success":true}}
Negative{"jsonrpc":"2.0","id":3,"result":{"maintenanceStartTime":-1,"success":true}}
Returnuint32_t
startMaintenance


DescriptionJsonRPC Method to start a Maintenance Cycle (SOLICITED MAINTENANCE). This is triggered by App (EPG/AS) to start scheduled maintenance based on the Start Maintenance Time. This RPC can be triggered only when a Maintenance Cycle is not running already. This will give a successful response with positive response when a maintenance is not running already and the trigger is successfule. It returns false response if the trigger fails or if there is already a maintenance is running.
Prototypeuint32_t MaintenanceManager::startMaintenance(const JsonObject& parameters, JsonObject& response)
Parametersconst JsonObject& parameters, JsonObject& response
RPC Callcurl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.startMaintenance","params":{}}' http://127.0.0.1:9998/jsonrpc
ResponsePositive{"jsonrpc":"2.0","id":3,"result":{"success":true}}
Negative{"jsonrpc":"2.0","id":3,"result":{"success":false}}
Returnuint32_t
stopMaintenance
Description

JsonRPC Method to gracefully end all the current running maintenance tasks, joins all threads and exists maintenance with MAINTENANCE_ERROR Status. This RPC Call gives successful response with true if the running maintenance is stopped successfully and it returns false response if there is a failure in the stopping process or if thee is no running Maintenance to stop. This is used by App (EPG/AS) to stop the current running maintenance. Stop Maintenance feature is controlled via an RFC Device.DeviceInfo.X_RDKCENTRAL-COM_RFC.Feature.StopMaintenance.Enable (this is defaulted to true i.e. Enabled by default)

Prototypeuint32_t MaintenanceManager::stopMaintenance(const JsonObject& parameters, JsonObject& response)
Parametersconst JsonObject& parameters, JsonObject& response
RPC Call

curl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.stopMaintenance","params":{}}' http://127.0.0.1:9998/jsonrpc 

Response

Positive 

{"jsonrpc":"2.0","id":3,"result":{"success":true}}

Negative

{"jsonrpc":"2.0","id":3,"result":{"success":false}}

Returnuint32_t
setMaintenanceMode
Description

JsonRPC Method to set the Maintenance Mode.

  • FOREGROUND
    • This mode is a vanilla mode that lets Maintenance to run all the Maintenance Tasks
  • BACKGROUND
  1. Abort activities currently running and if the task can't run in background (if maintenance was already started with FOREGROUND mode). The only exception is a critical software download.
  2. Execute tasks not impacting user experience if the maintenance mode is set to BACKGROUND before calling startMaintenance
Prototypeuint32_t MaintenanceManager::setMaintenanceMode(const JsonObject& parameters, JsonObject& response)
Parametersconst JsonObject& parameters, JsonObject& response
RPC CallFOREGROUND

curl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.setMaintenanceMode","params":{"maintenanceMode":"FOREGROUND"}}' http://127.0.0.1:9998/jsonrpc

BACKGROUND

curl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.setMaintenanceMode","params":{"maintenanceMode":"BACKGROUND"}}' http://127.0.0.1:9998/jsonrpc

ResponsePositive 

{"jsonrpc":"2.0","id":3,"result":{"success":true}}

Negative

{"jsonrpc":"2.0","id":3,"result":{"success":false}}

Returnuint32_t
getMaintenanceMode
DescriptionJsonRPC Method to get the current Maintenance Manager Mode (either FOREGROUND/ BACKGROUND). The received response will also have OptOut Value (NONE/ IGNORE_UPDATE/ BYPASS_OUTPUT/ ENFORECE_OPTOUT)
Prototypeuint32_t MaintenanceManager::getMaintenanceMode(const JsonObject& parameters, JsonObject& response)
Parametersconst JsonObject& parameters, JsonObject& response
RPC Call

curl -H "Authorization: Bearer `WPEFrameworkSecurityUtility | cut -d '"' -f 4`" --header "Content-Type: application/json" --request POST --silent -d '{"jsonrpc":"2.0","id":"3","method":"org.rdk.MaintenanceManager.1.getMaintenanceMode","params":{"maintenanceMode":"BACKGROUND"}}' http://127.0.0.1:9998/jsonrpc

Response

Foreground

{"jsonrpc":"2.0","id":3,"result":{"maintenanceMode":"FOREGROUND","optOut":"NONE","success":true}}

Background

{"jsonrpc":"2.0","id":3,"result":{"maintenanceMode":"BACKGROUND","optOut":"NONE","success":true}}

Returnuint32_t

Events

onMaintenanceStatusChange


DescriptionThis is a critical utility method which sends IARM Event upon Maintenance Status Event changes during the Maintenance Flow
Prototypevoid MaintenanceManager::onMaintenanceStatusChange(Maint_notify_status_t status)
Input ArgsMaint_notify_status_t status
Returnvoid

Functions

notifyStatusToString
DescriptionUtility function to get the string equivalent of Maintenance Status passed
Prototypestring notifyStatusToString(Maint_notify_status_t &status);
Input ArgsMaint_notify_status_t &status
Returnstring 

Returns the corresponding Maintenance Status String for the respective Maintenance Status. Below are the possible Maintenance statuses that will be returned

  • MAINTENANCE_IDLE
  • MAINTENANCE_STARTED
  • MAINTENANCE_ERROR
  • MAINTENANCE_COMPLETE
  • MAINTENANCE_INCOMPLETE
checkValidOptOutModes
DescriptionUtility function if the OptOut passed is from one that is honored by Maintenance Manager
Prototypebool checkValidOptOutModes(string OptoutModes);
Input Argsstring OptoutModes
Returntrue

Returns when the passed OptOutMode is in either of the one below;

  • ENFORCE_OPTOUT
  • BYPASS_OPTOUT
  • IGNORE_UPDATE
  • NONE
falseReturns when the OptOut is not any of the above
moduleStatusToString
DescriptionUtility function to get the string equivalent of Maintenance IARM Activity Status passed
Prototypestring moduleStatusToString(IARM_Maint_module_status_t &status);
Input ArgsIARM_Maint_module_status_t &status
Returnstring

Returns the corresponding Maintenance Status String for the respective Maintenance IARM Status. Below are the possible IARM statuses that will be returned

  • MAINTENANCE_DCM_COMPLETE
  • MAINTENANCE_DCM_ERROR
  • MAINTENANCE_RFC_COMPLETE
  • MAINTENANCE_RFC_ERROR
  • MAINTENANCE_LOGUPLOAD_COMPLETE
  • MAINTENANCE_LOGUPLOAD_ERROR
  • MAINTENANCE_PINGTELEMETRY_COMPLETE
  • MAINTENANCE_PINGTELEMETRY_ERROR
  • MAINTENANCE_FWDOWNLOAD_COMPLETE
  • MAINTENANCE_FWDOWNLOAD_ERROR
  • MAINTENANCE_REBOOT_REQUIRED
  • MAINTENANCE_FWDOWNLOAD_ABORTED
  • MAINTENANCE_CRITICAL_UPDATE
  • MAINTENANCE_EMPTY


  • No labels