All,

I've been using the Troubleshooting→Logs web page on RDK-B and was surprised that no logs were being displayed at any time frame (Today/Yesterday/Last week/Last month/Last 90 days) for the event logs even though the /nvram/log/event/eventlog file was present on my target and had many entries.

I found that if the number of entries in the eventlog exceeded 1250 that no logs ever get displayed on the web page.

I've tracked this down to the value g_uMaxParamInResponse being set to 5000 in the file https://code.rdkcentral.com/r/plugins/gitiles/rdkb/components/opensource/ccsp/CcspCommonLibrary/+/master/source/ccsp/components/common/MessageBusHelper/helper/messagebus_interface_helper.c.

The code in https://code.rdkcentral.com/r/plugins/gitiles/rdkb/components/opensource/ccsp/webui/+/master/source/Styles/xb3/code/actionHandler/ajax_troubleshooting_logs.php uses the following command to get the event logs:

exec("/usr/ccsp/ccsp_bus_client_tool eRT getv Device.X_CISCO_COM_Diagnostics.Eventlog.Entry. | grep 'type:' > /tmp/log_event.txt");

This command tries to get all the entries under the Eventlog at once.

Since each Device.X_CISCO_COM_Diagnostics.Eventlog.Entry has 4 parameters, if you have more than 1250 eventlog entries, you've gone over the max param in response value of 5000.

As a result, the return from the ccsp_bus_client_tool call is Execution fail(error code:UNKNOWN MESSAGE TYPE(-47)), which results in no values put into the /tmp/log_event.txt for the web page to display.


I think not displaying any events in this case is not correct; if increasing the number of acceptable parameters is not a good option (for memory purposes), then it seems the web page should display entries up to the maximum number allowed as defined by the time frame chosen.

Getting individual entries (e.g. Device.X_CISCO_COM_Diagnostics.Eventlog.Entry.1251, etc.) works since the number of parameters for an individual entry is only four, but changing the web page to do a for loop and make a call for each entry doesn't seem correct in terms of time or approach.

Any thoughts on how best to make the Troubleshooting Logs web page more accurately display the event logs and get around this max param problem would be appreciated, since I think this is something that would benefit all RDK-B users.  For example, is there a way to amend the ccsp_bus_client_tool call so it can get a range of entries instead of trying to get all entries, is the individual loop method the way to go, or is there a better solution than either one of these?

  • No labels

7 Comments

  1. Hi Matthew,

    Can you try to run this command manually from the console if possible and share the output?

    dmcli eRT getv Device.X_CISCO_COM_Diagnostics.Eventlog.Entry.  


    Also please share the PAM.log.* from /rdklogs/logs/ directory

    Additionally can you tell us
    1. If your build is based on the latest master?
    2. If a /tmp/troubleshooting_logs_event_***.txt was created when the query was made.


  2. PAMlog.txt.0

    Here's the output of the command:

    # dmcli eRT getv Device.X_CISCO_COM_Diagnostic

    CR component name is: eRT.com.cisco.spvtg.ccsp.CR
    subsystem_prefix eRT.
    getv from/to component(eRT.com.cisco.spvtg.ccsp.pam): Device.X_CISCO_COM_Diagnostics.Eventlog.Entry.
    Execution fail(error code:UNKNOWN MESSAGE TYPE(-47)).

    I've included the PAM.log.0 as an attachment.

    The build is based on the rdkb-20180527 build, but I don't think any major changes to this part of the rdk-b code have been made since then.

    A /tmp/troubleshooting_logs_event_***.txt is created (and is empty) when the query is made from the web page.

  3. Hi Matthew,

    We are unable to open the PandM logs that you have shared. Can you share it again?
    In addition, can you also share /nvram/log/event/eventlog?
    We suspect that the PandM might have crashed at some point of the query and to confirm this, can you also run

    #systemctl status CcspPandMSsp
    #journalctl

    and share the output with us?

  4. eventlog.zipPAMlog.txt.0


    Sujithra,

    I've included a new PAMlog.txt.0.  I could only see something in it when I opened it with Notepad++.  The first line is really long with a large number of NULL binary values, followed by:

    180910-19:29:05.351662 [mod=PAM, lvl=ERROR] [tid=3675] Total parameter count exceeds 5000 !!

    180910-19:29:05.351822 [mod=PAM, lvl=ERROR] [tid=3675] CcspCcMbi_GetParameterValues -- Something has screwed up. Failure status returned

    180910-19:29:05.351910 [mod=PAM, lvl=ERROR] [tid=3675] when get parameters: Device.X_CISCO_COM_Diagnostics.Eventlog.Entry. 
    180910-19:29:05.352310 [mod=PAM, lvl=WARN] [tid=3675] <CCSP_Message_Bus_Process_Thread> !!!!PSM mode switching happened. Send singal to check dbus connection


    I included the eventlog as a .zip attachment because it's so large.

    This is what I get from "systemctl status CcspPandMSsp":

    systemctl status CcspPandMSsp
    ? CcspPandMSsp.service - CcspPandMSsp service
    Loaded: loaded (/lib/systemd/system/CcspPandMSsp.service; enabled)
    Active: active (running) since Mon 2018-09-10 19:06:52 ; 52min ago
    Main PID: 3672 (CcspPandMSsp)
    Status: "CcspPandMSsp is Successfully Initialized"
    CGroup: /system.slice/CcspPandMSsp.service
    tq 3672 /usr/bin/CcspPandMSsp -subsys eRT.
    mq11426 ti_dhcp6c -i erouter0 -p /var/run/erouter_dhcp6c.pid -plugin /fss/gw/lib/libgw_dhcp6plg.so -d

    The journalctl is very long.  I've included the relevant parts around 19:29:05 in the journalctl.txt file attached here.

  5. Hi Matthew,

    Looking at the logs, it is evident that the logs are flooding from the OEM's side into the EventLogs under nvram, the common library message bus has a max limit to have a count of 5000 entries, beyond which this causes an issue and CcspPandM process crashes. Could you please check for the reason of flooding of logs? 

  6. Sujithra,

    The question I'm asking is not about the flooding of the logs.   The question I'm asking is 'if we have more than 5000 parameters, shouldn't there be a better way to handle the output on the web page other than just not displaying anything?'.

    You could get over 5000 parameters (which corresponds to 1250 log entries) in the GetParameterValues call without any flooding of the logs, given the requirement to be able to show the logs over the last 90 days.

    As I stated before, I think not displaying any events in this case is not correct; if increasing the number of acceptable parameters is not a good option (for memory purposes), then it seems the web page should display entries up to the maximum number allowed as defined by the time frame chosen, i.e. show the most recent 1250 log entries, instead of not showing anything.

    So, the answer I'm hoping you can provide is a generic RDK-B way of handling this error, so that instead of not displaying anything on the web page, the most recent events are still shown.  This is what I think would benefit all RDK-B users in case their log gets to the max number of parameters.

  7. Hi Matthew,

    Yes this is a valid error, the list of last 1250 entries should come up even if the number of entries is exceeding the max count. We will be resolving the issue, meanwhile if you have any fix with you, we will welcome your contributions.


    Thanks

    Sujithra