Grid member(s) going offline and unable to join back. The offline member shows a date in 2004 and syslogs log messages indicating VPN certificate expiry
The issue may affect grid that have run NIOS 4.x or 5.0rX in the past (even though they may be running a later release currently.)
NIOS versions 5.1r2 and earlier contained a CA certificate that expires this month (April, 2014) which may cause grid members to drop off the grid and no longer be able to join back.
The certificate is necessary to establish VPN connections between members and the grid master. In most cases, this certificate has been replaced with a newer certificate during the upgrade process. However, NIOS grid members that originally ran versions 4.x or 5.0rX may have retained the older certificate.
The ‘notAfter’ date in the syslog message below will help to validate whether the member is offline due to certificate expiry
: 2014-04-18T21:18:40+00:00 user (none) VPN_CACERT_DATES: notice notBefore=Apr 16 22:45:48 2004 GMT notAfter=Apr 14 22:45:48 2014 GMT
Infoblox strongly suggests that a hotfix which replaces the expiring CA certificate be applied to grids that have run NIOS 5.0 or earlier at one point. Note that this hotfix simply replaces the CA certificate with the newest version. It is safe to apply to all grids.
1. How to Identify if the grid has members that contain expired CA certificates:
Apply the attached diagnostic hotfix Hotfix-NIOS-GENERIC-CHECK-VPNC-CACERT-DATES-4d27a1b49bd52246d0e9ed28d2ae9fed-Fri-Apr-18-13-19-25-2014.bin using the GUI functionality: Grid | Upgrade | Upload. No need to restart after. Check the syslog file of every member for a message that contains the following:
< : 2014-04-18T21:18:40+00:00 user (none) VPN_CACERT_DATES: notice notBefore=Apr 16 22:45:48 2004 GMT notAfter=Apr 14 22:45:48 2014 GMT
If messages that contain “notAfter=Apr 14 22:45:48 2014 GMT”are observed – proceed with applying the remedial hotfix that matches the software code release running on the grid.
2. Which remedial hotfix to use:
Engineering have developed three hotfixes based on the major NIOS product lines (NIOS4.x, NIOS5.x, NIOS6.x), attached to this KB. Please use the one that corresponds to the current software release your grid runs:
Customers running NIOS 4x:Hotfix-NIOS-4.X-GENERIC-J46112-VPN-CRT-TIME_ISSUES-SIMPLIFIED-344608670a439debd057aedc7fb866f5-Fri-Dec-20-19-44-58-2013.bin***Note: Need to restart product on all members to make sure new code and certificates are used.
Customer running NIOS5.x code, but prior to NIOS 5.1r3-0:Hotfix-NIOS-5.X-PRE-5.1r3-0-J46113-VPN_CRT_TIME_ISSUES-SIMPLIFIED-fca7ecfe2a3a3fa868521157635e3dd8-Mon-Dec-23-19-25-01-2013.bin***Note: Need to restart product on all members to make sure new code and certificates are used.
Customers running NIOS 5.x from 5.1r3-0 and later, and NIOS6.x:Hotfix-NIOS-POST-51r3-GENERIC-J47667-VPN-CRT-TIME_ISSUES-7e21306d286a1a5cce365529a861aad6-Thu-Apr-17-22-31-14-2014.bin***Note: Need to restart product on all members to make sure new code and certificates are used.
Please login through GUI: go to Grid | Upgrade | Upload. Upload the hotfix and restart the product on every grid member.
4.x releasesFrom Grid perspective > Right click on the member node > Restart Product > OK
5.x and 6.x releasesGrid > Grid Manager > Members > Select the Member > Control (From Toolbar)> RestartDo product restart on the grid member first before restarting the master. In case of an HA pair node, restart the passive node and then the active node. Product restart is a complete NIOS software restart and there will be a few seconds interruption during the restart.