NSX Bytes: Important Bug in 6.2.4 to be Aware of
[UPDATE] In light of this post being quoted on The Register I wanted to clarify a couple of things. First off, as mentioned there is a fix for this issue (the KB should be rewritten to clearly state that) and secondly, if you read below, you will see that I did not state that just about anyone running NSX-v 6.2.4 will be impacted. Greenfield deployments are not impacted.
Here we go again…I thought maybe we where over these, but it looks like NSX-v 6.2.4 contains a fairly serious bug impacting VMs after vMotion operations. I had intended to write about this earlier in the week when I first became aware of the issue, however the last couple of days have gotten away from me. That said, please be aware of this issue as it will impact those who have upgraded NSX-v from 6.1.x to 6.2.4.
As the KB states, the issue appears if you have the Distributed Firewall enabled (it’s enabled and inline by default) and you have upgraded NSX-v from 6.1.x to 6.2.3 and above, though for most this should be applicable to 6.2.4 upgrades due to all this issues in 6.2.3. If VM’s are migrated between upgraded hosts they will loose network connectivity and require a reboot to bring back connectivity.
If you check the vmkernal.log file you will see similar entries to that below.
2016-07-01T07:02:37.357Z cpu7:223405)WARNING: NetDVS: 547: portAlias is NULL
2016-07-01T07:02:37.357Z cpu7:223405)Net: 2312: connected VM eth0 to VM Network, portID 0x200000c
2016-07-01T07:02:37.362Z cpu7:223405)PFImportState: unsupported version: 0
2016-07-01T07:02:37.363Z cpu7:223405)vsip VSIPDVFRestoreState:1912: Failed to restore PF state : Limit exceeded
2016-07-01T07:02:37.363Z cpu7:223405)WARNING: NetPort: 1431: failed to enable port 0x200000c: Failure
2016-07-01T07:02:37.363Z cpu7:223405)NetPort: 1632: disabled port 0x200000c
2016-07-01T07:02:37.363Z cpu7:223405)WARNING: Net: vm 223391: 5353: cannot enable port 0x200000c: Failure
2016-07-01T07:02:37.383Z cpu7:223405)Net: 3354: disconnected client from port 0x200000c
Cause
This issue occurs when the VSIP module at the kernel level does not handle the export_version deployed in NSX for vSphere 6.1.x correctly during the upgrade process.
The is no current resolution to the issue apart from the VM reboot but there is a workaround in the form of a script that can be obtained via GSS if you reference KB2146171. Hopefully there will be a proper fix in future NSX releases.