Pff, back to this blog after weeks of R&S! While helping a friend on a lab scenario, we had several nice LWAPP APs error messages, using debug lwapp events enable and debug lwapp errors enable on the controller CLI. Of course the APs were not joining. So here they are. The game is to try to guess what is going wrong:
First one, difficulty level 3 (scale of 5):
AP Associated. Base Radio MAC: f0:aa:bb:cc:dd:ee
AP Disassociated. Base Radio MAC:f0:aa:bb:cc:dd:ee
AP with MAC f0:aa:bb:cc:dd:ee (APf0aa.bbcc.ddee) is unknown
AP is unknown? What do you mean, unknown? Well, this model is not supported yet on this code. Here we were trying a 1140 AP on a 4.2.130 controller.
Okay, next one, difficulty level 2:
AP cannot join because the maximum number of APs on interface 1 is reached.
Self explicit, right? a 4400 controller can take up to 48 APs per physical port without LAG, and the controller max license capacity with LAG. Here we had 26 APs on a 4402-25 controller.
Next one, difficulty level 1:
Register event for AP f0:aa:bb:cc:dd:ee slot 0
AP f0:aa:bb:cc:dd:ee: Country code is not configured (FR).
f0:aa:bb:cc:dd:ee Regulatory Domain Mismatch: AP 00:22:90:eb:66:50 not allowed to join. Regulatory Domain check failed.
Here again, quite easy. Controller is not configured for the country for which the AP is set.
Next one, difficulty 4:
...does not include valid certificate in CERTIFICATE_PAYLOAD from AP f0:aa:bb:cc:dd:ee.
Unable to free public key
Public key? Sounds like a certificate issue. in this case, the AP time is so far from the controller time (the controller time is set to 01/01/1980!), that AP certificate is seen as invalid... debug pm pki enable tells the complete reason:
Current time outside AP cert validity interval: make sure the controller time is set.
Next one, difficulty level 2:
AP Authorization failure for f0:aa:bb:cc:dd:ee
Yep, as the warning says! There is an AP authorization list set on the controller (under security > Wireless policies), and as soon as it is enabled, any rebooting AP not in the list is refused on the controller. This one is easy for the message you see, a bit trickier in a real network. When you setup the authorization list, all the already connected APs stay happily on the controller, it is only when they disconnect and try to reconnect that the problem occurs.
Next one, difficulty level 3:
Received a Discovery Request with subnet broadcast with wrong AP IP address (A.B.C.D)!
Well, if you are in the real lab, you'd better check your trunk to your controller. Here an AP is a VLAN unknown to the controller (let's say VLAN 50, while the controller management interface is in VLAN 10, several dynamic interfaces exist on this controller port, but no VLAN 50). As the trunk to the controller does not filter which VLANs are allowed, the AP broadcast in VLAN 50 is sent to the trunk, and the controller wonders what this message is supposed to be.
Time for me to beat a dead horse on this topic: in this scenario, VLAN 50 should be forbidden on the trunk to the controller, and routing used to communicate between the AP in VLAN 50 and the controller in VLAN 10. ip routing is enabled on the Layer 3 switch and, again, VLAN 50 is NOT allowed on the trunk to the controller.
Last one, level 5:
unable to free slot ID 0 for AP f0:aa:bb:cc:dd:ee.
This is a tough one! Give you some more clues: the AP is in a remote location, accessing the controller through a WAN link. The AP knows this controller and was manually configured to join it (from the AP cli: lwapp ap controller ip address a.b.c.d)...
Another clue, controller perfectly has space for this AP, so this is NOT really a slot availability problem...
Well, issue was in fact... routing! The AP had a route to the controller, but the controller router did not have a route to the AP subnet! As the AP was manually configured, it was sending a join request. The join reply was never reaching the AP, as there was no return path from the controller. So the AP kept trying, probably exhausting the controller slots and this message appeared on the controller.
Fun!