So this customer calls me telling he needs help troubleshooting an issue. The customer delivers WiFi in a multi tenant setup. They own the facilities and deliver basic services to tenants like power, water, WiFi, etc. I really like the idea of denying the tenants bringing their own WiFi. The RF would look like Cisco Live’s World of Solutions in no time. +40 AP’s trying to send beacons all on 1Mbit. Not pretty.
The issue was that one tenants Samsung Smartphone/tablets got kicked off the WiFi and had issues associating again. No other tenant had complained.
The customer had recently upgraded their Cisco Wireless LAN Controllers from 8.0.something to 8.1.something.
Going onsite the first thing we do is get a hold of one of the clients that had issues. Sitting with the client in the IT department worked fine. As this is a large facility and the clients move around a lot we decided to try to test roaming. Just by moving from one room to the next. This triggered the error. The client disconnected from the WiFi and didn’t seem to join again.
On the CLI of the WLC I started the debug client command to look for any clues.
*apfMsConnTask_0: Sep 22 10:41:08.798: re:mo:ve:d0:95:59 Processing assoc-req station:re:mo:ve:d0:95:59 AP:re:mo:ve:d1:83:d0-01 thread:151ae860 *apfMsConnTask_0: Sep 22 10:41:08.799: re:mo:ve:d0:95:59 Reassociation received from mobile on BSSID re:mo:ve:d1:83:ce AP AP*****170a *apfMsConnTask_0: Sep 22 10:41:08.799: re:mo:ve:d0:95:59 Optimized Roaming : Client RSSI(-81) is lower than the association RSSI threshold(-74), reject the association request *apfMsConnTask_0: Sep 22 10:41:08.799: re:mo:ve:d0:95:59 Sending assoc-resp with status 34 station:re:mo:ve:d0:95:59 AP:re:mo:ve:d4:33:30-01 on apVapId 3 SSID****1 *apfMsConnTask_0: Sep 22 10:41:08.799: re:mo:ve:d0:95:59 Sending Assoc Response to station on BSSID re:mo:ve:d1:83:dd (status 34) ApVapId 3 Slot 1 *apfMsConnTask_0: Sep 22 10:41:08.799: re:mo:ve:d0:95:59 Scheduling deletion of Mobile Station: (callerId: 22) in 10 seconds
Again using the debug client comes to the rescue.
Optimized Roaming was clearly causing the issue. Optimized Roaming will look at the received packets from a client and use the RSSI and data rate to “help” the client to a better (optimized) access-point. So basically the client tries to connect to an access-point but the infrastructure reject it with an association status code 34.
Status code 34: Association denied due to excessive frame loss rates and/or poor conditions on current operating channel.
My first question to the customer was: “did you enable Optimized roaming as part of the software upgrade you did recently?”
No, Optimized Roaming had been enabled for almost a year. So why did this error occur just now? And why are only the Samsung devices affected?
Looking in the release note of the 8.1 software the customer was running I found a bug that had been solved in 8.1.
CSCuw95126 Optimized Roaming Rejection function does not work Symptom: To prevent consecutive Optimized Roaming association and disassociations, the Cisco wireless controller should reject client association by Optimized_Roaming_Threshold + 6dB, but in production, this rejection function does not work. Conditions: Software: 8.0.*
The customer enabled Optimized Roaming when running 8.0, but hit the Cisco bug CSCuw96126. The WLC failed to reject the clients in 8.0, so the client stayed associated. Now when upgrading to 8.1 Cisco had fixed that bug and rejects are working and kicking clients of the network as it should according to Optimized Roaming. Who sad fixing a bug will get you good results?
As only this type of Samsung device from this tenant had issues they opened a support request with Samsung. Surely this clients software didn’t like the status code 34 and couldn’t recover from it.
We did a workaround to disable the Optimized Roaming until the tenant could get an updated software from Samsung.