Teams Network Roaming Policies

Roaming policies are a relatively new addition to Microsoft Teams. Their predecessor in Skype for Business was called Call Admission Control, which is an industry term for “should I admit this call onto the link it’s destined for, or will that overload it and cause issues?”.

The SfB configuration for CAC was (is?)…. not fun. You needed to build out policies based on each sites connection to other sites and regions, you had to sort per-user and per-site restrictions, and you had to make broad assumptions around how much bandwidth particular calls may consume. It did allow for calls to be placed from one SfB user to another via the PSTN in situations where the WAN link was at capacity (or down), and that configuration was fun. (Or maybe I was just tired?)

But Dear Reader, bandwidth consumption is highly dependent on payload/user activity! For example, a video of a user sitting in front of a webcam, in front of a blank wall, will consume different amounts of bandwidth if that user is wearing a plain shirt versus a patterned shirt. Why? Video compression/encoding depends on changes between frames, and a patterned shirt will result more changes than an equivalent plain shirt. So I’m supposed to plan bandwidth consumption based on the shirts people wear? Well no, that’s just one example. Screenshares of a static document will consume less than a video, or a screenshare of a web page with ads popping up all over the place. A voice through an SBC call may flip to G711 and consume 64k (plus overhead) across a WAN, where as a voice call between two clients may use a “better” protocol and consume 8 or 16k.

A second consideration was that CAC couldn’t account for backup WAN links, or SDWAN capabilities, it only used the values you told it were available between the two sites. If you had a backup link that had a lower bandwidth and you were adventurous, you could try to restrict usage over that backup link using things like DSCP priority tags – you could, for example, block video traffic from the backup link, allowing voice traffic through.

The other complication with CAC, is that it didn’t take in to consideration what the actual available amount of bandwidth on the link was. You had to ballpark that “this is a 100Mb WAN link, and we’ll say that SfB can use 10Mb of that”. Well, if the actual available bandwidth is 40Mb, why are we restricting SfB to 10Mb? There were thoughts and some options about software defined networks and having the network able to talk to the applications and make dynamic decisions, but that sounds like as much fun as dumping a couple dozen forks in your bed and trying to go to sleep.

When Teams launched, there was no consideration for CAC until Roaming Profiles came along. Roaming profiles greatly simplify things. You don’t need to indicate the size of any links, which makes sense given that Teams could use WAN/SDWAN links or Internet connectivity, and that WAN/SDWAN links might be running as VPNs over the Internet link anyway. The net result is that you can restrict whether users in that site can use video, and the maximum media bitrate that each user can consume.

How this works is pretty simple. The parameters is this policy (applied to the site) override the parameters in the user’s policy (either the global or per user policy), while the user is in the site.

And one more catch: There’s another policy setting that allows the Roaming Policy to take effect if the user isn’t voice enabled:

To enable the network roaming policy for users who are not enterprise voice enabled, you must also enable the AllowNetworkConfigurationSettingsLookup setting in TeamsMeetingPolicy. This setting is off by default.

From https://learn.microsoft.com/en-us/microsoftteams/network-roaming-policy

And

From https://learn.microsoft.com/en-us/powershell/module/skype/set-csteamsmeetingpolicy

Note that the explanation in Teams Admin Center (if you hover your mouse over the i) isn’t clear about the fact this setting is only for non-voice users.

 From Teams Admin Center:

If this policy isn’t configured, then the roaming policy won’t kick in for non-voice users. If you deploy roaming policies, you will likely want to flip this setting on all global and user policies.

Branch Offices in Hybrid and Online Environments

In my previous two posts, I’ve covered branch office solutions including the SBA and the alternatives. No discussion on branch offices could be called complete without including hybrid environments.

The hybrid conversation is the same as on-prem when your user is homed on-prem and not online. As soon as your user is homed online, you have two seperate considerations

  1. User connectivity to the Cloud for all functions but PSTN
  2. Cloud connectivity to Cloud Connector Edition (CCE), Direct Routing (DR), or On-Premises Call Handling (OPCH)

For the first point, local Internet is what Microsoft recommends. Recall from earlier posts that you can use your router/firewall to direct O365 traffic out locally, and send all other traffic across a WAN, if that’s what your organization requires. If this Internet connection fails, your options are to route across the WAN, a 2nd Internet connection, mobile clients with LTE, or head elsewhere to work.

For the second point, things can get a bit more complex. The routing and high-availability of CCE, OPCH, and DR varies. Other factors will be centralized PSTN breakout vs a more localized approach – you’re more likely to have a business class or redundant connection for a centralized service than if you have distributed services.

Cloud Branches

One common solution that I see a lot of, is a hybrid scenario with branch office users hosted online, and main office users hosted in the on-prem pools. This eliminates the cost and administrative overhead of running branch office solutions, while keeping some infrastructure around for financial, compliance, interoperability with other services/devices and other reasons. Cloud-homed branches are also a great stepping off point when you’re moving a larger organization to a pure online environment.

Pure Cloud Considerations

At this point it shouldn’t be surprising to you that there really aren’t any new or unique considerations for branch offices when your entire organization is cloud based. From a branch office perspective, there’s no local infrastructure different versus hybrid scenarios.

Edge Cases and Wrap-up

In the past couple of posts, I’ve covered branch office considerations for high-availability. The range from SBAs, redundant WANs, redundant Internet, full pools, and more. While comprehensive, I didn’t cover every use case. When considering the solutions that best apply to you, draw up a simplified map of your environment, get a bunch of copies of it, and have at them with a red pen to indicate failure points. Work through these outages using your most important use cases to establish what works, what’s limited or hobbled, and what’s entirely broken. If a scenario doesn’t work for your use cases, put it aside.

You’ll now have two piles – works for me, and doesn’t work for me. Next, review the scenarios that do work, and establish which one best fits your business needs, including pricing. If the mighty dollar sign knocks all of these scenarios out of contention, you now need to sort through the “doesn’t work” scenarios, and work through them to find “the best of the worst” that does the best job of fitting your business needs and budget.

Branch Office Options

In my earlier post, I covered the SBA and what I feel are some pretty significant downsides given the technology changes in the past 10 or so years. So what are the options?

Redundant WAN

The simplest option, from an SfB point of view, is to have redundant connectivity from your branch office to your main office. How you go about this can vary. You could get a 2nd line from the same carrier, but that doesn’t help you if that carrier suffers an outage. A different carrier would guard against that, though watch out for the 2nd carrier simply using the first carrier for all of part of their services. Even with two different carriers, you could wind up with fibre in the same conduit, and you may suffer the dreaded backhoe fading.

Backup VPN

A backup VPN might make more sense. An Internet connection is less expensive, and there’s a good chance that it’s not sharing much or any infrastructure with the WAN link. The first issue to watch out for with VPNs is that you may not have sufficient upstream bandwidth. The second is that you may not have sufficient bandwidth at all. If you are using a lower capacity link as a backup, you can use the DSCP markings that you applied to your SfB traffic for QoS (you did do QoS, right?) to help you out. Your firewall/VPN device can be set to prioritize voice traffic based on these markings, and potentially block video all together.

Use SfB Server Standard Edition instead of an SBA

If redundant connections aren’t feasible, using a Standard Edition server may be. This moves all of your users functionality to their location, preventing the ugliness of limited functionality mode.  However, you now have to license this server, and you’re no longer dealing with an appliance – though you’ll recall that some SBAs were just servers with PRI cards anyway.

More downsides here are that if a user homed on this server hosts a meeting with a large number of participants from outside their office, all of that traffic is going to hit the WAN. Also, if the users in this office work remotely a lot, all of that traffic transits the WAN to reach the Edge servers at the main office…. unless you deploy an edge in the branch office, and now it seems like we’re boiling the ocean and building rocket ships to guard against a branch office WAN failure.

Get Out of the Office

The last option to deal with a branch office outage would be to get out of the office, either virtually or physically.

SfB has excellent mobile clients. If your users are homed in a central office or a datacentre, they can use the mobile clients to connect to their pool over LTE. There may be some limits here, like not being able to be a member of a Response Group, but as a backup option this one is pretty simple, and your staff may all already have company phones or subsidized company phones.

Lastly, the users can find a different place to work. This could be home, it could be a co-working space, or a coffee shop.

What about Hybrid and Cloud?

Finally, if you’re in a hybrid or cloud deployment, I’ll provide some thoughts on how to handle branch offices in the next two posts.

 

Survivable Branch Telephony & The SBA

Skype for Business has long had a survivable branch office telephony solution product, but it’s no longer a good choice in 2018 and beyond. I’m not convinced that it was ever anything more than “the best of the worst”.

Way back in the day, Cisco had a survivable remote telephony option. If your branch office WAN was down, a line card (analog, PRI, or even SIP) in your branch router could get you access to the PSTN. Office Communication Server 2007 R2, the first release in the LCS/OCS/Lync/SfB product to be classifiable as a potential PBX replacement, didn’t have a comparable solution. With Lync 2010, Microsoft introduced the Survivable Branch Appliance (SBA).

SBA Administrative Effort

The intention was that Microsoft partners would build appliances that host the SBA software. A company could ship one of these to a branch office, have an entry level technical person rack & stack the device, and the IT techs more experience with voice would then connect to it remotely and finish the configuration. In reality, that was… challenging. Some initial firmware upgrades required a USB hub, flash drive, keyboard and VGA monitor to be hooked up, so that the “entry level” technical resource could bang their way through a really ugly CLI. Not fun.

In terms of ongoing support, the vendor is responsible for releasing updates for the SBA, including the CU. When Lync 2010 CU4 was released (it enabled the first mobile experience) it took more than 6 months for some vendors to roll out CU4 for their SBAs. Not a great experience, and a lot of my customers abandoned the SBA in order to get their users their mobile connectivity.

Hardware

Depending on the vendor, a PSTN gateway was often included as a module in the SBA. Alternatively, the  SBA was a module in the gateway. The first option offered a chance of decent performance. The second option typically offered terrible performance. It’s simply not possible to shrink Wintel server horsepower down to an itty-bitty interface card size, never mind power it. Often the specs were similar to those of entry level laptops. This really limited performance and capacity, and the higher-end modules are expensive.

If you went for the full-sized SBA with the add-on gateway, you will likely open the box and recognize the device as a rebadged 1U or 2U server from one the standard Wintel server vendors. Don’t those need 4 post racks? Yup. And this is for a branch office, and while some branches have 4 posters, a lot are lucky if they’ve got an 8U wall mount equipment rack.

Feature Loss

Other than being a difficult product to manage, the SBA has some significant drawbacks.

First, the users only use the SBA as a SIP registrar and for backup telephony. All meetings, video, whiteboard shares, response groups, contact list functionality is all in the Standard Edition or Enterprise Edition that the SBA is associated with. If that SfB pool or connection to it is lost, users are presented with a “limited functionality mode”, with no contact list, and only voice capabilities. However, voice calls have to be placed by dialing the full number. No click-to-call, no buddy list, no extension dialing. Yuck. How many users would a) remember what to do to place in call in this mode, and b) actually want to do it, versus doing other things during the outage?

Next, your PSTN connection needed to be to the gateway located onsite with the SBA. Your three options are analog lines, PRIs, or SIP trunks.

Analog lines offer terrible call quality, and don’t offer DIDs. The later is okay if you only need outbound calling and don’t need the call to have your DID as caller ID, otherwise it’s a pretty terrible experience.

PRIs offer good call quality and DID support, but they’re expensive. Fractional T1s offer just a tease of cost reduction. You can pay 20% less for 50% less capacity, if you’d like!

SIP trunks offer good call quality and DID support, and depending on your provider, offer some extra options that PRIs and analog lines can’t support, like redundancy and failover.

Remember that the SBA doesn’t have any functionality other than the SIP registrar and backup calling. It’s likely that when your wan is down, your callers can’t reach Auto Attendant, Response Groups, or voicemail. That’s not a very good scenario, though there are a couple of workarounds – with huge tradeoffs.

Another downside is that typically a user homed on a full SfB pool has failover to a paired pool. With an SBA, this isn’t possible. The user is homed on the SBA, and can only failover to the full SfB pool associated with that SBA. If the SfB pool is lost, they’re in limited functionality mode. If an SfB pool that’s paired with another is lost users failover with that same limited functionality mode, but the administrator can quickly trigger a full failover that restore all modalities and functionality.

So what’s a organization supposed to do?

On the plus side, the SBA didn’t consume an SfB server license. That’s one small plus for SBAs, against a sizable list against. So, what are your options, in a classic on-prem only environment, but also in a hybrid, Cloud Connector Edition, Teams Direct Routing, and mixed SfB/Teams scenarios? Stay tuned!