If you develop products that use cellular technologies, then this post is for you. I’ve been a programmer for about 30 years and for the past five years I’ve worked with IoT products and other embedded systems. Cellular technologies have been one of the most opaque and risky aspects of these projects, bar none. But it doesn’t have to be that way.
Disclaimer: This posts links to several products and companies. I don’t have any commercial relationship to any of them at the time of writing this post.
The problem is… cellular?
For anyone new to this field, “cellular” means the same thing sort of thing that your cell phone is using. Technologies like 2G/GSM, 3G, 4G/LTE, LTE-M, NB-IoT and so on. You usually have a SIM card of some sort, or a software-based version of one, and connect wirelessly to a cellular base station owned by a mobile network operator (MNO).
These days you can kinda ignore 2G/GSM and 3G as they are being phased out. Your cell phone typically uses 4G/LTE or 5G. When working with embedded systems you will also see 4G and 5G mentioned, but usually we don’t use exactly the same technologies as your cell phone. If an embedded device has a very large battery or is powered via cable then you might see devices use something like LTE Cat 1, which gives you high bandwidth.
But more commonly we use LTE-M or NB-IoT. To not go into too much detail, let’s just say that LTE-M is more suited for devices that tend to move around and NB-IoT for devices that stay put for the lifetime of the device. This is a simplification but it tends to match the use cases you see in the field. LTE-M and NB-IoT are part of both the 4G and 5G standards.
LTE-M and NB-IoT are low-bandwidth protocols that you can use to reach the Internet via IP. You can also use text messages (SMS) if you like. Sounds simple enough, doesn’t it?
You Can’t Leave Australia
We had developed a Linux-based controller with a cellular modem for a client and it was working quite well in Sweden. You connected it to the client’s hardware and it worked both as a controller and as a gateway to the client’s servers on the Internet. The client wanted their customers to see how much value they were getting from the product, to let them control some aspects of the product and also to push software updates. These systems were installed in places that were difficult to reach, so you couldn’t normally just walk up to it with a USB stick. On top of that, they of course got data on their whole fleet that would be valuable for attracting investors.
Now, the problem with cellular technology. The controller was working in Sweden. Does that mean it will work outside of Sweden? Not necessarily. If you are just using a SIM card from one of the operators whose names often start with a capital “T”, or from one of the virtual operators who is just reselling access to those operators… you are in trouble.
It worked in Sweden so our client (of course) decided to send a device to a customer in Australia, along with a technician who would install it at the customer’s site. Everything seemed to be going fine until the device just didn’t connect to the Internet. The cellular connection wasn’t working. We gave the technician instructions for how to get a shell on the device so that we could debug the problem remotely. You’d think having a shell on the device would help, but it didn’t. All it told us was essentially that the modem couldn’t get a connection.
But at the same time, it sort of seemed like we could do things with ModemManager that would somehow convince the modem to connect. Maybe, with some luck, we would stumble upon the right AT commands that would convince the cellular spirits to speak each other’s languages.
The technician grew weary and even started to move the antennas around. The reported RSSI (signal strength) looked a bit weak, so maybe if he moved the antennas to a different location? Would a different antenna work? None of us were there with him, but you can sort of imagine a guy walking around with an antenna and waiting to see if he can find a sweet spot where there’s coverage. “Is the picture better now, honey?”
We were off in our cozy little Sweden while this technician was trying everything imaginable and not imaginable. And he could not leave Australia until the modem connection was working! Delivery on time or the pizza guy bakes a new one in your kitchen!
Questioning Your Life Choices
Clients wanted to know: we’re using cellular and it works fine on my phone, why can’t you make it work?
One of my colleagues liked to say that the cell phone vendors have spent countless hours making their products work seamlessly on all networks. That it true, of course. Operators also spend time making sure that the popular cell phones work on their networks. Nobody is going to deploy a new cell network without checking that you can use it with an iPhone and the more popular Android phones. We who have less popular hardware are not so lucky.
The operator (one with a capital “T”) had a web site for customers, a portal. I have seen a few of these portals and in my experience there is a simple way to tell a good portal from a bad portal: bad portals focus on billing, good portals focus on real-time technical data. This operator’s portal was essentially useless for debugging the problem with the device in Australia. Sometimes there would be an entry saying that the modem had connected to a local operator. That’s it. Useless. The modem itself was reporting similarly useless information. “Registration denied,” essential.
But the portal had once reported a working connection? It should be possible to make the modem work? The technician grasped on to this little pearl of hope and developed superstitions. Maybe he had actually been briefly successful when moving the antennas? Maybe if he kept moving them he would find that perfect position again and could make the flight back home?
This mixture of negative and occasional positive rewards combined with almost no information is the perfect way to make people (or almost any living being) develop a nervous disorder. I remember hearing about an experiment with rats that had two buttons: one gave them food and another one gave them an electroshock. The rats were well-adjusted and avoided the electroshock button. In a second experimental setup the buttons did random things: sometimes they dispensed food and sometimes the rats got shocked. The rats got neurotic.
Our technician was living in a chaotic universe where nothing made sense. Sometimes the portal gave a treat (successful connection) and other times it gave him an electroshock or nothing at all. To be well-adjusted beings, we rely on a universe that makes sense. If our actions result in random results then we get neurotic. Debugging turns into a nightmare we have unpredictable results.
The problem is not cellular
Telephone networks will eat us all if we don’t stop them. But there are some simple tricks to surviving as a developer of cellular devices.
What Operators Don’t Tell You
First of all, you want an operator that has a good portal. Good meaning that you can actually get useful information out of it. Not simply information on how much each SIM is costing you, with buttons for disabling SIMs so you will not be billed, billing periods, and other things having to do with billing. Billing addresses, monthly billing, 20 MB quotas, topping up SIMs. The less the portal cares about billing, the better. Keep it simple.
All portals I have seen, with one exception, have had extremely brief network logs. They tell you that the modem connected at some time, along with what technology and local operator was used. If that’s all the debugging information they give you then they could have just sent you a magic 8-ball. It showed up at Telemalefic at 1.00am and then went quiet. Uh-huh.
Operators have a lot of information. Unfortunately most of them have built their networks in a way that they can’t give you that information. You have to ask them ahead of time to start monitoring a device and then maybe a week later you can get some packet dumps, if you are lucky. That is not good enough; you need the data immediately. They’ll claim it can’t be done.
There is another just wonderful piece of information that operators keep from you. Operators, even the ones with a capital “T”, rely on commercial agreements with other operators. Nobody has global coverage, nobody is everywhere. When the modem starts up it scans the bands to see which operators are available. It will prefer to connect to the SIM card’s home network, which it will likely not find, so it will try to roam via a local operator. (If you have a roaming SIM then you better pray that it doesn’t find its home network).
Did you know that the SIM card will try to pick the local operator which is most commercially beneficial to the operator who sold you the SIM card? Even if that means you get a really shitty connection? Even if there is another operator with much better RSSI?
Operators do not like to tell you which other operators they have commercial agreements with. They definitely don’t tell you the bulk rates on data, that is none of your business.
I know of only one operator that does not behave this way. More on them later.
The Debuggable Setup
Let us open up this box of voodoo mystery toys that is cellular. There are two pieces of hardware that you may want to debug: the SIM card and the modem. If you have an MCU then you will also want to snoop on the serial line between the modem and the MCU, which you can do with a signal analyzer or the decode function in an oscilloscope. Mostly this is to verify that the communication between them is working, which is good to know at board bring-up time and in rare situations with modem stacks.
MCU ⟺ (SIM ⟺ Modem) ⟺ Network
You will also want to debug problems from the network side and I will show you how to do that as well.
The photo shows a debuggable setup (from top to bottom): a SIMtrace2 from Sysmocom, an nRF9160-DK from Nordic Semiconductor, and an Onomondo SIM card.
Big disclaimer on this photo: there are two things very wrong. The SIM card is upside down because it looks much better that way, and I didn’t have the right adapter to connect the SIMtrace2 to the SIM holder. It is 180° off. So the photo is completely wrong and doesn’t work. I could have gotten away with not saying anything, but here we are. You need to order the right adapter if you want to use the SIMtrace2. Be careful to check the orientation!
Debugging the SIM Card
The device in the top part of the image is a SIMtrace2 from Sysmocom. It lets you snoop on the data sent between the SIM card and the modem.
The item in the bottom part of the image is a holder for a full size SIM card (also sold by Sysmocom). The holder is not strictly needed, but it’s nice for easily swapping out SIM cards in the devices you are working with. These types of cables and holders can save a lot of time.
SIM cards are smart cards with a lot of logic embedded into them. There are programs you can use together with a USB smart card reader to interrogate the SIM card. Many business laptops have a built-in smart card reader and you can use it to read the files on the SIM card with something like cardpeek. But the SIMtrace2 gives you a live view of the communication between the modem and the SIM card.
The data can be viewed in Wireshark. SIMtrace2 is probably not the most useful tool you will have for day to day work, but you should have it in your toolkit.
One very interesting file on the SIM card is the FPLMN list (28539 or 6F7B in hex). This file contains a list of networks that the SIM will never connect to. Operators often don’t mention this file at all, so it is understandable if you haven’t heard of it.
As mentioned before, the modem scans for local operators. It looks for its home network first, but then falls back to roaming, i.e., using a network that is not the home network. So it will try to connect to the networks it hears. It can’t truly know ahead of time if it will be allowed on the network. Everything is automatically expensive in the telecom world so you can’t keep hammering a network with connection attempts. When the modem is denied registration on a network it adds that network the FPLMN list.
So potentially you can’t use that network ever again. Oops.
Well, unless you know about the FPLMN list and clear it. Or if you are denied from so many networks that you overflow the list. This can happen if you travel to another country with the SIM card. Which is just the perfect storm of confusion. You can also put the SIM card in a smartphone and it will likely start working again. Because of course it will.
Most people are probably used to Internet protocols where you can try again a bit later. That makes sense to modern humans. We don’t expect our computer to have a file where it writes down every web server that ever rejected a connection so that it won’t try again. We don’t expect a WiFi network to be permanently disabled if something went wrong once. But cellular is not the Internet. The telecom mindset is kind of like there are still operators connecting everyone’s circuit by hand.
Debugging the Network Locally
The device in the middle is an nRF9160-DK from Nordic Semiconductor, which has the MCU and the modem integrated in the same chip. This means that there is no serial line to snoop on. On the other hand, Nordic’s custom version of Zephyr RTOS comes with drivers for the modem, and there is nothing the hardware designer can do to mess up the integration with the modem. So we can be pretty confident that no TX/RX lines are swapped, and so on.
The nRF9160 modem can be monitored in real-time. You can of course monitor the connection status itself, but you can also get packet logs on various levels. Not just for your own application. You can see the traffic between the modem and the cell station. Nordic doesn’t show you everything, but it should be enough to figure things out. You can also send them the raw file in case there is some problem and they can have a look at it.
Are you completely out of luck if you don’t have the Nordic nRF9160? You might be, but you can ask your modem vendor. Many modems are based on hardware from Qualcomm and it may be possible to monitor the traffic. Here is one tool (which I have not tested personally) that lets you monitor the traffic on some modems: QCSuper.
But there is no reason to despair if you can’t monitor the traffic locally with your hardware, because you can monitor it remotely.
Debugging the Network Remotely(!)
The SIM holder in the above photo holds a SIM card from Onomondo, which is the operator with the best debugging capabilities in the world (in my opinion). They also have a nice business model where they actually let you select which networks you want to use. Network lists and costs are open and they pass through wholesale rates to you. They have gotten a lot of things right.
Onomondo provides live packet traces. You do not have to ask for them, you just log in to the portal and they are right there. There is even an API if you need it.
The online version of the traffic monitor is a bit limited, but you can download the files in PCAP format and open them in Wireshark.
But what if your modem is not able to connect to a network, like the case with the technician lost in Australia? There wouldn’t even be IP packets sent, so nothing to help you debug the problem, right? Wrong.
Onomondo also has live signalling logs. And it’s not a simple thing to achieve. They have spent years building a network that has this capability. You can’t as a network owner wake up one day and decide that this is what you want to do and then have it get done in short time.
If we had used Onomondo when that poor technician was standing upside down on the other side of the world, we could have gone to the page for the SIM card and looked at the signalling logs.
Notice the lines highlighted in blue in the log? That might be all the information you get when you look at another operator’s portal. The online viewer is again a bit limited, but you can open the file in Wireshark and dig down.
(Yeah, they really use SCTP instead of TCP or UDP).
I have used Onomondo’s traffic monitor to troubleshoot and fix problems that had at the time been lingering for years. There was just no way to troubleshoot them before. The traffic monitor showed exactly what was going wrong.
I have had a modem go catatonic, reporting the same CME ERROR for every command. Even for basic commands like trying to go into airplane mode. The signalling logs showed me that the modem had its registration denied by a local operator. This then made it go crazy. There was simply no way to get this information from the modem itself. I tried to force it to use another operator, but it wouldn’t even let me do that. So many potential problem causes were eliminated by seeing that one DIAMETER packet in Wireshark. If I had used another operator then I might have asked them for signalling logs, and sat on my hands for a week or so. If I would have even found an FAE that could get me logs.
My Spiel About Onomondo
Onomondo currently (in 2023) has a free 30 day trial. The debugging capabilities you get with this SIM are unmatched, and there are other benefits with this operator as well. The openness is incredible if you’re used to operators hiding everything and giving you a crummy portal that only has billing information.
I’ve seen operators that don’t even know if their SIM works with AT&T. You Had One Job. What networks can you use with Onomondo? You don’t even have to ask them. Just have a look at Onomondo’s coverage map.
How to Escape Australia
The technician in our earlier anecdote did finally get to leave Australia. Our client gave up and bought a commercial LTE gateway and hooked it up to the Ethernet port. We let Teltonika figure it out. I wish we would have had the right tools at the time; we would have solved it. I have solved similar problems since then. Cellular is no longer voodoo.