Mobile devices detection using the User Agent

One of the basic HTTP headers is the User Agent, which provides information about what type of device is connecting to a web server, usually (but not always) along with other details as version and compatibility.

Detection and parsing of the user agent string is a very interesting field. Being a simple HTTP string header anyone can tamper with it and modify to fool servers and websites into thinking it is something else, so the basic rule here is "take the UA as it is: one unsecure descriptor". In other words, think accordingly and either let the user handle errors for tampering with it, or query for more information.

Before digging into how to detect a mobile device using the user agent, I've prepared a small demo that displays a lot of information gathered by ASP.NET in the Request object:

For example, using the AcceptTypes array you can check the types of markup understood by the browser/device, auto-setup a default language based on the UserLanguages array, or check the Referer and forbid hotlinking of your images from other sites or forums. If you click on the "View Extended info" link you'll get a few fields as inspiration sources :)

Back to the goal of this post, the UA strings should follow a pattern (defined loosely in the HTTP RFC), so making a list and matching against it would be the first idea. There are huge UA lists, the most famous one being WURFL, because also includes device capabilities.

WURFL is not bad, in fact is great for some basic detection of mobile phones and some PDAs. But if you're going to use it for capabilities detection, beware of a huge design mistake that WURFL creator made: if you only store a boolean value for a capability, you're introducing either false positives (fake true values) or false negatives(fake false values).
The correct approach would have been to store a tri-state value (unknown/true/false), but WURFL doesn't has it, so you will probably want to create your own "extended WURFL" with some sort of validated capabilities or corrected ones (I've suffered both cases of false positives and false negatives at work).

Once pointed out this important limitation, and focusing on the UA itself, we have our UA database, but when testing with some varied phones, we see that some of them don't match. Why is this?

Well, probably because mobile operators and mobile phone manufacturers seem to enjoy ignoring specifications and doing "funny" things like sending your phone serial number in the user agent, appending all kind of dumb info (under Windows this is quite frequent too, to have all kind of crap added to the user agent) or having tons of revisions of a browser, all of them screwing an exact match.

A few examples of desktop browsers:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729)

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-GB; rv: Gecko/20100401 Firefox/3.6.3

Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/ Safari/532.5

And some mobile devices:

Mozilla/5.0 (Linux; U; Android 1.5; HTC Magic Build/PLAT-RC33) AppleWebKit/528.5+ (KHTML, like Gecko) Version/3.1.2 Mobile Safari/525.20.1

Mozilla/5.0 (iPhone; U; CPU iPhone OS 3_0 like Mac OS X; en-us) AppleWebKit/528.18 (KHTML, like Gecko) Version/4.0 Mobile/7A341 Safari/528.16

Vodafone/1.0/HTC_Diamond/ Opera/9.50 (Windows Nt 5.1; U; es_ES)

See a few problems in those sample UAs? Some have "Mobile Safari/xxx" while others have "Mobile/xxxx safari/yyy". Some include a revision number or the language inside the parenthesis, while others don't (or include a lot of different info. The HTC Diamond even lies, saying that it is a Windows NT 5.1 platform (a desktop Windows XP!)
And this are at least common ones that you can easily fix (searching for "HTC Diamond" for example instead of focusing just on the operating system), others are really hard or contain way too much useless info.

One way to improve the matching is using some algorithm, the most common one being Levenshtein, which measures how different are two strings (in number of charactes).

But even without applying some "noise cleaning" to the user agent, it will only have a limited improve in the matching process.

And here is where we stop and let you think about the best way to do or improve UA matching.

I for example am working in a small "Mobile Device Detector" .NET library that tries to detect mobile devices using only regular expressions, some UA cleaning and some logic, but no UA database at all (fast & light on resources, but at first not as varied and complete as a DB solution).
Right now it's in the early stages but already working and can be tested in this url (recommended with a mobile device):

If you notice any problem or incorrect matching please let me know so I can correct the logic and improve similar cases.

The world of mobile development (even indirect like mobile websites) is a bit caotic sometimes , but if things were always easy what fun would be in the software development world?


Posted by Kartones on 2010-04-13