I was using HTTP::BrowserDetect for a long time. Not because it's a pice of art or accurate, but because of laziness perhaps. When I had some free time, I thought about re-inventing the wheel, like I did several times before. The main reason for re-inventing is the source code and interface of the module (try to read it, you'll understand) and the lack of new releases. Also, it's not accurate.
There are two other alternatives though: HTML::ParseBrowser and HTTP::DetectUserAgent. The former is really good parser-wise, while the latter is actually a sniffer and does not give you a verbose result.
So, I wrote Parse::HTTP::UserAgent. It tries to be verbose and parse as much as possible from the junk named "User Agent String". It tries to identify the major browsers first and then falls back to minor/old ones with an extended probe. The parsed structure has many fields like:
name Browser name. You may need to check original_name() if faker (like Maxthon). version_raw Browser version version version(version_raw)->numify: The float version of the parsed version. original_name The original name (i.e.: Maxthon) original_version The original version (i.e.: 2.0 (Maxthon)) os Operating system. Windows names returned instead of versions lang The "user interface" language of the browser toolkit [tk_name, tk_version, version(tk_version)->numify]. Gecko, Trident, etc. dotnet If it has .NET CLR version in the string, this'll have all versions mozilla If a Mozilla browser, returns Moz version: [original, version(original)->numify] strength Encryption strength (I guess this does not have much value today) robot UA is a robot extras Any non-parsable junk. Arrayref. parser The name of the parser that returned the result set generic Parsed by a generic parser? Bool. string The original User Agent String unknown User Agent String can not be parsed device ***not implemented yet wap ***not implemented yet mobile ***not implemented yetThe module also has
->as_hash
and ->dumper
methods for debugging purposes.
The biggest difference is; it parses the fakers like Maxthon accurately. Also extracts .NET versions and toolkit names and versions. It also identifies Opera 10 (btw, Opera is the first thing I install on a new system) correctly.
The version numbers are converted to decimals to ease comparison (I dislike that major/minor stuff the others implement). The conversion also removes any junk string (like "gold") from the version number. While using version is good, as it handles all the nasty stuff, I got some regression from 5.6.2 smokers after releasing the module. It looks like they (5.6.2) have the pure perl version::vpp (I couldn't compile the xs version under 5.6.1 either) which has some kind of bug. I've opened a ticket about the issue, but also added a workaround to fool version::vpp (postfix '.0' if version is three digits). I currently have no idea about 5.5.x but 5.6.x seem to be fine at least (also tested myself with ActivePerl 5.6.1 on a virtual Windows XP).
The module also has some example programs in it for benchmarking. I'll give some figures below. The test system is: Windows Vista Home Premium SP2 32bit & P8600 @ 2.40GHz & ActivePerl 5.10.0.1004
C:\>perl -Ilib eg\bench.pl -c 1000 *** The data integrity is not checked in this run. *** This is a benchmark for parser speeds. *** Testing 161 User Agent strings on each module with 1000 iterations each. This may take a while. Please stand by ... Rate HTML HTML2 Browser Parse Parse2 Detect HTML 12.6/s -- -2% -63% -75% -82% -90% HTML2 12.9/s 2% -- -62% -75% -81% -90% Browser 34.2/s 170% 166% -- -33% -51% -73% Parse 51.1/s 304% 297% 50% -- -26% -59% Parse2 69.4/s 449% 439% 103% 36% -- -44% Detect 125/s 888% 871% 266% 144% 80% -- The code took: 241.65 wallclock secs (228.21 usr + 0.08 sys = 228.29 CPU) --------------------------------------------------------- List of abbreviations: HTML HTML::ParseBrowser v1 HTML2 HTML::ParseBrowser v1 (re-use the object) Browser HTTP::BrowserDetect v0.99 Detect HTTP::DetectUserAgent v0.01 Parse Parse::HTTP::UserAgent v0.16 Parse2 Parse::HTTP::UserAgent v0.16 (without extended probe)HTML::ParseBrowser is slow as hell. Even re-using the object as the doc suggests does not help. It's good that I wasn't aware of the module until now :p HTTP::BrowserDetect is not a good performer too. But the interface is extensive and it's kinda defacto standard in this area. It tries to match with *anything* possible and this choice slows it down (who cares if $ua->win31 is true as of today right?). HTTP::DetectUserAgent is the speedy one here. It doubles Parse::HTTP::UserAgent even when the extended probe is disabled. However it gains this speed with several CAVEATs as the version number suggests.
C:\>perl -Ilib eg\accuracy.pl *** This is a test to compare the accuracy of the parsers. *** The data set is from the test suite. There are 161 UA strings *** Parse::HTTP::UserAgent will detect all of them *** A tiny fraction of the regressions can be related to wrong parsing. *** Equation tests are not performed. Tests are boolean. This may take a while. Please stand by ... ---------------------------------------------------------------------------------------------- | Parser | Name FAILS | Version FAILS | Language FAILS | OS FAILS | ---------------------------------------------------------------------------------------------- | HTTP::DetectUserAgent | 27 - 16.77% | 37 - 23.27% | 67 - 100.00% | 35 - 24.31% | | HTTP::BrowserDetect | 28 - 17.39% | 8 - 5.03% | 67 - 100.00% | 20 - 13.89% | | HTML::ParseBrowser | 0 - 0.00% | 3 - 1.89% | 42 - 62.69% | 19 - 13.19% | | Parse::HTTP::UserAgent | 0 - 0.00% | 3 - 1.89% | 3 - 4.48% | 4 - 2.78% | ----------------------------------------------------------------------------------------------Parse::HTTP::UserAgent is not perfect, but at least it seems to be close. HTML::ParseBrowser is more accurate on name/version matching. Speedy HTTP::DetectUserAgent seems to be the worst. However there is one caveat, the test data is from the Parse::HTTP::UserAgent test suite. So, Parse::HTTP::UserAgent is not actually that good yet since there are some patterns it can not match.
Note: The module is already on CPAN, but you can get the latest code and non-CPAN content from the code repository. The repo also has a etc/Migration.pod
for HTTP::BrowserDetect users.