• Parse Accept-Language to detect a user's language

    May 4 2008

    I'm an English-speaking Canadian living in Germany. Quite often I go to a website like Google or Kayak and find myself looking at a German version of the site.

    Okay, I do live in Germany, but why assume that everyone within Germany speaks German? What about visitors from other countries, or even people living here that would prefer to use another language?

    What must be happening is these sites are taking my IP address, looking up the geographical location of that address, and choosing the official language for that country. This may work most of the time, but there is an even easier way to choose a language.

    Most browsers send an Accept-Language header. For example, mine is set to:

    en-ca,en;q=0.8,en-us;q=0.6,de-de;q=0.4,de;q=0.2

    What this basically says is that I prefer (in decreasing order of preference) Canadian English, generic English, US English, German spoken in Germany, and lastly generic German. Any web site I visit is capable of looking at this list and deciding what language I would prefer.

    Of course, no matter what assumptions you make about a visitor, give them a chance to change their language if needed. For example, if you use an Internet cafe in Berlin, you shouldn't be stuck viewing websites in German!

    One really nice thing: I often see Google Ads and other geographically targeted ads in German, and this makes ignoring the ads much easier! :)

    Update: I was inspired to throw together a quick Accept-Language parser in PHP:

    $langs = array();
    
    if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
        // break up string into pieces (languages and q factors)
        preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'], $lang_parse);
    
        if (count($lang_parse[1])) {
            // create a list like "en" => 0.8
            $langs = array_combine($lang_parse[1], $lang_parse[4]);
        	
            // set default to 1 for any without q factor
            foreach ($langs as $lang => $val) {
                if ($val === '') $langs[$lang] = 1;
            }
    
            // sort list based on value	
            arsort($langs, SORT_NUMERIC);
        }
    }
    
    // look through sorted list and use first one that matches our languages
    foreach ($langs as $lang => $val) {
    	if (strpos($lang, 'de') === 0) {
    		// show German site
    	} else if (strpos($lang, 'en') === 0) {
    		// show English site
    	} 
    }
    
    // show default site or prompt for language
    

    This would produce the following structure for my Accept-Language string:

    Array
    (
        [en-ca] => 1
        [en] => 0.8
        [en-us] => 0.6
        [de-de] => 0.4
        [de] => 0.2
    )

  • Comments

    1. Geert at 4:16pm on May 4, 2008

    Good advice, indeed. Way simpler than looking for the geo location of an IP address.

    I am only wondering about the reason why they once picked that content negotiation format for HTTP headers like Accept. Refering to your example, how would one parse the header easily to know that en-ca has a quality factor of 0.8? Exploding it on “;” or “,” does not really help.

    2. Geert at 4:37pm on May 4, 2008

    Oh, wait! Now I see, en-ca does not have a quality factor of 0.8 but of 1 (by default) since a “q=” parameter has been omitted.

    For some reason I misunderstood this content negotiation syntax for a while. But reading and re-reading the specs cleared things up: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html

    So, sorry for the confusion. Exploding on “,” is the way to go.

    3. Jesse Skinner at 5:02pm on May 4, 2008

    @Geert - You inspired me to throw together a parse script in PHP that deals with the q factor. Feel free to use and rewrite this as much as you like. (See above.)

    4. Susie at 10:32pm on May 4, 2008

    Wow, that's exactly the way I feel too! I'm a native English speaker living in China and it drives me nuts when I go to some major websites and am automatically given the Chinese version. I can read a little Chinese, but sometimes its even hard to find the link to go to the English-language version. Its enough to make me want to stop using the website!

    5. Matt at 7:19pm on May 6, 2008

    I started using a geotargetting feature in our ad server that will pick up the users language preference and target ads based on that - instead of Geo location.  I saw that and thought pretty much like you did.  If you speak spanish and live in texas - then spanish ads make more sense than if you speak english but live in spain.

    6. Kevin at 10:29am on May 25, 2008

    Thanks for that script, its simple, effective and it works :)

    I didn't want to write the code myself, so I asked Google and found my way here, so thanks again!

    7. Mei at 9:21am on June 20, 2008

    Hey, great script. I speak <a href="http://en.wikipedia.org/wiki/Welsh_language">Welsh</a> and English, and so my browser is wired for Welsh content, but it's rare that a site would recognise it...

    Anyway, Ihope to integrate this excellent piece of code into my future developments.

    Diolch!

    8. Denis at 1:51am on June 27, 2008

    Thank you for nice code for Accept-Language parsing, it saved a lot of time for me!

    9. Nikita at 3:37am on July 5, 2008

    I've found many implementations of this parser but wasn't sure, if it's right to explode on ','. So I searched for RFC and found this article. This code is much simpler than I've seen before. Thanks for that and for RFC link :)

    10. Sarah Lewis at 11:03pm on July 16, 2008

    Thanks for sharing this code! It's been very helpful for an application I'm coding.

    11. Wyrm at 12:00pm on July 29, 2008

    Thanks!

    12. Brian Cherne at 2:15am on August 7, 2008

    The W3C has a nice FAQ on when to use the Accept-Language header. http://www.w3.org/International/questions/qa-accept-lang-locales

    If all you had to know was language, I see this as being a great resource so long as you provide the user with an easy way to change languages - using an iconic or native language approach.

    However, some of the web sites I've worked on have locale-specific pricing/availability... and IP address is simply the most reliable/unobtrusive way to enable this. While it's bad to assume what language a user speaks, it may be impractical to provide translations for all locale-specific options (even for a handful of languages).

    13. Jesse Skinner at 5:23am on August 7, 2008

    @Brian - True, but you could also separate the localization (currency) stuff from the internationalization (language) stuff. So perhaps the pricing, currency and shipping could be based on the IP, whereas the language could be based on the Accept-Language header or whatnot.

    You may have to - how else would you want to handle separate pricing for USA versus Canada, for example? Would you have separate language files for each?

    14. Brian Cherne at 12:31pm on August 7, 2008

    @Jesse - I totally agree with you. In a situation where it's possible localization can (and should) be separated from internationalization.

    My point was, however, that assuming products A, B, C are available in the US and products D, E, F are available in Germany, it may not be possible (time/budget-wise) to write English descriptions for products D, E, F or German descriptions for A, B, C. This would be especially true if the return on investment was negligible (i.e., non-German speakers in Germany being <1%).

    All depends on how our clients do business. From what I've seen the major retailers have product availability based on region (North America, Europe, Asia/Pacific, ...). Then within each region there are (should be) two independent data sets: supported languages and supported locales (for currency/pricing, etc.). Unfortunately, I think too often language and locale are bundled together in an effort to keep things simple.

    15. Jesse Skinner at 1:13pm on August 10, 2008

    @Brian - yep that's totally true. Thanks for pointing that out.

    16. Naveed at 9:52am on September 12, 2008

    How can I get rid of this automatic language detection settings.  Everytime I go to google.com it redirects to google.ae (based on my location)... Is there any preference to change.

    Please advise in plain English and not in Programming terms.

    regards,

    17. Jesse Skinner at 9:57am on September 13, 2008

    @Naveed - If you look at the bottom of google.ae you will see a link to "Google.com in English" - click that and you'll be transported to the English google.com with no redirect :)

    For anyone else's convenience, the link for all languages is:

    http://www.google.com/ncr

    18. Lummo at 9:30am on September 26, 2008

    Your code is really useful but it failed on some of the language settings I use. I traced the problem to the spec for Accept-Language:

    Accept-Language = "Accept-Language" ":"
                            1#( language-range [ ";" "q" "=" qvalue ] )
          language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )

    The language-range parameter can be from 1 to 8 alpha characters for both the primary-tag and the subtag.

    This can be accommodated by changing your regexp pattern to:

    preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)s*(;s*qs*=s*(1|0.
    [0-9]+))?/i', $_SERVER['HTTP_ACCEPT_LANGUAGE'], $lang_parse);

    regards,

    Lummo

    19. Jesse Skinner at 11:56am on September 29, 2008

    @Lummo - Thanks for that! I've made the adjustment to allow 1-8 characters in the primary/sub-tags.

    20. Lummo at 5:54pm on October 7, 2008

    You are welcome. Don't forget the "/i" (for case insensitive) that I slipped in there too. Some of those strings are upper case too.

    21. Jesse Skinner at 3:11pm on October 11, 2008

    Thanks again, Lummo!

    22. TarquinWJ at 11:44am on November 18, 2008

    What happens if a user has da,en;q=0 which I think is valid, meaning "I want Danish, but whatever you do, don't give me English!"?

    A script that steps through the array (like yours) would see "en" and think it was OK to use it as a last resort, but q=0 means "give me anything except this" - so even Swahili would be better.

    The nicest approach I can think of for that is to build two arrays, one of the positive, and one of the negative.

    23. Lummo at 11:19am on November 20, 2008

    I think that the answer to that is "it depends what you want to do with the information". At the moment the code returns the q values so it's easy to skip over or delete any Accept-Languages where it is zero if that's what you want to do.

    If the user has q=0'ed all of the languages that you support then it might be best to offer up an apology before dropping back to some lingua franca. "sorry. We don't speak the same linguine!". The problem is what language to offer it in?

    regards,

    Lummo

    24. Malte Anglais at 2:31am on February 11, 2009

    Great article. Badly designed langauge redirects drive me insane. Especially when there's no obvious link in English to get back.

    25. Ries at 5:45pm on February 15, 2009

    I think not even search engines work to spec in these cases, but yeaa.. I life in Ecuador and often get a spanish version while my browser sends en as preferred, not even google does this correctly.

    For a search engine point of view is that google also doesn't understand this and cannot index a website in 3 languages while the content is the same (but show in a different language).

    Some people break the spec here, including google.

    Ries

    26. Cristian at 8:24am on March 18, 2009

    hi, is intersting but can you help me ...I'm not a programer and I want that when people come to my site to be automaticly reditected to his language (if he is from de to german and so on)... so I have no ideea how to do that...

    I use a cms and the lang link looks like this:

    www.mysite.com/index.php?en_home  -for english

    www.mysite.com/index.php?de_home  -for german

    I think you got the ideea ...where do I find a script and instructions that redirects automaticly to his lang?

    Best Regards

    Cristian

    27. Lummo at 8:53am on March 18, 2009

    So, at the moment your index.php is expecting to receive a parameter that specifies the language and you want to redirect to a page in that language? Is that right?

    So index.php?en_home would end up at mysite.com/en_home.php?

    There are a couple of ways of achieving this but both involve a degree of programming.

    1) If your web server is Apache then you can use .htaccess redirect rules to do the redirection for you. You'll need to set up the .htaccess to match the parameter and redirect to the URL that you want to handle that language.

    2) You can have your index.php file gather the parameter and then do a redirect to the URL that you want to handle that language. The PHP redirect is done by calling the header() function something like this:

    header('Location: ' . $redirectTo, true); // Redirect to target

    where $redirectTo contains the target URL.

    Can I suggest that, if you can, you alter your URL parameter to be like this:

    http://www.mysite.com/index.php?lang=en

    That way the language is passed as a parameter value rather than the parameter being the value.

    Hope that helps.

    with best regards,

    Lummo

    28. Woody Gilk at 5:14pm on May 6, 2009

    Thanks for your article! Based on your analysis and code provided, Koahana v3.0 will have a Request::accept_lang($lang) method. :)

    29. Nicholas Shanks at 1:57pm on July 11, 2009

    I don't know where Lummo got those BNFs for the Accept-Language header, but they do not conform to BCP 47.

    See page 4 of RFC 4646: http://tools.ietf.org/html/rfc4646

    Something like (I do not write BNF for a living):

    2*4ALPHA (["-" 4ALPHA] ("-" 2ALPHA (["-x] "-" 3*8ALPHA)))

    allowing tags like "sco-Latn-GB-x-lallans" or "en-oed":
    language-Script-COUNTRY-x-dialect
    the x prefix is for unofficial dialects (ebonics) and not needed for official ones (OED, Scouse)

    In short, the regex needs to be amended something like this:

    [a-zA-Z]{1,4}(-[A-Z][a-z]{1,3})?(-[a-zA-Z]{2})?(-x)?(-[a-zA-Z]{3,8})?

    and drop the /i from the end.

    4 character language codes are reserved and not currently used.
    I am ignoring grandfathered tags (i-) and such, read RFC4646 and RFC4647 for the full details.

    30. Lummo at 3:39pm on July 12, 2009

    Hello,

    You're right. I should have quoted the source. I believe that it was from RFC 2616 Hypertext Transfer Protocol -- HTTP/1.1, Section 14.4: (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.4).

    {quote}
    The Accept-Language request-header field is similar to Accept, but restricts the set of natural languages that are preferred as a response to the request. Language tags are defined in section 3.10.

          Accept-Language = "Accept-Language" ":"
                            1#( language-range [ ";" "q" "=" qvalue ] )
          language-range  = ( ( 1*8ALPHA *( "-" 1*8ALPHA ) ) | "*" )
    {/quote}

    Section 3.10 (http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10) defines the language tags.

    {quote}
    White space is not allowed within the tag and all tags are case- insensitive.
    {/quote}

    Looks like I was behind the times :-)

    Cheers,

    Lummo

    31. Lummo at 3:46pm on July 12, 2009

    Btw, RFC 4646 (page 5) says:

    {quote}
    The tags and their subtags, including private use and extensions, are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST NOT be taken to carry meaning.
    {/quote}

    so maybe the ammendment could be:

    [a-z]{1,4}(-[a-z]{1,4})?(-[a-z]{2})?(-x)?(-[a-z]{3,8})?/i

    Cheers again,

    Lummo

    32. Nicholas Shanks at 5:59am on July 13, 2009

    The problem with that is that the different groups may mistakenly match the wrong part, e.g.

    zh-hant could be matched as $1 = zh, $2 = hant; or as $1 = zh, $5 = hant

    The last part of the pattern, the "dialect" as I call it, is problematic. No new dialects are allowed to be defined that are fewer than 5 chars long, so it could be {5,8} except that there are extant cases such as en-OED (oxford english dictionary spellings) where as few as 3 letters are used.

    we could either do [A-Z][a-z]{3} .../i for the script code, and allow three-char dialects, or [a-z]{4} for four char script codes, and [a-z]{5,8} for the dialect with a /i at the end.
    The script code is only ever 4 chars, so it does not need to be {1,4}

    33. lilly at 5:48pm on September 1, 2009

    Excuse me for been too "blond"
    but can I implement this into a plain html website and how possibly could I do it?
    I have a website in 9 different language versions and would like an user to be redirected to the appropriate language version of the file he has visited, depending on his manual browser language pick up settings
    Thank you in advance

    34. Lummo at 6:39pm on September 1, 2009

    It depends (of course!) on what facilities you have on the system that hosts the site(s). Apache? PHP? Ruby? Python? Tomcat? etc?

    The Apache web server has a scheme for having URLs vectored to a language specific pages depending on the user's Accept-Language HTTP request header. You can read more at http://httpd.apache.org/docs/1.3/content-negotiation.html. This doesn't work too well for SEO for links though. It seems that all the href contents have to be the same for each language.

    I am working on an easier and more flexible solution but this margin is too narrow to contain the details. I'd offer to talk with you offline but I don't know how to get in touch without posting and e-mail address or something.

    Any suggestions?

    Lummo

    35. tayfun at 10:59am on September 18, 2009

    Hi, I've added a "(,|$)" to your regular expression because I was getting some junk in the header for some reason. I mean something like this: "tr-TR,tr;q=0.7,chrome://global/locale/intl.properties;q=0.3" A comma or an end of string does help a little in not recognizing junk. Thanks for your post, it helped me understand the discussion.

    36. Taber at 3:47am on November 27, 2009

    Great article! I'm wondering how reliable/well-adopted across different browsers the Accept-Language header is. For example, IE5+, Firefox 1.x+, etc? I'm sure all modern browsers support it, but just wondering where the line is drawn, if any. Thanks.

    37. Nicholas Shanks at 6:42am on November 27, 2009

    @taber: Mosaic gained support for Accept-Language in version 2.4 (1992). Netscape/Firefox and IE both inherited this code and so support it since version 1.0 (1994-5). Opera gained support somewhere between versions 3.51 and 5.12 (ca. 2000). Konq/Safari/Chrome have supported it since their respective version 1.0s too (2000, 2003, 2008). Lynx has supported it for longer than I can find. I don't know about iCab, links or w3m. wget and telnet also support it if you remember to write the header yourself ;-)

    38. Taber at 10:55am on November 27, 2009

    Awesome, thanks Nicholas!

    39. cloved at 10:55pm on January 28, 2010

    how can i do if i use javascript?

    40. Cyrill at 7:10pm on February 6, 2010

    Thanks for the script) It fits perfectly into the bootstrap class of ZendFramework)

    41. Manfred Kooistra at 9:03am on February 10, 2010

    What I don't understand is why you need to sort the languages by q value. It seems to me that they are already sorted by q value as they come from the browser.

    42. Manfred Kooistra at 9:21am on February 10, 2010

    Aha, I just read the Accept headers part in RFC 2616 at http://www.ietf.org/rfc/rfc2616.txt?number=2616 (section 14.1 starting on page 100), and the examples given are NOT ordered regarding q value. So it seems that, yes, you do need to sort languages.

    In the $_Server Manual on php.net an example of a regular expression to parse the accept language header is given here: http://www.php.net/manual/en/reserved.variables.server.php#94237

    43. Manfred Kooistra at 10:27am on February 10, 2010

    Okay, one last thing. If I understand your code correctly, you test if either English or German are the MOST preferred language, not actually which one of both is MORE preferred.

    Let's say you are in the US and offer a site in two languages, English and German, with the default language being English. Now your site is visited by a user speaks no English, only French, German and Italian and who has set his browser to prefer French over German, thus: fr,de;q=0.8,it;q=0.2. With your code this person will recieve the default language version of the website, English, because German is not in the first position of the list (de === 0). This is bad, because he does not know English but would have been happy with the German version.

    Instead of testing for either German or English being the FIRST language in the user's preferences, you should test which language comes before the other, no matter where in the list they appear.

    The code for that could look something like this:

    $sorted_languages = "";
    foreach ($langs as $lang => $val)
    $sorted_languages .= $lang . "-";

    if ((strpos($sorted_languages, 'de') === FALSE) && (strpos($sorted_languages, 'en') === TRUE) {
    // show English site
    } elseif ((strpos($sorted_languages, 'de') === TRUE) && (strpos($sorted_languages, 'en') === FALSE) {
    // show German site
    } elseif ((strpos($sorted_languages, 'de') === TRUE) && (strpos($sorted_languages, 'en') === TRUE) {
    if (strpos($sorted_languages, 'de') < strpos($lang, 'en') {
    // show German site
    } else {
    // show English site
    }
    } else { // if both return FALSE
    // show default site
    }

    44. Jesse Skinner at 11:57am on February 10, 2010

    @Manfred - my code does see which supported language has a higher q value, by first sorting the languages by q value (with arsort) and then looping over them, checking for the languages the site supports. The first matching one must have a higher, or equal, q value to any of the others.

    Of course there are other techniques for working with the data; it all depends on what experience you want for your visitors.

    45. Manfred Kooistra at 2:13pm on February 10, 2010

    Jesse, I misunderstood your loop. I was thinking each "strpos" was reading the whole array. I didn't differentiate between "$langs" and "$lang", because my attention was focused on understanding the "if strpos" which I have encountered for the first time here (being only a PHP amateur).

    But there still appears to be a problem with your loop: it does not stop, when you find the preferred language but continues for all key-value pairs in your array. If you have both English and German in your preferences, or multiple instances of one language (en-ca, en, en-us), each of them results in a display of your website, one above the other. I'm surprised that no-one has found this in their resulting source code. Shouldn't you put an "exit()" in there? Like this:

    foreach ($langs as $lang => $val) {
    if (strpos($lang, 'de') === 0) {
    // show German site
    exit();
    } else if (strpos($lang, 'en') === 0) {
    // show English site
    exit();
    }
    }

    Because "if (strpos($lang, 'de') === 0)" is true for both "[de-de] => 0.4" and "[de] => 0.2", so your "foreach"-loop outputs the German website twice, because you have nothing to stop it. Same goes for your three instandes of "en".

    I hope I could make myself clear. It's kind of difficult to explain this without drawing a nice graphic :-)

    46. Jesse Skinner at 3:07pm on February 10, 2010

    @Manfred - you're absolutely right, the code needs to break the loop, either using break, exit, die or return. I left it to the imagination how to display the site, and make sure it's only displayed once.

    47. Michael at 3:59pm on March 11, 2010

    Thank you for this perfect script :-)

    I am developing an international Dating Site and need to determine the users language.

    Currently it is only Danish. But I will implement other languages soon. (datingmatch.nu)

    Thanks Again

    48. Marty at 10:59am on April 9, 2010

    Hi, this code looks almost exactly what i need, though the most important thing for me is to seperate UK vs US visitors. is it possible to identify en-uk vs en-us?

    thanks for sharing! marty

    49. Lummo at 11:46am on April 9, 2010

    Yes, but you need "en-gb" rather than "en-uk".

    50. marty at 1:56pm on April 14, 2010

    Fantastic, thanks Lummo, seems to work perfectly! in firefox en-gb vs en-us works but in most other browsers it needs to be en-GB or en-UK.

    thanks very much!

    51. marty at 3:00pm on April 14, 2010

    sorry i meant en-GB, en-US

    52. marty at 9:57am on April 17, 2010

    Did anyone notice that its not working across all browsers?

    Well i thought it was working ok in all browsers after changing to uppercase, but that broke firefox.
    so i put two options for every country, en-us and en-US but that meant 20 lines for 10 countries which is kind of messy

    then i read up on php and replacing strpos with stripos seems to fix it, as stripos is case-insensitive.

    would it be simple to use 'case' instead of 'if else' for matching? i eventualy want to have about 20 different regions and from what i read it is more efficient.

    thanks!

    53. Michael at 9:23am on May 25, 2010

    I think the best way to test the except-lang header is using tool that can be modify headers and send it and view the response.

    I use this free http tool the test the header... enjoy
    http://soft-net.net/SendHTTPTool.aspx

    54. Richard Heider at 6:23am on June 1, 2010

    I just founded the Facebook group '<a href="http://www.facebook.com/group.php?gid=121786987860982">Facebook needs multi-language awareness</a>' to push that issue there.

    55. Echo at 9:11pm on June 10, 2010

    Very nice !

    Thanks

    56. Keith at 11:21am on June 20, 2010

    One thing I don't get is  why list individual varieties of a language in the Accept-Language header?  I mean, I'm unaware of any variety of English I don't understand (well, Ebonics, perhaps), so I just specify the generic version.  Here's my Accept-Language header's value:
    eo,de;q=0.8,es;q=0.5,en;q=0.3
    [I'm not even close to fluent in German and Spanish, but I figure getting webpages in those languages are a good way to improve my reading skills in them.]

    About redirecting to a local server, like google.ae, I'm not seeing the problem.  I'm sure you can still see the content in any supported language on any server, so viewing it on a more local server only makes sense, from a networking viewpoint.

    57. Richard Heider at 11:54am on June 20, 2010

    @Keith. The content-negotiation mechanism is a general mechanism not just restricted to languages or the language a site is presented in.

    What the site does with that info depends on the site and the context, e.g. the site might supply form letters or use a spell checker. It would then use the spelling appropriate for your region to serve you.

    However I guess in practice you are right, that the generic language variant is sufficent.

    58. Felix at 5:49am on July 10, 2010

    This is very helpful! Things like that are not very complicated, but you saved at least one hour of my precious life by posting this one.

    Thank you so much :)

    59. Dinar Q. at 9:39am on July 12, 2010

    i also have(had?) made code for this:
    example code:
    http://qdb.tmf.org.ru/phpsinaw%28test%29-ici/accept-language.php
    how it works: http://qdb.tmf.org.ru/phpsinaw%28test%29/accept-language.php
    working in real site variant of similar code:
    http://qdb.tmf.org.ru/minyasaganprogramlar/kukmara.ru/chat2/index.php (works at kukmara.ru/chat2/ ).
    all these sites do not work at night nearly 23:20-7:00 gmt+4.

    60. goran at 6:38am on July 15, 2010

    The biggest problem here might be users' lack of knowledge...I'm afraid most people don't know how to set up their browser in order to send appropriate accept-language header.

    61. mati at 6:37pm on July 20, 2010

    tanks!!

    62. Crazywater at 7:26am on September 14, 2010

    Hello,
    your script is very short and elegant, do you license it under any particular license or is it just free to use for everyone?

    63. Jesse Skinner at 10:19am on September 14, 2010

    @Crazywater - there's no formal license. Feel free to use this and any other code from my articles in your projects, but only at your own risk, of course.

    64. Crazywater at 12:27pm on September 14, 2010

    Thank you very much! :)

    65. Rulatir at 12:31pm on October 22, 2010

    What about multiple languages with the same q value (like implicit q=1)? In this case their order of occurrence in the accept-language string encodes their relative preference. Unfortunately arsort() is *not* a stable sort, so the order will be lost. You should use a stable sort.

    66. Mike at 10:03pm on November 10, 2010

    Thanks for posting this, very helpful :)

    67. Kathrin at 3:42am on January 22, 2011

    Two relatively minor issues I see:

    PHP has undefined results when sorting two equal values.  (see usort docs)

    Some clients do not specify q values, and trust the server to go with whatever was first.  As such, it makes sense to retain the index and maintain it in the event of a tie.  I am using usort to do this.

    Second, it is possible to specify 0 for a q value, for cases where one wishes to explicitly state "do not give me this language".  The regex doesn't handle this condition properly.  Using '/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+|0))?/i' and checking isset ($matches[4]) seems to work nicely.

    68. James Suvestor at 12:48pm on January 27, 2011

    Thanks

    69. David Van De Walle at 7:14pm on February 12, 2011

    Why don't make things simple.. one line of code ..

    $lang = substr($_SERVER['HTTP_ACCEPT_LANGUAGE'], 0, 2);

    70. Rick McKnight at 2:08pm on March 1, 2011

    Great piece of code Jesse.
    I looked into many other alternatives, but this seems the best by far.

    Thanks a lot.

    71. Urahara san at 8:42am on March 21, 2011

    (sent my previous response before verifying the workings, sorry about that)

    @Kathrin - you can make sort() stable by applying a (semi) Schwartzian transform before sorting.

    Being more pragmatic I've done away with the need to verify the string format:


    function getBrowserLanguages()
    {
    if (!isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])) {
    return array();
    }
    $langs = array();
    foreach (explode(',', $_SERVER['HTTP_ACCEPT_LANGUAGE']) as $k => $pref) {
    // split $pref again by ';q='
    // and decorate the language entries by inverted position
    if (false !== ($i = strpos($pref, ';q='))) {
    $langs[substr($pref, 0, $i)] = array((float)substr($pref, $i + 3), -$k);
    } else {
    $langs[$pref] = array(1, -$k);
    }
    }
    arsort($langs);

    // no need to undecorate, because we're only interested in the keys
    return array_keys($langs);
    }

    72. Giżycko, Mazury - forum at 7:59pm on June 29, 2011

    Very helpful, thank you.
    I'm about to create a site with 40 languages, so choosing the first language is crucial for me and my future visitors :)
    You can see my beginnings at ujagody.pl
    regards

    73. med at 8:53pm on August 7, 2011

    Hello,
    Is this script still supported? I'm stuck on using it. Or maybe it's not up to date as I'm not getting accurate results.

    74. med at 10:05pm on August 7, 2011

    OK I finally got an idea and it works well now

    75. Abhishek at 3:31am on March 13, 2012

    Thanks brother! The code works well for me.

    76. Mike at 12:25pm on March 19, 2012

    I use this code: https://github.com/zendframework/zf2/blob/master/library/Zend/Locale/Locale.php#L582

    77. chris at 6:26am on April 6, 2012

    Hi Jesse,
    thanks for the post. It gave me some guidance on how to do this in C#.

    78. Falk at 7:33am on August 24, 2012

    hey people I made this:
    if (isset($_SERVER['HTTP_ACCEPT_LANGUAGE'])){
    $idiomes=array('es_ES','ca_ES');
    $langs = array();
    preg_match_all('/([a-z]{1,8}(-[a-z]{1,8})?)\s*(;\s*q\s*=\s*(1|0\.[0-9]+))?/i', str_replace('-','_',$_SERVER['HTTP_ACCEPT_LANGUAGE']), $lang_parse);
    if (count($lang_parse[1])) {
    $langs = array_combine($lang_parse[1], $lang_parse[4]);
    foreach ($langs as $lang => $val) {
    if ($val === '') $langs[$lang] = 1;
    }
    arsort($langs, SORT_NUMERIC);
    }
    foreach ($langs as $lang => $val){
    foreach ($idiomes as $idioma){
    if (strtolower($lang)==strtolower($idioma)){
    return $idioma;
    }
    if (substr($lang,0,2)==substr($idioma,0,2)){
    return $idioma;
    }
    }
    }
    }

    In my case I'm in a function witch checks first of all the $_GET variable (so the user can choose), then the $_SESSION, then the $_COOKIE, and at the end the browser.
    Thanks for the code its useful and easily understandable.
    Sorry for my English I'm a Spanish speaking German.

    79. Dan at 4:35pm on December 9, 2012

    Dear Jesse,

    thanks a lot for guiding me to the right direction. :-)

    I improved your code by my tuned foreach-loop for the q factors:

    <schnipp>
    $ctr = 0;
    foreach ($langs as $lang => $val) {
        if ($val === '') {
            $langs[$lang] = round(1-($ctr++/count($langs)), 1);
        }
    }
    <schnapp>

    This codes build its own priority from one to zero to handle those nasty language-couples like
    'de-AT, en-US'
    which collide with each other but i found in real life though. In this case the sequence matters.

    Maybe that helps somebody.

    Cheers
    Dan

    Commenting is now closed. Come find me on Twitter.