From 84d9e2d50ca9fbcf34e31cb74797fc182187c7b5 Mon Sep 17 00:00:00 2001 From: Jakub Narebski Date: Fri, 3 Feb 2012 13:44:54 +0100 Subject: [PATCH] gitweb: Allow UTF-8 encoded CGI query parameters and path_info MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Gitweb forgot to turn query parameters into UTF-8. This results in a bug that one cannot search for a string with characters outside US-ASCII. For example searching for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 bytes in UTF-8 and percent-encoded as %C5%82) result in the following incorrect data in search field MichaÅ\202 Kiedrowicz This is caused by CGI by default treating '0xc5 0x82' bytes as two characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' query parameter is not processed explicitly as UTF-8 encoded string. The solution used here follows "Using Unicode in a Perl CGI script" article on http://www.lemoda.net/cgi/perl-unicode/index.html: use CGI; use Encode 'decode_utf8; my $value = params('input'); $value = decode_utf8($value); Decoding UTF-8 is done when filling %input_params hash and $path_info variable; the former requires to move from explicit $cgi->param(