Does your Java-based Web applications really support UTF8?

Whereas Java natively supports UTF8 (unicode), are you sure that this is the same for your Java-based Web applications? In fact, it’s not the case without doing nothing since the default encoding is latin1 (ISO 8859-1). Take the sample of an application using Spring framework, Tiles and jsp views. This application is deployed in a Tomcat server that can be reach through Apache2 Web server.

Coonfiguring host locale

UTF-8 needs to be configured for the local environment of the host since both Apache2 and Tomcat rely on some associated environment variables. This corresponds to a set of environment variable, as described below:

export LANG=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
export LC_NUMERIC=en_US.UTF-8
export LC_TIME=en_US.UTF-8
export LC_COLLATE=en_US.UTF-8
export LC_MONETAR=en_US.UTF-8
export LC_MESSAGES=en_US.UTF-8
export LC_PAPER=en_US.UTF-8
export LC_NAME=en_US.UTF-8
export LC_ADDRESS=en_US.UTF-8
export LC_ALL=

A .bash_profile file can be added at user home level to set them automatically. The configuration of these variables can be displayed with the command locale, as described below:

$ locale

Configuring Apache2 Web server

Configuring UTF8 in Apache2 is really simple. This can be done using the AddDefaultCharset directive in the charset file within the /etc/apache2/conf.d, as described below:

AddDefaultCharset UTF-8

Be sure to restart the server after that by using /etc/init.d/apache2 restart.

Configuring modjk

There is no specific options for configuring UTF8 in modjk. However setting ModJK and more particularly correct options within the jk.load file located within the /etc/apache2/mods-enabled folder avoid problems in encoding, as described below:

LoadModule jk_module /usr/lib/apache2/modules/

JkWorkersFile /etc/apache2/
JkLogFile /var/log/apache2/mod_jk.log
JkLogLevel debug
JkLogStampFormat "[%a %b %d %H:%M:%S %Y] "
JkOptions +ForwardURIEscaped

Configuring Tomcat

Some properties also need to be set within the connector configuration for modjk within the server.xml file located in the conf folder of the Tomcat distribution.

<Server port="9105" shutdown="SHUTDOWN">
  <Service name="Catalina">
    <Connector port="9109"
          enableLookups="false" redirectPort="9543" protocol="AJP/1.3"
          request.registerRequests="false" URIEncoding="UTF-8"
          useBodyEncodingForURI="true" compression="off"/>

    <Engine name="Catalina" defaultHost="localhost">
      <Host name="localhost" unpackWARs="true" autoDeploy="true"
      deployOnStartup="true" appBase="webapps"
      xmlValidation="false" xmlNamespaceAware="false"/>

Web application configuration

The thing to do at Web application code is to specify the UTF8 charset for both request and response directly on HttpServletRequest and HttpServletResponse classes. A filter is a great place to put such processing that impacts all Web application. Such filter is described below:

public class Utf8EncodingFilter implements Filter {
    public void destroy() {}
    public void init(FilterConfig config) throws ServletException {}

    private void doBeforeProcessing(ServletRequest request,
ServletResponse response) throws IOException, ServletException {

    private void doAfterProcessing(ServletRequest request,
ServletResponse response) throws IOException, ServletException {

    public void doFilter(ServletRequest request, ServletResponse response,
                        FilterChain chain) throws IOException, ServletException {
        doBeforeProcessing(request, response);
        Throwable problem = null;

        try {
            chain.doFilter(request, response);
        } catch(Throwable t) {
            problem = t;

        doAfterProcessing(request, response);


The Spring framework also natively provides such filter through the CharacterEncodingFilter class. Following code describes how to configure this class in the web.xml file of your application:



Be also sure that returned content-type always integrates the hint regarding UTF8 charset. For example, for HTML content:

Content-Type: text/html; charset=UTF-8

Whereas this aspect can be done using the HttpServletRequest class, this can be configured in JSP pages using page directive, as described the next section.

JSP / Tiles views configuration

JSP pages allow specifying encoding at their beginning using the page directive as describe below:

<%@page language="java" contentType="text/html; charset=UTF-8"%>
<%@ taglib prefix="c" uri=""%>
<%@ taglib uri="" prefix="tiles"%>


Be sure that all JSP pages involved in Tiles set this directive.

Database driver

When creating databases, you commonly can specify their encoding. In our case, we obviously choose the UTF8 one but it can’t be enough since some databases also require some configurations at driver level.

That’s the case for MySQL. As a matter of fact some parameters regarding UTF8 encoding need to be set on the JDBC url to support this encoding even if the target database is configured for UTF8:

  • useUnicode – specifies to use unicode
  • characterEncoding – specifies the character encoding to use

Here is an example of JDBC url with UTF8 configuration

This entry was posted in Java and tagged , , , . Bookmark the permalink.

One Response to Does your Java-based Web applications really support UTF8?

  1. Pingback: Does your Java-based Web applications really support UTF8 | Java Enterprise

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s