This project has moved and is read-only. For the latest updates, please go here.

Reading special characters such as Umlauts from .dbf

Jan 24, 2012 at 8:26 AM

Sorry, I am new here and may have posted this at the wrong spot (Comments instead of Discussions). So here I go again.

I am having trouble with special characters such as Umlauts in .dbf meta-data. They display just fine when viewed with a .dbf reader, but when imported using Catfood, they are replaced with junk (e.g. "Baden-Württemberg" becomes "Baden-Wⁿrttemberg". Is there a parameter (maybe in the connection string) I can tweak to correctly read these characters?

Jan 28, 2012 at 4:49 PM

1.40 supports using the ACE driver instead of JET. Could you try that and let me know it it makes a difference?

Jan 28, 2012 at 7:29 PM
The ACE driver is the first thing I tried after discovering the problem. Unfortunately, it has the same issue as the Jet driver.

Dr. Edgar R. Knapp, 3T Communications AG.
Sent from my android tablet.
Feb 26, 2012 at 10:38 PM

I haven't found anything that can be done with the connection string. There's a thread at http://forums.esri.com/Thread.asp?c=93&f=1170&t=197185 with some pointers but after some quick experimentation I can't get sensible names back from your file. Let me know if you come up with anything.

Feb 27, 2012 at 9:41 AM

I am not sure what to do with the code page parameters. What makes me sure that a solution is not that hard to come by is the fact that DBF Viewer 2000 (shareware), DBF Viewer (freeware), ArcGIS Explorer, etc. are able to interpret the metadata just fine and display the gamut of special characters in all their glory. Unfortunately, I can’t spend any more time to on this.

BTW: Thanks for the 1.50 update.

Feb 29, 2012 at 3:38 PM

I have found an ugly work-around. Change the registry setting at

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Xbase\DataCodePage

to "ANSI" instead of "OEM" and, voila,, Umlauts appear.

This ought to be settable through the Extended Properties of the Connection String, but I could not get it to work.

Jun 26, 2014 at 7:53 PM
I have a patch that allows the Shapefile reader to handle accented characters.

It appears that Odb DB - DBase IV uses the IBM437 (the original IBM PC charset!) encoding by default when it reads the DBF files. Unfortunately, ESRI specifies ISO8859-1. All the patch does is convert the name/value strings from IBM437 encoding to ISO8859-1 encoding and voila, I can use all the standard accented characters in place names, etc.

Here's the patch for Shapefile.cs

Around line 35, add the following lines:
        public const string ConnectionStringTemplateAce = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=dBase IV";

        // ESRI shapefiles use ISO-8859-1 as their character encoding (code page 28591), but OleDB seems to be using IBM437.
        // We translate the strings read by the DBase IV engine from one encoding to another.
        // Note: Some Shapefiles use other encodings.  For example, use code page 65001 for UTF-8.  You can find a list at
        // http://msdn.microsoft.com/en-ca/library/windows/desktop/dd317756%28v=vs.85%29.aspx
        // or google for "code page 28591 65001" or a bunch of others.
        private readonly Encoding OleDbEncoding = Encoding.GetEncoding(437);
        private readonly Encoding ShapefileEncoding = Encoding.GetEncoding(28591);

        private const string DbSelectStringTemplate = "SELECT * FROM [{0}]";
and around line 400:
                    for (int i = 0; i < _dbReader.FieldCount; i++)
                    {
                       metadata.Add(_dbReader.GetName(i),
                            _dbReader.GetValue(i).ToString());
                    }
becomes
                    for (int i = 0; i < _dbReader.FieldCount; i++)
                    {
                        String name = _dbReader.GetName(i);
                        String value = _dbReader.GetValue(i).ToString();

                        // OleDB reads strings in using the IBM437 (Original OEM!) encoding.  ESRI usually
                        // uses 8859-1.  Convert from IBM437 to the specified encoding.
                        name = ShapefileEncoding.GetString(OleDbEncoding.GetBytes(name));
                        value = ShapefileEncoding.GetString(OleDbEncoding.GetBytes(value));

                        metadata.Add(name,value);
                    }
Jan 9, 2015 at 11:05 AM
I had the same problem with an UTF-8 encoded shapefile. After some searching I found a very simple solution: just change the connection string to match the original character encoding. Here's my change:
public const string ConnectionStringTemplateJet = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source={0};Extended Properties=\"dBase IV;CharacterSet=65001;\"";
Worked like a charm! Obviously this is a hack - the class interface should provide a means to set the original character set code.