Thursday, March 08, 2012

PostgreSQL dump/restore and client_encoding

I started to look into EnterpriseDB recently. Pretty pleased with it so far. At first I launched the beta version of their Postgres Plus Cloud Database product, but since this version is in the process of being decomissioned, I've had to transfer the database I had already created to a newly created cluster in their DBaaS model -- which basically means that the cluster manager is maintained by them, and the cluster member servers (1 master + N replicas) are part of your EC2 footprint.

In any case, I did a pg_dump of the database from the initial master, then I tried to load the dump via psql into a newly created database on the new master. However, the client_encoding parameter in postgresql.conf was SQL_ASCII on the first master, and UTF8 on the second. This posed a problem. The psql load operation failed with errors of the type:

ERROR:  invalid byte sequence for encoding "UTF8": 0xe92044
CONTEXT:  COPY table1, line 6606
ERROR:  invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT:  COPY table2, line 978
ERROR:  invalid byte sequence for encoding "UTF8": 0xd454
CONTEXT:  COPY table3, line 3295

Obviously the encodings didn't match. I first tried to re-run the pg_dump on the first master (which had client_encode = 'SQL_ASCII') and specified "--encoding utf8" on the pg_dump command line. This didn't do the trick. I had the same exact errors when loading the dump on the second master.

One solution suggested by EnterpriseDB was to set client_encoding to SQL_ASCII on the new master, restart Postgres and retry the load. I found another solution though in a blog post very aptly titled (for my purposes) 'PostgreSQL database migration, the SQL_ASCII to UTF8 problem'. What I ended up doing, following the advice in the post, was to install the GNU recode utility (I did a yum install recode), then run the initial dump through recode, converting it from ascii to utf8. Something like this:

cat dump.sql | recode iso-8859-1..u8 > utf8_withrecode.sql

Then I modified the line

SET client_encoding = 'SQL_ASCII';

and turned it into:

SET client_encoding = 'UTF8';

after which loading the dump into the new master with psql worked like a charm. for the fun part of doing some load testing against this Postgres cluster!

1 comment:

jaylen watkins said...

Thanks for this interesting one. It is quite helpful update.