Thursday, March 08, 2012

PostgreSQL dump/restore and client_encoding

I started to look into EnterpriseDB recently. Pretty pleased with it so far. At first I launched the beta version of their Postgres Plus Cloud Database product, but since this version is in the process of being decomissioned, I've had to transfer the database I had already created to a newly created cluster in their DBaaS model -- which basically means that the cluster manager is maintained by them, and the cluster member servers (1 master + N replicas) are part of your EC2 footprint.

In any case, I did a pg_dump of the database from the initial master, then I tried to load the dump via psql into a newly created database on the new master. However, the client_encoding parameter in postgresql.conf was SQL_ASCII on the first master, and UTF8 on the second. This posed a problem. The psql load operation failed with errors of the type:

ERROR:  invalid byte sequence for encoding "UTF8": 0xe92044
CONTEXT:  COPY table1, line 6606
ERROR:  invalid byte sequence for encoding "UTF8": 0xa0
CONTEXT:  COPY table2, line 978
ERROR:  invalid byte sequence for encoding "UTF8": 0xd454
CONTEXT:  COPY table3, line 3295

Obviously the encodings didn't match. I first tried to re-run the pg_dump on the first master (which had client_encode = 'SQL_ASCII') and specified "--encoding utf8" on the pg_dump command line. This didn't do the trick. I had the same exact errors when loading the dump on the second master.

One solution suggested by EnterpriseDB was to set client_encoding to SQL_ASCII on the new master, restart Postgres and retry the load. I found another solution though in a blog post very aptly titled (for my purposes) 'PostgreSQL database migration, the SQL_ASCII to UTF8 problem'. What I ended up doing, following the advice in the post, was to install the GNU recode utility (I did a yum install recode), then run the initial dump through recode, converting it from ascii to utf8. Something like this:

cat dump.sql | recode iso-8859-1..u8 > utf8_withrecode.sql

Then I modified the line

SET client_encoding = 'SQL_ASCII';

and turned it into:

SET client_encoding = 'UTF8';

after which loading the dump into the new master with psql worked like a charm. for the fun part of doing some load testing against this Postgres cluster!

1 comment:

jaylen watkins said...

Thanks for this interesting one. It is quite helpful update.


Modifying EC2 security groups via AWS Lambda functions

One task that comes up again and again is adding, removing or updating source CIDR blocks in various security groups in an EC2 infrastructur...