Never been to CodeSnippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world (or not, you can keep them private!)

Convert a db to UTF8 after upgrading to MySQL 4.1 (See related posts)

If you've ever used a UTF8 application on a pre-4.1 MySQL server, or never cared about encodings on a 4.1 setup even, you may have a non-UTF8 database containing UTF8 data. While this doesn't bother most applications (e.g. PHP weblogs), it's not clean and you can't sort properly with any non-Western characters. This procedure will fix it:

mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql
chgrep latin1 utf8 dump.sql
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The chgrep part is important, because the table definitions in your dump will likely have "latin1" preserved. If you don't have chgrep, you may use any search-and-replace capable editor, but remember that it must open and save UTF8 properly. Edit: Instead of 'chgrep', you can use 'sed' e.g.:

sed -i "" 's/latin1/utf8/g' dump.sql


Alternatively you may attach "--skip-create-options" to the mysqldump command, but that could omit some needed options (e.g. PACK_KEYS=1 etc.).

You may change the utf8_general_ci collation to whatever you need, e.g. utf8_czech_ci for my purposes.

Edit: Fixed the typo as per the comment below.

Comments on this post

vanweerd posts on Feb 20, 2006 at 14:10
Note the typo --default_character-set in first line should b --default-character-set (as it is in line 4)
xy77 posts on Jul 18, 2006 at 11:49
It worked for me to dump the db using the command
mysqldump --user=username --password=password --default-character-set=latin1 --skip-set-charset dbname > dump.sql

emptying the db:
mysql --user=username --password=password --execute="DROP DATABASE dbname; CREATE DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;"

and reimporting the undmodified dump.sql using
mysql --user=username --password=password --default-character-set=utf8 dbname < dump.sql


The sed command didn't work for me, I get an error:
sed: couldn`t read s/latin1/utf8/g : No such file or directory

- David
lokus posts on Jul 20, 2006 at 07:19
I believe the instruction and method here is more straight forward than the dump and import.
http://dev.mysql.com/doc/refman/4.1/en/charset-conversion.html
sirn posts on Aug 08, 2006 at 19:56
for those who can't use the provided sed command, use this instead:

sed -r 's/latin1/utf8/g' dump.sql > dump_utf.sql
UltraKnorrie posts on Aug 05, 2008 at 11:02
A Quick add-on to this useful post.

I was using PhPMyAdmin, sincee I got no direct access to the server (mysqldump was not possible).
Here's my solution: on the old server running MySQL 3.xx use the export function to grab your data & structure (do both separately).
Now, as meentioned earlier, the charset of early MySQL DBs was by default latin1_swedish_ci.
So, on your new server running MySQL 4.xx or 5.xx (and also a newer version of PhPMyAdmin), select "latin1_swedisch_ci" as connection collation in the main screen of PhPMyAdmin.
Next, create you DB and select utf8_unicode_ci as collation.
Next, import your structure using charset = latin1
Next, import your data using charset = latin1

A browse on your tables should show that all accents are correctly stored in your database now.

Best,
Sarge posts on Mar 25, 2009 at 20:23
You can also use sed like this:
sed -i 's/latin1/utf8/g' dump.sql

You need to create an account or log in to post comments to this site.