I thought I’d mention the sort of database-related work I’ve done on contract, in case any of you might be needing some help, or just so you can be awed by my mad skills. I haven’t done a lot of contract work, but there are a lot of blogs and CMS’s out in the world, and sometimes things go wrong.
I just did some work for my friend Kimli, partially because it seemed like a good challenge, but mostly because she was trying to manually type in 2449 blog entries (omfg! 2449!) from a Windows-based blogging software she’d used from 2001 to 2006 into WordPress. This is the sort of thing that makes me weep: why do anything tedious by hand that you can automate? I demanded that she stop and send me her backup file of blog posts, which turned out to be a “CSV” (comma-separated values) file with entries similar to this:
219, “It was a dark and stormy night. The captain of the ship yelled, “”Tell me a story!”" and the story goes like this:
It was a dark and stormy night. The captain of the ship yelled…”, “A silly recursive joke”, 2001-11-01 10:54:33
This is why I double-quoted CSV: this ain’t no proper CSV, because an entry can span more than one line. Not to mention, all the “” within the post text are not going to make MySQL happy when it reads it, since double-quotes are supposed to determine the start and end of the field.
There was also a Windows-to-not-Windows text issue, where some posts had content like so:
On tonightís menu: chicken strips with a sweet yet zesty red chili sauce, sautÈed garlic mushroom caps, roasted potatoes.
The “í” here is supposed to be a ‘ (single-quote), but instead it’s some weird-looking i. And that “È” should be an “é”, and that’s not just a case issue since the accent’s pointing the wrong way. This problem was everywhere, because Kimli seems to like using French words a lot. Damn her and her hoity-toity vocabulary. Unfortunately, the dos2unix utility did not help me here, and I’ve dealt with this problem before.
So, by the wonders of regular expressions (Everybody stand back!) I fixed all of this: all the in-the-middle-of-the-post line breaks became <br />, all the “” became \”, and all the improper characters were mapped back to what they were supposed to be (which required the most time at all, since I had to google what a “Caga TiÛ” might possibly be if it were spelled correctly). There were some other issues, too, but these were most of them.
So I ended up with a file that looked something like this:
219, “It was a dark and stormy night. The captain of the ship yelled, \”Tell me a story!\” and the story goes like this: <br />It was a dark and stormy night. The captain of the ship yelled…”, “A silly recursive joke”, 2001-11-01 10:54:33
Now I had to map these values to blog entries in the WordPress database schema. Time passes. Once I got that figured out, the command I ran to import them into the database was this:
load data infile "fixedbackupfile.csv" into table wp_posts
fields terminated by ","
optionally enclosed by "\""
escaped by "\\"
lines terminated by '\n'
ignore 1 lines
(@var1, post_content, post_title, post_date)
set post_excerpt = left(100, post_content), post_author = 1;
What this all means is irrelevant; doesn’t it look complicated? There was some other stuff to be done to make WordPress work right with the new data, and add categories based on the post year, but it wasn’t much. And what you have is delicious juice dot archives, as Kimli wanted.
Now, none of this is brilliant (though if you want to be impressed, feel free), but it’s the sort of work that requires DBA or coder skills plus a good knowledge of MySQL. And I set up a testing environment to do all this in before pushing it live (woo, look at me, my own home testing environment). And I’ve certainly had more difficult tasks in my full-time work, but they’re even more boring to read about, believe me.
So this is one of the things DBAs work on. It’s often referred to as magical (Kimli agrees). A friend jokes that I need a “DBA hat”, which he insists is like a wizard hat, but Harry Potter doesn’t wear his anymore (I guess because it doesn’t look cool) so I don’t see why I have to.