If you manage a Web site for more than a few months, you run into problems of content rot. You’ll be cruising through some old pages, and you’ll find stuff that’s…off, for one reason for another.
For instance, when this blog first started, I was anal-retentive about enclosing BLOCKQUOTEd text in quotes. It was a quote, after all. I would go through all the text I quoted, find double quotes, convert them to singles, then surround the entire thing in double-quotes before BLOCKQUOTEing the entire thing.
Now, this was very admirable of me, but when I started inviting others to blog with me, that whole concept broke down. Not everyone was doing it, and since it wasn’t consistent, I didn’t want to do it at all. However, there are still a thousand or so entries sitting out there with quotes around them.
Just recently, we started to standardize code fragments we post with by using the CODE tag and the SimpleCode script. There remain, however, a hundred or so posts with code hacked up in BLOCKQUOTEs or DIVs or God knows what.
These aren’t an isolated cases – there are styles that we’ve since abandoned, double-dashes that haven’t been replaced with the – entity, etc. I try to nail these things as entries hit the site, but I miss some. On top of all this, throw in link rot – links that just 404 over time – and comments. Ugh, comments…
I try to stay on top of comment spam, but I’m sure some get through. Additionally, there are stupid comments that slip by (why do people insist on testing my comment form with ‘fgfgfgfgfgf’ all the time?), and comments that aren’t relevant any longer – people complaining about bad links that I’ve fixed or mis-spellings that I’ve corrected.
Categorization is another thing. I added the Temple of Mac category at about entry #1,600. However, I didn’t bother to go back through all the old entries and move all the Mac-related entries to the new category.
Mix all this together, and you have a site that doesn’t really age well. I’m sure if I tooled through 100 old entries, I’d have something that needed to be fixed or corrected in at least 40 of them. How do you handle this? Gadgetopia is hurtling toward entry number 3,000, and that’s a lot of volume.
I’ve often thought that I should create a script that just generated 10 random entries a day for me to review. Each morning, I’d get an email with 10 entries in it that I need to look over and touch up. But how do you make sure you get them all before you start getting duplicates? I suppose you could log them all in a table and then join the entries table against it to filter out entries that had already been covered. Like this:
SELECT e.entry_id FROM mt_entries e LEFT JOIN already_reviewed r ON e.entry_id = r.id WHERE r.id IS NULL ORDER BY RAND LIMIT 10
(I haven’t tested this SQL, mind you.) Wrap some PHP around this, schedule it for the middle of the night, and you’d have 10 entries every morning that you can tune up. Perhaps I’d send 10 to myself, and three or so to each of the rest of the authors.
I think, however, I’m going to try something different. I’m on the verge of putting another sidebar on the front page called “One Year Ago Today” that lists the things were we talking about a year ago (see the OnThisDay plugin). I’ll schedule an automatic rebuild of the front page every morning at 1:00 a.m., then check the year-old entries while I’m eating my Crunchy Corn Bran in the morning.
Maybe this will work, maybe it won’t. If someone wants to take a stab at the mailer script (or if you already have), please post a link. If anyone else has any thoughts about content rot, let’s hear them.