Ferodynamics Network

popular: profile privacy, mobile privacy

January 13, 2009

Hooray, a new project called Google Converters aims to produce simple converters to move your blog content between platforms. That’s great! Right?

I certainly applaud them for grasping the nettle but I fear their efforts might further put off the wider solution, which is a universal XML specification used by all blog platforms to import and export content.

At the moment each platform provides various means to move content from one place to another. Click on the WordPress tools menu for example and you will find tools to import from Blogger, Textpattern, Moveable Type and TypePad, Livejournal, Greymatter (no me neither), DotClear, and Blogware. They don’t work across all versions and it is up to the developers of each platform to find ways of extracting the data from others and converting it.

The advantage of doing this centrally, via Google Converters is that someone else does it for us, but actually I think exporting to a standardised format should be pretty high up the list of requirements for any platform, and it seems to me (by all means disagree) that this shouldn’t be that hard. Surely, at the heart of all these systems is a blog, and blogs are blogs right?

I have heard this mentioned time and time again and the point is always raised that there is little incentive for open source developers to focus their efforts on something like this. For that reason, unless there is already a better alternative out there that someone can point me to, I am going to step up to this. I don’t know anything about formal specifications, maybe an informal one will do, but I can write a WordPress plugin, and I can write a Habari plugin, and perhaps those two platforms will be enough to get us started.

Update: I have created a Google code, and Google group for this:

http://groups.google.com/group/blog-content-interchange-format-group

http://code.google.com/p/blog-content-interchange-format/

Wordpress Chat
one comment
page 1308
Html 5 Gallery
6 comments
page 1305
Html 5 Gallery
6 comments
page 1305
Silence Is Golden
3 comments
page 213
Questions About Habari For Wordpress Users
6 comments
page 424
Theming Habari Vs Wordpress
13 comments
page 440
My Experience Of Flexx
4 comments
page 1026
Plugin Update Fun With Photo Data 2
one comment
page 815
Post Image The Easy Peasy Way
26 comments
page 1065
Categories Vs Tags Either Neither Or Both
12 comments
page 7
Gaining Benefits From Plugins
8 comments
page 1167
Fun With Theme Widgets
24 comments
page 867
Beware Wp Cache
8 comments
page 1310
Six Million Ways To Die Choose One
14 comments
page 1128
Post Image The Easy Peasy Way
26 comments
page 1065
Post Image The Easy Peasy Way
26 comments
page 1065
Wordpress Chat
one comment
page 1308
Post Image The Easy Peasy Way
26 comments
page 1065
Beware Wp Cache
8 comments
page 1310
Wordpress Chat
one comment
page 1308
Wp Polls Reviewed
one comment
page 58
Fun With Photo Data
12 comments
page 330
Html 5 Gallery
6 comments
page 1305
Fun With Sidebar Tabs Styling
2 comments
page 336
Using Your Own Url Shortener
4 comments
page 1190
Html 5 Gallery
6 comments
page 1305
My Experience Of Flexx
4 comments
page 1026
Fun With Sidebar Tabs
193 comments
page 57
Html 5 Gallery
6 comments
page 1305
Fun With Plugins
27 comments
page 14
Wordpress 25 Exif Fields
12 comments
page 230
Html 5 Gallery
6 comments
page 1305
Html 5 Gallery
6 comments
page 1305
Html 5 Gallery
6 comments
page 1305
Beware Wp Cache
8 comments
page 1310
Quick N Dirty Replacement Text
no comment
page 122
Theming Habari Vs Wordpress
13 comments
page 440
Fun With Sidebar Tabs Styling
2 comments
page 336
Beware Wp Cache
8 comments
page 1310
  updated 1 seconds ago
Thursday, 8am
Andrew Rickmann

I must admit that I am more include to start with a blank sheet of xml but I will investigate Atom and AtomPub further to see if it is easier to do it that way. What is in my mind is that the extensions will mean it is no longer compatible with existing importers, so I wonder how beneficial it would really be. Plugins would still need to be built.

Thursday, 8am
Andrew Rickmann

While I can see the benefits I am going to avoid feature creep. My key goal is to do something very very simple that just works for most ordinary users. If someone wants to write a forum plugin that pulls the data from the format then that is great, but I won't be addressing anything specific for forums.

Thursday, 6am
Ryan

Great idea. Setting up a system which is compatible with forums as well would be good. So you could convert your blog posts with comments to forum topics with replies and vice versa. In fact forums need a universal system more so than blogs IMO.

Could the existing WordPress system be adapted for this? Couldn't you just make a Habari plugin which does the same thing as the built in WordPress one?

Thursday, 5am
michaeltwofish

There may also be problems maintaining timestamps, because an AtomPub server can choose to change or ignore things like the client-supplied time for 'updated'.

Thursday, 5am
michaeltwofish

No, because Atom, by default, doesn't encode everything that can be expressed in each blogging platform. Owen makes a good example, Habari's metadata for posts and comments. But it does encode information about authors and content and categories (which Habari uses as tags) and much more. In theory, you could use an AtomPub client to pull down all your posts from your old blog (export) and simply POST them to your new blog (import). I don't know if there are clients that support that at the moment, however. (Aside, PUT is for editing, POST is for creating.)

Wednesday, 6pm
Andrew Rickmann

Is there a reason why Atom isn't already used as a default format? If you can get information from a blog and put that information into another then isn't the work already done? Isn't that the universal format?

Wednesday, 3pm
ringmaster

I also don't understand WXR extending RSS. Extending Atom makes more sense because it's transactional — if you HTTP GET an Atom document, make changes to it and then POST it back to the same URL (with the proper credentials) it should cause the system to update with those changes. That's a nice feature. It uses actual HTTP verbs. Imagine an export format that also allowed for that, but could be used for author data and other site metadata. That's actually one thing that we were thinking of doing with Habari early on; building the admin as an Atom front-end that worked transactionally with any number of compatible back-ends.

Also a benefit of using an existing format is that there are a lot of people working on the standard, and extending it is more trivial than building something from scratch. Take a look at the work that went into the Atom spec to get an idea of what it takes to build something that will work just that well.

Simple would be nice, although in that case I'd emphasize the need to have someplace to collaborate on how extensions to that simple baseline work. You definitely don't want it to be difficult for developers from any platform to join the effort and exchange ideas.

Wednesday, 8am
Andrew Rickmann

I will take a look at it. My main requirement is that it just works and is really simple for normal people to understand. I don't really know anything about AtomPub as I have never had cause to use it.

Wednesday, 8am
Andrew Rickmann

Thanks Owen. There is much to think about there. My first thought was that it should just use plain XML. I never really understood why WXR should extend RSS instead of just being an XML file with the necessary data in it.

I think it is very possible to make it very complicated and most of all I want to keep it simple and achieve the basics. If it really needs extending beyond that it can be.

As for licensing, I plan to just let anyone do whatever they want with it.

Wednesday, 1am
Heiner

Just a little side note, the garden path evidently is the path to glory ;-)

Wednesday, 1am
michaeltwofish

I think AtomPub is a good place to start, as most (all?) platforms already have support for at least Atom syndication, it's an open, documented standard, the Atom community is helpful and open, it's extensible, and has been successfully extended. It may not be perfect, but starting again with something else is very much likely to end up with something a whole lot worse.

(The draft element allows the client to request that a resource be made publicly visible (the default) or not. I suspect you know this, however, so perhaps we'll discuss that out of band.)

Tuesday, 8pm
ringmaster

Well that's always been the problem with efforts like these before. Someone steps up and says they want interchange and selects one or two formats and is done.

I'm not suggesting that's the extent of your approach, but my major complaint in a discussion with one of the BlogML developers recently was that their .net world is so insular, they've not made any inroads with the likes of the 95% share OSS blog market. Did it never occur to them to publish in a clear, obvious, findable location the schema for their format? I guess not.

Besides that, it IS hard. Looking at the BlogML spec (when you're able to locate it), it's got a few things that make sense, and some stuff that could use revision. For example, in all of the blog platforms I know, comments and trackbacks are the same core thing, but of different types. Perhaps this is different for the .net language blog softwares. It seems to me that there is no need for first-order elements for both of these comment types. Also, the category system in the format wouldn't seem to support anything but the most rudimentary systems before needing completely replaced to house tag or taxonomy information.

What would also be helpful for BlogML, as would for any universal format, is a document describing what to do if your blog supports data that is not covered by the format. Habari, for example, stores open-ended metadata about comments and users. This isn't standard among blog software. So how should that be acounted for in the export format? Is there a place to publish and collaborate on extensions to the base format? Perhaps by slightly tweaking what extension your platform requires, you can gain two other platforms in the bargain, rather than have two additional, incompatible export format extensions and a reason for Google Converters to exist at all.

Being more open about the format would be useful, too. There were a couple of interchange formats that came and went without ever being used because the author wanted to retain some kind of rights over the process. If you're developing an interchange format between open source platforms, expect them to want it to be open, both in documentation and in compatible licensing. There's nothing like documenting a useful interchange format by writing a proprietary-licensed library to implement it that none of your target platforms can embed.

As far as the format itself, it should build on existing standards. This is one place where WXR tries and fails, in that it's kind of RSS, but it's not technically valid XML, which paradoxically RSS has to be. Atom might be a good place to start, although its usefulness in blogging has been somewhat abstracted out. That draft element still bugs me. What the heck does it MEAN? Maybe RDF? Maybe just valid XML would be good.