Django plans for spokenwikipedia.org

31 January 2009, 18:08

So I have owned spokenwikipedia.org for about a year and a half. I really like the Spoken articles WikiProject on Wikipedia, and I think it deserves a wider audience. The tiny speaker icon is very easy to lose amongst the clutter and decoration of a decent Wikipedia article, and even if you find it you may not know what to do with an Ogg file. So I bought the domain intending to “re-present” the material and highlight the recording aspects. And I love Python and have taken an interest in the Django framework. It has speedy development, great documentation and is very welcoming of newcomers, all of which are positive indicators for me. But… I never quite made the leap into starting a new project from scratch. There were a few too many things I didn’t know: one looming large was (and is) database stuff.

So I floundered about a bit and didn’t get very far. Then at LCA this month I went to Jacob Kaplan-Moss’s Django tutorial which was pretty much exactly what I needed. So now this is what I think I need to do:

  1. Write a script to pull all the needed info from Wikipedia via the MediaWiki API. This should help me figure out precisely what data I will have, which will help with the next step…
  2. Design my database, aka my models. I had a brainwave about this while at the conference: I should just augment the appropriate bits of the MediaWiki database. Then I should never be able to have conflicts between my models and MediaWiki’s. I still have a bit to learn here. I plan to start with a SQLite database.
  3. Choose my views (URLs). That is possibly my favourite thing about Django. A website with clean URLs gives me great joy! That won’t be too hard.
  4. Put code in the views, kind of integrated with designing the templates, below.
  5. Design the templates (HTML and CSS). I have done a bit of work on this, based on the concrete theme from freecsstemplates.org (yay CC-BY designs). Hacking up HTML and CSS always takes me a long time, although with Firebug it is tons faster. I am quite finicky.

I was worrying a bit about if I should try and design everything I want to do from the start, or start simple and then expand later. Although it might be tricky I’m going to start simple. If I don’t start simple I don’t think I will start at all. :)

Another issue is about if I should run each “XX.spokenwikipedia.org” language instance as its own version, or have them all unified and controlled by a single Django project. I wasn’t able to think of a good way to communicate which subdomain has been visited — I didn’t even know if it was possible — so I asked Jacob’s advice at the tutorial. He thought it should be possible by overwriting some of the middleware which gets the locale (by default) from the user’s browser. A quick google reveals a couple of things that might give me a head start here too.

Another thing later will be putting the appropriate pages into version control. I think I will use bzr + Launchpad because it bills itself as being very easy. Seems to fit with the Python philosophy so I’ll stick with it. Plus, ooh!, distributed. ;)

One thing I am not sure about right now is: how to initially populate the DB. I was thinking I would write some Python to query the MediaWiki API, write the results to a file. Then to populate it, parse the file and put it into the DB via the models. But that is obviously a dumb duplication. I should either immediately dump it into the database, away from Django’s eyes, or use Django to query the MW API. For, say, weekly updates, that seems like a good idea, but not sure how good an idea it is for an initial query of over a thousand spoken articles. Hm… oh well, I have some more pressing concerns: I can work on the templates, and actually figuring out what bits of data I need, before I come to this.

tags: ,

---

Comment

Subscribe to comments on this post: rss / atom

Commenting is closed for this article.