(15/7 Edited to add links to DjangoCon videos.)
Last week PyCon AU 2013 began and ended in Hobart, the fourth such conference, and a new high. I have attended it each year so far and it has really gone from strength to strength, with sustainable growth and a markedly friendly and welcoming atmosphere, thanks in major part to Christopher Neugebauer who has been the lead steward of the conference for its two year stint in Hobart.
Friday was miniconference day and I sat in on DjangoCon AU, which I got a lot more out of than I expected, considering I am only a casual user of Django at best.
Porting Django apps to Python 3 (Jacob Kaplan-Moss; slides) had a lot of excellent advice that was not at all Django-specific. In his words, one of the benefits of writing Python 3 is that Unicode now “fails early” – if you’re getting it wrong, you’ll find out when you write the code, not 6 months after it’s been deployed and you get your first customer whose name has a diacritic. He talked about how to go about a “single source” approach – one set of code that runs under both Python 2 and 3. Apparently it is possible, with the help of a library called six. This seems like a more sane approach than running 2to3 and ending up having to maintain 2 codebases that slowly diverge.
Secrets of the testing masters (Russell Keith-Magee; slides) advocated for the use of factory_boy, which looks like a good way of avoiding using fixtures (which are hard to maintain) and also avoiding boilerplate in tests. It was originally written to support Django test suites but it also supports SQLAlchemy-driven projects.
Core Developer Panel – I recommend giving this a watch. They seem like a really thoughtful bunch of people; if I was looking for a web framework community to join I would feel really good about making it Django. I was struck by one of the speakers saying that although he was happy Django had brought people to Python, he was disappointed that some people apparently consider themselves “Django developers” rather than just Python developers, just as some companies advertise for “Django developers” rather than something more broad – I think it is a great point and an attitude that points to a healthy community.
On to the main conf. Nobody Expects the Python Packaging Authority (Nick Coghlan; src) was a bit of a history lesson but also, importantly, offered hope for the future. pip 1.4+ will support “wheels”, as an alternative to binary eggs, which is I think the main reason pip has not yet completely displaced easy_install. An “official” guide for how to do packaging is in the works – watch this space. PEP439 plans to add pip to the standard library – at least enough so that pip can bootstrap itself. Brilliant. Among the questions, tip of the week was to set the environment variable
PIP_DOWNLOAD_CACHE so that if you install the same packages into multiple virtualenvs, you won’t need to download each one separately. Win!
Building secure web apps: Python vs the OWASP Top 10 (Jacob Kaplan-Moss; slides) was a super helpful reference comparing the top 10 security risks in web apps against Django, Flask and Pyramid. (Here’s an endorsement – “Flask is perfect for slide-driven development”)
Software Carpentry arrives Down Under! (Damien Irving) was about bringing software carpentry workshops to Australia. These are for scientists who need/want to improve their programming practices, like using version control, how to design their software and test it. Such an important idea.
Using Cython for distributed-multiprocess steganographic md5sum-collision generation. For… reasons. (Tom Eastman) was a fun look at how to solve an unimportant problem to confound his colleagues with a text adventure game in his company’s pastebin. Yep. Reasons. :)
Modern scientific computing and big data analytics in Python (Ed Schofield) – this was a tutorial so it is longer, but a comprehensive overview of libraries like numpy, scipy, scikit-learn, ipython parallel, map-reduce options and many more. Pandas looks like the thing to use for reading large CSV-like data sets. It’s a better option than numpy because you can index by a label/name rather than an integer. It also comes ready with .plot functions which produce matploblib graphs – nice!!
Tinkering with Tkinter (Russell Keith-Magee, slides) was a bit more than the title suggested – as well as a new look at Tkinter, he introduced his proof-of-concept project cricket which is a GUI for running tests and exploring the results (initially only Django, but easy to add others). He argued that we developers are still using tools with a pretty crappy UX for no particularly good reason, despite decades of research showing the benefit of a good interface. I have to agree on the test running front – py.test output is better than average, simply by the use of colour (radical!) – but even so if I have more than about 3 test failures I often paste the output into a text file just so I can scroll through it and don’t miss anything. So I will definitely look at seeing if this can be adapted to py.test and adopting it.
My big gay adventure. Making, releasing and selling an indie game made in python. (Luke Miller) – the story of creating and selling the point-and-click adventure game My ex-boyfriend the space tyrant. If you only watch one talk from PyCon, make it this one. Excellent presenter, covered so many different perspectives in his short 25 minutes – now that’s how you give a talk. From the process of designing and writing a game to marketing it, this was funny, insightful and interesting. Despite not being the target audience in any way at all I think I’m going to sit down with a friend and play it because it looks like a bunch of fun.
Lightning talks on Sunday had two great ones – check out pip install python-nation by Jacob Kaplan-Moss (NB: watch this before running the command :P). The other one I like was by Duckie introducing CompCon, a conference for Australia computing students, but what I really liked about it was the impromptu interrupt for a brief critique of conference presentation styles.
Whew! My to-watch list is still long:
Yeah, it’s a long list… aside from the first keynote, I didn’t miss any sessions. That just tells you how good the program was!
Virtualenv is a great Python tool for isolating dependencies when developing a new project. And virtualenvwrapper is the convenient shell script that should be part of virtualenv by default, IMO. virtualenwrapper lets you do things like:
mkvirtualenv --no-site-packages newproject
… and you’re done. Anything you install via pip, after that, will be confined to your virtualenv. Then when deployment time comes, you can go
pip freeze > requirements.txt, your users can go
pip install -r requirements.txt and all is neat and tidy with the world.
If you are writing a web app your users are probably your web server(s). Then Fabric comes into the mix. Fabric is designed to make deploying your web project a one-liner. It’s pretty thrilling to use too. Here is a typical fabric command:
"Specify a specific version to be made live"
env.version = version
with cd('%(path)s' % env):
run('mv releases/current releases/previous')
run('ln -s $(version) releases/current' % env)
Nice, no? These slides are a nice intro to fabric as well.
The reason that ‘with cd’ context manager is needed is because Fabric doesn’t keep ‘state’. Apparently each
run command is living in its own ssh session, just about. This is a problem if you need to
source files, as when using virtualenv. This is hinted at (here, here, here, here) but not really explained clearly anywhere.
To get it working, this is what I ended up doing:
(on my dev machine)
if [ $USER == blaugher ]; then
Setup a fresh virtualenv as well as a few useful directories, then run
a full deployment
sudo('aptitude install -y python-setuptools apache2 libapache2-mod-wsgi')
sudo('pip install virtualenv')
sudo('pip install virtualenvwrapper')
run('mkdir -p %(workon_home)s' % env)
# just in case it already exists, let's ditch it
run('rmvirtualenv %(project_name)s' % env)
run('mkvirtualenv --no-site-packages %(project_name)s' % env)
# [... plus other useful stuff.]
"Install the required packages from the requirements file using pip"
with cd('%(path)s' % env):
run('workon %(project_name)s && pip install -r ./releases/%(release)s/requirements.txt' % env)
So, that’s reasonably nice. The
.bash_profile is so-written because of this bug – somehow one of the virtualenvwrapper files ends up being owned by root, which causes an IOError for non-root users. You could change the shell command for when you run
sudo but it would be pretty tedious.
run have an option of
shell (boolean) and in the
env you can set the shell command to be used (by default it is
/bin/bash -l -c) but there is no easy way to specify different shell commands for
Virtualenvwrapper recommends to add the ‘export’ and ‘source’ lines to your
.bashrc. By adding them to
.bash_profile instead they will be executed for login shells – like our
/bin/bash/ -l, i.e. for all fabric commands, and we don’t have to explicitly
source any file in fabric. I thought this was a neat side-step of that problem. I’m not sure what the other implications of
My next problem was a call like this:
run('mkvirtualenv %(project_name)s --no-site-packages' % env)
Fabric was complaining that it was getting back a return code of 1. This seemed odd as it looked like it was working. Even running it by hand still looked good:
blaugher@tardis:~$ mkvirtualenv qwerty --no-site-packages
New python executable in qwerty/bin/python
blaugher@tardis:~$ echo $?
I had joined the mailing list and was preparing to write up my problem when I spotted this earlier reply:
Try reversing the order of env3 and —no-site-packages on the command line. mkvirtualenv expects the environment name to be the last argument.
Sure enough –
blaugher@tardis:~$ mkvirtualenv --no-site-packages qwerty
New python executable in qwerty/bin/python
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/predeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postdeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/preactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/get_env_details
That’s a little too subtle for my liking. Although note that as well as the extra user_scripts stuff, this time that the virtualenv is actually activated, as it is supposed to be (the “(qwerty)” prefix tells you you are working in a virtualenv).
A couple of other points. If Django is in your requirements.txt file rather than manually installed on your server, you will want to make sure your Django-specific Python calls look something like
run('workon %(project_name)s && python manage.py syncdb' % env).
Making ‘workon’ into something more generically useful is obviously not too difficult. This example fabfile adds a method to do it. Apparently an upcoming version of fabric might have a command like prefix, which would work very well.
Finally using virtualenvwrapper means all your virtualenvs are collected together in their own directory, separate to the projects you use them for. Docs on virtualenv alone tend to suggest nesting the project code within the virtualenv itself, but I prefer the approach of virtualenvwrapper.
The moral of the story is, all your problems are already solved, and all you need to do is locate your answers, out there, somewhere. :)
Last night I went to MPUG, the Melbourne Python users group. I have been on that mailing list for seemingly years, and it looks like now there will be an attempt to have regular meetings. Woot.
There was a very interesting talk by Martin Schweitzer called “Primetime Wordfinding” or “Elegant String Searches”. (The slides were posted to the mailing list.)
The basic problem is thus:
Given a set of letters and a dictionary, ﬁnd all words that can be made from those letters.
The method that he outlines is wonderfully elegant, and will be especially appreciated by maths geeks. However seeing
dictionary = ’/usr/share/dict/words’ make me think, “Typical IR approach! Where’s the linguistics?”
It also made me wonder how many languages Linux ships word lists for. Apparently Ubuntu ships many varieties of English, Portugese, Bulgarian, Catalan, Danish, Dutch, Finnish, Faroese, French, Galician, Italian, Norwegian, German, Polish, Spanish, Swedish and Ukrainian. So Europe has decent coverage, but the rest of the world, hmm…
So, how about this revised problem:
Given a set of letters and a language, ﬁnd all words that can be made from those letters.
We don’t have a dictionary but we have a language, which means we have (whether we consciously realise or not) the rules for
- how alphabetic letters map to phonemes (sound units)
- how phonemes can be combined to form syllables (the main concern)
- how syllables can be combined to form words.
I did a bit of looking to see if I could try and find a ready-made solution, and while it seems that syllable ‘parsing’ is a well-studied problem, syllable ‘generation’ is another matter.
Now this is going to be relatively tricky, because English doesn’t have good one-to-one correspondences between letters and phonemes.
So let’s hack some stuff together… as a first approximation, I’ll grab all the written examples from the Wikipedia articles on English phonology, English orthography and the IPA chart for English dialects.
>>> onsets = list("pbtdckgjfvszwmlnryh") + ["ch","th","sh"]
>>> onsets += ["pl","bl","cl","gl","pr","br","tr","dr","cr","gr","tw","dw","gu","qu"]
>>> onsets += ["fl","sl","fr","thr","shr","sw","thw","wh"]
>>> onsets += ["sp","st","sk"]
>>> onsets += ["sm","sn"]
>>> onsets += ["sph"]
>>> onsets += ["spl","spr","str","scl","scr","squ", "sc"]
>>> nuclei = ["a","e","i","o","u","ow","ou","ou","ie","igh","oi","eer","air","ee","ai"]
>>> nuclei += ["au","ea","ou","ai","ey","ei","er","ear","ir","oo","ou","igh","ough",
>>> codas = ["lp","lb","lt","ld","lk","rp","rb","rt","rd","rk","rgue","lf","lve","lth","lse",
>>> final = ["s","ed"]
That’s pretty yuck. And I’m not too sure at all about where some of those “r“s should go. A bit of a brute-force solution for problem #1 above. I would like to clean this up and somehow make sure it is complete.
Also, that “final” bit is not a linguistic thing, but it seems to me my codas are not accounting for plural words too well.
>>> # thankyou, martin!
>>> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
>>> def prime_val(ch):
... return primes[ord(ch.lower()) - ord('a')]
>>> def get_val(word):
... total =1
... for ch in word:
... total *= prime_val(ch)
... return total
>>> magic = get_val("chocolate")
>>> nuclei_ok = [n for n in nuclei if magic % get_val(n) 0]
>>> onsets_ok = [o for o in onsets if magic % get_val(o) 0] + [""]
>>> codas_ok = [c for c in codas if magic % get_val(c) 0] + [""]
>>> syllables = 
>>> for o in onsets_ok:
... for n in nuclei_ok:
... for c in codas_ok:
... syllable = o + n + c
... if magic % get_val(syllable) 0:
['talch', 'ta', 'telch', 'te', 'tolch', 'to', 'tealch', 'tea', 'toolch', 'too', 'toalch', 'toa', 'tol',
'calt', 'calth', 'calch', 'cact', 'calct', 'ca', 'celt', 'celth', 'celch', 'cect', 'celct', 'ce', 'colt',
'colth', 'colch', 'coct', 'colct', 'co', 'cealt', 'cealth', 'cealch', 'ceact', 'cealct', 'cea',
'coolt', 'coolth', 'coolch', 'cooct', 'coolct', 'coo', 'coalt', 'coalth', 'coalch', 'coact',
'coalct', 'coa', 'colct', 'col', 'lact', 'la', 'lect', 'le', 'loct', 'lo', 'leact', 'lea', 'looct', 'loo',
'loact', 'loa', 'halt', 'hact', 'halct', 'ha', 'helt', 'hect', 'helct', 'he', 'holt', 'hoct', 'holct',
'ho', 'healt', 'heact', 'healct', 'hea', 'hoolt', 'hooct', 'hoolct', 'hoo', 'hoalt', 'hoact',
'hoalct', 'hoa', 'holct', 'hol', 'chalt', 'chact', 'chalct', 'cha', 'chelt', 'chect', 'chelct',
'che', 'cholt', 'choct', 'cholct', 'cho', 'chealt', 'cheact', 'chealct', 'chea', 'choolt',
'chooct', 'choolct', 'choo', 'choalt', 'choact', 'choalct', 'choa', 'cholct', 'chol', 'tha',
'the', 'tho', 'thea', 'thoo', 'thoa', 'thol', 'clact', 'cla', 'clect', 'cle', 'cloct', 'clo', 'cleact',
'clea', 'clooct', 'cloo', 'cloact', 'cloa', 'alt', 'alth', 'alch', 'act', 'alct', 'a', 'elt', 'elth',
'elch', 'ect', 'elct', 'e', 'olt', 'olth', 'olch', 'oct', 'olct', 'o', 'ealt', 'ealth', 'ealch', 'eact',
'ealct', 'ea', 'oolt', 'oolth', 'oolch', 'ooct', 'oolct', 'oo', 'oalt', 'oalth', 'oalch', 'oact',
'oalct', 'oa', 'olct', 'ol']
Note that onsets and codas are optional, hence I add the empty string to those lists. (I forgot to factor in the “final” bit, although it doesn’t make any difference for the word “chocolate”.)
OK so now I have my syllables. You should find that these are basically all pronouncable in English, although they may not be the standard way of being written (for example, if “choct” was a valid word, I think it would be written as “chocked”. “ct” only seems to get to be a coda for a small number of words, like “tact”). And of course many of them are not valid as mono-syllabic words.
Now, how can we combine them into multi-syllabic words? Well, there are some word-level rules, but mostly they seem more relevant to pronunciation. So we should be reasonably safe with just concatenating syllables.
>>> syll2 = 
>>> for syllable in syllables:
... remaining = syllables[:]
... for r in remaining:
... combined = syllable + r
... if magic % get_val(combined) == 0:
And now…. wait for it…. the big moment has arrived!
>>> 'cocoa' in syll2
At the moment this is perhaps not markedly better than just generating every permutation of every length string of the letters in “chocolate”. But there you go… I call it “Dictionary-Free, Linguistically Motivated String Searches”. :)
I am going to ponder if there is a better way to implement this in Prolog. But for Python, is there any way you can use a regular expression for generation rather than parsing? A kind of “regular expression for production”?