Virtualenv is a great Python tool for isolating dependencies when developing a new project. And virtualenvwrapper is the convenient shell script that should be part of virtualenv by default, IMO. virtualenwrapper lets you do things like:
mkvirtualenv --no-site-packages newproject
workon newproject
… and you’re done. Anything you install via pip, after that, will be confined to your virtualenv. Then when deployment time comes, you can go pip freeze > requirements.txt, your users can go pip install -r requirements.txt and all is neat and tidy with the world.
If you are writing a web app your users are probably your web server(s). Then Fabric comes into the mix. Fabric is designed to make deploying your web project a one-liner. It’s pretty thrilling to use too. Here is a typical fabric command:
def deploy_version(version):
"Specify a specific version to be made live"
require('path')
env.version = version
with cd('%(path)s' % env):
run('rm releases/previous')
run('mv releases/current releases/previous')
run('ln -s $(version) releases/current' % env)
restart_webserver()
Nice, no? These slides are a nice intro to fabric as well.
The reason that ‘with cd’ context manager is needed is because Fabric doesn’t keep ‘state’. Apparently each run command is living in its own ssh session, just about. This is a problem if you need to source files, as when using virtualenv. This is hinted at (here, here, here, here) but not really explained clearly anywhere.
To get it working, this is what I ended up doing:
(on my dev machine)
.bash_profile
if [ $USER == blaugher ]; then
export WORKON_HOME=/home/blaugher/virtualenvs
source /usr/local/bin/virtualenvwrapper.sh
fi
fabfile.py
def setup():
"""
Setup a fresh virtualenv as well as a few useful directories, then run
a full deployment
"""
sudo('aptitude install -y python-setuptools apache2 libapache2-mod-wsgi')
sudo('easy_install pip')
sudo('pip install virtualenv')
sudo('pip install virtualenvwrapper')
put('.bash_profile', '~/.bash_profile')
run('mkdir -p %(workon_home)s' % env)
with settings(warn_only=True):
# just in case it already exists, let's ditch it
run('rmvirtualenv %(project_name)s' % env)
run('mkvirtualenv --no-site-packages %(project_name)s' % env)
# [... plus other useful stuff.]
def install_requirements():
"Install the required packages from the requirements file using pip"
with cd('%(path)s' % env):
run('workon %(project_name)s && pip install -r ./releases/%(release)s/requirements.txt' % env)
So, that’s reasonably nice. The .bash_profile is so-written because of this bug – somehow one of the virtualenvwrapper files ends up being owned by root, which causes an IOError for non-root users. You could change the shell command for when you run sudo but it would be pretty tedious. sudo and run have an option of shell (boolean) and in the env you can set the shell command to be used (by default it is /bin/bash -l -c) but there is no easy way to specify different shell commands for run vs sudo commands.
Virtualenvwrapper recommends to add the ‘export’ and ‘source’ lines to your .bashrc. By adding them to .bash_profile instead they will be executed for login shells – like our /bin/bash/ -l, i.e. for all fabric commands, and we don’t have to explicitly source any file in fabric. I thought this was a neat side-step of that problem. I’m not sure what the other implications of .bashrc vs .bash_profile are.
My next problem was a call like this:
run('mkvirtualenv %(project_name)s --no-site-packages' % env)
Fabric was complaining that it was getting back a return code of 1. This seemed odd as it looked like it was working. Even running it by hand still looked good:
blaugher@tardis:~$ mkvirtualenv qwerty --no-site-packages
New python executable in qwerty/bin/python
Installing distribute..................................
..........................................................
................................................................
.....................done.
But not:
blaugher@tardis:~$ echo $?
1
I had joined the mailing list and was preparing to write up my problem when I spotted this earlier reply:
Try reversing the order of env3 and —no-site-packages on the command line. mkvirtualenv expects the environment name to be the last argument.
!!!
Sure enough –
blaugher@tardis:~$ mkvirtualenv --no-site-packages qwerty
New python executable in qwerty/bin/python
Installing distribute........................................................
......................................................
...................................................................done.
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/predeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postdeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/preactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/get_env_details
(qwerty)blaugher@tardis:~$
That’s a little too subtle for my liking. Although note that as well as the extra user_scripts stuff, this time that the virtualenv is actually activated, as it is supposed to be (the “(qwerty)” prefix tells you you are working in a virtualenv).
A couple of other points. If Django is in your requirements.txt file rather than manually installed on your server, you will want to make sure your Django-specific Python calls look something like run('workon %(project_name)s && python manage.py syncdb' % env).
Making ‘workon’ into something more generically useful is obviously not too difficult. This example fabfile adds a method to do it. Apparently an upcoming version of fabric might have a command like prefix, which would work very well.
Finally using virtualenvwrapper means all your virtualenvs are collected together in their own directory, separate to the projects you use them for. Docs on virtualenv alone tend to suggest nesting the project code within the virtualenv itself, but I prefer the approach of virtualenvwrapper.
The moral of the story is, all your problems are already solved, and all you need to do is locate your answers, out there, somewhere. :)
Last night I went to MPUG, the Melbourne Python users group. I have been on that mailing list for seemingly years, and it looks like now there will be an attempt to have regular meetings. Woot.
There was a very interesting talk by Martin Schweitzer called “Primetime Wordfinding” or “Elegant String Searches”. (The slides were posted to the mailing list.)
The basic problem is thus:
Given a set of letters and a dictionary, find all words that can be made from those letters.
The method that he outlines is wonderfully elegant, and will be especially appreciated by maths geeks. However seeing dictionary = ’/usr/share/dict/words’ make me think, “Typical IR approach! Where’s the linguistics?”
It also made me wonder how many languages Linux ships word lists for. Apparently Ubuntu ships many varieties of English, Portugese, Bulgarian, Catalan, Danish, Dutch, Finnish, Faroese, French, Galician, Italian, Norwegian, German, Polish, Spanish, Swedish and Ukrainian. So Europe has decent coverage, but the rest of the world, hmm…
So, how about this revised problem:
Given a set of letters and a language, find all words that can be made from those letters.
We don’t have a dictionary but we have a language, which means we have (whether we consciously realise or not) the rules for
- how alphabetic letters map to phonemes (sound units)
- how phonemes can be combined to form syllables (the main concern)
- how syllables can be combined to form words.
I did a bit of looking to see if I could try and find a ready-made solution, and while it seems that syllable ‘parsing’ is a well-studied problem, syllable ‘generation’ is another matter.
Now this is going to be relatively tricky, because English doesn’t have good one-to-one correspondences between letters and phonemes.
So let’s hack some stuff together… as a first approximation, I’ll grab all the written examples from the Wikipedia articles on English phonology, English orthography and the IPA chart for English dialects.
>>> onsets = list("pbtdckgjfvszwmlnryh") + ["ch","th","sh"]
>>> onsets += ["pl","bl","cl","gl","pr","br","tr","dr","cr","gr","tw","dw","gu","qu"]
>>> onsets += ["fl","sl","fr","thr","shr","sw","thw","wh"]
>>> onsets += ["sp","st","sk"]
>>> onsets += ["sm","sn"]
>>> onsets += ["sph"]
>>> onsets += ["spl","spr","str","scl","scr","squ", "sc"]
>>> nuclei = ["a","e","i","o","u","ow","ou","ou","ie","igh","oi","eer","air","ee","ai"]
>>> nuclei += ["au","ea","ou","ai","ey","ei","er","ear","ir","oo","ou","igh","ough",
"y","oy","oa","ou","ow","ol","ar","ere","are","ear","or","ar","ore","oar","our",
"oor","ure","uer"]
>>> codas = ["lp","lb","lt","ld","lk","rp","rb","rt","rd","rk","rgue","lf","lve","lth","lse",
"lsh","lch","lge","rf","rve","rth","rce","rsh","rch","rge","lm","ln","rm","rn",
"rl","mp","nt","nd","nk","mph","mth","nth","nce","nze","nch","nge","ngth",
"ft","sp","st","sk","fth","pt","ct","pth","pse","ghth","tz","dth","dze","x","lpt",
"lfth","ltz","lst","lct","lx","rmth","mth","rpt","rpse","rtz","rst","rct","mpt",
"mpse","ndth","nct","nx","ngth","xth","xt"]
>>> final = ["s","ed"]
That’s pretty yuck. And I’m not too sure at all about where some of those “r“s should go. A bit of a brute-force solution for problem #1 above. I would like to clean this up and somehow make sure it is complete.
Also, that “final” bit is not a linguistic thing, but it seems to me my codas are not accounting for plural words too well.
>>> # thankyou, martin!
>>> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47,
53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
>>> def prime_val(ch):
... return primes[ord(ch.lower()) - ord('a')]
...
>>> def get_val(word):
... total =1
... for ch in word:
... total *= prime_val(ch)
... return total
...
>>> magic = get_val("chocolate")
>>> nuclei_ok = [n for n in nuclei if magic % get_val(n) 0]
>>> onsets_ok = [o for o in onsets if magic % get_val(o) 0] + [""]
>>> codas_ok = [c for c in codas if magic % get_val(c) 0] + [""]
>>> syllables = []
>>> for o in onsets_ok:
... for n in nuclei_ok:
... for c in codas_ok:
... syllable = o + n + c
... if magic % get_val(syllable) 0:
... syllables.append(syllable)
...
>>> len(syllables)
172
>>> syllables
['talch', 'ta', 'telch', 'te', 'tolch', 'to', 'tealch', 'tea', 'toolch', 'too', 'toalch', 'toa', 'tol',
'calt', 'calth', 'calch', 'cact', 'calct', 'ca', 'celt', 'celth', 'celch', 'cect', 'celct', 'ce', 'colt',
'colth', 'colch', 'coct', 'colct', 'co', 'cealt', 'cealth', 'cealch', 'ceact', 'cealct', 'cea',
'coolt', 'coolth', 'coolch', 'cooct', 'coolct', 'coo', 'coalt', 'coalth', 'coalch', 'coact',
'coalct', 'coa', 'colct', 'col', 'lact', 'la', 'lect', 'le', 'loct', 'lo', 'leact', 'lea', 'looct', 'loo',
'loact', 'loa', 'halt', 'hact', 'halct', 'ha', 'helt', 'hect', 'helct', 'he', 'holt', 'hoct', 'holct',
'ho', 'healt', 'heact', 'healct', 'hea', 'hoolt', 'hooct', 'hoolct', 'hoo', 'hoalt', 'hoact',
'hoalct', 'hoa', 'holct', 'hol', 'chalt', 'chact', 'chalct', 'cha', 'chelt', 'chect', 'chelct',
'che', 'cholt', 'choct', 'cholct', 'cho', 'chealt', 'cheact', 'chealct', 'chea', 'choolt',
'chooct', 'choolct', 'choo', 'choalt', 'choact', 'choalct', 'choa', 'cholct', 'chol', 'tha',
'the', 'tho', 'thea', 'thoo', 'thoa', 'thol', 'clact', 'cla', 'clect', 'cle', 'cloct', 'clo', 'cleact',
'clea', 'clooct', 'cloo', 'cloact', 'cloa', 'alt', 'alth', 'alch', 'act', 'alct', 'a', 'elt', 'elth',
'elch', 'ect', 'elct', 'e', 'olt', 'olth', 'olch', 'oct', 'olct', 'o', 'ealt', 'ealth', 'ealch', 'eact',
'ealct', 'ea', 'oolt', 'oolth', 'oolch', 'ooct', 'oolct', 'oo', 'oalt', 'oalth', 'oalch', 'oact',
'oalct', 'oa', 'olct', 'ol']
Note that onsets and codas are optional, hence I add the empty string to those lists. (I forgot to factor in the “final” bit, although it doesn’t make any difference for the word “chocolate”.)
OK so now I have my syllables. You should find that these are basically all pronouncable in English, although they may not be the standard way of being written (for example, if “choct” was a valid word, I think it would be written as “chocked”. “ct” only seems to get to be a coda for a small number of words, like “tact”). And of course many of them are not valid as mono-syllabic words.
Now, how can we combine them into multi-syllabic words? Well, there are some word-level rules, but mostly they seem more relevant to pronunciation. So we should be reasonably safe with just concatenating syllables.
>>> syll2 = []
>>> for syllable in syllables:
... remaining = syllables[:]
... remaining.remove(syllable)
... for r in remaining:
... combined = syllable + r
... if magic % get_val(combined) == 0:
... syll2.append(combined)
...
>>> len(syll2)
3382
And now…. wait for it…. the big moment has arrived!
>>> 'cocoa' in syll2
True
At the moment this is perhaps not markedly better than just generating every permutation of every length string of the letters in “chocolate”. But there you go… I call it “Dictionary-Free, Linguistically Motivated String Searches”. :)
I am going to ponder if there is a better way to implement this in Prolog. But for Python, is there any way you can use a regular expression for generation rather than parsing? A kind of “regular expression for production”?