Articles tagged: python

Getting virtualenv(wrapper) and Fabric to play nice

525 days ago

Virtualenv is a great Python tool for isolating dependencies when developing a new project. And virtualenvwrapper is the convenient shell script that should be part of virtualenv by default, IMO. virtualenwrapper lets you do things like:

mkvirtualenv --no-site-packages newproject
workon newproject

… and you’re done. Anything you install via pip, after that, will be confined to your virtualenv. Then when deployment time comes, you can go pip freeze > requirements.txt, your users can go pip install -r requirements.txt and all is neat and tidy with the world.

If you are writing a web app your users are probably your web server(s). Then Fabric comes into the mix. Fabric is designed to make deploying your web project a one-liner. It’s pretty thrilling to use too. Here is a typical fabric command:

def deploy_version(version):
    "Specify a specific version to be made live"
    require('path')
    env.version = version
    with cd('%(path)s' % env):
        run('rm releases/previous')
        run('mv releases/current releases/previous')
        run('ln -s $(version) releases/current' % env)
    restart_webserver()

Nice, no? These slides are a nice intro to fabric as well.

The reason that ‘with cd’ context manager is needed is because Fabric doesn’t keep ‘state’. Apparently each run command is living in its own ssh session, just about. This is a problem if you need to source files, as when using virtualenv. This is hinted at (here, here, here, here) but not really explained clearly anywhere.

To get it working, this is what I ended up doing:

(on my dev machine)

.bash_profile

if [ $USER == blaugher ]; then
    export WORKON_HOME=/home/blaugher/virtualenvs
    source /usr/local/bin/virtualenvwrapper.sh
fi

fabfile.py

def setup():
    """
    Setup a fresh virtualenv as well as a few useful directories, then run
    a full deployment
    """
    sudo('aptitude install -y python-setuptools apache2 libapache2-mod-wsgi')
    sudo('easy_install pip')
    sudo('pip install virtualenv')
    sudo('pip install virtualenvwrapper')
    put('.bash_profile', '~/.bash_profile')
    run('mkdir -p %(workon_home)s' % env)
    with settings(warn_only=True):
        # just in case it already exists, let's ditch it
        run('rmvirtualenv %(project_name)s' % env)
    run('mkvirtualenv --no-site-packages %(project_name)s' % env)
    # [... plus other useful stuff.]
def install_requirements():
    "Install the required packages from the requirements file using pip"
    with cd('%(path)s' % env):
        run('workon %(project_name)s && pip install -r ./releases/%(release)s/requirements.txt' % env)

So, that’s reasonably nice. The .bash_profile is so-written because of this bug – somehow one of the virtualenvwrapper files ends up being owned by root, which causes an IOError for non-root users. You could change the shell command for when you run sudo but it would be pretty tedious. sudo and run have an option of shell (boolean) and in the env you can set the shell command to be used (by default it is /bin/bash -l -c) but there is no easy way to specify different shell commands for run vs sudo commands.

Virtualenvwrapper recommends to add the ‘export’ and ‘source’ lines to your .bashrc. By adding them to .bash_profile instead they will be executed for login shells – like our /bin/bash/ -l, i.e. for all fabric commands, and we don’t have to explicitly source any file in fabric. I thought this was a neat side-step of that problem. I’m not sure what the other implications of .bashrc vs .bash_profile are.

My next problem was a call like this:

run('mkvirtualenv %(project_name)s --no-site-packages' % env)

Fabric was complaining that it was getting back a return code of 1. This seemed odd as it looked like it was working. Even running it by hand still looked good:

blaugher@tardis:~$ mkvirtualenv qwerty --no-site-packages
New python executable in qwerty/bin/python
Installing distribute..................................
..........................................................
................................................................
.....................done.

But not:

blaugher@tardis:~$ echo $?
1

I had joined the mailing list and was preparing to write up my problem when I spotted this earlier reply:

Try reversing the order of env3 and —no-site-packages on the command line. mkvirtualenv expects the environment name to be the last argument.

!!!

Sure enough –

blaugher@tardis:~$ mkvirtualenv --no-site-packages qwerty
New python executable in qwerty/bin/python
Installing distribute........................................................
......................................................
...................................................................done.
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/predeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postdeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/preactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/get_env_details
(qwerty)blaugher@tardis:~$ 

That’s a little too subtle for my liking. Although note that as well as the extra user_scripts stuff, this time that the virtualenv is actually activated, as it is supposed to be (the “(qwerty)” prefix tells you you are working in a virtualenv).

A couple of other points. If Django is in your requirements.txt file rather than manually installed on your server, you will want to make sure your Django-specific Python calls look something like run('workon %(project_name)s && python manage.py syncdb' % env).

Making ‘workon’ into something more generically useful is obviously not too difficult. This example fabfile adds a method to do it. Apparently an upcoming version of fabric might have a command like prefix, which would work very well.

Finally using virtualenvwrapper means all your virtualenvs are collected together in their own directory, separate to the projects you use them for. Docs on virtualenv alone tend to suggest nesting the project code within the virtualenv itself, but I prefer the approach of virtualenvwrapper.

The moral of the story is, all your problems are already solved, and all you need to do is locate your answers, out there, somewhere. :)

tags:

Comment

---

Language Sleuthing HOWTO with NLTK

551 days ago

Wow. There are lots of things I could and should have blogged about, like Australia’s first PyCon, like Wikimania in Poland, like LCA papers… like SFD… like LUV elections… like… ! But for now I will just post the slides from a talk I gave at the Linux Users of Victoria August meeting, called “Language Sleuthing HOWTO: Discovering Interesting Things with Python’s Natural Language Tool Kit”.

Interesting things if you are a member of luv-main, anyway. :)

Slides:

I know they aren’t too comprehensible without context. Maybe I will work them into a few longer-form blog posts…?

tags: , , ,

Comment

---

Job-finding

792 days ago

Well, it looks like on Tuesday I will be officially leaving the ranks of the unemployed. As fun as it is to have brunch three times a week in all kinds of interesting cafes, get all those life administration tasks sorted, and swan about at conferences, the fear of potential nothingness is a stressful thing to have on your back. Even for someone in as good a position as I have been. The phrase “job security” seems to take on a certain literalness.

In total I spent about about three months actively job-seeking, and five weeks unemployed. Which seems to suggest you should start looking for a new job about a month before you realise you want to. As well as putting the IT job feeds from Seek etc into my feed reader, I “activated my networks”, so to speak ( = told my friends, acquaintances at meetups, and my micro/blog). In the end there were 4 options that I seriously considered. For someone who does not have “X years of technology Y” Seek did prove to be useless. But if you are a Java/.NET/COBOL or even a PHP or Visual Basic person, you will find plenty to keep you busy.

Option #1 was a small business in Carlton North with a small suite of web-based applications for a particular retail sector. I met their business analyst when I decided to attend The Hive for the first time that week and he struck up a conversation with me. I mentioned that I was looking for work as a Python programmer and he did a double-take, then told me his company was looking for Python people. (Actually he told me they were looking for Java people to convert their Python applications to Java, which I light heartedly protested about. They wanted to do that because of trouble finding Python people, though.) So it was a great chance meeting, and I was not too surprised to get a follow-up email later that week.

While they seemed like good people, I think the role would have stretched me a little more than I am ready for right now. I mean some stretching is good, but then there is biting off more than you can chew and setting yourself up for failure. So I had the novelty of my first experience of declining a job offer.

Option #2 arose via someone responding to my blog post, which is pretty sweet. Some weeks passed and eventually a phone-based technical interview was set up, which I kinda bombed. It was one of those “impossible to prepare for” type tests and indeed I felt unprepared. So I didn’t hear back from them but it’s not so bad. I would have had to move to Sydney anyway. :P

Option #3 came about via a good friend of mine. It was kind of like, “Help me brainstorm where I could work.” “What about my workplace?” Her experience of working there was certainly a ringing endorsement, and I think she did the same for me to them. While also a small business, I was impressed at their thoroughness at getting the basics right – interview, technical test, checking references (something Option #1 didn’t fare well at). I would probably have done C# and maybe some IronPython. Broadening my skills in a commercially recognised way would certainly be no bad move. And I would have been very happy to accept their offer, were it not for…

Option #4, in the public sector, and I actually found out about it from a tech-usergroup-acquaintance posting to some mailing lists I’m on. The main things that drew me to this position were the fact that it is Python and deals with language data. That’s pretty much my dream combo at the moment. The interview went pretty well, despite a bumpy start and (in hindsight) a completely wrong answer, nonetheless delivered with conviction and seemingly accepted in same. I realised parallels with my previous work that I hadn’t seen before, and was able to ‘riff’ on those for a bit. I did feel ‘in my element’ enough to give off some confidence and I’m sure that helped a lot.

And so, I start on Tuesday. :)

tags: ,

Comment

---

Finding COCOA in CHOCOLATE without a dictionary?

907 days ago

Last night I went to MPUG, the Melbourne Python users group. I have been on that mailing list for seemingly years, and it looks like now there will be an attempt to have regular meetings. Woot.

There was a very interesting talk by Martin Schweitzer called “Primetime Wordfinding” or “Elegant String Searches”. (The slides were posted to the mailing list.)

The basic problem is thus:

Given a set of letters and a dictionary, find all words that can be made from those letters.

The method that he outlines is wonderfully elegant, and will be especially appreciated by maths geeks. However seeing dictionary = ’/usr/share/dict/words’ make me think, “Typical IR approach! Where’s the linguistics?”

It also made me wonder how many languages Linux ships word lists for. Apparently Ubuntu ships many varieties of English, Portugese, Bulgarian, Catalan, Danish, Dutch, Finnish, Faroese, French, Galician, Italian, Norwegian, German, Polish, Spanish, Swedish and Ukrainian. So Europe has decent coverage, but the rest of the world, hmm…

So, how about this revised problem:

Given a set of letters and a language, find all words that can be made from those letters.

We don’t have a dictionary but we have a language, which means we have (whether we consciously realise or not) the rules for

  1. how alphabetic letters map to phonemes (sound units)
  2. how phonemes can be combined to form syllables (the main concern)
  3. how syllables can be combined to form words.

I did a bit of looking to see if I could try and find a ready-made solution, and while it seems that syllable ‘parsing’ is a well-studied problem, syllable ‘generation’ is another matter.

Now this is going to be relatively tricky, because English doesn’t have good one-to-one correspondences between letters and phonemes.

So let’s hack some stuff together… as a first approximation, I’ll grab all the written examples from the Wikipedia articles on English phonology, English orthography and the IPA chart for English dialects.

>>> onsets = list("pbtdckgjfvszwmlnryh") + ["ch","th","sh"]
>>> onsets += ["pl","bl","cl","gl","pr","br","tr","dr","cr","gr","tw","dw","gu","qu"]
>>> onsets += ["fl","sl","fr","thr","shr","sw","thw","wh"]
>>> onsets += ["sp","st","sk"]
>>> onsets += ["sm","sn"]
>>> onsets += ["sph"]
>>> onsets += ["spl","spr","str","scl","scr","squ", "sc"]
>>> nuclei = ["a","e","i","o","u","ow","ou","ou","ie","igh","oi","eer","air","ee","ai"]
>>> nuclei += ["au","ea","ou","ai","ey","ei","er","ear","ir","oo","ou","igh","ough",
"y","oy","oa","ou","ow","ol","ar","ere","are","ear","or","ar","ore","oar","our",
"oor","ure","uer"]
>>> codas = ["lp","lb","lt","ld","lk","rp","rb","rt","rd","rk","rgue","lf","lve","lth","lse",
"lsh","lch","lge","rf","rve","rth","rce","rsh","rch","rge","lm","ln","rm","rn",
"rl","mp","nt","nd","nk","mph","mth","nth","nce","nze","nch","nge","ngth",
"ft","sp","st","sk","fth","pt","ct","pth","pse","ghth","tz","dth","dze","x","lpt",
"lfth","ltz","lst","lct","lx","rmth","mth","rpt","rpse","rtz","rst","rct","mpt",
"mpse","ndth","nct","nx","ngth","xth","xt"]
>>> final = ["s","ed"]

That’s pretty yuck. And I’m not too sure at all about where some of those “r“s should go. A bit of a brute-force solution for problem #1 above. I would like to clean this up and somehow make sure it is complete.

Also, that “final” bit is not a linguistic thing, but it seems to me my codas are not accounting for plural words too well.

>>> # thankyou, martin!
>>> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 
53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
>>> def prime_val(ch):
...     return primes[ord(ch.lower()) - ord('a')]
...
>>> def get_val(word):
...     total =1
...     for ch in word:
...             total *= prime_val(ch)
...     return total
...
>>> magic = get_val("chocolate")
>>> nuclei_ok = [n for n in nuclei if magic % get_val(n)  0]
>>> onsets_ok = [o for o in onsets if magic % get_val(o)  0] + [""]
>>> codas_ok = [c for c in codas if magic % get_val(c)  0] + [""]
>>> syllables = []
>>> for o in onsets_ok:
...     for n in nuclei_ok:
...             for c in codas_ok:
...                     syllable = o + n + c
...                     if magic % get_val(syllable)  0:
...                             syllables.append(syllable)
... 
>>> len(syllables)
172
>>> syllables
['talch', 'ta', 'telch', 'te', 'tolch', 'to', 'tealch', 'tea', 'toolch', 'too', 'toalch', 'toa', 'tol', 
'calt', 'calth', 'calch', 'cact', 'calct', 'ca', 'celt', 'celth', 'celch', 'cect', 'celct', 'ce', 'colt', 
'colth', 'colch', 'coct', 'colct', 'co', 'cealt', 'cealth', 'cealch', 'ceact', 'cealct', 'cea', 
'coolt', 'coolth', 'coolch', 'cooct', 'coolct', 'coo', 'coalt', 'coalth', 'coalch', 'coact', 
'coalct', 'coa', 'colct', 'col', 'lact', 'la', 'lect', 'le', 'loct', 'lo', 'leact', 'lea', 'looct', 'loo', 
'loact', 'loa', 'halt', 'hact', 'halct', 'ha', 'helt', 'hect', 'helct', 'he', 'holt', 'hoct', 'holct', 
'ho', 'healt', 'heact', 'healct', 'hea', 'hoolt', 'hooct', 'hoolct', 'hoo', 'hoalt', 'hoact', 
'hoalct', 'hoa', 'holct', 'hol', 'chalt', 'chact', 'chalct', 'cha', 'chelt', 'chect', 'chelct', 
'che', 'cholt', 'choct', 'cholct', 'cho', 'chealt', 'cheact', 'chealct', 'chea', 'choolt', 
'chooct', 'choolct', 'choo', 'choalt', 'choact', 'choalct', 'choa', 'cholct', 'chol', 'tha', 
'the', 'tho', 'thea', 'thoo', 'thoa', 'thol', 'clact', 'cla', 'clect', 'cle', 'cloct', 'clo', 'cleact', 
'clea', 'clooct', 'cloo', 'cloact', 'cloa', 'alt', 'alth', 'alch', 'act', 'alct', 'a', 'elt', 'elth', 
'elch', 'ect', 'elct', 'e', 'olt', 'olth', 'olch', 'oct', 'olct', 'o', 'ealt', 'ealth', 'ealch', 'eact', 
'ealct', 'ea', 'oolt', 'oolth', 'oolch', 'ooct', 'oolct', 'oo', 'oalt', 'oalth', 'oalch', 'oact', 
'oalct', 'oa', 'olct', 'ol']

Note that onsets and codas are optional, hence I add the empty string to those lists. (I forgot to factor in the “final” bit, although it doesn’t make any difference for the word “chocolate”.)

OK so now I have my syllables. You should find that these are basically all pronouncable in English, although they may not be the standard way of being written (for example, if “choct” was a valid word, I think it would be written as “chocked”. “ct” only seems to get to be a coda for a small number of words, like “tact”). And of course many of them are not valid as mono-syllabic words.

Now, how can we combine them into multi-syllabic words? Well, there are some word-level rules, but mostly they seem more relevant to pronunciation. So we should be reasonably safe with just concatenating syllables.

>>> syll2 = []
>>> for syllable in syllables:
...     remaining = syllables[:]
...     remaining.remove(syllable)
...     for r in remaining:
...             combined = syllable + r
...             if magic % get_val(combined) == 0:
...                     syll2.append(combined)
...
>>> len(syll2)
3382

And now…. wait for it…. the big moment has arrived!

>>> 'cocoa' in syll2
True

At the moment this is perhaps not markedly better than just generating every permutation of every length string of the letters in “chocolate”. But there you go… I call it “Dictionary-Free, Linguistically Motivated String Searches”. :)

I am going to ponder if there is a better way to implement this in Prolog. But for Python, is there any way you can use a regular expression for generation rather than parsing? A kind of “regular expression for production”?

tags: , ,

Comment

---