Articles tagged: python

#New project problems

1185 days ago

As I mentioned, I had the idea for CFFSW some six months ago. Part of what took me so long to make any progress was a giant amount of faffing about that can only come from knowing enough to be dangerous. In the end, I settled on Wordpress as a platform, knowing it was the one that I would be least tempted to divert my time from building the content to building the platform.

There are a few reasons for that. The first and most obvious being that Wordpress is extremely mature, thus making it far less likely that I should ever need to “go under the hood”. A second one is that it’s PHP, a language I have no desire to develop any competency in.

Something that made me lean away from Wordpress is my experience a couple of years ago, when Wordpress installs were getting exploited left right and centre (especially on Dreamhost). Trying to clean those up was a painful experience, but Dreamhost have made it a lot easier to upgrade Wordpress installs almost immediately (I either get an email with a link to click, or I get an email saying they’ve done it for me) which helps. I tidied up my Dreamhost practices a bit by having only one domain per user (here I mean unix user, I typically only deal with these when installing new things, thereafter most things have a web interface) and also enabling ‘Enhanced Security’ for each of those users (it boggled my mind to learn that this was not the case already). Although Dreamhost claim to enable this by default now, if you make a new user at the time of making a new domain (probably the most common use case) it’s not enabled. So much for that. I digress…

My first iteration of CFFSW actually involved a concerted effort in investigating various Python static-website generator options, and I even chose one (Nikola) and built a first draft. I was quite happy with, it was feature-enough-ful, looked shiny and new enough (thanks Bootstrap), the author/mailing list was responsive, I could write entries in Textile markup (same as Textpattern, which this blog uses) and hey, static sites do not get hacked! I thought I was golden.

But then… I put it down for four months, and when I picked it back up, a huge amount of development had passed. I discovered it is in fact possible for a project to be too active (or maybe, not yet stable enough).

And I remembered how annoying it is to install stuff from source on every computer I use in order to update a blog, which turns out to be at least four. Web interfaces do have their convenience.

However, I have an idea (which I haven’t tried out yet). Github lets you edit/add new files via the web interface, and they do a formatting preview for Textile and other markups. So I can basically use Github as my web interface to updating my blog. I just need to set up some mechanism that rebuilds/reuploads the pages when new commits arrive. (I can accept that requiring a src clone, because it should rarely need updating.) NB: if anyone can point me to some scripts/projects along this line, please do!

Finally, why not more Textpattern? I am quite used to it, but there are several factors against it:

  • Security: Requires more work to update, which means I’m more likely to leave it out of date for longer (FTPing files, Wordpress has spoiled me)
  • Far harder to change themes (which is why this blog still looks how I felt c2008)
  • PHP

If I use a Python static site generator I can do a little platform-building when I feel the urge. It never goes away completely. :)

tags: , , , , , ,

Comment [3]

---

My First Patch

1315 days ago

This happened a little while ago but I didn’t get around to writing about it yet: I added a feature to py.test that is available as of release 2.4.0. :)

I have dabbled in dozens of open source projects, which might extend as far as filing bugs for a few handfuls. But I have rarely been motivated enough to dive in and figure out what was going on and add a new feature or fix a bug that was annoying me. I guess in the case of py.test I use it so heavily at work that I was “itchy” enough to really want to “scratch” it.

The problem – it is very easy to parametrize tests in py.test (feed different inputs into the same test), which is very useful for test isolation (ideally one assert statement per test) without heaps of repeated code. That’s all great, but there is no easy way to mix passing tests and xfail tests. Xfail means “expected to fail”, and this is a powerful way of writing “demonstration tests” for bugs that you are aware of but haven’t yet fixed. Yep, test driven development!

One way to do it could be to simply copy the test and have a version of it just for xfail cases. However if our test function’s contents are more complicated, this is obviously going to be bad repetition liable to fall out of date. With my patch you can now apply the marker directly to the tuple which has the parametrized values:

Incidentally you can apply any marker, not just xfail. At work we use marks to link tests to issues in our issue tracker (essentialy test metadata), and this would work here too.

As well as enjoying using py.test I like the dev community too. The founder Holger Krekel is undoubtedly a very clever guy (he founded and co-developed PyPy) and a good project leader, exactly what you would want in a BDFL. If you’re not using py.test for testing in Python – why not? :)

tags: ,

Comment

---

Using tricky key functions for sort/min/max in Python

1335 days ago

After reading another exhortation for developers to blog (they used to be way more common…) I was shamed enough into writing up something I have enjoyed working on lately – key functions for sort/min/max in Python. Doesn’t sound that exciting but it can be a powerful technique and using key functions encourages you to write better code than trawling through iterables yourself. Everything I have read about key functions is super basic. So maybe this is sorting 201. Have I missed some great resources that cover this kind of thing? Let me know.

I started writing a blog post and then it turned out to be way easier to write the whole thing as an IPython Notebook. So it’s committed in a Github gist, and you can easily view it online via the notebook viewer. I wish I could embed it here but there doesn’t seem to be a way to do that. So – go read this thing I wrote. Corrections etc welcome.

tags:

Comment

---

PyCon AU 2013 highlights

1409 days ago

(15/7 Edited to add links to DjangoCon videos.)

Last week PyCon AU 2013 began and ended in Hobart, the fourth such conference, and a new high. I have attended it each year so far and it has really gone from strength to strength, with sustainable growth and a markedly friendly and welcoming atmosphere, thanks in major part to Christopher Neugebauer who has been the lead steward of the conference for its two year stint in Hobart.

Friday was miniconference day and I sat in on DjangoCon AU, which I got a lot more out of than I expected, considering I am only a casual user of Django at best.

Porting Django apps to Python 3 (Jacob Kaplan-Moss; slides) had a lot of excellent advice that was not at all Django-specific. In his words, one of the benefits of writing Python 3 is that Unicode now “fails early” – if you’re getting it wrong, you’ll find out when you write the code, not 6 months after it’s been deployed and you get your first customer whose name has a diacritic. He talked about how to go about a “single source” approach – one set of code that runs under both Python 2 and 3. Apparently it is possible, with the help of a library called six. This seems like a more sane approach than running 2to3 and ending up having to maintain 2 codebases that slowly diverge.

Secrets of the testing masters (Russell Keith-Magee; slides) advocated for the use of factory_boy, which looks like a good way of avoiding using fixtures (which are hard to maintain) and also avoiding boilerplate in tests. It was originally written to support Django test suites but it also supports SQLAlchemy-driven projects.

Core Developer Panel – I recommend giving this a watch. They seem like a really thoughtful bunch of people; if I was looking for a web framework community to join I would feel really good about making it Django. I was struck by one of the speakers saying that although he was happy Django had brought people to Python, he was disappointed that some people apparently consider themselves “Django developers” rather than just Python developers, just as some companies advertise for “Django developers” rather than something more broad – I think it is a great point and an attitude that points to a healthy community.

On to the main conf. Nobody Expects the Python Packaging Authority (Nick Coghlan; src) was a bit of a history lesson but also, importantly, offered hope for the future. pip 1.4+ will support “wheels”, as an alternative to binary eggs, which is I think the main reason pip has not yet completely displaced easy_install. An “official” guide for how to do packaging is in the works – watch this space. PEP439 plans to add pip to the standard library – at least enough so that pip can bootstrap itself. Brilliant. Among the questions, tip of the week was to set the environment variable
PIP_DOWNLOAD_CACHE so that if you install the same packages into multiple virtualenvs, you won’t need to download each one separately. Win!

Building secure web apps: Python vs the OWASP Top 10 (Jacob Kaplan-Moss; slides) was a super helpful reference comparing the top 10 security risks in web apps against Django, Flask and Pyramid. (Here’s an endorsement – “Flask is perfect for slide-driven development”)

Software Carpentry arrives Down Under! (Damien Irving) was about bringing software carpentry workshops to Australia. These are for scientists who need/want to improve their programming practices, like using version control, how to design their software and test it. Such an important idea.

Using Cython for distributed-multiprocess steganographic md5sum-collision generation. For… reasons. (Tom Eastman) was a fun look at how to solve an unimportant problem to confound his colleagues with a text adventure game in his company’s pastebin. Yep. Reasons. :)

Modern scientific computing and big data analytics in Python (Ed Schofield) – this was a tutorial so it is longer, but a comprehensive overview of libraries like numpy, scipy, scikit-learn, ipython parallel, map-reduce options and many more. Pandas looks like the thing to use for reading large CSV-like data sets. It’s a better option than numpy because you can index by a label/name rather than an integer. It also comes ready with .plot functions which produce matploblib graphs – nice!!

Tinkering with Tkinter (Russell Keith-Magee, slides) was a bit more than the title suggested – as well as a new look at Tkinter, he introduced his proof-of-concept project cricket which is a GUI for running tests and exploring the results (initially only Django, but easy to add others). He argued that we developers are still using tools with a pretty crappy UX for no particularly good reason, despite decades of research showing the benefit of a good interface. I have to agree on the test running front – py.test output is better than average, simply by the use of colour (radical!) – but even so if I have more than about 3 test failures I often paste the output into a text file just so I can scroll through it and don’t miss anything. So I will definitely look at seeing if this can be adapted to py.test and adopting it.

My big gay adventure. Making, releasing and selling an indie game made in python. (Luke Miller) – the story of creating and selling the point-and-click adventure game My ex-boyfriend the space tyrant. If you only watch one talk from PyCon, make it this one. Excellent presenter, covered so many different perspectives in his short 25 minutes – now that’s how you give a talk. From the process of designing and writing a game to marketing it, this was funny, insightful and interesting. Despite not being the target audience in any way at all I think I’m going to sit down with a friend and play it because it looks like a bunch of fun.

Lightning talks on Sunday had two great ones – check out pip install python-nation by Jacob Kaplan-Moss (NB: watch this before running the command :P). The other one I like was by Duckie introducing CompCon, a conference for Australia computing students, but what I really liked about it was the impromptu interrupt for a brief critique of conference presentation styles.

Whew! My to-watch list is still long:

Yeah, it’s a long list… aside from the first keynote, I didn’t miss any sessions. That just tells you how good the program was!

tags: ,

Comment

---

OpenTechSchool Python tutorials begin in Melbourne

1544 days ago

Last Saturday I spent a warm afternoon at the Electron Workshop coworking space in North Melbourne, volunteering as a Python tutor for the first OpenTechSchool workshop in Melbourne. About 25 students came and there was about 10 tutors. It was a lot of fun!

Does this look familiar? Before the workshop started I reminisced with other tutors, fond memories of using Logo in primary school. I used it in grade 4, I’m pretty sure, in a Victorian primary school. That and “Where in the world is Carmen Sandiego?” are all I remember of computers before age 10.

Maybe it’s just nostalgia speaking, but it seems incredibly sad that kids at school today won’t have fond memories of using Logo. OK so Logo looks pretty old-school now. Why is Scratch not its replacement?

Anyway I digress. Python has a Turtle module that lets you relive those glory days in a whitespace sensitive context. My pro-tip from the weekend is: don’t try to use it with IDLE on a Mac, stick to running it from a Python shell in the plain terminal.

OpenTechSchool have their tutorial notes published on github under the CC-BY-SA license, which is pretty great, so you can feel free to learn by yourself at home – but that’s not as much fun as coming to a workshop, right? :)

It amused me to see that one of the OTS team is Duana Stanley, who I knew through Girl Geek Dinner / MXUG things when she was living in Melbourne and working at ThoughtWorks. Small world…

Talking to the attendees about why they wanted to learn programming and how they heard about the event was really interesting. One just found it as a randomly advertised event whilst browsing Meetup! If you know someone who would be interested in this, make sure they join the OpenTechSchool Melbourne Meetup group or the Melbourne Python users group mailing list to hear about future events.

tags: ,

Comment

---

The progressive speaking stack, in Python

1573 days ago

In the free software activism BoF at LCA this evening, Sky raised the idea of using a progressive speaking stack. This is something that came out of Occupy Wall Street:

Occupy Wall Street’s General Assembly operates under a revolutionary “progressive stack.” A normal “stack” means those who wish to speak get in line. A progressive stack encourages women and traditionally marginalized groups speak before men, especially white men. This is something that has been in place since the beginning, it is necessary, and it is important.

(I can’t help noting that this is a queue rather than a stack.) Anyway for lulz I decided to implement it in Python.

>>> from progressive import Speaker, ProgressiveStack
>>> sortMethod = lambda speaker: speaker.privilege()
>>> s = ProgressiveStack(sortMethod)
>>> A = Speaker('A')
>>> B = Speaker('B')
>>> C = Speaker('C')
>>> D = Speaker('D')
>>> s.add(A)
>>> s.add(B)
>>> print s
ProgressiveStack<[Speaker<A>,Speaker<B>]>
>>> speaker = s.next()
>>> speaker.speak()
'AAA!'
>>> print s
ProgressiveStack<[Speaker<B>]>
>>> s.add(C)
>>> speaker = s.next()
>>> speaker.speak()
'BBB!'
>>> s.add(A)  # this person obviously has a lot to say...
>>> print s
ProgressiveStack<[Speaker<C>,Speaker<A>]>
>>> s.add(D)
>>> print s
ProgressiveStack<[Speaker<C>,Speaker<D>,Speaker<A>]>

Note D has moved ahead of A, because A has already spoken.

Enjoy. :)

tags: ,

Comment [2]

---

My talk at PyCon AU 2012 - "Funcargs and other fun with pytest"

1733 days ago

(What’s six months between friends…)

This last weekend I went down to Hobart for the third Australian PyCon conference. The first two were in Sydney, and the next one will also be in Hobart. I had a ball! I will hopefully revive this blog a bit more to write about other aspects of the conference but first up: the talk I gave, which was about the testing library pytest.

Pytest is a mature and comprehensive testing suite for Python projects, but it can be a little intimidating for newcomers. Where do these mysterious funcargs come from, how do parametrised tests work, and where are my xUnit-style setUp and tearDown methods?

Pytest lives by “convention over configuration” – which is great once you know what the conventions are. This talk will look at real examples of pytest in use, emphasising the features that differentiate it from nose.

Video:

I had fun picking out the comics – they are from comically vintage.

Slides, code.

tags: , , ,

Comment

---

Getting virtualenv(wrapper) and Fabric to play nice

2457 days ago

Virtualenv is a great Python tool for isolating dependencies when developing a new project. And virtualenvwrapper is the convenient shell script that should be part of virtualenv by default, IMO. virtualenwrapper lets you do things like:

mkvirtualenv --no-site-packages newproject
workon newproject

… and you’re done. Anything you install via pip, after that, will be confined to your virtualenv. Then when deployment time comes, you can go pip freeze > requirements.txt, your users can go pip install -r requirements.txt and all is neat and tidy with the world.

If you are writing a web app your users are probably your web server(s). Then Fabric comes into the mix. Fabric is designed to make deploying your web project a one-liner. It’s pretty thrilling to use too. Here is a typical fabric command:

def deploy_version(version):
    "Specify a specific version to be made live"
    require('path')
    env.version = version
    with cd('%(path)s' % env):
        run('rm releases/previous')
        run('mv releases/current releases/previous')
        run('ln -s $(version) releases/current' % env)
    restart_webserver()

Nice, no? These slides are a nice intro to fabric as well.

The reason that ‘with cd’ context manager is needed is because Fabric doesn’t keep ‘state’. Apparently each run command is living in its own ssh session, just about. This is a problem if you need to source files, as when using virtualenv. This is hinted at (here, here, here, here) but not really explained clearly anywhere.

To get it working, this is what I ended up doing:

(on my dev machine)

.bash_profile

if [ $USER == blaugher ]; then
    export WORKON_HOME=/home/blaugher/virtualenvs
    source /usr/local/bin/virtualenvwrapper.sh
fi

fabfile.py

def setup():
    """
    Setup a fresh virtualenv as well as a few useful directories, then run
    a full deployment
    """
    sudo('aptitude install -y python-setuptools apache2 libapache2-mod-wsgi')
    sudo('easy_install pip')
    sudo('pip install virtualenv')
    sudo('pip install virtualenvwrapper')
    put('.bash_profile', '~/.bash_profile')
    run('mkdir -p %(workon_home)s' % env)
    with settings(warn_only=True):
        # just in case it already exists, let's ditch it
        run('rmvirtualenv %(project_name)s' % env)
    run('mkvirtualenv --no-site-packages %(project_name)s' % env)
    # [... plus other useful stuff.]
def install_requirements():
    "Install the required packages from the requirements file using pip"
    with cd('%(path)s' % env):
        run('workon %(project_name)s && pip install -r ./releases/%(release)s/requirements.txt' % env)

So, that’s reasonably nice. The .bash_profile is so-written because of this bug – somehow one of the virtualenvwrapper files ends up being owned by root, which causes an IOError for non-root users. You could change the shell command for when you run sudo but it would be pretty tedious. sudo and run have an option of shell (boolean) and in the env you can set the shell command to be used (by default it is /bin/bash -l -c) but there is no easy way to specify different shell commands for run vs sudo commands.

Virtualenvwrapper recommends to add the ‘export’ and ‘source’ lines to your .bashrc. By adding them to .bash_profile instead they will be executed for login shells – like our /bin/bash/ -l, i.e. for all fabric commands, and we don’t have to explicitly source any file in fabric. I thought this was a neat side-step of that problem. I’m not sure what the other implications of .bashrc vs .bash_profile are.

My next problem was a call like this:

run('mkvirtualenv %(project_name)s --no-site-packages' % env)

Fabric was complaining that it was getting back a return code of 1. This seemed odd as it looked like it was working. Even running it by hand still looked good:

blaugher@tardis:~$ mkvirtualenv qwerty --no-site-packages
New python executable in qwerty/bin/python
Installing distribute..................................
..........................................................
................................................................
.....................done.

But not:

blaugher@tardis:~$ echo $?
1

I had joined the mailing list and was preparing to write up my problem when I spotted this earlier reply:

Try reversing the order of env3 and —no-site-packages on the command line. mkvirtualenv expects the environment name to be the last argument.

!!!

Sure enough –

blaugher@tardis:~$ mkvirtualenv --no-site-packages qwerty
New python executable in qwerty/bin/python
Installing distribute........................................................
......................................................
...................................................................done.
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/predeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postdeactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/preactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/postactivate
virtualenvwrapper.user_scripts Creating /home/blaugher/virtualenvs/qwerty/bin/get_env_details
(qwerty)blaugher@tardis:~$ 

That’s a little too subtle for my liking. Although note that as well as the extra user_scripts stuff, this time that the virtualenv is actually activated, as it is supposed to be (the “(qwerty)” prefix tells you you are working in a virtualenv).

A couple of other points. If Django is in your requirements.txt file rather than manually installed on your server, you will want to make sure your Django-specific Python calls look something like run('workon %(project_name)s && python manage.py syncdb' % env).

Making ‘workon’ into something more generically useful is obviously not too difficult. This example fabfile adds a method to do it. Apparently an upcoming version of fabric might have a command like prefix, which would work very well.

Finally using virtualenvwrapper means all your virtualenvs are collected together in their own directory, separate to the projects you use them for. Docs on virtualenv alone tend to suggest nesting the project code within the virtualenv itself, but I prefer the approach of virtualenvwrapper.

The moral of the story is, all your problems are already solved, and all you need to do is locate your answers, out there, somewhere. :)

tags:

Comment

---

Language Sleuthing HOWTO with NLTK

2483 days ago

Wow. There are lots of things I could and should have blogged about, like Australia’s first PyCon, like Wikimania in Poland, like LCA papers… like SFD… like LUV elections… like… ! But for now I will just post the slides from a talk I gave at the Linux Users of Victoria August meeting, called “Language Sleuthing HOWTO: Discovering Interesting Things with Python’s Natural Language Tool Kit”.

Interesting things if you are a member of luv-main, anyway. :)

Slides:

I know they aren’t too comprehensible without context. Maybe I will work them into a few longer-form blog posts…?

tags: , , ,

Comment

---

Job-finding

2724 days ago

Well, it looks like on Tuesday I will be officially leaving the ranks of the unemployed. As fun as it is to have brunch three times a week in all kinds of interesting cafes, get all those life administration tasks sorted, and swan about at conferences, the fear of potential nothingness is a stressful thing to have on your back. Even for someone in as good a position as I have been. The phrase “job security” seems to take on a certain literalness.

In total I spent about about three months actively job-seeking, and five weeks unemployed. Which seems to suggest you should start looking for a new job about a month before you realise you want to. As well as putting the IT job feeds from Seek etc into my feed reader, I “activated my networks”, so to speak ( = told my friends, acquaintances at meetups, and my micro/blog). In the end there were 4 options that I seriously considered. For someone who does not have “X years of technology Y” Seek did prove to be useless. But if you are a Java/.NET/COBOL or even a PHP or Visual Basic person, you will find plenty to keep you busy.

Option #1 was a small business in Carlton North with a small suite of web-based applications for a particular retail sector. I met their business analyst when I decided to attend The Hive for the first time that week and he struck up a conversation with me. I mentioned that I was looking for work as a Python programmer and he did a double-take, then told me his company was looking for Python people. (Actually he told me they were looking for Java people to convert their Python applications to Java, which I light heartedly protested about. They wanted to do that because of trouble finding Python people, though.) So it was a great chance meeting, and I was not too surprised to get a follow-up email later that week.

While they seemed like good people, I think the role would have stretched me a little more than I am ready for right now. I mean some stretching is good, but then there is biting off more than you can chew and setting yourself up for failure. So I had the novelty of my first experience of declining a job offer.

Option #2 arose via someone responding to my blog post, which is pretty sweet. Some weeks passed and eventually a phone-based technical interview was set up, which I kinda bombed. It was one of those “impossible to prepare for” type tests and indeed I felt unprepared. So I didn’t hear back from them but it’s not so bad. I would have had to move to Sydney anyway. :P

Option #3 came about via a good friend of mine. It was kind of like, “Help me brainstorm where I could work.” “What about my workplace?” Her experience of working there was certainly a ringing endorsement, and I think she did the same for me to them. While also a small business, I was impressed at their thoroughness at getting the basics right – interview, technical test, checking references (something Option #1 didn’t fare well at). I would probably have done C# and maybe some IronPython. Broadening my skills in a commercially recognised way would certainly be no bad move. And I would have been very happy to accept their offer, were it not for…

Option #4, in the public sector, and I actually found out about it from a tech-usergroup-acquaintance posting to some mailing lists I’m on. The main things that drew me to this position were the fact that it is Python and deals with language data. That’s pretty much my dream combo at the moment. The interview went pretty well, despite a bumpy start and (in hindsight) a completely wrong answer, nonetheless delivered with conviction and seemingly accepted in same. I realised parallels with my previous work that I hadn’t seen before, and was able to ‘riff’ on those for a bit. I did feel ‘in my element’ enough to give off some confidence and I’m sure that helped a lot.

And so, I start on Tuesday. :)

tags: ,

Comment

---

Finding COCOA in CHOCOLATE without a dictionary?

2840 days ago

Last night I went to MPUG, the Melbourne Python users group. I have been on that mailing list for seemingly years, and it looks like now there will be an attempt to have regular meetings. Woot.

There was a very interesting talk by Martin Schweitzer called “Primetime Wordfinding” or “Elegant String Searches”. (The slides were posted to the mailing list.)

The basic problem is thus:

Given a set of letters and a dictionary, find all words that can be made from those letters.

The method that he outlines is wonderfully elegant, and will be especially appreciated by maths geeks. However seeing dictionary = ’/usr/share/dict/words’ make me think, “Typical IR approach! Where’s the linguistics?”

It also made me wonder how many languages Linux ships word lists for. Apparently Ubuntu ships many varieties of English, Portugese, Bulgarian, Catalan, Danish, Dutch, Finnish, Faroese, French, Galician, Italian, Norwegian, German, Polish, Spanish, Swedish and Ukrainian. So Europe has decent coverage, but the rest of the world, hmm…

So, how about this revised problem:

Given a set of letters and a language, find all words that can be made from those letters.

We don’t have a dictionary but we have a language, which means we have (whether we consciously realise or not) the rules for

  1. how alphabetic letters map to phonemes (sound units)
  2. how phonemes can be combined to form syllables (the main concern)
  3. how syllables can be combined to form words.

I did a bit of looking to see if I could try and find a ready-made solution, and while it seems that syllable ‘parsing’ is a well-studied problem, syllable ‘generation’ is another matter.

Now this is going to be relatively tricky, because English doesn’t have good one-to-one correspondences between letters and phonemes.

So let’s hack some stuff together… as a first approximation, I’ll grab all the written examples from the Wikipedia articles on English phonology, English orthography and the IPA chart for English dialects.

>>> onsets = list("pbtdckgjfvszwmlnryh") + ["ch","th","sh"]
>>> onsets += ["pl","bl","cl","gl","pr","br","tr","dr","cr","gr","tw","dw","gu","qu"]
>>> onsets += ["fl","sl","fr","thr","shr","sw","thw","wh"]
>>> onsets += ["sp","st","sk"]
>>> onsets += ["sm","sn"]
>>> onsets += ["sph"]
>>> onsets += ["spl","spr","str","scl","scr","squ", "sc"]
>>> nuclei = ["a","e","i","o","u","ow","ou","ou","ie","igh","oi","eer","air","ee","ai"]
>>> nuclei += ["au","ea","ou","ai","ey","ei","er","ear","ir","oo","ou","igh","ough",
"y","oy","oa","ou","ow","ol","ar","ere","are","ear","or","ar","ore","oar","our",
"oor","ure","uer"]
>>> codas = ["lp","lb","lt","ld","lk","rp","rb","rt","rd","rk","rgue","lf","lve","lth","lse",
"lsh","lch","lge","rf","rve","rth","rce","rsh","rch","rge","lm","ln","rm","rn",
"rl","mp","nt","nd","nk","mph","mth","nth","nce","nze","nch","nge","ngth",
"ft","sp","st","sk","fth","pt","ct","pth","pse","ghth","tz","dth","dze","x","lpt",
"lfth","ltz","lst","lct","lx","rmth","mth","rpt","rpse","rtz","rst","rct","mpt",
"mpse","ndth","nct","nx","ngth","xth","xt"]
>>> final = ["s","ed"]

That’s pretty yuck. And I’m not too sure at all about where some of those “r“s should go. A bit of a brute-force solution for problem #1 above. I would like to clean this up and somehow make sure it is complete.

Also, that “final” bit is not a linguistic thing, but it seems to me my codas are not accounting for plural words too well.

>>> # thankyou, martin!
>>> primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 
53, 59, 61, 67, 71, 73, 79, 83, 89, 97, 101, 103]
>>> def prime_val(ch):
...     return primes[ord(ch.lower()) - ord('a')]
...
>>> def get_val(word):
...     total =1
...     for ch in word:
...             total *= prime_val(ch)
...     return total
...
>>> magic = get_val("chocolate")
>>> nuclei_ok = [n for n in nuclei if magic % get_val(n)  0]
>>> onsets_ok = [o for o in onsets if magic % get_val(o)  0] + [""]
>>> codas_ok = [c for c in codas if magic % get_val(c)  0] + [""]
>>> syllables = []
>>> for o in onsets_ok:
...     for n in nuclei_ok:
...             for c in codas_ok:
...                     syllable = o + n + c
...                     if magic % get_val(syllable)  0:
...                             syllables.append(syllable)
... 
>>> len(syllables)
172
>>> syllables
['talch', 'ta', 'telch', 'te', 'tolch', 'to', 'tealch', 'tea', 'toolch', 'too', 'toalch', 'toa', 'tol', 
'calt', 'calth', 'calch', 'cact', 'calct', 'ca', 'celt', 'celth', 'celch', 'cect', 'celct', 'ce', 'colt', 
'colth', 'colch', 'coct', 'colct', 'co', 'cealt', 'cealth', 'cealch', 'ceact', 'cealct', 'cea', 
'coolt', 'coolth', 'coolch', 'cooct', 'coolct', 'coo', 'coalt', 'coalth', 'coalch', 'coact', 
'coalct', 'coa', 'colct', 'col', 'lact', 'la', 'lect', 'le', 'loct', 'lo', 'leact', 'lea', 'looct', 'loo', 
'loact', 'loa', 'halt', 'hact', 'halct', 'ha', 'helt', 'hect', 'helct', 'he', 'holt', 'hoct', 'holct', 
'ho', 'healt', 'heact', 'healct', 'hea', 'hoolt', 'hooct', 'hoolct', 'hoo', 'hoalt', 'hoact', 
'hoalct', 'hoa', 'holct', 'hol', 'chalt', 'chact', 'chalct', 'cha', 'chelt', 'chect', 'chelct', 
'che', 'cholt', 'choct', 'cholct', 'cho', 'chealt', 'cheact', 'chealct', 'chea', 'choolt', 
'chooct', 'choolct', 'choo', 'choalt', 'choact', 'choalct', 'choa', 'cholct', 'chol', 'tha', 
'the', 'tho', 'thea', 'thoo', 'thoa', 'thol', 'clact', 'cla', 'clect', 'cle', 'cloct', 'clo', 'cleact', 
'clea', 'clooct', 'cloo', 'cloact', 'cloa', 'alt', 'alth', 'alch', 'act', 'alct', 'a', 'elt', 'elth', 
'elch', 'ect', 'elct', 'e', 'olt', 'olth', 'olch', 'oct', 'olct', 'o', 'ealt', 'ealth', 'ealch', 'eact', 
'ealct', 'ea', 'oolt', 'oolth', 'oolch', 'ooct', 'oolct', 'oo', 'oalt', 'oalth', 'oalch', 'oact', 
'oalct', 'oa', 'olct', 'ol']

Note that onsets and codas are optional, hence I add the empty string to those lists. (I forgot to factor in the “final” bit, although it doesn’t make any difference for the word “chocolate”.)

OK so now I have my syllables. You should find that these are basically all pronouncable in English, although they may not be the standard way of being written (for example, if “choct” was a valid word, I think it would be written as “chocked”. “ct” only seems to get to be a coda for a small number of words, like “tact”). And of course many of them are not valid as mono-syllabic words.

Now, how can we combine them into multi-syllabic words? Well, there are some word-level rules, but mostly they seem more relevant to pronunciation. So we should be reasonably safe with just concatenating syllables.

>>> syll2 = []
>>> for syllable in syllables:
...     remaining = syllables[:]
...     remaining.remove(syllable)
...     for r in remaining:
...             combined = syllable + r
...             if magic % get_val(combined) == 0:
...                     syll2.append(combined)
...
>>> len(syll2)
3382

And now…. wait for it…. the big moment has arrived!

>>> 'cocoa' in syll2
True

At the moment this is perhaps not markedly better than just generating every permutation of every length string of the letters in “chocolate”. But there you go… I call it “Dictionary-Free, Linguistically Motivated String Searches”. :)

I am going to ponder if there is a better way to implement this in Prolog. But for Python, is there any way you can use a regular expression for generation rather than parsing? A kind of “regular expression for production”?

tags: , ,

Comment

---