Recently in Linguistics Category
The story of Christmas. In Jamaican Creole. Via John Well's Phonetics Blog.
I went into a second-hand bookshop on the way home from uni yesterday. (Pickerings Books, 30 Buccleuch Street (just on the corner with Buccleuch Place). It's closing down, unfortunately.) I usually look for books on Philosophy, Religion, and Language, although I'm open to all really. Sometimes you can get real gems.
Of course, everyone's definition of a gem differs. While my interest in language are mainly theoretical and phonological, I do have a penchant for language descriptions and obscure languages, and the idea of descriptive fieldwork seems very romantic and adventurous to me. (You'd be like Indiana Jones, but a linguist instead of an archaeologist!) So I stumble upon David Watters' A Grammar of Kham, a language spoken by roughly 45,000 people in western Nepal. How much? £5! You can't say no for that price. Especially when it's in such a good condition - hardback, no notable scuffs, paper practically as-new. I get home, and find out that Cambridge University Press, the publishers, expect £96 for the book, and that amazon.co.uk are charging £91.20.
Slightly less impressive, but still a good saving, I got Geoffrey Kimball's Koasati Dictionary (also known as Coushatta, a language spoken by around 400 people in Louisiana) for £4, when amazon.co.uk want £64, and the publishers $85.
Do I need these books? No. Do I want them? Yes. Will I use them, and find them useful? Probably. As I was explaining to my flatmate yesterday, I'm not buying them for any pragmatic value. I doubt I'll be stuck in western Nepal any time soon, or be asked to translate some Koasati myths. I got them because I find this stuff interesting. And because they were bargains!
Kimball, Geoffrey D. Koasati Dictionary. Nebraska: University of Nebraska Press, 1994.
Watters, David E. A Grammar of Kham. Cambridge: Cambridge University Press, 2002.
I spent a good two hours last night playing around with Praat, with some rather odd results. Praat is a phonetic analysis program, packed full of features, that I've been using for over two years now. So far, I've mainly been using it for spectrograms, annotation of files, and making measurements. However, I needed to actually manipulate some sound files, so I started playing around with the settings to see what there is on offer. There's a lot, most of it over my head - multidimensional scaling and combine to ParamCurve, anyone?
I ended up making this WAV file (right click, "save target as" to download), mostly accidentally. It was originally a recording of me saying brewed, but has been manipulated in several ways. A few of the sounds (particularly the distorted time parts) remind me of noises in a lot of breakcore music I listen to (for instance, at around 1:45 this video of Venetian Snare's Szamár Madár (yes, it's meant to sound like that)).
The particular manipulation I wanted to do, was to take a sound file, and take a particular section of that file (from, say, 25ms to 150ms), and change the duration of that section (from, say, 125ms to 100ms). Reading the manual it seemed that this was indeed possible, but to do it with any great precision would require a script. So I figured, let's write this script, and since I'm going to the trouble of writing a script I might as well make it automate most of the process for me. As such, the script will manipulate the duration between the specified time points of the selected sound file, and, optionally, write the output to a .WAV file.
This is the first Praat script I've written from scratch (I've done a few others, mainly edits of other people's work to better suit whatever I'm working on), and if you want to see it, it's here. Hopefully the comments in the code make it fairly transparent, and I think its operation is relatively simple.
Okay, technical stuff now. Look away now! Don't say I didn't warn you! Beware! ... Now that I've scared away the non-geeks, let me talk a little about the extension possibilities of this script. Currently it's fairly limited - it only outputs to WAV, for example. It should be fairly easy to change what it outputs to (or even make a range of options appear on the form). In its current form, the user has to specify the start and the end of the area to be manipulated, by entering numbers. Another (probably easier) way to do this would be to take the information from a textgrid file associated with the sound file. Then all the user has to do is to make a textgrid file, add some points where they want the duration to be changed, and run the script. Coupled with a modification that allows for batch processing of files - modifying several files in one go - this would allow for fairly extensive modification at the click of a button.
I recently read Words Words Words, by David Crystal. It's a popular science book (or should that be "popular lingustics"?) inviting readers of all backgrounds to discover the magic of words, and lexical investigation in general. The book is split into six parts: the universe of words; the origins of words; the diversity of words; the evolution of words; the enjoyment of words; and becoming a word detective, with between four and seven chapters per part.
Crystal writes in an easy-going informal style, which makes it very easy to follow and read through. Crystal's deep passion for words, and the English langyage in particular, is evident in his prose, with the enthusiasm leaping out the page at you, in the form of diverse quotations from literature both classical and contemporary, pictures from around the world, and amusing anecdotes about words and word-usage. Crystal's passion is infectious. Although I study linguistics, I can't say I find the lexicon to be the most fascinating area of study, and I get quite annoyed when people think that I "study words". Yet this book has really opened my eyes to the vast panorama of lexical beauty available to us - to all of us. And with chapter headings like Wordsmithery, Wordmelodies, Worddeaths, and Wordworlds, who can say no?
Along the way, Crystal addresses the language critics, naysayers, and doom-mongers who "reflect gloomily on the present state of the language, make dire prophecies about its future, and wish things were like the earlier golden age they remember so well" (p156), noting that such comments are as old as the language itself. He notes the perfectly natural stages of semantic shift and of word death, while also pointing out that new coinages or borrowings can greatly enhance English's expressiveness.
If you're interested in words, and in the English language in particular, but don't want to have to deal with preachy, badly-researched books, nor wade through a dense academic text, this book is ideal. Crystal is filling a sorely-felt gap in the popular linguistics genre - books written by actual linguists! As influential as Melvyn Bragg or Lynn Truss may be, they have no formal linguistics training, and often serve only to give linguists a bad name.
One, very minor, criticism I have of this book is that the references are scattered throughout the text, and not collated at the end. Having said that, Crystal is very methodical in his reporting of sources, and encourages us to be likewise. You can read his blog here.
Crystal, David. Words Words Words. Oxford: Oxford University Press, 2007. (UK Paperback edition.)
Apologies for not blogging recently. The entry into 2008 has been quite a busy one for me. Anyway, a quick little post today on Sally Thomason's comments on language log the other day, titled Why I Don't Love the International Phonetic Alphabet. (Also cross-posted at Phonoloblog.)
I've never been a fan of Americanist transcription. In fact, that was one of my annoyances with Michael Kenstowicz's Phonology in Generative Grammar (as well as the fact that I find feature theory quite irritating).
Thomason notes that the IPA doesn't go in for diacritics much. That's for good reason! You need the diacritic space free so you can use all the IPA's real diacritics - such as those for voice, syllabicity, nasality, and so on. She notes that she much prefers the Americanist [š] over IPA [ʃ], because:
For linguists who got A’s in penmanship in grade school (if there’s anyone still alive who ever got grades in penmanship), this might work just fine when they’re transcribing data from speakers or from tape recordings. But I’m not one of those people, and there’d be a real risk that my [ʃ]’s would turn out looking like [s]’s and vice versa, and that’s a bad thing when you’re trying to figure out a language’s phonological system.
I've never encountered such a problem. Admittedly I've never done any fieldwork, but I write my <ʃ> as a large letter, taking up the entire vertical line space, elongated and curly, whereas my <s> is only half-height, short and fat.
Her other IPA gripes are the use of the letter <c>; affricate vs plosive+stop representation; and issues with the low central vowel. The last one I can agree on, that the IPA as it stands is inadequate to deal with these issues without elaborate notation. (However, as Eric Armstrong points out in the comments, [ɐ] really isn't that hard to write, it just takes practice.) However, with the first two, I have to disagree. IPA can cope with all the issues raised. I think Thomason has provided good reasoning for why strict 100% adherence to IPA standards is not always the best option, especially within a field of a particular group of languages. But for the majority of cases, the IPA works just fine - leaving only a small minority of times when you have to (or want to) specify what you're doing (e.g. "I'm using [č] instead of [tʃ] because ..."). Bottom line is, be specific, be clear, and say what you mean.
My feed reader has been playing up a little of late. Not really sure why, but it's annoying as I'm missing the latest instalments of my favourite bloggers. In other news, I'm not feeling too great or blogworthy at the moment, so this entry is just going to be a few links. So, without further ado, let's begin.
Wendi Momen recently returned from a trip to the Baha'i World Centre. The Led Zeppelin reference makes me smile.
Over at Language Log, Mark Liberman explains that authoritarian rationalism is not conservatism, especially in relation to linguistic prescriptivism. Wow, that sounds really boring. It's not, honest! Language Log frequently deals with the issue of prescriptivism, and what makes a certain grammar choice "right" or "wrong", and this article is both a good overview and a good discussion of some of the issues. Is "correctness" determined by populist rule, or, in Liberman's words, "the authority of a "rule" invented by a self-appointed expert, who has concluded that the world would be a better place if it were to be run according his prescriptions"? I hope you can determine from his tone on which side he (and the vast majority of today's linguists) stands.
Also at Language Log, Geoff Pullum ponders why the blog is banned in Iran. Being illegal in certain countries of the world does give the blog a certain punk-rock appeal that most contemporary linguistics lacks. The fact that Geoff Pullum spent a good number of years as a rock musician just makes it all the sweeter.
I recently discovered Vye Computers, and their new mini-v s37. I wants. (Maybe for extra geek points I should have said "do want", in a reference to badly-subtitled Chinese bootlegs of Return of the Sith?)
On the subject of small cool gadgets, the Nokia N800 looks pretty cool, especially because it runs Linux and is very customisable.
I'm not the only one who's been collecting links. Recently at Social Science++ there's a great post with a compilation of all you need to know on the race and IQ debate. (Philipe Copeland, over at Bahá'í Thought and Black America, has already given his thoughts on this.)
I also feel an urge to plug Desktop Tower Defence, a fun little tower defence flash game.
This is an interesting example of Chinese calligraphy, the character for horse drawn in such a way that it looks like, well, a horse. This is from Alice's Adventures in China, where the Islamic influence is noted (this is a pot of halál noodles).
There's an interesting article over at Linguistic Mystic, about creating new writing systems in order to conceal the meaning of what you're writing. This is something I've actually done from time to time, not necessarily to hide something I'm writing, but just as a linguistic game. It's also something that I think would be good to use in real cryptographic methods (although it would need some refinement).
Go and read the article, if you haven't already. Or skim it. In brief, you create a writing system (alphabet or whatever) or use an existing writing system, and write out the sounds of English (or whatever language you're writing in), rather than the letters. The advantage of writing out the sounds rather than the spelling is that it makes it more complex than a simple letter-substitution, and thus harder to decode. The article assumes that you'll be handwriting this stuff, which is how using an invented writing system is possible, and also gets around neatly the problem of computers doing statistical analysis on letter frequencies and so on.
Unfortunately, for practical application of this into a real encryption algorithm, there are a couple of hurdles to get over. First is the problem of phonemic transcription - we need to be able to take a word, break it into its phonemic form (the sounds), and then build it up again. While a human can do this okay most of the time, it's laborious. A machine can do it, but it there are two problems - some words are spelled the same but pronounced differently (I can read vs He read the book), and some words are pronounced the same but spelt differently (to, too, two). The scond hurdle is the problem that phonemic transcription is still open to frequency analysis - in fact, by some accounts, it makes it even easier.
Using this system, however, as a basic encryption method, before putting the text under more conventional algorithms, I feel would add an extra layer of security. Imagine someone trying to decrypt an email they've intercepted, and after finally decoding it, to discover that it's in Armenian! At least, it just looks like it's in Armenian - most people don't know any better. While the dedicated spy would seek out someone knowledgable of Armenian, who would inform them that it's just weird English, this would take time, and that's really what cryptography is - delaying the transfer of sensitive information. Here's some English in an Armenian system (I realise not everyone has Armenian language support on their computer (the fools!), so I've included an image below of what it should look like, in a slightly larger font):
Աի հօպ իտ ւարկս
However, this highlights another problem - that often there isn't a complete matching between English sounds and that of another language. For example, Armenian as it stands cannot handle English "th" sounds. What I wrote above was meant to be "I hope it works", but it might as well have been "I hope eet warks". Which looks a little like Dutch.
This is definitely an interesting area, and something I think has potential. It's especially useful as outlined in the original article, for use in a personal diary or journal. Your recipes will be safe! For the time being.
Of the past 36 hours, I've just spent 20 of them working on an assignment. Thirteen and a half of those hours were in the university library. I'd carved out quite a little den for myself. I was at a computer desk, surrounded by piles of books (at times I felt like I was in a fort), a bottle of orange juice (or cup of coffee) nearby. I'd be lounging with my shoes off (one needs to get comfortable), my fingers slowly drifting over the keys as I tried to make sense of the data before me.
The assignment was on English grammar, which admittedly isn't my favourite subject. It was in the form of three "short essay answers". There was no guideline as to length - I ended up with 14 pages (over 4,000 words), which I thought was more than enough. I suppose time will tell.
In the course of my research for the assignment, I was browsing through Bennet's Spatial and temporal uses of English prepositions and I found this piece of paper. It looks like a note that's been used as a bookmark. (Click the image to see a larger version.)
When I first saw it, and pulled it out to read it, I turned it upside down so I could read it, thinking for a brief second that it may be Greek. I quickly realised that this was not the case, and turned it around again, noticing that I could read some English words - ambiguous, vague, but most of everything else was very scrawly. It was at this point that I noticed that the 1, 2, and 3 of the list were on the right, not the left. A couple of the letterforms reminded me of something. And then it clicked - is this Hebrew?
I know very little of the Hebrew script, and even less of Hebrew itself, but it seems a reasonable reckoning. A little bit of internet searching for examples of handwritten Hebrew yield similar results.
The fluency of the handwriting makes me think it's a native speaker. Also, the English words that are there are written in a well-formed cursive script - a common feature of those who learn English as a second language. But here's the question - what does it actually say? Anyone know? Comments appreciated.
Bennett, D. C. (1975). Spatial and temporal uses of English prepositions - An essay in stratificational semantics. London: Longman.
Okay, my last couple of posts have involved both references and transcriptions. So it's time to lay down the law. In this post I hope to codify the manner in which I transcribe various non-Roman orthographies (writing systems), and also the way that I provide references in my posts.
Some of you may be questioning the worth of this. I just like things to be consistent, and I also like to give enough information from books that I talk about or quote from so that my readers can follow up the texts for themselves.
In general, when referencing books, I follow the widely-used MLA style. I am tempted to use the APA style, but frankly it's six and half a dozen - the difference is minimal, but consistency is key. MLA style is as follows:
Surname, Forenames. Title. City of publication: Publisher, Year
For example:
Dobrovie-Sorin, Carmen. The Syntax of Romanian. New York: Mouton de Gruyter, 1994
However, I adopt a far simpler system when referencing Baha'i texts, usually with this format:
Author, Title.
As well as a page reference, I'll usually try to include a paragraph or section number too (e.g. Baha'u'llah, The Kitab-i-Aqdas. p34, K41).
There are several reasons for this practice. Firstly, many Baha'i texts have been published several times by different publishing houses, but the text is all the same. By simply stating which book I am talking about, and providing a page and section reference, my readers can follow up the reference in their own copy, which may be a different edition to mine.
Secondly is the issue of translation. Standard MLA would dictate that I specify the translator of the text in question. However, for the vast majority of translated Baha'i writings, there exists only one translation, which has usually been authorised by the Baha'i World Centre. As such, I will only specify the translation used when I use an unauthorised or provisional one.
Thirdly is the question of dates. Usually (although I'm not sure how widespread this practice is) when one is referencing a republication of an older book, the date of publication is listed, along with the original date of publication in brackets. For example:
Meillet, A. Les Dialectes Indo-Européens. Paris: La Société de Linguistique de Paris, 1950 [1908]
However, with the Baha'i writings this is cumbersome, partly because of the several editions published mentioned in my first point, and partly because many of the texts were not ever published in the first place. For example, Tablets of Baha'u'llah is a collection of letters Baha'u'llah wrote to various people, on various subjects, at various times in His life. So providing a date of original publication for a citation is almost impossible.
Now that that's out the way, let's talk about transcription. Specifically of Arabic, the only language not written in the Latin (Roman) alphabet that I am vaguely competent in. Maybe in the future I'll have cause to write about Sanskrit or Chinese, but I'll cross that bridge if and when I come to it - they have more or less standard transcription systems anyway (IAST and Pinyin respectively).
I use a transcription system based on the Baha'i orthography, but with a couple of changes. There are a number of factors to consider when implementing a transcription. On the one hand, there is the idea that we want a one-to-one correspondence of symbols, so we can transfer from one orthography to another with no data loss. And on the other, there is the idea that we should be transcribing not just the spelling of the word, but the actual pronunciation. This second one is important, as many languages do not have direct "phonetic" spelling systems, and many that do are governed by rules that change pronunciation in particular contexts. For example, the Russian word друг ("friend") is pronounced /druk/, but if we wished to transcribe it letter-for-letter we'd get <drug>. (Yes, the Russian word for "friend" looks like the English word for "drug". But it's pronounced like "droog", and was spelt as such in Burgess's novel Clockwork Orange.) The reason that this word is spelt with the letter usually signifying /g/ is because when we inflect the word, for instance the genitive singular друга "of the friend", then it is pronounced like /g/. So we have two options here: transcribe the word in a way that is faithful to the phonetic reality, which gives us a confusing paradigm of druk - druga; or, transcribe the word in a way that is faithful to the underlying (supposed) mental reality, which gives us the simpler paradigm of drug - druga but requires us to expect the readers be familiar with the rule which says "devoice word-final obstruents" (which is the rule that makes what is originally a /g/ be pronounced as if it were a /k/).
With that in mind, there is the reader/writer distinction. A writer will write something once to be read probably many times. The ratio should help determine what kind of transcription system to use. For example, the transcription system I outline below uses a good number of diacritics and special characters. These take a lot longer for me to type than regular letters. Were I wanting something less intense, I could use the Qalam Romanisation, which is a lot easier on the hands but harder on the eyes. It's plain ugly. I only use the Qalam romanisation in plain-text emails or instant messenger conversations, where the reader/writer ratio is relatively low. (And even then, I do my best to fully vocalise it, and I usually spell the long vowels as <uu, ii> etc rather than <uw iy>.)
Arabic presents some particular problems for transcription, especially if we wish to remain true to the original orthography. For example, a word ending in an /a/ sound can be written in 3 ways: the vowel could be unmarked (so basically not written); it could be written with a tá’ marbúṭa - ة -; or with an ’alif maqṣúra - ى -. These last two are involved in morphophonemic alterations in cases of inflection, where a /t/ appears, or the /a/ changes into an /i/. Do we transcribe these letters as different from regular plain /a/? In my post on Egypt, I transcribed the tá’ marbúṭa as <at>, but then how do we distinguish it from just an /a/ and a /t/ together with nothing to do with a tá’ marbúṭa? At the end of it all, we need to consider the purposes of this transcription. I'm not an Arabicist. I don't intend to write long posts on Arabic philology or grammar. I'm willing to accept a degree of compromise and of ambiguity. Another factor influencing my transcription choice is aesthetics. Some people like to transcribe the hamza and ‘ayn with a 7 and 3 respectively. That looks ugly. I use two different types of apostrophe, which takes a while to get used to for some people (it did for me). It looks nice though, and the number of times you'll find a word whose meaning hinges on whether a letter is a hamza or an ‘ayn is minimal.So without further ado, here's a brief chart detailing the transcription of each letter:
| Letter | Transcription |
| ’alif ا | á |
| bá’ ب | b |
| tá’ ت | t |
| thá’ ث | th |
| jím ج | j |
| ḥá’ ح | ḥ |
| khá’ خ | kh |
| dál د | d |
| dhál ذ | dh |
| rá’ ر | r |
| zayn ز | z |
| sín س | s |
| shín ش | sh |
| ṣád ص | ṣ |
| ḍád ض | ḍ |
| ṭá’ ط | ṭ |
| ẓá’ ظ | ẓ |
| ‘ayn ع | ‘ |
| ghayn غ | gh |
| fá’ ف | f |
| qáf ق | q |
| káf ك | k |
| lám ل | l |
| mím م | m |
| nún ن | n |
| há’ ه | h |
| wáw و | w |
| yá’ ى | y |
| hamza ء | ’ |
The vowels I transcribe as <i, a, u, í, á, ú> (the acutes marking long vowels), and the diphthongs as <ay, aw>. The tá’ marbúṭa is transcribed with <at>, and the ’alif maqṣúra as simply <a>. This differs from the Baha'i Orthography I mentioned above in a number of ways. Firstly, digraphs are not underlined. This leads to an element of potential ambiguity, but makes the transcription easier to read, write, and copy from. It uses <w> and not <v> for the letter wáw.
A number of notable points concern the definite article al-. I transcribe this consistently as al-, even when preceding a "sun" letter - so al-shams, al-núr etc rather than ash-shams, an-núr. Also, when the vowel of al- disappears due to a preceding vowel, this elision will be marked with an apostrophe. This is how we get the name ‘Abdu’l-Bahá’, which in some other transcription systems is ‘Abd al-Bahá’ (or even 3abd al-bahaa7). Again, this causes a little ambiguity, but I feel it is minimal, and is is consistent with the Baha'i Orthography system. Where possible, I eliminate the inconsistencies noted in Winters (1997) and Momen, both cited below. One inconsistency I don't deal with is that of the lack of inflection from nominative case ending in names like ‘Abdu’l-Bahá’: Winters (1997) reckons that it should inflect for the accusative and genitive cases (rather than being frozen in form), making ‘Abda’l-Bahá’ and ‘Abdi’l-Bahá’ respectively. However, to treat foreign words as English words and inflect them thusly is quite normal for English. This is why we speak of "paninis" (despite panini already being plural in Italian). This kind of attitude would likewise require us to say saunassa instead of "in the sauna" (because sauna is a Finnish word, right, so it should be inflected for the inessive case, surely?). While commendable for wishing to preserve the original linguistic integrity of the word(s), this kind of approach ultimately only acts as impediment to communication. Which is why I keep this "inconsistency" in my transcription system.
Finally, when using common terms such as "Baha'i", "Baha'u'llah" and so on, I will leave off the diacritics and only use simple apostrophes (rather than right- or left-leaning ones), simply for the sake of convenience.
Momen, Moojan. Transliteration. http://www.northill.demon.co.uk/relstud/transliteration.htm
Winters, Jonah. Dying for God: Martyrdom in the Shii and Babi Religions. Unpublished MA Thesis, University of Toronto, 1997 (available online)


