The occasional ramblings of a freelance lexicographer

Wednesday, May 23, 2018

Entrepreneurial micro-business or ELT Deliveroo?

In my working life, I inhabit a number of different worlds.

There’s the general ELT world I mingle with at conferences, events and online that includes teachers, teacher trainers, publishing folk, academics and other freelancers. We talk about teaching, methodology, technology, language and yes, occasionally, the state of ELT.

Then there’s the ELT writing crowd, other freelance writers and editors who congregate face-to-face and online via groups like MaWSIG and ELT Freelancers, as well as through various Facebook groups. Like most colleagues, we largely enjoy a good moan … about our latest hassles and project nightmares, about the stresses of being a freelancer subject to the whims of the publishing industry and inevitably, about how things ‘ain’t like they used to be’. They are, however, also a very supportive bunch, always happy to offer encouragement, practical advice and, most importantly, a good laugh. And at times, it can be pretty inspiring to see the varied and exciting things we all get up to.

I also occasionally dip a toe into the world of local networking groups full of (largely female) entrepreneurs and small business owners who seem to spend a lot of time and energy (and money!) on branding and marketing and business plans and coaching and serious networking … the idea of an ‘elevator pitch’ or asking for ‘referrals’ at an ELT event would make most folk run a mile, but for these ladies, it’s all an essential part of the game. If that sounds a touch ‘sniffy’, it really isn’t meant to be. I don’t quite feel part of the ‘networking gang’ largely because my work doesn’t really fit their model. Most of them are customer-facing businesses (fitness instructors, therapists, consultants of some kind) who need to create a brand and market it to members of the public (and each other!). Some are small businesses with staff and premises and physical products to sell. And whilst a lot of the chat in these circles doesn’t really apply to me and my context, I do still meet some interesting people and I often pick up ideas that are tangentially useful or that I can adapt to be relevant.

How I see myself professionally varies enormously depending on who I’ve been hanging out with and how work’s going at any one time. I don’t quite feel like I’m a small business or an entrepreneur, but after a particularly inspiring networking event or talk, I err towards the idea of being a successful, funky little micro-business. After a successful talk at an ELT conference, discussing language and pedagogy with all kinds of different people, I can see myself as a budding ‘expert’ in my field with things to say and stuff to contribute. A lot of the time though, I’m just a slightly frustrated and disillusioned hack writer churning out ‘content’ in less-than-ideal conditions and barely scraping together a living (for the record, I earn considerably less than the average UK salary and my average yearly income has barely risen over 20 years of freelancing).

Earlier this week, a radio programme – the Digital Human on Radio 4 – made me stop and think again about work and my relationship with it. The programme explored how, as a society, we’ve become intent on finding ways to use technology to make our lives easier, more ‘frictionless’. It asked where all the time we’ve supposedly saved goes and it looked into how our work and home lives have increasingly merged, especially those of us involved in the gig economy. One anecdote from anthropologist, Jan English-Lueck really struck a chord with me:

“I remember talking to a woman who had a really bad problem with carpal tunnel and she’d given up camping, she’d given up reading books, she’d given up everything. And she held up her hands and said ‘I save these for my workplace’.” 

As many of you will know, I’ve been managing a chronic pain condition for nearly 20 years now. I’ve given up many things over the years, but in the past few months, my pains have been particularly troublesome and I’ve found myself giving up driving, giving up going to events that involve lots of standing around or sitting in one place (cinema, gigs, theatre) and increasingly, opting out of social events because at the end of the day, I’m so shattered, I just want to collapse into a fug of painkillers. Am I saving what strength and ability I have for work at the expense of other things in my life? Probably. Because I need to work to earn money and pay the bills, and as a freelancer, my income is unstable, I can’t afford to turn down work or miss deadlines, I don’t get sick pay or paid holidays. 

The programme got me wondering whether I’m really an entrepreneurial micro-business with the freedom to choose what I work on and to fit my working hours around other things or whether it’s all just a kind of ELT Deliveroo without the perks of the reflective jacket?

I really don’t know the answer and I’ve flipped between the two poles – and all points in-between – just in the course of writing this post.  What’s your relationship with your working life? Do you see yourself as a business, a creative entrepreneur, an expert, as a gun for hire, a hack writer or a harmless drudge?

Labels: , , ,

Wednesday, May 02, 2018

Corpus insider #2: Frequency & typicality

Corpora are really great for checking collocations: words that are typically used together. Collocation's a really important aspect of language and a vital part of language teaching if we want to help students avoid 'doing' obvious mistakes. As expert speakers, we generally have a feel for an individual word's most typical collocates, but when you're writing materials, it's easy to get a particular combination stuck in your head or to start doubting your intuitions - do we say get a bus or take a bus? The more you say it to yourself, the sillier each one starts to sound. A bit of outside evidence can be really helpful.
If you want to use a corpus to check out collocations though, it's important to understand a few basics about the statistics behind what the corpus tools are showing you and what type of collocations might be appropriate for the materials you're writing.

Frequent vs Typical

The most important distinction to get to grips with is the difference between frequent collocations and typical or significant or strong collocations. Most corpus tools will show you which words most commonly co-occur just based on raw frequency, but some tools will also have an option to rank collocates by strength of attraction, shown as a score. That is, the software will take into account not just how often two words occur together, but how likely that combination is based on the relative frequency of the two items. So the chances of two very frequent words occurring together is quite high and therefore often fairly predictable and uninteresting. If you look, for example, at the raw frequencies for words which modify the noun car, you'll come across a whole load of very common adjectives - new car, old car, small car, first car, other cars, etc. That doesn't really tell you an awful lot about language. Most students could probably guess these combinations. But if you rearrange the collocates by significance, combinations like electric car, sports car, rental car and police car start rising to the top, along with some cars that aren't even cars, like cable car. They're clearly much more interesting from a linguistic perspective, much less predictable and much more what we think of when we talk about teaching collocation. See this Sketch Engine blog post for more about this and more examples (although, I kind of disagree with its conclusions re. language teaching!).

Ranked by frequency (the underlined number)
Source: Sketch Engine, English Web 2013 corpus
Ranked by score (the number on the right)
Source: Sketch Engine, English Web 2013 corpus

When you want typical

I started off using corpora as a lexicographer working on learner's dictionaries. In a dictionary, you want to show the range of a word and its usage, so looking at typical collocates is a great starting point for getting a feel for a word. It helps you to tease out different senses - like the AmE sense of car meaning carriage, as in rail car, train car, freight car, etc. - to identify possible compounds, phrases and idioms  - car park, car pool, get car sick - and to pick out some of the most significant collocates you might want to exemplify and perhaps highlight.

The less obvious but typical collocations are important in teaching materials too, especially when an unpredictable collocation is also very frequent, like catch a bus or board a plane; which score highly on both types of measure. The typical collocations aren't, however, always what we want to focus on.

When you want vanilla

Many dictionary entries, especially for more frequent words, will start with what's known as a 'vanilla' example. That is a simple example that illustrates the basic meaning of the word in a context that's authentic but doesn't contain other elements that distract from the word being exemplified. Information about less obvious collocations, phrases or colligational patterns will come later. So the Cambridge Dictionaries entry for car has the following example sentences:

They don't have a car. (the 'vanilla' example - 'have' is actually one of the top collocating verbs by raw frequency, but it's unremarkable)
Where did you park the car? ('park' is a more interesting collocate)
It's quicker by car.
a car chase/accident/factory

The same principle holds for many other teaching contexts.

When you're introducing potentially new vocabulary items, you want students to focus on those new words. Of course, you want to present them in a realistic context with appropriate collocates, but you don't want to overwhelm the student with extra information and especially not with collocates that are well above the level of the original target word. So if I was, say, teaching car for the first time, I probably wouldn't throw in sports car or rental car, but it might be appropriate to add a bit of variety to the material with simple combinations like new car or small car. Only later when car was a familiar vocabulary item might I want to extend students' range to talk about other types of cars as appropriate contexts cropped up.

When frequent isn't necessarily obvious

A particularly tricky case in English is the set of 'delexical' verbs (make, do, take, get, have, put, give, etc.) which are all incredibly frequent, but for a learner of English, not at all obvious in terms of which to choose. If we go back to what we do with buses, by far the most frequent collocating verb is take. If you look at collocates by frequency, it's right at the top for most corpora. If you switch to order collocates by significance though, because it's a very common verb, it drops way down the order to be replaced by board, ride, catch, park and drive. Obviously, that doesn't mean that we don't need to teach take the bus because it'll be obvious to our students … because it won't!

Weighing up the numbers

So what does all this mean? Which statistics should we be looking at? Well, the answer is probably both. When I'm researching the collocates of a word, I'll flick between both types of ranking to get an overall picture of how the word works, then make my choices based on the teaching context.
  • If I'm looking for a natural example for a new vocab item, I'll probably look at raw frequencies to find a collocate that's common but not distracting.
  • If a collocate - like catch a bus - is high on both scores - it's probably worth teaching, and maybe highlighting, early on.
  • If I'm looking to extend students' range and get them to use familiar words in more varied ways, then I'll investigate the more interesting collocates that come up when ranked by score
A note about data

Finally, as ever with corpora, it’s also important to know what data you’re looking at. As I mentioned in my last corpus insider post, most corpora are made up of predominantly written data and, of course, that’s going to affect the type of results you get back. So, going back to my query at the start of this post about get the bus vs. take the bus, most of the corpora I looked at listed take as a top collocate by frequency, but get, which felt more natural to me, was much further down the lists (both by score and raw frequency). When I looked at the Spoken BNC2014 (a corpus of contemporary spoken British English) though, suddenly get the bus rocketed to the top, suggesting it's something we say, but maybe write slightly less often.

Labels: , , ,

Monday, April 23, 2018

IATEFL2018: Vocabulary lists: snog, marry, avoid?

At the recent IATEFL conference in Brighton, I gave a talk as part of the MaWSIG showcase about the way wordlists are used (and misused), especially in writing ELT materials and some of the issues that writers need to be aware of.

Below is an overview of my key points and also links to some of the references and tools I mentioned. I've embedded links in the post, but also repeated them all at the end, so if you came to the talk and just want the links, feel free to scroll down.

What do I mean by a wordlist?
My talk was about the kind of standardized wordlists that have been put together according to some criteria (typically frequency and usefulness for learners) and then published with the aim of being used as a basis for deciding which vocabulary to prioritize in teaching. There are loads of wordlists out there, but I mentioned just a few of the most well-known:
Specialist lists: Academic Word List (AWL), Academic Vocabulary List (AVL), New AWL, discipline specific lists (e.g. for Engineering, Medicine, etc.)
Vocabulary level tools: These approach the task from a slightly different perspective. Instead of providing a limited list of target vocab, they instead classify items from a learner's dictionary according to the level at which learners are most likely to start using/need each item. I'm especially familiar with English Vocabulary Profile, EVP (from Cambridge) and there's also the Global Scale of English, GSE, vocab tool (from Pearson). Both online tools allow you to look up an item and check its suggested level based on the CEFR scale (A1, A2, B1 etc.)

Why are wordlists popular?
Given the huge variety of English vocabulary, it's not surprising that anything that gives teachers and materials writers a starting point and a guide to which items might be most useful to teach first is popular. Wordlists provide a principled basis for planning a vocab syllabus, backing up our intuitions about which words are most frequent and saving us from reinventing the wheel by having to research the frequency of each word as we go along. For publishers, they also help to ensure a consistent approach to vocab across a coursebook series, across different titles or between a group of writers all working on the same project; they provide a single lexical hymn-sheet for everyone to sing from, if you like.

Why you need to understand your list:
Whilst wordlists have an obvious appeal, especially for writers, I think it's really important to understand any list you plan to use before you get started. Understanding how a list was put together, what the aims of the list compilers were, what criteria they used to select items and what data they used is vital. To take the academic wordlist (AWL) as an example:

  • It aims to identify general academic vocab, so it excludes items that only appear in specific disciplines, such as science or medicine, and focuses on words common across a range of disciplines. So if you're teaching ESP/ESAP, you'll need to supplement it with relevant subject-specific vocab.
  • It's based on data from published academic writing, not from student writing. That means it provides a good guide to the vocab students might need to know receptively (i.e. for reading), which might not necessarily be quite the same as what they need productively, for their own writing. See Durrant (2016) for an interesting look at what proportion of an academic wordlist student writers actually need.
  • The AWL excludes items on the GSL based on the premise that EAP students will have already 'learnt' this core general vocabulary. That doesn't, however, take into account any gaps in students' general vocab knowledge or that many of those general words are absolutely vital for academic writing and are often used in a way that might not be entirely predictable and students might not have already encountered. That's not necessarily a criticism of the list (you've got to draw the line somewhere), but it does mean that as a writer, you might want to include some of that off-list vocab in your syllabus.

And it's not just the AWL this applies to, all wordlists have their own quirks and limitations and unless you understand what these are, you're not going to get the best out of the list or understand what gaps you might need to fill. See the links at the bottom of this post for some places you can learn more about different lists.

User beware:

Issue 1: The nature of English
One issue with trying to chivvy words into a nice, neat list is that English is a messy beast and words are slippery little suckers! 

Multiple meanings: English is a highly polysemous language, that is, many words have multiple meanings. For example, a table can be a piece of furniture (very much an elementary word) or it can be a graphic representation of data in rows and columns (definitely a less frequent sense). Most lists don't differentiate between senses, leaving the user to guess which sense is the core one that should be taught and whether they should stretch to other senses or not. Lists such as EVP and GSE do give levels for different senses (so EVP has table=furniture as A1 and table=chart as B1), but if you put your text through a text-checking tool such as Text Inspector or VocabKitchen, it'll show the level for the basic, most frequent sense only. So in the phrase "the data in the table above", table would be highlighted as A1.

What is a word: Most lists deal in lemmas, that is a single part of speech and its associated inflections (so speak, speaks, spoke, spoken, speaking is one lemma). Some lists, such as the AWL, take the word family as their basic unit, that takes in all the words from a single root, including different parts of speech and prefixes (develop, development, developing, developmental, underdeveloped, etc.). This makes sense in an EAP context where being able to switch between parts of speech is a key skill for student writers, but deciding which members of a word family to focus on also requires a bit of common sense. You might, for example, decide to skip disestablishment as part of the establish word family!

Chunks: Most frequency-based wordlists tend to focus on individual words, simply because even the most common phrases or formulaic expressions (at least, in the first place, etc.) just don't make it in on frequency criteria alone. However, language chunks make up somewhere between 30-50% of any text, so they're clearly a really important part of vocabulary learning. This has two implications for writers (and teachers); firstly, you may want to supplement your wordlist with some useful chunks (such as those on the phrasal expressions list or just collocations to go with your key words) and again, you need to take chunks into account if you're using text-checkers - the chunk 'in the first place' will be shown as a sequence of A1 single words rather than being recognized as a fixed expression (ranked as B2 on EVP).

Issue 2: The nature of language learning
Similarly, language learning is a messy, non-linear sort of process, that isn't as simple as ticking words off a list and declaring them 'learnt'. Wordlists make it all too easy to fall into this trap though. Many's the time I've been told by an editor that I can't include a word in a vocab activity because it's already been 'covered' at a previous level ... and as Dorothy Zemach put it so brilliantly in her plenary "We can't have a student see a word twice!". Most research agrees that vocab learning requires repeated exposures to a word. Of course, I understand where my editors are coming from and there are other ways of recycling vocabulary without having to have the same words pop up as the vocab focus time and again, but it's still an important factor to bear in mind.

There's also the issue of whether a word is going to be most useful for a student at any particular stage for receptive purposes (i.e. we just want them to recognize and understand it when they comes across it) or whether we expect them to be able to use it productively. A lot of words will start off in a student's receptive vocab and then gradually shift into their productive repertoire. Some words will get stuck in reception even though we'd like them to move on. And others can quite happily stay as receptive only ... I know plenty of words that I understand but will probably never feel the need to use. Again, understanding whether a list is suggesting words for receptive or productive use at a particular level is vital. So, EVP, for example, aims to describe vocab that students are using productively at certain levels (based in large part on what students are writing in Cambridge exams). So if a word is labelled B1, then B1 students are already confident enough to use it in their exam writing. That means they probably became familiar with it receptively quite some time before. And if I want to include a word in a reading text in a B1 book, as receptive vocab, choosing an item marked as B2 will be entirely appropriate.

Issue 3: The nature of learners
Finally, learners don't form the single homogenous audience that universal wordlists suggest they might be with an equal number and range of vocab learning gaps to be filled.

L1 plays an important role in vocab learning, with learners from L1s that share a history with English (Romance languages, Greek, Germanic languages) having a head start when it comes to certain words because they're close cognates in their first language. For example, a word like diurnal may seem 'difficult', but if you're an Italian, French, Spanish, Portuguese or Romanian student of animal behaviour, you'll probably recognize it right away. Whereas your German-speaking peer will probably have to look it up to find it's translation (tagaktiv).

Age, interests, location and language needs will also play a role in exactly which vocab items are relevant to any given student. Yes, they'll probably all find a common core useful, but they'll want words to describe the things that are important or helpful to them and their context too. When I was learning French at school, I wanted to know all the cool, teenage slang, nowadays I'd be more likely to want vocab to describe my garden. Anyone using English in an ESP context is likely to need apparently low-frequency, specialist terms, sometimes quite early on in the language learning process.

Language level makes a difference too. Whilst most linguists agree on a common core of the more frequent couple of thousand words or so which might sustain a learner up to, say, intermediate level, beyond that, frequency statistics become less reliable and less useful. As you start to investigate lower frequency words, the range of similar-frequency items suddenly explodes and exactly which words you choose to teach will inevitably have to be guided more by usefulness for particular groups of learners than by simple frequency, making wordlists a much less reliable guide for higher level learners.

Wordlists: snog, marry, avoid?
So, if wordlists are so flawed, should we be bothering with them at all? Well, personally, I'm not going to be dumping them just yet because they are still undoubtedly an incredibly useful tool. But they're just that, a tool, to be used like any other reference resource we might turn to, as just one part of the mix, with full knowledge of their idiosyncratic quirks, taking into account all the factors I've mentioned here and always applying a solid dose of common sense.


These are the most useful links I've found for each list. Most give the background to the list and the list itself.
General Service List (West, 1953)
New GSL (Browne, et. al, 2013)
Academic Word List (Coxhead, 2000)
Academic Vocabulary List  (Gardner & Davies, 2013)
New AWL (2013)
Phrasal Expressions List (Martinez &Schmitt, 2012) 
Phrasal Verbs List (Garnier & Schmitt, 2015) 
Global Scale of English vocab tool (Pearson) - for background to the vocab tool, click on Developing the GSE Vocabulary on the Research & Expertise page 
See also Mura Nava's excellent list of wordlists for many more lists and links, including many of the specialist ESP lists. 

Text analysis tools:
Text Inspector - a paid tool with several analysis options (including EVP and AWL)
VocabKitchen - a free tool with CEFR and AWL options
Lextutor - a free tool with several analysis options, but not the most user-friendly interface  

Other references:

Durrant, P. (2016) To what extent is the Academic Vocabulary List relevant to university student writing? English for Specific Purposes 43
Working with wordlists - a blog post I wrote for the MaWSIG blog a couple of years ago

Labels: , , , , ,