• Kat Gupta’s research blog

    caution: may contain corpus linguistics, feminism, activism, LGB, queer and trans stuff, parrots, London

Corpus Linguistics 2011

I admit that I was feeling rather grumpy before CL2011. Extracting my data had proved tricky, I worried that the stuff I was working on wasn’t ready to present and I was feeling somewhat anti-social.

However, I ended up having a rather good conference. Part of it is just that corpus linguists tend to be nice people – as one first-time attendee noted to me, people were constructive and helpful when commenting on people’s presentations. This is not always the case – these things can turn into an academic pissing contest – and she was pleasantly surprised. As Costas noted, it can feel a bit like a family reunion (the good kind, I hope). It was nice to catch up with friends, meet new people and extract others from the hilariously awkward situations they managed to create for themselves. I have a story about a red devil tattoo now.

The organisation was impeccable. This was the first conference I’ve been to that was in a dedicated conference centre rather than in a university. I’ve got to say, the food was much better than I’m used to at these things. I won’t name names, but some of us were rather enamoured with the little moussey-cakey things at lunch. The only problem seemed to be with workshop venues – there weren’t computing facilities so attendees were asked to bring their own laptops, but the room assigned to one workshop wasn’t suitable for an active, hands-on workshop.
The conference scheduling was thoughtfully done and I presented in the same session as others working on newspaper discourse including Anna Marchi. It was interesting both for us and for the audience – we could make links between each others’ papers and also had the chance to talk afterwards.

I do wonder why corpus linguists haven’t really embraced twitter though. There was a presentation on it (which I livetweeted) but we weren’t told about hashtags, organised a tweetup or similar. Having seen something of how my astrophysicist sister uses twitter at her conferences I think we’re missing out – it looks like a good way of engaging with presentations and finding other conference attendees. Next time eh?

“To the Editor of the Times…”

Apologies for the silence. I am trying to write a conference paper for, um, Thursday and my data is stubbornly refusing to organise itself into categories. In a way I’m quite pleased – I’m now working with two corpora and it’s interesting that they show this difference. One is the Suffrage corpus that I’ve been using until now, created by identifying all the articles in the Times Digital Archive containing suffrag* and pulling them out. The asterisk is a wildcard which means that I don’t need to specify an ending – because it’s got that wildcard in it, the search term will find suffrage, suffragism, suffragette, suffragettes, suffragist, suffragists and so on. It will also identify Suffragan, an ecclesiastical term and one that has nothing to do with the suffrage movement. So the script has an exception in it for that term.

The other corpus is composed of Letters to the Editor – the LttE corpus. This sounds very staid and genteel but actually contained heated exchanges between different factions of the suffrage movement, the Women’s Anti-Suffrage League, various anti-suffragist men and anyone else who felt compelled to stick their oar in. At times it reads more like a blogging flamewar! This corpus was extracted using suffrag* as a search term to get letters mentioning suffrage etc; to get the letters I looked at the header of each text. The header contains information like the file name, the date it was published in the Times, the title of the article and, crucially, what it’s classified as – News, Editorials, Leaders or, indeed, Letters to the Editor. So this time the script looked for suffrag* and Letters to the Editor in the header.

Both corpora are divided by year and month, so I have a folder for 1908, 1909, 1910 etc and within those, sub-folders for each month. So if I wanted to, I could compare texts from April 1909 to April 1910, or June 1913 to December 1913, or the first six months of 1911 to the first six months of 1912. I like organising corpora in a way that allows this flexibility.

In Chapter Four, I investigated Mutual Information (MI) for suffragist, suffragists, suffragette and suffragettes in each year in the Suffrage corpus, then categorised the words it came up with. Mutual Information is a measure of how closely words are linked together. So, suffragist and banana aren’t linked at all, but as I found, suffragist and violence are linked. I then came up with categories for these words – direct action, gender, politics, law & prison and so on, and compared these categories across the different years.

I’ve now done the same for the LttE. What’s interesting is that there is not much overlap between the words associated with suffragist, suffragists, suffragette and suffragettes in the LttE corpus and the words associated with suffragist, suffragists, suffragette and suffragettes in the Suffrage corpus. Part of this is to do with the different functions of the texts; rather than reporting news, the Letters to the Editor try to argue, advocate and persuade. However, there are also words like inferior, educated and employed in the LttE data – words that seem to be more about the attributes of women or suffragist campaigners. This just doesn’t seem to be a feature in the Suffrage data.

Also interestingly, the categorise I came up with don’t work for this corpus. While direct action was a prominent category for the Suffrage corpus, I don’t think I can find a single term in the LttE MI data. Not even things like demonstration which is pretty innocuous as far as direct action goes.

So what’s going on here? At least part of it is due to the different functions of news reports and what are essentially open letters. But I think there’s also a difference in who was writing the letters. Letters to the Editor offered both suffrage campaigners and anti-suffrage campaigners an opportunity to represent their views themselves, rather than being represented by or mediated through a reporter, editor and others engaged in the the production of a news report. I don’t think it’s that strange that the language they use and avoid is different.

Live at Jodrell Bank

Been a bit quiet here, mainly because I’ve been writing and rewriting parts of chapters 3 and 5 and fuelled for the most part by caffeine, discounted creme eggs and irregular sleeping patterns. It’s not been pretty.

However, I did manage to get to Live from Jodrell Bank, tickets purchased for my birthday by my favourite little sister. As the name suggests, it took place at Jodrell Bank and the stage was in the shadow of the Lovell Telescope itself. The Lovell telescope is an impressive structure and seeing it surrounded by people and bathed in glorious sunshine was definitely a change!

Stage beside, and dwarfed by, the Lovell telescopeDetail of the supporting structure of the dish

There were also interesting people wandering around. The chap in the photo below-right was dressed in black, wearing a large crow’s head and clutching two large white balloons. We saw him around all day but didn’t manage to work out what he was doing (apart from looking dramatic).
Man wearing a crow's head over his head holding two large white balloons
The new visitor centre was open and my friend Liz and I had fun playing with something that resembled an old charity money-spinner. It looked like a large black funnel, and you started rolling a ball at the top. The ball would spiral down the funnel until it dropped out of the hole at the bottom. You could then retrieve your ball and do it all over again – multiple times if you’re a small child fascinated by such things or, indeed, two twenty-somethings. It was also fun sending one ball clockwise and the other counter-clockwise and either trying to get them to crash into each other or avoid each other but I’m almost positive this was not the point of the exercise! The point was that this simple model helps we, the general public, understand how black holes work in a fun and hands-on way. People of all ages could engage with it, although the baby we saw there was more interested in chucking the balls straight down the hole!

The map showing different telescopes around the world was striking and really illustrated how the Jodrell Bank Centre for Astrophysics (JBCA) is part of a global community of researchers. It also offered information on the different types of telescopes and showed how the facilities used by the JBCA, such as the e-Merlin network, fit into a wider context.

The line-up was Alice Gold, The Waves Machines, OK GO, British Sea Power and The Flaming Lips. I was especially looking forward to British Sea Power – there’s an endearingly wide-eyed wonder about the natural world in their music; they’ve written a song about a collapsing coastal Antarctic shelf and light pollution. One of the songs they played was the rather appropriate Observe The Skies which I think they dedicated to the Lovell Telescope. British Sea Power are known for decorating the stage with foliage and the appearance of a large bear. This time there was a bear fighting a robot/microwave.

The Flaming Lips put on an entertaining show, incorporating giant hamster balls, lasers, balloons filled with confetti and close-ups of Wayne Coyne’s nostrils. They wanted to mess around with intros and the audience wanted to sing along so there was a lot of “wait for it, wait for it…wait for it…” going on. I’m far too sleepy to attempt to interpret their lyrics, but Race for the Prize might be about two scientists and Do You Realize?? informs us that we are floating in space.

I was standing just in front of Jen so was treated to the physicists arguing over how they did the lasers and what constellations and planets we could see. Astronomers are interesting people to know when you’re standing in a field at night!

Brightly coloured balloons floating in front of a brightly lit stageBalloons floating in front of a brightly lit stage and lit so they seem to glow

As well as being an interesting and unusual music event, it was also an effective outreach event. PhD and postdoc researchers were on hand with science tricks, there were short talks throughout the afternoon by astronomers and Dr Tim O’Brien took to the stage to talk about the kind of research that takes place at Jodrell Bank. He played recordings of pulsars and people were cheering and clapping in time with them. It’s probably the first time I’ve heard people chanting SCIENCE SCIENCE SCIENCE at a gig. There was something joyfully and unapologetically geeky about it: there was an atmosphere of “we’re at a gig under a radio telescope, isn’t this amazing?” As Jen said in her talk, science is about having that sense of wonder and curiosity about the universe and trying to understand it better. There’s nothing uncool about that.

Of course it made me think about what linguistics could do. While communication is a fascinating thing, we collect our data from people, not the vastness of space. Unfortunately the hard drive where my corpora are stored or a digital recorder aren’t as dramatic as a telescope, especially one as iconic as the Lovell. Instead, I reckon we should reverse it: instead of trying to find a suitably research-intensive location for the gig, I reckon we should make the gig research-intensive and record band-crowd interactions or something.

Space and stars also inspire people to write songs about them. While Massive Attack deserve a mention for “love, love is a verb; love is a doing word” in Teardrop and The Indelicates for “You know exactly how clever sounds, the soft consonants and rounded vowels” in Jerusalem, I’m kind of drawing a blank on linguistic songs. I did, however, find a paper on a corpus analysis of rock harmony so that’s something, right?

Any suggestions for linguistics songs? Are there any for your area? What would your dream outreach event be?

Photos by K. Gupta and E. Kedge