• Kat Gupta’s research blog

    caution: may contain corpus linguistics, feminism, activism, LGB, queer and trans stuff, parrots, London

A defence of political correctness

Trigger warning: this post contains slurs for race, sexuality, disability, neurodiversity and gender.

So I tweeted something the other night and was a bit surprised that it took off:

As a queer, Asian, female-assigned-at-birth person with an interesting medical history, I like political correctness. Political correctness is why it is generally considered unacceptable to loudly inform me that I am a “chink”, a “paki”, that I should “fuck off back to where [I] came from”, that I should “fuck off back to Santa’s grotto”, that I am a “fucking dyke”, that I am a “fucking lesbian”, that I am a “fucking dwarf” or that I’m an “it”. Obviously not everyone agrees, which is why all of these examples are taken from real life.

When I come across a written article that uses slurs, I am not inclined to read it. I have lots of things to read: my “to read” list is constantly full of books, journal articles, blog posts. Unless someone has contracted my services as a proofreader or copyeditor, I am not obliged to read anything – and I am not wasting my time on something that uses hurtful language. I am not obliged to “look past” those slurs when those slurs hurt me.

If someone who doesn’t have the right to reclaim the term uses the word “tranny” throughout an article, I also have to wonder how far their knowledge extends. As someone who is involved with trans* welfare, health and legal issues, I have to wonder what I can take from it. I read a lot of those articles because one of my academic interests is the media representation of minority groups and issues, but – please forgive me if this sounds arrogant – I tend not to find something interesting and insightful and useful in such articles.

I love words. My degrees have basically been a love affair with words – how they’re used, what they mean, how they come with associations and connotations. I’ve also been accused of being “politically correct” and I’m well familiar with the argument that such political correctness stifles free expression and is a form of censorship. However, I think avoiding these slurs makes me a better, more thoughtful and more creative writer. For example, when I see the word “demented” being used, my mind flashes back to the dementia ward and day hospital where my mum worked and where my sister and I would accompany her if we were off sick from school. I think of my friend’s dad – my mum’s patient – and having to pretend to be my mum because he couldn’t recognise that I was a different person and trying to explain to him that I wasn’t my mum would be pointlessly upsetting. I think of the astonishing people my mum has treated – doctors and teachers and lecturers and footballers – and their families, and the aching loss of a mind, a history, a person.

I almost certainly don’t think what the writer wants me to think, which appears to be “isn’t this insane[1]/outrageous!”.

If I wrote something and there was so great a mismatch between what I wanted to say and what my readers took away from it, I’d consider that an unsuccessful effort. Not because I’d upset someone – I enjoy creating discomfort and disquiet in my creative work – but because I’d upset someone without intending to, because I’d used my words ineffectively, because it meant that I wasn’t doing my best as a writer.

Being politically correct has made me think about my language choices, and to think carefully about what I want to say. I’m reminded of these posters by Alison Rowan:
that's so...

There are lots and lots of alternatives which often express something more precisely. Just look at what you could use instead of “gay”: silly, heinous, preposterous, contemptuous, hideous, hapless, uncouth, unfortunate, deplorable, trashy, ridiculous, atrocious, corrupt, foolish. Or “retarded”: childish, absurd, indiscreet, ignorant, uncool, pointless, careless, irrational, senseless, irresponsible, illogical, unnecessary, trivial, ill-considered, dull, fruitless, silly. Each of those has different shades of meaning. Instead of the scattershot of “retarded” or “gay”, your words can be like precision strikes, hurting only the people you intend to hurt.

If you want to hurt people, that is. How much worse it is if, in your casual and unthinking use of “gay” or “retarded” or “spaz”, you wound someone you never meant to wound, never realised you wounded.

So back to political correctness.

The term “political correctness” was popularised by its opponents; people who agree that political correctness is often a good thing tend to call it other things, like “basic courtesy”. Political correctness means treating people with respect and courtesy, being mindful of what they do and do not want to be called and how they do or do not want to be addressed. It is offering dignity to minority groups, who are already being shat on in so many ways without having to deal with a barrage of slurs.

Saying that you’re against political correctness is not radical or edgy or subversive; it affirms the status quo. It affirms society’s default as white, straight, cisgendered, neurotypical, non-disabled, male. It does not challenge or mock or destabilise power. What, precisely, is subversive about trotting out the same tired racist, misogynistic, homophobic, transphobic, ableist crap?

 

[1] And let us contemplate the wide variety of words used to stigmatise mental illness and neurodiversity.

swearing with Google ngrams

Back, after an unwelcome hiatus. I’ve learnt my lesson though, and will be backing my wordpress database up. Regularly.

Anyway, being a linguist of the sweary variety, I was intrigued to see someone on twitter use Google lab’s ngram viewer to look at cunt and express surprise and delight that cunt was being used so frequently rather earlier than expected.

I thought the graph looked interesting.  The frequency of cunt was rather erratic: an isolated big peak in around 1625-35; an isolated smaller peak in around 1675; peaks in 1690ish and 1705ish; a rather spiky presence between 1705 and 1800; then fairly consistently low frequency until around 1950 when its frequency increases again.

This seemed puzzling – rather than being fairly low-level but present, there were these huge spikes in the 17th century.  I decided to have a look at the texts themselves.  These turned out to be in Latin, and the following image rather neatly illustrates the two different meanings at work here:

The books themselves seem to be religious texts written in Latin, even if Google’s ever-helpful advertising algorithm seems to interpret things rather differently. As you can see in the first image, I selected texts from the English corpus. It’s possible that the books are assigned a corpus based on their place of publication, but it’s not very intuitive.

I took a closer look at the texts to try and work out what was going on.  Some of the texts were in Latin, as this example taken from De paradiso voluptatis quem scriptura sacra Genesis secundo et tertio capite:

However, this was not the only issue.  I found at least one example of a musical score – this example taken from Liber primus motectorum quatuor vocibus:

Here, the full lexical item is benedicunt. In both of these examples, cunt is not a full lexical item; I can understand why the layout of the score might have led to it being parsed as a separate item, but I’m a bit confused why the same seems to have happened with dicunt.

The high frequency of cunt can also be attributed to Optical Character Recognition (OCR). Basically, the text is scanned and a computer program tries to convert the images into text.  This has varying degrees of accuracy – it can be very good, but things like size and font of print, the paper it was printed on and age of the texts all have an effect. The text obtained through scanning with OCR is then linked to the image.

This example, taken from Incogniti clariss. olim theologi Michaelis Aygnani Carmelitarum Generalis, is probably familiar to those working with OCR scanned texts. The text actually reads cont. but the OCR has read it as cunt. The search program can’t read the image files; all it has to go on are OCR scanned texts. When these aren’t accurate, you get results like these.

I think Google ngram is interesting, but with some caveats. Corpora can be tiny – the researcher can have read every single text in their corpus and know it inside-outside. Corpora can be large and highly structured, like the British National Corpus. Corpora can be large and the researcher doesn’t need to have read every single text contained in them, but through careful compilation the researcher knows where the texts have come from, where they were published and so on – for example, corpora assembled through LexisNexis. This is a bit different – it’s not really clear what’s even in the collection of texts and the researcher has to trust that Google has put the right texts in the right language section. I’ve seen Google ngrams being used to gauge relative frequencies or two or more phrases, but for now I think I’ll stick to more traditional corpora for most in-depth work.

Mark Davis also has a post comparing the Corpus of Historical American English with Google Books/Culturomics. His post is in-depth, interesting and systematic; I just swear a lot. You should probably read his.