Apologies for the silence. I am trying to write a conference paper for, um, Thursday and my data is stubbornly refusing to organise itself into categories. In a way I’m quite pleased – I’m now working with two corpora and it’s interesting that they show this difference. One is the Suffrage corpus that I’ve been using until now, created by identifying all the articles in the Times Digital Archive containing suffrag* and pulling them out. The asterisk is a wildcard which means that I don’t need to specify an ending – because it’s got that wildcard in it, the search term will find suffrage, suffragism, suffragette, suffragettes, suffragist, suffragists and so on. It will also identify Suffragan, an ecclesiastical term and one that has nothing to do with the suffrage movement. So the script has an exception in it for that term.
The other corpus is composed of Letters to the Editor – the LttE corpus. This sounds very staid and genteel but actually contained heated exchanges between different factions of the suffrage movement, the Women’s Anti-Suffrage League, various anti-suffragist men and anyone else who felt compelled to stick their oar in. At times it reads more like a blogging flamewar! This corpus was extracted using suffrag* as a search term to get letters mentioning suffrage etc; to get the letters I looked at the header of each text. The header contains information like the file name, the date it was published in the Times, the title of the article and, crucially, what it’s classified as – News, Editorials, Leaders or, indeed, Letters to the Editor. So this time the script looked for suffrag* and Letters to the Editor in the header.
Both corpora are divided by year and month, so I have a folder for 1908, 1909, 1910 etc and within those, sub-folders for each month. So if I wanted to, I could compare texts from April 1909 to April 1910, or June 1913 to December 1913, or the first six months of 1911 to the first six months of 1912. I like organising corpora in a way that allows this flexibility.
In Chapter Four, I investigated Mutual Information (MI) for suffragist, suffragists, suffragette and suffragettes in each year in the Suffrage corpus, then categorised the words it came up with. Mutual Information is a measure of how closely words are linked together. So, suffragist and banana aren’t linked at all, but as I found, suffragist and violence are linked. I then came up with categories for these words – direct action, gender, politics, law & prison and so on, and compared these categories across the different years.
I’ve now done the same for the LttE. What’s interesting is that there is not much overlap between the words associated with suffragist, suffragists, suffragette and suffragettes in the LttE corpus and the words associated with suffragist, suffragists, suffragette and suffragettes in the Suffrage corpus. Part of this is to do with the different functions of the texts; rather than reporting news, the Letters to the Editor try to argue, advocate and persuade. However, there are also words like inferior, educated and employed in the LttE data – words that seem to be more about the attributes of women or suffragist campaigners. This just doesn’t seem to be a feature in the Suffrage data.
Also interestingly, the categorise I came up with don’t work for this corpus. While direct action was a prominent category for the Suffrage corpus, I don’t think I can find a single term in the LttE MI data. Not even things like demonstration which is pretty innocuous as far as direct action goes.
So what’s going on here? At least part of it is due to the different functions of news reports and what are essentially open letters. But I think there’s also a difference in who was writing the letters. Letters to the Editor offered both suffrage campaigners and anti-suffrage campaigners an opportunity to represent their views themselves, rather than being represented by or mediated through a reporter, editor and others engaged in the the production of a news report. I don’t think it’s that strange that the language they use and avoid is different.