OU blog

Personal Blogs

Patrick Andrews

E language corpus

Visible to anyone in the world
Edited by Patrick Andrews, Tuesday, 18 Mar 2014, 22:59

I attended a short talk by Ronald Carter at the OU in Milton Keynes on Wednesday about an e-language corpus that is being developed. 

The University of Nottingham and CUP are developing the Cambridge and Nottingham e-language corpus.  So far, the corpus has one million words but it will be developed further.  They are looking at twitter, blogs, discussion boards, emails and SMS.

It was argued that e-language is very significant with 10 billion emails and 300 million tweets sent per day.  The nature of the language varies greatly with blogs being more writerly (high density of nouns, adjectives, prepositions and articles) and SMS very like spoken English (high use of pronouns, adverbs, verbs and interjection) with Twitter more towards the writerly end.  He suggested (and this seems plausible to me) that blogs and Twitter are relatively public and this is why they are more formal.  

There does not seem to be much published about this at the moment.  A quick google search suggests that some publications are on their way.

Permalink Add your comment
Share post

This blog might contain posts that are only visible to logged-in users, or where only logged-in users can comment. If you have an account on the system, please log in for full access.

Total visits to this blog: 812666