I attended a short talk by Ronald Carter at the OU in Milton Keynes on Wednesday about an e-language corpus that is being developed.
The University of Nottingham and CUP are developing the Cambridge and Nottingham e-language corpus. So far, the corpus has one million words but it will be developed further. They are looking at twitter, blogs, discussion boards, emails and SMS.
It was argued that e-language is very significant with 10 billion emails and 300 million tweets sent per day. The nature of the language varies greatly with blogs being more writerly (high density of nouns, adjectives, prepositions and articles) and SMS very like spoken English (high use of pronouns, adverbs, verbs and interjection) with Twitter more towards the writerly end. He suggested (and this seems plausible to me) that blogs and Twitter are relatively public and this is why they are more formal.
There does not seem to be much published about this at the moment. A quick google search suggests that some publications are on their way.