Where Do Emails Go When You're Gone?
The final project of the course was a presentation, not a paper, so it is harder for me to document this step in the journey. But here is a quick overview anyway.
I have to start with a Foucault quote on the one concept from his oeuvre — the idea of an episteme — that I have glommed on to:
In any given culture and at any given moment, there is always only one episteme that defines the conditions of possibility of all knowledge, whether expressed in a theory or silently invested in a practice.
I believe the advent of modern communication paradigms has led to us living in something I call the “networked episteme.” This phrase causes my friends and colleagues to immediately start edging away nervously, but it is what I believe so I must share it.
A pervasive yet revolutionary communication paradigm — easy to forget it is just 30 years old — is email. Mine is the last generation to remember what it might have been like not to have an email address. Electronic mail represents a deep transformation in human communications, as I have written elsewhere.
After that slightly earth-shattering backdrop, I want to narrow the field of inquiry quickly. Why does this matter and what is the connection with archives?
The answer: we are facing archival implosion. There are many kinds of emails. I am specifically talking about personal emails — the modern-day equivalent of personal letters. Historians of any period starting with the year 2000 will face a vacuum if we don’t deal with personal emails (paper letters are long gone now…).
Perhaps you assumed your emails would magically stay around forever and equally magically get released to the world someday? Not so. Google, for instance, will delete your account after two years of inactivity. The incredible paradox is that this means your emails — voluminous, comprehensive, searchable — are far more ephemeral than the paper letters of prior generations.
Now if we do get organized, we’ll have to deal with an archival explosion. But it is easier to cut than it is to build, and we need to build, for the historians of the future.
Now to the course. My focus was on whether we are well-positioned to handle email archives.
I am hardly identifying a new problem here. There are now “born-digital” archives and archivists around the world. There is a related effort to create new tools to help process enormous email archives. One such is ePADD, an open-source software package especially suited to the mbox format (used by Gmail, for instance).
ePADD has four modules, and just their description will give you a sense for the complications:
The acquisition stage, or “Appraisal”, is when a donor filters out anything they want to filter out
At the processing stage, the chosen emails are received from the donor and put in the archive.
At the discovery stage, researchers can search through email archives at a granular level.
The delivery stage is the presentation of unredacted emails to the researcher.
There is a long list of thorny issues. Here I’ll touch on two from the acquisition stage: authenticity and documentation:
It is unclear that we can verify, at this time, that “donated” emails have not been tampered with before the donation. One could imagine email files being edited by a clever forger.
We will also be highly reliant on the documentation during the acquisition. This establishes provenance at least from the moment of acquisition and provides context for a long-in-the-future researcher.
We also run the danger that the creation of these archives replicates, despite all the knowledge we now have, the unrepresentative selectivity that we have seen with physical archives. This raises very deep questions about who gets to choose, why they get to choose, how they choose, and the purpose of each archive.
"I've seen things you people wouldn't believe... All those moments will be lost in time, like tears in rain." — Roy Batty, Blade Runner
The two elements of this investigation that struck me as the most interesting, and they go together, are how emails change human behavior (and therefore humans, as I mentioned above); and the persistence of the substrate and formats.
The issue of the effect on human behavior is too complex to even sketch here. The issue of the substrate and formats is easily described: we forget, in the era of cloud computing and magical devices, that somewhere there is a physical substrate with magnetized bits representing information. Above that are layers upon layers of abstraction that we call formats.
None of these are “immortal.” Long-term preservation is going to require a lot of work.
Without that work, our lives, despite now being so profusely documented, will be like “tears in rain”, in the words of the doomed replicant Roy Batty.
PS: here are a few slides from my presentation that provide a bit more info: