According to National Security Agency (of the USA) whistleblower William Binney, the NSA probably has most of our email. See the video Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails. The question then is what they are doing with it? He mentions that the email can be “put it into forms of graphing, which is building relationships or social networks for everybody, and then you watch it over time, you can build up knowledge about everyone in the country.” (see transcript on page). In other words they could (are) building a large social graph that they can use in various ways.
In the transcript of the longer video Binney talks about various programs developed to filter out all the information:
Well, it was called Thin Thread. I mean, Thin Thread was our—a test program that we set up to do that. By the way, I viewed it as we never had enough data, OK? We never got enough. It was never enough for us to work at, because I looked at velocity, variety and volume as all positive things. Volume meant you got more about your target. Velocity meant you got it faster. Variety meant you got more aspects. These were all positive things. All we had to do was to devise a way to use and utilize all of those inputs and be able to make sense of them, which is what we did.
Binney goes on to talk about the code named Stellar Wind program that Bush authorized and then was forced to change after a revolt of some sort in the Justice Department in 2004. Stories tell of senior Bush advisors trying to get Ashcroft to sign authorization papers for the program while he was in the hospital. As for Stellar Wind, it seems to be mostly about metadata – the date, to, and from of emails that you could use to build a diachronic social graph which is what Binney was talking about. Strictly speaking this would be social network analysis rather than text analysis, but they might have supplemented the system with some keyword capabilities. Another story from Time points out the problem with such analysis – that it generates too many vague false positives. “Leads from the Stellar Wind program were so vague and voluminous that field agents called them “Pizza Hut cases” — ostensibly suspicious calls that turned out to be takeout food orders.”
Either way, these hints give us a tantalizing view into how text and network analysis is being experimented with. Are there any useful research applications?
I have been working for a while on archiving the Globalization Compendium which I worked on. Yesterday I got it archived in two Institutional Repositories:
In both cases there is a Zip of a BagIt bag with the XML files, code and other documentation from the site. My first major deposit.
Daniel sent the link to this YouTube video, A walk through The Waste Land, that shows an iPad edition of The Waste Land developed by Touch Press. The version has the text, audio readings by various people, a video of a performance, the manuscripts, notes and photos. I was struck by how this extends to the iPad the experiments of the late 1980s and 1990s that exploded with the availability of HyperCard, Macromedia Director and CD-ROM. The most active publisher was Voyager that remediated books and documentaries to create interactive works like Poetry in Motion (Vimeo demo of CD) or the expanded book series, but all sorts of educational materials were also being created that never got published. As a parent I was especially aware of the availability of titles as I was buying them for my kids (who, frankly, ignored them.) Dr. Seuss ABC was one of the more effective remediations. Kids (and parents) could click on anything on the screen and entertaining animations would reinforce the alphabet.
What happened to all that activity? What happened to all those titles? To some extent they never went away, it is just that attention turned to the web as a means of delivery. The web changed the economics which then changed the design. CD-ROMs could be sold and people (like me) were willing to pay for professional titles. But, it was hard to sell access to web materials when there is so much free stuff out and an expectation of free access. Thus companies changed what they sold when adapting to the web. Web sites were built that were free and promoted the print books like Seussville. These offered supplementary activities and in some cases monetized eyeballs with advertising, but they did not give away free interactive book experiences. Now the iPad (or, to be accurate, the App Store) has brought back a viable economic model where people can buy interactive books.
With Apple’s latest announcement of iBooks textbooks and iBooks Author, the attention is back on interactive books. Apple is clearly trying to change the economics of textbooks and how they are consumed. They want schools to move to iPads and kids to get interactive textbooks from publishers and authors who use iBooks Author to remediate books. Whether Apple sews up the market or we get a more open model, there is a lot to be said for (and against) moving away from print for textbooks.
To get a sense of what the new interactive books might look like there is an interesting demo in a Ted talk by Mike Matas: A next-generatioin digital book. He demos Al Gore Our Choice published by Push Pop Press. From the demo this looks like a book with a bunch of video and info-graphics tacked on. I don’t see a compelling reason for getting the interactive version of the book. In the case of the iPad “The Waste Land” they have used multimedia to thoroughly enhance the poem with readings and scholarship that could actually change your perception of the poem. In this case it seems like a multimedia supplement that just reinforces the content. The Ted talk ends with a hokey interactive where you blow into the iPad or iPhone to animate a graphic.
To be honest, I haven’t played with either one, just watched the demos. “Our Choice” could, as Al Gore says in his Guided Tour, use interactive infographics in ways that really let you understand the data differently. I also like the pinching and folding interaction they have pioneered for picking things up. The larger question is where are interactive books going? Will Apple convince schools and publishers to move to interactive textbooks? Will kids end up carrying around both heavy print texts and iPads or will the shift be complete at the expense of many texts? Personally I still buy print books of things I expect to want to consult over time even when there is an electronic version. Print books I only have to buy once (and then move in boxes forever). Electronic versions I have buy again and again as media like CD-ROMs go out of fashion, operating systems change, and viewing devices morph. Books are designed to last a lifetime, electronic media are obsolete before you finish walking through the wasteland.
Susan pointed me to Leximancer which is a commercial text analysis tool that creates mind maps of your information. I’m struck by how compelling people find mind maps.
Leximancer enables you to navigate the complexity of text in a uniquely automated fashion. Our software identifies ‘Concepts’ within the text – not merely keywords but focused clusters of related, defining terms as conceptualised by the Author. Not according to a predefined dictionary or thesaurus.
The Concepts are presented in a compelling, interactive display so that you can clearly visualise and interrogate their inter-connectedness and co-occurrence – which is as important as the Concepts themselves – right down to the original text that spawned them.
The Guardian has a great series on the Battle for the internet. This includes a number of interventions by Tim Berners-Lee including Tim Berners-Lee urges government to stop the snooping bill and Tim Berners-Lee: demand your data from Google and Facebook. There is an article, Web freedom faces greatest threat ever, warns Google’s Sergey Brin, about the dangers of walled gardens like FaceBook and Apple’s App Store. One might say the same about Google.
I just got a complementary copy of La macchina nel tempo: Studi dei informatica umanistica in onore di Tito Orlandi (The Time Machine: Studies in humanities computing in honour of Tito Orlandi) which I blogged about before. This got me wondering how much of Prof. Tito Orlandi’s writings are available online and what his legacy is. It turns out that Orlandi has put together a list of his publications with links to online versions where possible. There are even some in English like the excellent Is Humanities Computing a Discipline?
But how might one summarize Orlandi’s contribution? In his prefatory “Controcanto,” one of the editors of The Time Machine, Domenico Fiormonte, writes about first encountering Orlandi in a bunker where Fiormonte then spent a summer. During that summer he learned 3 things:
These three lessons seem about as good a starting place for the digital humanities as any. They also suggest some of what Tito Orlandi was interested in, namely formalization, redefinition, and interpretation. But surveying Orlandi’s writings, using the list of digital humanities publications from his personal site, you can see other themes. He believed that we needed to develop the theoretical foundations of humanities computing and that we should do that from the mathematical model of the computer, not how it works practically. (See Informatica, Formalizzazione e Discipline Umanistiche (in Italian.)) He believed that would help us understand how one can model culture on a computer. He discussed the importance of modelling before Willard McCarty did in Humanities Computing – something that should be recognized out of fairness to the pioneering work of Italian digital humanists since Busa.
Reading Orlandi and about Orlandi I also sense an impatience with those that follow him. This is what he writes in an unpublished talk given in London in 2000. He is talking about discussions by other scholars on the digital humanities.
I feel a sense of inadequateness, even disorder, in the overall change as presented by the same scholars. In fact, when they proceed to propose a definition of humanities computing, they tend to consider the products of computation, be they hardware (the Net) or software (applications like concordance programs or statistical packages), rather than the first principles of computing.
Orlandi wanted to ground the digital humanities in mathematics – a language common to informatics, science and potentially the digital humanities. That the digital humanities wandered off into hypertext, new media and so on seems to have annoyed him. He was also irritated that ideas he had been teaching and writing about for years were being ignored in the English-speaking world. Take a look at The Scholarly Environment of Humanities Computing: A Reaction to Willard McCarty’s talk on The computational transformation of the humanities. This web page discusses an outburst of his at a paper by McCarty with what Orlandi felt were ideas he had been discussing for a decade at least. It is instructive how he sets aside his pride to get at the issues that matter. He might be irritated, but he also wants to use this to reflect on more important issues.
Perilli and Fiormonte have done a great job bringing together a festschrift in honour of Orlandi. The Time Machine isn’t really about Orlandi’s thought so much as about his legacy in Italy. What we need now is for his foundational works to be translated and a retrospective interpretation of his contributions.
An article in the New York Times led me to the Google Art Project. This project doesn’t feel like a Google project, perhaps because it uses an off-black background and the interface is complex. The project brings together art work and virtual tours of many of the worlds important museums (but not all.0 You can browse by collection, artist (by first name), artworks, and user galleries. You can change the language of the interface (and it seems to change even when you don’t want it to in certain circumstances.) When viewing a gallery you can get a wall of paintings or a street view virtual tour of the gallery. Above you see the “Museum View” of a room in the Uffizi with a barrier around a Filippino Lippi that is being treated for a woodworm infestation! In the Museum View you can pan around and move up to paintings much as you would in Google Maps in Street View. On the left is a floor plan that you can also use.
This site reminds me of what was one of the best multimedia CD-ROMs ever, the Musee d’Orsay: Virtual Visit. This used QuickTime VR to provide a virtual tour. It had the sliding walls of art. It also had special guides and some nice comparison tools that let you get a sense of the size of a work of art. The Google Art Project feels loosely copied from this right down to the colour scheme. It will be interesting to see if the Google Art Project subsumes individual museum sites or consortia like the Art Museum Image Consortium (Amico.)
I find it interesting how Google is developing specialized interfaces for more and more domains. The other day I was Googling for movies in Edmonton and found myself on a movies – Google Search page that arranges information conveniently. The standard search interface is adapting.
The MLA has issued a New Variorum Shakespeare Digital Challenge. They are looking for original and innovative ways of “representing, and exploring this data.” You can download the XML files and schema from Github here to experiment with. Submissions are due by Friday, 31st of August, 2012. The winner of the challenge will get $500.
A new digital humanities collection focusing on collaboration, Collaborative Research in the Digital Humanities, has been published by Ashgate. The collection is edited by Marilyn Deegan and Willard McCarty and was developed in honour of Harold Short who retired a few years ago from King’s College London where he set up the Humanities Computing Centre (now called the Department of Digital Humanities).
I contributed a chapter on crowdsourcing entitled, “Crowdsourcing the humanities: social research and collaboration”.
How Star Trek artists imagined the iPad… 23 years ago is an article in Ars Technica about the design of the iconic Star Trek interfaces from those of PADDs (Personal Access Display Devices) to the touch screens used on the bridge. It turns out that one of the reasons for the flat touch screen interfaces was that they were cheap (compared to panels with lots of switches as contemporary spacecraft had.)
What could be simpler to make than a flat surface with no knobs, buttons, switches, or other details? Okuda designed a user interface dominated large type and sweeping, curved rectangles. The style was first employed in Star Trek IV: The Voyage Home for the Enterprise-A, and came to be referred to as “okudagrams.” The graphics could be created on transparent colored sheets very cheaply, though as ST:TNG progressed, control panels increasingly used video panels or added post-production animations.
From the photographs it looks like they didn’t just do the usual think of showing screen shots and concept art as art, but they have sequences of screens titled “Avances in Mechanics” that show, for example, how jumping has changed in games over time. The exhibit also seems to have a historical bent:
The Art of Video Games is one of the first exhibitions to explore the forty-year evolution of video games as an artistic medium, with a focus on striking visual effects and the creative use of new technologies. It features some of the most influential artists and designers during five eras of game technology, from early pioneers to contemporary designers. The exhibition focuses on the interplay of graphics, technology and storytelling through some of the best games for twenty gaming systems ranging from the Atari VCS to the PlayStation 3. (from the exhibit site)
I was very pleased to join Dan Cohen, Tom Scheinfeldt, Mills Kelly, and (for the first time!) Amanda French for Episode 69 of Digital Campus: “Strange Bedfellows.”
I am very flattered by the number of congratulations I’ve received on and offline for the publication of my book, Reading Machines: Toward an Algorithmic Criticism. (University of Illinois Press). Many have even pre-ordered it, and I fully intend to reimburse anyone who does that by buying them beer. But I must point out that the book is not actually out. When I try to click through and order a copy, it tells me that it won’t be available until the first week of December.
I don’t know how accurate that is (it’s the first time I’ve read an actual release date). I can say that the book is very close to being ready for the printer.
But now that it has been revealed, I want to take this opportunity to point out how amazing the cover is. I didn’t have any brilliant suggestions to make about cover art (I’m kind of a text guy), and so I just left it to the design department at Illinois. They hired a brilliant designer named Alex de Armond (http://www.alexdearmond.com/) who created the image by layering blank pages from Google Books, thus creating a texture somewhat like the layers of an onion skin. The thumb, of course, is one of the infamous artifacts from the Google Books scanning process. I am so very, very glad that I left this decision to others. I think it’s fantastic.
And really, why not judge a book by its cover?
This past term I gave a graduate course, in collaboration with Jeff Trzeciak, on “Technologies of Communication” and tried something that feels a bit subversive: I didn’t assign any essays. Of course, I’m almost certainly not the first humanities professor to not assign essays in a graduate course (I haven’t bothered looking for other examples though one could check the syllabus finder), but it still it still goes against the grain of all my own experiences as a student and challenges what I take to be one of the most common practices of assessment in the humanities.
My reasoning was that students would probably only gain a slight incremental benefit from writing Yet Another Essay (even if we all benefit from every instance of writing and receiving feedback, no matter where we are in our careers). However, if I could formulate some assessment modules that encouraged students to express themselves with unconventional technologies – and, crucially, think about the process of formulating their ideas and arguments with the constraints and affordances of those technologies – then that experience would be much more valuable to them. Besides, I tend to like experimenting with pedagogy and don’t need much of an excuse do so.
There are the four seasons – Winter, Spring, Summer, Fall – but there are also other annual phases that my academic physiology anticipates and experiences. I’m not talking about Oobleck, but rather things like the planning of the intense conference season in digital humanities (May-June), start of the academic year (September), preparation of my annual report (January), and result from the Standard Research Grants programme at the beginning of April. Happily, this season brought two bits of good news about successful proposals:
A few weeks ago, I realized that I no longer use graphical applications.
That’s right. I don’t do anything with GUI apps anymore, except surf the Web. And what’s interesting about that, is that I rarely use cloudy, AJAXy replacements for desktop applications. Just about everything I do, I do exclusively on the command line. And I do what everyone else does: manage email, write things, listen to music, manage my todo list, keep track of my schedule, and chat with people. I also do a few things that most people don’t do: including write software, analyze data, and keep track of students and their grades. But whatever the case, I do all of it on the lowly command line. I literally go for months without opening a single graphical desktop application. In fact, I don’t — strictly speaking — have a desktop on my computer.
I think this is a wonderful way to work. I won’t say that everything can be done on the command line, but most things can, and in general, I find the CLI to be faster, easier to understand, easier to integrate, more scalable, more portable, more sustainable, more consistent, and many, many times more flexible than even the most well-thought-out graphical apps.
I realize that’s a bold series of claims. I also realize that such matters are always open to the charge that it’s “just me” and the way I work, think, and view the world. That might be true, but I’ve seldom heard a usability expert end a discourse on human factors by acknowledging that graphical systems are only really the “best” solution for a certain group of people or a particular set of tasks. Most take the graphical desktop as ground truth — it’s just the way we do things.
I also don’t do this out of some perverse hipster desire for retro-computing. I have work to do. If my system didn’t work, I’d abandon it tomorrow. In a way, the CLI reminds me of a bike courier’s bicycle. Some might think there’s something “hardcore” and cool about a bike that has one gear, no logos, and looks like it flew with the Luftwaffe, but the bike is not that way for style. It’s that way because the bells and whistles (i.e. “features”) that make a bike attractive in the store get in the way when you have to use it for work. I find it interesting that after bike couriers started pairing down their rides years ago, we soon after witnessed a revival of the fixed-gear, fat-tire, coaster-break bike for adults. It’s tempting to say that that was a good thing because “people didn’t need” bikes inspired by lightweight racing bikes for what they wanted to do. But I think you could go further and say that lightweight racing bikes were getting in the way. Ironically, they were slowing people down.
I’ve spent plenty of time with graphical systems. I’m just barely old enough to remember computers without graphical desktops, and like most people, I spent years taking it for granted that for a computer to be usable, it had to have windows, and icons, and wallpapers, and toolbars, and dancing paper clips, and whatever else. Over the course of the last ten years, all of that has fallen away. Now, when I try to go back, I feel as if I’m swimming through a syrupy sea of eye candy in which all the fish speak in incommensurable metaphors.
I should say right away that I am talking about Linux/Unix. I don’t know that I could have made the change successfully on a different platform. It’s undoubtedly the case that what makes the CLI work is very much about the way Unix works. So perhaps this is a plea not for the CLI so much as for the CLI as it has been imagined by Unix and its descendants. So be it.
I’d like this to be the first of a series of short essays about my system. Essentially, I’d like to run through the things I (and most people) do, and show what it’s like to run your life on the command line.
First up . . .
I think most email programs really suck. And that’s a problem, because most people spend insane amounts of time in their email programs. Why, for starters, do they:
1. Take so long to load?
Unless you keep the app open all the time (I’m assuming you do that because you have the focus of a guided missile), this is a program that you open and close several times a day. So why, oh why, does it take so much time to load?
What? It’s only a few seconds? Brothers and sisters, this is a computer. It should open instantaneously. You should be able to flit in and out of it with no delay at all. Boom, it’s here. Boom, it’s gone. Not, “Switch to the workplace that has the Web browser running, open a new tab, go to gmail, and watch a company with more programming power than any other organization on planet earth give you a . . . progress bar.” And we won’t even discuss Apple Mail, Outlook, or (people . . .) Lotus Notes.
2. Integrate so poorly with the rest of the system?
We want to organize our email messages, and most apps do a passable job of that with folders and whatnot. But they suck when it comes to organizing the content of email messages within the larger organizational scheme of your system.
Some email messages contain things that other people want you to do. Some email messages have pictures that someone sent you from their vacation. Some email messages contain relevant information for performing some task. Some email messages have documents that need to be placed in particular project folders. Some messages are read-it-later.
Nearly every email app tries to help you with this, but they do so in an extremely inconsistent and inflexible manner. Gmail gives you “Tasks,” but it’s a threadbare parody of the kind of todo lists most people actually need. Apple mail tries to integrate things with their Calendar app, but now you’re tied to that calendar. So people sign up for Evernote, or Remember the Milk, or they buy OmniFocus (maybe all three). Or they go add a bump to the forum for company X in the hope that they’ll write whatever glue is necessary to connect x email program with y task list manager.
I think that you should be able to use any app with any other app in the context of any organizational system. Go to any LifeHacker-style board and you’ll see the same conversation over and over: “I tried OmniOrgMe, but it just seemed too complicated. I love EternalTask, but it isn’t integrated with FragMail . . .” The idea that the “cloud” solves this is probably one of the bigger cons in modern computing.
Problem 1 is immediately solved when you switch to a console-based email program. Pick any one of them. Type pine or mutt (for example), and your mail is before your eyes in the time it takes a graphical user to move their mouse to the envelope icon. Type q, and it’s gone.
Such programs tend to integrate well with the general command-line ecosystem, but I will admit that I didn’t have problem 2 completely cracked until I switch to an email program that is now over twenty years old: nmh.
I’ve written elsewhere about nmh, so allow me to excerpt (a slightly modified) version of that:
The “n” in nmh stands for “new,” but there’s really nothing new about the program at all. In fact, it was originally developed at the RAND Corporation decades ago.
We’re talking old school. Type “inc” and it sends a numbered list of email subject lines to the screen, and return you to the prompt. Type “show” and it will display the first message (in any editor you like). You could then refile the message (with “refile”) to another mailbox, or archive it, or forward it, and so on. There are thirty-nine separate commands in the nmh toolset, with names like “scan,” “show,” “mark,” “sort,” and “repl.” On a day-to-day basis, you use maybe three or four.
I’ve been using it for over a year. It is — hands down — the best email program I have ever used.
Why? Because the dead simple things you need to do with mail are dead simple. Because there is no mail client in the world that is as fast. Because it never takes over your life (every time you do something, you’re immediately back at the command prompt ready to do something else). Because everything — from the mailboxes to the mail itself — is just an ordinary plain text file ready to be munged. But most of all, because you can combine the nmh commands with ordinary UNIX commands to create things that would be difficult if not impossible to do with the GUI clients.
I now have a dozen little scripts that do nifty things with mail. I have scripts that archive old mail based on highly idiosyncratic aspects of my email usage. I have scripts that perform dynamic search queries based on analysis of past subject lines. I have scripts that mail todo list items and logs based on cron strings. I have scripts that save attachments to various places based on what’s in my build files. None of these things are “features” of nmh. They’re just little scripts that I hacked together with grep, sed, awk, and the shell. And every time I write one, I feel like a genius. The whole system just delights me. I want everything in my life to work like this program.
Okay, I know what you’re thinking: “Scripting. Isn’t that, like, programming? I don’t want/know how to do that.” This objection is going to keep re-appearing, so let me say something about it right away.
The programming we’re talking about for this kind of thing is very simple — so simple, that the skills necessary to carry it off could easily be part of the ordinary skillset of anyone who uses a computer on a regular basis. An entire industry has risen up around the notion that no user should ever do anything that looks remotely like giving coded instructions to a machine. I think that’s another big con, and some day, I’ll prove it to you by writing a tutorial that will turn you into a fearsome shell hacker. You’ll be stunned at how easy it is.
For now, I just want to make the point that once you move to the command line, everything is trivially connected to everything else, and so you are mostly freed from being locked in to any particular type of tool. You can use a todo list program that makes Omnifocus look like Notepad. You can use one that makes Gmail Tasks look like the U.N. Charter. Once we’re in text land, the output of any program can in principle become the input to any other, and that changes everything.
In the next installment, I’ll demonstrate.
(This is a lightly edited version of a post from my DayOfDH (next year there are rumblings that we may be able to aggregate content from our existing blogs instead of self-plagiarizing.)
We now have three years of DayOfDH blogging archives – that’s a pretty rich record of how digital humanists describe their activities in a given day. It also constitutes an interesting corpus for practising what Geoffrey Rockwell and I have been calling rapid analysis – trying to see what one might usefully glean from a relatively quick look at digital texts using specialized tools. Our interest in this is to develop techniques that might be useful to a wide range of people in digital society – for instance, students doing preliminary research or journalists compiling materials for an article.
Building the corpus was relatively, made even easier by a few tweaks kindly done by the DayOfDH team. I downloaded a full RSS archive of each years like this (I did this on the command-line, but you can just open each quoted URL in the browser and save it if it seems more convenient):$ curl "http://ra.tapor.ualberta.ca/~dayofdh/?wpmu-feed=posts" > 2009.xml $ curl "http://ra.tapor.ualberta.ca/~dayofdh2010/?wpmu-feed=posts" > 2010.xml $ curl "http://ra.tapor.ualberta.ca/~dayofdh2011/?wpmu-feed=fullfeed" > 2011.xml
Today NEH hosts the 2010 Start-Up Grant project directors meeting, featuring lightning talks on 46 projects funded in the last round. It’s no secret that I’m a fan of lightning talks, which are short elevator pitches — they’ve been a part of THATCamp and we’ll be hosting a panel at the American Studies Association conference this fall that will include them (more details, soon). As Melissa Terras said in her DH2010 Plenary, we must “be prepared by having at the tip of our tongues what we do and why we matter and why we should be supported and why DH makes sense.” This is good practice.
Since each project will be limited to only two minutes and three slides, I plan to gloss-over what was one of the more considerable parts of my grant proposal: a justification for APIs and funding of a workshop as a level-one startup. Instead, my pitch will address the basics: who, what, when, where, and why. I’ve also decided to share relevant text from my accepted proposal below for those that may be interested in learning more. Feel free to leave a comment or contact me directly.
Level 1 Start Up funding is requested to support a two-day workshop on Application Programming Interfaces (APIs), hosted by the Maryland Institute for Technology in the Humanities (MITH) at the University of Maryland. The workshop will gather 40-50 Digital Humanities scholars and developers who are using or interested in using APIs in their digital projects, industry leaders who will demonstrate their APIs, and practitioners who will help guide the group through the “working weekend.” The workshop will lay the groundwork for the integration of APIs into participant projects, and serve as a platform to develop future ideas for how to share and access humanities data through APIs. Presentations by workshop presenters will be video recorded, and recordings made available online through the workshop and MITH websites. Similarly, workshop activities will be archived on the workshop website, which will act as a clearinghouse and publication of all workshop-related content.
Enhancing the Humanities
Thanks largely to generous funding at both the national and global levels, there are now many large repositories of cultural and scholarly data freely available on the Web. The best of these repositories usually provide tools for searching, viewing, and manipulating their contents; tools, which, following the current conventions and values of web design, are often designed according to an uncluttered, simple aesthetic. These tools make the most common use cases as intuitive and simple as possible, and return nicely formatted results presented as a web page (rather than, say, an XML file or Excel spreadsheet) to be examined within the web space of the archive. There is much to be said for this approach, however, by itself, it can prevent (or make unmanageably difficult) uses of the data that while welcome and useful may not have been imagined by the original designers. A scholar may, for instance, want to ask a question that an elegant but relatively simple search interface does not allow. Another may want to combine the data from two archives together to create a visualization to illuminate previously unknown or unacknowledged connections. The growing number of scholars who have both content expertise and software programming ability increasingly want to access data programmatically (that is, within their own code) rather than by using an intuitive though limited web interface. At the same time, there are often reasons–practical, political, and legal–why it is not always preferable or possible to simply allow users to download the entire dataset for use on their own machines. The most common and arguably best solution to this problem is an Application Programming Interface (API).
An API can be informally defined as a set of published commands that computer programmers can use in their own code to interact with code that they did not write and to which they often have only limited access. For example, an API is often provided to allow third party programmers to retrieve data from a repository that they do not control. The emergence of APIs has facilitated the growth of “mashups”: the combination of data from different sources. Examples of mashups include, for instance, plotting photographs from the photo sharing service Flickr on a Google Map, or dynamically displaying book covers from Amazon.com associated with articles returned from a ProQuest query. Dan Cohen, the Director of the Center for History & New Media has written, “APIs hold great promise as a method for combining and manipulating various digital resources and tools in a free-form and potent way.” Indeed, the potential for APIs to be used in the humanities is significant.
The most-popular APIs have been produced for commercial products including Flickr, Google Maps, and Freebase. Few Digital Humanities scholars, however, have attempted to create APIs for their scholarly repositories. We are therefore organizing a meeting to bring together both Digital Humanities scholars and industry developers in order to study existing APIs and develop a set of recommendations and best practices for API development in the digital humanities. Sessions at the workshop will be videotaped and placed in an online archive to both preserve the work of the sessions and make it available to those who were unable to attend. Events such as the proposed API workshop not only help develop necessary skills within the Digital Humanities community, but serve as a platform for planning, sharing, and innovating.
September – October 2010: The project team will develop and launch the workshop website. This website will house not only information regarding the event but will also serve as a clearinghouse for all materials relating to the workshop.
November 2010: An official call for participants will be published on the MITH website as well as the workshop website. The two-day workshop will be promoted on Digital Humanities email lists and social media sites (such as Twitter). Potential participants will have one month to submit a brief application that explains who they are and what they’d like to do at the event. From that list, MITH staff will choose a group of participants (should the number of interested parties exceed the 50 participants budgeted). Participants will be selected based on their programming ability and their connection to an existing or emergent digital humanities project which might benefit from an API.
December 2010: Participants will be notified through email about the status of their applications.
Late January 2011 or February 2011: The two-day workshop will be held at the University of Maryland in College Park. Each of the two days will have similar structure: mornings will feature talks by representatives from data repositories with existing, exemplary APIs, followed by afternoon breakout sessions in which small groups of digital humanities practitioners will seek to implement the ideas shared in the morning sessions into their own code. Time at the end of each afternoon will accommodate brief presentations of ideas.
March 2011-May 2011: Lester and a web developer will produce a web exhibition of video from the workshops. Lester will produce a white paper and a set of guidelines for designing APIs for the humanities which will be vetted by workshop participants.
Final Product and Dissemination
The workshop will be promoted through a website developed specifically for the event, through various social media including Twitter and Facebook, the Humanist listserv, and MITH’s community mailing list. In addition, video recordings of the speakers will be made available on the website along with notes from participants. Additionally, MITH will publish a set of guidelines for developing web-based APIs for digital humanities projects which will be published on the site and promoted through the same channels that promoted the workshop. The resulting website will act as a resource for those interested in the event, as well as humanities researchers interested in leveraging APIs in their digital work.