From Slashdot I found this blog entry Ocracoke Island Journal: Nookd about how a Nook version of War and Peace had the word “kindle” replaced by “nook” as in “It was as if a light has been Nooked (kindled) in a carved and painted lantern…” It seems that the company that ported the Kindle version over to the Nook ran a search and replace on the word Kindle and replaced it with Nook.
I think this should be turned into a game. We should create an e-reader that plays with the text in various ways. We could adapt some of Steve Ramsay’s algorithmic ideas (reversing lines of poetry). Readers could score points by clicking on the words they think were replaced and guessing the correct one.
I’m sitting at Congress 2012 in the beer tent at Wilfred Laurier. I’ve been writing a conference report of SDH/SEMI 2012. But in the beer tent they are talking about the ARG that Neil Randall (may have) started called Bonfire of the Humanities. Apparently the dean may have shut it down, but traces are left, see #bonfireofthehumanities. See also the YouTube video, Torch Institute Declares War Against University of Waterloo.
Because some may misunderstand, the Torch Institute is probably is Alternate Reality Game (ARG) satirizing the academy. With ARGs you never know what is real or not. The dean shutting things down, like the removal of the YouTube above may or may not be part of the game script. (You can see other Torch Institute videos here.) The guiding idea behind ARGs is TINAG (This is not a game.) ARGs are supposed to be games only in so far as you play with what may or may not be the game. Who knows about the Torch Institute.
CNN has a story on ‘The Demise of Guys’: How video games and porn are ruining a generation. The book is The Demise of Guys: Why Boys Are Struggling and What We Can Do About It and it is by retired Stanford psychologist Philip Zimbardo and Nikita Duncan. The book builds on a TED Talk that argues that:
“Boys’ brains are being digitally rewired for change, novelty, excitement and constant arousal. That means they’re totally out of sync in traditional classes, which are analog, static, interactively passive.” (Zimbardo)
Compare this to Hanna Rosin: New data on the rise of women who argues that “the global economy is becoming a place where women are more successful than men.” She argues that there has been a hollowing out of the middle-class jobs men held for service jobs that women do better. Could the shift from a manufacturing to a service economy be responsible for the “demise of guys?”
Bloomberg has a story that Warren Buffett Says Free News Unsustainable, May Add More Papers. The days of expecting news online to be free to access may be coming to an end. We may find more and more news behind paywalls of the sort the New York Times brought in where you only get so many free articles a month.
Buffett believes that local papers with a “community focus” can make a profit as they are often the only source for community news. There will always be free alternatives for national or international news, but community newspapers often don’t have free alternatives.
This bodes well for journalism which has suffered recently which in turn has, I believe, created a democracy gap as the fifth estate loses its ability to monitor the others. Bloggers don’t reliably replace investigative journalism that profits from reporting on government and industry.
From Slashdot another story hinting at how government agencies are organizing to intercept and interpret Internet data. See FBI quietly forms secretive Net-surveillance unit.
My guess is that data mining large amounts of data produces so many false positives that organizations like the NSA and FBI have to set up large units to follow up on results. There is an interesting policy paper by Jeff Jonas and Jim Harper on Effective Counterterrorism and the Limited Role of Predictive Data Mining that argues that predictive mining isn’t worth it. The cost of false positives for industry when they use predictive data mining (predicting who might buy your product) is acceptable. The costs of false positives for counterterrorism are prohibitive as it takes trained agents away from better uses of their time. I doubt anyone in this climate it willing to give up on mining which is why The NSA is Building the Country’s Biggest Spy Center.
I wonder if we will ever know if money spent on voice and text mining is useful in counterintelligence? Perhaps the rumour of the possibility of it working is enough?
Lindsay Thomas, the hard working blogger for 4Humanities has written an excellent piece On Graduate Education in the Humanities, by a Graduate Student in the Humanities. She talks about how hard it is to complete quickly when you are making ends meet by TAing and teaching constantly. She talks about the “casualization” of academic labor.
I would add to her essay that we need to think about expanding outcomes for graduate students. We design graduate programs to produce junior faculty (or casual labor who hang on in hopes of getting full-time faculty jobs.) What we don’t do is to design programs so that they prepare people for knowledge work outside the academy. This is not rocket science, there are all sorts of ways to do it and digital humanities programs could take the lead as our student acquire skills of broader relevance. But, as Lindsay points out, if you start changing or adding to graduate programs you can just extend the time to completion and students might end up no better off.
The Globe and Mail had a very interesting article on how Twitter hands your data to the highest bidder, but not to you. The article talks about how Twitter is archiving your data, selling it, but not letting you access your old tweets. The article mentions that DataSift is one company that has been licensed to mine the Twitter archives. DataSift presents itself as the “the world’s most powerful and scalable platform for managing large volumes of information from a variety of social data sources.” In effect they do real-time text analysis for industry. Here is what they say in What we do:
DataSift offers the most powerful and sophisticated tools for extracting value from Social Data. The amount of content that Internet users are creating and sharing through Social Media is exploding. DataSift offers the best tools for collecting, filtering and analyzing this data.
Social Data is more complicated to process and analyze because it is unstructured. DataSift’s platform has been built specifically to process large volumes of this unstructured data and derive value from it.
One thing that DataSift has is a curation language called CDSL (Curated Stream Definition Language) for querying the cloud of data they gather. The provide an example of what you can with it:
Here’s an example, just for illustration, of a complex filter that you could build with only four lines of CSDL code: imagine that you want to look at information from Twitter that mentions the iPad. Suppose you want to include content written in English or Spanish but exclude any other languages, select only content written within 100 kilometers of New York City, and exclude Tweets that have been retweeted fewer than five times. You can write that in just four lines of CSDL!
It would be interesting to develop an academic alternative similar to Archive-It, but for real-time social media tracking.
The latest version of our Old Bailey Datawarehousing Interface is up. This was the Digging Into Data project that got TAPoR, Zotero and Old Bailey working together. One of the things we built was an advanced visualization environment for the Old Bailey. This was programmed by John Simpson following ideas from Joerg Sanders. Milena Radzikowska did the interface design work and I wrote emails.
One feature we have added is the broaDHcast widget that allows projects like Criminal Intent to share announcements. This was inspired partly by the issues of keeping distributed projects like TAPoR, Zotero and Old Bailey informed.
The GRAND group has a work being exhibited at the InSight: Visualizing Health Humanities show that starts tonight. We used Unity to create a FPS (First Person Shooter) type of game for medical communication. The game, called CatHETR, lets players move through a ward dealing with communicative situations. This project was supported by the GRAND Network of Centres of Excellence.
The Guardian has a nice short story by Keza MacDonald that asks Are gamers really sexist?. It doesn’t really answer the question or propose solutions, but it documents again how people who speak out against sexist language get harassed.
As I mentioned in my post on the GRAND conference, Ken Perlin showed a number of interesting Java apps that illustrated visual ideas. One was a Interactive Map of Pride and Prejudice. This interactive map is a rich prospect of the whole text which you can move around to see particular parts. You can search for words (or strings) and see where they appear in the text. You can select some text and it searches. The interface is simple and intuitive. You can see how Perlin talks about it in his blog. I also recommend you look at his other experiments.
Last week I was at the GRAND 2012 conference. GRAND (Graphics, Animation, and New Media) is a Networks of Centres of Excellence that brings together people across disciplines and across the country around gaming, new media and so on. You can see my GRAND 2012 conference notes here.
This year we had two of the best keynotes of any conference I have been to. Valerie Steeves talked about her research into parents and youth on the internet. The change in attitudes of both parents and youth to the internet between 2000 and today was dramatic. Ken Perlin was the closing keynote and he showed Java apps that he wrote as experiments. It made me want to learn to program in Java just to have as much fun as he was having.
From Humanist, a link to an article on online education, The X Factor (in the Brainstorm blog of the Chronicle of Higher Education. The post talks about how Harvard University has joined with MIT to create edX, an online education consortium. Harvard is now joining the MOOC (Massive Online Open Courses) bandwagon pioneered by some Stanford profs who opened their courses to thousands. The author, Kevin Carey, points out that edX won’t compete with MIT or Harvard, but with other online providers and with less prestigious institutions.
I worry we are going to see a lessening of educational diversity. I worry that the star quality of MIT, Harvard and Stanford will drive out less prestigious players leaving us with a small number of online courses. Fewer instructors for more people will mean more standardization of education and less diversity.
The New York Times has a Room for Debate on this, Got a Computer? Get a Degree with different reactions to the news. Most seem positive, but few feel that certificates for taking MOOCs are comparable to real course credit.
I’ve been meaning to blog on the video circulating of Kurt Vonnegut talking about the Shape of Stories. He describes the curves followed by popular stories like “boy meets girl” and suggests computers could even understand such simple curves. In Lapham’s Quarterly you can read the text of this lecture with illustrations. See Kurt Vonnegut at the Blackboard. In this version he asks about the value of such systems, a question which could apply equally to computer generated visualization,
The question is, does this system I’ve devised help us in the evaluation of literature? Perhaps a real masterpiece cannot be crucified on a cross of this design. How about Hamlet?
He concludes that the system doesn’t work because the truth is ambiguous. We simply don’t know in complex works (like Hamlet) if news is good or bad. Good literature is open to interpretation.
But there’s a reason we recognize Hamlet as a masterpiece: it’s that Shakespeare told us the truth, and people so rarely tell us the truth in this rise and fall here [indicates blackboard]. The truth is, we know so little about life, we don’t really know what the good news is and what the bad news is.
Many have noticed this amusing play on visualization including an infographic on Visua.ly, Kurt Vonnegut on the Shapes of Stories:
Prism is the coolest idea I have come across in a long time. Coming from the University of Virginia Scholar’s Lab, Prism is a collaborative interpretation environment. Someone comes up with categories like “Rhetoric”, “Orientalism” and “Social Darwinism” for a text like Notes on the State of Virginia. Then people (with accounts, which you can get freely) go through and mark passages. This creates overlapping interpretative markup of the sort you used to get with COCOA in TACT, but unlike TACT, many people can do the interpretation – it can be crowdsourced.
They are planning some visualizations of the results including what look like the types of visualizations that TACT gave where you can see words distributed over tagged areas.
Bethany Nowviskie explains the background to the project in this Scholar’s Lab post.
Jeff sent me a link to the beta TED Ed site where you can see how they are turning TED videos (and other animations) into simple lessons that we can use. See TED-Ed: Lessons Worth Sharing. The idea is that an instructor can reuse (flip) a video with their own questions and commentary. You can also use the framework with YouTube videos. Neat.
A nice story from the New York Times by Michael Winerip, Robo-Readers Used to Grade Test Essays (April 22, 2012) talks automated essay scoring software (AES). The story first reports a study from the University of Akron that showed that AES software is comparable to human graders (see A Win for the Robo-Readers by Steve Kolowich from Inside Higher Ed.) The NYT story goes then to report how Les Perelman, a director of writing at MIT, has shown how you can game AES tools. Among other things they don’t check facts or truth so you can write all sorts of outrageous things and still get a good score from AES. The story discusses some of the patterns that get good scores like lexical variety and long sentences. The story ends with the possibility that AES could be matched by essay writing software,
Two former students who are computer science majors told him (Perelman) that they could design an Android app to generate essays that would receive 6’s from e-Rater. He says the nice thing about that is that smartphones would be able to submit essays directly to computer graders, and humans wouldn’t have to get involved.
Particularly interesting is an essay Perelman wrote to show how poor essays can game the system. I wish I could say that I never saw writing like this and that therefore there was no danger of AES systems rewarding the poor writing found in real essays,
In today’s society, college is ambiguous. We need it to live, but we also need it to love. Moreover, without college most of the world’s learning would be egregious. College, however, has myriad costs. One of the most important issues facing the world is how to reduce college costs. Some have argued that college costs are due to the luxuries students now expect. Others have argued that the costs are a result of athletics. In reality, high college costs are the result of excessive pay for teaching assistants.
From Slashdot a story about how the Faculty Advisory Council to the Library (of Harvard) sent around a Memorandum on Journal Pricing arguing that periodical subscriptions are not sustainable and that faculty should therefore publishing in open-access journals.
The Faculty Advisory Council to the Library, representing university faculty in all schools and in consultation with the Harvard Library leadership, reached this conclusion: major periodical subscriptions, especially to electronic journals published by historically key providers, cannot be sustained: continuing these subscriptions on their current footing is financially untenable. Doing so would seriously erode collection efforts in many other areas, already compromised.
According to National Security Agency (of the USA) whistleblower William Binney, the NSA probably has most of our email. See the video Whistleblower: The NSA is Lying–U.S. Government Has Copies of Most of Your Emails. The question then is what they are doing with it? He mentions that the email can be “put it into forms of graphing, which is building relationships or social networks for everybody, and then you watch it over time, you can build up knowledge about everyone in the country.” (see transcript on page). In other words they could (are) building a large social graph that they can use in various ways.
In the transcript of the longer video Binney talks about various programs developed to filter out all the information:
Well, it was called Thin Thread. I mean, Thin Thread was our—a test program that we set up to do that. By the way, I viewed it as we never had enough data, OK? We never got enough. It was never enough for us to work at, because I looked at velocity, variety and volume as all positive things. Volume meant you got more about your target. Velocity meant you got it faster. Variety meant you got more aspects. These were all positive things. All we had to do was to devise a way to use and utilize all of those inputs and be able to make sense of them, which is what we did.
Binney goes on to talk about the code named Stellar Wind program that Bush authorized and then was forced to change after a revolt of some sort in the Justice Department in 2004. Stories tell of senior Bush advisors trying to get Ashcroft to sign authorization papers for the program while he was in the hospital. As for Stellar Wind, it seems to be mostly about metadata – the date, to, and from of emails that you could use to build a diachronic social graph which is what Binney was talking about. Strictly speaking this would be social network analysis rather than text analysis, but they might have supplemented the system with some keyword capabilities. Another story from Time points out the problem with such analysis – that it generates too many vague false positives. “Leads from the Stellar Wind program were so vague and voluminous that field agents called them “Pizza Hut cases” — ostensibly suspicious calls that turned out to be takeout food orders.”
Either way, these hints give us a tantalizing view into how text and network analysis is being experimented with. Are there any useful research applications?