Beginning to “DH” : A very introductory workshop on using introductory digital humanities tools
The first thing you need to decide is why you want to do DH. What is the benefit? What do you hope to achieve? These seem like simple questions, but they are at the foundation of figuring out what DH can offer you.
This workshop will take you through three tools, Wordle, Voyant, and Zotero, which will allow you to get a taste of some of the fundamental concepts behind digital humanities. You will need to find some digital texts to work with. If you don’t have one on hand, try exploring Project Gutenberg or The Internet Archive for open source material to play around with.
There are as many kinds of Digital Humanities projects as there are kinds of writing, there are: social networks, peer-sourced and peer funded projects, journals, databases, bibliographic tools, visualization tools, art projects, mapping projects, mapping tools, annotation tools, word counting tools, pattern counting tools, and some tools that do data-mining.
For your own research needs, you may want to simply begin by playing around with some of the open source tools available to you online or through the Athabasca University e-lab like the e-lab portfolio tool or any number of the tools found in the
e-lab Virtual Tool Cupboard.
Let’s start picking an easy visualization tool like Wordle.
Wordle is a text toy that makes lovely pictures, or ‘word clouds,’ out of the text you have ‘cut and paste’ into the browser. The more a word appears in the text you have ‘cut and paste,’ the bigger it will appear in the cloud. We begin with Wordle because it is the easier way to familiarize yourself with creating a word cloud; visualizing word frequency is a key trend in DH as word counts are at the basis of many DH tools.
Here is a word cloud I have created from one of my syllabi:
Wordle filters out common words, or “stop words,” words that are so common that they would skew the results, words like: and, the, in, at, etc. You can see that based on their frequency, the class looks at topics aligned with: information, the internet, digital, intelligence, and research. This is moderately interesting, especially if you are looking for a quick idea of what the class will concentrate on. For a more meaningful cloud, try looking at a whole book. Here is a word cloud based on the The Project Gutenberg EBook of Lewis Carroll’s Alice in Wonderland, and artfully rendered in Wordle.
You can see that although the text is about what happens in Wonderland, the text focuses heavily on Alice. She appears far more often than the White Rabbit or the Queen; based on word frequency, the book is also far more about that which is “Little” over that which is “Large,” and for as large as he looms, the “Caterpillar” is still smaller than the “Mouse.” You can also see that the word “Illustration” shows up. Here we find the first best lesson for on the fly visual text analysis: clean your text. The best text analysis is done with a clean, unformatted, normalized digital text, and unless you want to study the paratext, you need to take out the introductory material, the bibliography, any tags or embedded links, and any other authorial or editorial interference such as [Place Image Here] or [Illustration]. If you don’t remove it the tool will count it.
Now find a digital text you would like to explore and create a word cloud. You can save it to the public gallery, or crop a screen shot for your own uses. Take a moment to see if you can spot any trends in the cloud. Now let’s see what else we can do with your text.
A tool like Voyant can provide you a few more ways of seeing into your text as you use the little grey arrows to open and close widows. Using the same kind of ‘cut and paste’ interface, it provides a number of different tools beyond the word cloud, and you can find them handily located all within the same browser window. You will not only see the word frequencies, but also you can see word counts. More importantly, you can advance your analysis over simply looking at size to you can read the text, select a word and see its count and see it in context, or double click the highlighted keyword in context and see where that sentence appears in your text. These kinds of moves are important as they allow you to see context.
Find a sample text, and try it out in Voyant. The easiest way to learn to use this kind of tool is to simply click around until you get an idea of how the widows and tools work.
Voyant does not initially apply ‘stop words.’ If you would like to take those words out of your analysis, submit your text, open the “Words in the Entire Corpus” window, click on the gear icon, and chose Taporware (English) from the “Stop Words List,” and click on "Apply Stop Words Globally."
At this stage you should have some ideas of what words appear frequently or infrequently in the text you are studying. You should be able to find them in context, and look at how those words all stack up. Word counting, seeing words in context (in sentences and the full text), and spotting “trends” in those frequencies, will get you started using DH tools to “see” into your texts.
The simplest way to clean your text (take out hidden tags, formatting, breaks etc) is to just cut and paste it into your internet browser search box and then take it back out. This works best for shorter blocks of text. There are also a few tools out there like this one that allow you to cut and paste your texts clean. Don't forget you will need to go in and remove the paratext yourself.
Finally, in order to begin working with DH you need to store your files somewhere. You can use your documents of course, but you may want to look at Zotero to help you out. It is a free application that lives on your web browser and allows you to create and share libraries and bibliographies. It is the easiest open source tool out there to manage your digital library.
Now that you have some idea of what kinds of things you can learn about Digital Humanities in 15 mins, go out and find more. It is fun stuff; how useful you find it is really up to you. Remember however, for all the lazy tossing about of the term “analysis” in conjunction with “digital” or “algorithmic,” or “data,” computers only do what we tell them to do, and most open source tools simply count, display, and organize, what you do with the data is where the analysis really comes in.