Part of this course is exposing you to some of the tools and methodologies that many DHers use. Having at least a little bit of knowledge of these will help you better understand how DH projects or tools are built, what they can do, and what their limitations are. Some of this won’t be easy, and will involve some problem-solving and, once you get your feet wet, a little bit of playing around. But that, too, is part of DH: the experimenting, the play, the figuring out how to do things.
A lot of this you will necessarily have to do on your own, but remember, you’re also part of several communities. Got a question about how something works, or getting an error message? Google it. Chances are you’re not the only one to deal with this problem.
Because we’re building basic skills here, your completion of the following assignments are only being evaluated to the extent that you demonstrate that you have completed them. For each one, please submit a screenshot to the appropriate assignment on Canvas demonstrating that you have completed the assigned tasks(s). Don’t know what a screenshot is, or how to take one on your machine? If so, then figuring those out is your first task.
HMTL and CSS are essential elements of just about every content management system. In general, HTML constitutes the basic instructions to your computer as to the content of a webpage, and CSS provides the design elements (colors, layout, fonts, and so forth). Complete the free elements of Codecademy’s basic HTML & CSS course. Complete the free (unlocked) lessons through Unit 6. Submit the screenshot of your completion of the Build a Resume segment.
Read the following:
Every computer program is written in a computer language, or “code.” We’re going to learn and use Python. There a zillion languages, many of them used in digital humanities (especially Ruby and Java), but there are good reasons for starting with Python.
As Michelle Moravec has suggested, “invest your time in learning methods not tools.” By this, she doesn’t mean to eschew tools, but rather, to focus on methodology, with the actual tool used being secondary. In other words, even for the exercise we’re doing now, the point is not only to learn python for the sake of knowing python, although I would agree with many that having at least some sense of coding may be completely indispensable, but it’s a pretty good thing to have.
We’re also doing this exercise to learn some methodology for a skill that might come in very handy sometime: how to scrape data from the web. Think of all those digitized files or websites out there, that we can only get data from by downloading the pages. That’s fine for a few dozen or even a few score pages, but what about hundreds? Thousands? That’s where it makes sense to find some way to automate the ingestion of documents, get rid of all that pesky HTML and other formatting necessary for them to get all gussied up for the web anyway, have nice clean files ready for data analysis, and extract some data for analysis. That’s what we’ll be doing with python.
Accordingly, here are the lessons we’ll be doing, from the Programming Historian. First, scroll down to “Introduction to Python.” For actually writing and running your python programs, editors suggest either using the command line or installing Komodo Edit. If either of these work for you, excellent. For the Mac, I much prefer TextWrangler , and Windows folk may like Notepad++ (both of these are free), as Komodo Edit is a lot more complex than our needs warrant for the purposes of this exercise. Another cool option, if you don’t want to install anything, is repl.it, which runs free, online programming environments, including python, that operate on files on your computer (Caveats: you must use the python 2.7 compiler, not the python 3, and I don’t know whether it will invoke the proper libraries; given that python is already on Macs, I would only consider this option for Windows folk).
Another thing to remember: the filename extension, that is, the suffix for python program files is “.py”; no matter what text editor you use, if you’re using one, make sure that you save your files as .py files rather than as .txt files, which tends to be the default on text editors. Of course, you will be working with other files that will need to remain .txt or .html files, but the python programs must have a .py extension.
Then, the lessons to complete:
- Python Introduction and Installation
- Understanding Web Pages and HTML
- Working with Text Files
- Code Reuse and Modularity
- Working with Web Pages
- Viewing HTML Files
- Manipulating Strings in Python
- From HTML to a List of Words (part 1)
- From HTML to a List of Words (part 2)
- Normalizing Data
- Counting Frequencies
- Creating and Viewing HTML Files with Python
- Output Data as an HTML File
- Keywords in Context (Using n-grams)
- Output Keywords in Context in HTML File
In canvas, post your final obo.py file and output .html files for two keywords for “Output Keywords” exercise, one for a keyword in the file, the other for a keyword not in the file for an Old Bailey record other than the one used as the example. So, for example, I tried some keywords for the Ordinary’s Account of March 3rd, 1737, and here’s what I got for a hit:
I know, this seems like a lot. But trust me, some of this is review, and most of these lessons go by quickly (several of them under 10 minutes each). I guarantee that we’ll all be able to work through them within a week, with a minimum of pulling of hair and gnashing of teeth.
By the way, last time I did this, people using Windows had trouble with the command line window disappearing. Here’s how to deal with that.
Another interesting read, optional:
Some useful intro reading, beyond what we’ve read in Macroanalysis:
Then, complete the the Getting Started with Topic Modeling and MALLET tutorial. Submit two screenshots. One, like Figure 8 in the tutorial, should show the output of a train-topics command on the sample data set discussed in the tutorial, but indicates that you generated 15 topics instead of the default 10. The other should, like Figure 10 in the tutorial, show a screenshot of the tutorial_composition.txt file generated by your 15-topic model opened in a spreedsheet (Excel if you have it is fine; I often use LibreOffice, which is free and open source).
Some useful intro reading:
Complete the following lessons:
Intro to Google Maps and Google Earth. Note: Google Maps Engine Lite has been re-branded Google My Maps. It is slightly different, but includes the same functionality.
Installing QGIS 2.0 and Adding Layers
Creating New Vector Layers in QGIS 2.0
In Canvas, submit no fewer than two distinct screenshots of your maps from the Creating New Vector Layers showing roads and lots.
Some useful intro reading:
Complete Miriam Posner, Getting Started with Palladio
On Canvas, upload a screenshot of
- a map you’ve done with Miriam Posner’s Cushner data
- a network you’ve done with Miriam Posner’s Cushner data
- a network that you’ve done with Düring’s Neumann data
- a timeline that you’ve with Düring’s Neumann data