Because I Don’t Want to Be a Prospector: Towards a Better Term than “Data Mining”


 I have a confession. I hate the term “data mining.” Don’t get me wrong, I appreciate the concept of data mining and its importance in certain situations, but the term itself is problematic to me, especially when applied to humanities research. 

 I received my autumn 2011 copy of Victorian Studies in the mail yesterday and read the articles on interpretation in a digital age with glee.  The three articles all used data mining (or text mining) to look at larger patterns in term (word) usage in nineteenth-century texts. Their work uses tools such as Google Ngram Viewer, to provide signals which ideally lead to an exploration of larger concepts. I think this is fascinating work that definitely has value and should be continued especially since, as all the articles suggest, nineteenth-century texts are a large corpus of work that is ideal for this kind of investigation.

Though I re-emphasize that this is all very valuable and necessary work, the term “data mining” seems to connote a type of forcefulness in analysis, akin to trying to make a piece of a puzzle fit into a space it should not. When mining for this data, Heuser and Le-Khac warn against mistaking signal as data (81). However, the term “data mining” suggests that digital humanities scholars should prospect for data, set up stakes around the perimeter, make sure to not overlap into another prospector’s claim, dig , and hope for the best.
 Maybe it is because I grew up in a mining town and thus the association the term “mining” has is less than ethical, but I feel that we should have a better term to describe the important work being done.  I suggest we use something like data exegesis (too religious?) or maybe text (term?) curating . 

 My main concern is the use or (over use) of the term “data mining” will open the field of digital humanities to critique by those who do not understand the work that digital humanists do. Data mining has such an unethical connotation to start with, especially in relation to the type of information stripping for profit and advertisement that is done in social media, that scholars need to be savvier with the terminology we choose to use in the field.

What do you propose we use instead? Do you think data mining is an appropriate term for the type of work being done? 

Work Cited
Heuser, Ryan and Long Le-Khac. “Learning to Read Data: Bringing out the Humanistic in the Digital Humanities.” Victorian Studies 54.1 (Autumn 2011): 79-86. Print.

Comments

Popular Posts