today’s forecast, cloudy with a chance of text analysis

This post will focus on the usage of Word Clouds in modern textual analysis as well as what I and many others believe to be their various shortcomings. A few other blog posts listed in the module notes for our previous three sessions have presented valuable perspectives on this practice and from their viewpoints I will be primarily reflective.

I make no claims to originality with this opinion and I will happily join the chorus of criticism against word clouds. Ultimately, my criticism centers on their superficiality as well as their ease of access which is often misconstrued as utility by new users. When tinkering around with Voyant Tools, I found myself using Moby Dick as a bit of source material to fairly unsurprising results. The most common word was displayed to me and who would have guessed, it was whale. 

To me and many others, this is the crux of the matter. Word clouds are wholly unimpressive uses of text mining functionality and offer very little to anyone other than what we would already expect. When we attempt to engage with this practice in the context of wider data analysis as a whole, it is clear that much more robust tools are required.

As mentioned at the start of this post, word clouds are superficial and often simply behave as design elements of any document more than they serve any meaningful analytical role. What may begin as an earnest excavation of the key points of whatever is being run through a word cloud, ultimately ends at the top soil of the text without the addition of more nuanced methods.

The value of word clouds comes from their ease of access. However, this seems to often be conflated with the utility of the technology. As a fairly introductory example of text mining, word clouds are fun for the user as they give the sense of accomplishment in a field they may be new to. In a blog post by Shelby Temple, they equate the usage of a word cloud with the coders traditional first “hello world!” program. Both are more for-their-creator rather than for anyone hoping to glean any meaningful insight from the creation.

All in all, the Word Cloud happily occupies its corner of beginners level text analysis, but for professionals and the like it is merely a first step into a much larger field.

what’s in a name?

One would not be blamed for the optimistic assumption that there are some sort of grand unifying principles to the way the web, and more generally the infosphere as a whole is constructed. With all the data in the known universe, one would also assume that in order to even attempt an estimation of the quantity of that data, we would have developed some sort of praxis to do so. While I appreciate such tendencies towards optimism to an extent, I am afraid this post will likely shatter that illusion.

As is the case with all types of constructs, it is useful to establish naming conventions which allow the individual parts and resources to be differentiated from each other and more importantly, so that they can interact with each other on a system-wide level. The reigning champion in this domain is the Uniform Resource Identifier, or URI. This is the commonly used convention we currently have in place to name resources within the world wide web. A common derivative of this concept is the Uniform Resource Locator (URL), and the somewhat similar Uniform Resource Name (URN). 

Most internet-enabled individuals will be very familiar with the URL as the string of characters atop most web browsers that when provided with a URL, takes the user to their requested webpage. The URN on the other hand might be more common among academics, or bibliophiles and a well-known example is the International Standard Book Number system, or ISBN. A key difference between a URL and a URN is that the former inherently provides both information as to how to access the resource, as well as access itself; and the URN does not. An ISBN does not imply to the user how to actually acquire the book they are interested in.

Returning to my initial point however, these conventions exist within a larger discourse and in no way operate as primordial pillars of our conceptualization of data. These are all tools that we have developed to help us understand and make sense of a system which without constant adaptation on our part, quickly evolves into uncertainty. This fundamentally epistemological understanding of data and our relationship to it truly blurs the lines between computer science, and philosophy more generally.

Given the context of our data-driven world, investigating something as base as a naming convention prompts even further scrutiny of our other assumed structures and orders. Now before the veil is completely drawn, and this post leaves you questioning the infinitesimal partitions of your reality, I’d suggest examining the principle at work here within the infosphere more specifically. What I see and often struggle to explain about the library and information science discipline is it’s scope. When we consider what is in a name, we are confronted with the duality of it’s significance. Shakespeare seemed to believe that despite what we call it, a rose smells as it smells. Is this the case for the URI? Or does the nature of this web we have spun for ourselves beget new foundations.

consequential data

This post ushers in a theme of reflection. Throughout the coming months, I will use this space as a sounding board for considerations that arise from our readings. This first post is a brief investigation into the 2016 US Presidential Election and how user data was harnessed to create curated, self-reflexive chambers that ultimately brought the legitimacy of the electoral process into question.

The contemporary online individual finds themselves trapped within cyclical discourses and echo chambers which make bridging the gap between ideas more complicated. When considering a controversial topic, as seems to be the norm these days, individuals often allow bias and personal viewpoints to impact their idea of credibility or leniency with the data they are given. There has been significant work put into the creation of frameworks to understand how individuals can ultimately escape the echo chambers they find themselves in and ultimately encounter alternative viewpoints. One such framework, titled Bias-Trust is designed to minimize those previously mentioned signals that cause confirmation bias. This study aligns with Allcott and Gentzkow (2017) in recognizing social media as a meaningful source of political information. While the notion that fake news leads to a less informed citizenry seems obvious, it’s importance is often understated. A knowledgeable public is critical to a well-functioning democracy. There were very real concerns that the 2016 US presidential election was significantly swayed by fake news. An example of this phenomenon was an article that rose to find itself being the “most clicked” story about three months ahead of the election. The article was titled “Pope Francis Shocks World, Endorses Donald Trump for President, Releases Statement”. The story was entirely fictitious. However, it amassed over 900,000 shares on Facebook (Bakir and McStay 2017). In a proximal study, the conclusion was made that for the outcome of the election to have been swayed, a fake news article would have had to hold the same level of persuasiveness as about 36 traditional tv ad (ibid). The amount of fear concerning this possibility in the immediate aftermath of the election was caused Mark Zuckerberg, the CEO of Facebook, to release a statement declaring that fake news on the platform did not influence the election. Regardless of the platform’s stance on the matter, the reality of these fake news stories being engaged with on such a large scale is noteworthy.  Some of the fake news stories that are spread can be easily deciphered by the public as misleading or fictitious (ibid), there are others that do a better job of masking themselves as legitimate news sources or articles.

Allcott, H. and Gentzkow, M. (2017) covers the 2016 US Presidential Election specifically while also discussing the challenge that echo chambers and filter bubbles place on individuals and their capacity to engage critically with the content they view. They found that about 14% of American adults value social media as their most important source of election news. When investigating the relationship between fake news and ideological polarization, it is helpful to explore more about the psychological background of ideological polarization in the past two decades. Sophr (2017) provides an empirical currency to this discourse as the political history of polarization and its increased prominence in the recent election are a focus for many a case study. It evaluates the debate over whether polarization has increased due to our new systems of media consumption. The author examines the roles that specific social media companies play in the shaping of these ideological landscapes. The convergence of these companies as both technology firms and media companies is something that their article highlights. Sophr concludes that this convergence is at the heart of the shift from traditional news agencies and media to faster and more easily digestible news built on the back of Big Data and content curation.

