About the Co-founders
Nick Adams Ph.D
Nick Adams is TagWorks' inventor and co-creator. He came up with the initial design of the software when he faced the challenge of closely annotating over 8,000 news articles describing events of the Occupy movement.
Nick is a proponent of hybrid text analysis approaches, and he has taught, consulted, and mentored hundreds of researchers in a range of automated and human text analysis techniques. He founded the Computational Text Analysis Working Group at UC Berkeley's D-Lab, and the interdisciplinary Text Across Domains (Text XD) initiative at the Berkeley Institute for Data Science.
Nick earned his Ph.D. in sociology from Berkeley in 2015 and is now co-Founder of Thusly, Inc. and the Founder and Director of Goodly Labs, a CA non-profit dedicated to empowering the public with the data, tools, and insights of social science.
Nick is honored to sit on the SSRC's Digital Culture committee; to have helped form the Credibility Coalition; and to serve as a BITSS Catalyst (Berkeley Initiative for Transparency in the Social Sciences), promoting sustainable open science.
Norman Gilmore is the Chief Technology Officer for Thusly, Inc. and co-creator of TagWorks. He has a long term interest in software that supports citizen science, visualization for exploratory data analysis, and complex problem solving. So, he has loved building TagWorks, which brings all of those themes together.
Norman likes to lead happy teams, and believes Scrum is a good way to do that. He uses Node and React to build advanced interfaces, Django for web application development, and deploys to AWS with docker containers.
Norman has worked at big companies like Disney Interactive and Boeing, and at smaller companies you probably haven't heard of.
The TagWorks story
Sociologist Nick Adams had a problem.
It was not a new problem. And it was one that many researchers and data science teams encounter still today.
The problem was: how could he pull rich, intricate data from a very large corpus of documents without taking decades to do it?
Adams, a UC Berkeley political sociologist and data scientist, wanted to do a multilevel textual analysis of over 8,000 documents reporting on the 2011 Occupy movement. And his detailed conceptual scheme (itemizing all the potentially interesting information in those documents) included nearly 300 variables.
His review of similar research suggested that his goal was impossible. Past studies of contentious politics tended to be of two types. One type collected a few variables about a large number of movement events. The second type looked deeply into one or two events or movements producing thick descriptions of everything that happened and every detail of how the movement interacted with the city and the police. In other words, there were shallow looks at a large number of documents (a large-n study) or deep examinations of a small number of documents (a small-n study).
Adams was not interested in doing a deep dive into one or two episodes or a shallow pass over all of them. “I wanted to have the richness of description you find in a small N study and do it for the thousands of events led by 185 different Occupy encampments.
“I wanted to do something that's never been done before.”
It soon became clear why such a deep dive into a huge pile of documents hadn’t been completed. The task was incredibly daunting. This kind of textual analysis is not something computers can do. They can’t understand the nuance, ambiguities, and contradictions of natural language the way humans do. On the other hand, the cognitive load of identifying (any of) 300 different variables in a long passage of text like a news article is significant. It’s both extremely challenging and extremely boring. “My very smart undergrads felt it was an impossible task,” Adams recalls. “Each article would take hours to complete and once you proved that you could do it, you’re kind of like ‘that was terrible and I never want to do it again.’”
Other researchers had run into this issue before and had either simplified their approach or reduced their document set. A few had retained the size and depth of their studies and had persevered, but those studies took many years. Adams was determined to soak everything he could out of the historical opportunity the Occupy movement presented, but he couldn’t afford to spend a decade of his life training and managing multiple waves of undergraduates to hand-code all the documents.
Adams began to look into ways to break the job down into smaller assignments and then automate the work as much as possible. Existing tagging software required too much of annotators and still gathered too little data, so he decided to design a new solution. After much trial and error Adams found that an assembly line approach using the proper analytical units and tagging interfaces could efficiently break out large and complex content analysis jobs into brief, simple tasks that relatively untrained people could complete in series or in parallel.
Working with a team of software engineers in the San Francisco Bay Area, Adams created a prototype that made possible multi-level information extraction from a large corpus of documents. It wasn’t long before Adams realized that this solution could work for many other researchers, too.
Adams then teamed up with veteran software engineer and Thusly Co-Founder Norman Gilmore. Together with a team of programmers, they have now created TagWorks, a web-based software that not only decomposes a monumental effort into a series of simple tasks, it also trains and manages analysts and crowd workers directly through the software. TagWorks also automatically aggregates the work of crowd workers to validate results, formalizing and exploiting intersubjective epistemology so that principal investigators can feel confident about their results without having to check each tag. These unique features allow researchers to scale-up their workforces by a factor of 100 (or more) while significantly scaling back their management role.
If you’ve dreamt of a performing a rich examination of a very large collection of documents, but found yourself lacking appropriate tools and resources, don’t give up. Don’t settle. Don’t reduce the scope of your study or the richness and clarity of your variables.
Use TagWorks, and have it all.