What is TagWorks?
To use the popular ‘text mining’ metaphor: while other text analysis software provides its customers with shovels and pick axes, TagWorks allows you to strip mine entire mountains of documents to extract every bit of their informational value. To use a less distressing metaphor: TagWorks’ cloud-based software provides an assembly line of customize-able interfaces allowing you to enlist hundreds of Internet-based workers to read through and label all the important information inside your documents. You can think of it as a factory designed to apply labels to important words and phrases in your documents, or to put that differently: as a ‘works’ designed to place ‘tags’ throughout your documents.
How does TagWorks save so much time and money?
TagWorks’ assembly line approach and customize-able interfaces allow our clients to break apart very complex content analysis jobs into a series of brief tasks that Internet-based workers can do without face-to-face training. That means our clients save thousands of hours training and managing analysts. Large text analysis projects that would have taken many years using traditional approaches can now be done in a matter of months with TagWorks.
How is TagWorks better than AI or NLP?
With TagWorks guiding a team of Internet workers, you can apply hundreds of thousands (or even millions) of tags to your documents –– tags based on your own expertise that are accurately and flexibly applied by native speakers who will not be tripped up by metaphor, sarcasm, idioms, or other nuances of language. Computers are unable to apply your analytical expertise to documents with sufficient accuracy. Humans are necessary for this work.
Don’t be confused by all the hype. Tech commentators are abuzz about the ability of artificial intelligence (AI) to beat humans in games like Chess or Go. But human languages –– which do not include such strict rules and clear goals –– pose far greater challenges to computers. Machines can speak/write grammatically correct sentences, and answer precisely formed questions that can be answered by querying Wikipedia. (This is IBM-Watson’s strength.) But computers are not able to understand language the way a human does. (Just try talking to Siri or Alexa for more than a couple sentences.) Moreover, the greatest advancements in natural language processing (i.e. computers’ techniques for analyzing human languages) have come when humans have trained computers how to look for linguistic features (like named entities or parts of speech). In all of these cases, humans trained computers using documents that had already been tagged by humans (in a process called ‘supervised machine learning.’) So, the best hope for an AI/NLP solution to your text analysis challenge requires you to first have humans tag thousands of your documents. TagWorks is the premiere software for generating rich and large sets of tags ideal for machine learning training sets. To learn more about TagWorks and machine learning, click here.
How is TagWorks different from other commercially available content analysis software?
TagWorks is one of a kind in its capacity to guide thousands of Internet workers through the completion of your large and complex content analysis projects. If we use the popular metaphor of text mining, there are a number of companies that can sell you a shovel, sometimes even a mechanized shovel. But only TagWorks provides full-service strip-mining allowing you to efficiently convert an entire mountain of documents into rich, inter-connected data yielding priceless insights. (Our apologies for the metaphor. No plant or animal life will be harmed by your use of TagWorks.)
We sometimes recommend that potential customers try other human-based content analysis tools suitable for smaller or less complex projects. If you are analyzing 500 (two-page) documents or fewer, you might try ‘qualitative data analysis software’ like: NVivo, MaxQDA, AtlasTI, Discover Text, or Dedoose. The latter two packages also have some team-based features and pricing that may make them suitable for projects analyzing up to 1000 documents. But all these tools were designed, in the first instance, to aid expert researchers on projects tagging several dozen interview transcripts with up to 30 different tags. TagWorks’ assembly line approach and interfaces have been designed to support projects applying hundreds of different tags to many thousands of documents (or more) without requiring our clients to directly train and manage the hundreds of analysts needed to complete such large projects in a timely manner.
Other software packages – like TagTog and LightTag – are designed for quick, single-layer tagging tasks. These tools are helpful if you already have your own team of analysts, and you need to add just a handful of tags to less than 1000 documents. For instance, you might want to categorize each instance of a keyword as ‘relevant’ or ‘not relevant’ for your project. These lightweight tools are especially suitable (as is DiscoverText) for quick analyses of Twitter data. However, if you want to apply a rich set of tags to a larger set of longer documents, these tools would inefficiently require your analysts to read over the same documents many many times. They also don’t allow you to access or easily work with thousands of Internet-based workers. TagWorks’ unique assembly line approach and interfaces are designed to allow you to go as deep as you want into a massive set of longer documents while optimally using the time and effort of the hundreds of qualified Internet-based analysts we bring to your project.
How does TagWorks integrate with machine learning approaches?
The term machine learning can obscure the fact that such approaches require humans to teach computers (offering them definitions and constraints and guidance about how to teach themselves) to recognize important relationships, patterns, or information across some set of data. This is especially true in the case of human language, which includes so many ambiguities, homonyms, synonyms, idioms, metaphors, sarcasm, ironies, and poetics that machines built to compute mathematics were never designed to understand. The very most successful efforts to teach computers about human language use what are called supervised machine learning approaches. These approaches mimic the situation of a human supervisor standing over the computer’s shoulder watching it do language understanding tasks (i.e. categorizing/tagging the meaning of words and phrases in documents) while the human supervisor points out all the computer’s mistakes and praises it for everything it understands correctly. Such supervised machine learning systems require a set of documents that have already been accurately tagged by human analysts. These example documents and tags are called ‘training data’ and they are the secret sauce of almost every AI that attempts to understand language. Since TagWorks provides the most efficient way to create large, rich sets of training data, it is becoming the secret behind every secret sauce.
What kind of customers use TagWorks?
TagWorks is a general-purpose content analysis software suitable for projects of varying document length that seek to apply tag schema of varying complexity. Some of our clients, like Columbia University’s History Lab, are currently applying a limited set of tags to a very large (million document) archive of international diplomatic correspondence. TagWorks is helping prepare their archive for political science researchers, who may also use TagWorks to analyze the documents more deeply. Many of our clients have only a few thousand documents, but are applying a multi-stage analysis to dig deeper into specific passages of text (something other software does not support.) Yet another client –– the non-profit Public Editor project – uses a highly complex assembly line including 10 separate tasks that are stitched together to assess the credibility of news articles across dozens of categories of misinformation then displaying the tags to alert news readers to misleading content. No other commercially available software can come close to managing a project of this size and complexity. However intricate your conceptual scheme, TagWorks is ready to apply it to your archive, no matter how large.
Our clients come from academia, industry, government, and the non-profit sector. If you’d like to learn more about the suitability of TagWorks for your project, don’t hesitate to schedule a free consultation. For some clients, we will even build custom interfaces and widgets to supplement TagWorks’ current features.
Who are the Internet-based workers using TagWorks?
TagWorks is designed to work with your own volunteer team, or to recruit paid Internet-based workers from online labor pools like Mechanical Turk. We work with you to filter out unqualified workers and find those with appropriate language skills and expertise. TagWorks also requires workers (or volunteers) to complete qualification testing, including performing your own custom tasks, before they are allowed to contribute to your project. For many TagWorks projects, paying Internet-based workers will be the largest expense in the overall budget. But for medium and large projects, TagWorks customers can expect to save tens of thousands of dollars otherwise spent on management overhead and expert analysts’ compensation. And TagWorks medium and large projects are typically completed in several months instead of several years.
Who created TagWorks and how?
TagWorks’ story may be a lot like yours. TagWorks’ inventor, Nick Adams, had over 8,000 documents to analyze, and knew that any of 300 different categories of relevant information could be found in each document. He didn’t want to just read all the documents and write a story about them. He didn’t believe his audience should be expected to trust his insights and conclusions simply because of his expertise. He wanted all of his analysis and conclusions to be trace-able back to the original documents. He wanted the documents themselves tagged with all those different categories of information. And he wanted some of those category tags linked together and linked to different entities described in the documents, like people, places, locations, or events. That’s the sort of rich, inter-linked data that can be used in statistical models to discover and explain complex phenomena –– and the sort of data others can see, trust, and put to further use.
Just like many of you have –– Adams hunted around for software that would allow him and his team of research assistants to systematically apply all of his 300 tags to all of his nearly 10,000 documents. There was nothing. The available tools couldn’t support the application of more than 30 tags at a time, and required the close training and oversight of each analyst. Using those tools, his team would have had to read through all the documents 10 times and the project could have taken a decade to complete.
After countless hours at the drawing table, and a few discarded designs, Adams realized how an assembly line approach using the proper analytical units and tagging interfaces could efficiently break out large and complex content analysis jobs into brief, simple tasks that relatively untrained people can complete in series or in parallel. Best of all, the approach trains and manages analysts directly through the software, so senior researchers can scale-up their workforces by a factor of 100 while significantly scaling back their management role. The idea of TagWorks was born.
That was 2012. Since then, Adams –– a social scientist who earned his Ph.D. from UC Berkeley –– won funding and hired an engineer to help prototype the software, recruited a larger team of engineers to start building a viable product, met and hired the perfect CTO and co-founder (Norman Gilmore) to complete and battle-test the software, and landed the perfect strategic investor (SAGE Publishing –– the premiere, global social science methods publisher) to help bring TagWorks to you today. It’s been a long road. But we have been fueled all along by the vision of a world where researchers like you lead massive projects to reveal and explain complex patterns of social, political, economic, legal, and financial behavior. As customers like you uncover such behavioral patterns and webs of knowledge, we will all be supporting a growing community of knowledge workers sharing in the prosperity traditionally reserved for elite experts.