Whether you are an established company or working to launch a new service, you can always leverage text data to validate, improve, and expand the functionalities of your product. The science of extracting meaning and learning from text data is an active topic of research called Natural Language Processing (NLP).

NLP produces new and exciting results on a daily basis, and is a very large field. However, having worked with hundreds of companies, the Insight team has seen a few key practical applications come up much more frequently than any other: While many NLP papers and tutorials exist online, we have found it hard to find guidelines and tips on how to approach these problems efficiently from the ground up.

After leading hundreds of projects a year and gaining advice from top teams all over the United States, we wrote this post to explain how to build Machine Learning solutions to solve problems like the ones mentioned above. We’ll begin with the simplest method that could work, and then move on to more nuanced solutions, such as feature engineering, word vectors, and deep learning.

After reading this article, you’ll know how to: We wrote this post as a step-by-step guide; it can also serve as a high level overview of highly effective standard approaches. This post is accompanied by an interactive notebook demonstrating and applying all these techniques. Feel free to run the code and follow along!

Every Machine Learning problem starts with data, such as a list of emails, posts, or tweets. Common sources of textual information include: “Disasters on Social Media” dataset For this post, we will use a dataset generously provided by CrowdFlower, called “Disasters on Social Media”, where: In the rest of this post, we will refer to tweets that are about disasters as “disaster”, and tweets about anything else as “irrelevant”. Read more from blog.insightdatascience.com…

thumbnail courtesy of insightdatascience.com