As someone who has worked with data scientists my whole career (and I’m old!), I have seen this frustration time and time again – you want to build the best models possible, but trying to understand and work with messy data can make it difficult to deliver the insights you need. And the problem isn’t just finding the right data sets. It’s also the amount of time you have to spend mining that data without knowing what value you’re going to get. Trying to learn and understand second and third-party data, ensure that it’s at the right level, and then join it with all of your first-party data can easily take weeks.
Not only is that work tedious, you’re flying blind. It’s usually not until you’re done that you’re able to determine exactly how much value you’ll be able to extract. That means you can spend huge amounts of time feeling like you’re just guessing when you’d probably rather be delivering the kinds of insights that actually drive better business outcomes. It is enough to make you give up on using external datasets all together which is a shame because they can be so valuable!
Better, easier models with Project Sandman
To make third-party data easier to use, and help you build more powerful models, we’ve created a tool for data scientists on our Trusted Signals Network (TSN) as part of an Alpha program called Project Sandman. Our goal is to equip you with Signals to improve your models – quickly and painlessly.
Imagine your marketing team wants to increase cart size. So you decide to build a propensity model for an online shoe store chain that’s designed to predict which shoppers are most likely to accept an upsell offer to their shopping cart before checking out. If you were only using first-party data, your insights would be limited to those customers who’d actually done so before. As a result, you wouldn’t be able to predict which other consumers might also be interested in being upsold in this way, because you simply wouldn’t have the right data or enough data to do so.
With Project Sandman, you can easily augment your training/test sets with third-party industry Signals to figure out who else to upsell your products to. In our shoe store example, those Signals might come from the airline industry, where certain travelers regularly like to upgrade to premium economy. Or they could come from the cosmetics industry, where some customers are willing to pay a few extra dollars for adding samples to their cart at checkout. By incorporating third-party Signals like these into your models, you can dramatically increase your models’ predictive capabilities.
5 steps to using Project Sandman
Project Sandman makes it is easy for Data Scientists to use and allows you to de-risk your use of Signals at every step along the way:
Log into the UI, where you’ll find an ever-increasing number of Signals at your disposal. At that point, we’ll explain exactly what Signals are and why they’re useful, and help you find some that are relevant for your space.
We’ll ask you to share the algorithm type and key performance metrics (e.g., AUC, ROAUC, F1 Score, Recall, Technical Lift) for your latest model. Our team will use this information to assess your model and whether or not we can help.
If there’s a good fit, we’ll ask you to send us your existing training and test sets with outcomes, and a validation set without outcomes so our systems can build some models to show improvements.
We’ll send our best baseline model (same algorithm with the same training and test sets) and our best TSN model (same algorithm but with your training and test sets boosted with Signals) for you to review.
We’ll send you a Signal file so that you can verify the results for yourself.
Here’s where a big benefit of Project Sandman comes into play. Normally, joining third-party data sets is tedious and time-consuming work. But Project Sandman solves this problem. No matter what level of granularity you send us your test data sets at, we’ll send you back our Signal file at the exact same level of granularity. In that way, you can join the data in minutes rather than days. And, you don’t have to create any new artifacts. You can just take something from your existing model pipeline and use it in our system.
Come join our Alpha to become your organization’s data hero
I am leading an Alpha program right now for this very product and am looking for passionate data scientists who care about improving model performance to join me on this journey. We are focusing first on conversion and churn models. You get to use the signals for free during the 30 day Alpha, see the benefit for yourself, give me feedback so that we can evolve the direction of the product to suit your needs and be involved in some really cool science.
With Project Sandman, you can access relevant Signals that are not only easy to use, but will also increase the accuracy of your predictions. You don’t have to invest weeks to see what kind of value you’re going to get. You get to see the value as you go and test it out before you commit. Ultimately, that speeds time to value because you can get into production faster and drive better business outcomes. Finally, since Project Sandman and the TSN use leading privacy-enhancing methods to ensure that Signals are anonymous and that they protect customer and business privacy and confidentiality, your privacy team will love you.
Want to join Alpha? Click here to sign up.