How well is Hunch working so far?
We have generally been very happy with the press and blog coverage of Hunch over the past few weeks (for a sampling, see our press page). In the few disappointing reviews the reviewers have either:
1) misunderstood what Hunch is trying to do, e.g. thinking Hunch is a search engine, another “answers” site, or is only useful for people with chronic indecisiveness, or
2) tried a few topics which Hunch got wrong. They then inferred from that small sample that Hunch is wrong most of the time.
How well do we think Hunch is working? Pretty well, although we still have a ways to go. In addition needing to grow the number of topics, results, and questions, we also have to improve the accuracy of our existing topics.
The best metric we have for the accuracy of Hunch’s results is what we call the “success rate” which we define as: The percentage of topic plays where a user clicks “Yes” to one of the top 3 results AND doesn’t click “No” to one of them. We think it’s safe to assume in those cases the user was pretty happy with the experience.
This isn’t a perfect metric. For example, users leave feedback on only 40% of gameplays – so we don’t know whether they had a good experience the other 60% of the time. But it’s a pretty good metric – for example it seems be high in topics that, anecdotally, users seem to like and low in topics that they don’t.
When we first launched the preview site the success rates was about 70% (note that we also only had about 500 topics that our staff and friends had created). Today it is about 81% (with about 3,500 topics). This is based on a total of 1,617,450 topic plays and about 1,343,654 feedbacks – enough data to give us confidence in the numbers.
Our long term goal is to get the overall success rate above 95%. How can we do this? A bunch of ways, some of which we’ve implemented, some of which are in progress, and some of which we haven’t figured out yet.
- Statistical learning. As Hunch gets more data, it is better able to predict results for you from the THAY (Teach Hunch About You) questions you’ve answered. We think we are starting to get enough data to get some really powerful, statistically significant predictions. We have had about 20 million THAY questions answered, 3.8M result feedbacks, or approximately 62 feedbacks per result in the system (there are currently 63,119 results),
- More accurate question importances. You may or may not have seen this feature, but every question has an importance level (High, Medium, Low) that determines how heavily the user’s response affects the results. Initially these importance levels are (optionally) set by the question creator. About a month ago we created the “Hunch Importance Bot,” which looks at user feedback and adjusts importances to get a higher success rate. Our long term goal is to develop a mechanism for personalizing importances per user.
- Prioritizing user’s preferences. One common source of error is in cases where Hunch is forced to make trade offs between high-importance questions. In the “Which new car should I buy?” topic suppose you ask for an SUV under $18,000. The problem is there aren’t any SUVs (at least in Hunch right now) that are under $18,000. So Hunch looks at past user feedback and decides whether to prioritize car type over price or vice versa. Just yesterday we released a new feature we hope will help with this problem. When you get to the end of a topic play where Hunch is forced to make a trade off, it asks you which you care about more:
- User contributions. The most important way Hunch gets better is through user contributions. If you look at the activity feed, you’ll see this is happening at a impressive rate and quality level.
Our ultimate goal is for any decision you are making that you to get the same result in 2 minutes on Hunch as you do doing hours of research on the web. We know we aren’t there yet but think we will get there and are heading in the right direction.
If you have any feedback about your experiences with Hunch’s accuracy or suggestions for improvements, here is a forum thread to discuss.