College of Business
Butler Business Consulting Group

Q&A Platforms Evaluated

Using Butler University Q&A Intelligence Index

A Study by the Butler Business Consulting Group

Read the full report

Executive Summary

A new study using the Butler University Q&A Intelligence Index measures how various mobile Q&A platforms deliver quality, accurate answers in a timely manner to a broad variety of questions. Based on the results of our analysis, ChaCha led all Q&A platforms on mobile devices.

BBA-QA-Study _graphic 

 

 *Mean accuracy of responses originally graded on a 5 point scale.

Results of the study are based upon review of a large set of responses from each of the major Q&A platforms, coupled with a comparison of disparate Q&A platforms that serve answers in different ways. Our methodology included the creation of a new metric, termed the Butler University Q&A Intelligence Index, which measures the likelihood that a user can expect to receive a correct answer in a timely manner to any random question asked using natural language. We asked questions via mobile services and randomized the questions to cover both popular and long-tail knowledge requests.

ChaCha delivered the highest quality responses consistently across the largest group of categories and question types, but did have occasional issues with Objective|Temporal and Sports questions. Ask.com performed best in the single category of questions tagged as Objective|Temporal.

Quora was proficient at answering difficult questions that require expert and extensive explanations, but it was generally unable to deliver answers within 3 minutes for most information searches on mobile devices. Quora answered only 24% of the questions at all, and often the match found did not include a viable answer.

Siri did not perform nearly as well on this random sampling of popular and long-tail questions as it did on a recent Piper Jaffray study, where results indicated that Siri correctly answered 77% of questions (Elmer-DeWitt, 2012). Our study found Siri only accurately answered 16% of the questions posed. The variance may be due to the types of questions asked and the testing conditions. Piper Jaffray notes that Siri's biggest strengths are in "local discovery and OS (operating system) commands" which were not highly represented in our study of more mainstream questions.

Google's response rate was 100%, but the first non-sponsored result on the search results page (which often times was not fully visible as an organic search result on the presented page on a mobile device) only presented an accurate answer about 50% of the time, according to the Butler University Q&A Intelligence Index. On a mobile phone, when accounting for the clutter of ads and the likelihood of extra clicks to achieve the answer, allowing for the answer to be within the first non-sponsored search result might be considered generous. Again, this study differs from the results found in the Piper Jaffray study, but differences are likely due to variations in methodology. For example Piper Jaffray found that Google scores highest in terms of navigation and information (Elmer-DeWitt, 2012).

This study's results support the hypothesis that Q&A platforms cannot rely on algorithmic search results alone to deliver quality answers. Search Engine Results Pages (SERP) lack deep semantic understanding of "natural language" human questions, and, therefore, cannot account effectively for long-tail questions like those posed in this study (De Virgilio, Guerra, & Velegrakis, 2012).

To achieve a score above 50% or 60% on the Butler University Q&A Intelligence Index, it would appear that Q&A platforms must supplement algorithmic document indexing with either:

  • Utilization of structured data
  • Semantic understanding via artificial intelligence (AI) or real humans

In terms of handling structured data more effectively, Google is promoting direct answers using its new Knowledge Graph and Google Now technology, "which tap into the collective intelligence of the web and understand the world a bit more like people do," (Google, 2012). The limits of Google's algorithmic technologies are evident in the empirical results of this study and users' actual experiences. Other Q&A platforms in this study are also incorporating similar algorithmic solutions.

Improved machine learning may eventually push past the algorithmic limitations of document analysis. The efforts of DeepQA (IBM's Watson on Jeopardy) proved that intensive semantic processing can help, albeit without cost-efficient ability to scale using today's systems. While Ferrucci notes that "Computers cannot ground words to human experiences to derive meaning," Watson showed potential. The DeepQA project claimed 90% accuracy in answering 60% of questions and 70% accuracy in answering 100% of questions (Ferruci, 2010). These results translate to 54 and 70 Butler University Q&A Intelligence Index scores respectively.

We conclude that, without large advances in semantic processing and better utilization of knowledge graphs, Q&A platforms can benefit from the timely injection of human semantic understanding into the Q&A experience. ChaCha's top score on the Butler University Q&A Intelligence Index is an indication that just-in-time human-assisted Q&A can outperform algorithm-only solutions.

Read the full report