So I read this paper on an experiment with video CAPTCHAs. In the experiment, a youtube video is chosen randomly. Then the user is required to submit 3 related words or tags that correspond to the video. The purpose of this being to relieve the user from the burden of the text based CAPTCHAs. It seems like the authors of the study intend to use the video CAPTCHA as an alternative option to the text based one. However, it needs to be just as secure for them both to be readily available.
The experiment has a set of tags that it will accept from the user. There are several criteria for choosing which tags from the ones already available on youtube are valid. The most important of which is the frequency that it occurs on youtube. If the tag meets a particular frequency threshold, it is removed from the list of acceptable tags because it would be too easy to guess using an automated attack. Clearly, this aspect of the experiment had some give and take. If the list of acceptable tags is very strict, the CAPTCHA would be very secure from an attack but would also become very difficult for users to guess. The opposite is true if the acceptable tags became less strict.
The results of the experiment were interesting. About 80% of the test subjects found the new system less frustrating. This is a frequent issue with me. Sometimes the current CAPTCHAs are so difficult that I can quit out of frustration. The study also found attack success rates under 15% which is beautiful compared to Microsoft's claim of 60% success with the current system. After adding in some code to allow for more accurate string matching and using the most secure choice of acceptable tags, the attack success rates went under 3% while the human success rates were over 75%. That is a remarkable gap to achieve but we need to take into consideration how the experiment was done.
This sounds all fine and dandy until you realize that the attacks were also designed by the same people. This is where I see the flaw in their logic. They know too much about the specific implementation of the CAPTCHA. The attack they used was a frequency-based attack. The three most common tags that lie beneath the threshold for elimination were chosen. This is a very simplistic attack as they note. The experiment should have had an independent source create attacks with more advanced methods like ones that use the video's audio to generate tags. It is very possible that these CAPTCHAs will be very easily defeatable. I don't think this new system has very much promise, but more research should be done on the success of attacks to confirm.
--Keyon