Thursday, September 30, 2010

Which Words Does Google Instant Blacklist?

Some folks at the Hacker publication 2600 decided to compile a list of words that are restricted by Google Instant.
Except in extreme and special cases, GoogleGoogle is known for anything but censorship, but as we’ve said before, there are some terms the web giant’s new instant search feature won’t work with.
We understand Google’s intentions; the team over there is trying to make sure that no one sees pornographic or violent results they might fight disturbing unless they really mean to search for them. When asked about this feature a few weeks ago, Google’s Johanna Wright said the restrictions are in place to protect children.
But Google has opened itself up to a potential PR problem, because some of these omissions will be at best bewildering and at worst offensive to particularly sensitive (or progressive) users who don’t understand how Google Instant actually works.
For example, “bisexual” and “lesbian” are among the restricted words. Type them in to Google and the instant search will immediately stop delivering new results. You have to hit enter to confirm, yes, you really do want to know about something in some way related to bisexuals or lesbians.

Why Did Google Block These Words?


You can still search for these terms. The issue is that when you type them, Google Instant stops reporting results on the fly, and you must hit “enter” to see results.
That happens because Google Instant doesn’t just use what you’ve typed to display results. It reads data collected over the years about previous users’ searches to predict what you’re going to type. It’s the same algorithm that handles auto-complete, or the Google Suggest pop-ups in the old, not-so-instant Google search. Google searches only display for the exact text that you’ve typed after you’ve hit enter.
When results fail to appear after you’ve typed “lesbian” or “butt,” it’s not because the results are being censored. Google is struggling to prevent the text of offensive searches users have made in the past (there have been other controversies[mashable link] on this subject before) from jumping up in front of you when you’re looking for something innocuous.
Since countless users may have followed the word lesbian with “porn,” generating results inappropriate for children, Google’s algorithm has decided not to immediately throw 20 links to lesbian porn sites in your face when you type “lesbian,” even if that’s the most common search based on the algorithmic data.
When we contacted Google for comment, we received this statement from a spokesperson:
“There are a number of reasons you may not be seeing search queries for a particular topic. Among other things, we apply a narrow set of removal policies for pornography, violence, and hate speech. It’s important to note that removing queries from Autocomplete is a hard problem, and not as simple as blacklisting particular terms and phrases.
In search, we get more than one billion searches each day. Because of this, we take an algorithmic approach to removals, and just like our search algorithms, these are imperfect. We will continue to work to improve our approach to removals in Autocomplete, and are listening carefully to feedback from our users.
Our algorithms look not only at specific words, but compound queries based on those words, and across all languages. So, for example, if there’s a bad word in Russian, we may remove a compound word including the transliteration of the Russian word into English. We also look at the search results themselves for given queries. So, for example, if the results for a particular query seem pornographic, our algorithms may remove that query from Autocomplete, even if the query itself wouldn’t otherwise violate our policies. This system is neither perfect nor instantaneous, and we will continue to work to make it better.”
Google’s highly effective SafeSearch algorithm still applies to instant search results. SafeSearch can filter out potentially offensive search results quite effectively after a user has hit “enter” — the first page of results for “lesbian” with moderate safe search enabled is completely innocuous — and it works for searches in progress too.
Google’s current implementation is far from perfect — the company rep admitted that. If nothing else, we’d like to see Google manually re-enter safe suggestions for some common terms that have been restricted because they’re sometimes connected with sexual, violent or hateful results.
The rep told us that Google is working on improving the system, but wouldn’t give us any specifics about future changes. In the meantime, check out the complete list at 2600 if you’re curious.
[Via Nerve]

0 comments:

Post a Comment