from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/QKNnwLL991c" frameborder="0" allowfullscreen></iframe>')
Filters over-block and under-block (make Type I and II errors).
Population of pages matters. What's relevant?
Internet largely mediated by search engines.
(Pages users do find.)
Team at CRA~International attempted to view and categorize
68,150 webpages of which 63,105 worked.
60,833 Category 1a: no reference to sex and no nudity.
1,382 Category 5f: adult entertainment.
890 in other categories, e.g., show genitalia in an artistic or
educational context.
I drew random samples of the Category 1a pages to test filters.
result | Google inx | MSN inx | AOL, MSN, Y! srch | Wordtracker srch | |
---|---|---|---|---|---|
pages in sample | 11,100 | 39,999 | 22,405 | 206 million | |
working pages in sample | 10,009 | 36,557 | 21,870 | 195 million | |
queries in pop | 1.3 billion | 20.6 million | |||
queries in sample | 2,345 | 20.6 million |
Source | Google inx | MSN inx | AOL, MSN, Y! srch | Wordtracker srch |
---|---|---|---|---|
adult webpages | 1.1% | 1.1% | 1.7% | 14.1% |
domestic adult webpages | 44.2% | 56.7% | 88.4% | 87.4% |
searches w adult results | 6.0% | 37.1% | ||
searches w domestic adult results | 5.7% | 37.0% |
bound | Google inx | MSN inx | AOL, MSN, Y! srch |
---|---|---|---|
adult | 1.0% | 1.0% | 2.5% |
domestic adult | 0.4% | 0.5% | 2.2% |
Filter | Underblocking | Overblocking | ||
---|---|---|---|---|
MSN | MSN | |||
AOL Mature Teen | 8.9% | 8.6% | 22.6% | 23.6% |
MSN Pornography | 16.8% | 18.7% | 19.6% | 10.3% |
MSN Teen | 17.7% | 20.5% | 21.9% | 18.9% |
ContentProtect Default | 38.3% | 45.4% | 2.8% | 3.0% |
ContentProtect Custom | 28.3% | 46.7% | 1.4% | 0.7% |
CyberPatrol Custom | 31.0% | 33.5% | 1.4% | 0.9% |
CyberSitter Default | 12.7% | 16.5% | 3.6% | 4.1% |
CyberSitter Custom | 12.4% | 18.9% | 4.0% | 3.7% |
McAfee Young Teen | 16.1% | 26.0% | 12.4% | 13.2% |
Net Nanny Level 2 | 44.0% | 46.1% | 3.3% | 2.2% |
Norton Default | 60.2% | 54.9% | 1.4% | 0.7% |
Norton Custom | 58.4% | 54.2% | 0.9% | 0.4% |
Verizon | 41.8% | 40.3% | 9.4% | 5.7% |
8e6 | 18.3% | 23.0% | 9.4% | 7.5% |
SafeEyes | 16.2% | 15.2% | 3.3% | 3.2% |
Filter | underblocking | overblocking | ||
---|---|---|---|---|
MSN | MSN | |||
AOL Mature Teen | 5.6% | 6.5% | 18.4% | 21.0% |
MSN Pornography | 12.1% | 15.7% | 15.8% | 8.5% |
MSN Teen | 12.8% | 17.4% | 17.8% | 16.6% |
ContentProtect Default | 31.3% | 41.3% | 1.5% | 2.1% |
ContentProtect Custom | 22.2% | 42.6% | 0.6% | 0.4% |
CyberPatrol Custom | 24.6% | 29.7% | 0.6% | 0.5% |
CyberSitter Default | 8.6% | 13.6% | 2.1% | 3.1% |
CyberSitter Custom | 8.4% | 15.9% | 2.4% | 2.7% |
McAfee Young Teen | 11.4% | 22.5% | 9.3% | 11.3% |
Net Nanny Level 2 | 36.8% | 41.9% | 1.9% | 1.5% |
Norton Default | 52.9% | 50.7% | 0.6% | 0.4% |
Norton Custom | 51.1% | 50.1% | 0.4% | 0.2% |
Verizon | 34.7% | 36.2% | 6.7% | 4.4% |
8e6 | 13.1% | 19.6% | 6.7% | 6.0% |
SafeEyes | 11.4% | 12.3% | 1.9% | 2.3% |
Filter | MSN | |
---|---|---|
AOL Mature Teen | 40.0% | 40.6% |
MSN Pornography | 31.6% | 42.9% |
MSN Teen | 40.0% | 37.7% |
ContentProtect Default | 39.0% | 45.8% |
ContentProtect Custom | 40.6% | 47.1% |
CyberPatrol Custom | 48.6% | 44.0% |
CyberSitter Default | 50.0% | 32.8% |
CyberSitter Custom | 57.1% | 36.2% |
McAfee Young Teen | 44.4% | 37.5% |
Net Nanny Level 2 | 41.7% | 48.1% |
Norton Default | 35.3% | 49.3% |
Norton Custom | 36.4% | 49.7% |
Verizon | 37.0% | 42.4% |
8e6 | 42.1% | 46.8% |
SafeEyes | 35.3% | 40.4% |
filter | underblocking reslts | overblocking reslts | domestic underb | underblocking queries | 95% CL |
---|---|---|---|---|---|
AOL Mature Teen | 6.2% | 12.5% | 57.0% | 15.6% | 5.3% |
MSN Pornography | 21.4% | 4.4% | 86.1% | 32.3% | 20.9% |
MSN Teen | 20.8% | 5.8% | 91.9% | 28.1% | 18.8% |
ContentProtect Default | 18.4% | 6.4% | 70.1% | 46.2% | 10.0% |
ContentProtect Custom | 20.4% | 0.0% | 62.1% | 42.2% | 25.4% |
CyberPatrol Custom | 34.6% | 0.4% | 94.9% | 65.6% | 24.4% |
CyberSitter Default | 11.2% | 4.6% | 33.8% | 23.2% | 11.2% |
CyberSitter Custom | 10.0% | 5.3% | 44.1% | 20.1% | 8.1% |
McAfee Young Teen | 14.2% | 20.7% | 80.7% | 30.9% | 10.4% |
Net Nanny Level 2 | 28.1% | 3.7% | 79.4% | 36.6% | 20.8% |
Norton Default | 42.1% | 0.8% | 85.3% | 51.6% | 49.3% |
Norton Custom | 43.4% | 0.0% | 85.6% | 56.1% | 54.3% |
Verizon | 23.1% | 1.3% | 80.9% | 41.6% | 31.4% |
8e6 | 7.3% | 7.5% | 78.0% | 23.4% | 11.7% |
SafeEyes | 13.7% | 1.9% | 87.8% | 29.8% | 14.9% |
filter | underblocking reslts | overblocking reslts | domestic underblk | underblocking queries |
---|---|---|---|---|
AOL Mature Teen | 1.3% | 19.6% | 69.2% | 4.3% |
MSN Pornography | 2.7% | 13.3% | 86.1% | 8.2% |
MSN Teen | 2.6% | 13.7% | 83.1% | 8.3% |
ContentProtect Default | 7.5% | 12.4% | 84.1% | 23.1% |
ContentProtect Custom | 8.1% | 7.8% | 84.9% | 25.3% |
CyberPatrol Custom | 3.9% | 9.2% | 86.4% | 10.1% |
CyberSitter | 1.4% | 19.9% | 69.3% | 5.1% |
CyberSitter Custom | 2.9% | 18.2% | 84.0% | 9.4% |
McAfee Young Teen | 2.8% | 32.8% | 70.7% | 9.3% |
Net Nanny Level 2 | 12.6% | 9.5% | 82.9% | 34.4% |
Norton Default | 9.9% | 4.8% | 79.4% | 25.2% |
Norton Custom | 10.2% | 2.9% | 79.4% | 25.9% |
Verizon | 4.4% | 16.1% | 67.9% | 15.0% |
8e6 | 3.4% | 25.1% | 93.0% | 10.3% |
SafeEyes | 2.0% | 16.5% | 96.6% | 6.4% |
of the clean webpages in the indexes.
in Google or MSN search index
Less restrictive filters blocked as little as 40% of the adult pages.
The most restrictive filter blocked about 94% of the adult
pages among search results; also blocked about 13% of clean search results.
On average, it would block about 7.6 clean results for every adult result it blocks.
For the most popular queries, the most restrictive filter blocks over 98% of adult results;
also blocked ~20% of clean results.
Data Source | Percentage |
---|---|
Google index | 90.3% |
MSN index | 89.8% |
AOL, MSN & Y! queries | 88.2% |
Wordtracker queries | 95.9% |
Estimated percentage of nominally free adult foreign webpages that have commercial ties to the United States, based on data provided by CRA International. Estimates for query results take into account query weights.
Reference | Year | Sample type | Quantitative | Source of pages |
---|---|---|---|---|
eTesting Labs | 2001 | convenience | yes | searches on Google |
eTesting Labs | 2002 | convenience | yes | searches on Google; DMOZ |
NetAlert | 2001 | quota | yes | unknown |
PC Magazine | 2004 | unknown | no | unknown |
Consumer Reports | 2005 | convenience | no | unknown |
Rulespace depo | 2006 | convenience | yes | unknown |
eTesting 1: Google search for "free adult sex."
eTesting 2: Added DMOZ; took sample of results.
NetAlert: at most 30 webpages.
2/3 of adult membership websites are in US
lycos.fr, lycos.co.uk, com.ar, com.au, com.br, co.hu, co.il, co.kr, com.mx, co.nz, com.pl, com.pt, com.tw, com.ua, co.uk, com.ve, co.yu, co.za
Surprising outcry: thought the suit enabled DOJ to get personal info. Of course,
well now good for you -- instead of teaching parents/caregivers of minors how to block unwanted porn sites you have given this administration an EXCUSE to peruse search engine data bases.
enough erosion of civil liberties
Dorothy Grimes
earthchildren@comcast.net
Heartwood Books Heartwood@cstone.net to stark show details 1/20/06 Dear Professor Stark,
The Google user is an actual person, not just a statistic, and your attempt to expose my personal information (even buried in a large quantity of data) is at best short sighted on your part. It is also annoying. It is absolutely NONE OF YOUR BUSINESS what I search for on Google.
I am aware of the fact that some people (especially the young) seem to place no value on privacy. But this is not the case for everyone. Do you think for a minute that the government will be satisfied with "anonymous" data if it sees "suspicious" patterns? Using statistical methods to identify criminals has enormous potential for misuse. Look at the early use of genetics that produced eugenics. Before you accept your next consulting fee, stop and talk with someone about the ethics of your work.
Even if you do not value your personal privacy in this matter, ask yourself if you would want the public or the government examining all of your communication or internet use. When the government gains the right to watch our private non-criminal lives, this power will not exist only for the current well meaning Bush administration but will be available for the next Bush, Clinton or Nixon as well.
It is absolutely NONE OF YOUR BUSINESS what I search for on Google. It is none of my business whether the baseball cap just looks cute or is hiding thinning hair. Some things are private.
Paul Collinge
Heartwood Books 5 Elliewood Ave. Charlottesville, Va. 22903 434 295 7083