FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs

被引：0

作者：

Darer, Alexander ^{[1
]}

Farnan, Oliver ^{[1
]}

Wright, Joss ^{[2
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Oxford, England

[2] Univ Oxford, Oxford Internet Inst, Oxford, England

来源：

TMA CONFERENCE 2017 - PROCEEDINGS OF THE 1ST NETWORK TRAFFIC MEASUREMENT AND ANALYSIS CONFERENCE | 2017年

基金：

英国工程与自然科学研究理事会;

关键词：

censorship; filtering; DNS; Chinese Internet; search; CHINA; CENSORSHIP;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a novel framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are used as the basis for web search queries. The resulting URLs of these searches are checked for evidence of filtering, and newly discovered blocked resources will be fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows the approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. When deployed, this system was used to discover 1355 poisoned domains within China as of Feb 2017-30 times more than in the most widely-used published filter list of the time. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.

引用

页数：9