FilteredWeb: A Framework for the Automated Search-Based Discovery of Blocked URLs

被引:0
|
作者
Darer, Alexander [1 ]
Farnan, Oliver [1 ]
Wright, Joss [2 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford, England
[2] Univ Oxford, Oxford Internet Inst, Oxford, England
来源
TMA CONFERENCE 2017 - PROCEEDINGS OF THE 1ST NETWORK TRAFFIC MEASUREMENT AND ANALYSIS CONFERENCE | 2017年
基金
英国工程与自然科学研究理事会;
关键词
censorship; filtering; DNS; Chinese Internet; search; CHINA; CENSORSHIP;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Various methods have been proposed for creating and maintaining lists of potentially filtered URLs to allow for measurement of ongoing internet censorship around the world. Whilst testing a known resource for evidence of filtering can be relatively simple, given appropriate vantage points, discovering previously unknown filtered web resources remains an open challenge. We present a novel framework for automating the process of discovering filtered resources through the use of adaptive queries to well-known search engines. Our system applies information retrieval algorithms to isolate characteristic linguistic patterns in known filtered web pages; these are used as the basis for web search queries. The resulting URLs of these searches are checked for evidence of filtering, and newly discovered blocked resources will be fed back into the system to detect further filtered content. Our implementation of this framework, applied to China as a case study, shows the approach is demonstrably effective at detecting significant numbers of previously unknown filtered web pages, making a significant contribution to the ongoing detection of internet filtering as it develops. When deployed, this system was used to discover 1355 poisoned domains within China as of Feb 2017-30 times more than in the most widely-used published filter list of the time. Of these, 759 are outside of the Alexa Top 1000 domains list, demonstrating the capability of this framework to find more obscure filtered content. Further, our initial analysis of filtered URLs, and the search terms that were used to discover them, gives further insight into the nature of the content currently being blocked in China.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] A framework for learning in search-based systems
    Sarkar, S
    Chakrabarti, PP
    Ghose, S
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1998, 10 (04) : 563 - 575
  • [2] Search-based automated testing of continuous controllers: Framework, tool support, and case studies
    Matinnejad, Reza
    Nejati, Shiva
    Briand, Lionel
    Bruckmann, Thomas
    Poull, Claude
    INFORMATION AND SOFTWARE TECHNOLOGY, 2015, 57 : 705 - 722
  • [3] SBSTFrame: a Framework to Search-Based Software Testing
    Machado, Bruno N.
    Camilo-Junior, Celso G.
    Rodrigues, Cassio L.
    Quijano, Eduardo H. D.
    2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 4106 - 4111
  • [4] Enhancing Search-Based QBF Solving by Dynamic Blocked Clause Elimination
    Lonsing, Florian
    Bacchus, Fahiem
    Biere, Armin
    Egly, Uwe
    Seidl, Martina
    LOGIC FOR PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND REASONING, (LPAR-20 2015), 2015, 9450 : 418 - 433
  • [5] Search-Based Concolic Execution for SW Vulnerability Discovery
    Fayozbek, Rustamov
    Choi, Minjun
    Yun, Joobeom
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (10): : 2526 - 2529
  • [6] Search-based optimal motion planning for automated driving
    Ajanovic, Zlatan
    Lacevic, Bakir
    Shyrokau, Barys
    Stolz, Michael
    Horn, Martin
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 4523 - 4530
  • [7] Heuristic Search-Based Framework for Onboard Trajectory Redesign
    Trumbauer, Eric
    Villac, Benjamin
    JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2014, 37 (01) : 164 - 175
  • [8] HC-Search: A Learning Framework for Search-based Structured Prediction
    Doppa, Janardhana Rao
    Fern, Alan
    Tadepalli, Prasad
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 50 : 369 - 407
  • [9] AmbieGen: A search-based framework for autonomous systems testing
    Humeniuk, Dmytro
    Khomh, Foutse
    Antoniol, Giuliano
    SCIENCE OF COMPUTER PROGRAMMING, 2023, 230
  • [10] Quantum Grover search-based optimization for innovative material discovery
    Borujeni, A. Esfandiarpour
    Harikrishnakumar, Ramkumar
    Nannapaneni, Saideep
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 4486 - 4489