Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text
被引:2
作者:
Srinath, Mukund
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, University Pk, PA 16802 USAPenn State Univ, University Pk, PA 16802 USA
Srinath, Mukund
[1
]
Matheson, Lee
论文数: 0引用数: 0
h-index: 0
机构:
Future Privacy Forum, Washington, DC USAPenn State Univ, University Pk, PA 16802 USA
Matheson, Lee
[2
]
Venkit, Pranav Narayanan
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, University Pk, PA 16802 USAPenn State Univ, University Pk, PA 16802 USA
Venkit, Pranav Narayanan
[1
]
Zanfir-Fortuna, Gabriela
论文数: 0引用数: 0
h-index: 0
机构:
Future Privacy Forum, Washington, DC USAPenn State Univ, University Pk, PA 16802 USA
Zanfir-Fortuna, Gabriela
[2
]
论文数: 引用数:
h-index:
机构:
Schaub, Florian
[3
]
Giles, C. Lee
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, University Pk, PA 16802 USAPenn State Univ, University Pk, PA 16802 USA
Giles, C. Lee
[1
]
Wilson, Shomir
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, University Pk, PA 16802 USAPenn State Univ, University Pk, PA 16802 USA
Wilson, Shomir
[1
]
机构:
[1] Penn State Univ, University Pk, PA 16802 USA
[2] Future Privacy Forum, Washington, DC USA
[3] Univ Michigan, Ann Arbor, MI 48109 USA
来源:
PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023
|
2023年
基金:
美国国家科学基金会;
关键词:
privacy policy;
date extraction;
crawling;
D O I:
10.1145/3573128.3609342
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
The General Data Protection Regulation (GDPR) and other recent privacy laws require organizations to post their privacy policies, and place specific expectations on organisations' privacy practices. Privacy policies take the form of documents written in natural language, and one of the expectations placed upon them is that they remain up to date. To investigate legal compliance with this recency requirement at a large scale, we create a novel pipeline that includes crawling, regex-based extraction, candidate date classification and date object creation to extract updated and effective dates from privacy policies written in English. We then analyze patterns in policy dates using four web crawls and find that only about 40% of privacy policies online contain a date, thereby making it difficult to assess their regulatory compliance. We also find that updates in privacy policies are temporally concentrated around passage of laws regulating digital privacy (such as the GDPR), and that more popular domains are more likely to have policy dates as well as more likely to update their policies regularly.