Privacy risk assessment method incorporating sensitivity and correlation with empirical study
DOI:
https://doi.org/10.47989/ir31iConf64277Keywords:
Sensitivity, Correlation, Privacy risk assessment, Virtual academic communityAbstract
Introduction. User-generated content (UGC) has emerged as a prominent vector for privacy breaches, especially due to the context-dependence of data sensitivity and vulnerabilities introduced by data correlations. These challenges highlight the growing limitations of traditional assessment methods.
Method. This study proposes a privacy risk quantification method integrating both attribute sensitivity and inter-attribute association, with an experimental validation conducted on the ‘Friend Identification’ section of the https://muchong.com. A BERT-BiLSTM-CRF deep learning model is utilized for the automatic identification of attributes from unstructured text. Using a predefined privacy data lexicon, attribute sensitivity is quantified, and pointwise mutual information (PMI) is introduced to measure attribute associations. Combined with a privacy subject identification factor, these elements collectively quantify privacy risk values, followed by risk level classification.
Results. Ablation experiments and manual validation have confirmed the feasibility of the proposed scheme, demonstrating its capability to identify, assess, and classify privacy risks in unstructured textual data with broad applicability.
Conclusion(s). The study validates the proposed solution theoretically, technically, and empirically, overcoming the limitations of traditional isolated-field evaluation paradigms. The method can be extended to high-sensitivity domains such as healthcare and finance, providing a basis for dynamic, risk-informed classification policies.
References
Bedford, T., & Cooke, R. (2001). Probabilistic risk analysis: foundations and methods. Cambridge University Press.
Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29. https://dl.acm.org/doi/10.3115/981623.981633
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171-4186), Minneapolis, USA, June 2-7, 2019. https://doi.org/10.18653/v1/N19-1423
Dym, B., & Fiesler, C. (2020). Social norm vulnerability and its consequences for privacy and safety in an online community. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1-24. https://doi.org/10.1145/3415226
Featherman, M. S., Miyazaki, A. D., & Sprott, D. E. (2010). Reducing online privacy risk to facilitate e‐service adoption: the influence of perceived ease of use and corporate credibility. Journal of services marketing, 24(3), 219-229. https://doi.org/10.1108/08876041011040622
Gbongli, K., Xu, Y., Amedjonekou, K. M., & Kovács, L. (2020). Evaluation and Classification of Mobile Financial Services Sustainability Using Structural Equation Modeling and Multiple Criteria Decision-Making Methods. Sustainability, 12(4), 1288. https://doi.org/10.3390/su12041288
Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.
https://doi.org/10.48550/arXiv.1603.01360
Nguyen, S., & Wang, H. Y. (2018). Prioritizing operational risks in container shipping systems by using cognitive assessment technique. Maritime Business Review, 3 (2), 185-206. https://doi.org/10.1108/MABR-11-2017-0029
Posey, C., Lowry, P. B., Roberts, T. L., & Ellis, T. S. (2010). Proposing the online community self-disclosure model: the case of working professionals in France and the UK who use online communities. European journal of information systems, 19(2), 181-195. https://doi.org/10.1057/ejis.2010.15
Shostack, A. (2014). Threat modeling: Designing for security. John wiley & sons.
Tsay-Vogel, M., Shanahan, J., & Signorielli, N. (2018). Social media cultivating perceptions of privacy: A 5-year analysis of privacy attitudes and self-disclosure behaviors among Facebook users. New media & society, 20(1), 141-161. https://doi.org/10.1177/1461444816660731
Xu, X., Liu, J., & Liu, J. H. (2024). The effect of social media environments on online emotional disclosure: tie strength, network size and self-reference. Online Information Review, 48(2), 390-408. https://doi.org/10.1108/OIR-04-2022-0245
Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., & Hernández-Orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634(8032), 61-68. https://doi.org/10.1038/s41586-024-07930-y
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Ruili Geng , Tiantian Zhang , Sentao Li , Yishuai Xu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
