Privacy risk assessment method incorporating sensitivity and correlation with empirical study

Ruili Geng; Tiantian  Zhang; Sentao Li; Yishuai Xu

doi:10.47989/ir31iConf64277

Authors

Ruili Geng ZhengZhou University
Tiantian Zhang ZhengZhou University
Sentao Li ZhengZhou University
Yishuai Xu Universiti Malaya

DOI:

https://doi.org/10.47989/ir31iConf64277

Keywords:

Sensitivity, Correlation, Privacy risk assessment, Virtual academic community

Abstract

Introduction. User-generated content (UGC) has emerged as a prominent vector for privacy breaches, especially due to the context-dependence of data sensitivity and vulnerabilities introduced by data correlations. These challenges highlight the growing limitations of traditional assessment methods.

Method. This study proposes a privacy risk quantification method integrating both attribute sensitivity and inter-attribute association, with an experimental validation conducted on the ‘Friend Identification’ section of the https://muchong.com. A BERT-BiLSTM-CRF deep learning model is utilized for the automatic identification of attributes from unstructured text. Using a predefined privacy data lexicon, attribute sensitivity is quantified, and pointwise mutual information (PMI) is introduced to measure attribute associations. Combined with a privacy subject identification factor, these elements collectively quantify privacy risk values, followed by risk level classification.

Results. Ablation experiments and manual validation have confirmed the feasibility of the proposed scheme, demonstrating its capability to identify, assess, and classify privacy risks in unstructured textual data with broad applicability.

Conclusion(s). The study validates the proposed solution theoretically, technically, and empirically, overcoming the limitations of traditional isolated-field evaluation paradigms. The method can be extended to high-sensitivity domains such as healthcare and finance, providing a basis for dynamic, risk-informed classification policies.

References

Bedford, T., & Cooke, R. (2001). Probabilistic risk analysis: foundations and methods. Cambridge University Press.

Church, K., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational linguistics, 16(1), 22-29. https://dl.acm.org/doi/10.3115/981623.981633

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT 2019 (pp. 4171-4186), Minneapolis, USA, June 2-7, 2019. https://doi.org/10.18653/v1/N19-1423

Dym, B., & Fiesler, C. (2020). Social norm vulnerability and its consequences for privacy and safety in an online community. Proceedings of the ACM on Human-Computer Interaction, 4(CSCW2), 1-24. https://doi.org/10.1145/3415226

Featherman, M. S., Miyazaki, A. D., & Sprott, D. E. (2010). Reducing online privacy risk to facilitate e‐service adoption: the influence of perceived ease of use and corporate credibility. Journal of services marketing, 24(3), 219-229. https://doi.org/10.1108/08876041011040622

Gbongli, K., Xu, Y., Amedjonekou, K. M., & Kovács, L. (2020). Evaluation and Classification of Mobile Financial Services Sustainability Using Structural Equation Modeling and Multiple Criteria Decision-Making Methods. Sustainability, 12(4), 1288. https://doi.org/10.3390/su12041288

Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., & Dyer, C. (2016). Neural architectures for named entity recognition. arXiv preprint arXiv:1603.01360.

https://doi.org/10.48550/arXiv.1603.01360

Nguyen, S., & Wang, H. Y. (2018). Prioritizing operational risks in container shipping systems by using cognitive assessment technique. Maritime Business Review, 3 (2), 185-206. https://doi.org/10.1108/MABR-11-2017-0029

Posey, C., Lowry, P. B., Roberts, T. L., & Ellis, T. S. (2010). Proposing the online community self-disclosure model: the case of working professionals in France and the UK who use online communities. European journal of information systems, 19(2), 181-195. https://doi.org/10.1057/ejis.2010.15

Shostack, A. (2014). Threat modeling: Designing for security. John wiley & sons.

Tsay-Vogel, M., Shanahan, J., & Signorielli, N. (2018). Social media cultivating perceptions of privacy: A 5-year analysis of privacy attitudes and self-disclosure behaviors among Facebook users. New media & society, 20(1), 141-161. https://doi.org/10.1177/1461444816660731

Xu, X., Liu, J., & Liu, J. H. (2024). The effect of social media environments on online emotional disclosure: tie strength, network size and self-reference. Online Information Review, 48(2), 390-408. https://doi.org/10.1108/OIR-04-2022-0245

Zhou, L., Schellaert, W., Martínez-Plumed, F., Moros-Daval, Y., Ferri, C., & Hernández-Orallo, J. (2024). Larger and more instructable language models become less reliable. Nature, 634(8032), 61-68. https://doi.org/10.1038/s41586-024-07930-y

Privacy risk assessment method incorporating sensitivity and correlation with empirical study

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Similar Articles

About the Journal

Make a Submission

Information