An authorship attribution model applied to pedophilia crime investigations




Child and adolescent sexual abuse on the internet, Authorship attribution, Stylometry, Pedophilia, Police investigation


Objectives: Identify the current state of the art of scientific research in the field of authorship attribution applied to investigations of sexual crimes against children and adolescents over the Internet involving written material. Propose a methodology for using authorship attribution to identify suspected authors of texts with content that encourages child and adolescent sexual abuse.
Methodology: This is a qualitative research that uses the Systematic Review of Literature to identify works that deal with the techniques of authorship attribution in order to seek scientific evidence of its application to problems similar to the one addressed in the present study.
Results: The current state of the art of scientific research that relates the use of authorship attribution techniques to texts on the internet that encourage the practice of sexual abuse of children and adolescents is presented and, from this, a methodology is proposed to identification of authors of texts with those characteristics.
Conclusions: It is concluded that there is not an abundance of scientific research on this topic, which suggests that it is an open field for further studies. It is also concluded that it is fully possible to apply the techniques of authorship attribution in the identification of the probable authors of texts that aim to guide and encourage the practice of child and adolescent sexual abuse, which is explained by the proposed methodology.

Author Biographies

Aurélio Julbert de Assis Ruprecht, Universidade Federal de Santa Catarina - UFSC

Master in Information Science from the Universidade Federal de Santa Catarina - UFSC

Marcelo da Silva Moreira, Universidade Federal de Santa Catarina - UFSC

Master in Information Science from the Universidade Federal de Santa Catarina - UFSC

Enrique Muriel-Torrado, Universidade Federal de Santa Catarina - UFSC

PhD in Scientific Information from the Universidade de Granada

Moisés Lima Dutra, Universidade Federal de Santa Catarina - UFSC

Ph.D. in Computing from Universidade de Lyon


ABBASI, A.; CHEN, H. Visualizing Authorship for Identification. In: MEHROTRA, S., ZENG, D.D., CHEN, H., THURAISINGHAM, B., WANG, FY. Intelligence and Security Informatics. Berlin: Springe, 2006. v. 3975.

ABBASI, A.; CHEN, H. Writeprints. ACM Transactions on Information Systems, [s. l.], v. 26, n. 2, p.1-29, 1 mar. 2008. DOI

BHARGAVA, M.; MEHNDIRATTA, P.; ASAWA, K. Stylometric Analysis for Authorship Attribution on Twitter. Big Data Analytics, [s. l.], p. 37-47, 2013. Springer International Publishing. DOI

BRASIL. Decreto-lei n.º 2.848, de 7 de dezembro de 1940. Código penal. Disponível em: Acesso em: 19 maio 2021.

BRASIL. Lei nº 8.069, de 13 de julho de 1990. Dispõe sobre o Estatuto da Criança e do Adolescente e dá outras providências. Disponível em: Acesso em: 19 maio 2021.
CHILDHOOD BRASIL. Quem somos. 2021. Disponível em: Acesso em: 18 maio 2021.

CHILDHOOD BRASIL. Números da causa. 2021. Disponível em: Acesso em: 18 maio 2021.

HADJIDJ, R.; DEBBABI, M.; LOUNIS, H.; IQBAL, F.; SZPORER, A.; BENREDJEM, D. Towards an integrated e-mail forensic analysis framework. Digital Investigation, v. 5, n. 3-4, p. 124-137, 2009. Disponível em: Acesso em: 18 maio 2021.

ESCALANTE, H. J. Early detection of deception and aggressiveness using profile-based representations. Expert Systems with Applications, v. 89, p. 99-111, 2017. DOI

FRANCO, D. P.; MAGALHÃES, S. R. A dark web: navegando no lado obscuro da Internet. Amazônia em Foco, Castanhal, v. 4, n. 6, p. 18-33, jan./jul. 2015. Disponível em: Acesso em: 20 jan. 2019.

GE, Z.; SUN, Y.; SMITH, M. J. T. Authorship attribution using a neural network language model. School of Electrical and Computer Engineering, p. 4212–4213, 2016. Disponível em: Acesso em: 20 jan. 2019.

ISHIHARA, S. A comparative study of likelihood ratio based forensic text comparison procedures: multivariate Kernel Density with Lexical Features vs. Word N-grams vs. Character N-grams. In: CYBERCRIME AND TRUSTWORTHY COMPUTING CONFERENCE, 5., 2014, New Zealand. Anais [...].New Zealand, 2014. p. 1-11. Disponível em: Acesso em: 16 jan. 2019.

ISHIHARA, S. Strength of linguistic text evidence: a fused forensic text comparison system. Forensic Science International, v. 278, p. 184-197, 2017. DOI 10.1016/j.forsciint.2017.06.040

KOURTIS, I.; STAMATATOS, E. Author identification using semi-supervised Learning Notebook for PAN at CLEF 2011. University of the Aegean, 2011.

MOREIRA, M. Análise de manuais de pedofilia na dark web para prevenção de crimes sexuais contra crianças e adolescentes. 2020. Dissertação (Mestrado em Ciência da Informação) – Universidade Federal de Santa Catarina, Florianópolis, SC, 2020.

PENG, F.; SCHUURMANS, D.; KESELJ, V.; WANG, S. Language independent authorship attribuition using character level language models. In: CONFERENCE ON EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 10., 2003, USA. Proceedings […]. USA: Association for Computational Linguistics, 2003. p. 267–274.

RAMNIAL, H.; PANCHOO, S.; PUDARUTH, S. Authorship attribution using stylometry and machine learning techniques. Advances in Intelligent Systems And Computing, [s. l.], p.113-125, 29 ago. 2015. DOI

ROCHA, A.; SCHEIRER, W.; FORSTALL, C.; CAVALCANTE, T.; THEOPHILO, A.; SHEN, B.; CARVALHO, A.; STAMATATOS, E. Authorship Attribuition for Social Media Forensics. IEEE Transactions on Information Forensics and Security, [s. l.], v. 12, n. 1, p.121-122, jan. 2017. DOI 10.1109/TIFS.2016.2603960.

SAFARNET. Institucional. Disponível em: Acesso em: 18 maio 2021.

STAMATATOS, E. A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, [s. l.], v. 60, n. 3, p.538-556, mar. 2009. DOI

VILLAR-RODRIGUEZ, E. et al. A feature selection method for author identification in interactive communications based on supervised learning and language typicality. Engineering Applications of Artificial Intelligence, v. 56, p. 175-184, 2016.

YANG, M.; CHOW, K. Authorship Attribution for Forensic Investigation with Thousands of Authors. ICT Systems Security and Privacy Protection, [s. l.], p.339-350, 2014. DOI

ZHENG, R.; LI, J.; CHEN, H.; HUANG, Z. A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, [s. l.], v. 57, n. 3, p. 378-393, 2006. DOI

XYLOGIANNOPOULOS, K.; KARAMPELAS, P.; ALHAJJ, R. Text mining for plagiarism detection: multivariate pattern detection for recognition of text similarities. In: IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2018., Barcelona. Anais [...]. Barcelona:ASONAM, 2018. p. 938-945. DOI



How to Cite

Ruprecht, A. J. de A., Moreira, M. da S., Muriel-Torrado, E., & Dutra, M. L. (2022). An authorship attribution model applied to pedophilia crime investigations. Informação & Informação, 27(1), 381–404.


