Cleaning, correcting, and geocoding of big address databases using text mining
PDF (Español (España))
HTML (Español (España))

Keywords

Georeferencing
geocoding
text minninig

How to Cite

Fredy Humberto, & Fernandez Rozas, N. E. (2021). Cleaning, correcting, and geocoding of big address databases using text mining. Universidad Ciencia Y Tecnología, 25(109), 80-87. https://doi.org/10.47460/uct.v25i109.451

Abstract

For the georeferencing of a big number of addresses, prior geocoding through public or private systems is necessary. Geocoding is not an exact science because addresses are usually written and stored by people, which causes different precision issues in the registry, such as misspellings, unnecessary data, or a lack of minimal data. To address this problem, this article describes a methodology that cleans and corrects addresses by optimizing the geocoding process using existing systems. For its development, the Knowledge Discovery in Text (KDT) process is used. The methodology is applied to a database of criminal events addresses provided by the criminal analysis unit of the Regional Prosecutor's Office of Biobío, Chile. The results show an increase in the number of geocodes of the implemented systems, which varies according to the system used.

Keywords: Georeferencing, geocoding, Text mining.

References

[1]C. Davis y F. Fonseca, «Assessing the Certainty of Locations Produced by an Address Geocoding System,» Geoinformatica, vol. 11, pp. 103-129, 2007.

[2]L. Hill, «Georeferencing in Digital Libraries,» D-Lib Magazine, vol. 10, nº 5, 2004.

[3]J. Pontón y A. Santillán, «Seguridad Ciudadana: escenarios y efectos,» 2008.

[4]D. W. Goldberg, «Spatial approaches to reducing error in geocoded data,» 2010.

[5]D.-H. Yang, L. M. Bilaver, O. Hayes y R. Goerge, «Improving Geocoding Practices: Evaluation of Geocoding Tools,» Journal of Medical Systems, vol. 28, pp. 361-370, 2004.

[6]T. Ah-Hwee, «Text mining: The state of the art and the challenges,» de PAKDD’99 workshop on Knowledge Discovery from Advanced Databases, Beijing, 1999.

[7]R. Feldman y I. Dagan, «Knowledge discovery in textual databases,» de First International Conference on Knowledge Discovery and Data Mining (KDD-95), 1995.

[8]M. d. C Justicia de la Torre , «Nuevas Tecnicas de Mineria de Textos: Aplicaciones,» Granada, 2017.

[9]M. Lutz, Programming Python, vol. 2, O'reilly & Associates, 2001, pp. 1-10.

[10]W. McKinney, Python For Dara Analysis, O'Reilly, 2012, pp. 111-152.

[11]E. Ukkonen, «Algorithms for Approximate String Matching,» de International Conference on Foundations of Computation Theory, 1985.

[12]M. A. Alvarez Carmona, «Deteccion de similitud en textos cortos considerando traslape, ordeny relacion semantica de palabras,» Tonantzintla, Puebla, 2014.

[13]V. I. Levenshtein, «Binary Codes Capble Of Correcting Deletions, Insertions, and Reversals,» Soviet Physics Doklady, vol. 10, p. 707, 2 February 1966.

[14]Google, «Google Maps Plataform,» 2020. [Online]. Available: https://developers.google.com/maps/documentation/javascript/geocoding?hl=es-419. [Last access: 29 Jul 2020].

[15]Mapquest, «Mapquest Developer,» 2020. [Online]. Available: https://developer.mapquest.com/. [Last access: 25 Jul 2020].

[16]Microsoft Corporation, «Bing Maps Dev Center,» 2020. [Online]. Available: https://www.bingmapsportal.com/. [Last access: 29 Jul 2020].

[17]Open Street Map Wiki, 2020. [Online]. Available: https://wiki.openstreetmap.org/wiki/Main_Page. [Last access: 29 Jul 2020].

[18]OpenAdrdresses, «OpenAdrdresses,» 2020. [Online]. Available: https://openaddresses.io/. [Last access: 25 Jul 2020].

[19]OpenCage Geocoder, 2020. [Online]. Available: https://opencagedata.com/. [Last access: 29 Jul 2020].

[20]Yahoo, «Yahoo Developer,» 2016. [Online]. Available:https://developer.yahoo.com/. [Last access: 14 Aug 2020].

[21]K. Jordahl, J. Van Den Bossche y J. Wasserman, «Geopandas/Geopandas: V0. 4.1. Zenodo,» 2020.

https://doi.org/10.47460/uct.v25i109.451
PDF (Español (España))
HTML (Español (España))
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Downloads

Download data is not yet available.