Template-Type: ReDIF-Article 1.0 Author-Name: Mª Esther López Vizcaíno Author-Name: Carlos L. Iglesias Patiño Author-Name: Mª Esther Calvo Ocampo Title: PROPUESTA METODOLÓGICA PARA LA GEORREFERENCIACIÓN DE LA POBLACIÓN Y PRIMERAS APLICACIONES EN GALICIA Abstract: Resumen:Las estadísticas se refieren a una base territorial y la necesidad de información con una resolución geográfica mayor es cada vez más necesaria para llevar a cabo estudios sobre el asentamiento de la población. El objetivo de este artículo es presentar la metodología seguida en el Instituto Gallego de Estadística para geolocalizar la población de Galicia disponible en el Padrón Municipal de Habitantes, empleando para ello diversas técnicas estadísticas, como son la regresión con splines y el análisis de componentes principales. El proceso de georreferenciación se dividió en dos pasos, por un lado se georreferenció la parte urbana y por otro la parte rural. La metodología que se presenta permitió georreferenciar el 99% de la población de Galicia. Para complementar el objetivo general, se presentan ejemplos de uso de la población geolocalizada, que muestran la importancia y el potencial de la georreferenciación de los datos.Abstract:The statistics refer to a territorial base and the need for information with a greater geographical resolution is increasingly necessary to carry out studies on the settlement of the population. In the current administrative organization of the kingdom of Spain, municipalities constitute the smaller administrative units into which the territory is divided and which have precise boundaries assigned. For this reason, studies whose objective is the location of the population tend to descend to the municipal level. Anyway, several authors (Reher, 1994; Rúa and others, 2003) have recommended that, for the study of the settlement of the population on the territory, this division is clearly insufficient and it is necessary to increase the geographical resolution of analysis. In this direction, according to Goerlich and Cantarino (2012), the best way to represent the population is to georeference all the residential buildings of a country, determine the population that resides in them, to count it and to assign it to the corresponding pixel. The determination of the location of the population on the territory, not directly related to the administrative boundaries, is absolutely essential for many practical issues of social organization (Goerlich and Cantarino, 2012). Having georeferenced statistical units it will allow social, economic or environmental planning easier to design and monitor, as well as a more appropriate territory management, less dependent on the administrative subdivision. It also make decisions easier, not only public, but also private, both for profit and non profit entities. This will reduce costs because, in the medium term, many of the ad hoc studies commissioned by stakeholders, and possibly also private ones, will be unnecessary to base their strategy. It is not intended to replace the decision maker but to help him. Then, the aim of georeferencing is to identify by geographic coordinates where the statistical units are located in the territory. In the files that are used for georeferencing it is common to find a proportion of records that have incomplete information and that can not be assigned to a precise location in the space. This situation is particularly frequent in rural areas where addresses are not standardized and the scattering of the buildings leads to a wider positioning error. In this situation, instead of eliminating these records, what is usually done, in this work, missing coordinates will be imputed using indirect techniques, depending on the available information. The utility of the georeferenced information is multiple: in the resident population by territorial areas, in the transportation planning (Moreno y Prieto, 2003), in the planning of services or facilities, in the calculation of the population in influence zones of schools, senior residences, etc. The objective of this article is to present the methodology followed in the Galician Institute of Statistics (IGE) to geolocalize the population available in the Municipal Population Register (PMH), using various statistical techniques, such as the regression with splines and principal component analysis. The process of georeferencing was divided into two steps, on the one hand the urban part was georeferenced and on the other the rural part. In the urban part it was decided to apply a regression model with splines to the coordinates of the specified path. The advantages of using splines are that they adapt to non-rectilinear routes, which in Galicia abound due to their orography, and they can also be represented as mixed models, and therefore benefit from the theory and the existing software on this kind of models (Ruppert et al., 2003; Ugarte et al., 2009; Wood, 2006). In the rural part, the georeferencing is done at the nucleus/scattering level and the number of the state. The corresponding coordinates are assigned to a weighted average point of the nucleus or the corresponding scattering, because the centroid of the points (vector of arithmetic means) was not satisfactory in some cases, due to the dispersion of the population in Galician territory The methodology presented allowed us to georeference about 99% of the Galician population. In a later section and to complement this general objective, we present three examples of integration between both types of information, statistical and geographical, which show the importance and potential of georeferencing. The first example refers to the calculation of population in a geographical area that does not coincide with any of the administrative divisions of Galicia, the second and third examples are related to the calculation of statistical indicators of service coverage, allocating inhabitants to influence polygons; in the third, as a previous step to a spatial analysis. The proposed methodology was implemented in the R software (R Core Team, 2016) and tested with the PMH data for Galicia and for 2016. The goal is to use this methodological proposal, or its future improvements, to keep the population of Galicia georeferenced each year. Nowadays, the database MDAGE of the Spanish Institute of Statistics (INE) together with the imputation techniques described in this work, allow us to obtain this valuable information. At this point it should be noted that the imputation techniques used constitute one of the novel contributions of this work. In any case, to obtain new locations, Catalonian Institute of Statistics (Idescat) (Suñé, 2014) has also used imputation techniques, but in this case they use the population grid combined with probabilistic methods. Finally, the presented methodology is easily replicable, both with the data available in another NUTS2 and with the data available at the national level or even with other countries. And as for the extensions of this work, Goerlich and Cantarino (2010) commented that the resident population should be the main objective to geolocate, although not the only one. It is necessary to extend this study and, for example, change the criterion of location of the population from the place of residence to work (García-Palomares, 2017), a study that, although complicated by the number of sources of information that need to be used, we think that it can be undertaken in the future. Another interesting aspect is the incorporation of the longitudinal perspective and new information on the vital trajectories of the population would be offered. Classification-JEL: R1 Keywords: Georreferenciación, Splines, Análisis de Componentes Principales, Censo de población, Geocoded, Principal Component Analysis, Aging population Pages: 17-43 Volume: I Year: 2020 File-URL: http://www.revistaestudiosregionales.com/documentos/articulos/pdf-articulo-2584.pdf File-Format: Application/pdf Handle: RePEc:rer:articu:v:I:y:2020:p:17-43