Introduction

Nowadays, spatial analysis in text is widely considered as important for both researchers and users. In certain fields such as epidemiology, the extraction of spatial information in text is crucial and both resources and methods are necessary. In most of spatial analysis process, gazetteer is a commonly used resource. A gazetteer is a data source where toponyms (place name) are associated with concepts and their geographic footprint. Unfortunately, most of publicly available gazetteer are incomplete due to their initial purpose.
Hence, we propose Geodict, an integrated gazetteer that contains basic yet precise information (multilingual labels, administrative boundaries polygon, etc.) which can be customized.

Download

The last version of Geodict is available here :

Query Geodict with Python

Installation and Running

First, download the Elasticsearch version available here.
Second, install Python 3 (3.6 or more) and install the dedicacted api named Gazpy. To do so, run the following command:
sudo pip3 install git+https://github.com/Jacobe2169/gazpy.git

Finally, run the geodict_29_04/bin/elasticsearch

Query Geodict

To get a entity based on its label, use the get_by_label() method.

import gazpy as ga
from elasticsearch import Elasticsearch

gd = ga.Geodict(Elasticsearch())
fra = gd.get_by_label("France",lang="fr")[0]
print(fra.label.fr)
# France
print(fra.alias.fr)
#['FR', 'République française', 'FRA', 'RF', "l'hexagone"]
print(fra.coord)
# {'lat': 47.0, 'lon': 2.0}
print(fra.other.inc_geoname)
# ['GD6850344', 'GD3086749', 'GD1357264']

You can also query entities by on of their aliases:
gd.get_by_alias("FRA",lang="fr")

Or by using a different id (geoname, osm or wikidata) :
gd.get_by_other_id("Q142",identifier="wikidata")

Or by searching in a defined radius :
gd.get_in_radius(lon=2,lat=47,unit="km",distance=10)

Source

  • Geodict: an integrated gazetteer
    Fize, Jacques and Shrivastava, Gaurav
    Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017) 2017