
Introduction
Nowadays, spatial analysis in text is widely considered as important for both researchers and users. In certain fields such as epidemiology, the extraction of spatial information in text is crucial and both resources and methods are necessary. In most
of spatial analysis process, gazetteer is a commonly used resource. A gazetteer is a data source where toponyms (place name) are associated with concepts and their geographic footprint. Unfortunately, most of publicly available gazetteer
are incomplete due to their initial purpose.
Hence, we propose Geodict, an integrated gazetteer that contains basic yet precise information (multilingual labels, administrative boundaries polygon, etc.)
which can be customized.
Download
The last version of Geodict is available here :
Query Geodict with Python
Installation and Running
First, download the Elasticsearch version available here.
Second, install Python 3 (3.6 or more) and install the dedicacted api named Gazpy
. To do so, run the following command:
sudo pip3 install git+https://github.com/Jacobe2169/gazpy.git
Finally, run the geodict_29_04/bin/elasticsearch
Query Geodict
get_by_label()
method.import gazpy as ga
from elasticsearch import Elasticsearch
gd = ga.Geodict(Elasticsearch())
fra = gd.get_by_label("France",lang="fr")[0]
print(fra.label.fr)
# France
print(fra.alias.fr)
#['FR', 'République française', 'FRA', 'RF', "l'hexagone"]
print(fra.coord)
# {'lat': 47.0, 'lon': 2.0}
print(fra.other.inc_geoname)
# ['GD6850344', 'GD3086749', 'GD1357264']
You can also query entities by on of their aliases:
gd.get_by_alias("FRA",lang="fr")
Or by using a different id (geoname, osm or wikidata) :
gd.get_by_other_id("Q142",identifier="wikidata")
Or by searching in a defined radius :
gd.get_in_radius(lon=2,lat=47,unit="km",distance=10)
Source
Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017) 2017