Intro to GeoInfo Generator#

Motivation#

Despite the fact that Depositar provides a convenient interface that allows users to select points, lines, or areas to describe the spatial aspects of their dataset, drawing a complex boundary such as a city boundary can be challenging. This challenge is amplified, especially for users who are not familiar with geographic information formats like geoJSON.

Below is the interactive map that Depositar apply:

Make this Notebook Trusted to load map: File -> Trust Notebook

Thus, we aims to develop a geographic information generator, to help users find the apporporiate wikidata keywords to describe their dataset.

Method#

In this seciton, we provide a 2-step pipeline to achieve the goal:

../../_images/geo_pipeline.png

Fig. 3 A pipeline for the keyword generator project#

Stpe 1: NER#

After obtaining the input metadata for the current dataset, we will utilize Named Entity Recognition (NER) on the input data to selectively extract words that could potentially correspond to Wikidata Q-items.

To achieve this goal, we will utilize the ckiplab/bert-base-chinese-ner NLP task model.

The ckiplab/bert-base-chinese-ner model is part of the CKIP Transformers project, which offers transformer models specifically designed for traditional Chinese language processing.

We have selected the ckiplab/bert-base-chinese-ner model due to its superior F1 score in NER when compared to other models provided by CKIP Lab.