Project Result#

Now, let’s integrate the two aspects of our project. This integration will yield an interactive interface that enables users to input their dataset information. Consequently, they will receive the suggested Wikidata keywords along with the most fitting location that accurately characterizes the dataset.

Tip

To obtain wikidata keyword and the geographic information for your dataset, follow these steps:

  1. Click on the “rocket icon” located in the top-right corner.

  2. Select the option labeled Live Code from the menu.

  3. Once the environment is launched, you’ll be able to manually execute each code cell.

For any hidden code cells, simply click on Show code cell source and subsequently click run within each respective cell section.

Here’re the information you may input:

  • title: the title of your dataset

  • description: description of your dataset

  • resource_names: file names in your dataset

  • resource_descriptions: the description of files in your dataset

  • organization_title: the tilte of the affiliatedorganization

  • organization_description: the description of the affiliatedorganization

Warning

Notice At least one of the fields should be completed. Leaving all of them empty is not permissible.

Function Definations#

Just expend it and click run

Hide code cell source
# Packages Import ============================================================
import requests

# NLP task model
from ckip_transformers.nlp import CkipNerChunker
ner_driver = CkipNerChunker(model="bert-base")

# Function Definstion =========================================================
def wiki_search(search_term):
    url = f"https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&search={search_term}&language=zh"

    response = requests.get(url)
    data = response.json()

    # organize the response
    if "search" in data:
        for result in data["search"]:
            qid = result["id"]
            label = result["label"]
            description = result.get("description", "No description available")
            print(f"QID: {qid}, Label: {label}, Description: {description}")
    else:
        print("No results found.")
        
def search_osm_place(query):
    base_url = "https://nominatim.openstreetmap.org/search"
    params = {
        "q": query,
        "format": "json",
        "polygon_geojson": "1",  # Request GeoJSON polygons
        "limit": 7
    }

    response = requests.get(base_url, params=params)

    if response.status_code == 200:
        return response.json()
    else:
        return None

def make_keyword_map(input_list):
    # NER task
    ner = ner_driver(input_list)

    # Build keyword_map to store potential words
    avoid_class = ['QUANTITY', 'CARDINAL', 'DATE', 'ORDINAL']
    keyword_map = {}
    for sentence_ner in ner:
        for entity in sentence_ner:
            if(entity[1] in avoid_class):
                continue
        keyword_map[entity[0]] = entity[1]
    
    return keyword_map

def gen_keyword(title, description, resource_names, resource_descriptions, organization_title, organization_description):
    input_list = [title, description, resource_names, resource_descriptions, organization_title, organization_description]

    if all(not item for item in input_list):
        return -1
    else:
       keyword_map = make_keyword_map(input_list)
       return keyword_map
    
def wiki_output(result):
    if(result == -1):
        print("At least one of the fields should be completed. Leaving all of them empty is not permissible.")
        return
    else:
        for item in result:
            print(item)
            wiki_search(item)
            print('-------------------------------------------')

def geoInfo_output(result):
    if(result == -1):
        print("At least one of the fields should be completed. Leaving all of them empty is not permissible.")
        return
    else:
        for item in result:
            respond = search_osm_place(item)
            if respond:
                print(f"OSM result for {item} is:")
                for place in respond:
                    print("📍", place["display_name"])
                    print(str(place["geojson"]).replace("'", "\""))
                print('-------------------------------------------')
            else:
                    print("No geoInfo provided.")

Input#

✨ You can type information of your own dataset here:

title = '中研院'
description = ''
resource_names = []
resource_descriptions = []
organization_title = ""
organization_description = ""

result = gen_keyword(title, description, resource_names, resource_descriptions, organization_title, organization_description)
Tokenization:   0%|          | 0/6 [00:00<?, ?it/s]
Tokenization: 100%|██████████| 6/6 [00:00<00:00, 64362.72it/s]

Inference:   0%|          | 0/1 [00:00<?, ?it/s]
Inference: 100%|██████████| 1/1 [00:00<00:00,  5.09it/s]
Inference: 100%|██████████| 1/1 [00:00<00:00,  5.06it/s]

Result#

Wikidata Keyword Recommendation:#

wiki_output(result)
中研院
QID: Q337266, Label: Academia Sinica, Description: national academy of Taiwan
QID: Q20872478, Label: Biodiversity Research Center, Academia Sinica, Description: organization
QID: Q20872470, Label: Institute of Information Science, Academia Sinica, Description: organization
QID: Q10875102, Label: Member of Academia Sinica, Description: academic award in Taiwan
QID: Q10899637, Label: Hung En Liu, Description: No description available
QID: Q93696545, Label: Institute of Astronomy & Astrophysics Llbrary, Academic Sinica, Description: No description available
QID: Q120579929, Label: Reasons for the Failure of the Program for the Foundation of a National Metrology System of Time in the Academia Sinica in 1930s, Description: No description available
-------------------------------------------

Geographic Information Recommendation:#

geoInfo_output(result)
OSM result for 中研院 is:
📍 中央研究院, 128, 研究院路二段, 中研里, 南港區, 舊莊, 臺北市, 11529, 臺灣
{"type": "Polygon", "coordinates": [[[121.6098139, 25.0423357], [121.6099928, 25.0422723], [121.6101471, 25.0424388], [121.6102986, 25.0423416], [121.6114298, 25.042353], [121.611598, 25.0422362], [121.6111609, 25.0417884], [121.6110376, 25.0415609], [121.6108531, 25.0413988], [121.6108391, 25.0412591], [121.6108483, 25.0411745], [121.6108213, 25.0409236], [121.6109215, 25.0406827], [121.6109234, 25.0405608], [121.6111356, 25.0395975], [121.6112723, 25.0394586], [121.6119245, 25.0387853], [121.612849, 25.0395204], [121.6132347, 25.0391222], [121.6145719, 25.0394481], [121.6153276, 25.0386336], [121.6154951, 25.0386861], [121.6153522, 25.0390601], [121.6172567, 25.0388414], [121.6174233, 25.0400362], [121.6174569, 25.0403346], [121.6174815, 25.0406155], [121.6174912, 25.0407662], [121.6167943, 25.0416525], [121.6166898, 25.0419514], [121.6166698, 25.0420211], [121.6166611, 25.0420439], [121.616643, 25.0421047], [121.6166444, 25.0421754], [121.6167759, 25.0423978], [121.6167836, 25.042444], [121.6160597, 25.0438771], [121.6159596, 25.0440789], [121.6153441, 25.043857], [121.6149866, 25.0437405], [121.6146199, 25.0446663], [121.6145301, 25.0446373], [121.6141955, 25.0454896], [121.6137999, 25.0453621], [121.6124401, 25.0453351], [121.6124939, 25.0430158], [121.6123115, 25.0430126], [121.6122654, 25.043012], [121.6122424, 25.0429623], [121.6116596, 25.0429193], [121.6114791, 25.0429102], [121.6100696, 25.0428747], [121.6100116, 25.0425081], [121.6098547, 25.0424023], [121.6098139, 25.0423357]]]}
📍 中研院, 128, 研究院路二段, 中研里, 南港區, 舊莊, 臺北市, 11529, 臺灣
{"type": "Point", "coordinates": [121.6165385, 25.0428838]}
📍 中研院, 研究院路二段, 中研里, 南港區, 三重埔, 臺北市, 11564, 臺灣
{"type": "Point", "coordinates": [121.6165938, 25.043366]}
📍 中研院, 工农大路, 红旗街, 红旗街道, 朝阳区, 长春市, 吉林省, 130000, 中国
{"type": "Polygon", "coordinates": [[[125.2966103, 43.8635572], [125.2975544, 43.8628417], [125.2983966, 43.8634218], [125.297431, 43.8641219], [125.2966103, 43.8635572]]]}
-------------------------------------------

Preview the Location in OSM#

Select one of the location above, copy the geoJSON and paste to the below cell:

geoInfo = ''
Hide code cell source
import folium

center_coords = [25.041415686746607, 121.61472689731077]  # Sinica
m = folium.Map(location=center_coords, zoom_start=12)

if(len(geoInfo) == 0):
    print("please paste the geoJSON in the 'geoInfo' string.")
else:
    geojson = eval(geoInfo)
    folium.GeoJson(geojson).add_to(m)
    m.fit_bounds(m.get_bounds())
    display(m)
please paste the geoJSON in the 'geoInfo' string.