Project Result#
Now, let’s integrate the two aspects of our project. This integration will yield an interactive interface that enables users to input their dataset information. Consequently, they will receive the suggested Wikidata keywords along with the most fitting location that accurately characterizes the dataset.
Tip
To obtain wikidata keyword and the geographic information for your dataset, follow these steps:
Click on the “rocket icon” located in the top-right corner.
Select the option labeled
Live Code
from the menu.Once the environment is launched, you’ll be able to manually execute each code cell.
For any hidden code cells, simply click on Show code cell source
and subsequently click run
within each respective cell section.
Here’re the information you may input:
title
: the title of your datasetdescription
: description of your datasetresource_names
: file names in your datasetresource_descriptions
: the description of files in your datasetorganization_title
: the tilte of the affiliatedorganizationorganization_description
: the description of the affiliatedorganization
Warning
Notice At least one of the fields should be completed. Leaving all of them empty is not permissible.
Function Definations#
Just expend it and click run
Show code cell source
# Packages Import ============================================================
import requests
# NLP task model
from ckip_transformers.nlp import CkipNerChunker
ner_driver = CkipNerChunker(model="bert-base")
# Function Definstion =========================================================
def wiki_search(search_term):
url = f"https://www.wikidata.org/w/api.php?action=wbsearchentities&format=json&search={search_term}&language=zh"
response = requests.get(url)
data = response.json()
# organize the response
if "search" in data:
for result in data["search"]:
qid = result["id"]
label = result["label"]
description = result.get("description", "No description available")
print(f"QID: {qid}, Label: {label}, Description: {description}")
else:
print("No results found.")
def search_osm_place(query):
base_url = "https://nominatim.openstreetmap.org/search"
params = {
"q": query,
"format": "json",
"polygon_geojson": "1", # Request GeoJSON polygons
"limit": 7
}
response = requests.get(base_url, params=params)
if response.status_code == 200:
return response.json()
else:
return None
def make_keyword_map(input_list):
# NER task
ner = ner_driver(input_list)
# Build keyword_map to store potential words
avoid_class = ['QUANTITY', 'CARDINAL', 'DATE', 'ORDINAL']
keyword_map = {}
for sentence_ner in ner:
for entity in sentence_ner:
if(entity[1] in avoid_class):
continue
keyword_map[entity[0]] = entity[1]
return keyword_map
def gen_keyword(title, description, resource_names, resource_descriptions, organization_title, organization_description):
input_list = [title, description, resource_names, resource_descriptions, organization_title, organization_description]
if all(not item for item in input_list):
return -1
else:
keyword_map = make_keyword_map(input_list)
return keyword_map
def wiki_output(result):
if(result == -1):
print("At least one of the fields should be completed. Leaving all of them empty is not permissible.")
return
else:
for item in result:
print(item)
wiki_search(item)
print('-------------------------------------------')
def geoInfo_output(result):
if(result == -1):
print("At least one of the fields should be completed. Leaving all of them empty is not permissible.")
return
else:
for item in result:
respond = search_osm_place(item)
if respond:
print(f"OSM result for {item} is:")
for place in respond:
print("📍", place["display_name"])
print(str(place["geojson"]).replace("'", "\""))
print('-------------------------------------------')
else:
print("No geoInfo provided.")
Input#
✨ You can type information of your own dataset here:
title = '中研院'
description = ''
resource_names = []
resource_descriptions = []
organization_title = ""
organization_description = ""
result = gen_keyword(title, description, resource_names, resource_descriptions, organization_title, organization_description)
Tokenization: 0%| | 0/6 [00:00<?, ?it/s]
Tokenization: 100%|██████████| 6/6 [00:00<00:00, 64362.72it/s]
Inference: 0%| | 0/1 [00:00<?, ?it/s]
Inference: 100%|██████████| 1/1 [00:00<00:00, 5.09it/s]
Inference: 100%|██████████| 1/1 [00:00<00:00, 5.06it/s]
Result#
Wikidata Keyword Recommendation:#
wiki_output(result)
中研院
QID: Q337266, Label: Academia Sinica, Description: national academy of Taiwan
QID: Q20872478, Label: Biodiversity Research Center, Academia Sinica, Description: organization
QID: Q20872470, Label: Institute of Information Science, Academia Sinica, Description: organization
QID: Q10875102, Label: Member of Academia Sinica, Description: academic award in Taiwan
QID: Q10899637, Label: Hung En Liu, Description: No description available
QID: Q93696545, Label: Institute of Astronomy & Astrophysics Llbrary, Academic Sinica, Description: No description available
QID: Q120579929, Label: Reasons for the Failure of the Program for the Foundation of a National Metrology System of Time in the Academia Sinica in 1930s, Description: No description available
-------------------------------------------
Geographic Information Recommendation:#
geoInfo_output(result)
OSM result for 中研院 is:
📍 中央研究院, 128, 研究院路二段, 中研里, 南港區, 舊莊, 臺北市, 11529, 臺灣
{"type": "Polygon", "coordinates": [[[121.6098139, 25.0423357], [121.6099928, 25.0422723], [121.6101471, 25.0424388], [121.6102986, 25.0423416], [121.6114298, 25.042353], [121.611598, 25.0422362], [121.6111609, 25.0417884], [121.6110376, 25.0415609], [121.6108531, 25.0413988], [121.6108391, 25.0412591], [121.6108483, 25.0411745], [121.6108213, 25.0409236], [121.6109215, 25.0406827], [121.6109234, 25.0405608], [121.6111356, 25.0395975], [121.6112723, 25.0394586], [121.6119245, 25.0387853], [121.612849, 25.0395204], [121.6132347, 25.0391222], [121.6145719, 25.0394481], [121.6153276, 25.0386336], [121.6154951, 25.0386861], [121.6153522, 25.0390601], [121.6172567, 25.0388414], [121.6174233, 25.0400362], [121.6174569, 25.0403346], [121.6174815, 25.0406155], [121.6174912, 25.0407662], [121.6167943, 25.0416525], [121.6166898, 25.0419514], [121.6166698, 25.0420211], [121.6166611, 25.0420439], [121.616643, 25.0421047], [121.6166444, 25.0421754], [121.6167759, 25.0423978], [121.6167836, 25.042444], [121.6160597, 25.0438771], [121.6159596, 25.0440789], [121.6153441, 25.043857], [121.6149866, 25.0437405], [121.6146199, 25.0446663], [121.6145301, 25.0446373], [121.6141955, 25.0454896], [121.6137999, 25.0453621], [121.6124401, 25.0453351], [121.6124939, 25.0430158], [121.6123115, 25.0430126], [121.6122654, 25.043012], [121.6122424, 25.0429623], [121.6116596, 25.0429193], [121.6114791, 25.0429102], [121.6100696, 25.0428747], [121.6100116, 25.0425081], [121.6098547, 25.0424023], [121.6098139, 25.0423357]]]}
📍 中研院, 128, 研究院路二段, 中研里, 南港區, 舊莊, 臺北市, 11529, 臺灣
{"type": "Point", "coordinates": [121.6165385, 25.0428838]}
📍 中研院, 研究院路二段, 中研里, 南港區, 三重埔, 臺北市, 11564, 臺灣
{"type": "Point", "coordinates": [121.6165938, 25.043366]}
📍 中研院, 工农大路, 红旗街, 红旗街道, 朝阳区, 长春市, 吉林省, 130000, 中国
{"type": "Polygon", "coordinates": [[[125.2966103, 43.8635572], [125.2975544, 43.8628417], [125.2983966, 43.8634218], [125.297431, 43.8641219], [125.2966103, 43.8635572]]]}
-------------------------------------------
Preview the Location in OSM#
Select one of the location above, copy the geoJSON and paste to the below cell:
geoInfo = ''
Show code cell source
import folium
center_coords = [25.041415686746607, 121.61472689731077] # Sinica
m = folium.Map(location=center_coords, zoom_start=12)
if(len(geoInfo) == 0):
print("please paste the geoJSON in the 'geoInfo' string.")
else:
geojson = eval(geoInfo)
folium.GeoJson(geojson).add_to(m)
m.fit_bounds(m.get_bounds())
display(m)
please paste the geoJSON in the 'geoInfo' string.