2017.5.2 吴定明博士：富语义空间数据中的查询与分析任务

学术报告

当前位置: 首页学术报告

发布时间：2017-05-02 浏览次数：896

报告题目：富语义空间数据中的查询与分析任务(Queries and analysis tasks on semantically rich spatial data)

报告人：吴定明博士

主持人：林欣

报告时间：2017年5月2号 13:30—14:30

报告地点：中北校区理科大楼B714

报告人介绍：

吴定明博士，深圳大学计算机学院助理教授，2011年于丹麦奥尔堡大学获得博士学位，师从数据库领域著名学者Christen Jensen。曾在VLDB, TKDE, TODS, ICDE等国际顶级期刊与会议上发表论文10余篇。曾获得深圳市孔雀人才计划。目前研究方向：数据库系统、数据模型、查询语言、查询与更新处理和数据索引与挖掘算法。

报告摘要：

Semantically rich spatial data are big and ubiquitous, raising challenges with respect to their effective and efficient querying and analysis. In particular, traditional spatial analysis and querying methods are not readily applicable due to the increased data complexity. Toward addressing these challenges and supporting real-life applications that manage such data, three problems on the querying and analysis of (i) geo-social network data, (ii) spatio-textual data, and (iii) spatial RDF data will be covered in this talk. First, I will introduce the problem of Density-based Clustering of Places in Geo-Social networks (DCPGS). Current spatial clustering models disregard information about the people who are related to the clustered places. We extend the density-based clustering paradigm to apply on places in geo-social networks, considering both the spatial information between places and the social relationships between users who visit the places. After formally defining our model and the distance measure it relies on, we present efficient index-based algorithms for its implementation. We evaluate the effectiveness of our model via a case study and two quantitative measures, called social entropy and community score, which indicate that geo-social clusters have special properties and cannot be found by applying simple spatial clustering approaches. The efficiency of our algorithms is also evaluated experimentally. Next, I will present the modeling and evaluation of a Spatio-Textual Skyline (STS) query, in which the skyline points are selected based on not only their distances to a set of query locations, but also on their relevance to a set of query keywords. STS is especially relevant to modern applications, where points of interest are typically augmented with textual descriptions. We investigate three models for integrating textual relevance into the spatial skyline. Among them, model STD, combining spatial distance with textual relevance in a derived dimensional space, is the most effective one. STD computes a skyline satisfying the intent of STS, and having a small and easy-to-interpret size. We propose an IR-tree based algorithm for computing STD-based skylines. The effectiveness of our STD model and the efficiency of the algorithm are evaluated experimentally. Finally, I will talk about the problem of top-k relevant Semantic Place retrieval (kSP) on spatial RDF data, which finds applications in domains such as journalism, health, business, and tourism. Traditionally, RDF data is accessed by structured query languages, e.g., SPARQL. This requires users to understand both the language and the RDF schema. Recent research on keyword search over RDF data aims at reducing such requirements, but still ignores the spatial dimension of RDF data. Our kSP seeks for RDF subgraphs, rooted at spatial entities close to the query location and containing a set of query keywords. Compared to existing work, kSP queries are independent to structured query languages and they are location-aware. We devise a basic method for processing kSP queries. Two pruning approaches and a preprocessing technique are proposed to further improve efficiency. Experiments on real datasets demonstrate the superior and robust performance of our proposals compared to the basic method.

中山北路3663号理科大楼 200062