This is a selection of projects I have worked on over the past years.
i-EKbase - Intelligent Environmental Knowledgebase
The i-EKBase system is designed to monitor large farming areas using remote sensor data. The main source of input is data from Landsat and Modis satellites. i-EKBase integrates remote sensing data with various other sources (e.g., Bureau of Meteorology weather observations, Australian Soil Data, etc.). Farmers are provided with information and guidance related to the local biodiversity, soil quality, water availability, irrigation, topography, and early pest and plant disease prevention to improve crop yield management.
Semantic Sensor Data
Semantic enrichment of sensor data addresses the problems of (re-)use, integration, and discovery. A critical issue is how to generate semantic sensor data from existing data sources. In this project, we developed an approach to semantically augment an existing sensor data infrastructure to re-publish the data as Linked Open Data. In our use case we show how semantic sensor data can help with the growing challenge of selecting sensors that are fit for purpose.
Link My Data
One of the biggest obstacles to reuse of third-party sensor data is a lack of knowledge about data properties (e. g., provenance and quality) leading to a lack of trust in the data. Link My Data is a first step towards overcoming this problem. Link My Data provides a platform for data curation that allows users to share knowledge about individual sensors and sensor observations. The system supports annotation and transformation of sensor data on the Web to improve data quality and (re-)usability.
South Esk Hydrological Sensor Web
Limited freshwater resources in many parts of Australia have led to a highly regulated system of water allocation. Poor situation awareness can result in over-extraction of water from river systems, compromising river ecosystems. To increase situation awareness, we developed a continuous flow forecasting system based on the Open Geospatial Consortium Sensor Web Enablement standards. A prototype Hydrological Sensor Web has been established in the South Esk river catchment in north-eastern Tasmania. Observations from the aggregated sensor assets drive a rainfall-runoff model that predicts river flows at key monitoring points in the catchment.
The South Esk Hydrological Sensor Web was our test-bed for research on management and re-use of sensor observations and sensor metadata. I was involved in developing a provenance management system for a continuous flow forecasts system. The generation of predicted river flows involves complex interactions between instruments, simulation models, computational facilities and data providers. Correct interpretation of information produced at various stages of the information life-cycle requires detailed knowledge of data creation and transformation processes. Such provenance information allows hydrologists and decision-makers to make sound judgments about the trustworthiness of hydrological information.
This project won the Asia Pacific ICT (APICTA) Award for Sustainability and Green IT, 2012 and the Australian iAward for Green IT and Sustainability, 2011.
Both relational databases and wikis have strengths and weaknesses for use in collaborative data management and data curation. Relational databases offer many advantages such as scalability, query optimization and concurrency control, but are not easy to use and lack other features needed for collaboration. Wikis have proved enormously successful as a means to collaborate because they are easy to use, encourage sharing, and provide built-in support for archiving, history-tracking and annotation. However, wikis lack support for structured data, efficiently querying data at scale, and fine-grained data provenance. To achieve the best of both worlds, we implemeted a general-purpose platform for collaborative data management, called DBWiki. Our system not only facilitates the collaborative creation of a structured database; it also provides features not usually provided by database technology such as versioning, provenance tracking, citability, and annotation.
XArch - XML Archiver
XArch is an archive management system that allows one to create, populate, and query archives of multiple database versions. XArch is based on a nested merge approach that efficiently stores multiple database versions in a compact archive. The system allows one to create new archives, to merge new versions of data into existing archives, and execute both snapshot and temporal queries using a declarative query language. XArch has an extensible IO layer and is currently capable of archiving data in XML format as well as relational databases.
This is a list of selected publications.
A Use Case in Semantic Modelling and Ranking for the Sensor Web
International Semantic Web Conference (IWSC), 2014
Towards Content-Aware SPARQL Query Caching for Semantic Web Applications
Web Information Systems Engineering (WISE), 2013
From RESTful to SPARQL: A Case Study on Generating Semantic Sensor Data
SSN@ISWC 2013: 51-66
Link My Data: Community-based Curation of Environmental Sensor Data
Intl. Symposium on Spatial and Temporal Databases (SSTD), Demo Track, 2013
Discovering conditional inclusion dependencies
ACM Conf. on Information and Knowledge Management (CIKM), 2012
Improving data quality by source analysis.
ACM J. Data and Information Quality, Vol. 2, Issue 4, March 2012
The Database Wiki Project: A General-Purpose Platform for Data Curation and Collaboration
SIGMOD Record, Vol. 40, No. 3, September 2011
Using Links to prototype a Database Wiki
Symposium on Database Programming Languages (DBPL), Seattle, WA, 2011
DBWiki: A Structured Wiki for Curated Data and Collaborative Data Management
ACM International Conference on Management of Data (SIGMOD), Demo Track, 2011
Detecting Inconsistencies in Distributed Data
IEEE International Conference on Data Engineering (ICDE), 2010
Curating the CIA World Factbook
International Journal of Digital Curation, Issue 3, Volume 4, 2009
XArch: Archiving Scientific and Reference Data
ACM International Conference on Management of Data (SIGMOD), Demo Track, 2008
Sorting Hierarchical Data in External Memory for Archiving
Proceedings of the VLDB Endowment, Volume 1, Issue 1, 2008
Describing Differences between Databases
ACM Conf. on Information and Knowledge Management (CIKM), 2006
Columba: An Integrated Database of Proteins, Structures, and Annotations
BMC Bioinformatics, 6(1):8, 2005
Mining for Patterns in Contradictory Data
ACM Workshop on Information Quality for Information Systems, 2004
COLUMBA: Multidimensional Data Integration of Protein Annotations
Workshop on Data Integration in Life Sciences (DILS), 2004
Data Quality in Genome Databases
Proceedings of the Conference on Information Quality (IQ), 2003
Problems, Methods, and Challenges in Comprehensive Data Cleansing
HUB-IB-164, Humboldt University Berlin, 2003