Here you can find resources that were either results of previous work or demonstrate it in one way or another. This page is updated occasionally.

=> datasets
=> applications
=> software & tools
=> vocabularies


Below you can find a list of publicly available datasets (mostly RDF/Linked Data) I have been involved with in the past. Additional resources can be found at

/ AFEL Data Catalog – provides a large collection of data in the area of online learning aimed at facilitating research and development in areas such as learning analytics and web science. Data includes both behavioral/activity of online users as well as resource-centric data and metadata, extracted and augmented from platforms such as Twitter, Bibsonomy, Wikipedia or general-purpose Web-crawls and has contributed to research within the AFEL project on “Analytics for Everyday Learning” and beyond.
[ Dataset ]

/ TweetsKB –  is a public RDF corpus of anonymized data for a large collection of annotated tweets. The dataset currently contains data for more than 1.3 billion tweets, spanning a 4-year period (January 2013 – January 2017). Metadata information about the tweets as well as extracted entities, sentiments, hashtags and user mentions are exposed in RDF using established RDF/S vocabularies. For the sake of privacy, we encrypt the usernames and we do not provide the text of the tweets. However, through the tweet IDs, actual tweet content and further information can be fetched.
[ Dataset ]

/ LRMI Markup on the Web – provides a corpus of RDF statements involving LRMI terms, extracted from the Common Crawl (namely, the Web Data Commons). Data was extracted from the Web Data Commons markup corpus for three consecutive years (2013, 2014, 2015) by selecting all quads which co-occur on the same document as any LRMI term. The latest corpus (2015) contains more than 30 million RDF statements. As such, the corpus enables the investigation of LRMI usage on the Web, its distribution across domains and its evolution over time. An initial study on a preliminary and less comprehensive dataset is available here. The dataset is available as dump and through a SPARQL endpoint (distinct graphs for each year). Respective URIs and user credentials can be obtained on request via the email link below.
[ Dataset ]

/ – provides a unique collection of datasets from the architectural and smart cities domain, collected and generated as part of the DURAARK project. Datasets include 3D models of buildings and infrastructure (IFC models, point clouds/e57) obtained through laser scanning, as well as extracted geometric and semantic metadata about buildings, shapes/structures and their context. Data has been exposed through dedicated APIs and SPARQL endpoints. This unique collection has been nominated for the Dutch Data Prize 2016 in the technical/science category.
[ Data Catalog ]

/ Linked Education Catalog – provides additional metadata about educational datasets pulled from the Linked Education Cloud group at the DataHub. Metadata covers aspects such as type or topic coverage and is being expanded gradually as part of the LinkedUp project.
[ Data Catalog ]

/ L3S Research Datasets – provides a collection of research datasets and corpora currently hosted at L3S Research Center.
[ Data Catalog ]

/ TED Talks Linked Data – provides metadata and transcripts of all TED Talks according to Linked Data principles. The dataset is used by a number of applications.
[ Datahub Page ]

/ ACM Learning Analytics & Knowledge Corpus Linked Data – exposes structured metadata & full text from key research publications in the field of learning analytics and educational data mining. Data is available in a number of formats. Special permission from ACM to use metadata but also full text of publications for research purposes.
[ Dataset Home ]

/ SmartLink Services Non-Functional Properties – exposes Linked Data about publicly available Web services and Web APIs. In particular, the SmartLink dataset extends existing service description schemas with a light-weight schema for non-functional properties.
[ Datahub Page ]

/ mEducator – Linked Educational Resources – exposes linked data about publicly available educational Web resources. A light-weight RDF schema is exploited together with clustering and enrichment techniques to gradually interlink educational resources with entities in established LOD datasets.
[ Datahub Page ]


/ Cite4Me
Cite4Me is a Web application that leverages Semantic Web technologies to provide a new perspective on search and retrieval of bibliographical data, in particular in connection with the LAK Dataset. The Web application focuses on: (i) semantic recommendation of papers; (ii) novel semantic search & retrieval of papers; (iii) data interlinking of bibliographical data with related data sources from LOD; (iv) innovative user interface design; and (v) sentiment analysis of extracted paper citations. For more information see our ISWC2013 demo paper, watch the video, or test the application.

[ Cite4Me ] [ Paper ]

/ DURAARK Workbench
The DURAARK Workbench is a tool suite for enriching and searching architectural and infrastructural data. Automatic means for analysing 3D models (such as point clouds) of buildings are deployed in order enable enrichment, ingest and archival of 3D models. As part of the enrichment process, models are enriched with additional semantic metadata about buildings, detected shapes and the context of the respective structure, for instance, its legal, historical or geographical context. The workbench supports both, the ingest of new models as well as search and retrieval of models previously ingested, as described here. An online version is available here, while all source code and instructions for setting up local instances are shared on GitHub.


[ DURAARK Workbench ] [ Related Publications ]

/ SCS Connector
A key challenge of the Semantic Web lies in the creation of semantic links between Web resources. The creation of links serves as amean to semantically enrich Web resources, connecting disparate information sources and facilitating data reuse and sharing. As the amount of data on the Web is ever increasing, automated methods to unveil links between Web resources are required. In this paper, we introduce a tool, called SCS Connector, that assists users to uncover links between entity pairs within and across datasets. SCS Connector provides a Web-based user interface and a RESTful API that enable users to interactively visualise and analyse paths between an entity pair (ei;ej) through known links that can reveal meaningful relationships between (ei;ej) according to a semantic connectivity score (SCS). The demo is available here and described in an ESWC2014 demo paper (Winner of Best ESWC2014 Demo Award) available here. Further details about the connectivity score can be found in our underlying ESWC2013 research paper.

[ Demo ] [ Paper ]

/ MetaMorphosis+
Discovery of educational resources across the Web is a challenging task. The existing landscape of open educational resources consists of a multitude of distinct metadata schemas and interface mechanism. MetaMorphosis+ is a social semantic network-based Web application, which exploits Linked Data principles to enable tutors and learners access to a wide variety of educational resources from across the Web. A dedicated API on top of SmartLink is used to enable Web-wide search in repositories such as OpenLearn, while a dedicated Linked Educational Resources RDF store is used to provide enriched and well-described Linked Data about available resources. Educational data is automatically enriched with related knowledge within DBpedia and Bioportal and clustering techniques are used to enable users to navigate educational content in an explorative way. Being a social network of educational resources and actors, MetaMorphosis+ also provides comprehensive social capabilities.

[ MetaMorphosis+ ] [ Paper ]

/ SugarTube
Searching for information, data, and multimedia resources that are semantically related is a key feature of future Internet. Today’s path to achieving that vision is based on the Linked Open Data (LOD) cloud of RDF data. SugarTube (Semantics Used to Get Annotated video Recording) is a Web3.0 application to search for videos through RDF-based annotated video stored as part of the Open University Broadcast Unit’s learning material. The fundamental technology used to develop the application is Semantic Web Services. Users can search based on keywords, textual analysis of related documents, URLs, or geographical maps. Moreover, SugarTube gathers related data and knowledge from the LOD cloud to enrich the search results, such as related events, people, knowledge, websites, geo-location, maps, and additional video streams from YouTube, the BBC, and OpenLearn.

[ Website ] [ SugarTube ]

/ NoTube: semantics for distributed multimedia search on the Web
The following movie shows a prototypical application of our approach to similarity-based Semantic Web Service-discovery and mediation. For further documentation, please have a look at the corresponding publications. The application uses our approach for similarity-based matchmaking of Web services to retrieve video content metadata and was produced within the context of EU IP NoTube.

[ Website ]

/ LUISA: Semantic Web Services-based composition of eLearning resources
The following two movies demonstrate our Semantic Web Service-based approach for context-aware provisioning of eLearning resources (movie.3) and one application for mobile and context-based information delivery (movie.4). The presented software prototypes were among the outcomes of the EU STREP LUISA. For further documentation, have a look at the corresponding publications.

[ Website ]


/ SmartLink
Smart Link is short for “SeMantic Annotation enviRonmenT for Linked Services”. Simply put, it is an easy-to-use Web application aiding users in the creation of Linked Services – semantic service annotations following Linked Data principles. Amongst other things, it provides an interface to populate and query the Linked Services repository iServe. SmartLink builds on existing technologies and standards to enable wide reach of its annotations. Users can annotate arbitrary services – whether REST-ful or WSDL/SOAP-based – via a simple Web form. Annotations are stored in RDF following established service schemas, namely WSMO-Lite and the Minimal Service Model (MSM) which follow a light-weight approach to Semantic Web Services. Storage of annotations is spread across two public RDF-stores: iServe handles all functional properties defined in the MSM schema while an additional and SmartLink-specific SESAME repository hosts further non-functional service properties. A unified interface to store and query annotations across both repositories is provided by SmartLink.
[ Website ]

/ Internet Reasoning Service IRS-III
The Internet Reasoning Service (IRS) has the overall aim of supporting the automated or semi-automated construction of semantically enhanced systems over the internet. The epistemological basis of the IRS is based on the decomposition of the system’s expertise into tasks, methods, domains and applications, usually referred to as the TMDA framework. This framework, mainly influenced by extensive research on Problem-Solving Methods, was further extended in the IRS-I to support the creation of knowledge intensive systems structured according to the UPML framework. IRS-II continued this approach and integrated the UPML framework with Web Service technology so as to benefit from the reasoning infrastructure over the Web. Finally, the current version of the IRS, namely IRS-III, has incorporated and extended the WSMO ontology so that the implemented infrastructure allows the description, publication and execution of Semantic Web Services.
[ Website ]


Below you can find a list of (mostly RDF/OWL) vocabularies which are used in the above applications and are developed as a community process.

/ LRMI – Learning Resources Metadata Initiative – is an extension to for annotating learning resources, i.e. creative works such as books, movies or articles which are used in a learning context. The development currently takes place through a DCMI Task Force in which I participate. Further documentation can be found at and a preliminary study of LRMI adoption is available here.
[ spec ]

/ Vocabulary of Links – is a work-in-progress RDF schema for expressing metadata about links between resources & entities within and across RDF datasets. It is used as part of ongoing research to further describe the provenance and process behind automatically generated entity links and is intended to be used, for instance, to further describe VoID linksets. Further documentation can be found at
[ file ]

/ Linked Education Schema – is a work-in-progress RDF schema for cataloging and aligning educationaly relevant Web resources and their schemas. It is used as part of the LinkedUp project to categorise Web resources and datasets.
[ file ]

/ SmartLink Services Non-Functional Properties – is a simple RDF schema used by by SmartLink to capture non-functional properties (NfP) of services and Web APIs.
[ file ]

/ mEducator Educational Resources – was designed as an RDF schema to capture Linked Data-compliant metadata for educational Web resources and is used by a number of data stores and the Metamorphosis+ application.
[ file ]

Please find below a set of ontologies, produced with the Operational Conceptual Modelling Language, OCML based on IRS-III as editing and reasoning environment. Alternatively, RDF versions are available on request.

/ Conceptual Spaces Ontology, CSO – enabling to refine symbolic ontology concepts and instances following the Conceptual Spaces theory. Further documentation here.
[ file ]

/ Situation-Driven Processes Ontology, SDPO – enabling to incorporate Semantic Web Service orchestrations as situation-driven process sequences. Partially based on DOLCE D&S. Further documentation here and here.
[ file ]

/ Learning Process Modelling Ontology, LPMO – LPMO derives Situation-Driven Processes (please see above) for the needs of the eLearning domain. In that, it provides the expressivity to describe learning contexts and processes which are aligned to Semantic Web Services in terms of WSMO. Further documentation here.
[ file ]

%d bloggers like this: