Research & Teaching

My research concerns the use and development of artificial intelligence, knowledge extraction and knowledge discovery techniques in Web mining and data integration scenarios in a range of application domains. Below you can find a list of selected research & teaching activities I am / have been involved in.

=> current projects
=> previous projects
=> teaching

/ CURRENT PROJECTS

Established information retrieval approaches address the relevance of search results to an information need, whereas the actual learning scope of a user is usually disregarded. Recent research in the search as learning area has recognized the importance of learning scopes and focused on observing and detecting learning needs. However, it has often been restricted to limited and isolated feature sets or specific learning tasks. High-dimensional feature spaces, (audio)visual information, or the generalizability of previous work to support various learning needs by retrieval, ranking, and recommendations have not been investigated yet. The interdisciplinary Leibniz project SALIENT – joining expertise from computer science and psychology – aims at closing this gap by researching methods to improve retrieval performance and individuals’ learning through (a) an accurate detection and prediction of learning needs and knowledge gains during search by means of query logs, navigation logs, eye-tracking and thinking-aloud data, which serve as a basis for (b) supporting users in their learning tasks through an enhanced retrieval and ranking process and recommendations, as well as for (c) suggesting appropriate and personalized recommendations including multimodal information (diagrams, slides, videos, etc.). Results will be evaluated in a variety of scenarios and will lead to generalizable models and methods.

[ Project Website ]

The rise of the Web had a tremendous impact on how learning and knowledge acquisition takes place, specifically in social environments such as LinkedIn or Slideshare, where exchange of knowledge or resources are among the key motivations for interaction. Understanding the needs of all involved stakeholders, such as users, learners, job seekers, training or resource providers, remains a challenging problem. This is due not least to the scale, diversity and heterogeneity of data on the Web, where the extraction and analysis of relevant information poses significant scientific challenges. The goal of the H2020 project AFEL (Analytics for Everyday Learning) is to develop methods and tools to understand informal/collective learning as it surfaces implicitly in online social environments. To this end, AFEL will develop solutions for retrieving, extracting, enriching and analysing data from the Web to shape the understanding of loosely defined learning activities in online social environments.

Urban mobility is i 23_10_logo_blaucondensed nfluenced through a wide range of long-term trends, such as emobility and structural changes, as well as short- and mid-term influence factors, such as large events, weather, seasonal differences or construction sites. Hence, considering complex dependencies as part of mobility models requires the consideration of large amounts of heterogeneous data, for instance, when aiming to support regional administrations and mobility service providers with decision support and accurate models and predictions, which are crucial for cost-efficient planning of urban mobility infrastructures and services. Data4UrbanMobility will use state of the art maching learning, information retrieval and big data technologies to provide models and tools for analysing large amounts of diverse mobility-related data in order to develop precise models and predictions of urban mobility behavior. Tools and applications will be piloted in two regional scenarios (Wolfsburg, Hannover) together with partners from industry, government and public administrations.
[ Project Website ]

la4s-2 Learning Analytics (LA) as a key aspect of Learning Process Management (LPM) supports the measurement, analysis and interpretation of data about learners and their contexts for purposes of understanding and optimizing learning and the environments in which it occurs. In this context, processing and analysis of large amounts of data about learners and learning activities is crucial and requires an interdisciplinary approach. The main objective of the ERASMUS+ project LA4S is to provide access to innovative tools and datasets for learning analytics researchers and practitioners, with the overall aim to facilitate the large-scale validation and improvement of learning analytics methods and tools.
[ Project Website ]

I am coordinating the EU FP7 Support Action LinkedUp which aims to push forward the exploitation of the vast amounts of open data available on the Web, in particular by educational institutions and organizations. This will be achieved by identifying and supporting highly innovative large-scale Web information management applications through an open competition (the LinkedUp Challenge) and dedicated evaluation framework. The vision of the LinkedUp Challenge is to realise personalised university degree-level education of global impact based on open Web data and information. Drawing on the diversity of Web information relevant to education, ranging from Open Educational Resources metadata to the vast body of knowledge offered by the Linked Data approach, this aim requires overcoming substantial challenges related to Web-scale data and information management involving Big Data, such as performance and scalability, interoperability, multilinguality and heterogeneity problems, to offer personalised and accessible education services.
[ Project Website ]

KEYSTONE (semantic KEYword-based Search on sTructured data sOurcEs) is a COST Action with the objective to launch and establish an international network of researchers, practitioners, and application domain specialists working in fields related to semantic data management, the Semantic Web, information retrieval, artificial intelligence, machine learning and natural language processing. KEYSTONE coordinates collaboration among them to enable research activity and technology transfer in the area of keyword-based search over structured data sources, such as Linked Data. Furthermore, it will exploit the structured nature of data sources in defining complex query execution plans by combining partial contributions from different sources.
[ Project Website ]

/ PREVIOUS PROJECTS

I am the coordinator of the EU FP7 STREP DURAARK (“Durable Architectural Knowledge”). Preservation of architectural building data is crucial to suit the interests of all stakeholders (e.g. architects, urban planners engineers, building operators), for instance, to preserve cultural heritage and to enable knowledge-reuse of design and engineering solutions. In particular the Web of data offers a wealth of related information about the context of a builidings and structures, such as its legal, environmental, infrastructural or social context. The DURAARK project will tackle this challenge by developing semantic enrichment and long-term preservation tools, especially tailored to the domain of architectural knowledge.
[ Project Website ]

Linked Education summarises a stream of activities aimed at further promoting the use of Linked Data for educational purposes. While sharing and reusing educational data across institutional and national boundaries is a general goal for both the public and the private education sector, there is only limited take-up of Linked Data principles in the educational field. Through platforms such as LinkedEducation.org or LinkedUniversities.org or the W3C Community Group on Linked Open Education, we aim at facilitating and promoting the Web-scale sharing of educational data & resources based on state-of-the-art Linked Data principles. These forums provide an environment for researchers as well as practitioners in the fields of Web-based education and semantic technologies to share and discuss related data sets, schemas, applications or events and to identify best practices. The general goal is to identify best practices and successful pattern for deployment when sharing educational data on the Web.
[ W3C Linked Open Education Community Group ] [ LinkedEducation.org ]

Smart Link is short for “SeMantic Annotation enviRonmenT for Linked Services”. Simply put, it is an easy-to-use Web application aiding users in the creation of Linked Services – semantic service annotations following Linked Data principles. Amongst other things, it provides an interface to populate and query the Linked Services repository iServe. SmartLink builds on existing technologies and standards to enable wide reach of its annotations. Users can annotate arbitrary services – whether REST-ful or WSDL/SOAP-based – via a simple Web form. Annotations are stored in RDF following established service schemas, namely WSMO-Lite and the Minimal Service Model (MSM) which follow a light-weight approach to Semantic Web Services. Storage of annotations is spread across two public RDF-stores: iServe handles all functional properties defined in the MSM schema while an additional and SmartLink-specific SESAME repository hosts further non-functional service properties. A unified interface to store and query annotations across both repositories is provided by SmartLink.
[ Project Website ]

Searching for information, data, and multimedia resources that are semantically related is a key feature of future Internet. Today’s path to achieving that vision is based on the Linked Open Data (LOD) cloud of RDF data. SugarTube (Semantics Used to Get Annotated video Recording) is a Web3.0 application to search for videos through RDF-based annotated video stored as part of the Open University Broadcast Unit’s learning material. The fundamental technology used to develop the application is Semantic Web Services. Users can search based on keywords, textual analysis of related documents, URLs, or geographical maps. Moreover, SugarTube gathers related data and knowledge from the LOD cloud to enrich the search results, such as related events, people, knowledge, websites, geo-location, maps, and additional video streams from YouTube, the BBC, and OpenLearn.
[ Project Website ]

mEducator – Multi-type Content Repurposing and Sharing in Medical Education is an EC-funded eContentPlus Best Practice Network consisting of 13 European institutions. The aim of mEducator BPN is to implement and critically evaluate existing standards and reference models in the field of e-learning in order to enable specialized state-of-the-art medical educational content to be discovered, retrieved, shared and re-used across European higher academic institutions. Within mEducator, I am leading a work package aimed at implementing and evaluating a Semantic Web Services-oriented eLearning solution based on the results of the LUISA project (see below).
[ Project Website ]

I have been leading a work package on entity extraction and consolidation in the EU IP ARCOMEM which is composed of a consortium of 12 partners led by the University of Sheffield. ARCOMEM’s ultimate goal is to develop methods and tools for transforming digital archives into community memories. This is achieved by applying NLP and Semantic Web techniques to detect and represent events, entities and topics from all kind of archived Web objects (HTML pages to audio-visual material) and by leveraging the wisdom of the crowds (i.e. from all sorts of social media) for the content appraisal and selection.
[ Project Website ]

I led a work package in the EU IP NoTube (“Networks and Ontologies for the Transformation and Unification of Broadcasting and the Internet”) dealing with all sorts of digital multimedia provisioning (e.g. IP-TV) based on Semantic Web (Services) technologies. Further information on the research goals and results of NoTube can be found at the project website.
[ Project Website ]

Service Web 3.0 provides the directives, support and coordination of ongoing research and development in the area of networked service architectures, specifically those that utilize semantic technologies, towards the realization of the Future Internet. Service Web 3.0 provides the foundation for future community research and technological development in the area of large-scale service architectures and the further utility of semantic technologies.
[ Project Website ]

The Internet Reasoning Service (IRS) has the overall aim of supporting the automated or semi-automated construction of semantically enhanced systems over the internet. The epistemological basis of the IRS is based on the decomposition of the system’s expertise into tasks, methods, domains and applications, usually referred to as the TMDA framework. This framework, mainly influenced by extensive research on Problem-Solving Methods, was further extended in the IRS-I to support the creation of knowledge intensive systems structured according to the UPML framework. IRS-II continued this approach and integrated the UPML framework with Web Service technology so as to benefit from the reasoning infrastructure over the Web. Finally, the current version of the IRS, namely IRS-III, has incorporated and extended the WSMO ontology so that the implemented infrastructure allows the description, publication and execution of Semantic Web Services.
[ Project Website ]

Service Oriented Architectures for All (SOA4All) is a Large-Scale Integrating Project funded by the European Seventh Framework Programme, under the Service and Software Architectures, Infrastructures and Engineering research area. SOA4All will help to realize a world where billions of parties are exposing and consuming services via advanced Web technology: the main objective of the project is to provide a comprehensive framework and infrastructure that integrates complementary and evolutionary technical advances (i.e., SOA, context management, Web principles, Web 2.0 and Semantic Web) into a coherent and domain-independent service delivery platform.
[ Project Website ]

I was work package leader in the EU STREP LUISA applying the semantic web service framework, architecture and tools developed with the DIP project, including IRS-III, to eLearning. More specifically, LUISA created a rich flexible infrastructure supporting the development and reuse of learning materials for both learners and educators. The longterm vision behind LUISA is to shift the paradigm in eLearning from one based on learning objects to one based on semantic web services.
[ Project Website ]

/ TEACHING

“Data and Knowledge Engineering” (lecture, SS2019, Heinrich-Heine-University Düsseldorf), schedule and details can be found on the course Website.
Understanding and interpreting heterogeneous data, in particular in distributed settings suchas the Web, remains a challenging task. State-of-the-art Web applications such as Web search engines rely on a combination of approaches for making sense of data, involving both explicit knowledge, for instance, through knowledge graphs such as Wikidata or the Google knowledge graph and semi-structured Web markup, as well as statistical and machine-learning based approaches.
This course provides an introduction to data and knowledge engineering methods and principles, with a particular focus on the Web. This includes methods related to knowledge graphs and formal data & knowledge representation (RDF, OWL, Description Logics), data integration and linking, information extraction, Web data sharing practices (Linked Data, Semantic Web and affiliated W3C standards such as RDF, RDFa, Microdata), as well as emerging approaches in the context of distributional semantics, such as word and entity embeddings. Attention will also be paid to applications of taught techniques to facilitate data sharing and reuse on the Web.
“Advances in Data Science” (seminar, WS2018/2019, Heinrich-Heine-University Düsseldorf), schedule and details can be found via the course website.
Learning from data in order to gain useful insights is an important task, generally covered under the data science umbrella. This involves a wide variety of fields such as statistics, artificial intelligence, effective visualization, as well as efficient (big) data engineering, processing and storage, where efficiency and scalability often play crucial roles in order to cater for the quantity and heterogeneity of data. The goal of this seminar is to deepen the understanding about data science & engineering techniques through studying and critically evaluating state-of-the-art literature in the field. Participants will be introduced to the critical assessment and discussion of recent scientific developments, thereby learning about emerging technologies as well as gaining the ability to evaluate and discuss focused scientific works. Participants will be given recent literature covering relevant data science areas. Each participant will review independently 1-2 publications and present and discuss its content and contribution, which are then presented and discussed with the entire student participants. After successful completion, students will have a deepened understanding of state-of-the-art methods and applications in the data science field. Participants will have gained experience in critically assessing and summarising contemporary research publications.
“Introduction to Data Science” (lecture, WS2017/2018, Leibniz University Hannover), schedule and details can be found via the course website.
Learning from data in order to obtain useful insights or predictions is an important task, generally covered under the data science umbrella. This involves skills and knowledge from a wide variety of fields such as statistics, artificial intelligence, effective visualization, as well as efficient (big) data engineering, processing and storage. While data arises from real-world phenomena, for instance, on the Web, data science investigates how to analyse the data to understand such phenomena. The course teaches critical concepts and practical skills in computer programming and statistical inference, in conjunction with hands-on analysis of datasets, involving issues such as data cleansing; sampling; data management for accessing big data efficiently; exploratory data analysis to generate and test hypotheses; prediction based on statistical methods such as regression and classification; and communication of results through visualization.
“KESW – Knowledge Engineering & the Semantic Web” (lecture, SS2017, Leibniz University Hannover), schedule & details via the course website.
Abstract: This course provides an introduction to fundamental knowledge engineering principles as well as practical knowledge and insights into the use and application of state-of-the-art semantic technologies. Semantic (Web) technologies, based on established W3C standards such as RDF/OWL, Linked Data technologies and entity-centric markup (through RDFa and Microformats) enable the application of formal knowledge engineering principles on the Web and have emerged as defacto standards for sharing data or for annotating unstructured Web documents. The wider goal and purpose is to improve understanding and interpretation of Web documents and data, for instance, to facilitate Web search or data reuse. This course introduces key concepts of knowledge engineering and representation, their application specifically in the context of the (Semantic) Web and their contributions to tasks such as knowledge extraction or knowledge discovery.
“Foundations of Information Retrieval” (lecture, WS2016/2017, Leibniz University Hannover, guest lecturer):
Abstract: The lecture gives an introduction to Web Information Retrieval with particular emphasis on the algorithms and technologies used in the modern search engines. The module covers an introduction to the traditional text IR, including Boolean retrieval, vector space model as well as tolerant retrieval. Afterwards, the technical basics of Web IR are discussed, starting with the Web size estimation and duplicate detection followed by link analysis and crawling. This is followed by the introduction of IR evaluation methods and benchmarks. Finally, applications of classification and clustering in the IR domain are discussed. The theoretical basis is illustrated through examples of contemporary search systems, such as Google.
“KESW – Knowledge Engineering & the Semantic Web” (lecture, SS2016, Leibniz University Hannover), schedule & details via the course website.
Abstract: This course provides an introduction to fundamental knowledge engineering principles as well as practical knowledge and insights into the use and application of state-of-the-art semantic technologies. Semantic (Web) technologies, based on established W3C standards such as RDF/OWL, Linked Data technologies and entity-centric markup (through RDFa and Microformats) enable the application of formal knowledge engineering principles on the Web and have emerged as defacto standards for sharing data or for annotating unstructured Web documents. The wider goal and purpose is to improve understanding and interpretation of Web documents and data, for instance, to facilitate Web search or data reuse. This course introduces key concepts of knowledge engineering and representation, their application specifically in the context of the (Semantic) Web and their contributions to tasks such as knowledge extraction or knowledge discovery.
“Foundations of Information Retrieval” (lecture, WS2015/2016, Leibniz University Hannover, guest lecturer)
“Supervision of student projects in “Labor Web Technologien”” (since 2013, Leibniz University Hannover)
Supervision of PhD & MSc students (since 2006)
Guest lectures and invited talks (details here)
Tutorials and tutorial series at major conferences, specifically on knowledge discovery and and semantic technologies (details here)

/ stefandietze.net