NGI DAPSI: Interoperable Data Portability support for developing privacy-aware P2P applications (IDADEV-P2P)
The General Data Protection Regulation (GDPR) addresses the effective control of citizens over their personal data. However, that aim requires new frameworks, components and methods readily available to developers. Particularly, data portability requires that applications that interact with data subjects are capable of dealing with openly shared semantics and make data schemas (using different expression languages, including ontologies) and the processing of personal information understandable both for developers (in the form of tools and patterns) and to end users (providing integrated information and support tools).
This project aims at providing the necessary methods and tools to developers in their need of specific guidance and support for building (mainly mobile) applications that are (a) GDPR aware from its inception, (b) prepared for easy data portability based on immutable shared semantics and (c) supportive of end users with respect to their rights and control of personal data.
The project departs from the sound framework provided by the combination of Decentralized Identifiers (DIDs), standardized verifiable credentials and their mechanisms for verification and transmission over secure P2P communications. A typical infrastructure for that would be Hyperledger Indy + Aries + Ursi, that should be complemented with (i) interoperability support (with explicit support of FAIR principles) and (ii) user facing support tools that ease the integration of functionalities for user data control. From that framework, the project addresses the main interdisciplinary advanced research challenges to come up with a solution for cross-domain semantically interoperable data portability, based on shared schemas that are transferred to an open data model in a permissionless blockchain via data conversion processes.
The project aims at becoming a catalyst by addressing best practices in the development and deployment of apps, accelerating consistent development and ensuring data portability is built-in and interoperable across apps in different domains, preventing them to become data silos and a source of frustration and uncertainty for citizens.
Data portability and control of personal data by users has the potential to become a key enabler of new applications and business models, provided that these can interchange data. The current lack of specific guidelines, tools and practices for doing so hampers both user control of personal data and also opportunities for innovation and new business models. We address here data interoperability and compatibility from the application developer perspective, filing the gaps in some identified key areas:
- General guidance and best practice for development, closing the gap of the average application development (focusing on mobile apps but also applicable to Web or desktop) skills and the specialized needs for data portability and privacy, including technical, user experience and legal.
- Ready, out of the box, components for app developers that fill the gap and enable consistent and compatible applications, easing and speeding “data portability compliant” app development. This includes advances in R&I in: decentralized architecture, data management and user support, the latter with an emphasis on informing and assisting users beyond the regular practice of user agreements that entail pitfalls and opacity to users, that most of the time don’t even read or understand them.
- Interoperability, covering formatting and schema but also semantics, leveraging the practices and infrastructure for interoperability yet deployed and combining with best practice in GDPR-aware technologies as W3C Verifiable Credential Models.
The following Figure shows the departure tentative schema of the components to be developed in the project and its relation to existing technologies (details below).
The actual transfer of personal data takes place in the P2P secure communications environments using Verifiable Credentials and a technology as Hyperledger Aries. Public permisionless registries of DIDs for app providers are used for user initiated (via a secure storage device as it can be implemented in a mobile phone) communication and for verification of credentials, a common pattern fostered by projects as Hyperledger Indy. However, the credentials lack by themselves the references to shared schemas and infrastructure is needed for that. That is envisioned as a bridge between the Web of Data (in the upper right, understood broadly as published and curated schemas relevant for description of user data, not only those in RDF) and immutable replicas of those schemas in an open blockchain (as Ethereum for example) that provide immutability for consistent and evolvable development. That would require also rethinking FAIR metrics and practices used in the Web of Data and transferring them to the blockchain realm. However, these require services and components to mediate the three environments. Thus, the fundamental architecture is made up of:
- A development kit that includes interoperability libraries for data schema transformation and conversational and UI components that are GDPR and data portability “savvy”.
- A set of services for the use of AI techniques in entity linking and mapping to reconcile schemas before publishing, and the services to track the Web of Data and app uses of those to aid application developers that will consume the results of these services.
These require applied research and innovation to experiment methods and evaluate alternative designs. It should be noted that all the components in the Figure entail the use of Open Source software. Likewise, our project will commit completely to Open Source Licenses, to maximize impact and transfer of innovation.
The key innovations will be exercised in the planned MVP that would demonstrate the use of the framework produced as outcome for a use case of actual data portability in the switch of a provide to other initiated by a user. The key points demonstrated would be:
- How the developers of the two applications are able to map and reconcile the different schemas and fuse personal data plus consent data (that provides legal security to both sides of the communication), hiding the complexities of the schemas, vocabularies and languages used to maximize adoption by average app developers.
- How the apps will be able to inform the use using advanced methods including a chatbot interface and the details of the regulation applicable, specifics of the fields in the data model and of the domain, all leveraging semantic technologies into an integrated ontology model combined with the NLP techniques used in chatbot development.
- How the apps would use W3C Verifiable Credentials combined with blockchain based verification and P2P secure technologies for a seamless transfer, repurposing and aggregation in app backends.
The focus is thus that of integrating technologies and preparing them for adoption both for developers and users. In that regard, the project aligns with a human-centric view of the Internet, empowering users and opening the potential of developers to innovate in directions that consider data subject rights as a departure point.