We live in an information paradigm. The standard model of physics is being challenged and replaced by a data centric probabilistic universe. It turns out that our deterministic material reality was a subjective human view of events, when viewed in the light of advances in quantum energy, fractals and chaos mathematics.
Automated industrial and personal computing applications are now completely pervasive in the fabric of human society in both the developed and developing world. And they are all built of blocks of data. We do have to evolve our concepts of data to understand that information technology is now a virtualization of information services, and networking connected devices of all kinds. It is comprised merely of bits of information, independent of wires, chips and electronic components, that over the next decade will probably be replaced by quantum computers of a very different physical design.
Recent heightened public awareness of protection of data privacy, triggered by electronic security failures, is an opportunity to redefine the view that data is merely an electronic representation of information. Security issues have raised the alarm, consequently we must track the data lifecycle more effectively. The simplest solution may be to manage data holistically, from inception to end-of-life.
There is a clear requirement to standardise and categorise data in a way that we can continue to evolve our technology to meet the challenges of global information dissemination for exchange of scientific, humanitarian and world trade data. Advanced UML models and modelling technology is already being used for structured data terminologies. An identity management domain incorporating blockchain as a class model serves to demonstrate that a platform independent model can easily extend any of the industry common information models to implement nexus identity management.
Data has a very specific meaning and lifecycle, in terms of creation, context, transformation and transportation of representation from one location to another. This entails residency, ownership and above all accessibility. Over the past decades, evolving legal frameworks around data custody, intellectual property, management responsibility and data-as-an-asset have been developed as part of an evolving set of data practices.
Yet we do not have a complete view of the data lifecycle. Today there are a patchwork of individual standards of governance, usage policies and structural definitions in most international jurisdictions for what is now arguably the most valuable asset on the planet. And at the same time the misuse of data is now one of the most common crimes in every society. The key to facilitating an evolution of global data residency may be by enabling a revolutionary international approach to role-based data access control throughout its lifecycle.
It seems possible that data itself requires an identity, a standard tagging of data elements with category, ownership, authorship, purpose, security classification and permitted jurisdiction and residency, that can be formed, encrypted, distributed, stored and updated with a secure practice in place of user and application credentials. While this may seem like a complete shift, in reality it may not be so difficult with co-operation and collaboration amongst interested parties.
There is a proliferation of fraudulent use of data to the benefit of a few and the detriment of most people. The question must be asked, do we require a consolidated global standard for data lifecycle, with a transparent public audit trail of role-based data access, perhaps a blockchain approach to logging all data change transactions?
Timeline in Brief
As information technology went global in the 1980s, there were many initiatives to standardize data across industries, such as telecommunications, manufacturing and health, some of which were more successful than others. Currently there are few precise agreed definitions of common terminologies and concepts for most fields of technology, science and industry, and a large scope for misinterpretation.
With the advent of Web 2.0, the focus shifted to user generated content at the end of the 1990s, and with the consequent proliferation of applications for everything by everyone, the momentum for standardization was lost, with the resulting increased heterogeneity, diversity and disparity of data terminologies that we experience today. While search capabilities have improved, and context is now an important element of data definition, because of the sheer volume of information collected by individuals, NGOs, corporations and governments, the signal is being lost in the white noise, both for structured and unstructured data elements and collections.
The security of data is falling and failing. It is true that if there is sufficient motivation, either financial or political, a work around can be found for the current generation of security vulnerabilities to protect essential data. The efficacy of new security measures is ephemeral, short-lived and effective only until research by well-funded state sponsored actors and criminal organizations develops a new exploit. This is because we have not address the fundamental problem, which is that the basic network and application protocols were designed without security in mind.
The success of fraudulent misuse of data has led to the current situation, a proliferation of security tools and technologies marketed as the answer to data protection. They have been developed in response to security breaches and vulnerabilities. In the words of Symantec , ‘We are only one step ahead of the hackers’.
Identity and Access Management
The biggest weakness, the exploit vector bypassing data security, is identity fraud, either counterfeiting identity credentials or access tokens. As all systems of data accessibility depend on access privileges that can be compromised by persistent, carefully planned and patient interception attacks, the race against global fraud is in danger of becoming a lost cause. The success of these attacks is not only because of poor implementation of security standards, but also the information protocols themselves, developed ad hoc, inherited from an electronic age where networking and hardware were essentially deployed in a hub and spoke configuration. Today interconnectivity is decentralized, and potentially global communications across partner organizations. The current generation of security technology is no match for attackers who are extremely well funded with resources matching those that large corporations and governments spend on cyber defence.
Identity and Access Management still depends largely on good working practices by responsible people, establishing, maintaining and securing access privileges, making use of the excellent advances in cryptography. Given the paradigm of current work practices, where people are working on and off site, over network connections that are more or less secure, this is hardly sufficient. By analogy, pilot error is still the largest cause of air safety violations, and the same is true for identity management. Human failings aside, the corruption capabilities of the extraordinary profits from identity fraud is a considerable factor.
Currently security professionals acknowledge there is no fool proof method of preventing security breaches, and that new variations of old attack methods are constantly surfacing. Even ‘Zero Trust’ measures such as virtual variations on physical isolation of servers can be compromised over time by capturing identity details, understanding authorization mechanisms and spoofing authentication credentials of people, roles and applications. At the network layer, forms of single packet inspection to identify communications, are innovative and successful within bounds. However these methods that are only as secure as the systems that collect the encapsulated identity data, providing a single point of failure. Once authorization (identity verification) has taken place there is plenty of evidence of systems being compromised with exploits such as ‘golden tickets’ and ‘golden tokens’ giving intruders administrator privileges for network and identity tokens.
If privileged access to highly sensitive or classified data is the basis for data security, violation of trust is bound to increase, as the incentives for malpractice grow. There are huge profits involved. Global political uncertainties and provocations, aligned with growing international tensions can only lead to increases in attacks on essential infrastructure.
Journey to Data Privacy
All industries hold personal data, and for those to which the European GDPR directive apply, legal protection is required. The European Commission defines personal data as any information that relates to an identified or identifiable living individual. Different pieces of information, which collected together can lead to the identification of someone, also constitute personal data. This includes technology information such as IP address and device identifier.
GDPR EU legislation is applicable to organizations either processing personal data in the EU, or relating to EU citizens. The legislation applies to organizations inside and outside of the EU. Non-compliant organizations may find it more difficult to do business in Europe. GDPR EU legislation became law in 2016, and on 25th May 2018 the stringent penalties for non-compliance came into play. There is a wide range of personally identifiable information, including personal demographic, employment, financial, healthcare and social data, that must now be adequately protected under European law.
A better way to provide data assurance and governance, rather than closing a vulnerability attack after the fact, may well be to develop a data security protocol that is secure by design from the outset, with a focus on protection of the data itself. Can we adopt a data identity standard that mandates practices, protocols and methods of non-repudiation that are focussed on the stored data representation? Currently the focus is the user applications, Internet Protocol, and the various integration methods and protocols of network connectivity.
A new data identity protocol could address the entirety of the data lifecycle, including creation, acquisition, encryption, storage and disposal of the data set and component elements. Standard cryptographic algorithms applied at data source, distributed to a network of identity providers for non-repudiation may be a cost-effective improvement for providing data protection. The current situation is a cycle of a proliferation of information security tools applied at every stage of application access, integration, network connection, and data transportation. These tools and techniques have largely been developed in view of attack vectors that have already happened. The cost of securing data has increased dramatically over the past decade.
Data Identity Standard
It is time to rethink the paradigm of sensitive, classified data, to provide a distributed security context for the data itself, independent of the facilitating technology services. One innovation may be to provide collected information with an identity, a type of signature that records the registration, authorship, usage, persistence, access, update and disposal of data sets. This accompanying metadata could remain throughout the creation and operational use of data throughout its lifecycle, protected by a distributed chain of transactions that require the consensus of the network ownership for change not only to data, but to the accompanying metadata.
The technology is readily available, provided it is implemented and deployed as a well-designed public cloud collection and storage mechanism, with a careful use of the currently available set of security mechanisms, cryptography and key management, audited by logging and monitoring services that would be extremely difficult to corrupt, and virtually impossible if corroborated across more than one public cloud audit trail.
Industry specific terminologies have been developed over the past decades. Telecommunications, Health and Energy industries have developed common models. The Telecommunications Information Framework (SID) provides a reference model and common vocabulary for all the information required to deploy network operations for fixed line and mobile service operations. In electric power transmission and distribution, the Common Information Model (CIM) is a standard developed by electric power utilities to allow application software to exchange information about energy networks. Network power and telecommunications data elements are very sensitive from the point-of-view of security of public operations, and as such are obvious targets for disruption from hostile actors. OpenEHR is a specification that describes the management and storage, retrieval and exchange of electronic health records. Patient records contain some of the most important personal data to be protected from security vulnerabilities.
Starting with well-structured industry terminologies, data modelling standards groups could develop and provide recommendations on the classification of data elements, to which could be applied a consistent protocol for securing data identity. This could be a range of standard measures including cryptography, content validation, and a blockchain of independent identity providers for non-repudiation, with an audit service backed by logging and monitoring capabilities replicated across public cloud providers for non-repudiation.
Data Identity Technology
The most effective way to secure information is a combination of physical security, best practice cryptography and a multi-pass verification of identity credentials. Currently there are standards such as OAuth 2.0 and OpenID Connect applied to end users and applications for authorization and authentication. There is no real co-ordination of authentication across the network, transport and application layers, meaning that data integrity is only as good as the weakest security measure in the chain of protocols across networking endpoints, internet (TCP/IP) and applications (e.g. HTTP). End-to-end security is currently not secure by design, rather the result of security patched onto and overlaid over an older paradigm of applications and data running on physical hardware and local area networks.
Blockchain was originally developed as a protocol to timestamp transactions for non-repudiation. It was adapted to its use as a bitcoin currency in 2008. A block chain is an additive set of blocks, linked by encrypting the previous links in the chain, stored as a distributed ledger. By 2018, private (permissioned) blockchain was adapted as a technology for a variety of business uses. Once recorded, the data in any given block cannot be altered retroactively without alteration of all subsequent blocks, requiring consensus from members of the chain.
As blockchains use cryptographic algorithms, they are compute intensive, eminently suitable for low volume transactions, such as the storage of sensitive or classified data. Any data change transactions would require informed consent from the blockchain members. The distributed data store is comprised of all the databases maintained by the members network group, meaning that there is no centralized distribution point for the sharing and replication of data. A data set could be stored as a distributed record of transactions, broadcast simultaneously to all participants in the network, making misuse of stored data much more difficult.
Figure 1: Blockchain Identity Security Logical View
Such an initiative might be the only way to minimise the requirement for the proliferation of network security monitoring from devices over virtual wide area networks, connecting data centers to public clouds. The catalogue of prevention, and detection tools accompanying application security from mobile application registration and login, integration and application servers, distributed databases and third-party services could be rationalized. An organization wide security review could address the problem that currently all of these measures have known flaws and weaknesses, with new vulnerabilities exposed while existing threat vectors are addressed. Currently the surface area is too large for security assurance to be real.
The current paradigm of data in the wild, protected by a patchwork of technology services, some secure, some inherently insecure, has no real future in addressing the global security of data. Security is only as strong as the weakest link in the existing chain of application and network measures used to protect information. The global regulatory environment is rich in process, and poor in compliance and therefore security effectiveness.
Data has gone global, yet the definitions of use and abuse of information are completely different from one society to the next. There is misuse of personal data in every industry with machine learning algorithms from web behaviours, , including not only marketing, but also finance, social security, defence, civil administration and national security.
Initiatives such as GDPR in the European context, PCI-DSS in the finance industry are a good start, although as yet we have not found an effective method of addressing the root of data misuse - networking and application technologies are inherently vulnerable, and when linked together, even more so. The standards for accountability for data, while worthy, are not working in practice.
To continue to evolve the current generation of technology, a different paradigm is required to resolve this problem, as the level of financial misappropriation and vulnerability of essential infrastructure continues to grow.
All data originates from people in various roles – creator, author, publisher, distributor, manager, buyer, seller or end user of data. People engaged in data collection are as diverse as members of the public, small business operators, employees and consultants in public and private organizations. A multitude of technology applications are proliferating around access to data in the form of identity management and federation, authorization and authentication of data at rest and in motion. Personal, sensitive and classified data persists and proliferates across networks and databases, with varying strength cryptography, and all too often in plain text.
While we have many partial solutions to the problem of global data security, residency and accessibility, most technologies have known or potential security vulnerabilities, which when linked together into an end-to-end business technology solution, are insecure by nature. This situation can only intensify in view of the accelerating trend to network data across on premises traditional data center infrastructure, private and public clouds, using identity federation, while data is increasingly stored internationally.