Data Center.com

data integrity

By Stephen J. Bigelow

What is data integrity?

Data integrity is the assurance that digital information is uncorrupted and can only be accessed or modified by those authorized to do so.

Data integrity describes data that's kept complete, accurate, consistent and safe throughout its entire lifecycle in the following ways:

Data integrity is a broad discipline that influences how data is collected, stored, accessed and used. The idea of integrity is a central element of many regulatory compliance frameworks, such as the General Data Protection Regulation (GDPR).

Data integrity isn't a single product, platform or tool. Instead, it's a comprehensive environment that's created through an array of applicable standards, rules, processes and procedures that are implemented across an organization's infrastructure and observed by its employees, partners and users.

Data corruption occurs when any unwanted or unexpected changes to data take place during storage, access or processing -- all of which represent a failure or loss of data integrity. Data corruption can be caused by hardware failures, human error, a malicious action or failure of data security.

Why is data integrity important?

Where traditional businesses often focused on the construction and distribution of physical products, today's businesses typically prosper through the delivery of digital products and services. This transition demands an enormous amount of data, which has become the new raw material of the digital economy. This manifests itself in three major ways:

  1. Business analytics. A traditional axiom of early computing was garbage in/garbage out. This is certainly true of modern business analytics for business decision-making and product development. This makes data integrity critical to analytical results, as missing or inaccurate data might result in poor business decisions or product behaviors.
  2. Customer interactions. Businesses collect and use an enormous amount of customer data, including sensitive or personally identifiable data. Data integrity ensures that customers are treated correctly, such as receiving proper account crediting and reporting. Data security must keep that sensitive data safe from loss of theft.
  3. Compliance. Businesses are typically obligated to retain data for a period of time to ensure that business processes are followed in accordance with prevailing industry standards and government regulations. Data integrity is vital for complete, accurate and consistent reporting for all compliance purposes; otherwise, the business may be out of compliance and subject to fines and other legal remedies.

Consequently, data integrity fills the same essential role as any physical quality control effort needed within a traditional business, ensuring that the raw material is correct, secure and suited for its intended purpose.

Types of data integrity

Data integrity involves both physical and logical issues:

Physical integrity. This includes issues related to storing and retrieving data -- primarily the storage devices, memory components and any associated hardware. For example, if a hard drive or memory device is damaged, its stored data is unavoidably affected. There are many threats that can affect the data integrity of, or even damage, physical storage hardware:

Organizations can enhance data's physical integrity by implementing hardware infrastructure, including redundant storage subsystems such as RAID, with battery-protected write cache; using advanced error-correcting memory devices; implementing clustered and distributed file systems; and using error-detecting algorithms to detect data changes in transit. Organizations often adopt a variety of hardware devices and techniques to enhance data's physical integrity.

Logical integrity. Even when the hardware devices and infrastructure are working flawlessly, there are several considerations that affect the correctness or sensibility of data within its respective context. Does the data make sense, or has it changed unexpectedly? Logical integrity can be affected by poor software design and software bugs as well as human error and malfeasance. There are four principal types of logical integrity:

  1. Entity integrity. This ensures that no data element is repeated and that no critical data entry is blank or null. This is a common logical integrity consideration in relational database systems.
  2. Referential integrity. These rules define how data is stored and used in a database and that only authorized changes, additions or deletions can occur. These rules prevent duplicate data, ensure data accuracy or eliminate inapplicable data.
  3. Domain integrity. This reflects the format, type, amount and value range or scope of acceptable data values within a database. For example, if data is supposed to be numerical, an alphanumeric data element may be rejected.
  4. User-defined integrity. These are additional rules and constraints that are implemented in accordance with the organization's specific needs and aren't otherwise covered by the first three integrity types.

Physical and logical integrity are defined separately but can often be related. For example, a null data stream might violate logical or entity integrity, but the cause of the null data may be traced to a failed internet of things sensor.

What are data integrity risks?

Data integrity can be lost due to a variety of reasons:

The consequences of data integrity loss can range from a minor annoyance to a major business catastrophe -- depending on the amount of loss and the nature of the data involved. Business and technology leaders invest considerable time and resources to understand and prevent data integrity loss.

How to ensure data integrity compliance

Data integrity isn't a straightforward concept, and it can't be ensured with any single software tool or regulatory law. Rather, data integrity is a broad field of endeavor that involves people, processes, rules and various tools to provide guardrails and support. While there's no single universal solution for data integrity, there are numerous tactics that can help to build an environment that supports data integrity. Common tactics include the following:

Data integrity vs. data security vs. data quality

The terms integrity, security and quality are sometimes improperly used as interchangeable terms. Although the three ideas are closely related, they possess unique attributes that distinguish them from their companion terms.

Data quality refers to the reliability of data. Good quality data must be accurate, complete, unique with no duplicates and timely enough to be useful.

Data security is the infrastructure, tools and rules used to ensure that only authorized applications and users can access data; that the data is used in a business-compliant manner; and that data is preserved or backed up against loss, theft or malfeasance.

Data integrity then provides a broader umbrella that embraces aspects of data quality and security, ensuring proper retention, appropriate destruction, and adequate compliance with relevant industry and government regulations.

It's essential for a data quality strategy be aligned to an organization's core business goals. Learn about the six dimensions of data quality and how organizations can benefit from a data quality strategy.

08 Sep 2022

All Rights Reserved, Copyright 2000 - 2024, TechTarget | Read our Privacy Statement