Cornerstone AI White Paper Titled Clean Data Without the Time and Headaches

How Cornerstone AI gets to cleaner data faster

Take a deeper dive into the technical and statistical details of how our AI assistant produces explainable, clinically relevant data cleaning rules by downloading our White Paper.

A clean clinical dataset in days

Cornerstone AI dramatically reduces the time to clean real-world clinical datasets while increasing data quality and providing explainable rules so you and downstream teams can have confidence in the output.

  • Two stacks of data with mixed colors signaling messy and dirty data.

    What You Have:

    Clinical datasets or pipelines that are costing your team time and effort to clean and prepare. EHR, registry, digital health, claims, clinical trial, and sensor data are all supported by our platform.

  • A stack of messy data is shown to be cleaned and organized.

    What We Do:

    Our AI assistant scans each table and data point, inferring structure, relationships, and validity. Then, it automatically creates data cleaning rules to organize the data, standardize values, and identify clinical and database errors.

  • An image depicts a data quality report

    What You Get:

    An instant data quality report, highlighting the top issues in your data, automated or UI-based correction of those issues, easily understood explanations of findings, and an audit trail of all changes throughout the process.

  • A lock indicating the data is secure

    What We DON'T Do:

    We do not keep, aggregate, or resell your data. Your data is yours and is only used for you. We are HIPAA compliant and adhere to strict data use agreements.

Raw data is all you need to get started

Cornerstone AI takes in raw clinical datasets, runs them through various structuring, standardization and error detection algorithms to provide a clean, standardized, analysis-ready dataset. Here’s how it works.

Bring Raw Data

Raw and messy data is shown in the image

Import your raw dataset (or multiple datasets, if you want to combine them into a single dataset) via our user interface or API, and sit back as our algorithms get to work.

Cornerstone AI logo

Structure, Standardize, and Detect Errors

After data import, the Cornerstone AI algorithms run across the dataset to understand and contextualize the data structure, standardize to dictionaries, and detect errors and anomalies in the data. High dimensional AI/ML models learn the patterns across the entire dataset allowing it to automatically generate unique clinically relevant cleaning rules for each dataset and patient population.

Cornerstone AI logo

Review & Annotate

Data processing is completed and ready for review

After the algorithms finish, you can review the errors they found in the dataset and the text standardization they performed. Our goal is to surface the issues so you don’t have to dig for them. Details of why items were flagged as errors will be provided, but you have the final say in what you do with the data. A human-in-the-loop workflow allows easy review of the AI-generated results in an intuitive web application for both data scientists and clinical experts.

Improve Algorithms & Track Changes

Any corrections you make feed directly into our algorithms, continuously improving them over time, to reduce the work for you and your team.

Anything changed throughout the process is logged to an audit trail so you and your teams have full traceability of data updates.

Clean, audited data is ready to be exported

Export Clean Data

Your data is now harmonized, clean, and ready for analysis. Export the dataset, explainable rules, and audit trail and start finding answers in your data.

Leave the data cleaning to Cornerstone AI

Cornerstone algorithms automatically detect your dataset’s structure and profiles the data quality so you get a high-level understanding of the usability of your data. Then, Cornerstone AI standardizes raw text and medical codes and constructs models of every data point in your dataset to detect clinical errors and, optionally, impute missing data.

Data Profiling

Image indicates data can be automatically structured

Automatic Structure Detection

The Cornerstone AI application intelligently parses through all of the fields in a dataset and determines relationships between fields. It then determines if the field represents a date, a unit, a code that maps to a medical dictionary, or anything in between.

Image depicting multiple sources harmonizing into a single source

Multi-Source Harmonization

Looking to take multiple data sources, combine them, and find answers in the compiled dataset? No problem - the Cornerstone AI application automatically compares the datasets and stacks them together so you don’t have to complete any manual mapping.

Image representing a data quality scorecard from data that's been processed

Data Quality Score

From the start, the application will give you a visualization of how good the data is across the entire dataset, and then within individual tables and records. Focus your time on areas that have the most issues so you can clean the data more efficiently.

Data Cleaning

Magnifying glass showing error identification

Error Identification & Correction

Your data is mostly correct; the problem is that issues are hidden across the hundreds or thousands of fields. To automate identification of these issues, we model every data point to detect errors, surface issues, and make corrections when possible. Common errors include unit mismatches (inches instead of centimeters), date swaps, biologically implausible data, or inconsistencies across fields.

A stack of documents represent text and codes being standardized

Text & Code Standardization

We all know some of the juiciest information is in unstandardized text format. Our Standardization Module automatically scans every field and detects which would benefit from standardization. We currently standardize diagnoses (ICD-10, SNOMED), procedures (CPT), labs (LOINC) and many more.

Data that was previously missing is shown to be imputed

Missing Data Imputation

If desired, missing data can be intelligently filled in using imputation methods specialized by data type. Accuracy metrics allow you to balance your need for completeness with required precision. Data can be easily exported with and without these imputations.

Data Integrity

A  medical shield represents healthcare compliance

HIPAA Compliance

Cornerstone AI focuses exclusively on healthcare data. We know and understand the stringent rules and regulations around data security and adhere strictly to HIPAA and other data use agreements.

A circle showing all items are recorded in an audit trail

Audit Trail

All actions by our algorithms, as well as any specifications made by users, are tracked in a regulatory-grade audit trail. Detailed change logs with explanations are available for export at any time. 

A cloud with a lock on it represents security

On Prem or Cornerstone Hosted

Healthcare data is sensitive and we understand the organizational rigor around protecting PHI. Cornerstone AI is a secure, cloud-first solution, but we can deploy on your own cloud environment if keeping data behind your firewall is a requirement.

Error Identification Use Cases

Cornerstone AI’s error identification algorithms build models of every data point and can detect a wide range of clinical errors within real-world datasets.

Want to dig into the details of the use cases?

Standardization Use Cases

Data table showing standardization of raw data into a hierarchical schema

Whether you call it standardization, harmonization, or normalization, clinical data is often unstandardized and inconsistent, which makes analysis difficult. Cornerstone AI standardizes raw text data into controlled clinical vocabularies including diagnoses (ICD-10, SNOMED), procedures (CPT), labs (LOINC) and many more. Below are some common standardization use cases.

Problem: No Coding System.

Solution: We standardize raw data into medical dictionaries (e.g., LOINC, ICD-10, CDISC, etc.)

Cornerstone AI automatically profiles datasets and identifies which records are able to be standardized to a medical dictionary. Once determined, the application will use dictionary matching and natural language processing (NLP) to assign accurate codes to the records.

Data table showing standardization of raw data
Data table showing standardization of multiple systems into ICD-10

Problem: Multiple Coding Systems.

Solution: We provide crosswalks between various coding systems.

Cornerstone AI converts standardized terms between coding systems to ensure clinically similar information can be analyzed holistically. Common mappings include historical diagnoses in ICD-9 that need to be updated to ICD-10.

Problem: Detailed standardization creates too much noise.

Solution: We augment standardized terms with the relevant hierarchy for optimal grouping.

Coding systems are complex and some commonly used systems have tens of thousands of unique terms. This allows for precise terms to be selected at the point of care; however, this level of precision is often too granular for the questions and applications which purchasers of EMR data intend to solve. 

Cornerstone maintains endpoints which utilize the hierarchical structure of dictionaries and other common disease classification systems to allow for the aggregation of coded data into a manageable number of clinically meaningful categories.

Integration Options

Image depicting the user interface

UI

Review data reports, adjudicate discovered errors and optimize algorithm configurations to best suit your needs through our web interface (deployed on-premises or hosted by us).

Image depicting an API integration

APIs

Need real-time data profiling, error detection and standardization? We offer APIs that enable you to plug our algorithms directly into your data pipeline.

Frequently Asked Questions

  • Cornerstone is HIPAA compliant and performs penetration testing regularly to ensure data privacy and security.

    Our continually maintained security program documents and implements physical, administrative, and technical safeguards necessary to protect any confidential or patient-related information shared by customers.

  • The Cornerstone system supports clinical data related to any indication. The Cornerstone AI algorithms are self-learning and therefore indication-agnostic. Types of clinical data we’ve worked with include Rheumatoid Arthritis, Infertility, Oncology, Knee Surgery Recovery, COVID, Alzheimer's Disease, and healthy control cohorts.

    Our platform is built for and has demonstrated success with Real World Data (RWD) as well as data from patient registries, and Phase II, III, and IV clinical trials.

    We’ve helped improve data quality in datasets of less than 100 patients to more than 100,000 patients.

Get better data, faster.

Let’s chat about how Cornerstone can help your team reach its data goals.