Measuring Accuracy in Health Information Networks

How do you know that your scale at home is accurate? What about the accuracy of detecting contaminants in a glass of tap water? Or even, how accurate is a viral infection result.

Well, that all depends on what you mean by accurate.

If you received a 90% on a test, this means that the majority of your responses coincided with the commonly accepted correct answers--and a few did not. In most scientific scenarios, accuracy is compared to a true value, which is likely based on an industry consensus.

For each one of the aforementioned scenarios, there are ways to determine accuracy regardless of production methods. Precision weights measure the accuracy of a scale, while referential standards determine the specificity and sensitivity of results for both inorganic chemistry and infectious diseases. Without these engineered materials, entire industries would struggle with basic problems like manufacturing equipment within certain tolerances or consistently meeting FDA guidelines.

So what about healthcare data?

Healthcare data remains on a unique journey down the expanding road of interoperability but is still somewhat inaccessible for research due to the sensitive personal linkages it contains. This is both good from a privacy perspective, but also limiting at a point in which innovation in the broader technology community is managing large data sets with more efficiency and accuracy. This puts greater emphasis on Health IT professionals to draw on techniques from some of these other industries and apply it thoughtfully to healthcare data.

At the national level, organizations like TEFCA and its Qualified Health Information Networks (QHINs) are positioning to connect large networks and individual organizations. In California, a statewide data exchange framework seeks to connect regional HIOs and individual partners. Concurrently, more data types, such as social data elements, are being collected by both vendors and HIOs/HDUs alike with various levels of standardization.  However, despite an evergreen strive for interoperability, there is no accepted model to assess data quality as it passes through multiple networks. Fortunately, this problem is being solved in the machine learning industry, where large sets of data are used to train, test, and evaluate model outputs.

Some existing examples of large benchmark data sets are Big-bench or Words-in-Contex (WiC). These sets test arithmetic operations, assess rare word usages, and even semantic understanding. There are plenty of LinkedIn ‘experts’ to describe this process in further detail, however, the takeaway is that even AI models have expected outputs that can be tuned and fitted for greater accuracy.

Perhaps some of the largest data sets that may aggregate and transit across several networks would be for population health insights. But could you objectively assess data accuracy with disease data, immunization counts, or even cancer registries reported from different local health jurisdictions and transformed at various data intermediaries?

What is “accurate” for these scenarios?

Now, a few seasoned Health IT professionals might allude to USCDI+ at ONC which does have a quality component. But this frames a solution similar to that of the National Committee for Quality Assurance (NCQA)--code sets and data categories. While humans remain fascinated with classification systems, this endless pursuit of one standard to rule them all should be scorned heavily.

All code sets originate with some ambiguity and subjectivity--this is the fundamental flaw. Some users may only use one broad code for entire populations while others have organizational policies, and worse yet, some have ‘local’ codes that may not directly correspond to a LOINC/SNOMED/CPT code. Secondly, what is collected in a county or city does not always correspond to categories at the state level, and even less so at the federal level. Thus, higher fidelity information gets mapped to broader categories and it degrades throughout different databases and networks. An example of this might be ethnicity values collected at a local level but reduced to a few categories at the federal level. Other values, such as preferred language, religion, or functional limitations may be dropped completely depending on the data intermediary.

Lastly, some may argue the concept of 'direct reporting' to a large entity such as CDC over a secure API. However, this thought contagion incubates the idea of building more systems and processes to circumvent existing networks and leads to more siloed channels. Probably the greatest risk to these direct reporting scenarios is that many organizations would still need to configure and maintain an API to push messages to other sites, which may exceed their resource capabilities.

So, what is the solution?

Well, first, let’s assume that we want to use the existing networks in the country and not build a highway next to another highway to fix the problem. I believe the solution to be in two parts:

The first part I alluded to in a previous article about data standards is data provenance. This is a rather familiar concept within healthcare records and in HIOs/HDUs, but it starts to unravel when data sets are managed by multiple organizations. Organizations that transform and aggregate data should be assigned unique IDs so that the final receiver of a data set has visibility of the entire data supply chain, from the originators of the information along with the intermediate handlers. A unique ID reduces the computationally expensive metadata overhead that comes with traditional provenance headers. This concept is common in other technologies, such as blockchain, IP protocols, and insurance clearinghouses.

The second part of the solution attempts to identify how data may be distorted from its original source versus attempting to standardize every data category to the lowest level. Report receivers at high levels, such as the CDC, CMS, or state-level government agencies, should distribute ‘referential data sets’ to public health authorities and health systems to report back to them in the same manner they would report live data. In this method, a receiver could definitively measure the accuracy of information since the input--or true values--are known. This could be done in several ways: by region, by date range, or by a particular category. Additionally, the frequency in this testing process could be either time-based (quarterly) or event-based (change in data reporting process). The objective is not to identify data quality issues from other organizations, or to even compare, but to have some benchmark for data quality as it transits through large networks. Generally, a final receiving organization would be concerned with the percentage of data elements lost, any data enhancements or degradations, and the time it took to receive the data set.

When machines produce too much friction, engineers use lubricants to reduce the heat, they don't circumvent thermodynamics. When data is outside the expected parameters of a function, programmers add exception handling instead of designing for rare scenarios. We need to be more practical when it comes to managing large health data sets and rule out the one-size-fits-all solution. No matter the standard, data that leaves an organization and passes through several HIOs/HDUs will not remain in the same format, but it can still be accurate.

Comments

Popular Posts