We have entered the age of big data—the volume of data gathered in healthcare alone is becoming larger and more complex as the days go by. This wealth of information has sparked the rise of open access databases for public and commercial use around the globe. All this freely available data can be a boon to innovation in life sciences. However, the risks these databases present, and the approaches regulatory agencies take toward those risks, are important considerations for your organization.
FDA has backed open access data as an important source of information for life science industries. In 2014 they launched openFDA, which provides APIs and raw, downloadable datasets. These datasets include information related to labeling, recalls, and adverse events. In 2015, they even issued a challenge for developers to analyze, model, and automate currently available information to evaluate and understand its impact.
FDA has taken further steps to encourage industry to leverage open access resources outside the agency. In April 2018, FDA released final guidance detailing approaches to using agency-recognized, public databases for genetic and genomic in vitro diagnostics. While this move is limited in scope, it represents a real shift in FDA’s position and reliance on open access data in their regulatory activities. By encouraging greater use of these resources, FDA appears to be bolstering industry efforts to build and market original, evidence-based, and clinically valid products.
Open access data can be incredibly helpful for life science organizations making new and advanced products. However, it also presents a handful of challenges and risks for your teams to evaluate and control. In order to confirm your product’s safety and effectiveness when open access data is used, there are two big risk areas to focus on: data integrity and accuracy, and built-in biases.
While some databases have controls in place for ensuring accuracy and integrity of submitted data, this is still an overall issue for life science organizations. Inaccurate data can skew results, guide development down inappropriate paths, and potentially expose your users and patients to undue hazards and harms.
Datasets are fairly susceptible to human biases. Sometimes information remains absent from a database because participants fear an adverse effect of reporting information; other times, information can be over-reported. There are also general concerns around accessibility to reporting and sampling bias that can impact a given dataset. While open access does offer more sources of data to be compiled in a centralized location, that does not mean that information is fully free of human bias.
It’s a near-impossible task for your life science organization to control all the risks of open access data. Because you’re not authoring the data, its veracity and validity in the scope of your product development cannot be 100 percent guaranteed. However, for data you decide to incorporate, being able to identify these issues and implement controls can be valuable.
Linking raw information to demographic data, for example, can provide context and generate insights that can be used in your product’s development. Looking at the database’s data controls, as well as identifying measures your organization can undertake to verify information, can be useful for risk control too. And, using FDA-recognized databases also helps you ensure integrity and accuracy. By identifying and controlling these risks early on, you can save significant time in later stages.