Should You Use Open Access Data in Your Life Science Product Development?
We have entered the age of big data—the volume of data gathered in healthcare alone is becoming larger and more complex as the days go by. This wealth of information has sparked the rise of open access databases for public and commercial use around the globe. All this freely available data can be a boon to innovation in life sciences. However, the risks these databases present, and the approaches regulatory agencies take toward those risks, are important considerations for your organization.
Regulatory Precedent for Open Access
FDA has backed open access data as an important source of information for life science industries. In 2014 they launched openFDA, which provides APIs and raw, downloadable datasets. These datasets include information related to labeling, recalls, and adverse events. In 2015, they even issued a challenge for developers to analyze, model, and automate currently available information to evaluate and understand its impact.
Apart from openFDA, the agency also has a listing of databases at the industry’s disposal. Many of these relate to regulatory data FDA collects and manages, but are nonetheless advantageous for organizations doing preliminary market research and working on their design and development timelines. Some of these databases include:
- Approvals for 510(k)s, PMAs, and De Novo requests
- Medical device reporting
- Post-approval studies
- Recognized consensus standards
- Product life cycle data
FDA has taken further steps to encourage industry to leverage open access resources outside the agency. In April 2018, FDA released final guidance detailing approaches to using agency-recognized, public databases for genetic and genomic in vitro diagnostics. While this move is limited in scope, it represents a real shift in FDA’s position and reliance on open access data in their regulatory activities. By encouraging greater use of these resources, FDA appears to be bolstering industry efforts to build and market original, evidence-based, and clinically valid products.
Risks and Challenges
Open access data can be incredibly helpful for life science organizations making new and advanced products. However, it also presents a handful of challenges and risks for your teams to evaluate and control. In order to confirm your product’s safety and effectiveness when open access data is used, there are two big risk areas to focus on: data integrity and accuracy, and built-in biases.
Data Integrity & Accuracy
With open access, everything is laid out and accessible. In addition, information can be submitted to databases by just about anyone. ClinGen, for example, allows individual patients to upload their genetic and health information into its database. The accuracy and fidelity of this data, therefore, is prone to a number of human factors. These can include cybersecurity risks; malicious actors can alter, delete, or block off access to critical information within the database.
While some databases have controls in place for ensuring accuracy and integrity of submitted data, this is still an overall issue for life science organizations. Inaccurate data can skew results, guide development down inappropriate paths, and potentially expose your users and patients to undue hazards and harms.
Built-In Biases
Datasets are fairly susceptible to human biases. Sometimes information remains absent from a database because participants fear an adverse effect of reporting information; other times, information can be over-reported. There are also general concerns around accessibility to reporting and sampling bias that can impact a given dataset. While open access does offer more sources of data to be compiled in a centralized location, that does not mean that information is fully free of human bias.
This can roll into an issue known as algorithmic discrimination: the risk of data analysis algorithms producing biased results. While more of a concern for public policy in the context of discriminating against certain groups of people, it can absolutely impact life science industries. Any algorithms constructed to analyze data biased toward certain segments of your product’s intended population can generate skewed understanding. This is neither helpful nor beneficial to you, your users and patients, or regulators.
Controlling Open Access Risks
It’s a near-impossible task for your life science organization to control all the risks of open access data. Because you’re not authoring the data, its veracity and validity in the scope of your product development cannot be 100 percent guaranteed. However, for data you decide to incorporate, being able to identify these issues and implement controls can be valuable.
Linking raw information to demographic data, for example, can provide context and generate insights that can be used in your product’s development. Looking at the database’s data controls, as well as identifying measures your organization can undertake to verify information, can be useful for risk control too. And, using FDA-recognized databases also helps you ensure integrity and accuracy. By identifying and controlling these risks early on, you can save significant time in later stages.
About Cognition Corporation
At Cognition, our goal is to provide medical device and pharmaceutical companies with collaborative solutions to the compliance problems they face every day, allowing the customer to focus on their products rather than the system used to create them. We know we are successful when our customers have seamlessly integrated a quality system, making day-to-day compliance effortless and freeing up resources to focus on product safety and efficacy.