Eight important concepts when scoring medical data

The Institutional Review Board (IRB), the Health Insurance Portability and Accountability Act (HIPAA) in the United States, and General Data Protection Regulation (GDPR) in the European Union, impose significant compliance standards on a High Speed Analytics Solution for medical applications that don’t exist in non-Medical Analytics Solution. Because of critical issues with efficacy, we recommend hiring an expert and using purpose-built tools that are more efficient than after-market or open-source tools. 

A non-medical data company can secure data without the medical requirements. Failures, while embarrassing for the company, do not generally result in fines, industry censure or forced shutdown of the operation. On the other hand, the Medical Data Science standard is much higher and affects storage, use security and distribution of the data.

All medical data science that involves patient data is subject to an IRB that reviews usage, verifies efficacy, and considers security of patient data. Improper handling of data can shut down a project based on the IRB’s concerns about invasion of privacy and patient security. An IRB issue can result in a shutdown of an otherwise promising medical development simply because engineers are used to using tools that encourage “ad-hoc” data usage — resulting in efficacy and security concerns within the IRB. The outcome is FDA approval failure — ending the analytics and the product (the subject of the analytics). Getting the correct people with the correct tools is critical to success.

With a grasp of the following terms, you’re well on your way to understanding the complexities of managing and budgeting a medical research project:

  • Human Subjects
  • Animal Subjects
  • Hypothesis Approval
  • Personally Identifiable Information
  • De-Identified Information
  • Standard Operating Procedure
  • Data Chain of Custody
  • Platform Certification

Human Subject

medical human subject

Any medical research — even as simple as a step-measuring wrist watch — involves a human subject. As soon as a company wishes to make a medically significant tool, a human subject is involved and FDA approved rules kick in. An IRB is formed for the project that demonstrate the IRB’s independence from the financial and business concerns of the research. The IRB reviews the research, analytics and data to insure that the risks to the human subjects do not outweigh the benefits of the research outcome.

Any company can make a step counter (or other medical device). To use the counter in a medical context requires following FDA, IRB and efficacy standards that impact how the data is accessed, how the reports are defined, how the data is stored and a record of who sees the data. These changes represent both a development and management challenge. Traditional methods of data-analytics will chose efficiency over efficacy and access and analysis over security. These tradeoffs will not satisfy the IRB for human subjects in medical research.

Medical Animal Subject

animal subject

Animal subjects are being treated with more humanity every day and many medical studies today involve animal subjects. An animal subject may be a precursor to a human subject or it may be an end unto veterinary research. The standards for animal research are rapidly approaching those of humans. The main difference is that data from animal subjects is not required to comply with security and privacy constraints of human subjects. Companies, however, generally expect the same security of data for animal subjects as human subjects — as the data is still mission critical to the company’s success.

Hypothesis Approval

hypothesis approval data

When medical research is involved, an engineer must generally get IRB approval before executing a query on patient data. This is because the efficacy (in terms of risk to the patient’s security and privacy) of the data use must be established before the query is conducted. This complexity is difficult for experienced non-medical data scientists to grasp. The industry standard is to (1) gather the data, (2) perform ad-hoc queries on it to discover key details. Tools like ElasticSearch and Hadoop are “ad-hoc” query tools — designed to gather data and then query it. 

Traditional method of compiling the data and then executing “ad-hoc” queries are ethical issues related to invasion of patient privacy and security — patients are protected by HIPAA and GDPR from these approaches. Thus, analytical queries are defined in advance, approved by the IRB and must demonstrate the medical efficacy.

In general, analytic systems do not generally protect patient privacy. An efficient tool can not only use these restrictions to their advantage, as the Painted Streams product does, but without the correct tools, projects can fail. Painted Streams, by Painted Intelligence, specifically handles data gathering for a hypothesis testing by optimizing the pre-defined nature of the data query. Painted Streams gives you instant information in an FMEA (Failure Mode and Effects Analysis) informing you hours or days before a traditional gather, process and store system. Working with an efficient tool designed to solve your most difficult problems reduces labor and frustration and sometimes saves projects.

Personally Identifiable Information (PII)

pii data analytics

Personally Identifiable Information is any piece of information that connects medical data to a human subject. Examples of PII are a name, a social security number (or other government ID), an address, and in some cases the medical data, like a fingerprint, body image or DNA record qualifies as PII1.

De-Identified information

De-Identified information is information that is organized in such a way that it can not be associated with the human subject that originated it. It is important to remember that HIPAA standards are vague on De-identified information but best practices generally require use of de-identified information only with the patient’s consent–even if the patient is no longer associated with the data, they still “own” it. Many organizations follow a Best Use policy that takes all international laws into account so that they don’t violate them as the company or product expands.

Standard Operating Procedures (SOPs)

A company’s best practices or best use are generally codified in a series of SOPs that clarify the companies practice in specific situations. SOPs can be quite long and at times bureaucratic. Changing them can require review of all outstanding IRBs and other governance bodies. Working with experienced engineers who understand both the medical use certification process and the specific process in your organizations SOPs can be very important in a successful outcome.

Data Chain of Custody

A concept called “Chain of Custody” provides a structure for describing data on servers, networks and client machines. The structured concept for the review of data on and in between systems is critical to looking at the threats, exposure, encryption, security of the data to help the IRB determine the patients’ security and privacy exposure at any given point in the system.

A chain of custody can be quite complicated. Take the simple step tracking device mentioned earlier–Nokia makes a medically approved step tracking device (Nokia Withings Watch). The watch produces data, sends it to a smart phone via bluetooth, the phone sends it to a Nokia server, a 3rd party (medical group) can then use then access that Nokia server and put data on their own server, analyse it and present it to a doctor/researcher and or deposit it in a medical record on an additional server. Pictured, the flow of data through the chain of custody looks like this:

data chain of custody

In this simplified diagram, there are 7 platforms that the data generated by the watch can end up on. There are also 8 networks that the data goes through. For each of these, the IRB will need to know and approve the PII accompanied with the data, the encryption associated with the data and protection methods and certification of each system and network the data is on. Given that a particular entity may be dealing with multiple devices, multiple collection systems, multiple data and analytic servers and the like, this can grow exponentially complicated.

Having the right leader in place to communicate this architecture to an IRB and neatly outline all the complex aspects of the network, encryption, storage and the like is a key to a projects success through IRB management and ultimately FDA approval.

Platform Certification

platform certification

Every platform that handles patient data requires certification. Some companies elect to go with an AWS style cloud server and get their certification on their own. Other companies choose to use services like Aptible Enclave which already comes with a number of certifications (or “pass audits). In the Aptible Enclave case, it is certified with: HIPAA, ISO 27001, SOC 2, FISMA and FERPA and has passed several industry audits from Genentech, Roche, Amgen and others.


The advanced requirements of Medical Data Analytics requires people and tools designed for its unique requirements.  For more information about managing the complex world of Medical Data Science and High Speed Analytics platform, contact the experts at Telegraph Hill Software:

David Urry
Telegraph Hill Software
535 Mission St, San Francisco, CA 94105

1 https://www.varonis.com/blog/is-dna-really-personally-identifiable-information-pii-no-maybe-yes/