Data Collection in Phase II-III Clinical Research – It is Ripe for Disruption?

Current data collection practices in clinical research are facing many challenges. There is a broad recognition that issues prevail with the current clinical research data collection processes and systems.

Industry leaders, such as Janet Woodcock, of the FDA’s Centre for Drug Evaluation and Research claim that “the clinical trials system is “broken” and there needs to be new ways to collect and utilise patient data” (Woodcock, 2017).

In a survey of over 300 industry professionals, to assess the barriers in the delivery of clinical trials, over two thirds of responders identified challenges in the lack of data visibility and the inability to interrogate related issues in real-time. Too many disparate data sources made it difficult to foresee issues, and the majority favoured the development of unified data platforms such as interactive dashboards for analytics, and the automation of key performance indicators and alerts. All responders identified data quality as their biggest challenge (Hublou, 2016).


The lack of generalizability (Moore, et al., 2000) of clinical research results to broader populations is a concern, and the belief prevails that the data collected is insufficient and not being used in the right way (Frieden, 2017).

There is a growing recognition of the value of secondary use of real world data. Several real world data sources for use in phase II-III trials have emerged in recent years to complement the eCRF data, such EHR, electronic patient reported outcomes (ePRO) and data from wearables. The FDA is actively encouraging the use of real-world data (FDA, 2018) from sources such as electronic health records (EHRs), claims and billing data and patient registries (as represented in figure 1.1: Sources of Real World Data (Clinical Research Corporation, 2017).

There have been successful deployments of EHR data for late phase trials, but EHR data is not yet commonly used in phase II-III trials as the primary research data. One of the main challenges is the standardisation of data. In the US alone, there are over 250 EHR system providers, and the inability to share EHR data across platforms has raised both security and safety concerns (Siemens, 2016).

In parallel, while commercially available EDC systems offer a secure, industry-recognised workflow, (Shah, et al., 2010) there are challenges in loading third party data into EDC systems, and on average, six separate systems are used to collect data in any one clinical trial (Tufts, 2018).

For EHR data to be useful in clinical research there needs to be an understanding of what the data means. In addition, data variables or attributes need to be defined across the two domains of patient care and clinical research for the data to have real purpose in both domains (Sinaci & Erturkmen, 2013). Recommendations include narrowing the focus on a defined set of variables that are common to EHRs and clinical data collection systems. So focusing on an EHR variable such as mortality, which tends to be less open to interpretation, is seen as the first step in the re-use of EHR data for phase II-III trials.


The growing number of technology solutions (Byers, 2017) coupled with growing data volumes (Bhadani & Jothimani, 2016) are creating opportunities in how health data is used. Advances in technology are likely to transform the clinical trial data collection process. While applied machine learning and robotic process optimisation may improve overall trial feasibility processes by introducing efficiency in how site data is assessed and analysed (Anisingaraju, 2017). New technologies such as artificial intelligence and blockchain can make data sharing more efficient (Anisingaraju, 2017) and secure (Nugent, et al., 2016), but are still at the explorative stages.

The clinical landscape is shifting, and the crossover from healthcare to clinical research will drive change. Institutions are currently reluctant to share EHR data, but the influence of the patient may change this practice. The voice of the patient as an educated and informed stakeholder is an important step in driving the use of health data clinical research (Cowie, et al., 2017). Various initiatives where patients were asked to opt-in to share their data for research purposes have been successful in both Salford, UK and in Sweden (New, et al., 2018), which may refute concerns related to data privacy and consent, especially where patients are invited as equal parties to the research or have a vested interest in exploring new therapies.

The conservative nature of the clinical research industry will mean this change will be by stealth. The process continues to be stifled by regulation, and there is a reluctance to change clinical trial data collection methods for fear of falling foul of the regulator.


There remains a need to collect data in a prescribed and controlled way for clinical research, and in the intermediate term, it is unlikely that wearables, sensors and EHR data will replace existing EDC systems, but it is likely that EHR and other real world data sources will further complement and begin to supplement certain variables used in clinical research. In the longer term, the convergence of EHR and EDC systems is likely. Whether this means that EDC consume EHR technologies (or vice versa) is yet unclear. The clinical research industry will not change itself, and despite efforts to transform the industry from within, it is likely to be replaced by new players in the market such as Google Health (Google, 2018).

In 1969, Greenes et al stated that “increasing activity in the use of computers for acquisition, storage, and retrieval of medical information has been stimulated by the growing complexity of medical care, and the need for standardisation, quality control, and retrievability of clinical data” (Greenes, et al., 1969).

50 years later, this statement is still applicable, and it points to the slow pace at which change happens in how health data is organised. It is likely that the pace of digitisation will continue to speed up, but the clinical research industry will continue to evolve cautiously.The clinical research industry is ripe for disruption.

Works Cited & Bibliography
Anisingaraju, S., 2017. Genetic Engineering & Biotechnology News. [Online]
Available at: https://www.genengnews.com/gen-exclusives/optimizing-clinical-trials/77900933
[Accessed 21 June 2018].
Bhadani, A. & Jothimani, D., 2016. Big Data: Challenges, Opportunities, and Realities. In: M. In Singh & D. Kumar, eds. Effective Big Data Management and Opportunities for Implementation. Pennsylvania: UNK, pp. 1-24.
Byers, C., 2017. The Growing Importance of IT in Healthcare. [Online]
Available at: https://mytechdecisions.com/it-infrastructure/growing-importance-healthcare/
[Accessed 09 June 2018].
Clinical Research Corporation, 2017. Clinical Research Corporation. [Online]
Available at: http://crcaustralia.com/media-releases/real-world-data/
[Accessed 31 May 2018].
Cowie, M. et al., 2017. Electronic health records to facilitate clinical research. Springer, 106(Unk), pp. 1-9.
FDA, 2018. Real World Evidence. [Online]
Available at: https://www.fda.gov/ScienceResearch/SpecialTopics/RealWorldEvidence/default.htm
[Accessed 31 May 2018].
Frieden, T., 2017. Evidence for Health Decision Making — Beyond Randomized, Controlled Trials. The New England Journal of Medicine, 377(UNK), pp. 465-475.
Google, 2018. Google rolls out a new tool to help health providers solve the medical record mess. [Online]
Available at: https://www.cnbc.com/2018/03/05/google-cloud-healthcare-api-to-address-medical-reord-interoperability.html
[Accessed 4 May 2018].
Greenes, R. A., Pappalardoab, N., Marble, C. W. & Barnett, G. O., 1969. Design and implementation of a clinical data management system. Computers and Biomedical Research, 2(5), pp. 469-485.
Hublou, R., 2016. Pharmaceutical Processing. [Online]
Available at: https://www.pharmpro.com/article/2016/09/biggest-challenges-delivering-clinical-trials-time-within-budget
[Accessed 22 November 2017].
Moore, D. et al., 2000. How generalizable are the results of large randomized controlled trials of antiretroviral therapy?. HIV Medicine, 1(UNK), pp. 149-154.
New, J. et al., 2018. Putting patients in control of data from electronic health records. BMJ, 360(5554), p. UNK.
Nugent, T., Upton, D. & Cimpoesu, M., 2016. Improving data transparency in clinical trials using blockchain smart contracts. F1000 Res, 1(5), p. 2541.
Shah, J. et al., 2010. Electronic Data Capture for Registries and Clinical Trials in Orthopaedic Surgery: Open Source versus Commercial Systems. Clinical Orthopaedics and Related Research, 468(10), pp. 2664-2671.
Siemens, T., 2016. How Will Technology Drive Global Clinical Trial Change by 2025?. Applied Clinical Trials, 24(12), p. 42.
Sinaci, A. & Erturkmen, G., 2013. A federated semantic metadata registry framework for enabling Interoperability across clinical research and care domains. Journal of Biomedical Informatics, 48(UNK), pp. 784-794.
Tufts, 2018. Tufts Centre for the Study of Drug Development. [Online]
Available at: http://csdd.tufts.edu/news/complete_story/pr_ir_jan_feb_2018
[Accessed 9 January 2018].
Woodcock, J., 2017. Workshop at the National Academies of Sciences, Engineering, and Medicine. Kansas, Endpoints News.