Newsletter
SubscribeBy Bishakha Mona
In this first part of a two-part article, the Alliance looks at a powerful new funding opportunity—open data platforms. We examine the potential and pitfalls of investing in these platforms and how they might influence finding cures at a faster pace.
Open data has long been a wilderness which many are now seeking to tame. The concurrent rise of Web 2.0 and social media created a “sea change in the evolution of open data as a research resource.” 1 The vastness of this data has created an informational treasure trove that multiplies and proliferates not only through diverse industries and sectors but also through the sciences. Traditional communications between researchers often included face-to-face interaction as well as (written) letter, and conferences. While these allowed for some collaboration, geographical barriers and limited technologies restricted the pace of scientific discovery. However, as technologies evolved, they provided scientists with newer avenues to create greater dialogue among multiple researchers accelerating advances in sciences– unlocking even more data.
Science Philanthropy Alliance President, France Cordova noted, “The sheer volume and complexity of data available holds immense potential to transform the landscape of scientific inquiry—from unravelling the mysteries of the cosmos to deciphering the intricacies of the human brain. AI, with its capability to sort through and discern patterns in vast amounts of seemingly disparate data, may be a formidable ally in our quest for knowledge.”
Open data platforms are online repositories that provide access to scientific data, often in raw or minimally processed form for free and with some protections. These platforms encourage the sharing and disseminating of scientific research data globally, allowing researchers to reuse and build upon existing data accelerating scientific discovery further. As a result, the Alliance has identified investment in open data platforms as an important emerging trend ripe for philanthropic investing.
The Possibilities in Unlocking Overlooked Data
Several funders have now realized the promise of open science and request or require unused and negative data to be shared openly. In some cases, surplus data is required to be shared earlier – uploaded to public databases, published preprints – before a paper is officially reviewed and accepted for publication. The research enterprise which previously kept data away from the press and the public until key findings were peer-reviewed and officially published has shifted its stance. Likewise, most of the experimental data that did not directly support key findings were never shared. Unlocking overlooked data may lead to more independent analyses and more favorable outcomes.
Open Data Platforms: Powerful and Versatile
Open data platforms are powerful and versatile tools that provide at the very least “parking space” to store data that can be accessed, processed, and analyzed by others. Open data platforms can range from being just parking spaces to active aggregator sites that can make data discoverable and requestable on a web page. The push for open data was first driven by public and philanthropic funding and is now mandated by the U.S. government for public funding institutions. Philanthropists have funded open data platforms to improve data “interoperability” for wider use and support other open science endeavors. Interoperability is the key to open data platforms whose formatting, protocols, and standardized data to make it accessible to all stakeholders.
Wellcome is a pioneering organization for open science and has funded several initiatives like the International COVID19 Data Alliance (ICODA), the UK Biobank, a repository of both data and tissue samples, and the Human Cell Atlas. “Open data platforms enable researchers to take advantage of increased scale through digital advances by operating at a greater speed than they could manually reviewing individual files,” said Hannah Hope of Wellcome. “We believe open data platforms are important in expanding who has access to research data.”
Nowadays, scientists are happy to share their data on open science platforms, but it is not an easy feat. Focusing only on the biomedical field, the volume, velocity, and variety of data generated is massive. Data is produced from various sequencing approaches, imaging technology, screening experiments, and more recently, wearable devices. The abundance of available and unused data today has the potential to help us understand and solve some of humanity’s biggest questions on a scale that was impossible previously.
Even when data is shared, it sometimes cannot be utilized because methods of collection and interpretation are unclear; or there is insufficient metadata describing the data, or there is a lack of access to the platform hosting the data.
“Given that one can generate mountains of data today, one of the challenges everyone in the field has is how best to make it available. What is the curation plan?” said Kathryn Richmond, executive vice president at the Allen Institute who leads the Office of Science and Technology and the Paul G. Allen Frontiers Group. As one of the earliest philanthropies to fund both the creation of open data and open data platforms, the Frontiers Group witnessed the initial pushback for open science and now works and funds collaboratively with consortiums as more efforts are embracing open data. Richmond emphasized that creators and funders must constantly think “not only about how to make the data more easily available but also how do we update the data? How long does one support the platform?”
The Chan Zuckerberg Initiative (CZI) is another early funder of The Human Cell Atlas, a coordinated effort of scientists seeking to create reference maps of all cells in the human body. Their team of engineers, computational biologists, UX researchers, and grantmakers also saw an opportunity to make the data from cell atlases more accessible to researchers in the field. CZI has supported a data platform for the HCA community that serves as a mechanism of collecting diverse contributions of raw data. Furthermore, CZI developed and launched Chan Zuckerberg CELL by GENE Chan Zuckerberg CELL by GENE (CZ CELLxGENE) a few short years ago and today, the platform is home to the world’s largest collection of standardized single-cell data — including datasets from many of cell atlases CZI has funded. CELLxGENE collects, curates, and makes processed data generated by labs around the world easily accessible. “The generation of biologically important sources has given us a closer look at health and disease,” said Jonah Cool, Senior Science Program Officer at CZI. “Sharing these datasets openly with the research community has the potential to unlock new discoveries about human biology, especially when you consider recent advances in AI and large language learning models.”
Funders and Philanthropists like CZI, Wellcome and the Paul G. Allen Frontiers Group are on the cutting edge of what’s possible. In our second article, we will look at how transforming data into knowledge and insights into action plans can create a future where open data platforms drive informed decisions and meaningful scientific discovery.
[1] Going Big: The Story Of Open Data, Nov 23, 2023, Forbes
#####################################
In Part Two of our article next week, we will look at the creative approaches some organizations are taking in solving the challenges of open data, and the benefits that can ensue.