GovLab Blog website_feature

How Data Can Map and Make Racial Inequality More Visible (If Done Responsibly)

The piece is supplemented by a crowdsourced listing of Data-Driven Efforts to Address Racial Inequality.


  • The GovLab developed this living reflection document with diverse input from our network to help identify the opportunities, risks, challenges, and lessons about the use of data to make racial inequalities more visible and the ways it may be systematically and collaboratively countered.
  • The document also serves as our contribution to New York City’s Racial Inclusion & Equity Task Force Data/Research Subcommittee. We hope that it can provide value to the Subcommittee’s deliberations and agenda setting.
  • We share this document not as a finalized list of recommended priorities or practices but as a tool for deliberation on and assessment of data’s role in racial justice.
  • We have additionally assembled a list of data-driven organizations working on racial inequality-related issues here.
  • For any reactions, concerns, suggestions, and recommendations: contact Stefaan G. Verhulst, Co-Founder of The GovLab at sverhulst @


Institutions need to take meaningful action to address such demands. Though racism is not experienced in the same way by all communities of color, policymakers must respond to the anxieties and apprehensions of Black people as well as those of communities of color more generally. This work will require institutions and individuals to reflect on how they may be complicit in perpetuating structural and systematic inequalities and harm and to ask better questions about the inequities that exist in society (laid bare in both recent acts of violence and in racial disadvantages in health outcomes during the ongoing COVID-19 crisis). This work is necessary but unlikely to be easy. As Rashida Richardson, Director of Policy Research at the AI Now Institute at NYU notes:

“Social and political stratifications also persist and worsen because they are embedded into our social and legal systems and structures. Thus, it is difficult for most people to see and understand how bias and inequalities have been automated or operationalized over time.”

We believe progress can be made, at least in part, through responsible data access and analysis, including increased availability of (disaggregated) data through data collaboration. Of course, data is only one part of the overall picture, and we make no claims that data alone can solve such deeply entrenched problems. Nonetheless, data can have an impact by making inequalities resulting from racism more quantifiable and inaction less excusable.

In seeking to reflect upon ways that data can make a difference, The GovLab used arapid-research methodology to compile a list of topic areas below where data and data analysis could help illustrate where racial inequality exists in the United States and support evidence-based efforts to promote equity. Given recent events, it focuses mainly on how racism harms Black communities.

Resulting projects might use data to improve existing policies, identifying those that are reductive or unable to address systemic failures. By using data to improve situational awareness of a problem, identify causes and effects in racist incidents, and predict outcomes or assessing policy impact, those committed to anti-racism can develop better solutions to the challenges Black communities face every day.

Needless to say, this rapid topic map is simply a scan of the issues and a basic overview of the situation. It is far from comprehensive. We realize racism is an enormous, odious, and deeply entrenched problem that has persisted in the United States since its founding. We also recognize that many in The GovLab operate from a position of power and privilege that require us to listen to those who do not and amplify their views. Both communities of color and white allies can take action to advance racial justice.

Prioritizing any of these topics will also require increased community engagement and participatory agenda setting. Likewise, we are deeply conscious that data can have a negative as well as positive impact and that technology can perpetuate racism when designed and implemented without the input and participation of minority communities and organizations. While our report here focuses on the promise of data, we need to remain aware of the potential toweaponize data against vulnerable and already disenfranchised communities. In addition, (hidden) biases in data collectedand used in AI algorithms, as well as in a host of other areas across the data life cycle, will only exacerbate racial inequalities if not addressed.

Topics where data could make racial injustices more visible and advance understanding of its depth and causes and ways through which it can be systematically addressed

With these caveats in mind, we find that the following areas might be most amenable to improvements in how to leverage data to develop racially equitable solutions and approaches:

Criminal Justice Inequalities

  1. Resource Misallocation: Communities of color have less financial and institutional support for justice-related activities than their white counterparts. Fewer crimes are solved. Victims of crime receive less support. There are fewer programs that offer alternatives to incarceration. These inequalities are not new but the result of institutional divestment and policy choices driven by multiple factors, including radicalized cultural pathologies. While data offers no easy solution to this problem, driven by bad actors, it can provide tools to activists and media to expose those who facilitate racism and provide an evidentiary basis for groups to demand change in their cities, states, and country. Indeed, the Washington Post has compiled an original database of homicide arrest data from the United States’s 50 largest cities to demonstrate how little many cities invest in solving homicides with minority victims and the consequences of those decisions.
  2. Mass Incarceration and Criminalization: Countless studies indicate Black people suffer due to unequal arrest rates, plea deals, and sentencing alongside other forms of discrimination in the criminal justice system. However, the United States still lacks a comprehensive national racial demography of arrests and criminal records or nationwide data on the basic nature of the prison experience. This gap makes evaluating the success of programs intended to address mass incarceration or poor prison conditions. In 2014, the Manhattan District Attorney worked with the Vera Institute and identified significant racial disparities in which defendants were more likely to be prosecuted. As part of the cooperation, the office agreed to pursue strategies that would reduce racial and ethnic inequalities.
  3. Police Violence: In recent years, the press and everyday people have recorded images of police using excessive force, often against Black persons. Yet, there is no official, reliable collection of civilian deaths and injuries caused by law enforcement. A partial database released by the FBI is considered by experts to be misleading. This lack of information makes it difficult for the public to exercise oversight over police and understand the full scope of police violence. Only with independent databases such as Fatal Encounters, Mapping Police Violence, and The Washington Post’s Fatal Forceproject have these episodes begun to be counted in a systematic fashion.

Economic Inequalities:

  1. Income Inequality: There are major racial disparities in family wealth resulting from the legacies of slavery and modern-day segregation, redlining, and other forms of discrimination. While many know of these policies which deprive families of color of equality of opportunity, a lack of data can hide or exacerbate their effects. Long and persistent undercounting of certain Black populations has led to bias and inaccuracies in how government funding is distributed, limiting the resources provided to communities of color. Efforts such as the Black Census Project and Data for Black Lives have attempted to improve collection and address inequalities.
  2. Educational and Training Achievement Gap: Systemic oppression has also led to significant gaps in educational outcomes. School districts serving majority minority populations receive significantly less funding than their majority white counterparts; Black and Hispanic students receive lower test results. Yet, while race is accepted as a defining factor in who graduates from schools in the United States, there is incomplete data in how it manifests. Recent studies have attempted to use available data in innovative ways to address this issue. Recently, the Center for the Analysis of Postsecondary Readiness used data analytics to create a multiple-measure placement algorithm that resulted in higher course completion levels for marginalized students.
  3. Access to Infrastructure: People of color, including Black workers, are less likely to own cars and have access to other means of transportation; in some metropolitan areas, research shows that nearly half of Americans without internet access are people of color. Using data, public institutions can better identify where these areas are and deploy resources to address long-perpetuated inequalities in access to critical infrastructure. The nonprofit EducationSuperHighwayuses publicly available information from the federal government to publish information about available bandwidth in schools and identify districts they can support.

Inequalities in Health

  1. Reduced Quality of Care: In a 2005 report, the National Academy of Medicine noted Black patients were less likely than white patients to be given the appropriate care for certain conditions due to implicit and explicit bias. These conditions contribute to the fact that Black women are three to four times more likely to die in childbirth than white women. A related concern is the increasing role of algorithms and the possibility that biases in them might undermine care of minority patients. In 2019, a publication in Science reviewed a commercial healthcare algorithm used by doctors to recommend treatment. The study found the algorithm demonstrated significant racial bias, failing to note the complex health needs of black patients relative to white patients. The study’s authorsreported the findings to the company responsible and is working with them without salary to improve the algorithm.
  2. Exposure to Environmental Contaminants: High exposure to particulate matter, unclean water, and other pollutants can have serious health consequences, increasing the incidence of cancer, low birth weights, high blood pressure, asthma, and other health conditions. While there are significant anecdotal reports of increased health threats in communities of color, these stories are often ignored by policymakers until they reach a crisis point. Data can allow residents to prove their case and seek restitution. In the City of Zanesville, Ohio, Black residents usedopen data assets to demonstrate that African American homes were connected to contaminated water sources while white homes suffered none of the same issues. The analysis contributed to the residents’ victory in a court case and a $10.9 million settlement.
  3. Distrust and Historical Trauma: One consequence of a legacy of discrimination, exploitation, and mistreatment is that many Black people do not trust the public health (and other) establishments. This distrust leads to people often not seeking out the help and services needed. Learning the impact of distrust on behavior and outcomes; as well as how to redesign public health services to increase trust and access requires new data driven initiatives.
  4. Mental Health: Significant differences by race exist in mental health care due to differences in access, quality, and cost between white patients and patients of color. Data could allow for ways to address these gaps, through the creation of new services or better identification of needs. In one recent study for the AMA Journal of Ethics, researchers explored whether data-driven artificial intelligence could help mental health care practitioners better identify those in need of support.

Social Justice and Rights Inequalities

  1. Access to Housing: Inequities in housing between white people and people of color are perpetuated through both laws on land use and more informal systems of discrimination. Questions remain regarding the optimal policy and social responses to these formal and informal barriers to equitable access to housing. Some organizations are experimenting with data-driven ways to understand and visualize phenomena like gentrification and segregation — including the MIT Media Lab’s Atlas of Inequality and those initiatives from Los Angeles and other cities compiled by Harvard.
  2. Hate Crimes and Hate Speech: People of color, especially Black persons, are more at risk to be the targets of hate crimes relative to non-Hispanic whites. The 2018 FBI Hate Crime Statistics notes that 59.6% of reported violent hate crimes were motivated by race and ethnicity. While these figures suggest a significant problem, sources such as the FBI are notorious for undercounting and underrepresenting hate crimes and do not include many instances of hate speech, which anecdotal reporting indicates is pervasive in online and offline settings. Large social media and technology companies are experimenting with AI-based detection systems to identify and remove hate speech that appears on their platforms, though many such methods remain unproven.
  3. Pandemic Surveillance and Privacy Rights: Community and advocacy groups are seeking ways to use data toreduce the impact of COVID-19 in communities of color while avoiding its weaponization through unchecked surveillance and inappropriate data access controls. However, there is little transparency about how institutions are using the data, which means communities have little input into what constitutes inappropriate use. Revealing these relationships could expose racial biases, as has become evident in uses of AI and facial recognition technologies, and develop toolkits to guard against tech-enabled racial inequities.
  4. Voting Rights and Representation: Many members of minority communities face barriers to participating in the political process due to photo ID laws, literacy tests, and other requirements. Groups such as the Black Census Project and Data for Black Lives are seeking data-driven strategies to improve data collection to avoid under-representation of marginalized communities. Data has also informed efforts by organizations such as the Brennan Center in revealing the flaws in state attempts to purge their voter roles and the inequalities these activities produce.

Moving forward:

The above topics point to major manifestations of bias and injustice in the United States against Black people. While the existence of racism in many of these areas may not be news, many of the topics here can be addressed with policies informed by data or data analysis. With increased access to data, it is possible to advance understanding of the depth and causes of these inequalities and identify ways through which they can be systematically addressed. They can help decisionmakers identify their own biases and prejudices and understand how it has reproduced inequities.

In this conclusion, we summarize some of these avenues, describing how new data methods, increased access to data, improved data responsibility and hiring decisions can help policymakers and others chip away at the entrenched racism and bias evident in our society through data.

  • Revealing Hidden Inequalities:Sometimes racism is starkly apparent, but often it is more subtle and insidious. Data analysis can help policymakers make visible patterns and trends and take steps toward addressing them. Recent uses of data in metropolitan areas show that, despite efforts to promote integration and combat discrimination, many US cities remain deeply segregated. Experts such as Dayna Bowen Matthew at the Brookings Institutionhave sought to identify factors that contribute to this fact and recommend policies to address them.
  • Making The Data Life Cycle Less Racist: As described, finding hidden patterns within our society and economy is one important step toward addressing racism. Increasingly, though, policymakers also need to search for patterns of racism within data itself. Issues of algorithmic bias are often discussed in the context of a growing reliance on artificial intelligence, yet bias may exist across the data lifecycle from collection to analysis to reporting and dissemination. As our discussion of healthcare and sentencing inequities above suggests, such problems are indeed prevalent. We need an end-to-end data life cycle approach to ensure data is used responsibly, ethically, and does not exclude any part of the public. Already, Actionable Intelligence for Social Policy and the University of Pennsylvania has published a toolkit to help policymakers center racial equity in their use of data.
  • Outreach to Disenfranchised and Excluded Communities: Much as data can be used to identify lacunae and gaps that indicate racism, so data can be used in a more positive way by minority and historically excluded communities. In other words, we can use data to address patterns of racism, discrimination, and exclusion. A good example of this can be found in participatory budgeting, which engages marginalized groups and allows them to be directly involved in policy making. By soliciting the public’s input on the questions, issues, and indicators they care about, as The GovLab’s Data Assembly project does — researchers can ensure data use reflects public concerns — including the concerns of vulnerable and marginalized individuals. In the United Kingdom, the Citizens’ Biometric Council has centered the perspectives of those traditionally marginalized in its discussions on biometric technology governance to ensure the technology does not perpetuate and amplify existing injustices.
  • Increasing Access to Disaggregated Data: As decision-making becomes increasingly data-led, so equality of access to data becomes a core issue. Policymakers need to take steps to ensure that minority groups have access to data sets and their resulting insights, for example by expanding the types of information available on open data platforms and ensuring police departments comply with requests for data. Importantly, in order for this access to be meaningful, minority groups also need training and skill-building from which they are often excluded.
  • Trusted Intermediary: Many data projects suffer from a lack of clarity regarding the entity positioned to act on data-driven insights. The lack of a clear demand can lead to valuable or transformative insights going unused. Communities of color might also distrust actors that could represent the demand for data — e.g. police departments and over-policing data. An independent body could be empowered to help steer the responsible and participatory use of data to help address issues of race. As a first step, stakeholders could create an international body to build an evidence base and governance model for such a trusted intermediary — potentially modeled on Data2X, the gender data institution housed at the UN Foundation.
  • Hiring Decisions: Finally, while not strictly a data solution, it is important to emphasize the key role that hiring decisions for data (and related) roles can play in addressing racism and prejudice. Placing minority and disenfranchised candidates in data roles can have a dramatic effect on reducing the extent of racism and discrimination embedded in datasets.
  • Supporting Organizations Leading this Work: Many of the issues discussed in this piece are the focus of various organizations. The GovLab, seeking to amplify and generate support for those voices, has compiled a listing of some of those organizations here.

The first iteration of this piece was developed by The GovLab at New York University Tandon School of Engineering with contributions from: Stefaan Verhulst, Andrew J. Zahuranec, Andrew Young, Danuta Egle, Mary Ann Badavi, Nadiya Safonova, Rashida Richardson, Beth Simone Noveck, Charlton McIlwain, Mona Sloane, Juliet McMurren, and Amen Ra Mashariki.

GovLab Blog website_feature

Select Committee on the Modernization of Congress Hosts Virtual Member Discussion

This release was originally written and distributed by The Select Committee on the Modernization of Congress on May 7, 2020.

Members were joined by two guest speakers to discuss committee continuity and remote operations

Washington, D.C. – The Select Committee on the Modernization of Congress (“Select Committee”) held a virtual discussion with two guest speakers to discuss the importance of committee continuity and how to continue working effectively on behalf of the American people during the ongoing global pandemic. The Select Committee met virtually with Marci Harris, CEO of PopVox, and Beth Simone Noveck, Director of The Governance Lab and Chief Innovation Officer for the State of New Jersey. The Members and guests discussed best practices for remote committee and Member operations, and ways other legislatures around the world are handling business.

“It seems appropriate and important to talk about how we can continue working as a committee,” Chair Derek Kilmer (D-WA) said in his opening remarks. “Our work on behalf of the American people cannot, and I think we all know, should not stop as a result of this pandemic,” said Vice Chair Tom Graves (R-GA).


Click here to view the video

Ms. Noveck shared best practices from state legislatures around the country, and innovative ways legislatures around the world have been operating remotely over the last few months. She highlighted a number of virtual platforms, like Microsoft Teams and Cisco WebEx, that have been used internationally in countries like Argentina, the United Kingdom, Ecuador, and Spain for legislative business. “The ongoing need to prepare for social distancing underscores the importance of ensuring that our legislative institutions are prepared to continue functioning through both the current challenges and those that lie ahead,” she said.

Over the last month, Ms. Harris has hosted mock remote hearings and mark-ups to test the capabilities these virtual platforms offer. She highlighted the usual technical issues that many Americans have experienced, such as needing to mute a line, or turn video capabilities on, but she also shared the positive feedback she received following each session. She highlighted the pros and cons for each session, and how to find workarounds for the in-person connection and interaction that many legislators rely on. “The main finding from our two mock hearings, as you’re demonstrating today, are not a question of technology, but of culture, process and will,” she said.

Since the U.S. Capitol closed to public visitors and guests, and the majority of congressional offices moved to a modified telework operating status, the Select Committee has continued to hold Member-level discussions on committee priorities and ways to continue effectively working ahead of the October 30, 2020 committee report deadline.

GovLab Blog website_feature

Launch: Responsible Data for Children (RD4C) Toolkit

RD4C Tools

The GovLab and UNICEF, as part of the Responsible Data for Children initiative (RD4C), are pleased to share a set of lightweight and user-friendly tools to support organizations and practitioners seeking to operationalize the RD4C Principles. These principles—Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle—are especially important in the current moment, as actors around the world are taking a data-driven approach to the fight against COVID-19.
The initial components of the RD4C Toolkit are:
The RD4C Data Ecosystem Mapping Tool intends to help users to identify the systems generating data about children and the key components of those systems. After using this tool, users will be positioned to understand the breadth of data they generate and hold about children; assess data systems’ redundancies or gaps; identify opportunities for responsible data use; and achieve other insights.
The RD4C Decision Provenance Mapping methodology provides a way for actors designing or assessing data investments for children to identify key decision points and determine which internal and external parties influence those decision points. This distillation can help users to pinpoint any gaps and develop strategies for improving decision-making processes and advancing more professionally accountable data practices.
The RD4C Opportunity and Risk Diagnostic provides organizations with a way to take stock of the RD4C principles and how they might be realized as an organization reviews a data project or system. The high-level questions and prompts below are intended to help users identify areas in need of attention and to strategize next steps for ensuring more responsible handling of data for and about children across their organization.
Finally, the Data for Children Collaborative with UNICEF developed an Ethical Assessment that “forms part of [their] safe data ecosystem, alongside data management and data protection policies and practices.” The tool reflects the RD4C Principles and aims to “provide an opportunity for project teams to reflect on the material consequences of their actions, and how their work will have real impacts on children’s lives.
RD4C launched in October 2019 with the release of the RD4C Synthesis Report, Selected Readings, and the RD4C Principles. Last month we published the The RD4C Case Studies, which analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. The case studies are: Romania’s The Aurora Project, Childline Kenya, and Afghanistan’s Nutrition Online Database.

To learn more about Responsible Data for Children, visit or contact rd4c [at] To join the RD4C conversation and be alerted to future releases, subscribe at this link.

GovLab Blog website_feature

Continuity in Legislatures Amid COVID-19: An Updated Snapshot

By Sam DeJohn, Anirudh Dinesh, and Dane Gambrell

As COVID-19 changes how we work, governments everywhere are experimenting with new ways to adapt and continue legislative operations under current physical restrictions. From city councils to state legislatures and national parliaments, more public servants are embracing and advocating for the use of new technologies to convene, deliberate, and vote.

On April 20th, GovLab published an initial overview of such efforts in the latest edition of the CrowdLaw Communique. As the United States Congress wrestles with the question of whether to allow remote voting, the GovLab has compiled an update on those international and state legislatures that are the furthest ahead with the use of new technology to continue operations.


In the US, On April 16, over 60 former members of Congress participated in a “Mock Remote Hearing” exercise to test the viability of online proceedings during the COVID-19 pandemic.

In Kentucky, when they last met on April 1, that State’s House of Representatives adopted new rules allowing lawmakers to vote remotely by sending in photos of a ballot to designated managers on the House Floor.” (WFPL). Lawmakers have also altered voting procedures to limit the number of lawmakers on the House floor. Members will vote in groups of 25 and may vote by paper ballot (NCSL).

New Jersey lawmakers made history on March 25 when members of the General Assembly called into a conference line to cast their votes remotely on several bills related to the coronavirus pandemic. NJ lawmakers moved 12 bills that day via remote voting.

On the west coast of the United States, the city council of Kirkland, Washington, recently held its first virtual city council meeting. Many cities and counties in California have also begun holding their meetings via Zoom.

As compiled by the National Council of State Legislatures, states that have changed rules — many just in the past few weeks — to allow full committee action and/or remote voting include: Iowa, Kentucky, Minnesota, New Jersey, North Carolina, Utah, and Vermont. Other states have specifically said they are seriously considering allowing remote action, including New Hampshire, New Mexico, New York, and Wyoming.


In the European Union, Parliament is temporarily allowing remote participation to avoid spreading COVID-19 (Library of Congress). With regard to voting, all members, even those participating in person, will receive a ballot sent by email to their official email address. The ballot, which must contain the name and vote of the MP in a readable form and the MP’s signature, must be returned from their official email address to the committee or plenary services in order to be counted. The ballot must be received in the dedicated official European Parliament mailbox by the time the vote is closed.

In Spain, MPs have been casting votes using the Congress’s intranet system, which has been in place since 2012. Rather than voting in real time, voting is typically open for a two-hour period before the session to vote for the alternative or amendment proposals and for a two-hour period following the session in which the proposals are debated to vote the final text.

The UK Parliament has developed an app for remote voting called Member Hub (Wired, April 23, 2020). Two tests of the system have taken place this week. One involved around 30 participants, and the second involved several hundred. Altogether, 430 people have tested the voting app, which is under development.

Also in the UK is the “Virtual House.” On April 21, a handful of lawmakers returned from their Easter break to approve the continuation of democracy via a “virtual Parliament,” a remarkable and unanimous vote to overturn the way things have been done there for over 700 years, and to keep on arguing — but at a proper distance. You can find an update on the first steps and what it will look like here.

In Wales, members of the Welsh assembly on April 1 used Zoom video conferencing for its weekly plenary session, the first for any parliament in the UK (The Hill).

In the Isle of Man, a self-governing crown dependency, members of its parliament will debate and vote on legislation over audio link (The Hill)


The Inter-Parliamentary Union reports that In Argentina, the President of the Chamber of Deputies has approved working remotely via Zoom and videoconference. The videoconferences are broadcast live on Diputados TV. Deputies have access to a digital signature through the Token system for submissions of projects. In the Senate, committee meetings take place via videoconference and are broadcast on the channel Senado TV. A new remote working platform, Senado Móvil, has also been set up, and can be accessed with a username and password. It allows access to the Intranet, institutional emails, the “Comdoc” administrative system and shared files so that coordinated tasks can be carried out in workgroups.

In Chile, recognizing that a majority of the parliament’s members are part of the at-risk population for COVID-19, the Senate has met remotely via Zoom to debate issues ranging from extending postnatal leave to forbidding the denial of basic services during the epidemic. During a deliberation, the Chair of the Committee operates a clock that shows the timing of the session and mutes and unmutes members. For voting, each member appears on the screen and states their vote. Legislation has already been passed using this method. For instance, legislation regulating access to unemployment insurance during exceptional circumstances was enacted in early April.

In Brazil, the Congresso Nacional do Brasil (National Congress) has passed a new resolution which enables the 594 Members of both chambers to work remotely. Both houses use the Zoom video conferencing software for deliberation. While members of the Chamber of Deputies (the lower house) have the option to participate in person, all participation in the Senate (the upper house) is now virtual. For electronic voting, the Chamber uses an application known as Infoleg that is available on smartphones and tablets. Lawmakers must register their device in advance and sign-in using a security code sent to their mobile device. Here is an intro to the system used in the Senate. Cristiano Ferri, Founder of the Hacker Lab in the Brazilian House of Representatives, describes how the Brazilian parliament is responding to COVID-19 in this video.

In Paraguay, the Senate has held its first virtual session involving 42 lawmakers (April 14, 2020).


In the Maldives, the 87 lawmakers of Parliament are convening online using Microsoft Teams video conferencing technology, instead of physically being present at the parliament house, known as the People’s Majlis, in the capital Malé. Sessions are also being broadcast on television in real time, as well as on social media.

On the other end of the spectrum, South Korea recently held the world’s first nationwide legislative elections since the outbreak. Each polling station was equipped with hand sanitizer and disposable gloves; voters, wearing masks and standing far apart, had their temperatures checked at the entrances.

This list is not exhaustive. If you know of any other examples you would like to share please contact [email protected]

GovLab Blog website_feature

From Idea to Reality: Why We Need an Open Data Policy Lab

slack-imgs.comThe belief that we are living in a data age—one characterized by unprecedented amounts of data, with unprecedented potential—has  become mainstream. We regularly read phrases such as “data is the most valuable commodity in the global economy” or that data provides decision-makers with an “ever-swelling flood of information.”
Without doubt, there is truth in such statements. But they also leave out a major shortcoming—the fact that much of the most useful data continues to remain inaccessible, hidden in silos, behind digital walls, and in untapped “treasuries.”
For close to a decade, the technology and public interest community has pushed the idea of open data. At its core, open data represents a new paradigm of data availability and access.  The movement borrows from the language of open source and is rooted in notions of a “knowledge commons”, a concept developed, among others, by scholars like Nobel Prize winner Elinor Ostrom.

Milestones and Limitations in Open Data

Significant milestones have been achieved in the short history of the open data movement. Around the world, an ever-increasing number of governments at the local, state and national levels now release large datasets for the public’s benefit. For example, New York City requires that all public data be published on a single web portal. The current portal site contains thousands of datasets that fuel projects on topics as diverse as school bullying, sanitation, and police conduct. In California, the Forest Practice Watershed Mapper allows users to track the impact of timber harvesting on aquatic life through the use of the state’s open data. Similarly, Denmark’s Building and Dwelling Register releases address data to the public free of charge, improving transparent property assessment for all interested parties.
A growing number of private companies have also initiated or engaged in “Data Collaborative” projects to leverage their private data toward the public interest. For example, Valassis, a direct-mail marketing company, shared its massive address database with community groups in New Orleans to visualize and track block-by-block repopulation rates after Hurricane Katrina. A number of data collaboratives are also currently being launched to respond to the COVID-19 pandemic. Through its COVID-19 Data Collaborative Program, the location-intelligence company Cuebiq is providing researchers access to the company’s data to study, for instance, the impacts of social distancing policies in Italy and New York City. The health technology company Kinsa Health’s US Health Weather initiative is likewise visualizing the rate of fever across the United States using data from its network of Smart Thermometers, thereby providing early indications regarding the location of likely COVID-19 outbreaks.
Yet despite such initiatives, many open data projects (and data collaboratives) remain fledgling—especially those at the state and local level.
Among other issues, the field has trouble scaling projects beyond initial pilots, and many potential stakeholders—private sector and government “owners” of data, as well as public beneficiaries—remain skeptical of open data’s value. In addition, terabytes of potentially transformative data remain inaccessible for re-use. It is absolutely imperative that we continue to make the case to all stakeholders regarding the importance of open data, and of  moving it from an interesting idea to an impactful reality. In order to do this, we need a new resource—one that can inform the public and data owners, and that would guide decision makers on how to achieve open data in a responsible manner, without undermining privacy and other rights.

Purpose of the Open Data Policy Lab

Today, with support of Microsoft and under the counsel of a global advisory board of open data leaders, The GovLab is launching an initiative designed precisely to build such a resource.
Our Open Data Policy Lab will draw on lessons and experiences from around the world to conduct analysis, provide guidance, build community, and take action to accelerate the responsible re-use and opening of data for the benefit of society and the equitable spread of economic opportunity. 
Toward that end we will identify and disseminate best practices; develop and curate guidelines, toolkits, and frameworks to support more effective data provision and use; and implement proof-of-concept initiatives to improve our understanding of how to harness the power of open data to solve key societal challenges. 
In addition, the Open Data Policy Lab will foster a community of data stewards, chief data officers, and other decision-makers within the public and private sectors to share knowledge, undertake collaborative work, and spur responsible data sharing. 
In launching this Lab, our goal is to disseminate information, and more generally to work toward making manifest the many potential benefits of open data. By providing a community and a set of demand-driven initiatives, we hope to bring about genuine social, economic, and political transformation.
We’d love to have you join us on this journey. If you are interested, there are three ways to engage:

  • If you create policy, manage or otherwise control data at an institution, join us as a data steward partner and connect with responsible data leaders seeking ways to create public value through data re-use and collaboration.
  • If you are studying open data or data re-use, join us as a research partner and help us build and act upon the Open Data Policy Lab research and policy agenda.
  • If you are part of an institution seeking to advance the field of open data, support the Open Data Policy Lab by becoming a funding partner.

Over the next few months we will release our first resources, followed by regular updates and targeted interventions based upon feedback from users and policy makers. Sign up here to receive updates and early releases from the Open Data Policy Lab.

GovLab Blog website_feature

LAUNCH: The Responsible Data for Children (RD4C) Case Studies

Screen Shot 2020-04-03 at 4.13.36 PM
This week, as part of the Responsible Data for Children initiative (RD4C), the GovLab and UNICEF launched a new case study series to provide insights on promising practice as well as barriers to realizing responsible data for children.
Drawing upon field-based research and established good practice, RD4C aims to highlight and support responsible handling of data for and about children; identify challenges and develop practical tools to assist practitioners in evaluating and addressing them; and encourage a broader discussion on actionable principles, insights, and approaches for responsible data management.
RD4C launched in October 2019 with the release of the RD4C Synthesis Report, Selected Readings, and the RD4C Principles: Purpose-Driven, People-Centric, Participatory, Protective of Children’s Rights, Proportional, Professionally Accountable, and Prevention of Harms Across the Data Lifecycle.
The RD4C Case Studies analyze data systems deployed in diverse country environments, with a focus on their alignment with the RD4C Principles. This week’s release includes case studies arising from field missions to Romania, Kenya, and Afghanistan in 2019. The data systems examined are: 
Romania’s The Aurora Project
The Aurora Project is a child protection platform developed by UNICEF Romania in collaboration with NGO and government partners. The system enables social workers and community health care providers to diagnose and monitor vulnerabilities experienced by children and their families. Through the administration of a child protection questionnaire, the system supports the determination of a minimum package of services needed by children and their families. It also enables child protection evaluation and planning work at the national level. The Aurora Project reflects many of the RD4C Principles through its collection of data for clear and well- defined purposes and the various training and guidance materials provided to users. UNICEF Romania and counterparts in the Romanian Government are still working to address challenges related to sensitive group data and the potential for disproportionate data collection and retention.
Childline Kenya
Childline Kenya is a helpline offering services for children subjected to violence or neglect. Since it began operations in 2006, trained counselors have responded to calls, logged major components for reporting purposes, and redirected callers to relevant services. The organization emphasizes training and the rights of children while ensuring its data collection is proportional and purpose-driven. Given the sensitivity of its work, it faces some difficulties with duplicative and complex data.
Afghanistan’s Nutrition Online Database
Afghanistan’s Nutrition Online Database is a web-based information system providing access to aggregated nutrition data to inform planning and service delivery at the national, provincial, and zonal level. The Public Nutrition Department (PND) within the Afghanistan Ministry of Public Health (MoPH) leads database management, with UNICEF Afghanistan acting as the lead technical developer and providing ongoing technical support. The system exists because missed use of potentially valuable data is a common challenge across the children’s data ecosystem Afghanistan. The Nutrition Online Database tries to spur the use of existing and newly developed nutrition data streams that otherwise might not inform potentially life saving nutrition planning and service delivery. It is the product of a participatory development process with key stakeholders across sectors and actors within beneficiary communities. PND, UNICEF Afghanistan, and other stakeholders support professionally accountable data use through training efforts and working groups but remain challenged by the fragmentation of nutrition systems, mandates, formats, and indicators. These factors could contribute to challenges
in tracking decision-making processes affecting data responsibility across the nutrition data ecosystem.

To learn more about Responsible Data for Children, visit or contact rd4c [at] To join the RD4C conversation and be alerted to future releases, subscribe at this link

GovLab Blog website_feature

Do You Have COVID-19 Questions? We Have Answers: Ask a Scientist Launches

This blog post was written by  and originally appeared in the Federation of American Scientists blog on March 18, 2020.

covid chat bot 2
Today, the Federation of American Scientists, The Governance Lab at New York University Tandon School of Engineering, and the State of New Jersey Office of Innovation launched a free interactive tool to help answer the public’s questions on COVID-19 virus in English and Spanish.
“Ask a Scientist,” located at offers answers to questions about the nature of the virus, public health data on the outbreak, guidance on how to protect against contracting the virus, and even information for travellers. All the content is sourced from WHO, CDC, and other reliable and verified sources, researched and edited for readability and clarity by a team of scientific experts.
“We are in the midst of what could become the greatest infectious disease outbreak of our time,” FAS President Ali Nouri says about the new collaboration. “The public deserves science-based information during this crisis and we’re proud of this partnership to provide that.”
To use the service, a person types in a question. If they don’t find the answer they need, they can click “Ask a Scientist” and receive a researched answer by a team of FAS researchers and a crowdsourced network of vetted science experts led by the National Science Policy Network. Every answer is sourced, cited and dated to ensure accuracy and timeliness. Answers are then added to the knowledge base for the benefit of others.
“We are getting all hands on deck, and engaging a global volunteer network of scientists, journalists and other experts to lend their know how to provide rapid and accurate information that will help slow the spread of this disease and mitigate its impact,” says Professor Beth Simone Noveck, Director of the Governance Lab at the NYU Tandon School of Engineering and Chief Innovation Officer for the State of New Jersey.
In addition to providing the public with key information on COVID-19 virus, Ask a Scientist is also designed to dispel myths and disinformation on coronavirus that is circulating online and on social media.
Ask a Scientist will also be live on the Amazon Alexa by the end of the week. Just say “Alexa, Ask a Scientist” followed by your COVID-19 questions to access the service by voice.
To visit Ask a Scientist, click here.


GovLab Blog website_feature

The future of work is having a midlife crisis

Policymakers should ask more questions and focus less on their favourite answers

This article was written by Jeffrey Brown, Head of Technology Policy at the Bertelsmann Foundation and Stefaan Verhulst, Co-Founder of The GovLab. It was originally published on Apolitical on March 16, 2020.

social-media-future-of-work-3 (1)
Workers around the world are starting to realise the myriad ways in which technology will disrupt their working lives over the coming decade.
The arrival of new technologies such as artificial intelligence, automation, and cobots are leading wage earners everywhere to ask: will this thing do my job by 2030?
This seemingly simple question is often asked tongue in cheek, but it belies a complex and mounting anxiety among workers that has catapulted the “future of work” into prime time ⁠— and the public’s consciousness.

The future of work is having a midlife crisis

Seven years after the release of Frey and Osbourne’s bombshell study proclaiming that 47% of tasks in the U.S. could be automated by 2033, global commissions, presidential candidates, organised labour, and CEOs of Fortune 500 Companies have all taken up the mantle of future of work policymaking.
But just as the need for actionable policies draws nearer, the future of work is failing to live up to its hype. Instead, the amorphous concept has devolved into a Pandora’s box of wandering definitions, confused framing, and contradictory statistics.
For example, take the wildly varying estimates of the number of jobs that will be lost—or gained—due to technology in a given year. These narrow, one-dimensional statistics are often cherry picked and used as incontrovertible evidence for a singular and all-encompassing policy response that “solves” the future of work as if it were an algebraic equation scrawled on a chalkboard.

Poor framing and definitions have needlessly damaged the quality of public policy generated around the future of work. How did we get to this point?

But, much like climate change, the future of work defies simple solutions. To date, future of work analysis and solutions have fanned uncertainty rather than providing a sustainable path forward. To be clear: We do not need a reframing of what the future of work is and isn’t. Rather, we need to acknowledge that the future of work has outgrown its rebellious, anything-goes ideas phase and is now facing a midlife crisis in which it must buckle down and appeal to a higher purpose.
In our view, the higher purpose should break from the past to ensure that policymakers are armed with the tools, questions, and frameworks to develop sustainable future of work policy that enables everyone to reap reward from new technologies.

Zooming out

But first, a reckoning is needed in which the field’s shortcomings are assessed. Consider, for instance, the current framing of the future of work debate, which is delivered through a steady drip of studies analysing the experiences (and anxieties) of different groups. Various studies find that women and workers of colour are more likely to be affected by automation, while others highlight the plight of white men or white-collar workers employed in financial and legal services.
While such analysis has done a great deal to highlight specific future of work challenges facing subsets of society, the reality is that technology and automation will affect everyone. Percentage estimates for groups of workers likely to be impacted will do little to advance sustainable policy that works for everyone. Rather, policymakers should instead focus on asking better questions and define who they are creating policy for in the first place.

Our broader goal is to develop a new science of questions—one that could widen the conversation about which issues really matter, and that could help organisations harness the potential of the data age in making decisions about how to allocate finite funding, time, and other resources

Poor framing and definitions have needlessly damaged the quality of public policy generated around the future of work. How did we get to this point? Taking stock, it is clear that we jumped to solutions (such as Universal Basic Income) without first taking a breath to ask if we are asking the right questions. We need a better way of identifying the issues that matter — both when it comes to the future of work, and more generally.

Valuing questions over answers

As any fourth-grade teacher would tell you, there is no such thing as a bad question. But in order for the future of work to emerge from its midlife crisis, we need a break from the cycle of prioritising moonshot solutions over all else. And it is up to policymakers to seek out and ask the right questions.
That is why The GovLab and the Bertelsmann Foundation are partnering to launch the future of work domain of the 100 Questions Initiative, in which we crowdsource the most critical questions for policymakers to address as they relate to the future of work. Our broader goal is to develop a new science of questions—one that could widen the conversation about which issues really matter, and that could help organisations harness the potential of the data age in making decisions about how to allocate finite funding, time, and other resources.
Framing questions in recognition of this fact can allow us to develop policies that consider the interests of everyone. The resentment many citizens express towards government today will not be solved overnight nor are there any easy solutions to it. By defining questions well, though, we can unlock the potential of data and data science to begin addressing some of these concerns. We can — question by question — provide the public with answers that matter. If we continue at the current pace, at the very least, policymakers risk wasting precious resources to “solve” poorly defined problems that stem from the wrong questions. At the very worst, they risk compacting public policy liabilities that will be left for future generations to sort out.
If the 2010s were the decade of lofty buzzwords mixed with optimistic futurism, the 2020s should be the decade in which methodological rigour and data take over to guide the development of sustainable future of work policy. It is only by zooming out to ask the right questions that we can truly “solve”’ the future of work’s midlife crisis. — Jeffrey Brown and Stefaan Verhulst
Engage with us on the 100 Questions Project

GovLab Blog website_feature

Call for Action: Toward Building the Data Infrastructure and Ecosystem We Need to Tackle Pandemics and Other Dynamic Societal and Environmental Threats

Today, in response to the growing human tragedy caused by the spread of COVID-19, concerned individuals and organizations from around the world launched a call for action aimed at rapidly bolstering society’s ability to leverage data to respond to the current and future pandemics. 

Read and sign the Call for Action here

In particular, seven actions are called for to enable more systematic, sustainable and responsible data collaboration to support decision-making regarding pandemic prevention, monitoring, and response. They include:

  1. Developing and Clarifying Governance Framework to enable the trusted, transparent, and accountable reuse of privately held data in the public interest;
  2. Building capacity of organizations in the public and private sector to reuse and act on data through investments in training, education, and reskilling of relevant authorities;
  3. Establishing data stewards in organizations who can coordinate and collaborate with counterparts on using data in the public’s interest and acting on it;
  4. Building a network of data stewards to coordinate and streamline efforts while promoting greater transparency;
  5. Engaging citizens about how their data is being used so clearly articulate how they want their data to be responsibly used, shared, and protected;
  6. Unlocking funds from a variety of sources to ensure projects are sustainable and can operate long term.
  7.  Promoting technological innovation through collaboration between funders (e.g. governments and foundations) and researchers (e.g. data scientists) to develop and deploy useful, privacy-preserving technologies.

The recommendations are based upon the European Expert Group on Business to Government Data Sharing’s Final Report and efforts undertaken by The GovLab, The World Economic Forum, GSMA’s AI for Impact Taskforce, SDSN TReNDS, Open Data Institute, the Global Partnership for Sustainable Development Data, among others.
A set of concrete pathways to implement the below immediately, in tandem with governments and other actors, is being developed and crowdsourced with the help of the signatories. When signing up please indicate whether you want to be part of the next steps. 
If you are interested in joining the effort either sign HERE or contact Stefaan Verhulst at stefaan @  (Co-Founder of the GovLab). 

GovLab Blog

Designing the 100 Questions for NYC: Panel Reflects on New Science of Questioning During Open Data Week

The panelists (from left to right): Brennan Lake (Cuebiq), Starling Childs (Citiesense), Adrienne Schmoeker (City of New York), Panthea Lee (Reboot), and Stefaan Verhulst (The GovLab)
The panelists (from left to right): Brennan Lake (Cuebiq), Starling Childs (Citiesense), Adrienne Schmoeker (City of New York), Panthea Lee (Reboot), and Stefaan Verhulst (The GovLab)

New Yorkers face many problems in their daily lives. Whether it’s public health or public transit, residents have many concerns about the place they call home. The question many policymakers face, then, is how to identify the most urgent issues for the public in an open, participatory manner and mobilize the resources needed to solve them.

On Tuesday, March 3, 2020, The GovLab and Reaktor facilitated a discussion aimed at tackling this need. As part of New York City Open Data Week 2020, running through March 7th, The GovLab’s Co-Founder and Chief Research and Development Officer Stefaan Verhulst moderated a panel and exercise to identify a new science and practice of formulating questions answerable through data science. Informed by The GovLab’s 100 Questions Initiative, the group also hoped to identify the most important questions for New York City whose answers can be found in data and data science.

Stefaan was joined by panelists Brennan Lake (Cuebiq), Starling Childs (Citiesense), Adrienne Schmoeker (City of New York) and Panthea Lee (Reboot). As data stewards of organizations representing the supply and demand sides of data usage, Brennan, Starling, and Adrienne, each provided a perspective on the need to develop a question-driven approach to open and private data use. Panthea provided her knowledge as a design-thinking and citizen engagement expert.

Toward a New Methodology

Stefaan opened the discussion by asking the panelists how they currently engage with users, acknowledging that a question-driven approach means being demand driven. Though working in different contexts with very different audiences, each panelist spoke about the need to empower local communities and promote broader usage of open and private data.

“It is really important for us to create impact locally. We started our data philanthropy effort because there are [data] use cases that touch everyone,” said Brennan Lake in reference to Cuebiq’s Data for Good program. “Still, most of the response we’ve gotten has come from researchers. That’s great but this data is coming from millions of users who need to have a say in how their data is used.”

Starling Childs, in his discussion of Citiesense’s neighborhood analytics offerings, echoed these points. “It’s important to respond to the contexts of the communities we serve. We’re providing the platform to help improve the quality of life for the people who live and work there.”

At the same time, panelists emphasized that a demand-driven approach isn’t the same as expecting the public to do all the work. Organizations need to think about what they can do and how they can coordinate mass action in ways that provide value.

“One of the biggest misconceptions around user-centered design is that we should ask the user what they want and ask the user to decide everything,” said Panthea, Reboot’s executive director. “When it comes to defining the right questions, I think about where we are already collecting residents’ needs and questions so we are not going back to the same places over and over. I think about what questions we are best equipped to solve and answer to make people’s lives better.”

Adrienne Schmoeker, New York City’s Deputy Chief Analytics Officer, made a similar point in reference to her work on the city’s open data platform. “Just as there’s no such thing as the perfect dataset, there aren’t always perfect questions either. It can often take us a month, two months to scope down the question we are trying to answer. […] We would like to compel New Yorkers to identify the questions that are most important to them, prioritize them, and find a productive channel to communicate back.”

The panel ended with a conversation about how the different organizations represented ensure their work with open and private-sector data feeds into actions that improves lives. Again, the participants noted the need to think about organizational interests and the interests of the audience.

“A good place to start is checking what your special interests are, what your motives are, to ensure that what you provide is going to add value and not just noise,” said Brennan.

“I think it can be really irresponsible to start projects around questions or problems that we do not have the ability to solve or really don’t want to solve,” noted Panthea. “Many projects get stuck in implementation because we haven’t told citizens what is within boundaries, what is in jurisdiction. […] There’s risk of getting people’s hopes up.”

Identifying 100 Questions for New York City

The GovLab's Andrew Young leading one of the breakout sessions
The GovLab’s Andrew Young leading one of the breakout sessions

Following this discussion and a brief segment for audience questions, the event moved into its second phase. The GovLab and Reaktor invited attendees to participate in a small-group ideation effort to develop a strategy for creating a version of The 100 Questions Initiative for New York City.

The groups discussed how topic domains could be identified and prioritized; how New Yorkers might be best engaged; and how any work done might move insight to action. There were many ideas raised through the exercise drawn from the participants’ diverse backgrounds and experiences working with open and private-sector data. Still, several ideas came up frequently.

In trying to identify domains to address, participants often called attention to both bottom-up and top-down mechanisms that New York City already has. They noted that existing channels, such as 311 (a prominent asset for data collection), could be connected with resources meant for question formulation. Many people also discussed the need to work in areas where there is a clear mandate.

On engaging New York City residents, individuals noted the need to engage those groups that coordinate individuals instead of trying to create new communities. They also noted the need to do specialized outreach to underrepresented groups — whether they be the elderly, the undocumented, or the impoverished — to ensure their concerns are not forgotten. Some of this work bringing people together could be done through special events, such as hackathons or citizen assemblies or through online platforms for closed-loop feedback.

Finally, on translating insight to action, participants spoke frequently about the need for awareness upfront of resource constraints. Organizations do not have infinite capacity and often face financial restrictions, have limited political capital, or are constrained by something as small as when a project falls in the calendar year. By understanding these facets, organizations can produce impactful work that helps the public.

Next Steps

As the discussion suggests, there is real interest in New York City around a new, question-driven methodology. Through its research on The 100 Questions Initiative, The GovLab will continue to develop this approach so the city can provide meaningful data solutions to the issues residents face.

Information on this work can be found at The 100 Questions Initiative website ( There, individuals can participate in prioritizing issues by voting on questions related to migration. They can also learn about the cohorts of bilinguals, people who have a clear understanding of a problem area as well as an understanding of what data means to it and how it can be used.

Individuals interested in doing more around The 100 Questions Initiative can contact Stefaan at [email protected].