On February 13th, 2019, DevTech Systems attended the first ever World Bank Data Day at its Headquarters in Washington, DC. DevTech participated in the Data Day Fair and Senior Economist Anne Bernier and Data Analysts Evan Williams and CJ Tracey attended talks throughout the day. Here are some highlights from the day:
My two favorite talks were about using big data to measure economic concepts faster, and in more detail, than official government statistics. In one lightening talk, Juni Zhu (World Bank) demonstrated how LinkedIn data can answer questions about changing employment in the tech sector in lower and middle-income countries, and skills mismatch between recent graduates and job openings. In another talk, Richard Record and Sam Fraiberger described how tweets and online news can be analyzed to create economic sentiment indices that are cheaper and faster than government-run confidence surveys. Sentiment indices can be very helpful leading indicators of GDP growth and employment.
The morning sessions of Data Day focused primarily on the use of Machine Learning and Big Data within the international development field.
All of the talks and discourse boiled down to one simple statement: machine learning has the potential to be a game-changer for international development. Satellite imagery has revolutionized the way we monitor electrification rates (and by extension priority areas) in Africa. GPS data from taxi and ridesharing services have identified bottlenecks and infrastructure deficiencies more accurately, cheaply, and efficiently than traditional surveys ever could. Web scraping provides economic data in countries where official statistics are not trustworthy. The use of network data can be used to spread health information and contain epidemics. Video data (which would be extremely expensive for humans to monitor), could monitor teacher absenteeism cheaply and with a high degree of accuracy.
However, while machine learning gives us so many more tools to do good in the world, it also can be used in unethical ways. The two big elephants in the ML room are the hype of ML/AI, and more significantly, the issue of privacy. As with all hype, algorithms applied incorrectly can also do more harm than good. A bigger issue may be that, particularly in countries with high levels of corruption, big data can become a tool of coercion and control. It’s important to consider the ethical application and use of data, and there is, a growing need for international organizations to focus on ethical concerns within data and machine learning application.
Although I found the entire session interesting, three speakers really stood out. First, the keynote speaker Roberta Gatti discussed the development of the Human Capital Index (HCI), a highly successful World Bank project that directly ties health and education to productivity. Hearing the thought process that led to the development of the index was insightful, and their focus on outcomes, coverage, salience, coherence, and ownership should be kept in mind when creating future indices. Additionally, the discussion at the end with Ms. Gatti and Simeon Djankov was interesting and provided additional insight into the development and use of indices. Ms. Gatti again highlighted the struggle between simplicity and comprehensiveness when developing the index and Mr. Djankov, referencing future plans for the index, stated that the World Bank should be careful not to overburden the index and should stick to different tools for different tasks.
Second, Charles Fox from the Geospatial Operational Support Team talked about their development of a rapid healthcare access assessment for Yemen. Their objective was to estimate access to healthcare in Yemen, where the World Bank has been unable to send missions since 2015. They used open source information and created an open source algorithm (available on GitHub) that could estimate access to healthcare services and is adaptable to new information as it becomes available. This highlighted the use of new open source technologies to create a flexible and responsive tool where the use of traditional methods is not possible.
Finally, Malar Veerappan from the Development Data Group spoke about maximizing the value and use of the World Bank’s data assets. This is very applicable to DevTech’s work on creating a space for accessible data with the ESDB. She stated that there needs to be a large movement toward a reused and integrated data community in order to reduce time spent on data-prep.