Zane Selvans

Vaquero de Datos, Catalyst Cooperative
I wrangle US open energy system data for researchers, activists, policymakers, and journalists working on climate & energy policy.

How to find and expose utilities that cheat the markets

Joe Daniel at the Union of Concerned Scientists (and formerly Sierra Club) has done a lot of great analysis identifying various ways that utilities cheat in the power markets in order to keep using uneconomic fossil fuel assets that should really be shut down. See e.g.:

However, a lot of this analysis has been pretty manual, is only updated occasionally if at all, and it has relied on proprietary datasets.  It would be great to implement these analyses in a continuous way, and see if it's possible to train AI/ML systems to identify the cheaters so they can be targeted for legal / regulatory / legislative / advocacy campaigns. It would also be great if the whole thing could use open data and open source analytical tools. 

A continuously updating dashboard of "Top 10 Most Wanted" utility market manipulators standing in the way of the transition to renewable energy would be great.

Data to support utility financial transition strategies

Catalyst has done a lot of work integrating financial data on electric utilities, to support new financing mechanisms and legislation that can get existing fossil generation offline faster. Especially in vertically integrated markets, existing capital investments are still a big impediment to the energy transition. If you want to learn more about the analyses that back this work, and the policy mechanisms involved, check out these whitepapers from Energy Innovation.

Thus far we've mostly worked on electric utility data, but we're moving into natural gas now, and need to integrate a bunch of new messy data to support the same kind of policy mechanisms and effective targeting of the most susceptible / vulnerable utilities. These will include the FERC Form 2 (a disaster of a FoxPro DBF database) EIA Form 176 (CSV files) and the PHMSA Annual Gas Reports (Excel spreadsheets). We are looking for more automated record linkage and entity resolution tools that can make this work more reliable and less labor intensive.

Check out RMI's Utility Transition Hub to see what some of the downstream applications of our existing data look like.
Like Comment

Seeking Co-PI/collaborator for CCAI Innovation Grant

Hey all, I'm a member of Catalyst Cooperative, a little worker-owned data wrangling organization. We take publicly available but poorly curated energy system data in the US and clean it up for use in the public interest by researchers, activists, policymakers, and journalists. We are looking for someone at a university or research institution in the US who would like to work with us, that is eligible to be a PI on a Climate Change AI Innovation Grant.

Our work mostly focuses on data cleaning and integration -- normalizing data into well structured relational DBs, doing record linkage between different government datasets, disambiguating records that refer to the same entities, identifying bad / outlying values, imputing missing values, etc.  We're getting to where the number of datasets we're integrating makes doing this kind of work by hand or with rule-based systems is pretty tedious and fragile, and we're interested in applying some recent work that has gone into tools like Holoclean, Tamr, and Snorkel to the datasets we work with -- potentially forking some of their academic / research oriented repos that have been abandoned as the tools have been commercialized. I wrote up some notes on the landscape of ML-based data wrangling tools in May in this post: Automated Data Wrangling.

Access to high quality data that's processed in a repeatable way, and kept up to date is a big source of information & power asymmetry in US energy policy, and we're trying to rectify it. Open source, actively maintained versions of these tools would also empower open data advocates in other domains.

We're open to looking at other (more traditional?) ML/AI projects working with the data we produce too, but the data wrangling stuff is really our core mission and the niche we've tried to carve out for ourselves.  Let me know if you're interested in chatting more!
Like Comment

Energy data wrangler seeking AI

Hey all, I'm a member of a small data wrangling cooperative that takes publicly available but poorly curated US energy data and processes it into well normalized databases and columnar formats. We do a lot of grungy work with record linkage, entity resolution, outlier detection and imputation, and hand labeling of energy data, so that researchers, activists, journalists, and policymakers can get right to doing new analysis instead. All our code and data is open source, and we'd love for more folks from the ML/AI community to make use of it.  We also have a lot of experience with US electricity policy and utility finance, which are major roadblocks to getting off of fossil fuels in the US. We're starting to think about how we can make our data wrangling less tedious, fragile, and error prone by integrating more automated ML-based tools for record linkage, entity resolution, outlier detection, and missing value imputation. If anyone has experience with those kinds of applications we'd love to talk! Here's a roundup of recent tools that I wrote earlier this year.
Like Comment