Zane Selvans

Vaquero de Datos, Catalyst Cooperative
I wrangle US open energy system data for researchers, activists, policymakers, and journalists working on climate & energy policy.
 Priya Donti  Reading through their methodology, the questions that come to mind:
* How well did the model do at recovering actual plant operations and emissions in places with hourly or better plant-level ground truth data? Does it seem odd that this metric isn't included?
* What frequency of sampling do they have in the satellite imagery? Is it spatially uniform? I think a lot of these satellites have nearly polar orbits, in which case they'll oversample high latitudes / undersample low latitudes. How diverse and uniformly distributed is the temporal coverage? Polar orbits designed to give global coverage are often always seeing the ground at same time of day (e.g. 2pm). Is the coverage good enough to capture, diurnal, weekly, and seasonal variations in plant operation?
* Did they look at how well the model recovered plant operations in high latitude locations when it was only given observations that would be reflective of what's available at low latitudes?
* How good is the national level data that they're using to estimate the overall fuel mix and convert the plant on/off signal into GHG emissions?
* What assumptions are they making about the heat rates / thermal efficiency of the plants?
* What assumptions are they making about the load factors of the plants while operating?
* How are they accounting for plants where several different independently operable generation units may be using the same smokestack? We see this frequently in the EPA CEMS/AMPD hourly data especially for natural gas combined cycle plants. They have several different operational modes where just the combustion turbines may operate as peakers, independent of the steam turbines. Or where single GT-ST pairs may operate independently of each other, even though they are all hooked up to the same smokestack / emissions monitoring equipment.
* What fraction of actually existing power plants are covered by the datasets they're using to identify the locations of plants to track? What's the lower size threshold of plants that are included in those datasets, and what fraction of the global power plant population (and emissions) would you expect to be missing?
* Do they actually believe that the difference between their prediction and the other reported dips in global generation in 2020 are statistically significant? I would guess that 3.4%, 3.3%, and 2.9% are all statistically indistinguishable, but they seem to think otherwise.
All of the proprietary data that Joe is using in the US is ultimately derived from public data (much of which we are working to clean up and provide in analysis ready formats for free). Also a lot of it is about electricity production costs, power market prices, and the physical operational characteristics of the plants, so it's not something you can really observe externally.

Do ML folks really think that what Climate TRACE is trying to do is tractable? It seems like the ground truth data for training and validating the models is going to be totally nonexistent for most of the world, and I know it's an awful mess in the US, even though it's available. As an outsider that's familiar with the data but not the ML side of things, it sounds like a boondoggle.
I highly recommend Danny Cullenward & David Victor's book Making Climate Policy Work which looks primarily at the political economy of carbon pricing and offsets, and makes a strong case that it can't work well, even though it might be good economic policy, because the same attributes that make it good economically make it awful politically.

In a slightly different vein, Leah Stokes' book Short Circuiting Policy explores the mechanisms through which utilities capture regulators at the state level, and have been effective in some cases at rolling back climate policy or neutering it in the implementation phase.
These seem to be some of the main folks working on applying ML to the problem of structured / relational data cleaning. Anyone here know a post-doc or new professor to come out of one of their labs, who is also excited about working on climate issues and not just chasing the VC 💰💰?