By Grant Fisher, chief information officer, Pecan Street
10 years in, Pecan Street’s greatest technical accomplishment is the development of the largest database of real-world energy and water use on the planet. It’s used by thousands of researchers from more than 60 countries via Dataport. Its data has been cited in more than 100 peer-reviewed papers. And it’s grown from a rented server on the UT-Austin campus to a petabyte of storage at our lab that grows by more than a billion data points per day.
But we didn’t set out to create the mother of all energy and water databases. Honest.
When Pecan Street launched, our mission was simple but broad: we wanted to spur the development of more renewable energy through real-world research and testing. In our first few years, we pursued a number of avenues that all led toward that mission:
- We established a unique network of residential electricity customers whose detailed energy use is measured every minute of every day.
- We spurred the installation of rooftop solar so that we could measure its impact on the grid.
- We sparked a surge in EV purchases in the first neighborhood we connected – right here in Austin. At one time, we had more than 75 EV owners in less than one square mile.
- We ran trials that tested how price signals could change people’s energy use and how text-messages could help reduce energy use during hot summer peak days.
All of those activities pointed us in an interesting new direction. If we were going to be leaders in energy and water research, we had to become leaders on data.
Our first database was created by the Texas Advanced Computing Center (TACC) at the University of Texas. TACC created and stored all of the initial data in a format that made things easy for Pecan Street’s internal staff. But as the data grew from 20 homes to 50 homes to over 100, we decided to host the data on-site at our lab, located in the Mueller neighborhood.
We had to learn how to manage naturally-occurring data irregularities – such as how to indicate missing data versus measurement reads of zero – and how to prioritize what data to collect. So we assembled a data advisory board of the country’s leading distributed energy and grid management researchers. The board has not only helped ensure the integrity of our data, but it has also guided Pecan Street’s expansion by helping identify and recommend the datasets we should develop.
It also became clear that we needed a simple and easy interface through which researchers from around the world could access our data. That meant opening up a part of the system that is typically on the backside of a network directly to end-users. And that meant instituting a number of security measures that would protect the Personally Identifiable Information (PII) of the families that participate in our research.
We accomplished this by making sure that we never have PII stored within any devices at participants’ homes. All of the data that is transferred to Pecan Street’s servers is just raw energy and water readings along with a unique home identifier we call a DataID. Our PII data is not stored in our databases, but rather with SalesForce, a global customer management tool trusted by companies around the world to safeguard PII. Only a select few Pecan Street employees can access information about our participants (this is important, for example, when we need to reach them for system maintenance). Others, including all of the researchers using our data, only know homes by the randomly generated DataID.
Why is this important? Because we want to safeguard our participants’ digital presence online. Pecan Street couldn’t have collected all of our data without them, and we want to make sure they have placed their trust in us appropriately.
In addition to the databases’ security, we’ve also become a lot more sophisticated in how we extract the data from homes, transform it, and load it into our database so it can be used by researchers. Initially, we downloaded data from the devices daily. As the needs of our staff and researchers changed, we started bringing in more data, faster. Instead of downloading the data once per night from each house, we now constantly stream data between all of our participants’ homes and our servers.
It is important to know where you come from before you know where you can go. Dataport has come a long way since the initial files we created manually from our energy sensors years ago. We have listened to feedback from researchers around the world and increased the amount of data we are collecting in terms of velocity and types of readings. We added new types of data with water and gas, and we are in the final months of expanding our data collection network to both coasts of the United States.
Where will we go from here? YOU get to help point us to the next big dataset. So, send in your ideas. If you see us at a conference, flag us down. If you have an idea that needs data, send an email. We are here to expand the mother of all databases, and we know we can’t do it alone.