Build your own data warehouse for personal analytics with SQLite and Datasette
Start time | 16:00 |
---|---|
End time | 16:25 |
Countdown link | Open timer |
Internet companies collect staggering quantities of data about us - both with and without our knowledge. Thanks to Europe's GDPR legislation most companies now let you export your data back out again. But what can you do with it? I'll show how to use the powerful combination of SQLite and Datasette to take control of your digital life and build a private data warehouse for your own personal analytics.
Many data enthusiasts dream of analyzing their own personal data, but few find time to build their own pipeline for it. This talk will show you how to get started with personal analytics with the highest possible return on your invested effort.
SQLite is the ideal tool for building a personal data analysis pipeline: it's free, fast and widely supported. Each database is a single file on disk, so you don't need to set up a database server to start using it. Tools that import data into SQLite can be written in any programming language, and its JSON support means it can even ingest data that may not fit neatly in a standard relational database table.
Datasette is a Python application that provides an interface over SQLite. It lets you bookmark and queries in your browser and export the results as JSON and CSV. The Datasette plugin ecosystem has over 30 plugins that extend Datasette in different ways, adding visualization tools, alternative export formats and more.
I'll show how to combine SQLite, Datasette and some simple Python scripts to ingest personal data from multiple different sources and build a personal data warehouse for your digital life. Data sources will include:
- Apple Photos
- Google (via Google Takeout)
- Foursquare / Swarm
- GitHub
- Apple Health
- 23AndMe
Techniques that work for an individual can work for organizations too. I'll finish by showing how this approach to working with data can scale up to solving professional problems in addition to personal analytics.
Simon is the creator of Datasette, an open source tool for exploring and publishing data.
Datasette is based on Simon's experiences working as a data journalist at the UK's Guardian newspaper.
Simon is also a co-creator of the Django web framework. He recently completed the JSK Fellowship program at Stanford.
https://simonwillison.net/ - @simonw