How Pinterest Scaled Their Database

Marty Weiner from Pinterest has written a nice post on how they scaled their DB

What I liked most about approach is the sheer simplicity of how they did it

Any DB Scaling needs to cater to few things

  • fast and easy lookups irrespective of the size of Data
    • one should be able to find needle in haystack without sweating
    • also means JOINs should be less & should never go across shards
  • Scale-out with minimal or no impact to services
  • High availability of the data

Key notes from the post are:

  • chose Mysql for it is a battle tested db for large datasets
    • => Sharding & Scaling are to be done outside the DB
  • Master is always for write as well as read
    • Avoids data sync issues
    • Slave is only for availability purpose, and not for load sharing
  • Use uuid (64 bit) to uniquely identify the two main Types of objects stored i.e., Pin-ID & board-id
    • this uuid is made of Shard-id, type (pin or board), and row-id in corresponding table in that shard
  • Shard-id range to actual shard mapping is stored in ZK
  • Pin-ID & all other tables related to that PIN-id will be on same shard (including pid-ID to board-id mapping)
  • Similarly board- and related data including board-id to pin-id mapping are on one shard
    • all pins on a board may or may not be on same shard
  • User login to shard mapping is maintained separately
  • Scaling can be done by either
    • New shard-id range addition to zk
    • Splitting shard-id ranges (will require data migration before new shard is made available)
    • Or of course, upgrading the machines 🙂

Leave a Reply

Required fields are marked *.

This site uses Akismet to reduce spam. Learn how your comment data is processed.