Slowing Down and Going Faster
It’s been an interesting few weeks here in the ol’ Postwire engineering den. If you don’t have time to read this incredibly captivating blog entry, let me just say that Postwire is now faster. Way faster.
Zen and the Art of Startup Software
Ask any engineer at a startup and they won’t hesitate to tell you that it’s different. Here at Postwire, you could call us “agile” but that word is now loaded with so much marketing buzz that it’s hard to quantify what that means exactly. In a nutshell, we build software fast, because we’re developing as we discover our market and it’s needs. In order to build fast you’ve got to sacrifice something, and what that means here is that we don’t optimize until we absolutely have to. Up until a couple weeks ago, that was easy to manage. So what changed?
You like us! You really like us!
A few weeks ago something awesome started to happen. Users started signing up in larger and larger numbers, and then they decided to stick around and use the application once they were there. Sure we’d had steady growth up to that point, and had a handle on balancing out optimizations in time with the typical growth. This time however, it was impossible to keep up.
Starting in the first week of October, our average number of concurrent users increased by a factor of 10, almost overnight. Not only were we super psyched by the incredible uptake in usage, we were instantly on deck trying to solve the performance problems caused by this sudden surge of users. Feature work was put aside as the whole team scrambled to profile our system to find those big, meaty performance draining portions of code that we could bite off and bring everyone back into acceptable performance standards. (By the way, if you aren’t using New Relic you’re making a mistake.)
We found that while we could make lots of small tweaks and add features like pagination, we just couldn’t get performance where we wanted it to be. With a number of concurrent users as high as this, we had no choice but to start work on a major performance overhaul. This was no light task, and we’d known for months that we’d need to do this eventually. Eventually just came much faster than we had expected in this case.
The fix is in
We took great care, and spent some long nights and weekends to get the performance to a level we thought was acceptable. But how did we do?
Fetch Page API Call for one of our most active users:
- Before Optimizations: 7.638 Seconds
- After Optimizations: 0.160 Seconds
Our most common request is now 48x faster. So, yeah we’re celebrating. But we aren’t done.
Even though we are thrilled that our users can get a great experience again, we aren’t satisfied. There are still some requests that take longer than we’d like and we’re continuing to improve as we go. For now though, we’re celebrating, and I hope you are too.