Scaling

what happens when your startup doesn't die

Aug 03, 2020

If your startup doesn't die you are going to reach many points in your software development that you need to "scale".

Scale is a process of bringing something from a small scale to a big one.

When you are starting and you are small, you have a few lines of code, a few services to deploy, a few tests, a few members in your team, a few issues, and your projects have a few dependencies.

This will be based on my own experience in launching some products/services at Entria.

Scaling Issues

We define a unit of work as an issue, it could also be called a task.

An issue defines everything that needs to be done to fix the bug, to implement the improvement, or to release a new feature.

When we started our first project (Brand Lovers), we had 1 repo for backend, 1 repo for frontend, and 1 repo for the app.

We had issues in each one of these repositories that made it hard to figure it out what to work on, and where to write a new issue, is this a backend or frontend issue?

We fixed this moving all issues to the server repo, so we have a centralized place where we can manage all issues. It was easy to find what to work on and where the create a new issue.

We also changed the way we thought about issues. Any issue was cross-functional, with no more backend or frontend separation.

An issue contained the design if needed, both backend and frontend tasks that will be performed mostly by the same developer, we are all full stacks here.

We also organized sprints per week using the GitHub milestone to track what we are working on this week and next week.

Scaling Codebase

At Feedback House we have more than 10 different modules that help business manage their teams.

Using multiple repositories was making a lot of things hard for us. We had a lot of duplicated code, copy and pasted directly from another repository because it was harder to keep publishing common packages to be consumed.

Each repository was a huge monolith, hard to split into many small units.

We moved each repository (backend, frontend, and app) to a monorepo structure.

We still do not have the infra necessary to keep all of them in a single monorepo, I will explain about these challenges in the next sections.

Using repositories does not scale for product code, as we want to be able to perform changes across many "repositories" at the same time when working in a feature.

Creating many releases will get some projects to update and some projects stale.

We want to move all code together, using a monorepo makes it easy to converge code patterns.

If you have many repos, you have more trouble making sure all of them follow the same code conventions, like lint, prettier, and so on and so for.

Moving to a monorepo let us split our monolith into many packages that can be our building blocks to help us build new features faster.

Let's see some real examples of packages in the backend after this change:

@app/graphql - package that makes it easy to build GraphQL server using mongodb and graphql-js
@app/i18n - manages all i18n translations strings to be shared among services
@app/modules - contains all models/collections to be shared among each GraphQL service
@app/roles - contains all permission logic needed to access resources in our platform

Let's see some packages in the frontend:

@app/ui - contains our design system
@app/form - contains all form fields
@app/hooks - contains all our shared custom react hooks

Monorepo let us design and build our software as a set of building blocks.

Scaling Tests

As we saw in Your QA won't get this bug article, we need tests to catch second-order effects of changes that even your QA team won't get.

As our codebase grows our number of tests also grew.

More tests also ensure we can move faster without breaking things.

Tests also help us to model business logic and make sure we are building the right thing.

We saw our CI time get slower over time.

The first thing we did was to instead of running tests in a serial way, we started running them in parallel, using maxWorker based on the number of CPU - 1.

You can read more on this article Parallel testing a GraphQL server with Jest and MongoDB.

We were using a docker with 8 CPU, so we could run 7 workers, getting our tests to be almost 7 times faster than before. This was a huge first step to make all our tests run faster.

But as you grow this solution is not enough, as we keep adding more tests (we reached more than 5k tests only on backend monorepo).

Then, we start doing parallel testing across many containers using CircleCI testing splitting.

We split tests among 7 parallel dockers with 8 CPU each one, we got tests to run 7 times faster again.

However, this is not a cheap solution as each new parallel container we increase our CI costs. It is also not optimal as we were always testing everything every time even when modifying a simple README file.

Our last improvement to make this better (to scale) was to only run tests affected by the files in a given pull request.

This makes sure we have fast feedback on the pull requests, as we only run the minimal set of tests.

And we keep running all the tests on the main branch to make sure we don't regress in other areas.

Firefox went even further using Machine Learning to decide which tests to run, read more about it here Testing Firefox more efficiently with machine learning

Scaling Dependencies

When growing a product/service we add more and more dependencies to move faster without having to rebuild everything.

This is good at the beginning but it does have some cost.

As you have more dependencies increases the chances of hitting a bug in some of your dependencies. You can also find some missing features in that dependencies.

There are 3 ways to solve these 2 problems:

if the fix/feature is small use patch-package and sends a pull request to the package to upstream the fix
if the fix/feature is big you can maintain a fork until the pull requests lands in the package
if the fix/feature is big you can vendor, bring the package to your monorepo, and try to keep them in sync.

Another problem with too many dependencies is that you always need to keep them up to date. You need to handle breaking changes and make sure you won't break any workflow or test.

If something is the core of your product, it is better to build yourself.

Scaling Code Modifications

Cpojer explains in this article Effective JavaScript Codemods, how they are using codemod to evolve codebase pattern at Facebook.

As your codebase evolves news patterns appear that are better than the old ones.

When your codebase is small you can easily patch all the files yourself.

However, when you have many lines of code this is impractical.

We made more than 20 codemods using jscodeshift that help to keep all our codebase up to date to our new code patterns.

There are many more scaling problems not covered in this article as scaling teams, scaling design, scaling the sales teams.

What scaling problems you have already solved in your company?