How Bhinneka reduce hundreds Lines of Code with JSON Schema Validation

JSON Schema

JSON Schema for validation to reduce you lines of code

Lines of Code (LOC) is one of software metrics that can represent how big & complex your product is or how inefficient your code is. More lines of code means more effort to maintain it, in reverse the fewer lines of code means the happier you get. One way to solve big product is split it to smaller product or service using Microservices architecture, but still it doesn’t guarantee you get minimum lines of code. Bhinneka itself have several microservices with biggest lines of code service is 104k and smallest lines of code service is 4.6k.

One way to reduce this enormous lines of code is to takeout what process or specs that can live independently, outside your source code, and one of them is validation. Validation is where your code run check from user input, with both positive and negative case must be covered. If user input does not meet your validation rule/ specs then it must return error, otherwise if all validation rule is valid then it can go to the next process, e.g. save to database.

Inline Validation

It is very basic implementation where you do the validation directly on the code and evaluate it one by one on the if statement. Each statement represent one condition of the validation, for example if the request body for specific field is empty.

Sample code for inline validation in Python:

if body['paymentMethodName'] == '':
    return response(400, message="Payment method can't be blank")

Now, imagine if you have dozens field and each field may have multiple condition for validation, how many line of code it will produce?

Other disadvantage of this approach is that the validation rule is not reusable. If you have other file or module that have similar validation, you have to write it again. Congratulation, you’re adding more line of code to your project and you have break the DRY (Don’t Repeat Yourself) Principle because you have more and more duplication.

Language Validation

This approach is better than inline validation because you don’t have to add many inline if statement for validation. Otherwise, you just define the validation rule on the field definition, you can also put multiple condition on one line, so this will definitely reduce lines of code. You can also create custom validation so you can have reusable rule. Nice!

Sample code for language validation on struct level in Go using Package validator:

type User struct {
	FirstName      string     `json:"fname"`
	LastName       string     `json:"lname"`
	Age            uint8      `validate:"gte=0,lte=130"`
	Email          string     `validate:"required,email"`
	FavouriteColor string     `validate:"hexcolor|rgb|rgba"`
}

However, this approach have disadvantage, it’s Language Dependent. Imagine if you want to switch to other programming language, you have to rewrite all the validation rule to new programming language. It is time consuming. Other than that, its still contributing to your lines of code because it’s live inside your source code, not outside. We need an approach that can live independently.

JSON Schema Validation

So what is JSON Schema? From its official website https://json-schema.org/:

JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.

JSON Schema can be used to validate your API client JSON request body, validate API response, or any other use case, it depends on you needs. Comparing from two previous approach, this approach is perfect for our requirements, which is:

  1. Reduce lines of code, because its live outside your source code, it wouldn’t be counted as your lines of code. Remember less code means less effort to maintain it.
  2. Reusable, you can create custom rule, and even reference to other schema.
  3. Language Independent, if you planning to switch to other programming language, you can use the same validation rule without rewriting it.

Sample JSON Schema file, let’s named it user_schema_create_params.json:

[
    {
      "id": "user_schema_create_params",
      "$schema": "http://json-schema.org/draft-04/schema#",
      "description": "schema create new user",
      "type": "object",
      "properties": {
          "name": {
              "type": "string",
              "maxLength": 70,
              "pattern": "^[a-zA-Z0-9\\,. \\/\\()-]+$"
          },
          "email": {
              "type": "string",
              "maxLength": 50,
              "format": "email"
          },
          "department": {
              "type": "string",
              "maxLength": 50,
              "pattern": "^[a-zA-Z0-9\\,. \\/\\()-]+$"
          },
          "phone": {
              "type": "string",
              "maxLength": 30,
              "pattern": "^[0-9\\()-]+$"
          }
      },
      "required": ["name", "email", "departement", "phone"]
    }
]

Now you can load the JSON Schema file and use supported library to validate it.

Sample code for JSON Schema validation in Go using gojsonschema:

    schemaLoader := gojsonschema.NewReferenceLoader("file:///project/user_schema_create_params.json")
    documentLoader := gojsonschema.NewReferenceLoader("file:///project/document.json")

    result, err := gojsonschema.Validate(schemaLoader, documentLoader)
    if err != nil {
        panic(err.Error())
    }

    if result.Valid() {
        fmt.Printf("The document is valid\n")
    } else {
        fmt.Printf("The document is not valid. see errors :\n")
        for _, desc := range result.Errors() {
            fmt.Printf("- %s\n", desc)
        }
    }

Sample code for JSON Schema validation in Python using jsonschema:

import jsonschema
import simplejson as json

with open('user_schema_create_params.json', 'r') as f:
    schema_data = f.read()
schema = json.loads(schema_data)

json_obj = {"name": "John Doe", "email": "mail@example.com", "department": "IT", "phone": "0123456789"}
jsonschema.validate(json_obj, schema)

Build Performance Testing Tools That Focus on Concurrency & Speed

Performance Testing is one of crucial factor before a product released to production. If you are backend engineer, this even get more crucial because API is the engine of your product, if something bad happen on the API, frontend (either it’s web or mobile apps) will be broken because everything is called through API.

Performance Testing can measure how strong & behavior of your API (and infrastructure behind it) can handle certain simultaneous users during peak hours or peak season. For engineer, it’s very important because it will give them high level picture of their code performance on “real” environment, because their “local” environment is a “single user” mode. For example, you can discover certain problem like memory leak on Performance Testing.

Why reinventing the wheel?

There are many performance testing tools available out there, with ab (apache benchmark) & JMeter is the most popular one. So why create another one?

One of our main concern is speed execution of the Performance Testing, because it will affect the CI/ CD build execution time. Faster is better.

Gubrak

gubrak

Gubrak is written in Golang, which is known for the concurrency & speed using goroutine & channel. So, there is no doubt about the performance & it’s very fast.

One of other cool feature from Gubrak is ability to load configuration like request url, headers & payload from json file, so your command line doesn’t overwhelming with configuration text. For example let’s create our config.json:

{
  "url": "http://example.com",
  "headers": {
      "Content-Type": "application/json",
      "Accept": "application/json",
      "Authorization": "Basic YOUR_BASIC_AUTH"
  },
  "payload": ""
}

And then you can run it like this:

 $ gubrak -m get -c config.json

Installation

Installing Gubrak is pretty easy, you can install it using Homebrew or from source.

With Homebrew:

$ brew install Bhinneka/tool/gubrak

From source:

$ go get github.com/Bhinneka/gubrak
$ go install github.com/Bhinneka/gubrak/cmd/gubrak

Benchmark

We said that we wanted to build performance testing tools that focus on concurrency & speed, so let’s benchmark how fast it is compared to other tools, in this case ab.

For testing scenario, we’ll be using from our previous post “Migrasi dari PHP (Lumen) ke Golang (Echo) – Part 1“.

  • There will be 2 services, one with PHP Lumen, the other with Go Echo
  • Concurrency variable is 10, 100, and 1000
  • Metric value is Time taken for tests, in seconds

PHP Lumen

c=10 c=100 c=1000
ab 0.277 2.668 N/A
Gubrak 0.238 2.496 N/A

Go Echo

c=10 c=100 c=1000
ab 0.034 0.250 2.107
Gubrak 0.026 0.176 2.098

As you can see, Gubrak is slightly faster than ab, the max gap is about hundred milliseconds, not so many but it may play significant factor if you running multiple Performance Testing on dozens endpoints.

Gubrak is initiated by our Backend Developer Wuriyanto. Its experimental & still under development, so you are welcome to give it a try and feedback is really needed for better quality. You can visit Gubrak Repository on Github.

Migrasi dari PHP (Lumen) ke Golang (Echo) – Part 1

Product Service Benchmark (PHP Lumen vs Go Echo) using JMeter

Product Service Benchmark (PHP Lumen vs Go Echo) using JMeter

Ketika saya pertama kali join di Bhinneka.com di tahun 2017, tugas pertama yang saya dapatkan adalah membuat microservice untuk product catalog. Sebuah service yang bisa dibilang sangat krusial karena banyak service lain yang pasti dependency ke service product ini. Tentunya dibutuhkan sebuah sistem yang memiliki kualitas yang baik serta availability & performance yang tinggi.

Namun karena keterbatasan waktu dan resource yang kami miliki saat itu, maka diputuskan untuk menggunakan language yang mudah & banyak dipahami yaitu PHP dengan menggunakan framework yang kecil atau sering disebut microframework yaitu Lumen. Lumen merupakan microframework buatan Laravel yang sudah banyak dikenal karena kemudahannya, bedanya Lumen dibuat sangat minimalis dan khusus untuk membuat API.

Untuk database kami putuskan menggunakan ArangoDB, sebuah multi-model NoSQL database yang support graph, document-based dan key-value. Sebuah keputusan yang bisa dibilang cukup “berani” karena saat itu implementasi ArangoDB belum banyak dibanding NoSQL yang lain seperti MongoDB atau Neo4j. Tapi Alhamdulillah sampai saat ini sudah 2 tahun running in production & kami tetap bertahan dengan ArangoDB, walaupun dengan beberapa problem yang kami hadapi. Next mungkin saya akan sharing tentang ArangoDB ini.

Balik lagi ke PHP Lumen, juga sudah 2 tahun running in production, awalnya kami tidak mengalami kendala karena product service ini hanya digunakan internal user, sehingga secara hits bisa dibilang minim. Namun lama-kelamaan dengan bermunculannya service-service baru, seperti website https://www.bhinneka.com dan https://b2b.id serta service lain yang dependency ke product service ini, otomatis hits-nya semakin lama semakin tinggi. Salah satu cara yang kami gunakan adalah dengan caching menggunakan Redis. So far memang belum ada kendala berarti namun kami cukup yakin ini hanya masalah waktu sebelum terjadi “bencana”.

Ingin tahu bagaimana pengalaman Bhinneka.com migrasi dari arsitektur Monolith ke Microservice? Saksikan video dari salah satu Senior Software Developer kami di acara Tokopedia Tech a Break #29 .

Selain masalah performance, kami juga sudah menetapkan language standar untuk backend service adalah Go & Python. Sebelumnya selain PHP, kami juga punya backend service menggunakan Kotlin namun sudah berhasil kami migrasi (rewrite) menggunakan Python, sehingga menyisakan 1 service yang masih menggunakan language di luar standar yang sudah kami tentukan.

Benchmark

Salah satu cara untuk melihat impact dari sebuah migrasi/ rewrite adalah dengan melakukan benchmarking.

Environment yang saya gunakan adalah sebagai berikut:

  • MacOS version 10.13.6
  • Processor 2,6 GHz Intel Core i7
  • Memory 16 GB 2133 MHz LPDDR3
  • Docker version 17.03
  • ArangoDB version 3.2.9

PHP (Lumen)

PHP version 7.2.10 dan Lumen PHP Framework version 5.4

Service akan running menggunakan web server built-in dari PHP dengan menggunakan command berikut:

$ php -S localhost:3010 -t ./public

Go (Echo)

Go version 1.11.4 dan Echo Go Framework version 3.2.6

Service akan running dengan menggunakan command berikut:
$ go run main.go

Tools

Tools yang digunakan untuk melakukan benchmark adalah ab (Apache Benchmark) dan JMeter.

Metric

Metric yang akan diukur adalah sebagai berikut:

  1. Average Response Time (in ms); semakin rendah nilainya maka akan semakin baik.
  2. Requests Per Second (RPS); semakin tinggi nilainya maka akan semakin baik.

Skenario

Akan ada 3 skenario benchmark yang akan dilakukan, dimana di dalam masing-masing skenario terdapat 2 variabel yaitu:

  • n=jumlah total requests
  • c=jumlah concurrent requests; yaitu requests yang dilakukan bersamaan dalam 1 waktu

Adapun skenario yang akan dilakukan adalah sebagai berikut:

  1. Total requests (users) adalah 10, dan masing-masing user mengakses secara bergantian (n=10, c=1)
  2. Total requests adalah 100, jumlah users adalah 10, dimana masing-masing 10 users tersebut mengakses dalam waktu yang bersamaan (n=100, c=10)
  3. Total requests adalah 1000, jumlah users yang mengakses bersamaan adalah 100 (n=1000, c=100)

Skenario #1: (n=10, c=1)

Average Response Time (in ms) Requests Per Second (RPS)
PHP Lumen 25.581 39.09
Go Echo 4.879 204.98

Pada skenario #1, dimana tidak ada user yang mengakses bersamaan, Go Echo unggul 5x lebih cepat dibandingkan PHP Lumen.

Skenario #2: (n=100, c=10)

Average Response Time (in ms) Requests Per Second (RPS)
PHP Lumen 24.308 41.14
Go Echo 1.168 856.17

Pada skenario #2, dimana user yang mengakses bersamaan ada 10 user, Go Echo unggul 20x lebih cepat dibandingkan PHP Lumen.

Skenario #3: (n=1000, c=100)

Average Response Time (in ms) Requests Per Second (RPS)
PHP Lumen 24.149 41.41
Go Echo 0.965 1035.91

Pada skenario #3, dimana user yang mengakses bersamaan ada 100 user, Go Echo unggul 25x lebih cepat dibandingkan PHP Lumen.

Kesimpulan

Semakin tinggi jumlah concurrency requests, terlihat Go dengan Echo Framework semakin jauh lebih unggul dibandingkan PHP dengan Lumen Framework, dimana dari hasil benchmark menunjukkan 25x lebih cepat.

Limitasi

Hasil benchmark ini sangat mungkin berbeda dengan hasil benchmark lain, dimana product service yang kami gunakan merupakan sebuah produk jadi, yang sudah mengalami modifikasi (seperti library yang di load di runtime), dan modifikasi lainnya.

Selain itu benchmark ini kami lakukan di environment local yang berbeda dengan environment production.

So bagaimana kalau di environment production? Ditunggu part2-nya ya!