Elasticsearch Lua II

This post is about my GSoC project, that I worked on during summer, 2016. I worked under the LabLua organization on adding a test suite and improving documentation for elasticsearch-lua. elasticsearch-lua.

Introduction

Elasticsearch is a distributed, scalable and full-text search engine based on Lucene. It provides an HTTP web interface and handles JSON documents. It is presently ranked 1 in the category of ‘Search engines’.

elasticsearch-lua is a client for Elasticsearch that provides a wrapper over the REST interface for the Lua Programming Language. I developed it as part of GSoC 2015 with my mentor Pablo Musa.

My GSoC project this year was entitled ‘Improve elasticsearch-lua tests and builds’ and was a continuation of the work that I had done last year. Apart from adding a test suite for elasticsearch-lua and making it robust, I also decided to work on the documentation of the code.

Test suite for elasticsearch-lua

The tests are divided into unit, integration and stress tests. Note that all these tests run for Lua 5.1, 5.2, 5.3 and LuaJIT 2.0. Code coverage is measured for unit tests and integration tests. Coveralls was chosen to measure and maintain code coverage. As of now, around 91% of the code is covered with tests.

Unit Tests

There are many different modules within elasticsearch-lua. For every such module, there is a corresponding unit test written. Unit tests can be found in tests/ directory. Care was taken to test extensively all the endpoints. Some key points to note:

Some modules were ‘mocked’ to intercept external calls.
Not only return values (success or failure) but every internal parameter was ‘deep’ checked. Deep check involves checking each nested parameter recursively. For example, a lua table might have another table inside it.
Travis was chosen for continuous integrations. Everytime code is pushed, a build is triggered on travis and unit tests are run. Success or failure status is reported back.
A number of bugs (pertaining to generating of target url for endpoint, and listing source files in the rockspec file) were found by running the tests. All were fixed.

The diff of changes due to unit tests can be seen here.

Integration Tests

Apart from the test of every component individually, it is equally important that they work together while interacting with each other. To make elasticsearch-lua robust, it was necessary to add some integration tests.

Integration tests involve calling an API function in a real environment and testing parameters at every point. Wrappers for some API functions were developed so as to avoid repeated code.
We believe that using real data for integration tests is always a good practice. Also, the test dataset should stress the system a bit and, thus, it should not be very small. Therefore, we opted by using part of the data available freely from www.githubarchive.org. A mirror is maintained here. The dataset is not a part of the main repository due to size, so it is downloaded on the fly while running tests on travis.
Common operations (such as search, index, get, delete and bulk) were tested in a single run. These operations are intermixed together.

The diff of changes due to integration tests can be seen here.

Stress Tests

Stress tests involve testing elasticsearch-lua limits. By having these tests, the client will be able to prove its stability in an effective manner.

A separate framework for stress testing was designed, considering that it might take a few hours to finish. In short, every successful (unit + integration tests) build triggers a new build, which runs the stress tests, provided that no such build is already running.
The status of stress tests is reported through a separate badge in the README.

The diff of changes due to stress tests can be seen here.

Documentation

Having a good documentation is very important for any library. It helps developers to understand functionalities without having to investigate the code. Moreover, it helps the library adoption as new developers can use it as a guide to get started. Although this was initially not a task for the GSOC project, after realizing its importance, I opted to invest a lot of time in the documentation and added it to the GSoC project timeline.

Guides

The guides consist of documents and tutorials that help developers to install, use and customize elasticsearch-lua. The guides explain the most frequently used functionalities along with some internals. These pages are hosted here.

API Documentation

The API Documentation lists all possible functions provided by the elasticsearch-lua. Each function name is accompanied by the parameters that it accepts. The API documentation is published here.

The diff of changes pertaining to documentation can be seen here.

Additional tasks (Not part of GSoC)

Apart from the tasks mentioned above, I worked on the following as well:

Luaver

While working with elasticsearch-lua, I had to frequently switch between different versions of lua while developing the test suite. Switching is not simple and I faced the following issues often:

Building different lua versions required some effort such as downloading the version source, unzipping, installing and managing any dependency faced. Also, the previous version had to be deleted completely in order to avoid any ambiguity.
Luarocks installation depends on the Lua version. Switching lua versions can mess up the installed rocks.
To solve these issues I used workaround methods, such as editing the source code of some existing rocks.
Sometimes, these code changes broke the entire rock. In such cases, I had to remove all existing rocks, rebuild luarocks and then reinstall the needed rocks.

As I was already familiar with NodeJS and Ruby and understood how such problems were addressed by nvm and rvm, I decided to create a similar tool for lua, and that is how luaver was born.

I also wrote a separate blog post about luaver and you can support the project here. Initially, I didn’t expect to spend much time on it and figured that I could manage both GSoC and develop luaver simultaneously. However, at some point in time, I got too involved in luaver which resulted in me getting one week behind the timeline that I had proposed for GSoC. Nevertheless, I covered it up soon.

Updating elasticsearch-lua

It is important that the client implements all the features provided by Elasticsearch. Also, Elasticsearch is evolving a lot and releasing in a fast pace, so it is important that clients are also up-to-date. Some features were missing and the client version was 1.6 while Elasticsearch is in 2.3. Therefore, I decided to update existing features and implement some missing features.

Benefits of working on the same project for two consecutive years

I myself had written the client. The codebase was already at my finger-tips. I could spend more time working than understanding and getting comfortable with the code.
I wanted to further consolidate my client and make it stable. I couldn’t get much time during the rest of the year to work full-fledged on the development. Google Summer of Code offered a nice incentive.
I had already worked with the Lua community. Being in familiar environment, I was able to work and think freely. luaver was created to benefit the open source Lua community. If this was my first time I wouldn’t even have thought about developing it.

Find me on Github and Twitter