Learning Equality

Learning Equality is committed to enabling every person in the world to realize their right to a quality education by enabling teaching and learning with technology, without the Internet.

Follow publication

Contributions to Kolibri for Google Summer of Code 2021

Learning Equality participated as a mentor in the 2021 Google Summer of Code program, a global initiative focused on bringing more student developers into open source software development.

This guest blog post, by GSoC student Vivek Agrawal, details the experience of participating in the program and the contributions made to Kolibri.

Overview: Supporting the Kolibri backend Tasks system

As part of the Google Summer of Code (GSoC) program, I worked on Learning Equality’s Kolibri Learning Platform this summer. I contributed to Kolibri’s backend Tasks system, which is an asynchronous task queue that handles processing of time consuming tasks via worker threads / processes built in pure Python.

The primary pain point of the existing Tasks system was that to add a new category of jobs to the Tasks backend for execution required modifying the core Tasks API. My project aimed to address this exact pain point.

Here are all the pull requests that I created this summer to implement my project.

The Starting Point

I first applied as an applicant in 2020. The seed for Google Summer of Code was sown when one of my friends studying at National Institute of Technology, Raipur, India told me about the program.

I was going through my first semester at that time and I was new to programming. I looked up on the Internet and found Google Summer of Code to be an opportunity that closely aligned with my career interests and development.

I decided to apply for GSoC 2020 and was unfortunately not selected, but I was motivated to continue building my developer skills and apply again the next year.

Then came March 2021, when the organizations list for GSoC 2021 was announced. In this one year gap, I had worked hard to build up my programming skills and a strong foundation to be able to contribute successfully to the work at the organizations that were listed.

I first shortlisted a few organizations like Accord, Chromium, Joplin and Wikimedia. But the lightning moment came when I looked at Learning Equality’s GSoC organization page. The values and vision of Learning Equality matched exactly with my passion of providing education to low resources communities. I made the decision that if I were to apply for GSoC 2021, it would only be for Learning Equality.

Understanding the Problem Statement

The ideas list page outlined ideas for varying individual capacities. Since I was more interested in servers and databases, I chose the idea titled “Tasks API is messy and difficult to work with.”

I made it clear to myself that understanding the problem thoroughly was the most important part, recognizing that if I don’t interpret the problem correctly, then it would become impossible to write a good solution.

I started by reading the Kolibri developer documentation and setting up the developer environment. I then continued with the developer documentation to get an overview of the Kolibri architecture.

I searched issues related to the Tasks backend on GitHub and found some low hanging issues that I could help with. Writing code to solve those initial issues provided me with a solid understanding of the Tasks backend.

I submitted these pull requests when I was trying to get my head around the Tasks backend:

After submitting these pull requests, there was still a remaining question: What problems were the developers actually facing with the Tasks backend?

Reaching out on the Learning Equality #GSoC Slack channel to discuss this question with the Co-Founder and Product Lead, Richard Tibbles, helped further my understanding of the challenges and where I could best contribute to the team.

Following our discussion, I wrote a detailed proposal and submitted it for review. I took the opportunity to connect again with Richard via Slack for feedback on the proposal and revising it before submitting a final version.

Getting Selected and the Community Bonding Period

After submitting the proposal, I didn’t settle. With one month remaining before the result day, I continued getting a deeper understanding of the Kolibri backend and Tasks backend in particular.

I submitted these pull requests to fix issues related to the Tasks backend:

In the one month period before the results, I learned about software testing and I did a detailed study on threads, processes and concurrency at the operating system level.

The day that we all GSoC participants were waiting for finally arrived — May 17, 2021. Around 11:30 PM, I received an email from Google stating that I had been selected as a Student Developer at Learning Equality for the summer!

Wow — what a great moment it was for me and for my family & friends as well! This marked the start of the Community Bonding Period. I logged into Slack and a meeting was scheduled for the next day with my mentors, Richard Tibbles and Jonathan Boiser.

In the meeting, I was introduced to my mentors and we then discussed a plan for refactoring the Tasks backend to address the developer pain points. The discussion gave a very clear perspective of what needed to be done.

In the remainder of the Community Bonding Period, I continued learning about topics that I had less experience with — Django Rest Framework and the concurrent.futures library. Also, during the community bonding period, I made a roadmap for myself for the GSoC work period.

Completed Work Items

1. Registering tasks

The first challenge was to enable Kolibri plugins to register tasks on the Tasks backend for later enqueuing and monitoring of those task functions. We decided to implement a decorator-based registration functionality inspired from Kolibri’s own @version_upgrade.

The Kolibri plugins get added to the INSTALLED_APPS Django setting, so to register tasks on the Tasks backend, we run a loop through the INSTALLED_APPS and import tasks.py module programmatically.

This, in effect, runs the tasks.py module and hence the @register_task decorator gets to run. I implemented the@register_task decorator that, when run, creates an object of the RegisteredJob class, which gets stored in the JobRegistry for later querying.

For example, to register a function add, we would write:

@register_task
def add(x, y):
return x + y

The methods in RegisteredJob class namely enqueue, enqueue_at and enqueue_in get binded to the decorated function. So if the plugin developer wants to enqueue this function via the Python backend, then they can do it by writing add.enqueue(4, 2). This will enqueue the add function with arguments 4 and 2.

We can pass a validator function as a parameter to the decorator. When the frontend client sends an API request for enqueuing this task, the validator runs within the HTTP request / response cycle. If the validator doesn’t raise any exception, that means we are good to go with enqueuing the task.

For example, to register a function add and its validator add_validator, we would write:

def add_validator(request):
assert isinstance(request.data[“x”], int)
assert isinstance(request.data[“y”], int)
@register_task(validator=add_validator)
def add(x, y):
return x + y

Pull request #8142 implemented this whole functionality.

2. Enqueuing any type of task via an API endpoint

Now, with the functionality of registering tasks to the Tasks backend on our hand, an API endpoint was required for enqueuing task of any type. The POST /api/tasks/tasks API endpoint will be consumed by our frontend client to enqueue tasks.

The POST payload for this endpoint expects a task parameter that should contain a dotted path to the task function decorated with @register_task. Now, if a validator was assigned to the task, then that validator runs with request as its argument and the return value of the validator is then passed to the task function as keyword arguments.

If a validator was not assigned to the task, then we pass everything that came with the request payload, except task, as keyword arguments to the task function.

Pull request #8186 implemented this functionality.

3. Single worker pool

The existing task worker model was strictly tied to three queues. Each worker group would only look for tasks in their assigned queue. This model restricted our plugin developers to assign tasks to one of those three queues only.

The @register_task decorator accepts priority as one of its parameters. The possible values of priority can be regular or high. Tasks with regular priority are those that can wait for some time before they actually start executing and high priority is for tasks that want execution as soon as possible.

I implemented a priority-based single worker pool. We have two groups of workers: regular and high. The high workers come into action only when all regular workers are busy and we still have jobs with high priority.

If there are fewer workers running than there are regular workers, we look first for jobs with high priority. If found, we run it, otherwise we look for jobs with regular priority and run those if found.

If all regular workers are busy, then the remaining workers only look for high priority jobs. If we find one, we run it. This algorithm makes sure high priority jobs don’t need to wait for their execution.

Pull request #8299 implemented this functionality.

4. Refactoring the Tasks API endpoints

This new architecture work around single worker pool and decorator-based registration enabled refactoring the Tasks management API endpoint methods like list, retrieve and restarttask.

The @register_task decorator accepts permission_classes as one of its parameters. The permission_classes parameter takes in Django Rest Framework’s permission classes.

When the user sends a request to POST /api/tasks/tasks/, we check whether the user is authorized to proceed with the help of permission_classes. All other task API endpoint methods have been refactored to use the same permission_classes to authorize the request.

Pull request #8303 implements this functionality.

Work In Progress Items

1. API documentation

The documentation for the new tasks backend architecture and consumable APIs is a work in progress.

The draft pull request #8336 aims to address this.

2. Decoupling the monolithic tasks module

The tasks in the kolibri.core.tasks.api monolithic module need to be decoupled to their respective Django apps.

Completing this work might take some time as it requires a lot of manual testing. I’ll continue to push commits to the draft pull request #8269 to get this done.

The Future

1. Going from Pickle to JSON

We have observed some issues in the past with using pickle as our binary serialization method. Job objects get serialized via pickle and then get persisted on the database which are later deserialized when retrieved. This has some security implications too.

So to future proof our tasks backend, our long term goal is to completely move away from pickle and instead store job data in the JSON standard.

2. Extending the create API method to handle scheduled tasks

The POST /api/tasks/tasks/ API endpoint implementation currently does not handle scheduling of tasks for enqueuing at some future point in time.

We should be extending the API to include this use case. It should be fairly easy to implement, given the current architecture.

Closing Thoughts

First of all, I would like to thank with all my heart Sir Richard Tibbles for his amazing guidance throughout the program. Without his guidance and ideas I would have not been able to implement this project.

Around three months back, when I started with my Google Summer of Code 2021 journey, I had almost zero practical experience with backend development. These three months of work with Learning Equality’s Kolibri platform has made me a very confident backend developer.

When I look back and see myself, I feel proud :)

Also, I would like to thank the whole Learning Equality team for creating an atmosphere that kept me relaxed throughout the program and I enjoyed every moment of this program.

Thank you Learning Equality ❤

Cheers!

Sign up to discover human stories that deepen your understanding of the world.

Published in Learning Equality

Learning Equality is committed to enabling every person in the world to realize their right to a quality education by enabling teaching and learning with technology, without the Internet.

Written by Vivek Agrawal

Founder @ QA Rangers. Helping software companies' CTOs sleep without worrying about critical bugs.

No responses yet

Write a response