My Experience Contributing to Open Source

July 14, 2022

Compartir post:

My Experience Contributing to Open Source

July 14, 2022

Compartir post:

Why I will choose to contribute to open source in the future

Contributing to open source is a fun way to learn and improve your coding skills – and you get to do so by helping others. Open source communities are often open to new participants in a learning environment; this makes the experience of contributing joyful.

A good way to decide where to contribute is to choose open source software that you use daily. In that sense, if you know the software that you are about to contribute to, you can identify easily where help is needed and where you can add new and interesting features.

In this article I will detail contributions that I made and demonstrate more or less the workflow of contributing to Open Source. The projects I will focus on are the following:

scrapy: Python web crawling and scraping framework.
jrnl: Text journal application for the command line.
polybar: A fast and easy-to-use tool for creating a status bar.

Scrapy| Add a new extension to check settings names

Scrapy is a very customizable tool; one of the main ways to customize it is through settings. Scrapy settings allow you to customize the behaviour of all Scrapy components, including the core, extensions, pipelines and spiders themselves. The infrastructure of the settings provides a global namespace of key-value mappings that the code can use to pull configuration values from.

So, for example if `DNSCACHE_ENABLED` is set to True, our spider will enable DNS in-memory cache. Scrapy has a LOT of configurable variables that are prone to typo errors that did not provide enough feedback for new users to discover this trivial error.

This contribution tries to solve that problem by creating a new extension. Initially, a linter was proposed for the issue, but after some discussion consensus was that a runtime approach will be better. The extension finds unused settings by reading an attribute (`has_been_read`) in the settings dictionary and if possible will suggest a possible replacement. Another advantage of this runtime approach is that it will also find unused settings that possibly are not misspelled settings.

PR: https://github.com/scrapy/scrapy/pull/4828

Scrapy| Add failed and success count stats to feedstorage backends

Scrapy allows users to specify how to export extracted data (.xml, .json, .csv, etc) and also where to save it (the local filesystem, S3, Google Cloud, an FTP server, standard output). However, those storage backends did not save performance information at the end of the run. A good place to save this information are the statistics that Scrapy generates during the run. These stats can be used to measure how the run performed, as it contains stats about memory usage, the finish timestamp and starting timestamp, etc.

The idea of this PR is to add a new stat that will help users to know if the storage backend had some problem while saving. For example, if a spider saves to S3 and the local filesystem but the S3 credentials were wrong, this stat will be presented to the user:

{…

‘elapsed_time_seconds’: 11.61577,

‘feedexport/failed_count/S3FeedStorage’: 2,

‘feedexport/success_count/FileFeedStorage’: 2,

‘finish_reason’: ‘finished’,

…}

PR: https://github.com/scrapy/scrapy/pull/4850

jrnl | Add default display format option to config file

jrnl supports a wide variety of formats(Markdown, JSON, YAML, etc); however the usability for this feature was not the best. That is because if you want to (for example) print the last eight entries in Markdown format you can use this command:

jrnl -8 –export md

That could be tedious for users that want to print as Markdown every time, because they must add the `–export md` option continually. In order to avoid that annoyance, this contribution adds a new option to the configuration file: `display_format`, which is an option that can be set to any of the exporters that jrnl has.

PR: https://github.com/jrnl-org/jrnl/pull/1050

Polybar| Remove upper bound to get_volume

Polybar has a feature to show a system’s actual volume. The problem is that it does not “recognize” when volume goes beyond 100%. This is a possibility that sound systems such as PulseAudio gives you, allowing you to increment volume beyond 100% to 150%.

This minor feature only requires avoiding clamping the volume between [0, 100]. Maybe the “hardest” part was compiling the code and testing the changes.

PR: https://github.com/polybar/polybar/pull/2184

Conclusion

Knowledge obtained from discussions with maintainers, getting more involved with projects that were interesting to me, and the welcoming communities are some of the major reasons I will seek to contribute to open source in the future.

What are your thoughts on open source?

My Experience Contributing to Open Source

Compartir post:

My Experience Contributing to Open Source

Compartir post:

Why I will choose to contribute to open source in the future

Scrapy| Add a new extension to check settings names

Scrapy| Add failed and success count stats to feedstorage backends

jrnl | Add default display format option to config file

Polybar| Remove upper bound to get_volume

Conclusion

Stay Connected

More Updates

Follow us!

Solutions

Industries

About

Email

Location

© 2022 Emptor.