GSoC 2017 - Checkstyle
Multi-thread mode for Java files processing

Overview


As noted in the Checkstyle github repository, “Checkstyle is a development tool to help programmers write Java code that adheres to a coding standard”. It is very extensible and customizable, allowing developers to write their own checks to improve code quality.

Project idea


During this GSoC 2017 I have been working on multi-thread support implementation for the Checkstyle runner. The project consists of the following pars:

  1. Perform analysis, which parts should be parallelized (you could see my analysis in this issue: #4354). According to these analysis, the most consuming part is the TreeWalker module.

  2. Implement CLI parameters to control the number of threads.

  3. Split all checks into three groups:

    1. Stateless checks. These checks could be used from the separate threads at the same time.

    2. File-stateful checks. These checks contains some file-related context and therefore cannot be used from separate threads. Instead, these checks should be cloned for each thread, so there will not be shared resources.

    3. Global-stateful checks. These checks must not be cloned, so the developer should control the check thread-safety.

  4. Implement the MT mode for the Checker module.

    The ST Checker processes all files inside the same thread. Unlike that, the MT Checker processes all files on a pool of threads.

  5. Implement the MT mode for the TreeWalker module.

    The MT Checker module works well while checking several files at once, but if a user wants to check a particular file, the MT Checker would not help here. Instead, the MT TreeWalker uses thread pool to run TreeWalker checks on that pool.

  6. Prepare documentation for the MT Checkstyle mode.

Project status


Merged Pull Requests

  1. Issue #4370, PR #4420, commit 036582

    This PR introduces new CLI parameters for enabling the MT mode.

  2. Issue #4472, PR #4595, commit a40685

    This issue blocks initial performance testing, therefore it was also resolved during the GSoC.

  3. Issue #4700, PR #4710 commit 100309

    These changes were made in order to enable IntelliJ inspection, and because they are related to concurrent modifications, they were also made during the GSoC project.

  4. Issue #4945, PR #4946, commit ff85e7

    This PR removes thread-unsafe context from the JavadocPackageCheck module. This is a part of redesigning checks for the MT mode.

  5. Issue #4927, PR #4928, commit 290b5b

    This PR removes thread-unsafe context from the SeverityLevelCounter module. This is a part of redesigning checks for the MT mode.

  6. Issue #4925, PR #4926, commit 29901f

    This PR removes thread-unsafe context from the AbstractJavadocCheck module. This is a part of redesigning checks for the MT mode.

  7. Issue #4932, PR #4933, commit 0ae5aa

    This PR removes thread-unsafe context from the AbstractJavadocCheck module. This is a part of redesigning checks for the MT mode.

Pull requests in review

  1. Issue #4883, PR #4898, commit 82eb7c

    These changes are closely related to all other components, because they introduces check markers, in order to allow a developer to specify the module type - global stateful module, file-stateful module or stateless module.

  2. Issue #4870, PR #4892, commit e40d87

    This PR splits existing checks into three groups by marking them with one of previously mentioned check module markers.

  3. Issue #4869, PR #4882, commit 35a324

    This PR introduces a class, responsible for cloning checks for the Checker module. It is required, because there are several file-stateful checks, which cannot be used from the separate threads at the same time. The CheckCloneService clones all file-stateful checks in order to run them on a pool of threads.

  4. Issue #4409, PR #4890, commit 95b10c

    This PR implements the MT mode for Checker module. It is responsible for running checks on a pool of threads.

  5. Issue #4917, PR #4918, commit 8671e5

    This PR removes thread-unsafe context from the AbstractFileSetCheck module. This is a part of redesigning checks for MT mode.

  6. Issue #4908, PR #4909, commit d1ad03

    This PR removes thread-unsafe context from the AbstractCheck module. This is a part of redesigning checks for MT mode.

Pull requests in progress

  1. Issue #4957, PR #4958, commit f4fe52

    This PR implements the MT mode for the TreeWalker module.

Issues to be done

  1. Issue #4577 (Add multi thread mode to ANT task)

    The MT Ant properties were moved to another issue, while resolving issue #4370, this issue have to be implemented when the main work will be done.

  2. Issue #4547 (Add MT properties to documentation)

    Here is a documentation draft: Google Docs

  3. Issue #4894 (Prepare performance report for the MT mode)

    The initial performance report shows, that the MT version faster by ~20% than ST version. The complete performance report should be prepared after the project is finished

  4. Issue #4896 (Add documentation on how to design checks in MT modes)

    Here is a documentation draft: Google Docs

What have I learned during GSoC


First of all, Checkstyle uses code coverage metrics to maintain code quality, and, what is more important, it requires 100% code coverage for all new code. Previously, I have not faced with projects whose requirements were so strict (usually, projects requires about 75-85% of code coverage). Moreover, the Checkstyle project actively uses mutation-based code coverage, which was a new approach for me. Despite that I have used tests in my own projects, I have never wrote so many tests for a bug fix or a new feature and therefore, during this summer I have learned a lot about project testing.

Also, Checkstyle has a really good code quality - it is due to fact that before the new code is merged to the master branch, it must pass all CI checks (Checkstyle, pmd, IDEA, code coverage and others) and the PR must pass the code review. Also, you may not create a PR before opening an issue (otherwise this will lead to CI failures), and the issue should describe the problem you are trying to resolve. This is a very good feature, because code history is very important - no one wants to see a commit named “hot fix” in the git history. All Checkstyle commits must include a GitHub issue they are resolving and a short exhaustive description. When building my own CI checks, I will definitely include this check into commit validation.

Acknowledgement


I would like to thank my mentors - Ilia Dubinin and Vladislav Lisetskii, they were very responsive and I learned a lot while working with them. Also I would like to thank Andrei Selkin for helping me in resolving issues. And of course special thanks to Roman Ivanov, rnveach and all Checkstyle team members.