Skip to main content

Benchmarking

You can use our benchmarking tools to measure the impact of your performance-related code changes or to ensure you don't introduce any performance regressions when making other changes.

Use:

Rule benchmarking

A Stylelint rule can repeat its core logic many, many times (e.g. checking every value node of every declaration in a vast CSS codebase). So it's worth paying attention to performance and doing what we can to improve it!

Improving the performance of a rule is a great way to contribute if you want a quick little project. Try picking a rule and seeing if there's anything you can do to speed it up. Make sure you include benchmark measurements in your pull request!

Getting started

You can run a benchmark on any given rule with any valid config using:

npm run benchmark-rule -- ruleName ruleOptions [config]

For example:

npm run benchmark-rule -- value-keyword-case lower

If the ruleOptions or config arguments are anything other than a string or a boolean, they must be valid JSON wrapped in quotation marks. For example:

npm run benchmark-rule -- value-keyword-case '["lower", {"camelCaseSvgKeywords": true}]' '{"fix": true}'

Interpreting results

It will end up printing some simple stats like this:

Warnings: 1441
Mean: 74.17598357142856 ms
Deviation: 16.63969674310928 ms

Compare the results with those of:

  • a similar rule when writing new rules
  • the main branch when changing existing rules

Implementation details

The script loads Bootstrap's CSS (from its CDN) and runs it through the configured rule.

System benchmarking

The system benchmarking tool measures Stylelint's overall CLI and Node.js API performance across realistic workspaces of varying sizes.

Getting started

You can compare against a baseline using:

# Run benchmarks and save a baseline result.
npm run benchmark -- --save=baseline.json

# Make changes or switch branches, then compare.
npm run benchmark -- --compare=baseline.json

If you already have two benchmark files, you can compare them without running any benchmarks:

npm run benchmark -- --compare=baseline.json --compare-to=after.json

To view a saved benchmark file:

npm run benchmark -- --show=baseline.json

Options

OptionDescription
--sizes=small,medium,large,xlargeWhich workspace sizes to test.
--iterations=NMeasured iterations per benchmark. Default is 10.
--warmup=NWarmup iterations to discard. Default is 2.
--modes=api,cliComma-separated list of modes to test. Running both modes, which is the default, is intended to help identify performance regressions in either mode.
--save=FILESave results to JSON.
--show=FILEDisplay results from a saved JSON file.
--compare=FILECompare against baseline.
--compare-to=FILECompare --compare file against this file directly without running benchmarks.
--benchmark-onlySkip workspace generation.

Interpreting results

Summary

──────────────────────────────────────────────────────────────────────────────────────────
Size Files API time ±CV /file CLI time ±CV /file
──────────────────────────────────────────────────────────────────────────────────────────
Small 20 32.96ms 32.1% 1.65ms 331.55ms 15.2% 16.58ms
Medium 100 191.33ms 10.1% 1.91ms 551.01ms 12.2% 5.51ms
Large 500 1.33s 3.7% 2.66ms 1.77s 7.1% 3.55ms
X-Large 1000 4.79s 9.4% 4.79ms 5.72s 3.4% 5.72ms
──────────────────────────────────────────────────────────────────────────────────────────
  • Time: mean with the top and bottom 10% of iterations removed for stability.
  • ±CV: coefficient of variation, which measures how consistent the results are. Lower is better.
  • /file: time per file. If this increases significantly for larger workspaces, it may indicate a performance issue. Higher values at small sizes can be explained by fixed overheads, such as process startup time.

Comparisons

────────────────────────────────────────────────────────────────────────────
Size Baseline Current Diff Change Status
────────────────────────────────────────────────────────────────────────────
Small 265.98ms 331.55ms +65.57ms +24.7% ✗ Slower
Medium 563.36ms 551.01ms 12.35ms -2.2% ≈ Same
Large 1.81s 1.77s 39.93ms -2.2% ≈ Same
X-Large 5.63s 5.72s +97.60ms +1.7% ≈ Same
────────────────────────────────────────────────────────────────────────────

The significance threshold is dynamic based on the coefficient of variation (CV) of both baseline and current runs. If the CV is high, indicating noisy measurements, the threshold is raised to avoid false positives.

  • ✓ Faster: improvement beyond the CV-based threshold
  • ✗ Slower: regression beyond the CV-based threshold
  • ≈ Same: within measurement noise

Tips

You can:

  • close other applications and avoid running on battery power for the most reliable results
  • increase --iterations or reduce system load if you see a high CV of over 15%, which indicates noisy measurements
  • use --sizes=small,medium,large for faster feedback (the xlarge size is useful for stress testing, but takes longer to run)

Implementation details

The tool:

  • discards warmup iterations to account for JIT compilation and cache warming
  • removes the top and bottom 10% of iterations using a trimmed mean to reduce the impact of outliers

And simulates the following workspace sizes:

SizeFilesRulesOverridesPluginsSimulates
small201000Personal site or small library
medium10025102Moderately-sized product or component library
large50050505Enterprise app or design system
xlarge1000802008Huge, sprawling monorepo