97 lines
3.6 KiB
Markdown
97 lines
3.6 KiB
Markdown
# safe-regex
|
|
|
|
Detect potentially
|
|
[catastrophic](http://regular-expressions.mobi/catastrophic.html)
|
|
[exponential-time](http://perlgeek.de/blog-en/perl-tips/in-search-of-an-exponetial-regexp.html)
|
|
regular expressions by limiting the
|
|
[star height](https://en.wikipedia.org/wiki/Star_height) to 1.
|
|
|
|
WARNING: This module has both false positives and false negatives.
|
|
Use [vuln-regex-detector](https://github.com/davisjam/vuln-regex-detector) for improved accuracy.
|
|
|
|
[![Build Status](https://travis-ci.org/davisjam/safe-regex.svg?branch=master)](https://travis-ci.org/davisjam/safe-regex)
|
|
|
|
## Example
|
|
|
|
Suppose you have a script named `safe.js`:
|
|
|
|
``` js
|
|
var safe = require('safe-regex');
|
|
var regex = process.argv.slice(2).join(' ');
|
|
console.log(safe(regex));
|
|
```
|
|
|
|
This is its behavior:
|
|
|
|
```
|
|
$ node safe.js '(x+x+)+y'
|
|
false
|
|
$ node safe.js '(beep|boop)*'
|
|
true
|
|
$ node safe.js '(a+){10}'
|
|
false
|
|
$ node safe.js '\blocation\s*:[^:\n]+\b(Oakland|San Francisco)\b'
|
|
true
|
|
```
|
|
|
|
## Methods
|
|
|
|
``` js
|
|
const safe = require('safe-regex')
|
|
```
|
|
|
|
### const ok = safe(re, opts={})
|
|
|
|
Return a boolean `ok` whether or not the regex `re` is safe and not possibly
|
|
catastrophic.
|
|
|
|
`re` can be a `RegExp` object or just a string.
|
|
|
|
If the `re` is a string and is an invalid regex, returns `false`.
|
|
|
|
* `opts.limit` - maximum number of allowed repetitions in the entire regex.
|
|
Default: `25`.
|
|
|
|
## Install
|
|
|
|
With [npm](https://npmjs.org) do:
|
|
|
|
```
|
|
npm install safe-regex
|
|
```
|
|
|
|
## Resources
|
|
|
|
### What should I do if my project has a super-linear regex?
|
|
|
|
1. Confirm that it is *reachable* by untrusted input.
|
|
2. If it is, you can consider whether you can prevent worst-case behavior by trimming the input, revising the regex, or replacing the regex with another algorithm like string functions. For examples, see Table 5 in [this article](http://people.cs.vt.edu/davisjam/downloads/publications/DavisCoghlanServantLee-EcosystemREDOS-ESECFSE18.pdf).
|
|
3. If none of those solutions looks feasible, you might also consider changing regex engines. The [RE2 bindings](https://www.npmjs.com/package/re2) might work, though test carefully to confirm there are no [semantic portability problems](https://medium.com/@davisjam/why-arent-regexes-a-lingua-franca-esecfse19-a36348df3a2?source=friends_link&sk=d21be7f8f723e2080dc993385c6973d1).
|
|
|
|
### Further reading
|
|
|
|
The following documents may be edifying:
|
|
|
|
- [Research brief on the extent of super-linear regexes in practice](https://medium.com/@davisjam/introduction-987fdc4c7b0?source=friends_link&sk=ceefa4a4ca9617e08ab782c3b1580aea)
|
|
- [Research brief on the variability of super-linear regex behavior across programming languages](https://medium.com/@davisjam/why-arent-regexes-a-lingua-franca-esecfse19-a36348df3a2?source=friends_link&sk=d21be7f8f723e2080dc993385c6973d1)
|
|
- [Comparing regex matching algorithms](https://swtch.com/~rsc/regexp/regexp1.html)
|
|
|
|
## Project policies
|
|
|
|
### Versioning
|
|
|
|
This project follows [Semantic Versioning 2.0 (semver)](https://semver.org/).
|
|
|
|
Here are the project-specific meanings of MAJOR, MINOR, and PATCH updates:
|
|
|
|
- MAJOR: "Incompatible" API changes were introduced. There are two types in this module:
|
|
- Changes that modify the interface
|
|
- Changes that cause any regexes to be marked as unsafe that were formerly marked as safe
|
|
- MINOR: Functionality was added in a backwards-compatible manner. There are two types in this module:
|
|
- Refactoring the analyses but not changing their results
|
|
- Modifying the analyses to reduce false positives, without affecting negatives (false or true)
|
|
- PATCH: I don't anticipate using PATCH for this module
|
|
|
|
### License
|
|
|
|
[MIT](https://github.com/davisjam/safe-regex/blob/master/LICENSE) |