Skip to content

Instantly share code, notes, and snippets.

@jhorsman
Last active March 29, 2024 05:25
Show Gist options
  • Star 62 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save jhorsman/62eeea161a13b80e39f5249281e17c39 to your computer and use it in GitHub Desktop.
Save jhorsman/62eeea161a13b80e39f5249281e17c39 to your computer and use it in GitHub Desktop.
Semantic versioning regex
@jwdonahue
Copy link

Accepts 0001.0001.0001 which is not a valid semver version string.

@bartelsielski
Copy link

@jwdonahue This regex fixes that:

^([0-9]|[1-9][0-9]*)\.([0-9]|[1-9][0-9]*)\.([0-9]|[1-9][0-9]*)(?:-([0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?(?:\+[0-9A-Za-z-]+)?$

@Tschebbischeff
Copy link

Tschebbischeff commented Apr 1, 2019

EDIT:
TL;DR: Use the official RegEx

As @rverst pointed out, there is now an officially recommended RegEx, which as of 23rd Sep 2019 is:
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-((?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

Apparently depending on the used engine \d might in rare cases match non-ASCII digits, hence I recommend replacing all \d by [0-9] leading to this expression:

^(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)\.(0|[1-9][0-9]*)(?:-((?:0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9][0-9]*|[0-9]*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\+([0-9a-zA-Z-]+(?:\.[0-9a-zA-Z-]+)*))?$

As per the dicussion below, my provided RegEx makes the mistake of allowing leading zeroes in numeric pre-release versions, which is INVALID. Do NOT use the RegEx from the original comment below.


Original comment:

Just wanted to expand on the previous solutions with:

^([0-9]|[1-9][0-9]*)\.([0-9]|[1-9][0-9]*)\.([0-9]|[1-9][0-9]*)(?:-([0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?(?:\+([0-9A-Za-z-]+(?:\.[0-9A-Za-z-]+)*))?$

This RegEx adds support for multiple build metadata identifiers separated by dots (akin to the pre-release version part of the regex), as defined on semver.org

@almic
Copy link

almic commented Apr 23, 2019

None of these seemed good enough, so here goes:
^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(-[a-zA-Z\d][-a-zA-Z.\d]*)?(\+[a-zA-Z\d][-a-zA-Z.\d]*)?$

Matches:
1 - Major
2 - Minor
3 - Patch
4 (optional) - Pre-release version info
5 (optional) - Metadata (build time, number, etc.)

@whoisj
Copy link

whoisj commented May 15, 2019

just found this thread. Let me try... as I understand it

  1. Major version is required, minor and patch version, and the meta version are supported but optional.
  2. Major, minor, and/or patch can be 0 but cannot be a non-zero value with a leading 0.
  3. The meta version can contain any Unicode letter (not restricted to Latin characters), _, ., and - characters.
  4. Minor version values must suffix to a major version.
  5. Patch version values must suffix to a minor version.
  6. Meta version values can suffix to a minor or patch version.
(0|(?:[1-9]\d*))(?:\.(0|(?:[1-9]\d*))(?:\.(0|(?:[1-9]\d*)))?(?:\-([\w][\w\.\-_]*))?)?

Match Groups:

  1. Major
  2. Minor (optional - requires major to match)
  3. Patch (optional - requires minor to match)
  4. Meta/pre-release (optional - requires minor to match)

You can add the ^ prefix and $suffix to do match enforcement.

Regex in action

@johnwc
Copy link

johnwc commented Jul 9, 2019

@gistofj Not sure where you read that Unicode was accepted, but it clearly states in the 2.0 ref on the site that MUST comprise only ASCII alphanumerics for both pre-release & meta. I also don't see it state is can contain underscores.

(?<Major>0|(?:[1-9]\d*))(?:\.(?<Minor>0|(?:[1-9]\d*))(?:\.(?<Patch>0|(?:[1-9]\d*)))?(?:\-(?<PreRelease>[0-9A-Z\.-]+))?(?:\+(?<Meta>[0-9A-Z\.-]+))?)?

Regex in action

@rverst
Copy link

rverst commented Sep 17, 2019

None of these regex seems to be correct. But actually there is one suggested on the SemVer homepage. Why not using it?

If you're looking for a bash version: https://gist.github.com/rverst/1f0b97da3cbeb7d93f4986df6e8e5695

@johnwc
Copy link

johnwc commented Sep 23, 2019

@rverst All of those regex work and have been tested to work with examples. You can use the link to test them. If they do not work for you, then you are using a different regex engine than what they are for, most likely you are using a regex engine that is based on perl's engine.

The regex example you point out in your link was just added to that page on Aug 23, after this entire thread and comments was created. Also, it does not pass all of your tests you gave in the bash script. See the inavlaid matches here: Regex in action

Your regex in your bash script also should be simplified down to: (\d+)\.(\d+)\.(\d+)-?([a-zA-Z-\d\.]*)\+?([a-zA-Z-\d\.]*) Test it here: Regex in action
You can also test the regex that I shared against your list of test: Regex in action

@rverst
Copy link

rverst commented Sep 23, 2019

@johnwc I think they are not. And I have tested them. And yes, against a perl based engine.

The example link provided by the thread starter, runs against a ECMA script engine. But he other examples seem to be tested against the .net engine, so let's go on with that:

Can we agree, that we need more test cases than given in the different examples? Let's take the ones provided in the example on the SemVer-homepage.

I split each test in valid and invalid test cases, cause in some cases the permalink provided by regexstorm.net is too long.

Original regex from thread starter:

Regex from @bartelsielski:

Regex from @Tschebbischeff:

Regex from @almic:

Regex from @gistofj:

Regex from @johnwc:

The suggested "simplification" of the bash-regex is also incorrect. It matches some invalid SemVers.

The fact, that the regex examples on the SemVer-Page were added on Aug 23 does not make the regex in the thread and comments correct.

It wasn't my intention to embarrass you or the other commentators. Regex are complex and many mistakes have been and will being made with regex. I came across this thread by searching for a solution for my problem - checking a string for a valid SemVer as good as possible.
Since the provided solutions from this thread were not "good enough", I realized, that the originators of SemVer are providing a regex on their page.
I simply thought it would be nice to mention it, in the event that other people looking for similar problems encounter this thread.

@Tschebbischeff
Copy link

@rverst
I don't know if you did this on all the links, but at least for mine you are missing the leading ^ and trailing $ which are important to match on a single line (hence it doesn't fully match the patch version in your provided link.

On your test set using my actual RegEx I can not find any missing matches on valid strings.
On invalid strings I find significantly less errors than you with my actual RegEx.
The only error is that I allow leading zeroes in the pre-release versions, which is indeed illegal and correctly handled by the now existing official regex.

Here is a link to my test with the ^ and $ : https://regex101.com/r/JOKR70/1
(I have added 1.2.3-0123+meta and 1.2.3-123.01234 as invalid test cases)

This is the test with the official RegEx: https://regex101.com/r/TefKLN/1

Semantically, I can not find any other mistakes I made, when comparing to the official RegEx.

I will also update my answer above with the suggestion to use the official RegEx, since people finding this thread may not be reading a big discussion.

@rverst
Copy link

rverst commented Sep 23, 2019

@Tschebbischeff I did, for the tester at http://regexstorm.net - since it seemed not to work at all (at least for me) with the tokens for start- or end-of-line.
You're right, your regex matches the valid strings. Not on regexstorm.net but I wouldn't recommend this page for validating regex either.

I'm sorry if I misinterpreted some results of my (admittedly fast and simple) test, but I just couldn't leave @johnwx's statement:

"All of those regex work and have been tested to work with examples"

like that.

My main goal was to make people who find this thread aware of the official version of the regex. And the fact that sometimes it's just not enough to have only five test cases or so.

@jmatsushita
Copy link

Here's a version including semver contraints https://regex101.com/r/Ly7O1x/196

@Dentrax
Copy link

Dentrax commented May 7, 2020

@jmatsushita @jhorsman

1.0.0-alpha- passes the unit test, but it should not.

@Tschebbischeff
Copy link

Tschebbischeff commented May 19, 2020

@Dentrax
1.0.0-alpha- is in fact a valid semantic version according to (9):

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. Identifiers MUST comprise only ASCII alphanumerics and hyphen [0-9A-Za-z-]. Identifiers MUST NOT be empty. Numeric identifiers MUST NOT include leading zeroes. Pre-release versions have a lower precedence than the associated normal version. A pre-release version indicates that the version is unstable and might not satisfy the intended compatibility requirements as denoted by its associated normal version. Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92.

Hyphens are allowed to be a part of the pre-release identifiers.
Only the first hyphen in a semantic version string denotes the beginning of the following pre-relase identifiers.
The pre-release information is terminated either by the end of the string or a + (which denotes the beginning of build metadata).

I.e. your example resolves to these values:

major: 1
minor: 0
patch: 0
pre-release: ["alpha-"]
build-metadata: []

(Also there is now an officially recommended RegEx in the FAQ section on semver.org that you can use :) )

@mathomp4
Copy link

mathomp4 commented Feb 2, 2022

Here's a fun question for the gurus here: Does anyone have a good GitHub acceptable SemVer regex?

I got into the beta for protected tags and tried the official SemVer regexes but both seem to be too complex for GitHub regexer.

@Tschebbischeff
Copy link

Tschebbischeff commented Feb 2, 2022

@mathomp4
It is impossible to create a pattern for semver strings with the glob-style pattern matching emmaviolet@github mentioned they use.
I'm sorry 😔

More detail if you want:

A semver string's major version part alone allows for a theoretically infinite string of numerical characters. Let's try to get the equivalent of the regex [0-9]* only.

* allows any string (excl. / maybe, then ** specifically would allow / inside).

It is limitable with a constant prefix, infix or suffix, but there is no way to limit the "type of character" it allows at the beginning and/ or end of the string.

Now * and ** are the only two special characters that allow to match more than one character.
There is no way to make ? or [set] match against more than one character and {a,b} (if even available here) is based on a and b being patterns. Those patterns might allow infinite strings by using * inside, but do not allow limiting the type of character for the same reason as above.

Hence, it's not possible to match "an infinite string comprised of only specific characters" in glob-style 🥲

@mathomp4
Copy link

mathomp4 commented Feb 2, 2022

@Tschebbischeff Yeah, I stared at it for a while but I figured "If I can't do an ls for that pattern, I can't do this". Oh well, at least I can block off v*...and hope no bosses want a v-tag! 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment