Regex Course

Grouping and using ES6 features

Marcin Wanago
JavaScriptRegex

We covered quite a few features of regex so far. There is a lot more, though. Today we will deal with more advanced concepts, like groping and cover more of the RegExp object features in JavaScript. We will also learn how to use some of the features that ES6 brought us. Let’s go!

exec

It is a method that executes a search for a match in a string – similar to the test method – but returns a result array (or null). Its result has additional properties, like index and input

1const string = 'fileName.png, fileName2.png, fileName3.png';
2const regexp = /fileName[0-9]?.png/g;
3 
4regexp.exec(string);
5 
6[
7  0: "fileName.png",
8  index: 0,
9  input: "fileName.png, fileName2.png, fileName3.png"
10]

The index is the position of a match, and input is the provided string. Please note, that I am using a global flag here, that is mentioned in the first part of the course. Thanks to that, we can look for more than one match in our string, by calling exec multiple times. It will set the lastIndex property of the RegExp object to a number indicating the place where the searching stopped.

1let resultArray;
2while((resultArray = regexp.exec(string)) !== null) {
3  console.log(resultArray[0], regexp.lastIndex);
4}
5 
6// fileName.png  12
7// fileName2.png 27
8// fileName3.png 42

Grouping in regex

With regular expressions, we can not only check the string for matches but also extract certain information while ignoring unnecessary characters. To do this, we will use grouping with round brackets.

1function getDateFromString(dateString) {
2  const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/;
3  const result = regexp.exec(dateString);
4  if(result) {
5    return {
6      day: result[1],
7      month: result[2],
8      year: result[3]
9    }
10  }
11}
12 
13getDateFromString('14-05-2018');
1{
2  day: '14',
3  month: '05',
4  year: '2018'
5}

In this case, we extracted three groups of characters and ignored the dashes. Just note that result[0]  will be the full string of characters matched.

There is a named groups proposition that is in stage 4 already and proves to be helpful in use-cases such as the one above. It was nicely described in the article on the 2ality blog by Axel Rauschmayer.

Nested groups

You can actually nest groups:

1function getYearFromString(dateString) {
2  const regexp = /[0-9]{2}-[0-9]{2}-([0-9]{2}([0-9]{2}))/;
3  const result = regexp.exec(dateString);
4  if(result) {
5    return {
6      year: result[1],
7      yearShort: result[2]
8    }
9  }
10}
11 
12getYearFromString('14-05-2018');
1{
2  year: '2018',
3  yearShort: '18'
4}

Here, in the part  ([0-9]{2}([0-9]{2})) of our pattern, we nest one group in the other. Thanks to that, we get both long and short string for the year.

Conditional patterns

There is another useful feature, which is the OR statement. We can use it with the pipe character:

1function doYearsMatch(firstDateString, secondDateString) {
2  const execResult = /[0-9]{2}-[0-9]{2}-([0-9]{4})/.exec(firstDateString);
3  if(execResult) {
4    const year = execResult[1];
5    const yearShort = year.substr(2,4);
6    return RegExp(`[0-9]{2}-[0-9]{2}-(${year}|${yearShort})`).test(secondDateString);
7  }
8}
9 
10doYearsMatch('14-05-2018', '12-02-2018'); // true
11doYearsMatch('14-05-2018', '24-04-18');   // true

In our pattern,  (${year}|${yearShort}) will cause the years to match even if the second one is provided in a short form.

Capture all

While working with groups, there is a particular one that might come in handy:  (.*)

1function getResolution(resolutionString) {
2  const execResult = /(.*) ?x ?(.*)/.exec(resolutionString);
3  if(execResult) {
4    return {
5      width: execResult[1],
6      height: execResult[2] 
7    }
8  }
9}
1getResolution('1024x768');
2 
3{
4  width: '1024',
5  height: '768'
6}

Thanks to using the  ? operator, it will work also if there are additional spaces:

1getResolution('1920 x 1080');
2 
3{
4  width: '1920',
5  height: '1080'
6}

Sticky flag

As you’ve already seen, RegExp object has a property called lastIndex. It is used when the search is global (with the use of appropriate flag) for the pattern matching to be continued in the right place. With the sticky flag,  y , introduced in ES6, we can force the search start at a certain index.

1function getDateFromString(dateString) {
2  const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/y;
3  regexp.lastIndex = 14;
4  const result = regexp.exec(dateString);
5  if(result){
6    return {
7      day: result[1],
8      month: result[2],
9      year: result[3]
10    }
11  }
12}
13 
14getDateFromString('Current date: 14-05-2018');

Remember that performing a check on a string (for example with exec) changes the lastIndex property, so if you would like it to stay the same between multiple sticky searches, don’t forget to set it. If the pattern matching fails, lastIndex is set to 0.

It is a good time to note that you can check if the RegExp object has flags enabled.

1const regexp = /([0-9]{2})-([0-9]{2})-([0-9]{4})/y;
2regexp.lastIndex = 14;
3console.log(regexp.sticky); // true

Same goes for other flags: for more, visit MDN web docs.

Unicode flag

ES6 brought a better support for Unicode, too. Adding a Unicode flag,  u , enables additional features related to Unicode. Thanks to it, you can use  \u{x} in your patterns, where x is the code of the desired character.

1/\u{24}/u.test('$'); // true

It won’t work without u flag. It is important to know, that it impacts more than just that, though. It is possible to use some more exotic Unicode characters without the flag:

1/😹/.test('😹'); // true

but it will fail us in more advanced cases:

1/a.b/.test('a😹b');  // false
2/a.b/u.test('a😹b'); // true
3 
4/😹{2}/.test('😹😹');  // false
5/😹{2}/u.test('😹😹'); // true
6 
7/[a😹b]/.test('😹');  // false
8/[😹🐶]/u.test('😹'); // true
9 
10/^[^x]$/.test('😹');  // false
11/[^x]/.test('😹');    // true
12/^[^x]$/u.test('😹'); // true
13 
14/^[ab😹]$/.test('😹');  // false
15/[ab😹]/.test('😹');    // true
16/^[ab😹]$/u.test('😹'); // true

We can easily draw a conclusion, that it is a good practice to include u  flag in our patterns, especially if there is any chance that there would be characters other than just the standard ASCII.

If you combine it with the ignore case flag, the pattern will also match for both lowercase and uppercase characters.

1/\u{78}/ui.test('X'); // true

An interesting note is that in the pattern attribute of input and textarea elements in HTML has this flag enabled by default.

Summary

Today we learned more about RegExp object in JavaScript and how we can use this knowledge with a great feature of regular expressions: grouping. We’ve also learned two new flags:  sticky and Unicode. Hopefully, you now see more and more use-cases for regular expressions. Until the next time!