RegExp

西瓜皮儿 2021/12/6 前端重点

# 底层

js正则底层是基于perl实现的正则引擎
同样的字符串处理正则会比一般的字符串方法快很多
详细的匹配模式可以看《正则表达式必知必会》读书笔记

# 构造函数

构造函数

作用：构造一个正则表达式
调用：new RegExp(pattern[, flags])
入参：String|RegExp [, String]
返回：RegExp
tip：flags支持的标志有：g / i / m / s / u / y

字面量和构造函数

有两种方式构造正则：字面量和构造器
区别：
1. 字面量赋值是编译状态，效率更高。
2. 构造函数是运行时编译，效率会更低，但支持传入字符串进行运行时解析，所以可以有更自由的定义正则方式。

const str = 'www.music.qq.com/'
const reg1 = /qq.com/g;
const reg2 = new RegExp(/qq.com/, 'g');

console.log(reg1.test(str)); // true
console.log(reg2.test(str)); // true

1
2
3
4
5
6

# falgs语法

flags

flag	作用
g	全局匹配
i	忽略大小写
m	多行匹配模式：开头匹配(^)/结尾匹配($)按照每行匹配。(以\n或者\r 分隔)
s	点匹配所有字符(包括换行字符)
u	unicode模式匹配
y	粘性匹配，从lastIndex匹配，且不会向下搜索

# pattern模式

详细的匹配模式可以看《正则表达式必知必会》读书笔记

# 基础语法

非打印字符	含义
\f	换页
\n	换行
\r	回车
\t	制表符
\v	垂直制表符

# 高级筛选

# 主动执行方法

规则

如果一个表达式同时指定了 sticky 和 global，其将会忽略 global 标志。
sticky不会更改lastIndex，除非匹配结束，重置lastIndex

# exec

作用：逐条遍历匹配项（在加g/y的情况下）
调用：reg.exec(str)
入参：String
返回：Array | null

exec对比match

exec和match都可以拿到所有匹配项
exec可以逐条遍历，match只是返回匹配数组

# test

作用：检测字符串是否由匹配项
调用：reg.test(str)
入参：String
返回：Boolean

# toString

作用：以字符串形式返回正则表达式
调用：reg.toString()
返回：String

const reg = /abc/gmi;
console.log(reg.toString()); // '/abc/gmi'

1
2

# 例子

一些常见的正则用法

# 属性

# lastIndex

作用：记录全局匹配模式下，上一次匹配的下标 + 1
使用：reg.lastIndex
值：Number

注：全局匹配模式(g)/粘性模式(y) 会启用此属性

const str = 'hello';
const reg = /o/g;

// 匹配的下标 4 ，lastIndex存 5
console.log(reg.exec(str)); // [ 'o', index: 4, input: 'hello', groups: undefined ]
console.log(reg.lastIndex); // 5

// 没匹配到，lastIndex重置0
console.log(reg.exec(str)); // null
console.log(reg.lastIndex); // 0

console.log(reg.exec(str)); // [ 'o', index: 4, input: 'hello', groups: undefined ]
console.log(reg.lastIndex); // 5

1
2
3
4
5
6
7
8
9
10
11
12
13

# source

作用：拿到正则的pattern（不包括falgs）
使用：reg.source
值：String

const reg = /aaa/gim;
console.log(reg.source); // aaa

1
2

# dotAll

作用：查看是否使用了标识符【s】
调用：reg.dotAll
值：Boolean

const reg1 = /./;
const reg2 = /./s;

console.log(reg1.dotAll); // false
console.log(reg2.dotAll); // true

1
2
3
4
5

# global

作用：查看是否使用了标识符【g】
使用：reg.global
值：Boolean

const reg1 = /./ig;
const reg2 = /./i;

console.log(reg1.global); // true
console.log(reg2.global); // false

1
2
3
4
5

# ignoreCase

作用：查看是否使用了标识符【i】
使用：reg.ignoreCase
值：Boolean

const reg1 = /i/g;
const reg2 = /i/ig;

console.log(reg1.ignoreCase); // false
console.log(reg2.ignoreCase); // true

1
2
3
4
5

# multiline

作用：查看是否使用了标识符【m】
使用：reg.multiline
值：Boolean

const reg1 = /m/m;
const reg2 = /m/;

console.log(reg1.multiline); // true
console.log(reg2.multiline); // false

1
2
3
4
5

# sticky

作用：查看是否使用了标识符【y】
使用：reg.sticky
值：Boolean

const reg1 = /aa/y;
const reg2 = /aa/;

console.log(reg1.sticky); // true
console.log(reg2.sticky); // false

1
2
3
4
5

# unicode

作用：查看是否使用了标识符【u】
调用：reg.unicode
值：Boolean

const reg1 = /a/u;
const reg2 = /a/;

console.log(reg1.unicode); // true
console.log(reg2.unicode); // false

1
2
3
4
5

# flags

falgs

作用：拿到所有的标识符
调用：reg.flags
值：String
tip：(g/i/s/m/u/y)且会自动按字母顺序排序

# hasIndices

tip：有版本限制
作用：查看是否使用了标识符【d】
调用：reg.hasIndices
值：Boolean

环境	支持版本
chrome	90
node	no

# Symbol方法

被Symbol标识出来的方法，基本都被String.prototype.XXX 内部调用
如：RegExp.prototype[Symbol.match]被String.prototype.match()内部调用

# match

tip：该方法是被String.prototype.match()内部调用
调用：str1.match(reg)
入参：RegExp
返回：Array | Object
tip：入参不是RegExp会被隐式 new RegExp(reg)
tip：返回值类型取决于有没有加 g

let a = 'hello world!';
let b = /l/g;
let c = /l/;

// 入参不是正则：String.prototype.match自身处理
console.log(a.match(b)); // ['l', 'l', 'l']

// 入参是正则：String.prototype.match调用RegExp.prototype[Symbol.match]方法
console.log(a.match(c)); 
/**
 * {
 *  [0]: 'l',
 *  index: 2,
 *  input: 'hello world!',
 *  group: undefined
 * }
 */

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

String.prototype.match()内部调用证明

const reg = /1/g;
const str = '123';

console.log(str.match(reg)); // [ '1' ]

reg[Symbol.match] = (str) => ['嘿', '嘿嘿', '嘿嘿嘿']
console.log(str.match(reg)); // ['嘿', '嘿嘿', '嘿嘿嘿']

1
2
3
4
5
6
7

# matchAll

tip：是被String.prototype.matchAll()内部调用
调用：str1.matchAll(reg)
入参：RegExp
返回：iterator，所有的捕获组的一个迭代器
tip：入参正则必须设置g

let a = 'hello world!';
let b = /l/g;
let c = /l/;

console.log(a.matchAll(b)); // iterator
console.log(a.matchAll(c)); // Error
console.log(Array.from(a.matchAll(b)));
/**
 * [
 *    [ 'l', index: 2, input: 'hello world!', groups: undefined ],
 *    [ 'l', index: 3, input: 'hello world!', groups: undefined ],
 *    [ 'l', index: 9, input: 'hello world!', groups: undefined ]
 * ]
 */

1
2
3
4
5
6
7
8
9
10
11
12
13
14

# replace

tip：由String.prototype.replace()内部调用
作用：替换（一个或全部）指定字符串
调用：str1.replace(reg, str2)
入参1：String | RegExp
入参2：String | (matchs[, match...], input) => {}
返回：String
tip：不修改原str1
tip：第一个参数是正则的时候，第二个参数可以是函数，函数入参取决于正则有没有分组

# search

tip：由String.prototype.search()内部调用
作用：查找字符串的第一个匹配项
调用：str.search(reg)
传参：RegExp
返回：Number
tip：如果没找到会返回-1

var str = "hey JudE";
var re = /[A-Z]/g;
var re2 = /[.]/g;

console.log(str.search(re)); // 4
console.log(str.search(re2)); // -1

1
2
3
4
5
6

search和indexof的区别

search支持正则，但也只能返回第一个匹配项
indexOf支持变换起始下标，但不支持正则

const str = '1231';
console.log(str.search(/1/, 1)); // 0
console.log(str.indexOf('1', 1)); // 3

1
2
3

# split

tip：String.prototype.split()内部调用
调用：str1.split(str2|reg[, len])
入参：String | RegExp[, Number]
返回：Array，用str2/reg分割得来的数组，可限制数组长度
tip：第二个参数是限制了数组的长度
tip：第一个参数可以是正则
参数1是正则时，String.prototype.split()内部调用此方法

# species

同Symbol.species

class MyRegExp extends RegExp {
  // 将 MyRegExp species 覆盖为 RegExp 父类构造器
  static get [Symbol.species]() { return RegExp; }
}

1
2
3
4

map Date