19-Nov-2017 03:13

Additionally, it will print out the byte offset where the invalid byte sequence occurred.

Edit: The output encoding doesn't have to be specified, it will be assumed to be UTF-8.

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules: For 1-byte character, the first bit is a 0, followed by its unicode code.

For n-bytes character, the first n-bits are all one's, the n 1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

But the second continuation byte does not start with 10, so it is invalid.

In general, text is more likely to contain shorter UTF-8 sequences than longer ones, so you might as well handle the shorter cases first to save a few CPU cycles. That makes the code more readable (and saves one pointless subtraction). Just AND with the bitmask to specify which bits you are interested in inspecting.

Defaults to true * Returns boolean * */ It evaluates the address in two parts, first evaluating the host and if that legal it then evaluates the user name.