May 2013, rev Nov 2022
Maybe you've noticed the mysterious code inserted in a file in WordPress' root installation folder and
wondered what its purpose is. In short, it allows permalinks to work. If that's not an adequate
explanation, and you'd like to really understand what every character means and what its purpose
is, this article is for you. We're going to put this code under a big magnifying glass. In the end,
you'll understand everything there is to know about this block of code.
During installation, WordPress enters the following rules into the
.htaccess file in
its installation root. This is assuming you are using Apache server software. This is the most commonly
used server software. Other software uses different methods to achieve permalink functionality. These
other methods are outside the scope of this article. These .htaccess rules are not inserted, however,
if you chose the
"default" (none) permalink structure. Some themes or plugins may cause additional lines
to be entered. These are the rules inserted upon a fresh installation.
If there is not a .htaccess file in the installation folder, one is created:
# BEGIN WordPress
# The directives (lines) between "BEGIN WordPress" and "END WordPress" are
# dynamically generated, and should only be modified via WordPress filters.
# Any changes to the directives between these markers will be overwritten.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
Many of the above code lines are links to a detailed explanation of the respective line.
(Rollover to identify links) The section headings
that each link leads to is a link to the official reference at apache.org.
If your WordPress installation is not in your site's public html root, the RewriteBase and
final RewriteRule will include the path to your WordPress installation folder.
The main purpose of this rule set is to allow arbitrary paths (permalinks) be sent to the proper
WordPress script without causing file/path not found 404 errors. If your permalinks work but you see no
.htaccess file on your server, it is because it is hidden. The Linux operating system treats
any filename beginning
with a dot (.) as a hidden file. Your FTP client has a setting somewhere to enable viewing of
hidden files. Select that setting and .htaccess will be visible.
The first and last lines starting with # are comments so you know how these rules ended up in your
.htaccess file. WordPress core code uses these comments to locate the related code when it needs
to alter the rules for various reasons. They do nothing on their own and serve no other purpose. Any
line where the first character is a
# is a comment.
The <IfModule mod_rewrite.c> and </IfModule> lines work as a pair and
prevent errors if your server does not have the mod_rewrite.c module installed. If the module is missing,
the lines in between are ignored. Even if
you're sure your server will always have this module available, you may wish to leave these
lines in place so if the module becomes unavailable for any reason, your site will not totally break. However, without the module,
permalinks will fail to work. But a 404 not found error is better than a 500 this site is borked error.
This line tells the server to process all following lines beginning with Rewrite* up to the
next RewriteEngine line as one logical group.
You could set the engine to Off to not process any rules. During testing, this is much easier than
commenting out every rule.
This line defines the base from which all relative path references are taken. A lone / means the base is
your site's public html root. The actual value will be the path to the installation folder for
WordPress. If your blog index page is accessed by https://example.com/myblog/index.php, the
line would read RewriteBase /myblog/.
This rule assigns the value passed in the Authorization HTTP request header to the environment varianble HTTP_AUTHORIZATION.
The HTTP_AUTHORIZATION value is in turn assigned to PHP's $_SERVER super global array. We do this because in some cases
Apache will strip out passed values before invoking any CGI scripts. This is intended to be a security measure, but lack of it causes WordPress to be
unable to properly set the correct protocol (https://) in its permalinks. If your PHP is not installed as a CGI script, or if you don't use HTTP authentication,
you likely do not need this rule, but it does no harm to leave it in place.
As this is the first rewrite rule, with no conditions above it, this rule is always processed. The rule basically
says for every request, assign the passed authorization value to the specified environment varaible. Let's take a closer look.
All RewriteRule directives have two arguments and an optional flag setting. The first argument is a
regexp
(regular expression) to attempt to match against the request. The second argument is what to rewrite if there
is a match of the first argument. In our case, the portion within square brackets is an "E" flag, telling the
rewrite engine to assign a value to an envirionment variable. More on this later. The
second argument in our case is -, which is a special character telling the engine to not
actually rewrite anything, to only do what the flags indicate if there's a match.
The first argument .* is a regexp, where several characters have special
meaning. The regexp token . alone means
match any single character in the request exactly once. The *
token means repeat the previous token as many times as possible. In other words the regexp .* will match everything in any request.
The E= part of the flag is its identifier, telling the engine what to do with the remaining arguments. Nothing is assigned
to E, the = is just telling the engine that there are more arguments following. HTTP_AUTHORIZATION
is the environment variable we want to assign a value to. The : serves as a separator token between two arguments.
The %{...} syntax is used for all .htaccess variables. HTTP:Authorization is the variable name, which is assigned the value from the
HTTP Authorization request header. Notably, this assignment occurs before Apache might strip out the value prior to passing the request on to CGI.
up to full script
This rule prevents any request for index.php from being processed farther down as a
permalink, because it's not one. This will make more sense in a while. This rewrite rule is always processed as this
rule has no conditions above it (no RewriteCond rules) that would prevent processing. The rule basically
says if the text index.php matches the request portion after the base definition, pass the request
along unchanged and do not process any more rules. Let's take a closer look.
Again, we have two arguments and an optional flag argument. In our case, [L] is a "Last" flag telling the
rewrite engine to not process any more rules if there's a match to the first argument. The
second argument again is -, telling the engine to not
actually rewrite anything, to only do what the flags indicate if there's a match.
The first argument is ^index\.php$. In this case, the special regexp characters
are ^\.$ All other characters are to be literal matches to the request above the base definition.
So the regexp index alone means
the request must have the corresponding text index somewhere in its content. The ^
character means the match must start at the beginning of the request. Since our base is / (root),
^index means either match a file or folder in root that begins with index. So
indexfolder/file.php would match this example regexp fragment. So would indexfile.php.
However, sindex.php would not match, since the i is not the first character. In order
to match, index has to be the first 5 characters.
In a regexp, the dot . means match any character
exactly once. Since we want to only match an actual dot, not any character, we indicate so with a \
backslash preceding the dot. So the regexp ^index. (with a dot but without the \) will match any character after the x,
such as indexQmore.php, but ^index\. will only match something with an actual dot after
the x, like index.more.php. So now, ^index\.php means the request must start
with index.php, such as index.php.moretext. However, the $ means match the very
end of the request. So now we have the full ^index\.php$ where the only possible
match is index.php. Nothing else will match,
index.php.moretext will not match. pre.index.php will not match.
RewriteCond means a condition must be true in order for the next RewriteRule to
be processed. %{REQUEST_FILENAME} is a variable set by the server to contain the request URL,
not just a filename as it may appear. The -f flag without the ! means the
condition is true if the first
argument resolves to a valid file. The ! negates the flag, so now with the
full !-f the condition is true
if the first argument does NOT resolve to a valid file. Since permalinks should not point to any valid
system filename, a typical permalink will cause this condition to be true. A reference to a
valid server file will cause this condition to be false.
This is similar to the first condition, except now we're checking for valid directories
(the -d flag) instead
of files. Both conditions must be true for the next rewrite rule to be processed. The following
rule will only be processed if the request is not a valid file and it is not a valid folder either.
If the request is either a valid folder and/or it's a valid file, the following rule is not processed
and the request is passed on unchanged, meaning the file is served normally as though there was
no rewrite rule in place.
This rule is essentially only processed if the request is some sort of permalink. Any other
valid file system path is passed on without change. Once again, there are two arguments and a flag.
The [L] flag once again means do not process any other rules after this one. There are
often no other rules after this, but it is placed here just to be safe. The dot . means match
any one character. The /index.php means replace the entire original request with /index.php.
If your WordPress installation is above your public html root, you will see the actual folder path here
as well.
This rule basically says send any permalink requests to index.php for further processing by
WordPress. WordPress gets the original request from a different variable, so it doesn't matter if
it gets rewritten here, WordPress will still know what the original request was.
Though the request is sent to the new location (index.php), the user's browser continues to display
the original request in its address bar. This is by design for internal references. External
references (URLs that include the https://www.domain.com portion) result in a 302 server
response (temporary redirect) and the address bar changes.
The / preceding index.php means this is not a relative link, so the initial
RewriteBase / directive does not affect the resulting path. If we omitted the /
and just had index.php, we would have a relative link, so the base would be applied before it.
Since the base is /, we end up with the same file path either way in this particular case.
Why do we match just one character with the dot (.) argument? Shouldn't we match the
entire request by using .*? [In regexp, * means repeat the previous match (with
the . meaning match anything) any number of times, including 0 times. Thus .* will
match the entire request, no matter what it is.] We could. But there's no point. Since the match
is not in parenthesis, and there is no $1 style back reference in the /index.php
parameter back to any such parenthesis, this means
no forward reference is used and the entire request is discarded, to be replaced in its entirety
with the next argument, /index.php. It's a bit more efficient to match any one character
than all of them. Since we are required to provide some kind of regexp, a simple dot is the most expedient.
If the original request is always discarded, and previous RewriteCond lines determine if the rewrite is
performed, what is the point of requiring a regexp then? In this case, it serves no purpose, thus the dot .
is the most expedient regexp we can provide. But in many other situations, the regexp is used to
match particular portions of the request. Those portions can then be back referenced in the rewrite parameter
to feed useful information from the original request to the new rewritten page. The RewriteRule
functionality is much more powerful than what we need here, but it's getting off topic to discuss
in any more detail. That's for another day.
Now that we see how all permalink requests get sent to index.php, the reason for earlier line
RewriteRule ^index\.php$ - [L] can be more clearly seen. Rewritten requests
are also evaluated by .htaccess rules just as a fresh request would be. The second
RewriteRule line keeps the rewritten request (or a basic initial index.php request)
from being re-evaluated for possible permalink handling, as it can't possibly be a permalink.
Besides preventing an endless loop, it's just more efficient.
If not for these rules, all permalinks would throw a 404 error before WordPress ever got a chance
to process the links. What was once mysterious gibberish should at least make some sense by now. As with
any computer script, once you understand what the various code bits mean and what is done with the
information, it's not all that complicated. up to full script «Back
Comments, feedback, and questions are always welcome. Email me at
JavaScript needs to be enabled
.
revised 8 Nov 2022 -- Add section for HTTP Authorization