What URL Encoding Means in Technical Terms
According to RFC 3986, URL encoding means that a URL may contain only specific “unreserved” ASCII characters, and any character outside this allowed set must be converted before it can safely travel across the web.
The transformation rule is straightforward:
% + two hexadecimal digits
The hexadecimal value represents the numeric byte of the original character.
Why This Transformation Is Necessary
If characters are not converted properly:
- Spaces interrupt the URL path
- Query strings may split incorrectly
- Special symbols may be misinterpreted
- Servers can respond with 400 or 404 errors
- Security risks may arise from malformed requests
When applied correctly, the conversion ensures:
- Stable HTTP communication
- Accurate parsing of parameters
- Consistent cross-browser behavior
- Reliable decoding on the server side
How the Encoding Process Works Step by Step
Technically, this mechanism converts characters into their hexadecimal byte representation based on ASCII or UTF-8 encoding. Each resulting byte is prefixed with a percent sign.
Example 1: Space in a File Path
Original URL:
https://example.com/new page.html
A space is not permitted inside a path.
Character value of space:
- Decimal: 32
- Hexadecimal: 20
Corrected version:
https://example.com/new%20page.html
Example 2: Special Symbols Inside Parameters
Original:
https://example.com/search?q=smart locker & indoor
If & is treated as data rather than a parameter separator, it must be encoded.
Proper version:
https://example.com/search?q=smart%20locker%20%26%20indoor
Converted elements:
- Space → %20
- & → %26
Important: The structural ampersand separating parameters must remain unchanged.
Categories of Characters in URLs
Recognizing character types helps avoid incorrect transformations.
Unreserved Characters
These characters are safe and can remain unchanged:
|
Category |
Characters |
|
Letters |
A–Z a–z |
|
Numbers |
0–9 |
|
Symbols |
- _ . ~ |
Example:
https://example.com/product-123_A
No conversion is required.
Reserved Characters
These characters define the structure of a URL:
|
Character |
Role |
|
? |
Starts query string |
|
& |
Separates parameters |
|
= |
Assigns parameter values |
|
# |
Fragment reference |
|
/ |
Path divider |
|
: |
Scheme separator |
If these symbols are encoded incorrectly, the address loses its logical structure.
Incorrect:
https://example.com/page%3Fid%3D10
Correct:
https://example.com/page?id=10
Characters That Must Be Converted
The following symbols should be transformed when used as literal data:
- Space
- "
- < >
- { }
- |
- \
- ^
- `
- %
Example with Percent Sign
Original:
https://example.com/50% discount
Safe version:
https://example.com/50%25%20discount
- % becomes %25
- Space becomes %20
URL Encoding in Web Applications
In practical development, when developers define URL encoding, they usually refer to data that has already been converted into percent-encoded format before being sent within an HTTP request.
You will encounter such transformed values in:
- HTML form submissions
- AJAX calls
- Redirect parameters
- Tracking URLs
- REST API requests
Example: Form Data
User enters:
John Smith & Co.
Browser sends:
John+Smith+%26+Co.
Note: In the application/x-www-form-urlencoded format, spaces are replaced with + instead of %20.
Implementation in Code
Most programming languages provide built-in utilities.
JavaScript Example
Encoding a parameter:
encodeURIComponent("indoor locker & storage")
Result:
indoor%20locker%20%26%20storage
Decoding restores the original string:
decodeURIComponent("indoor%20locker%20%26%20storage")
Handling Unicode and Multilingual URLs
Modern systems rely on UTF-8. Non-ASCII characters are first converted into UTF-8 byte sequences and then expressed in percent format.
Example:
Original:
https://example.com/product
Converted:
https://example.com/%D0%BF%D1%80%D0%BE%D0%B4%D1%83%D0%BA%D1%82
Each UTF-8 byte becomes %XX.
This mechanism enables proper handling of international content and multilingual websites.
Quick Reference Table
|
Character |
Decimal |
Hex |
Encoded |
|
Space |
32 |
20 |
%20 |
|
! |
33 |
21 |
%21 |
|
“ |
34 |
22 |
%22 |
|
# |
35 |
23 |
%23 |
|
% |
37 |
25 |
%25 |
|
& |
38 |
26 |
%26 |
|
= |
61 |
3D |
%3D |
|
? |
63 |
3F |
%3F |
Frequent Implementation Mistakes
Double Conversion
Example:
%20 → %2520
This occurs when already transformed data is processed again.
Consequence:
- Broken redirects
- Corrupted parameters
- Complex debugging
Encoding the Entire Address
Incorrect:
encodeURIComponent("https://example.com/page?id=5")
Only parameter values should be transformed, not the structural components of the address.
Inconsistent Canonical Handling
If both converted and non-converted versions of a page are accessible, duplicate URLs may appear. Proper URL generation logic should ensure consistency.
Why Proper Implementation Matters
At its foundation, this mechanism ensures that characters unsafe for direct transmission are represented in a standardized, transport-friendly format. It keeps HTTP communication predictable and machine-readable.
It functions behind the scenes in:
- Web frameworks
- Routing systems
- E-commerce filters
- API endpoints
- Analytics tracking
When implemented correctly, it remains invisible to users. When misconfigured, it leads to malformed requests, indexing issues, and unstable application behavior.
A clear understanding of how character transformation works is essential for developers building reliable, scalable web systems.