引出问题:当我们进行网络请求的时候,URL中有中文和特殊字符时,请求就会报错(基本都是Get请求),这个时候就需要对请求链接URL进行encode编码。
Objective-C中的URL编码解码
encode
- (NSString*)urlEncode
{
NSString *encode = [self stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];
if (encode.length) {
return encode;
}
return self;
}
编码用到了[NSCharacterSet URLQueryAllowedCharacterSet]
,这个我们稍后详细看一下。
decode
- (NSString*)urlDecode
{
NSString *decode = [self stringByRemovingPercentEncoding];
if (decode.length) {
return decode;
}
return self;
}
NSCharacterSet字符集
NSCharacterSet
对象表示一组Unicode兼容字符,我们对字符串进行编码用到的API是:
// Returns a new string made from the receiver by replacing all characters not in the allowedCharacters set with percent encoded characters. UTF-8 encoding is used to determine the correct percent encoded characters. Entire URL strings cannot be percent-encoded. This method is intended to percent-encode a URL component or subcomponent string, NOT the entire URL string. Any characters in allowedCharacters outside of the 7-bit ASCII range are ignored.
- (nullable NSString *)stringByAddingPercentEncodingWithAllowedCharacters:(NSCharacterSet *)allowedCharacters API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
通过将不在allowedCharacters集合中的所有字符替换为百分比编码字符,返回从接收器生成的新字符串。UTF-8编码用于确定编码字符的正确百分比。不能对整个URL字符串进行百分比编码。此方法旨在对URL组件或子组件字符串进行百分比编码,而不是对整个URL字符串进行百分比。allowedCharacters中超出7位ASCII范围的任何字符都将被忽略。(
意思就是:会对这个字符串进行Unicode(UTF-8)编码,另外将不在allowedCharacters集合中的所有字符替换为百分比编码字符,但你也不能对整个URL字符串进行编码,应该区别对待scheme、host、path、query。
注意点:不在allowedCharacters集合中的字符!不在allowedCharacters集合中的字符!不在allowedCharacters集合中的字符!这一点是其他博客都没说明的。
allowedCharacters这个字符集你可以自定义集合,也可以使用NSCharacterSet的类属性。
常用字符集
NSCharacterSet类属性API
@interface NSCharacterSet (NSURLUtilities)
// Predefined character sets for the six URL components and subcomponents which allow percent encoding. These character sets are passed to -stringByAddingPercentEncodingWithAllowedCharacters:.
// Returns a character set containing the characters allowed in a URL's user subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLUserAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
// Returns a character set containing the characters allowed in a URL's password subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLPasswordAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
// Returns a character set containing the characters allowed in a URL's host subcomponent.
@property (class, readonly, copy) NSCharacterSet *URLHostAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
// Returns a character set containing the characters allowed in a URL's path component. ';' is a legal path character, but it is recommended that it be percent-encoded for best compatibility with NSURL (-stringByAddingPercentEncodingWithAllowedCharacters: will percent-encode any ';' characters if you pass the URLPathAllowedCharacterSet).
@property (class, readonly, copy) NSCharacterSet *URLPathAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
// Returns a character set containing the characters allowed in a URL's query component.
@property (class, readonly, copy) NSCharacterSet *URLQueryAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
// Returns a character set containing the characters allowed in a URL's fragment component.
@property (class, readonly, copy) NSCharacterSet *URLFragmentAllowedCharacterSet API_AVAILABLE(macos(10.9), ios(7.0), watchos(2.0), tvos(9.0));
@end
这几个类属性有什么区别呢?只去看官方文档真不好理解有什么具体的区别。我们写一段代码简单测试一下,用这几个属性分别对 https://小明:pwd123@192.168.1.1:80/app/home/list?name=中国&address=BJ&page=2&pageCount=&role=1#index 进行编码
URL结构
hierarchical part
┌───────────────────┴─────────────────────┐
authority path
┌───────────────┴───────────────┐┌───┴────┐
abc://username:password@example.com:123/path/data?key=value&key2=value2#fragid1
└┬┘ └───────┬───────┘ └────┬────┘ └┬┘ └─────────┬─────────┘ └──┬──┘
scheme user information host port query fragment
urn:example:mammal:monotreme:echidna
└┬┘ └────────────┬───────────────┘
scheme path
URL结构拆解
scheme | host | path | query | port | user | password | fragment |
---|---|---|---|---|---|---|---|
https | 192.168.1.1 | /app/home/list | name=中国&address=BJ&page=2&pageCount=&role=1 | 80 | 小明 | pwd123 | index |
编码结果
类属性 | 编码后文本 |
---|---|
URLUserAllowedCharacterSet | https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
URLPasswordAllowedCharacterSet | https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
URLHostAllowedCharacterSet | https%3A%2F%2F%E5%B0%8F%E6%98%8E%3Apwd123%40192.168.1.1%3A80%2Fapp%2Fhome%2Flist%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
URLPathAllowedCharacterSet | https%3A//%E5%B0%8F%E6%98%8E:pwd123@192.168.1.1:80/app/home/list%3Fname=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
URLQueryAllowedCharacterSet | https://%E5%B0%8F%E6%98%8E:pwd123@192.168.1.1:80/app/home/list?name=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
URLFragmentAllowedCharacterSet | https://%E5%B0%8F%E6%98%8E:pwd123@192.168.1.1:80/app/home/list?name=%E4%B8%AD%E5%9B%BD&address=BJ&page=2&pageCount=&role=1%23index |
通过上面的表格看细节不太好比较,但是我们知道他们所编码的部分和字符集是不一样的,网络上大部分流传是这样的:
URLFragmentAllowedCharacterSet "#%<>[\]^`{|}
URLHostAllowedCharacterSet "#%/<>?@\^`{|}
URLPasswordAllowedCharacterSet "#%/:<>?@[\]^`{|}
URLPathAllowedCharacterSet "#%;<>?[\]^`{|}
URLQueryAllowedCharacterSet "#%<>[\]^`{|}
URLUserAllowedCharacterSet "#%/:<>?@[\]^`
那么对不对呢?依据是什么?我在Apple官网也没找到相关的资料证明这个,索性我们做一次实验吧:把ASCII中的字符用NSCharacterSet编码。
要编码的字符串是:NSString code = @" !"#$%&'()+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~" ASCII编码表中的32位到126位。
编码结果
类属性 | 编码后文本 | 被编码的字符集 |
---|---|---|
URLUserAllowedCharacterSet | %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%/:<>?@[\]^`{ |
URLPasswordAllowedCharacterSet | %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%/:<>?@[\]^`{ |
URLHostAllowedCharacterSet | %20!%22%23$%25&'()*+,-.%2F0123456789%3A;%3C=%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%/:<>?@[\]^`{ |
URLPathAllowedCharacterSet | %20!%22%23$%25&'()*+,-./0123456789:%3B%3C=%3E%3F@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%;<>?[\]^`{ |
URLQueryAllowedCharacterSet | %20!%22%23$%25&'()*+,-./0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%<>[\]^`{ |
URLFragmentAllowedCharacterSet | %20!%22%23$%25&'()*+,-./0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~ | @" "#%<>[\]^`{ |
结论:网上流传的并不对,这个是我亲身实践得出的,开发中一般使用
URLQueryAllowedCharacterSet
或URLFragmentAllowedCharacterSet
(他俩支持的字符集一样),这样就不会对URL常出现的?/:
进行编码了。
自定义字符集
经过上面的分析,我们对编码有了一定了解,那么像 '()*+,-.
等几个特殊字符,URLQueryAllowedCharacterSet
并不支持编码,和其他平台传输有乱码现象怎么办呢?这个时候就需要自定义字符集了。
NSString *code = @" !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~";
NSCharacterSet *invertedSet = [[NSCharacterSet characterSetWithCharactersInString:@" \"#%<>[\\]^`{|}'()*+,-."] invertedSet];
NSString *encode = [code stringByAddingPercentEncodingWithAllowedCharacters:invertedSet];
//编码后encode: %20!%22%23$%25&%27%28%29%2A%2B%2C%2D%2E/0123456789:;%3C=%3E?@ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B%5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D~
变量 "#%<>[\]^``{|}'()*+,-.
为什么要 invertedSet
反转集合呢?因为 stringByAddingPercentEncodingWithAllowedCharacters
入参的字符集合是不会被编码的集合,我们反转之后就是对我们自定义的变量里面的字符进行编码了。
End。