Punycode_idnaIDNA (Internationalized Domain Names in Applications) support.
This module provides ToASCII and ToUnicode operations as specified in RFC 5891 (IDNA 2008), using Punycode (RFC 3492) for encoding.
IDNA allows domain names to contain non-ASCII Unicode characters by encoding them using Punycode with an ACE prefix. This module handles the conversion between Unicode domain names and their ASCII-compatible encoding (ACE) form.
type error_reason = | Punycode_error of Punycode.error_reason| Invalid_label of stringLabel violates IDNA constraints. The string describes the violation. See RFC 5891 Section 4 for label validation requirements.
*)| Domain_too_long of int| Normalization_failed| Verification_failedToASCII/ToUnicode verification step failed (round-trip check). Per RFC 5891 Section 4.2, the result of encoding must decode back to the original input.
*)exception Error of error_reasonException raised for all IDNA processing errors.
val pp_error_reason : Format.formatter -> error_reason -> unitpp_error_reason fmt e pretty-prints an error.
val error_reason_to_string : error_reason -> stringerror_reason_to_string e converts an error to a human-readable string.
Maximum length of a domain name in bytes (253), per RFC 1035.
Converts an internationalized domain name to its ASCII-compatible encoding (ACE) form suitable for DNS lookup.
See RFC 5891 Section 4 for the complete ToASCII specification.
val to_ascii :
?check_hyphens:bool ->
?check_bidi:bool ->
?check_joiners:bool ->
?use_std3_rules:bool ->
?transitional:bool ->
string ->
stringto_ascii domain converts an internationalized domain name to ASCII.
Implements the ToASCII operation from RFC 5891 Section 4.1.
For each label in the domain: 1. If all ASCII, pass through (with optional STD3 validation) 2. Otherwise, normalize to NFC per Section 4.2.1 and Punycode-encode with ACE prefix
Optional parameters (per RFC 5891 Section 4 processing options):
check_hyphens: Validate hyphen placement per Section 4.2.3.1 (default: true)check_bidi: Check bidirectional text rules per RFC 5893 (default: false, not implemented)check_joiners: Check contextual joiner rules per RFC 5892 Appendix A.1 (default: false, not implemented)use_std3_rules: Apply STD3 hostname rules per Section 4.2.3.2 (default: false)transitional: Use IDNA 2003 transitional processing (default: false)Example:
to_ascii "münchen.example.com"
(* = "xn--mnchen-3ya.example.com" *)label_to_ascii label converts a single label to ASCII.
This implements the core ToASCII operation for one label, as described in RFC 5891 Section 4.1.
Converts an ASCII-compatible encoded domain name back to Unicode.
See RFC 5891 Section 4.2 for the complete ToUnicode specification.
to_unicode domain converts an ACE domain name to Unicode.
Implements the ToUnicode operation from RFC 5891 Section 4.2.
For each label in the domain: 1. If it has the ACE prefix ("xn--"), Punycode-decode it per RFC 3492 Section 6.2 2. Otherwise, pass through unchanged
Example:
to_unicode "xn--mnchen-3ya.example.com"
(* = "münchen.example.com" *)label_to_unicode label converts a single ACE label to Unicode.
This implements the core ToUnicode operation for one label, as described in RFC 5891 Section 4.2.
Functions that work with the domain-name library types.
These provide integration with the Domain_name module for applications that use that library for domain name handling.
val domain_to_ascii :
?check_hyphens:bool ->
?use_std3_rules:bool ->
[ `raw ] Domain_name.t ->
[ `raw ] Domain_name.tdomain_to_ascii domain converts a domain name to ASCII form.
Applies to_ascii to the string representation and returns the result as a Domain_name.t.
Example:
let d = Domain_name.of_string_exn "münchen.example.com" in
domain_to_ascii d
(* = Domain_name.of_string_exn "xn--mnchen-3ya.example.com" *)val domain_to_unicode : [ `raw ] Domain_name.t -> [ `raw ] Domain_name.tdomain_to_unicode domain converts a domain name to Unicode form.
Applies to_unicode to the string representation and returns the result as a Domain_name.t.
is_idna_valid domain checks if a domain name is valid for IDNA processing.
Returns true if to_ascii would succeed on the domain.
is_ace_label label is true if the label has the ACE prefix "xn--" (case-insensitive). This indicates the label is Punycode-encoded per RFC 3492 Section 5.
normalize_nfc s returns the NFC-normalized form of UTF-8 string s.
Per RFC 5891 Section 4.2.1, domain labels must be normalized to NFC (Unicode Normalization Form C) before encoding.
See Unicode Standard Annex #15 for details on Unicode normalization forms.