From 5ba5ba9203090f86b2e7fe3ddbb67bc6a3d1498f Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Fri, 19 Aug 2022 10:23:22 -0700 Subject: [PATCH 1/3] Address [I18N-ACTION-1178] by merging useful bits of i18n-html-tech-lang Merged definitions of intended audience and text-processing language with minor edits. Addition of a new example. --- index.html | 49 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 47 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index d9ea77c..fa7a2da 100644 --- a/index.html +++ b/index.html @@ -189,8 +189,53 @@

Languages and Language Tags

Specifications that define language tag matching MUST specify the matching algorithms available and the selection mechanism.

For example, JavaScript internationalization [[ECMA-402]] and [[CLDR]] provide a "best fit" algorithm which can be tailored by implementers.

+
+

Defining and using language tags

+ +

There are two common uses for language tags in document formats, protocols, and specifications. In some cases, language tags are used to provide metadata about intended audience for collections of content, such as at the record or document level. In other cases, language tags are used to identify the language of specific bits of text in order to facilitate text processing.

+ +
The language of the intended audience
- +

Metadata that describes the language of the intended audience is about the document as a whole. Such metadata may be used for searching, serving the right language version, classification, etc. Where there are language changes in a document, information about the language of the intended audience is not specific enough to support text-processing, that is to say, in a way that would be needed for the application of text-to-speech, styling, automatic font assignment, etc.

+ +

The language of the intended audience does not include every language used in a document. Many documents on the Web contain embedded fragments of content in different languages, whereas the page is clearly aimed at speakers of one particular language. For example, a German city-guide for Beijing may contain useful phrases in Chinese, but it is aimed at a German-speaking audience, not a Chinese one.

+ +

On the other hand, it is also possible to imagine a situation where a document contains the same or parallel content in more than one language. For example, a Web page may welcome Canadian readers with French content in the left column, and the same content in English in the right-hand column. Here the document is equally targeted at speakers of both languages, so there are two audience languages. This situation is not as common on the Web as in printed material since it is easy to link to separate pages on the Web for different audiences, but it does occur where there are multilingual communities. Another use case is a blog or a news page aimed at a multilingual community, where some articles on a page are in one language and some in another.

+ +

There are also pages where the navigational information, including the page title, is in one language but the real content of the page is in another. While this is not necessarily good practice, it doesn't change the fact that the language of the intended audience is usually that of the content, regardless of the language at the top of the document source.

+ +

Metadata about the language of the intended audience is usually best declared outside the document, such as in the HTTP Content-Language header.

+ +
+
The text-processing language
+ +

When specifying the text-processing language you are declaring the language in which a specific range of text is actually written, so that user agents or applications that manipulate the text, such as voice browsers, spell checkers, or style processors can effectively handle the text in question. So we are, by necessity, talking about associating a single language with a specific range of text.

+ +

This specificity distinguishes the declaration of the language for text-processing from that of the language of the intended audience.

+ +

The language for text-processing is usually best declared using attributes on elements, including setting a document-wide default.

+ + +
+
@@ -240,7 +285,7 @@

Locales and Internationalization

Since the adoption of the current [[BCP47]] identifier syntax, a number of locale models have adopted BCP47 directly or provided adaptation or mappings between proprietary models and language tags. Notably, the development and adoption of the open-source repository of locale data known as [[CLDR]] has led to wider general adoption of language tags as locale identifiers.

-

Common Locale Data Repository (or [[CLDR]]). The Common Locale Data Repository is a Unicode Consortium project that defines, collects, and curates sets of data needed to enable locales in systems or operating environments. CLDR data and its locale model are widely adopted, particularly in browsers.

+

Common Locale Data Repository (or [[CLDR]]). The Common Locale Data Repository is a Unicode Consortium project that defines, collects, and curates sets of data needed to enable locales in systems or operating environments. CLDR data and its locale model are widely adopted, particularly in browsers.

Unicode Locale Identifier or Unicode Locale. A language tag that follows the additional rules and restrictions on subtag choice defined in UTR#35 [[LDML]]. Any valid Unicode locale identifier is also a valid [[BCP47]] language tag, but a few valid language tags are not also valid Unicode locale identifiers.

From b9fb5652da3da841f954c41a9fd05e282770f3d9 Mon Sep 17 00:00:00 2001 From: Addison Phillips Date: Thu, 25 Aug 2022 06:52:16 -0700 Subject: [PATCH 2/3] Address @r12a's comments Change `kbd` to `span class=kw` --- index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index fa7a2da..ee76ac9 100644 --- a/index.html +++ b/index.html @@ -204,7 +204,7 @@

Defining and using language tags

There are also pages where the navigational information, including the page title, is in one language but the real content of the page is in another. While this is not necessarily good practice, it doesn't change the fact that the language of the intended audience is usually that of the content, regardless of the language at the top of the document source.

-

Metadata about the language of the intended audience is usually best declared outside the document, such as in the HTTP Content-Language header.

+

Metadata about the language of the intended audience is usually best declared outside the document, such as in the HTTP Content-Language header.

The text-processing language
@@ -216,7 +216,7 @@

Defining and using language tags

The language for text-processing is usually best declared using attributes on elements, including setting a document-wide default.

@@ -552,7 +508,59 @@

Locales and Internationalization

Users expect form fields and other data inputs to use a presentation for non-linguistic fields that is consistent with the document or application where the values appear. User's usually expect their input to match the document's context rather than the user-agent or operating environments and input validation, prompting, or controls are also thus consistent with the content. This gives content authors the ability to create a wholly localized customer experience and is generally in keeping with customer expectations.

- + +
+

Choosing between metadata and text-processing language

+ +

There are two common uses for language tags in document formats, protocols, and specifications. In some cases, language tags are used to provide metadata about intended audience for collections of content, such as at the record or document level. In other cases, language tags are used to identify the language of specific bits of text in order to facilitate text processing.

+ +
+
The language of the intended audience
+ +

Metadata that describes the language of the intended audience is about the document as a whole. Such metadata may be used for searching, serving the right language version, classification, etc. Where there are language changes in a document, information about the language of the intended audience is not specific enough to support text-processing, that is to say, in a way that would be needed for the application of text-to-speech, styling, automatic font assignment, etc.

+ +

The language of the intended audience does not include every language used in a document. Many documents on the Web contain embedded fragments of content in different languages, whereas the page is clearly aimed at speakers of one particular language. For example, a German city-guide for Beijing may contain useful phrases in Chinese, but it is aimed at a German-speaking audience, not a Chinese one.

+ +

On the other hand, it is also possible to imagine a situation where a document contains the same or parallel content in more than one language. For example, a Web page may welcome Canadian readers with French content in the left column, and the same content in English in the right-hand column. Here the document is equally targeted at speakers of both languages, so there are two audience languages. This situation is not as common on the Web as in printed material since it is easy to link to separate pages on the Web for different audiences, but it does occur where there are multilingual communities. Another use case is a blog or a news page aimed at a multilingual community, where some articles on a page are in one language and some in another.

+ +

There are also pages where the navigational information, including the page title, is in one language but the real content of the page is in another. While this is not necessarily good practice, it doesn't change the fact that the language of the intended audience is usually that of the content, regardless of the language at the top of the document source.

+ +

Metadata about the language of the intended audience is usually best declared outside the document, such as in the HTTP Content-Language header.

+
+ +
+
The text-processing language
+ +

When specifying the text-processing language you are declaring the language in which a specific range of text is actually written, so that user agents or applications that manipulate the text (such as voice browsers, spell checkers, or style processors) can process the text in a language-appropriate manner. So we are, by necessity, talking about associating a single language with a specific range of text.

+ +

This specificity distinguishes the declaration of the language for text-processing from that of the language of the intended audience.

+ +

The language for text-processing is usually best declared using attributes on elements, including setting a document-wide default.

+ + + +
+

Further Reading

diff --git a/local.css b/local.css index bafbc19..921693c 100644 --- a/local.css +++ b/local.css @@ -77,6 +77,15 @@ kbd { text-align: start; } +.kw { + font-family: Menlo, Consolas, "DejaVu Sans Mono", Monaco, monospace; + font-size: .95em; + color: blue; + page-break-inside: avoid; + hyphens: none; + text-transform: none; +} + .summary { padding: 1em;