soft hyphens

as handled by search engines.

Rolling dices to find out what will work, often seems to be just as efficient as checking up on browsers' support for web standards. So much more than browsers to take into account, and support for standards is kind of hit and miss.

The way I interpret W3C standards on manual intro­duc­tion of word-break oppor­tuni­ties – “soft hyphens” etc, the coding variants shown below are equivalent. The following four variants of a word should therefore be handled as they appear on screen – unbroken and without hidden characters, regardless of how they are written…

  • accessibility (control) written as: accessibility
  • ac­ces­si­bility written as: ac­ces­si­bility
  • accessibility written as: ac<wbr>ces<wbr>si<wbr>bility
  • ac­ces­si­bility written as: ac&#173;ces&#173;si&#173;bility

Now to see how search engines handle these variants, bearing in mind what the W3C has said about the issue.

For operations such as searching and sorting, the soft hyphen should always be ignored. Text: Hyphens

A quick test with the four equivalent versions of “accessibility” as search-word in three different search engines, gave the following results…

  • Edge/Bing: goodfailgoodfail
  • Chrome/Google: goodgoodgoodgood
  • Opera/DuckDuckGo: goodfailgoodfail
  • Opera/Bing: goodfailgoodfail
  • Opera/Google: goodgoodgoodgood
  • Opera/Yahoo: goodfailgoodfail

Multi-test in Opera done to show that it was search engines that got tested, and not browsers. Looks like Google has a good handle on things in this department, which isn't surprising.
Disappointing that Bing, Yahoo and DuckDuckGo* don't “strip out” and ignore these control characters like they are supposed to.
* (I use DuckDuckGo as default search engine in my browsers.)

For completeness, here are a few more tests…

  • Opera/Wikipedia: goodgoodgoodgood
  • Opera/Amazon: goodfailgoodfail
  • Opera/ *: goodgoodgoodgood

* (Search on is driven by Google Custom Search.)

One thing is to copy and paste in words with functional characters to search with. A more important question is if search engines that fail this operation, are able to find words and sentences with these and other functional characters in or between them written in files on the internet.

I cannot properly check how well search engines handle soft hyphens on sites they crawl. I do however have a feeling that the tested engines are about as good, or as bad, at hand­ling func­tional and other­wise invisible characters any­where in all their operations, as they have shown in my tests for search-words.

other applications

I also took a tour around a few on-line dictionaries to see how they manage when served words containing soft hyphens. Without naming and listing any: out of seven visited, two dictionaries literally guessed the right word and did OK, and the other five failed.

Makes me wonder why these functional characters aren't handled the way they should in all such applications. Why is the issue overlooked? When will those who have coded for failure gonna fix their software?

to conclude

So, I have made these tests, and found some weaknesses in some major and minor search engines and other applications. The results are a little worse than I expected, but not sur­pris­ingly so.

I do use soft hypens and other invisible characters when I write on/for the web, and see no reason to change a practice that works in all major browsers. That there still is software that cannot handle these characters properly, isn't really my problem.

Since there are limits to what I can and/or want to spend time on checking, some of the more problematic but rare issues my articles and notes may run into may remain unknown to me until long after the problem-causing apps have lost access to the world wide web.

sincerely  georg; sign

Hageland 22.sep.2016
last rev: 23.sep.2016 advice upgrade advice upgrade navigation