soft hyphens
… as handled by search engines.
Rolling dices to find out what will work, often seems to be just as efficient as checking up on browsers' support for web standards. So much more than browsers to take into account, and support for standards is kind of hit and miss.
The way I interpret W3C standards on manual introduction of word-break opportunities – “soft hyphens” etc, the coding variants shown below are equivalent. The following four variants of a word should therefore be handled as they appear on screen – unbroken and without hidden characters, regardless of how they are written…
- accessibility (control) written as: accessibility
- accessibility written as: ac­ces­si­bility
- ac
ces si bility written as: ac<wbr>ces<wbr>si<wbr>bility - accessibility written as: ac­ces­si­bility
Now to see how search engines handle these variants, bearing in mind what the W3C has said about the issue.
For operations such as searching and sorting, the soft hyphen should always be ignored. Text: Hyphens
A quick test with the four equivalent versions of “accessibility” as search-word in three different search engines, gave the following results…
- Edge/Bing: good – fail – good – fail
- Chrome/Google: good – good – good – good
- Opera/DuckDuckGo: good – fail – good – fail
- Opera/Bing: good – fail – good – fail
- Opera/Google: good – good – good – good
- Opera/Yahoo: good – fail – good – fail
Multi-test in Opera done to show that it was search engines that got tested, and not browsers. Looks like Google
has a good handle on things in this department, which isn't surprising.
Disappointing that Bing, Yahoo and DuckDuckGo* don't “strip out” and ignore these control characters
like they are supposed to.
* (I use DuckDuckGo as default search engine in my browsers.)
For completeness, here are a few more tests…
- Opera/Wikipedia: good – good – good – good
- Opera/Amazon: good – fail – good – fail
- Opera/gunlaug.com *: good – good – good – good
* (Search on gunlaug.com is driven by Google Custom Search.)
One thing is to copy and paste in words with functional characters to search with. A more important question is if search engines that fail this operation, are able to find words and sentences with these and other functional characters in or between them written in files on the internet.
I cannot properly check how well search engines handle soft hyphens on sites they crawl. I do however have a feeling that the tested engines are about as good, or as bad, at handling functional and otherwise invisible characters anywhere in all their operations, as they have shown in my tests for search-words.
other applications
I also took a tour around a few on-line dictionaries to see how they manage when served words containing soft hyphens. Without naming and listing any: out of seven visited, two dictionaries literally guessed the right word and did OK, and the other five failed.
Makes me wonder why these functional characters aren't handled the way they should in all such applications. Why is the issue overlooked? When will those who have coded for failure gonna fix their software?
to conclude
So, I have made these tests, and found some weaknesses in some major and minor search engines and other applications. The results are a little worse than I expected, but not surprisingly so.
I do use soft hypens and other invisible characters when I write on/for the web, and see no reason to change a practice that works in all major browsers. That there still is software that cannot handle these characters properly, isn't really my problem.
Since there are limits to what I can and/or want to spend time on checking, some of the more problematic but rare issues my articles and notes may run into may remain unknown to me until long after the problem-causing apps have lost access to the world wide web.
sincerely
Hageland 22.sep.2016
23.sep.2016 - added local ref to article about my default search engine. Extended paragraphs in main article about
"handling of soft hyphens while crawling web sites".
last rev: 23.sep.2016