Text Verification for Right-to-Left Languages
Summary
Arabic and Hebrew are both languages which are read from right-to-left. This is something that needs to be kept in mind when creating text validations for Synthetic Measurements, so as not to mix up left-to-right text with right-to-left text within the same string.
For example, the Hebrew: 'חֶסֶר' is the same word as 'חסר'. However, the first instance has diacritical marks to indicate the vowels and is therefore encoded differently. These strings are not equal and one of them would not work when using Text Verification on a web-page unless that web-page has the matching diacritical marks.
This is an example when you’d copy and paste both Arabic strings into the Python editor used by the Synthetic Agent:
Here's an example we encountered recently, this is two text strings in a python console, notice the encoding are different.
verify_a = 'Next تسوق عبر الإنترنت لشراء منتجات الموضة والملابس'
verify_b = 'Next السعودية | تسوق عبر الإنترنت لشراء منتجات الموضة والملابس'
verify_a
'Next \xd8\xaa\xd8\xb3\xd9\x88\xd9\x82 \xd8\xb9\xd8\xa8\xd8\xb1 . . . '
verify_b
'Next \xd8\xa7\xd9\x84\xd8\xb3\xd8\xb9\xd9\x88\xd8\xaf\xd9\x8a\xd8\xa9 . . .'
The above shows that at the data level these strings are different. Depending on where the Text is copied from this would change the format and the encoding of the text. It is recommended to grab the text directly from the page source and copy that into the Verify Text field.
After removing the ending characters and the encoding of both strings, they now match. When you add Arabic characters to the ‘end’ of the string is storing the encoding at the 'front' the string. This would have an impact on how the Python interpreter is running string comparisons for the Verify Text functionality, depending on the order in which the symbols are checked for presence on the page.
Please sign in to leave a comment.
Comments
0 comments