Squish 6.5 Upcoming Feature: OCR Support

Squish 6.5 Upcoming Feature: OCR Support

The Squish 6.3 release introduced Image-based recognition, allowing users to identify and automate application components that were not otherwise recognized by Squish’s object-recognition capabilities. While Image-based recognition is useful, one difficulty arises in creating platform-independent tests. Visual appearance of a component can vary across platforms for a number of reasons. This variability is particularly prominent for onscreen text because of a wide assortment of fonts, font sizes, decorations and rendering modes used on different platforms. Fuzzy image search, introduced in Squish 6.4, is generally unsuitable for finding text. The same text rendered with different parameters can look largely dissimilar in pixel-to-pixel comparison because of varying letter widths, different kerning or shifting line break positions. In order to allow for efficient text handling in such scenarios, the upcoming Squish 6.5 release includes support for Optical Character Recognition (OCR).

OCR implementation

Squish currently uses the free Tesseract OCR library to facilitate the text recognition functionality. In principle, any OCR engine can potentially be substituted. Please contact froglogic support if you are interested in using another OCR engine. Due to the size of the complete Tesseract OCR package, including all of the language files, it needs to be installed independently of Squish. These packages will be available on our download portal. Please note that Squish’s OCR functionality will remain inactive if the Tesseract OCR package is not installed.

Interacting with text

Squish IDE supports recording of the OCR interaction and verification instructions. During script recording, you can select one of the OCR options from the Insert menu on the Squish control bar and select an action. The OCR selection dialog then appears.

OCR selection dialog

The dialog presents the current desktop screenshot and the results of the OCR overlaid on top of it. There, you can select the text for interaction and modify the search text if necessary. In the screenshot above, we have fitted a rectangle around a selection of text for recognition. After accepting the dialog, one of the following lines will be added to the recorded script (depending on the action chosen on the control bar):

mouseClick(waitForOcrText("Start new game"));
doubleClick(waitForOcrText("Start new game"));
tapObject(waitForOcrText("Start new game"));
test.ocrTextPresent("Start new game");

On replay, the new Squish API functions will repeatedly preform the OCR on the current screen contents. After each run it will search the result set for the specified text. It will continue until the specified phrase is found or the timeout expires.

Just like with the image search APIs, you can select further occurrences of the same phrase using the optional occurrence parameter. Also, the Tesseract OCR engine supports specifying a language of the recognized text. It serves as a hint to the engine, influencing the character set and the dictionary used. If a non-default language is selected during recording, it will be saved as a part of the test script:

// Click on the third occurrence of the German text
mouseClick(waitForOcrText("Neues spiel", {language: "German", occurrence: 3}));

Screen area selection

The OCR engines are tuned to scan pages of black text on a white background. User Interface screenshots that contain interactive controls, icons, color backgrounds, etc. tend to confuse the engine and produce some amount of erroneous results. These usually manifest in the form of short, meaningless groups of letters. Therefore, it is likely not very useful to grab the text from the entire desktop – even if the AUT window is maximized or spans the entire screen. To limit the area used for OCR, please specify an object reference as the optional searchRegion parameter. In order to minimize the OCR artifact count and the test run times you should always try to keep the search region as small as possible.

// Search for text only on the specified window
waitForOcrText("Start new game", {}, waitForObject(":Main_Window"));
// Log the text on a custom rendered widget
test.log( getOcrText( {}, waitForObject(":Custom_Rendered_Widget")));

The search region can be a ScreenRectangle object. Therefore it is possible to use script-computed coordinates as the search region as well:

// Search for text only in the specific rectangle
waitForOcrText("Start new game", {}, new UiTypes.ScreenRectangle( 100,    // x
                                                                  100,    // y
                                                                  500,    // width
                                                                  500 )); // height

It is also possible to combine multiple search functions. As an example, we can capture an image of a button and make the central part that displays the label transparent. By combining the image search and OCR, we can get the content of the button’s label.

Button frame

var buttonPosition = waitForImage("button_frame");
test.log( getOcrText({}, buttonPosition));


Text rendering is one of the biggest reasons behind differences in visual appearance of the same components across different platforms. Thanks to the newly added OCR support, it is possible to write platform-independent tests based solely on your AUT’s appearance. It is also possible to interact with controls outside of the tested application like web browser menus, desktop icons, system menus, etc. Combined with Image-based recognition capabilities, it should allow verification and interaction with virtually any application component.


Leave a reply

电子邮件地址不会被公开。 必填项已用*标注