Kevin Price, Host of the Price of Business on Business Talk 1110 AM KTEK (on Bloomberg’s home in Houston) recently interviewed David Thede.
About the interviewee
David Thede, President, dtSearch Corp.
Tell me about your firm (number of employees, location, type of companies you work with, etc.).
The Smart Choice for Text Retrieval® since 1991, dtSearch offers enterprise and developer text retrieval along with document filters. Document filters are the current industry name for software that parses file formats, emails and other data.
dtSearch provides parsing, extraction, conversion and searching of a broad spectrum of data formats. Supported data types encompass databases, static and dynamic website data, popular “Office” formats, compression formats and emails (including the full-text of nested attachments).
The dtSearch product line spans enterprise and developer text search products, meeting some of the largest-capacity text retrieval needs in the world.
What type and size of companies do you have as clients?
Typical dtSearch clients are large companies with lots and lots of text to search. For example, 6 out of 7 of the Fortune 500 largest Aerospace and Defense industry companies are dtSearch Engine customers. But dtSearch welcomes working with organizations of any size that need to search their databases, files, emails, etc., as well as companies whose web-based data requires full-text searching.
What comes to mind when you see this topic (text retrieval/search solution tools)?
The expectation of “everywhere” search defines today’s enterprise search. Enterprise users expect instant concurrent searching of all content-based data applications. And they expect comprehensive, instant concurrent searching across all data repositories at once.
What are the best practices when it comes to this issue?
Best practices dictate that text search will dig deep. Suppose an email message has a ZIP or RAR attachment consisting of a PDF and an MS Word document, where the latter embeds an Excel file which in turn embeds a PowerPoint document. Search should encompass the deepest levels of this recursively-embedded structure.
Best practices also require that indexed searching should typically be less than a second, even across terabytes of data from a variety of different online and offline sources. And best practices supports fast multithreaded indexed searching for instant concurrent search of shared-access data repositories.
Best practices also requires a wide range of search options for refining search requests. (dtSearch offers over 25 different search options.) And search should return search results in an easily navigable format, including with highlighted hits.
And one tip for end-users: emails frequently contain a lot of misspellings. To sift through those misspellings, you’ll want to turn on the feature we call “fuzzy searching” to a level of one or two.