by Greg Leighton, PhD
As a matter of course, you expect your documents to be available to you whenever you need them. But what about those frequent periods of time between accesses, while they are “sleeping”? Is there more they can do for your organization during these idle periods?
Awareness is growing that documents and other types of unstructured data, such as emails, news feeds, multimedia clips, and social media content, represent a valuable source of information about an organization’s business processes. Various estimates suggest that as much as 80% of a modern organization’s data is unstructured. Despite this, most attention thus far has been paid to mining structured information — such as transactional data — that fits tidily into relational databases. This is partly because the rigid schemas of database tables represent less of a moving target, facilitating the development of focused mining strategies capable of exploiting structural knowledge during their search for interesting patterns.
A second contributing factor has been the ever-rising importance of unstructured data within the enterprise. While relational databases have consistently played a key role since their introduction in the 1970s, various trends over the intervening decades have contributed to grant a higher status to unstructured data. These include the introduction of document authoring software, the adoption of email and instant messaging as core business communication tools, and the recent use of social media platforms as a means for increasing brand awareness and for obtaining immediate customer feedback. In response, effective solutions for mining information from such sources are only now emerging.
Utilizing the WebPal.net cloud infrastructure, we’re currently developing efficient and scalable methods for gleaning valuable business knowledge from documents, as they “sleep”. While maintaining the same levels of availability and responsiveness our users have grown accustomed to, we will be able to offer the many additional benefits presented by a robust document mining solution.