
We live in a world defined by globalization, tenuous supply chains, shifting alliances and hypercompetition, a world where competitive advantage has the lifespan of a Mayfly, where even now someone in Hyderabad or Kazakhstan or Hanyang may be sitting down to eat your lunch. In this fiercely competitive reality, what you know about your competitive environment—and, more importantly, what you don’t know—is likely to determine whether your company survives or succumbs.
The value of good competitive intelligence is self-evident but most CI teams are buried beneath a mountain of data that is growing exponentially every year. They’re often so busy collecting and filtering data they have no time left for what’s most important—analysis and advice. It’s become impossible to do the job well without the right tools. Fortunately, the right tools are getting pretty smart.
Intelligent agents, also known as autonomous agents or smart software, are finding wide acceptance in the application of competitive intelligence.
What Makes Software Smart?
What makes one piece of software smarter than another? Mostly, an intelligent agent must be autonomous, proactive, responsive, and sociable.
Once released, an autonomous agent carries out its mission without further instructions. Most software requires constant prodding from its user—a keystroke or a mouse click. Intelligent agents are strikingly independent. They succeed or fail at their task without human intervention.
Agents are driven by a single imperative—to accomplish their task. They never tire, never sleep, and never get surly. Their proactive, goal-oriented behavior has the tenacity of a Gila monster. They can be thwarted in their mission but never distracted or cajoled.
The more intelligent an agent, the more capably it can recognize and respond to changes in its environment, resolving challenges to the successful completion of its mission. Very bright agents can even learn from their environment.
Finally, agents need be sociable. They may need to communicate with other agents. They’re expected to communicate to their employer what they’ve learned in the wild. And should a mission fail, an agent needs to debrief on the cause of its failure in order to craft a more perfect mission.
The average agent probably has just enough intelligence to get out of its own way but there are a few adept at navigating complex networks and the invisible web, negotiating security protocols, and even disguising their origins to avoid counter-measures.
The Trouble With Google
A single agent is of limited use but a hive of agents can be very powerful. Hundreds or thousands of agents working cooperatively can literally change the world. Look at Google. Google is essentially an army of agents dispatched to crawl the web. Individually they’re not especially intelligent but en masse they are a very effective brute force.
Google and other search engines are appropriate tools for simple competitive intelligence tasks. The problem is primarily relevance. Not only is the shear mass of content returned by most search engine queries daunting but also much of it is irrelevant.
Search engines are designed to index text. The algorithms used to rank the relevance of indexed text to a particular keyword or key phrase are vulnerable to search engine spam—the chronic attempts by shady characters to wheedle a higher position in search engine returns—but even when relevant content is returned, only the URL and text immediately surrounding the keyword is extracted. Relevant content must then be manually identified, cut and pasted into a database or spreadsheet. The task is repetitive and tedious—perfect for an agent, boring for a human.
A successful CI agent must improve on the relevance of a search engine, extracting all of the relevant data and not just the immediately surrounding text. Less talented agents attempt this using a technique called “physical pinpointing.” The agent’s programmer must first visit the target page and physically highlight the desired data, providing the agent something akin to a geographical reference. Even minor changes to the page layout or syntax can invalidate physical pinpointing and result in the mission’s failure.
Really clever agents use logical pinpointing, a combination of Boolean expressions and pattern matching. Logical pinpointing can accommodate changes in the underlying structure of a page and even newly discovered pages.
The Invisible Web
From a business perspective, much of the really valuable information on the web—product pricing, inventory, descriptions, schedules, tabular data, management profiles, patent requests, government mandated disclosures—is stored in databases and accessible only after submitting a form. A web form is a cul-de-sac for a search engine but an intelligent agent can populate and submit the form repeatedly until all of the desired data is extracted.
An example. A number of sites on the web offer unique information about particular genes, each identified by its gene sequence. Sequences are similar to reference numbers and are required as form data submitted to a database to retrieve each site’s unique information about that gene. Many of these sites also have calculators to compute things like thermal dynamic properties. Calculators also use the gene sequence as form data.
At best, a search engine might point to the page where the form lives. The wealth of data behind the form remains inaccessible. Because if this, it’s sometimes called the “invisible web.” Search engines are blind to it. An intelligent agent, however, can submit each sequence of interest to each site, extract the relevant data, and export it to a database or spreadsheet for further analysis.
Navigating Chaos
The web is one of the most useful tools humanity has invented but it’s useful primarily because it’s chaotic. It’s a complex, decentralized, continually changing network of connections. Navigating this coherent chaos is like answering a Zen koan: how do you step twice in the same stream? Understanding it’s nature requires a slightly technical digression, but only slightly.
Each request for a web page is a discrete and discontinuous event with no relationship to the page previously requested. There is no continuity between successive actions, what the techies call “session state.” In more human terms, it’s as if you lived entirely in the present with no memory of your previous actions. To act coherently at all, you’d have to read environmental clues or pin notes to your shirt about what you’ve done already. This is pretty much the current state of the web. Paradoxically, it’s both a weakness and strength. The web may not recognize the past but it’s present is pretty much bulletproof.
There have been a number of clever inventions to circumvent this “sessionless state” between successive page requests and successive visits to the same site. One of these are cookies, a few lines of code saved to the browser’s hard drive. Cookies can contain various bits of data but the most important uniquely identifies the user. Cookies persist over time but only the web server that put them there can read them. Cookies provide an elegant, economical way for a server to recognize a returning browser and reunite the browser with its history.
Another means is a unique identifier assigned by a web server to each browser. The identifier is appended to the URL of any page requested by that browser (or hidden in an invisible form passed from page to page), providing continuity between page requests, but the unique identifier persists only as long as the browser is active on that site. There is no continuity between sessions. When the browser returns to the site—or if it tarries too long between page requests—it’s once again a stranger to the server.
And, of course, a user may be required to identify themselves before being admitted. The challenge/response protocol of username/password is commonly used to guard proprietary or paid content on the web.
Agents may need to replicate all of these navigational schemas—cookies, server session variables, username/password—in order to extract the desired data and accomplish their mission. The measure of a CI agent’s intelligence is largely their navigational ability.
Agents for Hire
QL2 Software is in the business of building agents for competitive and business intelligence applications. The feature set of our flagship product, WebQL, represents a good laundry list for evaluating a robust CI agent.
Chris Buckingham is CEO of QL2 Software, one of Washington State’s fastest growing private companies (Puget Sound Business Journal, 2005). QL2 Software develops intelligent agents for business, pricing and competitive intelligence.