How to Use External Datasets to Greatly Expand the Reach of Your Competitive Intelligence

Nomad Data
December 3, 2021
At Nomad Data we help you find the right dataset to address any business problem. Submit your free data request describing your use case and you'll be connected with data providers from our over 3,000 partners who can address your exact need.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Your competition now has access to all types of data about your business. By understanding the volumes of new data coming to market you can maintain your competitive edge. Never before has such granular detail been available on your competition, your market and the customers you serve. In this article you’ll learn:

  • What visibility you can gain through external data
  • How to find the right data to match your business need
  • The categories of data that exist
  • The process for buying and procuring external data

How to find the right data to match your business need

The world of external datasets is growing by leaps and bounds. Sensors, microsatellites, smartphones and the Internet are blanketing the planet in an information absorption layer unlike anything that has ever existed. The information created through this ubiquitous layer of observation is transforming the art of competitive intelligence.

While a company’s internally generated data still has a strong place in competitive intelligence, in most cases its value diminishes when looking outside of a company’s four walls. External data allows companies to monitor the surrounding environment with striking precision.

Navigating this world of external data used to be complex and reserved only for those with the deepest technical expertise in information supply chains, data engineering and business analysis. As with most things technology, recent innovations have driven the cost and complexity of data discovery down.

One such innovation in data search is Nomad Data’s discovery platform ( It allows competitive intelligence professionals to define their business visibility need and be quickly matched to the best commercial data available. Using natural language processing, Nomad’s engine forwards the business use case to the data companies who most likely have the data that can service it. Data companies respond with the specifics of how their data can inform on the areas of client interest. This approach combines the best of human and machine. Locating the most relevant data, which used to often require weeks of work, oftentimes with little to no success, now requires mere minutes of a data searcher’s time.

What visibility you can gain through external data

The most common questions we get about external data are always, “Well what can it be used for? What’s even possible?”. Our data searching customers often have preconceived notions about where data can provide value or add insight. More often than not these preconceived notions are inaccurate in today’s overly data-fueled environment. Rather than list out all the possible use cases for data let’s focus on those specific to competitive intelligence. This list is nowhere near exhaustive but some of the more popular uses we’ve seen in competitive intelligence include:

  • Lists of competitors in your market along with metrics around their scale (revenues, employees)
  • Which products are driving my competitors’ growth? What is the growth level?
  • How is my competition changing pricing in this inflationary environment?
  • What roles is my competition hiring in? Seeing attrition in?
  • How is my competition acquiring customers? Through which channels?
  • How far are customers driving to my competitors’ locations?
  • What is the overlap in customers between my business and my competition?
  • What industries is my competition hiring from?
  • What channels is my competition marketing through? Which messages are resonating for them?

These questions represent the tip of an unimaginably large iceberg hidden beneath the ocean of data. Nomad’s role is to connect these customer use cases to the best of breed datasets than can address them.

The categories of data that exist

Data is derived from too many places to list them all (Nomad has over 100 categories), but generally commercial data comes from the following high-level sources:

  • Web scrape data – A vendor scrapes web pages over time to produce a valuable dataset
  • Transaction data – Purchases can be tracked through payment processors, banks, credit cards, email receipts, point of sale systems
  • Geolocation data – GPS chips on cell-phones and within laptops report user locations a frequently as by the minute.
  • Satellite data – Satellite constellations can track the earth using cameras, radio receivers, infrared detectors. The types of sensors being added to new satellites is rapidly expanding.
  • Exhaust data – This data is derived from equipment and software in the information supply chain including routers, DNS servers, software applications, sensor logs, etc.

Nomad Data removes the need for you to be an expert at each data category. You no longer have to understand the situations where each is most applicable.

The process for buying and procuring external data

Once you’ve identified several data vendors that can address your needs at a high level, there are additional factors which come into play when deciding which to move forward with.

One key factor is alignment of selling models. Some data sellers will license data on a one-off or bespoke basis, meaning you specify what slice of data you need, and they deliver it in a single data extract. Not all data companies are willing to license data in this fashion. Some sell data in an all-you-can-eat fashion, requiring you to pay for both the data you need and other data you don’t. They also require you to pay for the initial file as well as updates that may come at different time intervals over a quarter or a year. For people shopping for data that will need continuous updates over time, the market is slightly larger than it is for those looking to buy piecemeal. Ensure the data seller is willing to license in a manor that fits with the way you are looking to purchase.

Data testing is one of the first steps in the acquisition process. The goal is to ensure that the data does in fact provide the insight into your use case and that it does so accurately. Understanding data biases is also a critical piece of this step.

Data testing involves looking at sample data, or if the data is delivered within an interface, testing that interface and its features. The first key goal of this step is to ensure that you are seeing the granularity of data you need. For example, you may need to see your competitors’ sales of a specific product and a dataset may only be able to show you their overall sales or shopping basket, not the individual item sales. You may need to see product sales on a weekly basis so you can react quickly, but a dataset may only be updated quarterly.

The next step is to compare the data being shared from the data seller with something you know to be true. This data is referred to as ground truth data. One way to accomplish this is to compare the data seller’s reported data on your own business against your actual internal data. This will give you a measure of accuracy. Later it can also be used to calibrate the overall dataset to give you a clearer read on competition.

The last key step in testing is to understand the biases. Data biases are created as a byproduct of the way a dataset is collected. Understanding the data collection methodology will help you uncover areas where you’d expect the data to be of higher and lower accuracy. For example, if a dataset is derived from a mobile application for finding shopping deals then conclusions drawn from it might be more representative of the behavior of cost conscious individuals rather than those who are price indiscriminate. Another example would be data collected from consumer review sites. People are far more likely to leave reviews if they are unhappy than in any other situation. As a result, consumer review data is biased far more negatively compared to actual opinions. Identifying and debiasing data is a complex task.

One of the last major hurdles in working with external data is the contracting process. The data license will outline the acceptable terms of use for the data as well as representations and warranties. Companies with higher bars for compliance will find this process more painful. Oftentimes the process of agreeing to terms can take weeks to months. For firms with a much lower bar, contracts can be signed in a matter of hours to a few days.

The final step in working with external data is around delivery. For many data sellers the process is fairly straightforward. The seller will deliver the information through an easy to navigate user interface or a well-defined file. For larger, more complex datasets, the process of building a data ingestion pipeline to handle this data can be extremely time consuming. Luckily this is also an area where there has been much innovation. Many data sellers are providing their data directly within Amazon S3 buckets or Snowflake databases. This makes delivery near instantaneous.

These steps represent the beginning of the journey. Now you’re ready to do your actual analysis to begin adding value to your organization. Although the process of reaching this point can seem intimidating, each step is seeing massive innovation. Eventually we should be able to get to the analysis step within the same day of finding the right data. This is where the future is heading.

Learn More