Talking Quality with Jimmy Becker

Mr Clean

 Some specifics on NetProspex’s approach to quality:

I recently joined NetProspex to lead our strategy to continue to enhance the quality and quantity of the content we offer. I wanted to use this opportunity to summarize our approach to quality – how we define it, how we measure it, and most importantly, how ensure it is as high as it can reasonably be.“Reasonably” is an important qualifier because there is always a trade-off between maximizing quality and maximizing quantity.

For us, quality is straightforward:  

Can you connect to this person via the phone number and/or email address that we have provided in our business contact record?

If you can reach that person through email and/or phone, then we have provided an accurate record to you; if you cannot, either because the person has moved on or because the contact information is incorrect, then we have not provided you with a quality record. In our world, quality is perishable and dynamic. People change jobs and what was a good contact record yesterday may be a bad one today. We estimate that the accuracy of our database “decays” at a rate of 1.5% per month. That is, 1.5% of the records that were good last month are no longer accurate one month later. This is simply the dynamic nature of the information and why we constantly replenish and refresh the data that we have.

Before I describe how we measure our quality, I want to first discuss how we compile our database and what content is contained within it. We have a database of 24+ million records of B2B business contacts, where each record contains a full business card profile. This contact information includes a phone number, postal address and an email address with a business domain (i.e., no personal emails). For each person, we also have her/his job title and company affiliation.

Our content collection is a crowd-source model whereby anyone in the B2B sales and marketing community can trade her/his business contacts into our database in exchange them for a like number of fresh contacts. This trading community is continually trading in new records for this consideration of refreshing their contact lists.

To give you a sense of the scale, we receive millions of new records each month. However, we do not accept most of these traded records. Our quality control processes reject the majority of these trades because the records are incomplete, they contain personal emails, they duplicate existing records of ours, or the quality is not up to our standards. These rejected records never make it into our database. For the subset of records that make it through our initial screen – we call it CleneStep – we then put them through a second process of verifying the quality of the traded records. To do this, we take a sample of each trade and telephone-verify the contact information.

We also do multiple tests on the emails, including offering contacts an opportunity to opt-out of our database. We assign an accuracy score to the entire trade based on the overall accuracy of the tested sample. For example, let’s assume we receive a trade of 25,000 records and 20,000 of them make it through the CleneStep process. We then send a random sample of those 20,000 records out for tele-verification. Let’s assume that for 75% of that sample of records we were able to verify that person works at that company and is accessible through that phone number and for 25% of the records, the person was no longer there, or it was an incorrect phone number, etc. In this case, we would assign an accuracy score of 75% for the entire batch of 20,000 records and then add them to our database to be available to our entire community of users for purchase and/or trade.

(I’m oversimplifying slightly so let me clarify and be more precise. For the subset of records we actually sampled, we will assign an accuracy score of 100 for the 75% of them we verified and for the 25% that were rejected, we will not include them in the database as we know they are bad records. The unsampled records will be added to the database, all with a score of 75.)

Let me add one last factor to this explanation of how we measure quality – the dimension of time. As I mentioned above, contact information decays over time. And what is accurate today will not be quite as accurate next month. To accommodate this decay rate, we decrement the accuracy score for every contact by 1.5% each month. So, our accuracy scores are dynamic and automatically decrease each month for each record, unless we have obtained independent re-confirmation of a record’s accuracy.

As I suggested in the beginning, we must recognize that there is a trade-off between quality and quantity of the entire database. We can always increase either of these but it will necessarily come at the cost of the other. For example, if we eliminate our records with the lowest accuracy scores, the average accuracy of the entire database will increase but, obviously, the size of the database will be smaller by the amount of records we eliminated. This can be frustrating for customers who, understandably, want to maximize quality and maximize the number of available and usable contacts. However, we cannot simultaneously serve both masters.

Just as Heisenberg’s Uncertainty Principle demonstrated that in the world of particle physics one cannot simultaneously know with certainty both a particle’s location and momentum, an analogous phenomenon is true when dealing with data – it is simply not possible to maximize both quality and quantity. Our approach to this challenge of trading off quality and quantity is to not make an arbitrary cut-off at a particular level but instead allow users to make that choice at the level that is optimal for them.

We provide the necessary tools – primarily our accuracy score – to enable users to dial in the level of quality/quantity that best suits their needs.

Next, I want to address the question of how a user should interpret our quality score. Simply put, our quality score is best understood as a probability measure. That is, what is the likelihood (between zero and 100%) that this contact record is accurate? This is most easily understood when considering the quality of an entire list. Essentially, we are representing that based on our extensive statistical analysis, the connectivity results (email or phone) from using this list should be very close to the calculated average accuracy score. For any particular record, the results are unknowable – and tomorrow may be different from today – but the results from the entire list should be predictable.

Finally, how does NetProspex stand behind our approach to quality? In short, completely. We believe that we have the most sophisticated, statistically-valid, and transparent approach to quality in the industry. As such, we guarantee the connectability of any records that you buy from us. If you have any records that are inaccurate (phone connection or email deliverability), let us know and we’ll replace them. It is that simple.

Jimmy Becker is the SVP of Content at NetProspex