How to Prepare Your Data for a Modeling Project

How to Prepare Your Data for a Modeling Project

Data Modeling Prep

In the data-crazed world we live in, companies are interested in leveraging their data more than ever. On the surface, people tend to view their data as a treasure chest of gold that simply needs a key to uncover the prize (insights). In reality, uncovering valuable insights through data modeling is often a bumpy process. “Speed-bumps” typically arise during the ETL (Extraction, Transformation, Loading) process. These ETL “speed-bumps” are a result of clients passing along data that is in bad shape. More often than not, clients think their data is in good shape. Our experience has shown this is usually not reality. To help you get the most out of your data, here are a few tips on how to prepare your data for a modeling project.

Know where your data is

Depending on how your data collection and organization is structured, your data could be in a variety of locations. Not only is it important to know where your data is from a time-saving standpoint but it’s vital for uncovering the insights you need. For example, you may come to a data modeling provider and say, “Here’s the answer I want to know at the end of the project”. Depending on the answer you’re looking for, there may be key data that must be included in order to answer your desired question. In short, don’t wait until you have a pressing deadline to get a hold on where your data is.

Structure data consistently

It’s very important that your data is structured consistently for all observations. For example, making sure your sales data is reported in dollars by state by month by product line for all states, months, and products. Do you have common nomenclature used across data sources? (e.g. SKUs are consistent across sales, production, marketing efforts, etc.) Do you have the same field names consistently across data sources?

It’s a matter of making sure your data is speaking the same language.

Data documentation

Is your data well documented? For example, does each data source have a description of what it contains; acceptable ranges for fields; known anomalies noted?

Communication

For large organizations, data can be owned by a variety of different departments and agencies. It’s important that you identify the owners of the data and make sure they are aware and engaged in the project. Sometimes it can be a challenge getting people to give you the data you need in a time crunch if they’re clueless as to why you need the information.

Quick takeaway

Make your data modeling project run a lot smoother by getting your data ready beforehand. Not only will it make your data modeling provider very happy but it will help you take ownership of your data organization and structure; enabling you to get the most out of your data.

Leave a Reply

Join our Newsletter

We'll send you newsletters with news, tips & tricks. No spams here.

Your Name (required)
Your Email (required)
Message