When working with data from multiple sources, data can be in different units.

One example can be found in the Kaggle FIFA 21 Raw Dataset.

This dataset has a lot of opportunities for cleaning up the data, but here we’re just going to look at the Height and Weight columns. We’ll look at 3 methods to clean up these columns and we’ll take each step by step.

The data in the height column is a mix of numbers in Metric system (cm.) as well as in English system (ft. and in.), while the data in the weight column is in kgs. The goal here is to convert data in the Height column to Feet with one decimal place and Weight from kgs. to lbs.

## Method 1: Basic Excel

Step 1: Load File

(Using Open Office)

Step 2: Add 2 Columns after Column N

Step 3: In Cell O2 Enter this formula: =RIGHT(M2,2)=”cm”

This will detect Metric or English units, returning TRUE for Metric.

Step 4: Copy this formula to all of column O.

Step 5: Sort the Worksheet by Column O.

At this point, we have the Metric units split out from the English units.

Step 6: Find the Split.

Since we’re looking to convert everything into English units, we’ll keep the values above this point. However, we have the extra item of wanting it in Feet with decimals rather than Feet and Inches in the original.

Step 7: In Cell O2 Enter: =VALUE(LEFT(M2,1))+VALUE(MID(M2,3,LEN(M2)-3))/12

(This works partly because players’ heights are all single digits (between 1 and 9). If heights did not all have the same number of digits we would need to use more logic.)

Step 8: In Cell P2 Enter: =VALUE(LEFT(N2,3))

Note that Column O has a repeating decimal at the end due x/12 for inches. We can control that either in formatting or add a rounding to the formula:

=ROUND(VALUE(LEFT(M2,1))+VALUE(MID(M2,3,LEN(M2)-3))/12,2)

Step 9: Copy the formulas in columns O and P down to the first TRUE in column O (row 41 in this example))

Step 10: Unless you know the conversion rates off the top of your head, we’ll have to look them up. Pull up a Google tab and enter “convert cm to ft”. This gives us 1cm=0.0328084 feet. We can also get the conversion rate for kg to lbs here: 1kg = 2.200462 lbs.

Step 11: In Cell O42 Enter: =ROUND(VALUE(LEFT(M42,3))*0.0328084,2)

(Again we can count on all players’ heights having the same number of digits (3). Also, rounding to 2 decimal places.)

Step 12: In Cell P42 Enter: =ROUND(VALUE(LEFT(N42,2))*2.200462,0)

(Rounding to 0 decimal places here to be consistent with the Pounds native values.)

Step 13: Copy these formulas down to the end.

Step 14: Select Columns O:P and Copy/Paste Values to replace the formulas with values.

Step 15: Delete Columns M:N.

All done. We now have columns which are consistently using the same units of our choice.

15 steps and ~20 minutes to make this transformation.

## Method 2: Advanced Excel (with or without leveraging ChatGPT)

We’ll start in about the same way, adding Columns N and O to work with in the spreadsheet.

Step 1: Add Columns O:P

We’ll now use formulas that detect the units and put the correct values in the cells. This lets us skip creating a flag, sorting the column and needing to use different formulas in different spots in the column.

Step 2: In Cell O2: Enter

=IF(RIGHT(M2,2)="cm",ROUND(VALUE(LEFT(M2,3))*0.0328084,2),ROUND(VALUE(LEFT(M2,1))+VALUE(MID(M2,3,LEN(M2)-3))/12,2))

Step 3: In Cell P3: Enter

=IF(RIGHT(N2,2)="kg",ROUND(VALUE(LEFT(N2,2))*2.200462,0),VALUE(LEFT(N2,3)))

If you look closely, you may see these formulas just combine the formulas used above for the flag and the conversions from the original units to the units and formats we want. If you’re good with Excel you may be able to create these yourself, but they’re long enough and untangling any error is tricky enough that it’s great to leverage the tools we now have with AI.

We asked ChatGPT to help us create the formula.

However, ChatGPT doesn’t always get it right the first time:

(It also provided a new formula for Column P, which was unnecessary).

Step 4: Copy these formulas down.

Step 5: Select Columns O:P and Copy/Paste Values

Step 6: Delete Columns M:N, which are no longer needed.

This is a significant improvement and saves time compared to the original process.

Now, let's look at what Querri can do with its combination of using the LLM for logic in the background and working natively in tabular data.

## Method 3: Using Querri

Step 1: Load the Data into a Project in Querri.

Step 2: Tell it to convert the data in plain English

Resulting Columns:

I didn’t tell it about the decimals for lbs, but converting that to integer is simple.

All done!

All the 3 tools were able to accomplish the task, the difference being that Excel needed some advanced knowledge to craft a set of formulae to accomplish the task. ChatGPT made that easy by crafting the formulae for us, but that too needed a fair knowledge of prompting skill to have it spit out the exact formula (though in 2 trials) to accomplish the same task.

With Querri, the conversion task was a breeze and the fastest of the 3 methods. It also needed no prior knowledge of any tool. All the user had to do was describe the ask in plain English.

Who doesn't like easy and fast!

Do you wish you could make data cleaning an enjoyable part of your job? Then Upload your data set into Querri and start experimenting for free today!