Life beyond the dreaded VLOOKUP

György Balázsi
3 min readFeb 10, 2016

--

VLOOKUP is the single most useful function of every spreadsheet software. It emulates the basic data structure called “hash table” or “dictionary” in many programming languages. Without this, you cannot imagine to do any useful calculation.

Said data structure works like this: you have a collection of data, from which you want to pick specific values by “handles”, or keys as they are called usually. Example #1: the series of revenue figures of a company summed up by months, from which you want to pick one specific value associated with one specific year and month. Example #2 (see later): you have the collection of countries and capitals of the world combined, from which you can pick any capital if you know the country it pertains to.

Yet, VLOOKUP is one of the least used and most dreaded functions in spreadsheets. I can hardly tell any casual spreadsheet user who is comfortable with it.

The reason is simple: there are more than one ways to screw it up, due to unintuitive behavior. Most notably, “relative reference” in spreadsheet softwares is sometimes useful, but here it does more harm than good, making it very easy to do an error when selecting the cell area from which you want to pick your values. Also, the mandatory zero as last parameter if you want your function to behave in the obvious way. Missing zero, comes confusion about what is happening. And there are more pitfalls to list, if this is not enough. (In the introductory lines of the MrExcel tip about VLOOKUP, the existence of many pitfalls is formulated by the understatement “fairly complex” — which is the most radical statement to be expected from people who are in the business of selling Excel tips.)

So why not have the real thing: beautiful, simple to use Python dictionaries?

In the examples posted here I show you a couple of examples how Python dictionaries can be used.

We start with a text based list of countires and capitals.

From the text based list we create a Python dictionary using the left column names (cities) as keys, right column names (countries) as values, using the tab character which divides the 2 columns as the basis fro splitting up the rows. One way to define a dictionary in Python is the dict(…) expression (there are others as well). The expression within the parantheses tell python that it has to build the dictionary in 2 steps: first, split the text based list into lines, second, split the lines along the tab character.

Unfortunately, the result is not the same as what we wanted, because it maps countries to capital cities, not vica versa, according to the contents of the original list. No problem, we can easily reverse the key-value order, using again the dict(…) defining expression. This time in it, at the beginning the (k,v) part tells Python that the dictionary must map the v values to the k values specified in the second part of the expression (… for …). The second part of the expression tells Python that it has to use the v,k pairs contained in the d.items() list. The reversal of of the capitals and countries happens by the reverted order of the k and v variables.

Finally, as an example how to screen list items in Python, I show you how to pick the capitals, for which the name of their countries begins with H.

For this purpose, we use a so called list comprehension expression, which is very similar to the dict(…) expression above. This time the […] parentheses tell Python that it has to construct a list (not eg a dictionary as above). The expression within the […] parentheses is very similar to the examples above. The main difference is that it contains a filtering condition, namely that the “key” element of the key-value pairs contained in the dictionary must begin with the “H” character.

Here is the beginning of the text based list used as a source for the dictionary
Here are the code lines representing the computations described above, and the result of the last one

--

--