How To Write Unmaintainable Code (Naming)

This is an adaptation of the original guide to unmaintainable code (http://mindprod.com/jgloss/unmain.html) adapted to Python and the world of data science and ML/AI. This blog post only covers the naming section of the original article

Introduction

Never ascribe to malice, that which can be explained by incompetence. - Napoleon

In the interests of creating employment opportunities in Python and data science, I am passing on these tips from the masters on how to write code that is so difficult to maintain, that the people who come after you will take years to make even the simplest changes. Further, if you follow all these rules religiously, you will even guarantee yourself a lifetime of employment, since no one but you has a hope in hell of maintaining the code. Then again, if you followed all these rules religiously, even you wouldn’t be able to maintain the code!

You don’t want to overdo this. Your code should not look hopelessly unmaintainable, just be that way. Otherwise, some meddlesome Machine Learning Engineer (MLE) might actually figure it out and rewrite or refactor it. The MLE is your arch-nemesis, dedicated to maintaining, productionizing, and ensuring your code and models don’t blow up. If you play your cards right, you’ll have the MLE questioning their life choices and teetering on the edge of sanity. That’s the dream, my friend.

General Principles

Quidquid latine dictum sit, altum sonatur. - Whatever is said in Latin sounds profound.

To foil the MLE, you have to understand how they think. They have your giant notebook, no time to read it all, much less understand it. They want to rapidly find the place to make their change, make it and get out and have no unexpected side effects from the change.

They view your code through a toilet paper tube, a tiny piece of your program at a time. You want to make sure they can never get at the big picture from doing that. You want to make it as hard as possible for them to find the code they are looking for. But even more important, you want to make it as awkward as possible for them to safely ignore anything.

Programmers are lulled into complacency by conventions. But every once in a while, by subtly violating convention, you force them to read every line of your code with a magnifying glass.

You might get the idea that every language feature makes code unmaintainable – not so, only if properly misused.

Naming

“When I use a word,” Humpty Dumpty said, in a rather scornful tone, “it means just what I choose it to mean - neither more nor less."

  • Lewis Carroll – Through the Looking Glass, Chapter 6

Much of the skill in writing unmaintainable code is the art of naming variables and methods. They don’t matter at all to the interpreter. That gives you huge latitude to use them to befuddle the MLE.

Pandas column names are your friends

The best thing about Pandas is that it’s data not code. No matter how good their IDE is, the IDE can’t understand the data your processing and can’t help the MLE understand what your data looks like.

  • Your column names should be hard to type. Make sure they have spaces, and even better periods in them.
  • If you are using the same dataframe in multiple functions, make sure to change the spelling or case of column names in each one.
  • Randomly add extra columns when you process Pandas dataframes. Out of memory errors are the most annoying to troubleshoot.
  • Every column should be an object dtype. That way they are both less efficient, and can contain any arbitrary value.

New uses for baby name books

Buy a copy of a baby naming book and you’ll never be at a loss for variable names. Fred is a wonderful name, and easy to type. If you’re looking for easy-to-type variable names, try asdf or aoeu if you type with a DSK keyboard.

Single letter variable names

If you call your variables a, b, c, then it will be impossible to search for instances of them using a simple text editor. Further, nobody will be able to guess what they are for. If anyone even hints at breaking the tradition honoured since FØRTRAN of using i, j, and k for indexing variables, namely replacing them with ii, jj and kk, warn them about what the Spanish Inquisition did to heretics. If you want to take this to the next level, checkout underscores. The beauty is that _, __ and ___ are all valid variable names.

___ = []
for _ in csvs:
  __ = pd.read_csv(_)
  ___.append(__)
___ = pd.concat(___)

Creative miss-spelling

If you must use descriptive variable and function names, misspell them. By misspelling in some function and variable names, and spelling it correctly in others (such as raed_data_file and read_from_db) we effectively negate the use of grep or IDE search techniques. It works amazingly well. Add an international flavor by spelling tory or tori in different theatres/theaters. Did you know that there is no panda library? Just pandas. Your data science codebase needs a panda module which implements a random subset of pandas functionality in a “better” way. You can even import panda as pd, so no one can tell which one you are using.

Be abstract

In naming functions and variables, make heavy use of abstract words like it, data, handle, stuff, do, processing, perform and the digits e.g. routine_x48, run_model_326, and process_data_333

A.C.R.O.N.Y.M.S.

Use acronyms to keep the code terse. Never define them.

Thesaurus surrogatisation

Thesaurus' are wonderful and keep things interesting. A method for data_cleaning is boring. Why not data_disinfecting? In fact with a thesaurus, you can have multiple methods that must be called in order:

data = scrub_data(data)
data = wash_data(data)
data = polish_data(data)

When the MLE asks you why you don’t just consolidate those into 1 function, you can respond with outrage that scrubbing, washing, and polishing are dramatically different steps in the cleaning process.

Use plural forms from other languages

Esperanto , Klingon and Hobbitese qualify as languages for these purposes. In Klingon the -mey suffix is used for the plural form of inanimate objects or animals. You will write a script that keeps track of the return statusmey of all your jobsmey.

Reuse names

Whenever possible re-use variable names. The goal is to force the MLE to carefully exmaine the scope of every instance. When you re-use variable names, the actual values should be similar but slightly different. For example maybe you have a function called load_aws_costs which returns AWS cost by day. You should immediately re-define that in another scope to load AWS cost by month instead.

Unicode variable names

You can use a pretty large subset of unicode as variable names.

>>> i = 2
>>> í = 1
>>> í + i
3

The second í is i-acute (with an accent). Sprinkle these liberally. They are very hard to distinguish from their un-accented counterparts.

Mix languages

Randomly intersperse two languages (human or computer). If your boss insists you use his language, well, he’s xenophobic.

网站流量 = []
for d in dates:
  df = pd.read_csv(f"s3://my-bucket/{d.isoformat()}")
  网站流量.append(df)
网站流量 = pd.concat(网站流量)

网站流量 is website traffic in Chinese

Names from other languages

Use foreign language dictionaries as a source for variable names. For example, use the German punkt for point. MLEs, without your firm grasp of German, will enjoy the multicultural experience of deciphering the meaning.

Names From mathematics

Choose variable names that masquerade as mathematical operators, e.g.:

open_paren = (slash + asterix) / equals;

Bedazzling names

Choose variable names with irrelevant emotional connotation. e.g.:

marypoppins = (superman + starship) / god;

This confuses the reader because they have difficulty disassociating the emotional connotations of the words from the logic they’re trying to think about.

When to use i

Never use i for the innermost loop variable. Use anything but i. Use i liberally for any other purpose especially for non-int variables. Similarly use n as a loop index.

Lower case l looks a lot like the digit 1

Use lower case l to indicate long constants. e.g. 10l is more likely to be mistaken for 101 than 10L is. Ban any fonts that clearly disambiguate uvw, wW, gq9, 2z, 5s, il17|!j, oO08, `'", ;,., m nn rn, and {[()]}. Be creative.

Rename builtins

Python built-ins can be renamed. This is useful when you think the existing built-in is confusing, or lacks ambition. delattr is surpisingly limited in scope - it can only delete attributes for Python objects. Why not write a delattr function that removes columns from your production database instead.

Recycling

Use scoping as confusingly as possible by recycling variable names in contradictory ways. For example, suppose you have global variables A and B, and functions foo and bar. If you know that variable A will be regularly passed to foo and B to bar, make sure to define the functions as function foo(B) and function bar(A) so that inside the functions A will always be referred to as B and vice versa. With more functions and globals, you can create vast confusing webs of mutually contradictory uses of the same names.

Cd wrttn wtht vwls s mch trsr

Use as many abbreviations or variations on the same word as possible. For example use the British spelling colour and then the American color and dude-speak kulerz. Abbreviations are great - if you spell out names in full, there is only one possible way to spell each name. For you should refer to your database connection as db_conn, dbconn, dbc, dbcxn all in the same file. For giggles, occasionally name it something totally irrelevant, like hairy_goat.

Misleading names

Make sure that every method does a little bit more (or less) than its name suggests. As a simple example, a method named is_valid(x) should as a side effect drop a production database table.

Type hints and suffixes

A common practice in Python is to use type hints so that people know what type a variable will be at runtime. Great news! Python does not enforce typehints. Your typehints should always lie. Before typehints, people would (and still do) use types in variable names to be helpful, for example (user_id_str) is a good way to refer to a user_id that is a str. Lucky for you - your database stores user_id as an integer.

Obscure film references

Use constant names like LancelotsFavouriteColour instead of blue and assign it a hex value of $0204FB. The color looks identical to pure blue on the screen, and a maintenance programmer would have to work out 0204FB (or use some graphic tool) to know what it looks like. Only someone intimately familiar with Monty Python and the Holy Grail would know that Lancelot’s favorite color was blue. If a maintenance programmer can’t quote entire Monty Python movies from memory, he or she has no business being a programmer.


About Saturn Cloud

Saturn Cloud is your all-in-one solution for data science & ML development, deployment, and data pipelines in the cloud. Spin up a notebook with 4TB of RAM, add a GPU, connect to a distributed cluster of workers, and more. Request a demo today to learn more.