-
Notifications
You must be signed in to change notification settings - Fork 2
ASL Coding Policy
When programming, our main goals are:
- To obtain correct results
- To minimize the development time
The present document outlines a policy aimed at attaining these goals. Although reflective programmers would arrive at these conclusions eventually on their own, they are presented here so that the reader learns how to program in a smart way without having to first suffer too much pain.
A typical image to keep in mind is captured by the following example, where two programmers are asked to separately carry out the same project. Suppose that, at the end, both projects work correctly. The programmers spent their time as follows:
- Programmer A: 3 h coding, 7 hours debugging
- Programmer B: 3.5 h coding, 1.5 hours debugging
Which programmer would you like to be? The policy described here is intended to help you code in the "smart way" so that you can be more like Programmer B.
We adhere to the general principles of good and healthy coding known by any experienced programmed. Those that do not abide by these principles will experience extremely long debugging time, difficulties to collaborate with others, and will not be sure that their results are correct.
-
DRY principle
-
The code must be modular, organized in classes and small functions. As a rule of thumb, a function should not exceed the size of one screen. These classes and functions should be as reusable and general as possible. However, this does not mean that you need to implement functionality that you do not need. You may design the interface of the function to be sufficiently general to accommodate future usage and, in case that the invoking code requires more general functionality than you implemented, raise a NotImplementedError. In this way, you can extend the function in the future without having to modify outer code.
The following is the most important rule that we need to follow:
Our code should either return a correct result or raise an error.
Consider the following example:
A = np.random.random((4,5))
x = np.random.random((5,))
def residual(b):
return np.linalg.norm(A@x-b)
print(residual( np.ones((4,1)) )) # output = 3.3187837978733326
print(residual( np.ones((4,)) )) # output = 1.6593918989366663Here, the user of function residual obtains two different results. The first one is wrong, the second one is correct. Since it is very likely that the user may sooner or later invoke this function in the first (incorrect) way, the way function residual must be coded is, for example,
def residual(b):
assert b.ndim == 1
return np.linalg.norm(A@x-b)which raises an error if it is invoked incorrectly. A good practice in general is to write input checks in the functions that we code. This does not mean that we need to check everything all the time but, when debugging, if you find that an error was caused inside a function because the arguments were not passed properly, just add an input check to easily locate a similar error in the future and save debugging time.
A common source of errors is when the user passes the parameters in an inappropriate order. Consider the function
def rx_power(tx_power, gain_tx, gain_rx, distance, wavelength):
return tx_power + gain_tx + gain_rx - 20*np.log10(4*np.pi*distance/wavelength)If, instead of typing rx_power(120, 1, 1, 30, 0.1), the user types rx_power(120, 1, 1, 0.1, 30), the result will be incorrect and no error will be raised. A safer way of invoking this function would therefore be rx_power(distance=30, tx_power=120, wavelength=0.1, gain_rx=1, gain_tx=1).
Another argument to advocate the syntax rx_power(distance=30, tx_power=120, wavelength=0.1, gain_rx=1, gain_tx=1) over rx_power(120, 1, 1, 30, 0.1) is that it is very likely that at some point we wish to assign default values to certain arguments. For example, one would replace the above function definition with
def rx_power(tx_power, distance, wavelength, gain_tx=0, gain_rx=0):
return tx_power + gain_tx + gain_rx - 20*np.log10(4*np.pi*distance/wavelength)Note that in order to assign default values to the arguments gain_tx and gain_rx, we needed to place them at the end. This is because positional arguments always precede keyword arguments. With this change, if one has used the syntax rx_power(120, 1, 1, 30, 0.1) at multiple places of the code, these invocations need to be changed. If we forget to change one of them, then our code will return wrong results without raising any error.
Readable code is easy to
- understand
- debug
- read by other people, in particular collaborators
- maintain
Think that you will be reading functions and classes that you coded weeks or months before. If you need to spend several hours trying to find out what they do, your development time will unnecessarily increase. Besides, it is likely that you do not fully understand its functionality and details.
Several crucial aspects are outlined below:
When we are beginners, we usually jump into writing code before thinking much. At some point, we all said "one day, when I have time, I will document this code". But, as we know, that day never came for any, or at least most, of our projects.
The above view embodies what could be termed the "barbarian" style: we first code and then we write about what we coded. However, as common sense suggests, most of the times it is preferable to first think and then do. We first design a house, then we build it.
This indicates that, when we create a function, the first thing that we write should be the header. It does not need to be long, but it should indicate what the function will do. A good header:
- is as short as possible,
- specifies the input and output arguments, preferably their types and shapes if not obvious,
- briefly describes the relation between the input and the outputs, i.e., what the function does. How the function does it is not mentioned unless necessary.
The header need not say everything explicitly. Since it is written for programmers, you do not need to say "everything" in every header. For example, the header of this function
def mat_argmax(m_A):
""" returns tuple with indices of max entry of matrix m_A"""
num_cols = m_A.shape[1]
assert m_A.ndim == 2
ind = np.argmax(m_A)
row = ind // num_cols
col = ind % num_cols
return (row, col)is totally fine. One need not say that m_A is an np.ndarray (the programmer knows it), what is its shape (not relevant), or what is the number of dimensions (since it is said to be a matrix, m_A.ndim must be 2).
Sometimes, if what the function does can be directly understood by the naming of the function and arguments, no header may be needed. E.g.:
def dbm_to_watt(array):
return 10**((array - 30) / 10)Other times, the header is even longer than the function itself.
Sometimes, a header may not be even necessary. This often occurs with private methods or local functions if we are sure that they are invoked only in one place and we are sure that in the future they will not be invoked from other places. As a general rule, if the name of a function does not start with a single underscore "_" (which means that the function is private) or the function is not local, then it must have a header.
Coding efficiently, meaning well and fast, is a skill that is acquired with experience. Writing a short header contributes towards this purpose for various reasons:
-
It forces us to think and phrase what a function will do. This has several implications:
- After writing the header, we have a clear direction in mind and our movements will be oriented in a single direction, i.e., we move in a straight line. If, instead, we think what to do while coding, we tend to wander around and it will typically take longer time to arrive at our destination.
- If what the function does is difficult to describe, meaning that it takes more than one or two sentences, it is a symptom that we are not structuring well our code. What this function is supposed to do may perhaps be done by several smaller functions. This leads to a healthier code.
-
It serves as a "contract" between the user and the function. In practice, the word debugging means "locating an error". If a function that is involved in an error has such a contract, the set of possibilities reduces to two:
- The function is correctly invoked: in this case, the error must be inside the function.
- The function is incorrectly invoked: in this case, the error is outside the function.
On the contrary, if no "contract" exists, so it is unclear what the function is supposed to do, then the error can be anywhere.
-
It allows the programmer to know what a function does without reading its code. It takes one minute to write a header when we code a function since we know, at that point in time, what the function does. However, finding out what a function does after several weeks may take a much longer time, which therefore negatively affects efficiency. Let alone the implications regarding collaboration with other users.
A good programmer has a top-down perspective of the code. This structure should be reflected in the code organization inside a function. Thus, one organizes the code in "chunks" or paragraphs or even sections separated by an empty line. Each of them may start with a comment which quickly let us see what is done there without having to read the code inside. Observe in the following example how clean the resulting code is:
def plot(self):
"""Adds a rectangle per side of the building to the current figure. """
def high(pt):
return np.array([pt[0], pt[1], self.height])
def lateral_face(pt1, pt2):
return np.array([
[high(pt1), high(pt2)],
[pt1, pt2],
])
def plot_face(face):
mlab.mesh(face[..., 0], face[..., 1], face[..., 2])
mlab.mesh(face[..., 0],
face[..., 1],
face[..., 2],
representation='wireframe',
color=(0, 0, 0))
# Top face
face = np.array([
[high(self.nw_corner), high(self.ne_corner)],
[high(self.sw_corner), high(self.se_corner)],
])
plot_face(face)
# West face
face = lateral_face(self.nw_corner, self.sw_corner)
plot_face(face)
# North face
face = lateral_face(self.nw_corner, self.ne_corner)
plot_face(face)
# East face
face = lateral_face(self.ne_corner, self.se_corner)
plot_face(face)
# South face
face = lateral_face(self.sw_corner, self.se_corner)
plot_face(face)Proper variable naming saves plenty of trouble. For example, in rx_power above, the user may pass the arguments in natural units, while the function expects them to be in dB-units. This can lead to wrong results. A way of reducing the probability of these mistakes is by naming the variables in such a way that they accurately convey their meaning, for example:
def rx_power(tx_dbpower, dbgain_tx, dbgain_rx, distance, wavelength):
return tx_dbpower + dbgain_tx+ dbgain_rx - 20*np.log10(4*np.pi*distance/wavelength)To speed up coding, a good naming approach should also be systematic. Observe that, in the above example, "tx" is sometimes at the beginning and other times at the end. Thus, a more appropriate naming would be as follows:
def rx_power(tx_dbpower, tx_dbgain, rx_dbgain, distance, wavelength):
return tx_dbpower + tx_dbgain+ rx_dbgain - 20*np.log10(4*np.pi*distance/wavelength)When dealing with linear algebra, or in general with math that involves vectors and matrices, it is recommendable to write a prefix that indicates whether a variable is a vector or a matrix. This avoids many typical bugs. For example, if one writes a + b and a is a matrix whereas b is a vector, then we are probably trying to do something different from what we are actually doing, but Python will just broadcast the vector into a matrix and silently do the wrong thing. In turn, with a proper naming, writing m_A + v_b will immediately bring this fact to our attention and we would realize what is going on at the moment of coding.
Another example is as follows:
for i in range(n):
process_measurements(data[n])In the above example, there is a bug: the programmer wanted to write process_measurements(data[i]). However, this mistake is not clearly noticeable and will not result in an error, yet the results would be wrong. A more sensible approach would be as follows:
for ind_sensor in range(num_sensors):
process_measurements(data[ind_sensor])Observe that the probability of making a mistake by adopting this naming is much lower.
The same example also allows us to illustrate one more practical consideration: it is likely that, in a certain paper, the total number of sensors is N and the authors use i to index sensors. However, naming variables according to mathematical notation on a paper is not a good idea. It is typically the case that we change notation while writing a paper, meaning e.g. that what was N now is M, what was M now is L, and what was P now is N. If we come back to our code after a couple of days working on a paper and the code does not use variables with a meaningful name, it would be almost unreadable for us. Thus, we prefer names such as num_sources, num_gridpoints, ind_gridpoint, etc, which do not change even when the notation in the paper is modified.
Needless to say that names such as aux, aux_aux, variable, etc are strongly discouraged.
Similar considerations apply to functions. So names such as path_loss are more appropriate than f or l.
As researchers, we usually develop our own simulation code. Following the general programming principle of separating functionality from configuration, GSim is designed in such a way that the configuration part is in experiment functions in an experiment module inside the folder experiments, whereas the rest of the modules implement the simulation functionality.
Experiment functions perform the following actions:
- Set the parameters of the experiment, e.g., transmit power, number of receivers, number of samples, etc.
- Instantiate the appropriate objects and invoke the required simulation functions.
- Plot the results or create a table.
This means that to run an experiment, one should never ever have to modify something in the code outside the experiment function. We cannot go to the code of the simulator and change one parameter, comment one line, etc. All outside code must be the same regardless of which experiment we execute. Note that, otherwise, it is extremely easy to obtain wrong results and mess up somebody else's experiments.
Finally, the simulation code outside the experiment file must be properly organized using the standard principles of OOP. For example, note that whenever certain functionality is specific to a subclass, it not be implemented in the parent class. Conversely, whenever a certain functionality pertains to a certain class, it should not be implemented in a subclass.
The points raised here are, for the most part, imposed by common sense. Just try to be smart when coding and also respectful to your collaborators.