On Tests

Nov 22, 2022

From long-term programming language research and practical work, I have figured out some truths about testing. Yet in every company I've worked at, I've found that the vast majority of people don't understand these principles, and many teams collectively adopt the wrong approach without knowing it. Many people regard testing as a kind of doctrine and dogma, conduct excessive testing, unnecessary testing, and unreliable testing, and teach these wrong practices to novices, causing a vicious circle. The original purpose was to improve the code quality, but the result not only failed to achieve the goal, but reduced the code quality, increased the workload, and greatly delayed the progress of the project.

I also write tests, but my way of testing is much smarter than the way of "test dogmatists". In my mind, the status of the code itself is much higher than the test. I don't ignore testing, but I don't put the cart before the horse and overemphasize testing. I don't advocate test-driven development (TDD). I know what to test and what not to test, when to write tests and when not to, when to defer testing, and when not to test at all. For this reason, coupled with my strong programming skills, I have repeatedly completed tasks that others thought were impossible to complete in a short period of time, and produced very high-quality code.

The truth about tests

Now I will summarize what I have learned about testing, some of which are little known or misunderstood.

Don't think that you can improve the code quality by showing the attitude of "emphasizing code quality" everywhere. There are always some people who think that if they know terms like "unit test" and "integration test", they know programming well and can educate others. Unfortunately, attitudes and slogans alone cannot solve problems. You must also have practical skills, in-depth insights and wisdom, and you must know exactly what to do. The quality of the code will not be improved because you pay attention to it, nor will it be improved because you have taken measures (such as testing, static analysis). You have to know when to write tests, when not to write tests, and when you need to write tests, what kind of tests to write. In fact, the only feasible way to improve code quality is not to write tests, but to repeatedly refine one's own thinking and write simple and clear code. If you want to really improve code quality, my article " The Wisdom of Programming " is a good place to start.
True programming masters are not bound by tests. Yes, the guy next to you who you think "don't care much about testing" is probably a better programmer than you. I like to compare programming to driving a race car, and testing to the tire barriers placed on the side of the road to prevent collisions...

Guardrails are sometimes very useful and can save lives. However, a qualified driver will never focus on having guardrail protection, and the status of testing in programming activities should be like this. A good driver will quickly see an elegant and simple path, with just the right speed and timing, to go straight to the finish line. The guardrails are only placed in the most dangerous places, so that if you have an accident, you will not die too badly. Guardrails don't make you a good driver, they don't make you a champion. Most of the time, your safety only depends on your own technology, not the guardrails. You will always have a way to kill yourself. The role of testing is the same, even with a lot of testing, the security of the code is still only in your hands. You can always create new bugs and no tests can detect it...
Usually, a qualified driver will never touch these barriers at all, and they have a higher goal in mind: to get to the finish line quickly. In contrast, an unqualified driver, he often crashes to the outside of the track, so in his heart, the guardrail has a supreme status, so he always preaches the importance of the guardrail to others. In order to prevent mistakes when he was driving, he had to densely place guardrails on both sides of the path he passed, and even put the guardrails in the middle of the track to ensure that his turning range was correct. He stumbled between the guardrails and barely made it to the finish line. People who advocate test-driven development are such third-rate drivers. No matter how many tests they write, it is impossible for such people to produce reliable code.
Don't write tests until the program and algorithm are finalized. TDD dogmas like to tell you that you should write tests before you write programs. Why write tests before writing code? It's just a dogma. In fact, these people have not used their own brains to think about this issue, but just echo what others say, thinking it is "cool" and in line with the trend, or thinking that if they do this, others will think they are masters. In fact, you don't need to write tests until the program framework is completed and the algorithm is finalized. If you want to know if the code is correct, it is enough to manually run the code and see the results.
If you find that the properties that need to be guaranteed in the early stage of programming are so many that you don’t have confidence if you don’t write tests, then you should find a way to improve the basic programming skills first: do more exercises, simplify the code, and make the code more modular If you want to change it, look at my "wisdom of programming" or "SICP" or something like that. Writing tests does not improve your level, on the contrary, writing tests prematurely will tie your hands and feet, preventing you from modifying code and algorithms freely. If you can't modify the code quickly, can't intuitively feel its changes and structure, but get stuck everywhere because of the test, you won't be able to generate the so-called " flow " in your mind, and you won't be able to write elegant code. In the end, you don't learn anything. Only after the program no longer requires drastic changes is it time to gradually add tests.
Don't change the clear way of programming in order to write tests. In order to meet the "coverage" requirements, to be able to test certain modules, or to use mocks, many people change the originally simple and clear code into a more complex and confusing form, and even use a lot of reflection. This actually reduces the quality of the code. The code was originally very simple, and you can tell whether it is correct at a glance, but now you look at it at a glance, and there are various adapter plugs added for the convenience of testing everywhere, and you can no longer feel the code. These codes used to assist testing hinder your intuitive thinking about the code, and if you can't completely map the logic of the code in your mind (and then generate intuition), it will be difficult for you to write truly reliable code.
Some C# programmers add a lot of interfaces and reflections for testing, because this can easily replace a piece of code with a mock during testing. As a result, you find that each class in this program has a supporting interface, and you need to write another mock class to implement this interface. In this way, not only the code becomes complicated and difficult to understand, but also the assistance function of Visual Studio is lost: you can no longer press a key (F12) to jump directly to the definition of the method, but you need to jump to the corresponding interface method first , and then find the correct implementation. So you can no longer quickly jump through the code. This loss of convenience drastically reduces the mind's chances of generating an overall understanding. And for mocking, every constructor call has to be replaced with a reflection-containing construction, so that the static type checking of the compiler cannot ensure that the type is correct, increasing the possibility of errors at runtime, and the error messages are still difficult to understand, and the consequences outweigh the gains.
Don't test "implementation details" because that's equivalent to writing the code twice. The test should only describe the "basic properties" that the program needs to satisfy (such as sqrt(4) should be equal to 2), rather than describe the "implementation details" (such as the specific steps of the square root algorithm). Some people's tests are too detailed, and even test every implementation step of the code conscientiously: the first step must be done A, the second step must be done B, the third step must be done C... and some people like to write for the UI. Test, their tests often write like this: If you browse to this page, then you should see this line in the title bar...
If you think about it carefully, you will find that this approach is essentially just writing the code (or UI) twice. It turns out that the code clearly says: do A first, then B, then C. It is clearly written in the UI description file: These contents are in the title bar. Why do you need to double check them all in the test? This doesn't add any reliability at all: you can make mistakes in your code, and if you rewrite the same logic in a different form, won't it be wrong?
It's like some smart people who always worry that the door is not locked when they go out. After closing the door, they have to push and pull it several times to make sure that the door is locked. Before he took a few steps, he still suspected that the door was not locked, and went back and pushed and pulled it several times, but he still couldn't rest assured: P This approach not only cannot guarantee the correctness of the code, but also creates obstacles for modifying the code. Of course, you wrote the same piece of code twice, and every time you want to modify the code, you have to modify it twice! Such a test is like a magic spell, pressing the code tightly. Every time the code is changed, so many tests fail that they have to be rewritten. In essence, the code is modified twice, but it is more painful.
Not every bug fix requires writing a test. A common dogma circulated by many companies is that every time a bug is fixed, a test needs to be written for it to ensure that the bug does not happen again. Some people even ask you to fix a bug like this: first write a test that reproduces the bug, then fix it and make sure the test passes. This kind of thinking is actually a kind of rote dogmatism, which will seriously slow down the progress of the project, but the quality of the code will not be improved. Before writing tests, you should think carefully about a question: How likely is this bug to happen again in the same place? Once a lot of low-level mistakes are seen, it is unlikely to reappear in the same place. In this case, you can just manually verify that the bug is gone.
To spend a lot of time on bugs that are unlikely to reappear, to write a reproducer, to construct various data structures to verify it, and to ensure that it will not reappear next time is actually superfluous. Even if the same low-level error occurs again, it is likely not in the same place. Writing tests not only does not guarantee that it will not happen again, but also wastes a lot of your time. This test consumes time every time you build, and every compilation takes a few minutes more because of these tests. After accumulating, you will find that the progress of the project has slowed down significantly. You should write new tests only if you find that existing tests do not capture important properties that the program must satisfy. You shouldn't be writing tests for the bug, but for the nature of the code. The content of this test should not just prevent the bug from happening again, but to ensure that the previously missing "nature" reflected by the bug is guaranteed.
Avoid using mocks, especially multi-layer mocks. Many people like to use a lot of mocks and accumulate many layers when writing tests, thinking that only in this way can they test modules with deeper paths. In fact, this is not only very cumbersome and troublesome, but multi-layer mocks often cannot generate sufficiently diverse inputs and cannot cover various boundary conditions. If you find that the test requires multiple layers of mocking, then you should think about it. Maybe what you need is not mocking, but rewriting the code to make it more modular. If your code is modular enough, you shouldn't need multiple layers of mocks to test it. You just need to prepare some inputs (including edge cases) for each module and make sure their outputs meet the requirements. Then you pipe these modules together to form a larger module, test that it also complies with input and output requirements, and so on.
Don't pay too much attention to "test automation", manual testing is also testing. Writing tests, the word often implies the meaning of "automatic operation", that is, it is assumed that the test should be fully automated without manual operation. Type a command, and it will tell you what's wrong after a while. However, "human testing" is often overlooked. They don't realize that artificial experimentation and observation are also a kind of test. So you find this situation, because automatic testing is very difficult to construct in many cases (for example, if you want to test the response of a complex interactive GUI code), many people spend a lot of time, using various testing frameworks and tools, Even remotely controlling the WEB browser to do some automatic operations, it took too much time to find all kinds of unreliable and unable to measure many things.
In fact, changing the way of thinking, they only need to spend a few minutes to observe many in-depth problems manually. The reason for overemphasizing test automation often lies in an unrealistic assumption. They assume that errors will reoccur frequently, so automation can save human effort. But in fact, once a bug is fixed, the chances of it recurring are not very great. Excessive requirements for test automation not only delay the progress of the project, annoy programmers, reduce efficiency, but also lose the accuracy of manual testing.
Avoid writing tests that are too long and time-consuming. Many people write tests, and there is a long list of jabbering. When I read it later, he no longer remembers what he wanted to test. Some people can test the required properties with a small input, but they always like to give a large input, subconsciously thinking that this is more reliable. As a result, this test consumes a lot of build time every time, but actually achieves The effect is indistinguishable from very small inputs.
A test only tests one aspect, avoiding repeated testing. Some people test a lot of content in one test, but every time that test fails, they can't figure out which part is wrong. In order to "reassure", some people like to "incidentally" test certain components that he thinks are related in multiple tests. As a result, every time a problem occurs with that component, many tests are found to fail. If a test only tests one aspect and does not test the same component repeatedly, then you can quickly find the problematic component and location based on the failed test.
Avoid testing by comparing strings. When many people write tests, they like to print out something and then use string comparison to determine whether the output meets the requirements. A common practice is to print the output as formatted JSON and then compare the two texts. Some people don't even use JSON, and directly compare the results output by printf. This test is very fragile. Because the format of the string output often changes slightly, such as someone adding a space in it. Using this string as the standard output for string comparison can easily cause a large number of tests to fail due to minor changes, resulting in many tests requiring unnecessary modifications. The correct approach should be to perform a structured comparison. If you want to save the standard result as JSON, you should first parse the object represented by the JSON, and then perform a structured comparison. PySonar2's testing is done in this way, so it is quite stable.
The misunderstanding of "testing can help later generations". Whenever you point out the error of test dogmatism, someone will come out and say: "The test is not for yourself, but for people who come in after you leave and make no mistakes." First of all, this kind of person does not see clearly what I am saying What, because I've never objected to reasonable testing. Secondly, this kind of "testing can help later generations" is actually an untenable statement that has not been tested in practice. If your code is written in a mess, no matter how many tests you test, people will not be able to understand it later. Instead, they will be more confused by inexplicable test failures, not knowing whether they are wrong or the tests are wrong. I have already said that tests cannot completely guarantee that the code will not be corrected. In fact, their role in preventing code from being corrected is very weak. In any case, later people must understand the logic of the original code and know what it is doing, otherwise they will not be able to make correct changes, even if you have the most rigorous testing.

Give a personal example. After I made PySonar at Google, I didn't write the last test. The second time I went back to Google, my boss Steve Yegge said to me: "After you left, I changed some of your code. It was so clear and easy to grasp. It was a joy to modify your code!" What does this mean? I'm not saying that you can not write tests, but this example shows that the role of tests for later generations is not as great as some people think. Creating clean code is the key to solving this problem.

This kind of fear that people will leave suddenly and the code cannot be maintained has caused some people to pay too much attention to testing, but testing cannot solve this problem. On the contrary, if the test is too cumbersome and unnecessary tests are done, it is easy to dissatisfy the employees, and it is easy to leave to join a company that is more insightful in this regard. Some companies think that with the test, they can just send people away. This kind of thinking is very wrong. One thing you need to understand is that code always belongs to the person who wrote it, even with tests. No matter how many tests you have, if the key people really leave, the solution is to keep them! A far-sighted company always solves this problem by other means, such as preferential treatment and respect for employees, creating a good atmosphere so that they don't want to leave so quickly. In addition, companies must pay attention to the inheritance of knowledge to prevent certain codes from being understood by only one person.

case analysis

Some people may wonder, why can I tell others about these experiences, and what success stories do I have for this? So now for a few of the things I've done, and the failures I've seen test dogmatists fail.

Google

Many people may have heard of PySonar, which I did at Google . At that time, Google's teammates were trembling, saying that it was almost impossible to start such a difficult and complicated thing from scratch. In particular, a teammate quarreled for me to write a test from the very beginning, and kept arguing until the end, which annoyed me to death. Why are they so worried? Because type deduction for Python is a very difficult code, it requires quite complex data structures and algorithms, and it needs to be proficient in Python's semantic implementation.

As a trained professional, I didn't care about their bluffs, I didn't believe their dogma. I organized the code in my own way, conducted precise thinking, design and reasoning, and finally made a very elegant, correct, high-performance, and easy-to-maintain code within three months. PySonar is still the world's most advanced Python type inference and indexing system, adopted by many companies to process millions of Python codes. ,

If I followed the requirements of my Google teammates and adopted the existing open source code, or wrote the test prematurely, let alone not being able to complete this thing within three months of internship time, even if I struggled for several years, it would not be possible.

Shape Security

A recent success with this way of thinking is an advanced JavaScript obfuscator and improvements to the cluster management system for Shape Security. Don't underestimate this JS obfuscator, its obfuscation ability is much stronger and faster than open source tools such as uglify. It not only includes basic functions such as uglify's variable renaming, but also contains special complications for humans and compilers, so that no one can see a clue what the program is going to do, and even the most advanced JS compiler can't put it simplify.

In fact, this obfuscator is also a compiler, but it translates JavaScript into an unreadable form. In this project, because a slight error can make a huge difference, I adopted a very rigorous testing method learned from the Chez Scheme compiler. For each compiler step (pass), I design it some input code that can just measure this step (for example, with function definition, for loop, try-catch, etc.). The code output by Pass is executed by the JavaScript interpreter, and the result is compared with the execution result of the original program. For each test program, after each pass, the intermediate results output are compared with the standard results. If it is wrong, it indicates that there is a problem with that pass, and the small program that makes the error will indicate which part is probably wrong. Following the principles of smallness, no redundancy, and no repetition, I only wrote more than 40 very small JavaScript programs in total. Since these tests cover all JavaScript constructs and are almost non-repetitive, they pinpoint erroneous changes. Finally, this JS obfuscator can correctly convert a project as large as AngularJS, ensure the semantics are correct, it is completely unreadable, and it can effectively prevent it from being simplified by optimizers (such as Closure Compiler).

In contrast, people who overhyped testing and reliability failed to produce such high-quality obfuscators. In fact, before I joined the team, two or three masters had already made an obfuscator, and the project lasted for many months. This piece of code has not been released to customers, because its renaming component will always output wrong code in some cases, and it will still make mistakes after many modifications. Not 100% correct, which is unacceptable for a program language converter. Renaming is just one step in my obfuscator, which contains about ten similar steps that can transform the code in various ways.

When implementing the name changer, my teammates asked me to take the name change code they wrote before and just fix the bug. However, after looking at the code, I found that this code cannot be repaired, because it uses the wrong idea, and it is impossible to achieve 100% correctness by sewing and repairing, and it is obviously inefficient, so I decided to rewrite it myself. Since I am familiar with the road, it only took me an afternoon to complete a correct name changer, which fully complies with the semantics of JavaScript, various wonderful scope rules, and the structure is very simple. To put it bluntly, this name changer is also a kind of interpreter . A deep understanding of interpreters allows me to easily write name changers for any language.

Unfortunately, history repeated itself again ;) The teammates heard that I spent an afternoon rewriting a name changer, got very nervous, and bluffed to me: "You know how many months our name changer took Time to make it? Do you know how many tests we wrote to make sure it's correct? You can make a new one in an afternoon, how can you make it correct!" I don't know how they have the nerve to say such a thing Come, because the fact is that they spent so many months, so much manpower, and wrote so many tests, but the name changer they made still has bugs and doesn't work. When I put the tests I wrote and several larger open source projects (AngularJS, Backbone, etc.) Renamer, but got exactly the right code. In addition, after performance testing, my name changer is four times faster. So as Dijkstra said: "The most elegant programs are often the most efficient."

After finishing this project, I changed to a team (cluster team). The people in this team are much better, low-key and humorous. Shape Security's product (Shape Shifter) includes a highly reliable (HA) cluster management system, which can elect a leader through the network to build a highly fault-tolerant parallel processing cluster. This cluster management system has always been very complicated in the company, but it is a component with the highest reliability requirements. Once a problem occurs, it may have catastrophic consequences. Indeed, it was very reliable at the time and never had a problem. However, due to historical reasons, its code is overly complex and lacks modularity, making it difficult to expand to meet new customer needs. My task in entering this new team is to simplify, modularize and expand it on a large scale so that it can meet new requirements.

In this project, due to the large changes in the code, with the understanding, trust and support of colleagues and department leaders, we decided to abandon the existing tests directly and rely entirely on strict and timely code review, logical reasoning, deliberation and discussion, Experiment by hand to ensure the correctness of the code. While I am modifying the code, a teammate who is more familiar with the existing code has been silently monitoring every change I make through git, judging whether my change deviates from the original semantics based on his own experience, and communicating with me in time discuss. Thanks to this flexible and rigorous approach, the project was completed in less than two months. The improved code is not only more modular, more extensible, adaptable to new requirements, but still very reliable. Assuming that department leaders are "test dogmatists" who are not allowed to abandon existing tests, it is absolutely impossible for such a project to be completed on schedule. However, in today's world, less than one person in ten has the opportunity to lead like this.

Coverity

Finally, I would like to cite a case that failed very badly due to improper testing methods, and that is Coverity's Java static analysis product. I admit that Coverity's C and C++ analyzers are probably very good, however Java's analyzer, it's hard to say. When I entered Coverity, my colleagues had endured a full year of management bullying and high pressure, working overtime, writing basic new products and many tests. However, due to too much technical debt, no amount of testing can guarantee the reliability of the product.

My task is to use my in-depth PL knowledge to constantly fix all kinds of strange bugs left by the predecessors. Some bugs need to run for more than 20 minutes before appearing, and you can’t see what’s going on once, so it’s very time-consuming to fix. Sometimes I have to lie down in front of the computer to rest, and open my eyes to see the results from time to time. Coverity cares so much about testing that they require you to write new tests for every bug you fix. The test must be able to faithfully reproduce the phenomenon of the bug, and the test must pass after the fix. This seems to be a practice that cares about code quality, but it not only failed to ensure the stability and reliability of the product, but also greatly slowed down the progress of the project, and caused employee fatigue and dissatisfaction.

Once they assigned me a bug: when analyzing a medium-sized project, the analyzer seemed to go into an endless loop and not complete for hours. Because Coverity's global static analysis is actually some kind of graph traversal algorithm. When there are loops in this graph, you must be careful. If you recurse in without asking indiscriminately, you may enter an infinite loop. The way to avoid the infinite loop is very simple, you construct a collection (Set) of graph nodes, and then pass it to the function as a parameter. Whenever you visit a node, you first check whether the node is already in the collection, and if so, you return directly, otherwise you add the node to the collection, and then recursively process the child nodes of this node. Its C++ code looks like this:

void traverse(Node node, Set<Node> &visited)
{
  if (visited.contains(node)) {
    return;
  } else {
    visited.add(node);
    process_node(node, visited);   // 里面会递归调用 traverse
  }
}

After checking the code, I found that the code did not actually enter the "infinite loop", but entered the calculation of exponential complexity, so it could not be completed for a long time. This is because the person who wrote this function was not careful, or did not understand that C++ function parameters default to pass by value (do copy) instead of pass by reference, so he forgot to type the "&", so when the function is called recursively, it is not Instead of passing the original collection, a copy is made. Every recursive call traverse, visited gets a new copy, so after returning, the value of visited is restored to the previous state, just like node is automatically removed. So the function will still visit the node again under certain circumstances. Such code will not enter an infinite loop, but under a certain special graph structure, this will cause exponential time complexity (please think about what kind of graph this is).

It was an obvious graph theory algorithm problem, but it was fixed by adding an "&", and the manual test also found that the problem disappeared. However, the test dogmatists of Coverity (including the person who wrote the bug himself) clamored and seriously ordered me to write a test, construct a data structure that can cause such consequences, and ensure that this bug will not be repeated. appear again.

Isn't it hilarious to write a test for a mistake that I would never make, and it can't happen again? Even if you write tests, there is no guarantee that the same thing will not happen again. If you accidentally leave out the "&", the same problem will happen next time, and it will happen in another place, but you didn't write a test for that piece of code, so writing a test for this bug will not prevent the same problem Happen again. It's like an unskilled racing driver who crashes a car where others are unlikely to crash, and then asks the track to put a tire guard in that place. But next time, this driver will crash in another place where no one else will crash...

People who have a little common sense of graph theory and are familiar with the basic concepts of C++ will not make such mistakes. To prevent this kind of problem, only by personal technology and experience, not by testing. The best way to prevent it from happening again is probably to hold a meeting to explain this problem clearly so that everyone can understand and don't do it again next time. So writing a test for this bug is completely superfluous. I explained this principle to my teammates. After listening to them, they seemed to have heard nothing, and still demanded strongly: "But you still have to write this test, because this is our regulation! You know that if there is a bug, send How much does it cost for a sales engineer to go to a customer..." I was speechless.

Coverity's Java analysis is often because of this kind of test dogmatism, which makes the progress of the project extremely painful and slow, but it is still full of bugs. Other problems with Coverity include what I pointed out above, writing duplicate tests, testing too many things in one test, using string comparisons for tests, etc. I'm afraid it's hard for you to imagine that a company that manufactures products aimed at improving code quality maintains the quality of its own code like this: P

Finish

Because the vast majority of people have such a deep misunderstanding of testing and the prevalence of test dogmatism is so widespread, many excellent programmers have sunk into tedious test-driven development and cannot stretch their strengths. In order for everyone to have a relaxed, smooth and reliable working environment, I hope that everyone will forward this article more and change the bad habits of this industry. I hope that everyone will treat tests rationally in the project, instead of writing tests blindly. Only in this way can the project be completed better and faster.

Yin’s Memos

Discussion about this post