Homework 7: Static Analysis with Infer Due Wednesday, April 3, 2024, 11:59PM AoE
In this assignment you will use a static analysis tool to automatically detect potential defects in three subject programs. The static analysis tool is Facebook’s Infer, which focuses on memory errors, leaks, race conditions, and API issues. Infer is open source.
You may work with a partner for this assignment. If you do you must use the same partner for all sub-components of this assignment.
Warning: Infer Can Be Hard To Run
You should use the setup from HW0 to run Infer. Previous students have reported that Infer does not run on the Windows Subsystem for Linux (WSL) or similar shortcuts for using Ubuntu- or Linux-like interfaces. Headless Virtual Box configurations are reported to work well. Officially, however, the HW0 setup is the (only) supported configuration for the class.
It is your responsibility to download, compile, run and analyze the subject program and associated tools. Getting the code and tools to work in some manner is part of the assignment. You can post on the forum for help and compare notes bemoaning various architectures (e.g., windows vs. mac vs. linux, etc.). Ultimately, however, it is your responsibility to read the documentation for these programs and tools and use some elbow grease to make them work.
The Static Analysis Tool: Infer
The Infer tool is a static analyzer — it detects bugs in programs without running them. The primary website is fbinfer.com.
Unfortunately, some versions of Infer can be obnoxious to build and/or install, despite their handy installation guide. We recommend that you use Infer’s latest binary release, which we have tested on the HW0 setup on the subject programs.
While times will vary, some students have reported that running Infer on jfreechart can take five hours. Start early!
You can find Infer’s output in the infer-out
folder.
First Subject Program: lighttpd Webserver
We will make use of the lighttpd webserver (pronounced "lighty"), version 1.4.17, as our first subject program for this homework. A local mirror copy of lighttpd-1.4.17.tar.gz is available, but you can also get it from the original website (but note that the specific version number is important: if you use a more recent version of lighttpd, you may struggle on some parts of the report). It is about 55,000 lines of code in about 90 files.
While not as large or popular as apache, at various points lighttpd has been used by YouTube, xkcd and Wikimedia. Much like apache, old verisons of it have a number of known security vulnerabilities.
The Common Vulnerabilities and Exposures system is one approach for tracking security vulnerabilities. A CVE is basically a formal description, prepared by security experts, of a software bug that has security implications.
There are at least ten CVEs associated with lighttpd 1.4.17 tracked in various lists (such as cvedetails or mitre). For example, CVE-2014-2324 has the description "Multiple directory traversal vulnerabilities in (1) mod_evhost and (2) mod_simple_vhost in lighttpd before 1.4.35 allow remote attackers to read arbitrary files via a .. (dot dot) in the host name, related to request_check_hostname." You can dig into the information listed in, or linked from, a CVE (or just look at subsequent versions of the program where the bug is fixed!) to track down details. Continuing the above example, mod_evhost refers to source file mod_evhost.c, mod_simple_vhost refers to file mod_simple_vhost.c, and request_check_hostname is in file request.c. You will want such information when evaluating the whether or not the tools find these security bugs.
Infer on lighttpd
Once you have Infer built or downloaded, applying it to lighttpd should be as simple as:
$ sudo apt install make $ sudo apt install python2-minimal $ sudo apt install zlib1g zlib1g-dev $ cd lighttpd-1.4.17 $ sh configure $ /path/to/infer/bin/infer run -- make
That should produce output similar to (but everything is fine if you get very different numbers):
Lighttpd Output Sample
make[1]: Leaving directory '/home/weimer/src/lighttpd-1.4.17' Found 88 source files to analyze in /home/weimer/src/lighttpd-1.4.17/infer-out Starting analysis... legend: "F" analyzing a file "." analyzing a procedure FFFFFFFFFF.....F...FF....F..FF.F..F....................................................................................FF.................................................F...........F..................F..................F...........................................................................F....................................................................F........................................................F.......F.................F...............F.......FF.............F...................F.............F.........F...F.................F...................................F............FF.F.....F.......................F.....FF..............F..F........FF..........FF.............FF.......FF.F....F......F......FFF..............F.........F...F......F...........F.......FF..........F.F...........F...F..F.......F..F...F........................F..F.........F....F........F.....F..F..........F............F....F...................F................................................................................................................................................ Found 308 issues src/joblist.c:19: error: NULL_DEREFERENCE pointer `srv->joblist->ptr` last assigned on line 16 could be null and is dereferenced at line 19, column 2. 17. } 18. 19. > srv->joblist->ptr[srv->joblist->used++] = con; 20. 21. return 0; ... Summary of the reports NULL_DEREFERENCE: 145 DEAD_STORE: 94 MEMORY_LEAK: 65 RESOURCE_LEAK: 3 QUANDARY_TAINT_ERROR: 1
(Before you worry about getting different numbers, double-check the prose above: it is fine to get different numbers. Similarly, it is common for this tool to only report a few "types" of defects: if you only see a few "types" of defects, you are running the tool correctly.) You will have to read through the output carefully and analyze the reported defects. Some will be true positives (i.e., real bugs in the code) and some will be false positives (i.e., spurious warnings that do not correspond to real bugs).
Second Subject Program: JFreeChart
The second subject program is the JFreeChart chart creation library. It is used to produce quality charts for a variety of applications and files. It contains about 300,000 lines of Java code spread over about 640 files. We will use JFreeChart version 1.5.0, which is available here.
Running Infer on JFreeChart similarly direct to running Infer on lighttpd:
JFreeChart Output Sample
$ cd jfreechart-1.5.0 $ /path/to/infer/bin/infer run -- mvn compile Capturing in maven mode... [INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building JFreeChart 1.5.0 ... Found 640 source files to analyze in /home/weimer/src/jfreechart-1.5.0/infer-out Starting analysis... ... Found 69 issues src/main/java/org/jfree/data/xml/DatasetReader.java:73: error: RESOURCE_LEAK resource of type `java.io.FileInputStream` acquired to `in` by call to `FileInputStream(...)` at line 72 is not released after line 73. 71. throws IOException { 72. InputStream in = new FileInputStream(file); 73. > return readPieDatasetFromXML(in); 74. } ... Summary of the reports THREAD_SAFETY_VIOLATION: 43 NULL_DEREFERENCE: 22 RESOURCE_LEAK: 4
Third Subject Program: Your Choice
Run Infer on one other substantial subject program of your choice. The target program must fulfill the following criteria:
- it must be open-source
- it must be “substantial”: that is, it must have a real use and not be a toy or example, and it should be reasonably complicated in the sense that you cannot read the whole program easily in one sitting
- it should be written in a language supported by Infer
Otherwise, the choice is up to you. Note, however, that the report will ask you to justify why it is interesting to run Infer on the program that you choose. If you’re unsure of what sort of project to choose, the subject programs from other assignments in this class are a good place to look (although not all subject programs are acceptable: for example, fuzzywuzzy.py
from HW6 is not an acceptable choice, because it is not “substantial”).
Written Report
Write a detailed report reflecting on your experience running Infer on the three subject programs. Your report must include your NJIT UCID (and your partner’s, if you choose to work in a group). Keep in mind that you need to cite any outside sources that you used during the assignment (besides the tool itself and its documentation).
Your report must address at least the following topics:
- [Setup] In a few sentences, describe your setup experience with Infer. This might include dependencies, installation, run time, etc. [1 point for description]
- [Usability] In a few sentences, describe your usability experience with Infer. This might include locating the reports, navigating the report or documentation website, etc. You should also contrast your usability experience with Infer versus at least one other tool that we have used in this course (of your choice). [1 point for description, 1 point for contrast, 1 point for insightful analysis]
- [Overall] At a high level, what did Infer do well? How might it be improved? Comment on defect report categorizations (e.g., NULL_DEREFERENCE). Did you observe any “duplicate” defect reports (i.e., the same underlying issue was reported in terms of multiple different symptoms)? What are the costs (in general, including developer time, monetary cost, risks, training, etc., and anything else mentioned at any point in class) associated with using Infer? [2 points for overall description, 1 point for categories, 1 point for duplicates, 1 point for costs]
- [CVE] Choose two of the CVEs associated with lighttpd. For each CVE, describe whether or not the tool reported the issue associated with the CVE (or would otherwise have pointed you to it). You must choose at least one CVE such that the tool points out the CVE in some manner. Overall, how effective is this tool at finding security defects? [1 point for each CVE, 1 point for conclusion]
- [Additional] Briefly describe the additional subject program that you chose. Why might it be interesting to run a static analysis on this program? What sort of defects might you expect to find (or not find), and why? [1 point for description, 1 point for why it would be interesting]
- [Comparison] Compare and contrast the defect reports produced for the three subject programs. On which did you find it most useful? Consider false positives, false negatives, and issues that you would consider to have high priority or severity. Include (copy-and-paste, screenshot, etc.) part of one report you found particularly noteworthy (good, bad, complex: your choice) and explain it. [3 point for compare/contrast, 1 point for inlined report and analysis, 2 point for other insights]
Note that the above grading rubric sums to 20 points, but the assignment is worth 10 points on Canvas. Your score on the assignment is the number of points that you get from the rubric above divided by 2, to normalize effort vs the other homeworks.
Students are often anxious about a particular length requirement for this report. Unfortunately, some students include large screenshots and others do not, so raw length counts are not as useful as one might hope. Instead, I will say that in this homework we often see varying levels “insight” or “critical thinking” from students. I know that’s the sort of wishy-washy phrasing that students hate to hear (“How can I show insight?”). But some of the questions (e.g., “what does cost mean in this report?”) highlight places where some students give one direct answer and some students consider many nuances. Often considering many nuances is a better fit (but note that if you make things too long you lose points for not being verbose or readable – yes, this is tough).
Let us consider an example from the previous homework. Suppose we had asked you whether mutation testing worked or not. Some students might be tempted to respond with something like “Yes, mutation testing worked because it put test suite A ahead of test suite B, and we know A is better than B because it has more statement coverage.” That’s a decent answer … but it overlooks the fact that statement coverage is not actually the ground truth. (It is somewhat akin to saying “yes, we know the laser range finder is good because it agrees with my old bent ruler”.) Students who give that direct answer get most of the credit, but students who explain that nuance, perhaps finding some other ways to indicate whether mutation testing worked or not, and what that even means, would get the most credit (and will also have longer reports). Students are often concerned about length, but from a grading perspective, the real factor is the insight provided.
Submission
Submit a single PDF report via Canvas. You must include your name and UCID (and your partner’s, if applicable).
There is no explicit format required. For example, you may use either an essay structure or a point-by-point list of questions and answers, or any other structure that communicates your point.
FAQ and Troubleshooting
Question: When I run infer.exe run -- make or infer run -- mvn compile I get errors like InferModules__SqliteUtils.Error or Maven command failed.
Answer: The most common issue is that Infer does not always run well on Windows Subsystem for Linux (WSL) or similar shortcuts to get a Linux- or Ubuntu-like interface on another OS. We strongly recommend a headless Virtual Box setup or a cloud machine (as recommended in HW0).
Question: When I try to run Infer, I get cannot execute binary file: Exec format error..
Answer: One student reports: "Finally got it. Turns out I was using a 32 bit processor (i386) so even when I set up my vm as 64 bit, it couldn’t run any x86-64 binaries. Fixed it by installing a 64 bit vdi. https://appuals.com/fix-cannot-execute-binary-file-exec-format-error-ubuntu/
Question: I see Maven command failed: *** mvn compile -P infer-capture when I try to run Infer.
Answer: Some students have seen success with:
sudo apt-get install cobertura maven sudo apt-get install openjdk-8-jdk
Others reported that "I ended up having to setup an Ubuntu VM in VirtualBox".Question: When I try to run Infer on JFreeChart, I get a `Compilation failure` including the following text (but compilation succeeds if I try to just run `mvn compile` on its own):
Usage Error: *** Maven command failed: *** mvn compile -P infer-capture *** exited with code 1
Answer: The problem is usually that there is a Java Runtime Environment (JRE) installed on your machine, but not a Java Development Kit (JDK). Try running `which javac`; if you get no output, then you need to install a JDK. You can install a JDK with `sudo apt-get install openjdk-11-jdk` (or similar for other Java versions).
Question: When I try to run infer on lighttpd, it dies when trying to build the first file with an error like:
External Error: *** capture command failed: *** make *** existed with code 2 Run the command again with `--keep-going` to try and ignore this error.
Answer: Some students have reported that starting over and being careful to run all of the commands resolved this issue. It may be caused by missing some required command from the instructions.
Question: When I try to run infer, I get some output but then Fatal error: out of memory. What can I do?
Answer: You may need to assign your virtual machine more memory (see HW0 for setup). You may also need to choose a different subject progam. Some students have reported this when analyzing cpython — perhaps a different program would work for you.
Question: When I try to run infer on libpng, it dies when trying to build the first file with an error like:
External Error: *** capture command failed: *** make *** existed with code 2 Run the command again with `--keep-going` to try and ignore this error.
Answer: One student reported that being careful to install all of the required build utilities, such as with this exact sequences, resolved the issue:
sudo apt install make sudo apt install python-minimal
Question: When I try to run infer on a program (e.g., lighttpd), it seems to produce no reports or output when I run infer run -- make. Instead, if I look very carefully at the output, hidden near the bottom is a warning like:
** Error running the reporting script:
Answer: You must have your directories set up so that infer/bin/infer is "next to" other files like infer/lib/python/report.py. Infer uses those extra scripts to actually generate human-readable reports. If you tried to copy the infer binary somewhere else, it won't work. Make sure you have all of the components of infer in consistent locations.
Question: I'm not certain why "false positives" and "false negatives" are relevant for comparing the tools. I'm also not certain how we tell if something is a false positive or a false negative. Can you elaborate?
Answer: We can elaborate a bit, but I will note that this aspect of the assignment is assessing your mastery of course concepts. That is, why false positives and false negative might be important, and how to distinguish between them, are critical software engineering concepts and might come up on exams as well. You may want to double-check your notes on these, including on the readings. Now for more detail:
Suppose you are able to determine the false positive rate of one tool — or approximate it. For example, suppose you find that Tool #1 produces twice as many false positives as Tool #2. Well, then you might combine that with some of the reading for the class. For example, this week's FindBugs reading notes "Our ranking and false positive suppression mechanisms are crucial to keeping the displayed warnings relevant and valuable, so that users don’t start ignoring the more recent, important warnings" (among other comments on false alarms), while another (not assigned, but interesting nonetheless!) similar paper notes "False positives do matter. In our experience, more than 30% easily cause problems. People ignore the tool. True bugs get lost in the false. A vicious cycle starts where ..." among other comments on false alarms.
Something similar could be considered for false negatives. To give a prose example rather than a reading list this time, a report might include a claim like: "Many developers will dislike a tool that claims to find Race Conditions but actually misses 99% of them. If the tool has that many false negatives, developers will feel they cannot gain confidence in the quality of the software and will instead turn to other techniques, such as testing, that increase confidence in quality assurance." I'm not saying that is a good or a bad argument, but it is an example of the sort of analytic text or line of reasoning that might be applicable here.
Students often wonder: "How do I know if the tool is missing a bug?" Unfortunately, that's a real challenge. There are at least two ways students usually approach that problem, and both require labor or effort. Similarly, determining if a report is a false alarm usually requires reading it and comprehending the code nearby.
I can't really say much more here without giving away too much of what we expect from you on this part of the assignment, but I can reiterate the soundness and completeness (false positives and false negatives) are significant concepts in this course and that you should include them, coupled with your knowledge of the human element of such tools, in your assessment of the tools.