R2020a Lessons Learned

How  Lessons Learned are Managed

GMAT lessons learned include things that we did well and should keep doing, and large scale things we should be doing to improve the software or our project.   Lessons learned are each discussed by the team and if we decide there is a real issue, we require a plan for improvement.   To make sure we are efficiently handling lessons learned, here are some high level guidelines for creating them.

What is a Lesson Learned

Lessons learned are issues that cause significant problems or could have caused significant problems, or are issues where we did something that significantly improved the software or our project.   Lessons learned require group discussion and probably a change in team habits, process or strategy.

Lessons learned satisfy one the following criteria:

  • Issue that is putting the project at greater risk than necessary
  • Issue that is causing significant inefficiency
  • Issue that is significantly lowering quality
  • What is Not a Lesson Learned

A lesson learned is not a minor annoyance, a tweak to an existing process, or something that can be resolved between team members in the everyday process of getting work done. Team members should bring these types of issues up at meetings, or work them among the team members involved.

A minor issue, (i.e.  not a lessons learned), satisfies one of these criteria:

  • Tweak to an existing process
  • Minor annoyance or gripe
  • Can be resolved by just picking up the phone, or discussing via email, or weekly meeting
  • Does not require significant change in habits or processes

Things We Should Keep Doing

  • [ MES ] Release-focused CCB meetings separate from the Thursday team meeting worked well. It is better to start fresh with tickets rather than start after an hour or more of another meeting, and it seemed to me that the CCB was much more thorough than for R2018a. It may be good to separate out CCB during development as well, but the big payoff was during the release process.

Things We Should Change

Do Better

  • [ SPH ]We need to make sure that all development is following our process as documented in the SMP and software development tickets in JIRA.  We found significant issues during late stage development that would all be done if the checklists in those processes were used during development.
    • [ MES ] This needs to be a no-exception thing – on the FOV stuff we had debugged code for TAT-C and assumed that it could simply be converted class-by-class into GmatBase derived classes. However that is where we missed things like variables in Hardware being used by Thruster in a completely different way from how the FOV computations would use them. The rework of Hardware/FieldOfView can be used to illustrate the process as documented in a step-by-step way.
  • We need to test the samples earlier and more frequently on all 3 platforms (perhaps weekly starting right before the Release process and continuing with the RCs).  NOTE that the samples should be run on a clean build without (or before) running preparegmat, to find missing input data files.
  • After editing sample scripts and before committing and pushing, run the entire relevant folder so that interdependent issues are captured.  (Example: some scripts were deleted that were #Include scripts.  If the related folder had been run, that would have been caught.)
  • Some code is missing elements that should be added before being pushed to the main repository.  Examples are the block at the top of the file identifying the copyright information, and comments of class parameters and methods.  We need to do do better in reviewing code before pushing it because the coder is much less likely to make those changes later in the process.
  • Improve regression testing process
    • [ SPH ]Mac tests need to be run regularly with results emailed out just like other platforms.  We got lucky here but could have found issues late in release. 
    • [ SPH ]Need to update regression test process to include API tests.  API tests cannot be run in the same RunDef as the rest of GMAT regression tests because they require a different mode.  We need to create a new master test runner file that sets up multiple run defs
  • [ JM ] Use branches more often to reduce the development that happens directly on master.
  • [ SPH ] Business model update to focus development on project needs.
  • [ SPH ] There are release processes that should be moved to development processes so they occur early and ensure nightly builds are as near release as possible.  Requirements migration, RTTM mapping, wrap up tests, xml and rst docs. 
  • [ SPH ] Keep tests in release configuration.  There are too many config changes late in release that hurt us.   Cycle through config such as internal, public, alpha during weekly or monthly regression tests so taht we are always testing the different configurations. 
  • [ MES ] Add/move tasks to be accomplished to attain feature complete milestone
    1. Merge of key branches into master, after first merging master into branches and running tests – at the very least smoke tests in the branch before merging back to master
    2. Updating the data files with GMAT python utility (now 1st task on Code Freeze list)
    3. Move all new test scripts that are complete at this point into gmat_test/test/script/input
    4. Move all new sample scripts that are complete at this point into gmat/application/samples
    5. Create an RC0 on each platform with  new capabilities as they stand, internal plugins and alpha plugins all included – in short, everything included; then run nightly build tests.

            The idea here is to have a "super-smoke test" at the beginning of release testing to identify issues early. I think we would have caught the API issues, the OFI issues, and at least the impact of Hardware changes on the Thruster class.

  • [ MES ] Acquire a new test Mac so we don't have to run on developer laptops. Even if we build GMAT on the test machine and unzip onto developer machines, we still have to split into two sets of tests due to VPN crapping out after 24 hours. We could explore if we can get this time increased to 48 hours, but I'm not optimistic on how easy it would be to waive IT's policies.
  • [ DJC ] Release more frequently.  That would have several benefits:
    • It is easy to forget how to perform some steps.  Two years between releases is too long.
    • This release is quite complex, with many new features.  More frequent releases would reduce that complexity.
    • External teams waiting for fixes or new features would be less inclined to move to other tools.

      Doing this would require:
    • A more streamlined release process:
      • Internal:  Perhaps make intermediate releases beta, but if so we must stick to the beta schedule.
      • External:  The release authority and other NASA approval chain entities would need to be prepped for this change.
    • Merging of new features as they occur.  (We should be doing this anyway.)
  • [ SPH ] Create a note card commit/push check list.
  • DSC  Keep Mac regression test system in good shape.  e.g., update to using a more recent Matlab version. 
  • [SN] Investigate the cost/benefit of including a Gitkraken pro license as a line item in the GMAT budget. Separately, consider voting on a uniform repo visualization tool for the whole team that works well and looks the same on all OS platforms.
  • [ RM ] Write-protect the master branch, and STRONGLY encourage development on feature branches.
    • This does NOT mean the master branch gatekeeper needs to review merges into master.
    • This DOES mean that the branch owner will have to ask the gatekeeper to merge their branch into master. This will nearly eliminate accidental merges into master.
    • This will not be a heavy task. The gatekeeper exists merely to ensure that branch owners don't accidentally merge into master.
    • Look into whether the gatekeeper can be an automated script of some kind.

Stop Doing

  • [ SPH ]  The number of permutations of different packages is too large, zip, installer, public, internal, mac, linux, windows.   I propose we consider only releasing a zip or installer on Windows to reduce the number of permutations of tests to run. 
    • [ MES ] An alternative to this is to run only smoke tests on either the zip or installer, with full tests on the other.
    • [ MES ] Mac process currently runs full tests only on the internal version, on the theory that the public version is a subset of those capabilities. This approach may be reasonable on other platforms, with the addendum that all delivered versions should at least be smoke tested.
    • [ DJC ] On Linux, perhaps move to a package based environment (e.g. Docker or Kubernetes)