R2018a Lessons Learned

How  Lessons Learned are Managed

GMAT lessons learned include things that we did well and should keep doing, and large scale things we should be doing to improve the software or our project.   Lessons learned are each discussed by the team and if we decide there is a real issue, we require a plan for improvement.   To make sure we are efficiently handling lessons learned, here are some high level guidelines for creating them.

What is a Lesson Learned

Lessons learned are issues that cause significant problems or could have caused significant problems, or are issues where we did something that significantly improved the software or our project.   Lessons learned require group discussion and probably a change in team habits, process or strategy.

Lessons learned satisfy one the following criteria:

  • Issue that is putting the project at greater risk than necessary
  • Issue that is causing significant inefficiency
  • Issue that is significantly lowering quality
  • What is Not a Lesson Learned

A lesson learned is not a minor annoyance, a tweak to an existing process, or something that can be resolved between team members in the everyday process of getting work done. Team members should bring these types of issues up at meetings, or work them among the team members involved.

A minor issue, (i.e.  not a lessons learned), satisfies one of these criteria:

  • Tweak to an existing process
  • Minor annoyance or gripe
  • Can be resolved by just picking up the phone, or discussing via email, or weekly meeting
  • Does not require significant change in habits or processes

Things We Should Keep Doing

Things We Should Change

Do Better

  • Linux script test system runs are currently performed from a folder named "linuxBin" in the test/script folder to avoid conflicts with the test/script/bin folder.  The scripts used to test on Linux should be merged into the bin folder to simplify repository updates.  (This has been tested and has no conflicts, but it's too close to release to do it until R2018a is out the door.)
  • Still lots of test config changes after QA complete, not running nav tests we should run etc.. need to formalize how we verify that test system configuration is nailed down by QA complete.   what folders should be run, review profiles spreadsheet etc.  We made a lot of changes two day before Code Freeze .
  • We need to check the Mac builds on any more recent operating systems we can try it on EARLIER in the process, not at the very end of the release process.
  • Scrub the system for deprecated features and remove at least low hanging fruit... both code and tests.
    • Save
    • LibCInterface
    • TrackingSystem
    • StatisticsAcceptFilter
    • StatisticsRejectFilter
  • Deprecated Files
    • Xerxes ?
    • ig_rz _origin.dat

    • ap_origin.dat

    • ig_rz_R2016A.dat

    • unused/obsolete NAIF kernels
  • Always build the system and run smoke tests before committing and pushing code.  (This includes when merging code from a branch into another branch.)  This is particularly critical near system freezes (QA Complete, Visual Freeze, and especially Code Freeze.)
  • Check sample missions on all platforms
    • Currently, running samples is assigned to one person.  We should modify the release process to assign it to one person per platform.
  • Seriously consider adding the bundling shell scripts to the daily build script so that we create zip/installer every day and test against those...
  • Include all of the plugin components when running smoke tests and system tests.  We are looking for side effects as well as failures in the component being worked, so we need to watch for those everywhere.
  • Raise and resolve test case failures as early as possible.  The recent issue with Ephem_GMAT_Code500_ACE_OpsPrototype_v13 was thought to be a config issue, but there were other signs of a problem that we neglected.
  • Log the exact Git commit and SVN revision used for daily builds and regression tests. It can be difficult to determine where in the commit graph the branch used for the build is, especially when multiple branches are being developed in parallel.
  • Some pushes to the central repository will not compile.  That makes it difficult to track down when a change occurred.  It would be useful to include a note on the commit message stating that the code does not compile, and why.
  • We should run Console version on Windows and GUI version on Mac at Feature Complete to identify any problems; then we can fix issues by QA complete.
  • Similarly, we should update data files at Feature Complete so that we can debug by QA complete.
  • The Console/Windows, GUI/Mac, and data file updates should be repeated when testing RC0.
  • We need to consider what is a realistic approach to updating User Guide.
  • We need to look at rundef/startup file conflicts and how to avoid them; for example turning alpha plugins off in rundef can hide error of not turning them off in startup file.
  • We should track tickets by release and sprint, not just sprint.
  • We should default to putting the current release in the “Fix Version” field in JIRA, and remove it later if the work is to be deferred.
  • We should define Rnnnnx-Sprint-Final as the time between Feature Complete and Code Freeze; and number all sprints before Feature Complete
  • We should have defined terminology for what should go in the “Fix Version” field for JIRA; all tickets should have one of the standard values. A notional set for discussion:
    • CCB incoming – new tickets for CCB disposition
    • Rnnnnx – assigned to a release
    • Near Term backlog – needs to go in next release
    • Long Term backlog – needs to go in some future release (I’ve seen “someday” in the Fix Version field, this is equivalent
    • Backlog – tickets to review to see if they are overtaken by events or should be put in some other category.
  • (SPH)  We need to complete complex features earlier in the development cycle and get out to users for beta testing.  We have too many bugs slip through, because we are assuming that our system tests, which are quite rigorous will catch everything.  But that is not the case.  A few months of user testing will find things that we missed.

Stop Doing