How Lessons Learned are Managed

GMAT lessons learned include things that we did well and should keep doing, and large scale things we should be doing to improve the software or our project. Lessons learned are each discussed by the team and if we decide there is a real issue, we require a plan for improvement. To make sure we are efficiently handling lessons learned, here are some high level guidelines for creating them.

What is a Lesson Learned

Lessons learned are issues that cause significant problems or could have caused significant problems, or are issues where we did something that significantly improved the software or our project. Lessons learned require group discussion and probably a change in team habits, process or strategy.

Lessons learned satisfy one the following criteria:

Issue that is putting the project at greater risk than necessary
Issue that is causing significant inefficiency
Issue that is significantly lowering quality

What is Not a Lesson Learned

A lesson learned is not a minor annoyance, a tweak to an existing process, or something that can be resolved between team members in the everyday process of getting work done. Team members should bring these types of issues up at meetings, or work them among the team members involved.

A minor issue, (i.e. not a lessons learned), satisfies one of these criteria:

Tweak to an existing process
Minor annoyance or gripe
Can be resolved by just picking up the phone, or discussing via email, or weekly meeting
Does not require significant change in habits or processes

Things We Should Keep Doing

[ JHE ] Assigning managers to oversee specific aspects of the release (code freeze, visual freeze, test freeze) helped make the release make steady progress
[ JHE ] Continue using dashboards and tags to track the release progress and categorize the priority of different tickets towards the release
[ MES ] Start paperwork process at the start of development for the next release; mainly internally but be in touch with lawyers and others about any issues we anticipate. Main thing is to write a draft of the release contents section and keep track of contributors as we go along. In our tracking of progress paperwork should be a parallel path to the technical, in other words not in the line of progress between the various freezes (although technical milestones do help understand progress, I don't think the paperwork depends on the completion of any of them until we are staging for release and must have the final approval.

Things We Should Change

Do Better

[ AHC ] Prepare release announcement text earlier. Do not write it the same day we are posting it to Sourceforge news.
[ JHE ] Need to review demarcation between code freeze, test freeze, and app freeze, with particular emphasis between the first two.
- [ MES] Seconded – another way of putting it is clearly stating what "done" means for each phase.
[ PJC ] Assign freeze managers 6 months in advance of the release so they can review their duties and begin work as soon as possible. This will reduce the overall work that needs to be done. Perhaps have these positions filled immediately after the previous release.
- [ MES ] First step is to streamline the process for each step, and pull the paperwork part out to be a separate, fully parallel process. If you want to teach people their duties the first step is to clearly define what they are
[ MES ] Start paperwork process at the start of development for the next release; mainly internally but be in touch with lawyers and others about any issues we anticipate. Main thing is to write a draft of the release contents section and keep track of contributors as we go along. In our tracking of progress paperwork should be a parallel path to the technical, in other words not in the line of progress between the various freezes (although technical milestones do help understand progress, I don't think the paperwork depends on the completion of any of them until we are staging for release and must have the final approval.
[DJC] Review release periodicity and streamline the process. 2 years is a long interval between releases; smaller, more frequent releases might be less costly in time and might enable "greasing the skids" with the release authorization organizations.
- [ MES ] some ways to streamline process
  - Keep the number of test failures in check throughout development – build bug-fixing time into schedule
  - Periodically do some of the tasks (e.g., removing compiler warnings) throughout development see below
  - Create separate scripts and CMake configurations for building public release candidate, internal release candidate, public build, and nightly build. On the Mac going into R2022a we only had a nightly build script, and were following Wendy's step by step instructions for release candidate builds.
  - Decide how "from scratch" the creation of a release should be. The standard in Wendy's instructions was to delete the CMake cache; do we want to carry this further and create new repos for the release? This isn't strictly a streamlining thing, but a decision that should be considered.
  - Build an RC0 at Feature complete that includes running configure.py and updating data files such as EOP and leap second – so it doesn't get left to the last minute.

Start Doing

[ MES ]Get processes clearly documented so
- people can clearly understand their tasks
- protect against people leaving / need to train new people
[ SES ] Consider developing and testing every bug and new feature in its own branch. Only push feature code and test cases to dev when complete and 100% passing. This way dev is always in a "ready to deliver" state.
- [ PJC ] If this is not feasible, ensure we stay on top of failing test cases. If a developer pushes a feature that breaks test cases. They are not finished with their task until all tests are passing again.
- [ MES ] This is the way git is designed to be used.
- [ MES ] We should also periodically do some of the test activities during development (perhaps quarterly?). I'm thinking things like
  - Removing Compiler warnings
  - Running static analysis tools
  - Perhaps we should add running a profiler as well to pay more attention to performance
  - Updating the /data directories (EOP, leap seconds, density models,...)
  - Possibly deleting CMake cache and re-running configure.py – it may be worth continually showing the build process works and is properly documented.
    This is in addition to keeping up with failing test cases, which is more important than any of these individually. Collectively, these items should save effort later
  - [ PJC ] Save mode tests

[ SES ] Testers to pay more attention to maintaining and updating the Ubuntu and Mac test results.
- [ MES ] need clear definition of what is to be stored on mesa-file. I think people have ideas but nothing is written down anywhere.
- How can SES get access to Mac results? SES and AC to converse.
[ SES ] Tag, label, and categorize tickets more consistently to make it easier to identify what bugs were fixed and what bugs exist in any release
- [ MES ] Need instructions in "New team member's" guide
[ DSC ] With each release, or even outside the release process, consider creating a "Getting Started with GMAT" doc based on our existing "GMAT Stakeholder Overview-Training Resources" document. Link to this doc from both our Wiki and SourceForge.
- We often get questions like this from customers.
- Perhaps, we can find customer willing to fund more and updated GMAT training videos.
[ AHC ] GMAT would benefit from having a dedicated Test Architect/Manager
- Having one person manage all 3 test systems
- Keep track of all failing tests are update corresponding Jira tickets
- Setting up and overseeing a continuous integration pipeline
- They could also act as a build manager so whenever we needed a specific build on any system we know who's able to make it.
[ AHC ] Cleanup Jira ticket workflow
[ PJC ] Take notes on all issues discovered and fixed during the release process. This can be added in the notes section of the release process page. These do not need to be in depth. Example: Item: Perform Static Analysis → Note: Performed static analysis, discovered 22 issues that were resolved via minor code updates over 2 weeks.

Stop Doing

[ MES ] Stop using SVN – I don't think it should be too hard to set up a git repo for test cases.

Browser not supported