Yes, I said in my original comment that it can’t universally parse and validate every HTML document. If they’re older pages that don’t do lots of crazy formatting then it’s not too hard to use regex as a first pass then take a second pass through the results to weed out the odd stuff.
I’ve started using Obsidian with a kanban plugin, though any sufficient kanban style solution would work. I have a to-do column (aka backlog), an in-progress column, and a finished column. I add notes to the cards about what I did and I never delete stuff from the finished column so I can review if I need to re-open or re-do a task in the future.