After you've written a program, you might spend some time optimising it. You might unroll loops, or write parts of it in assembler. And so on.
But the algorithm that you use is massively more important than any optimisations you do afterwards.
The first is from the antivirus I wrote. Someone once said to me "I couldn't write a program that just reads all the files, that runs as fast as your scanner." Yes, well. Certainly back then when we were talking about several thousand viruses, you didn't actually need to read the whole file to be able to be certain that none of the existing viruses were present. To get that, I had to analyse each virus to determine where in the file the virus had to be if it was there. And so I only needed to read part of each file.
I also did a three-stage process; the first stage scanned for two bytes only. That generated some "false alarms", but 99% of files needed no further examination, which made things faster.
And so on.
The second example is from a major database I run. Every six months or so, I have to run an indexing process, which usually takes several hours. I ran it recently, and it was nine hours. The problem is, the database has grown massively, and when I wrote the indexing program, I didn't think too hard about it.
Nine hours got me thinking, and I came up with a completely different algorithm to do the same job. I timed it - 45 seconds!