I had an idea last night about how to redo the MMD distance calculations. It's easy to say that I was...affected by chromatic's post to the list awhile back about the inefficencies of the MMD system.
The current system loops over the function signature string to create a type tuple array PMC. This tuple is used with the list of possible candidates to calculate manhattan distances. Once all the distances are calculated, the list is sorted according to this distance and the "best" match (with the smallest distance) is returned.
I had an idea that using a Levinshtein distance on the signature string directly would prevent us the need to create tuples and then calculate a manhattan distance. Actualy, we could probably use a simple Hamming Distance for most cases, since we don't want to match transposed arguments or cases of incorrect argument numbers.
If I have some free tuits, I might throw together a prototype subclass of the MultiSub PMC to prove the idea works (and hopefully prove that it's a performance gain).
Why not use a low watermark algorithm to get the "best" match? Even if you need to keep track of all methods with the current "best" distance to break ties in the end, it seems sorting is unnecessary.
Having never implemented a MMD myself, I am kind of curious how a string edit distance (Levenshtein or Hamming) could be used as a replacement to Manhattan distance. Could you provide a simple example?
Re:Why Sort?
Whiteknight on 2009-01-06T23:25:58
The Parrot MMD system used to sort the candidate list, but apparently (according to chromatic++) we don't anymore.
As to the ability of a Hamming distance to perform this duty, it's just a thought that I had. It may turn out to be untenable, but it's something I would like to try. As far as I understand the type tuple generation code, the tuple only contains the information that the signature string contains, with the addition of the PMC type.
For simple situations, we can take a string like "SS->I", and compare it to various candidates:
"PP->I" (distance = 2)
"PS->I" (distance = 1)
"NP->I" (distance = 2)Here, the "PS->I" candidate is the closest winner, and can easily handle an SS->I function call by autoboxing the first candidate to a String PMC.
Again, it's just an idle thought I had.