[GAP Forum] Lookup tables in gap
Arnaldo Mandel
am at ime.usp.br
Mon Jun 30 19:46:21 BST 2008
Alexander Hulpke wrote (on Jun 27, 2008):
> Dear Arnaldo Mandel, Dear Forum,
>
> > First, obvious solution: produced an ordered list L of all labels
> > (that was fast!). Although L is longer than N, it is true that the
> > label of N[i] is L[i], so, given a label s, the corresponding record
> > is N[Position(L,s)]. Also, since L is ordered, lookup should be fast.
> >
> > That seems fine, so I tried a traversal, which I know takes linear
> > time on the number of links. It took forever. I had the routine
> > print a progress report, and it was clearly crawling.
>
> from the description given, I understand that you have a list of
> strings, in which you are searching. The performance problems indicate
> that the strings are not immutable. In this case GAP cannot store that
> the list is sorted, but checks it every time.
> (The reason for this slightly disturbing behaviour is that it would be
> possible to change one of the strings (and thus making the list not
> sorted) without the list noticing. Section "Sorted Lists and Sets" in
> the manual has more details.
>
> A workaround is easy. Simply do
> for i in L do
> MakeImmutable(i);
> od;
>
> This should give you a substantial speedup.
Thanks for the tip! Although I had already worked around the problem,
I decided to test your suggestion. After all, I have known GAP
forever, but only recently I started to use it seriously. So, the
least I can get from this exercise is a better understanding of GAP.
I read about MakeImmutable, and it seemed to me that the simple
statement MakeImmutable(L); would accomplish the same as your loop
above. A little test confirmed it.
So, I tried this in my old traversal routine. Instead of using the
large list L, I used only the small one, N. Still, after a while it
was clear that it had not helped: the traversal still crawled. Just
to recall what I said before: on the same structure, with strings
mapped already to indices, the whole traversal took less than a
minute.
(A back of the envelope calculation: Position is called once in the
innermost loop of the traversal. Since Size(N) is just under a
million, execution of Position entails about 20 string comparisons;
this is probably 100 times longer than direct indexing, but this would
be still faster than what I observed, and would not account for the
gradual slowing down in tha algorithm)
Then I tried to see how efficient would be a record as an associative
array. In what follows I present an interaction so that it is made
clear what I did. The list N is called names, below:
gap> Size(names);
973438
gap> IsMutable(names);
false
gap> ForAny(names,IsMutable);
false
gap> ForAll(names,IsString);
true
gap> Maximum(List(names,Length));
141
# Note: more than half have length between 12 and 16.
gap> R:=rec();
rec( )
gap> for i in [1..Size(names)] do
> R.(names[i]):=i;
> if i mod 128 = 0 then
> Print("\r",i);
> fi;od;
So, I could see the progress along i. In the beginning, it looked
like a quick animation, one could barely read the numbers. At about
i=300000, things started to slow down visibly. It was taking 1s for
each block of 128 indices. Now it has reached 500000, and it takes
about 3s for each such block. I will let it go to see whether it
finishes or goes to exponential hell.
Cheers,
am
--
Arnaldo Mandel
Departamento de Ciência da Computação - Computer Science Department
Universidade de São Paulo, Bra[sz]il
am at ime.usp.br
Talvez você seja um Bright http://the-brights.net Maybe you are a Bright.
More information about the Forum
mailing list