[GAP Forum] Lookup tables in gap
Frank Lübeck
frank.luebeck at math.rwth-aachen.de
Mon Jun 30 21:20:57 BST 2008
Dear Arnaldo Mandel, Dear Forum,
> First, obvious solution: produced an ordered list L of all labels
> (that was fast!). Although L is longer than N, it is true that the
> label of N[i] is L[i], so, given a label s, the corresponding record
> is N[Position(L,s)]. Also, since L is ordered, lookup should be fast.
Short answer: Use N[PositionSorted(L,s)]. More generally, always use
'PositionSorted' instead of 'Position', when you know that your list
is sorted (but you are responsible to make sure that this is true).
And below is a longer answer as well. I'll try to discuss some aspects of
the mentioned problem in a commented GAP session. Maybe this is of more
general interest, since even experienced GAP users are sometimes running
into certain traps when dealing with long lists.
> Last chance: use a record as an associative array. That is, create a
> record R such that, for every label s, R.(s) = Position(L,s). Looking
No, this doesn't help. In the moment record components are stored unsorted
and so they are searched linearly. (We will probably change this with the
next release of GAP.)
With best regards,
Frank
gap> # As an example we produce a sorted list of strings without duplicates,
gap> # containing almost 10^6 entries:
gap> L := Set(List([1..1000000], i->
> ShallowCopy(String(Random(100000000,900000000)))));;
gap> Length(L);
999379
gap>
gap> # Now L is mutable and has mutable entries.
gap> # Using 'Position' on L uses linear search:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9037
gap>
gap> # Here GAP cannot remember that L is actually sorted, because the list L
gap> # doesn't know about changes of its entries (and changing an entry could
gap> # make the list unsorted). So, GAP cannot do better above.
gap>
gap> # Now, if we make the entries of L immutable, the above problem cannot
gap> # occur, it will be no longer possible to change the entries of L (of
gap> # course, this step may not be an option for you, if you do want to
gap> # change the entries of L, maybe later).
gap> for s in L do MakeImmutable(s); od;
gap>
gap> # ok, let's try again:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9016
gap>
gap> # Hm, we get the same as before? The problem now is that GAP didn't run
gap> # through the whole list to check if it is sorted. Let's look at the
gap> # information GAP has stored about L:
gap> TNUM_OBJ(L);
[ 20, "list (plain,dense)" ]
gap>
gap> # Ok, we make GAP learning more about L:
gap> IsSortedList(L);
true
gap> TNUM_OBJ(L);
[ 40, "list (plain,table,ssort)" ]
gap>
gap> # Since the entries of L are now immutable, GAP can remember the
gap> # sortedness. Another try for our loops:
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
0
gap>
gap> # So, we are lucky that GAP has a builtin feature to remember that L is
gap> # sorted in this case! It can now use a much faster binary search. We
gap> # can run much longer loops:
gap> for i in [1..Length(L)] do p := Position(L, L[i]); od; time;
700
gap>
gap> # But, there can still be a problem in a very similar setting. Let us
gap> # double the last entry of L:
gap> Add(L,L[Length(L)]);
gap>
gap> # Now L is still sorted, but entries are not unique. Nevertheless, to
gap> # find an entry in L we could still use a binary search. But see what
gap> # happens:
gap> IsSortedList(L);
true
gap> TNUM_OBJ(L);
[ 38, "list (plain,table,nsort)" ]
gap> for i in [1..100] do p := Position(L, L[i]); od; time;
0
gap> for i in [1..100] do p := Position(L, L[900000+i]); od; time;
9049
gap>
gap> # Unfortunately, GAP has only a hook for remembering that a list is
gap> # sorted without duplicates, but not just that it is sorted. So, now
gap> # we are unlucky, even having immutable entries in L doesn't help.
gap>
gap> # But, remember my very first hint, if you know that a list is sorted
gap> # use 'PositionSorted' instead of 'Position', then you don't need to
gap> # care about all these internals:
gap> for i in [1..Length(L)] do p := PositionSorted(L, L[i]); od; time;
1052
gap>
gap> # Finally, let me mention an effect (a trap), which becomes worse if the
gap> # entries of a long list are again lists or records, and their entries
gap> # ....
gap> # If the entries are mutable then GAP runs recursively through this
gap> # structure whenever it is used as an argument of an operation. (This
gap> # is to find the "type" of the object and to find the right method for
gap> # the operation.) This is particularly annoying if the operation is
gap> # doing something quite cheap and most information in the 'type' of
gap> # the list is not needed.
gap> N := List(L, s-> rec(label:=rec(string:=[s])));;
gap> # 'Size' for a list returns its 'Length':
gap> for i in [1..100] do l := Size(N); od; time;
3940
gap> # If you use 'Length' directly, GAP uses a kernel hook to avoid the
gap> # problem in this particular case:
gap> for i in [1..1000000] do l := Length(N); od; time;
64
gap> # Making entries immutable helps. But again, sometimes this may not
gap> # be a solution because you do want to change your objects later.
gap> for r in N do MakeImmutable(r); od;
gap> for i in [1..1000000] do l := Size(N); od; time;
280
More information about the Forum
mailing list