Discussion:
[erlang-patches] improved orddict performance
unknown
2013-09-30 07:38:52 UTC
Permalink
This patch improves the performance of some orddict functions by
reimplementing them using the lists module:

https://github.com/erlang/otp/pull/91

The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.

--steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130930/4ad85951/attachment.html>
unknown
2013-09-30 08:00:37 UTC
Permalink
Hello Steve,

Great patch, I added a few comments about some edge cases.

Regards,
Post by unknown
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes performance measurements, so please read it to know more about the changes.
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
unknown
2013-09-30 09:32:52 UTC
Permalink
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.

--steve
Post by unknown
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20130930/25dd7495/attachment.html>
unknown
2013-10-02 07:02:35 UTC
Permalink
I've added a new commit to this branch that addresses the concerns Anthony
raised. Please refetch.

--steve
Post by unknown
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.
--steve
Post by unknown
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131002/3f32b449/attachment.html>
unknown
2013-10-02 07:14:25 UTC
Permalink
This looks great, major improvement for larger orddicts, +1
Post by unknown
I've added a new commit to this branch that addresses the concerns Anthony
raised. Please refetch.
--steve
Post by unknown
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.
--steve
Post by unknown
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
/* Sincerely
--------------------------------------------------------------
Pedram Nimreezi - Chief Technology Officer */

// The hardest part of design ? is keeping features out. - Donald Norman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131002/f31ba3e8/attachment-0001.html>
unknown
2013-10-02 21:36:39 UTC
Permalink
I've added a third commit to this branch to address new concerns and
suggestions from Anthony Ramine and Richard Carlsson. I appreciate your
help, guys.

Please refetch.

--steve
Post by unknown
I've added a new commit to this branch that addresses the concerns Anthony
raised. Please refetch.
--steve
Post by unknown
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.
--steve
Post by unknown
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131002/9f6f10fd/attachment.html>
unknown
2013-10-04 00:58:52 UTC
Permalink
OK, Anthony brought up yet another issue with my previous modified orddict,
which is that if someone passes an improper list instead of an orddict, the
new version acts differently than the old. Since the goal is backwards
compatibility with improved performance, I've added a fourth commit that
deals with these issues.

Notably, the original orddict, if passed an improper list or a list with
elements that are not 2-tuples, will still work properly if the key being
dealt with occurs before the problematic part of the list. For example,
this works with the R16B02 orddict:

{ok,2} = orddict:find(1,[{1,2},{2,3,4}]).

I believe the latest commit addresses this and the previous comments, while
still providing an orddict with improved performance:

https://github.com/vinoski/otp/commit/a261aae36f617e39c5e53dc8a16fa186e4c5830c

Thanks again to Anthony and Richard for their excellent feedback.

--steve
Post by unknown
I've added a third commit to this branch to address new concerns and
suggestions from Anthony Ramine and Richard Carlsson. I appreciate your
help, guys.
Please refetch.
--steve
Post by unknown
I've added a new commit to this branch that addresses the concerns
Anthony raised. Please refetch.
--steve
Post by unknown
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.
--steve
Post by unknown
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131003/7355c005/attachment.html>
unknown
2013-10-04 03:27:42 UTC
Permalink
Post by unknown
OK, Anthony brought up yet another issue with my previous modified
orddict, which is that if someone passes an improper list instead of an
orddict, the new version acts differently than the old. Since the goal is
backwards compatibility with improved performance, I've added a fourth
commit that deals with these issues.
I'm not sure whether this is the backwards compatibility one wants to
preserve. Usually you want to preserve an API, but in this case you're
preserving code that manipulates the internal representation of an orddict.
Any code that uses the orddict API should not end up with an improper list.
Any code that passes an improper list to orddict should break.
--
Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131003/a836a909/attachment.html>
unknown
2013-10-04 07:34:18 UTC
Permalink
That's actually a pretty good point,
Backwards compatibility in Erlang is usually a slow gradual 3 version
process..
But it might be prudent to throw a warning when a non orddict is
introduced.
Post by unknown
Post by unknown
OK, Anthony brought up yet another issue with my previous modified
orddict, which is that if someone passes an improper list instead of an
orddict, the new version acts differently than the old. Since the goal is
backwards compatibility with improved performance, I've added a fourth
commit that deals with these issues.
I'm not sure whether this is the backwards compatibility one wants to
preserve. Usually you want to preserve an API, but in this case you're
preserving code that manipulates the internal representation of an orddict.
Any code that uses the orddict API should not end up with an improper list.
Any code that passes an improper list to orddict should break.
--
Rich
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
/* Sincerely
--------------------------------------------------------------
Pedram Nimreezi - Chief Technology Officer */

// The hardest part of design ? is keeping features out. - Donald Norman
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131004/7a3266c7/attachment.html>
unknown
2013-10-04 10:19:08 UTC
Permalink
Post by unknown
Post by unknown
OK, Anthony brought up yet another issue with my previous modified
orddict, which is that if someone passes an improper list instead of an
orddict, the new version acts differently than the old. Since the goal is
backwards compatibility with improved performance, I've added a fourth
commit that deals with these issues.
I'm not sure whether this is the backwards compatibility one wants to
preserve. Usually you want to preserve an API, but in this case you're
preserving code that manipulates the internal representation of an orddict.
Any code that uses the orddict API should not end up with an improper list.
Any code that passes an improper list to orddict should break.
But in the case of orddict, it is actually documented to have a data
representation being an ordered list of key-value tuples.

Therefore there is a possibility that someone has misused that
knowledge and uses an improper ordered list of key-value tuples,
which may not be violating the documentation...
Post by unknown
--
Rich
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
/ Raimo Niskanen, Erlang/OTP, Ericsson AB
unknown
2013-10-04 11:26:02 UTC
Permalink
On Fri, Oct 4, 2013 at 5:19 AM, Raimo Niskanen <
Post by unknown
Post by unknown
Any code that uses the orddict API should not end up with an improper
list.
Post by unknown
Any code that passes an improper list to orddict should break.
But in the case of orddict, it is actually documented to have a
data representation being an ordered list of key-value tuples.
Therefore there is a possibility that someone has misused that knowledge
and uses an improper ordered list of key-value tuples, which may not be
violating the documentation...
If the documentation exposes the implementation (which is not a good idea,
either) as an ordered list of key-value tuples, then users shouldn't
complain when it doesn't handle improper lists.
--
Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131004/68b2406a/attachment.html>
unknown
2013-10-08 16:24:06 UTC
Permalink
Normally I agree with you but in this case we decided that we should have an orddict which completely specified the internal representation. The whole point of orddict is that the internal structure is documented and can be relied on. So you *can* use it as a proplist if you want, or as a list of key-value options to a function. Re proplists I feel that the orddict API is cleaner.

I personally wouldn't worry about compatibility if someone calls these functions with an improper list. But then again I have rarely worried about backwards compatibility. :-)

Robert

----- Original Message -----
From: "Rich Neswold" <rich.neswold>
To: "erlang patches" <erlang-patches>
Sent: Friday, 4 October, 2013 1:26:02 PM
Subject: Re: [erlang-patches] improved orddict performance
On Fri, Oct 4, 2013 at 5:19 AM, Raimo Niskanen <
Post by unknown
Post by unknown
Any code that uses the orddict API should not end up with an improper list.
Any code that passes an improper list to orddict should break.
But in the case of orddict, it is actually documented to have a data
representation being an ordered list of key-value tuples.
Therefore there is a possibility that someone has misused that knowledge
and
uses an improper ordered list of key-value tuples, which may not be
violating the documentation...
If the documentation exposes the implementation (which is not a good idea,
either) as an ordered list of key-value tuples, then users shouldn't
complain when it doesn't handle improper lists.
--
Rich
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131008/a4f093f9/attachment.html>
unknown
2013-10-04 10:23:41 UTC
Permalink
Hello Steve,

Btw did you measure a call to is_key/2 with a very large orddict not containing the given key? Wouldn't the short-circuiting due to ordering allow for better performances than using keyfind/3?

Also please note there is nearly no way to totally address backwards-compatibility with lists functions, due to the short-circuiting:

orddict:is_key(b, [{a,1},{c,3},not_a_pair]).
OK, Anthony brought up yet another issue with my previous modified orddict, which is that if someone passes an improper list instead of an orddict, the new version acts differently than the old. Since the goal is backwards compatibility with improved performance, I've added a fourth commit that deals with these issues.
{ok,2} = orddict:find(1,[{1,2},{2,3,4}]).
https://github.com/vinoski/otp/commit/a261aae36f617e39c5e53dc8a16fa186e4c5830c
Thanks again to Anthony and Richard for their excellent feedback.
--steve
I've added a third commit to this branch to address new concerns and suggestions from Anthony Ramine and Richard Carlsson. I appreciate your help, guys.
Please refetch.
--steve
I've added a new commit to this branch that addresses the concerns Anthony raised. Please refetch.
--steve
Great feedback, Anthony -- thanks very much. Clearly there are more backward compatibility issues I need to address.
--steve
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes performance measurements, so please read it to know more about the changes.
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
--
Anthony Ramine
unknown
2013-10-04 17:15:19 UTC
Permalink
Post by unknown
Hello Steve,
Btw did you measure a call to is_key/2 with a very large orddict not
containing the given key? Wouldn't the short-circuiting due to ordering
allow for better performances than using keyfind/3?
Yes, this is definitely a scenario for which the R16B02 orddict can be much
faster. Where the difference point lies depends on the terms involved. My
posted tests don't include this scenario but instead measure the time it
takes to find a key halfway through the list.

Perhaps instead of trying to reuse the lists module as it does now, which
has obviously proven to be tricky and can't take advantage of list order,
this patch should instead introduce either an ordered list find function
into the lists module as a BIF, or go all the way and introduce a BIF
ordered lists module and reimplement orddict on top of that. I'd be willing
to give either of the latter a try if the OTP team feels they would be
better approaches.
Post by unknown
Also please note there is nearly no way to totally address
orddict:is_key(b, [{a,1},{c,3},not_a_pair]).
To handle this my latest (amended) commit does a check for pairs only up
until it sees a key equal or greater than the key being requested.

--steve
Post by unknown
Post by unknown
OK, Anthony brought up yet another issue with my previous modified
orddict, which is that if someone passes an improper list instead of an
orddict, the new version acts differently than the old. Since the goal is
backwards compatibility with improved performance, I've added a fourth
commit that deals with these issues.
Post by unknown
Notably, the original orddict, if passed an improper list or a list with
elements that are not 2-tuples, will still work properly if the key being
dealt with occurs before the problematic part of the list. For example,
Post by unknown
{ok,2} = orddict:find(1,[{1,2},{2,3,4}]).
I believe the latest commit addresses this and the previous comments,
https://github.com/vinoski/otp/commit/a261aae36f617e39c5e53dc8a16fa186e4c5830c
Post by unknown
Thanks again to Anthony and Richard for their excellent feedback.
--steve
I've added a third commit to this branch to address new concerns and
suggestions from Anthony Ramine and Richard Carlsson. I appreciate your
help, guys.
Post by unknown
Please refetch.
--steve
I've added a new commit to this branch that addresses the concerns
Anthony raised. Please refetch.
Post by unknown
--steve
Great feedback, Anthony -- thanks very much. Clearly there are more
backward compatibility issues I need to address.
Post by unknown
--steve
On Mon, Sep 30, 2013 at 4:00 AM, Anthony Ramine <n.oxyde>
Hello Steve,
Great patch, I added a few comments about some edge cases.
Regards,
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
Post by unknown
Post by unknown
--steve
_______________________________________________
erlang-patches mailing list
erlang-patches
http://erlang.org/mailman/listinfo/erlang-patches
--
Anthony Ramine
--
Anthony Ramine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131004/e8c38394/attachment.html>
unknown
2013-12-19 11:23:28 UTC
Permalink
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
We like optimizations and we like simple, elegant
code, so this patch has been a tricky one to review.

In the end, we reached the conclusion that we like
the improvements of orddict:from_list/1.

We reject the optimizations of the other functions
(is_key/2, fetch/2, find/2) for the following reasons:

The optimization depends on the lists:keyfind/3 function
being implemented as a BIF. That sort of
optimization is fine in a specific application if you
have measured and found that it really makes a difference;
I would avoid it in other circumstances.

All the compatibility code further destroys the
elegance. Because of the compatibility code find/2 and
is_key/2 do not get faster if the key is missing in
the dictionary; that is unfortunate since you use find/2
if you cannot be sure that the key is present and fetch/2
if you know that the key is present.

Because the internal representation of orddict is
intentionally documented, an application can use
lists:keyfind/3 directly if needed and it does not have
to worry about about compatibility.

Regarding the from_list/1 function, I am not sure that
it is necessary to force a function_clause error.
The original code could either crash in store/3 or in
lists:foldl/3 when given illegal input. Also, from_list/1
depends on lists:reverse/1 being implemented as a BIF; I
think it would be more elegant to check and reverse at
the same time:

reverse_pairs([{_,_}=H|T], Acc) ->
reverse_pairs(T, [H|Acc]);
reverse_pairs([], Acc) -> Acc.

/Bj?rn
--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131219/9053c464/attachment.html>
unknown
2013-12-19 13:12:08 UTC
Permalink
Post by unknown
The optimization depends on the lists:keyfind/3 function
being implemented as a BIF. That sort of
optimization is fine in a specific application if you
have measured and found that it really makes a difference;
I would avoid it in other circumstances.
I think that any future change that would make lists:keyfind/3
or lists:reverse/1 significantly *slower* than they are today, would
struggle for acceptance by the community. ;-)

In practice, I would say that a great deal of code out there depends
on these functions being highly efficient - whether or not they are
implemented as BIFs.

BR,
Ulf W

Ulf Wiger, Co-founder & Developer Advocate, Feuerlabs Inc.
http://feuerlabs.com



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131219/4b97fd9b/attachment.html>
unknown
2013-12-19 14:41:54 UTC
Permalink
Hi Bj?rn, thanks for the explanation. Is there any more work I need to do
for this patch?

--steve
Post by unknown
Post by unknown
This patch improves the performance of some orddict functions by
https://github.com/erlang/otp/pull/91
The commit message at the link above is very detailed and includes
performance measurements, so please read it to know more about the changes.
We like optimizations and we like simple, elegant
code, so this patch has been a tricky one to review.
In the end, we reached the conclusion that we like
the improvements of orddict:from_list/1.
We reject the optimizations of the other functions
The optimization depends on the lists:keyfind/3 function
being implemented as a BIF. That sort of
optimization is fine in a specific application if you
have measured and found that it really makes a difference;
I would avoid it in other circumstances.
All the compatibility code further destroys the
elegance. Because of the compatibility code find/2 and
is_key/2 do not get faster if the key is missing in
the dictionary; that is unfortunate since you use find/2
if you cannot be sure that the key is present and fetch/2
if you know that the key is present.
Because the internal representation of orddict is
intentionally documented, an application can use
lists:keyfind/3 directly if needed and it does not have
to worry about about compatibility.
Regarding the from_list/1 function, I am not sure that
it is necessary to force a function_clause error.
The original code could either crash in store/3 or in
lists:foldl/3 when given illegal input. Also, from_list/1
depends on lists:reverse/1 being implemented as a BIF; I
think it would be more elegant to check and reverse at
reverse_pairs([{_,_}=H|T], Acc) ->
reverse_pairs(T, [H|Acc]);
reverse_pairs([], Acc) -> Acc.
/Bj?rn
--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131219/71f00c38/attachment.html>
unknown
2013-12-19 15:11:18 UTC
Permalink
Post by unknown
Hi Bj?rn, thanks for the explanation. Is there any more work I need to do
for this patch?
Yes, we are still in orddict:from_list/1, so it
would be nice if you could produce a single
commit with the update of from_list/1.

/Bj?rn
--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131219/2ab8c2f4/attachment-0001.html>
unknown
2013-12-20 01:05:28 UTC
Permalink
I've modified the branch as requested.

https://github.com/erlang/otp/pull/91

--steve
Post by unknown
Post by unknown
Hi Bj?rn, thanks for the explanation. Is there any more work I need to do
for this patch?
Yes, we are still in orddict:from_list/1, so it
would be nice if you could produce a single
commit with the update of from_list/1.
/Bj?rn
--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20131219/f548c771/attachment.html>
unknown
2014-01-08 14:21:12 UTC
Permalink
Post by unknown
I've modified the branch as requested.
https://github.com/erlang/otp/pull/91
Thanks! I have addet it to our daily builds for testing.

/Bj?rn
--
Bj?rn Gustavsson, Erlang/OTP, Ericsson AB
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-patches/attachments/20140108/e7f0eff5/attachment.html>
Loading...