Facebook open sources elf opengo #1311

kityanhem · 2018-05-02T19:08:16Z

https://research.fb.com/facebook-open-sources-elf-opengo/

ELF OpenGo
ELF OpenGo is a reimplementation of AlphaGoZero / AlphaZero. It was trained on 2,000 GPUs over a two week period, and has achieved high performance.

"ELF OpenGo has been successful playing against both other open source bots and human Go players. We played and won 200 games against LeelaZero (158603eb, Apr. 25, 2018), the strongest publicly available bot, using its default settings and no pondering. We also achieved a 14 win, 0 loss record against four of the top 30 world-ranked human Go players."

"We thank the LeelaZero team for their high quality work, and our hope is that open-sourcing our bot can similarly benefit community initiatives like LeelaZero"

Github: https://github.com/pytorch/elf

bochen2027 · 2018-05-02T19:11:32Z

Google, its your turn now!

We want AGZ weights asap...

One benefit of pushing forward open bots that are superhuman is it may force the hand of others as well, now that Fb has open sourced their bot that they claim is stronger than current LZ, maybe Deepmind will come back for thirds to get one last PR hooray for Google by open sourcing their AGZ weights to spit in fb's face. open competition is good for go. This is truly the end of a human era.

pytorch/ELF#1

//////////
//////////

Possible hidden meaning?

and a day later:

https://www.theguardian.com/technology/2016/jan/28/go-playing-facebook-spoil-googles-ai-deepmind

I don't think "the" development is what people think it is refering to.

https://www.reddit.com/r/cbaduk/comments/81ri8b/so_many_strong_networksais_on_cgos/dv4pzo5/

I kinda predicted this a while ago actually:

//////////
//////////

So..... overall ELF is at least >400+ Elo stronger than current best LZ network up to 800 elo. But it appears to have some latent issues such as being even more prone to ladder fallability and other certain fragilities at less than superhigh playouts, which ironically are things that aren't as prelevant in the current LZ net arch. By using the approach of hybrid mix 50% LZ/ 50% ELF self-games and continuing to train the LZ net but using ELF as a part-time strength-gainer games-generator (at least until which point LZ catches up to ELF or even surpasses it) may patch the current weak spots in both net archs and set up a framework that is repeatable into the future, such as if AGZ weights get released or facebook teams releases a (new) and even stronger ELF OpenGo network sometime in the future, etc this can be repeated again. And it gives the LZ project the chance to figure out how to resolve two of the biggest issues of superhuman Go AI, that is ladders, large group deaths, and things like high handicap and variable komi and whatnots. Since the current 15b still has enough capacity left, the adding of ELF games will allow LZ to catch up enough to give the devs a way to figure out how to lower gating or remove gating etc. For sure the continuation of "things as usual" for the LZ project is no longer an option, if for no other reason than the strength gap being too great for volunteer clients to still be willing to go the business as usual path, and as gcp stated, its not an option to train the ELF network directly either. (not to mention that switching directly to the ELF network would be in essence killing off the Leela Zero itself) So looks like the roadmap going forward is to train LZ hybridized with ELF, so it seems like there is a way to make good on the fb post about ELF helping project like Leela Zero after all!

http://archive.is/JS3ET

MartinDevelopment · 2018-05-02T19:24:59Z

Someone should run a test and see how strong that bot is compared to our own, they say they won 200 games but we don't know out of homany.

killerducky · 2018-05-02T19:26:21Z

We played and won 200 games against LeelaZero

They won all the games.

bochen2027 · 2018-05-02T19:30:24Z

Quick someone compile and test how strong it is on a single playout. Then we have a better idea of its true strength.

kityanhem · 2018-05-02T19:32:34Z

20 blocks?

WYVYW · 2018-05-02T19:37:54Z

"A go bot that has attained professional level in two weeks." The will hopefully release some more info.
How much training games were needed for example...

bochen2027 · 2018-05-02T19:41:04Z

These tests apparently gave Fb's bot about twice the number of PO that it gave to LZ. So this is like what? a 120 elo advantage to Fb0? Anyway he responded with some more info and apparently they also used V100 for the games. Curious to know the wr% against LZ when both using single po.

diadorak · 2018-05-02T20:12:36Z

Their network is 20x224. So 80,000 PO would be equivalent to 145,000 on LZ's 15x192 network?

bochen2027 · 2018-05-02T20:15:11Z

@diadorak interesting, yeah in that case, he mentioned in the match LZ was getting average of 43000 rollouts per move...

bochen2027 · 2018-05-02T20:20:57Z

Feels like they rushed this release out the door, (didn't have the 200 sgf, couldn't provide training data, many simple things couldn't be currently answered) perhaps seeing that LZ had already gotten superhuman and it was basically "now or never"... Fingers crossed Google steps in before they/fb get a chance to refine and trump them by dumping all AGZ code, net, data. etc

odeint · 2018-05-02T20:50:01Z

It was trained on 2,000 GPUs over a two week period.

That's really impressive. We've had about 200 clients on average over 5 months. A lot of these are weaker machines but lets low-ball it and say equivalent to 100 GPUs over 20 weeks, so about 2000 GPU-weeks as well.

That means they didn't just throw compute at the problem but are even more efficient with their learning!

Friday9i · 2018-05-02T20:57:55Z

@odeint, they had 4000 GPU x weeks vs 2000 for LZ if your estimation is correct, so not a surprise their network is much stronger: imagine LZ in 4 or 5 months at the current rythm!
And I guess their GPU are on average more than 2x more powerful than LZ contributors (many have old and weak GPUs...), I'd say at least 4 or 5 times more powerful.
All in all, their computing power was much higher than LZ's up to now IMHO.

killerducky · 2018-05-02T20:59:08Z

For playing LZ they tested equal time for each bot, which is of course fair. But their software is much faster. An LZ implementation that used Nvidia CUDNN might be competitive again. Same for training, they could go faster due to using Nvidia software.

fishcu · 2018-05-02T21:00:09Z

Where are you guys pulling this info from? I can't even find a link to the model they trained. Did they really release it already? :D

MartinDevelopment · 2018-05-02T21:06:15Z

@fishcu Here it is. https://github.com/pytorch/ELF/releases/download/pretrained-go-19x19-v0/pretrained-go-19x19-v0.bin

Ka-zam · 2018-05-02T21:40:08Z

I'd say at least 4 or 5 times more powerful.

Seems reasonable. There's nothing inherently wrong with LZ approach, on the contrary it's much more general and robust.

They released it under BSD style license though. That's a big one.

ghost · 2018-05-02T22:04:42Z

For playing LZ they tested equal time for each bot, which is of course fair. But their software is much faster. An LZ implementation that used Nvidia CUDNN might be competitive again. Same for training, they could go faster due to using Nvidia software.

They tested equal time, but I think they may have misconfigured it, AZ was only doing enough nodes for the equivalent of using only 3 seconds on some moves, rather than using all of the time available.

fell111 · 2018-05-02T22:31:06Z

Do they release the weight file? Can we convert to LZ's weight file and have a try?

bochen2027 · 2018-05-02T22:31:42Z

anyone got it to compile? what is its strength on 1 po.

They tested in 14 games with pros on even, but Golaxy won 28 out of 30 games against pros with giving them one stone, and Golaxy was less than half a stone stronger than LZ. But yet they managed to win all 200 games against LZ? So what gives? Something is not adding up

Its amazing they claimed not enough infrastructure yet to host the training data when asked for it, seriously? they are facebook. plus they said they would have to look for the 200 sgf games they played against LZ once a team member came back from vacation. Like for reals?! I really hope to be proven wrong and that fb's OpenGo is very much stronger than anything we have publicly right now.

Was this the US president named bots on cgos?

john45678 · 2018-05-02T23:38:16Z

The four Korean players it played against, Kim Jiseok (Goratings) 3583, Shin Jinseo 3577, Park Yeonghun 3482 and Choi Cheolhan 3467. 14-0 against them, simply amazing result.

yuandong-tian · 2018-05-02T23:58:35Z

Hi all,

I am the first author of ELF OpenGo. Thanks for your discussions here!

After seeing the comments posted here, we revisited our game logs and found that we were incorrect in our assumption that LeelaZero used a fixed 50s per move schedule. Under our settings LZ uses a variable-time schedule, which typically takes 50 seconds at the beginning of the game and then gradually decays to 17 seconds. So the conclusion is that:

LZ does not use a constant time of 50 seconds per move. As the game progresses, LZ generally uses less time per move putting it at a disadvantage.
The default setting of LZ is not 3200 rollouts per move. In the log we see that LZ uses a significantly larger number of rollouts.
Our bot always uses a constant number of 80k rollouts per move, which takes about ~50 seconds in V100.

The exact configuration we used to run LeelaZero is:

Commit: 97c2f81

Command line:

leelaz --weights lz_weights -g --noponder —gpu 0

This is what we mean by “default setting without pondering” in our research blogs.

The weights are Apr. 25, 192x15, 158603eb (http://zero.sjeng.org/networks/158603eb61a1e5e9dcd1aee157d813063292ae68fbc8fcd24502ae7daf4d7948.gz). This is the most recent public weight file we could find by the time we did the experiments.

We would like your help in running a fair comparison as we originally intended. What settings could we use to ensure LZ uses a fixed time of 50s per move? We're happy to rerun the experiments. Thanks for all the interest!

roy7 · 2018-05-03T00:01:39Z

Someone speak up if I'm wrong, but if you're using gomill or a similar tool that can send a gtp command before the game starts, send LZ a "time_settings 0 50 1". I believe this means a 50 second byoyomi period in which to place 1 stone. Thus, it should move within 50 seconds every move.

Marcin1960 · 2018-05-22T12:35:15Z

@marcocalignano "keep it's own mistakes until finally learn it anyway the ELF way?"

I do not think there is only one ELF way. ELF has just stronger hardware so it is ahead as AlphaGo is.

At least I hope so, otherwise LZ project could become quite boring. Finding your own way is interesting and exciting. Same applies to humans.

When LZ was in low kyu level, would you also proposed abandoning it?

tapsika · 2018-05-22T13:03:21Z

If I could play with a much stronger player that also explain me the mistakes why I shouldn't want to do that more often?

But note that NN learning is more rigid than that (cf. training from human games). Using ELF games with ELF-based search means ELF evaluations are forced into LZ network. And there ARE a few characteristic differences between programs, even Zero-style ones, and LZ undoubtedly gets closer to ELF.

I'm not saying this is bad though, but only makes sense if LZ can significantly surpass ELF in the end. Otherwise it would be better to leave the choice to the users, who could simply use ELF as is.

dzhurak · 2018-05-22T13:22:39Z

If we train LZ network only on games played by ELF network it will be just supervised learning. No reinforcement part at all. No need to use game window. So 50-50 is just fair I guess.

roy7 · 2018-05-22T13:56:07Z

I don't fully trust ELF weights in LZ. It's much stronger, yes, but it can't really handicap well and the value head is so sharp I'm suspicious of that as well. :) I think GCP plans to cap % of ELF games at 50% of the training window, then scale it back or remove them once LZ is as strong as ELF. But along the way some ELF games mixed in should help LZ learn a bit faster is the hope.

Marcin1960 · 2018-05-22T14:00:08Z

@tapsika "I'm not saying this is bad though, but only makes sense if LZ can significantly surpass ELF in the end. Otherwise it would be better to leave the choice to the users, who could simply use ELF as is."

It would be wonderful if ELF people joined our client base for couple weeks with their 2000 of powerful machines .

I suspect LZ would skyrocket without using more ELF data.

marcocalignano · 2018-05-22T14:08:57Z

@roy7 But can you really say that the ELF games are 50% of the total games or is it 50% for the faster client only?

roy7 · 2018-05-22T14:39:22Z

Well the window is 500K games as far as I recall, so once we've generated 250K ELF games we'll stop generating new ones.

As far as task allocation I don't see how client speed matters. Some % (I don't recall exactly) of selfplay are given ELF weights. So that % of the total games received from clients will also be that %.

ThorAvaTahr · 2018-05-22T14:39:28Z

i support Roy in that I am suspicious of the ELF weights. It appears weaker at ladders and worse at handicap games. Also we have not witnessed the ELF learning process, so it remains somewhat a black box (though magic :) ), there could be some unknown caveats that we want to avoid in LZ. Also the sharp value head gives reason to be cautious, though of course it could just mean that it is better in understanding the board evaluation (only a small advantage would be evaluated strongly, assuming both players play strongly).

So in short ELF, though a lot stronger, appears to be a little less robust. It could off course be that in general there is a trade-off between robustness and strength for a given network size.

In addition, if going to higher percentages of ELF games, why not reboot from ELF altogether?

marcocalignano · 2018-05-22T14:53:13Z

So it could happen since the window shifts that the percentage changes depending of where the ELF games are concentrated.

gcp · 2018-05-22T15:20:24Z

In addition, if going to higher percentages of ELF games, why not reboot from ELF altogether?

If we have like 250k ELF games you might be able to train a super-ELF. The idea of having both in the training window is to get some higher quality games to accelerate Leela faster, while maybe avoiding some ELF pitfalls by still having Leela games. It's also important to have the Leela games so if it's doing some kind of over-fitting or learning something exploitable, it can see it's own mistakes and correct them in the next training pass, which the pure ELF data cannot.

Still it's very much an ad hock construction. But it seems OK so far? Rating graph is nicely steep.

once we've generated 250K ELF games we'll stop generating new ones.

I wouldn't allow more than 50% of the window, but it may be useful to generate more, if someone wants to have a go at making super-ELF. That would push the public state of the art.

john45678 · 2018-06-03T22:47:29Z

"ELF is getting trashed!"
ELF is pulling away a bit now, but it seems that the student has learned well.

WYVYW · 2018-06-03T23:00:19Z

It seems LZ needs much more ladder games to trash ELF. I watched to wins, Those wins were ladder games.
To improve LZ ladderability we should let play it against ELF instead of selfplay or hardcoding.

Marcin1960 · 2018-06-03T23:03:30Z

@WYVYW

LZ learned the ladders before ELF. Are you saying LZ is forgetting them now?

WYVYW · 2018-06-03T23:05:07Z

No, I meant that LZ is better in Ladder games, and could maybe even be better at ladders.

kityanhem · 2018-06-04T01:22:27Z

Well, maybe ten more new networks appear. The winrate of ELF will be around 75% :))

l1t1 · 2018-06-04T02:22:06Z

348 : 56 (86.14%) now
380 : 44 (89.62%) last

jillybob · 2018-06-04T02:24:22Z

Now: 95% Confidence Interval: 0.862 ± 0.0337 (0.828 to 0.896)
Last: 95% Confidence Interval: 0.896 ± 0.029 (0.867 to 0.925)

Therefore the improvement in score is not statistically significant.

Nazgand · 2018-06-10T06:50:36Z

As FB ELF games are being used to train Leela Zero, maybe something like pair go games would be worthwhile. E.g. Every move, white has a p_w probability of playing a move generated by best_network and (1-p_w) probability of playing a move generated by ELF. Every move, black has a p_b probability of playing a move generated by best_network and (1-p_b) probability of playing a move generated by ELF.
Games could be played with various choices of (p_w,p_b). E.g. {p_w,p_b} as any subset of {1,1/4,2/4,3/4,1}.
I am not aware of a convincing argument for why this should be done; this just seems like a more effective way to meld 2 styles of play.

alreadydone · 2018-06-10T23:16:51Z

Just some more info about ELF that I somehow missed 1+ month ago.
"The project started from January and went for 4- months. Initially we implemented an AZ version, and the result wasn't very good; then we implemented AGZ, and it was just so-so; not until a bug was found and fixed in PyTorch did the AI rushed like a wild horse in strength; quite spectacular. Later for higher speed we switched to AZ again. If not for this bug, the release could be two months earlier."

It's not clear to me whether higher speed refers to that lower playouts cost less time or that no-gating led to faster progress. The AZ approach includes t=1 all game but @yuandong-tian's #1311 (comment) indicates that they only did t=1 for first 30 moves (but maybe he's referring to the AGZ approach only?).

(The bug pytorch/pytorch#5801 was that the gradient for KL divergence/cross entropy was incorrectly computed on GPUs, so if you do VAE/GAN which uses KLdiv, take care!)
(The article also mentioned they tested half-precision feed-forward net. For our attempt see #1273.)

shubho · 2018-06-10T23:20:24Z

We use float16 for all inference
PyTorch bug is fixed in master and I think also in the last release

petgo3 · 2018-06-11T19:08:18Z

Just for information: LZ (d0187996) just won 3 consecutive games on KGS against elf. I know this is not significant. But if we take in account, that elf uses GTX 1080 ti, while i use GTX 1060, it's at least remarkable :-)

http://kgs.gosquares.net/index.rhtml.en?id=petgo3&id2=ELF&r=0
(last game not registered yet)

MartinVanEs · 2018-06-13T19:04:04Z

Can anyone explain me why we use 55% winrate to be promoted? I
If a new network has a 52% winrate against the old network. According to me, the new network is most probably stronger than te old network. (even if that probability is les than 95% significant). So why keep the old ,most probably weaker, network and not promote the new network?

alreadydone · 2018-06-13T20:19:04Z

@MartinVanEs See #1524 (comment) for discussions.

petgo3 · 2018-06-15T05:58:15Z

With fulltuned net 4634d903 (not promoted 52.09%) LZ catches up to Elf on KGS.
Started using the net 2018/06/14 21:13: So far 4:4, all as white won, one as black.
http://kgs.gosquares.net/index.rhtml.en?id=petgo3&id2=ELF&r=0

l1t1 · 2018-06-15T06:19:33Z

@petgo3 what is the meaning of "DF2 version 1.0" in "ELF [-]: GTP Engine for ELF (white): DF2 version 1.0"?

Ishinoshita · 2018-08-17T07:40:48Z

@yuandong-tian : regarding randomness in self-play games, are you using t=1 for the first 30 moves then t=0, as in AGZ paper ? Or another scheme ?

l1t1 · 2018-08-30T02:17:40Z

500K self play games

bochen2027 mentioned this issue May 2, 2018

is facebook going to release a OpenChess tomorrow? glinscott/leela-chess#502

Closed

Tsun-hung mentioned this issue Jul 4, 2018

Batching #1601

Closed

odeint mentioned this issue Sep 5, 2018

Why can't leela zero break the net? #1801

Closed

Ishinoshita mentioned this issue Dec 19, 2018

AlphaZero paper peer-reviewed is available #2069

Open

Facebook open sources elf opengo #1311

Facebook open sources elf opengo #1311

Comments

kityanhem commented May 2, 2018 • edited

bochen2027 commented May 2, 2018 • edited

I don't think "the" development is what people think it is refering to.

MartinDevelopment commented May 2, 2018 • edited

killerducky commented May 2, 2018

bochen2027 commented May 2, 2018

kityanhem commented May 2, 2018

WYVYW commented May 2, 2018

bochen2027 commented May 2, 2018 • edited

diadorak commented May 2, 2018

bochen2027 commented May 2, 2018

bochen2027 commented May 2, 2018 • edited

odeint commented May 2, 2018

Friday9i commented May 2, 2018 • edited

killerducky commented May 2, 2018

fishcu commented May 2, 2018

MartinDevelopment commented May 2, 2018 • edited

Ka-zam commented May 2, 2018

ghost commented May 2, 2018 • edited by ghost

fell111 commented May 2, 2018

bochen2027 commented May 2, 2018 • edited

john45678 commented May 2, 2018

yuandong-tian commented May 2, 2018 • edited

roy7 commented May 3, 2018

Marcin1960 commented May 22, 2018 • edited

tapsika commented May 22, 2018 • edited

dzhurak commented May 22, 2018

roy7 commented May 22, 2018

Marcin1960 commented May 22, 2018 • edited

marcocalignano commented May 22, 2018

roy7 commented May 22, 2018

ThorAvaTahr commented May 22, 2018 • edited

marcocalignano commented May 22, 2018

gcp commented May 22, 2018 • edited

john45678 commented Jun 3, 2018

WYVYW commented Jun 3, 2018 • edited

Marcin1960 commented Jun 3, 2018

WYVYW commented Jun 3, 2018

kityanhem commented Jun 4, 2018 • edited

l1t1 commented Jun 4, 2018

jillybob commented Jun 4, 2018 • edited

Nazgand commented Jun 10, 2018

alreadydone commented Jun 10, 2018 • edited

shubho commented Jun 10, 2018

petgo3 commented Jun 11, 2018

MartinVanEs commented Jun 13, 2018

alreadydone commented Jun 13, 2018

petgo3 commented Jun 15, 2018

l1t1 commented Jun 15, 2018

Ishinoshita commented Aug 17, 2018

l1t1 commented Aug 30, 2018

kityanhem commented May 2, 2018 •

edited

bochen2027 commented May 2, 2018 •

edited

MartinDevelopment commented May 2, 2018 •

edited

bochen2027 commented May 2, 2018 •

edited

bochen2027 commented May 2, 2018 •

edited

Friday9i commented May 2, 2018 •

edited

MartinDevelopment commented May 2, 2018 •

edited

ghost commented May 2, 2018 •

edited by ghost

bochen2027 commented May 2, 2018 •

edited

yuandong-tian commented May 2, 2018 •

edited

Marcin1960 commented May 22, 2018 •

edited

tapsika commented May 22, 2018 •

edited

Marcin1960 commented May 22, 2018 •

edited

ThorAvaTahr commented May 22, 2018 •

edited

gcp commented May 22, 2018 •

edited

WYVYW commented Jun 3, 2018 •

edited

kityanhem commented Jun 4, 2018 •

edited

jillybob commented Jun 4, 2018 •

edited

alreadydone commented Jun 10, 2018 •

edited