Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Facebook open sources elf opengo #1311

Open
kityanhem opened this issue May 2, 2018 · 416 comments
Open

Facebook open sources elf opengo #1311

kityanhem opened this issue May 2, 2018 · 416 comments

Comments

@kityanhem
Copy link

kityanhem commented May 2, 2018

https://research.fb.com/facebook-open-sources-elf-opengo/

ELF OpenGo
ELF OpenGo is a reimplementation of AlphaGoZero / AlphaZero. It was trained on 2,000 GPUs over a two week period, and has achieved high performance.

"ELF OpenGo has been successful playing against both other open source bots and human Go players. We played and won 200 games against LeelaZero (158603eb, Apr. 25, 2018), the strongest publicly available bot, using its default settings and no pondering. We also achieved a 14 win, 0 loss record against four of the top 30 world-ranked human Go players."

"We thank the LeelaZero team for their high quality work, and our hope is that open-sourcing our bot can similarly benefit community initiatives like LeelaZero"

Github: https://github.com/pytorch/elf

@bochen2027
Copy link

bochen2027 commented May 2, 2018

Google, its your turn now!

We want AGZ weights asap...

One benefit of pushing forward open bots that are superhuman is it may force the hand of others as well, now that Fb has open sourced their bot that they claim is stronger than current LZ, maybe Deepmind will come back for thirds to get one last PR hooray for Google by open sourcing their AGZ weights to spit in fb's face. open competition is good for go. This is truly the end of a human era.

pytorch/ELF#1

//////////
//////////

Possible hidden meaning?

image

and a day later:

https://www.theguardian.com/technology/2016/jan/28/go-playing-facebook-spoil-googles-ai-deepmind

I don't think "the" development is what people think it is refering to.

https://www.reddit.com/r/cbaduk/comments/81ri8b/so_many_strong_networksais_on_cgos/dv4pzo5/

I kinda predicted this a while ago actually:

image

//////////
//////////

So..... overall ELF is at least >400+ Elo stronger than current best LZ network up to 800 elo. But it appears to have some latent issues such as being even more prone to ladder fallability and other certain fragilities at less than superhigh playouts, which ironically are things that aren't as prelevant in the current LZ net arch. By using the approach of hybrid mix 50% LZ/ 50% ELF self-games and continuing to train the LZ net but using ELF as a part-time strength-gainer games-generator (at least until which point LZ catches up to ELF or even surpasses it) may patch the current weak spots in both net archs and set up a framework that is repeatable into the future, such as if AGZ weights get released or facebook teams releases a (new) and even stronger ELF OpenGo network sometime in the future, etc this can be repeated again. And it gives the LZ project the chance to figure out how to resolve two of the biggest issues of superhuman Go AI, that is ladders, large group deaths, and things like high handicap and variable komi and whatnots. Since the current 15b still has enough capacity left, the adding of ELF games will allow LZ to catch up enough to give the devs a way to figure out how to lower gating or remove gating etc. For sure the continuation of "things as usual" for the LZ project is no longer an option, if for no other reason than the strength gap being too great for volunteer clients to still be willing to go the business as usual path, and as gcp stated, its not an option to train the ELF network directly either. (not to mention that switching directly to the ELF network would be in essence killing off the Leela Zero itself) So looks like the roadmap going forward is to train LZ hybridized with ELF, so it seems like there is a way to make good on the fb post about ELF helping project like Leela Zero after all!

http://archive.is/JS3ET

@MartinDevelopment
Copy link

MartinDevelopment commented May 2, 2018

Someone should run a test and see how strong that bot is compared to our own, they say they won 200 games but we don't know out of homany.

@killerducky
Copy link
Contributor

We played and won 200 games against LeelaZero

They won all the games.

@bochen2027
Copy link

Quick someone compile and test how strong it is on a single playout. Then we have a better idea of its true strength.

@kityanhem
Copy link
Author

image
20 blocks?

@WYVYW
Copy link

WYVYW commented May 2, 2018

"A go bot that has attained professional level in two weeks." The will hopefully release some more info.
How much training games were needed for example...

@bochen2027
Copy link

bochen2027 commented May 2, 2018

These tests apparently gave Fb's bot about twice the number of PO that it gave to LZ. So this is like what? a 120 elo advantage to Fb0? Anyway he responded with some more info and apparently they also used V100 for the games. Curious to know the wr% against LZ when both using single po.

@diadorak
Copy link

diadorak commented May 2, 2018

Their network is 20x224. So 80,000 PO would be equivalent to 145,000 on LZ's 15x192 network?

@bochen2027
Copy link

@diadorak interesting, yeah in that case, he mentioned in the match LZ was getting average of 43000 rollouts per move...

@bochen2027
Copy link

bochen2027 commented May 2, 2018

Feels like they rushed this release out the door, (didn't have the 200 sgf, couldn't provide training data, many simple things couldn't be currently answered) perhaps seeing that LZ had already gotten superhuman and it was basically "now or never"... Fingers crossed Google steps in before they/fb get a chance to refine and trump them by dumping all AGZ code, net, data. etc

@odeint
Copy link

odeint commented May 2, 2018

It was trained on 2,000 GPUs over a two week period.

That's really impressive. We've had about 200 clients on average over 5 months. A lot of these are weaker machines but lets low-ball it and say equivalent to 100 GPUs over 20 weeks, so about 2000 GPU-weeks as well.

That means they didn't just throw compute at the problem but are even more efficient with their learning!

@Friday9i
Copy link

Friday9i commented May 2, 2018

@odeint, they had 4000 GPU x weeks vs 2000 for LZ if your estimation is correct, so not a surprise their network is much stronger: imagine LZ in 4 or 5 months at the current rythm!
And I guess their GPU are on average more than 2x more powerful than LZ contributors (many have old and weak GPUs...), I'd say at least 4 or 5 times more powerful.
All in all, their computing power was much higher than LZ's up to now IMHO.

@killerducky
Copy link
Contributor

For playing LZ they tested equal time for each bot, which is of course fair. But their software is much faster. An LZ implementation that used Nvidia CUDNN might be competitive again. Same for training, they could go faster due to using Nvidia software.

@fishcu
Copy link

fishcu commented May 2, 2018

Where are you guys pulling this info from? I can't even find a link to the model they trained. Did they really release it already? :D

@MartinDevelopment
Copy link

MartinDevelopment commented May 2, 2018

@fishcu Here it is. https://github.com/pytorch/ELF/releases/download/pretrained-go-19x19-v0/pretrained-go-19x19-v0.bin

@Ka-zam
Copy link

Ka-zam commented May 2, 2018

I'd say at least 4 or 5 times more powerful.

Seems reasonable. There's nothing inherently wrong with LZ approach, on the contrary it's much more general and robust.

They released it under BSD style license though. That's a big one.

@ghost
Copy link

ghost commented May 2, 2018

For playing LZ they tested equal time for each bot, which is of course fair. But their software is much faster. An LZ implementation that used Nvidia CUDNN might be competitive again. Same for training, they could go faster due to using Nvidia software.

They tested equal time, but I think they may have misconfigured it, AZ was only doing enough nodes for the equivalent of using only 3 seconds on some moves, rather than using all of the time available.

@fell111
Copy link

fell111 commented May 2, 2018

Do they release the weight file? Can we convert to LZ's weight file and have a try?

@bochen2027
Copy link

bochen2027 commented May 2, 2018

anyone got it to compile? what is its strength on 1 po.

They tested in 14 games with pros on even, but Golaxy won 28 out of 30 games against pros with giving them one stone, and Golaxy was less than half a stone stronger than LZ. But yet they managed to win all 200 games against LZ? So what gives? Something is not adding up

Its amazing they claimed not enough infrastructure yet to host the training data when asked for it, seriously? they are facebook. plus they said they would have to look for the 200 sgf games they played against LZ once a team member came back from vacation. Like for reals?! I really hope to be proven wrong and that fb's OpenGo is very much stronger than anything we have publicly right now.

Was this the US president named bots on cgos?

@john45678
Copy link

The four Korean players it played against, Kim Jiseok (Goratings) 3583, Shin Jinseo 3577, Park Yeonghun 3482 and Choi Cheolhan 3467. 14-0 against them, simply amazing result.

@yuandong-tian
Copy link

yuandong-tian commented May 2, 2018

Hi all,

I am the first author of ELF OpenGo. Thanks for your discussions here!

After seeing the comments posted here, we revisited our game logs and found that we were incorrect in our assumption that LeelaZero used a fixed 50s per move schedule. Under our settings LZ uses a variable-time schedule, which typically takes 50 seconds at the beginning of the game and then gradually decays to 17 seconds. So the conclusion is that:

  1. LZ does not use a constant time of 50 seconds per move. As the game progresses, LZ generally uses less time per move putting it at a disadvantage.

  2. The default setting of LZ is not 3200 rollouts per move. In the log we see that LZ uses a significantly larger number of rollouts.

  3. Our bot always uses a constant number of 80k rollouts per move, which takes about ~50 seconds in V100.

The exact configuration we used to run LeelaZero is:

Commit: 97c2f81

Command line:

leelaz --weights lz_weights -g --noponder —gpu 0

This is what we mean by “default setting without pondering” in our research blogs.

The weights are Apr. 25, 192x15, 158603eb (http://zero.sjeng.org/networks/158603eb61a1e5e9dcd1aee157d813063292ae68fbc8fcd24502ae7daf4d7948.gz). This is the most recent public weight file we could find by the time we did the experiments.

We would like your help in running a fair comparison as we originally intended. What settings could we use to ensure LZ uses a fixed time of 50s per move? We're happy to rerun the experiments. Thanks for all the interest!

@roy7
Copy link
Collaborator

roy7 commented May 3, 2018

Someone speak up if I'm wrong, but if you're using gomill or a similar tool that can send a gtp command before the game starts, send LZ a "time_settings 0 50 1". I believe this means a 50 second byoyomi period in which to place 1 stone. Thus, it should move within 50 seconds every move.

@Marcin1960
Copy link

Marcin1960 commented May 22, 2018

@marcocalignano "keep it's own mistakes until finally learn it anyway the ELF way?"

I do not think there is only one ELF way. ELF has just stronger hardware so it is ahead as AlphaGo is.

At least I hope so, otherwise LZ project could become quite boring. Finding your own way is interesting and exciting. Same applies to humans.

When LZ was in low kyu level, would you also proposed abandoning it?

@tapsika
Copy link

tapsika commented May 22, 2018

If I could play with a much stronger player that also explain me the mistakes why I shouldn't want to do that more often?

But note that NN learning is more rigid than that (cf. training from human games). Using ELF games with ELF-based search means ELF evaluations are forced into LZ network. And there ARE a few characteristic differences between programs, even Zero-style ones, and LZ undoubtedly gets closer to ELF.

I'm not saying this is bad though, but only makes sense if LZ can significantly surpass ELF in the end. Otherwise it would be better to leave the choice to the users, who could simply use ELF as is.

@dzhurak
Copy link

dzhurak commented May 22, 2018

If we train LZ network only on games played by ELF network it will be just supervised learning. No reinforcement part at all. No need to use game window. So 50-50 is just fair I guess.

@roy7
Copy link
Collaborator

roy7 commented May 22, 2018

I don't fully trust ELF weights in LZ. It's much stronger, yes, but it can't really handicap well and the value head is so sharp I'm suspicious of that as well. :) I think GCP plans to cap % of ELF games at 50% of the training window, then scale it back or remove them once LZ is as strong as ELF. But along the way some ELF games mixed in should help LZ learn a bit faster is the hope.

@Marcin1960
Copy link

Marcin1960 commented May 22, 2018

@tapsika "I'm not saying this is bad though, but only makes sense if LZ can significantly surpass ELF in the end. Otherwise it would be better to leave the choice to the users, who could simply use ELF as is."

It would be wonderful if ELF people joined our client base for couple weeks with their 2000 of powerful machines .

I suspect LZ would skyrocket without using more ELF data.

@marcocalignano
Copy link
Member

@roy7 But can you really say that the ELF games are 50% of the total games or is it 50% for the faster client only?

@roy7
Copy link
Collaborator

roy7 commented May 22, 2018

Well the window is 500K games as far as I recall, so once we've generated 250K ELF games we'll stop generating new ones.

As far as task allocation I don't see how client speed matters. Some % (I don't recall exactly) of selfplay are given ELF weights. So that % of the total games received from clients will also be that %.

@ThorAvaTahr
Copy link

ThorAvaTahr commented May 22, 2018

i support Roy in that I am suspicious of the ELF weights. It appears weaker at ladders and worse at handicap games. Also we have not witnessed the ELF learning process, so it remains somewhat a black box (though magic :) ), there could be some unknown caveats that we want to avoid in LZ. Also the sharp value head gives reason to be cautious, though of course it could just mean that it is better in understanding the board evaluation (only a small advantage would be evaluated strongly, assuming both players play strongly).

So in short ELF, though a lot stronger, appears to be a little less robust. It could off course be that in general there is a trade-off between robustness and strength for a given network size.

In addition, if going to higher percentages of ELF games, why not reboot from ELF altogether?

@marcocalignano
Copy link
Member

So it could happen since the window shifts that the percentage changes depending of where the ELF games are concentrated.

@gcp
Copy link
Member

gcp commented May 22, 2018

In addition, if going to higher percentages of ELF games, why not reboot from ELF altogether?

If we have like 250k ELF games you might be able to train a super-ELF. The idea of having both in the training window is to get some higher quality games to accelerate Leela faster, while maybe avoiding some ELF pitfalls by still having Leela games. It's also important to have the Leela games so if it's doing some kind of over-fitting or learning something exploitable, it can see it's own mistakes and correct them in the next training pass, which the pure ELF data cannot.

Still it's very much an ad hock construction. But it seems OK so far? Rating graph is nicely steep.

once we've generated 250K ELF games we'll stop generating new ones.

I wouldn't allow more than 50% of the window, but it may be useful to generate more, if someone wants to have a go at making super-ELF. That would push the public state of the art.

@john45678
Copy link

"ELF is getting trashed!"
ELF is pulling away a bit now, but it seems that the student has learned well.

@WYVYW
Copy link

WYVYW commented Jun 3, 2018

It seems LZ needs much more ladder games to trash ELF. I watched to wins, Those wins were ladder games.
To improve LZ ladderability we should let play it against ELF instead of selfplay or hardcoding.

@Marcin1960
Copy link

@WYVYW

LZ learned the ladders before ELF. Are you saying LZ is forgetting them now?

@WYVYW
Copy link

WYVYW commented Jun 3, 2018

No, I meant that LZ is better in Ladder games, and could maybe even be better at ladders.

@kityanhem
Copy link
Author

kityanhem commented Jun 4, 2018

Well, maybe ten more new networks appear. The winrate of ELF will be around 75% :))

@l1t1
Copy link

l1t1 commented Jun 4, 2018

348 : 56 (86.14%) now
380 : 44 (89.62%) last

@jillybob
Copy link

jillybob commented Jun 4, 2018

Now: 95% Confidence Interval: 0.862 ± 0.0337 (0.828 to 0.896)
Last: 95% Confidence Interval: 0.896 ± 0.029 (0.867 to 0.925)

Therefore the improvement in score is not statistically significant.

@Nazgand
Copy link
Contributor

Nazgand commented Jun 10, 2018

As FB ELF games are being used to train Leela Zero, maybe something like pair go games would be worthwhile. E.g. Every move, white has a p_w probability of playing a move generated by best_network and (1-p_w) probability of playing a move generated by ELF. Every move, black has a p_b probability of playing a move generated by best_network and (1-p_b) probability of playing a move generated by ELF.
Games could be played with various choices of (p_w,p_b). E.g. {p_w,p_b} as any subset of {1,1/4,2/4,3/4,1}.
I am not aware of a convincing argument for why this should be done; this just seems like a more effective way to meld 2 styles of play.

@alreadydone
Copy link
Contributor

alreadydone commented Jun 10, 2018

Just some more info about ELF that I somehow missed 1+ month ago.
"The project started from January and went for 4- months. Initially we implemented an AZ version, and the result wasn't very good; then we implemented AGZ, and it was just so-so; not until a bug was found and fixed in PyTorch did the AI rushed like a wild horse in strength; quite spectacular. Later for higher speed we switched to AZ again. If not for this bug, the release could be two months earlier."

It's not clear to me whether higher speed refers to that lower playouts cost less time or that no-gating led to faster progress. The AZ approach includes t=1 all game but @yuandong-tian's #1311 (comment) indicates that they only did t=1 for first 30 moves (but maybe he's referring to the AGZ approach only?).

(The bug pytorch/pytorch#5801 was that the gradient for KL divergence/cross entropy was incorrectly computed on GPUs, so if you do VAE/GAN which uses KLdiv, take care!)
(The article also mentioned they tested half-precision feed-forward net. For our attempt see #1273.)

@shubho
Copy link

shubho commented Jun 10, 2018

  1. We use float16 for all inference
  2. PyTorch bug is fixed in master and I think also in the last release

@petgo3
Copy link

petgo3 commented Jun 11, 2018

Just for information: LZ (d0187996) just won 3 consecutive games on KGS against elf. I know this is not significant. But if we take in account, that elf uses GTX 1080 ti, while i use GTX 1060, it's at least remarkable :-)

http://kgs.gosquares.net/index.rhtml.en?id=petgo3&id2=ELF&r=0
(last game not registered yet)

@MartinVanEs
Copy link

Can anyone explain me why we use 55% winrate to be promoted? I
If a new network has a 52% winrate against the old network. According to me, the new network is most probably stronger than te old network. (even if that probability is les than 95% significant). So why keep the old ,most probably weaker, network and not promote the new network?

@alreadydone
Copy link
Contributor

@MartinVanEs See #1524 (comment) for discussions.

@petgo3
Copy link

petgo3 commented Jun 15, 2018

With fulltuned net 4634d903 (not promoted 52.09%) LZ catches up to Elf on KGS.
Started using the net 2018/06/14 21:13: So far 4:4, all as white won, one as black.
http://kgs.gosquares.net/index.rhtml.en?id=petgo3&id2=ELF&r=0

@l1t1
Copy link

l1t1 commented Jun 15, 2018

@petgo3 what is the meaning of "DF2 version 1.0" in "ELF [-]: GTP Engine for ELF (white): DF2 version 1.0"?

@Tsun-hung Tsun-hung mentioned this issue Jul 4, 2018
@Ishinoshita
Copy link

@yuandong-tian : regarding randomness in self-play games, are you using t=1 for the first 30 moves then t=0, as in AGZ paper ? Or another scheme ?

@l1t1
Copy link

l1t1 commented Aug 30, 2018

500K self play games

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests