Skip to content

Instantly share code, notes, and snippets.

@spencerduncan
Created November 5, 2016 18:03
Show Gist options
  • Save spencerduncan/c1d383819e032ab61cb266d8d283a67f to your computer and use it in GitHub Desktop.
Save spencerduncan/c1d383819e032ab61cb266d8d283a67f to your computer and use it in GitHub Desktop.
Note from SC2 AI panel - Blizzcon 2016
Basics of deep learning
______________
/ observations\
| |
V |
agent environment
| ^
| actions |
\______________/
DQN
Inputs -> (convolutional layer -> activation functions)xN -> (fully connected layer -> activation functions)xN -> outputs
General problem solving through reinforment learning
Agent recieves only observations and rewards
Use of games as a research environment for machine learning
Atari 2600 games
TORCs(driving game)
Labyrinth (3D maze game)
We won at Go, so lets do Starcraft, because its cool and difficult
Fog of War
Huge Action Space
Economy
Realtime
Long Pay-Off
3 Asymmetric Races
^
|
\_______Stuff that hurts AI brains
Even if you know you want to counter something, like mutalisks, there are many steps which seem trivial and obvious to a human which are not to an AI. I.e. you need to
1) mine the right resources
2) Get the right supporting tech
3) actually make the counter unit
4) preform all of the mouse clicks required to do the above
5) Plan ahead enough so that you have the counter before the mutalisks are murdering your workers
Real World Applications
1) AI for controlling cooling systems for server farm (save money by reducing cooling when not needed)
Unsolved AI problems
Memory
Planning
Imagination
*Cutting edge research* - Kevin Calderone
Types of AI-
Scripted AI
Deep learning AI
Most AI, like the current AI, now are scripted, but that's boring. The AIs can only be as clever as the designer, and generally cannot adapt.
Deep learing approach
Self learning
Uses raw images
Develops own strategy
<Deep learning visualization views>
<Feature based layers>, image representations of current game state
<Damage, team, health, vision>
Damage shows where unts are taking damage
Health shows current unit health
Team shows unit owners
vision shows where agent has vision
These provide observations for the agent
Replay analysis
Replay coaching tool for AI
Replays being pulled from ladder for teaching AIs
Offical API includes
Image based AI API
Scripted AI API
Documentation
Example code
AI vs AI play
Target release Q1 2017
There will be a blogpost on SC2.com shortly after the panel
Looking ahead:
Coaching tools
Balance testing
Q&A Section: (Horribly paraphrased by me)
Artosis: "The AI that played the atari games, they got to superhuman levels, how did they really get?"
Ewalds: "We have one guy who plays all the atari games, he's not pro but pretty good, and he plays all the games a few times to establish the baselines. We relased a paper about our performance 50% we did better, 25% we did almost as good and 25% we didn't do well. Games like pong we could do 10x better than humans, games like ms pacman we did 10x worse, and montezumas revenge we got no where. Now all except one or two are better than humans, and montezumas revenges is almost human performance"
Artosis: "what makes these games so hard"
Ewalds: "Pong is pretty simple, given the current and previous frame you can compute the entire state of the game, and is enouch for perfect play. For pacman, there's a lot more states involves, a lot more that is hidden, and there is more planning involed so you don't get killed by a ghost. These are some different aspects you have to build into the agent"
Vinyals: "to add to that, to start learning a game, most agents start by performing random inputs, and will eventually get a reward, which gets them started. In some games, there are so many steps that need to be performed to get a reward. In games like montezumas revenge, the reward is so far away and it's so easy to just die, that the AI learns to do nothing since it can't find a reward. (Montezumas revenge includes finding keys and opening doors)"
Artosis: "so it just stands there instead of going to die"
Vinyals: "Yes, because all it learns is that if it moves it dies. It doesn't see the progress due to a lack of intrinsic motiviation"
Artosis: "Why go from Atari to SC2?"
Vinyals: "This sort of sparse reward structure is something key in SC2. If you set the reward to "win the game", how long does it take and how much to you have to learn before you start to win. The partial observability of the map also adds challenge because the agent has to learn how to scout effectivly. As a researcher this is a very interesting problem, and being able to have other researchers and hobbiests help make this is a very interesting environment"
Paul: "This is also a very stable game with a lot of replays so there's a lot of structure there to work off of"
Kevin: "It's a very popular game as well. If we were to pick a similarly difficult game with less players, it would be hard to learn if we were actually doing well.
Artosis: "what challenges have you faced"
Paul: "How do we present the environment to the AI? THere's a lot of intricaies. Imagine trying to explain SC2 to your grandmother. And google keeps improving or finding bugs in their platform that we need to keep updating around"
Kevin: "We also keep finding SC2 bugs that we run into when speeding up the game for AI vs AI play, like a buffer overflow in the game duration counter, that only came up after running the game for hours"
Vinyals: "We were running a distributed SC2 binary on our servers, and the data center guy was hype when he found out what we're doing"
Kevin: "It's funning if you search something on google, SC2 is running next to it"
Artosis: "What are some of the biggest challenges the AI will face"
Vinyals: "Just learning from scratch is very difficult. When you as a player try to learn a game like Starcraft, you probably go and watch tutorials and other players, you don't just start randomly performing actions in the game, so that is certainly a challenge. So, one of the things we'll probably do is bootstrap from how other people play. This was a key aspect of AlphaGo. Using a large dataset of previous games to get the process started. This is a twofold process. Lets say you have a game state, like your in your base and making buildings. You can predict what the human is most likely to do, and this is called a policy. You can also predict how likely this move is to lead to winning the game, this is called a value. Also zerg best race. It was exciting to see this in Go. I don't understand Go at all, but I saw the probabilities for winning. And the commentators were saying that AplhaGo was making a mistake, but AlphaGo thought these moves were good, and often they were. So these AIs may even help for better commentary"
Artosis: "What about multitasking and spreading attention? what about limiting these to be like a human?"
Kevin: "yes, for AI to be fun it needs to be fair. We're going to limit the APM, but we're also forcing the AI agent to work through the same UI. It won't havea mouse, but it will need to perform selections, make control groups, move the camera around etc"
Ewalds: "The goal isn't to make a physical robot, it's also not a goal to make a bot to learn to read the number of minerals, so some things will be given to it more directly. We want to learn how to deal with planning, perception, etc. So we'll take basic liberties like giving it the number of minerals, but giving it actual unit IDs, maybe not. We may relax these in training some, just like you might look at a replay to see the full map after, but only for training. And we'll reply on the blizzard folks to make sure the game space is fair"
Paul: "Yeah, the agent will recieve the information a human would, and the interface will be setup so it cannot cheat"
Vinyals: "Like in Ty vs Byun, there was a ship that got missed because of limited attention, and being able to model this in an AI will be a challange"
Artosis: "To say on that topic, it seems like the AI will be able to play against itself, is this something that's going to be happing a lot, are there going to be 1000 SC2 games running on the google servers and how does that work?"
Ewalds: "At first it will only be against the built in AI, but eventually that will be easy and it will start to play against ifself, on the google compute, probably hundreds or thousands of games at a time. We will probably remove the time dependancy to start, but eventually it will need to play in real time, so if it can't compute fast enough it will need to skip moves"
Artosis "what about learning from replays"
Vinyals: "Yes we want to do that. One way we learn is imitation learning. We see someone do a build, and we try to do it ourselfs and from that we learn how it works. You may not want to learn from the best players at first, but you would wnat to learn the basics like how to mine etc. So people at any level can contribute to this just by laddering and providing replays"
Artosis: "What to you plan to see in the months after the API release"
Paul: "In the months before, we want to see people on the forums to provide feedback on what they wnat to see. The people who use broodwar can be a huge help and hopefully they'll switch over. This is a way for people to actually make an SC2 AI without being banned"
Kevin: "And we want to make it accessable, like we want highschooolers to be able to hop in and play with the AI. And tehre are other useful tools like replay analysis"
Vinyals: "We hope to see whatever we're not predicting ^^"
Artosis; "what do you want to get out of building this AI?"
Vinyals: "Putting my research hat on, I would like for this to be a benchmark where people can test algorithms for these problems that are not solved in AI. It's not always the algorithms that make advancements, but having good benchmarks for people to have good results, so this will be a good additional environment for this to help advance the field"
Artosis: "Paul, why is this big for Videogames in general"
Paul: "I think it's big for the whole world. Machine learning is helping in so many fields. *names other technologies that do cool things, cd roms internet 3d gpus* I think this is the next big thing, games that respond to people, that have more depth in the AI, I think this could be a *game changer*"
END
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment