spencerduncan · November 5, 2016 18:03
diff --git a/sc2ainotes.txt b/sc2ainotes.txt
 Basics of deep learning
 	    ______________
 	   /  observations\
 	  |               |
 	  V               |
 	agent          environment
 	  |              ^
          |   actions    |
          \______________/

 DQN

 Inputs -> (convolutional layer -> activation functions)xN -> (fully connected layer -> activation functions)xN -> outputs

 General problem solving through reinforment learning
 Agent recieves only observations and rewards

 Use of games as a research environment for machine learning
 Atari 2600 games
 TORCs(driving game)
 Labyrinth (3D maze game)

 We won at Go, so lets do Starcraft, because its cool and difficult

 Fog of War
 Huge Action Space
 Economy
 Realtime
 Long Pay-Off
 3 Asymmetric Races
 ^
 |
 \_______Stuff that hurts AI brains

 Even if you know you want to counter something, like mutalisks, there are many steps which seem trivial and obvious to a human which are not to an AI. I.e. you need to
 1) mine the right resources
 2) Get the right supporting tech
 3) actually make the counter unit
 4) preform all of the mouse clicks required to do the above
 5) Plan ahead enough so that you have the counter before the mutalisks are murdering your workers

 Real World Applications
 1) AI for controlling cooling systems for server farm (save money by reducing cooling when not needed)

 Unsolved AI problems
 Memory
 Planning
 Imagination

 *Cutting edge research* - Kevin Calderone

 Types of AI-
 	Scripted AI
 	Deep learning AI

 Most AI, like the current AI, now are scripted, but that's boring. The AIs can only be as clever as the designer, and generally cannot adapt.

 Deep learing approach
 	Self learning
 	Uses raw images
 	Develops own strategy

 <Deep learning visualization views>
 <Feature based layers>, image representations of current game state
 <Damage, team, health, vision>
 Damage shows where unts are taking damage
 Health shows current unit health
 Team shows unit owners
 vision shows where agent has vision

 These provide observations for the agent

 Replay analysis
 Replay coaching tool for AI
 Replays being pulled from ladder for teaching AIs

 Offical API includes
 Image based AI API
 Scripted AI API
 Documentation
 Example code
 AI vs AI play

 Target release Q1 2017

 There will be a blogpost on SC2.com shortly after the panel

 Looking ahead:
 Coaching tools
 Balance testing

 Q&A Section:  (Horribly paraphrased by me)
 Artosis: "The AI that played the atari games, they got to superhuman levels, how did they really get?"
 Ewalds: "We have one guy who plays all the atari games, he's not pro but pretty good, and he plays all the games a few times to establish the baselines. We relased a paper about our performance 50% we did better, 25% we did almost as good and 25% we didn't do well. Games like pong we could do 10x better than humans, games like ms pacman we did 10x worse, and montezumas revenge we got no where. Now all except one or two are better than humans, and montezumas revenges is almost human performance"
 Artosis: "what makes these games so hard"
 Ewalds: "Pong is pretty simple, given the current and previous frame you can compute the entire state of the game, and is enouch for perfect play. For pacman, there's a lot more states involves, a lot more that is hidden, and there is more planning involed so you don't get killed by a ghost. These are some different aspects you have to build into the agent"
 Vinyals: "to add to that, to start learning a game, most agents start by performing random inputs, and will eventually get a reward, which gets them started. In some games, there are so many steps that need to be performed to get a reward. In games like montezumas revenge, the reward is so far away and it's so easy to just die, that the AI learns to do nothing since it can't find a reward. (Montezumas revenge includes finding keys and opening doors)"
 Artosis: "so it just stands there instead of going to die"
 Vinyals: "Yes, because all it learns is that if it moves it dies. It doesn't see the progress due to a lack of intrinsic motiviation"
 Artosis: "Why go from Atari to SC2?"
 Vinyals: "This sort of sparse reward structure is something key in SC2. If you set the reward to "win the game", how long does it take and how much to you have to learn before you start to win. The partial observability of the map also adds challenge because the agent has to learn how to scout effectivly. As a researcher this is a very interesting problem, and being able to have other researchers and hobbiests help make this is a very interesting environment"
 Paul: "This is also a very stable game with a lot of replays so there's a lot of structure there to work off of"
 Kevin: "It's a very popular game as well. If we were to pick a similarly difficult game with less players, it would be hard to learn if we were actually doing well. 
 Artosis: "what challenges have you faced"
 Paul: "How do we present the environment to the AI? THere's a lot of intricaies. Imagine trying to explain SC2 to your grandmother. And google keeps improving or finding bugs in their platform that we need to keep updating around"
 Kevin: "We also keep finding SC2 bugs that we run into when speeding up the game for AI vs AI play, like a buffer overflow in the game duration counter, that only came up after running the game for hours"
 Vinyals: "We were running a distributed SC2 binary on our servers, and the data center guy was hype when he found out what we're doing"
 Kevin: "It's funning if you search something on google, SC2 is running next to it"
 Artosis: "What are some of the biggest challenges the AI will face"
 Vinyals: "Just learning from scratch is very difficult. When you as a player try to learn a game like Starcraft, you probably go and watch tutorials and other players, you don't just start randomly performing actions in the game, so that is certainly a challenge. So, one of the things we'll probably do is bootstrap from how other people play. This was a key aspect of AlphaGo. Using a large dataset of previous games to get the process started. This is a twofold process. Lets say you have a game state, like your in your base and making buildings. You can predict what the human is most likely to do, and this is called a policy. You can also predict how likely this move is to lead to winning the game, this is called a value. Also zerg best race. It was exciting to see this in Go. I don't understand Go at all, but I saw the probabilities for winning. And the commentators were saying that AplhaGo was making a mistake, but AlphaGo thought these moves were good, and often they were. So these AIs may even help for better commentary"
 Artosis: "What about multitasking and spreading attention? what about limiting these to be like a human?"
 Kevin: "yes, for AI to be fun it needs to be fair. We're going to limit the APM, but we're also forcing the AI agent to work through the same UI. It won't havea mouse, but it will need to perform selections, make control groups, move the camera around etc"
 Ewalds: "The goal isn't to make a physical robot, it's also not a goal to make a bot to learn to read the number of minerals, so some things will be given to it more directly. We want to learn how to deal with planning, perception, etc. So we'll take basic liberties like giving it the number of minerals, but giving it actual unit IDs, maybe not. We may relax these in training some, just like you might look at a replay to see the full map after, but only for training. And we'll reply on the blizzard folks to make sure the game space is fair"
 Paul: "Yeah, the agent will recieve the information a human would, and the interface will be setup so it cannot cheat"
 Vinyals: "Like in Ty vs Byun, there was a ship that got missed because of limited attention, and being able to model this in an AI will be a challange"
 Artosis: "To say on that topic, it seems like the AI will be able to play against itself, is this something that's going to be happing a lot, are there going to be 1000 SC2 games running on the google servers and how does that work?"
 Ewalds: "At first it will only be against the built in AI, but eventually that will be easy and it will start to play against ifself, on the google compute, probably hundreds or thousands of games at a time. We will probably remove the time dependancy to start, but eventually it will need to play in real time, so if it can't compute fast enough it will need to skip moves"
 Artosis "what about learning from replays"
 Vinyals: "Yes we want to do that. One way we learn is imitation learning. We see someone do a build, and we try to do it ourselfs and from that we learn how it works. You may not want to learn from the best players at first, but you would wnat to learn the basics like how to mine etc. So people at any level can contribute to this just by laddering and providing replays"
 Artosis: "What to you plan to see in the months after the API release"
 Paul: "In the months before, we want to see people on the forums to provide feedback on what they wnat to see. The people who use broodwar can be a huge help and hopefully they'll switch over. This is a way for people to actually make an SC2 AI without being banned"
 Kevin: "And we want to make it accessable, like we want highschooolers to be able to hop in and play with the AI. And tehre are other useful tools like replay analysis"
 Vinyals: "We hope to see whatever we're not predicting ^^"
 Artosis; "what do you want to get out of building this AI?"
 Vinyals: "Putting my research hat on, I would like for this to be a benchmark where people can test algorithms for these problems that are not solved in AI. It's not always the algorithms that make advancements, but having good benchmarks for people to have good results, so this will be a good additional environment for this to help advance the field"
 Artosis: "Paul, why is this big for Videogames in general"
 Paul: "I think it's big for the whole world. Machine learning is helping in so many fields. *names other technologies that do cool things, cd roms internet 3d gpus* I think this is the next big thing, games that respond to people, that have more depth in the AI, I think this could be a *game changer*"

 END
	Basics of deep learning
	______________
	/ observations\
	\| \|
	V \|
	agent environment
	\| ^
	\| actions \|
	\______________/

	DQN

	Inputs -> (convolutional layer -> activation functions)xN -> (fully connected layer -> activation functions)xN -> outputs

	General problem solving through reinforment learning
	Agent recieves only observations and rewards

	Use of games as a research environment for machine learning
	Atari 2600 games
	TORCs(driving game)
	Labyrinth (3D maze game)

	We won at Go, so lets do Starcraft, because its cool and difficult

	Fog of War
	Huge Action Space
	Economy
	Realtime
	Long Pay-Off
	3 Asymmetric Races
	^
	\|
	\_______Stuff that hurts AI brains

	Even if you know you want to counter something, like mutalisks, there are many steps which seem trivial and obvious to a human which are not to an AI. I.e. you need to
	1) mine the right resources
	2) Get the right supporting tech
	3) actually make the counter unit
	4) preform all of the mouse clicks required to do the above
	5) Plan ahead enough so that you have the counter before the mutalisks are murdering your workers

	Real World Applications
	1) AI for controlling cooling systems for server farm (save money by reducing cooling when not needed)

	Unsolved AI problems
	Memory
	Planning
	Imagination

	Cutting edge research - Kevin Calderone

	Types of AI-
	Scripted AI
	Deep learning AI

	Most AI, like the current AI, now are scripted, but that's boring. The AIs can only be as clever as the designer, and generally cannot adapt.

	Deep learing approach
	Self learning
	Uses raw images
	Develops own strategy

	<Deep learning visualization views>
	<Feature based layers>, image representations of current game state
	<Damage, team, health, vision>
	Damage shows where unts are taking damage
	Health shows current unit health
	Team shows unit owners
	vision shows where agent has vision

	These provide observations for the agent

	Replay analysis
	Replay coaching tool for AI
	Replays being pulled from ladder for teaching AIs

	Offical API includes
	Image based AI API
	Scripted AI API
	Documentation
	Example code
	AI vs AI play

	Target release Q1 2017

	There will be a blogpost on SC2.com shortly after the panel

	Looking ahead:
	Coaching tools
	Balance testing

	Q&A Section: (Horribly paraphrased by me)
	Artosis: "The AI that played the atari games, they got to superhuman levels, how did they really get?"
	Ewalds: "We have one guy who plays all the atari games, he's not pro but pretty good, and he plays all the games a few times to establish the baselines. We relased a paper about our performance 50% we did better, 25% we did almost as good and 25% we didn't do well. Games like pong we could do 10x better than humans, games like ms pacman we did 10x worse, and montezumas revenge we got no where. Now all except one or two are better than humans, and montezumas revenges is almost human performance"
	Artosis: "what makes these games so hard"
	Ewalds: "Pong is pretty simple, given the current and previous frame you can compute the entire state of the game, and is enouch for perfect play. For pacman, there's a lot more states involves, a lot more that is hidden, and there is more planning involed so you don't get killed by a ghost. These are some different aspects you have to build into the agent"
	Vinyals: "to add to that, to start learning a game, most agents start by performing random inputs, and will eventually get a reward, which gets them started. In some games, there are so many steps that need to be performed to get a reward. In games like montezumas revenge, the reward is so far away and it's so easy to just die, that the AI learns to do nothing since it can't find a reward. (Montezumas revenge includes finding keys and opening doors)"
	Artosis: "so it just stands there instead of going to die"
	Vinyals: "Yes, because all it learns is that if it moves it dies. It doesn't see the progress due to a lack of intrinsic motiviation"
	Artosis: "Why go from Atari to SC2?"
	Vinyals: "This sort of sparse reward structure is something key in SC2. If you set the reward to "win the game", how long does it take and how much to you have to learn before you start to win. The partial observability of the map also adds challenge because the agent has to learn how to scout effectivly. As a researcher this is a very interesting problem, and being able to have other researchers and hobbiests help make this is a very interesting environment"
	Paul: "This is also a very stable game with a lot of replays so there's a lot of structure there to work off of"
	Kevin: "It's a very popular game as well. If we were to pick a similarly difficult game with less players, it would be hard to learn if we were actually doing well.
	Artosis: "what challenges have you faced"
	Paul: "How do we present the environment to the AI? THere's a lot of intricaies. Imagine trying to explain SC2 to your grandmother. And google keeps improving or finding bugs in their platform that we need to keep updating around"
	Kevin: "We also keep finding SC2 bugs that we run into when speeding up the game for AI vs AI play, like a buffer overflow in the game duration counter, that only came up after running the game for hours"
	Vinyals: "We were running a distributed SC2 binary on our servers, and the data center guy was hype when he found out what we're doing"
	Kevin: "It's funning if you search something on google, SC2 is running next to it"
	Artosis: "What are some of the biggest challenges the AI will face"
	Vinyals: "Just learning from scratch is very difficult. When you as a player try to learn a game like Starcraft, you probably go and watch tutorials and other players, you don't just start randomly performing actions in the game, so that is certainly a challenge. So, one of the things we'll probably do is bootstrap from how other people play. This was a key aspect of AlphaGo. Using a large dataset of previous games to get the process started. This is a twofold process. Lets say you have a game state, like your in your base and making buildings. You can predict what the human is most likely to do, and this is called a policy. You can also predict how likely this move is to lead to winning the game, this is called a value. Also zerg best race. It was exciting to see this in Go. I don't understand Go at all, but I saw the probabilities for winning. And the commentators were saying that AplhaGo was making a mistake, but AlphaGo thought these moves were good, and often they were. So these AIs may even help for better commentary"
	Artosis: "What about multitasking and spreading attention? what about limiting these to be like a human?"
	Kevin: "yes, for AI to be fun it needs to be fair. We're going to limit the APM, but we're also forcing the AI agent to work through the same UI. It won't havea mouse, but it will need to perform selections, make control groups, move the camera around etc"
	Ewalds: "The goal isn't to make a physical robot, it's also not a goal to make a bot to learn to read the number of minerals, so some things will be given to it more directly. We want to learn how to deal with planning, perception, etc. So we'll take basic liberties like giving it the number of minerals, but giving it actual unit IDs, maybe not. We may relax these in training some, just like you might look at a replay to see the full map after, but only for training. And we'll reply on the blizzard folks to make sure the game space is fair"
	Paul: "Yeah, the agent will recieve the information a human would, and the interface will be setup so it cannot cheat"
	Vinyals: "Like in Ty vs Byun, there was a ship that got missed because of limited attention, and being able to model this in an AI will be a challange"
	Artosis: "To say on that topic, it seems like the AI will be able to play against itself, is this something that's going to be happing a lot, are there going to be 1000 SC2 games running on the google servers and how does that work?"
	Ewalds: "At first it will only be against the built in AI, but eventually that will be easy and it will start to play against ifself, on the google compute, probably hundreds or thousands of games at a time. We will probably remove the time dependancy to start, but eventually it will need to play in real time, so if it can't compute fast enough it will need to skip moves"
	Artosis "what about learning from replays"
	Vinyals: "Yes we want to do that. One way we learn is imitation learning. We see someone do a build, and we try to do it ourselfs and from that we learn how it works. You may not want to learn from the best players at first, but you would wnat to learn the basics like how to mine etc. So people at any level can contribute to this just by laddering and providing replays"
	Artosis: "What to you plan to see in the months after the API release"
	Paul: "In the months before, we want to see people on the forums to provide feedback on what they wnat to see. The people who use broodwar can be a huge help and hopefully they'll switch over. This is a way for people to actually make an SC2 AI without being banned"
	Kevin: "And we want to make it accessable, like we want highschooolers to be able to hop in and play with the AI. And tehre are other useful tools like replay analysis"
	Vinyals: "We hope to see whatever we're not predicting ^^"
	Artosis; "what do you want to get out of building this AI?"
	Vinyals: "Putting my research hat on, I would like for this to be a benchmark where people can test algorithms for these problems that are not solved in AI. It's not always the algorithms that make advancements, but having good benchmarks for people to have good results, so this will be a good additional environment for this to help advance the field"
	Artosis: "Paul, why is this big for Videogames in general"
	Paul: "I think it's big for the whole world. Machine learning is helping in so many fields. names other technologies that do cool things, cd roms internet 3d gpus I think this is the next big thing, games that respond to people, that have more depth in the AI, I think this could be a game changer"

	END