Tuesday, September 21, 2010

Google DevQuiz, the shiritori problem

 So, I mentioned I'd have a look at the dev quiz problems and I'm going to have a quick look over one of them.

 The problem in question is playing "Shiritori" against their server using a limited dictionary. First one to be unable to respond loses.
  As I'm sure most non-japanese readers are not familiar with the game, it is a simple word association game. The next word you respond with must begin with the same later the previous word ended with, so one might go:

Apple->elephant->tower->rat->...

No word may be repeated twice and if you can't respond, you lose.

The game is generally delineated by a subject such as sports or famous people or whatever... In the case of google, the limiting set would be a limited dictionary that you and the server share. Here is the set from the easy(trivial exercise).

psouvqk  <- start word
khbpmr
kozljf
krgauzfzlgm
kkfria
rdqbrwyvbf
rzsllm
fcdpksoc
feeesamqti
fmxfwtqa
fdqyteakd
cudyi
cfaxwprqywu
czeltzvu
iapjqat
isttjds
isexgjyllo
tejubs
tkjkxecsw
sfdezjjxj
shdlkdsjpb
slpbtiox
jvhgdyvb
jlumkmgl
jcsfqql
jsvknhxwbjh
bpddtv
bhtkvcuph
vzlocep
mdnjqve
elfduspou
utvptuuukqz
zzlrtwetw
wsoguy
wwgcgqwxo
yyfofqglzvj
ybiyodwxysl
llottukq
qsgmdelg
gucmqmkiarn
anmxhykbad
djyicuzu
ooxkbyzdxh
hiodhnmqox
xdisedp
pauopwwan

This one could actually be solved with a piece of paper as the branching is very limited. Most words are directly connected to the next one. Of course writing code from the start to solve it is reusable in the harder questions.

The next one is not nice if you try to trace the possible paths on paper.

cjunqesksk
kabeoppnt
gihcboqk
cjorrhkg
ntzptexc
dgmizn
mwsgd
trpiqowsm
kkqarpan
gkyzgbfd
cvhrlkmwpm
njprgqt
dxeheowsk
msmtvrg
tdrygvhxc
kbscviwmdpd
gzqwm
cddemqxfvgt
nrfavlpwock
djtswyfksg
mpescdobc
tocztzehxn

Less words! But much more branching... There was actually a trend of a single "end-game" that if succesfully entered you could work it in paper perhaps but the whole thing was begging for a read-ahead program.

I'm going to spare the detail, but by implementing a dictionary to store the remaining words, and then running a recursive algorithm to check whether each path is "alive", one can simply pick the live paths all the way until the opponent loses. With this word set it is still doable without any pattern analysis or serious optimisation. I still used C++ as I imagined the next stage could be more demanding and require expanding on my current work(it did).

Anyway, here's the core classes pulled from the header files:


class Game {
private:
 Dictionary legalWords;  /* list of legal words that haven't been used */
 vector<string> usedWords; /* list of words that have been used */

 int turn;     /* turn counter used to determine player or ai turn */

 void useWord(string word);

 void undoLast();
 int isComputerTurn();
 int isChoiceAlive(string choice, int steps);
public:
 Game();
 void startGame();
 void displayTurn();
 int selectNext(vector<string> choiceList);
};

class Dictionary {
private:
 list<string> words[ALPH_LIST_NO];
 int names;

 string firstWord;
public:
 /*
  Construct the dictionary from a file
 */
 Dictionary(char * filename);

 Dictionary() {
  names = 0;
 }

 /*
  Loads all names from a file, returning the number read,
  or 0 for a failure
 */
 int loadFromFile(char * filename);

 /*
  Display the contents of the dictionary
 */
 void displayContents();

 int addWord(string word);
 int removeWord(string word);

 /*
  Return the first word for empty strings
 */
 vector<string> getPossibleNext(string cur);
};


The important part of the code is really the bit about checking whether a choice is alive. The rest is just trivial implementation.

/**
 Recursively check whether a specific choice can be used safely
 Any case where there is a set of choices by the ai that can lead to
 game over means the choice is dead.
 The function simulates actual play, trying each path. 
 For player turns any subpath means the path is alive
 For ai turns any subpatch being dead means the path is dead
*/
int Game::isChoiceAlive(string choice, int steps) {
 if(steps <= 0) {
  return 1;
 }

 useWord(choice);
 steps--;


 vector<string> nextWords = legalWords.getPossibleNext(choice);
  
 int result = 1;
 if(isComputerTurn()) {
  if(nextWords.size() == 0) {
//   DEBUG("ran out of words at: " << choice);

   result = 1;
  } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') {
   // h leads to a dead loop
   
   result = 1;
  } else {

   vector<string>::iterator it;
   // check all words. on the computers turn if any choice is dead
   // we assume the computer will pick it and thus lead to a dead end
   for(it = nextWords.begin(); it != nextWords.end(); it++) {
    if(!isChoiceAlive(*it, steps)) {
     result = 0;
          
     break;
    }
   }
  }
 } else {
  if(nextWords.size() == 0) {
   // no player choices so game over
//   DEBUG("ran out of words at: " << choice << "\n");
   result = 0;
   DEBUG("Player ran out of words\n");
  } else {

   vector<string>::iterator it;
   // default to failure
   result = 0;

   for(it = nextWords.begin(); it != nextWords.end(); it++) {
    if(isChoiceAlive(*it, steps)) {
     result = 1;
     break;
    }
   }
  }
 }

 undoLast();
 return result;
}

One will note that this works by recursively calling itself until a path is determined to either by alive(1) or dead(0). This depends on whose turn it is. For the player choice so long as one sub-path is alive, the whole path is alive. On the other hand on the servers turn, one path that results in a dead player is a dead path(since we assume the server plays a perfect game, which it probably does). To restore the state of the dictionary, any items popped out of the dictionary(with useWord) are returned back to it using undoLast().

This results in an elegant solution to the problem. It certainly could use some optimizations but it returns a response to each server move very quickly so such are not necessary. I would be curious to benchmark the Visual Studio STL implementationfor this purpose.

The final question signficantly extends the number of words:

zjfurojiwt
tpenjyluge
txxkygwaqgv
ttloiso
exsosv
edmpo
ejvzpaf
vgaxvtbzo
vzgcsrihdof
vfqkkfefsz
orxbf
ocsbkecrutz
oiuvsvdzny
fmualz
fttay
fjlin
zxqry
zwlhuaavn
zbfbu
yzkhgqjwn
ylapbgu
yztpihmont
nmmeycwxu
nvchbixbft
nhube
uvokit
ulrsdme
ullkbwv
tqgbupfbx
xbxaj
xqkfbyg
xlsjah
euxgwijb
bygjlosw
buoqdwoeda
bltxp
vtvbiphxlk
kbucm
kvpbutfxnci
kpiidygnc
oqcfhx
fcnylfqb
zhptjfbk
ywhozx
nuceayubb
uornsk
jwkcw
jtebgzhijgm
jiyeziidgeg
jhdgofa
jkqvxudtkpi
jvdfqjh
jpsbykp
jymncyc
jeuawoyqfeq
jxwdyctpbr
jtkius
wyehnlfcewj
wvwnxm
wkwpoxug
wauwusmlca
wbvheei
wrvuppgxeh
wqpdckip
wuearboc
wpixxq
wgysr
waegpptsls
myrjybpaj
miksbvyrpw
mgbidg
mrzowglfa
mncqxmi
mqkxjtcnrhh
mymuttlap
muzynhc
mmgkyupq
mhvcustr
mgyxalfjoks
grfnkukj
grvpecw
glafmam
gkktjoquda
gxcumli
gnojpqlfwh
gcykxbcp
gfukbmc
grrnxq
gmadnr
gkeeyspumss
awwgeyfvj
axcvtmpsw
arcjpzm
azpqqvg
aakhi
acxurh
agiaariapp
ajzlurevxyc
akvsq
acnzqsglor
alvilkvzvs
ipsyrchhmwj
iqpgfhzzw
icypxogrm
ibcytxveblg
iatrdra
izgujgoph
iknbjgp
icewc
inbfvdmq
intkzlyeer
ibqnfshys
hkpuj
hspyw
hvkom
hsrewtg
hxxfa
huozi
htihddyp
hkumyc
hfaadvq
hkzmvdcxr
hvtfs
pesyj
pzqdepzw
ptybxgm
pahfbcdtg
ppnhzyspa
phfqjdxybi
psmkslfah
pagszdvdc
petdq
ptgevrrebr
pcdbqtls
cfzfmjhj
czczzhaxw
cgbpozgm
cwctwg
cyfamuchyaa
cxxqydweki
cwnzkiah
ccactp
cbylbq
chogthyr
cswakbs
qpslvrrgnj
qltanybow
qtqpmximzwm
qhkbwlig
qgmyza
qkvozmngii
qjdqvh
qfdntswgjfp
qujuqkczgrc
qjbrsxwcjhr
qkajthhs
rqgvj
rmbcwvenyw
rwiuwlulm
rplfg
rikfpsra
rrjfki
rkinlvh
raditmwbp
rabaxvlzoc
ryzzijueq
rxtwzcs
svosgyukelj
suvjdw
szfizfvlem
scwyaooag
syrfklha
scnyeai
slwhbymh
shsguslpxpp
sskxiec
srzgnsoooq
svmgiir
qasspaquyt
qzptilhpgyo
qauxwqly
rvwse
rjrnhyvdf
rjuvgruzpn
sptgdchfahv
sssuuafpiz
skhsmu

Trying to simply run through the massive number of choices is going to take a long, long time. I initially tried to solve this by compromising and figuring that the server may only be looking a number of steps ahead so optimised my own code and got it to look around 8-10 ahead(I can't remember at this time). This wasn't sufficient, but I noticed losing patterns would lead into a back and forth where the opponent keeps sending the same letter back to you.
On examination you can find there are one more words that end with certain letters than those that begin with them. Thus if one can force the computer to select one of these words, a path can be considered "alive". This reduces a huge amount of branching, and in fact is enough to allow a full simulation, and the final 6 points for the question. The edit to isChoiceAlive is quite simple, note the "if(choice[0] == 'k' || ... ) code.

int Game::isChoiceAlive(string choice, int steps) {
 if(steps <= 0) {
  return 1;
 }

 useWord(choice);
 steps--;


 vector<string> nextWords = legalWords.getPossibleNext(choice);
  
 int result = 1;
 if(isComputerTurn()) {
  if(nextWords.size() == 0) {
//   DEBUG("ran out of words at: " << choice);

   result = 1;
  } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') {
   // h leads to a dead loop
   
   result = 1;
  } else {

   vector<string>::iterator it;
   // check all words. on the computers turn if any choice is dead
   // we assume the computer will pick it and thus lead to a dead end
   for(it = nextWords.begin(); it != nextWords.end(); it++) {
    if(!isChoiceAlive(*it, steps)) {
     result = 0;
          
     break;
    }
   }
  }
 } else {
  if(nextWords.size() == 0) {
   // no player choices so game over
//   DEBUG("ran out of words at: " << choice << "\n");
   result = 0;
   DEBUG("Player ran out of words\n");
  } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') {
   // h leads to a dead loop
   
   result = 0;
  } else {

   vector<string>::iterator it;
   // default to failure
   result = 0;

   for(it = nextWords.begin(); it != nextWords.end(); it++) {
    if(isChoiceAlive(*it, steps)) {
     result = 1;
     break;
    }
   }
  }
 }

 undoLast();
 return result;

Sunday, September 5, 2010

Concrete mathematics

 Got the results in from the DevQuiz. All appears to be correct and I've got a place, woo! I'm in two minds about the difficulty of the questions. On one hand, they were challenging enough to make one think, but I don't think that they were overall "hard" even for the final questions, as I could still get away with fairly generic programs. On the other hand it did take a fair amount of time to do all the questions... I hope developer day is awesome and worth the time spent on the questions

 That aside, I picked up Concrete Mathematics a couple of days along with The Art of  Computer Programming.   I don't imagine I'll have the time to read the latter for a while, but I've started poking through Concrete Mathematics(yeah I'll get back to sicp eventually. Too many books, too little time). I really like how they've added in the student "graffiti". Some of it is very helpful in grasping a concept and other parts keep the text interesting even if they aren't directly useful. It's been a while since I've had to do some serious math (I guess a few of the questions in part 1 of sicp count), so it's got my brain going for a ride, but it's interesting stuff, and I still love to see a formula come together in an elegant manner. Not gotten far in yet, but definitely recommended for those interested.