NyTimes put up an article about the ongoing attempts to improve online privacy.
Welp. While in principle I think that privacy is important, and people have a right to it, I don't think people are going to like the alternative.
Google, yahoo, facebook, all those service people use for free and love, are supported by advertising. Targeted advertising even. And that only works because of some amount of user tracking.
So, what will happen if/when tracking becomes impossible thanks to mandatory "do not track" buttons? These companies providing the services need to switch to another revenue model. With tracking out of the window, monetising data is out of the question. As is targetted advertising... So we're left with primitive advertising(and more ad coverage on pages to cover for lower rates!?), or for pay services.
Is this helping anyone? I don't think so.
There are already ways to disable tracking. Disable your cookies, disable flash cookies/datastore. But this isn't really a place government should in any way be getting involved with. Increased consumer awareness could be good. Allow them to choose. People who are comfortable with being tracked in exchange for a service can continue, and people who would prefer to pay for the service can go ahead in peace. Simply removing tracking as an option altogether is just a good way to kill off free services.
Wednesday, November 10, 2010
Monday, November 1, 2010
Be part of the problem, or the solution
Well, I'm still alive.
I've just decided that unless I'm actually going to write an opinionated rant about something it's not going to be very interesting so I'm not really updating about whatever latest book I've read or am reading(though "The elements of computing systems" is awesome so far!)
Anyway, today, a little bit of a rant.
I *hate* Shift-JIS.
We're in the era of unicode now, or at least should be. It can represent any character you can think of, is fully ascii compatible, well supported in all development API's, and yet so many places are still using ancient artifacts from the past.
I work at a Japanese company so unfortunately, I have to deal with Shift-JIS and EUC-JP a lot. Generally in the context of turning them into UTF-8. Whether it's taking search queries and figuring out their encoding or dealing with buggy source code because someone thought it would be a good idea to mix encodings in varying source files, encodings other than unicode upset me a lot.
Apparently some big shops still haven't entered the modern era either. Try loading a UTF-8 encoded csv file with japanese in it into excel.
Garbage!
Thanks Microsoft. It's kind of sad because overall excel is a good product. Anyway, OpenOffice(or LibreOffice now that a lot of the devs ran from oracle) loads em great. And it gives you an option to select encoding if you do decide to use one of this dinosaur age encodings(I'm sure there's a setting for this somewhere but I don't use excel anymore and the people that reported the problem to me couldn't find one, which means neither will the vast majority of users).
So, when our software puts out CSV files for customers, how do we encode them?
In Shift-JIS. It makes me hurt. But what can we do when the vast majority of people use Excel? What does one do in a situation like this?
I've just decided that unless I'm actually going to write an opinionated rant about something it's not going to be very interesting so I'm not really updating about whatever latest book I've read or am reading(though "The elements of computing systems" is awesome so far!)
Anyway, today, a little bit of a rant.
I *hate* Shift-JIS.
We're in the era of unicode now, or at least should be. It can represent any character you can think of, is fully ascii compatible, well supported in all development API's, and yet so many places are still using ancient artifacts from the past.
I work at a Japanese company so unfortunately, I have to deal with Shift-JIS and EUC-JP a lot. Generally in the context of turning them into UTF-8. Whether it's taking search queries and figuring out their encoding or dealing with buggy source code because someone thought it would be a good idea to mix encodings in varying source files, encodings other than unicode upset me a lot.
Apparently some big shops still haven't entered the modern era either. Try loading a UTF-8 encoded csv file with japanese in it into excel.
Garbage!
Thanks Microsoft. It's kind of sad because overall excel is a good product. Anyway, OpenOffice(or LibreOffice now that a lot of the devs ran from oracle) loads em great. And it gives you an option to select encoding if you do decide to use one of this dinosaur age encodings(I'm sure there's a setting for this somewhere but I don't use excel anymore and the people that reported the problem to me couldn't find one, which means neither will the vast majority of users).
So, when our software puts out CSV files for customers, how do we encode them?
In Shift-JIS. It makes me hurt. But what can we do when the vast majority of people use Excel? What does one do in a situation like this?
Tuesday, September 21, 2010
Google DevQuiz, the shiritori problem
So, I mentioned I'd have a look at the dev quiz problems and I'm going to have a quick look over one of them.
The problem in question is playing "Shiritori" against their server using a limited dictionary. First one to be unable to respond loses.
As I'm sure most non-japanese readers are not familiar with the game, it is a simple word association game. The next word you respond with must begin with the same later the previous word ended with, so one might go:
Apple->elephant->tower->rat->...
No word may be repeated twice and if you can't respond, you lose.
The game is generally delineated by a subject such as sports or famous people or whatever... In the case of google, the limiting set would be a limited dictionary that you and the server share. Here is the set from the easy(trivial exercise).
This one could actually be solved with a piece of paper as the branching is very limited. Most words are directly connected to the next one. Of course writing code from the start to solve it is reusable in the harder questions.
The next one is not nice if you try to trace the possible paths on paper.
Less words! But much more branching... There was actually a trend of a single "end-game" that if succesfully entered you could work it in paper perhaps but the whole thing was begging for a read-ahead program.
I'm going to spare the detail, but by implementing a dictionary to store the remaining words, and then running a recursive algorithm to check whether each path is "alive", one can simply pick the live paths all the way until the opponent loses. With this word set it is still doable without any pattern analysis or serious optimisation. I still used C++ as I imagined the next stage could be more demanding and require expanding on my current work(it did).
Anyway, here's the core classes pulled from the header files:
The important part of the code is really the bit about checking whether a choice is alive. The rest is just trivial implementation.
One will note that this works by recursively calling itself until a path is determined to either by alive(1) or dead(0). This depends on whose turn it is. For the player choice so long as one sub-path is alive, the whole path is alive. On the other hand on the servers turn, one path that results in a dead player is a dead path(since we assume the server plays a perfect game, which it probably does). To restore the state of the dictionary, any items popped out of the dictionary(with useWord) are returned back to it using undoLast().
This results in an elegant solution to the problem. It certainly could use some optimizations but it returns a response to each server move very quickly so such are not necessary. I would be curious to benchmark the Visual Studio STL implementationfor this purpose.
The final question signficantly extends the number of words:
Trying to simply run through the massive number of choices is going to take a long, long time. I initially tried to solve this by compromising and figuring that the server may only be looking a number of steps ahead so optimised my own code and got it to look around 8-10 ahead(I can't remember at this time). This wasn't sufficient, but I noticed losing patterns would lead into a back and forth where the opponent keeps sending the same letter back to you.
On examination you can find there are one more words that end with certain letters than those that begin with them. Thus if one can force the computer to select one of these words, a path can be considered "alive". This reduces a huge amount of branching, and in fact is enough to allow a full simulation, and the final 6 points for the question. The edit to isChoiceAlive is quite simple, note the "if(choice[0] == 'k' || ... ) code.
The problem in question is playing "Shiritori" against their server using a limited dictionary. First one to be unable to respond loses.
As I'm sure most non-japanese readers are not familiar with the game, it is a simple word association game. The next word you respond with must begin with the same later the previous word ended with, so one might go:
Apple->elephant->tower->rat->...
No word may be repeated twice and if you can't respond, you lose.
The game is generally delineated by a subject such as sports or famous people or whatever... In the case of google, the limiting set would be a limited dictionary that you and the server share. Here is the set from the easy(trivial exercise).
psouvqk <- start word khbpmr kozljf krgauzfzlgm kkfria rdqbrwyvbf rzsllm fcdpksoc feeesamqti fmxfwtqa fdqyteakd cudyi cfaxwprqywu czeltzvu iapjqat isttjds isexgjyllo tejubs tkjkxecsw sfdezjjxj shdlkdsjpb slpbtiox jvhgdyvb jlumkmgl jcsfqql jsvknhxwbjh bpddtv bhtkvcuph vzlocep mdnjqve elfduspou utvptuuukqz zzlrtwetw wsoguy wwgcgqwxo yyfofqglzvj ybiyodwxysl llottukq qsgmdelg gucmqmkiarn anmxhykbad djyicuzu ooxkbyzdxh hiodhnmqox xdisedp pauopwwan
This one could actually be solved with a piece of paper as the branching is very limited. Most words are directly connected to the next one. Of course writing code from the start to solve it is reusable in the harder questions.
The next one is not nice if you try to trace the possible paths on paper.
cjunqesksk kabeoppnt gihcboqk cjorrhkg ntzptexc dgmizn mwsgd trpiqowsm kkqarpan gkyzgbfd cvhrlkmwpm njprgqt dxeheowsk msmtvrg tdrygvhxc kbscviwmdpd gzqwm cddemqxfvgt nrfavlpwock djtswyfksg mpescdobc tocztzehxn
Less words! But much more branching... There was actually a trend of a single "end-game" that if succesfully entered you could work it in paper perhaps but the whole thing was begging for a read-ahead program.
I'm going to spare the detail, but by implementing a dictionary to store the remaining words, and then running a recursive algorithm to check whether each path is "alive", one can simply pick the live paths all the way until the opponent loses. With this word set it is still doable without any pattern analysis or serious optimisation. I still used C++ as I imagined the next stage could be more demanding and require expanding on my current work(it did).
Anyway, here's the core classes pulled from the header files:
class Game { private: Dictionary legalWords; /* list of legal words that haven't been used */ vector<string> usedWords; /* list of words that have been used */ int turn; /* turn counter used to determine player or ai turn */ void useWord(string word); void undoLast(); int isComputerTurn(); int isChoiceAlive(string choice, int steps); public: Game(); void startGame(); void displayTurn(); int selectNext(vector<string> choiceList); }; class Dictionary { private: list<string> words[ALPH_LIST_NO]; int names; string firstWord; public: /* Construct the dictionary from a file */ Dictionary(char * filename); Dictionary() { names = 0; } /* Loads all names from a file, returning the number read, or 0 for a failure */ int loadFromFile(char * filename); /* Display the contents of the dictionary */ void displayContents(); int addWord(string word); int removeWord(string word); /* Return the first word for empty strings */ vector<string> getPossibleNext(string cur); };
The important part of the code is really the bit about checking whether a choice is alive. The rest is just trivial implementation.
/** Recursively check whether a specific choice can be used safely Any case where there is a set of choices by the ai that can lead to game over means the choice is dead. The function simulates actual play, trying each path. For player turns any subpath means the path is alive For ai turns any subpatch being dead means the path is dead */ int Game::isChoiceAlive(string choice, int steps) { if(steps <= 0) { return 1; } useWord(choice); steps--; vector<string> nextWords = legalWords.getPossibleNext(choice); int result = 1; if(isComputerTurn()) { if(nextWords.size() == 0) { // DEBUG("ran out of words at: " << choice); result = 1; } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') { // h leads to a dead loop result = 1; } else { vector<string>::iterator it; // check all words. on the computers turn if any choice is dead // we assume the computer will pick it and thus lead to a dead end for(it = nextWords.begin(); it != nextWords.end(); it++) { if(!isChoiceAlive(*it, steps)) { result = 0; break; } } } } else { if(nextWords.size() == 0) { // no player choices so game over // DEBUG("ran out of words at: " << choice << "\n"); result = 0; DEBUG("Player ran out of words\n"); } else { vector<string>::iterator it; // default to failure result = 0; for(it = nextWords.begin(); it != nextWords.end(); it++) { if(isChoiceAlive(*it, steps)) { result = 1; break; } } } } undoLast(); return result; }
One will note that this works by recursively calling itself until a path is determined to either by alive(1) or dead(0). This depends on whose turn it is. For the player choice so long as one sub-path is alive, the whole path is alive. On the other hand on the servers turn, one path that results in a dead player is a dead path(since we assume the server plays a perfect game, which it probably does). To restore the state of the dictionary, any items popped out of the dictionary(with useWord) are returned back to it using undoLast().
This results in an elegant solution to the problem. It certainly could use some optimizations but it returns a response to each server move very quickly so such are not necessary. I would be curious to benchmark the Visual Studio STL implementationfor this purpose.
The final question signficantly extends the number of words:
zjfurojiwt tpenjyluge txxkygwaqgv ttloiso exsosv edmpo ejvzpaf vgaxvtbzo vzgcsrihdof vfqkkfefsz orxbf ocsbkecrutz oiuvsvdzny fmualz fttay fjlin zxqry zwlhuaavn zbfbu yzkhgqjwn ylapbgu yztpihmont nmmeycwxu nvchbixbft nhube uvokit ulrsdme ullkbwv tqgbupfbx xbxaj xqkfbyg xlsjah euxgwijb bygjlosw buoqdwoeda bltxp vtvbiphxlk kbucm kvpbutfxnci kpiidygnc oqcfhx fcnylfqb zhptjfbk ywhozx nuceayubb uornsk jwkcw jtebgzhijgm jiyeziidgeg jhdgofa jkqvxudtkpi jvdfqjh jpsbykp jymncyc jeuawoyqfeq jxwdyctpbr jtkius wyehnlfcewj wvwnxm wkwpoxug wauwusmlca wbvheei wrvuppgxeh wqpdckip wuearboc wpixxq wgysr waegpptsls myrjybpaj miksbvyrpw mgbidg mrzowglfa mncqxmi mqkxjtcnrhh mymuttlap muzynhc mmgkyupq mhvcustr mgyxalfjoks grfnkukj grvpecw glafmam gkktjoquda gxcumli gnojpqlfwh gcykxbcp gfukbmc grrnxq gmadnr gkeeyspumss awwgeyfvj axcvtmpsw arcjpzm azpqqvg aakhi acxurh agiaariapp ajzlurevxyc akvsq acnzqsglor alvilkvzvs ipsyrchhmwj iqpgfhzzw icypxogrm ibcytxveblg iatrdra izgujgoph iknbjgp icewc inbfvdmq intkzlyeer ibqnfshys hkpuj hspyw hvkom hsrewtg hxxfa huozi htihddyp hkumyc hfaadvq hkzmvdcxr hvtfs pesyj pzqdepzw ptybxgm pahfbcdtg ppnhzyspa phfqjdxybi psmkslfah pagszdvdc petdq ptgevrrebr pcdbqtls cfzfmjhj czczzhaxw cgbpozgm cwctwg cyfamuchyaa cxxqydweki cwnzkiah ccactp cbylbq chogthyr cswakbs qpslvrrgnj qltanybow qtqpmximzwm qhkbwlig qgmyza qkvozmngii qjdqvh qfdntswgjfp qujuqkczgrc qjbrsxwcjhr qkajthhs rqgvj rmbcwvenyw rwiuwlulm rplfg rikfpsra rrjfki rkinlvh raditmwbp rabaxvlzoc ryzzijueq rxtwzcs svosgyukelj suvjdw szfizfvlem scwyaooag syrfklha scnyeai slwhbymh shsguslpxpp sskxiec srzgnsoooq svmgiir qasspaquyt qzptilhpgyo qauxwqly rvwse rjrnhyvdf rjuvgruzpn sptgdchfahv sssuuafpiz skhsmu
Trying to simply run through the massive number of choices is going to take a long, long time. I initially tried to solve this by compromising and figuring that the server may only be looking a number of steps ahead so optimised my own code and got it to look around 8-10 ahead(I can't remember at this time). This wasn't sufficient, but I noticed losing patterns would lead into a back and forth where the opponent keeps sending the same letter back to you.
On examination you can find there are one more words that end with certain letters than those that begin with them. Thus if one can force the computer to select one of these words, a path can be considered "alive". This reduces a huge amount of branching, and in fact is enough to allow a full simulation, and the final 6 points for the question. The edit to isChoiceAlive is quite simple, note the "if(choice[0] == 'k' || ... ) code.
int Game::isChoiceAlive(string choice, int steps) { if(steps <= 0) { return 1; } useWord(choice); steps--; vector<string> nextWords = legalWords.getPossibleNext(choice); int result = 1; if(isComputerTurn()) { if(nextWords.size() == 0) { // DEBUG("ran out of words at: " << choice); result = 1; } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') { // h leads to a dead loop result = 1; } else { vector<string>::iterator it; // check all words. on the computers turn if any choice is dead // we assume the computer will pick it and thus lead to a dead end for(it = nextWords.begin(); it != nextWords.end(); it++) { if(!isChoiceAlive(*it, steps)) { result = 0; break; } } } } else { if(nextWords.size() == 0) { // no player choices so game over // DEBUG("ran out of words at: " << choice << "\n"); result = 0; DEBUG("Player ran out of words\n"); } else if(choice[0] == 'x' || choice[0] == 'k' || choice[0] == 'b') { // h leads to a dead loop result = 0; } else { vector<string>::iterator it; // default to failure result = 0; for(it = nextWords.begin(); it != nextWords.end(); it++) { if(isChoiceAlive(*it, steps)) { result = 1; break; } } } } undoLast(); return result;
Sunday, September 5, 2010
Concrete mathematics
Got the results in from the DevQuiz. All appears to be correct and I've got a place, woo! I'm in two minds about the difficulty of the questions. On one hand, they were challenging enough to make one think, but I don't think that they were overall "hard" even for the final questions, as I could still get away with fairly generic programs. On the other hand it did take a fair amount of time to do all the questions... I hope developer day is awesome and worth the time spent on the questions
That aside, I picked up Concrete Mathematics a couple of days along with The Art of Computer Programming. I don't imagine I'll have the time to read the latter for a while, but I've started poking through Concrete Mathematics(yeah I'll get back to sicp eventually. Too many books, too little time). I really like how they've added in the student "graffiti". Some of it is very helpful in grasping a concept and other parts keep the text interesting even if they aren't directly useful. It's been a while since I've had to do some serious math (I guess a few of the questions in part 1 of sicp count), so it's got my brain going for a ride, but it's interesting stuff, and I still love to see a formula come together in an elegant manner. Not gotten far in yet, but definitely recommended for those interested.
That aside, I picked up Concrete Mathematics a couple of days along with The Art of Computer Programming. I don't imagine I'll have the time to read the latter for a while, but I've started poking through Concrete Mathematics(yeah I'll get back to sicp eventually. Too many books, too little time). I really like how they've added in the student "graffiti". Some of it is very helpful in grasping a concept and other parts keep the text interesting even if they aren't directly useful. It's been a while since I've had to do some serious math (I guess a few of the questions in part 1 of sicp count), so it's got my brain going for a ride, but it's interesting stuff, and I still love to see a formula come together in an elegant manner. Not gotten far in yet, but definitely recommended for those interested.
Thursday, August 26, 2010
Recent activity: Google DevQuiz and why you probably shouldn't use Visual C++
Google Developer Day Tokyo is on end of September.
After previous years where signup was first come first served, this year they're filtering people by using a "developer quiz". Seems a reasonable enough move to make. I've noticed a few people who are interested in going just for the prospect of a chance at a "free phone" since there is some speculation they'll be giving away an android handset again this year(If I'm not mistaken they already gave them out at this years I/O).
Despite starting early, and getting the first half done well before the deadline, I got distracted by other things for a while and ended up doing the remaining half in the last day. Got all my solutions in, and at least according to the automated grader I cleared all the problems successfully. I can only hope that I don't lose points for bad code or something(because my "test of concept" prototype ended up getting hacked into my final solution...). I'll probably throw up some of the code here later with light commentary.
Unfortunately I lost a fair bit of time to poor problem comprehension as they were stated only in Japanese. I'm quite comfortable with it, but considering there's a reasonable amount of English speakers participating, it would have been nice to have English version of the questions.
Since I was doing my work on a pc I don't normally use for development, I thought I'd give Visual C++ a go for old times sake(when I was doing a lot of DirectX/Win32 programming at university)... See how it was now.
Unfortunately I was rather disappointed. While IDE's like eclipse have come ahead leaps and bounds, Visual C++(or at least the express edition) is slow and clumsy. Often for apparently no good reason it would just slow down, and syntax checking and highlighting is terribly buggy. While it is the express edition with a cut-down feature set, I don't think that "adding bugs" is the same as feature cuts. I used to have a lot of appreciation for Visual Studio as one of the better Microsoft products, but right now it does little to please and much to annoy.
On a final note, I recently picked up "Lions' Commentary on UNIX 6th Edition". I've only just gotten started on it, but it's a real trip to the past. The old school(pre-ANSI) C is a real shocker, and having to learn PDP-11 assembly is a bit of an irritation, but what I've read so far has been interesting and educating.
After previous years where signup was first come first served, this year they're filtering people by using a "developer quiz". Seems a reasonable enough move to make. I've noticed a few people who are interested in going just for the prospect of a chance at a "free phone" since there is some speculation they'll be giving away an android handset again this year(If I'm not mistaken they already gave them out at this years I/O).
Despite starting early, and getting the first half done well before the deadline, I got distracted by other things for a while and ended up doing the remaining half in the last day. Got all my solutions in, and at least according to the automated grader I cleared all the problems successfully. I can only hope that I don't lose points for bad code or something(because my "test of concept" prototype ended up getting hacked into my final solution...). I'll probably throw up some of the code here later with light commentary.
Unfortunately I lost a fair bit of time to poor problem comprehension as they were stated only in Japanese. I'm quite comfortable with it, but considering there's a reasonable amount of English speakers participating, it would have been nice to have English version of the questions.
Since I was doing my work on a pc I don't normally use for development, I thought I'd give Visual C++ a go for old times sake(when I was doing a lot of DirectX/Win32 programming at university)... See how it was now.
Unfortunately I was rather disappointed. While IDE's like eclipse have come ahead leaps and bounds, Visual C++(or at least the express edition) is slow and clumsy. Often for apparently no good reason it would just slow down, and syntax checking and highlighting is terribly buggy. While it is the express edition with a cut-down feature set, I don't think that "adding bugs" is the same as feature cuts. I used to have a lot of appreciation for Visual Studio as one of the better Microsoft products, but right now it does little to please and much to annoy.
On a final note, I recently picked up "Lions' Commentary on UNIX 6th Edition". I've only just gotten started on it, but it's a real trip to the past. The old school(pre-ANSI) C is a real shocker, and having to learn PDP-11 assembly is a bit of an irritation, but what I've read so far has been interesting and educating.
Thursday, August 12, 2010
The Practice of Programming
Yep. I'm far from done with SICP, but the other books I feel compelled to order are also very interesting.
Recently I read a quote somewhere about programming style that was taken out of Kernighan and Pike's "The Practice of Programming". It was neat enough that I went to check it out on Amazon, and as one would, ended up buying it.
Having read through the first few chapters so far it's been a good refreshed on some basic algorithms, and some well thought out strategies. The CVS formatting example in chapter 4 is far from amazing, but it takes a good measured approach to a common problem that has far too many annoying little edge cases. More than that however, the Markov Chain work in chapter 3 was great. It was interesting to see the 150 line C program taken down to 20 lines in awk. Maybe one of these days I'll take a look at that(after haskell, clojure, ada and all the other languages with nifty features I've been wanting to look at). What was surprising though was how poor some of the STL implementations appeared to be.
A very compact book well worth the read for anyone serious about programming. If I have any criticism of it, it would likely lie in some of the rather cryptic examples that could've been a lot more clear with proper variable naming. They are always explained afterwards, and very clever, but many of them don't feel entirely necessary. Despite that minor niggle, definitely recommended and I look forward to the other half.
Recently I read a quote somewhere about programming style that was taken out of Kernighan and Pike's "The Practice of Programming". It was neat enough that I went to check it out on Amazon, and as one would, ended up buying it.
Having read through the first few chapters so far it's been a good refreshed on some basic algorithms, and some well thought out strategies. The CVS formatting example in chapter 4 is far from amazing, but it takes a good measured approach to a common problem that has far too many annoying little edge cases. More than that however, the Markov Chain work in chapter 3 was great. It was interesting to see the 150 line C program taken down to 20 lines in awk. Maybe one of these days I'll take a look at that(after haskell, clojure, ada and all the other languages with nifty features I've been wanting to look at). What was surprising though was how poor some of the STL implementations appeared to be.
A very compact book well worth the read for anyone serious about programming. If I have any criticism of it, it would likely lie in some of the rather cryptic examples that could've been a lot more clear with proper variable naming. They are always explained afterwards, and very clever, but many of them don't feel entirely necessary. Despite that minor niggle, definitely recommended and I look forward to the other half.
Tuesday, July 27, 2010
Huffman encoding tree. SICP 2.68 - 2.72
I can't help but notice that the further I get into sicp, the more of the blogs chronicling other peoples progress on it seem to just drop off, which is unfortunate. Honestly, if I was to post detailed solutions to everything, I doubt I would keep at it myself either.
Anyway, the huffman encoding trees stuff was pretty cool, so I'll post my solutions here.
First, some general implementation details for the trees(as always looking in the text for the explanations):
This is all straight out of sicp.
Also, as a "setup" sicp supplies a couple of functions for working with sets, to which I add one(element-of-set, which checks if a pair corresponding to a symbol is present)
SICP 2.68 encoding a message:
Idea here is to encode one symbol at a time from the tree. SICP supplies us with encode:
And here's my implementation of encode-symbol:
SICP 2.69 generating a huffman tree:
Here's the real meat, and a surprisingly simple case once you get to it. First as part of the problem definition we're given generate-huffman-tree and told to implement successive-merge.
SICP 2.70 a huffman song:
Not much to this question. Apply the routines to some sample data.
SICP 2.71 and 2.72 some theory
It should be quite apparent that if we form a tree in a pattern of 1,2,4,8,16... that each sub-tree formed would be merged one at a time with the next item, giving a tree of n-1 levels. The most frequent symbol would be a single access away, while the least would be n-1 levels away(since the final level has 2 leaves, not 1). All remaining items would be on the nth level.
The interesting thing though is, that as we're dealing with a sorted list, to get the first(most frequent) item, it is right at the end of the list. We thus end up going through every entry of (symbols tree) to find it, but only once, giving a complexity of n. The final entry on the other hand is at the very head of the list, and thus immediately found each time. However it has to go through n levels of the tree, also giving an order of growth n, though we can assume there would be more overhead from cons and whatnot.
Anyway, the huffman encoding trees stuff was pretty cool, so I'll post my solutions here.
First, some general implementation details for the trees(as always looking in the text for the explanations):
This is all straight out of sicp.
(define (make-leaf symbol weight) (list 'leaf symbol weight)) (define (make-leaf symbol weight) (list 'leaf symbol weight)) (define (leaf? object) (eq? (car object) 'leaf)) (define (symbol-leaf x) (cadr x)) (define (weight-leaf x) (caddr x)) (define (make-code-tree left right) (list left right (append (symbols left) (symbols right)) (+ (weight left) (weight right)))) (define (left-branch tree) (car tree)) (define (right-branch tree) (cadr tree)) (define (symbols tree) (if (leaf? tree) (list (symbol-leaf tree)) (caddr tree))) (define (weight tree) (if (leaf? tree) (weight-leaf tree) (cadddr tree))) (define (decode bits tree) (define (decode-1 bits current-branch) (if (null? bits) '() (let ((next-branch (choose-branch (car bits) current-branch))) (if (leaf? next-branch) (cons (symbol-leaf next-branch) (decode-1 (cdr bits) tree)) (decode-1 (cdr bits) next-branch))))) (decode-1 bits tree)) (define (choose-branch bit branch) (cond ((= bit 0) (left-branch branch)) ((= bit 1) (right-branch branch)) (else (error "bad bit -- CHOOSE-BRANCH" bit))))
Also, as a "setup" sicp supplies a couple of functions for working with sets, to which I add one(element-of-set, which checks if a pair corresponding to a symbol is present)
(define (adjoin-set x set) (cond ((null? set) (list x)) ((< (weight x) (weight (car set))) (cons x set)) (else (cons (car set) (adjoin-set x (cdr set)))))) (define (element-of-set? x set) (cond ((null? set) #f) ((eq? (symbol x) (symbol (car set))) #t) (else (element-of-set? x (cdr set))))) (define (make-leaf-set pairs) (if (null? pairs) '() (let ((pair (car pairs))) (adjoin-set (make-leaf (car pair) (cadr pair)) (make-leaf-set (cdr pairs))))))
SICP 2.68 encoding a message:
Idea here is to encode one symbol at a time from the tree. SICP supplies us with encode:
(define (encode message tree) (if (null? message) '() (append (encode-symbol (car message) tree) (encode (cdr message) tree))))
And here's my implementation of encode-symbol:
(define (encode-symbol sym tree) (define (get-encoding branch) (cond ((leaf? branch) '()) ((element-of-set? sym (symbols (left-branch branch))) (cons 0 (get-encoding (left-branch branch)))) ((element-of-set? sym (symbols (right-branch branch))) (cons 1 (get-encoding (right-branch branch)))))) (if (element-of-set? sym (symbols tree)) (get-encoding tree) (error "Symbol not part of encoding set")))I start out with a simple check that the symbol is part of the set. From there we check which branch the element is, and work down the tree, consing on the relevant bits until we hit a leaf which returns a null and completes the encoded symbol.
SICP 2.69 generating a huffman tree:
Here's the real meat, and a surprisingly simple case once you get to it. First as part of the problem definition we're given generate-huffman-tree and told to implement successive-merge.
(define (generate-huffman-tree pairs) (successive-merge (make-leaf-set pairs)))Here's my take on successive-merge:
(define (successive-merge pairs) (if (= (length pairs) 1) (car pairs) (successive-merge (adjoin-set (make-code-tree (cadr pairs) (car pairs)) (cddr pairs)))))Note that as the pairs are sorted, we need only grab the smallest(first two) pairs, join them together into a new sub-tree, and then place it in the correct position in the list(according to its weight. This is achieved with the modified adjoin-set detailed earlier).
SICP 2.70 a huffman song:
Not much to this question. Apply the routines to some sample data.
(define song-tree (generate-huffman-tree '((A 2) (BOOM 1) (GET 2) (JOB 2) (NA 16) (SHA 3) (YIP 9) (WAH 1)))) (length (encode '(get a job sha na na na na na na na na get a job sha na na na na na na na na wah yip yip yip yip yip yip yip yip yip sha boom) song-tree)) ;Value: 84As we can see it comes out to 84 bits. As a fixed-length code each word would be 3 bits, and with 36 words, we would get a grand total of 108 bits, which of course is longer.
SICP 2.71 and 2.72 some theory
It should be quite apparent that if we form a tree in a pattern of 1,2,4,8,16... that each sub-tree formed would be merged one at a time with the next item, giving a tree of n-1 levels. The most frequent symbol would be a single access away, while the least would be n-1 levels away(since the final level has 2 leaves, not 1). All remaining items would be on the nth level.
The interesting thing though is, that as we're dealing with a sorted list, to get the first(most frequent) item, it is right at the end of the list. We thus end up going through every entry of (symbols tree) to find it, but only once, giving a complexity of n. The final entry on the other hand is at the very head of the list, and thus immediately found each time. However it has to go through n levels of the tree, also giving an order of growth n, though we can assume there would be more overhead from cons and whatnot.
Sunday, July 18, 2010
In-fix derivation (sicp exercise 2.58 b)
In part 2.3.2 Symbolic Differentiation, we develop a system to calculate simple derivations of calculations in prefix form( such as: + 2 5).
In 2.58 the task is to change this to in-fix form (going with the last example 2 + 5). In addition, it should be able to handle several different operations in a single list(for example 2 * x * x + 5 * x + 3), and correctly prioritize the operations so as to give a correct derivative, without modifying the original deriv procedure.
After going through a lot of dead-end solutions(many of which involved playing around with make-sum and make-product to read ahead and see what else is in the list but were cumbersome and lacking in "elegance") I finally figured out an elegant solution that makes sense.
Here's the initial code that the new data must be matched to:
To achieve it, the "detector functions" (in this case sum? and product?) must be changed to figure if there is a corresponding operation anywhere in the top level.
With this
x * 5 + x * 4
Would be detected as a sum (it would also be detected as a product, but it is derivs job to make sure the operations are handled in the right order).
We want it to be correctly handled as
(x * 5) + (x * 4)
To this end, we redefine addend and augend as:
addend => everything before the +
augend => everything after the +
In the same way we handle mutiplier and multiplicand(reusing the overlapping code of course)
The make-product and make-sums are barely changed. The order of the outputted list is about all that changes:
The other stuff provided by the problem needed for the program to work:
As a test of all this, passing some derivations:
As to further improvements... Well there is still a lot of work that could be done on simplifying the outputs.
Overall though I am very happy with this solution and would be comfortable adding more operations to deriv(simply add the most "top-level" ops at the top, and they will then properly calculate their subs before operating on them)
In 2.58 the task is to change this to in-fix form (going with the last example 2 + 5). In addition, it should be able to handle several different operations in a single list(for example 2 * x * x + 5 * x + 3), and correctly prioritize the operations so as to give a correct derivative, without modifying the original deriv procedure.
After going through a lot of dead-end solutions(many of which involved playing around with make-sum and make-product to read ahead and see what else is in the list but were cumbersome and lacking in "elegance") I finally figured out an elegant solution that makes sense.
Here's the initial code that the new data must be matched to:
(define (deriv exp var) (cond ((number? exp) 0) ((variable? exp) (if (same-variable? exp var) 1 0)) ((sum? exp) (make-sum (deriv (addend exp) var) (deriv (augend exp) var))) ((product? exp) (make-sum (make-product (multiplier exp) (deriv (multiplicand exp) var)) (make-product (deriv (multiplier exp) var) (multiplicand exp)))) (else (error "unknown expression type -- DERIV" exp))))
To achieve it, the "detector functions" (in this case sum? and product?) must be changed to figure if there is a corresponding operation anywhere in the top level.
(define (has-operation? x target) (cond ((and (pair? x) (pair? (cdr x)) (eq? (cadr x) target)) #t) ((and (pair? x) (pair? (cdr x))) (sum? (cddr x))) (else #f))) (define (sum? x) (has-operation? x '+)) (define (product? x) (has-operation? x '*))
With this
x * 5 + x * 4
Would be detected as a sum (it would also be detected as a product, but it is derivs job to make sure the operations are handled in the right order).
We want it to be correctly handled as
(x * 5) + (x * 4)
To this end, we redefine addend and augend as:
addend => everything before the +
augend => everything after the +
(define (addend s) (before-op s '+)) (define (augend s) (after-op s '+)) (define (before-op s target) (define (list-before s target) (if (or (null? s) (eq? (car s) target)) '() (cons (car s) (list-before (cdr s) target)))) (clean (list-before s target))) (define (after-op s target) (define (list-after s target) (cond ((null? (cdddr s)) (caddr s)) ((eq? target (cadr s)) (cddr s)) (else (list-after (cddr s) target)))) (clean (list-after s target))) (define (clean target) (cond ((not (pair? target)) target) ((null? (cdr target)) (car target)) (else target)))
In the same way we handle mutiplier and multiplicand(reusing the overlapping code of course)
(define (multiplier p) (before-op p '*)) (define (multiplicand p) (after-op p '*))
The make-product and make-sums are barely changed. The order of the outputted list is about all that changes:
(define (make-sum a1 a2) (cond ((and (number? a1) (number? a2)) (+ a1 a2)) ((=number? a1 0) a2) ((=number? a2 0) a1) (else (list a1 '+ a2)))) (define (make-product m1 m2) (cond ((or (=number? m1 0) (=number? m2 0)) 0) ((=number? m1 1) m2) ((=number? m2 1) m1) ((and (number? m1) (number? m2)) (* m1 m2)) (else (list m1 '* m2))))
The other stuff provided by the problem needed for the program to work:
(define (variable? x) (symbol? x)) (define (=number? exp num) (and (number? exp) (= exp num))) (define (same-variable? v1 v2) (and (variable? v1) (variable? v2) (eq? v1 v2)))
As a test of all this, passing some derivations:
; x^2 + 2x + 5 (deriv '(x * x + (x + x + 5)) 'x) ;Value 120: ((x + x) + 2) ; 2x + 2 ;x^3 + 5x^2 - 3x + 8 (deriv '(x * x * x + x * x * 5 + x * -3 + 8) 'x) ;Value 121: (((x * (x + x)) + (x * x)) + (((x * 5) + (x * 5)) + -3)) ;3x^2 + 10x - 3
As to further improvements... Well there is still a lot of work that could be done on simplifying the outputs.
Overall though I am very happy with this solution and would be comfortable adding more operations to deriv(simply add the most "top-level" ops at the top, and they will then properly calculate their subs before operating on them)
Tuesday, July 13, 2010
Regarding the difficulty of sicp
Sicp(Structure and interpretation of computer programs) if anything has made me aware of weaknesses and strengths.
Getting around lambda notation has been one challenge. It is gradually coming more naturally, but after programming for years mainly in C++ getting used to functions as first order has put my brain for a twist. Several times.
Similarly, I have been really poor about data hiding for a long time. Nowadays I am certainly better about it, but I still have difficulty considering ideas such as flatmap as a black box. I try to think about what is going on inside it rather than "what does flatmap do?", making a relatively simple problem(the queens problem from my last post) a lot more mentally challenging.
On the other hand, working with the picture language described in part 2.2.4 came very naturally. I can only presume this was from working in computer graphics which I studied as part of my computer games development course at university.
Of course the natural thing to do would be to describe all the transformations in matrices...
That being said, despite being an "introductory" textbook, sicp really challenges a lot of preconceived notions and habits, and opens up new ways of thinking. It is certainly challenging, but it is definitely worth taking the time to read(at least so far).
I can't wait to get to the interpreter implementation. If only I could balance more time into it from all the other reading I'm doing now(Programming Pearls just arrived but it may be a while till I get around to that).
Getting around lambda notation has been one challenge. It is gradually coming more naturally, but after programming for years mainly in C++ getting used to functions as first order has put my brain for a twist. Several times.
Similarly, I have been really poor about data hiding for a long time. Nowadays I am certainly better about it, but I still have difficulty considering ideas such as flatmap as a black box. I try to think about what is going on inside it rather than "what does flatmap do?", making a relatively simple problem(the queens problem from my last post) a lot more mentally challenging.
On the other hand, working with the picture language described in part 2.2.4 came very naturally. I can only presume this was from working in computer graphics which I studied as part of my computer games development course at university.
Of course the natural thing to do would be to describe all the transformations in matrices...
That being said, despite being an "introductory" textbook, sicp really challenges a lot of preconceived notions and habits, and opens up new ways of thinking. It is certainly challenging, but it is definitely worth taking the time to read(at least so far).
I can't wait to get to the interpreter implementation. If only I could balance more time into it from all the other reading I'm doing now(Programming Pearls just arrived but it may be a while till I get around to that).
Sunday, July 11, 2010
Sicp exercise 2.42
Got bitten by a bug tonight... the bug to work a little bit more on sicp, so I went back to the 8 queens problem that had been giving me some hassle before and solved it. It's a fun problem and it seems like the solution I came up for the safety test was a bit different from the usual, so here we go.
For starters, the problem involves figuring out all the possible ways to set down n queens on an n*n chess board in such a manner that no queen is in check from another.
The solution to it involves breaking down the problem by working out the way to set down n-1 queens and trying to place a new queen in all possible positions in the next row, then filtering any out that are poorly placed. This recurses down to 0 queens placed from where it then builds up the queens.
The main routine is provided as part of the problem leaving the solver to work out how to represent boards, and how to check that a set of positions is good.
The code
Auxiliary functions
Here are the auxiliary routines(pretty much as they are in sicp):
Everything else should be in GNU/MIT scheme.
Alternate solutions
Some other solutions can be found here, though I found many of them long-winded and no more clear. I posted my own solution there for good measure.
For starters, the problem involves figuring out all the possible ways to set down n queens on an n*n chess board in such a manner that no queen is in check from another.
The solution to it involves breaking down the problem by working out the way to set down n-1 queens and trying to place a new queen in all possible positions in the next row, then filtering any out that are poorly placed. This recurses down to 0 queens placed from where it then builds up the queens.
The main routine is provided as part of the problem leaving the solver to work out how to represent boards, and how to check that a set of positions is good.
The code
;; Heres a solution with a recursive test based on the number of columns away the
;; queen being tested against is
;; if the distance horizontally is the same as the vertical difference it's not
;; valid
(define (queens board-size)
(define (queen-cols k)
(if (= k 0)
(list empty-board)
(filter
(lambda (positions) (safe? k positions))
(flatmap
(lambda (rest-of-queens)
(map (lambda (new-row)
(adjoin-position new-row k rest-of-queens))
(enumerate-interval 1 board-size)))
(queen-cols (- k 1))))))
(queen-cols board-size))
(define empty-board '())
(define (adjoin-position new-row k rest-of-queens)
(append (list new-row) rest-of-queens))
(define (safe? k positions)
(check-safe
(car positions)
1
(cdr positions)))
;; Check the queens column position against that of all other columns
;; One if vertical distance = horizontal distance it's a diagonal hit
(define (check-safe position distance cols)
(cond ((null? cols) #t)
((= (car cols) position) #f)
((= (- (car cols) distance) position) #f)
((= (+ (car cols) distance) position) #f)
(else (check-safe position (+ distance 1) (cdr cols)))))
Auxiliary functions
Here are the auxiliary routines(pretty much as they are in sicp):
(define (flatmap proc seq)
(accumulate append (list) (map proc seq)))
(define (accumulate op initial sequence)
(if (null? sequence)
initial
(op (car sequence)
(accumulate op initial (cdr sequence)))))
(define (enumerate-interval initial end)
(if (> initial end)
'()
(cons initial (enumerate-interval (+ initial 1) end))))
(define (filter predicate sequence)
(cond ((null? sequence) '())
((predicate (car sequence))
(cons (car sequence)
(filter predicate (cdr sequence))))
(else (filter predicate (cdr sequence)))))
Everything else should be in GNU/MIT scheme.
Alternate solutions
Some other solutions can be found here, though I found many of them long-winded and no more clear. I posted my own solution there for good measure.
Saturday, July 10, 2010
MovableType and Code Complete
Some objectives
I've set up this little whiteboard in the back of my room with a list of things I need to do in the near future. On it I have a few things written like:
I've set up this little whiteboard in the back of my room with a list of things I need to do in the near future. On it I have a few things written like:
- Find a creative writing class
- Set up movable type blog
- Write resume
- Take some time to play with the scheme debugger.
Today I decided to tackle the second one of those. I felt I should really move to movable type. I setup a hosting account, got a domain name, and installed movable type, but while it was nice, I came to the realization that ultimately for what I want to do blogger should be more than up to the job. The time I could've spent making my Movable Type pretty could have been better spent on other things like reading or programming.
Maintaining a blog, and a few static pages. Once I get that resume rewritten, I should have it up here. The advice from Andy Lester's "Find the tech job you love" should come in very useful. I'd heartily recommend it to anyone who wants to keep on top of their game even if they're not currently looking.
Code Complete
Code Complete(by Steve McConnell) is that book I wish I'd read when I started out programming. Steve McConnell(SMC) considers it the first book on code construction, and I think it's only fair to admit that's the case, and it should probably be required reading for any serious programmer.
The book as a whole is highly pragmatic. SMC avoids religious wars and accepts the validity of certain practices while warning strongly about their associated dangers and risks. He thoroughly references everything, covering it with statistics from past research. His advice is solid and can be applied immediately.
Some highlights for me:
- Chapter 9. The Pseudocode Programming Process. This chapter describes a way of writing code by starting out with a general description of what a routine does. From there it is fleshed out into code, and the final result is a well commented routine that makes sense. If you've ever gone back to a piece of code that you wrote a long time ago, trying to figure out what it does, this may well be what you need.
- Chapter 11. The Power of Variable Names. Unfortunately for myself, I don't take much time to think of variable names. I think of the first thing that kind of describes a variable and have no rules for how I lay them out. At least until now.
- Chapter 25. Code-Tuning Strategies. I'm pretty confident with optimization. Chapter 26 (Code-Tuning Techniques) was pretty much all familiar territory. But this chapter really kicked my ass. Until now, I have tended to optimize from the get go, and believed that makes me a superior programmer. An unfortunate attitude which SMC has succeeded in changing my outlook on. To summarize his approach: keep it simple and readable, find the bottlenecks, then fix them. Simple code is going to be easier to maintain in the first place, and optimizing it will certainly be less effort.
- Chapter 28. Managing Construction. This chapter did a good job of describing how I'd like to see software managers work. 28.6. Managing Your Manager was a good laugh, if unfortunate.
- Chapter 33. Personal Character. An excellent breakdown of qualities it takes to become a great programmer. I went on to read Djikstra's the humble programmer afterwards. In a field as complex as software development, accepting that complexity, and breaking down complex issues into simpler parts our simple brains can handle is an important task, and the ability to admit our own lacking there is important.
I don't think it can ever be too late to read this excellent book, and I will be sure to review it from time to time in the future.
Joining the ACM
I joined the ACM not too long ago and it has certainly been worth it so far. Access to a huge backlog of journal articles and back issues of Communications of the ACM is already great, in addition gaining access to a boatload of books and courses is the topping to the cake. The available book collections contain most of the books I have recently bought, which is making me feel just a bit silly. In fact, I left my copy of Code Complete at work after completing it, but in writing this simple opinion piece, I was simply able to refer to the copy I have access to via online.
Tuesday, June 15, 2010
The joy of reading
As it stands I have now worked my way through the pragmatic programmer and peopleware and am at various positions in a number of other books.
I believe neither of these books can be justified by a single reading and I'm realistically going to need to refer to them frequently.
DRY(don't repeat yourself) and orthogonal design, two of the principles of the pragmatic programmer are interestingly enough two significant changes that were coming in about in my coding before reading the book, though I wasn't entirely aware of them. I think a major factor in this was simply reading more of other peoples source, especially from getting involved in open source projects(at the moment I am lightly involved in the HBase project: unfortunately mostly asking questions, but I have made a minor contribution and hope to make more in the future as time affords).
There is a vast richness of other great principles in the book, but these two put to paper affected me most. Code I just knew was not going to be easy to maintain in the future came to mind immediately. If I don't fix it now, I will take a lot more time to fix it later(or even worse someone else will have to because I didn't). The temptation to take a piece of code that does A and copy paste it editing some bits so it can do B can be massive. But later down the line, you'll come back to it, edit A, forget B, C and D, and wonder just what went wrong.
Peopleware: It's really just brilliant. I don't know where to start. I'm quite frankly just not interested in working in management. Maybe in an ideal company it would be interesting. As is, I work primarily as a programmer and my semi-official secondary task is that of team "captain", whatever that may be. Ultimately there are things that need to be done, and if no-one else is going to do them... Well fine. I won't like it, but it's better than no-one doing it.
It's clear the job a of a great manager is really tough. A poor manager can think they're doing a great "job" by bullying people into ridiculous schedules and cutting costs by reducing working space. Peopleware points the inherent flaws in such practices.
For starters, cutting costs on development environments. In a world where maybe 5% of a developers salary will get them a poor work environment and mediocre productivity, investing double that for a quieter environment where work gets done can pay huge dividends. The studies quoted in Peopleware note 2.6* productivity gain in the top quartile as opposed to the bottom. Differences in salary are less than 10% on average, with high salary workers in both quartiles. The big differences lie in sense of privacy, feeling that a working environment is too loud, and simply not having enough space.
Schedules. I don't do much overtime as a matter of personal policy. I have no concerns of productivity due to it. I think I'm getting just as much done, if not more(at least when not interrupted by a constant flow of meetings and requests for this and that). My only concern is the effect on my team. I make significant effort to try and get my team members to just go home, but often find them having worked a couple of hours after I left. Peopleware would have us believe that the cost of this is undertime, and my experience of that is that it's generally accurate.
Another significant problem I have noted with overtime is that people focus entirely on the problem at hand but don't take the time improve their processes because they are working too much overtime.
Now the following is my own crazy idea with little of the actual evidence that peopleware has plenty of:
Even if overtime actually does give a net increase in work done (lets say you can work 10 hours at 90% instead of 8 hours at 100%), there is time wasted there. While Bob doing overtime is staying late to impress managers or some such nonsense, Steve could well be studying to improve his own methods. In fact a really good workplace would probably supply Steve and Bob this time. Anyway, assuming Steve has the spare time and does it as a result, he will be constantly improving, gradually improving his efficiency and speed. Bob will meanwhile be working in the same way as before and eventually Steve's 100% is the equivalent of Bobs 120% and he is flat out plain more productive despite working less.
Are all developers like Steve? Probably not. But the best ones love programming and do work on self-improvement. They are not people any organisation serious about software would want working for them. Denying those that would improve themselves by putting them through overtime through the use of idiotic schedules not only reduces their opinion of their employer, it also increases indifference, loss of enthusiasm and the likelihood of turnover.
I believe neither of these books can be justified by a single reading and I'm realistically going to need to refer to them frequently.
DRY(don't repeat yourself) and orthogonal design, two of the principles of the pragmatic programmer are interestingly enough two significant changes that were coming in about in my coding before reading the book, though I wasn't entirely aware of them. I think a major factor in this was simply reading more of other peoples source, especially from getting involved in open source projects(at the moment I am lightly involved in the HBase project: unfortunately mostly asking questions, but I have made a minor contribution and hope to make more in the future as time affords).
There is a vast richness of other great principles in the book, but these two put to paper affected me most. Code I just knew was not going to be easy to maintain in the future came to mind immediately. If I don't fix it now, I will take a lot more time to fix it later(or even worse someone else will have to because I didn't). The temptation to take a piece of code that does A and copy paste it editing some bits so it can do B can be massive. But later down the line, you'll come back to it, edit A, forget B, C and D, and wonder just what went wrong.
Peopleware: It's really just brilliant. I don't know where to start. I'm quite frankly just not interested in working in management. Maybe in an ideal company it would be interesting. As is, I work primarily as a programmer and my semi-official secondary task is that of team "captain", whatever that may be. Ultimately there are things that need to be done, and if no-one else is going to do them... Well fine. I won't like it, but it's better than no-one doing it.
It's clear the job a of a great manager is really tough. A poor manager can think they're doing a great "job" by bullying people into ridiculous schedules and cutting costs by reducing working space. Peopleware points the inherent flaws in such practices.
For starters, cutting costs on development environments. In a world where maybe 5% of a developers salary will get them a poor work environment and mediocre productivity, investing double that for a quieter environment where work gets done can pay huge dividends. The studies quoted in Peopleware note 2.6* productivity gain in the top quartile as opposed to the bottom. Differences in salary are less than 10% on average, with high salary workers in both quartiles. The big differences lie in sense of privacy, feeling that a working environment is too loud, and simply not having enough space.
Schedules. I don't do much overtime as a matter of personal policy. I have no concerns of productivity due to it. I think I'm getting just as much done, if not more(at least when not interrupted by a constant flow of meetings and requests for this and that). My only concern is the effect on my team. I make significant effort to try and get my team members to just go home, but often find them having worked a couple of hours after I left. Peopleware would have us believe that the cost of this is undertime, and my experience of that is that it's generally accurate.
Another significant problem I have noted with overtime is that people focus entirely on the problem at hand but don't take the time improve their processes because they are working too much overtime.
Now the following is my own crazy idea with little of the actual evidence that peopleware has plenty of:
Even if overtime actually does give a net increase in work done (lets say you can work 10 hours at 90% instead of 8 hours at 100%), there is time wasted there. While Bob doing overtime is staying late to impress managers or some such nonsense, Steve could well be studying to improve his own methods. In fact a really good workplace would probably supply Steve and Bob this time. Anyway, assuming Steve has the spare time and does it as a result, he will be constantly improving, gradually improving his efficiency and speed. Bob will meanwhile be working in the same way as before and eventually Steve's 100% is the equivalent of Bobs 120% and he is flat out plain more productive despite working less.
Are all developers like Steve? Probably not. But the best ones love programming and do work on self-improvement. They are not people any organisation serious about software would want working for them. Denying those that would improve themselves by putting them through overtime through the use of idiotic schedules not only reduces their opinion of their employer, it also increases indifference, loss of enthusiasm and the likelihood of turnover.
Monday, June 7, 2010
Change of plan
Been busy with other books since the arrival of an amazon shipment. Busier than I was with sicp alone.
I originally set deadlines to discipline myself, but have found that it's resulted in negative behavior such as:
* Not taking the time to look into concepts I don't fully pick up on
* Going for sometimes sub-par solutions to exercises.
* An inefficient study pattern.
So I'm not going to do them any more(nor have I been for the last week or so).
Sicp is still on the menu, but with a big pile of other books to go through it may take a while.
Here's what I got. I'll do some write-ups as I finish reading them.
* Land the tech job you love. I finished this one already. Great advice for interviewing and writing a cv, As well as pitfalls to avoid. I'll be putting together a new resume soon based on its advice and posting that up.
* The pragmatic programmer. Only just got started on this one, but so far it's brilliant. Many of the concepts I have been doing... Halfheartedly. I have been applying DRY("don't repeat yourself") to most of my work, but when deadlines come close or I just want to finish something up... Out comes the copy and paste. In the end it's never worth it. Formalising these ideas in my head is sure to be a great help.
* Code complete: next in line after the pragmatic programmer. I should have read this one years ago.
* Facts and fallacies of software engineering. Good train reading as each fact is pretty snappy and to the point. However the fact that half his references are coming from himself is a matter of some concern. 25% added complexity adding 100% workload is a concept that I have experienced time and time again, and having it stated clearly really helps to be more aware of it.
* How to win friends and influence people. This one's courtesy of Joel Spolsky's reading list. I'm pretty much a geek and so I often get frustrated with non-programmers and their inability to understand what seem like simple things. This attitude is counter-productive though so I'm working on improving it. As cheesy as this book may seem, it gives some very practical advice that should be obvious but isn't.
* Rapid development. Another frequently recommended McConnell tome. I look forward to getting to this too though for now it lies in my shelf awaiting its turn.
* Head first design patterns. Another one waiting its turn.
* Out of the Crisis Next in line for more "businessy" stuff after I get done with Drucker.
More are in the mail as we speak, including peopleware.
Refinding the joy of reading has been great so far, but there is a lot of material I've waited for too long to get around to. This should be interesting.
I originally set deadlines to discipline myself, but have found that it's resulted in negative behavior such as:
* Not taking the time to look into concepts I don't fully pick up on
* Going for sometimes sub-par solutions to exercises.
* An inefficient study pattern.
So I'm not going to do them any more(nor have I been for the last week or so).
Sicp is still on the menu, but with a big pile of other books to go through it may take a while.
Here's what I got. I'll do some write-ups as I finish reading them.
* Land the tech job you love. I finished this one already. Great advice for interviewing and writing a cv, As well as pitfalls to avoid. I'll be putting together a new resume soon based on its advice and posting that up.
* The pragmatic programmer. Only just got started on this one, but so far it's brilliant. Many of the concepts I have been doing... Halfheartedly. I have been applying DRY("don't repeat yourself") to most of my work, but when deadlines come close or I just want to finish something up... Out comes the copy and paste. In the end it's never worth it. Formalising these ideas in my head is sure to be a great help.
* Code complete: next in line after the pragmatic programmer. I should have read this one years ago.
* Facts and fallacies of software engineering. Good train reading as each fact is pretty snappy and to the point. However the fact that half his references are coming from himself is a matter of some concern. 25% added complexity adding 100% workload is a concept that I have experienced time and time again, and having it stated clearly really helps to be more aware of it.
* How to win friends and influence people. This one's courtesy of Joel Spolsky's reading list. I'm pretty much a geek and so I often get frustrated with non-programmers and their inability to understand what seem like simple things. This attitude is counter-productive though so I'm working on improving it. As cheesy as this book may seem, it gives some very practical advice that should be obvious but isn't.
* Rapid development. Another frequently recommended McConnell tome. I look forward to getting to this too though for now it lies in my shelf awaiting its turn.
* Head first design patterns. Another one waiting its turn.
* Out of the Crisis Next in line for more "businessy" stuff after I get done with Drucker.
More are in the mail as we speak, including peopleware.
Refinding the joy of reading has been great so far, but there is a lot of material I've waited for too long to get around to. This should be interesting.
Tuesday, May 25, 2010
Work, sicp, and management
Sometimes priorities have to kick in.
The reality of most workplaces is that management can be frustrating. So I'm putting sicp to the side for a week while I read up on management... In Japanese, courtesy of P.F.Druckers "Management"
Why in Japanese? One reason is that it meant I could pick it up straight away at the bookstore.
But more than that, by learning these concepts in Japanese, I hope to be able to also communicate them to coworkers and... managers.
This being said, I have no intent to stop at Drucker. I already have a preconceived notion against one of his principles, that of "management by objectives". While on paper it is a good and motivational idea, my experience of it as a software engineer is receiving objectives I have little to no ability to control(such as sales figures) or that are plainly delusional(timelines set by people with little experience of developing complex systems).
Should time be on my side, I hope to be able to put together notes of my thoughts in this blog for later perusing. After that I hope to work through Deming's Out of the crisis" which should include his 14 points.
Hopefully in a couple more weeks things at work can settle down and I can focus more on what I want to be doing rather than what I need to do to be able to do what I want to do in the future.
The reality of most workplaces is that management can be frustrating. So I'm putting sicp to the side for a week while I read up on management... In Japanese, courtesy of P.F.Druckers "Management"
Why in Japanese? One reason is that it meant I could pick it up straight away at the bookstore.
But more than that, by learning these concepts in Japanese, I hope to be able to also communicate them to coworkers and... managers.
This being said, I have no intent to stop at Drucker. I already have a preconceived notion against one of his principles, that of "management by objectives". While on paper it is a good and motivational idea, my experience of it as a software engineer is receiving objectives I have little to no ability to control(such as sales figures) or that are plainly delusional(timelines set by people with little experience of developing complex systems).
Should time be on my side, I hope to be able to put together notes of my thoughts in this blog for later perusing. After that I hope to work through Deming's Out of the crisis" which should include his 14 points.
Hopefully in a couple more weeks things at work can settle down and I can focus more on what I want to be doing rather than what I need to do to be able to do what I want to do in the future.
Sunday, May 16, 2010
Busy week
I really wanted to get a lot more done this week, but final result is that still not done with chapter 2.2
Getting started during a holiday at least helped get some speed... Now I need another to maintain it.
Here's 2.41 just for the hell of it
Also some definitions that might be needed:
Getting started during a holiday at least helped get some speed... Now I need another to maintain it.
Here's 2.41 just for the hell of it
(define (triples n)
(flatmap (lambda (i)
(map (lambda (j) (cons i j))
(unique-pairs (- i 1))))
(enumerate-interval 1 n)))
(define (s-triples target max)
(filter (lambda (x) (= target (accumulate + 0 x))) (triples max)))
Also some definitions that might be needed:
(define (unique-pairs n)
(flatmap (lambda (i)
(map (lambda (j) (list i j))
(enumerate-interval 1 (- i 1))))
(enumerate-interval 1 n)))
(define (enumerate-interval start n)
(define (enumerate x)
(if (< n x)
null
(cons x (enumerate (+ 1 x)))))
(enumerate start))
(define (filter predicate sequence)
(cond ((null? sequence) null)
((predicate (car sequence))
(cons (car sequence)
(filter predicate (cdr sequence))))
(else (filter predicate (cdr sequence)))))
Monday, May 10, 2010
Progress slowing down
A couple of hours and only managed to get 3 exercises done cleanly today... Bad day perhaps? Getting easily distracted. Solution to 2.35 just because it's really cool:
count-leaves as an accumulation:
And for reference, accumulate is:
Also an alternative map function from one of the other exercises:
Mind-bending, but fun stuff.
Edit: well, I was going to sleep, but the next exercise looked interesting too, so I figured it out much quicker(finally getting the hang of using lambda?!)
sicp exercise 3.36:
count-leaves as an accumulation:
(define (count-leaves t)
(accumulate + 0 (map (lambda (x)
(if (pair? x)
(count-leaves x)
1))
t)))
And for reference, accumulate is:
(define (accumulate op initial sequence)
(if (null? sequence)
initial
(op (car sequence)
(accumulate op initial (cdr sequence)))))
Also an alternative map function from one of the other exercises:
(define (map p sequence)
(accumulate (lambda (x y) (cons (p x) y)) nil sequence))
Mind-bending, but fun stuff.
Edit: well, I was going to sleep, but the next exercise looked interesting too, so I figured it out much quicker(finally getting the hang of using lambda?!)
sicp exercise 3.36:
(define (accumulate-n op init seqs)
(if (null? (car seqs))
null
(cons (accumulate op init (accumulate (lambda (x y) (cons (car x) y)) null seqs))
(accumulate-n op init (accumulate (lambda (x y) (cons (cdr x) y)) null seqs)))))
Sunday, May 9, 2010
Week 2 results
Finished up section 2.1 and working away through section 2.2, a long one.
I generally found section one tougher, probably just because the mathematical proofs which it's been a while since I did.
Next week there's no more golden week holidays so I'm just going to make sure I at least finish up 2.2, and if possible get some progress into 2.3
I generally found section one tougher, probably just because the mathematical proofs which it's been a while since I did.
Next week there's no more golden week holidays so I'm just going to make sure I at least finish up 2.2, and if possible get some progress into 2.3
Wednesday, May 5, 2010
2.13
Pardon the lazy formatting(time spent writing blog = time not spent figuring stuff out!)
I've been using Ken's answers to check mine but unfortunately it turned out his answer for 2.13 was wrong(upon checking against real numbers).
The resultant tolerance of two intervals is going to be
tab=(ta+tb)/(1+ta*tb)
Edit: as it turns out, the question is "Assuming small tolerances". Which means that ta*tb is close to 0 or negligible, which would of course mean that
tab = ta + tb
I still think the way I figured it should be easier to follow so I'm going to leave this up
You can check it by just making a couple of intervals using the procedure from exercise 2.12, multiplying them and getting the percent:
> (define a (make-center-percent 10 20))
> (define b (make-center-percent 25 5)
> a
(8 . 12)
> b
(23 3/4 . 26 1/4)
> (define ab (mul-interval a b))
> ab
(190 . 315)
> (percent ab)
24 76/101
so we have 20 and 5 -> 24 76/101
(0.2+0.5)/(1+0.2*0.05)
->0.24752475247524752475247524752475
Here's what I figured for exercise 2.13:
So long as percentage not over 100% All numbers stay +
int1 * int2
Result is:
lower: a-min * b-min
upper: a-max * b-max
tolerance1 = (a-max - a-min)/2 / (a-max - (a-max-a-min)/2)
lets call (a-max - a-min)/2 the "variance"
tolerance = variance/(int-max - variance)
in resultant interval ab the interval is
(a-min * b-min , a-max * b-max)
the variance is thus
(a-max*b-max - a-min*b-min)/2
and tolerance is
((a-max*b-max - a-min*b-min)/2) / (a-max*b-max - ((a-max*b-max - a-min*b-min)/2)))
We can also say
a-max = center-a * (1+tolerance-a)
a-min = center-b * (1-tolerance-a)
So tolerance a*b is:
calling tolerance-a=ta and tolerance-b=tb for simplicity
(removing center-a*center-b as everything shares it
((1+tolerance-a) * (1+tb) - (1-ta) * (1-tb) / 2) /
(1+ta) * (1+tb) - ((1+ta) * (1+tb) - (1-ta) * (1-tb) / 2))
just analyzing the top pert we get:
((1+ta) * (1+tb) - (1-ta) * (1-tb) / 2)
=> ((1 - 1) + (ta*tb - ta*tb) + ta + tb + ta + tb) /2
=> ta + tb
substituting back to full equation:
ta+tb / ((1+ta+tb+ta*tb) - (ta+tb))
=> ta+tb / (1+ta*tb)
Sicp exercise 2.11
I have no intention of posting all my solutions since I would prefer to spend the time figuring them out and studying over making them look pretty and understandable...
Anyone looking for solutions should check out one of the many other blogs that have solved this issue, such as Ken Dyck's
or (in progress) Bill the Lizard
That being said, my solution to exercise 2.11 is pretty convoluted, and I'm not too happy with Kens one either... If you have suggestion, comment!
Anyone looking for solutions should check out one of the many other blogs that have solved this issue, such as Ken Dyck's
or (in progress) Bill the Lizard
That being said, my solution to exercise 2.11 is pretty convoluted, and I'm not too happy with Kens one either... If you have suggestion, comment!
(define (int-type x)
(cond ((> (lower-bound x) 0) 1)
((< (upper-bound x) 0) -1)
(else 0)))
(define (mul-interval x y)
(let ((x-type (int-type x))
(y-type (int-type y)))
(cond ((and (= x-type 1) (= y-type 1))
(make-interval (* (lower-bound x) (lower-bound y))
(* (upper-bound x) (upper-bound y))))
((and (= x-type 1) (= y-type 0))
(make-interval (* (upper-bound x) (lower-bound y))
(* (upper-bound x) (upper-bound y))))
((and (= x-type 1) (= y-type -1))
(make-interval (* (upper-bound x) (lower-bound y))
(* (lower-bound x) (upper-bound y))))
((and (= x-type 0) (= y-type 1))
(make-interval (* (lower-bound x) (upper-bound y))
(* (upper-bound x) (upper-bound y))))
((and (= x-type 0) (= y-type 0))
(let ((p1 (* (lower-bound x) (lower-bound y)))
(p2 (* (lower-bound x) (upper-bound y)))
(p3 (* (upper-bound x) (lower-bound y)))
(p4 (* (upper-bound x) (upper-bound y))))
(make-interval (min p1 p2 p3 p4)
(max p1 p2 p3 p4))))
((and (= x-type 0) (= y-type -1))
(make-interval (* (upper-bound x) (lower-bound y))
(* (lower-bound x) (lower-bound y))))
((and (= x-type -1) (= y-type 1))
(make-interval (* (lower-bound x) (upper-bound y))
(* (upper-bound x) (lower-bound y))))
((and (= x-type -1) (= y-type 0))
(make-interval (* (lower-bound x) (upper-bound y))
(* (lower-bound x) (lower-bound y))))
((and (= x-type -1) (= y-type -1))
(make-interval (* (upper-bound x) (upper-bound y))
(* (lower-bound x) (lower-bound y))))
(else (error "This shouldn't happen!")))))
Monday, May 3, 2010
Week 2
Going to get cracking on section 2 sicp, but take my time so I can catch up on some of the lecture notes and videos that I had to pass just to get through the content of the book.
Completing 2.1 is definite, with spare time it'd be good to get 2.2 done as well
Completing 2.1 is definite, with spare time it'd be good to get 2.2 done as well
Sunday, May 2, 2010
Section 1: done
I certainly set a high bar for the first week and barely scraped through.
I also cheated slightly by skipping a couple of exercises (1.45 which I was just too tired to experiment with, and one of the math proofs)
Here's my programs for the final exercise
It's been enjoyable, but considering the quantity of exercises, I really had to barge through it. I'm going to take the next week easier so I can review some of the lectures.
Posting next sundays target tomorrow.
I also cheated slightly by skipping a couple of exercises (1.45 which I was just too tired to experiment with, and one of the math proofs)
Here's my programs for the final exercise
1.46:
(define (iterative-improve good-enough improve)
(lambda (guess)
(define (iter new-guess)
(if (good-enough new-guess)
new-guess
(iter (improve new-guess))))
(iter guess)))
(define (sqrt x)
(define (good-enough guess)
(display guess)
(newline)
(if (< (abs (- (square guess) x))
0.001)
true
false))
(define (improve guess)
(average guess (/ x guess)))
((iterative-improve good-enough improve) x))
for fixed point:
(define (fixed-point f)
(define tolerance 0.00001)
(define (good-enough guess)
(let ((new-guess (f guess)))
(if (< (abs (- new-guess guess)) tolerance)
true
false)))
(define (improve guess)
((average-damp f) guess))
(iterative-improve good-enough improve))
(define (average-damp f)
(lambda (x) (average x (f x))))
> ((fixed-point cos) 1.0)
0.7390885809390573
> (define (fixed-sqrt x)
((fixed-point (lambda (y) (/ x y)))
1.0))
> (fixed-sqrt 10)
3.162277665175675
It's been enjoyable, but considering the quantity of exercises, I really had to barge through it. I'm going to take the next week easier so I can review some of the lectures.
Posting next sundays target tomorrow.
Deadline approaching
Welp, I've cleared all the exercises up to and including 1.39
One subsection of 1.3 remains.
But before hitting that, I have to say I am impressed with scheme.
Procedures that return dynamically generated procedures!? That's just awesome. I've heard that MIT actually took scheme off the curriculum last year, which is a real pity. There's no doubt this stuff is difficult. Mapping out in my head how I'm going to put down an iterative process where each step depends on the previous step is interesting. The moment of realizing the solution is awesome.
In this particular case I'm referring to the iterative solution of 1.37: it involves implementing a function to calculate the k-term finite continued fraction and can be found here: http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-12.html#%_sec_1.3
My solution:
N and D are procedures that should return the nth operands.
start at the end(where we have Nk/Dk), work backwards.
(define (cont-frac-iter n d k)
(define (iter a result)
(if (< a 0)
result
(iter (- a 1) (/(n a) (+ (d a) result)))))
(iter k 0))
I don't think I'd use it for actual projects unless there are some far better tools and libraries out there. But while I was pretty confident in my programming abilities, this is making me think of all kind of fresh things. Awesome book.
Friday, April 30, 2010
2 am on friday night
2 am friday night and I'm finally towards the end of section 1.2.
Section 1.1 seemed "not too bad" so I thought section one in a week would be pretty ok.
It hasn't been.
But it has been most interesting. Despite having been excellent at math in high school, including all the pure math, so much of it is long forgotten. From the depths of my mind gradually proof by induction crawls back out. Ancient methods from 7 years ago(ha! ancient my ass).
Scheme itself is ok, apart from the fact that nothing ever works because of the strict parsing.
(+a b)? Ooops.
If someone is going to learn scheme, it should probably be their first language. A lot of stuff makes plenty of sense, it's just that to someone that normally programs c, it has some nasty gotchas. Just gotta do exercise 1.28 and then onward to section 1.3! So exciting!
Wednesday, April 28, 2010
Non-programming things to do in the near future
Take an introductory course in microeconomics.
Take a course in creative writing.
But where? That is something to answer once I'm done with sicp
Tuesday, April 27, 2010
Ackermann's function funtime
So this weeks assignment, to get through section 1, seems like a chunky enough piece of work.
Especially when doing the exercises.
First contact with scheme has not been friendly. Lisp is an ugly language, and all those brackets are throwing me for more of a spin than pointers ever did.
But that aside, I got horribly side-tracked by the MIT definition of the ackermann function and the wikipedia one.
edit: they don't give the same results, and thank god for that because my expensions seemed to be all wrong.
MIT:
(define (A x y)
(cond ((= y 0) 0)
((= x 0) (* 2 y))
((= y 1) 2)
(else (A (- x 1)
(A x (- y 1))))))
The wikipedia method(converted to lisp for comparison)
(define (A m n)
(cond ((= m 0) (+ n 1))
((= n 0) (A (- m 1) 1))
(else (A (- m 1)
(A m (- n 1)))))
Should someone happen to be reading this that can offer some insight on this, please do let me know what's changed
Monday, April 26, 2010
Week 1: Back to basics
I mentioned in my first post the perils of java schools as a source of inspiration. So really, what better way to get started on this than the book mentioned in it, "Structure and Interpretation of Computer Programs"
So without further ado, it is there I shall start, week 1, Section 1, see how that goes and work out future pacing from there.
A beginning.
I'm happy with my abilities as a programmer. I'm not arrogant to think of myself as one of the best programmers to grace this earth, but I think all things told, that I'm pretty good. I've been doing it as a hobby since I was young, and now I do it at work.
But that can't be enough. Why settle at that when there is far more to do? There's certainly no harm in trying to be as good as possible
So while this quest doesn't exactly begin here, here I shall begin my chronicle thereof. I'll be keeping track of my personal objectives, and posting musings.
This blog is primarily something I'm starting for myself, but should someone else be doing something similar, I'd love to hear their experiences and exchange stories. Perhaps the objectives I set for myself may give others ideas. Perhaps others have good ideas for objectives. Perhaps 5 years from now it will have grown into something entirely different.
A couple of rules that I intend to play by:
1) No work stuff. It's (mostly) proprietary... unfortunately.
2) No personal stuff. This is about programming.
3) No holy wars. Because really, most sides have some valid points.
Finally, the inspiration to start this blog came from http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html . I've learnt C, understand pointers and recursion. But there is so much I haven't learned. There are hundreds of good volumes out there, and there is a lot I hope to learn while writing this blog.
Subscribe to:
Posts (Atom)