Wednesday, 21 December 2016

Laws in Physics vs. in Computer Science

The laws of physics and nature are unchanged and we just have to discover them. It requires a lot of imagination and thorough study to fully explain and prove a law. A beautiful and rewarding effect is that later on we can predict certain phenomenon or simply what will happen next based on the laws. An example is the Fermat law that says that light always traverses the path that gives it the lowest running time. You can, for example, infer that because the atmosphere is denser than the vacuum, the light that comes to your eye from the Sun is curved and in reality when you watch a sunset, the Sun is already below the horizon, even though it is still visible.

On the other hand, there are no true and unchanged laws in computer science. The design of systems changes and the way it does can be surprising as every designer/human being has different preferences and can even change a commonly accepted "good design choices". An example is a mismatch between OS page size and DBMS page size, 4 KB and 8 KB respectively. The database can fetch 8 KB pages randomly, for example, because of the index-based scan. On the other hand, the OS notices that the application (DBMS in this case) starts from fetching 2 pages in a row (2x OS page of 4 KB - gives the 8 KB page fetched by the DBMS) and OS assumes that it must be a sequential scan of pages, so it additionally prefetches another 2x 4 KB pages (16 KB in total), the second 8 KBs in vain. The other 2 pages won't be read subsequently. Afterwards, DBMS goes to a totally different location to fetch another 8KB page.

The systems could have a built-in level of adjustment (probably built-in Machine Learning components) that could simulate the nature and automatically improve as the time progresses, for example changing the number of pages to 3 in a row to trigger sequential scan assumption in the OS. The computer environment should imitate better the natural environment.

Thursday, 27 October 2016

$ fortune

$ fortune
Real software engineers work from 9 to 5, because that is the way the job is
described in the formal spec.  Working late would feel like using an
undocumented external procedure.

Monday, 10 October 2016

Thouhgt of the day from Quora

"Get a good education, find someone you love, do something you're good at and don't stop until you're the greatest at it. That way, you don't care what people think of your life or your pictures, you can just live your life the way you've always dreamt." Feynman: "What do you care what other people think?"

Thursday, 22 September 2016

PhD student has to write blog posts regularly, here you go.

"PhD is a career like running your own business". I swung by an event for new incoming students to UChicago and herd such a sentence. I set my heart on learning a couple more things from my fellow students on how I should live my PhD life. I read a couple of pages in a book on how to write academic papers and one advice was that you should call your supervisor: advisor. It was repeated in the talk. It seems to me that in this matter the nomenclature is really of importance. It's not only your advisor who can teach you new aspects but also other students around you, of course. The only requirement is that the other people should be interested in a similar topic like you. The database group at UChicago is growing rapidly and we hope that our new chair will be willing to help with this endeavor. It would be too cumbersome to create a story from the thoughts that I heard today so let me just enumerate them and comment a little bit on each of them.

  1. You get what people think you should get. This is very true. We have to master the skill of communication, especially about what we work on, what is our research about and what is in for you. Giving a talk is always a challenge for me, but I love it. I'm always thrilled and sometimes even a little bit nervous, however, once I am there on the stage, know that my material is well-prepared and start communicating with the audience, then there is no better time in my life, seriously.
  2. Don't do something off truck. So, communicate with your advisor frequently and make sure that you both agree on the next steps. I don't want to make my advisor being frustrated, sure thing. This is more about how you want to run your business, what your vision is for an excellent new approach/technique/method/algorithm/theory/experiments. Just make sure that your supervisor agreed with you and remember to calibrate your research trajectory.
  3. Spend some time improving yourself. I know that I must work more on math & English. This is math that helps me to keep sane in this unstructured world of words/thoughts/opinions and get on the right truck with my own thinking. At the same time, you mush speak/read/write in English and do it perfectly, otherwise you loose a lot being in the US or other English speaking countries.
  4. Write a technical blog. I'll write more technical stuff, I promise. It'll be on compression, Spark, parquet (and other data formats), data migration, some other databases (MonetDB), and I will comment on some papers.
  5. This blog has to be keyword search-able, I'm glad it is.
  6. Revise your blog posts from time to time.
  7. Endurance and self-motivation.
  8. Take ownership of your life and your research.
  9. Manage yourself as you were a superstar.
  10. Act thoughtfully.
  11. Act intelligently.
  12. Act as a leader.
Let it be all for now.

Friday, 8 July 2016

San Francisco from another perspective

If you are going to San Francisco, take some smile with you. This city just requires you to admire the beautiful and sunny days. This vibrant place let you experience an amazing mixture of tastes from posh Financial District to sad places brim with homeless people. You can do a traditional city-oriented sight-seeing or go to special places, such as, California Academy of Sciences or the Young Museum in the Golden Gate Park. The crucial point is this, it's all about people. Ask them what the best place to visit is and you'll get a basket of tips and tricks :)

Transamerica Pyramid - tall, landmark building shaped like a narrow pyramid (view from Hyatt Regency in the Financial District).

The Sun penetrating the Financial District.
Railways of the cable car and steep hill in the background.
Manually operated cable car (the icon of the city and the last such relic in the world).
This is not the Golden Gate bridge but the San Francisco - Oakland Bay bridge. It also looks marvelous.
A homeless person at one of the Bart stations.

Gorgeous butterfly in the Golden Gate Park (California Academy of Sciences).

Check your paper for IEEE compliance through the IEEE PDF eXpress site

I found that the easiest way to make your pdf compliant with the IEEE requirements is to just print your pdf to another pdf! This works like a charm. I had a few problems, which are enumerated in the table below. I tried removing the urls, I even tried these hacks: However, the only thing that you have to do is to grab the document generated by latex, go to File->Print...->Print to file (choose another location/name) and voila, it works! :)

If the report says:
Possible cause(s)
Bookmarks found in document
(1) User inserted bookmarks into the PDF.
Document contains security
(2a) User applied security to some or all elements of the PDF
(2b) If source is Microsoft Word— PDFMaker is set to apply security to all PDFs
Font ### is not embedded
(3a) The font file does not exist on the system that created the PDF
(3b) The font is not being found by Distiller
(3c) The font is not embeddable
(3d) Using Office 2003 or 2007
Font ### is not subsetted
(4a) PDF conversion options are not set correctly
(4b) Headers/footers were added using Acrobat 6 function (or other application)
(4c) Using Office 2003 or 2007
Document contains link annotations
(5) User has applied a link to text in the document.
Document contains form fields
(6) User has used form fields in the document.

Thursday, 7 April 2016

Thought of the day

Friday, 15 January 2016

Another RUBY lesson

adam@gaia:~$ ruby hello.rb
hello.rb:2:in `<main>': undefined method `wait' for Thread:Class (NoMethodError)
adam@gaia:~$ ruby hello.rb
In another thread
adam@gaia:~$ ruby hello.rb
In thread 0In thread 5In thread 1In thread 9
In thread 4In thread 2In thread 8In thread 3
In thread 7In thread 6

adam@gaia:~$ ruby hello.rb
In thread 0
In thread 3
In thread 4
In thread 6
In thread 1
In thread 2
In thread 9
In thread 5
In thread 7
In thread 8
adam@gaia:~$ irb -r ./hello.rb
irb(main):001:0> pt_1
NameError: undefined local variable or method `pt_1' for main:Object
    from (irb):1
    from /usr/bin/irb:12:in `<main>'
irb(main):002:0> load('hello.rb')
=> true
irb(main):003:0> pt_1
NameError: undefined local variable or method `pt_1' for main:Object
    from (irb):3
    from /usr/bin/irb:12:in `<main>'
irb(main):004:0> class Point
irb(main):005:1>   attr_accessor :x, :y
irb(main):006:1> end
=> nil
irb(main):008:0* class Line
irb(main):009:1>   attr_accessor :pt_1, :pt_2
irb(main):010:1> end
=> nil
irb(main):012:0* pt_1 =
=> #<Point:0x000000014ea100>
irb(main):013:0> pt_1.x = 10
=> 10
irb(main):014:0> pt_1.y = 20
=> 20
irb(main):016:0* pt_2 =
=> #<Point:0x000000014a38e0>
irb(main):017:0> pt_2 = 50
=> 50
irb(main):018:0> pt_2 = 100
=> 100
irb(main):020:0* line =
=> #<Line:0x00000001496550>
irb(main):021:0> line.pt_1 = pt_1
=> #<Point:0x000000014ea100 @x=10, @y=20>
irb(main):022:0> line.pt_2 = pt_2
=> 100
irb(main):023:0> pt_1
=> #<Point:0x000000014ea100 @x=10, @y=20>
irb(main):024:0> line
=> #<Line:0x00000001496550 @pt_1=#<Point:0x000000014ea100 @x=10, @y=20>, @pt_2=100>
irb(main):025:0> line.pt_1
=> #<Point:0x000000014ea100 @x=10, @y=20>
irb(main):026:0> line.pt_1.x
=> 10
irb(main):027:0> line.pt_1.y
=> 20
irb(main):028:0> line.pt_2.c
NoMethodError: undefined method `c' for 100:Fixnum
    from (irb):28
    from /usr/bin/irb:12:in `<main>'
irb(main):029:0> line.pt_2.x
NoMethodError: undefined method `x' for 100:Fixnum
    from (irb):29
    from /usr/bin/irb:12:in `<main>'
irb(main):030:0> line.pt_1.y
=> 20
irb(main):031:0> class Point
irb(main):032:1>   attr_accessor :x
irb(main):034:1*   def y
irb(main):035:2>     x + 10
irb(main):036:2>   end
irb(main):037:1> end
=> nil
irb(main):039:0* class Line
irb(main):040:1>   attr_accessor :pt_1, :pt_2
irb(main):041:1> end
=> nil
irb(main):043:0* pt_1 =
=> #<Point:0x0000000123e000>
irb(main):044:0> pt_1.x = 10
=> 10
irb(main):045:0> pt_1.y = 20
=> 20
irb(main):047:0* pt_2 =
=> #<Point:0x0000000122cee0>
irb(main):048:0> pt_2 = 50
=> 50
irb(main):049:0> pt_2 = 100
=> 100
irb(main):051:0* line =
=> #<Line:0x000000017189e0>
irb(main):052:0> line.pt_1 = pt_1
=> #<Point:0x0000000123e000 @x=10, @y=20>
irb(main):053:0> line.pt_2 = pt_2
=> 100
irb(main):054:0> line.pt_1.y
=> 20
adam@gaia:~$ irb -r ./hello.rb
ArgumentError: wrong number of arguments (0 for 2)
    from /home/adam/hello.rb:2:in `initialize'
    from (irb):1:in `new'
    from (irb):1
    from /usr/bin/irb:12:in `<main>'
irb(main):002:0>, 20).x
=> 10
irb(main):003:0>, 20).y
=> 20
irb(main):004:0>, 20).adam
Called adam but I only respond to [:method_missing, :x, :x=, :y, :y=]
=> nil
irb(main):005:0> load('hello.rb')
=> true
irb(main):006:0>, 20).x
=> 10
irb(main):007:0>, 20).y
=> 20
irb(main):008:0>, 20).adam
Called adam but I only respond to [:!, :!=, :!~, :<=>, :==, :===, :=~, :__id__, :__send__, :class, :clone, :define_singleton_method, :display, :dup, :enum_for, :eql?, :equal?, :extend, :freeze, :frozen?, :hash, :initialize_clone, :initialize_dup, :inspect, :instance_eval, :instance_exec, :instance_of?, :instance_variable_defined?, :instance_variable_get, :instance_variable_set, :instance_variables, :is_a?, :kind_of?, :method, :method_missing, :methods, :nil?, :object_id, :private_methods, :protected_methods, :public_method, :public_methods, :public_send, :respond_to?, :respond_to_missing?, :send, :singleton_class, :singleton_methods, :taint, :tainted?, :tap, :to_enum, :to_s, :trust, :untaint, :untrust, :untrusted?, :x, :x=, :y, :y=]
=> nil
irb(main):009:0> str = "Adam 6\nBrian 10\nCarol 13\nBrian 14\nAdam 23\nCarol 14"
=> "Adam 6\nBrian 10\nCarol 13\nBrian 14\nAdam 23\nCarol 14"
irb(main):010:0> puts str
Adam 6
Brian 10
Carol 13
Brian 14
Adam 23
Carol 14
=> nil
irb(main):011:0> str.lines
=> #<Enumerator: "Adam 6\nBrian 10\nCarol 13\nBrian 14\nAdam 23\nCarol 14":lines>
irb(main):012:0> str.lines.to_a
=> ["Adam 6\n", "Brian 10\n", "Carol 13\n", "Brian 14\n", "Adam 23\n", "Carol 14"]
irb(main):013:0> { |line| line.split }
=> [["Adam", "6"], ["Brian", "10"], ["Carol", "13"], ["Brian", "14"], ["Adam", "23"], ["Carol", "14"]]
irb(main):014:0> { |line| line.split }.map { |name, score| [name, score.to_i] }
=> [["Adam", 6], ["Brian", 10], ["Carol", 13], ["Brian", 14], ["Adam", 23], ["Carol", 14]]
irb(main):015:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| name }
=> {"Adam"=>[["Adam", 6], ["Adam", 23]], "Brian"=>[["Brian", 10], ["Brian", 14]], "Carol"=>[["Carol", 13], ["Carol", 14]]}
irb(main):016:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| score }
=> {6=>[["Adam", 6]], 10=>[["Brian", 10]], 13=>[["Carol", 13]], 14=>[["Brian", 14], ["Carol", 14]], 23=>[["Adam", 23]]}
irb(main):017:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| name[0] }
=> {"A"=>[["Adam", 6], ["Adam", 23]], "B"=>[["Brian", 10], ["Brian", 14]], "C"=>[["Carol", 13], ["Carol", 14]]}
irb(main):018:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| name }
=> {"Adam"=>[["Adam", 6], ["Adam", 23]], "Brian"=>[["Brian", 10], ["Brian", 14]], "Carol"=>[["Carol", 13], ["Carol", 14]]}
irb(main):019:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| name }.map { |name, name_and_scores| name_and_scores }
=> [[["Adam", 6], ["Adam", 23]], [["Brian", 10], ["Brian", 14]], [["Carol", 13], ["Carol", 14]]]
irb(main):020:0> { |line| line.split }.map { |name, score| [name, score.to_i] }.group_by { |name, score| name }.map { |name, name_and_scores| }
=> [[6, 23], [10, 14], [13, 14]]
irb(main):021:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| }
=> [[6, 23], [10, 14], [13, 14]]
irb(main):022:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| }
=> [29, 24, 27]
irb(main):023:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [] }
=> [[29], [24], [27]]
irb(main):024:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }
=> [["Adam", 29], ["Brian", 24], ["Carol", 27]]
irb(main):025:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.to_h
NoMethodError: undefined method `to_h' for [["Adam", 29], ["Brian", 24], ["Carol", 27]]:Array
    from (irb):25
    from /usr/bin/irb:12:in `<main>'
irb(main):026:0> { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.to_hash
NoMethodError: undefined method `to_hash' for [["Adam", 29], ["Brian", 24], ["Carol", 27]]:Array
    from (irb):26
    from /usr/bin/irb:12:in `<main>'
irb(main):027:0> RUBY_VERSION
=> "1.9.3"
irb(main):028:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }]
=> {"Adam"=>29, "Brian"=>24, "Carol"=>27}
irb(main):029:0> str
=> "Adam 6\nBrian 10\nCarol 13\nBrian 14\nAdam 23\nCarol 14"
irb(main):030:0> str
=> "Adam 6\nBrian 10\nCarol 13\nBrian 14\nAdam 23\nCarol 14"
irb(main):031:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }]
=> {"Adam"=>29, "Brian"=>24, "Carol"=>27}
irb(main):032:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }].max_by(&:last)
\=> ["Adam", 29]
irb(main):033:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.sort_by(&:last)]
=> {"Brian"=>24, "Carol"=>27, "Adam"=>29}
irb(main):034:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.sort_by(&:last).reverse]
=> {"Adam"=>29, "Carol"=>27, "Brian"=>24}
irb(main):035:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.sort_by(&:last).reverse.lazy]
NoMethodError: undefined method `lazy' for [["Adam", 29], ["Carol", 27], ["Brian", 24]]:Array
    from (irb):35
    from /usr/bin/irb:12:in `<main>'
irb(main):036:0> Hash[ { |name, score| [name, score.to_i] }.group_by(&:first).map { |name, name_and_scores| [name,] }.sort_by(&:last).reverse]
=> {"Adam"=>29, "Carol"=>27, "Brian"=>24}