Kodama's home / tips.

Japanese description.

Histogram class for ruby

Download: histogram.rb

We can get character based histogram using this class.

To get the top n-data, use ruby Sized Priority Queue class.

Example: line length in a text file

require "histogram"
h=Histogram.new(0,99,5) # lower bound / upper bound / class width
readlines.each{|l| h.push(l.size)}  # scan lines
h.report # print figure

Result: Statistics of the file "cgi-lib.rb" in ruby library

$ ruby sample.rb < cgi-lib.rb 
  0-.:----+----+----+----+----+----+----+----+----+----+----+----+----+- 66
  5-.:----+----+----+----+----+----+ 30
 10-.:----+----+ 10
 15-.:----+----+----+--- 18
 20-.:----+----+----+----+----+-- 27
 25-.:----+----+----+----+-- 22
 30-.:----+----+----+----+- 21
 35-.:----+----+ 10
 40-.:----+---- 9
 45-.:----+-- 7
 50-.:----+ 5
 55-.:----+----+-- 12
 60-.:----+ 5
 65-.:----+----+---- 14
 70-.:----+- 6
 75-.:---- 4
 80-.:-- 2
 85-.: 0
 90-.: 0
 95-.:-- 2
number: 270, average: 25.733333
standard deviation: 23.516488, coefficient of variation: 0.913853
range: 103.000000, range/average: 4.002591
variance: 553.025185, skewness: 0.838520, kurtosis: 2.856878

Note

In this example, when data is out of bound, then pushed into nearest class.
e.g.
h.push(200)  # pushed into the class "95-".
h.push(-200)  # pushed into the class "0-".
We can control this behavior. See the method "push" in source code for detail.
Followings are Japanese descriptions.

Ruby でヒストグラムを作成

テキスト(コンソール)でヒストグラム(度数分布図,柱状図)を観察することができる. 画像を使わないで, テキストで完結するのがポイント.

ついでに, 平均, 分散, 高次のモメントなどを計算.


Kodama's home / tips.