hbase的搭建

Tuesday, May 6th, 2008 at 14:35 Leave a comment Go to comments

作者: chua | 可以转载, 转载时务必以超链接形式标明文章原始出处和作者信息及版权声明
网址: http://www.meichua.com/archives/45.html

hbase的搭建
URL:http://hadoop.apache.org/hbase/docs/r0.1.1/api/overview-summary.html

在已经创建的hdfs基础上搭建
1:修改hadoop/contrib/hbase/conf/hbase-env.sh
加入java_home的路径

2:修改hadoop/contrib/hbase/conf/hbase-site.xml,加入如下

1
2
3
4
5
6
7
8
9
10
  <property>
    <name>hbase.master</name>
    <value>10.0.4.121:11100</value>
    <description>The host and port that the HBase master runs at.</description>
  </property>
  <property>
    <name>hbase.rootdir</name>
    <value>hdfs://10.0.4.121:10100/hbase</value>
    <description>The directory shared by region servers.</description>
  </property>

3:启动hbase

1
hadoop/contrib/hbase/bin/start-hbase.sh

4: 查看http://wiki.apache.org/hadoop/Hbase/HbaseShell,进行shell操作

4.1 首先进入shell

1
 hadoop/contrib/hbase/bin/hbase shell

4.2 创建表

1
 CREATE TABLE offer(image_big,image_small);

4.2 插入数据,查询,删除数据
如:

1
2
3
4
5
6
  INSERT INTO offer(image_big:,image_small:) VALUES ('abcdefg','abc') WHERE row = 'testinsert';
  INSERT INTO offer(image_big:,image_small:) VALUES ('hijklmn','hij') WHERE row = 'testinsert';
  INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('abcdefg','path_big','abc','path_small') WHERE row = 'testinsert';
  INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('hijklmn','path_big','hij','path_small') WHERE row = 'testinsert';
 
  SELECT * FROM offer WHERE row = 'testinsert';

返回结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
 +------------------------+-------------------------+
 | Column                 | Cell                    |
 +------------------------+-------------------------+
 | image_big:             | hijklmn                 |
 +------------------------+-------------------------+
 | image_big:content      | hijklmn                 |
 +------------------------+-------------------------+
 | image_big:path         | path_big                |
 +------------------------+-------------------------+
 | image_small:           | hij                     |
 +------------------------+-------------------------+
 | image_small:content    | hij                     |
 +------------------------+-------------------------+
 | image_small:path       | path_small              |
 +------------------------+-------------------------+
1
 SELECT count(*) FROM offer WHERE row = 'testinsert';

返回:

1
 1 row(s) in set. (0.02 sec)

从上可以看到,虽然我们插入了4条数据,但是结果是1,hbase覆盖了相同的数据,insert2覆盖insert1,insert4覆盖insert2,相当于update,从shell的介绍中我们也看到hql没有提供update
此时的数据结果应该如下:

1
2
3
4
5
6
7
 +----------+--------------------------+---------------------------+
 |          |  Column   image_big      |      Column image_small   |
 |   key    +--------------------------+---------------------------+
 |          |   :   |:content | :path  |  :  |:content|  :path     |
 +-------------------------------------+---------------------------+
 |testinsert|hijklmn|hijklmn  |path_big| hij |  hij   |  path_small|
 +----------+--------------------------+---------------------------+

加入insert加入TIMESTAMP会怎么样呢?

1
2
3
4
5
6
  DELETE * FROM offer WHERE row = 'testinsert';
 
  INSERT INTO offer(image_big:,image_small:) VALUES ('abcdefg','abc') WHERE row = 'testinsert' timestamp '1209982310285';
  INSERT INTO offer(image_big:,image_small:) VALUES ('hijklmn','hij') WHERE row = 'testinsert' timestamp '1209982311285';
  INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('abcdefg','path_big','abc','path_small') WHERE row = 'testinsert' timestamp '1209982312285';
  INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('hijklmn','path_big','hij','path_small') WHERE row = 'testinsert' timestamp '1209982313285';

结果无论是

1
  SELECT * FROM offer WHERE row = 'testinsert'

or

1
  SELECT * FROM offer WHERE row = 'testinsert' timestamp '1209982310285';

都只返回

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  +-------------------------+----------------------+
  | Column                  | Cell                 |
  +-------------------------+----------------------+
  | image_big:              | hijklmn              |
  +-------------------------+----------------------+
  | image_big:content       | hijklmn              |
  +-------------------------+----------------------+
  | image_big:path          | path_big             |
  +-------------------------+----------------------+
  | image_small:            | hij                  |
  +-------------------------+----------------------+
  | image_small:content     | hij                  |
  +-------------------------+----------------------+
  | image_small:path        | path_small           |
  +-------------------------+----------------------+

我迷惑了,如hbase Architecture介绍中是有timestamp的,数据按照时间备份的.但这里怎么理解哦…
http://www.mail-archive.com/core-user@hadoop.apache.org/msg00222.html,上面的页面中说到似乎目前还不支持,但是我这里插入是成功的;另外个人理解row和timestamp从数据结果上来说都是index级的,应该是数据本身之外的,那么不显示倒是没啥问题,但是数据好像被覆盖呢?难道目前不支持……
先delete

1
  DELETE * FROM offer WHERE row = 'testinsert';

再select

1
  SELECT * FROM offer WHERE row = 'testinsert';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  +-------------------------+----------------------+
  | Column                  | Cell                 |
  +-------------------------+----------------------+
  | image_big:              | abcdefg              |
  +-------------------------+----------------------+
  | image_big:content       | abcdefg              |
  +-------------------------+----------------------+
  | image_big:path          | path_big             |
  +-------------------------+----------------------+
  | image_small:            | abc                  |
  +-------------------------+----------------------+
  | image_small:content     | abc                  |
  +-------------------------+----------------------+
  | image_small:path        | path_small           |
  +-------------------------+----------------------+

这个意外的发现,说明数据是有备份的,是不过没有搜索到历史数据,select中的timestamp条件好像没有起作用,每次返回都是最新的数据.架构中说道insert如果没有时间条件,系统默认会加上当前时间.

5 client访问hbase
如上次访问HDFS,引入hbase-site.xml,lib包,代码如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
  package com.chua.hadoop.client;
 
  import java.io.BufferedInputStream;
  import java.io.BufferedOutputStream;
  import java.io.DataInputStream;
  import java.io.File;
  import java.io.FileInputStream;
  import java.io.FileOutputStream;
  import java.io.IOException;
  import java.util.Iterator;
  import java.util.SortedMap;
 
  import org.apache.commons.httpclient.HttpClient;
  import org.apache.commons.httpclient.methods.GetMethod;
  import org.apache.hadoop.hbase.HBaseConfiguration;
  import org.apache.hadoop.hbase.HTable;
  import org.apache.hadoop.io.Text;
 
  /**
   * 类HBase.java的实现描述:TODO 类实现描述
   * @author chua 2008-5-4 下午05:03:33
   */
  public class HBase {
 
      /**
       * @param args
       */
      public static void main(String[] args) throws Exception {
          String domain = "www.dlog.cn";
          String path_s = "/uploads/m/me/meichua/meichua_100.jpg";
          String path_b = "/uploads/m/me/meichua/200804/22094433_tLuyw.jpg";
          byte[] data_s = getData(domain, path_s);
          byte[] data_b = getData(domain,path_b);
 
          HBaseConfiguration config = new HBaseConfiguration();
          HTable table = new HTable(config, new Text("offer"));
          createRecore(table,"chua","image_big",data_b,path_b);
          createRecore(table,"chua","image_small",data_s,path_s);
 
          //取得一个row的所有data,遍历keySet
          SortedMap map = table.getRow(new Text("chua"));
          if(!map.isEmpty()) {
              Iterator it = map.keySet().iterator();
              while(it.hasNext()){
                  System.out.println(it.next());
              }
          }
          //取得某个row的colunmName的data
          byte[] data = table.get(new Text("chua"), new Text("image_big:content"));
          saveAsFile(data,"c:/chua_big.jpg");
      }
 
      public static void createRecore(HTable table,String row, String colunm,byte[] data, String path) throws IOException {
          long lockId = table.startUpdate(new Text(row));
          table.put(lockId, new Text(colunm+":content"), data);
          table.put(lockId, new Text(colunm+":path"), path.getBytes());
          table.commit(lockId);
      }
 
      /**
       * 从网上读取图片
       * @param domain
       * @param path
       * @return
       */
      public static byte[] getData(String domain,String path){
          byte[] dataResource = null;
          try {
              HttpClient client = new HttpClient();
              client.getHostConfiguration().setHost(domain,80,"http");
              GetMethod getMethod = new GetMethod(path);
              int status = client.executeMethod(getMethod);
              if(status == 200) {
                  dataResource = getMethod.getResponseBody();
              }
              getMethod.releaseConnection();
          } catch(Exception e) {  
              System.out.println("Download error"+e);
          }
          return dataResource;
      }
 
      /**
       * 从本地文件读取
       * @param path
       * @return
       */
      public static byte[] getData(String path) {
          File file = new File(path);
          DataInputStream dis = null;
          try {
              dis = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
              int length = dis.available();
              byte[] data = new byte[length];
              dis.read(data);
              return data;
          } catch (Exception e) {
              e.printStackTrace();
              return null;
          }
      }
 
      /**
       * 存到一个文件
       * @param data
       * @param path
       */
      public static void saveAsFile(byte[] data,String path) {
          if(data != null) {
              try {
                  BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(path));
                  for(byte tmp : data) {
                      out.write(tmp);
                  }
                  out.close();
              } catch (Exception e) {
                  e.printStackTrace();
              }
          }
      }
  }

输出:
image_big:content
image_big:path
image_small:content
image_small:path
以上是一个client访问hbase的例子,比较简单

6 hbase架构介绍

http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture

Categories: technic Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.
You must be logged in to post a comment.