hbase的搭建
作者: chua | 可以转载, 转载时务必以超链接形式标明文章原始出处和作者信息及版权声明
网址: http://www.meichua.com/archives/45.html
hbase的搭建
URL:http://hadoop.apache.org/hbase/docs/r0.1.1/api/overview-summary.html
在已经创建的hdfs基础上搭建
1:修改hadoop/contrib/hbase/conf/hbase-env.sh
加入java_home的路径
2:修改hadoop/contrib/hbase/conf/hbase-site.xml,加入如下
1 2 3 4 5 6 7 8 9 10 | <property> <name>hbase.master</name> <value>10.0.4.121:11100</value> <description>The host and port that the HBase master runs at.</description> </property> <property> <name>hbase.rootdir</name> <value>hdfs://10.0.4.121:10100/hbase</value> <description>The directory shared by region servers.</description> </property> |
3:启动hbase
1 | hadoop/contrib/hbase/bin/start-hbase.sh |
4: 查看http://wiki.apache.org/hadoop/Hbase/HbaseShell,进行shell操作
4.1 首先进入shell
1 | hadoop/contrib/hbase/bin/hbase shell |
4.2 创建表
1 | CREATE TABLE offer(image_big,image_small); |
4.2 插入数据,查询,删除数据
如:
1 2 3 4 5 6 | INSERT INTO offer(image_big:,image_small:) VALUES ('abcdefg','abc') WHERE row = 'testinsert'; INSERT INTO offer(image_big:,image_small:) VALUES ('hijklmn','hij') WHERE row = 'testinsert'; INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('abcdefg','path_big','abc','path_small') WHERE row = 'testinsert'; INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('hijklmn','path_big','hij','path_small') WHERE row = 'testinsert'; SELECT * FROM offer WHERE row = 'testinsert'; |
返回结果如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | +------------------------+-------------------------+ | Column | Cell | +------------------------+-------------------------+ | image_big: | hijklmn | +------------------------+-------------------------+ | image_big:content | hijklmn | +------------------------+-------------------------+ | image_big:path | path_big | +------------------------+-------------------------+ | image_small: | hij | +------------------------+-------------------------+ | image_small:content | hij | +------------------------+-------------------------+ | image_small:path | path_small | +------------------------+-------------------------+ |
1 | SELECT count(*) FROM offer WHERE row = 'testinsert'; |
返回:
1 | 1 row(s) in set. (0.02 sec) |
从上可以看到,虽然我们插入了4条数据,但是结果是1,hbase覆盖了相同的数据,insert2覆盖insert1,insert4覆盖insert2,相当于update,从shell的介绍中我们也看到hql没有提供update
此时的数据结果应该如下:
1 2 3 4 5 6 7 | +----------+--------------------------+---------------------------+ | | Column image_big | Column image_small | | key +--------------------------+---------------------------+ | | : |:content | :path | : |:content| :path | +-------------------------------------+---------------------------+ |testinsert|hijklmn|hijklmn |path_big| hij | hij | path_small| +----------+--------------------------+---------------------------+ |
加入insert加入TIMESTAMP会怎么样呢?
1 2 3 4 5 6 | DELETE * FROM offer WHERE row = 'testinsert'; INSERT INTO offer(image_big:,image_small:) VALUES ('abcdefg','abc') WHERE row = 'testinsert' timestamp '1209982310285'; INSERT INTO offer(image_big:,image_small:) VALUES ('hijklmn','hij') WHERE row = 'testinsert' timestamp '1209982311285'; INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('abcdefg','path_big','abc','path_small') WHERE row = 'testinsert' timestamp '1209982312285'; INSERT INTO offer(image_big:content,image_big:path,image_small:content,image_small:path) VALUES ('hijklmn','path_big','hij','path_small') WHERE row = 'testinsert' timestamp '1209982313285'; |
结果无论是
1 | SELECT * FROM offer WHERE row = 'testinsert' |
or
1 | SELECT * FROM offer WHERE row = 'testinsert' timestamp '1209982310285'; |
都只返回
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | +-------------------------+----------------------+ | Column | Cell | +-------------------------+----------------------+ | image_big: | hijklmn | +-------------------------+----------------------+ | image_big:content | hijklmn | +-------------------------+----------------------+ | image_big:path | path_big | +-------------------------+----------------------+ | image_small: | hij | +-------------------------+----------------------+ | image_small:content | hij | +-------------------------+----------------------+ | image_small:path | path_small | +-------------------------+----------------------+ |
我迷惑了,如hbase Architecture介绍中是有timestamp的,数据按照时间备份的.但这里怎么理解哦…
http://www.mail-archive.com/core-user@hadoop.apache.org/msg00222.html,上面的页面中说到似乎目前还不支持,但是我这里插入是成功的;另外个人理解row和timestamp从数据结果上来说都是index级的,应该是数据本身之外的,那么不显示倒是没啥问题,但是数据好像被覆盖呢?难道目前不支持……
先delete
1 | DELETE * FROM offer WHERE row = 'testinsert'; |
再select
1 | SELECT * FROM offer WHERE row = 'testinsert'; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | +-------------------------+----------------------+ | Column | Cell | +-------------------------+----------------------+ | image_big: | abcdefg | +-------------------------+----------------------+ | image_big:content | abcdefg | +-------------------------+----------------------+ | image_big:path | path_big | +-------------------------+----------------------+ | image_small: | abc | +-------------------------+----------------------+ | image_small:content | abc | +-------------------------+----------------------+ | image_small:path | path_small | +-------------------------+----------------------+ |
这个意外的发现,说明数据是有备份的,是不过没有搜索到历史数据,select中的timestamp条件好像没有起作用,每次返回都是最新的数据.架构中说道insert如果没有时间条件,系统默认会加上当前时间.
5 client访问hbase
如上次访问HDFS,引入hbase-site.xml,lib包,代码如下
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | package com.chua.hadoop.client; import java.io.BufferedInputStream; import java.io.BufferedOutputStream; import java.io.DataInputStream; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.util.Iterator; import java.util.SortedMap; import org.apache.commons.httpclient.HttpClient; import org.apache.commons.httpclient.methods.GetMethod; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.HTable; import org.apache.hadoop.io.Text; /** * 类HBase.java的实现描述:TODO 类实现描述 * @author chua 2008-5-4 下午05:03:33 */ public class HBase { /** * @param args */ public static void main(String[] args) throws Exception { String domain = "www.dlog.cn"; String path_s = "/uploads/m/me/meichua/meichua_100.jpg"; String path_b = "/uploads/m/me/meichua/200804/22094433_tLuyw.jpg"; byte[] data_s = getData(domain, path_s); byte[] data_b = getData(domain,path_b); HBaseConfiguration config = new HBaseConfiguration(); HTable table = new HTable(config, new Text("offer")); createRecore(table,"chua","image_big",data_b,path_b); createRecore(table,"chua","image_small",data_s,path_s); //取得一个row的所有data,遍历keySet SortedMap map = table.getRow(new Text("chua")); if(!map.isEmpty()) { Iterator it = map.keySet().iterator(); while(it.hasNext()){ System.out.println(it.next()); } } //取得某个row的colunmName的data byte[] data = table.get(new Text("chua"), new Text("image_big:content")); saveAsFile(data,"c:/chua_big.jpg"); } public static void createRecore(HTable table,String row, String colunm,byte[] data, String path) throws IOException { long lockId = table.startUpdate(new Text(row)); table.put(lockId, new Text(colunm+":content"), data); table.put(lockId, new Text(colunm+":path"), path.getBytes()); table.commit(lockId); } /** * 从网上读取图片 * @param domain * @param path * @return */ public static byte[] getData(String domain,String path){ byte[] dataResource = null; try { HttpClient client = new HttpClient(); client.getHostConfiguration().setHost(domain,80,"http"); GetMethod getMethod = new GetMethod(path); int status = client.executeMethod(getMethod); if(status == 200) { dataResource = getMethod.getResponseBody(); } getMethod.releaseConnection(); } catch(Exception e) { System.out.println("Download error"+e); } return dataResource; } /** * 从本地文件读取 * @param path * @return */ public static byte[] getData(String path) { File file = new File(path); DataInputStream dis = null; try { dis = new DataInputStream(new BufferedInputStream(new FileInputStream(file))); int length = dis.available(); byte[] data = new byte[length]; dis.read(data); return data; } catch (Exception e) { e.printStackTrace(); return null; } } /** * 存到一个文件 * @param data * @param path */ public static void saveAsFile(byte[] data,String path) { if(data != null) { try { BufferedOutputStream out = new BufferedOutputStream(new FileOutputStream(path)); for(byte tmp : data) { out.write(tmp); } out.close(); } catch (Exception e) { e.printStackTrace(); } } } } |
输出:
image_big:content
image_big:path
image_small:content
image_small:path
以上是一个client访问hbase的例子,比较简单
6 hbase架构介绍