eBay readies next generation search built with Hadoop and HBase

wbj0110

浏览: 1551921 次
性别:
来自: 上海

最近访客更多访客>>

一往无前bhz

ninja2006

loginboot

u012363178

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hbase
Hadoop
MapReduce

eBay readies next generation search built with Hadoop and HBase Hbase Hadoop MapReduce

eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Apache Hadoop for hourly index updates and Apache HBase to provide random access to item information. Hugh E. Williams the VP Search, Experience & Platforms for eBay Marketplaces delivered the keynote, where he outlined the scale, technologies used, and experiences from an 18 month effort by over 100 engineers to completely rebuild eBay's core site search. The new platform, Cassini, will support:

97 million active buyers & sellers
250 million queries per day
200 million items live in over 50,000 categories

eBay already stores 9 PB of data in Hadoop and Teradata clusters for analysis, but this will be their first production application that users use directly. The new system will be more extensive than the current one (Galileo):

Old System: Galileo New System: Cassini

10's of factors used for ranking	100's of factors used for ranking
title-only match by default	use all data to match by default
manual intervention for rollout, monitoring, remediation	automated rollout, monitoring, remediation

Cassini will keep 90 days of historical data online - currently 1 billion items, and include user and behavioral data for ranking. Most of the work required to support the search system is done in hourly batch jobs that run in Hadoop. Different kinds of indexes will all be generated in the same cluster (an improvement over Galileo, which had different clusters for each kind of indexing). The Hadoop environment allows eBay to restore or reclassify the entire site inventory as improvements are created.

Items are stored in HBase, and are normally scanned during the hourly index updates. When a new item is listed, it will be looked up in HBase and added to the live index within minutes. HBase also allows for bulk and incremental item writes and fast item reads and writes for item annotation.

Williams indicated that the team was familiar with running Hadoop and it had worked reliably with few problems. By contrast he indicated the "ride so far with HBase has been bumpy." Williams noted that eBay remains committed to the technology, have been contributing fixes to issues they found, are learning fast and that the last two weeks have gone smoothly. The engineering team was new to using HBase and ran into some issues when testing at scale, such as:

* production cluster configuration for their workloads

* hardware issues

* stability: unstable region servers, unstable master, regions stuck in transition

* monitoring HBase health: often problems haven't been detected until they impact live service - the team is adding lots of monitoring

* managing multi-step MapReduce jobs

Overall Williams felt the project was ambitious but had gone quickly and well, and that the team was able to use Hadoop and HBase to build a significantly improved search experience.

come from info

分享到：

Storm集群安装配置过程 | 使用反向代理技术保护Web服务器

2013-10-13 13:04
浏览 765
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

nmess-generator:用于动态 Node.js 服务器的直观而优雅的样板: 麻烦简而言之 What `nmess` does for you: lays out an Express Node.js server scaffolds server routing prepares a database connection sets up a gulpfile that: compiles ECMAScript 6 to 5... readies We

0.0.0 VScode插件platformIO开发环境的安装[esp32、8266]_̌萌新历险记的博客-CSDN博客.mhtml: 0.0.0 VScode插件platformIO开发环境的安装[esp32、8266]_̌萌新历险记的博客-CSDN博客.mhtml

什么是五一数学建模以及学习五一数学建模的意义是什么: 五一数学建模

人工智能大模型体验报告3.0: 人工智能大模型体验报告3.0 目录大模型产品测评综述大模型产品现状与进程 3 .0版本大模型测评规则大模型厂商整体测评 3.0版本大模型综合指数 3.0版本测评细分维度指数及评述测评题目展示厂商最佳实践案例厂商优秀案例展示人工智能大模型体验报告3.0.pdf (1.39 MB, 需要: RMB 9 元)

使用opencv进行人脸识别和对比-python源码.zip: 使用opencv进行人脸识别和对比-python源码.zip

上宫庄官网单页专题页触屏版手机wap健康网站模板下载.zip: 触屏版自适应手机wap软件网站模板触屏版自适应手机wap软件网站模板

基于matlab实现的GST模型的红外弱小目标检测代码.rar: 基于matlab实现的GST模型的红外弱小目标检测代码.rar

LMDI(对数平均迪氏指数法）分解法的stata实现案例+code: LMDI(对数平均迪氏指数法）目前在能源消耗、碳排放等领域应用很多，总体来说并不是一个很难的方法，但是相应的资料、步骤还是比较少。本人在写论文的时候，通过搜集资料、研究，找到了stata的实现方法，来自于 Kerry Du老师写的一个模块l dmi，网址为： LMDI: Stata module to compute L ogarithmic Mean Divisia Index (LMDI) Dec omposition (repec.org) 大家可以自己去研究一下。对应的参考文献就是Ang, B.W., 2005. The LMDI approach t odecomposition analysis: a practical gui de. Energy Policy 33, 867–871. 目前LMDI分解基本都以这篇文献为参考，论坛里也有，大家可以下载去深入研究一下，总体并不难。但是对于LMDI方法的解读以及stata实现的步骤、过程，目前基本没有具体的中文资料。本文在写论文的过程中，进行了总结，现上传上来供大家参考，里面包括stata的程序文件（ado

IEC 60695-11-2-2017 第11-2部分：试验火焰——1kW标称预混合火焰——装置、验证试验安排和指南.pdf: IEC 60695-11-2-2017 第11-2部分：试验火焰——1kW标称预混合火焰——装置、验证试验安排和指南.pdf

基于matlab实现的拉普拉斯金字塔分解做毕业设计的可以参考，小波变换以及MGA的初级参考.rar: 基于matlab实现的拉普拉斯金字塔分解做毕业设计的可以参考，小波变换以及MGA的初级参考.rar

基于优化设计的储油罐变位识别与罐容表标定的研究.doc: 本文档是课题研究的研究报告内含调研以及源码设计以及结果分析

node-v10.19.0-linux-x64.tar.xz: Node.js，简称Node，是一个开源且跨平台的JavaScript运行时环境，它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立，旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎，可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型，这使得它非常适合处理大量并发连接，从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外，Node.js使用了模块化的架构，通过npm（Node package manager，Node包管理器）,社区成员可以共享和复用代码，极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展，它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等，因此，开发者可以用JavaScript编写全栈应用程序，这一点大大提高了开发效率和便捷性。在实践中，许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台，如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能，简化了开发流程，并且能更快地响应市场需求。

制造企业数字化转型ERP与CRM系统集成定位及场景构建方案.pptx: 制造企业数字化转型ERP与CRM系统集成定位及场景构建方案.pptx

仿UG生活网触屏版手机wap网址导航网站模板.zip: 触屏版自适应手机wap软件网站模板触屏版自适应手机wap软件网站模板

CRM客户关系管理系统.zip: 【课程设计全套资料】基于JAVA的管理系统

IEC 60730-2-13-2017.pdf: IEC 60730-2-13-2017.pdf

机械设计谷物洗涤机sw12非常好的设计图纸100%好用.zip: 机械设计谷物洗涤机sw12非常好的设计图纸100%好用.zip

2000-2022年东部、中部以及西部分区域空间权重矩阵（五类）: 2000—2022年东中西分区域空间权重矩阵，省级层面，具体包括01矩阵，经济矩阵，地理矩阵，经济地理矩阵以及经济地理嵌套矩阵，包含原始数据和计算过程，经济地理权重矩阵采用2000-2022年数据，可以更改研究期间，里面有计算过程，地理距离是用的欧式距离，如有疑问可与我私聊哈，大家放心下载。

TMCM-0930-TMCL 硬件手册: TMCM-0930-TMCL 硬件手册

PDF合并器.exe\发票合并打印\PDF文件合并\PyPDF2包\PyQt5库\PdfReader模块\PdfWriter模块: 软件说明:先将要合并的.PDF类型文件放到一个文件夹里，然后点击”选择文件夹“选择它，再点击合并，确定存放位置确定后即可成功合并一个PDF类型的文件。在 Python 中，可以使用 PyPDF2 库来合并 PDF 文件。用 pip install PyPDF2 命令来安装这个库。

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论