eBay presented a keynote at Hadoop World, describing the architecture of its completely rebuilt search engine, Cassini, slated to go live in 2012. It indexes all the content and user metadata to produce better rankings and refreshes indexes hourly. It is built using Apache Hadoop for hourly index updates and Apache HBase to provide random access to item information. Hugh E. Williams the VP Search, Experience & Platforms for eBay Marketplaces delivered the keynote, where he outlined the scale, technologies used, and experiences from an 18 month effort by over 100 engineers to completely rebuild eBay's core site search. The new platform, Cassini, will support:
- 97 million active buyers & sellers
- 250 million queries per day
- 200 million items live in over 50,000 categories
eBay already stores 9 PB of data in Hadoop and Teradata clusters for analysis, but this will be their first production application that users use directly. The new system will be more extensive than the current one (Galileo):
10's of factors used for ranking | 100's of factors used for ranking |
title-only match by default | use all data to match by default |
manual intervention for rollout, monitoring, remediation | automated rollout, monitoring, remediation |
Cassini will keep 90 days of historical data online - currently 1 billion items, and include user and behavioral data for ranking. Most of the work required to support the search system is done in hourly batch jobs that run in Hadoop. Different kinds of indexes will all be generated in the same cluster (an improvement over Galileo, which had different clusters for each kind of indexing). The Hadoop environment allows eBay to restore or reclassify the entire site inventory as improvements are created.
Items are stored in HBase, and are normally scanned during the hourly index updates. When a new item is listed, it will be looked up in HBase and added to the live index within minutes. HBase also allows for bulk and incremental item writes and fast item reads and writes for item annotation.
Williams indicated that the team was familiar with running Hadoop and it had worked reliably with few problems. By contrast he indicated the "ride so far with HBase has been bumpy." Williams noted that eBay remains committed to the technology, have been contributing fixes to issues they found, are learning fast and that the last two weeks have gone smoothly. The engineering team was new to using HBase and ran into some issues when testing at scale, such as:
* production cluster configuration for their workloads
* hardware issues
* stability: unstable region servers, unstable master, regions stuck in transition
* monitoring HBase health: often problems haven't been detected until they impact live service - the team is adding lots of monitoring
* managing multi-step MapReduce jobs
Overall Williams felt the project was ambitious but had gone quickly and well, and that the team was able to use Hadoop and HBase to build a significantly improved search experience.
come from info
相关推荐
麻烦 简而言之 What `nmess` does for you: lays out an Express Node.js server scaffolds server routing prepares a database connection sets up a gulpfile that: compiles ECMAScript 6 to 5... readies We
0.0.0 VScode插件platformIO开发环境的安装[esp32、8266]_̌萌新历险记的博客-CSDN博客.mhtml
五一数学建模
人工智能大模型体验报告3.0 目录 大模型产品测评综述 大模型产品现状与进程 3 .0版本大模型测评规则 大模型厂商整体测评 3.0版本大模型综合指数 3.0版本 测评细分维度指数及评述 测评题目展示 厂商最佳实践案例 厂商优秀案例展示 人工 智能大模型体验报告3.0.pdf (1.39 MB, 需要: RMB 9 元)
使用opencv进行人脸识别和对比-python源码.zip
触屏版自适应手机wap软件网站模板 触屏版自适应手机wap软件网站模板
基于matlab实现的GST模型的红外弱小目标检测代码.rar
LMDI(对数平均迪氏指数法)目前在能源消耗、碳排放等领域应用很多,总体来说并不 是一个很难的方法,但是相应的资料、步骤还是比较少。本人在写论文的时候,通过搜集资 料、研究,找到了stata的实现方法,来自于 Kerry Du老师写的一个模块l dmi,网址为: LMDI: Stata module to compute L ogarithmic Mean Divisia Index (LMDI) Dec omposition (repec.org) 大家可以自己去研究一下。对应的参考 文献就是Ang, B.W., 2005. The LMDI approach t odecomposition analysis: a practical gui de. Energy Policy 33, 867–871. 目前LMDI分解基 本都以这篇文献为参考,论坛里也有,大家可以下载去深入研究一下,总体并不难。 但是 对于LMDI方法的解读以及stata实现的步骤、过程,目前基本没有具体的中文资料 。 本文在写论文的过程中,进行了总结,现上传上来供大家参考,里面包括stata的 程序文件(ado
IEC 60695-11-2-2017 第11-2部分:试验火焰——1kW标称预混合火焰——装置、验证试验安排和指南.pdf
基于matlab实现的拉普拉斯金字塔分解 做毕业设计的可以参考,小波变换以及MGA的初级参考.rar
本文档是课题研究的研究报告内含调研以及源码设计以及结果分析
Node.js,简称Node,是一个开源且跨平台的JavaScript运行时环境,它允许在浏览器外运行JavaScript代码。Node.js于2009年由Ryan Dahl创立,旨在创建高性能的Web服务器和网络应用程序。它基于Google Chrome的V8 JavaScript引擎,可以在Windows、Linux、Unix、Mac OS X等操作系统上运行。 Node.js的特点之一是事件驱动和非阻塞I/O模型,这使得它非常适合处理大量并发连接,从而在构建实时应用程序如在线游戏、聊天应用以及实时通讯服务时表现卓越。此外,Node.js使用了模块化的架构,通过npm(Node package manager,Node包管理器),社区成员可以共享和复用代码,极大地促进了Node.js生态系统的发展和扩张。 Node.js不仅用于服务器端开发。随着技术的发展,它也被用于构建工具链、开发桌面应用程序、物联网设备等。Node.js能够处理文件系统、操作数据库、处理网络请求等,因此,开发者可以用JavaScript编写全栈应用程序,这一点大大提高了开发效率和便捷性。 在实践中,许多大型企业和组织已经采用Node.js作为其Web应用程序的开发平台,如Netflix、PayPal和Walmart等。它们利用Node.js提高了应用性能,简化了开发流程,并且能更快地响应市场需求。
制造企业数字化转型ERP与CRM系统集成定位及场景构建方案.pptx
触屏版自适应手机wap软件网站模板 触屏版自适应手机wap软件网站模板
【课程设计全套资料】基于JAVA的管理系统
IEC 60730-2-13-2017.pdf
机械设计谷物洗涤机sw12非常好的设计图纸100%好用.zip
2000—2022年东中西分区域空间权重矩阵,省级层面,具体包括01矩阵,经济矩 阵,地理矩阵,经济地理矩阵以及经济地理嵌套矩阵,包含原始数据和计算过程,经济地理 权重矩阵采用2000-2022年数据,可以更改研究期间,里面有计算过程,地理距离 是用的欧式距离,如有疑问可与我私聊哈,大家放心下载。
TMCM-0930-TMCL 硬件手册
软件说明:先将要合并的.PDF类型文件放到一个文件夹里,然后点击”选择文件夹“选择它,再点击合并,确定存放位置确定后即可成功合并一个PDF类型的文件。在 Python 中,可以使用 PyPDF2 库来合并 PDF 文件。用 pip install PyPDF2 命令来安装这个库。