1375 points 6 days ago by topshelf in 10000th position
Front Page/ShowHN stories over 4 points from last 7 days
If internet connection drops, you can still read the stories
If there were any historical discussions on the story, links to all the previous stories on Hacker News will appear just above the comments.
Historical Discussions: Google is already pushing WEI into Chromium (July 26, 2023: 1375 points)
Historical Discussions: Tesla created secret team to suppress thousands of driving range complaints (July 27, 2023: 825 points)
825 points 5 days ago by mfiguiere in 181st position
In March, Alexandre Ponsin set out on a family road trip from Colorado to California in his newly purchased Tesla, a used 2021 Model 3. He expected to get something close to the electric sport sedan's advertised driving range: 353 miles on a fully charged battery.
He soon realized he was sometimes getting less than half that much range, particularly in cold weather – such severe underperformance that he was convinced the car had a serious defect.
"We're looking at the range, and you literally see the number decrease in front of your eyes," he said of his dashboard range meter.
Ponsin contacted Tesla and booked a service appointment in California. He later received two text messages, telling him that "remote diagnostics" had determined his battery was fine, and then: "We would like to cancel your visit."
What Ponsin didn't know was that Tesla employees had been instructed to thwart any customers complaining about poor driving range from bringing their vehicles in for service. Last summer, the company quietly created a "Diversion Team" in Las Vegas to cancel as many range-related appointments as possible.
The Austin, Texas-based electric carmaker deployed the team because its service centers were inundated with appointments from owners who had expected better performance based on the company's advertised estimates and the projections displayed by the in-dash range meters of the cars themselves, according to several people familiar with the matter.
Inside the Nevada team's office, some employees celebrated canceling service appointments by putting their phones on mute and striking a metal xylophone, triggering applause from coworkers who sometimes stood on desks. The team often closed hundreds of cases a week and staffers were tracked on their average number of diverted appointments per day.
Managers told the employees that they were saving Tesla about $1,000 for every canceled appointment, the people said. Another goal was to ease the pressure on service centers, some of which had long waits for appointments.
In most cases, the complaining customers' cars likely did not need repair, according to the people familiar with the matter. Rather, Tesla created the groundswell of complaints another way – by hyping the range of its futuristic electric vehicles, or EVs, raising consumer expectations beyond what the cars can deliver. Teslas often fail to achieve their advertised range estimates and the projections provided by the cars' own equipment, according to Reuters interviews with three automotive experts who have tested or studied the company's vehicles.
Neither Tesla nor Chief Executive Elon Musk responded to detailed questions from Reuters for this story.
Tesla years ago began exaggerating its vehicles' potential driving distance – by rigging their range-estimating software. The company decided about a decade ago, for marketing purposes, to write algorithms for its range meter that would show drivers "rosy" projections for the distance it could travel on a full battery, according to a person familiar with an early design of the software for its in-dash readouts.
Then, when the battery fell below 50% of its maximum charge, the algorithm would show drivers more realistic projections for their remaining driving range, this person said. To prevent drivers from getting stranded as their predicted range started declining more quickly, Teslas were designed with a "safety buffer," allowing about 15 miles (24 km) of additional range even after the dash readout showed an empty battery, the source said.
The directive to present the optimistic range estimates came from Tesla Chief Executive Elon Musk, this person said.
"Elon wanted to show good range numbers when fully charged," the person said, adding: "When you buy a car off the lot seeing 350-mile, 400-mile range, it makes you feel good."
Tesla's intentional inflation of in-dash range-meter projections and the creation of its range-complaints diversion team have not been previously reported.
Driving range is among the most important factors in consumer decisions on which electric car to buy, or whether to buy one at all. So-called range anxiety – the fear of running out of power before reaching a charger – has been a primary obstacle to boosting electric-vehicle sales.
At the time Tesla programmed in the rosy range projections, it was selling only two models: the two-door Roadster, its first vehicle, which was later discontinued; and the Model S, a luxury sport sedan launched in 2012. It now sells four models: two cars, the 3 and S; and two crossover SUVs, the X and Y. Tesla plans the return of the Roadster, along with a "Cybertruck" pickup.
Reuters could not determine whether Tesla still uses algorithms that boost in-dash range estimates. But automotive testers and regulators continue to flag the company for exaggerating the distance its vehicles can travel before their batteries run out.
Tesla was fined earlier this year by South Korean regulators who found the cars delivered as little as half their advertised range in cold weather. Another recent study found that three Tesla models averaged 26% below their advertised ranges.
The U.S. Environmental Protection Agency (EPA) has required Tesla since the 2020 model year to reduce the range estimates the automaker wanted to advertise for six of its vehicles by an average of 3%. The EPA told Reuters, however, that it expects some variation between the results of separate tests conducted by automakers and the agency.
Data collected in 2022 and 2023 from more than 8,000 Teslas by Recurrent, a Seattle-based EV analytics company, showed that the cars' dashboard range meters didn't change their estimates to reflect hot or cold outside temperatures, which can greatly reduce range.
Recurrent found that Tesla's four models almost always calculated that they could travel more than 90% of their advertised EPA range estimates regardless of external temperatures. Scott Case, Recurrent's chief executive, told Reuters that Tesla's range meters also ignore many other conditions affecting driving distance.
Electric cars can lose driving range for a lot of the same reasons as gasoline cars — but to a greater degree. The cold is a particular drag on EVs, slowing the chemical and physical reactions inside their batteries and requiring a heating system to protect them. Other drains on the battery include hilly terrain, headwinds, a driver's lead foot and running the heating or air-conditioning inside the cabin.
Tesla discusses the general effect of such conditions in a "Range Tips" section of its website. The automaker also recently updated its vehicle software to provide a breakdown of battery consumption during recent trips with suggestions on how range might have been improved.
Tesla vehicles provide range estimates in two ways: One through a dashboard meter of current range that's always on, and a second projection through its navigation system, which works when a driver inputs a specific destination. The navigation system's range estimate, Case said, does account for a wider set of conditions, including temperature. While those estimates are "more realistic," they still tend to overstate the distance the car can travel before it needs to be recharged, he said.
Recurrent tested other automakers' in-dash range meters – including the Ford Mustang Mach-E, the Chevrolet Bolt and the Hyundai Kona – and found them to be more accurate. The Kona's range meter generally underestimated the distance the car could travel, the tests showed. Recurrent conducted the study with the help of a National Science Foundation grant.
Tesla, Case said, has consistently designed the range meters in its cars to deliver aggressive rather than conservative estimates: "That's where Tesla has taken a different path from most other automakers."
Failed tests and false advertising
Tesla isn't the only automaker with cars that don't regularly achieve their advertised ranges.
One of the experts, Gregory Pannone, co-authored a study of 21 different brands of electric vehicles, published in April by SAE International, an engineering organization. The research found that, on average, the cars fell short of their advertised ranges by 12.5% in highway driving.
The study did not name the brands tested, but Pannone told Reuters that three Tesla models posted the worst performance, falling short of their advertised ranges by an average of 26%.
The EV pioneer pushes the limits of government testing regulations that govern the claims automakers put on window stickers, the three automotive experts told Reuters.
Like their gas-powered counterparts, new electric vehicles are required by U.S. federal law to display a label with fuel-efficiency information. In the case of EVs, this is stated in miles-per-gallon equivalent (MPGe), allowing consumers to compare them to gasoline or diesel vehicles. The labels also include estimates of total range: how far an EV can travel on a full charge, in combined city and highway driving.
"They've gotten really good at exploiting the rule book and maximizing certain points to work in their favor involving EPA tests."
EV makers have a choice in how to calculate a model's range. They can use a standard EPA formula that converts fuel-economy results from city and highway driving tests to calculate a total range figure. Or automakers can conduct additional tests to come up with their own range estimate. The only reason to conduct more tests is to generate a more favorable estimate, said Pannone, a retired auto-industry veteran.
Tesla conducts additional range tests on all of its models. By contrast, many other automakers, including Ford, Mercedes and Porsche, continue to rely on the EPA's formula to calculate potential range, according to agency data for 2023 models. That generally produces more conservative estimates, Pannone said.
Mercedes-Benz told Reuters it uses the EPA's formula because it believes it provides a more accurate estimate. "We follow a certification strategy that reflects the real-world driving behavior of our customers in the best possible way," the German carmaker said in a statement.
Ford and Porsche didn't respond to requests for comment.
Whatever an automaker decides, the EPA must approve the window-sticker numbers. The agency told Reuters it conducts its own tests on 15% to 20% of new electric vehicles each year as part of an audit program and has tested six Tesla models since the 2020 model year.
EPA data obtained by Reuters through the Freedom of Information Act showed that the audits resulted in Tesla being required to lower all the cars' estimated ranges by an average of 3%. The projected range for one vehicle, the 2021 Model Y Long Range AWD (all-wheel drive), dropped by 5.15%. The EPA said all the changes to Tesla's range estimates were made before the company used the figures on window stickers.
The EPA said it has seen "everything" in its audits of EV manufacturers' range testing, including low and high estimates from other automakers. "That is what we expect when we have new manufacturers and new technologies entering the market and why EPA prioritizes" auditing them, the agency said.
The EPA cautioned that individuals' actual experience with vehicle efficiency might differ from the estimates the agency approves. Independent automotive testers commonly examine the EPA-approved fuel-efficiency or driving range claims against their own experience in structured tests or real-world driving. Often, they get different results, as in the case of Tesla vehicles.
Pannone called Tesla "the most aggressive" electric-vehicle manufacturer when it comes to range calculations.
"I'm not suggesting they're cheating," Pannone said of Tesla. "What they're doing, at least minimally, is leveraging the current procedures more than the other manufacturers."
Jonathan Elfalan, vehicle testing director for the automotive website Edmunds.com, reached a similar conclusion to Pannone after an extensive examination of vehicles from Tesla and other major automakers, including Ford, General Motors, Hyundai and Porsche.
All five Tesla models tested by Edmunds failed to achieve their advertised range, the website reported in February 2021. All but one of 10 other models from other manufacturers exceeded their advertised range.
Tesla complained to Edmunds that the test failed to account for the safety buffer programmed into Tesla's in-dash range meters. So Edmunds did further testing, this time running the vehicles, as Tesla requested, past the point where their range meters indicated the batteries had run out.
Only two of six Teslas tested matched their advertised range, Edmunds reported in March 2021. The tests found no fixed safety buffer.
Edmunds has continued to test electric vehicles, using its own standard method, to see if they meet their advertised range estimates. As of July, no Tesla vehicle had, Elfalan said.
"They've gotten really good at exploiting the rule book and maximizing certain points to work in their favor involving EPA tests," Elfalan told Reuters. The practice can "misrepresent what their customers will experience with their vehicles."
South Korean regulators earlier this year fined Tesla about $2.1 million for falsely advertised driving ranges on its local website between August 2019 and December 2022. The Korea Fair Trade Commission (KFTC) found that Tesla failed to tell customers that cold weather can drastically reduce its cars' range. It cited tests by the country's environment ministry that showed Tesla cars lost up to 50.5% of the company's claimed ranges in cold weather.
The KFTC also flagged certain statements on Tesla's website, including one that claimed about a particular model: "You can drive 528 km (328 miles) or longer on a single charge." Regulators required Tesla to remove the "or longer" phrase.
Korean regulators required Tesla to publicly admit it had misled consumers. Musk and two local executives did so in a June 19 statement, acknowledging "false/exaggerated advertising."
Creating a diversion
By last year, sales of Tesla's electric vehicles were surging. The company delivered about 1.3 million cars in 2022, nearly 13 times more than five years before.
As sales grew, so did demand for service appointments. The wait for an available booking was sometimes a month, according to one of the sources familiar with the diversion team's operations.
Tesla instructs owners to book appointments through a phone app. The company found that many problems could be handled by its "virtual" service teams, who can remotely diagnose and fix various issues.
Tesla supervisors told some virtual team members to steer customers away from bringing their cars into service whenever possible. One current Tesla "Virtual Service Advisor" described part of his job in his LinkedIn profile: "Divert customers who do not require in person service."
Such advisors handled a variety of issues, including range complaints. But last summer, Tesla created the Las Vegas "Diversion Team" to handle only range cases, according to the people familiar with the matter.
The office atmosphere at times resembled that of a telemarketing boiler room. A supervisor had purchased the metallophone – a xylophone with metal keys – that employees struck to celebrate appointment cancellations, according to the people familiar with the office's operations.
Advisers would normally run remote diagnostics on customers' cars and try to call them, the people said. They were trained to tell customers that the EPA-approved range estimates were just a prediction, not an actual measurement, and that batteries degrade over time, which can reduce range. Advisors would offer tips on extending range by changing driving habits.
If the remote diagnostics found anything else wrong with the vehicle that was not related to driving range, advisors were instructed not to tell the customer, one of the sources said. Managers told them to close the cases.
Tesla also updated its phone app so that any customer who complained about range could no longer book service appointments, one of the sources said. Instead, they could request that someone from Tesla contact them. It often took several days before owners were contacted because of the large backlog of range complaints, the source said.
The update routed all U.S. range complaints to the Nevada diversion team, which started in Las Vegas and later moved to the nearby suburb of Henderson. The team was soon fielding up to 2,000 cases a week, which sometimes included multiple complaints from customers frustrated they couldn't book a service appointment, one of the people said.
The team was expected to close about 750 cases a week. To accomplish that, office supervisors told advisers to call a customer once and, if there was no answer, to close the case as unresponsive, the source said. When customers did respond, advisers were told to try to complete the call in no more than five minutes.
In late 2022, managers aiming to quickly close cases told advisors to stop running remote diagnostic tests on the vehicles of owners who had reported range problems, according to one of the people familiar with the diversion team's operations.
"Thousands of customers were told there is nothing wrong with their car" by advisors who had never run diagnostics, the person said.
Reuters could not establish how long the practice continued.
Tesla recently stopped using its diversion team in Nevada to handle range-related complaints, according to the person familiar with the matter. Virtual service advisors in an office in Utah are now handling range cases, the person said. Reuters could not determine why the change was made.
On the road
By the time Alexandre Ponsin reached California on his March road trip, he had stopped to charge his Model 3's battery about a dozen times.
Concerned that something was seriously wrong with the car, he had called and texted with several Tesla representatives. One of them booked the first available appointment in Santa Clara – about two weeks away – but advised him to show up at a Tesla service center as soon as he arrived in California.
Ponsin soon received a text saying that remote diagnostics had shown his battery "is in good health."
"We would like to cancel your visit for now if you have no other concerns," the text read.
"Of course I still have concerns," Ponsin shot back. "I have 150 miles of range on a full charge!"
The next day, he received another text message asking him to cancel the appointment. "I am sorry, but no I do not want to close the service appointment as I do not feel my concerns have been addressed," he replied.
Undeterred, Ponsin brought his car to the Santa Clara service center without an appointment. A technician there told him the car was fine. "It lasted 10 minutes," Ponsin said, "and they didn't even look at the car physically."
After doing more research into range estimates, he said he ultimately concluded there is nothing wrong with his car. The problem, he said, was that Tesla is overstating its performance. He believes Tesla "should be a lot more explicit about the variation in the range," especially in very cold weather.
"I do love my Tesla," the engineer said. "But I have just tempered my expectation of what it can do in certain conditions."
By Steve Stecklow in London and Norihiko Shirouzu in Austin
Additional reporting by Heekyong Yang and Ju-min Park in Seoul and Peter Henderson in San Francisco
Art direction and lead illustration: Eve Watling
Video Production: Lucy Ha and Ilan Rubens
Edited by Brian Thevenot
Historical Discussions: Building and operating a pretty big storage system called S3 (July 27, 2023: 794 points)
795 points 5 days ago by werner in 2565th position
Today, I am publishing a guest post from Andy Warfield, VP and distinguished engineer over at S3. I asked him to write this based on the Keynote address he gave at USENIX FAST '23 that covers three distinct perspectives on scale that come along with building and operating a storage system the size of S3.
In today's world of short-form snackable content, we're very fortunate to get an excellent in-depth exposé. It's one that I find particularly fascinating, and it provides some really unique insights into why people like Andy and I joined Amazon in the first place. The full recording of Andy presenting this paper at fast is embedded at the end of this post.
Building and operating a pretty big storage system called S3
I've worked in computer systems software — operating systems, virtualization, storage, networks, and security — for my entire career. However, the last six years working with Amazon Simple Storage Service (S3) have forced me to think about systems in broader terms than I ever have before. In a given week, I get to be involved in everything from hard disk mechanics, firmware, and the physical properties of storage media at one end, to customer-facing performance experience and API expressiveness at the other. And the boundaries of the system are not just technical ones: I've had the opportunity to help engineering teams move faster, worked with finance and hardware teams to build cost-following services, and worked with customers to create gob-smackingly cool applications in areas like video streaming, genomics, and generative AI.
What I'd really like to share with you more than anything else is my sense of wonder at the storage systems that are all collectively being built at this point in time, because they are pretty amazing. In this post, I want to cover a few of the interesting nuances of building something like S3, and the lessons learned and sometimes surprising observations from my time in S3.
S3 launched on March 14th, 2006, which means it turned 17 this year. It's hard for me to wrap my head around the fact that for engineers starting their careers today, S3 has simply existed as an internet storage service for as long as you've been working with computers. Seventeen years ago, I was just finishing my PhD at the University of Cambridge. I was working in the lab that developed Xen, an open-source hypervisor that a few companies, including Amazon, were using to build the first public clouds. A group of us moved on from the Xen project at Cambridge to create a startup called XenSource that, instead of using Xen to build a public cloud, aimed to commercialize it by selling it as enterprise software. You might say that we missed a bit of an opportunity there. XenSource grew and was eventually acquired by Citrix, and I wound up learning a whole lot about growing teams and growing a business (and negotiating commercial leases, and fixing small server room HVAC systems, and so on) – things that I wasn't exposed to in grad school.
But at the time, what I was convinced I really wanted to do was to be a university professor. I applied for a bunch of faculty jobs and wound up finding one at UBC (which worked out really well, because my wife already had a job in Vancouver and we love the city). I threw myself into the faculty role and foolishly grew my lab to 18 students, which is something that I'd encourage anyone that's starting out as an assistant professor to never, ever do. It was thrilling to have such a large lab full of amazing people and it was absolutely exhausting to try to supervise that many graduate students all at once, but, I'm pretty sure I did a horrible job of it. That said, our research lab was an incredible community of people and we built things that I'm still really proud of today, and we wrote all sorts of really fun papers on security, storage, virtualization, and networking.
A little over two years into my professor job at UBC, a few of my students and I decided to do another startup. We started a company called Coho Data that took advantage of two really early technologies at the time: NVMe SSDs and programmable ethernet switches, to build a high-performance scale-out storage appliance. We grew Coho to about 150 people with offices in four countries, and once again it was an opportunity to learn things about stuff like the load bearing strength of second-floor server room floors, and analytics workflows in Wall Street hedge funds – both of which were well outside my training as a CS researcher and teacher. Coho was a wonderful and deeply educational experience, but in the end, the company didn't work out and we had to wind it down.
And so, I found myself sitting back in my mostly empty office at UBC. I realized that I'd graduated my last PhD student, and I wasn't sure that I had the strength to start building a research lab from scratch all over again. I also felt like if I was going to be in a professor job where I was expected to teach students about the cloud, that I might do well to get some first-hand experience with how it actually works.
I interviewed at some cloud providers, and had an especially fun time talking to the folks at Amazon and decided to join. And that's where I work now. I'm based in Vancouver, and I'm an engineer that gets to work across all of Amazon's storage products. So far, a whole lot of my time has been spent on S3.
When I joined Amazon in 2017, I arranged to spend most of my first day at work with Seth Markle. Seth is one of S3's early engineers, and he took me into a little room with a whiteboard and then spent six hours explaining how S3 worked.
It was awesome. We drew pictures, and I asked question after question non-stop and I couldn't stump Seth. It was exhausting, but in the best kind of way. Even then S3 was a very large system, but in broad strokes — which was what we started with on the whiteboard — it probably looks like most other storage systems that you've seen.
S3 is an object storage service with an HTTP REST API. There is a frontend fleet with a REST API, a namespace service, a storage fleet that's full of hard disks, and a fleet that does background operations. In an enterprise context we might call these background tasks "data services," like replication and tiering. What's interesting here, when you look at the highest-level block diagram of S3's technical design, is the fact that AWS tends to ship its org chart. This is a phrase that's often used in a pretty disparaging way, but in this case it's absolutely fascinating. Each of these broad components is a part of the S3 organization. Each has a leader, and a bunch of teams that work on it. And if we went into the next level of detail in the diagram, expanding one of these boxes out into the individual components that are inside it, what we'd find is that all the nested components are their own teams, have their own fleets, and, in many ways, operate like independent businesses.
All in, S3 today is composed of hundreds of microservices that are structured this way. Interactions between these teams are literally API-level contracts, and, just like the code that we all write, sometimes we get modularity wrong and those team-level interactions are kind of inefficient and clunky, and it's a bunch of work to go and fix it, but that's part of building software, and it turns out, part of building software teams too.
Before Amazon, I'd worked on research software, I'd worked on pretty widely adopted open-source software, and I'd worked on enterprise software and hardware appliances that were used in production inside some really large businesses. But by and large, that software was a thing we designed, built, tested, and shipped. It was the software that we packaged and the software that we delivered. Sure, we had escalations and support cases and we fixed bugs and shipped patches and updates, but we ultimately delivered software. Working on a global storage service like S3 was completely different: S3 is effectively a living, breathing organism. Everything, from developers writing code running next to the hard disks at the bottom of the software stack, to technicians installing new racks of storage capacity in our data centers, to customers tuning applications for performance, everything is one single, continuously evolving system. S3's customers aren't buying software, they are buying a service and they expect the experience of using that service to be continuously, predictably fantastic.
The first observation was that I was going to have to change, and really broaden how I thought about software systems and how they behave. This didn't just mean broadening thinking about software to include those hundreds of microservices that make up S3, it meant broadening to also include all the people who design, build, deploy, and operate all that code. It's all one thing, and you can't really think about it just as software. It's software, hardware, and people, and it's always growing and constantly evolving.
The second observation was that despite the fact that this whiteboard diagram sketched the broad strokes of the organization and the software, it was also wildly misleading, because it completely obscured the scale of the system. Each one of the boxes represents its own collection of scaled out software services, often themselves built from collections of services. It would literally take me years to come to terms with the scale of the system that I was working with, and even today I often find myself surprised at the consequences of that scale.
It probably isn't very surprising for me to mention that S3 is a really big system, and it is built using a LOT of hard disks. Millions of them. And if we're talking about S3, it's worth spending a little bit of time talking about hard drives themselves. Hard drives are amazing, and they've kind of always been amazing.
The first hard drive was built by Jacob Rabinow, who was a researcher for the predecessor of the National Institute of Standards and Technology (NIST). Rabinow was an expert in magnets and mechanical engineering, and he'd been asked to build a machine to do magnetic storage on flat sheets of media, almost like pages in a book. He decided that idea was too complex and inefficient, so, stealing the idea of a spinning disk from record players, he built an array of spinning magnetic disks that could be read by a single head. To make that work, he cut a pizza slice-style notch out of each disk that the head could move through to reach the appropriate platter. Rabinow described this as being like "like reading a book without opening it." The first commercially available hard disk appeared 7 years later in 1956, when IBM introduced the 350 disk storage unit, as part of the 305 RAMAC computer system. We'll come back to the RAMAC in a bit.
Today, 67 years after that first commercial drive was introduced, the world uses lots of hard drives. Globally, the number of bytes stored on hard disks continues to grow every year, but the applications of hard drives are clearly diminishing. We just seem to be using hard drives for fewer and fewer things. Today, consumer devices are effectively all solid-state, and a large amount of enterprise storage is similarly switching to SSDs. Jim Gray predicted this direction in 2006, when he very presciently said: "Tape is Dead. Disk is Tape. Flash is Disk. RAM Locality is King." This quote has been used a lot over the past couple of decades to motivate flash storage, but the thing it observes about disks is just as interesting.
Hard disks don't fill the role of general storage media that they used to because they are big (physically and in terms of bytes), slower, and relatively fragile pieces of media. For almost every common storage application, flash is superior. But hard drives are absolute marvels of technology and innovation, and for the things they are good at, they are absolutely amazing. One of these strengths is cost efficiency, and in a large-scale system like S3, there are some unique opportunities to design around some of the constraints of individual hard disks.
As I was preparing for my talk at FAST, I asked Tim Rausch if he could help me revisit the old plane flying over blades of grass hard drive example. Tim did his PhD at CMU and was one of the early researchers on heat-assisted magnetic recording (HAMR) drives. Tim has worked on hard drives generally, and HAMR specifically for most of his career, and we both agreed that the plane analogy – where we scale up the head of a hard drive to be a jumbo jet and talk about the relative scale of all the other components of the drive – is a great way to illustrate the complexity and mechanical precision that's inside an HDD. So, here's our version for 2023.
Imagine a hard drive head as a 747 flying over a grassy field at 75 miles per hour. The air gap between the bottom of the plane and the top of the grass is two sheets of paper. Now, if we measure bits on the disk as blades of grass, the track width would be 4.6 blades of grass wide and the bit length would be one blade of grass. As the plane flew over the grass it would count blades of grass and only miss one blade for every 25 thousand times the plane circled the Earth.
That's a bit error rate of 1 in 10^15 requests. In the real world, we see that blade of grass get missed pretty frequently – and it's actually something we need to account for in S3.
Now, let's go back to that first hard drive, the IBM RAMAC from 1956. Here are some specs on that thing:
Now let's compare it to the largest HDD that you can buy as of publishing this, which is a Western Digital Ultrastar DC HC670 26TB. Since the RAMAC, capacity has improved 7.2M times over, while the physical drive has gotten 5,000x smaller. It's 6 billion times cheaper per byte in inflation-adjusted dollars. But despite all that, seek times – the time it takes to perform a random access to a specific piece of data on the drive – have only gotten 150x better. Why? Because they're mechanical. We have to wait for an arm to move, for the platter to spin, and those mechanical aspects haven't really improved at the same rate. If you are doing random reads and writes to a drive as fast as you possibly can, you can expect about 120 operations per second. The number was about the same in 2006 when S3 launched, and it was about the same even a decade before that.
This tension between HDDs growing in capacity but staying flat for performance is a central influence in S3's design. We need to scale the number of bytes we store by moving to the largest drives we can as aggressively as we can. Today's largest drives are 26TB, and industry roadmaps are pointing at a path to 200TB (200TB drives!) in the next decade. At that point, if we divide up our random accesses fairly across all our data, we will be allowed to do 1 I/O per second per 2TB of data on disk.
S3 doesn't have 200TB drives yet, but I can tell you that we anticipate using them when they're available. And all the drive sizes between here and there.
So, with all this in mind, one of the biggest and most interesting technical scale problems that I've encountered is in managing and balancing I/O demand across a really large set of hard drives. In S3, we refer to that problem as heat management.
By heat, I mean the number of requests that hit a given disk at any point in time. If we do a bad job of managing heat, then we end up focusing a disproportionate number of requests on a single drive, and we create hotspots because of the limited I/O that's available from that single disk. For us, this becomes an optimization challenge of figuring out how we can place data across our disks in a way that minimizes the number of hotspots.
Hotspots are small numbers of overloaded drives in a system that ends up getting bogged down, and results in poor overall performance for requests dependent on those drives. When you get a hot spot, things don't fall over, but you queue up requests and the customer experience is poor. Unbalanced load stalls requests that are waiting on busy drives, those stalls amplify up through layers of the software storage stack, they get amplified by dependent I/Os for metadata lookups or erasure coding, and they result in a very small proportion of higher latency requests — or "stragglers". In other words, hotspots at individual hard disks create tail latency, and ultimately, if you don't stay on top of them, they grow to eventually impact all request latency.
As S3 scales, we want to be able to spread heat as evenly as possible, and let individual users benefit from as much of the HDD fleet as possible. This is tricky, because we don't know when or how data is going to be accessed at the time that it's written, and that's when we need to decide where to place it. Before joining Amazon, I spent time doing research and building systems that tried to predict and manage this I/O heat at much smaller scales – like local hard drives or enterprise storage arrays and it was basically impossible to do a good job of. But this is a case where the sheer scale, and the multitenancy of S3 result in a system that is fundamentally different.
The more workloads we run on S3, the more that individual requests to objects become decorrelated with one another. Individual storage workloads tend to be really bursty, in fact, most storage workloads are completely idle most of the time and then experience sudden load peaks when data is accessed. That peak demand is much higher than the mean. But as we aggregate millions of workloads a really, really cool thing happens: the aggregate demand smooths and it becomes way more predictable. In fact, and I found this to be a really intuitive observation once I saw it at scale, once you aggregate to a certain scale you hit a point where it is difficult or impossible for any given workload to really influence the aggregate peak at all! So, with aggregation flattening the overall demand distribution, we need to take this relatively smooth demand rate and translate it into a similarly smooth level of demand across all of our disks, balancing the heat of each workload.
In storage systems, redundancy schemes are commonly used to protect data from hardware failures, but redundancy also helps manage heat. They spread load out and give you an opportunity to steer request traffic away from hotspots. As an example, consider replication as a simple approach to encoding and protecting data. Replication protects data if disks fail by just having multiple copies on different disks. But it also gives you the freedom to read from any of the disks. When we think about replication from a capacity perspective it's expensive. However, from an I/O perspective – at least for reading data – replication is very efficient.
We obviously don't want to pay a replication overhead for all of the data that we store, so in S3 we also make use of erasure coding. For example, we use an algorithm, such as Reed-Solomon, and split our object into a set of k "identity" shards. Then we generate an additional set of m parity shards. As long as k of the (k+m) total shards remain available, we can read the object. This approach lets us reduce capacity overhead while surviving the same number of failures.
So, redundancy schemes let us divide our data into more pieces than we need to read in order to access it, and that in turn provides us with the flexibility to avoid sending requests to overloaded disks, but there's more we can do to avoid heat. The next step is to spread the placement of new objects broadly across our disk fleet. While individual objects may be encoded across tens of drives, we intentionally put different objects onto different sets of drives, so that each customer's accesses are spread over a very large number of disks.
There are two big benefits to spreading the objects within each bucket across lots and lots of disks:
- A customer's data only occupies a very small amount of any given disk, which helps achieve workload isolation, because individual workloads can't generate a hotspot on any one disk.
- Individual workloads can burst up to a scale of disks that would be really difficult and really expensive to build as a stand-alone system.
For instance, look at the graph above. Think about that burst, which might be a genomics customer doing parallel analysis from thousands of Lambda functions at once. That burst of requests can be served by over a million individual disks. That's not an exaggeration. Today, we have tens of thousands of customers with S3 buckets that are spread across millions of drives. When I first started working on S3, I was really excited (and humbled!) by the systems work to build storage at this scale, but as I really started to understand the system I realized that it was the scale of customers and workloads using the system in aggregate that really allow it to be built differently, and building at this scale means that any one of those individual workloads is able to burst to a level of performance that just wouldn't be practical to build if they were building without this scale.
Beyond the technology itself, there are human factors that make S3 - or any complex system - what it is. One of the core tenets at Amazon is that we want engineers and teams to fail fast, and safely. We want them to always have the confidence to move quickly as builders, while still remaining completely obsessed with delivering highly durable storage. One strategy we use to help with this in S3 is a process called "durability reviews." It's a human mechanism that's not in the statistical 11 9s model, but it's every bit as important.
When an engineer makes changes that can result in a change to our durability posture, we do a durability review. The process borrows an idea from security research: the threat model. The goal is to provide a summary of the change, a comprehensive list of threats, then describe how the change is resilient to those threats. In security, writing down a threat model encourages you to think like an adversary and imagine all the nasty things that they might try to do to your system. In a durability review, we encourage the same "what are all the things that might go wrong" thinking, and really encourage engineers to be creatively critical of their own code. The process does two things very well:
- It encourages authors and reviewers to really think critically about the risks we should be protecting against.
- It separates risk from countermeasures, and lets us have separate discussions about the two sides.
When working through durability reviews we take the durability threat model, and then we evaluate whether we have the right countermeasures and protections in place. When we are identifying those protections, we really focus on identifying coarse-grained "guardrails". These are simple mechanisms that protect you from a large class of risks. Rather than nitpicking through each risk and identifying individual mitigations, we like simple and broad strategies that protect against a lot of stuff.
Another example of a broad strategy is demonstrated in a project we kicked off a few years back to rewrite the bottom-most layer of S3's storage stack – the part that manages the data on each individual disk. The new storage layer is called ShardStore, and when we decided to rebuild that layer from scratch, one guardrail we put in place was to adopt a really exciting set of techniques called "lightweight formal verification". Our team decided to shift the implementation to Rust in order to get type safety and structured language support to help identify bugs sooner, and even wrote libraries that extend that type safety to apply to on-disk structures. From a verification perspective, we built a simplified model of ShardStore's logic, (also in Rust), and checked into the same repository alongside the real production ShardStore implementation. This model dropped all the complexity of the actual on-disk storage layers and hard drives, and instead acted as a compact but executable specification. It wound up being about 1% of the size of the real system, but allowed us to perform testing at a level that would have been completely impractical to do against a hard drive with 120 available IOPS. We even managed to publish a paper about this work at SOSP.
From here, we've been able to build tools and use existing techniques, like property-based testing, to generate test cases that verify that the behaviour of the implementation matches that of the specification. The really cool bit of this work wasn't anything to do with either designing ShardStore or using formal verification tricks. It was that we managed to kind of "industrialize" verification, taking really cool, but kind of research-y techniques for program correctness, and get them into code where normal engineers who don't have PhDs in formal verification can contribute to maintaining the specification, and that we could continue to apply our tools with every single commit to the software. Using verification as a guardrail has given the team confidence to develop faster, and it has endured even as new engineers joined the team.
Durability reviews and lightweight formal verification are two examples of how we take a really human, and organizational view of scale in S3. The lightweight formal verification tools that we built and integrated are really technical work, but they were motivated by a desire to let our engineers move faster and be confident even as the system becomes larger and more complex over time. Durability reviews, similarly, are a way to help the team think about durability in a structured way, but also to make sure that we are always holding ourselves accountable for a high bar for durability as a team. There are many other examples of how we treat the organization as part of the system, and it's been interesting to see how once you make this shift, you experiment and innovate with how the team builds and operates just as much as you do with what they are building and operating.
The last example of scale that I'd like to tell you about is an individual one. I joined Amazon as an entrepreneur and a university professor. I'd had tens of grad students and built an engineering team of about 150 people at Coho. In the roles I'd had in the university and in startups, I loved having the opportunity to be technically creative, to build really cool systems and incredible teams, and to always be learning. But I'd never had to do that kind of role at the scale of software, people, or business that I suddenly faced at Amazon.
One of my favourite parts of being a CS professor was teaching the systems seminar course to graduate students. This was a course where we'd read and generally have pretty lively discussions about a collection of "classic" systems research papers. One of my favourite parts of teaching that course was that about half way through it we'd read the SOSP Dynamo paper. I looked forward to a lot of the papers that we read in the course, but I really looked forward to the class where we read the Dynamo paper, because it was from a real production system that the students could relate to. It was Amazon, and there was a shopping cart, and that was what Dynamo was for. It's always fun to talk about research work when people can map it to real things in their own experience.
But also, technically, it was fun to discuss Dynamo, because Dynamo was eventually consistent, so it was possible for your shopping cart to be wrong.
I loved this, because it was where we'd discuss what you do, practically, in production, when Dynamo was wrong. When a customer was able to place an order only to later realize that the last item had already been sold. You detected the conflict but what could you do? The customer was expecting a delivery.
This example may have stretched the Dynamo paper's story a little bit, but it drove to a great punchline. Because the students would often spend a bunch of discussion trying to come up with technical software solutions. Then someone would point out that this wasn't it at all. That ultimately, these conflicts were rare, and you could resolve them by getting support staff involved and making a human decision. It was a moment where, if it worked well, you could take the class from being critical and engaged in thinking about tradeoffs and design of software systems, and you could get them to realize that the system might be bigger than that. It might be a whole organization, or a business, and maybe some of the same thinking still applied.
Now that I've worked at Amazon for a while, I've come to realize that my interpretation wasn't all that far from the truth — in terms of how the services that we run are hardly "just" the software. I've also realized that there's a bit more to it than what I'd gotten out of the paper when teaching it. Amazon spends a lot of time really focused on the idea of "ownership." The term comes up in a lot of conversations — like "does this action item have an owner?" — meaning who is the single person that is on the hook to really drive this thing to completion and make it successful.
The focus on ownership actually helps understand a lot of the organizational structure and engineering approaches that exist within Amazon, and especially in S3. To move fast, to keep a really high bar for quality, teams need to be owners. They need to own the API contracts with other systems their service interacts with, they need to be completely on the hook for durability and performance and availability, and ultimately, they need to step in and fix stuff at three in the morning when an unexpected bug hurts availability. But they also need to be empowered to reflect on that bug fix and improve the system so that it doesn't happen again. Ownership carries a lot of responsibility, but it also carries a lot of trust – because to let an individual or a team own a service, you have to give them the leeway to make their own decisions about how they are going to deliver it. It's been a great lesson for me to realize how much allowing individuals and teams to directly own software, and more generally own a portion of the business, allows them to be passionate about what they do and really push on it. It's also remarkable how much getting ownership wrong can have the opposite result.
I've spent a lot of time at Amazon thinking about how important and effective the focus on ownership is to the business, but also about how effective an individual tool it is when I work with engineers and teams. I realized that the idea of recognizing and encouraging ownership had actually been a really effective tool for me in other roles. Here's an example: In my early days as a professor at UBC, I was working with my first set of graduate students and trying to figure out how to choose great research problems for my lab. I vividly remember a conversation I had with a colleague that was also a pretty new professor at another school. When I asked them how they choose research problems with their students, they flipped. They had a surprisingly frustrated reaction. "I can't figure this out at all. I have like 5 projects I want students to do. I've written them up. They hum and haw and pick one up but it never works out. I could do the projects faster myself than I can teach them to do it."
And ultimately, that's actually what this person did — they were amazing, they did a bunch of really cool stuff, and wrote some great papers, and then went and joined a company and did even more cool stuff. But when I talked to grad students that worked with them what I heard was, "I just couldn't get invested in that thing. It wasn't my idea."
As a professor, that was a pivotal moment for me. From that point forward, when I worked with students, I tried really hard to ask questions, and listen, and be excited and enthusiastic. But ultimately, my most successful research projects were never mine. They were my students and I was lucky to be involved. The thing that I don't think I really internalized until much later, working with teams at Amazon, was that one big contribution to those projects being successful was that the students really did own them. Once students really felt like they were working on their own ideas, and that they could personally evolve it and drive it to a new result or insight, it was never difficult to get them to really invest in the work and the thinking to develop and deliver it. They just had to own it.
And this is probably one area of my role at Amazon that I've thought about and tried to develop and be more intentional about than anything else I do. As a really senior engineer in the company, of course I have strong opinions and I absolutely have a technical agenda. But If I interact with engineers by just trying to dispense ideas, it's really hard for any of us to be successful. It's a lot harder to get invested in an idea that you don't own. So, when I work with teams, I've kind of taken the strategy that my best ideas are the ones that other people have instead of me. I consciously spend a lot more time trying to develop problems, and to do a really good job of articulating them, rather than trying to pitch solutions. There are often multiple ways to solve a problem, and picking the right one is letting someone own the solution. And I spend a lot of time being enthusiastic about how those solutions are developing (which is pretty easy) and encouraging folks to figure out how to have urgency and go faster (which is often a little more complex). But it has, very sincerely, been one of the most rewarding parts of my role at Amazon to approach scaling myself as an engineer being measured by making other engineers and teams successful, helping them own problems, and celebrating the wins that they achieve.
I came to Amazon expecting to work on a really big and complex piece of storage software. What I learned was that every aspect of my role was unbelievably bigger than that expectation. I've learned that the technical scale of the system is so enormous, that its workload, structure, and operations are not just bigger, but foundationally different from the smaller systems that I'd worked on in the past. I learned that it wasn't enough to think about the software, that "the system" was also the software's operation as a service, the organization that ran it, and the customer code that worked with it. I learned that the organization itself, as part of the system, had its own scaling challenges and provided just as many problems to solve and opportunities to innovate. And finally, I learned that to really be successful in my own role, I needed to focus on articulating the problems and not the solutions, and to find ways to support strong engineering teams in really owning those solutions.
I'm hardly done figuring any of this stuff out, but I sure feel like I've learned a bunch so far. Thanks for taking the time to listen.
Historical Discussions: Unpacking Google's Web Environment Integrity specification (July 26, 2023: 754 points)
Unpacking Google's new "dangerous" Web-Environment-Integrity specification (July 26, 2023: 6 points)
754 points 6 days ago by dagurp in 2876th position
Read this article in 日本語.
Google seems to love creating specifications that are terrible for the open web and it feels like they find a way to create a new one every few months. This time, we have come across some controversy caused by a new Web Environment Integrity spec that Google seems to be working on.
At this time, I could not find any official message from Google about this spec, so it is possible that it is just the work of some misguided engineer at the company that has no backing from higher up, but it seems to be work that has gone on for more than a year, and the resulting spec is so toxic to the open Web that at this point, Google needs to at least give some explanation as to how it could go so far.
What is Web Environment Integrity? It is simply dangerous.
The spec in question, which is described at https://github.com/RupertBenWiser/Web-Environment-Integrity/blob/main/explainer.md, is called Web Environment Integrity. The idea of it is as simple as it is dangerous. It would provide websites with an API telling them whether the browser and the platform it is running on that is currently in use is trusted by an authoritative third party (called an attester). The details are nebulous, but the goal seems to be to prevent "fake" interactions with websites of all kinds. While this seems like a noble motivation, and the use cases listed seem very reasonable, the solution proposed is absolutely terrible and has already been equated with DRM for websites, with all that it implies.
It is also interesting to note that the first use case listed is about ensuring that interactions with ads are genuine. While this is not problematic on the surface, it certainly hints at the idea that Google is willing to use any means of bolstering its advertising platform, regardless of the potential harm to the users of the web.
Despite the text mentioning the incredible risk of excluding vendors (read, other browsers), it only makes a lukewarm attempt at addressing the issue and ends up without any real solution.
So, what is the issue?
Simply, if an entity has the power of deciding which browsers are trusted and which are not, there is no guarantee that they will trust any given browser. Any new browser would by default not be trusted until they have somehow demonstrated that they are trustworthy, to the discretion of the attesters. Also, anyone stuck running on legacy software where this spec is not supported would eventually be excluded from the web.
To make matters worse, the primary example given of an attester is Google Play on Android. This means Google decides which browser is trustworthy on its own platform. I do not see how they can be expected to be impartial.
On Windows, they would probably defer to Microsoft via the Windows Store, and on Mac, they would defer to Apple. So, we can expect that at least Edge and Safari are going to be trusted. Any other browser will be left to the good graces of those three companies.
Of course, you can note one glaring omission in the previous paragraph. What of Linux? Well, that is the big question. Will Linux be completely excluded from browsing the web? Or will Canonical become the decider by virtue of controlling the snaps package repositories? Who knows. But it's not looking good for Linux.
This alone would be bad enough, but it gets worse. The spec hints heavily that one aim is to ensure that real people are interacting with the website. It does not clarify in any way how it aims to do that, so we are left with some big questions about how it will achieve this.
Will behavioral data be used to see if the user behaves in a human-like fashion? Will this data be presented to the attesters? Will accessibility tools that rely on automating input to the browser cause it to become untrusted? Will it affect extensions? The spec does currently specify a carveout for browser modifications and extensions, but those can make automating interactions with a website trivial. So, either the spec is useless or restrictions will eventually be applied there too. It would otherwise be trivial for an attacker to bypass the whole thing.
Can we just refuse to implement it?
Unfortunately, it's not that simple this time. Any browser choosing not to implement this would not be trusted and any website choosing to use this API could therefore reject users from those browsers. Google also has ways to drive adoptions by websites themselves.
First, they can easily make all their properties depend on using these features, and not being able to use Google websites is a death sentence for most browsers already.
Furthermore, they could try to mandate that sites that use Google Ads use this API as well, which makes sense since the first goal is to prevent fake ad clicks. That would quickly ensure that any browser not supporting the API would be doomed.
There is hope.
There is an overwhelming likelihood that EU law will not allow a few companies to have a huge amount of power in deciding which browsers are allowed and which are not. There is no doubt that attesters would be under a huge amount of pressure to be as fair as possible.
Unfortunately, legislative and judicial machineries tend to be slow and there is no saying how much damage will be done while governments and judges are examining this. If this is allowed to move forward, it will be a hard time for the open web and might affect smaller vendors significantly.
It has been long known that Google's dominance of the web browser market gives them the potential to become an existential threat to the web. With every bad idea they have brought to the table, like FLOC, TOPIC, and Client Hints, they have come closer to realizing that potential.
Web Environment Integrity is more of the same but also a step above the rest in the threat it represents, especially since it could be used to encourage Microsoft and Apple to cooperate with Google to restrict competition both in the browser space and the operating system space. It is imperative that they be called out on this and prevented from moving forward.
While our vigilance allows us to notice and push back against all these attempts to undermine the web, the only long-term solution is to get Google to be on an even playing field. Legislation helps there, but so does reducing their market share.
Similarly, our voice grows in strength for every Vivaldi user, allowing us to be more effective in these discussions. We hope that users of the web realize this and choose their browsers consequently.
The fight for the web to remain open is going to be a long one and there is much at stake. Let us fight together.
Historical Discussions: Cap'n Proto 1.0 (July 28, 2023: 719 points)
719 points 4 days ago by kentonv in 1939th position
Cap'n Proto 1.0
kentonv on 28 Jul 2023
It's been a little over ten years since the first release of Cap'n Proto, on April 1, 2013. Today I'm releasing version 1.0 of Cap'n Proto's C++ reference implementation.
Don't get too excited! There's not actually much new. Frankly, I should have declared 1.0 a long time ago – probably around version 0.6 (in 2017) or maybe even 0.5 (in 2014). I didn't mostly because there were a few advanced features (like three-party handoff, or shared-memory RPC) that I always felt like I wanted to finish before 1.0, but they just kept not reaching the top of my priority list. But the reality is that Cap'n Proto has been relied upon in production for a long time. In fact, you are using Cap'n Proto right now, to view this site, which is served by Cloudflare, which uses Cap'n Proto extensively (and is also my employer, although they used Cap'n Proto before they hired me). Cap'n Proto is used to encode millions (maybe billions) of messages and gigabits (maybe terabits) of data every single second of every day. As for those still-missing features, the real world has seemingly proven that they aren't actually that important. (I still do want to complete them though.)
Ironically, the thing that finally motivated the 1.0 release is so that we can start working on 2.0. But again here, don't get too excited! Cap'n Proto 2.0 is not slated to be a revolutionary change. Rather, there are a number of changes we (the Cloudflare Workers team) would like to make to Cap'n Proto's C++ API, and its companion, the KJ C++ toolkit library. Over the ten years these libraries have been available, I have kept their APIs pretty stable, despite being 0.x versioned. But for 2.0, we want to make some sweeping backwards-incompatible changes, in order to fix some footguns and improve developer experience for those on our team.
Some users probably won't want to keep up with these changes. Hence, I'm releasing 1.0 now as a sort of "long-term support" release. We'll backport bugfixes as appropriate to the 1.0 branch for the long term, so that people who aren't interested in changes can just stick with it.
What's actually new in 1.0?
Again, not a whole lot has changed since the last version, 0.10. But there are a few things worth mentioning:
A number of optimizations were made to improve performance of Cap'n Proto RPC. These include reducing the amount of memory allocation done by the RPC implementation and KJ I/O framework, adding the ability to elide certain messages from the RPC protocol to reduce traffic, and doing better buffering of small messages that are sent and received together to reduce syscalls. These are incremental improvements.
Breaking change: Previously, servers could opt into allowing RPC cancellation by calling
context.allowCancellation()after a call was delivered. In 1.0, opting into cancellation is instead accomplished using an annotation on the schema (the
allowCancellationannotation defined in
c++.capnp). We made this change after observing that in practice, we almost always wanted to allow cancellation, but we almost always forgot to do so. The schema-level annotation can be set on a whole file at a time, which is easier not to forget. Moreover, the dynamic opt-in required a lot of bookkeeping that had a noticeable performance impact in practice; switching to the annotation provided a performance boost. For users that never used
context.allowCancellation()in the first place, there's no need to change anything when upgrading to 1.0 – cancellation is still disallowed by default. (If you are affected, you will see a compile error. If there's no compile error, you have nothing to worry about.)
KJ now uses
kqueue()to handle asynchronous I/O on systems that have it (MacOS and BSD derivatives). KJ has historically always used
epollon Linux, but until now had used a slower
poll()-based approach on other Unix-like platforms.
KJ's HTTP client and server implementations now support the
A new class
capnp::RevocableServerwas introduced to assist in exporting RPC wrappers around objects whose lifetimes are not controlled by the wrapper. Previously, avoiding use-after-free bugs in such scenarios was tricky.
Many, many smaller bug fixes and improvements. See the PR history for details.
What's planned for 2.0?
The changes we have in mind for version 2.0 of Cap'n Proto's C++ implementation are mostly NOT related to the protocol itself, but rather to the C++ API and especially to KJ, the C++ toolkit library that comes with Cap'n Proto. These changes are motivated by our experience building a large codebase on top of KJ: namely, the Cloudflare Workers runtime,
KJ is a C++ toolkit library, arguably comparable to things like Boost, Google's Abseil, or Facebook's Folly. I started building KJ at the same time as Cap'n Proto in 2013, at a time when C++11 was very new and most libraries were not really designing around it yet. The intent was never to create a new standard library, but rather to address specific needs I had at the time. But over many years, I ended up building a lot of stuff. By the time I joined Cloudflare and started the Workers Runtime, KJ already featured a powerful async I/O framework, HTTP implementation, TLS bindings, and more.
Of course, KJ has nowhere near as much stuff as Boost or Abseil, and nowhere near as much engineering effort behind it. You might argue, therefore, that it would have been better to choose one of those libraries to build on. However, KJ had a huge advantage: that we own it, and can shape it to fit our specific needs, without having to fight with anyone to get those changes upstreamed.
One example among many: KJ's HTTP implementation features the ability to "suspend" the state of an HTTP connection, after receiving headers, and transfer it to a different thread or process to be resumed. This is an unusual thing to want, but is something we needed for resource management in the Workers Runtime. Implementing this required some deep surgery in KJ HTTP and definitely adds complexity. If we had been using someone else's HTTP library, would they have let us upstream such a change?
That said, even though we own KJ, we've still tried to avoid making any change that breaks third-party users, and this has held back some changes that would probably benefit Cloudflare Workers. We have therefore decided to "fork" it. Version 2.0 is that fork.
Development of version 2.0 will take place on Cap'n Proto's new
v2 branch. The
master branch will become the 1.0 LTS branch, so that existing projects which track
master are not disrupted by our changes.
We don't yet know all the changes we want to make as we've only just started thinking seriously about it. But, here's some ideas we've had so far:
We will require a compiler with support for C++20, or maybe even C++23. Cap'n Proto 1.0 only requires C++14.
In particular, we will require a compiler that supports C++20 coroutines, as lots of KJ async code will be refactored to rely on coroutines. This should both make the code clearer and improve performance by reducing memory allocations. However, coroutine support is still spotty – as of this writing, GCC seems to ICE on KJ's coroutine implementation.
Cap'n Proto's RPC API, KJ's HTTP APIs, and others are likely to be revised to make them more coroutine-friendly.
kj::Maybewill become more ergonomic. It will no longer overload
nullptrto represent the absence of a value; we will introduce
KJ_IF_MAYBEwill no longer produce a pointer, but instead a reference (a trick that becomes possible by utilizing C++17 features).
We will drop support for compiling with exceptions disabled. KJ's coding style uses exceptions as a form of software fault isolation, or "catchable panics", such that errors can cause the "current task" to fail out without disrupting other tasks running concurrently. In practice, this ends up affecting every part of how KJ-style code is written. And yet, since the beginning, KJ and Cap'n Proto have been designed to accommodate environments where exceptions are turned off at compile time, using an elaborate system to fall back to callbacks and distinguish between fatal and non-fatal exceptions. In practice, maintaining this ability has been a drag on development – no-exceptions mode is constantly broken and must be tediously fixed before each release. Even when the tests are passing, it's likely that a lot of KJ's functionality realistically cannot be used in no-exceptions mode due to bugs and fragility. Today, I would strongly recommend against anyone using this mode except maybe for the most basic use of Cap'n Proto's serialization layer. Meanwhile, though, I'm honestly not sure if anyone uses this mode at all! In theory I would expect many people do, since many people choose to use C++ with exceptions disabled, but I've never actually received a single question or bug report related to it. It seems very likely that this was wasted effort all along. By removing support, we can simplify a lot of stuff and probably do releases more frequently going forward.
Similarly, we'll drop support for no-RTTI mode and other exotic modes that are a maintenance burden.
We may revise KJ's approach to reference counting, as the current design has proven to be unintuitive to many users.
We will fix a longstanding design flaw in
kj::AsyncOutputStream, where EOF is currently signaled by destroying the stream. Instead, we'll add an explicit
end()method that returns a Promise. Destroying the stream without calling
end()will signal an erroneous disconnect. (There are several other aesthetic improvements I'd like to make to the KJ stream APIs as well.)
We may want to redesign several core I/O APIs to be a better fit for Linux's new-ish io_uring event notification paradigm.
The RPC implementation may switch to allowing cancellation by default. As discussed above, this is opt-in today, but in practice I find it's almost always desirable, and disallowing it can lead to subtle problems.
And so on.
It's worth noting that at present, there is no plan to make any backwards-incompatible changes to the serialization format or RPC protocol. The changes being discussed only affect the C++ API. Applications written in other languages are completely unaffected by all this.
It's likely that a formal 2.0 release will not happen for some time – probably a few years. I want to make sure we get through all the really big breaking changes we want to make, before we inflict update pain on most users. Of course, if you're willing to accept breakages, you can always track the
v2 branch. Cloudflare Workers releases from
v2 twice a week, so it should always be in good working order.
Historical Discussions: Sinead O'Connor has died (July 26, 2023: 653 points)
653 points 6 days ago by jbegley in 53rd position
Irish singer Sinéad O'Connor has died at the age of 56, her family has announced.
In a statement, the singer's family said: "It is with great sadness that we announce the passing of our beloved Sinéad. Her family and friends are devastated and have requested privacy at this very difficult time."
The acclaimed Dublin performer released 10 studio albums, while her song Nothing Compares 2 U was named the number one world single in 1990 by the Billboard Music Awards. Her version of the ballad, written by musician Prince, topped the charts around the globe and earned her three Grammy nominations.
The accompanying music video, directed by English filmmaker John Maybury, consisted mostly of a close-up of O'Connor's face as she sung the lyrics and became as famous as her recording of the song.
In 1991, O'Connor was named artist of the year by Rolling Stone magazine on the back of the song's success.
O'Connor was presented with the inaugural award for Classic Irish Album at the RTÉ Choice Music Awards earlier this year.
The singer received a standing ovation as she dedicated the award for the album, I Do Not Want What I Haven't Got, to "each and every member of Ireland's refugee community".
"You're very welcome in Ireland. I love you very much and I wish you happiness," she said.
President Michael D Higgins led the tributes to O'Connor, saying his "first reaction on hearing the news of Sinéad's loss was to remember her extraordinarily beautiful, unique voice".
"To those of us who had the privilege of knowing her, one couldn't but always be struck by the depth of her fearless commitment to the important issues which she brought to public attention, no matter how uncomfortable those truths may have been," he said.
"What Ireland has lost at such a relatively young age is one of our greatest and most gifted composers, songwriters and performers of recent decades, one who had a unique talent and extraordinary connection with her audience, all of whom held such love and warmth for her ... May her spirit find the peace she sought in so many different ways."
Taoiseach Leo Varadkar expressed his sorrow at the death of the singer in a post on social media. "Her music was loved around the world and her talent was unmatched and beyond compare. Condolences to her family, her friends and all who loved her music," said Mr Varadkar.
Tánaiste Micheál Martin said he was "devastated" to learn of her death. "One of our greatest musical icons, and someone deeply loved by the people of Ireland, and beyond. Our hearts goes out to her children, her family, friends and all who knew and loved her," he said.
Minister for Culture and Arts Catherine Martin said she was "so sorry" that the "immensely talented" O'Connor had died.
"Her unique voice and innate musicality was incredibly special ... My thoughts are with her family and all who are heartbroken on hearing this news Ní bheidh a leithéid arís ann."
Sinn Féin vice president Michelle O'Neill said Ireland had lost "one of our most powerful and successful singer, songwriter and female artists".
"A big loss not least to her family & friends, but all her many followers across the world."
O'Connor drew controversy and divided opinion during her long career in music and time in public life.
In 1992, she tore up a photograph of Pope John Paul II on US television programme Saturday Night Live in an act of protest against child sex abuse in the Catholic Church.
"I'm not sorry I did it. It was brilliant," she later said of her protest. "But it was very traumatising," she added. "It was open season on treating me like a crazy bitch."
The year before that high-profile protest, she boycotted the Grammy Awards, the music industry's answer to the Oscars, saying she did not want "to be part of a world that measures artistic ability by material success".
She refused the playing of US national anthem before her concerts, drawing further public scorn.
In more recent years, O'Connor became better known for her spiritualism and activism, and spoke publicly about her mental health struggles.
In 2007, O'Connor told US talkshow Oprah Winfrey that she had been diagnosed with bipolar disorder four years previously and that before her diagnosis she had struggled with thoughts of suicide and overwhelming fear.
She said at the time that medication had helped her find more balance, but "it's a work in progress". O'Connor had also voiced support for other young women performers facing intense public scrutiny, including Britney Spears and Miley Cyrus.
O'Connor, who married four times, was ordained a priest in the Latin Tridentine church, an independent Catholic church not in communion with Rome, in 1999.
The singer converted to Islam in 2018 and changed her name to Shuhada Sadaqat, though continued to perform under the name Sinéad O'Connor. In 2021, O'Connor released a memoir Rememberings, while last year a film on her life was directed by Kathryn Ferguson.
On July 12th, O'Connor posted on her official Facebook page that she had moved back to London, was finishing an album and planned to release it early next year. She said she intended to tour Australia and New Zealand towards the end of 2024 followed by Europe, the United States and other locations in early 2025.
The circumstances of her death remain unclear.
O'Connor is survived by her three children. Her son, Shane, died last year aged 17.
Former Late Late Show host Ryan Tubridy said he was "devastated" by the news of O'Connor's death.
"We spoke days ago and she was as kind, powerful, passionate, determined and decent as ever," he said in a post on Instagram.
Addressing O'Connor directly, he said: "Rest in peace Sinéad, you were ahead of your time and deserve whatever peace comes your way."
Broadcaster Dave Fanning said O'Connor would be remembered for her music and her "fearlessness" and "in terms of how she went out there all the time, believed in everything she was doing, wasn't always right and had absolutely no regrets at all".
Canadian rock star Bryan Adams said he loved working with the Irish singer. "I loved working with you making photos, doing gigs in Ireland together and chats, all my love to your family," he tweeted.
REM singer Michael Stipe said: "There are no words," on his Instagram account alongside a photograph he posted of himself with O'Connor.
Hollywood star Russell Crowe posted a story on Twitter recounting a chance meeting with O'Connor – whom he described as "a hero of mine" – outside a pub in Dalkey, south Dublin, while he was working in Ireland last year.
"What an amazing woman. Peace be with your courageous heart Sinéad," he tweeted.
Billy Corgan, lead singer of American rock band The Smashing Pumpkins, said O'Connor was "fiercely honest and sweet and funny".
"She was talented in ways I'm not sure she completely understood," he said.
Ian Brown of The Stone Roses tweeted: "RIP SINEAD O'CONNOR A Beautiful Soul. Hearin Collaborating with and hearing Sinead sing my songs in the studio in Dublin was magical and a highlight of my musical life."
Musician Tim Burgess of the Charlatans said: "Sinead was the true embodiment of a punk spirit. She did not compromise and that made her life more of a struggle. Hoping that she has found peace."
American rapper and actor Ice T paid tribute to O'Connor, saying she "stood for something". In a Twitter post, he wrote: "Respect to Sinead ... She stood for something ... Unlike most people ... Rest Easy".
The Irish Music Rights Organisation (IMRO) said: "Our hearts go out to family, friends, and all who were moved by her music, as we reflect on the profound impact she made on the world."
Irish band Aslan paid tribute to O'Connor – both originating from Dublin. O'Connor collaborated with the band on Up In Arms in 2001.
Aslan lead singer Christy Dignam died in June.
A post on the band's Facebook page read: "Two Legends taken from us so closely together... No words ... Rest in Peace Sinead".
British singer Alison Moyet said O'Connor had a voice that "cracked stone with force by increment". In a post on Twitter, she wrote: "Heavy hearted at the loss of Sinead O'Connor. Wanted to reach out to her often but didn't. I remember her launch. Astounding presence. Voice that cracked stone with force & by increment.
"As beautiful as any girl around & never traded on that card. I loved that about her. Iconoclast."
US film and TV composer Bear McCreary reflected on writing new songs with the "wise and visionary" Sinead O'Connor in a social media post. McCreary tweeted that he was "gutted".
"She was the warrior poet I expected her to be — wise and visionary, but also hilarious. She and I laughed a lot. We were writing new songs together, which will now never be complete. We've all lost an icon. I've lost a friend. #RIP."
The pair had worked together on the latest version of the theme for Outlander.
Historical Discussions: The U.K. government is close to eroding encryption worldwide (July 28, 2023: 634 points)
636 points 4 days ago by pwmtr in 10000th position
The U.K. Parliament is pushing ahead with a sprawling internet regulation bill that will, among other things, undermine the privacy of people around the world. The Online Safety Bill, now at the final stage before passage in the House of Lords, gives the British government the ability to force backdoors into messaging services, which will destroy end-to-end encryption. No amendments have been accepted that would mitigate the bill's most dangerous elements.
If it passes, the Online Safety Bill will be a huge step backwards for global privacy, and democracy itself. Requiring government-approved software in peoples' messaging services is an awful precedent. If the Online Safety Bill becomes British law, the damage it causes won't stop at the borders of the U.K.
The sprawling bill, which originated in a white paper on "online harms" that's now more than four years old, would be the most wide-ranging internet regulation ever passed. At EFF, we've been clearly speaking about its disastrous effects for more than a year now.
It would require content filtering, as well as age checks to access erotic content. The bill also requires detailed reports about online activity to be sent to the government. Here, we're discussing just one fatally flawed aspect of OSB—how it will break encryption.
An Obvious Threat To Human Rights
It's a basic human right to have a private conversation. To have those rights realized in the digital world, the best technology we have is end-to-end encryption. And it's utterly incompatible with the government-approved message-scanning technology required in the Online Safety Bill.
This is because of something that EFF has been saying for years—there is no backdoor to encryption that only gets used by the "good guys." Undermining encryption, whether by banning it, pressuring companies away from it, or requiring client side scanning, will be a boon to bad actors and authoritarian states.
The U.K. government wants to grant itself the right to scan every message online for content related to child abuse or terrorism—and says it will still, somehow, magically, protect peoples' privacy. That's simply impossible. U.K. civil society groups have condemned the bill, as have technical experts and human rights groups around the world.
The companies that provide encrypted messaging—such as WhatsApp, Signal, and the UK-based Element—have also explained the bill's danger. In an open letter published in April, they explained that OSB "could break end-to-end encryption, opening the door to routine, general and indiscriminate surveillance of personal messages of friends, family members, employees, executives, journalists, human rights activists and even politicians themselves." Apple joined this group in June, stating publicly that the bill threatens encryption and "could put U.K. citizens at greater risk."
U.K. Government Says: Nerd Harder
In response to this outpouring of resistance, the U.K. government's response has been to wave its hands and deny reality. In a response letter to the House of Lords seen by EFF, the U.K.'s Minister for Culture, Media and Sport simply re-hashes an imaginary world in which messages can be scanned while user privacy is maintained. "We have seen companies develop such solutions for platforms with end-to-end encryption before," the letter states, a reference to client-side scanning. "Ofcom should be able to require" the use of such technologies, and where "off-the-shelf solutions" are not available, "it is right that the Government has led the way in exploring these technologies."
The letter refers to the Safety Tech Challenge Fund, a program in which the U.K. gave small grants to companies to develop software that would allegedly protect user privacy while scanning files. But of course, they couldn't square the circle. The grant winners' descriptions of their own prototypes clearly describe different forms of client-side scanning, in which user files are scoped out with AI before they're allowed to be sent in an encrypted channel.
The Minister completes his response on encryption by writing:
We expect the industry to use its extensive expertise and resources to innovate and build robust solutions for individual platforms/services that ensure both privacy and child safety by preventing child abuse content from being freely shared on public and private channels.
This is just repeating a fallacy that we've heard for years: that if tech companies can't create a backdoor that magically defends users, they must simply "nerd harder."
British Lawmakers Still Can And Should Protect Our Privacy
U.K. lawmakers still have a chance to stop their nation from taking this shameful leap forward towards mass surveillance. End-to-end encryption was not fully considered and voted on during either committee or report stage in the House of Lords. The Lords can still add a simple amendment that would protect private messaging, and specify that end-to-end encryption won't be weakened or removed.
Earlier this month, EFF joined U.K. civil society groups and sent a briefing explaining our position to the House of Lords. The briefing explains the encryption-related problems with the current bill, and proposes the adoption of an amendment that will protect end-to-end encryption. If such an amendment is not adopted, those who pay the price will be "human rights defenders and journalists who rely on private messaging to do their jobs in hostile environments; and ... those who depend on privacy to be able to express themselves freely, like LGBTQ+ people."
It's a remarkable failure that the House of Lords has not even taken up a serious debate over protecting encryption and privacy, despite ample time to review every every section of the bill.
TELL the U.K. Parliament: PROTECT Encryption—And our privacy
Finally, Parliament should reject this bill because universal scanning and surveillance is abhorrent to their own constituents. It is not what the British people want. A recent survey of U.K. citizens showed that 83% wanted the highest level of security and privacy available on messaging apps like Signal, WhatsApp, and Element.
Documents related to the U.K. Online Safety Bill:
Historical Discussions: Wavy walls use fewer bricks than a straight wall (2020) (July 27, 2023: 635 points)
Wavy walls use fewer bricks than a straight wall (2020) (December 09, 2020: 3 points)
635 points 5 days ago by caiobegotti in 543rd position
How cool is this! Popularized in England, these wavy walls actually use less bricks than a straight wall because they can be made just one brick thin, while a straight wall—without buttresses—would easily topple over.
According to Wikipedia, these wavy walls are also known as: crinkle crankle walls, crinkum crankum walls, serpentine walls, or ribbon walls. The alternate convex and concave curves in the wall provide stability and help it to resist lateral forces. [source]
The county of Suffolk seems to be home to countless examples of these crinkle crankle walls. On freston.net you can find 100 wavy walls that have been documented and photographed. In the United States, the best known serpentine wall can be found at the University of Virginia where Thomas Jefferson incorporated the wavy walls into the architecture. Although some authorities claim that Jefferson invented this design, he was merely adapting a well-established English style of construction. [source]
As for the mathematics behind these serpentine walls and why the waves make them more resistant to horizontal forces like wind vs straight walls, check out this post by John D. Cook.
Below you will find additional examples of these intriguing wavy walls that lawnmowers surely detest!
[h/t smell1s on reddit]
Historical Discussions: If we want a shift to walking we need to prioritize dignity (July 29, 2023: 593 points)
603 points 3 days ago by PaulHoule in 452nd position
Have you ever had a friend return from a vacation and gush about how great it was to walk in the place they'd visited? "You can walk everywhere! To a café, to the store. It was amazing!" Immediately after saying that, your friend hops in their car and drives across the parking lot to the Starbucks to which they could easily have walked.
Why does walking feel so intuitive when we're in a city built before cars, yet as soon as we return home, walking feels like an unpleasant chore that immediately drives us into a car?
A lot contributes to this dilemma, like the density of the city, or relative cheapness and convenience of driving. But there's a bigger factor here: We don't design the pedestrian experience for dignity.
This is a national problem, but certainly one we can see throughout our own Twin Cities metro: Even where pedestrian facilities are built, brand-new, ADA-compliant and everything else — using them feels like a chore, or even stressful and unpleasant.
Dignity is a really important concept in active transportation, but one that we often miss in the conversation about making streets better for walking and biking. I've been delighted to see the term appear on a social media account advocating for pedestrians. But as we plan and design better streets for active transportation, we need to consider the dignity of the pedestrian experience.
A Hierarchy of Needs
Three related concepts exist in designing great pedestrian spaces, and they can be arranged similarly to Maslow's hierarchy of needs. The base of the pyramid is the most essential, but having a complete and delightful pedestrian experience requires all three layers. The layers are: compliance, safety and dignity.
Compliance: Often Not Enough
At the bottom of the pyramid you have compliance — for pedestrian facilities, that mainly means complying with ADA rules. This requirement is non-negotiable for agencies because failure to obey exposes them to legal challenges. The ADA has done a great deal to make pedestrian facilities better for all — certainly wheelchair users, but also those who walk, use strollers, ride bicycles on sidewalks, etc.
Unfortunately, compliance with ADA rules alone often does not yield good pedestrian facilities.
For example, many agencies will simply remove pedestrian facilities to reduce the cost of compliance. A good example is the intersection of France and Parklawn avenues in Edina. If you were on the west side of France and wanted to walk to the Allina clinic in 2013, you could simply have crossed on the north crosswalk. But to improve ADA compliance, Edina removed the north crosswalk in 2014. Now, you would have to cross the busy signalized intersection three times just to continue on the north sidewalk.
In other cases, compliance is in good faith but not enough to make a pedestrian facility really usable — because complete compliance would entail a much larger project. This can be found when a broken-down sidewalk, or one with obstructions in the way, gets brand-new corner curb ramps but no other improvements. A wheelchair user can easily get up off the street at the corner, but can't go farther than 10 feet without hitting another impediment.
Safety: A Step Further, But What Is Still Lacking?
In the middle of the pyramid you have safety — both perceived and actual. It is possible to create a facility that is compliant but does not seem very safe. Picture sparkling new curb ramps to cross a 45-mph surface street with no marked crosswalk. In other cases, facilities are well-designed and safe, but may still not be dignified.
An example of this is in my own backyard, on Hennepin County's Nicollet Avenue. A very-welcome project last year installed new crosswalks to popular Augsburg Park. These have durable crosswalk markings, excellent signage and refuge medians. But crossing still feels like a negotiation with drivers. And the overall sidewalk experience on the 1950s street is still lacking, with sidewalks at the back-of-curb and little to no shade.
Dignity: Making Walking Feel Right
Finally, we have dignity. To determine whether a facility is dignified, I propose a simple test:
If you were driving past and saw a friend walking or rolling there, what would your first thought be:
1. "Oh, no, Henry's car must have broken down! I better offer him a ride."
2. "Oh, looks like Henry's out for a walk! I should text him later."
This is a surprisingly good test. Picture seeing your friend on a leafy sidewalk versus walking along a 45 mph suburban arterial. What would you think intuitively?
But to get more specific, these are the key factors in making a pedestrian experience dignified:
- Shade and light
- Enclosure and proportions
Shade and Light
A dignified facility needs consistent shade during hot summer months. At night, shadows should be minimal and the route should be clear. Especially when a tree canopy is present, this is best achieved with more individual fixtures installed lower to the ground and at a lower light output. However, a fairly consistent light level can be achieved even with basic cobraheads, as long as there are enough to light the corridor fully.
Routes should be intuitive, easy, and not feel tedious to navigate. Having to make sharp, 90° turns or go out of your way feel awkward and make you feel like your time and effort is wasted — even if the detour is relatively minor.
Enclosure and Proportions
It's a very uncomfortable experience to walk along a wide-open corridor with no walls or edge definition — and it's a common experience along suburban arterials, where you may have a wide road on one side and a wide-open parking lot on the other. You feel exposed and vulnerable. At the same time, overgrown sidewalks or ones that encroach on pedestrian space can feel claustrophobic and inconvenient. The right balance is needed.
Finally, engaging frontage is always more appealing than blank frontage. The extreme of this principle is obvious: Walking down a traditional main street is more pleasurable than walking through an industrial park. But even where land uses are similar, engagement of frontage can vary a lot: picture the difference between walking past front doors of houses in a traditional neighborhood, and walking past privacy fences and back yards in cul-de-sac suburban neighborhoods. The traditional neighborhood is more interesting and engaging to walk through.
When I was visiting downtown Northfield, I noted a new building along Water Street (MN-3), which had similar materials to the older downtown buildings on Division: windows, brick, [cultured] stone base. Yet the back was turned to the street, and the experience walking past was undignified.
A Pedestrian Cannot Live on Compliance Alone
Creating compliant sidewalks and trails is a high priority for agencies seeking to avoid litigation and serve pedestrians on the most basic level. Although that has some benefits, it isn't enough. Whether actively undermining walkability (like removing crosswalks to achieve ADA compliance) to simply not doing enough (adding a new curb ramp to an otherwise wheelchair-hostile sidewalk), we need to go much further.
To make walking and rolling a desirable, everyday activity, we need facilities that are compliant, safe and dignified. We have many examples in our communities of great pedestrian ways — but we have a long way to go to make it universal, and truly move the needle toward walking.
Streets.mn is a 501(c)(3) nonprofit. Our members and donors help us keep Minnesota's conversation about land use and planning moving forward.
Historical Discussions: Jujutsu – A Git-compatible DVCS that is both simple and powerful (February 19, 2022: 568 points)
jj v0.8: A Git-compatible DVCS that is both simple and powerful (July 17, 2023: 4 points)
Jujube: An experimental VCS inspired by Mercurial and Git (December 18, 2020: 3 points)
Show HN: The Best of Git, Mercurial, and Pijul in One VCS? (January 04, 2022: 2 points)
Jujutsu: A Git-compatible DVCS that is both simple and powerful (June 17, 2023: 2 points)
Jujutsu: A Git-Compatible DVCS, Combining Features from Git, Mercurial, Darcs (July 03, 2023: 2 points)
Jujutsu DVCS (February 17, 2022: 1 points)
Design of a lock-free DVCS (rsync-/Dropbox-/NFS-safe) (January 13, 2022: 1 points)
559 points about 14 hours ago by lemper in 10000th position
This is not a Google product. It is an experimental version-control system (VCS). I (Martin von Zweigbergk [email protected]) started it as a hobby project in late 2019. That said, this it is now my full-time project at Google. My presentation from Git Merge 2022 has information about Google's plans. See the slides or the recording.
Jujutsu is a Git-compatible
DVCS. It combines
features from Git (data model,
speed), Mercurial (anonymous
branching, simple CLI free from 'the index',
revsets, powerful history-rewriting), and Pijul/Darcs
(first-class conflicts), with features not found in most
of them (working-copy-as-a-commit,
undo functionality, automatic rebase,
safe replication via
rsync, Dropbox, or distributed file
The command-line tool is called
jj for now because it's easy to type and easy
to replace (rare in English). The project is called 'Jujutsu' because it matches
If you have any questions, please join us on Discord . The glossary may also be helpful.
Jujutsu has two backends. One of them is a Git backend (the other is a native one 1). This lets you use Jujutsu as an alternative interface to Git. The commits you create will look like regular Git commits. You can always switch back to Git. The Git support uses the libgit2 C library.
Almost all Jujutsu commands automatically commit the working copy. That means
that commands never fail because the working copy is dirty (no 'error: Your
local changes to the following files...'), and there is no need for
You also get an automatic backup of the working copy whenever you run a command.
Also, because the working copy is a commit, commands work the same way on the
working-copy commit as on any other commit, so you can set the commit message
before you're done with the changes.
With Jujutsu, the working copy plays a smaller role than with Git. Commands
snapshot the working copy before they start, then the update the repo, and then
the working copy is updated (if the working-copy commit was modified). Almost
all commands (even checkout!) operate on the commits in the repo, leaving the
common functionality of snapshotting and updating of the working copy to
centralized code. For example,
jj restore (similar to
git restore) can
restore from any commit and into any commit, and
jj describe can set the
commit message of any commit (defaults to the working-copy commit).
All operations you perform in the repo are recorded, along with a snapshot of the repo state after the operation. This means that you can easily revert to an earlier repo state, or to simply undo a particular operation (which does not necessarily have to be the most recent operation).
If an operation results in conflicts, information about those conflicts will be recorded in the commit(s). The operation will succeed. You can then resolve the conflicts later. One consequence of this design is that there's no need to continue interrupted operations. Instead, you get a single workflow for resolving conflicts, regardless of which command caused them. This design also lets Jujutsu rebase merge commits correctly (unlike both Git and Mercurial).
Basic conflict resolution:
Whenever you modify a commit, any descendants of the old commit will be rebased onto the new commit. Thanks to the conflict design described above, that can be done even if there are conflicts. Branches pointing to rebased commits will be updated. So will the working copy if it points to a rebased commit.
Besides the usual rebase command, there's
jj describe for editing the
description (commit message) of an arbitrary commit. There's also
which lets you edit the changes in a commit without checking it out. To split
a commit into two, use
jj split. You can even move part of the changes in a
commit to any other commit using
The tool is quite feature-complete, but some important features like (the
git blame are not yet supported. There
are also several performance bugs. It's also likely that workflows and setups
different from what the core developers use are not well supported.
I (Martin von Zweigbergk) have almost exclusively used
jj to develop the
project itself since early January 2021. I haven't had to re-clone from source
(I don't think I've even had to restore from backup).
There will be changes to workflows and backward-incompatible changes to the
on-disk formats before version 1.0.0. Even the binary's name may change (i.e.
jj). For any format changes, we'll try to implement transparent
upgrades (as we've done with recent changes), or provide upgrade commands or
scripts if requested.
See below for how to build from source. There are also pre-built binaries for Windows, Mac, or Linux (musl).
On most distributions, you'll need to build from source using
First make sure that you have the
packages installed by running something like this:
sudo apt-get install libssl-dev openssl pkg-config
cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli
If you're on Nix OS you can use the flake for this repository.
For example, if you want to run
jj loaded from the flake, use:
nix run 'github:martinvonz/jj'
You can also add this flake url to your system input flakes. Or you can install the flake to your user profile:
nix profile install 'github:martinvonz/jj'
If you use linuxbrew, you can run:
If you use Homebrew, you can run:
You can also install
jj via MacPorts (as the
sudo port install jujutsu
You may need to run some or all of these:
xcode-select --install brew install openssl brew install pkg-config export PKG_CONFIG_PATH='$(brew --prefix)/opt/openssl@3/lib/pkgconfig'
cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli
cargo install --git https://github.com/martinvonz/jj.git --locked --bin jj jj-cli --features vendored-openssl
You may want to configure your name and email so commits are made in your name.
Create a file at
~/.jjconfig.toml and make it look something like
$ cat ~/.jjconfig.toml [user] name = 'Martin von Zweigbergk' email = '[email protected]'
To set up command-line completion, source the output of
jj util completion --bash/--zsh/--fish (called
jj debug completion in
jj <= 0.7.0). Exactly how to source it depends on your shell.
source <(jj util completion) # --bash is the default
Or, with jj <= 0.7.0:
source <(jj debug completion) # --bash is the default
autoload -U compinit compinit source <(jj util completion --zsh)
Or, with jj <= 0.7.0:
autoload -U compinit compinit source <(jj debug completion --zsh)
jj util completion --fish | source
Or, with jj <= 0.7.0:
jj debug completion --fish | source
source-bash $(jj util completion)
Or, with jj <= 0.7.0:
source-bash $(jj debug completion)
There are several tools trying to solve similar problems as Jujutsu. See related work for details.
At this time, there's practically no reason to use the native backend. The backend exists mainly to make sure that it's possible to eventually add functionality that cannot easily be added to the Git backend. ↩
Historical Discussions: Show HN: Khoj – Chat offline with your second brain using Llama 2 (July 30, 2023: 526 points)
554 points 2 days ago by 110 in 10000th position
Latest commit message
July 27, 2023 15:28
July 30, 2023 02:07
July 30, 2023 22:37
July 30, 2023 19:27
July 28, 2023 18:47
July 11, 2023 18:43
July 28, 2023 19:27
July 28, 2023 19:27