Category Archives: Computer and Website

Artificial neurology

Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory have developed a new computational model of a neural circuit in the brain, which could shed light on the biological role of inhibitory neurons — neurons that keep other neurons from firing.

The model describes a neural circuit consisting of an array of input neurons and an equivalent number of output neurons. The circuit performs what neuroscientists call a “winner-take-all” operation, in which signals from multiple input neurons induce a signal in just one output neuron.

Using the tools of theoretical computer science, the researchers prove that, within the context of their model, a certain configuration of inhibitory neurons provides the most efficient means of enacting a winner-take-all operation. Because the model makes empirical predictions about the behavior of inhibitory neurons in the brain, it offers a good example of the way in which computational analysis could aid neuroscience.

The researchers will present their results this week at the conference on Innovations in Theoretical Computer Science. Nancy Lynch, the NEC Professor of Software Science and Engineering at MIT, is the senior author on the paper. She’s joined by Merav Parter, a postdoc in her group, and Cameron Musco, an MIT graduate student in electrical engineering and computer science.

For years, Lynch’s group has studied communication and resource allocation in ad hoc networks — networks whose members are continually leaving and rejoining. But recently, the team has begun using the tools of network analysis to investigate biological phenomena.

“There’s a close correspondence between the behavior of networks of computers or other devices like mobile phones and that of biological systems,” Lynch says. “We’re trying to find problems that can benefit from this distributed-computing perspective, focusing on algorithms for which we can prove mathematical properties.”

Artificial neurology

In recent years, artificial neural networks — computer models roughly based on the structure of the brain — have been responsible for some of the most rapid improvement in artificial-intelligence systems, from speech transcription to face recognition software.

An artificial neural network consists of “nodes” that, like individual neurons, have limited information-processing power but are densely interconnected. Data are fed into the first layer of nodes. If the data received by a given node meet some threshold criterion — for instance, if it exceeds a particular value — the node “fires,” or sends signals along all of its outgoing connections.

Each of those outgoing connections, however, has an associated “weight,” which can augment or diminish a signal. Each node in the next layer of the network receives weighted signals from multiple nodes in the first layer; it adds them together, and again, if their sum exceeds some threshold, it fires. Its outgoing signals pass to the next layer, and so on.

In artificial-intelligence applications, a neural network is “trained” on sample data, constantly adjusting its weights and firing thresholds until the output of its final layer consistently represents the solution to some computational problem.

Biological plausibility

Lynch, Parter, and Musco made several modifications to this design to make it more biologically plausible. The first was the addition of inhibitory “neurons.” In a standard artificial neural network, the values of the weights on the connections are usually positive or capable of being either positive or negative. But in the brain, some neurons appear to play a purely inhibitory role, preventing other neurons from firing. The MIT researchers modeled those neurons as nodes whose connections have only negative weights.

Technology for doing good

Garrett Parrish grew up singing and dancing as a theater kid, influenced by his older siblings, one of whom is an actor and the other a stage manager. But by the time he reached high school, Parrish had branched out significantly, drumming in his school’s jazz ensemble and helping to build a state-championship-winning robot.

MIT was the first place Parrish felt he was able to work meaningfully at the nexus of art and technology. “Being a part of the MIT culture, and having the resources that are available here, are what really what opened my mind to that intersection,” the MIT senior says. “That’s always been my goal from the beginning: to be as emotionally educated as I am technically educated.”

Parrish, who is majoring in mechanical engineering, has collaborated on a dizzying array of projects ranging from app-building, to assistant directing, to collaborating on a robotic opera. Driving his work is an interest in shaping technology to serve others.

“The whole goal of my life is to fix all the people problems. I sincerely think that the biggest problems we have are how we deal with each other, and how we treat each other. [We need to be] promoting empathy and understanding, and technology is an enormous power to influence that in a good way,” he says.

Parrish began his academic career at Harvard University and transferred to MIT after his first year. Frustrated at how little power individuals often have in society, Parrish joined DoneGood co-founders Scott Jacobsen and Cullen Schwartz, and became the startup’s chief technology officer his sophomore year. “We kind of distilled our frustrations about the way things are into, ‘How do you actionably use people’s existing power to create real change?’” Parrish says.

The DoneGood app and Chrome extension help consumers find businesses that share their priorities and values, such as paying a living wage, or using organic ingredients. The extension monitors a user’s online shopping and recommends alternatives. The mobile app offers a directory of local options and national brands that users can filter according to their values. “The two things that everyday people have at their disposal to create change is how they spend their time and how they spend their money. We direct money away from brands that aren’t sustainable, therefore creating an actionable incentive for them to become more sustainable,” Parrish says.

Finds and links related data scattered across digital file

The age of big data has seen a host of new techniques for analyzing large data sets. But before any of those techniques can be applied, the target data has to be aggregated, organized, and cleaned up.

That turns out to be a shockingly time-consuming task. In a 2016 survey, 80 data scientists told the company CrowdFlower that, on average, they spent 80 percent of their time collecting and organizing data and only 20 percent analyzing it.

An international team of computer scientists hopes to change that, with a new system called Data Civilizer, which automatically finds connections among many different data tables and allows users to perform database-style queries across all of them. The results of the queries can then be saved as new, orderly data sets that may draw information from dozens or even thousands of different tables.

“Modern organizations have many thousands of data sets spread across files, spreadsheets, databases, data lakes, and other software systems,” says Sam Madden, an MIT professor of electrical engineering and computer science and faculty director of MIT’s bigdata@CSAIL initiative. “Civilizer helps analysts in these organizations quickly find data sets that contain information that is relevant to them and, more importantly, combine related data sets together to create new, unified data sets that consolidate data of interest for some analysis.”

The researchers presented their system last week at the Conference on Innovative Data Systems Research. The lead authors on the paper are Dong Deng and Raul Castro Fernandez, both postdocs at MIT’s Computer Science and Artificial Intelligence Laboratory; Madden is one of the senior authors. They’re joined by six other researchers from Technical University of Berlin, Nanyang Technological University, the University of Waterloo, and the Qatar Computing Research Institute. Although he’s not a co-author, MIT adjunct professor of electrical engineering and computer science Michael Stonebraker, who in 2014 won the Turing Award — the highest honor in computer science — contributed to the work as well.

Pairs and permutations

Data Civilizer assumes that the data it’s consolidating is arranged in tables. As Madden explains, in the database community, there’s a sizable literature on automatically converting data to tabular form, so that wasn’t the focus of the new research. Similarly, while the prototype of the system can extract tabular data from several different types of files, getting it to work with every conceivable spreadsheet or database program was not the researchers’ immediate priority. “That part is engineering,” Madden says.

The system begins by analyzing every column of every table at its disposal. First, it produces a statistical summary of the data in each column. For numerical data, that might include a distribution of the frequency with which different values occur; the range of values; and the “cardinality” of the values, or the number of different values the column contains. For textual data, a summary would include a list of the most frequently occurring words in the column and the number of different words. Data Civilizer also keeps a master index of every word occurring in every table and the tables that contain it.

Then the system compares all of the column summaries against each other, identifying pairs of columns that appear to have commonalities — similar data ranges, similar sets of words, and the like. It assigns every pair of columns a similarity score and, on that basis, produces a map, rather like a network diagram, that traces out the connections between individual columns and between the tables that contain them.

Database caching in server farms increases speed and reliability

Today, loading a web page on a big website usually involves a database query — to retrieve the latest contributions to a discussion you’re participating in, a list of news stories related to the one you’re reading, links targeted to your geographic location, or the like.

But database queries are time consuming, so many websites store — or “cache” — the results of common queries on web servers for faster delivery.

If a site user changes a value in the database, however, the cache needs to be updated, too. The complex task of analyzing a website’s code to identify which operations necessitate updates to which cached values generally falls to the web programmer. Missing one such operation can result in an unusable site.

This week, at the Association for Computing Machinery’s Symposium on Principles of Programming Languages, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory presented a new system that automatically handles caching of database queries for web applications written in the web-programming language Ur/Web.

Although a website may be fielding many requests in parallel — sending different users different cached data, or even data cached on different servers — the system guarantees that, to the user, every transaction will look exactly as it would if requests were handled in sequence. So a user won’t, for instance, click on a link showing that tickets to an event are available, only to find that they’ve been snatched up when it comes time to pay.

In experiments involving two websites that had been built using Ur/Web, the new system’s automatic caching offered twofold and 30-fold speedups.

“Most very popular websites backed by databases don’t actually ask the database over and over again for each request,” says Adam Chlipala, an associate professor of electrical engineering and computer science at MIT and senior author on the conference paper. “They notice that, ‘Oh, I seem to have asked this question quite recently, and I saved the result, so I’ll just pull that out of memory.’”

“But the tricky part here is that you have to realize when you make changes to the database that some of your saved answers are no longer necessarily correct, and you have to do what’s called ‘invalidating’ them. And in the mainstream way of implementing this, the programmer needs to manually add invalidation logic. For every line of code that changes the database, the programmer has to sit down and think, ‘Okay, for every other line of code that reads the database and saves the result in a cache, which ones of those are going to be broken by the change I just made?’”

Chlipala is joined on the paper by Ziv Scully, a graduate student in computer science at Carnegie Mellon University, who worked in Chlipala’s lab as an MIT undergraduate.

Exhaustive search

Ur/Web, which Chlipala invented, lets web developers completely specify their sites’ functionality using just one programming language. The Ur/Web compiler then automatically generates all the different types of code required to power a website — HTML, JavaScript, SQL database queries, and cascading style sheets — while providing certain performance and security guarantees. Chlipala and Scully’s new system is a modification of the compiler, so Ur/Web users can simply recompile their existing code to get all the benefits of database caching. The language itself remains unchanged.

Make speech recognition ubiquitous in electronics

The butt of jokes as little as 10 years ago, automatic speech recognition is now on the verge of becoming people’s chief means of interacting with their principal computing devices.

In anticipation of the age of voice-controlled electronics, MIT researchers have built a low-power chip specialized for automatic speech recognition. Whereas a cellphone running speech-recognition software might require about 1 watt of power, the new chip requires between 0.2 and 10 milliwatts, depending on the number of words it has to recognize.

In a real-world application, that probably translates to a power savings of 90 to 99 percent, which could make voice control practical for relatively simple electronic devices. That includes power-constrained devices that have to harvest energy from their environments or go months between battery charges. Such devices form the technological backbone of what’s called the “internet of things,” or IoT, which refers to the idea that vehicles, appliances, civil-engineering structures, manufacturing equipment, and even livestock will soon have sensors that report information directly to networked servers, aiding with maintenance and the coordination of tasks.

“Speech input will become a natural interface for many wearable applications and intelligent devices,” says Anantha Chandrakasan, the Vannevar Bush Professor of Electrical Engineering and Computer Science at MIT, whose group developed the new chip. “The miniaturization of these devices will require a different interface than touch or keyboard. It will be critical to embed the speech functionality locally to save system energy consumption compared to performing this operation in the cloud.”

“I don’t think that we really developed this technology for a particular application,” adds Michael Price, who led the design of the chip as an MIT graduate student in electrical engineering and computer science and now works for chipmaker Analog Devices. “We have tried to put the infrastructure in place to provide better trade-offs to a system designer than they would have had with previous technology, whether it was software or hardware acceleration.”

Price, Chandrakasan, and Jim Glass, a senior research scientist at MIT’s Computer Science and Artificial Intelligence Laboratory, described the new chip in a paper Price presented last week at the International Solid-State Circuits Conference.

The sleeper wakes

Today, the best-performing speech recognizers are, like many other state-of-the-art artificial-intelligence systems, based on neural networks, virtual networks of simple information processors roughly modeled on the human brain. Much of the new chip’s circuitry is concerned with implementing speech-recognition networks as efficiently as possible.

But even the most power-efficient speech recognition system would quickly drain a device’s battery if it ran without interruption. So the chip also includes a simpler “voice activity detection” circuit that monitors ambient noise to determine whether it might be speech. If the answer is yes, the chip fires up the larger, more complex speech-recognition circuit.

In fact, for experimental purposes, the researchers’ chip had three different voice-activity-detection circuits, with different degrees of complexity and, consequently, different power demands. Which circuit is most power efficient depends on context, but in tests simulating a wide range of conditions, the most complex of the three circuits led to the greatest power savings for the system as a whole. Even though it consumed almost three times as much power as the simplest circuit, it generated far fewer false positives; the simpler circuits often chewed through their energy savings by spuriously activating the rest of the chip.

A typical neural network consists of thousands of processing “nodes” capable of only simple computations but densely connected to each other. In the type of network commonly used for voice recognition, the nodes are arranged into layers. Voice data are fed into the bottom layer of the network, whose nodes process and pass them to the nodes of the next layer, whose nodes process and pass them to the next layer, and so on. The output of the top layer indicates the probability that the voice data represents a particular speech sound.

A voice-recognition network is too big to fit in a chip’s onboard memory, which is a problem because going off-chip for data is much more energy intensive than retrieving it from local stores. So the MIT researchers’ design concentrates on minimizing the amount of data that the chip has to retrieve from off-chip memory.

Human intuition to planning algorithms

Every other year, the International Conference on Automated Planning and Scheduling hosts a competition in which computer systems designed by conference participants try to find the best solution to a planning problem, such as scheduling flights or coordinating tasks for teams of autonomous satellites.

On all but the most straightforward problems, however, even the best planning algorithms still aren’t as effective as human beings with a particular aptitude for problem-solving — such as MIT students.

Researchers from MIT’s Computer Science and Artificial Intelligence Laboratory are trying to improve automated planners by giving them the benefit of human intuition. By encoding the strategies of high-performing human planners in a machine-readable form, they were able to improve the performance of competition-winning planning algorithms by 10 to 15 percent on a challenging set of problems.

The researchers are presenting their results this week at the Association for the Advancement of Artificial Intelligence’s annual conference.

“In the lab, in other investigations, we’ve seen that for things like planning and scheduling and optimization, there’s usually a small set of people who are truly outstanding at it,” says Julie Shah, an assistant professor of aeronautics and astronautics at MIT. “Can we take the insights and the high-level strategies from the few people who are truly excellent at it and allow a machine to make use of that to be better at problem-solving than the vast majority of the population?”

The first author on the conference paper is Joseph Kim, a graduate student in aeronautics and astronautics. He’s joined by Shah and Christopher Banks, an undergraduate at Norfolk State University who was a research intern in Shah’s lab in the summer of 2016.

The human factor

Algorithms entered in the automated-planning competition — called the International Planning Competition, or IPC — are given related problems with different degrees of difficulty. The easiest problems require satisfaction of a few rigid constraints: For instance, given a certain number of airports, a certain number of planes, and a certain number of people at each airport with particular destinations, is it possible to plan planes’ flight routes such that all passengers reach their destinations but no plane ever flies empty?

A more complex class of problems — numerical problems — adds some flexible numerical parameters: Can you find a set of flight plans that meets the constraints of the original problem but also minimizes planes’ flight time and fuel consumption?

Finally, the most complex problems — temporal problems — add temporal constraints to the numerical problems: Can you minimize flight time and fuel consumption while also ensuring that planes arrive and depart at specific times?

For each problem, an algorithm has a half-hour to generate a plan. The quality of the plans is measured according to some “cost function,” such as an equation that combines total flight time and total fuel consumption.

Cancer treatment with machine learning

Regina Barzilay is working with MIT students and medical doctors in an ambitious bid to revolutionize cancer care. She is relying on a tool largely unrecognized in the oncology world but deeply familiar to hers: machine learning.

Barzilay, the Delta Electronics Professor of Electrical Engineering and Computer Science, was diagnosed with breast cancer in 2014. She soon learned that good data about the disease is hard to find. “You are desperate for information — for data,” she says now. “Should I use this drug or that? Is that treatment best? What are the odds of recurrence? Without reliable empirical evidence, your treatment choices become your own best guesses.”

Across different areas of cancer care — be it diagnosis, treatment, or prevention — the data protocol is similar. Doctors start the process by mapping patient information into structured data by hand, and then run basic statistical analyses to identify correlations. The approach is primitive compared with what is possible in computer science today, Barzilay says.

These kinds of delays and lapses (which are not limited to cancer treatment), can really hamper scientific advances, Barzilay says. For example, 1.7 million people are diagnosed with cancer in the U.S. every year, but only about 3 percent enroll in clinical trials, according to the American Society of Clinical Oncology. Current research practice relies exclusively on data drawn from this tiny fraction of patients. “We need treatment insights from the other 97 percent receiving cancer care,” she says.

To be clear: Barzilay isn’t looking to up-end the way current clinical research is conducted. She just believes that doctors and biologists — and patients — could benefit if she and other data scientists lent them a helping hand. Innovation is needed and the tools are there to be used.

Barzilay has struck up new research collaborations, drawn in MIT students, launched projects with doctors at Massachusetts General Hospital, and begun empowering cancer treatment with the machine learning insight that has already transformed so many areas of modern life.

Machine learning, real people

At the MIT Stata Center, Barzilay, a lively presence, interrupts herself mid-sentence, leaps up from her office couch, and runs off to check on her students.

She returns with a laugh. An undergraduate group is assisting Barzilay with a federal grant application, and they’re down to the wire on the submission deadline. The funds, she says, would enable her to pay the students for their time. Like Barzilay, they are doing much of this research for free, because they believe in its power to do good. “In all my years at MIT I have never seen students get so excited about the research and volunteer so much of their time,” Barzilay says.

At the center of Barzilay’s project is machine learning, or algorithms that learn from data and find insights without being explicitly programmed where to look for them. This tool, just like the ones Amazon, Netflix, and other sites use to track and predict your preferences as a consumer, can make short work of gaining insight into massive quantities of data.

Applying it to patient data can offer tremendous assistance to people who, as Barzilay knows well, really need the help. Today, she says, a woman cannot retrieve answers to simple questions such as: What was the disease progression for women in my age range with the same tumor characteristics?

Student innovators and shared his latest projects

Like MIT’s campus computing environment, Athena, a pre-cloud solution for enabling files and applications to follow the user, Dropbox’s Drew Houston ’05 brings his alma mater everywhere he goes.

After earning his bachelor’s in electrical engineering and computer science, Houston’s frustration with the clunky need to carry portable USB drives drove him to partner with a fellow MIT student, Arash Ferdowsi, to develop an online solution — what would become Dropbox.

Dropbox, which now has over 500 million users, continues to adapt. The file-sharing company recently crossed the $1-billion threshold in annual subscription revenue. It’s expanding its business model by selling at the corporate level — employees at companies with Dropbox can use, essentially, one big box.

True to his company’s goal of using technology to bring people (and files) together, Houston is keen to share his own wisdom with others, especially those at MIT. Houston gave the 2013 Commencement address, saying “The hardest-working people don’t work hard because they’re disciplined. They work hard because working on an exciting problem is fun.”

He has also been a guest speaker in ‘The Founder’s Journey,” a course designed to demystify entrepreneurship, and at the MIT Enterprise Forum Cambridge; a frequent and active participant in StartMIT, a workshop on entrepreneurship held over Independent Activities Period (IAP); and a staple of the MIT Better World tour, an alumni engagement event happening at cities all over the globe.

Houston stopped by MIT in February for the latest iteration of StartMIT to give a “fireside chat” about the early days of Dropbox, when it was run with a few of his friends from Course 6, and discussed the current challenges of the company: managing scale. His firm now employs over 1,000 people.

“With thousands of employees in the company, you need coordination, and it can become total chaos. Ultimately all the vectors need to point in the same direction,” he told students. It turns out that Dropbox’s new online collaboration suite, Paper, could play a key role in getting those vectors to line up, offering lessons to both those new to start-ups and more seasoned entrepreneurs.

Top University Rankings

MIT has been honored with 12 No. 1 subject rankings in the QS World University Rankings for 2017.

MIT received a No. 1 ranking in the following QS subject areas: Architecture/Built Environment; Linguistics; Computer Science and Information Systems; Chemical Engineering; Civil and Structural Engineering; Electrical and Electronic Engineering; Mechanical, Aeronautical and Manufacturing Engineering; Chemistry; Materials Science; Mathematics; Physics and Astronomy; and Economics.

Additional high-ranking MIT subjects include: Art and Design (No. 2), Biological Sciences (No. 2), Earth and Marine Sciences (No. 5), Environmental Sciences (No. 3), Accounting and Finance (No. 2), Business and Management Studies (No. 4), and Statistics and Operational Research (No. 2).

Quacquarelli Symonds Limited subject rankings, published annually, are designed to help prospective students find the leading schools in their field of interest. Rankings are based on research quality and accomplishments, academic reputation, and graduate employment.

MIT has been ranked as the No. 1 university in the world by QS World University Rankings for five straight years.

Summaries of online discussions

From Reddit to Quora, discussion forums can be equal parts informative and daunting. We’ve all fallen down rabbit holes of lengthy threads that are impossible to sift through. Comments can be redundant, off-topic or even inaccurate, but all that content is ultimately still there for us to try and untangle.

Sick of the clutter, a team from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) has developed “Wikum,” a system that helps users construct concise, expandable summaries that make it easier to navigate unruly discussions.

“Right now, every forum member has to go through the same mental labor of squeezing out key points from long threads,” says MIT Professor David Karger, who was senior author on a new paper about Wikum. “If every reader could contribute that mental labor back into the discussion, it would save that time and energy for every future reader, making the conversation more useful for everyone.”

The team tested Wikum against a Google document with tracked changes that aimed to mimic the collaborative editing structure of a wiki. They found that Wikum users completed reading much faster and recalled discussion points more accurately, and that editors made edits 40 percent faster.

Karger wrote the new paper with PhD students Lea Verou and Amy Zhang, who was lead author. The team presented the work last week at ACM’s Conference on Computer-Supported Cooperative Work and Social Computing in Portland, Oregon.

How it works

While wikis can be a good way for people to summarize discussions, they aren’t ideal because users can’t see what’s already been summarized. This makes it difficult to break summarizing down into small steps that can be completed by individual users, because it requires that they spend a lot of energy figuring out what needs to happen next. Meanwhile, forums like Reddit let users “upvote” the best answers or comments, but lack contextual summaries that help readers get detailed overviews of discussions.

Wikum bridges the gap between forums and wikis by letting users work in small doses to refine a discussion’s main points, and giving readers an overall “map” of the conversation.

Readers can import discussions from places such as Disqus, a commenting platform used for publishers like The Atlantic. Then, once users create a summary, readers can examine the text and decide if they want to expand the topic to read more. The system uses color-coded “summary trees” that show topics at different levels of depth and lets readers jump between original comments and summaries.

“Our aim is to harness collaborative summarization to save th